Quantitative and Statistical Analysis in Statistical Package for the Assignment Example | Topics and Well Written Essays

ASSESSMENT: 2015-2016 Q1: (a) Make a statistical assessment of each of these three variables independently. [5 marks] i) Subjective health status (health) Subjective health status was measured as a self-reported categorical variable using a Likert scale ranging 1-5. Results from this survey question are therefore likely to be subjective. Data from the survey indicates 2417 valid responses were received from the sample population as shown below: Subjective general health Frequency Valid Percent Very good 740 31% Good 985 41% Fair 509 21% Bad 151 6% Very bad 33 1% Total 2417 100% Table 1: Frequency table for subjective health status Generally, analysis results indicate a very small percentage (1%) of the population reporting to be in very bad health. Almost three quarters of the population (72%) self-reported to be in good or very good health. On a scale of 1-5, 1 being ‘Very good’ and 5 being ‘Very bad’, respondents gave an average rating of 2.07 regarding their health status. This translates to an average score of ‘Good’. The pie chart below indicates the proportion of respondents in each response category. Figure 1: Pie chart on subjective health status A histogram was constructed to visualize the distribution of weighted responses with regards to self-reported health status. As seen below, the distribution is right skewed with most responses portraying the population to be in good health. Generally, self-report studies suffer from validity problems as seen in this case study where respondents over-report their health status to make them look healthier than could be the case. Fig 2: Histogram showing the skewness of reported general health ii) Age of respondent (agea) Results from the survey indicate the youngest respondent was aged 15 years. The minimum age-restriction is possibly an inclusion criteria set in the study protocol. The oldest respondent was aged 98 as shown in the summary statistics in the table below: Descriptive Statistics N Minimum Maximum Mean Std. Deviation Age of respondent, calculated 2415 15 98 47.34 18.693 Table 2: Summary statistics for respondent age The above results indicate a mean respondent age of 47 years with a standard deviation of 19. These results were calculated and are more valid than the self-report data on health-status mentioned earlier in this paper. As in most experiments, age distribution is almost normal albeit with a slight right skewness, as shown by the normal curve superimposed onto the histogram below: Fig 3: Histogram showing the age distribution of study participants Most respondents are in the age bracket of 20-70. The normality in age distribution ensures non-biased responses particularly for questions whose responses vary by age, such as health status. iii) How happy are you? (happy) Similar to health status, happiness was measured as a self-reported question and its interpretation should be made with similar caution. Generally, results from the survey indicate a large proportion of respondents self-reporting to be happy. Results are based on a Likert-scale ranging from 0-10, 0 being the lowest (‘Extremely unhappy’) and 10 the highest (‘Extremely happy’). Responses from 0 to 4 were rated as ‘unhappy’. The frequencies for the various categories of responses are shown below: How happy are you? Frequency Percent 0 - 4 unhappy 168 7% 5 205 9% 6 157 7% 7 430 18% 8 761 32% 9 416 17% extremely happy 282 12% Total 2419 100% Table 3: Summary statistics for respondent happiness Using the scale, more than three quarters of the sampled population self-reported a ‘happiness score’ of 7 and above. The distribution of responses is hence expected to be left skewed and is confirmed in the histogram below: Fig 4: Histogram showing the distribution of self-reports on happiness (b) Consider whether and in what way there is an association between any combination of two of the three variables. [10 marks] For this question, it was hypothesized that there is an association between health and age. Generally, biological studies indicate that controlling for other variables, health declines with age due to the tear and wear of body tissues. To investigate whether this hypothesis holds in this study, two indicator variables for age and health will be used: subjective health status (health), and age of respondent (agea). Preliminary analysis Preliminary investigations involve a crosstab of means of respondent age for each level of reported health status. The question is whether a pattern exists among persons reporting a specific health status and their age. Mean age of respondents differentiated by health is computed and is shown below: Subjective general health Mean, age of respondent, calculated Frequency Very good 41.85 739 Good 47.06 980 Fair 52.29 509 Bad 57.1 150 Very bad 57.92 33 Total 47.34 2410 Table 4: Mean respondent age differentiated by health status Table 4 above indicates a possible correlation between age and health status. We observe that better health outcomes are reported by respondents with a lower age. Consequently, as reported health outcomes worsen, the mean age increases correspondingly. This relation is visualized in the graph below: Fig 5: A graph showing a possible correlation between age and reported health outcomes From the above figure, it is observed that as age increases, health outcomes steadily become worse. At a mean age of around 57 years, there is a sudden peak in the gradient of the curve, indicating a slight change in age while reported health outcomes jump from ‘bad’ to ‘very bad’. Confirmatory analysis/ tests of association Whilst a possible correlation has been hypothesized between age and health outcome, the result can only be confirmed using appropriate tests of hypothesis. For this test, we create two categorical outcomes and subject them to a Chi-square test of association. Since health is already a categorical variable, no transformations are necessary. Instead, age of respondents is recoded to create categories and stored in a new variable, age_cat (label: Age category), as shown below: Age category Value Frequency 15-25 1 387 26-35 2 341 36-45 3 392 46-55 4 450 56-65 5 389 66-75 6 288 76-85 7 133 86-98 8 35 Total 2415 Having created age categories, a Chi-Square test of association can be used to confirm the hypothesized association between age and reported health. The test was run in SPSS and the output is shown below; Chi-Square Tests Value df Asymp. Sig. (2-sided) Pearson Chi-Square 174.127a 28 4.8E-23 N of Valid Cases 2410 Chi-square tests on the crosstab gives a p-value ~ 0, hence we fail to reject the null hypothesis and conclude that a correlation exists between age and subjective general health. Assumptions It is assumed that the observations are independent of each other, i.e. observations made on any variable are not correlated in any way. (c) Draw a plausible causal diagram for the relationship between the three variables, justifying why you have drawn it in the way you do. [3 marks] Age is an independent variable, i.e. its values do not change even though other variables may be changing. Consequently, it has to be free of influence from any of the two variables of interest here. Happiness too is caused partly by good health. Hence, happiness is a direct result of general health and indirect result of age. (d) Analyse the causal diagram and explain what conclusions you can come to about the relationship between the three variables. [13 marks] From earlier statistical analyses, age and general health have a causal relationship, i.e. increase in age causes deterioration in health. Age is the antecedent variable, i.e. age is causally antecedent to general happiness. It is also known that positive health outcome is linked to happiness. This linkage is based on existing theory and knowledge. Happiness and health have, for a long time now, been linked together resulting into the popular phrase- laughter is the best medicine. To sum it all, age has a direct effect on health, and health has a direct effect on happiness. Therefore age has an indirect effect on happiness but is also the antecedent in this chain. If we intend to investigate the effect of age on happiness, then it is prudent that we first control for the effect of general health in our model (a confounding variable). This is important since the model does not imply a one-to-one matching of the effects from one stage to another, i.e. external influences may also lead to improved health outcomes hence happiness. Controlling for the effects of general health in the model can be achieved through processes such as matching. The effect of age on happiness would be a direct one due to the causality linkage between the two variables. (e) What are the possible limitations of your model? [2 marks] The model does not factor in influence from latent external factors that are not included into the model. These are hard to detect and measure using regular techniques. Factor analysis methods can be used to eliminate their influence. QUESTION 2 (a) How many minutes per day, on average, do women engage in paid work? [2 marks] 132 minutes (b) What is the standard deviation of the sampling distribution for the average number of minutes women engage in paid work? [2 marks] SE = SD/√(sample size) Hence SD=SE/ √(sample size) Standard deviation (SD) is a measure of dispersion, or spread, while standard error (SE) is a measure of how close the sample mean is to the population mean. Consequently, as sample size increases, SE tends to zero as the sample estimate gets closer to the population estimate. SD, on the other hand, gets closer to the population SD. In the absence of the SD, the SE provides an unbiased estimate of the SD. Hence the answer is 4.12 minutes (c) State whether the sampling distribution for the average number of minutes men engage in paid work is more spread out or less spread out than the sampling distribution for the average number of minutes women engage in paid work [2 marks]. Briefly explain how you know this [2 marks]. The sampling distribution for the average number of minutes men engage in paid work is more spread out. The SD is a measure of spread. A higher SD implies the data is more spread and vice versa. Hence, since men have a higher SD (estimated form SE), than women, the sampling distribution is more spread out. (d) Suppose that we double the number of men in our sample. Describe two ways in which this increase in sample size changes the sampling distribution for the average number of minutes men engage in paid work [4 marks]. As mentioned earlier, an increase in sample size leads to a reduction in the standard error while the standard error shifts towards the mean. Consequently, when we double the number of men in our sample, we are likely to have a marginal drop in standard error. (e) a. Give one example of a possible confounding variable and explain why it might be a confounding variable [3 marks]. A possible confounding variable is age of the respondent. Older women are likely to spend more time in paid work and less time in childcare as compared to younger women. The latter are more likely to split their time between childcare and paid work as they have younger children that require their attention. However, older women will have already gotten through with childcare. (e) b. Now give one example of a possible intervening variable and explain why it might be an intervening variable [3 marks]. An example of an intervening variable would be number of children. Number of children would affect the number of minutes women spend on childcare and also affect time spent on paid work. (e) c. Now give one example of a possible instrumental variable and explain why it might be an instrumental variable [3 marks]. An instrumental variable would be nature of work as it has a direct effect on how many hours women spend on paid work (f) (g) -One was of controlling for causal relationships is through stratification. To effectively investigate the relationship between minutes spent on childcare and paid work, it would make sense if the population is stratified by the confounding variable, i.e. age. For instance, analyses should be separately for different age groups and if the relationships is not found to be similar between the groups, the results should be presented for different age categories. -A second way to improve the investigation of the relationship would be to first check if a relationship actually exists between the two variables and the intervening variable, number of children. QUESTION 3 Read More

Quantitative and Statistical Analysis in Statistical Package for the Social Sciences - Assignment Example

Extract of sample "Quantitative and Statistical Analysis in Statistical Package for the Social Sciences"

CHECK THESE SAMPLES OF Quantitative and Statistical Analysis in Statistical Package for the Social Sciences

Development of Type 2 Diabetes Mellitus Risk Factors in Rural Children

Satisfaction Ratings for Stock, Quiet and Staff in Male and Female Students

Business Success Is Customer-Driven

CRITIQUE OF QUANTITATIVE ARTICLE

Analyzing the Year of Studying

Priming Study Attitudes to Smoking

Regression Analysis Models for Marketing Decision Making

The Way That the Researcher Structures a Research Project