Mathematical Statistics and Data Analysis Math Problem Example | Topics and Well Written Essays

Question 1: a. Mean and standard deviation: Given that the sample was randomly selected then the sample means is expected to be unbiased and therefore depict the population means, this means that the sample mean of the random sample will be an unbiased estimator of the population mean, in this case therefore we determine the sample mean, the results below shows the SPSS output where n =258: Total Mean 57.1783 Std. Deviation 17.97748 Sum 14752.00 In the above case n = 258, the population means is mean is 57.1783 and standard deviation is 17.97748 B. confidence interval: The confidence interval is calculated as follows according to Stuart (1998): P{ [ X – ST] ≤ X≤ [ X+ ST]} = 90% Where X is the mean, S is the standard deviation and T is the T statistic at 90% level, we substitute our formula as follows: P{ [57.1783 – (17.97748)( 2.32635)] ≤ 57.1783≤ [57.1783 + (17.97748)( 2.32635)]} = 90% P{15.35639 ≤ 57.1783≤ 99.00021} = 90% This means that we are 90% confident that the populations mean of exam results lies between 15.35639 and 99.00021. C. justification of the formula: The above formula is used in the determination of a confidence interval, the formula uses the mean, standard deviation and T statistic value, the standard deviation is a measure of dispersion of the mean, by constructing a confidence interval we determine the deviation of the mean given a level of confidence, therefore the confidence interval above states that there is a 90% probability that the mean lies between 15.35639 and 99.00021. D. sample means for Australian and non Australian residents The following is the SPSS output for the sample means of the different countries: Report Exam Country Mean N Std. Deviation 1 60.0254 118 15.80461 6 56.2174 92 19.80167 7 52.4524 42 19.18231 8 49.0000 6 11.71324 Total 57.1783 258 17.97748 Means: Country Mean 1 60.0254 6 56.2174 7 52.4524 8 49.0000 Total 57.1783 From the above output it is evident that country 1 mean is higher than in any other country. E. standard deviation: Country Std. Deviation 1 15.80461 6 19.80167 7 19.18231 8 11.71324 Total 17.97748 The standard deviations are also summarized in the above table. F. difference in means: In this case we test whether exam mean results in Australia are higher than in non Australian residents, we assume that country 1 represents Australia, Null hypothesis: H0: a = b Alternative hypothesis: Ha: a > b Where a is the mean for Australia exam results, and b is mean result for the other country. The following is the SPSS output: Country 1 and 6: Group Statistics Country N Mean Std. Deviation Std. Error Mean Exam 1 118 60.0254 15.80461 1.45493 6 92 56.2174 19.80167 2.06447 Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper Exam Equal variances assumed 4.642 .033 2.517 158 .013 7.57304 3.00901 1.62998 13.51611 Equal variances not assumed 2.296 61.939 .025 7.57304 3.29815 .98000 14.16608 Country 1 and 7 Group Statistics Country N Mean Std. Deviation Std. Error Mean Exam 1 118 60.0254 15.80461 1.45493 7 42 52.4524 19.18231 2.95989 Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper Exam Equal variances assumed 4.642 .033 2.517 158 .013 7.57304 3.00901 1.62998 13.51611 Equal variances not assumed 2.296 61.939 .025 7.57304 3.29815 .98000 14.16608 Country 1 and 8: Group Statistics Country N Mean Std. Deviation Std. Error Mean Exam 1 118 60.0254 15.80461 1.45493 8 6 49.0000 11.71324 4.78191 Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper Exam Equal variances assumed .765 .383 1.683 122 .095 11.02542 6.55283 -1.94657 23.99741 Equal variances not assumed 2.206 5.966 .070 11.02542 4.99835 -1.22182 23.27266 We reject the null hypothesis H0: a = b and accept the alternative hypothesis that Ha: a > b in all the cases and therefore conclude that the performance of Australian residents is higher than in the other countries. Question 2: A. male broken down into faculty and country: Faculty 1 2 3 Count Count Count Country 1 28 12 19 6 40 0 8 7 4 8 13 8 0 1 1 B. female broken down into faculty and country: Faculty 1 2 3 Count Count Count Country 1 29 26 4 6 44 0 0 7 6 11 0 8 4 0 0 C. proportion: 1 =Business, 2=Sciences, 3=Engineering and Surveying Male students in group: Total in group = 258 Total male =134 Proportion of male = 134/258 = 51.938% Male students from Australia enrolled in the business faculty: Male from Australia = 59 Male enrolled in business = 28 Proportion = 28/59 = 47.4576% Female in science faculty Female = 124 Science faculty females= 37 Proportion = 37/124 = 29.84% D. confidence interval: We assume that the sample considered in this study was randomly selected and therefore represents the entire population, in this case therefore we consider the sample for overseas students in analyzing this relationship, the following is a summary of the male female total proportion country 1 59 59 118 0.457364 6 48 44 92 0.356589 7 25 17 42 0.162791 8 2 4 6 0.023256 total 258 1 mean 64.5 0.25 standard deviation 50.15642 0.194405 proportion of overseas 0.542636 mean proportion of overseas 0.180879 The confidence interval is calculated as follows: P{ [ X – ST] ≤ X≤ [ X+ ST]} = 95% Where X is the mean proportion of overseas students, S is the standard deviation and T is the T statistic at 95% level, we substitute our formula as follows: P{ [0.18088– (0.194405)( 2.32635)] ≤ 0.18088≤ [0.18088+ (0.194405)( 2.32635)]} = 95% P{0.09038 ≤ X ≤ 0.9949} = 95% This means that we are 95% confident that the mean proportion of overseas students lies between 0.09038 and 0.9949. Question 3: a. Lurking variable: a lurking variable can be defined as a variable that is not included in a study yet it is very important in the relationship being studies, example a study aimed at forecasting sales level in a firm, this study may not consider other factors affecting consumer expenditure and therefore the variable that is omitted is referred to as a lurking variable. B. central limit theory and law of large numbers: Stuart, D. (1998) states the central limit theory states that as the number of random variables increase then the distribution assumes a normal distribution, the law of large numbers depict the nature of the sample means, this law states that if we are to sample a population and increase the sample size then the mean derived becomes stable and is close to the expected value. Therefore the two differ in that the central limit theorem depict a normal distribution for random variables as these numbers are increased indefinitely, the law of large numbers on the hand depict the stability of the mean as the sample size is increased. C. sample mean and population mean in random samples: The sample mean and the population mean differs in some instances, if we consider a small sample then we expect the sample mean to deviate more from the population mean, however if we use a large sample then the sample mean is close to the population mean. The mean deviates from the population mean as a result of the standard error derived, when considering data we calculate the mean and the standard deviation, the standard deviation depicts the deviation of variables from the mean and therefore due to this deviation the sample mean is different from the population mean. D. sample mean and population mean in voluntary response samples: In voluntary response sample the sample mean differs from the population due to the unpredictable human behaviour, for this reason therefore the sample mean will differ from the population mean. E. parameter and a statistic: when we apply a function to a data set the result is referred to as a statistic, on the other hand the quantity that define a characteristic of a function is referred to as a parameter, example when we estimate Y = a + bx then a and b are parameters because they define the characteristic of the function. F. central limit theory in statistic: The central limit theory is important in statistic in that it depicts the distribution of random variables as their number increase indefinitely, this means that the larger the sample used in a study then the more accurate the estimates will be. G. sampling distribution of the mean: Sampling Distribution of the mean can be defined as the distribution of sample means taken from the same population and the same sample size, this means that the mean of a sample can therefore is predictable and can be determined before sampling. H. telephone survey problems: telephone interview are appropriate in that they are less expensive and less time consuming than face to face interviews, however there are various validity problems associated with telephone surveys, one of this problem is sampling whereby when sampling the sample produced may not be a representative probability sample, this problem arises where people may not have phones in their homes and that majority of individuals may have phones that arte not listed in directories. The other validity problem is the quality of information obtained from the telephone interview, respondents may refuse to provide information truthfully and therefore studies may not be valid, in a face to face interview it is very easy to know when a respondent is lying and in the telephone interview a respondent may provide untruthful information. Question:4 Union official 40% of trucks carry heavy load, Main road department 17 out of 60 carry heavy load The main road department 17/60 = 28.33% carry heavy load We establish a binomial distribution probability: Hypothesis: Null hypothesis H0: a = b, alternative hypothesis: Ha: a ≠ b Where a is the union official probability and b is the probability of main road department, we establish where the two probabilities are equal. The binomial probability function is as follows according to Durrett, R. (1996): Pr = nCk X Pk X (1 – P)n-k Where n is number of trials, K is number of successes and P is the probability of successes, we substitute our formula as follows: Pr = 60C17 X 0.283317 X (1 – 0.2833)60-43 Pr = 0.113674 The binomial probability depict that the probability of achieving a success in 60 trials is 0.113674. From our above analysis it was evident that the probability value is 0.4 and this means that we reject the null hypothesis and accept the alternative hypothesis. Question 5: Twins: A. parametric test: Hypothesis: We test whether there is a difference in the test for foster twins and adopted twins: Null hypothesis: H0: a=b Alternative hypothesis: Ha: a ≠ b The following is the SPSS output T test for the two means: Paired Samples Test Paired Differences t df Sig. (2-tailed) Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper Pair 1 adopted - foster 4.875 8.509 3.009 -2.239 11.989 1.620 7 .149 From the test we reject the null hypothesis; therefore there is a difference in the test for foster twins and adopted twins B. non parametric test: We determine the Wilcox on signed rank test: Ranks N Mean Rank Sum of Ranks foster - adopted Negative Ranks 7(a) 4.21 29.50 Positive Ranks 1(b) 6.50 6.50 Ties 0(c) Total 8 A foster < adopted B foster > adopted C foster = adopted Test Statistics(b) foster - adopted Z -1.620(a) Asymp. Sig. (2-tailed) .105 A Based on positive ranks B Wilcox on Signed Ranks Test From the above we reject the null hypothesis H0: a=b and therefore depict that the foster WISC is greater than adopted children C. difference in the two tests: There are differences evident in the two tests, the first test uses the T table to analyze the relationship and test the hypothesis whether the two mean are equal or different, the non parametric test on the other hand test the rank of the two variables and the number of positive and negative ranks in the data, therefore the parametric method is based on a t table while the non parametric method not only test the hypothesis but also provides the ranks of the two variable. References: Durrett, R. (1996) Probability: theory and examples, London, Sage publishers Rice, J. (1995) Mathematical Statistics and Data Analysis, New York, McGraw hill press Stuart, D. (1998) Statistics: An Introduction, New Jersey, Prentice hall publishers Read More

Mathematical Statistics and Data Analysis - Math Problem Example

Extract of sample "Mathematical Statistics and Data Analysis"

CHECK THESE SAMPLES OF Mathematical Statistics and Data Analysis

A Simulated Research Study

Ethical Behavior of Business Students at Bayview University

MATHEMATICAL STATISTIC AND ITS IMPACT ON LAWENFORCEMENT

Is there a relation between age and income

Relationship Between ANA Test Titers Autoimmune Disease

The Central Limit Theorem

Quantifying Experts Uncertainty about the Future Cost of Exotic Diseases

The Use of Statistics in Mathematics