Simple Data Analysis and Comparison Term Paper Example | Topics and Well Written Essays

Name: Course: Instructor: University: Date: Contents Contents 1 Executive Summary 2 Executive Summary Data is factual material used as a basis for decision making and other uses. Therefore, data refers to facts, observations, realizations several underlying variables recorded on various events (Kirch). Data requires thorough analysis before it can be used for decision making. This project analyses three sets of data, namely, D1, D2 and D3. The analysis includes the determination of maximum and minimum values of each data set and a summary for a quick glance. Also, in the analysis are data analysis methods such as the mean, the median, the mode, standard deviation, kurtosis and skewness. These values provide more information about the characteristics of data. For instance the range provides the difference between the highest and lowest value in a given set off data. The mean shows a value that best represents other scores within the set of data. These values can also be used make comparisons between different data sets. The project analyses data in the form of charts figures and graphs to create pictorial represents that give more information about the characteristics of the data under analysis. For instance, the use of box plots, histograms scatter graphs and time series charts provide information pertaining to the skewness and distribution of data. The project provides reliable findings that are derived from the data analysis methods used. The findings and experiences learnt in the data analysis project are summarised in the conclusion. Introduction Data is either voluntarily or involuntarily collected in everyday events. Proper analysis of this data is crucial in all sectors. Businesses, social disciplines, the education sector and other disciplines use and require properly analysed data to make reliable decisions. The simple data analysis methods that this projects uses are of significant use in engineering. They allow scholars, users and stakeholders to deal with variability in real data. It should be noted that all decisions, based on real data are risky and uncertain, hence good decisions are required as one is able to quantify this risk. These simple data analysis methods are crucial in understanding the sources or causes of variability. Once these sources of data variability are understood, they can be removed (Vining and Kowalski). Methods of data analysis such as the range provide information about highest and lowest value for a variable. This can help to discover miscoded data (Meier, Brudney and Bohte). The data analysis makes it easier for users to see relationships in data and understand the distributions and occurrences of difference causes. For example, users of information are able to know the difference between the highest and smallest value in the three sets of data. However, these data analysis methods are limited to knowledge in that those who cannot interpret the indications of the various methods of analysis will make no use of the analysis. Secondly, use of methods such as empirical rule may not be effective because it only applies to unimodal and symmetrical data. Project Work The analysis of data involves the use of various tools and methods of data analysis. The minimum and maximum values of in each of the three data sets are determined by this analysis. The range, which represents the difference between upper bound and lower bound, measures how far the values are spread out. It can also detect data entry errors. The mean assigns importance to both small and bigger items in a set of data. It is also used to make comparisons with other data. The median identifies what could be the average when data is arranged chronologically. It is a score that provides the best representation of majority of other values (Vincent and Weir). The standard deviation measures variability in a set of data. It indicates how far above or how far below, the mean or other values are, and how many values are likely to be found at any distance from the mean (Lamond). Skewness is also important because when a set of data is positively skewed, its scores are clustered to the left, and the tail is to the right. Scores clustered to the right and a tail extending to the left. Represent a negatively skewed set of data. A bell shaped distribution represents a normal kurtosis. A peak indicates a positive kurtosis. A flat distribution indicates a negative kurtosis. The data analysis methods used are explained in the analysis below, and illustrated in table1. Analysis of Data Sets, D1, D2 and D3 D1 Min Min is the minimum or lowest value in a given set of data. In this case, the Min is 181.52977 Max Max is the maximum or highest value in a set of data. The Max in this case is 365.29787 Range Range is the difference between Max and Min. in this case, the range is 183.7681 Count Count is the total frequency or the number of values in a set of data. The Count in this case is 147. Mean Mean is given as ∑fx / ∑f = 41079.4416/147 =279.452 Median Median is the value that is at the middle level when data is arranged in an ascending or descending order. The median of this data set is 278.1439 Mode The Mode is the value with the highest frequency in a given set of data. In this case, the mode is 181.53. Standard Deviation The standard deviation of this data set is 39.935. Kurtosis In this data set, Kurtosis is -0.3001. Skewness Skewness formD1 is -0.1578 D2 Min Min is the minimum or lowest value in a given set of data. In this case, the Min is 4.08. Max Max is the maximum or highest value in a set of data. The Max in this case is 280.91. Range Range is the difference between Max and Min. in this case, the range is 276.84. Count Count is the total frequency or the number of values in a set of data. The Count in this case is 147. Mean Mean is given as ∑fx / ∑f = 19487.007 /147 =132.565 Median Median is the value that is at the middle level when data is arranged in an ascending or descending order. The median of this data set is 126.130. Mode The Mode is the value with the highest frequency in a given set of data. In this case, there is no Mode. Standard Deviation The standard deviation of this data set is 80.76. Kurtosis In this data set, Kurtosis is -0.300. Skewness Skewness for D2 is 0.154. D3 Min Min is the minimum or lowest value in a given set of data. In this case, the Min is 244.748. Max Max is the maximum or highest value in a set of data. The Max in this case is 728.53. Range Range is the difference between Max and Min. in this case, the range is 483.782. Count Count is the total frequency or the number of values in a set of data. The Count in this case is 147. Mean Mean is given as ∑fx / ∑f = 67022.4543 /147 =455.9351 Median Median is the value that is at the middle level when data is arranged in an ascending or descending order. The median of this data set is 442.3245. Mode The Mode is the value with the highest frequency in a given set of data. In this case, there is no Mode. Standard Deviation The standard deviation of this data set is 90.2632. Kurtosis In this data set, Kurtosis is 0.4771. Skewness Skewness for D2 is 0.4316. Table1: Summary Table Min Max Range Count Mean Median Mode Standard Deviation Kurtosis Skewness D1 181.53 365.30 183.77 147 279.45 278.14 181.53 39.94 -0.300 -0.158 D2 4.08 280.91 276.84 147 132.57 126.13 - 80.76 -0.300 0.154 D3 244.75 728.53 483.78 147 455.94 442.32 - 90.260 0.477 0.432 Presentation of Data in Histograms Histograms make it easier to see relationships in data. They enable information users to understand the distribution of occurrences of different causes (Andersen and Fagerhaug). Users also determine whether data is normal or not. For instance, none of the data sets has a normal distribution as can be shown by Charts1, 2, 3 and 4, respectively. Chart 1: Histogram for D1 Chart 2: Histogram for D2 Chart 3: Histogram for D3 Chart 4: Histogram for D1, D2 and D3 Use of the Empirical Rule Empirical rule is used in the approximation of percentage of values that are within a given number of standard deviations from the mean of data if the data is normally distributed. It assumed that in a normal distribution of data; 68% of values fall = > mean ±1sd 95% of values fall= > mean ±2sd 99.73% values fall= > mean ±3sd However, the empirical rule is an approximation that applies only when the data is symmetric and unimodal (Merrill). Analysis using the empirical rule is shown below and summarised in table 2. D1 279.45-39.9 = 239.54 = 68% of 239.54 = 164.88 279.45+39.9 = 319.39 = 68% of 319.39 = 217.19 279.45-79.8 =199.65 = 95% of 199.65 = 189.66 279.45+79.8 = 359.25 = 95% of 359.25 = 341.29 279.45- 119.7 = 159.75 = 99.73% of 159.75 =159.32 279.45+119.7 = 399.15 = 99.73% of 399.15 = 398.07 D2 132.57 - 80.76 =51.81 = 68% of 51.81 = 49.22 132.57 + 80.76 = 213.33 = 68% of 213.33 = 202.66 132.57 - 161.52 = -28.95 = 95% of 28.95 = 27.51 132.57 + 161.52 = 294.09 = 95% of 294.09 = 279.39 32.57 - 242.28 = -109.71 = 99.73% of -109.71 = 109.41 132.57 + 242.28 = 374.85 = 99.73% of 374.85 = 373.84 D3 455.94 – 90.26 = 365.68 = 68% of 365.68 = 364.69 455.94 + 90.26 =546.2 =68% of 546.2 =371.42 455.94 - 180.52 = 275.42 = 95% of 275.42 = 261.65 455.94 + 180.52 = 636.46 = 95% of 636.46 = 604.64 455.94 - 270.78 = 185.16 = 99.73% of 185.16 = 184.66 455.94 + 270.78 = 726.72 = 99.73%of 726.72 = 724.77 Table2: Empirical Rule Summary Table D1 D2 D3 > mean -1sd 164.88 49.22 364.69 > mean +1sd 217.19 202.66 371.42 > mean -2sd 189.66 27.51 261.65 > mean +2sd 341.29 279.39 604.64 > mean -3sd 159.32 109.41 184.66 > mean +3sd 398.07 373.84 724.77 Five Number Summary The analysis carries out a five number summary whereby the max, third quartile, mean, first quartile and the min of each data set is analysed in table 3, 4, 5 and 6. Table 3: Five Number Summary for D1 Item UE Value MAX 365.3 Q3 304.93 MEDIAN 278.14 Q1 255.6 MIN 181.53 Table 4: Five Number Summary for D2 Item Value MAX 280.91 Q3 200.71 MEDIAN 126.13 Q1 60.6 MIN 4.08 Table 5: Five Number Summary for D3 Item Value MAX 728.53 Q3 509.54 MEDIAN 442.32 Q1 402.77 MIN 244.75 Table 6: Summary for Five Number Summary for D1, D2 and D3 D1 D2 D3 MAX 365.3 280.91 728.53 Q3 304.93 200.71 509.54 MEDIAN 278.14 126.13 442.32 Q1 255.6 60.6 402.77 MIN 181.53 4.08 244.75 Box Plots The project uses box plots display variable locations, and indicate the data’s skewness and symmetry. They are a convenient tool to be used for comparison of data sets. However they hide many details of the distribution. Chart 5: Summary Box Plot for D1, D2 and D3 Probability Plot The project uses probability plots to determine distribution patterns of data as shown in charts 6, 7, 8 and 9. Chart 6: Normal Probability Plot for D1 Chart 7: Normal Probability Plot for D2 Chart 8: Normal Probability Plot for D3 Chart 9: Normal Probability Plot for D1, D2 and D3 Time Series The project uses time series charts to make prediction of trends of future values as can be shown in charts 10, 11, and 12. Chart 10: Time Series for D1 Chart 11: Time Series for D2 Chart 12: Time Series for D3 Results/Discussion From the analysis, D3 has the highest range. Therefore, it is the set of data that has the highest variation of data. D1 and D2 are not highly varied. D1 and D2 are negatively skewed while D3 is positively skewed. This means that in a histogram the charts for D1 and D2 have scores that are clustered to the left while the chart for D3 has variables clustered to the right. D3 has the highest standard deviation. Therefore, variables in this data set are highly varied. Using normal probability plots, it can be noted that D1 and D2 exhibit sets of data that can be categorised normal while D3 does not exhibit characteristics of a normal distribution. This is because straight lines can be drawn on the scatter marks of normal probability plots for D1 and D2 while it is impossible to draw a straight line on the normal probability plot for D3. The means of the three sets of data also provide significant information about the value of most of the variables expected in each of the data sets. D3 has the highest mean while D2 has the lowest mean. The median of each data is almost similar to the mean. This also represents most of the variables to be expected in each data set. The charts, graphs and tables used presented the data in the most efficient way that could be understood by all users of information. Conclusion Raw data requires appropriate analysis in order to be used for making decision, and be helpful to information users. Therefore, the major problem is to be able to identify the many data analysis methods that can be used to make reliable inferences. From the investigation, in this analysis, it can be noted that data can be analysed using various data analysing methods. Also, there are several methods that can tell the type of distribution of a given set of data. For instance, the range provides information about how varied the variables are while the median and mean can tell the value of the most common variables available in that data set. Histograms, skewness ad normal probability plots show that only data sets D1 and D2 have a normal distribution. Therefore, data can be analysed in several ways to arrive at the same inference, and it is recommended to use more than one data analysis method to test a characteristic in data sets. Works Cited Andersen, Bjørn and Tom Fagerhaug. Root Cause Analysis: Simplified Tools and Techniques. Milwaukee: ASQ Quality Press, 2006. Griffiths, Dawn. Head First Statistics. Farnham : O'Reilly Publications , 2008. Kirch, Wilhelm. Encyclopedia of Public Health: With 75 Figures and 86 Tables. Dordrecht: Springer Publications, 2008. Lamond, Joseph F. Significance of Tests and Properties of Concrete and Concrete-Making Materials. West Conshohocken: ASTM Pres, 2006. Meier, Kenneth J, Jeffrey L Brudney and John Bohte. Applied Statistics for Public and Nonprofit Administration. Boston: Cengage Learning Press, 2012. Merrill, Ray M. Introduction to Epidemiology. Sudbury: Jones and Bartlett Publishers, 2010. Vincent, William J and Joseph P Weir. Statistics in Kinesiology. Champaign, IL : Human Kinetics Press, 2012. Vining, G Geoffrey and Scott M Kowalski. Statistical Methods for Engineers. Boston: Cengage Learning Press, 2011. Read More

Simple Data Analysis and Comparison - Term Paper Example

Extract of sample "Simple Data Analysis and Comparison"

CHECK THESE SAMPLES OF Simple Data Analysis and Comparison

Descriptive statistics

Review of manuscript

Sampling size and sample size for qualitative research

Business Research Methods

Selection of Lead Balls Based on Their Masses from Three Sources

Law and Employment Regulations

Employing XML, XQUERY, and SQL Queries Techniques