Data Classification Using Weka Software Lab Report Example | Topics and Well Written Essays

A REPORT ON DATA CLASSIFICATION USING WEKA SOFTWARE Student Name Course Instructor Institution of Affiliation City Submission Date Introduction In a general understanding, data mining or knowledge discovery is the process of providing meaning to a set of data through proper analysis. The data analyzed and given meaning may come from different range of sources. The data may be obtained as free texts from websites or other social sites, also, the data can be a structured data existing within a given structured data repository. There exist various tools for data mining, and these are the tools that are used to derive sense from a given set of data. Data mining is widely used technique in addressing the existing complexities in data, more so the unstructured data. Weka is one of the most easy to use, but complete data mining tools. It is a machine learning tool which allows the use of various algorithms to perform data classification and other machine learning tasks. Weka algorithms (the classifiers) can be used from other programs such as java codes, python codes among others. Nevertheless, Weka can also be used directly within the software which is tailored free under the licensing of GNU General Public License (Hall et al., 2009). Weka as software can perform the following machine learning tasks: Data preprocessing: These refer to the set of processes which Weka allows users to perform on the data before the actual machine learning processes. Regression calculation: Regression calculation involves the use of statistical regression equation to calculate the relations among the data presented to Weka. This is a very important process in Weka’s machine learning as it is depended upon by various other processes like the data classification and clustering. Data classification: Data classification is simply the process of organizing (classifying) data into several categories for easy, efficient and most effective use. A properly designed classification algorithm makes data available for business use, in other words, it makes the data easy to retrieve and analyze. Data clustering: This is the process of putting data into several different with less attention on the inner details. Compared to classification, data clustering is a simple way of putting data into simple cluster classes. Data visualization: Data visualization, as the name suggests, is a way of having a pictorial perception of a given data set. In many cases, data visualization is done through visual reports. He visual reports are the various statistical tools which are used to obtain trends and make observations on the data. Data association: data association refers to the relationships among data sets. These associations are important in determining the classification classes of a given data set. Weka makes use of various association rules to classify data (Hall et al., 2009). Data Description Data description refers to the information about the data under study. It is closely related the term metadata which is data about data. In data mining and classification, data description plays a very major role in coming up with the clusters or classes in which the data belongs. Weka is keen on the data types, or the file types which it has the ability of performing any processing on. The files presented to Weka must therefore be in either of the supported formats. Weka supports file formats like .arff, .csv, C4.5 among others. Weka users must then have the knowledge of data and file conversion in order to achieve the best from Weka. For this specific assignment, the data files provided are all .csv files which are supported. There was no need therefore to convert the data into a different format, but rather use the data just as they are. The Data sets were therefore imported into weka tool, one set at time. The data files provided for this assignment included: sub-0.csv, sub-1.csv, train.csv, test.csv. The data is a bank client data which contains a number of attributes. The dataset came with a description file, illustrating what every attribute represented. Input The input to the classifier was the set of data with the associated attributes. The bank client data should undergo training and classifications to obtain the most appropriate results. The data underwent a number of transformations such as splitting the data to obtain a higher level of accuracy in the classification. Also, conversion of some columns to allow for proper classification was also conducted. Output After running the data over a number of classifiers, the required output should be a variable to y which is answers the question whether a client has subscribed a term deposit or no. The output is a binary value, taking either 'yes' or 'no'. Results Below are some of the screen shots showing the Weka results obtained for the classifier algorithms that were chosen for this experiment. The experiment started by transforming the data and splitting the datasets into 2. Data Classification Neural Networks In Weka, Neural Network is under the MultilayerPerceptron. The most commonly used Neural Network architecture is feed-forward. It is always characterized by input layer, hidden layer and finally the output layer. The output signal which always corresponds to the input vector which has the attributes to be classified shows the class to which the object belongs. For this case, given that the output was a binary variable (0,1), the neural network interpretation was done as a probability (result shown on the screen-shots on the results section). The values corresponding to the unit a, is corresponding to the probability that the specific input vector belonged to the same class. The above presentations (screen-shots) give specific details on the logics and equations applied by neural networks. The above output gives the node types, the inputs and the weights. Given that there was no any alteration of the network topology, we have all the hidden layer nodes being sigmoid, and output layer nodes being linear units. From the above results, we can see that the answer to our question according to MultilayerPerceptron is “No”. The binary result 0. Support Vector Machines In weka, Support Vector Machines implements sequential minimal optimization algorithm, and hence the name SMO. To use support vector machine therefore, we went to the classify, functions and chose SMO. One advantage of using the classier for this specific assignment is its ability to convert nominal attributes to binary. In addition, it automatically normalizes all attributes as a default behavior. This based the output coefficients on the normalized data and not the original data which is very important for interpretation of the classifier. The above screen-shots show the results of the classification done by Support Vector Machine algorithm on weka tool. The algorithm can be used for both classification and regression, however, for this experiment, it was only used for classification. As seen from the above results, the algorithm automatically tries to normalize the data, giving a more precise and accurate results. The resulting coefficients which are obtained from normalized data and not the actual data makes it clear to make sense out of the result. The interpretation of the above result shows a “No” answer to the task in question. Discussion Weka is a machine learning tools with various algorithms for data classification. It is an easy to use tool with simple button clicks to achieve the goals. However, it also has the simple CLI which can also be used to achieve most of the functionalities. In this assignment, the graphical user interface was used. Weka also have additional three interfaces, these includes: Explorer interface is responsible for the provision of the graphical front end to Weka’s components and routines. Experimenter: this part allows the user to build classification experiments on Weka Knowledge Flow: this section is an alternative option to the Explorer as graphical user front end user interface for Weka’s main algorithms. The set of data for the assignment is imported into Weka for the classification to be done. The next stage after data importation was the filtering. The imported set of data was taken through filters to clean up. Some of the filters contained in Weka are meant for normalization, re-sampling, discretization and transforming, combining attributes, attribute selection etc. For this task, data transformation was done. This involved NominalToBinary Transformation This transformation was basically necessary for MultilayerPerceptron algorithm. This algorithm has no automatic way of normalizing the data. Preprocessing of the data is therefore a very important step when doing data classification using weka. The Classifiers During the classification, a total of two different classifiers were applied for the sake of this experiment. They are Neural Networks and Support Vector Machine classifiers. Each classifier gave a different result depending on the set parameters. The difference mainly lied on the accuracy and the biasness (threshold). SVM showed high level of over-fitting compared to NN. SVM classier showed the highest level of accuracy and speed. Reflection on the Assignment After successful classification of the dataset which was provided using the two algorithms, Neural Network and Support Vector Machines, the differences of the algorithms were clear. It was observed that SVM was able to perform the same number of epochs far much faster than neural networks. Also, the output interpretation for the result was easier with SVM than the Neural Network. This made Support Vector Machine better for this specific task compared to Neural Network. Conclusion The assignment was successful and the classification was achieved using the Weka tool. The various algorithms which were used gave different results. This is because the classifiers vary in several perspectives. Weka allows easy application of the various classification algorithms to achieve the variations and compare the results. Through the use of Weka, the assignment was possible and a lot of processing was done to the data. The recorded results were exact outcomes from Weka. Reference Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I.H., 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), pp.10-18. Read More

Data Classification Using Weka Software - Lab Report Example

Extract of sample "Data Classification Using Weka Software"

CHECK THESE SAMPLES OF Data Classification Using Weka Software

A Level of Useful Inference upon a Given Idea

Design Patterns - Behavioral, Creational, Structural

Software Engineering

Waikato Environment for Knowledge Analysis

Managing Information Technology

Change Management of People and Technology in an ERP Implementation

Machine Learning Algorithms and Tools

Classification of Chances of Defaulting to Pay