The paper “ Practical Data Mining Using C Language” is a potent example of a lab report on logic & programming. By definition, data mining is a practice of examining big databases with the aim of generating new information. Conversely, it is a technique of manipulating big data by breaking it down into small sections and using a set of algorithms to extract meaningful data that can be used in the analysis of a system. From the process of data mining, new relationships that previously were not identified could be discovered and analyzed.
The analysis is done from different perspectives and views to comprehensively generate a structured layout report on the set of data. Hence, the initial stage of data mining is to come up with data sets that can be easily understood and processed to extract relevant information and relation. From the computation of discovered relationships and patterns, diverse methods can be applied to the data sets and would find practical use in a number of fields, including machine learning, artificial intelligence, database systems and models, and also in conducting statistics. The core source of data in the process of data mining is a database containing a mixture of data of different types.
The data therein is what is extracted, and models applied on to it such that the types can be isolated and uniquely analyzed. There are diverse data mining techniques that are in existence today. They include: Classification Clustering Regression Anomaly detection Association rules Reinforcement learning Structured prediction Feature engineering Summarization Any of the above techniques can be used individually or in combination with another or a number of them thereof to analyze a dataset. Now, data mining is the fourth step of machine learning identified universally under Knowledge Defined in Databases (KDD) process.
For a successful process of data mining, a decision tree is first generated upon a data set of which information is to be extracted. Hence it is imperative to first construct a data tree. The decision tree is created from well-defined algorithms. The common algorithms used include the ID3 algorithm and the C4.5 algorithm. Since the requirement for the assignment is to use the id3 algorithm, the following is a snap description of it. Ideally, it is a straight-forward decision-tree learning algorithm that finds its application uniquely in those sets of data attributes that are well defined and belong to clearly distinguished classes.
The algorithm analyses the input data set iteratively starting from the root node from which it builds a data tree. At every node, the best data classification attribute is chosen. The code used to do the data mining in this assignment has been written in c language and output from the console following the ID3 algorithm. Here is the description of its functionality: Step 1:The program imports the standard library and the string libraries which are pre-coded to enable it to read the string data from the file.
It then sets the string buffer size expanding the buffer memory to 1024 1024 bytes so that it can accommodate the longest line available. The fields also acting as data headers correspond to the number of columns present in the data table, which from the data set, it stands at value 22. This is established in the initial stage of the program under program constants. This spans from lines 17 to 39 of the code as commented appropriately.