Knowledge Discovery for Business Information Systems Assignment Example | Topics and Well Written Essays

IT: Written Report Customer Inserts His/Her Name Customer Inserts Grade Course Customer Inserts Tutor’s Name 23rd April, 2012 Introduction Today’s business world is changing at a quick and unpredictable pace for different organizations. As a result, several strategies have been devised to understand the changing business environment. One strategy that is catching pace is business intelligence that is used in effective data management. In the course of undertaking business, several decisions have to be undertaken. Moreover, these facts have to be based on factual information backed with evidence for proper decisions to be undertaken. The process of decision making involves many initiatives which need to be synchronized and analyzed from different perspectives. Decision making process involves making use of datasets in the process of modelling and warehousing. Data modelling is very important in making decisions since it ensures patterns for data are known and used in business processes. Data modelling techniques such as SEMMA, CRISP are used in analyzing patterns in data. This essay is going to look into CRISP and data modelling algorithms in the process of loan processing. Moreover, the essay is going to look into data warehousing techniques and how data warehousing assists in developing knowledge. Task 1 Section A The process of seeking for a loan is tedious but it could be made simple by data mining tools. These tools allow for analysis of data making use of different set of analysis to come up with conclusive data set. There are a number of factors that need to be considered when credit worthiness is being assessed in the process of loan application. Some of the factors include the income of the individual, the age, previous credit history, amount and purpose of the loan. The loan process requires business intelligence which should be rooted in evidenced based decision making (Paredes, 2009). The performance of an organization could also be improved if data mining tools are integrated with other solutions in the process of making business decisions. Business intelligence calls for us to make use of different strategies and tools in building the performance of a business. In the process of processing a loan we have to make critical decisions based on the data obtained from different customers and sources. Sometimes this data is enormous especially when you encounter a scenario where there are many customers. In these scenarios, data mining tools and strategies are deployed to ensure that better analysis of data is done before decisions are undertaken (Yeates, 2001). Business intelligence informs organizations to make use of data mining in the process of analyzing different sets of data. The following process will be involved in the deciding the best method of analysing credit data that is used in determining the credit worthiness of the customer. Data Models: The first process in the determination of the credit worthiness of a customer and problems associated with assessing good credit history is through building data models. There are different data models which are either predictive or descriptive. Predictive data models try to predict if the customer has the capability of paying off his/her loans (Paredes, 2009). In the case of loans, a predictive data model could make use of a classification to ensure that a certain factor is chosen. A model may predict a factor among different set of factors such as income level among a number of factors. In this case we refer to the factor as a regression model. On the other hand, descriptive data models are divided into clustering and association. The clustering technique divides the data into groups which are similar and this helps in reducing the complexity of data analysis (Olson, 2011). While, making use of association data models we take a look into factors in a data model that occur in pairs. For instance, if we look at people who purchase milk could also purchase bread or wheat products. However, even though predictive models are only used for predictions in some instances predictive data models are also descriptive. CRISP-DM: This is the most widely used data modelling technique which contains a number of steps that is supposed to achieve data analysis. The process contains lots of procedures which are highly repetitive to ensure data is analyzed in a proper manner. The processes are stated below: a) Business understanding: This is the initial process which requires an individual understanding of the business objectives and requirements. b) Data understanding: This process starts with the collection of data and familiarization with data with aim of identifying problems or discovering insights. This is because you have to first get familiar with the data, identify the problems for easier data modelling. c) Data preparation: The preparation phase takes into account all factors and data to construct the final dataset. This is whereby all data will be fed into the tables, records and attributes selection for data modelling is undertaken. This process may involve the gathering of data while the effective criterion on credit worthiness has an assigned score. Moreover, this step includes setting up of rules and criteria for evaluation the credit worthiness. d) Modelling: This is the most important procedure and it consists of various modelling techniques which are selected and applied at different times. Some techniques have the specific requirements that need to be measured for the purpose of achieving most favourable results. e) Evaluation: This is the stage whereby you the models you have built are checked to ensure that they meet the business goals. In the loaning procedure, we look at the whether the model captures information on the credit worthiness of the customer. At the end of this phase, we have to make a decision based on the results of the data mining. f) Deployment: The formation of the data model is not the end of the process since the knowledge or information obtained from the data model will inform business strategies. For instance, in the loaning process, data models will be used determining the criteria and factors that will be used in awarding loans based on the credit worthiness (Silverston, 2002). Data modelling follows different techniques and procedures with the modelling process being very critical. Several data mining procedures will be used in ensuring the data follows the specified criteria (Olson, 2011). In the example above, we look at a data model whereby the customer has 22 attributes that ensure clarity. For instance, we ensure the variables are compared to each other and that if changes occur to the model can be able to handle all complexities. Section B The file provided as credit.csv contains several variables and fields that are used in determining the important variables or factors that are necessary in determining the credit scoring of different clients. The five most important factors from the analysis are: a) Credit history: The credit history of a customer determines a lot concerning the customer’s ability to borrow and pay back his/her debts. The credit history is important since it helps in determining if the customer has the ability or repaying his/her debts (Paredes, 2009). In the analysis of the credit rating and scoring of different customers we found out that customers who possessed the attributes B and C are likely to be considered for a loan. This is because these customers have had a loan in the past that they have cleared or a loan which they are in the process of clearing. This factor is very important since in the box plot figure, we see that the closeness of the credit rating. This is shown in figure 1 in the appendix. From the analysis we witness that credit history features and occurs in many instances in figure 2 of the diagram in the appendix. b) Savings Account: From the analysis conducted we found out that having a savings account is necessary in securing a loan. Moreover, the savings account must have some funds in it to enable a person secure a loan. For instance, in the analysis, customers with accounts which had between 500 and 1,000 Deutsch Marks and those whose accounts had over 1,000 Deutsch Marks had the best chances in qualifying for a loan. From the analysis we see that the savings account is important since it allows for analysis to be conducted. In the analysis of the savings account, we see those customers who have between 100 and 500 Deutsch Marks and 500 and 1000 Deutsch Marks are considered credit worthy. This is witnessed in figure 2 by the decision tree. c) Employment: Employment is an important factor that is taken into consideration during the process of securing a loan. The employment status is mostly tied to his/her income since people derive their income from their employment. In the assessment of the loan procedure we came to find out that people who had been employed for a period of between 1 and 4 years had better chances of getting loans. The same applies to people who have had a job for a period of between 4 and 7 years. The analysis of employment figures shows people who are employed within a period of above one year ensure that loans are given. Employment is an important factor since if a person who is unemployed has lower capabilities of getting a loan. Only around 16% of the unemployed people have the capability of getting loans. This is shown on figure 2. d) Job type: Job type was another factor that was considered in the process of seeking for a loan. In this case, we take a look at the skills of the employee in respect to his income and credit rating. It is well known that the skills of an employee contribute to the amount of disposable income that is available. For instance, we take a look at the analysis of data whereby we get to note that unemployed people tend to have little ability of paying their debts due to the low income they earn. Job type is an important factor in determining if a loan can be secured based on credit worthiness. In the decision tree in figure 2, we get to see that criteria like if a person has unemployed and unskilled has around 9% of securing a loan due to poor creditworthiness (Silverston, 2002). Moreover, people who are highly qualified and have managerial skills are likely to secure a loan due to good credit worthiness. This is show n by tree decision on figure 2. e) Credit Rating: Credit rating can be described as the evidence that supports the repayment of debts by different customers. In our case, we look at credit rating as an important factor since good credit ratings ensures that there is likelihood of the person paying off the loan. While bad credit ratings bring down the scores that are necessary in ensuring credit worthiness is achieved. Credit rating is an important marker since it seen that the decision tree whereby a person who has good rating Section C The task of data modelling is tedious since it involves a lot of procedures which are repetitive in nature. These procedures include making use of analytical and descriptive data in that it to capture important aspects of the loan process. In the design of a data warehouse, the client in this case will have to consider multi-dimensional modelling since the variables used in the loaning procedure are quite numerous. Multidimensional data warehouse models have the capability of being used in querying data from different sources and presenting accurate information (Pittman, 2008). Based on the processes and procedure used in designing and coming up with a loan process it would be prudent to design a hybrid system for the loan process. For instance, in the case of the loan process the metadata that would be used analyzing the data that would be used in the data warehouse. In the development of a data warehouse it would be prudent to make use of three-tier architecture since this would ensure proper flow of data in the loan process (Abramowicz, 2011). The three-tier architecture is good in ensuring that the data stored can be easily retrieved and used. The loaning process involves analyzing various variables in order to come up with good decisions that ensure credit scoring and loan default levels are low. The above procedure of the loan procedure we get to know that in some instances, to make the data simple we have to make use of a web based data warehouse that is user friendly (Reeves, 2009). The most critical procedure in the process of handling data is through data is modelling. Therefore we have to decide the best system to make use of in ensuring data warehouses are maintained. In the case of loan procedure, parallel processing would be the most effective system in maintaining the data warehouse. The use of data migration tools that is very important since it allows for data to be stored and managed effectively (Hernandez, 2006). However, the process of data migration is not very important compared to the analysis tool. In the process of loan procedure it is important for the data to be analyzed. The best tool to be used in the analysis of the loan procedure is the online transaction processing. This is because it has the ability of handling many transactions from huge databases (Sullivan, 1996). Transaction databases are usually huge and making use of online transaction processing is effective in relational databases. Within the relational databases we have to make use of slicing, rolling up or pivoting to analyze the data. For instance, in the relational database for a loan system these systems are used in analysing data into segments before they are stored for easy reference. Task 2 Using pivot tables we can analyze a different set of data using various data techniques to come up with different set of data modelled in various ways. The table below shows data/graphs modelled using pivot tables (Yeates, 2001). a) In the analysis of the excel data for sales figures in the different regions we are able to come up with different conclusions. First we notice the glue gun is the best selling product in three regions with the exception of the West region. The North region has average consumption of the three products. The buying patterns in the north and east regions nearly replicate each other in the data set. For example, we see that glue guns are popular in these regions while the level of demand for light sabres is same in these regions. Moreover, the south and west regions also display same demand patterns for the three products. Whereby demand for transponders are high while the demand for light sabres is low in these two regions. This is seen in figure 5 in the appendix section. b) From the analysis of the sales figures for the different salesmen data in the different regions, we get to know that Luke Skywalker is the top sales person. He tops in sales figures in three regions except for the north region. We are also able to deduce that Chewbacca is a consistent sales person. This is because his sales figures in all the regions are consistent and rarely differ. Hansolo is the worst sales person among the four sales people and this could be attributed to low sales in the West region. The same trend can be seen with the sales figures registered by James Kirk. Generally all the sales people sold fewer products in the west region leading to low sales as witnessed in figure 6 in the appendix section. c) From the analysis of the sales figures in the excel file we get to know the purchases conducted by different organizations. In the analysis using pivot tables, we get to know that Enterprise was the highest purchaser of different goods. While Planet on the other hand is lowest purchaser of goods among the three different customers. Generally, the sales for the year 2008 were lower compared to the sales of 2009. Transponders were the least selling goods among the three different products. Planet performed poor due to poor sales in the year 2008. In overall glue guns were popular among these customers since they sold a lot of units. d) From the analysis, we get to see that the south side has the highest sales among all the regions. Luke Skywalker was the best salesman among the different people. Glue guns were the best selling products in all the three regions. Transponders and Light Sabres recorded different sells figures in all the different regions. For instance, in the North section Transponders sold more than Light Sabres. While on the other hand, Light Sabres sold more than Transponders in all the different regions. Hansolo was the worst performing salesman among all other sales people and he sold fewer goods in the west region. Conclusion The world of business intelligence is complicated since it involves making use of data from different locations. These data is made up of complex and simple variables arranged to make up a complete dataset. In most cases, these datasets consists of huge sets of variables that inhibit research into different sections and aspects of business. As a result, data modelling has been used to ensure that analysis of these datasets is made easy and in most cases different strategies are deployed. For instance, the CRISP and SEMMA data modelling methods are quite effective. In this essay we looked at how CRIPS was deployed in analysing and drawing inferences on the process of securing a loan from a bank. The process was quite tedious since it involved analyzing relational datasets. In the last exercise we looked at ways of using pivot tables to analyze the different datasets while at the same time the importance of this data analysis. Data warehousing is indeed an important aspect of business since data stored could later be used in decision making process in business. It is therefore prudent to combine these different techniques of data modelling, data warehousing and other analysis tools in making business decisions. References Abramowicz, W. and Żurada, J., 2011. Knowledge discovery for business information systems. Boston, MA: John Wiley and Sons. Hernandez, T., 2006. Progressive Methods in Data Warehousing and Business Intelligence. Canadian Journal of Regional Science, 29(1), pp.31-35. Olson, D. and Delen, D., 2011. Advanced data mining techniques. Sydney: Cengage Learning. Paredes, J., 2009. The Multidimensional Data Modeling Toolkit: Making Your Business Grow. Lowell, MA: John Wiley and Sons. Pittman, K., 2008. Comparison of data mining techniques used to predict student retention.Chicago, IL: Routledge. Reeves, L., 2009. A Manager's Guide to Data Warehousing. London: Palgrave. Silverston, L. and Agnew, P., 2002. Data, models, and decisions: The fundamentals of management science. New York, NY: Lippincott Williams & Wilkins. Sullivan, O., 1996. Data Warehousing. ABA Banking Journal, 88(1), pp.54-57. Yeates, M., 2001. Business Intelligence & Data Modelling: A Developing Field. Canadian Journal of Regional Science, 24(1), pp.22-24. Appendix Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Read More

Knowledge Discovery for Business Information Systems - Assignment Example

Extract of sample "Knowledge Discovery for Business Information Systems"

CHECK THESE SAMPLES OF Knowledge Discovery for Business Information Systems

Data Warehousing and Data Mining

Data Mining for E-Commerce

Implications of Implementing a Wide Business Information System in Tesco

Management Information and Communication Systems

Conceptual Development of Technological Opportunities

Analysis of the Issues Associated with Management of Data the Organization

Relationship between Supply Chain Structure Supplier Selection

Data Mining and Knowledge Discovery in Database