Web Page Classifier System Assignment Example | Topics and Well Written Essays

Webpage classifier system Name: Number: Course: Lecturer: Date: Webpage classifier system Introduction With classification of documents, there is need to be accurate and keep the integrity of information at par. When coming up with a classifier system, these are the issues that come into play. Most developers of these systems ensure that they come up with as system that will make it easy to manage articles and to easily get around them. In the current system that is covered in this report, however, there are problems that are associated with the system. One problem that is notable in our case is the correctness of the classified documents. In the system the current documents that are classified are not guaranteed that they are correct. The user is therefore not assured that whatever the system returns as the classification and the searched for documents are in any way correct. This means that the user could be getting wrong results and returns. The criterion that is used to make the classifications is not right and the users have no authority to bank on the return of the queries they submit to the system. This is one area that needs to be looked into. Another problem with the web classifier system is that there is a restriction in classifying documents into existing rules. This is one weakness that is evident in the system in that while trying to classify documents which are not yet classified into the system, one thing that becomes hard is integrating the documents into existing categories. The rules which have been put in place forces someone to create some new rules that will either alter the existing categories or completely create new category folders. This creates redundancy in the classification systems. Redundancy because there will be many folders that will be serving the same purpose. This will bring about unnecessary repetition of classification system. Case study The web classifier system will be analyzed basing on The Australian magazine. The Australian magazine has many categories. Each of these categories have important role in the magazine. The most important categories include National Affairs, Business, Australian IT, Higher Education and Cars, to mention but a few. Each of the categories has editors and journalists who do researches in them. Without good research and innovation, the marketing and the popularity is going to be affected. There is therefore a need to have a good classification system that will be able to take into consideration the categories well. Each category has a chief editor who manages the editing and the publishing of articles. There is a need to make sure that the system that has been used is easy to manipulate. Easy to manipulate because the content is dynamic. Overview of similarity based classification The issue of classifying samples by using their pair wise similarities is in two flavors, measuring the similarity that exists between samples and measuring the similarities of sample by making use of their pair wise similarities. The system that has been developed looks into classification by making use of similarities that exists between samples (Huang, Heutte & Loog 2007). There are many important characteristics and features that come with the developed system. Top on the list is the fact that it is possible to create folders and subfolders. This capability enables classification to be simplified as subfolders allows documents to be neatly stored. The user can easily identify the documents and the folders that are of interest to him/her. On the user level, every user is assigned a domain where they are able to save their classifications. This way, the whole system will not be affected if a user feels the need to change something (Oza 2005). In standard learning, the samples are represented by using d-dimensional numerical factors in a Euclidean space. The elements that are represented in this space quantify a given feature in the sample. An example that can be given is that of fish where the width of a fish and the lightness of the scales might be taken to be two features that might find its use in a system that packs similar fish together (Lanzi, Stolzmann & Wilson 2001). Taking the process normally, the natural method of looking for fish similarity would be to give a description of similarity that is found between fish by computing the Euclidean distance or finding the dissimilarity between their features vector representations. In a general, the feature vectors may be taken to be embedded in a linear d-dimensional space endowed with a feature which is tasked with measuring their dissimilarity. Classifiers which are considered to be classic rely on the numerical feature vector representation to give a description of the samples and on the concomitant distance metric to quantify their pair wise similarities (Huang, Heutte & Loog 2007). Judging the similarity that exists between samples is characterized by many data types which are disparate and comes with challenges of data representation and quantitative comparison. An example is that current databases store information in disparate formats like multimedia databases can store text, video and audio while Internet databases store information in terms of mouse clicks history, profiles of users, and marketing rules that are used in the site. All of these database objects are described both in numerical and non-numerical data. Storing data in disparate formats is not appropriate. It may therefore not be reliable to rely on metric space classifiers to find metric classifiers (Perez et al. 2011). In addition, in some applications, it is only the pair wise similarities may be seen and observed and the features which are underlying may be accessible. There has been a need to bring together data types and to generalize metrics to similarities; this has brought about research in various areas of similarity functions (Huang, Heutte & Loog 2007). Importance of similarity based classification Basing on the case study that has been developed, it is important to have used similarity based classification; there is the freedom to delete folders. This way, one is able to easily manage the classification system. There is some form of control in that you are not allowed to delete a folder if it has a subfolder (Cazzanti & Washington 2007). It has places where one is able to add rules. It is a better way of classification as this allows the heads of various sections to add rules that are will help them to manage these sections. With this ease of creating and deleting files, it makes it easy to create and manage new information content in the various departments. The nature of business that is undertaken in The Australian makes it a reliable and appropriate choice for this. Each media section has varying rules. Other rules can be added appropriately. There is the need to ensure that the rules ensure a safe handling of the data and information. This information is contained in the rules that are created in the process of manipulating data (Chu & Lin 2005). In this classification system, it allows naming of folders and documents using appropriate names. There are no defined names that need to be used in this system. For this reason, the users will make use of the names that they deem fit in the process. The names have some conventions that must be followed when naming is done. The user has the freedom to choose the folder in which the selected article should be saved. In this case, the path is known in the classification area. The path to a given article is shown. It makes the browsing of the various articles and folders easy to handle. Similar based classification is important because it allows articles which have the same content to be put together. This way, their handling and alteration is also easy to come through. The Australian magazine has many sections which deal with different sections and issues. Each of these issues needs to be handled separately. Each of the issues that are dealt in these sections keeps on changing every day. There is therefore a need to ensure that each issue is handled professionally (Lanzi, Stolzmann & Wilson 2001). Criteria to be followed In the system that has been developed, there are many articles that need to be managed. Each section in The Australian will need to have different articles that will require unique articles that are used in that department. Similarity based classification is therefore required to manage this information and to achieve this organization. With Knowledge based system, it would be hard to achieve this type of classification and organization. The root document needs to be broken down into smaller units that are easier to manage (Huang, Heutte & Loog 2007). Another criterion that will need to be followed is that many sub-categories will need to be created by making use of create-folder rule. These folders are created in the various categories. It will therefore be required that similarity based classification criteria be used when all these are created (Cazzanti & Washington 2007). Performance of knowledge based systems Knowledge base is an emerging discipline which has been argued about by many professionals. The discipline encompasses a convergence of quality of data, management of data, business process management, and management of risks which surround the handling of data in any organization. It is through the use of data governance that organization gets a chance to exercise control over the business process and inferences associated with this (Cazzanti & Washington 2007). One of the controversies that have been formulated in knowledge management is that it has been seen as a small discipline in information systems. This is not the case as the two disciplines are far much distant apart. Information management entails the response to anticipated stimuli using predetermined steps. It means that the responses that are made in information management were all seen to come before. There must be a problem which had been foreseen before and a solution be created to solve that problem. On the other hand, knowledge management consists of responses to new opportunities and challenges (Lanzi, Stolzmann & Wilson 2001). Data governance is a collection of processes that ensures that important and confidential data are managed in a formal way throughout the enterprise. This process ensures that the integrity of data is assured and that people can trust data at any stage of handling the data; for this to be achieved, people are held accountable for the management of the organization data and that they make sure that the quality of data is always high. It is also making sure that people are given the responsibilities of fixing and preventing data issues whenever these issues arise. This way, data is always efficient. It is about using technology, empowering people on the importance and the techniques that are required for the data to be managed efficiently. When companies desire to have a total control of their data, they have to empower their people and employ the right technologies so that this is tenable (Sun 2004). There are many benefits that come with knowledge management for The Australian. One of the benefits is the fact that the productivity is increased as the solutions that come with knowledge management way of solving problems in a volatile way. The data governance is a very key concept for any company, especially given the fact that there is a lot to be seen in the data world. Data governance is a collection of processes that ensures that important and confidential data are managed in a formal way throughout the enterprise. This process ensures that the integrity of data is assured and that people can trust data at any stage of handling the data; for this to be achieved, people are held accountable for the management of the organization data and that they make sure that the quality of data is always high (Roli, Kittler & Windeatt 2004). It is also making sure that people are given the responsibilities of fixing and preventing data issues whenever these issues arise. This way, data is always efficient. It is about using technology, empowering people on the importance and the techniques that are required for the data to be managed efficiently. When companies desire to have a total control of their data, they have to empower their people and employ the right technologies so that this is tenable (Perez et al. 2011). One of the main challenges in the implementation of knowledge based classification is the fact that there are volumes of data that will need to be managed. The implementers will also be required to gauge the relevance of the data that is to be implemented. The data should also be accessible and of good quality. With the data being added in large volumes through data mining, there is a lot to be sieved through when someone is looking for quality data to be implemented into knowledge base. As The Australian develops their knowledge base, there are data that need to be vetted for quality so that only the required data is included in the knowledge base (Oza 2005). Another challenge that comes in knowledge based classification management implementation is the handling of tacit knowledge which is very pivotal in the organization. The handling of tacit knowledge poses the organization for success. Naturally, it is hard to formulate the knowledge from people’s head to a form which can be communicated (Li, Wang & Dong 2005). One of the characteristics of tacit knowledge is the fact that it keeps on changing and is reshaped by the owner according to the new experiences the owner is undergoing. They are ever changing thus they need to be handled dynamically and also make sure that the changes are being included in the knowledge base (Lanzi, Stolzmann & Wilson 2001). Comparison with similarity based classification When knowledge-based and similarity-based are compared, it is clear that similarity based is more advantageous. Knowledge based classification makes use of databases more than classification. Similarity based classification makes use of trees and similar articles (Huang, Heutte & Loog 2007). Knowledge based classification makes use a pool of data and the way information in organized in that pool. It therefore makes use of database terminologies and search algorithms to look for some information and data. In similarity based classification, the information is searched according to their propensity to be similar to other articles and content (Chu & Lin 2005). In terms of speed similarity based is found to be faster than knowledge based because pair-wise data are grouped together and therefore makes them easier to manage. It is easier to look for data which are from a similar group as they are known where they reside. In knowledge based classification, the program will have to look for information from every place. Because there is no defined place where information can be looked up, the process is tedious and will take time before the article is got. This is one drawback that is seen in knowledge based classification (Butz 2002). Evaluation of auto classification results Given the nature of users making use of the system, it would be normal to have different people creating different rules. Each of the rules has to be reflected in the system. The users should not be confined to rules that are created by others. Different users have different tastes and therefore different rules. The system has the freedom for these users. The authors and editors in The Australian have different stories that are under different categories. The categories that are included here include sports, news, nature and research. All these will require the authors to have different rules that are used to store them. The web classifier system classifiers documents using rules which are created by users. Rules help to classify documents according to the procedures that are carried in that are carried out in the organization. The classifier system makes use of a similarity based function to put similar documents together. It has been argued before that the system is not that correct. This is an issue that has been noted in this system as the correctness of the data is not assured. It therefore class for keenness and thoroughness of the users. This is an issue that requires attention of the developers of the system. When a document is used as a similarity base for looking for other documents in the system, five other documents will be returned by the system. This would be an efficient way of looking for documents within the system. The only problem that is seen here is that the system is not that correct. It is not a guarantee that the system returns the right results. If this issue is dealt with, then it will be a good system that will be handy in handling classification. It will be a good system to be used in The Australian Company; the large number of documents and articles that they have to classify on a daily basis makes it a good system to handle this tedious work. Possible extensions Since similarity based classification is based on similarity of pairs, there is a need to have the similarity based classification based on generative stature. There is the need to extend the framework so that it is generative. The system that has been developed needs to be extended and include the generative features (Cazzanti & Washington 2007). Classifiers that are got from generative framework have more advantages when performance, interpretability and ease of interpretation are measured. It is not that the generative framework is disregarded forever but the samples are described by making use of numerical vectors which is the standard method of description that is used in metric learning. The probabilistic models are based on statistics which are descriptive and which are used for the classes. It is important to have access to estimates of probability (Cazzanti & Washington 2007). In the proposed extension that I insist, the framework should be in a position to seamlessly accommodate classifiers which are multi-class, and are asymmetric for costs and class priors. In addition to this, probability estimates are easily integrated to larger systems and find their use in identifying abnormal samples that are of low probability in any class. The new extensions that are proposed in this paper will bring solutions that are constrained in maximum entropy problems where the constraints are placed on the mean values of the similarity-based descriptive statistics. Conclusion From the research, the different classification systems are utilized in different scenarios. In our case, the similarity based classification is appropriate as the rules are made basing on the users and how they would like to have the processes undertaken in the system. Users are the ones who dictate how the items are to be classified in the system. In knowledge-based systems, databases are used. With databases, items are categorized with other items which are similar. Rules are created with the discretion of the users in the system in similarity based classification. In knowledge based systems, rules are created by the database administrators. In a media company like The Australian, there is need to have freedom of how users to create their own rules. This approach that the system has taken. References Butz, M 2002, Anticipatory learning classifier systems, Springer, New York. Cazzanti, L & Washington, UO 2007, Generative models for similarity-based classification, University of Washington, Washington. Chu, W & Lin, T 2005, Foundations and advances in data mining, Springer, New York. Huang, D-S, Heutte, L & Loog, M 2007, 'Advanced intelligent computing theories and applications: With aspects of contemporary intelligent computing techniques', Third International Conference on Intelligent Computing, Springer, New York. Lanzi, PL, Stolzmann, W & Wilson, S 2001, 'Advances in learning classifier systems', 4th international workshop, IWLCS2001, San Francisco, CA, USA, July 2001, Springer, New York. Li, X, Wang, S & Dong, ZY 2005, 'Advanced data mining and applications', First international conference, ADMA2005, Wuhan China, July 2005, Springer, New York. Oza, N 2005, 'Multiple classifier systems', 6th international workshop, MCS 2005, Seaside, CA, USA, June 2005, Springer, New York. Perez, J, Corchado, J, Moreno, M, Matthieu, P, Canada-Bago, J, Ortega, A & Fernadez, A 2011, 'Highlights in practical applications of agents and multiagent systems', 9th International conference on practical applications of agents and multiagent, Springer, New York. Roli, F, Kittler, J & Windeatt, T 2004, 'Multiple classifier systems', 5th international workshop, MCS 2004, Cagliari, Italy, June 2004, Springer, New York. Sun, B 2004, A web page classification system using genetic algorithm, Lamar University, Texas. Read More

Web Page Classifier System - Assignment Example

Extract of sample "Web Page Classifier System"

CHECK THESE SAMPLES OF Web Page Classifier System

Data Mining and Web Personalization

Strengths and Weaknesses of an E Marketing

Employment of Business Information Systems for a Competitive Advantage

How We Use Worldwide Web

The Current and Future use of Internet Technologies for LLoyds TSB

Biography of Craig Newmark

The Deep Web, Dark Web

AutoXpress Car Rental System-Design Phase