Language Assessment Test Evaluation: How Useful is the Test for its Intended Purpose Research Proposal Example | Topics and Well Written Essays

Running Head: Language Test Evaluation Language Assessment Test Evaluation: How Useful is the Test for its Intended Purpose? Name: Course: Tutor’s Name: 22nd, October, 2009 Contents 1.0 A Brief Description of the Assessment Task……………………………………………3 2.0 Bachman and Palmer’s framework Principles…………………………………………..5 3.0 Evaluation of the Language Test.......................................................................................5 3.1 Reliability………………………………………………………………………..5 3.2 Construct Validity……………………………………………………………….6 3.3 Authenticity………………………………………………………………………8 3.4 Inter-activeness…………………………………………………………………..9 3.5 Impact……………………………………………………………………………10 3.6 Practicality……………………………………………………………………….10 4.0 Summary of the Evaluation…………………………………………………………….. 11 5.0 Reference List……………………………………………………………………………12 6.0 Appendix………………………………………………………………………………… 1.0 A Brief Description of the Assessment Task The language test paper presented in the appendix as the case to be evaluated is a key stage two-year three paper (Appendix A), which is one of the test papers given at the end of each year to evaluate the progress made by the children in any level/stage (English Test, 2008; English Test b, 2008). It is a Standard Assessment Test paper used nationally in England, Wales and Northern Ireland (WJS, 2009). The aim of the test paper is to determine the levels of the children compared to the national standard levels and to determine the progress made by the children. By measuring the levels, the progress is established. A child with average abilities should have gained level two at the end of key stage one and that at key stage two should have gained level 4 at the end of the year (WJS, 2009). The SATs tests assist the teachers in determining the weaknesses and strengths of their pupils. In the English test for example, it will help the teachers find out the areas where the pupils are weak, what they have learnt and retained and their strengths. They show what the pupils have learnt during the teaching period and help grade the children to different levels considering all subjects. There are Mathematics SATs tests, Science SAT tests and English SATs tests. The test results will show if a pupil has attained the expected national curriculum level at the end of key stage two (level 4) (WJS, 2009). The text presented is a narrative taking the silent reading format and with different response formats. It has single word questions and open ended questions. It is also timed and the timing is uniform for all the examinees. It has different types of questions which include inferential, literal and lexical and assesses different skills obtained in school by the children (English Test, 2008; English Test b, 2008). The skills are reading skills, written language skills and others all included in the comprehension skills as evident in the schemes of work (Appendix C). Attainment of the national curriculum level is the main objective of the teachers; the test paper assesses the ability of the pupils to determine the levels, but is this test paper useful? Test usefulness will be evaluated in this paper based on Bachman and Palmer’s framework which has interactiveness, authenticity, construct validity, impact, reliability and practicality as test usefulness qualities. 2.0 Bachman and Palmer’s framework Principles There are three principles that guide the use of this framework in evaluating a test. These are; 1) That “Test usefulness and the proper balance amid the different qualities cannot be prescribed in general, but must be determined for each specific testing situations” (Bachman & Palmer, 1996 p.18) 2) That “It is the overall usefulness of the test that is to be examined, rather than the individual qualities that affect usefulness (Bachman & Palmer, 1996 p.18) 3) That “The individual test qualities cannot be evaluated independently, but must be evaluated in terms of their combined effect on the overall usefulness of the test” (Bachman & Palmer, 1996 p.18). 3.0 Evaluation of the Language Test Usefulness 3.1 Reliability An important question that should be asked when evaluating this test paper is that; are the scores obtained from the results reliable? According to Bachman and Palmer, reliable scores should be consistent and should not vary irrespective of the assessor, the testing situation or the raters (1996). The test paper has specific marks allocated under each section for example section 5 which cover pages 4 and 5 questions and each question has some marks allocated to it (English Test b, 2008). This does not change irrespective of the assessor, the rating scale or the test situation. It is consistent. Reliable scores give reliable results that can be used to grade the students (Bachman & Palmer, 1996; Westbrook, 2009). The interviewee also commented that the allocation of marks gives them (teachers) exactly what they want. He noted that with such kind of grading, any pupil could be assessed anywhere and graded with others provided they were all in the same level (Appendix B). b) Construct Validity Construct validity is the valid interpretations of the test scores which grade the children into various levels (Bachman &Palmer, 1996). The question is; how is this interpretation made valid? Specific levels have specific abilities and these abilities are measured by certain English language questions or tasks. The national curriculum provides guidelines on what should be covered under each key stage and the skills the children should obtain. Assessing these skills is by use of the tests provided. It is the national curriculum setters or the assessors that determine what marks to allocate to specific regions of the assessment tasks based on the skills being assessed. In the paper provided, it is the Qualifications and curriculum authority that gave the allocation of marks with the aim of testing what the children had learnt in a whole year and to determine their levels. The appropriateness and the meaningfulness of the interpretations of the test scores therefore lie in the contents of the test (Bachman & Palmer, 1996; Westbrook, 2009). According to Burt, comprehensions have high contrast validity since they are based on the curriculum of each stage (1996). Comprehension is a test that measures a variety of skills. It is an advanced test that measures writing and reading skills and the ability to communicate in writing (LearningRX Center, 2009). Some of the comprehension questions examine the children’s understanding of the texts within the comprehension (Alderson, 2000; Rathvon, 2004). The comprehension presented as the case test paper has tested the following skills all which are required to be attained at the end of key stage two. The Comprehension Skills Assessed The test paper as noted earlier has a Silent format. Silent reading formats assesses whether the examinees actually read the stimulus material or not, assesses whether they answer the questions by guessing based on their ability to make out some key words and make inferences from context and pictorial clues or whether they answer the questions randomly. Constructed and Multiple Choice Responses: The test paper presented has a constructed response format (English Test b, 2008) which instructs the examinees to answer questions using single words or extended verbal productions (Rathvon, 2004; Burt, 1996). It also has multiple choices that require selection of the correct response from a list of given responses (English Test b, 2008). Multiple choices measure recognition but do not measure the ability to construct a meaning out of what has been read or the ability to recall. They are mostly subject to guessing. On the contrary, constructed responses measure the ability of the examinee to construct meaning out of the text (for example the question that asked “Explain why Garnet wanted it to rain?”) (English Test b, 2008) and recall what was read earlier. They provide a lot of information for instructional planning and diagnosis (Alderson et al, 1995; Rathvon, 2004). Written Language Skills Assessed The paper has also assessed written language proficiency measures which consider a variety of skills with the following components: a) Grammar/linguistics in which the child’s ability to correctly use syntax, vocabulary, and sentence structure are assessed. b) The content, where the ability to produce a meaningful communication is assessed. c) Handwriting/copying: the assessment of this component considers the ability to form legible words, sentences, letters and numbers. d) Conventions: the skill measured in this case is the ability to apply capitalization, spelling and punctuation rules. e) Writing Fluency: here it is the automatic nature of the examinee’s writing that is assessed (Wren, 2009; SEDL, 2009). As indicated earlier, tests should be valid considering the stages or the level of examinees taking it. The case presented belongs to key stage two examinees and all the assessment is done considering this level. Each level has specific curriculum that determines what should be taught and these different curricula differentiates the requirements of each level (Lewis, 1995). The interviewee however had a contrasting idea about the skills assessed. He noted that the test has a variety of skills which makes it difficult to interpret the results. He however extended the comments by noting its importance. He noted that the paper is however very good since it measures the child’s ability gained from teaching for one year. It would not be appropriate to let the child move on without knowledge of what progress he/she has made or without knowledge of the efforts of the teacher (Appendix B). The teacher plays a very important role in molding the society by providing knowledge and the knowledge begins at a tender age. At this tender age, there are basics or guidelines that that should be followed when teaching them and this leads to continuous progress through stages. The progress does not come without an assessment and the test is one very important assessment of progress (Appendix B). 3.3 Authenticity Authenticity is the extent of correspondence between the language test tasks and the features of the target language use task. According to Bachman and Palmer, the tests should be able to show the real abilities of the child even in other domains and not in the test alone to be considered useful (1996). This is like testing the application of the knowledge gained in school. In the comprehension provided, the aim is to examine the child’s understanding of text and assess a variety of skills previously mentioned. It would be considered authentic if the child can be able to understand the content and meaning of other stories not in an exam situation only (Bachman & Palmer, 1996; Westbrook, 2009). During teaching, the curriculum may require a child to be able to comprehend a text. Comprehension is not in tests only and this provides the basis of investigating if the scores allocated are valid. That is, it is the target language tasks that provide the basis of assessment and determine what scores to allocate. The children have learnt a lot of skills during the year and these skills are based on the target (English) language features such as speaking, syntax, lexis and several others. The test is authentic since it assesses several skills that correspond to various English language tasks. An example is the ability to comprehend tested in the paper and is a feature of English language. So many other skills have also been assessed as indicated above. 3.4 Interactiveness This is the measure of the extent of the test taker’s involvement in the exam. Bachman and Palmer indicated that there are three individual characteristics that should be tested in language. These are; topical knowledge, language ability (strategic competence/ metacognitive strategies and language knowledge) and affective schemata (Allison, 1999;Bachman & Palmer, 1996; Westbrook, 2009). Interactiveness is measured by the level of engagement of these characteristics. A test developer should consider all these when developing a language test. Bachman and Palmer also noted that interactiveness provides a link with construct validity. Based on this argument it is evident that a test has to have the ability to engage all types of individual characteristics differently in order to produce results that are based on those characteristics. Interactiveness explains why some pupils belong to level 3, 4 or 5 after a test. The given sample is a very good example of an interactive test. Different skills are tested and different skills require different individual characteristics all which are engaged when interacting with the test tasks (Bachman & Palmer, 1996; Westbrook, 2009). 3.5 Impact Test usefulness is also measured by its impacts on the society and educational systems. A test with no impact on the society or the educational system is considered useless (Bachman & Palmer, 1996; Westbrook, 2009). The aim of the qualifications and curriculum authority was to ensure more of key stage 2 pupils attaining level 4. Attaining level 4 is only by improving the educational system which will ensure more pupils improve (WJS, 2009). A test that engages pupils’ characteristics, is reliable, is authentic and has high construct validity helps the Educational systems achieve its aims of improving performances in schools at different levels. This sample test paper has all these characteristics. My interviewee also supported this by noting that it not only improves the performance in school, but also the performance of a society considering the role of education in the society (Appendix B). 3.6 Practicality This quality involves the development of the test, the implementation methods and whether it will be used at all. It considers the resources available, that is from human resources to stationery. Development of language tests should consider the available resources before being developed otherwise it will be impractical. The test presented in the appendix was a test that was done by key stage two pupils as end year English Language task. It means that the resources were available for its development and it was done. It is impractical to set a test that cannot be done by same level pupils in different regions. This is not the case with the test paper presented. As noted earlier, it is a national assessment paper. 4.0 Summary of the Evaluation A test has to be reliable, practical, authentic, interactive and valid constructively. It also has to have some positive impact on the teachers, educational systems and the society to be considered useful (Bachman, 2009). All these qualities have been considered in the evaluation of the English test paper provided in the appendix. Based on the evaluation, it is evident that the test paper is very useful with limited weaknesses only presented by a user. It has reliable results since the test questions are standardized and every key stage two child gets a similar question from the same comprehension passage. It has a standardized assessment strategy, has adequate levels of reliability and validity, clear comprehension with vocabulary, clear background knowledge, word reading and other skills. The test takes the form of silent reading which is considered a more accurate way of assessing the children’s abilities since they now focus on accurate reading rather than reading aloud that requires them to comprehend the meaning of the text as they read (Rathvon, 2004; SEDL, 2009). It qualities are attributed to the fact that it is national and standardized paper based on the curriculum. Exam authorities have to ensure the tests they produce for assessment are very useful. 5.0 Reference List Alderson, J. C. (2000). Assessing Reading. Cambridge, England: Cambridge University Press. Alderson, J. C., Clapham, C. & Wall. (1995). Language Test Construction and Evaluation. 2nd Ed. Cambridge, England: Cambridge University Press. Allison, D. (1999). Language Testing and Evaluation: An Introductory Course. Boston, MA: World Scientific. Bachman, L. F. (2009).Statistical Analyses for Language Assessment: Test Usefulness. Chapter 1. Cambridge: Cambridge University Press. http://assets.cambridge.org/97805210/03285/excerpt/9780521003285_excerpt.pdf. Bachman, L. F. & Palmer, A. S. (1996). Language Testing in Practice: Designing and Developing Useful Language Tests. Oxford: Oxford University Press. Burt, A. M. (1996). Key Comprehension. Oxford, UK: Ginn. English Test. (2008). Rain and Shine Booklet. Retrieved on 18th Oct, 2009 from: http://www.stjosephspickering.nyorks.sch.uk/SATs%20papers/English%20SATs%20papers/2008%20English/Reading%20Booklet%202008.pdf. English Test b. (2008). Reading Answer Booklet Rain and Shine. Key Stage 2 Levels; 3–5. Retrieved on 18th Oct, 2009 from: http://www.st-josephs-pickering.nyorks.sch.uk/SATs%20papers/English%20SATs%20papers/2008%20English/Reading%20Answer%20Booklet%202008.pdf. LearningRX Center. (2009). Reading Comprehension Skills. Retrieved on 20th Oct, 2009 From: http://www.learningrx.com/reading-comprehension-skills.htm. Lewis, A. (1995). Primary Special Needs and the National Curriculum. 2nd Ed. London: Routledge. Rathvon, N. (2004). Early Reading Assessment: A Practioner's Handbook. New York: Guilford Press. SEDL. (2009). Reading Assessment Techniques. Retrieved on 20th Oct, 2009 from: http://www.sedl.org/reading/framework/assessment.html. Westbrook, C. (2009).Developing and Validating a Test of English Language Skills for Admissions Purposes. Retrieved on 19th Oct, 2009 from: http://portal-live.solent.ac.uk/course/teach_learn/news/resources/cwestbrook_poster.pdf Woodlands Junior School (WJS). (2009). Understanding SATS Tests & Reports in 2009.Retrieved on 18th Oct, 2009 from: http://www.woodlands-junior.kent.sch.uk/SATS.html Wren, S. (2009). Methods of Assessing Cognitive Aspects of Early Reading Development. The Southwest Educational Development Laboratory. (800) 476-6861 http://www.sedl.org/reading/topics/assessment.pdf. Read More

Language Assessment Test Evaluation: How Useful is the Test for its Intended Purpose - Research Proposal Example

Extract of sample "Language Assessment Test Evaluation: How Useful is the Test for its Intended Purpose"

CHECK THESE SAMPLES OF Language Assessment Test Evaluation: How Useful is the Test for its Intended Purpose

The Roles of Assessment, Evaluation and Feedback

Personality Assessment Instruments

Kinds of Intelligence

Education Is a Process of Creating a Sound Mind in a Sound Body

Monitoring and Assessment in Modern Foreign Languages

Comparsion of 3 Similar Instruments

Personality Assessment Instruments

Teaching and Learning Spoken English