SERVE Home Page
SERVE Center for Continuous Improvement
 Overview
 Products & Publications
 Professional Development
Tools & Resources
 Assessment Subtopics:
       Classroom
       School
       State
       Instructional Quality

  

Topic Areas Emerging Issues About SERVE
SERVE > Topics Areas >Assessment>State

 

 

State Assessment

Test Scores Are an Estimate Not a “True” Score

Tests are not like thermometers, which are very accurate in determining a person’s temperature. They are more like blood pressure gauges, which give variable results. For example, Table 1 shows an individual’s systolic blood pressure (SBP) readings for five consecutive days.

Table 1: Systolic Blood Pressure Readings

Day
Systolic Blood Pressure
1
124
2
134
3
126
4
136
5
121
Average
128

SBP between 130 and 139 is considered “High Normal,” while between 120 and 129 is considered “Normal.” A call to the doctor might be in order if the individual only has Day 2 or Day 4 SBP. However, if Day 1, 3, or 5 SBP is used, the individual does not call the doctor. The best estimate of SBP is the five-day average, which suggests this individual might want to think about calling the doctor. This variability is known as measurement error. That is, blood pressure will naturally change, and it could be because of the time of day, the amount of stress the individual is under, or the placement of the cuff. Similarly, for achievement tests, scores may vary because of different things (for example, a person’s anxiety level on a certain day). So, if a student took the same test on different days, they may get slightly different scores.

There is some uncertainty in any measure. Even physical measurement instruments (like scales, gauges, rulers, and the like) have some room for error, and the amount of error is often stated in the manufacturers’ specifications. Measures of things “inside the head” (like knowledge, skills, attitudes and so on) have even more uncertainty because unlike physical measurements, such “mental” measures are indirect. Test questions or opinion items are constructed to tap into what’s inside the head, and are thus subject to the interpretation of whoever is taking the test. So even good test questions yield scores that contain error.

Tests that report scores usually give confidence bands around scores. A confidence band is a range of scores around the score the student received that includes the margin of error. If a student scores 150 on a test, and the confidence band is 140-160, that means the student’s true performance is estimated to be in that range, and might fluctuate within that range if the student took the same test again. So the confidence band is a better descriptor than the actual score on a given day.

Some tests report scores in categories. For example, North Carolina reports students in Achievement Levels I, II, III, and IV. Florida reports Levels 1 through 5. These categories are based on setting cut scores on the score scale for each category. You can usually find the cut scores on a given test on the state’s website, if you are interested. For example, see page 4 of Understanding FCAT Reports 2004 for Florida’s achievement levels.

Some state tests provide individual printouts with confidence bands around the scale scores, showing how they overlap with different achievement levels. You can see an example in the Georgia 2004 CRCT Score Interpretation Guide (PDF). Georgia reports achievement in three categories: Does not meet, Meets, and Exceeds (the standard). Individual student score reports, however, shows the student’s scale score and confidence bands overlaid on a graph, so you can see whether the whole range of possible performance levels (the whole confidence band) is in one achievement level or whether it overlaps onto another.

The purpose for categorizing students into levels of achievement is to make scores easier to interpret. Think of it this way: the scale score answers the question, “How did my child do?” and the level designation answers the question, “How good is that compared with expectations in my state?” The categories add information compared to just a scale score.

Keep in mind, though, that over and over testing experts tell us that decisions should not be made based on a single test’s results (for more information, see www.ccsso.org ). It takes multiple ways of looking at student achievement to determine if students need additional help.

So remember that both the scale scores and the categories onto which they are mapped are subject to measurement error and are only estimates of “true” performance. If a child’s test performance is very different from what you expect, the estimate may be off. Check the student’s performance on some other measures.

Back