Last spring, when administrators gave spelling scores from standardized achievement tests to the Salt Lake Board of Education, the results spelled trouble with a capital "T."
Students from second to ninth grade fell below the 50th percentile in spelling, with those in third grade scoring at only the 43rd percentile, a drop from the 50th percentile two years earlier.At first glance, it looked as though Salt Lake students needed a strong dose of remedial work. But are they really poorer spellers than their peers nationwide? The answer may lie more in the tests than the students.
Tests measure student achievement and, therefore, it is implied, school success. In the name of accountability, education and the public have come to rely upon standardized achievement tests. At times, in a sort of educational "keeping-up-with-the-Joneses," schools and districts have been pitted against each other to see which is the most "successful" by comparing test scores.
But there is a growing national movement that sees standardized testing as a misleading yardstick by which the public and educators measure school success. Opponents of testing have formed their own special-interest groups, including the three-year-old National Center for Fair and Open Testing, of Cambridge, Mass.
A New Mexico physician, John J. Cannell, sued several test publishers after surveying 167 school districts. He found 150 of them, including some whose schools have long been recognized as having serious problems, had standardized test scores above the national norm. (See related story.)
His results were soon dubbed the "Lake Wobegon Effect." Humorist Garrison Keillor calls his mythical town of Lake Wobegon a place where all women are strong, all men are good-looking and "all children are above average."
Somewhere between such strong criticism of test scores and unquestioning acceptance lies reality.> "We put too much emphasis on the tests themselves," said Steven Bossert, an expert on testing and chairman of the University of Utah's Educational Administration Department. "They measure only a very small fraction of what is expected to be learned in school."
It is a misunderstanding about what the tests are, what they measure and how the scores should be evaluated that leads to the distorted achievement picture.
Several standardized tests are used widely. No school district is required to use a particular test, or any standardized test, for that matter.
In the metropolitan Salt Lake area, three different tests are used by the five school districts. Salt Lake, Granite and Davis use the Stanford Achievement Test, Jordan gives the Iowa Test of Basic Skills and Murray uses the Comprehensive Test of Basic Skills.
"Tests for the elementary grades tend to cover the same types of basic issues. There aren't huge differences in the items," Bossert said.
David Nelson, coordinator of evaluation and assessment in the State Office of Education, agrees. "It's not a question of which test is better but whether or not it does the job of measuring your curriculum."
Robert DeVries, who runs Salt Lake School District's testing program, said his school district chose the Stanford test because "it fits our needs the best. Generally, they are all fairly good tests. There are no real big differences, but there are differences in how they break out achievement."
The difference DeVries refers to is how the scores are clustered by various skills. In 1987, Salt Lake switched from the California Achievement Test to the Stanford because a committee of teachers preferred certain features in the Stanford test.
Within the next two years, most area school districts will purchase newly normed tests and, in the process, they plan on using teacher committees to help select them.
"When we choose a standardized test, we go through the curriculum guide to be certain that the test covers what we're teaching," explained Barbara Brunker, Murray School District testing coordinator.
Test norm confusion
One large difference between the tests is the norm. While most people would recognize the difficulty in comparing scores from different tests, many are not aware of test norms that can throw a statistical monkey wrench into comparisons of the same test.
Students in the Salt Lake, Granite and Davis districts may take the same test, but they use tests that were normed in different years.
Unlike most other tests, standardized achievement tests do not have a passing score but rely on the national norm known as the 50th percentile - half of students scored above this average, with the other half scoring below it. A score is said to be above or below average using this norm.
The norm is the midpoint of scores achieved by a sample group of students who took the test before it was released for general use.
Like pollsters conducting political opinion polls, testing company officials choose a sample group of students for test norming. Because these students are to statistically represent the U.S. student population, they're matched proportionately by race, socioeconomic status and geographic location.
For example, the '86 Stanford test, in its norm group, uses nearly 300,000 students chosen to match the U.S. school population based on the 1980 census.
Because of the high cost to companies that develop and norm the tests, and to school districts that administer them, a normed test is usually used for eight years, although most companies do issue a renormed version of the same test, usually after four years.
The administrators and testing experts agree the problem with this practice is that averages may become outdated and achievement may change.
There is a general feeling nationwide that student achievement is increasing overall, so students would be expected to perform above the norm set several years ago. However, school administrators often use these "above national norm" scores to support claims of exceptional improvement, rather than factoring in the expected increase in scores.
"The closer you are to the norm, the more meaningful the results," said Nelson.
When Superintendent John W. Bennion took the Salt Lake District's helm in 1985, he found the district was using a test normed in 1977. That led to the use of the '86-normed Stanford test. "I didn't have much confidence that it (old test) said much about where we were at all."
The startling drop in the district's spelling scores comes from an apples-to-oranges comparison of different tests and norms. The first results came from a California Achievement Test normed in 1977 and the second from the Stanford Achievement Test normed in 1986.
Assessing specific achievement
But there is another problem peculiar to spelling that underscores danger in using a standardized achievement test of general knowledge to assess specific achievement.
Bossert explained that it's difficult to assess spelling accurately because most children learn to spell from word lists based on their textbooks.
Some texts, for example, introduce the letter "X" in the second grade while "X" doesn't appear in others until the fifth or sixth grade. "Is it fair to rate a child's ability to spell using a test with words with which he is unfamiliar?" he asked.
"Curriculum alignment, especially in the early grades, is very important in standardized testing. Studies have shown that some districts use tests that only overlap (the curriculum) 40 or 50 percent of the time, while other districts are taking tests that overlap 90 percent of the time. Some districts in Utah have gone through the process of trying to align more closely with the tests," Bossert said.
Test critic Cannell believes that "teaching to the test," rather than poor curriculum alignment, is a bigger problem with standardized achievement tests. As teachers use identical tests year after year, they become familiar with the questions and can therefore gear their lessons to them.
Bennion said that may not be so bad, if the children are learning essential information. "In teaching, the objective is that you want the kids to learn what you teach them. If that's teaching to the test, fine. Maybe we should say the test is testing to the curriculum."
Another problem in assessing what standardized tests have measured is the socioeconomic background of the students. Educational studies have repeatedly shown that children from lower socioeconomic backgrounds don't do as well on standardized achievement tests as those from more affluent backgrounds.
"That doesn't mean the poor don't learn. They do, but the test scores can be low, even if the school is doing an excellent job," Bossert said.
Why does socioeconomic background affect test scores?
A number of factors are involved, said Bossert, including the parents' educational level, the availability of more books and educational toys in the home, language use and reading in the home, reading skills of parents and exposure to stimulating activities such as family vacations and trips to museums.
"There are exceptions, but the more affluent usually do better (on standardized tests). How would the average single mother, who may be working two jobs to support her family, have the opportunity to spend a lot of time with her child?" Bossert said.
Of course, the question for affluent students would be: Are their performances consistent with their advantages?
Bennion said, "You could have a pretty mediocre school in a high socioeconomic area, and the kids will look quite good on standardized test scores. That doesn't mean that they're getting the best shot that they should be getting from the school. Maybe they should have been at the 70th percentile, instead of the 55th percentile."
The experts said the same holds true for school districts. A more affluent, homogenous school district should be expected to have higher test scores than one with a diverse student population.
But no school district should be expected to score too high. The tests have been specifically constructed to produce bell-curve results, so that the bulk of the students fall in the middle third between the 33rd and 66th percentile.
If a district's mean hits above the 80th percentile, the state's Nelson said he'd be suspicious. "I'd want to look at how it got scored and who made Xerox copies of the test."
A narrow picture
With the limitations of standardized testing, what value do the experts see? All agree that standardized testing is worthwhile, to a degree, but they emphasize that it only gives a narrow view of the schools at a specific time.
"It's a general indicator of how we're doing relative to other students and districts around the nation, but by itself, standardized testing is not very adequate," Bennion said.
But educators also said test scores from one year can become meaningful when compared with scores from other years. With a series of scores, a student's, school's or district's prog-ress can be charted to determine trends.
However, the educators like the move toward criterion-referenced testing, where the test is matched to the material taught in class. They say it's a more valid assessment of academic performance.
The State Office of Education is developing criterion-referenced tests for Utah. Granite District has been involved in pilot trials of the new state-developed tests and may choose to spend more money on criterion-referenced tests. But it's not abandoning standardized tests.
"It will be valuable to have both," said James Henderson of the district staff. "We still need to see how we are doing comparatively in the nation."
Salt Lake's Bennion added, "Standardized testing is here to stay, but I think there is a growing recognition that it is inadequate as the sole measure of student progress, school prog-ress or district progress. As time goes on, I think we'll be able to show parents what it measures and what it doesn't measure."