In Districts Experimenting With New Tests, Writing Questions Is Only Half The Task

Apr 2, 2015

Teachers at a calibration workshop sort out which activities they will be grading.
Credit Sam Evans-Brown / NHPR

While in the Manchester school district, hundreds of parents are pulling their children out of the state’s new standardized test, the Smarter Balanced, four districts are trying something new. The Rochester, Sanborn, and Epping school districts along with Souhegan high school have recently received permission to design and implement their own assessments.

Think back to the standardized tests you did when you were in high school. Did you ever get a math question like this?

Your town’s population is predicted to increase over the next three years. As one of the town’s planners, you are asked to address this issue in terms of the town’s water supply. In order to meet the future needs of the town you need design and propose a water-tower somewhere on town property that will be capable of holding 45,000 cubic feet of water.

Another example: a student is told their parents won the lottery, and are asked whether they should take the money as a single payment or in installments, given that they are saving for retirement.

These questions will show up in front of students in four districts.

“So again, another real world application and they’re applying the geometry pieces,” says Mary Moriarty, assistant superintendent from Rochester, as she reads over the task intended to be given in a geometry class.

About a month ago, those districts learned the federal government will allow them to use these tests to cut the number of times their students take the new state-wide standardized test – called the Smarter Balanced – from seven to three.

Teachers in these districts believe these are long, multi-part questions are a better barometer of whether kids are learning the things they need to know.

“Years ago we probably would have just asked someone graph an equation. We would have given them an equation and asked them to graph it,” says Moriarty, “Now we’re giving them a scenario, and they have to take the math they know to create the analysis for that scenario.”

You can’t possibly answer a performance assessment by just filling in a bubble, and of course you can’t grade them as easily either. So when teachers from these four districts wrote these tasks earlier this year that was only half the job.

Training Four Districts To Be ‘Pyschometricians’

On a cold day in March, Math, English, and science teachers from the Epping, Rochester, and Sanborn school districts are trying to calibrate their grading. To hear Sanborn’s assistant Principal, Michael Turmelle, say it, there are a lot of expectations heaped on the shoulders of these teachers.

“When we talk about the Smarter Balanced, there are pyschometricians who have worked for years on this test,” says Turmelle tells the room full of assembled teachers, referring to the experts who design tests, “So there’s validity, reliability, fairness, equity in the system. Today is all about the scoring and calibration, so it has to be at a scientific level, to the point where we can say our work meets those criteria.”

They sit down in groups, and all read the same essay, and then score the essays on a number of criteria – organization, evidence, grammar and vocabulary – on a scale of 1 to 4. Once they’re all done, they compare what they’ve come up with.

While some of the tasks schools will present will likely have a different "prompt" (as in students will be asked to write essays on different subjects), shared rubrics are an essential part of keeping the grading standard.

One group reviewed an essay students had to write to the local school board opposing or supporting a cut in the athletics budget.

“I have trouble, I have trouble going with much above a two for most of it because I feel like the writer sort of coopted the prompt,” says Sanborn English teacher Aaron Cass, explaining why his scores were lower than the other teachers.

This last bit – sharing and talking about how you grade – can be tricky.

“Grading is really personal, just like teaching is very personal, and sometimes you don’t want to hurt other people’s feelings,” says Crystal Lavoie, a 9th grade teacher at Sanborn. But she says it doesn’t take long before teachers start to converge on the same scores.

“You want to make sure that you’re grading things as fairly as your peers are, too,” she explains, “You have to have these discussions as a large group… talk as a group about what does it mean to have something skillfully constructed. Does it have conjunctions in it? Is it a compound sentence? And you can’t do that until you talk to other people about what that looks like.”

“Not For the Faint of Heart”

This is the process known as calibration, and according to Scott Marion – a consultant from the New Hampshire-based Center for Assessment, that’s been working with the state to develop the PACE pilot program – it works remarkably well.

He says education thinkers have been reconsidering the term standardization. For student’s grades to be comparable, do they really have to be taking the exact same test in the exact same way?

“The ultimate arbiter of comparability is the student work, because that doesn’t lie,” says Marion.

Marion says teachers are working from a shared rubric, or scoring guide, but those are just words. “The problem with just scoring guides is you can interpret the words one way, I could interpret it another way. So we create anchor papers. And we come to agreement on that. So then when we come to new papers I say what does it look like compared to these papers. Is it a borderline three? Is it a prototypical three? Is it a high three?

He says, after training, typically 85 percent of teachers give essays the same score, and 95 percent are within one point of one another. They are calibrated.

This has all been done before – for instance the people who score the essay sections of other standardized tests all have to calibrate. But in New Hampshire, eventually the Department of Education would like the PACE program to expand to every school statewide. That would mean all of the state’s English, Math, and Science teachers would need to calibrate.

“This is not for the faint of heart. It’s more work, we think it’s the right work,” says Marion, “Districts have to be ready, but there’s no stopwatch running on this.”

A Fluid Situation

The federal government okayed the PACE pilot only a month ago, and the schools involved in the program are already in the throes of administering the test. The districts have had to contend with a shifting landscape, as the rules of the assessment shifted during negotiations with the US Department of Education.

For instance, teachers like Rochester Middle School English teacher Cassandra Sweatt, didn’t learn until this training session that they’d be using one, standard grading guide.

Folks got a little testy.

“We’re not getting a chance to use the rubric that we’re going to have to use for the assessments that we’ve already created. So that’s where I’m confused because if we don’t get to practice with those,” Sweatt said in the middle of the calibration session, clearly irritated.

“It’s – as I tried to explain – it’s the messiness of trying to implement a statewide performance assessment, in each of four separate districts at the same time that we’re negotiating with the federal government to get a waiver,” explained Dan French, after the session. He’s another consultant with the Boston-based Center for Collaborative Education, which has been working to develop the PACE program and ran this workshop.

That messiness will play out over the next few weeks, as the districts administer these tests.

Dan French from the Center for Collaborative Education oversees the calibration workshop
Credit Sam Evans-Brown / NHPR

But despite the fact these are being rolled out at full-speed and questions that come up are being sorted out on the fly, the teachers involved are optimistic.

“It’s not multiple choice, and it’s not about being a good test-taker, it’s about being a good learner and a good student,” says Jennifer Andrews, a middle school teacher from Rochester, who says teaching to this kind of assessment is what gets her excited about being a teacher.

Once the tests are all done in a few weeks, they will be graded, then be swapped to a new teacher in the same school and graded again, and then exchanged between the four districts and graded a third time. Those psychometricians at the federal and state level will pour over the results, and see not just how the students did, but how these districts did in trying something totally new.

Correction: An earlier version of this story gave the incorrect surname for Dan French