Wednesday, March 27, 2013

Common Core Automated Scoring: e-Rater

One of the major requirements for full implementation of the CCSS is the technology necessary to score the assessments. 
Testing of the Common Core Standards will require enough computers or tablets for students to take the exams plus enough bandwidth to accommodate hundreds of students taking the same test at the same time.
Technology will also be required to create an automated scoring system for the Common Core Standards assessments.  Such companies as the Educational Testing Service (ETS) who partnered with Pearson and The College Board, developed a program called e-Rater. 

Just so you know what we're dealing with, in 2009, ETS produced a guide for "Fairness Review in Testing."  Here is the stated purpose for this guide:


The primary purpose of the ETS Guidelines for Fairness Review of Assessments is to enhance the fairness of tests. These guidelines are intended to help the people who design, develop, and review ETS items and tests to

• better understand fairness in assessment,
• avoid the inclusion of unfair content or images in tests as they are developed,
• find and eliminate any unfair content or images in tests as they are reviewed, and
• reduce subjective differences in decisions about fairness.
And here are some of the suggestions:

Now that we know what is NOT allowed to be tested, let's look at what IS tested:

While products such as e-Rater is capable of grading upwards of 16,000 essays in less than a minute, it doesn't "identify truth."  ETS even acknowledges that "e-Rater is not designed to be a fact checker.”  According to Les Perelman from MIT,  robo-readers like e-Rater is "easily gamed, is vulnerable to test prep, sets a very limited and rigid standard for what good writing is, and will pressure teachers to dumb down writing instruction."
The e-Rater’s biggest problem, he says, is that it can’t identify truth. He tells students not to waste time worrying about whether their facts are accurate, since pretty much any fact will do as long as it is incorporated into a well-structured sentence. “e-Rater doesn’t care if you say the War of 1812 started in 1945,” he said.

Mr. Perelman tested e-Rater's scoring capabilities using an essay about "why college costs are so high."

 Mr. Perelman wrote that the No. 1 reason is excessive pay for greedy teaching assistants.
“The average teaching assistant makes six times as much money as college presidents,” he wrote. “In addition, they often receive a plethora of extra benefits such as private jets, vacations in the south seas, starring roles in motion pictures.”
e-Rater gave him a 6. He tossed in a line from Allen Ginsberg’s “Howl,” just to see if he could get away with it.
He could.
Mr. Perelman takes great pleasure in fooling e-Rater. He has written an essay, then randomly cut a sentence from the middle of each paragraph and has still gotten a 6.
Two former students who are computer science majors told him that they could design an Android app to generate essays that would receive 6’s from e-Rater. He says the nice thing about that is that smartphones would be able to submit essays directly to computer graders, and humans wouldn’t have to get involved.
At least e-Rater allowed their product to be tested.  Other companies like Vantage and Pearson did not.
This doesn't sound like an "internationally bench marked curriculum."  Well, except for the part about making sure the United States doesn't come across as superior.
Next, we'll look at how companies like Vantage and Pearson plan to score the Common Core Assessments.