Learning, Innovation & Tech

Bombs & Breakthroughs

What you should really know about testing in education

SMW, April 30, 2014 5:59 PM

noah.ark.noahs.jpg


"Why does it matter that most of our tests
are written by specialists in statistics?
Well, when psychometricians write a test,
they spend a lot of time
ensuring standardization and reliability."


(Editor's Note: The following is a great primer for every non-academic to understand testing in education: the basics of summative vs. formative assessements, high-stakes vs. testing in general - demystified in an essential read.
Great work by Anya Kamenetz for NPR, originally posted. - C.J. Westerberg)



What Are Education Tests For, Anyway?
by Anya Kamentz for NPR

Pay attention to this piece. There's going to be a test at the end.

Did that trigger scary memories of the 10th grade? Or are you just curious how you'll measure up?

If the answer is "C: Either of the above," keep reading.

Tests have existed throughout the history of education. Today they're being used more than ever before - but not necessarily as designed.

Different types of tests are best for different purposes. Some help students learn better. Some are there to sort individuals. Others help us understand how a whole population is doing.

But these types of tests are easily confused, and more easily misused. As the U.S. engages in another debate over how - and how much - we test kids, it might be helpful to do a little anatomy of assessment, or a taxonomy of tests.

Teachers divide tests into two big categories: formative and summative.

Formative assessment, aka formative feedback, is the name given to the steady little nudges that happen throughout the school day - when the teacher calls on someone, or sends a student up to the board to solve a problem, or pops a quiz to make sure you did the reading.

Any test given for purely diagnostic reasons can also be formative. Say a new student comes to school and teachers need to see what math class she should be in. What distinguishes formative assessments is that they're not there to judge you as a success or failure. The primary purpose is to guide both student and teacher.

Summative assessment, on the other hand, sums up all your learning on one big day: the unit test, the research paper, the final exam, the exhibition.

When it comes to summative tests, U.S. schools really love a particular subcategory of them: psychometrically validated and standardized tests. Psychometrics literally means "mind measurement" - the discipline of test-making. It's a statistical pursuit, which means it's mostly math. Giant chunks of social science are based on the work of 19th century psychometricians, who came up with tools and concepts like correlation and regression analysis.

But the most famous of those tools is the bell curve. Almost any aspect of the human condition, when plotted on a graph, tends to assume this famous shape: crime, poverty, disease, marriage, suicide, weight, height, births, deaths. And, of course, when Alfred Binet developed the first widely used intelligence tests in the late 1800s, he made sure that the results conformed to that same bell curve.

That is, does this test actually tell us anything important?
Especially, is it predictive of future performance in the real world?


Why does it matter that most of our tests are written by specialists in statistics? Well, when psychometricians write a test, they spend a lot of time ensuring standardization and reliability.



(cartoon added by editor The Daily Riff)

Reliability means if you give the same test to the same person on two different occasions, her scores should not be wildly different. And standardization means that, across a broad population, the results of the test will conform to an expected distribution - that bell curve, or something like it. If you give the same test to 20,000 people and they all score a 75, that's not a very useful test.

These rules are the reason that 4 million U.S. students are taking extra tests this year. Not for their own practice, but to test the tests themselves. These are the newly developed tests developed to align with the Common Core of State Standards. Large field tests are required to establish their standardization and reliability.

A psychometric test is historically grounded, mathematically precise and suitable for ranking large human populations. But those strengths can also be weaknesses.

A reliable test doesn't change much from year to year. That can make them easier to coach.
The need for reliable scoring often drives designers to use multiple choice questions, to avoid ambiguity. But that format has a hard time measuring a whole range of crucial human abilities: creativity, problem-solving, communication, teamwork and leadership, to name a few.
The multiple-choice format and the need for predictability mean psychometric tests, whether a state third-grade reading test or an SAT, all somewhat resemble each other. And so, they can end up testing a student's test-taking ability more than actual subject knowledge.

Reliability and standardization can be at odds with the third key problem in psychometrics: validity. That is, does this test actually tell us anything important? Especially, is it predictive of future performance in the real world? Validity ideally is established by comparing students' test scores with some sort of ground truth, such as grades in school, or later success in college. But that takes a long time and a lot of number crunching. And in practice the process is often pretty circular: the validity of test results tends to be based on their correlation with other test results.

And in practice the process is often pretty circular:
the validity of test results tends to be based
on their correlation with other test results.


So, those are the keys to how test makers see the test. But in the world of education, it's not just how they're written, but how they're used.

That brings us to the tests that so many Americans love to hate: high-stakes tests. The ones that decide whether our kids move up to the 4th grade, get a full-ride scholarship, or someday, a job.

In practice, we accept one kind of high-stakes test: the standalone gatekeeper test. Everyone wants a pilot who passed her licensing exam, or a lawyer who passed the bar. We like transparent, objective standards, especially when it's other people who have to meet them.

No, it's the other kind of high-stakes test that draws the most ire: accountability tests. They get this name because they are given to judge the performance of schools, teachers and states, not just students. Accountability tests determine school reorganization and closure decisions, teacher evaluations and state funding.

So, got all that? Good. Now here's your essay question:

Under the federal No Child Left Behind law, passed in 2001, public school accountability has rested largely on the results of psychometrically validated and standardized, largely multiple-choice summative assessments covering math and English only, given annually in 3rd through 12th grades. Given what you've just read about the strengths and weaknesses of this test format, is it wise to attach so many consequences to their results? State the reasons for your response.
###

blog comments powered by Disqus
The secret message communicated to most young people today by the society around them is that they are not needed, that the society will run itself quite nicely until they - at some distant point in the future - will take over the reigns. Yet the fact is that the society is not running itself nicely... because the rest of us need all the energy, brains, imagination and talent that young people can bring to bear down on our difficulties. For society to attempt to solve its desperate problems without the full participation of even very young people is imbecile.
Alvin Toffler
Follow The Daily Riff on Follow TDR on Twitter

find us on facebook

Thumbnail image for Thumbnail image for DSC_0163-1_2.jpg

Hey, so-called Leaders. Ya want feedback or measurement?

CJ Westerberg, 06.24.2016

What do we measure and why? Questions About The Uses of Measurement by Margaret Wheatley and Myron Kellner-Rogers

Read Post | Comments

Riffing good stories

Einstein.desk.bookcase.jpg

Can Messy Be A Sign Of Brilliance?

CJ Westerberg, 06.23.2016

Einstein's Desk In Photo: What He & Agatha Christie May Have In Common

Read Post | Comments
joi-ito.jpg

"Professors who are good at mentoring will thrive"

CJ Westerberg, 03.17.2016

Joi Ito, Head of MIT Media Lab

Read Post | Comments
red.abstract.jpg

NEW POST: The Beauty of Math

CJ Westerberg, 12.24.2015

Beautiful Video

Read Post | Comments
CIMG0353.jpg

Seven Questions: Is your child a recipe-follower or a real learner?

CJ Westerberg, 10.22.2015

John Holt: The Seven Ways to Picture a Student's Understanding - - - by CJ Westerberg

Read Post | Comments
stress2.girl-in-a-jar.jpg

A Response to: Is Your Child a "Warrior" or "Worrier"?

CJ Westerberg, 10.22.2015

Why Some Kids Handle Test Pressure and Others . . . Not.

Read Post | Comments
two.schools.sign.bullying.jpg

Two Schools: Which one builds a better bully?

CJ Westerberg, 10.21.2015

"Education-as-we-know-it is about building hierarchies - among athletes . . . "

Read Post | Comments
nature.abstract.jpg

"Children just don't go out in nature"

CJ Westerberg, 10.21.2015

Children today spend on average spend less than 40 minutes "outside" - Video Doc

Read Post | Comments
gotye-college.jpg

What do you remember from high school? (Humor)

CJ Westerberg, 10.16.2015

Week-end Funnies - VIDEO Viral video parody (nudity alert!) Some Study That I Used to Know is already at 1.5 million views. It's another play on Goyte's hugely viral video

Read Post | Comments

More Featured Posts