Learning, Innovation & Tech

Bombs & Breakthroughs

What you should really know about testing in education

SMW, April 30, 2014 5:59 PM

noah.ark.noahs.jpg


"Why does it matter that most of our tests
are written by specialists in statistics?
Well, when psychometricians write a test,
they spend a lot of time
ensuring standardization and reliability."


(Editor's Note: The following is a great primer for every non-academic to understand testing in education: the basics of summative vs. formative assessements, high-stakes vs. testing in general - demystified in an essential read.
Great work by Anya Kamenetz for NPR, originally posted. - C.J. Westerberg)



What Are Education Tests For, Anyway?
by Anya Kamentz for NPR

Pay attention to this piece. There's going to be a test at the end.

Did that trigger scary memories of the 10th grade? Or are you just curious how you'll measure up?

If the answer is "C: Either of the above," keep reading.

Tests have existed throughout the history of education. Today they're being used more than ever before - but not necessarily as designed.

Different types of tests are best for different purposes. Some help students learn better. Some are there to sort individuals. Others help us understand how a whole population is doing.

But these types of tests are easily confused, and more easily misused. As the U.S. engages in another debate over how - and how much - we test kids, it might be helpful to do a little anatomy of assessment, or a taxonomy of tests.

Teachers divide tests into two big categories: formative and summative.

Formative assessment, aka formative feedback, is the name given to the steady little nudges that happen throughout the school day - when the teacher calls on someone, or sends a student up to the board to solve a problem, or pops a quiz to make sure you did the reading.

Any test given for purely diagnostic reasons can also be formative. Say a new student comes to school and teachers need to see what math class she should be in. What distinguishes formative assessments is that they're not there to judge you as a success or failure. The primary purpose is to guide both student and teacher.

Summative assessment, on the other hand, sums up all your learning on one big day: the unit test, the research paper, the final exam, the exhibition. This is similar testing on online poker sites while playing poker on through poker sites which require extensive testing.

When it comes to summative tests, U.S. schools really love a particular subcategory of them: psychometrically validated and standardized tests. Psychometrics literally means "mind measurement" - the discipline of test-making. It's a statistical pursuit, which means it's mostly math. Giant chunks of social science are based on the work of 19th century psychometricians, who came up with tools and concepts like correlation and regression analysis.

But the most famous of those tools is the bell curve. Almost any aspect of the human condition, when plotted on a graph, tends to assume this famous shape: crime, poverty, disease, marriage, suicide, weight, height, births, deaths. And, of course, when Alfred Binet developed the first widely used intelligence tests in the late 1800s, he made sure that the results conformed to that same bell curve.

That is, does this test actually tell us anything important?
Especially, is it predictive of future performance in the real world?


Why does it matter that most of our tests are written by specialists in statistics? Well, when psychometricians write a test, they spend a lot of time ensuring standardization and reliability.



(cartoon added by editor The Daily Riff)

Reliability means if you give the same test to the same person on two different occasions, her scores should not be wildly different. And standardization means that, across a broad population, the results of the test will conform to an expected distribution - that bell curve, or something like it. If you give the same test to 20,000 people and they all score a 75, that's not a very useful test.

These rules are the reason that 4 million U.S. students are taking extra tests this year. Not for their own practice, but to test the tests themselves. These are the newly developed tests developed to align with the Common Core of State Standards. Large field tests are required to establish their standardization and reliability.

A psychometric test is historically grounded, mathematically precise and suitable for ranking large human populations. But those strengths can also be weaknesses.

A reliable test doesn't change much from year to year. That can make them easier to coach.
The need for reliable scoring often drives designers to use multiple choice questions, to avoid ambiguity. But that format has a hard time measuring a whole range of crucial human abilities: creativity, problem-solving, communication, teamwork and leadership, to name a few.
The multiple-choice format and the need for predictability mean psychometric tests, whether a state third-grade reading test or an SAT, all somewhat resemble each other. And so, they can end up testing a student's test-taking ability more than actual subject knowledge.

Reliability and standardization can be at odds with the third key problem in psychometrics: validity. That is, does this test actually tell us anything important? Especially, is it predictive of future performance in the real world? Validity ideally is established by comparing students' test scores with some sort of ground truth, such as grades in school, or later success in college. But that takes a long time and a lot of number crunching. And in practice the process is often pretty circular: the validity of test results tends to be based on their correlation with other test results.

And in practice the process is often pretty circular:
the validity of test results tends to be based
on their correlation with other test results.


So, those are the keys to how test makers see the test. But in the world of education, it's not just how they're written, but how they're used.

That brings us to the tests that so many Americans love to hate: high-stakes tests. The ones that decide whether our kids move up to the 4th grade, get a full-ride scholarship, or someday, a job.

In practice, we accept one kind of high-stakes test: the standalone gatekeeper test. Everyone wants a pilot who passed her licensing exam, or a lawyer who passed the bar. We like transparent, objective standards, especially when it's other people who have to meet them.

No, it's the other kind of high-stakes test that draws the most ire: accountability tests. They get this name because they are given to judge the performance of schools, teachers and states, not just students. Accountability tests determine school reorganization and closure decisions, teacher evaluations and state funding.

So, got all that? Good. Now here's your essay question:

Under the federal No Child Left Behind law, passed in 2001, public school accountability has rested largely on the results of psychometrically validated and standardized, largely multiple-choice summative assessments covering math and English only, given annually in 3rd through 12th grades. Given what you've just read about the strengths and weaknesses of this test format, is it wise to attach so many consequences to their results? State the reasons for your response.
###

blog comments powered by Disqus
It had long since come to my attention that people of accomplishment rarely sat back and let things happen to them. They went out and happened to things.
Leonardo da Vinci
Follow The Daily Riff on Follow TDR on Twitter

find us on facebook

Sparks of Genius. Thirteen Thinking Tools.jpg

How the Arts Prepare for a Life's Work in any Discipline

CJ Westerberg, 05.23.2014

The Arts in STEM learning: "I actually explained my whole theory of immunology to an entire art class at a college JUST through visuals and they GOT it. I have many more problems trying to explain it to my students through words in my immunology classes." -Dr. Robert Root-Bernstein

Read Post | Comments

Riffing good stories

maui.surfers.JSB.jpg

Shaping Serendipity for Learning: Conversations with John Seely Brown

CJ Westerberg, 05.20.2014

"Conventional wisdom holds that different people learn in different ways. Something is missing from that idea, however, so we offer a corollary: Different People, when presented with exactly the same information in exactly the same way, will learn different things.

Read Post | Comments
john_oliver.jpg

Want smart & different kind of brilliant? Check out John Oliver

SMW, 05.13.2014

Video - Humor His Climate Change segment with Bill Nye and "the deniers" is just that.

Read Post | Comments
japan.teen.boy.japanese.satori.generation.jpg

The Satori Generation

SMW, 05.03.2014

Enlightened or Generation Resignation?

Read Post | Comments
boy.manbox.TED.jpg

How Chicago's BAM (Becoming a Man) program works

SMW, 05.01.2014

Teaching students to identify thought patterns that contribute to depression & anxiety so they can work to replace them with healthier patterns. . .

Read Post | Comments
lizard.jpg

Is school(ing) cold or warm-blooded?

SMW, 04.30.2014

"Today we're all in on the game. The question is whether it is played well."

Read Post | Comments
noah.ark.noahs.jpg

What you should really know about testing in education

SMW, 04.30.2014

"Why does it matter that most of our tests are written by specialists in statistics? Well, when psychometricians write a test, they spend a lot of time ensuring standardization and reliability."

Read Post | Comments
Einstein.desk.bookcase.jpg

Can Messy Be A Sign Of Brilliance?

CJ Westerberg, 03.18.2014

Einstein's Desk In Photo: What He & Agatha Christie May Have In Common

Read Post | Comments
Finland. Finnishflag.BobCompton.jpg

The Finland Phenomenon: Inside the World's Most Surprising School System

CJ Westerberg, 03.17.2014

New update December 2, 2013 which includes Yong Zhao link and comments PLUS CNN videos

Read Post | Comments