Diagnostic Assessment of Reading

First Author: Gina Biancarosa -- University of Oregon
Keywords: Assessment, Measurement, Diagnostic measure, Reading
Abstract / Summary: 

This symposium brings together four strands of research aimed at accentuating the knowledge we gain from reading measures by providing diagnostic information on individual readers. Diagnostic assessments provide information that can help focus instruction and intervention most effectively for individual children. The four studies presented represent varying efforts to develop diagnostic reading measures for use across a broad range of grades. The first two studies focus on early and intermediate grade readers and involve measures designed to supplement curriculum-based measurement (CBM) practices in schools by providing information that existing CBM assessments do not offer. The latter two focus on middle and high school readers and on a more comprehensive diagnosis of the component skills with which readers struggle. Each uses rigorous methods, including multidimensional scaling, to establish the technical adequacy of the new measures and represents a unique advancement in reading measurement. Results and their implications will be discussed by an expert in applied educational measurement.

Symposium Papers: 

Diagnostic Decoding Assessment: Technical Characteristics and Implications

First Author/Chair:Michelle Hosp -- University of Massachusetts, Amherst
Additional authors/chairs: 
John Hosp

The majority of students who struggle in reading have difficulties at the word level. Being able to diagnose specific patterns of difficulty is crucial for planning intervention. The purpose of this study was to develop a set of decoding probes covering both mono- and multisyllabic skills that represent the most common patterns in English targeted within curricula and standards.
300 2nd, 3rd, and 4th graders from 2 schools in two western U.S. states were administered 605 nonwords which represented patterns across 12 skills: CVC (5), CVCC (5), CVCe (4), r-controlled (5), blends (14), digraphs (7), vowel teams (16), double closed syllable (CVC/CVC: 5), prefixes (12), short vowel suffixes (16), long vowel suffixes (7), and contractions (5). Items were tested using both classical test theory and a 2-parameter logistic IRT model to evaluate their utility. In addition, 75 students also took the WJIII Word Attack (WA) and Letter-Word ID (LWID) subtests as well as Curriculum-based measures (CBM) for oral reading to evaluate criterion-related validity evidence and 75 retook all 605 decoding items at a 2-week retest interval.
Average IRT discrimination across items within a probe were good (1.73-2.30) as was difficulty (-1.77-0.90). Concurrent validity was moderate to strong with the WJIII WA (.50-.75), WJIII LWID (.51-.74), WJIII Broad Reading (.50-.77) and CBM (.60-.71). Test-retest coefficients were .75-.95.
There is evidence for strong technical characteristics at both the item and pattern probe level. Additional research into the technical characteristics and predictive utility of measures is ongoing.

Diagnosing the Reading Comprehension Processes of Poor Comprehenders: Year 2 Results of the Multiple-choice Online Causal Comprehension Assessment

First Author/Chair:Gina Biancarosa -- University of Oregon
Additional authors/chairs: 
Mark Davison; Sarah Carlson; Ben Seipel

Purpose. Teachers and researchers alike have long complained of a lack of practical measures of reading comprehension processes. The current study attempts to fill this need by designing a measure that not only identifies poor comprehenders, but also distinguishes between two types of poor comprehenders based on the way they process text in the effort to comprehend. Importantly, prior research has demonstrated these two poor comprehender types respond differentially to intervention.

Method, Results, & Conclusion. In winter 2016, U.S. third through fifth grade students will take a pilot version of the Multiple-Choice Online Causal Comprehension Assessment (MOCCA; n = 1500 per grade). MOCCA items involve stories missing a sentence necessary for the story ending to be causally coherent. Students choose among answer choices that include meaningful distractors that represent comprehension processes think aloud studies have demonstrated poor comprehenders overuse (i.e., paraphrases and lateral connections). Students will be identified as good comprehenders or one of three types of poor comprehenders: paraphrasers, who tend to paraphrase rather than make inferences or connections; lateral connectors, who tend to elaborate, associate, or explain without maintaining causal coherence; and indeterminate, which may include students with poor component skills or inconsistent comprehension processing. We compare results from 1- and 2-parameter multidimensional item response theory models that accommodate the use of informative distractors: nominal response models and decision-theory-based model. Results will include comparisons of model fit and accurate identification of good and poor comprehenders. Implications and future validity analyses be discussed.

Vertical scaling and longitudinal predictive relationships of scenario-based assessments across grades

First Author/Chair:Tenaha O'Reilly -- Educational Testing Service
Additional authors/chairs: 
John Sabatini; Jonathan Weeks; Jonathan Steinberg; Szu-Fu Chao; Zuowei Wang

Purpose – There are two purposes for this study: 1) to examine the properties of theoretically-driven, scenario-based assessments (SBA) of reading comprehension; and 2) to understand changes in student reading comprehension across time on such assessments.

Method – About 20000 students across grades 3-12 completed an initial 45-minute, SBA form and a longitudinal subsample of 1900 students completed an additional SBA form one year later. SBA forms were of varying difficulty and designed to measure higher-level reading skills. SBA tasks included summary writing, completing graphic organizers, multiple source comprehension, evaluation of digital content, and understanding of different perspectives. Classical psychometric analyses, IRT, correlations, regressions are among techniques used to evaluate the data.

Results – We will report on results of creating a single, vertical scale across grades 3-12 using a set of 19 SBA forms that vary in content and skills assessed using a MIRT analytic approach. We will then examine results of the longitudinal follow ups where students took a second form about a year later. We will look at strength of relationships, generalizability, mean change, and how student ability on the first SBA interacts with change over time.

Conclusions – SBAs are built upon an expanded framework of reading comprehension. While the SBAs here still rely largely on selected response items, they have numerous features that are unique compared to traditional reading comprehension tests. It is therefore important to understand whether these differences are construct-relevant to better evaluate validity claims before widespread use is considered.

Relation of reading components to scenario-based assessments in adolescents

First Author/Chair:John Sabatini -- Educational Testing Service
Additional authors/chairs: 
Tenaha O'Reilly; Kelly Bruce

Purpose: To examine the relations between scenario-based assessments (SBAs) and performance on reading component subtests designed for struggling adolescent readers. The SBAs target a broader construct of higher order, reading comprehension skills that are not typically addressed in traditional reading tests. The components target difficulties that could become focal areas for instructional interventions at the word, sentence, and discourse levels.

Method: In Fall 2013, a sample of about 255 students across Grades 6 through 11 completed a computerized battery of six subtests measuring: word recognition & decoding; vocabulary; morphology; sentence processing; efficiency of basic reading comprehension; and reading comprehension. An IRT-based, developmental scale was computed for each subtest (alpha reliabilities range from .83-.92. In Spring 2014, a subsample of students also completed one of three SBA forms, appropriate to their grade levels.

Results: Results indicate moderate correlations of SBAs to component subtests (range r=.39-67). Results of multiple regression analyses show a range of .39 to.54 variance predicted by the first five components (excluding the reading comprehension subtest). Results of hierarchical regressions were complex, providing potential insights into how subskills relate to SBA performance. [Note: recently completed vertical scaling will be used to re-analyze results using common scale for conference.]

Conclusions: The results help elaborate on the relations between higher level, SBA performance and underlying component reading skills across the middle grades. This in turn can inform the most efficient plans for implementing the two types of assessments to yield most informative and interpretations for informing instructional decisions.

The Future of Diagnostic Assessment

First Author/Chair:Discussant Discussant -- Florida Center for Reading Research

Yaacov Petscher will serve as the discussant. His research interests include educational assessment, developmental psychology, and quantitative psychology. An experienced data analyst and researcher in education, he has won the Educational Researcher of the Year Award from the Florida Educational Research Association, the Dina Feitelson Research Award from the International Literacy Association, and the Rebecca Sandak Award from the Society for the Scientific Study of Reading. He has co-edited books on applied quantitative analysis and fluency as a measurement construct and serves or has served on the editorial boards of Assessment for Effective Intervention, the International Journal of Behavioral Development, and the Journal of Research on Educational Effectiveness. His commentary will focus on the need for diagnostic measurement in reading and the future potential of multidimensional IRT models for generating diagnostic information.