Creative reuse and repurposing of data

Creative reuse and repurposing of data

First Author: Wilhelmina van Dijk -- Florida State University
Keywords: Measurement, Open Science, Methodology, Structural equation modelling
Abstract / Summary: 

Data are often collected with a specific purpose in mind. Often, however, other researchers come up with innovative research questions unrelated to this purpose that can be answered with the same data, or with a combination of several data sets. The intent of this symposium is to illustrate the benefits, pitfalls, and possible outcomes of creative reuse of extant data. The presenters will provide an overview of strategies to maximize the reuse and repurposing of extant data, discuss an example of a methodological innovation needed to prepare data for reuse, demonstrate a technique on combining data sets from different sources, and finally present an example of creative reuse of data. Active discussion of the topic will be encouraged.

Symposium Papers: 

Strategies to leverage extant data for the next generation of collaborative studies of reading and reading development

First Author/Chair:Erik Willcutt -- University of Colorado, Boulder
Additional authors/chairs: 
Lauren McGrath; Richard K. Olson

Over the past two decades the NICHD Learning Disabilities Research Centers (LDRCs) and Research Hubs have generated extensive data regarding nearly all aspects of individual differences and difficulties in reading. This presentation will provide a foundation for the subsequent talks by describing several different strategies to maximize the use of these and other existing datasets to address research questions that cannot be easily addressed by any individual study, including key questions regarding the etiology, neurophysiology, and treatment of reading disability.

Method / Results:
We will describe strategies that include online publication and documentation of deidentified datasets for public use, collaborative analyses of pooled samples, meta-analysis of summary effect sizes across studies, and mega-analyses that include both published effect sizes and new analyses of a subset of datasets. For each method we will use specific examples from our work to illustrate both the potential strengths and the unique challenges that are inherent in these approaches. Important considerations include the diverse range of measures used across studies to assess core reading constructs, differences in Institutional Review Board requirements for data to be shared across sites, and the development of procedures to ensure that the rights of individual laboratories and investigators are protected throughout the entire pipeline from collaborative analyses to eventual publication.

We will conclude by summarizing best practices to facilitate the effective implementation of these approaches in the future, and discuss opportunities for laboratories outside the NICHD LDRCs and Hubs to contribute to these and other related collaborative projects.

Evaluating measurement invariance with effect sizes: Avoiding the sample size paradox.

First Author/Chair:Wilhelmina van Dijk -- Florida State University

For many applied researchers, being able to combine data from multiple sets is an attractive way to answer novel research questions without having to collect copious amounts of new data. Different methods exist to evaluate the extent of measurement invariance across multiple groups. The traditional MI method compared differences in Chi-square values of models with varying parameter constraints. This method is highly susceptible to sample size, however, and not always useful with large sample sizes. Moreover, this method does not evaluate the size of the difference, which may be small. As alternative methods, several effect size estimation methods have been proposed.

We combined 8 data sets from different reading intervention research projects, based on students’ pretest scores by estimating a 1-factor CFA model with 5 indicators. We then conducted the following methods to evaluate MI: (a) traditional MI approach constraining parameters across groups, (b) standardized parameter differences across groups, (c) f-statistic (i.e., average difference from the overall mean), and (d) estimating meaningful differences through modeling parameters across standardized factor scores.

We compare results across the methods, evaluate advantages and disadvantages for each method, and provide recommendations for guidelines on the interpretation and use of effect size measures for Measurement Invariance evaluation.

Using integrative data analyses to answer questions in the field of reading

First Author/Chair:Christopher Schatschneider -- Florida State University

Purpose. With the increasing popularity of Open Science and the availability of multiple datasets to the field at large, there is a need to be able to rigorously evaluate and potentially combined datasets so they may be analyzed as a single study. One method of doing this is called Integrative Data Analysis (IDA; Curran & Hussong, 2009). The purpose of this presentation is to introduce IDA and discuss its benefits and limitations, and to demonstrate how to employ IDA by presenting on an analysis of data from eight separate projects that were all treatment studies of early intervention on reading skills.
Method. Data from eight studies of early reading were evaluated and combined using IDA. These datasets were all funding by NICHD and all were treatment studies looking at early interventions on reading.
Results. IDA allowed for the estimation of a factor score for reading that leveraged information across all dataset, even though not all of the datasets had the same measures collected. Then for each study, we compared the results obtained using observed indicators from that particular study to the results that employed the factor score obtained from IDA.
Conclusions. We will conclude by discussing the strengths and limitations of this approach in the analysis of treatment studies in the field of reading research.

Reanalysis of the CLASS using a bias correction model

First Author/Chair:Jessica Logan -- The Ohio State University

Meta-analytic work demonstrates that the Classroom Assessment Scoring System (CLASS) is only weakly related to children’s outcomes (Keys, 2013). A potential reason may be a specific type of measurement error. Typically, the ten CLASS dimensions (and three domains) are calculated as an average across three 20-minute observation cycles. In this investigation, we instead use all thirty data points (ten dimensions, three cycles). We fit a bias-correction model to the data, with latent factors estimated both for the three domains, but also for the three cycles. The bias factors capture the error that is associated with a particular cycle that is not representative of the rest of the day (e.g., an injured child), thus removing that error from the estimates of the CLASS domains.

Data for the current study were drawn from the National Research Center on Early Childhood Education (NRCEC) Teacher Professional Development Study (Pianta and Burchinal, 2007-2011), and were retrieved from ICPSR (; 216 classrooms with complete data). SEMs were fit using MPlus using WLSMV.

Factor loadings on the domain scores were strong, similar to other published findings. The loadings for the bias factor were small to moderate in size (-.08 to .70), which suggests that each of the error terms is capturing some reliable variance.

The result of this bias-correction model should be more reliable, and produce more accurate representations of individual differences in preschool classroom quality. In our presentation, we will also demonstrate how model predicts children’s emergent literacy outcomes.


First Author/Chair:DISCUSSANT Christopher Schatschneider

The discussant will briefly discuss each talk with an emphasis on pointing out similarities and differences across the presentations. The discussant will also address perceived strengths and hurdles to overcome with the movement toward Open Science.