When thinking about the validity of our assessments, we focus primarily upon construct validity**. Construct validity is usually thought of as the degree to which assessments measure what they are designed to measure, but since our assessments do much more than provide a score, we think of construct validity more broadly—as the degree to which our assessments accomplish what they are designed to accomplish. When we design and deliver assessments, we consider (1) the extent to which their scores capture the Lectical™ dimension (the skill level of the performance); (2) how well they target the domain or topic of interest; (3) their relevance, particularly with respect to the relevance of their feedback; and (4) their utility, particularly with respect to the value of their feedback.
We track two forms of reliability: (1) internal consistency, which we examine with Rasch modeling software, and (2) inter-rater reliability.
We have been conducting research on (and with) our assessments for several years. Some of this research has been published in peer reviewed journals. Other research is documented in reports. The following section shows how some of our publications and reports relate to various aspects of reliability and validity. (Click on titles to view pdf documents.)
|
Predictive validity 1 |
Evidence |
LectaTests are designed to target real-world skills—skills that make us better at what we do at work and in our personal lives. If we are doing a good job, working with our assessments should support behavioral change.
|
In a preliminary analysis of Clear Impact's ambitious 40-hour, 9 month, leadership training initiative involving four levels of management in a large North American city, we examined the effects of embedding up to 8 LectaTests (including pre-and post LDMAs) on manager's growth and collaborative behavior. Most of the results reported here are restricted to the LDMA data of supervisors who (1) completed pre and post LDMAs and (2) had two or more supervisees who had completed pre and post LDMAs.  |
|
Regression results |
|
n |
r |
p |
The number of LectaTests taken by supervisors predicts their own Lectical growth
161
.19 |
.01 |
Greater Lectical growth of supervisors predicts higher 360 scores from direct reports
9 |
.59 |
.09 |
Higher 360 scores for supervisors predict higher average lectical growth of direct reports |
10 |
.60 |
.07 |
Higher average Lectical growth of direct reports predicts higher 360 scores from peers |
10 |
.46 |
.18 |
|
Predictive validity 2 |
Upper level managers, on average, have higher-level decision making skills than lower level managers. |
 |
Unidimensionality |
References |
Rasch modeling shows that the LAS captures a robust
dimension of performance. |
Dawson-Tunik, T. L. (2004). A good education is: The development of evaluative thought across the life-span. Genetic, Social, and General Psychology Monographs, 130, 4-112. |
Dawson-Tunik, T. L., Commons, M., Wilson, M., & Fischer, K. (2005). The shape of development. The European Journal of Developmental Psychology, 2, 163-196. |
Transformational learning |
References |
Rasch modeling shows that development along the latent
dimension measured by the LAS is wave-like, a pattern that is consistent with
the cognitive developmental postulate that development is characterized by a
series of nested, hierarchical reorganizations of knowledge structures
(rather than the simple accumulation of knowledge). |
Xie, Y., & Dawson, T. L. (2006). Multidimensional
models in a developmental context. In M. Garner, G. Engelhard, M. Wilson
& W. Fisher (Eds.), Advances in Rasch Measurement: JAM
Press. |
Dawson-Tunik, T. L. (2004). A good education is: The development of evaluative thought acros the life-span. Genetic, Social, and General Psychology Monographs, 130, 4-112. |
Dawson-Tunik, T. L., Commons, M., Wilson, M., & Fischer, K. (2005). The shape of development. The European Journal of Developmental Psychology, 2, 163-196. |
Dawson-Tunik, T. L. (2005, June). Cognitive
change is stage-like: The cumulative evidence from a decade of Rasch
modeling. Paper presented at the Annual Meeting of the Jean Piaget
Society, Vancouver. |
Dawson, T. L. (2006). Stage-like patterns in the
development of conceptions of energy. In X. Liu & W. Boone (Eds.), Applications of Rasch measurement in science education (pp.
111-136). Maple Grove, MN: JAM Press. |
Internal consistency |
References |
The internal consistency of the LAS has historically been
above .90. (As of 2009, we are maintaining alphas of .95 and above. In
general, reliability studies show that we can have confidence in lectical
scores to within 1/5 to 1/4 of a level, which means we can detect 4-7 distinct phases of performance within a typical classroom.) |
Dawson, T. L. (2000). Moral reasoning and evaluative
reasoning about the good life. Journal
of Applied Measurement, 1, 372-397. |
Dawson, T. L. (2002). A comparison of three developmental
stage scoring systems. Journal of
Applied Measurement, 3, 146-189. |
Dawson, T. L., Xie, Y., & Wilson, M. (2003). Domain-general and domain-specific developmental assessments: Do they measure the same thing? Cognitive Development, 18, 61-78. |
Dawson-Tunik, T. L. (2004). A good education is: The development of evaluative thought across the life-span. Genetic, Social, and General Psychology Monographs, 130, 4-112. |
Dawson-Tunik, T. L., Commons, M., Wilson, M., & Fischer, K. (2005). The shape of development. The European Journal of Developmental Psychology, 2, 163-196. |
Inter-rater reliability |
References |
Inter-rater reliability for the LAS consistently has been
above 85% agreement within 1/3 of a lectical level. (As of 2007, we maintain
an inter-rater agreement rate of 85% within 1/4 of a level.) |
Dawson-Tunik, T. L. (2004). A good education is: The development of evaluative thought across the life-span. Genetic, Social, and General Psychology Monographs, 130, 4-112. |
Dawson-Tunik, T. L., Commons, M., Wilson, M., & Fischer, K. (2005). The shape of development. The European Journal of Developmental Psychology, 2, 163-196. |
Dawson, T. L. (2006). Stage-like patterns in the
development of conceptions of energy. In X. Liu & W. Boone (Eds.), Applications of Rasch measurement in science education (pp.
111-136). Maple Grove, MN: JAM Press. |
Statistical reliability |
Alphas and variance explained by factor 1 |
In Rasch analyses of the assessments developed by DTS, the lectical dimension (hierarchical complexity) consistently explains 79–99% of the variance in performances. |
Test |
Project |
N |
item no. |
Alpha |
Variance explained |
LDMA |
CI |
1099 |
5 |
.968 |
88.73% |
FOLA |
DTS |
61 |
2 |
.894 |
91.44% |
LMBE |
LMBE |
13 |
5 |
.984 |
95.13% |
LLRA |
LLRA |
54 |
5 |
.988 |
95.47% |
LRJA |
LRJA |
224 |
7 |
.958 |
79.89% |
|
Evaluation studies |
References |
Lectical assessments have been used in a number of evaluation studies. They have been shown to capture learning over relatively short interventions and in small cohorts. (See table below for results of several piared samples t-tests.) |
Dawson-Tunik, T. L. & Stein, Z. (2004, July). Critical Thinking Seminar pre and post assessment results. Hatfield, MA: Developmental Testing Service, Inc. |
Dawson, T. L., & Stein, Z. (2006). National decision-making curriculum. Results of the pre- and post-instruction developmental assessments. Northampton, MA: Developmental Testing Service. |
Dawson, T. L., & Stein, Z. (2006). Mind Brain & Education study: Final report. Northampton, MA: Developmental Testing Service, Inc. |
Embedding LectaTests
This table shows growth during a current leadership training program in which LectaTests are embedded Embedded assessments inform instruction. For example, they may be used as part of a lesson plan, like a written assignment might be used to help learners solidify what they are learning. as formative assessments. Although class time is limited to 40 hours, participants are demonstrating development that is equivalent to what we have measured in year-long college courses (without embedding).
Scale |
Overall (114) |
Upper (7) |
Middle (43) |
Supers (57) |
|
Pre |
Post |
Pre |
Post |
Pre |
Post |
Pre |
Post |
Lectical score |
11.3 |
11.5 |
11.5 |
11.6 |
11.4 |
11.6 |
11.2 |
11.4 |
Perspective taking |
22 |
38 |
30 |
35 |
25 |
41 |
19 |
36 |
Perspective seeking |
8 |
17 |
8 |
20 |
10 |
19 |
6 |
15 |
Perspective coordination |
30 |
59 |
27 |
66 |
31 |
63 |
28 |
55 |
Collaborative capacity |
34 |
56 |
33 |
66 |
37 |
59 |
31 |
52 |
Contextual thinking |
31 |
52 |
33 |
65 |
33 |
56 |
29 |
48 |
Decision-making process |
28 |
54 |
32 |
57 |
28 |
60 |
26 |
50 |
|
|
|
|
Study |
N |
Interval |
Program length (hrs) |
Embedded |
Mean growth |
IT 2004, LDMA |
40 |
6 mos |
60 |
No |
0.06 |
IT 2005, LDMA |
32 |
6 mos |
60 |
Yes |
0.27 |
MH 2010, LRJA |
43 |
13 mos |
42 |
No |
0.13 |
AU 2010, LDMA |
28 |
12 mos |
43 |
No |
0.09 |
ZV 2012, LDMA |
18 |
4 mos |
40 |
Yes |
0.18 |
NA 2012, LDMA |
24 |
1 mos |
40 |
No |
0.03 |
NA1 2013, LDMA |
16 |
4 mos |
40 |
No |
0.05 |
NA2 2013, LDMA |
19 |
4 mos |
40 |
No |
0.07 |
ST 2012, LDMA |
27 |
6 mos |
40 |
No |
0.15 |
CI 2013, LDMA |
512 |
9 mos |
40 |
Yes |
0.18 |
|
The table on the right shows average growth for several evaluation studies. In some of these, LectaTests were embedded Embedded assessments inform instruction. For example, they may be used as part of a lesson plan, like a written assignment might be used to help learners solidify what they are learning. in curricula; in others, they were not. The average annual growth for undergraduates in college is about .13 of a level per year. The average growth for the 7 studies in which LectaTests were not embedded is .083, whereas the average growth for the 3 studies in which LectaTests were embedded, is .21. |
Detecting change over short periods and in small samples
In the table below, paired samples t-tests show levels of detectable growth in several program evaluations that were conducted with Lectical Assessments. They demonstrate that our measures can detect growth as small as .05 of a level in an average-sized classroom (NA1 2013, LDMA). Moreover, they show that measurable growth can occur with minimal instruction in a well-designed program. For example, the individuals in CI 2012 met only 4 times over the course of 3 months.
Study |
N |
Interval |
DF |
Mean time 1 |
Mean time 2 |
t2 - t1 |
t |
p |
IT 2005, LDMA |
32 |
6 mos |
31 |
10.98 |
11.17 |
0.27 |
7.05 |
.001 |
CI 2012, LDMA |
31 |
3 mos |
30 |
11.24 |
11.30 |
0.06 |
2.01 |
.053 |
ST 2012, LDMA |
27 |
3 mos |
26 |
11.18 |
11.27 |
0.09 |
2.64 |
.014 |
AU 2010, LDMA |
44 |
12 mos |
43 |
10.92 |
11.08 |
0.16 |
2.19 |
.034 |
CI 2013, LDMA |
185 |
9 mos |
184 |
11.31 |
11.49 |
0.17 |
14.39 |
.001 |
AU 2011, LDMA |
57 |
12 mos |
56 |
11.24 |
11.28 |
0.04 |
1.50 |
.140 |
AU 2011, LDMA |
38 |
12 mos |
37 |
11.25 |
11.32 |
0.07 |
2.28 |
.030 |
ZV 2012, LDMA |
18 |
4 mos |
17 |
11.26 |
11.44 |
0.18 |
5.91 |
.001 |
NA 2012, LDMA |
24 |
1-3 mos |
23 |
11.25 |
11.28 |
0.03 |
1.41 |
.170 |
NA1 2013, LDMA |
16 |
3 mos |
15 |
11.24 |
11.29 |
0.05 |
3.30 |
.001 |
NA2 2013, LDMA |
19 |
3 mos |
18 |
11.23 |
11.30 |
0.07 |
3.63 |
.001 |
MH 2010, LRJA |
43 |
13 mos |
42 |
11.32 |
11.19 |
0.13 |
2.22 |
.031 |
*All Lectical assessments meet or exceed the validity and reliability standards for educational and psychological testing set jointly by the APA, AERA, and NCME.
**See: Messick, S. (1980).
Test validity and the ethics of assessment. American
Psychologist, 35(11), 1012-1027.
|