Report on the CUNY Task Force on System Wide Assessment: The CLA is Coming

Philip A. Pecorino, Ph.D.

Queensborough Community College, CUNY

Faculty are about to be faced with dealing with claims made based on and the consequences of the use of a device purported to measure the amount of learning produced by CUNY undergraduate programs. However, the device and methodology for its application and conclusions drawn from the data produced are quite controversial. Perhaps the two most significant concerns or criticisms of the use of the device are that:

1.       The device does not provide for what the public and CUNY may want to know or report that it does know, namely, "We need to know what students know when they enter CUNY and what they know when they graduate". The device is only measure of rudimentary cognitive skills.

2.       Use of the device will not provide CUNY with what would be needed specific to curricula to improve teaching and learning.

The CUNY has made the decision to end the use of the CPE. The CUNY acknowledges the need to have some evidence of the value added by an education at CUNY and so it formed a task force to make recommendations for measures and processes for assessment of learning. The CUNY Task Force (TF) on System-Wide Assessment was charged to do something more than, and other than, to provide a replacement for the CPE.

The TF moved rather directly to recommend the use of the Collegiate Learning Assessment (CLA) by the Council for Aid to Education(CAE). (FULL DISCLOSURE, the chairman of the Board of Trustees of the CAE is chairman of the CUNY BOT, Benno Schmidt, and the CUNY Vice Chancellor of Community Colleges, Eduardo J. Marti is also on the board of trustees of the CAE.) See http://www.cuny.edu/about/administration/offices/ue/cue/AssessmentTaskForceReport2011.pdf for the Task Force Report (TFR) itself.

Here are just some of the concerns:

·         The CUNY Assessment Task Force (TF) recommends the use Collegiate Learning Assessment (CLA) for CUNY wide assessment of learning but the TF reports that the CLA presents problems with providing what CUNY charged the Task Force to recommend.   The Task Force Report (TFR) indicates that the CLA does not provide for an instrument that will as CUNY charged to provide: an instrument that will

measure other learning outcomes associated with general education at CUNY.

benchmark learning gains against those of comparable institutions outside CUNY

use the results to improve teaching and learning throughout CUNY.

·         Such a discrepancy supports the contention that the purpose of the test recommended is not to measure learning outcomes but to provide data that can support a public relations presentation showcasing CUNY's effectiveness and does so provide through dubious methodological means.

·         The use of the Collegiate Learning Assessment (CLA) provides for the examination of skills of two disparate groups that are to be selected using a cross-sectional rather than a longitudinal design. By design or accident the random selection method chosen that examines students entering and another group of those leaving programs of study permits the selection of people with quite different levels of entering skills and background knowledge. It is likely that comparisons of results will likely show higher level of skills in those graduating. Those entering, particularly at the community colleges, are likely to include ESL students and those in need of remediation in more than one area. Those exiting are likely not to include similar numbers of students. The uncritical reliance on the vendor supplied rhetoric concerning psychometrics to displace the critiques of methodology does not allay concerns over the merits of using the CLA.

·         There is no recommendation nor provision for analyzing the results of CUNY’s use of the CLA versus the results of examining people of similar ages to the two groups of CUNY students but who have spent time outside of formal instruction and who through maturation of brain structures and operations and increase in experiences in handling information acquired through interaction with their social and information environments have developed the cognitive skills that the CLA assesses.

·         Whatever the CLA provides is, on the admission of the TF and the developers of the CLA, not able to provide CUNY with what would be needed specific to curricula to improve teaching and learning.

·         There is concern that the TFR indicates more cautions against using the CLA than recommendations for its use.

·         There are indications of weaknesses in basic design and in the use of the results of the CLA and cautions with how to interpret the results.

·         The inferences drawn from the results of the CLA are dubious as the subject groups are so heterogeneous as to their ages, prior academic background preparation, interests, motivations, programs of study and transfer between programs. In the TFR There is no indication of consideration of alternative explanations for the results of the administration of the CLA.

·         The CLA is not an assessment of knowledge at all and its vendor acknowledges this. It does not provide for what CUNY may want to know or report that it does know, namely, "We need to know what students know when they enter CUNY and what they know when they graduate"

·         There were no considerations given by the TF of the latest findings of Neuroscience and Developmental Psychology.

·         There is no indication of a literature review of criticisms of the devices examined

·         The CLA is not normed against the general population in the age range of the typical test subjects.

·         The use of the CLA -testing (sampling) model- is severely flawed.

·         The test subjects can be severely compromised as to their performance due to range of motivations.

·         The test subject selection process can easily be severely compromised or manipulated to produce desired results (gaming the system).

The Charge to the Task Force

The CUNY Assessment Task Force recommendations do not meet the charges of the Task Force and how use of the Collegiate Learning Assessment (CLA) instrument cannot and thus will not meet the desired goals expressed by the Chancellery.

The background for the Task Force offers this significant background report.

After extensive deliberations, the CPE Task Force recommended that CUNY discontinue the use of the CPE (CUNY Proficiency Examination Task Force, 2010). As a certification exam, the CPE had become redundant. Nearly every student who was eligible to take the exam— by completing 45 credits with a 2.0 GPA or better— passed the exam. Further, given that the CPE was designed by CUNY and administered only within CUNY, it could not be used to benchmark achievements of CUNY students against those of students at comparable institutions. Because it was administered only at a single point in time, the CPE also did not measure learning gains over time. Finally, the development and administration of the test had become prohibitively expensive, projected at $5 million per year going forward. The Board of Trustees took action to discontinue the CPE in November 2010. ----Task Force Report (TFR), 4-5

In January 2011, the CUNY Task Force on System-Wide Assessment of Undergraduate Learning Gains (Assessment Task Force) was convened by Executive Vice Chancellor Alexandra Logue and charged as follows:

The Chancellery wishes to identify and adopt a standardized assessment instrument to

measure learning gains at all of CUNY’s undergraduate institutions. The instrument

should be designed to assess the ability to read and think critically, communicate

effectively in writing, and measure other learning outcomes associated with general

education at CUNY. It must be possible for each college and the University to benchmark learning gains against those of comparable institutions outside CUNY. It is the responsibility of the Task Force to identify the most appropriate instrument and to advise the Chancellery on how best to administer the assessment and make use of the results.

           The Task Force is charged with the following specific responsibilities:

1. Taking into account psychometric quality, the alignment of the domain of the

instrument with broad learning objectives at CUNY colleges, cost, facility of obtaining

and using results, and the ability to benchmark results externally, select an

assessment instrument from among those commercially available at this time.

2. Develop recommendations for the chancellery on how the assessment should best be administered so as to

a. represent each college’s undergraduate student body;

b. generate a valid assessment of learning;

c. facilitate comparisons across CUNY colleges and between CUNY and other

postsecondary institutions.

3. Develop recommendations on how the colleges and the chancellery can best use the

results to improve teaching and learning throughout CUNY.

----Task Force Report (TFR) Executive Summary and Introduction

As CUNY seeks to address the real and appropriate concern for public accountability and wants to provide some assurances to the public that there is real value in a CUNY education there is need for some measure to provide that assurance. It is often stated by university and college officials that there is a need to know what a student knows when entering CUNY and what the student knows when graduating.   The TFR indicates that with the Collegiate Learning Assessment (CLA) does not provide for an instrument that will:

·         measure other learning outcomes associated with general education at CUNY.

·         benchmark learning gains against those of comparable institutions outside CUNY

·         use the results to improve teaching and learning throughout CUNY.

Instead the TFR recommends use of a device that will not at all measure what a student knows but only, and perhaps in a dubious manner, provide some indication of what a student can do as far as basic cognitive skills.

Consider the following limitations acknowledged in the TFR:

The Task Force emphasizes that the CLA assesses a limited domain and should not be regarded as a comprehensive measure of general education outcomes defined by CUNY colleges. The test is not intended to evaluate all aspects of institutional effectiveness and is not designed to assess individual student or faculty performance. .—(TFR,3)

The Task Force does not, however, endorse the CLA for all purposes. CLA results are intended for use in evaluating learning outcomes only at the institutional level and primarily as a “signaling tool to highlight differences in programs that can lead to improvements in teaching and learning” (from the introduction to the sample 2009-2010 CLA Institutional Report). .—(TFR,16)

As indicated earlier, the CLA assesses learning in a limited domain and cannot be regarded as a comprehensive measure of general education outcomes as currently defined by CUNY colleges or as may be defined by the Pathways initiative. .—(TFR,16)

and again here:

Given the impossibility of capturing all outcomes with a single instrument, the Task Force identified the core learning outcomes common across CUNY: reading, critical thinking, written communication, quantitative reasoning and information literacy. The Task Force acknowledges that these competencies do not represent the full range of learning outcomes deemed essential

by CUNY colleges and institutions across the country (see Liberal Education and America’s Promise, 2007). Nor do they adequately represent discipline-specific knowledge and competencies. The assessment instrument best aligned with this restricted domain must therefore be seen as one component of a more comprehensive assessment system comprised of the many formative and summative measures tailored to assess general education learning outcomes. .—(TFR,6-7)

The concern is that CUNY as a whole and its units may use the CLA results to make claims about the effectiveness of curricula and of the General Education Core in particular when the instrument cannot support such claims. Nor can the results be used to advance teaching and learning as they are non-specific to programs of instruction.

The TFR notes that:

TheTask Force discussed the methodological issues associated with assessing learning gains, and this report contains some initial recommendations for administering the test. However, these questions merited additional deliberation, and more detailed recommendations will be presented in a supplementary report. .—(TFR,6)

That supplementary report has not been produced and it is a major issue as the CLA has numerous challenges in how it would be administered. Indeed it may prove most formidable to develop process and protocols that would enable the selection of student groups to be considered as acceptable according to standard criteria for the conduct of such assessments.

The importance of proper test administration is noted again here:

Administering the test to properly oriented students under standard and secure conditions is essential for gathering quality data..—(TFR,12).

Yet there is as yet no description of processes or protocols needed to secure quality data.

Cautions are included as well:

noting the need to conduct research on the validity of the electronic scoring methodology to be fully implemented soon by the Council for Aid to Education (CAE), the organization that develops and scores the CLA.—(TFR,2)

CUNY may be able to learn from other institutions how best to motivate randomly selected students to demonstrate their true ability on the assessment. .—(TFR,2)

Finally, the Task Force calls attention to the fact that the national sample of colleges that have administered the CLA differs in important respects from the CUNY student body, and that only a handful of community colleges have administered the community college version of the CLA to date. This lack of comparability may initially hamper CUNY’s ability to interpret its learning gains with reference to national averages. All of the other candidate tests are characterized by this important constraint.—(TFR,3)

The Council for Aid to Education (CAE) produces the CLA and the TFR notes that :

However, because the CAE has recently implemented machine scoring for all of its unstructured response tests, the Task Force recommends that the University obtain more information about the validity of the scoring process and consider the possible implications for the interpretation of test scores. .—(TFR,14)

There is no indication of any effort by the CAE to provide the information on the validity of the machine scoring process and there are questions concerning such an important matter.

The Task Force also urges caution with respect to interpreting the available benchmarking data. In its standard report to participating colleges, the CAE provides data comparing the learning gains at each college to gains measured in the national sample. The validity of these comparisons may be affected by the extent to which the colleges comprising the benchmark sample resemble CUNY and the degree to which the sample of tested students in the benchmark colleges reflects the total population of undergraduates in those colleges. .—(TFR,16)

The test is not intended to evaluate all aspects of institutional effectiveness and is not designed to assess individual student or faculty performance. .—(TFR,16)

Questionable claims are made related to basic concepts of testing and measurements:

The CLA will be administered to samples of students who are just beginning their undergraduate studies and to students who are nearing the end of their undergraduate career. The sampling must be done randomly to produce representative results; yet random sampling will pose logistical challenges. .—(TFR,2)

The TFR indicates that :

This report contains the Task Force’s recommendations for a test instrument. A supplementary report will provide guidance on test administration and use of test results by faculty and academic administrators.—(TFR,16)

Criticism of the cross sectional method for sampling:

My understanding is that the CLA assessment is based on a
cross-sectional design in which a group of freshmen at an institution
are compared to a group of seniors at the same institution. The
groups are equated through adjustments based on the observed
influences of covariates (e.g., SAT). This approach seems reasonably
if you are trying to measure a relatively uniform treatment (all
students receive very similar training during the 4 years) but I have
not seen any discussion of the much more complicated environment at
CUNY.
A plurality of CUNY students are transfer students. Their
educational experience is influenced by two or more institutions and
they frequently take a great deal of time to graduate. The challenges
to a cross-sectional design are considerable.

First, there is the obvious problem of identifying the effects that
each institution had on a student even when one is reasonably certain
that the seniors began their education with similar skill levels to
the freshmen to which they are being compared.

Second, there are a number of threats to the validity of the
cross-sectional design that make it very difficult to be certain that
confounds have not created differences that can be attributed to the
proper source.

One problem is that students self-select and the student who transfers
from BCC to Lehman may be different in many ways from a BCC student
who transfers to Baruch. It seems quite possible that these students
could differ in area of interest, motivation, or other factors that
are not captured by the standard covariates that are used to equate
the groups.

Another problem is that admission criteria differ significantly at the
senior colleges and students who may be admissible to Lehman or Queens
may not be admissible to Baruch. This difference constitutes a kind
of selection bias because students will begin their education at a
senior college with systematically different skills.

Moroever, the transfer admission criteria are not typically included
as covariates to the CLA process and the students who have had the
most significant gains at a 2yr school are exactly those students who
will be disproportionately represented at a senior college with
tougher transfer admission criteria. --Professor Kevin Sailor (Psychology, Lehman)

“A major problem has to do with implementation. If a college were to choose a weak group of freshmen and an outstanding group of seniors to take the test, a college would appear to be doing very well. Would anyone anywhere try to game the system in this way? Hmmm.”-- Dean Savage (Sociology, Queens College)

Lisa A. Ellis (Library, Baruch College), who served on the Task Force, has replied:

This is not correct for a number of reasons. First it is not “implementation” but “sampling” which is a problem. The report notes the differences between cross-sectional and longitudinal studies and gives the testing of freshmen and seniors as an example of how these particular study methodologies differ, This is not to be read as a statement of how sampling will be done if such a test were to be administered on CUNY campuses. On page 17, it reads, “However, both designs present challenges associated with the treatment of drop-outs and transfer students, and solutions to these issues must be standardized if the measurement of gains is to be benchmarked across institutions.”   The Task Force not only recommends a cross-sectional design but also, “recommends testing students beginning of their academic career [freshmen], at roughly the 60th credit [upper sophomores or those nearing completion of Associate’s degrees], and for students pursuing the bachelor’s degree, when approaching the 120th credit [presumably seniors, if they have not taken an excess of credits].”

Absent the instructions for implementation as indicated as needed by the TFR the reply of Dr. Ellis does not suffice to dismiss the concerns raised by others concerning the use of the cross sectional approach.

The absence of that supplementary report, that is claimed will be available in December of 2011, leaves open the significant questions concerning how it would be possible to administer the CLA in CUNY in a manner that would be valid and stand up to scrutiny using strict standards employed by the Social Sciences.

The TFR acknowledges some of the challenges in meeting those standards:

To measure learning gains, CUNY must choose either a cross-sectional or a longitudinal design. In a cross-sectional study, random samples of freshmen and seniors are drawn during the school year— freshmen in the fall and seniors in the spring. In a longitudinal study, a group of freshmen is tested in their first year, and then again as seniors. In theory, the two designs should yield equivalent results. However, both designs present challenges associated with the treatment of drop-outs and transfer students, and solutions to these issues must be standardized if the measurement of gains is to be benchmarked across institutions. Because of the multi-year period required to execute a longitudinal design, the Task Force endorses a cross-sectional design. Moreover, because CUNY wishes to use the same instrument to test learning outcomes at all of its colleges—community and senior—the Task Force recommends testing students at the beginning of their academic career, at roughly the 60th credit, and for students pursuing the bachelors degree, when approaching the 120th credit. Finally, in developing a sampling scheme, analysts must take into account the numbers of ESL and remedial students, and the appropriateness of including them in the college’s representative sample. Both groups may face special challenges in a timed testing situation.

The methodological issues of sampling will have a direct effect not only on assessments of learning at the institutional level, but also on calculations of learning gains and subsequent derivations of the learning gains to be ascribed to the college rather than to natural maturation.

A further complication to measuring learning gains is determining the nature and significance of any gain. The assessment of learning gains must take into account changes in performance from one point in time to the next, as well as gain relative to specific standards. With both methodological and substantive complexities in play, the Task Force recommends caution in the

initial administrations of the test and the use of multiple alternative measures to help in the interpretation of results. .—(TFR,16)

Issue with other factors contributing to the results:

Again the TFR notes that:

The methodological issues of sampling will have a direct effect not only on assessments of learning at the institutional level, but also on calculations of learning gains and subsequent derivations of the learning gains to be ascribed to the college rather than to natural maturation. .—(TFR,16)

Neither the CAE nor the TFR appear to take as fundamentally significant the failure to account for natural maturation in comparing groups that range from 17 to 20 years of age with groups that will be four or more years older.

Recent studies indicate that the higher cognitive process in human brains develop through that period of time regardless of formal education.

Reyna, Valerie F. and Farley, Frank. Risk and Rationality in Adolescent Decision Making: Implications for Theory, Practice and Public Policy. Psychological Science in the Public Interest, Volume 7, No. 1, September 2006

Sowell, Elizabeth R., Thompson, Paul M. , Holmes, Colin J., Jernigan, Terry I., and Toga, Arthur W. In vivo evidence for post-adolescent brain maturation in frontal and striatal regions. Nature neuroscience, Volume 2, No. 10, Ocotober, 1999.

Blakemore, Sarah-Jayne and Choudhury, Suprana. Development of the adolescent brain: implications for executive function and social cognition. Journal of Child Psychology and Psychiatry. Vol. 47, No.3/4, 2006, pp. 296-312.

Dahl, Ronald E., Adolescent Brain Development: A Period of Vulnerabilities and Opportunities, Annals of NY Academy of Science 1021:, 2004 ,1-22

Criticism of the failure to account for other factors contributing to the results of the CLA:

…the outside influence on a student’s intellectual skills is
quite likely to be much more variable for transfer students than it is
for native matriculants. For example, many students work and it seems
quite likely that they pick up some skills through their work
experience. Consider a student who works in a white collar
environment (an accounting department at a large firm) versus one who
works in a blue collar environment (night security). It seems likely
that the student in a white collar environment will be exposed to
information and tasks that would have a greater impact on skills
assessed by the CLA. If this exposure lasts for four or five years
then the contributions might be significant.

Thus, some gains may be due to the outside environment rather than the
school environment. Moreover, this possible influence would
undermine comparisons across schools or programs unless these kinds of
experiences are distributed equally across the populations who
transfer from each 2 year school to every 4 year school. --Professor Kevin Sailor (Psychology, Lehman)

Lisa A. Ellis (Library, Baruch College), who served on the Assessment Task Force, has written:

“…In truth, there are a number of factors that may impact learning gains which may or may not include teaching.”

Use of CLA in CUNY

While some like Dean Savage (Sociology, Queens College) caution:

“It's very likely that Central will use the CLA results to evaluate college effectiveness, so everyone on the campuses should be aware of the test's limitations, and be prepared to evaluate the results accordingly.”

Lisa A. Ellis (Library, Baruch College), who served on the Assessment Task Force, has written the following:

“We had numerous discussions on the committee about how the test results will be used (i.e PPM, cross campus comparisons, eliminate programs, etc.) and as a group were firmly opposed to such use. Minutes were taken and this appears numerous times during the minutes as a caution to what information the test results can provide each campus in terms of what actions can be reasonably taken to improve or change learning gains depending on the results received. In truth, there are a number of factors that may impact learning gains which may or may not include teaching.”

Need for Other measures and devices to be employed and implementation guidelines

Indeed the TFR includes:

The Task Force identified sampling design, motivation of students, and involvement of faculty as keys to the successful implementation of the CLA. Sampling must be conducted carefully so that the test results accurately reflect the level of learning and unique demographics at each CUNY institution. Because the test is not high stakes, CUNY must devise a strategy for encouraging test takers to demonstrate their true abilities on the test. Finally, unless faculty believe that the test is a valuable tool for assessing the learning goals they are attempting to advance in their own classrooms, the information generated by the assessment will not become a resource for improving learning outcomes of undergraduate students.—(TFR,16).

It needs to be emphasized that the TFR specifically cautions:

…. With both methodological and substantive complexities in play, the Task Force recommends caution in the initial administrations of the test and the use of multiple alternative measures to help in the interpretation of results. .—(TFR,16)

Yet there are no multiple measures being described or provided thus far and the concern is that they will not be and the results of the CLA, however administered, may be interpreted in a manner to suit the interests of those who would require its use for purposes other than improving pedagogy.

Although the TFR reports that The assessment instrument best aligned with this restricted domain must therefore be seen as one component of a more comprehensive assessment system comprised of the many formative and summative measures tailored to assess general education learning outcomes. .—(TFR,6-7) and “recommends caution in the initial administrations of the test and the use of multiple alternative measures to help in the interpretation of results. .—(TFR,16)

It is now reported by the Chancellery that

“the Task Force and the Chancellery share the strong view that responsibility for assessment of all kinds rests with the colleges and especially with the faculty.   It is the purview of the faculty to define learning goals and outcomes, identify appropriate measures and evidence to assess progress toward those goals, and use the results of many strands of evidence for improvement. “ -David Crook

So the CLA is to be just one component of a more comprehensive assessment system comprised of the many formative and summative measures tailored to assess general education learning outcomes that the colleges will devise or acquire and use. The use of multiple alternative measures to help in the interpretation of results will depend on the colleges to develop or acquire the alternative measures. The Chancellery will supply no such measures. This will leave the CLA results to be the only assessment of “learning” across CUNY that will be used to report on the efficacy of the curricula of the University: an assessment of rudimentary skills using a problematic instrument.

###################################

Works Cited by CUNY Task Force

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1999). Standards for Educational and Psychological Testing. Washington, D.C.: American Educational Research Association.

Arum, R. and J. Roksa (2011). Academcially Adrift: Limited Learning on College Campuses. University of Chicago Press.

CUNY Proficiency Examination Task Force. (2010). Report of the CUNY Proficiency Examination Task Force. New York: City University of New York.

Ewell, P. (2009). Assessment, Accountability and Improvement: Revisiting the Tension. University of Illinois and University of Indiana: National Institute for Learning Outcomes Assessment.

Hutchins, P. (2010). Opening Doors to Faculty Involvement in Assessment. University of Illinois at Urbana-Champaign: National Institute for Learning Outcomes Assessment.

Liberal Education and America's Promise. (2007). Liberal Education and America's Promise (LEAP) - Essential Learning Outcomes. Retrieved June 21, 2011, from AAC&U - Association of American Colleges and Universities: http://www.aacu.org/leap/vision.cfm

National Institute for Learning Outcomes Assessment. (2011). Tool Kit: Tests. Retrieved July 13, 2011, from http://www.learningoutcomesassessment.org/tests.htm

Rhodes, T. (. (2010). Assessing Outcomes and Improving Achievement: Tips and Tools for Using Rubrics. Washington, D.C.: Association of American Colleges and Universities.

VALUE: Valid Assessment of Learning in Undergraduate Education Project. (2007). VALUE: Valid Assessment of Learning in Undergraduate Education Overview. Retrieved June 21, 2011, from AAC&U Association of American Colleges and Universities: http://www.aacu.org/value/index.cfm

Voluntary System of Accountability (2007). About VSA. Retrieved June 21, 2011, from

Voluntary System of Accountability: http://www.voluntarysystem.org

###################################

OTHER RELEVANT RESOURCES

AAC&U. (2005). Liberal education outcomes. Washington, DC: Association of American Colleges and Universities.

AASCU. (Spring 2006). Value-added Assessment. Perspectives. Washington, DC: American Association of State Colleges and Universities.

Arum, R., Roksa, J., & Velez, M. (2008). Learning to reason and communicate in college: Initial report of findings from the CLA longitudinal study. Brooklyn, NY: The Social Science Research Council.

Banta, T. W., and G. R. Pike. 2007. Revisiting the blind alley of value-added. Assessment Update 19 (1), 1,2,14,15.

Benjamin, R. and M. Chun (2003). A new field of dreams: The Collegiate Learning Assessment project.” Peer Review,. 5(4), 26-29.(http://www.cae.org/content/pro_collegiate_reports_publications.htm; 5/23/08).

Benjamin, R., Chun, M., & Shavelson, R. (2007). Holistic Tests in a sub-score world: The diagnostic logic of the Collegiate Learning Assessment. New York, NY: Council for Aid to Education. Found at (11/24/07) http://www.cae.org/content/pdf/WhitePaperHolisticTests.pdf

Benjamin, R., & Chun, M. (2009). Returning to learning in an age of assessment: A synopsis of the argument. New York, NY: Council for Aid to Education monograph.

Benjamin. R., Chun, M. (2003). A new field of dreams: The CLA Project. Peer Review, 2003, 5 (14), 26-29.

Blakemore, Sarah-Jayne and Choudhury, Suprana. Development of the adolescent brain: implications for executive function and social cognition. Journal of Child Psychology and Psychiatry. Vol. 47, No.3/4, 2006, pp. 296-312.

Braun, H. J. 2005 Case, R. (1996) Changing views of knowledge and their impact on educational research and practice. In D.R. Olson & N. Torrance (Eds.), Handbook of human development in education: New models of learning, teaching, and schooling. Oxford: Blackwell.

Braun, H.J. (2005). Using student progress to evaluate teachers: A primer on valueadded

models. New Jersey: Educational Testing Service.

Brennan, R.L. (1995). The conventional wisdom about group mean scores. Journal of Educational Measurement, 32(4), 385-396.

CLA (2006) Sample Institutional Report. www.cae.org/cla

Carpenter, Andrew N and Bach, Craig Learning Assessment: Hyperbolic Doubts Versus Deflated Critiques http://ellis.academia.edu/AndrewCarpenter/Papers/172152/Learning_Assessment_Hyperbolic_Doubts_Versus_Deflated_Critiques

Carroll, J.B. (1993). Human cognitive abilities: A survey of factor-analytic studies. NewYork: Cambridge University Press.Cronbach, L.J. (1990). Essential of psychological and educational testing. 5th edition.New York: Harper Collins.

Council for Aid to Education. (Fall, 2006). CLA Interim Institutional Report. New York, NY:Council for Aid to Education.

Council for Aid to Education (2006) Collegiate Learning Assessment. New York, NY: Council for Aid to Education.

Council for Aid to Education. (2008). CLA Interim Institutional Report. New York, NY: Council for Aid to Education.

Dahl, Ronald E., Adolescent Brain Development: A Period of Vulnerabilities and Opportunities, Annals of NY Academy of Science 1021:, 2004 ,1-22

Dwyer, C. A., Millett, C. M., & Payne, D. G. (2006). A Culture of Evidence:Postsecondary assessment and learning outcomes. Princeton, N.J.: Educational Testing Service.

Ekman, R., & Pelletier, S. (2008). Assessing student learning: A work in progress. Change: The Magazine of Higher Learning, 40(4), 14-19.

Erwin, D., & Sebrell, K.W. (2003). Assessment of critical thinking: ETS’s tasks in critical thinking. The Journal of General Education, 52(1), 50-70.

Ewell, P. T. (1994). A policy guide for assessment: Making good use of the Tasks in

Critical Thinking. Princeton, NJ: Educational Testing Service.

Garrett, J. (2009) English composition report. Los Angeles, CA; California State University.

Glenn, David. Scholar Raises Doubts about the Value of a Test of Student Learning. The Chronicle of Higher Education, June 2, 2010.

Graff, G., & Birkenstein, C. (May/June 2008). A Progressive Case for Educational Standardization: How not to respond to the Spellings report. Academe Online, http://www.aaup.org/AAUP/pubsres/academe/2008/MJ/Feat/graf.htm (May 20, 2008).

Hafner, A. (2010). NSSE 2009 findings: Comparisons between CSU students and far West peers and trends over time. Los Angeles, CA: California State University.

Hardison, C. M., & Vilamovska, A. (2009). The Collegiate Learning Assessment: Setting standards for performance at a college or university. Santa Monica, CA: RAND Education.

Hardison, C.M., & Valamovska, A-M. (2008). Critical thinking performance tasks: Setting and applying standards for college-level performance. PM-2487-CAE. Santa Monica, CA: Rand.

Hosch, Braden J. Time on Test, Student Motivation, and Performance on the Collegiate Learning Assessment: Implications for Institutional Accountability, Association for Institutional Research Annual Forum, Chicago, IL, June, 2010.

Klein, S., Kuh, G., Chun, M., Hamilton, L., & Shavelson, R. (2005). An approach to measuring cognitive outcomes across higher-education institutions. Research in Higher Education, 46, 3, 251-276.

Klein, S. & Bolus, R. (1982). An analysis of the relationship between clinical skills and bar examination results. Report prepared for the Committee of Bar Examiners of the State Bar of California and the National Conference of Bar Examiners.Klein, S. (1983). Relationship of bar examinations to performance tests of lawyering skills. Paper presented to the American Educational Research Association, Montreal, April. (reprinted in Professional Education Researcher Notes, 1982, 4,

10-11).

Klein, S. ). Characteristics of hand and machine-assigned scores to collegestudents’ answers to open-ended tasks. In Festschrift for David Freedman, D.Nolan and T. Speed, editors: Beachwood, OH. Institute for Mathematical Statistics, 2008.

Klein, S., Benjamin, R., Shavelson, R., & Bolus, R. (2007). The Collegiate Learning Assessment: Facts and fantasies. Evaluation Review, 31(5), 415-439.

Klein, S., Freedman, D., Shavelson, R., & Bolus, R. (2008). Assessing school effectiveness. Evaluation Review, 32(6), 511-525.

Klein, S. P., Kuh, G. D., Chun, M., Hamilton, L., & Shavelson, R. (2003, April). The search for value-added: Assessing and validating selected higher education outcomes. Paper presented at the 88th American Educational Research Association (AERA).

Klein, S. P., Kuh, G. D., Chun, M., Hamilton, L., & Shavelson, R. (2005). An approach to measuring cognitive outcomes across higher education institutions. Research in Higher Education, 46(3), 251-276.

Klein, S., Liu, O. L., Sconing, J., Bolus, R., Bridgeman, B., Kugelmass, H., Nemeth, A., Robbins, S., & Steedle, J. (2009). Test validity study report. Retrieved March 31, 2010,

from the Web:http://www.voluntarysystem.org/docs/reports/TVSReport_Final.pdf

Kuh, G. (2006). Director’s Message in: Engaged Learning: Fostering Success for All Students. Bloomington, Indiana: National Survey of Student Engagement.

Landgraf, K. (2005). Cover letter accompanying the distribution of Braun (2005) report.

McClelland, D.C. (1973). Testing for competence rather than for “intelligence.”American Psychologist, 28(1), 1-14.

Powers, D., Burstein, J., Chodorow, M., Fowles, M., & Kukich, K. (2000). Comparing the validity of automated and human essay scoring (GRE No. 98-08a, ETS RR-00-10). Princeton, NJ: Educational Testing Service.

Powers, D., Burstein, J., Chodorow, M., Fowles, M., & Kukich, K. (2001). Stumping erater: Challenging the validity of automated scoring. (GRE No. 98-08Pb, ETS RR-01-03). Princeton, NJ: Educational Testing Service.

Reyna, Valerie F. and Farley, Frank. Risk and Rationality in Adolescent Decision Making: Implications for Theory, Practice and Public Policy. Psychological Science in the Public Interest, Volume 7, No. 1, September 2006

Sackett, P.R., Borneman, M.J., & Connelly, B.S. (2008). High-stakes testing in higher education and employment. American Psychologist, 63(4), 215-227.

Shavelson, R.J. (2007a). Assessing student learning responsibly: From history to an audacious proposal. Change. January/February, 2007.

Shavelson, R.J. (2007b). Student learning assessment: From history to an audacious proposal. AAC&U.

Shavelson, R., & Huang, L. (2005). CLA conceptual framework. New York: Council for Aid to Education.

Shavelson, R.J. (2007). A brief history of student learning: How we got where we are and a proposal for where to go next. Washington, DC: Association of American Colleges and Universities’ The Academy in Transition.

Shavelson, R. (2007 January/February). Assessing student learning responsibly: From history to an audacious proposal. Change, 26-33.

Shavelson, R.J. (2008a,b): Aspen Paper and Wingspread Paper

Shermis, M. D. (2008). The Collegiate Learning Assessment: A critical perspective. Assessment Update, 20(2), 10-12.

Sowell, Elizabeth R., Thompson, Paul M. , Holmes, Colin J., Jernigan, Terry I., and Toga, Arthur W. In vivo evidence for post-adolescent brain maturation in frontal and striatal regions. Nature neuroscience, Volume 2, No. 10, Ocotober, 1999.

Steedle, J. (2009). Advancing institutional value-added score estimation. New York: Council for Aid to Education

Taylor, K.L., & Dionne, J-P. (2000). Accessing Problem-Solving Strategy Knowledge: The Complementary Use of Concurrent Verbal Protocols and Retrospective Debriefing. Journal of Educational Psychology, 2000, 92(3), 413-425.

U.S. Department of Education (2006). A test of leadership: Charting the Future of U.S. Higher Education. Washington, D.C.

Philip Pecorino teaches philosophy and ethics at Queensborough Community College of the City University of New York. If you have questions or comments about this article, please contact him at ppecorino@qcc.cuny.edu.