• Reference Manager
  • Simple TEXT file

People also looked at

Original research article, performance assessment of critical thinking: conceptualization, design, and implementation.

articles on assessing critical thinking

  • 1 Lynch School of Education and Human Development, Boston College, Chestnut Hill, MA, United States
  • 2 Graduate School of Education, Stanford University, Stanford, CA, United States
  • 3 Department of Business and Economics Education, Johannes Gutenberg University, Mainz, Germany

Enhancing students’ critical thinking (CT) skills is an essential goal of higher education. This article presents a systematic approach to conceptualizing and measuring CT. CT generally comprises the following mental processes: identifying, evaluating, and analyzing a problem; interpreting information; synthesizing evidence; and reporting a conclusion. We further posit that CT also involves dealing with dilemmas involving ambiguity or conflicts among principles and contradictory information. We argue that performance assessment provides the most realistic—and most credible—approach to measuring CT. From this conceptualization and construct definition, we describe one possible framework for building performance assessments of CT with attention to extended performance tasks within the assessment system. The framework is a product of an ongoing, collaborative effort, the International Performance Assessment of Learning (iPAL). The framework comprises four main aspects: (1) The storyline describes a carefully curated version of a complex, real-world situation. (2) The challenge frames the task to be accomplished (3). A portfolio of documents in a range of formats is drawn from multiple sources chosen to have specific characteristics. (4) The scoring rubric comprises a set of scales each linked to a facet of the construct. We discuss a number of use cases, as well as the challenges that arise with the use and valid interpretation of performance assessments. The final section presents elements of the iPAL research program that involve various refinements and extensions of the assessment framework, a number of empirical studies, along with linkages to current work in online reading and information processing.

Introduction

In their mission statements, most colleges declare that a principal goal is to develop students’ higher-order cognitive skills such as critical thinking (CT) and reasoning (e.g., Shavelson, 2010 ; Hyytinen et al., 2019 ). The importance of CT is echoed by business leaders ( Association of American Colleges and Universities [AACU], 2018 ), as well as by college faculty (for curricular analyses in Germany, see e.g., Zlatkin-Troitschanskaia et al., 2018 ). Indeed, in the 2019 administration of the Faculty Survey of Student Engagement (FSSE), 93% of faculty reported that they “very much” or “quite a bit” structure their courses to support student development with respect to thinking critically and analytically. In a listing of 21st century skills, CT was the most highly ranked among FSSE respondents ( Indiana University, 2019 ). Nevertheless, there is considerable evidence that many college students do not develop these skills to a satisfactory standard ( Arum and Roksa, 2011 ; Shavelson et al., 2019 ; Zlatkin-Troitschanskaia et al., 2019 ). This state of affairs represents a serious challenge to higher education – and to society at large.

In view of the importance of CT, as well as evidence of substantial variation in its development during college, its proper measurement is essential to tracking progress in skill development and to providing useful feedback to both teachers and learners. Feedback can help focus students’ attention on key skill areas in need of improvement, and provide insight to teachers on choices of pedagogical strategies and time allocation. Moreover, comparative studies at the program and institutional level can inform higher education leaders and policy makers.

The conceptualization and definition of CT presented here is closely related to models of information processing and online reasoning, the skills that are the focus of this special issue. These two skills are especially germane to the learning environments that college students experience today when much of their academic work is done online. Ideally, students should be capable of more than naïve Internet search, followed by copy-and-paste (e.g., McGrew et al., 2017 ); rather, for example, they should be able to critically evaluate both sources of evidence and the quality of the evidence itself in light of a given purpose ( Leu et al., 2020 ).

In this paper, we present a systematic approach to conceptualizing CT. From that conceptualization and construct definition, we present one possible framework for building performance assessments of CT with particular attention to extended performance tasks within the test environment. The penultimate section discusses some of the challenges that arise with the use and valid interpretation of performance assessment scores. We conclude the paper with a section on future perspectives in an emerging field of research – the iPAL program.

Conceptual Foundations, Definition and Measurement of Critical Thinking

In this section, we briefly review the concept of CT and its definition. In accordance with the principles of evidence-centered design (ECD; Mislevy et al., 2003 ), the conceptualization drives the measurement of the construct; that is, implementation of ECD directly links aspects of the assessment framework to specific facets of the construct. We then argue that performance assessments designed in accordance with such an assessment framework provide the most realistic—and most credible—approach to measuring CT. The section concludes with a sketch of an approach to CT measurement grounded in performance assessment .

Concept and Definition of Critical Thinking

Taxonomies of 21st century skills ( Pellegrino and Hilton, 2012 ) abound, and it is neither surprising that CT appears in most taxonomies of learning, nor that there are many different approaches to defining and operationalizing the construct of CT. There is, however, general agreement that CT is a multifaceted construct ( Liu et al., 2014 ). Liu et al. (2014) identified five key facets of CT: (i) evaluating evidence and the use of evidence; (ii) analyzing arguments; (iii) understanding implications and consequences; (iv) developing sound arguments; and (v) understanding causation and explanation.

There is empirical support for these facets from college faculty. A 2016–2017 survey conducted by the Higher Education Research Institute (HERI) at the University of California, Los Angeles found that a substantial majority of faculty respondents “frequently” encouraged students to: (i) evaluate the quality or reliability of the information they receive; (ii) recognize biases that affect their thinking; (iii) analyze multiple sources of information before coming to a conclusion; and (iv) support their opinions with a logical argument ( Stolzenberg et al., 2019 ).

There is general agreement that CT involves the following mental processes: identifying, evaluating, and analyzing a problem; interpreting information; synthesizing evidence; and reporting a conclusion (e.g., Erwin and Sebrell, 2003 ; Kosslyn and Nelson, 2017 ; Shavelson et al., 2018 ). We further suggest that CT includes dealing with dilemmas of ambiguity or conflict among principles and contradictory information ( Oser and Biedermann, 2020 ).

Importantly, Oser and Biedermann (2020) posit that CT can be manifested at three levels. The first level, Critical Analysis , is the most complex of the three levels. Critical Analysis requires both knowledge in a specific discipline (conceptual) and procedural analytical (deduction, inclusion, etc.) knowledge. The second level is Critical Reflection , which involves more generic skills “… necessary for every responsible member of a society” (p. 90). It is “a basic attitude that must be taken into consideration if (new) information is questioned to be true or false, reliable or not reliable, moral or immoral etc.” (p. 90). To engage in Critical Reflection, one needs not only apply analytic reasoning, but also adopt a reflective stance toward the political, social, and other consequences of choosing a course of action. It also involves analyzing the potential motives of various actors involved in the dilemma of interest. The third level, Critical Alertness , involves questioning one’s own or others’ thinking from a skeptical point of view.

Wheeler and Haertel (1993) categorized higher-order skills, such as CT, into two types: (i) when solving problems and making decisions in professional and everyday life, for instance, related to civic affairs and the environment; and (ii) in situations where various mental processes (e.g., comparing, evaluating, and justifying) are developed through formal instruction, usually in a discipline. Hence, in both settings, individuals must confront situations that typically involve a problematic event, contradictory information, and possibly conflicting principles. Indeed, there is an ongoing debate concerning whether CT should be evaluated using generic or discipline-based assessments ( Nagel et al., 2020 ). Whether CT skills are conceptualized as generic or discipline-specific has implications for how they are assessed and how they are incorporated into the classroom.

In the iPAL project, CT is characterized as a multifaceted construct that comprises conceptualizing, analyzing, drawing inferences or synthesizing information, evaluating claims, and applying the results of these reasoning processes to various purposes (e.g., solve a problem, decide on a course of action, find an answer to a given question or reach a conclusion) ( Shavelson et al., 2019 ). In the course of carrying out a CT task, an individual typically engages in activities such as specifying or clarifying a problem; deciding what information is relevant to the problem; evaluating the trustworthiness of information; avoiding judgmental errors based on “fast thinking”; avoiding biases and stereotypes; recognizing different perspectives and how they can reframe a situation; considering the consequences of alternative courses of actions; and communicating clearly and concisely decisions and actions. The order in which activities are carried out can vary among individuals and the processes can be non-linear and reciprocal.

In this article, we focus on generic CT skills. The importance of these skills derives not only from their utility in academic and professional settings, but also the many situations involving challenging moral and ethical issues – often framed in terms of conflicting principles and/or interests – to which individuals have to apply these skills ( Kegan, 1994 ; Tessier-Lavigne, 2020 ). Conflicts and dilemmas are ubiquitous in the contexts in which adults find themselves: work, family, civil society. Moreover, to remain viable in the global economic environment – one characterized by increased competition and advances in second generation artificial intelligence (AI) – today’s college students will need to continually develop and leverage their CT skills. Ideally, colleges offer a supportive environment in which students can develop and practice effective approaches to reasoning about and acting in learning, professional and everyday situations.

Measurement of Critical Thinking

Critical thinking is a multifaceted construct that poses many challenges to those who would develop relevant and valid assessments. For those interested in current approaches to the measurement of CT that are not the focus of this paper, consult Zlatkin-Troitschanskaia et al. (2018) .

In this paper, we have singled out performance assessment as it offers important advantages to measuring CT. Extant tests of CT typically employ response formats such as forced-choice or short-answer, and scenario-based tasks (for an overview, see Liu et al., 2014 ). They all suffer from moderate to severe construct underrepresentation; that is, they fail to capture important facets of the CT construct such as perspective taking and communication. High fidelity performance tasks are viewed as more authentic in that they provide a problem context and require responses that are more similar to what individuals confront in the real world than what is offered by traditional multiple-choice items ( Messick, 1994 ; Braun, 2019 ). This greater verisimilitude promises higher levels of construct representation and lower levels of construct-irrelevant variance. Such performance tasks have the capacity to measure facets of CT that are imperfectly assessed, if at all, using traditional assessments ( Lane and Stone, 2006 ; Braun, 2019 ; Shavelson et al., 2019 ). However, these assertions must be empirically validated, and the measures should be subjected to psychometric analyses. Evidence of the reliability, validity, and interpretative challenges of performance assessment (PA) are extensively detailed in Davey et al. (2015) .

We adopt the following definition of performance assessment:

A performance assessment (sometimes called a work sample when assessing job performance) … is an activity or set of activities that requires test takers, either individually or in groups, to generate products or performances in response to a complex, most often real-world task. These products and performances provide observable evidence bearing on test takers’ knowledge, skills, and abilities—their competencies—in completing the assessment ( Davey et al., 2015 , p. 10).

A performance assessment typically includes an extended performance task and short constructed-response and selected-response (i.e., multiple-choice) tasks (for examples, see Zlatkin-Troitschanskaia and Shavelson, 2019 ). In this paper, we refer to both individual performance- and constructed-response tasks as performance tasks (PT) (For an example, see Table 1 in section “iPAL Assessment Framework”).

www.frontiersin.org

Table 1. The iPAL assessment framework.

An Approach to Performance Assessment of Critical Thinking: The iPAL Program

The approach to CT presented here is the result of ongoing work undertaken by the International Performance Assessment of Learning collaborative (iPAL 1 ). iPAL is an international consortium of volunteers, primarily from academia, who have come together to address the dearth in higher education of research and practice in measuring CT with performance tasks ( Shavelson et al., 2018 ). In this section, we present iPAL’s assessment framework as the basis of measuring CT, with examples along the way.

iPAL Background

The iPAL assessment framework builds on the Council of Aid to Education’s Collegiate Learning Assessment (CLA). The CLA was designed to measure cross-disciplinary, generic competencies, such as CT, analytic reasoning, problem solving, and written communication ( Klein et al., 2007 ; Shavelson, 2010 ). Ideally, each PA contained an extended PT (e.g., examining a range of evidential materials related to the crash of an aircraft) and two short PT’s: one in which students either critique an argument or provide a solution in response to a real-world societal issue.

Motivated by considerations of adequate reliability, in 2012, the CLA was later modified to create the CLA+. The CLA+ includes two subtests: a PT and a 25-item Selected Response Question (SRQ) section. The PT presents a document or problem statement and an assignment based on that document which elicits an open-ended response. The CLA+ added the SRQ section (which is not linked substantively to the PT scenario) to increase the number of student responses to obtain more reliable estimates of performance at the student-level than could be achieved with a single PT ( Zahner, 2013 ; Davey et al., 2015 ).

iPAL Assessment Framework

Methodological foundations.

The iPAL framework evolved from the Collegiate Learning Assessment developed by Klein et al. (2007) . It was also informed by the results from the AHELO pilot study ( Organisation for Economic Co-operation and Development [OECD], 2012 , 2013 ), as well as the KoKoHs research program in Germany (for an overview see, Zlatkin-Troitschanskaia et al., 2017 , 2020 ). The ongoing refinement of the iPAL framework has been guided in part by the principles of Evidence Centered Design (ECD) ( Mislevy et al., 2003 ; Mislevy and Haertel, 2006 ; Haertel and Fujii, 2017 ).

In educational measurement, an assessment framework plays a critical intermediary role between the theoretical formulation of the construct and the development of the assessment instrument containing tasks (or items) intended to elicit evidence with respect to that construct ( Mislevy et al., 2003 ). Builders of the assessment framework draw on the construct theory and operationalize it in a way that provides explicit guidance to PT’s developers. Thus, the framework should reflect the relevant facets of the construct, where relevance is determined by substantive theory or an appropriate alternative such as behavioral samples from real-world situations of interest (criterion-sampling; McClelland, 1973 ), as well as the intended use(s) (for an example, see Shavelson et al., 2019 ). By following the requirements and guidelines embodied in the framework, instrument developers strengthen the claim of construct validity for the instrument ( Messick, 1994 ).

An assessment framework can be specified at different levels of granularity: an assessment battery (“omnibus” assessment, for an example see below), a single performance task, or a specific component of an assessment ( Shavelson, 2010 ; Davey et al., 2015 ). In the iPAL program, a performance assessment comprises one or more extended performance tasks and additional selected-response and short constructed-response items. The focus of the framework specified below is on a single PT intended to elicit evidence with respect to some facets of CT, such as the evaluation of the trustworthiness of the documents provided and the capacity to address conflicts of principles.

From the ECD perspective, an assessment is an instrument for generating information to support an evidentiary argument and, therefore, the intended inferences (claims) must guide each stage of the design process. The construct of interest is operationalized through the Student Model , which represents the target knowledge, skills, and abilities, as well as the relationships among them. The student model should also make explicit the assumptions regarding student competencies in foundational skills or content knowledge. The Task Model specifies the features of the problems or items posed to the respondent, with the goal of eliciting the evidence desired. The assessment framework also describes the collection of task models comprising the instrument, with considerations of construct validity, various psychometric characteristics (e.g., reliability) and practical constraints (e.g., testing time and cost). The student model provides grounds for evidence of validity, especially cognitive validity; namely, that the students are thinking critically in responding to the task(s).

In the present context, the target construct (CT) is the competence of individuals to think critically, which entails solving complex, real-world problems, and clearly communicating their conclusions or recommendations for action based on trustworthy, relevant and unbiased information. The situations, drawn from actual events, are challenging and may arise in many possible settings. In contrast to more reductionist approaches to assessment development, the iPAL approach and framework rests on the assumption that properly addressing these situational demands requires the application of a constellation of CT skills appropriate to the particular task presented (e.g., Shavelson, 2010 , 2013 ). For a PT, the assessment framework must also specify the rubric by which the responses will be evaluated. The rubric must be properly linked to the target construct so that the resulting score profile constitutes evidence that is both relevant and interpretable in terms of the student model (for an example, see Zlatkin-Troitschanskaia et al., 2019 ).

iPAL Task Framework

The iPAL ‘omnibus’ framework comprises four main aspects: A storyline , a challenge , a document library , and a scoring rubric . Table 1 displays these aspects, brief descriptions of each, and the corresponding examples drawn from an iPAL performance assessment (Version adapted from original in Hyytinen and Toom, 2019 ). Storylines are drawn from various domains; for example, the worlds of business, public policy, civics, medicine, and family. They often involve moral and/or ethical considerations. Deriving an appropriate storyline from a real-world situation requires careful consideration of which features are to be kept in toto , which adapted for purposes of the assessment, and which to be discarded. Framing the challenge demands care in wording so that there is minimal ambiguity in what is required of the respondent. The difficulty of the challenge depends, in large part, on the nature and extent of the information provided in the document library , the amount of scaffolding included, as well as the scope of the required response. The amount of information and the scope of the challenge should be commensurate with the amount of time available. As is evident from the table, the characteristics of the documents in the library are intended to elicit responses related to facets of CT. For example, with regard to bias, the information provided is intended to play to judgmental errors due to fast thinking and/or motivational reasoning. Ideally, the situation should accommodate multiple solutions of varying degrees of merit.

The dimensions of the scoring rubric are derived from the Task Model and Student Model ( Mislevy et al., 2003 ) and signal which features are to be extracted from the response and indicate how they are to be evaluated. There should be a direct link between the evaluation of the evidence and the claims that are made with respect to the key features of the task model and student model . More specifically, the task model specifies the various manipulations embodied in the PA and so informs scoring, while the student model specifies the capacities students employ in more or less effectively responding to the tasks. The score scales for each of the five facets of CT (see section “Concept and Definition of Critical Thinking”) can be specified using appropriate behavioral anchors (for examples, see Zlatkin-Troitschanskaia and Shavelson, 2019 ). Of particular importance is the evaluation of the response with respect to the last dimension of the scoring rubric; namely, the overall coherence and persuasiveness of the argument, building on the explicit or implicit characteristics related to the first five dimensions. The scoring process must be monitored carefully to ensure that (trained) raters are judging each response based on the same types of features and evaluation criteria ( Braun, 2019 ) as indicated by interrater agreement coefficients.

The scoring rubric of the iPAL omnibus framework can be modified for specific tasks ( Lane and Stone, 2006 ). This generic rubric helps ensure consistency across rubrics for different storylines. For example, Zlatkin-Troitschanskaia et al. (2019 , p. 473) used the following scoring scheme:

Based on our construct definition of CT and its four dimensions: (D1-Info) recognizing and evaluating information, (D2-Decision) recognizing and evaluating arguments and making decisions, (D3-Conseq) recognizing and evaluating the consequences of decisions, and (D4-Writing), we developed a corresponding analytic dimensional scoring … The students’ performance is evaluated along the four dimensions, which in turn are subdivided into a total of 23 indicators as (sub)categories of CT … For each dimension, we sought detailed evidence in students’ responses for the indicators and scored them on a six-point Likert-type scale. In order to reduce judgment distortions, an elaborate procedure of ‘behaviorally anchored rating scales’ (Smith and Kendall, 1963) was applied by assigning concrete behavioral expectations to certain scale points (Bernardin et al., 1976). To this end, we defined the scale levels by short descriptions of typical behavior and anchored them with concrete examples. … We trained four raters in 1 day using a specially developed training course to evaluate students’ performance along the 23 indicators clustered into four dimensions (for a description of the rater training, see Klotzer, 2018).

Shavelson et al. (2019) examined the interrater agreement of the scoring scheme developed by Zlatkin-Troitschanskaia et al. (2019) and “found that with 23 items and 2 raters the generalizability (“reliability”) coefficient for total scores to be 0.74 (with 4 raters, 0.84)” ( Shavelson et al., 2019 , p. 15). In the study by Zlatkin-Troitschanskaia et al. (2019 , p. 478) three score profiles were identified (low-, middle-, and high-performer) for students. Proper interpretation of such profiles requires care. For example, there may be multiple possible explanations for low scores such as poor CT skills, a lack of a disposition to engage with the challenge, or the two attributes jointly. These alternative explanations for student performance can potentially pose a threat to the evidentiary argument. In this case, auxiliary information may be available to aid in resolving the ambiguity. For example, student responses to selected- and short-constructed-response items in the PA can provide relevant information about the levels of the different skills possessed by the student. When sufficient data are available, the scores can be modeled statistically and/or qualitatively in such a way as to bring them to bear on the technical quality or interpretability of the claims of the assessment: reliability, validity, and utility evidence ( Davey et al., 2015 ; Zlatkin-Troitschanskaia et al., 2019 ). These kinds of concerns are less critical when PT’s are used in classroom settings. The instructor can draw on other sources of evidence, including direct discussion with the student.

Use of iPAL Performance Assessments in Educational Practice: Evidence From Preliminary Validation Studies

The assessment framework described here supports the development of a PT in a general setting. Many modifications are possible and, indeed, desirable. If the PT is to be more deeply embedded in a certain discipline (e.g., economics, law, or medicine), for example, then the framework must specify characteristics of the narrative and the complementary documents as to the breadth and depth of disciplinary knowledge that is represented.

At present, preliminary field trials employing the omnibus framework (i.e., a full set of documents) indicated that 60 min was generally an inadequate amount of time for students to engage with the full set of complementary documents and to craft a complete response to the challenge (for an example, see Shavelson et al., 2019 ). Accordingly, it would be helpful to develop modified frameworks for PT’s that require substantially less time. For an example, see a short performance assessment of civic online reasoning, requiring response times from 10 to 50 min ( Wineburg et al., 2016 ). Such assessment frameworks could be derived from the omnibus framework by focusing on a reduced number of facets of CT, and specifying the characteristics of the complementary documents to be included – or, perhaps, choices among sets of documents. In principle, one could build a ‘family’ of PT’s, each using the same (or nearly the same) storyline and a subset of the full collection of complementary documents.

Paul and Elder (2007) argue that the goal of CT assessments should be to provide faculty with important information about how well their instruction supports the development of students’ CT. In that spirit, the full family of PT’s could represent all facets of the construct while affording instructors and students more specific insights on strengths and weaknesses with respect to particular facets of CT. Moreover, the framework should be expanded to include the design of a set of short answer and/or multiple choice items to accompany the PT. Ideally, these additional items would be based on the same narrative as the PT to collect more nuanced information on students’ precursor skills such as reading comprehension, while enhancing the overall reliability of the assessment. Areas where students are under-prepared could be addressed before, or even in parallel with the development of the focal CT skills. The parallel approach follows the co-requisite model of developmental education. In other settings (e.g., for summative assessment), these complementary items would be administered after the PT to augment the evidence in relation to the various claims. The full PT taking 90 min or more could serve as a capstone assessment.

As we transition from simply delivering paper-based assessments by computer to taking full advantage of the affordances of a digital platform, we should learn from the hard-won lessons of the past so that we can make swifter progress with fewer missteps. In that regard, we must take validity as the touchstone – assessment design, development and deployment must all be tightly linked to the operational definition of the CT construct. Considerations of reliability and practicality come into play with various use cases that highlight different purposes for the assessment (for future perspectives, see next section).

The iPAL assessment framework represents a feasible compromise between commercial, standardized assessments of CT (e.g., Liu et al., 2014 ), on the one hand, and, on the other, freedom for individual faculty to develop assessment tasks according to idiosyncratic models. It imposes a degree of standardization on both task development and scoring, while still allowing some flexibility for faculty to tailor the assessment to meet their unique needs. In so doing, it addresses a key weakness of the AAC&U’s VALUE initiative 2 (retrieved 5/7/2020) that has achieved wide acceptance among United States colleges.

The VALUE initiative has produced generic scoring rubrics for 15 domains including CT, problem-solving and written communication. A rubric for a particular skill domain (e.g., critical thinking) has five to six dimensions with four ordered performance levels for each dimension (1 = lowest, 4 = highest). The performance levels are accompanied by language that is intended to clearly differentiate among levels. 3 Faculty are asked to submit student work products from a senior level course that is intended to yield evidence with respect to student learning outcomes in a particular domain and that, they believe, can elicit performances at the highest level. The collection of work products is then graded by faculty from other institutions who have been trained to apply the rubrics.

A principal difficulty is that there is neither a common framework to guide the design of the challenge, nor any control on task complexity and difficulty. Consequently, there is substantial heterogeneity in the quality and evidential value of the submitted responses. This also causes difficulties with task scoring and inter-rater reliability. Shavelson et al. (2009) discuss some of the problems arising with non-standardized collections of student work.

In this context, one advantage of the iPAL framework is that it can provide valuable guidance and an explicit structure for faculty in developing performance tasks for both instruction and formative assessment. When faculty design assessments, their focus is typically on content coverage rather than other potentially important characteristics, such as the degree of construct representation and the adequacy of their scoring procedures ( Braun, 2019 ).

Concluding Reflections

Challenges to interpretation and implementation.

Performance tasks such as those generated by iPAL are attractive instruments for assessing CT skills (e.g., Shavelson, 2010 ; Shavelson et al., 2019 ). The attraction mainly rests on the assumption that elaborated PT’s are more authentic (direct) and more completely capture facets of the target construct (i.e., possess greater construct representation) than the widely used selected-response tests. However, as Messick (1994) noted authenticity is a “promissory note” that must be redeemed with empirical research. In practice, there are trade-offs among authenticity, construct validity, and psychometric quality such as reliability ( Davey et al., 2015 ).

One reason for Messick (1994) caution is that authenticity does not guarantee construct validity. The latter must be established by drawing on multiple sources of evidence ( American Educational Research Association et al., 2014 ). Following the ECD principles in designing and developing the PT, as well as the associated scoring rubrics, constitutes an important type of evidence. Further, as Leighton (2019) argues, response process data (“cognitive validity”) is needed to validate claims regarding the cognitive complexity of PT’s. Relevant data can be obtained through cognitive laboratory studies involving methods such as think aloud protocols or eye-tracking. Although time-consuming and expensive, such studies can yield not only evidence of validity, but also valuable information to guide refinements of the PT.

Going forward, iPAL PT’s must be subjected to validation studies as recommended in the Standards for Psychological and Educational Testing by American Educational Research Association et al. (2014) . With a particular focus on the criterion “relationships to other variables,” a framework should include assumptions about the theoretically expected relationships among the indicators assessed by the PT, as well as the indicators’ relationships to external variables such as intelligence or prior (task-relevant) knowledge.

Complementing the necessity of evaluating construct validity, there is the need to consider potential sources of construct-irrelevant variance (CIV). One pertains to student motivation, which is typically greater when the stakes are higher. If students are not motivated, then their performance is likely to be impacted by factors unrelated to their (construct-relevant) ability ( Lane and Stone, 2006 ; Braun et al., 2011 ; Shavelson, 2013 ). Differential motivation across groups can also bias comparisons. Student motivation might be enhanced if the PT is administered in the context of a course with the promise of generating useful feedback on students’ skill profiles.

Construct-irrelevant variance can also occur when students are not equally prepared for the format of the PT or fully appreciate the response requirements. This source of CIV could be alleviated by providing students with practice PT’s. Finally, the use of novel forms of documentation, such as those from the Internet, can potentially introduce CIV due to differential familiarity with forms of representation or contents. Interestingly, this suggests that there may be a conflict between enhancing construct representation and reducing CIV.

Another potential source of CIV is related to response evaluation. Even with training, human raters can vary in accuracy and usage of the full score range. In addition, raters may attend to features of responses that are unrelated to the target construct, such as the length of the students’ responses or the frequency of grammatical errors ( Lane and Stone, 2006 ). Some of these sources of variance could be addressed in an online environment, where word processing software could alert students to potential grammatical and spelling errors before they submit their final work product.

Performance tasks generally take longer to administer and are more costly than traditional assessments, making it more difficult to reliably measure student performance ( Messick, 1994 ; Davey et al., 2015 ). Indeed, it is well known that more than one performance task is needed to obtain high reliability ( Shavelson, 2013 ). This is due to both student-task interactions and variability in scoring. Sources of student-task interactions are differential familiarity with the topic ( Hyytinen and Toom, 2019 ) and differential motivation to engage with the task. The level of reliability required, however, depends on the context of use. For use in formative assessment as part of an instructional program, reliability can be lower than use for summative purposes. In the former case, other types of evidence are generally available to support interpretation and guide pedagogical decisions. Further studies are needed to obtain estimates of reliability in typical instructional settings.

With sufficient data, more sophisticated psychometric analyses become possible. One challenge is that the assumption of unidimensionality required for many psychometric models might be untenable for performance tasks ( Davey et al., 2015 ). Davey et al. (2015) provide the example of a mathematics assessment that requires students to demonstrate not only their mathematics skills but also their written communication skills. Although the iPAL framework does not explicitly address students’ reading comprehension and organization skills, students will likely need to call on these abilities to accomplish the task. Moreover, as the operational definition of CT makes evident, the student must not only deploy several skills in responding to the challenge of the PT, but also carry out component tasks in sequence. The former requirement strongly indicates the need for a multi-dimensional IRT model, while the latter suggests that the usual assumption of local item independence may well be problematic ( Lane and Stone, 2006 ). At the same time, the analytic scoring rubric should facilitate the use of latent class analysis to partition data from large groups into meaningful categories ( Zlatkin-Troitschanskaia et al., 2019 ).

Future Perspectives

Although the iPAL consortium has made substantial progress in the assessment of CT, much remains to be done. Further refinement of existing PT’s and their adaptation to different languages and cultures must continue. To this point, there are a number of examples: The refugee crisis PT (cited in Table 1 ) was translated and adapted from Finnish to US English and then to Colombian Spanish. A PT concerning kidney transplants was translated and adapted from German to US English. Finally, two PT’s based on ‘legacy admissions’ to US colleges were translated and adapted to Colombian Spanish.

With respect to data collection, there is a need for sufficient data to support psychometric analysis of student responses, especially the relationships among the different components of the scoring rubric, as this would inform both task development and response evaluation ( Zlatkin-Troitschanskaia et al., 2019 ). In addition, more intensive study of response processes through cognitive laboratories and the like are needed to strengthen the evidential argument for construct validity ( Leighton, 2019 ). We are currently conducting empirical studies, collecting data on both iPAL PT’s and other measures of CT. These studies will provide evidence of convergent and discriminant validity.

At the same time, efforts should be directed at further development to support different ways CT PT’s might be used—i.e., use cases—especially those that call for formative use of PT’s. Incorporating formative assessment into courses can plausibly be expected to improve students’ competency acquisition ( Zlatkin-Troitschanskaia et al., 2017 ). With suitable choices of storylines, appropriate combinations of (modified) PT’s, supplemented by short-answer and multiple-choice items, could be interwoven into ordinary classroom activities. The supplementary items may be completely separate from the PT’s (as is the case with the CLA+), loosely coupled with the PT’s (as in drawing on the same storyline), or tightly linked to the PT’s (as in requiring elaboration of certain components of the response to the PT).

As an alternative to such integration, stand-alone modules could be embedded in courses to yield evidence of students’ generic CT skills. Core curriculum courses or general education courses offer ideal settings for embedding performance assessments. If these assessments were administered to a representative sample of students in each cohort over their years in college, the results would yield important information on the development of CT skills at a population level. For another example, these PA’s could be used to assess the competence profiles of students entering Bachelor’s or graduate-level programs as a basis for more targeted instructional support.

Thus, in considering different use cases for the assessment of CT, it is evident that several modifications of the iPAL omnibus assessment framework are needed. As noted earlier, assessments built according to this framework are demanding with respect to the extensive preliminary work required by a task and the time required to properly complete it. Thus, it would be helpful to have modified versions of the framework, focusing on one or two facets of the CT construct and calling for a smaller number of supplementary documents. The challenge to the student should be suitably reduced.

Some members of the iPAL collaborative have developed PT’s that are embedded in disciplines such as engineering, law and education ( Crump et al., 2019 ; for teacher education examples, see Jeschke et al., 2019 ). These are proving to be of great interest to various stakeholders and further development is likely. Consequently, it is essential that an appropriate assessment framework be established and implemented. It is both a conceptual and an empirical question as to whether a single framework can guide development in different domains.

Performance Assessment in Online Learning Environment

Over the last 15 years, increasing amounts of time in both college and work are spent using computers and other electronic devices. This has led to formulation of models for the new literacies that attempt to capture some key characteristics of these activities. A prominent example is a model proposed by Leu et al. (2020) . The model frames online reading as a process of problem-based inquiry that calls on five practices to occur during online research and comprehension:

1. Reading to identify important questions,

2. Reading to locate information,

3. Reading to critically evaluate information,

4. Reading to synthesize online information, and

5. Reading and writing to communicate online information.

The parallels with the iPAL definition of CT are evident and suggest there may be benefits to closer links between these two lines of research. For example, a report by Leu et al. (2014) describes empirical studies comparing assessments of online reading using either open-ended or multiple-choice response formats.

The iPAL consortium has begun to take advantage of the affordances of the online environment (for examples, see Schmidt et al. and Nagel et al. in this special issue). Most obviously, Supplementary Materials can now include archival photographs, audio recordings, or videos. Additional tasks might include the online search for relevant documents, though this would add considerably to the time demands. This online search could occur within a simulated Internet environment, as is the case for the IEA’s ePIRLS assessment ( Mullis et al., 2017 ).

The prospect of having access to a wealth of materials that can add to task authenticity is exciting. Yet it can also add ambiguity and information overload. Increased authenticity, then, should be weighed against validity concerns and the time required to absorb the content in these materials. Modifications of the design framework and extensive empirical testing will be required to decide on appropriate trade-offs. A related possibility is to employ some of these materials in short-answer (or even selected-response) items that supplement the main PT. Response formats could include highlighting text or using a drag-and-drop menu to construct a response. Students’ responses could be automatically scored, thereby containing costs. With automated scoring, feedback to students and faculty, including suggestions for next steps in strengthening CT skills, could also be provided without adding to faculty workload. Therefore, taking advantage of the online environment to incorporate new types of supplementary documents should be a high priority and, perhaps, to introduce new response formats as well. Finally, further investigation of the overlap between this formulation of CT and the characterization of online reading promulgated by Leu et al. (2020) is a promising direction to pursue.

Data Availability Statement

All datasets generated for this study are included in the article/supplementary material.

Author Contributions

HB wrote the article. RS, OZ-T, and KB were involved in the preparation and revision of the article and co-wrote the manuscript. All authors contributed to the article and approved the submitted version.

This study was funded in part by the Spencer Foundation (Grant No. #201700123).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We would like to thank all the researchers who have participated in the iPAL program.

  • ^ https://www.ipal-rd.com/
  • ^ https://www.aacu.org/value
  • ^ When test results are reported by means of substantively defined categories, the scoring is termed “criterion-referenced”. This is, in contrast to results, reported as percentiles; such scoring is termed “norm-referenced”.

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (2014). Standards for Educational and Psychological Testing. Washington, D.C: American Educational Research Association.

Google Scholar

Arum, R., and Roksa, J. (2011). Academically Adrift: Limited Learning on College Campuses. Chicago, IL: University of Chicago Press.

Association of American Colleges and Universities (n.d.). VALUE: What is value?. Available online at:: https://www.aacu.org/value (accessed May 7, 2020).

Association of American Colleges and Universities [AACU] (2018). Fulfilling the American Dream: Liberal Education and the Future of Work. Available online at:: https://www.aacu.org/research/2018-future-of-work (accessed May 1, 2020).

Braun, H. (2019). Performance assessment and standardization in higher education: a problematic conjunction? Br. J. Educ. Psychol. 89, 429–440. doi: 10.1111/bjep.12274

PubMed Abstract | CrossRef Full Text | Google Scholar

Braun, H. I., Kirsch, I., and Yamoto, K. (2011). An experimental study of the effects of monetary incentives on performance on the 12th grade NAEP reading assessment. Teach. Coll. Rec. 113, 2309–2344.

Crump, N., Sepulveda, C., Fajardo, A., and Aguilera, A. (2019). Systematization of performance tests in critical thinking: an interdisciplinary construction experience. Rev. Estud. Educ. 2, 17–47.

Davey, T., Ferrara, S., Shavelson, R., Holland, P., Webb, N., and Wise, L. (2015). Psychometric Considerations for the Next Generation of Performance Assessment. Washington, DC: Center for K-12 Assessment & Performance Management, Educational Testing Service.

Erwin, T. D., and Sebrell, K. W. (2003). Assessment of critical thinking: ETS’s tasks in critical thinking. J. Gen. Educ. 52, 50–70. doi: 10.1353/jge.2003.0019

CrossRef Full Text | Google Scholar

Haertel, G. D., and Fujii, R. (2017). “Evidence-centered design and postsecondary assessment,” in Handbook on Measurement, Assessment, and Evaluation in Higher Education , 2nd Edn, eds C. Secolsky and D. B. Denison (Abingdon: Routledge), 313–339. doi: 10.4324/9781315709307-26

Hyytinen, H., and Toom, A. (2019). Developing a performance assessment task in the Finnish higher education context: conceptual and empirical insights. Br. J. Educ. Psychol. 89, 551–563. doi: 10.1111/bjep.12283

Hyytinen, H., Toom, A., and Shavelson, R. J. (2019). “Enhancing scientific thinking through the development of critical thinking in higher education,” in Redefining Scientific Thinking for Higher Education: Higher-Order Thinking, Evidence-Based Reasoning and Research Skills , eds M. Murtonen and K. Balloo (London: Palgrave MacMillan).

Indiana University (2019). FSSE 2019 Frequencies: FSSE 2019 Aggregate. Available online at:: http://fsse.indiana.edu/pdf/FSSE_IR_2019/summary_tables/FSSE19_Frequencies_(FSSE_2019).pdf (accessed May 1, 2020).

Jeschke, C., Kuhn, C., Lindmeier, A., Zlatkin-Troitschanskaia, O., Saas, H., and Heinze, A. (2019). Performance assessment to investigate the domain specificity of instructional skills among pre-service and in-service teachers of mathematics and economics. Br. J. Educ. Psychol. 89, 538–550. doi: 10.1111/bjep.12277

Kegan, R. (1994). In Over Our Heads: The Mental Demands of Modern Life. Cambridge, MA: Harvard University Press.

Klein, S., Benjamin, R., Shavelson, R., and Bolus, R. (2007). The collegiate learning assessment: facts and fantasies. Eval. Rev. 31, 415–439. doi: 10.1177/0193841x07303318

Kosslyn, S. M., and Nelson, B. (2017). Building the Intentional University: Minerva and the Future of Higher Education. Cambridge, MAL: The MIT Press.

Lane, S., and Stone, C. A. (2006). “Performance assessment,” in Educational Measurement , 4th Edn, ed. R. L. Brennan (Lanham, MA: Rowman & Littlefield Publishers), 387–432.

Leighton, J. P. (2019). The risk–return trade-off: performance assessments and cognitive validation of inferences. Br. J. Educ. Psychol. 89, 441–455. doi: 10.1111/bjep.12271

Leu, D. J., Kiili, C., Forzani, E., Zawilinski, L., McVerry, J. G., and O’Byrne, W. I. (2020). “The new literacies of online research and comprehension,” in The Concise Encyclopedia of Applied Linguistics , ed. C. A. Chapelle (Oxford: Wiley-Blackwell), 844–852.

Leu, D. J., Kulikowich, J. M., Kennedy, C., and Maykel, C. (2014). “The ORCA Project: designing technology-based assessments for online research,” in Paper Presented at the American Educational Research Annual Meeting , Philadelphia, PA.

Liu, O. L., Frankel, L., and Roohr, K. C. (2014). Assessing critical thinking in higher education: current state and directions for next-generation assessments. ETS Res. Rep. Ser. 1, 1–23. doi: 10.1002/ets2.12009

McClelland, D. C. (1973). Testing for competence rather than for “intelligence.”. Am. Psychol. 28, 1–14. doi: 10.1037/h0034092

McGrew, S., Ortega, T., Breakstone, J., and Wineburg, S. (2017). The challenge that’s bigger than fake news: civic reasoning in a social media environment. Am. Educ. 4, 4-9, 39.

Mejía, A., Mariño, J. P., and Molina, A. (2019). Incorporating perspective analysis into critical thinking performance assessments. Br. J. Educ. Psychol. 89, 456–467. doi: 10.1111/bjep.12297

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educ. Res. 23, 13–23. doi: 10.3102/0013189x023002013

Mislevy, R. J., Almond, R. G., and Lukas, J. F. (2003). A brief introduction to evidence-centered design. ETS Res. Rep. Ser. 2003, i–29. doi: 10.1002/j.2333-8504.2003.tb01908.x

Mislevy, R. J., and Haertel, G. D. (2006). Implications of evidence-centered design for educational testing. Educ. Meas. Issues Pract. 25, 6–20. doi: 10.1111/j.1745-3992.2006.00075.x

Mullis, I. V. S., Martin, M. O., Foy, P., and Hooper, M. (2017). ePIRLS 2016 International Results in Online Informational Reading. Available online at:: http://timssandpirls.bc.edu/pirls2016/international-results/ (accessed May 1, 2020).

Nagel, M.-T., Zlatkin-Troitschanskaia, O., Schmidt, S., and Beck, K. (2020). “Performance assessment of generic and domain-specific skills in higher education economics,” in Student Learning in German Higher Education , eds O. Zlatkin-Troitschanskaia, H. A. Pant, M. Toepper, and C. Lautenbach (Berlin: Springer), 281–299. doi: 10.1007/978-3-658-27886-1_14

Organisation for Economic Co-operation and Development [OECD] (2012). AHELO: Feasibility Study Report , Vol. 1. Paris: OECD. Design and implementation.

Organisation for Economic Co-operation and Development [OECD] (2013). AHELO: Feasibility Study Report , Vol. 2. Paris: OECD. Data analysis and national experiences.

Oser, F. K., and Biedermann, H. (2020). “A three-level model for critical thinking: critical alertness, critical reflection, and critical analysis,” in Frontiers and Advances in Positive Learning in the Age of Information (PLATO) , ed. O. Zlatkin-Troitschanskaia (Cham: Springer), 89–106. doi: 10.1007/978-3-030-26578-6_7

Paul, R., and Elder, L. (2007). Consequential validity: using assessment to drive instruction. Found. Crit. Think. 29, 31–40.

Pellegrino, J. W., and Hilton, M. L. (eds) (2012). Education for life and work: Developing Transferable Knowledge and Skills in the 21st Century. Washington DC: National Academies Press.

Shavelson, R. (2010). Measuring College Learning Responsibly: Accountability in a New Era. Redwood City, CA: Stanford University Press.

Shavelson, R. J. (2013). On an approach to testing and modeling competence. Educ. Psychol. 48, 73–86. doi: 10.1080/00461520.2013.779483

Shavelson, R. J., Zlatkin-Troitschanskaia, O., Beck, K., Schmidt, S., and Marino, J. P. (2019). Assessment of university students’ critical thinking: next generation performance assessment. Int. J. Test. 19, 337–362. doi: 10.1080/15305058.2018.1543309

Shavelson, R. J., Zlatkin-Troitschanskaia, O., and Marino, J. P. (2018). “International performance assessment of learning in higher education (iPAL): research and development,” in Assessment of Learning Outcomes in Higher Education: Cross-National Comparisons and Perspectives , eds O. Zlatkin-Troitschanskaia, M. Toepper, H. A. Pant, C. Lautenbach, and C. Kuhn (Berlin: Springer), 193–214. doi: 10.1007/978-3-319-74338-7_10

Shavelson, R. J., Klein, S., and Benjamin, R. (2009). The limitations of portfolios. Inside Higher Educ. Available online at: https://www.insidehighered.com/views/2009/10/16/limitations-portfolios

Stolzenberg, E. B., Eagan, M. K., Zimmerman, H. B., Berdan Lozano, J., Cesar-Davis, N. M., Aragon, M. C., et al. (2019). Undergraduate Teaching Faculty: The HERI Faculty Survey 2016–2017. Los Angeles, CA: UCLA.

Tessier-Lavigne, M. (2020). Putting Ethics at the Heart of Innovation. Stanford, CA: Stanford Magazine.

Wheeler, P., and Haertel, G. D. (1993). Resource Handbook on Performance Assessment and Measurement: A Tool for Students, Practitioners, and Policymakers. Palm Coast, FL: Owl Press.

Wineburg, S., McGrew, S., Breakstone, J., and Ortega, T. (2016). Evaluating Information: The Cornerstone of Civic Online Reasoning. Executive Summary. Stanford, CA: Stanford History Education Group.

Zahner, D. (2013). Reliability and Validity–CLA+. Council for Aid to Education. Available online at:: https://pdfs.semanticscholar.org/91ae/8edfac44bce3bed37d8c9091da01d6db3776.pdf .

Zlatkin-Troitschanskaia, O., and Shavelson, R. J. (2019). Performance assessment of student learning in higher education [Special issue]. Br. J. Educ. Psychol. 89, i–iv, 413–563.

Zlatkin-Troitschanskaia, O., Pant, H. A., Lautenbach, C., Molerov, D., Toepper, M., and Brückner, S. (2017). Modeling and Measuring Competencies in Higher Education: Approaches to Challenges in Higher Education Policy and Practice. Berlin: Springer VS.

Zlatkin-Troitschanskaia, O., Pant, H. A., Toepper, M., and Lautenbach, C. (eds) (2020). Student Learning in German Higher Education: Innovative Measurement Approaches and Research Results. Wiesbaden: Springer.

Zlatkin-Troitschanskaia, O., Shavelson, R. J., and Pant, H. A. (2018). “Assessment of learning outcomes in higher education: international comparisons and perspectives,” in Handbook on Measurement, Assessment, and Evaluation in Higher Education , 2nd Edn, eds C. Secolsky and D. B. Denison (Abingdon: Routledge), 686–697.

Zlatkin-Troitschanskaia, O., Shavelson, R. J., Schmidt, S., and Beck, K. (2019). On the complementarity of holistic and analytic approaches to performance assessment scoring. Br. J. Educ. Psychol. 89, 468–484. doi: 10.1111/bjep.12286

Keywords : critical thinking, performance assessment, assessment framework, scoring rubric, evidence-centered design, 21st century skills, higher education

Citation: Braun HI, Shavelson RJ, Zlatkin-Troitschanskaia O and Borowiec K (2020) Performance Assessment of Critical Thinking: Conceptualization, Design, and Implementation. Front. Educ. 5:156. doi: 10.3389/feduc.2020.00156

Received: 30 May 2020; Accepted: 04 August 2020; Published: 08 September 2020.

Reviewed by:

Copyright © 2020 Braun, Shavelson, Zlatkin-Troitschanskaia and Borowiec. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Henry I. Braun, [email protected]

This article is part of the Research Topic

Assessing Information Processing and Online Reasoning as a Prerequisite for Learning in Higher Education

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

jintelligence-logo

Article Menu

  • Subscribe SciFeed
  • Recommended Articles
  • PubMed/Medline
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Predicting everyday critical thinking: a review of critical thinking assessments.

articles on assessing critical thinking

1. Introduction

2. how critical thinking impacts everyday life, 3. critical thinking: skills and dispositions.

“the use of those cognitive skills and abilities that increase the probability of a desirable outcome. It is used to describe thinking that is purposeful, reasoned, and goal directed—the kind of thinking involved in solving problems, formulating inferences, calculating likelihoods, and making decisions” ( Halpern 2014, p. 8 ).

4. Measuring Critical Thinking

4.1. practical challenges, 4.2. critical thinking assessments, 4.2.1. california critical thinking dispositions inventory (cctdi; insight assessment, inc. n.d. ), 4.2.2. california critical thinking skills test (cctst; insight assessment, inc. n.d. ), 4.2.3. cornell critical thinking test (cctt; the critical thinking company n.d. ), 4.2.4. california measure of mental motivation (cm3; insight assessment, inc. n.d. ), 4.2.5. ennis–weir critical thinking essay test ( ennis and weir 2005 ), 4.2.6. halpern critical thinking assessment (hcta; halpern 2012 ), 4.2.7. test of everyday reasoning (ter; insight assessment, inc. n.d. ), 4.2.8. watson–glaser tm ii critical thinking appraisal (w-gii; ncs pearson, inc. 2009 ).

“Virtual employees, or employees who work from home via a computer, are an increasing trend. In the US, the number of virtual employees has increased by 39% in the last two years and 74% in the last five years. Employing virtual workers reduces costs and makes it possible to use talented workers no matter where they are located globally. Yet, running a workplace with virtual employees might entail miscommunication and less camaraderie and can be more time-consuming than face-to-face interaction”.

5. Conclusions

Institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

  • Ali, Marium, and AJLabs. 2023. How Many Years Does a Typical User Spend on Social Media? Doha: Al Jazeera. Available online: https://www.aljazeera.com/news/2023/6/30/how-many-years-does-a-typical-user-spend-on-social-media (accessed on 13 November 2023).
  • Arendasy, Martin, Lutz Hornke, Markus Sommer, Michaela Wagner-Menghin, Georg Gittler, Joachim Häusler, Bettina Bognar, and M. Wenzl. 2012. Intelligenz-Struktur-Batterie (Intelligence Structure Battery; INSBAT) . Mödling: Schuhfried GmbH. [ Google Scholar ]
  • Arum, Richard, and Josipa Roksa. 2010. Academically Adrift . Chicago: The University of Chicago Press. [ Google Scholar ]
  • Bakshy, Eytan, Solomon Messing, and Lada Adamic. 2015. Exposure to ideologically diverse news and opinion on Facebook. Science 348: 1130–32. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bart, William. 2010. The Measurement and Teaching of Critical Thinking Skills . Tokyo: Invited colloquium given at the Center for Research on Education Testing. [ Google Scholar ]
  • Bruine de Bruin, Wandi, Andrew Parker, and Baruch Fischhoff. 2007. Individual differences in adult decision-making competence. Journal of Personality and Social Psychology 92: 938–56. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Butler, Heather. 2012. Halpern Critical Thinking Assessment predicts real-world outcomes of critical thinking. Applied Cognitive Psychology 26: 721–29. [ Google Scholar ] [ CrossRef ]
  • Butler, Heather, and Diane Halpern. 2020. Critical Thinking Impacts Our Everyday Lives. In Critical Thinking in Psychology , 2nd ed. Edited by Robert Sternberg and Diane Halpern. Cambridge, UK: Cambridge University Press. [ Google Scholar ] [ CrossRef ]
  • Butler, Heather, Chris Dwyer, Michael Hogan, Amanda Franco, Silvia Rivas, Carlos Saiz, and Leandro Almeida. 2012. Halpern Critical Thinking Assessment and real-world outcomes: Cross-national applications. Thinking Skills and Creativity 7: 112–21. [ Google Scholar ] [ CrossRef ]
  • Butler, Heather, Chris Pentoney, and Mabelle Bong. 2017. Critical thinking ability is a better predictor of life decisions than intelligence. Thinking Skills and Creativity 24: 38–46. [ Google Scholar ] [ CrossRef ]
  • Ennis, Robert. 2005. The Ennis-Weir Critical Thinking Essay Test . Urbana: The Illinois Critical Thinking Project. Available online: http://faculty.ed.uiuc.edu/rhennis/supplewmanual1105.htm (accessed on 22 October 2023).
  • Ennis, Robert, and Eric Weir. 2005. Ennis-Weir Critical Thinking Essay Test . Seaside: The Critical Thinking Company. Available online: https://www.academia.edu/1847582/The_Ennis_Weir_Critical_Thinking_Essay_Test_An_Instrument_for_Teaching_and_Testing (accessed on 22 October 2023).
  • Facione, Peter. 1990. California Critical Thinking Dispositions Inventory . Millbrae: The California Academic Press. [ Google Scholar ]
  • Facione, Peter, Noreen Facione, and Kathryn Winterhalter. 2012. The Test of Everyday Reasoning—(TER): Test Manual . Millbrae: California Academic Press. [ Google Scholar ]
  • Forsyth, Carol, Philip Pavlik, Arthur C. Graesser, Zhiqiang Cai, Mae-lynn Germany, Keith Millis, Robert P. Dolan, Heather Butler, and Diane Halpern. 2012. Learning gains for core concepts in a serious game on scientific reasoning. In Proceedings of the 5th International Conference on Educational Data Mining . Edited by Kalina Yacef, Osmar Zaïane, Arnon Hershkovitz, Michael Yudelson and John Stamper. Chania: International Educational Data Mining Society, pp. 172–75. [ Google Scholar ]
  • French, Brian, Brian Hand, William Therrien, and Juan Valdivia Vazquez. 2012. Detection of sex differential item functioning in the Cornell Critical Thinking Test. European Journal of Psychological Assessment 28: 201–7. [ Google Scholar ] [ CrossRef ]
  • Frenkel, Sheera, and Mike Isaac. 2018. Facebook ‘Better Prepared’ to Fight Election Interference, Mark Zuckerberg Says . Manhattan: New York Times. Available online: https://www.nytimes.com/2018/09/13/technology/facebook-elections-mark-zuckerberg.html (accessed on 22 October 2023).
  • Gheorghia, Olimpiu. 2018. Romania’s Measles Outbreak Kills Dozens of Children: Some Doctors Complain They Don’t Have Sufficient Stock of Vaccines . New York: Associated Press. Available online: https://www.nbcnews.com/health/health-news/romania-s-measles-outbreak-kills-dozens-children-n882771 (accessed on 13 November 2023).
  • Giancarlo, Carol, Stephen Bloom, and Tim Urdan. 2004. Assessing secondary students’ disposition toward critical thinking: Development of the California Measure of Mental Motivation. Educational and Psychological Measurement 64: 347–64. [ Google Scholar ] [ CrossRef ]
  • Halpern, Diane. 1998. Teaching critical thinking for transfer across domains: Dispositions, skills, structure training, and metacognitive monitoring. American Psychologist 53: 449–55. [ Google Scholar ] [ CrossRef ]
  • Halpern, Diane. 2012. Halpern Critical Thinking Assessment . Mödling: Schuhfried (Vienna Test System). Available online: http://www.schuhfried.com/vienna-test-system-vts/all-tests-from-a-z/test/hcta-halpern-critical-thinking-assessment-1/ (accessed on 13 January 2013).
  • Halpern, Diane. 2014. Thought and Knowledge: An Introduction to Critical Thinking , 5th ed. New York: Routledge Publishers. [ Google Scholar ]
  • Halpern, Diane, Keith Millis, Arthur Graesser, Heather Butler, Carol Forsyth, and Zhiqiang Cai. 2012. Operation ARIES!: A computerized learning game that teaches critical thinking and scientific reasoning. Thinking Skills and Creativity 7: 93–100. [ Google Scholar ] [ CrossRef ]
  • Huber, Christopher, and Nathan Kuncel. 2015. Does college teach critical thinking? A meta-analysis. Review of Educational Research 86: 431–68. [ Google Scholar ] [ CrossRef ]
  • Insight Assessment, Inc. n.d. Critical Thinking Attribute Tests: Manuals and Assessment Information . Hermosa Beach: Insight Assessment. Available online: http://www.insightassessment.com (accessed on 22 October 2023).
  • Jain, Anjali, Jaclyn Marshall, Ami Buikema, Tim Bancroft, Jonathan Kelly, and Craig Newschaffer. 2015. Autism occurrence by MMR vaccine status among US children with older siblings with and without autism. Journal of the American Medical Association 313: 1534–40. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Klee, Miles, and Nikki McCann Ramirez. 2023. AI Has Made the Israel-Hamas Misinformation Epidemic Much, Much Worse . New York: Rollingstone. Available online: https://www.rollingstone.com/politics/politics-features/israel-hamas-misinformation-fueled-ai-images-1234863586/amp/?fbclid=PAAabKD4u1FRqCp-y9z3VRA4PZZdX52DTQEn8ruvHeGsBrNguD_F2EiMrs3A4_aem_AaxFU9ovwsrXAo39I00d-8NmcpRTVBCsUd_erAUwlAjw16x1shqeC6s22OCpSSx2H-w (accessed on 27 October 2023).
  • Klepper, David. 2022. Poll: Most in US Say Misinformation Spurs Extremism, Hate . New York: Associated Press-NORC Center for Public Affairs Research. Available online: https://apnorc.org/poll-most-in-us-say-misinformation-spurs-extremism-hate/ (accessed on 27 October 2023).
  • Landis, Richard, and William Michael. 1981. The factorial validity of three measures of critical thinking within the context of Guilford’s Structure-of-Intellect Model for a sample of ninth grade students. Educational and Psychological Measurement 41: 1147–66. [ Google Scholar ] [ CrossRef ]
  • Liedke, Jacob, and Luxuan Wang. 2023. Social Media and News Fact Sheet . Washington, DC: Pew Research Center. Available online: https://www.pewresearch.org/journalism/fact-sheet/social-media-and-news-fact-sheet/ (accessed on 15 November 2023).
  • Lilienfeld, Scott, Rachel Ammirati, and Kristin Landfield. 2009. Giving debiasing away: Can psychological research on correcting cognitive errors promote human welfare? Perspective on Psychological Science 4: 390–98. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Michael, Joan, Roberta Devaney, and William Michael. 1980. The factorial validity of the Cornell Critical Thinking Test for a junior high school sample. Educational and Psychological Measurement 40: 437–50. [ Google Scholar ] [ CrossRef ]
  • National Center for Health Statistics. 2015. Health, United States, 2015, with Special Feature on Racial and Ethnic Health Disparities ; Washington, DC: U.S. Government Printing Office.
  • NCS Pearson, Inc. 2009. Watson-Glaser II Critical Thinking Appraisal: Technical Manual and User’s Guide . London: Pearson. Available online: http://www.talentlens.com/en/downloads/supportmaterials/WGII_Technical_Manual.pdf (accessed on 22 October 2023).
  • Stanovich, Keith, and Richard West. 2008. On the failure of cognitive ability to predict myside and one-sided thinking biases. Thinking & Reasoning 14: 129–67. [ Google Scholar ] [ CrossRef ]
  • The Critical Thinking Company. n.d. Critical Thinking Company. Available online: www.criticalthinking.com (accessed on 13 October 2023).
  • Tsipursky, Gleb. 2018. (Dis)trust in Science: Can We Cure the Scourge of Misinformation? New York: Scientific American. Available online: https://blogs.scientificamerican.com/observations/dis-trust-in-science/ (accessed on 11 April 2022).
  • Walsh, Catherina, Lisa Seldomridge, and Karen Badros. 2007. California Critical Thinking Disposition Inventory: Further factor analytic examination. Perceptual and Motor Skills 104: 141–51. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • World Health Organization. 2018. Europe Observes a 4-Fold Increase in Measles Cases in 2017 Compared to Previous Year . Geneva: World Health Organization. Available online: http://www.euro.who.int/en/media-centre/sections/press-releases/2018/europe-observes-a-4-fold-increase-in-measles-cases-in-2017-compared-to-previous-year (accessed on 22 October 2023).
CCTDI CCTST CCTT CM3 E-W HCTA TER W-GII
ConstructDispositionSkillsSkillsDispositionSkillsSkillsSkillsSkills
Respondent Age18+18+10+5+12+18+Late childhood to adulthood18+
Format(s)Digital and paperDigitalPaperDigital and paperpaperDigitalDigital and paperDigital
Length75 items4052–76 items25 items1 problem20–40 items35 items40 items
Administration Time30 min55 min50 min20 min40 min20–45 min45 min30 min
Response FormatMultiple-choiceMultiple-choiceMultiple-choiceMultiple-choiceEssayMultiple-choice and short-answerDichotomous choiceMultiple-choice
Feeyesyesyesyesnoyesyesyes
Evidence—Reliabilityyesyesyesyesnoyesyesyes
Evidence—validitynoyesnoyesyesyesNone availableyes
Credential required for administrationyesnononononoDeveloper scoresno
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Butler, H.A. Predicting Everyday Critical Thinking: A Review of Critical Thinking Assessments. J. Intell. 2024 , 12 , 16. https://doi.org/10.3390/jintelligence12020016

Butler HA. Predicting Everyday Critical Thinking: A Review of Critical Thinking Assessments. Journal of Intelligence . 2024; 12(2):16. https://doi.org/10.3390/jintelligence12020016

Butler, Heather A. 2024. "Predicting Everyday Critical Thinking: A Review of Critical Thinking Assessments" Journal of Intelligence 12, no. 2: 16. https://doi.org/10.3390/jintelligence12020016

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

Critical Thinking Is About Asking Better Questions

  • John Coleman

articles on assessing critical thinking

Six practices to sharpen your inquiry.

Critical thinking is the ability to analyze and effectively break down an issue in order to make a decision or find a solution. At the heart of critical thinking is the ability to formulate deep, different, and effective questions. For effective questioning, start by holding your hypotheses loosely. Be willing to fundamentally reconsider your initial conclusions — and do so without defensiveness. Second, listen more than you talk through active listening. Third, leave your queries open-ended, and avoid yes-or-no questions. Fourth, consider the counterintuitive to avoid falling into groupthink. Fifth, take the time to stew in a problem, rather than making decisions unnecessarily quickly. Last, ask thoughtful, even difficult, follow-ups.

Are you tackling a new and difficult problem at work? Recently promoted and trying to both understand your new role and bring a fresh perspective? Or are you new to the workforce and seeking ways to meaningfully contribute alongside your more experienced colleagues? If so, critical thinking — the ability to analyze and effectively break down an issue in order to make a decision or find a solution — will be core to your success. And at the heart of critical thinking is the ability to formulate deep, different, and effective questions.

articles on assessing critical thinking

  • JC John Coleman is the author of the HBR Guide to Crafting Your Purpose . Subscribe to his free newsletter, On Purpose , follow him on Twitter @johnwcoleman, or contact him at johnwilliamcoleman.com.

Partner Center

  • DOI: 10.1002/ACE.7502
  • Corpus ID: 143902337

Assessing Critical Thinking

  • S. Brookfield
  • Published 1 September 1997
  • Education, Psychology
  • New Directions for Adult and Continuing Education

82 Citations

Cultivating critical thinking: insights from an elite liberal arts college, assessing critical thinking in higher education: current state and directions for next-generation assessment. research report. ets rr-14-10., reflections on critical thinking: lessons from a quasi-experimental study, narrative thematic analysis of baccalaureate nursing students' reflections: critical thinking in the clinical education context., critical thinking and patient outcomes: a review., critical thinking and diversity experiences: a connection., situation critical: critical theory and critical thinking in engineering education, perceptions of students' learning critical thinking through debate in a technology classroom: a case study., an investigation into the cross-cultural measurement equivalency of the cctdi to assess the critical thinking dispositions of pre-service teachers across turkish and american higher education institutions, scaffolding: a promising approach to fostering critical thinking, 6 references, assessing critical thinking across the curriculum, developing critical thinkers: challenging adults to explore alternative ways of thinking and acting., the generalizability of critical thinking : multiple perspectives on an educational ideal, evaluating critical thinking, thinking and acting., related papers.

Showing 1 through 3 of 0 Related Papers

A critical issue: assessing the critical thinking skills and dispositions of undergraduate health science students

  • Open access
  • Published: 15 August 2023
  • Volume 2 , article number  21 , ( 2023 )

Cite this article

You have full access to this open access article

articles on assessing critical thinking

  • Anthony Dissen   ORCID: orcid.org/0000-0003-0828-387X 1  

2149 Accesses

Explore all metrics

Critical thinking skills and dispositions are significantly important factors that aid in one’s ability to understand and solve complex problems. Within the field of higher education, critical thinking skills and dispositions are valued and encouraged but are not always fully developed at the completion of the undergraduate degree experience. Those students who are studying to enter the healthcare field are particularly in need of strong critical thinking skills and dispositions in order to provide patients and communities with effective, evidence-based care in the midst of an ever-increasingly complex environment. What program of study a student selects, and the unique curriculum design of that program, may impact the development of these skills and dispositions during undergraduate study. This quantitative study sought to explore and understand the critical thinking skills and dispositions of undergraduate students enrolled in a BS in Health Science (BSHS) degree program, and how these skills in particular compared to the national student population. During the Spring 2022 academic semester, 140 senior-level BSHS students were given the opportunity to complete the California Critical Thinking Skills Test and the California Critical Thinking Disposition Inventory. Results show less development in critical thinking skills when compared to the national student population, with Numeracy skills being the most poorly developed, and Truth-Seeking being the most inconsistent disposition possessed by the student participants. The implications of these findings, particularly for faculty who teach students planning to enter the healthcare field, are offered, including recommendations for curricular design and modification.

Similar content being viewed by others

articles on assessing critical thinking

Chinese medical students’ disposition for critical thinking: a mixed methods exploration

articles on assessing critical thinking

Clear skies ahead: optimizing the learning environment for critical thinking from a qualitative analysis of interviews with expert teachers

A meta-analysis of the effects of non-traditional teaching methods on the critical thinking abilities of nursing students.

Avoid common mistakes on your manuscript.

1 Introduction

Critical thinking skills and dispositions allow students to gather, interpret, and reflect upon how new information and data can be applied to address personal and professional needs and situations [ 1 ]. While there is no one singular definition, critical thinking is often described as an active, attentive, and purposeful method by which one analyzes facts and information to form a judgment or accomplish a specific goal [ 2 ]. This is an important set of skills and attitudes for students in the health sciences to possess, as critical thinking allows one to be comfortable with the possibilities of new perspectives and ideas, which is crucial for healthcare practice. Additionally, critical thinking is necessary for the development of current and future clinical reasoning skills [ 3 ]. This is partly due to the need for students to learn to be appropriately skeptical when reviewing treatment techniques, best practice guidelines, and new research that may impact their means of practice and care delivery [ 4 ]. To be able to work effectively and rationally in the healthcare and medical fields, critical thinking skills and dispositions must be properly developed and supported in educational settings [ 5 ].

The Carnegie Foundation for Teaching and Learning [ 6 ] has proposed four major categories of recommendations for the reform of medical and health science education: Teaching and learning to promote integration, promoting habits of inquiry and improvement, individualizing learning, using standardized assessments, and supporting the progressive development of professional identity. These recommendations parallel the subsequent evolution of education and teaching theories over the past century [ 7 ], namely the dynamic nature of the learning and the teaching processes, and the importance of the teaching environment. Both undergraduate and graduate-level health science educational programs are recognizing that these reforms are needed in order to meet the current and future demands being placed upon healthcare professionals, and that the environment in which learning is taking place is as important as the content being shared. Much of the emphasis behind these proposed reforms is centered around the need for future healthcare professionals to not only know the didactic and intellectual aspects of their work, but to also be able to solve complex problems and to think critically about their work and their identities as healthcare workers.

As such, critical thinking is a fundamental aspect of quality clinical decision-making among a variety of healthcare professions. To be able to think rationally and clearly, especially when encountering problems and uncertainty at work, is a necessary skill to be effective in the kinds of environments and situations that are common in the healthcare and medical fields [ 5 ]. Undergraduate health-focused students who have critical thinking education embedded into their curriculum have shown improvements in their problem-solving skills [ 8 ], which may have particularly important outcomes in promoting patient safety. Health education programs that teach critical thinking have been found to help reduce diagnostic errors, improve overall patient safety, and reduce cognitive those biases that can lead to poorer patient outcomes and professional practice [ 9 ]. This need for critical thinking is not just present in professional practice, but during pre-professional educational experiences as well, where the ability to enhance the capacity for problem-solving and wider reasoning is necessary to perform well academically [ 10 ]. This is especially important considering the significant pressures that are placed upon students during their academic careers at the undergraduate level to perform well academically to secure spots in clinical and graduate programs after completing their baccalaureate degrees.

The consequences of not possessing critical thinking skills in healthcare and medicine can be significant. Healthcare professionals who do not possess a capacity for critical thinking and problem-solving skills have a measurable impact on the health of their patients and communities, specifically poor rates of compliance with health recommendations and treatments, as well as direct harm to the health and wellbeing of those being served [ 11 ]. Given the importance of having a healthcare workforce that can practice critical thinking as part of their professional work, it is necessary to better understand how critical thinking skills and attitudes can be instilled within healthcare professionals, both during their pre-professional education and throughout their professional careers.

By understanding the current level of critical thinking skills and attitudes of health science students before they enter their professional fields of practice, it can be possible to identify those areas of strength, those areas of weakness, and how to make changes as needed within health science education programs to better prepare students for a professional field that demands strong critical thinking skills, attitudes, and applications. In addition, by understanding how skills, attitudes, and overall academic performance relate to one another, health science education programs can be more purposeful in how they advise students, develop curriculum, and track student progress throughout their academic journey.

This study sought to answer the following research questions:

RQ1: What are the critical thinking skills of undergraduate health science students at a four-year, public, comprehensive state university? RQ2: How do the overall critical thinking skills of undergraduate health science students at a four-year, public, comprehensive state university compare to the national population of undergraduate students? RQ3: What are the dispositions towards the importance of critical thinking of undergraduate health science students at a four-year, public, comprehensive state university? RQ4: To what degree is overall academic performance as measured by grade point average (GPA) a reasonable indicator of critical thinking development?

2 Study methods

The theoretical framework for this study was heavily influenced by the work of Dr. Peter Facione, whose seminal work in the field of critical thinking assessment is utilized by educators, employers, and policymakers who recognize the need for students and alumni of institutions of higher education to be able to properly demonstrate these skills and dispositions as a result of their time in higher education [ 12 ]. An additional component to the assessment work developed by Facione is the need for not only developing critical thinking skillsets, but also the development of those dispositions and attitudes, what Facione and colleagues call the critical spirit, that are needed in order to possess the internal drive and motivation to apply critical thinking skills in various aspects of one’s personal, professional, and social spheres of life [ 13 ].

The work done by Facione in the development of this framework has been studied and utilized by other researchers, particularly around assessing the critical thinking skills and disposition of healthcare professionals and health science students. This framework has been utilized by Nair et al. [ 14 ] in the development of their Critical Thinking Self-Assessment Scale, which was built specifically to be utilized by nurses as part of their own critical thinking self-assessment. Facione’s critical thinking assessment work has also been used to evaluate the effectiveness of different educational interventions with regard to their ability to improve critical thinking in pre- and post-exposure to treatment. A 2020 study by Wu et al. [ 15 ] utilized the disposition assessment tool developed using Facione’s work to evaluate the effectiveness of mind mapping exercises to increase critical thinking inclination of students. Additionally, the assessment tools built from this framework have been used to evaluate the potential predictors of critical thinking abilities of undergraduate students, such as taking courses online or transferring courses from another college or university [ 16 ].

2.1 Population and sample selection

The participants for this study were BS in Health Science (BSHS) students enrolled at a four-year, public, comprehensive state university located on the east coast of the United States. All participants were 18 years of age or older, were enrolled in one of five sections of the senior-level BSHS research course that was offered in the Spring 2022 academic semester, and agreed to participate in this study. Two additional sections of the course were offered that did not participate in the data collection efforts of this study. Participants in this study were those students in attendance during the class period that was utilized to administer the critical thinking assessment tools.

Students had declared one of the following concentrations within the BSHS degree: General Concentration, Pre-Occupational Therapy, Pre-Physical Therapy, or Pre-Communication Disorders. Participants were given the opportunity to complete each assessment tool in a voluntary capacity and were not required to complete either or both assessments under any conditions. As the study participants were recruited as part of the senior-level research course of the BS in Health Science degree, all participants were nearing the culmination of their undergraduate career at the time of data collection. No exclusionary criteria were used in selecting study participants beyond their enrollment within the BS in Health Science degree and current enrollment in the senior-level research course. All data collection took place after obtaining all necessary approvals from the Stockton University IRB Committee, including CITI training by the researcher. IRB approval was obtained after submitting all required documentation, proof of CITI training, study procedures, and informed consent documents (Stockton University IRB Approval Number #2021.175). A total of 194 students were enrolled in the senior-level research course during the Spring 2022 semester, with 5 sections of this course agreeing to participate in data collection efforts, who in total represented 140 enrolled students or 72% of the total student population enrolled in the senior-level research course in the Spring 2022 semester.

2.2 Instrumentation

The researcher used the California Critical Thinking Skills Test (CCTST) and the California Critical Thinking Disposition Inventory (CCTDI) for data collection purposes, and administered each assessment to students enrolled in the senior-level research course for the BS in Health Science degree. These tests are owned and administered by Insight Assessment and were developed in part by the work in Critical Thinking Assessment (CTA) theory as described by Peter et al. [ 12 , 13 ]. The CCTST is a 34-item, multiple-choice, non-discipline-specific test that evaluates critical thinking along 8 different subscales: Analysis, Interpretation, Inference, Evaluation, Explanation, Inductive Reasoning, Deductive Reasoning, and Numeracy. It is estimated that the CCTST takes an average of 45 min to complete. Each multiple-choice question item is related to generic situations not unique to any particular domain of work. Scores are developed using a proprietary formula, and range from low or non-manifested, weak development, moderate development, strong development, and superior development [ 17 ].

The CCTDI consists of 75 generic statements with a 6-point Likert-selected response scale that is also non-discipline specific. The CCTDI test evaluates disposition towards critical thinking along 7 different subscales: Truth Seeking, Analyticity, Open-Mindedness, Systematicity, Confidence in Reasoning, Inquisitiveness, and Maturity of Judgement. It is estimated that the CCTDI takes an average of 15–20 min to complete. Scores can range from 5 to 60 for each subscale and indicate a level of disposition ranging from weak disposition development, positive disposition development, or strong disposition development [ 18 ]. Each tool is delivered via an online web-based portal owned and operated by Insight Assessment. Both of these tools were selected due to their previously established validity and reliability in assessing the critical thinking skills and attitudes of study participants [ 13 ]. The CCTST has documented strength in both the content validity of each of the skill domains as well as construct validity. Validity has been demonstrated by correlational studies exploring critical thinking skills with additional measurements such as GPA and GRE scores, as well as for criterion (predictive) validity [ 13 , 19 , 20 ]. The CCTST has also shown strong internal reliability with documented Cronbach’s Alpha coefficients ranging from 0.60 to 0.78 on individual scales, and 0.90 or above for the overall measure [ 17 ]. See Appendix A for the breakdown of the score ranges that pertain to each level of development for the CCTST and the CCTDI.

The CCTDI has also been researched and assessed for its validity and reliability, with the inventory items being found to be valid with an internal consistency reliability score of 0.887 [ 21 ]. This has also been shown with a cross-cultural application of the CCTDI, with high content validity across cultural versions of the inventory with alpha coefficients ranging from 0.81 to 0.97 [ 22 ]. A recent meta-analysis by Orhan explored the reliability of the CCTDI using 98 alpha values across 87 unique studies of the CCTDI. Orhan found the CCTDI to be reliable across samples with an alpha value of 0.83 [ 23 ]. These studies have shown strong consistent validity and reliability for the CCTDI as an instrument for the assessment of the critical thinking dispositions of students.

To assure ongoing validity and reliability for both the CCTST and the CCTDI in this study, both tools were delivered exactly as instructed by Insight Assessment. No variations were made to either instrument, no questions or sections were added, omitted, or changed, and study participants met all requirements for participation as described by the Insight Assessment user’s manual [ 17 , 18 ].

2.3 Data collection and management

All data collection took place during the first 2 weeks of March 2022. Study participants and faculty were informed that participation in the data collection phase of this study was purely voluntary and that there would be no penalty for not participating in the study. All participant information has been kept confidential, and participants were provided with an informed consent form prior to the data collection beginning. Participants were also informed that the information collected for this study would not be shared with members of the public in any identifiable way and that all study findings would be presented as aggregated data. All data collection took place during the traditional meeting time of each confirmed section of the senior-level research course, with two sections meeting via Zoom conference, and three sections meeting face-to-face in a university computer lab.

Distribution and completion of each of the assessments took place via the online portal offered through Insight Assessment. Each class meeting allowed for adequate time for both assessments to be administered in a single class meeting. Data collection took place as an in-class activity for that day’s class meeting, and there was no course penalty for not taking part in the data collection. Should a student have declined to participate in the in-class activity during the day of data collection, they would have been provided with an assigned reading on critical thinking in the healthcare field that would be utilized for in-class discussions after the data collection activity had concluded. No student declined to participate in the study. In an effort to reduce student anxiety, students were assured that all results were purely for the purpose of assessment and that class rankings or comparisons would not be shared. Additionally, there was no additional course credit given for participation, nor were there any extra credit or similar potentially coercive incentives provided for data collection participation. To ensure each student participant had the opportunity to access the online platform, all in-person meetings took place in a university computer lab. For the 2 sections that met with the researcher via Zoom conferencing, all students had access to a laptop or computer with internet access. For these sections, the faculty member teaching the course was present in the Zoom room. It should be noted that since these meetings took place over Zoom, the continuity of the environment in which students were completing the assessment could not be guaranteed when compared to those students completing the assessment in a university computer lab.

2.4 Data analysis

All data collection took place during the first 2 weeks of March 2022, with all data being collected before the beginning of the Spring break period of the term. After all data collection was completed, reports were generated by Insight Assessment to provide results of Overall Critical Thinking Skills, Critical Thinking Skills across each Subscale, Overall Critical Thinking Dispositions, and Critical Thinking Dispositions across each Subscale. The overall critical thinking skill score population means for both the national undergraduate student population and the national health science undergraduate student population were obtained through Insight Assessment to allow for comparison between these two national populations and the study sample.

Descriptive statistics were generated for overall and subscale scores for the CCTST and the CCTDI, and frequency statistics were generated for ethnicity, gender, and declared concentration within the BSHS degree. T-tests for independent samples were conducted for gender for both the CCTST and the CCTDI. Overall student scores for the CCTST were analyzed for comparison to the national population of undergraduate students via one sample t-test. For the demographic variable of Degree Concentration, which includes General Concentration, Pre-Occupational Therapy Concentration, Pre-Physical Therapy Concentration, and Pre-Communication Disorders Concentration, the researcher conducted a one-way analysis of variance. All statistical analysis was conducted using the IBM SPSS software Version 25.

3.1 Research question 1

RQ1 sought to understand the critical thinking skills of undergraduate health science students at a four-year, public, comprehensive state university by utilizing the CCTST offered through Insight Assessment. Of the 140 total students who were invited to participate in this, 130 students completed the CCTST, as 10 students in total did not attend class on the day of data collection. Using the criteria set forward by Insight Assessment, the data results from 5 student participants were removed from the final report of the data, as they completed the CCTST in under 15 min, which would not be considered an adequate amount of time to thoughtfully respond to each question being asked. As a result, a total of 125 students completed the CCTST in full, representing approximately 89% of the potential sample. The mean age was 22 years of age, with 79% indicating female gender identity. See Table 1 for ethnicity findings and Table 2 for the frequency of each concentration identified within the BS in Health Science degree.

The CCTST is designed to assess and measure the critical thinking and numeracy skills that are used in the process of reflective reasoning in order to make an informed judgment about what to do, or what to believe, in a particular situation or setting. The CCTST provides an overall critical thinking score, as well as scores across 8 sub-domains: Analysis, Inference, Evaluation, Induction, Deduction, Interpretation, Explanation, and Numeracy. A brief description of each domain is described in Appendix B .

The CCTST scores are calculated by Insight Assessment via a proprietary formula for both the overall score and the score of each sub-domain. Study sample scores for overall critical thinking ability, as well as across each sub-domain, are shown via descriptive statistics in Table 3 . One-way analysis of variance showed no statistically significant difference ( p  = 0.708) in the Overall Critical Thinking Skills Scores of participants among the different degree concentration options (Pre-Occupational Therapy, Pre-Physical Therapy, Pre-Communication Disorders, and General) within the BS in Health Science students (see Tables 4 , 5 ).

3.2 Research question 2

RQ2 sought to answer was to understand how the overall critical thinking skills of undergraduate health science students at a four-year, public, comprehensive state university compared to the national population of undergraduate students. Aggregate data provided by Insight Assessment shows that the population mean score for overall critical thinking skills of four-year college/university undergraduate students is currently 75.3, which can be compared to the overall critical thinking skills score of 69.96 for the study’s sample. The results of the one-sample t-test showed that the overall critical thinking skills score of the study sample is significantly lower than that of the national four-year college/university undergraduate student population (see Table 6 ).

As Insight Assessment does not collect aggregate data for the sub-domain measurements, comparison between the national four-year college/university undergraduate student population and the study sample for each sub-domain was not possible.

3.3 Research question 3

RQ3 sought to understand the dispositions towards the importance of critical thinking of undergraduate health science students at a four-year, public, comprehensive state university. Of the 140 total students who were invited to participate in this, 130 students completed the CCTDI, as 10 students in total did not attend class on the day of data collection, representing approximately 93% of the potential sample. The mean age was 22 years of age, with 80% indicating female gender identity. See Table 7 for ethnicity findings, and Table 8 for frequency of each concentration identified within the BS in Health Science degree. It is important to note that the sample size is larger for RQ4 (n = 130) than for RQ1, RQ2, and RQ3 (n = 125), as all students who completed the CCTDI did so at or above the minimum amount of time deemed necessary to ensure validity and accuracy of the results.

The CCTDI is designed to assess the critical thinking mindset and attitudes of individuals toward critical thinking. The CCTDI provides scores across 7 subdomains: Truth-Seeking, Open-Mindedness, Inquisitiveness, Analyticity, Systematicity, Confidence in Reasoning, and Maturity of Judgment. A brief description of each domain is described in Appendix C .

Study sample scores for overall critical thinking dispositions, as well as across each sub-domain, are shown via descriptive statistics in Table 9 , with the sub-domain of Inquisitiveness showing the highest mean score of 46.5, and the sub-domain of Truth-Seeking showing the lowest mean score of 35.4. It is important to note that there are no national population means available for comparative purposes, as Insight Assessment does not collect this kind of national mean data for the CCTDI. The reason for this is that there is no correct or incorrect answer for each of the 6-point Likert questions asked in the CCTDI, and there is no ideal mean score for study results to be measured against.

3.4 Research question 4

RQ4 sought to understand to what degree overall academic performance, as measured by grade point average (GPA), is a reasonable indicator of critical thinking development. To help correct for multiple comparisons, a Bonferroni Correction was conducted. An adjusted p-value was computed by dividing a 0.05 level of significance by the number of correlations for both the CCTST and the CCTDI. For the CCTST, the adjusted p-value (0.05/9) was 0.005. For the CCTDI, the adjusted p-value (0.05/8) was 0.006. Tables 10 and 11 shows the correlation matrix between Critical Thinking Skills and GPA, and Critical Thinking Dispositions and GPA, respectfully, that indicated a statistically significant relationship. For both tables, relationships that are significant at the 0.05 level are marked with a single asterisk (*) and those that are significant at the adjusted p-value levels are marked with a double asterisk (**). Pearson correlation shows a statistically significant positive correlation between GPA and overall critical thinking skills (0.235, p  = 0.008), as well as across all critical thinking subscales (Table 10 ), with the subscale of numeracy showing the highest correlation with GPA (0.300, p  = 0.001). Pearson correlation shows a statistically significant positive correlation between GPA and the critical thinking disposition subscale of systematicity only (0.175, p  = 0.047), with no other subscale showing a statistically significant correlation (Table 11 ).

4 Discussion

The aim of this study was to understand the critical thinking skills and dispositions of undergraduate students enrolled in a BS in Health Science degree program at a four-year university. The findings of this study are in agreement with the published research pertaining to critical thinking skills development in undergraduate students as a whole, as some estimates have described that 45% of undergraduate students do not show meaningful improvement in their critical thinking skills upon graduation, with even this number potentially being underestimated [ 24 ]. As this study was not longitudinal in nature, it is not known to what degree critical thinking skills or dispositions did or did not improve over the course of a student’s higher education experience. Rather, this study provides a snapshot of the skills and dispositions found at the culmination of their program of study. Therefore, the findings of this study do not necessarily suggest a failure to develop critical thinking skills and dispositions of this particular Health Science undergraduate program. Instead, it provides insight into the degree to which critical thinking skills and attitudes have been developed upon the conclusion of academic study, with opportunities to evaluate ways in which to further enhance critical thinking skill and disposition development by understanding the current baseline.

Earlier research conducted by Keeley et al. [ 25 ] points to a common resistance of students to engage in critical thinking, which these authors suggest may be due to a generalized resistance to engaging in different forms of learning and studying behaviors than they have previously utilized in their education in an effort to “avoid change, work, and pain.” The authors also suggest that students who do not regularly engage in self-reflection (i.e. why am I resistant to engaging in critical thinking?) are less likely to be aware of their hesitation in the first place.

Another potential reason for this deficiency in critical thinking skill development may be pedagogical in nature. Higher education pedagogy is often content-based and seeks to imbue students and learners with deep knowledge about a series of subjects, whereas a more critical thinking-oriented pedagogy is rooted in teaching students and learners how to think complexly and across a number of different areas [ 26 ]. As a result of a heavily content-based pedagogy, undergraduate students may not be receiving the kind of complex and problem-based learning environment needed to develop a more robust critical thinking skillset. Research by Matthews & Lowe also suggests both pedagogical and environmental reasons as to why students may be resistant to engaging in more critical thinking and critically reflective mindsets [ 27 ]. Particularly, these researchers highlight the need for the development of the critical thinking disposition (the critical spirit described by Facione) in order for students to overcome resistance to both developing and utilizing critical thinking skills in their educational and professional endeavors. Without possession of a strong disposition toward critical thinking, more overt resistance to the utilization of critical thinking may remain.

Participants in this study showed the strongest development in Inference and Induction skills. Inference, the ability to draw logical conclusions based on presented data, is an important subdomain of critical thinking skills. Healthcare practice and research both require the utilization of inferential reasoning in order to appropriately draw conclusions and make recommendations in situations and environments that are not always pristine or ideal [ 28 ]. This allows for the greater development of a “what if?” mindset that can be of significant importance in health-related environments. The similar level of development found within induction is interesting to note, as induction can be seen as a sub-category of inferential reasoning. Clinical reasoning requires the development and application of inductive reasoning in order to make larger generalizations and conclusions based on the individual clinical scenarios or patterns that are being witnessed and observed [ 29 ]. While development in the areas of Inference and Induction was only at a moderate level, as opposed to strong or superior development as described by the CCTST, it is still important to note that these areas are of significant importance when it comes to future work in the healthcare field.

What starts to become more concerning are the areas within critical thinking skillsets that were more weakly developed and demonstrated in this study. Weak development in the area of evaluation is worthy of special attention, as the healthcare field is riddled with dubious claims, misinformation campaigns, and conspiracy theories. Recent research done by Lantian, Bagneux, Delouvée, & Gauvrit provided insight into the link between evaluative and critical thinking abilities and subscribing to conspiratorial beliefs and theories [ 30 ]. Courses that emphasize evaluation skills have been shown to reduce adherence to pseudoscientific beliefs while also building a more skeptical frame of mind when coming across new information or claims [ 31 ].

The skillset with the lowest level of development was numeracy, with weak to no development in this area being shown by the CCTST. This is, in some ways, not surprising, as adults in the United States have been found to perform well below average in numeracy skills when compared to adults in other developed nations according to the Organization for Economic Cooperation and Development [ 32 ]. However, in this present study, it must be noted that not only was numeracy the most poorly developed critical thinking skillset, numeracy mean scores fell within the weak to not developed range. This is a finding of great importance, as numeracy is a required skill within the field of healthcare. Regarding critical thinking as a whole, the study sample’s mean score was 69.96 for overall critical thinking skills, which was statistically significantly lower than the overall critical thinking skills of 4-year college/university undergraduate students’ mean score of 75.3. Utilizing the criteria provided in the CCTST, the study sample mean shows weak to moderate development, whereas the national student population shows moderate development at the higher end of the moderate development range. This shows that not only do the study participants show lower development in their critical thinking skills when compared to the national population, but that the study participants are a full category of development lower.

Results show that the majority of dispositions assessed in the CCTDI showed positive development among the study participants. Open-Mindedness, Inquisitiveness, Analyticity, and Confidence in Reasoning were all found to fall within the positive range of personal development. Higher scores on the CCTDI have been found to be associated with greater problem-solving skills, showing that these affective qualities are important in the overall critical thinking attributes of students [ 33 ]. Open-mindedness and inquisitiveness are especially important dispositions to possess, as they are paramount to supporting the desire to learn and to enhance personal knowledge within students, which has further been associated with better student performance in higher education [ 34 ].

Particularly with students pursuing health-related careers, open-mindedness again has been found to be associated with academic success and graded work in courses [ 35 ]. While these other domains of analyticity and confidence in reasoning are associated with problem-solving overall, they are not as predictive of student success and readiness as open-mindedness and inquisitiveness [ 36 ], although higher dispositions overall are an important aspect of building problem-based learning skills.

What is perhaps most concerning amongst the findings pertaining to this research question is that Truth-Seeking showed the lowest disposition development, with results showing inconsistent to ambivalent demonstration. Truth-seeking is a necessary disposition to possess in order to seek out the best possible evidence and information to understand a situation or issue. As such, truth-seeking behavior has been described as the main predictive dispositional factor of an individual possessing a robust overall critical thinking behavior [ 37 ]. In particular, truth-seeking allows one to question their previously held beliefs or ideas about a topic, which is critical in the healthcare field, as new information and science are always coming forward. This new information often may displace or change previously held theories or practices, and a truth-seeking disposition is required in order to critically evaluate and accept new information that is found to be factually based.

Part of the reason why dispositions and attitudes towards Truth-Seeking may be so hard to foster is the subjective and often abstract nature of what constitutes truth, which is then further compounded by the copious amounts of information that students are tasked with processing when attempting to determine factualness. As described by Arth et al. [ 38 ], “…information is available to people in unrecordable amounts and insurmountable ways.” The sheer amount of information that students are being confronted with is only increasing, and without proper information literacy preparation, and particularly digital information literacy, students may be both unprepared and unmotivated to seek out that information which would point towards the truth. This point is reinforced by Gibbs [ 39 ], who emphasizes the additional consideration of trust in self. Without a level of trust in one’s own ability to both seek out true information and simultaneously recognize false or misleading information, students may not possess the confidence necessary to develop a stronger attitudes towards truth-seeking as a behavior.

Regarding the findings pertaining to correlations between GPA and critical thinking skills and dispositions, academic performance and GPA have been shown to be associated with greater critical thinking skill development [ 40 ]. And while GPA is not the only indicator of skill development, overall academic performance and success may be one way of measuring the potential for critical thinking skillset enhancement. The finding of numeracy being the most positively correlated subscale with GPA is an important one, given the overall poor development of numeracy skill development in this study sample. However, numeracy as a skill that was shown to be poorly developed in this study may be impacted by more than overall GPA and academic development. Within the research seeking to understanding why mathematics and numeracy skills are often poorly developed in American students, negative stereotypes, stigma, and poor sense of self have been identified as significant influences. The psychological impact of negative self-stereotyping can be a double-edged sword, both in terms of instructor biases towards what kind of student tends to be better at mathematics, as well as student self-belief regarding whether or not they are the kind of student who is good at math [ 41 ]. The impact of stereotype threat on mathematics and numeracy achievement has been identified as a potential key factor in the overall lack of mathematics development across student groups and demographics [ 42 ], with female-identifying students in particular being highly vulnerable to these stereotyping images and messages [ 43 ]. Considering the high percentage of female-identifying students within this study, the potential impact of stereotyping and stereotype threat, particularly its role in mathematics and numeracy skill development and utilization, cannot be ignored.

With critical thinking dispositions, the fact that systematicity was the only subscale found to be associated with GPA is in some ways not surprising, as systematicity is the tendency to approach problems in an ordered, disciplined, and systematic way. Those with higher GPAs may naturally be inclined to a more systematic way of approaching their work and studies, which may explain this correlation. However, it should also be noted that no other disposition subscale was found to be correlated with GPA, which brings attention to the fact that GPA and academic grade achievement may not be an indicator of disposition and attitude towards critical thinking. This highlights the limitation of using GPA as a barometer for critical thinking development, as it cannot fully capture or predict how a student will conceptualize and utilize critical thinking in their personal or professional lives.

However, this finding does highlight the phenomenon that students may possess critical thinking skills but not possess the disposition necessary to put these skills into use, which may in part be influenced by the dispositions of the educators who are teaching these students. A recent study by Shin et al. [ 44 ] explored the role of a critical reflection competency program for nurse educators in improving the educators’ dispositions. Participation in a 4-week critical reflection competency program was found to improve the critical thinking dispositions and teaching efficacy of nurse educators, which simultaneously allows for greater opportunity for nursing educators to imbue these dispositions and attitudes within their students. How an educator is projecting their own attitudes towards the importance of critical thinking utilization may have a significant impact on how they are not only designing curriculum and teaching methods, but also in how they are creating a general environment that fosters a curious mind and a stronger disposition towards employing critical thinking skills in work.

An additional influential factor on the development of critical thinking dispositions may be the opportunity for a student to explore and utilize creativity in their classroom. Qiang et al. [ 45 ] found that a student’s critical thinking disposition was positively related to their self-concepts of creativity and scientific creativity in particular. This was further emphasized by Khoshgoftar et al. [ 46 ], who found a direct relationship with critical thinking dispositions and reflective creative capacities. The significance of these findings are two-fold. First, that classroom learning opportunities that emphasize creativity and reflection opportunities may help to further bolster critical thinking dispositions within students, and secondly, that a student’s ability to be reflective and creative may not always be properly captured in GPA scoring. Educators, particularly those working with students in the health sciences, may find benefit in not only improving their own dispositions towards critical thinking, but also find opportunities to properly assign, assess, and capture reflection and creative capacity in their students to further enhance student disposition development.

4.1 Implications for practice

The findings of this study are of great importance, as future healthcare professionals need to possess the critical thinking skills and dispositions necessary to perform their work accurately and safely, especially given a work environment that is ever-increasing in its complexity. As this study was conducted with pre-professional health science students, the ways in which the findings of this study may be applied to the field of health pre-professional education are specific to the development of these skills and dispositions before clinical education and/or encounters with patients or community members begins. This speaks specifically to the general development of cognitive skills and attitudes versus clinical skills and attitudes, which would be developed during their post-baccalaureate education and training.

An important area to note is the correlational relationship that exists between the different subscales of both the CCTST and the CCTDI, particularly those correlations that showed the strongest relationship to one another. Overall critical thinking skill was most strongly correlated with analysis, inference, induction, and deduction skills, which provides insight into ways to focus potential curricular and pedagogical changes that may work to increase overall critical thinking skills within students. Course assignments, projects, educational lessons, and readings that require students to utilize analytical, inferential, and both inductive and deductive skills may be of particular benefit to shaping an overall improvement and strengthening of critical thinking within students. Numeracy skills, which were the most poorly developed, were most positively correlated with explanatory skills. This is an important finding, as strengthening explanatory skills, which refers to a student’s ability to defend and justify a belief or a response to a question, may have a simultaneous benefit of supporting a student’s development in numeracy.

Regarding critical thinking dispositions, while truth-seeking was the most poorly developed attitude, it also showed the strongest correlation with overall critical thinking dispositions. Therefore, in an effort to improve truth-seeking dispositions within students, exposing students to opportunities that will overall strengthen and support their dispositions towards critical thinking may have the added benefit of supporting their desires to seek out the truth. Maturity of judgment also showed a higher correlation with truth-seeking, which again provides helpful insight. As maturity of judgment allows a student to understand and accept that multiple solutions or options may be possible when approaching a question or issue, and that complexity is an inherent aspect of many problems and issues, fostering this disposition within students may again help to support their development within truth-seeking.

In an effort to put these findings into practical use, the first and most immediate practice-based recommendation based on the findings of this study is to evaluate programmatic curriculum and teaching approaches that have been shown to promote critical thinking skill development in higher education settings. Mahmoud & Mohamed [ 47 ] provide several evidence-based recommendations for the enhancement of critical thinking skills and abilities. While a few of these recommendations are described below, readers are encouraged to read the paper by Mahmoud & Mohamed in its entirety, particularly those educators who work with health-oriented students, in order to fully recognize the breadth of curricular and teaching approaches recommended.

Problem-Based Learning A major component of pre-health profession education should be problem-based learning, which is a student-centered approach to the learning process that focuses on solving open-ended problems through collaborative engagement with other learners in a group setting.

Programmatic Orientation Students often do not fully understand the philosophy and core concepts of the programs they are selecting to study. As students are often oriented to their college or university after admittance, so too should they be fully orientated to the program of study they are choosing as their major.

Clinical Scenarios Context-dependent activities ask the learner to bring their life experiences, prior learning, and personal skills into the classroom. In this way, improved recall and application of knowledge have been shown to be enhanced, allowing students to encode information learned in such a way that it can be easily retrieved when they are in a specific scenario.

An additional recommendation is to encourage faculty members of pre-health educational programs to adjust their curriculum and teaching styles, such as the utilization of a flipped classroom model, to promote critical thinking dispositions. This may be particularly helpful in developing the disposition of truth-seeking, which was not only found to be poorly developed in this study but in other studies that have sought to understand the dispositions of students in the healthcare field [ 48 ]. However, as previously shared, resistance to new methods of teaching can influence how effective a flipped-classroom approach can be in fostering critical thinking skills and dispositions. Oudbier, Spaai, Timmermans, & Boerboom highlight how student self-regulation, the motivation of the faulty member, and variation in assessment approaches can all play a significant role in whether a flipped classroom approach will be effective [ 48 ]. To increase the positive possible outcomes of such an approach, Arth et al. [ 38 ] provide valuable insights and recommendations made by professors on how to encourage critical thinking and truth-seeking dispositions within undergraduate students. Selected examples of their recommendations are particularly linked with curricular design and teaching strategies.

Research Information Skills The ability to properly seek out and evaluate information should be incorporated throughout the curriculum in a variety of classes versus localizing these skills in a research-specific course. Specifically, students need to learn the difference between researching information via the scientific method versus simply looking up information.

Belief Bias & Skepticism An important aspect of developing a critically oriented mindset is to understand one’s own biases, and how these personal biases can influence the way in which information is sought out and interpreted. In this way, confirmation bias can be avoided, and a healthy level of skepticism can be maintained.

Discernment of Good vs. Bad Information Avoiding belief bias and maintaining a skeptical mindset also links to the desire to find reliable information and to be able to discern good quality from poor quality information. Given the proliferation of questionable claims that are found through online sources, educators need to be teaching the necessary skills to determine the reliability of the information that is obtained during the research process.

The Constant Pursuit of Truth Although it may initially seem counterintuitive, one of the most important ways to encourage a truth-seeking disposition in undergraduate students is to design a curriculum that reinforces the idea that nothing can ever be known with complete certainty, particularly in the health sciences. Not only because of the abundance of information of questionable validity and reliability, but also due to the fact that information is constantly changing as new research is conducted and new evidence is gathered. As previously shared in the research by Arth et al. [ 38 ], students need to be encouraged to see the pursuit of truth as an ever evolving behavior due to the plethora of new information that is being shared, particularly via digital platforms. This requires students to be comfortable with a lack of finality when it comes to the pursuit of truthful and factual information. Students who are exposed to educational environments that encourage comfortability with the ever-present need to seek out truth through purposefully designed learning experiences, modeling techniques, and reflection time from their faculty have been shown to improve in both their critical thinking skills as well as their attitudes towards seeking out truth [ 49 , 50 ].

It should be noted that this study serves as an internal review and assessment of a single academic program within the field of health science. While this may lead to a reduction in generalizability to other educational program, what this study can contribute is the necessity for higher education programs to engage in this very kind of assessment and evaluation of critical thinking skills and attitudes of their students. Without engaging in an internal assessment and audit of student critical thinking skills and attitudes, educators and curriculum developers will not have the information and data needed to determine whether or not their curricular program, as well as the pedagogical methods being employed by faculty, is leading to a robust development in critical thinking skills and attitudes. These methods are not currently in place in meaningful levels within the current program of study that the students within this study were enrolled in, and it is the hope of this researcher that these methods will be increased in an effort to increase critical thinking skills and dispositions over time.

4.2 Study strengths and weaknesses

As with any scholarly research, there are limitations to the methods of research design and data collection that influence the results of the study itself. First, the collection of data for this study utilized a sample of convenience. This researcher is a faculty member in this BS in Health Science program and therefore was able to focus data collection solely on students with which he had easy and convenient access. Since all students who responded were a part of this single program, it is difficult to be able to fully generalize the results of the CCTST and the CCTDI to the undergraduate health science population as a whole. While this does allow for a more specific analysis of this particular cohort of students, it does introduce limitations into how study findings can be then expanded to additional institutions of higher education.

A second limitation of this study is the lack of ability to compare the critical thinking skills and dispositions of BS in Health Science students to other undergraduate students at the same university who are enrolled in other programs of study. While a comparison to national population means was possible for overall critical thinking scores, it would have been illuminating to be able to compare the mean scores across each subdomain of critical thinking skill as well. Since these data are not collected or stored by Insight Assessment, being able to draw data from other students at the same university would have made this kind of comparison possible. Given the logistical and financial constraints that existed, it would not have been possible to collect data from an adequate number of non-health science students, leaving this kind of subdomain comparison absent from this particular study. It should also be noted that 2 sections of students did complete the assessment over Zoom versus being in an in-person computer lab setting. While there is no evidence to suggest a significant difference in student performance or adherence to assessment guidelines between those completing the assessment online versus those in person, nonetheless it may have played a role in impacting student outcomes.

An additional consideration is the comparison of data from the study sample to the national population of students providing Overall Critical Thinking Skill scores. Since the exact demographic breakdown and program of study breakdown of this national population of students is not known, there is a natural limit to what degree this comparison is helpful. Future research hoping to compare a study sample to another sample or population would benefit from knowing more specific details pertaining to the demographic and educational descriptors in order to extrapolate greater findings.

Lastly, data collection took place amongst a group of senior-level students who had spent the previous 2 years of their undergraduate-level education in the COVID-19 global pandemic environment. The impact of COVID-19, and especially the way in which it significantly impacted the field of higher education and of learning as a whole, is still being assessed and understood. For the purposes of this study, it would not have been possible to control for the ways in which COVID-19 may have temporarily or permanently impacted critical thinking skills and dispositions. As such, the results of this study must be viewed through this lens, as it is possible that the scores for skills and dispositions would have been different in a non-COVID-impacted learning environment.

5 Conclusion

Undergraduate health science students within this study population show low to moderate development of critical thinking skills, with numeracy skills being particularly poorly developed, and grade point average being moderately but significantly associated with critical thinking skill development across all subscales. And while students show positive development across most critical thinking disposition subscales, they also show inconsistent and ambivalent dispositions towards truth-seeking, with grade point average not being a significant indicator of attitudes and dispositions. Health science education programs that hope to enhance and strengthen both critical thinking skill and disposition development may wish to implement evidence-based pedagogical practices to ensure students are prepared for professional practice within the field of health science that require strong critical thinking development.

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Arnott SR. Evidence beyond the rules: a critical thinking approach to teaching evidence law to undergraduate students. J Scholarsh Teach Learn. 2018. https://doi.org/10.14434/josotl.v18i4.22812 .

Article   Google Scholar  

Hitchcock D. Stanford encyclopedia of philosophy—critical thinking. 2018. https://plato.stanford.edu/entries/critical-thinking . Accessed Jan 15 2023.

Allen DD, Toth-Cohen S. Use of case studies to promote critical thinking in occupational therapy students. J Occup Ther Ed. 2019. https://doi.org/10.26681/jote.2019.030309 .

Morris RJ, Gorham-Rowan MM, Robinson RJ, Scholz K. Assessing and teaching critical thinking in communication science and disorders. Teach Learn Commun Sci Dsord. 2018. https://doi.org/10.30707/TLCSD2.1Morris .

Sharples JM, et al. Critical thinking in healthcare and education. Brit Med J. 2017. https://doi.org/10.1136/bmj.j2234 .

Irby D, Cooke M, O’Brien B. Calls for reform of medical education by the carnegie foundation for the advancement of teaching: 1910 and 2010. Ac Med. 2010. https://doi.org/10.1097/ACM.0b013e3181c88449 .

Mann KV. Theoretical perspective in medical education: past experience and future possibilities. Med Ed. 2011. https://doi.org/10.1111/j.1365-2923.2010.03757.x .

Kanbay Y, Okanlı A. The effect of critical thinking education on nursing students’ problem-solving skills. Contemp Nurs. 2017. https://doi.org/10.1080/10376178.2017.1339567 .

Chacon JA, Janssen H. Teaching critical thinking and problem-solving skills to healthcare professionals. Med Sci Ed. 2021. https://doi.org/10.1007/s40670-020-01128-3 .

Hanley P, Slavin RE, Elliot L. Thinking, doing, talking science. Evaluation report and executive summary: Education endowment foundation. 2015. https://educationendowmentfoundation.org.uk/projects-and-evaluation/projects/thinking-doing-talking-science/ . Accessed Jan 15 2023.

Cummings L. Critical thinking in medicine and health. Fall in Med Health. 2020. https://doi.org/10.1007/978-3-030-28513-5_1 .

Facione NC, Facione PA. Externalizing the critical thinking in clinical judgment. NursOutlook. 1996. https://doi.org/10.1016/S0029-6554(06)80005-9 .

Facione NC, Facione PA, Sanchez C. Critical thinking disposition as a measure of competent clinical judgment: the development of the California critical thinking disposition inventory. J Nurs Ed. 1994. https://doi.org/10.3928/0148-4834-19941001-05 .

Nair GG, Stambler LL. A conceptual framework for developing a critical thinking self-assessment scale. J Nurs Ed. 2013. https://doi.org/10.3928/01484834-20120215-01 .

Wu HZ, Wu QT. Impact of mind mapping on the critical thinking ability of clinical nursing students and teaching application. J Int Med Res. 2020. https://doi.org/10.1177/0300060519893225 .

Terry N, Ervin B. Student performance on the California critical thinking skills test. Acad Ed Lead J. 2012;16:S25.

Google Scholar  

CCTST User Manual and Resource Guide. Insight assessment. Oakland: The California Academic Press; 2021.

CCTDI User Manual and Resource Guide. Insight assessment. Oakland: The California Academic Press; 2021.

Denia A. Association of critical thinking skills with clinical performance in fourth-year optometry students. Optom Ed. 2008;33:103–6.

Paans W, Sermeus W, Niewsg R, van der Schans C. Determinants of the accuracy of nursing diagnoses: Influence of ready knowledge, knowledge sources, disposition toward critical thinking and reasoning skills. J Prof Nurs. 2010. https://doi.org/10.1016/j.profnurs.2009.12.006 .

Redhana I, Sudria IBN. Validity and reliability of critical thinking disposition inventory. Proceedings of the 3rd International Conference on Innovative Research Across Disciplines. 2020. https://doi.org/10.2991/assehr.k.200115.046 .

İskïfglu G. Cross-cultural equivalency of the California critical thinking disposition inventory. Ed Sci Theory Prac. 2013. https://doi.org/10.12738/estp.2014.1.1840 .

Orhan A. California critical thinking disposition inventory: reliability generalization meta-analysis. J Psychoeduc. 2022. https://doi.org/10.1177/07342829211048962 .

Lane D, Oswald FL. Do 45% of college students lack critical thinking skills? Revisiting a central conclusion of academically adrift. Ed Meas Iss Pract. 2016. https://doi.org/10.1111/emip.12120 .

Keeley SM, Shemberg KM, Cowell BS, Zinnbauer BJ. Coping with student resistance to critical thinking. Coll Teach. 1995. https://doi.org/10.1080/87567555.1995.9925537 .

Flores KL, Matkin GS, Burbach ME, Quinn CE, Harding H. Deficient critical thinking skills among college graduates: implications for leadership. Ed Phil Theory. 2012. https://doi.org/10.1111/j.1469-5812.2010.00672.x .

Mathews SR, Lowe K. Classroom environments that foster a disposition for critical thinking. Learn Envir Res. 2011. https://doi.org/10.1007/s10984-011-9082-2 .

Moser A, Puhan MA, Zwahlen M. The role of causal inference in health services research I: tasks in health services research. Int J Pub Health. 2020. https://doi.org/10.1007/s00038-020-01333-2 .

Shin HS. Reasoning processes in clinical reasoning: from the perspective of cognitive psychology. Korean J Med Ed. 2019. https://doi.org/10.3946/kjme.2019.140 .

Lantian A, Bagneux V, Delouvée S, Gauvrit N. Maybe a free thinker but not a critical one: high conspiracy belief is associated with low critical thinking ability. Appl Cog Psych. 2021. https://doi.org/10.1002/acp.3790 .

Wilson JA. Reducing pseudoscientific and paranormal beliefs in university students through a course in science and critical thinking. Sci Ed. 2018. https://doi.org/10.1007/s11191-018-9956-0 .

OECD—Skills matter: additional results from the survey of adult skills—United States. 2019. https://www.oecd.org/skills/piaac/publications/countryspecificmaterial/PIAAC_Country_Note_USA.pdf . Accessed 25 Jan 2023.

Tümkaya S, et al. An investigation of university student’s critical thinking disposition and perceived problem solving skills. Euras J Ed Res. 2009;36:57–74.

Comer RD, Schweiger TA, Shelton P. Impact of students’ strengths, critical thinking skills and disposition on academic success in the first year of a PharmD program. Amer J Pharm Ed. 2019. https://doi.org/10.5688/ajpe6499 .

Ozcan H, Elkoca A. Critical thinking skills of nursing candidates. Int J Car Sci. 2019;12:1600–6.

Pu D, et al. Influence of critical thinking disposition on the learning efficiency of problem-based learning in undergraduate medical students. BMC Med Ed. 2019. https://doi.org/10.1186/s12909-018-1418-5 .

Rahmawati M, Kurniati D, Trapsilasiwi D, Osman S. Students’ truth-seeking behaviour in solving problems with no specified universal set given. Kreano. 2021. https://doi.org/10.15294/kreano.v12i2.32549 .

Arth A, Griffin D, Earnest W. Professors’ perspectives on truth-seeking and new literacy. J Med Lit Ed. 2019. https://doi.org/10.23860/JMLE-2019-11-3-6 .

Gibbs P. Why academics should have a duty of truth telling in an epoch of post-truth? High Ed. 2019. https://doi.org/10.1007/s10734-018-0354-y .

Ghazivakili Z, Norouzi NR, Panahi F, Karimi M, Gholsorkh H, Ahmadi Z. The role of critical thinking skills and learning styles of university students in their academic performance. J Advanc Med Ed Prof. 2014;2:95–102.

Reyna C. Lazy, dumb, or industrious: when stereotypes convey attribution information in the classroom. Educ Psych Rev. 2000. https://doi.org/10.1023/A:1009037101170 .

Appel M, Kronberger N. Stereotypes and the achievement gap: stereotype threat prior to test taking. Educ Psych Rev. 2012. https://doi.org/10.1007/s10648-012-9200-4 .

Chang F, Luo M, Walton G, Aguilar L, Bailenson J. Stereotype threat in virtual learning environments: effects of avatar gender and sexist behavior on women’s math learning outcomes. Cyberpsych Behav Soc Net. 2019. https://doi.org/10.1089/cyber.2019.0106 .

Shin S, Lee I, Kim J, Oh E, Hong E. Effectiveness of a critical reflection competency program for clinical nurse educators: a pilot study. BMC Nurs. 2023. https://doi.org/10.1186/s12912-023-01236-6 .

Qiang R, Han Q, Guo Y, Bai J, Karwowski M. Critical thinking disposition and scientific creativity: the mediating role of creative self-efficacy. J Cret Behav. 2020. https://doi.org/10.1002/jocb.347 .

Khoshgoftar Z, Barkhordari-Sharifabad M. Medical students’ reflective capacity and its role in their critical thinking disposition. BMC Med Ed. 2023. https://doi.org/10.1186/s12909-023-04163-x .

Mahmoud SA, Mohamed HA. Critical thinking disposition among nurses working in public hospitals at port-said governorate. Int J Nurs Sci. 2017. https://doi.org/10.1016/j.ijnss.2017.02.006 .

Oudbier J, Spaai G, Timmermans K, Boerboom T. Enhancing the effectiveness of flipped classroom in health science education: a state-of-the-art review. BMC Med Ed. 2022. https://doi.org/10.1186/s12909-021-03052-5 .

Medina MS, Castleberry AN, Persky AM. Strategies for improving learner metacognition in health professional education. Am J Pharm Ed. 2017. https://doi.org/10.5688/ajpe81478 .

Abiogu GC, et al. Cognitive-behavioral reflective training for improving critical thinking disposition of nursing students. Medicine. 2020. https://doi.org/10.1097/MD.0000000000022429 .

Download references

Author information

Authors and affiliations.

School of Health Sciences, Stockton University, 101 Vera King Farris Dr., Galloway, NJ, 08205, USA

Anthony Dissen

You can also search for this author in PubMed   Google Scholar

Contributions

Author completed all data collection, analysis, table formatting, literature review, and manuscript writing.

Corresponding author

Correspondence to Anthony Dissen .

Ethics declarations

Ethics approval and consent to participate.

The research instruments and research methods for this study were approved by the Stockton University Institutional Review Board. All data collection took place after obtaining all necessary approvals from the Stockton University IRB Committee, including CITI training by the researcher. IRB approval was obtained after submitting all required documentation, proof of CITI training, study procedures, and informed consent documents (Stockton University IRB Approval Number #2021.175). All research activities were carried out following the guidelines set forth by the Stockton University IRB.

Competing interests

The author received no funding as part of this study, nor does he have any Competing interests related to the design or implementation of this study.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Score ranges for CCTST and CCTDI

1.1 cctst score ranges.

 

Not manifested

Weak

Moderate

Strong

Superior

CCTST score range

50–62

63–69

70–78

79–85

86 or higher

1.2 CCTDI score ranges

 

Strong negative

Negative

Inconsistent/ambivalent

Positive

CCDI score range

10–19

20–29

30–39

40–49

Appendix B: California critical thinking skills test domain descriptions

Overall critical thinking skills score overall ability and strength of a student to use reflective judgement and reasoning about how to make choices regarding a particular set of actions or how to develop an informed belief or opinion. This predicts capacities for success in educational and professional settings that require reasoned decision making and complex problem-solving.

Analysis score a measurement of overall analytical skill. This score is used to identify reasons, themes, assumptions, and evidence present that all must be considered and utilized when making an argument or offering explanation for phenomena.

Inference score refers to those skills and abilities that allow one to draw conclusions from the evidence, experiences, and observations being presented. In addition, Inference Scores show how one uses their personal values, beliefs, and reasoning skills to draw conclusions.

Evaluation scores the ability of someone to assess the credibility of claims and assertions being made by others, as well as their ability to assess the quality of the reasoning being used by others when an argument is being made or an explanation is being given.

Induction skill score one’s ability to estimate the likely outcomes of certain decisions or choices. Inductive reasoning and decision making is often assessed after reviewing case studies, reflecting upon prior life experiences, performing statistical analyses, participating in simulations, reviewing hypothetical situations, or studying patterns that emerge in a set of events.

Deduction critical thinking skills score the ability to engage in logical decision making that is based on a given set of rules, beliefs, conditions, values, principles, and/or policies.

Interpretation critical thinking skills score the development in the process of discovering and assigning meaning to information or events. Interpretive skills can be applied to verbal information, written text, and graphical and/or pictorial information.

Explanation critical thinking skills score the development in the process of justifying a decision that has been made or a belief that has been stated. Strong skills in this sub-domain rely upon the ability to provide evidence and to explain the methods used to explain the decision that has been made.

Numeracy critical thinking skills score the ability to make judgments and decisions based on quantitative information within a variety of different environments and contexts. This can include description on how quantitative information is gathered, adjusted, manipulated, represented, and explained.

Appendix C: California critical thinking disposition inventory domain descriptions

Truth-seeking score the habit and desire to seek out the best possible understanding of any given situation or issue. Truth-Seeking requires the goal of following the best available evidence to come to an informed conclusion, even if this leads one to question previously held beliefs or ideas.

Open-mindedness score the tendency to give space to others to voice their views, opinions, and beliefs, even when one may not personally agree with what is being shared. Open-Mindedness is a necessary disposition to be able to regard the opinions of others, and to understand the complexities that exist in a pluralistic and intersectional society.

Inquisitiveness score a curiosity at the intellectual level that is motivated by a desire to know and understand. Inquisitiveness is particularly related to an inherent desire to know this information, even if it does not appear to be immediately useful or relevant.

Analyticity Score the tendency to be actively aware of the next stage of actions that occur during an occurrence or event. Analyticity involves anticipating both positive and negative outcomes, and the various choices, plans, and proposals that can be considered at any given time.

Systematicity score the tendency to strive to approach issues or problems in an ordered, disciplined, and systematic way. Systematicity provides one with the desire to approach questions and uncertain situations in a purposeful manner, even when they do not possess a strong background or skill in using a particular approach.

Confidence in reasoning score the tendency and habit to solve problems and make decisions by trusting in reflective thinking and assessment. This relates to not only the confidence in one’s own reasoning process, but also in the reasoning that is utilized by groups and teams.

Maturity of judgment score refers to the habit and desire to be able to make timely decisions when confronted with complex issues and situations. Possessing an attitude that emphasizes Maturity of Judgment allows one to understand and accept that multiple solutions or options may be possible when approaching a question or issue and recognize that black-and-white thinking is not appropriate.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Dissen, A. A critical issue: assessing the critical thinking skills and dispositions of undergraduate health science students. Discov Educ 2 , 21 (2023). https://doi.org/10.1007/s44217-023-00044-z

Download citation

Received : 01 March 2023

Accepted : 02 August 2023

Published : 15 August 2023

DOI : https://doi.org/10.1007/s44217-023-00044-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Critical thinking
  • Educational assessment
  • Health science education
  • Undergraduate education
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 09 March 2020

Rubrics to assess critical thinking and information processing in undergraduate STEM courses

  • Gil Reynders 1 , 2 ,
  • Juliette Lantz 3 ,
  • Suzanne M. Ruder 2 ,
  • Courtney L. Stanford 4 &
  • Renée S. Cole   ORCID: orcid.org/0000-0002-2807-1500 1  

International Journal of STEM Education volume  7 , Article number:  9 ( 2020 ) Cite this article

69k Accesses

60 Citations

4 Altmetric

Metrics details

Process skills such as critical thinking and information processing are commonly stated outcomes for STEM undergraduate degree programs, but instructors often do not explicitly assess these skills in their courses. Students are more likely to develop these crucial skills if there is constructive alignment between an instructor’s intended learning outcomes, the tasks that the instructor and students perform, and the assessment tools that the instructor uses. Rubrics for each process skill can enhance this alignment by creating a shared understanding of process skills between instructors and students. Rubrics can also enable instructors to reflect on their teaching practices with regard to developing their students’ process skills and facilitating feedback to students to identify areas for improvement.

Here, we provide rubrics that can be used to assess critical thinking and information processing in STEM undergraduate classrooms and to provide students with formative feedback. As part of the Enhancing Learning by Improving Process Skills in STEM (ELIPSS) Project, rubrics were developed to assess these two skills in STEM undergraduate students’ written work. The rubrics were implemented in multiple STEM disciplines, class sizes, course levels, and institution types to ensure they were practical for everyday classroom use. Instructors reported via surveys that the rubrics supported assessment of students’ written work in multiple STEM learning environments. Graduate teaching assistants also indicated that they could effectively use the rubrics to assess student work and that the rubrics clarified the instructor’s expectations for how they should assess students. Students reported that they understood the content of the rubrics and could use the feedback provided by the rubric to change their future performance.

The ELIPSS rubrics allowed instructors to explicitly assess the critical thinking and information processing skills that they wanted their students to develop in their courses. The instructors were able to clarify their expectations for both their teaching assistants and students and provide consistent feedback to students about their performance. Supporting the adoption of active-learning pedagogies should also include changes to assessment strategies to measure the skills that are developed as students engage in more meaningful learning experiences. Tools such as the ELIPSS rubrics provide a resource for instructors to better align assessments with intended learning outcomes.

Introduction

Why assess process skills.

Process skills, also known as professional skills (ABET Engineering Accreditation Commission, 2012 ), transferable skills (Danczak et al., 2017 ), or cognitive competencies (National Research Council, 2012 ), are commonly cited as critical for students to develop during their undergraduate education (ABET Engineering Accreditation Commission, 2012 ; American Chemical Society Committee on Professional Training, 2015 ; National Research Council, 2012 ; Singer et al., 2012 ; The Royal Society, 2014 ). Process skills such as problem-solving, critical thinking, information processing, and communication are widely applicable to many academic disciplines and careers, and they are receiving increased attention in undergraduate curricula (ABET Engineering Accreditation Commission, 2012 ; American Chemical Society Committee on Professional Training, 2015 ) and workplace hiring decisions (Gray & Koncz, 2018 ; Pearl et al., 2019 ). Recent reports from multiple countries (Brewer & Smith, 2011 ; National Research Council, 2012 ; Singer et al., 2012 ; The Royal Society, 2014 ) indicate that these skills are emphasized in multiple undergraduate academic disciplines, and annual polls of about 200 hiring managers indicate that employers may place more importance on these skills than in applicants’ content knowledge when making hiring decisions (Deloitte Access Economics, 2014 ; Gray & Koncz, 2018 ). The assessment of process skills can provide a benchmark for achievement at the end of an undergraduate program and act as an indicator of student readiness to enter the workforce. Assessing these skills may also enable instructors and researchers to more fully understand the impact of active learning pedagogies on students.

A recent meta-analysis of 225 studies by Freeman et al. ( 2014 ) showed that students in active learning environments may achieve higher content learning gains than students in traditional lectures in multiple STEM fields when comparing scores on equivalent examinations. Active learning environments can have many different attributes, but they are commonly characterized by students “physically manipulating objects, producing new ideas, and discussing ideas with others” (Rau et al., 2017 ) in contrast to students sitting and listening to a lecture. Examples of active learning pedagogies include POGIL (Process Oriented Guided Inquiry Learning) (Moog & Spencer, 2008 ; Simonson, 2019 ) and PLTL (Peer-led Team Learning) (Gafney & Varma-Nelson, 2008 ; Gosser et al., 2001 ) in which students work in groups to complete activities with varying levels of guidance from an instructor. Despite the clear content learning gains that students can achieve from active learning environments (Freeman et al., 2014 ), the non-content-gains (including improvements in process skills) in these learning environments have not been explored to a significant degree. Active learning pedagogies such as POGIL and PLTL place an emphasis on students developing non-content skills in addition to content learning gains, but typically only the content learning is assessed on quizzes and exams, and process skills are not often explicitly assessed (National Research Council, 2012 ). In order to fully understand the effects of active learning pedagogies on all aspects of an undergraduate course, evidence-based tools must be used to assess students’ process skill development. The goal of this work was to develop resources that could enable instructors to explicitly assess process skills in STEM undergraduate classrooms in order to provide feedback to themselves and their students about the students’ process skills development.

Theoretical frameworks

The incorporation of these rubrics and other currently available tools for use in STEM undergraduate classrooms can be viewed through the lenses of constructive alignment (Biggs, 1996 ) and self-regulated learning (Zimmerman, 2002 ). The theory of constructivism posits that students learn by constructing their own understanding of knowledge rather than acquiring the meaning from their instructor (Bodner, 1986 ), and constructive alignment extends the constructivist model to consider how the alignment between a course’s intended learning outcomes, tasks, and assessments affects the knowledge and skills that students develop (Biggs, 2003 ). Students are more likely to develop the intended knowledge and skills if there is alignment between the instructor’s intended learning outcomes that are stated at the beginning of a course, the tasks that the instructor and students perform, and the assessment strategies that the instructor uses (Biggs, 1996 , 2003 , 2014 ). The nature of the tasks and assessments indicates what the instructor values and where students should focus their effort when studying. According to Biggs ( 2003 ) and Ramsden ( 1997 ), students see assessments as defining what they should learn, and a misalignment between the outcomes, tasks, and assessments may hinder students from achieving the intended learning outcomes. In the case of this work, the intended outcomes are improved process skills. In addition to aligning the components of a course, it is also critical that students receive feedback on their performance in order to improve their skills. Zimmerman’s theory of self-regulated learning (Zimmerman, 2002 ) provides a rationale for tailoring assessments to provide feedback to both students and instructors.

Zimmerman’s theory of self-regulated learning defines three phases of learning: forethought/planning, performance, and self-reflection. According to Zimmerman, individuals ideally should progress through these three phases in a cycle: they plan a task, perform the task, and reflect on their performance, then they restart the cycle on a new task. If a student is unable to adequately progress through the phases of self-regulated learning on their own, then feedback provided by an instructor may enable the students to do so (Butler & Winne, 1995 ). Thus, one of our criteria when creating rubrics to assess process skills was to make the rubrics suitable for faculty members to use to provide feedback to their students. Additionally, instructors can use the results from assessments to give themselves feedback regarding their students’ learning in order to regulate their teaching. This theory is called self-regulated learning because the goal is for learners to ultimately reflect on their actions to find ways to improve. We assert that, ideally, both students and instructors should be “learners” and use assessment data to reflect on their actions, although with different aims. Students need consistent feedback from an instructor and/or self-assessment throughout a course to provide a benchmark for their current performance and identify what they can do to improve their process skills (Black & Wiliam, 1998 ; Butler & Winne, 1995 ; Hattie & Gan, 2011 ; Nicol & Macfarlane-Dick, 2006 ). Instructors need feedback on the extent to which their efforts are achieving their intended goals in order to improve their instruction and better facilitate the development of process skills through course experiences.

In accordance with the aforementioned theoretical frameworks, tools used to assess undergraduate STEM student process skills should be tailored to fit the outcomes that are expected for undergraduate students and be able to provide formative assessment and feedback to both students and faculty about the students’ skills. These tools should also be designed for everyday classroom use to enable students to regularly self-assess and faculty to provide consistent feedback throughout a semester. Additionally, it is desirable for assessment tools to be broadly generalizable to measure process skills in multiple STEM disciplines and institutions in order to increase the rubrics’ impact on student learning. Current tools exist to assess these process skills, but they each lack at least one of the desired characteristics for providing regular feedback to STEM students.

Current tools to assess process skills

Current tests available to assess critical thinking include the Critical Thinking Assessment Test (CAT) (Stein & Haynes, 2011 ), California Critical Thinking Skills Test (Facione, 1990a , 1990b ), and Watson Glaser Critical Thinking Appraisal (Watson & Glaser, 1964 ). These commercially available, multiple-choice tests are not designed to provide regular, formative feedback throughout a course and have not been implemented for this purpose. Instead, they are designed to provide summative feedback with a focus on assessing this skill at a programmatic or university level rather than for use in the classroom to provide formative feedback to students. Rather than using tests to assess process skills, rubrics could be used instead. Rubrics are effective assessment tools because they can be quick and easy to use, they provide feedback to both students and instructors, and they can evaluate individual aspects of a skill to give more specific feedback (Brookhart & Chen, 2014 ; Smit & Birri, 2014 ). Rubrics for assessing critical thinking are available, but they have not been used to provide feedback to undergraduate STEM students nor were they designed to do so (Association of American Colleges and Universities, 2019 ; Saxton et al., 2012 ). The Critical Thinking Analytic Rubric is designed specifically to assess K-12 students to enhance college readiness and has not been broadly tested in collegiate STEM courses (Saxton et al., 2012 ). The critical thinking rubric developed by the Association of American Colleges and Universities (AAC&U) as part its Valid Assessment of Learning in Undergraduate Education (VALUE) Institute and Liberal Education and America’s Promise (LEAP) initiative (Association of American Colleges and Universities, 2019 ) is intended for programmatic assessment rather than specifically giving feedback to students throughout a course. As with tests for assessing critical thinking, current rubrics to assess critical thinking are not designed to act as formative assessments and give feedback to STEM faculty and undergraduates at the course or task level. Another issue with the assessment of critical thinking is the degree to which the construct is measurable. A National Research Council report (National Research Council, 2011 ) has suggested that there is little evidence of a consistent, measurable definition for critical thinking and that it may not be different from one’s general cognitive ability. Despite this issue, we have found that critical thinking is consistently listed as a programmatic outcome in STEM disciplines (American Chemical Society Committee on Professional Training, 2015 ; The Royal Society, 2014 ), so we argue that it is necessary to support instructors as they attempt to assess this skill.

Current methods for evaluating students’ information processing include discipline-specific tools such as a rubric to assess physics students’ use of graphs and equations to solve work-energy problems (Nguyen et al., 2010 ) and assessments of organic chemistry students’ ability to “[manipulate] and [translate] between various representational forms” including 2D and 3D representations of chemical structures (Kumi et al., 2013 ). Although these assessment tools can be effectively used for their intended context, they were not designed for use in a wide range of STEM disciplines or for a variety of tasks.

Despite the many tools that exist to measure process skills, none has been designed and tested to facilitate frequent, formative feedback to STEM undergraduate students and faculty throughout a semester. The rubrics described here have been designed by the Enhancing Learning by Improving Process Skills in STEM (ELIPSS) Project (Cole et al., 2016 ) to assess undergraduate STEM students’ process skills and to facilitate feedback at the classroom level with the potential to track growth throughout a semester or degree program. The rubrics described here are designed to assess critical thinking and information processing in student written work. Rubrics were chosen as the format for our process skill assessment tools because the highest level of each category in rubrics can serve as an explicit learning outcome that the student is expected to achieve (Panadero & Jonsson, 2013 ). Rubrics that are generalizable to multiple disciplines and institutions can enable the assessment of student learning outcomes and active learning pedagogies throughout a program of study and provide useful tools for a greater number of potential users.

Research questions

This work sought to answer the following research questions for each rubric:

Does the rubric adequately measure relevant aspects of the skill?

How well can the rubrics provide feedback to instructors and students?

Can multiple raters use the rubrics to give consistent scores?

This work received Institutional Review Board approval prior to any data collection involving human subjects. The sources of data used to construct the process skill rubrics and answer these research questions were (1) peer-reviewed literature on how each skill is defined, (2) feedback from content experts in multiple STEM disciplines via surveys and in-person, group discussions regarding the appropriateness of the rubrics for each discipline, (3) interviews with students whose work was scored with the rubrics and teaching assistants who scored the student work, and (4) results of applying the rubrics to samples of student work.

Defining the scope of the rubrics

The rubrics described here and the other rubrics in development by the ELIPSS Project are intended to measure process skills, which are desired learning outcomes identified by the STEM community in recent reports (National Research Council, 2012 ; Singer et al., 2012 ). In order to measure these skills in multiple STEM disciplines, operationalized definitions of each skill were needed. These definitions specify which aspects of student work (operations) would be considered evidence for the student using that skill and establish a shared understanding of each skill by members of each STEM discipline. The starting point for this work was the process skill definitions developed as part of the POGIL project (Cole et al., 2019a ). The POGIL community includes instructors from a variety of disciplines and institutions and represented the intended audience for the rubrics: faculty who value process skills and want to more explicitly assess them. The process skills discussed in this work were defined as follows:

Critical thinking is analyzing, evaluating, or synthesizing relevant information to form an argument or reach a conclusion supported with evidence.

Information processing is evaluating, interpreting, and manipulating or transforming information.

Examples of critical thinking include the tasks that students are asked to perform in a laboratory course. When students are asked to analyze the data they collected, combine data from different sources, and generate arguments or conclusions about their data, we see this as critical thinking. However, when students simply follow the so-called “cookbook” laboratory instructions that require them to confirm pre-determined conclusions, we do not think students are engaging in critical thinking. One example of information processing is when organic chemistry students are required to re-draw molecules in different formats. The students must evaluate and interpret various pieces of one representation, and then they recreate the molecule in another representation. However, if students are asked to simply memorize facts or algorithms to solve problems, we do not see this as information processing.

Iterative rubric development

The development process was the same for the information processing rubric and the critical thinking rubric. After defining the scope of the rubric, an initial version was drafted based upon the definition of the target process skill and how each aspect of the skill is defined in the literature. A more detailed discussion of the literature that informed each rubric category is included in the “Results and Discussion” section. This initial version then underwent iterative testing in which the rubric was reviewed by researchers, practitioners, and students. The rubric was first evaluated by the authors and a group of eight faculty from multiple STEM disciplines who made up the ELIPSS Project’s primary collaborative team (PCT). The PCT was a group of faculty members with experience in discipline-based education research who employ active-learning pedagogies in their classrooms. This initial round of evaluation was intended to ensure that the rubric measured relevant aspects of the skill and was appropriate for each PCT member’s discipline. This evaluation determined how well the rubrics were aligned with each instructor’s understanding of the process skill including both in-person and email discussions that continued until the group came to consensus that each rubric category could be applied to student work in courses within their disciplines. There has been an ongoing debate regarding the role of disciplinary knowledge in critical thinking and the extent to which critical thinking is subject-specific (Davies, 2013 ; Ennis, 1990 ). This work focuses on the creation of rubrics to measure process skills in different domains, but we have not performed cross-discipline comparisons. This initial round of review was also intended to ensure that the rubrics were ready for classroom testing by instructors in each discipline. Next, each rubric was tested over three semesters in multiple classroom environments, illustrated in Table 1 . The rubrics were applied to student work chosen by each PCT member. The PCT members chose the student work based on their views of how the assignments required students to engage in process skills and show evidence of those skills. The information processing and critical thinking rubrics shown in this work were each tested in at least three disciplines, course levels, and institutions.

After each semester, the feedback was collected from the faculty testing the rubric, and further changes to the rubric were made. Feedback was collected in the form of survey responses along with in-person group discussions at annual project meetings. After the first iteration of completing the survey, the PCT members met with the authors to discuss how they were interpreting each survey question. This meeting helped ensure that the surveys were gathering valid data regarding how well the rubrics were measuring the desired process skill. Questions in the survey such as “What aspects of the student work provided evidence for the indicated process skill?” and “Are there edits to the rubric/descriptors that would improve your ability to assess the process skill?” allowed the authors to determine how well the rubric scores were matching the student work and identify necessary changes to the rubric. Further questions asked about the nature and timing of the feedback given to students in order to address the question of how well the rubrics provide feedback to instructors and students. The survey questions are included in the Supporting Information . The survey responses were analyzed qualitatively to determine themes related to each research question.

In addition to the surveys given to faculty rubric testers, twelve students were interviewed in fall 2016 and fall 2017. In the United States of America, the fall semester typically runs from August to December and is the first semester of the academic year. Each student participated in one interview which lasted about 30 min. These interviews were intended to gather further data to answer questions about how well the rubrics were measuring the identified process skills that students were using when they completed their assignments and to ensure that the information provided by the rubrics made sense to students. The protocol for these interviews is included in the Supporting Information . In fall 2016, the students interviewed were enrolled in an organic chemistry laboratory course for non-majors at a large, research-intensive university in the United States. Thirty students agreed to have their work analyzed by the research team, and nine students were interviewed. However, the rubrics were not a component of the laboratory course grading. Instead, the first author assessed the students’ reports for critical thinking and information processing, and then the students were provided electronic copies of their laboratory reports and scored rubrics in advance of the interview. The first author had recently been a graduate teaching assistant for the course and was familiar with the instructor’s expectations for the laboratory reports. During the interview, the students were given time to review their reports and the completed rubrics, and then they were asked about how well they understood the content of the rubrics and how accurately each category score represented their work.

In fall 2017, students enrolled in a physical chemistry thermodynamics course for majors were interviewed. The physical chemistry course took place at the same university as the organic laboratory course, but there was no overlap between participants. Three students and two graduate teaching assistants (GTAs) were interviewed. The course included daily group work, and process skill assessment was an explicit part of the instructor’s curriculum. At the end of each class period, students assessed their groups using portions of ELIPSS rubrics, including the two process skill rubrics included in this paper. About every 2 weeks, the GTAs assessed the student groups with a complete ELIPSS rubric for a particular skill, then gave the groups their scored rubrics with written comments. The students’ individual homework problem sets were assessed once with rubrics for three skills: critical thinking, information processing, and problem-solving. The students received the scored rubric with written comments when the graded problem set was returned to them. In the last third of the semester, the students and GTAs were interviewed about how rubrics were implemented in the course, how well the rubric scores reflected the students’ written work, and how the use of rubrics affected the teaching assistants’ ability to assess the student skills. The protocols for these interviews are included in the Supporting Information .

Gathering evidence for utility, validity, and reliability

The utility, validity, and reliability of the rubrics were measured throughout the development process. The utility is the degree to which the rubrics are perceived as practical to experts and practitioners in the field. Through multiple meetings, the PCT faculty determined that early drafts of the rubric seemed appropriate for use in their classrooms, which represented multiple STEM disciplines. Rubric utility was reexamined multiple times throughout the development process to ensure that the rubrics would remain practical for classroom use. Validity can be defined in multiple ways. For example, the Standards for Educational and Psychological Testing (Joint Committee on Standards for Educational Psychological Testing, 2014 ) defines validity as “the degree to which all the accumulated evidence supports the intended interpretation of test scores for the proposed use.” For the purposes of this work, we drew on the ways in which two distinct types of validity were examined in the rubric literature: content validity and construct validity. Content validity is the degree to which the rubrics cover relevant aspects of each process skill (Moskal & Leydens, 2000 ). In this case, the process skill definition and a review of the literature determined which categories were included in each rubric. The literature review was finished once the data was saturated: when no more new aspects were found. Construct validity is the degree to which the levels of each rubric category accurately reflect the process that students performed (Moskal & Leydens, 2000 ). Evidence of construct validity was gathered via the faculty surveys, teaching assistant interviews, and student interviews. In the student interviews, students were given one of their completed assignments and asked to explain how they completed the task. Students were then asked to explain how well each category applied to their work and if any changes were needed to the rubric to more accurately reflect their process. Due to logistical challenges, we were not able to obtain evidence for convergent validity, and this is further discussed in the “Limitations” section.

Adjacent agreement, also known as “interrater agreement within one,” was chosen as the measure of interrater reliability due to its common use in rubric development projects (Jonsson & Svingby, 2007 ). The adjacent agreement is the percentage of cases in which two raters agree on a rating or are different by one level (i.e., they give adjacent ratings to the same work). Jonsson and Svingby ( 2007 ) found that most of the rubrics they reviewed had adjacent agreement scores of 90% or greater. However, they noted that the agreement threshold varied based on the number of possible levels of performance for each category in the rubric, with three and four being the most common numbers of levels. Since the rubrics discussed in this report have six levels (scores of zero through five) and are intended for low-stakes assessment and feedback, the goal of 80% adjacent agreement was selected. To calculate agreement for the critical thinking and information processing rubrics, two researchers discussed the scoring criteria for each rubric and then independently assessed the organic chemistry laboratory reports.

Results and discussion

The process skill rubrics to assess critical thinking and information processing in student written work were completed after multiple rounds of revision based on feedback from various sources. These sources include feedback from instructors who tested the rubrics in their classrooms, TAs who scored student work with the rubrics, and students who were assessed with the rubrics. The categories for each rubric will be discussed in terms of the evidence that the rubrics measure the relevant aspects of the skill and how they can be used to assess STEM undergraduate student work. Each category discussion will begin with a general explanation of the category followed by more specific examples from the organic chemistry laboratory course and physical chemistry lecture course to demonstrate how the rubrics can be used to assess student work.

Information processing rubric

The definition of information processing and the focus of the rubric presented here (Fig. 1 ) are distinct from cognitive information processing as defined by the educational psychology literature (Driscoll, 2005 ). The rubric shown here is more aligned with the STEM education construct of representational competency (Daniel et al., 2018 ).

figure 1

Rubric for assessing information processing

When solving a problem or completing a task, students must evaluate the provided information for relevance or importance to the task (Hanson, 2008 ; Swanson et al., 1990 ). All the information provided in a prompt (e.g., homework or exam questions) may not be relevant for addressing all parts of the prompt. Students should ideally show evidence of their evaluation process by identifying what information is present in the prompt/model, indicating what information is relevant or not relevant, and indicating why information is relevant. Responses with these characteristics would earn high rubric scores for this category. Although students may not explicitly state what information is necessary to address a task, the information they do use can act as indirect evidence of the degree to which they have evaluated all of the available information in the prompt. Evidence for students inaccurately evaluating information for relevance includes the inclusion of irrelevant information or the omission of relevant information in an analysis or in completing a task. When evaluating the organic chemistry laboratory reports, the focus for the evaluating category was the information students presented when identifying the chemical structure of their products. For students who received a high score, this information included their measured value for the product’s melting point, the literature (expected) value for the melting point, and the peaks in a nuclear magnetic resonance (NMR) spectrum. NMR spectroscopy is a commonly used technique in chemistry to obtain structural information about a compound. Lower scores were given if students omitted any of the necessary information or if they included unnecessary information. For example, if a student discussed their reaction yield when discussing the identity of their product, they would receive a low Evaluating score because the yield does not help them determine the identity of their product; the yield, in this case, would be unnecessary information. In the physical chemistry course, students often did not show evidence that they determined which information was relevant to answer the homework questions and thus earned low evaluating scores. These omissions will be further addressed in the “Interpreting” section.

Interpreting

In addition to evaluating, students must often interpret information using their prior knowledge to explain the meaning of something, make inferences, match data to predictions, and extract patterns from data (Hanson, 2008 ; Nakhleh, 1992 ; Schmidt et al., 1989 ; Swanson et al., 1990 ). Students earn high scores for this category if they assign correct meaning to labeled information (e.g., text, tables, graphs, diagrams), extract specific details from information, explain information in their own words, and determine patterns in information. For the organic chemistry laboratory reports, students received high scores if they accurately interpreted their measured values and NMR peaks. Almost every student obtained melting point values that were different than what was expected due to measurement error or impurities in their products, so they needed to describe what types of impurities could cause such discrepancies. Also, each NMR spectrum contained one peak that corresponded to the solvent used to dissolve the students’ product, so the students needed to use their prior knowledge of NMR spectroscopy to recognize that peak did not correspond to part of their product.

In physical chemistry, the graduate teaching assistant often gave students low scores for inaccurately explaining changes to chemical systems such as changes in pressure or entropy. The graduate teaching assistant who assessed the student work used the rubric to identify both the evaluating and interpreting categories as weaknesses in many of the students’ homework submissions. However, the students often earned high scores for the manipulating and transforming categories, so the GTA was able to give students specific feedback on their areas for improvement while also highlighting their strengths.

Manipulating and transforming (extent and accuracy)

In addition to evaluating and interpreting information, students may be asked to manipulate and transform information from one form to another. These transformations should be complete and accurate (Kumi et al., 2013 ; Nguyen et al., 2010 ). Students may be required to construct a figure based on written information, or conversely, they may transform information in a figure into words or mathematical expressions. Two categories for manipulating and transforming (i.e., extent and accuracy) were included to allow instructors to give more specific feedback. It was often found that students would either transform little information but do so accurately, or transform much information and do so inaccurately; the two categories allowed for differentiated feedback to be provided. As stated above, the organic chemistry students were expected to transform their NMR spectral data into a table and provide a labeled structure of their final product. Students were given high scores if they converted all of the relevant peaks from their spectrum into the table format and were able to correctly match the peaks to the hydrogen atoms in their products. Students received lower scores if they were only able to convert the information for a few peaks or if they incorrectly matched the peaks to the hydrogen atoms.

Critical thinking rubric

Critical thinking can be broadly defined in different contexts, but we found that the categories included in the rubric (Fig. 2 ) represented commonly accepted aspects of critical thinking (Danczak et al., 2017 ) and suited the needs of the faculty collaborators who tested the rubric in their classrooms.

figure 2

Rubric for assessing critical thinking

When completing a task, students must evaluate the relevance of information that they will ultimately use to support a claim or conclusions (Miri et al., 2007 ; Zohar et al., 1994 ). An evaluating category is included in both critical thinking and information processing rubrics because evaluation is a key aspect of both skills. From our previous work developing a problem-solving rubric (manuscript in preparation) and our review of the literature for this work (Danczak et al., 2017 ; Lewis & Smith, 1993 ), the overlap was seen between information processing, critical thinking, and problem-solving. Additionally, while the Evaluating category in the information processing rubric assesses a student’s ability to determine the importance of information to complete a task, the evaluating category in the critical thinking rubric places a heavier emphasis on using the information to support a conclusion or argument.

When scoring student work with the evaluating category, students receive high scores if they indicate what information is likely to be most relevant to the argument they need to make, determine the reliability of the source of their information, and determine the quality and accuracy of the information itself. The information used to assess this category can be indirect as with the Evaluating category in the information processing rubric. In the organic chemistry laboratory reports, students needed to make an argument about whether they successfully produced the desired product, so they needed to discuss which information was relevant to their claims about the product’s identity and purity. Students received high scores for the evaluating category when they accurately determined that the melting point and nearly all peaks except the solvent peak in the NMR spectrum indicated the identity of their product. Students received lower scores for evaluating when they left out relevant information because this was seen as evidence that the student inaccurately evaluated the information’s relevance in supporting their conclusion. They also received lower scores when they incorrectly stated that a high yield indicated a pure product. Students were given the opportunity to demonstrate their ability to evaluate the quality of information when discussing their melting point. Students sometimes struggled to obtain reliable melting point data due to their inexperience in the laboratory, so the rubric provided a way to assess the student’s ability to critique their own data.

In tandem with evaluating information, students also need to analyze that same information to extract meaningful evidence to support their conclusions (Bailin, 2002 ; Lai, 2011 ; Miri et al., 2007 ). The analyzing category provides an assessment of a student’s ability to discuss information and explore the possible meaning of that information, extract patterns from data/information that could be used as evidence for their claims, and summarize information that could be used as evidence. For example, in the organic chemistry laboratory reports, students needed to compare the information they obtained to the expected values for a product. Students received high scores for the analyzing category if they could extract meaningful structural information from the NMR spectrum and their two melting points (observed and expected) for each reaction step.

Synthesizing

Often, students are asked to synthesize or connect multiple pieces of information in order to draw a conclusion or make a claim (Huitt, 1998 ; Lai, 2011 ). Synthesizing involves identifying the relationships between different pieces of information or concepts, identifying ways that different pieces of information or concepts can be combined, and explaining how the newly synthesized information can be used to reach a conclusion and/or support an argument. While performing the organic chemistry laboratory experiments, students obtained multiple types of information such as the melting point and NMR spectrum in addition to other spectroscopic data such as an infrared (IR) spectrum. Students received high scores for this category when they accurately synthesized these multiple data types by showing how the NMR and IR spectra could each reveal different parts of a molecule in order to determine the molecule’s entire structure.

Forming arguments (structure and validity)

The final key aspect of critical thinking is forming a well-structured and valid argument (Facione, 1984 ; Glassner & Schwarz, 2007 ; Lai, 2011 ; Lewis & Smith, 1993 ). It was observed that students can earn high scores for evaluating, analyzing, and synthesizing, but still struggle to form arguments. This was particularly common in assessing problem sets in the physical chemistry course.

As with the manipulating and transforming categories in the information processing rubric, two forming arguments categories were included to allow instructors to give more specific feedback. Some students may be able to include all of the expected structural elements of their arguments but use faulty information or reasoning. Conversely, some students may be able to make scientifically valid claims but not necessarily support them with evidence. The two forming arguments categories are intended to accurately assess both of these scenarios. For the forming arguments (structure) category, students earn high scores if they explicitly state their claim or conclusion, list the evidence used to support the argument, and provide reasoning to link the evidence to their claim/conclusion. Students who do not make a claim or who provide little evidence or reasoning receive lower scores.

For the forming arguments (validity) category, students earn high scores if their claim is accurate and their reasoning is logical and clearly supports the claim with provided evidence. Organic chemistry students earned high scores for the forms and supports arguments categories if they made explicit claims about the identity and purity of their product and provided complete and accurate evidence for their claim(s) such as the melting point values and positions of NMR peaks that correspond to their product. Additionally, the students provided evidence for the purity of their products by pointing to the presence or absence of peaks in their NMR spectrum that would match other potential side products. They also needed to provide logical reasoning for why the peaks indicated the presence or absence of a compound. As previously mentioned, the physical chemistry students received lower scores for the forming arguments categories than for the other aspects of critical thinking. These students were asked to make claims about the relationships between entropy and heat and then provide relevant evidence to justify these claims. Often, the students would make clearly articulated claims but would provide little evidence to support them. As with the information processing rubric, the critical thinking rubric allowed the GTAs to assess aspects of these skills independently and identify specific areas for student improvement.

Validity and reliability

The goal of this work was to create rubrics that can accurately assess student work (validity) and be consistently implemented by instructors or researchers within multiple STEM fields (reliability). The evidence for validity includes the alignment of the rubrics with literature-based descriptions of each skill, review of the rubrics by content experts from multiple STEM disciplines, interviews with undergraduate students whose work was scored using the rubrics, and interviews of the GTAs who scored the student work.

The definitions for each skill, along with multiple iterations of the rubrics, underwent review by STEM content experts. As noted earlier, the instructors who were testing the rubrics were given a survey at the end of each semester and were invited to offer suggested changes to the rubric to better help them assess their students. After multiple rubric revisions, survey responses from the instructors indicated that the rubrics accurately represented the breadth of each process skill as seen in each expert’s content area and that each category could be used to measure multiple levels of student work. By the end of the rubrics’ development, instructors were writing responses such as “N/A” or “no suggestions” to indicate that the rubrics did not need further changes.

Feedback from the faculty also indicated that the rubrics were measuring the intended constructs by the ways they responded to the survey item “What aspects of the student work provided evidence for the indicated process skill?” For example, one instructor noted that for information processing, she saw evidence of the manipulating and transforming categories when “students had to transform their written/mathematical relationships into an energy diagram.” Another instructor elicited evidence of information processing during an in-class group quiz: “A question on the group quiz was written to illicit [sic] IP [information processing]. Students had to transform a structure into three new structures and then interpret/manipulate the structures to compare the pKa values [acidity] of the new structures.” For this instructor, the structures written by the students revealed evidence of their information processing by showing what information they omitted in the new structures or inaccurately transformed. For critical thinking, an instructor assessed short research reports with the critical thinking rubric and “looked for [the students’] ability to use evidence to support their conclusions, to evaluate the literature studies, and to develop their own judgements by synthesizing the information.” Another instructor used the critical thinking rubric to assess their students’ abilities to choose an instrument to perform a chemical analysis. According to the instructor, the students provided evidence of their critical thinking because “in their papers, they needed to justify their choice of instrument. This justification required them to evaluate information and synthesize a new understanding for this specific chemical analysis.”

Analysis of student work indicates multiple levels of achievement for each rubric category (illustrated in Fig. 3 ), although there may have been a ceiling effect for the evaluating and the manipulating and transforming (extent) categories in information processing for organic chemistry laboratory reports because many students earned the highest possible score (five) for those categories. However, other implementations of the ELIPSS rubrics (Reynders et al., 2019 ) have shown more variation in student scores for the two process skills.

figure 3

Student rubric scores from an organic chemistry laboratory course. The two rubrics were used to evaluate different laboratory reports. Thirty students were assessed for information processing and 28 were assessed for critical thinking

To provide further evidence that the rubrics were measuring the intended skills, students in the physical chemistry course were interviewed about their thought processes and how well the rubric scores reflected the work they performed. During these interviews, students described how they used various aspects of information processing and critical thinking skills. The students first described how they used information processing during a problem set where they had to answer questions about a diagram of systolic and diastolic blood pressure. Students described how they evaluated and interpreted the graph to make statements such as “diastolic [pressure] is our y-intercept” and “volume is the independent variable.” The students then demonstrated their ability to transform information from one form to another, from a graph to a mathematic equation, by recognizing “it’s a linear relationship so I used Y equals M X plus B ” and “integrated it cause it’s the change, the change in V [volume]. For critical thinking, students described their process on a different problem set. In this problem set, the students had to explain why the change of Helmholtz energy and the change in Gibbs free energy were equivalent under a certain given condition. Students first demonstrated how they evaluated the relevant information and analyzed what would and would not change in their system. One student said, “So to calculate the final pressure, I think I just immediately went to the ideal gas law because we know the final volume and the number of moles won’t change and neither will the temperature in this case. Well, I assume that it wouldn’t.” Another student showed evidence of their evaluation by writing out all the necessary information in one place and stating, “Whenever I do these types of problems, I always write what I start with which is why I always have this line of information I’m given.” After evaluating and analyzing, students had to form an argument by claiming that the two energy values were equal and then defending that claim. Students explained that they were not always as clear as they could be when justifying their claim. For instance, one student said, “Usually I just write out equations and then hope people understand what I’m doing mathematically” but they “probably could have explained it a little more.”

Student feedback throughout the organic chemistry course and near the end of the physical chemistry course indicated that the rubric scores were accurate representations of the students’ work with a few exceptions. For example, some students felt like they should have received either a lower or higher score for certain categories, but they did say that the categories themselves applied well to their work. Most notably, one student reported that the forms and supports arguments categories in the critical thinking rubric did not apply to her work because she “wasn’t making an argument” when she was demonstrating that the Helmholtz and Gibbs energy values were equal in her thermodynamics assignment. We see this as an instance where some students and instructors may define argument in different ways. The process skill definitions and the rubric categories are meant to articulate intended learning outcomes from faculty members to their students, so if a student defines the skills or categories differently than the faculty member, then the rubrics can serve to promote a shared understanding of the skill.

As previously mentioned, reliability was measured by two researchers assessing ten laboratory reports independently to ensure that multiple raters could use the rubrics consistently. The average adjacent agreement scores were 92% for critical thinking and 93% for information processing. The exact agreement scores were 86% for critical thinking and 88% for information processing. Additionally, two different raters assessed a statistics assignment that was given to sixteen first-year undergraduates. The average pairwise adjacent agreement scores were 89% for critical thinking and 92% for information processing for this assignment. However, the exact agreement scores were much lower: 34% for critical thinking and 36% for information processing. In this case, neither rater was an expert in the content area. While the exact agreement scores for the statistics assignment are much lower than desirable, the adjacent agreement scores do meet the threshold for reliability as seen in other rubrics (Jonsson & Svingby, 2007 ) despite the disparity in expertise. Based on these results, it may be difficult for multiple raters to give exactly the same scores to the same work if they have varying levels of content knowledge, but it is important to note that the rubrics are primarily intended for formative assessment that can facilitate discussions between instructors and students about the ways for students to improve. The high level of adjacent agreement scores indicates that multiple raters can identify the same areas to improve in examples of student work.

Instructor and teaching assistant reflections

The survey responses from faculty members determined the utility of the rubrics. Faculty members reported that when they used the rubrics to define their expectations and be more specific about their assessment criteria, the students seemed to be better able to articulate the areas in which they needed improvement. As one instructor put it, “having the rubrics helped open conversations and discussions” that were not happening before the rubrics were implemented. We see this as evidence of the clear intended learning outcomes that are an integral aspect of achieving constructive alignment within a course. The instructors’ specific feedback to the students, and the students’ increased awareness of their areas for improvement, may enable the students to better regulate their learning throughout a course. Additionally, the survey responses indicated that the faculty members were changing their teaching practices and becoming more cognizant of how assignments did or did not elicit the process skill evidence that they desired. After using the rubrics, one instructor said, “I realize I need to revise many of my activities to more thoughtfully induce process skill development.” We see this as evidence that the faculty members were using the rubrics to regulate their teaching by reflecting on the outcomes of their practices and then planning for future teaching. These activities represent the reflection and forethought/planning aspects of self-regulated learning on the part of the instructors. Graduate teaching assistants in the physical chemistry course indicated that the rubrics gave them a way to clarify the instructor’s expectations when they were interacting with the students. As one GTA said, “It’s giving [the students] feedback on direct work that they have instead of just right or wrong. It helps them to understand like ‘Okay how can I improve? What areas am I lacking in?’” A more detailed account of how the instructors and teaching assistants implemented the rubrics has been reported elsewhere (Cole et al., 2019a ).

Student reflections

Students in both the organic and physical chemistry courses reported that they could use the rubrics to engage in the three phases of self-regulated learning: forethought/planning, performing, and reflecting. In an organic chemistry interview, one student was discussing how they could improve their low score for the synthesizing category of critical thinking by saying “I could use the data together instead of trying to use them separately,” thus demonstrating forethought/planning for their later work. Another student described how they could use the rubric while performing a task: “I could go through [the rubric] as I’m writing a report…and self-grade.” Finally, one student demonstrated how they could use the rubrics to reflect on their areas for improvement by saying that “When you have the five column [earn a score of five], I can understand that I’m doing something right” but “I really need to work on revising my reports.” We see this as evidence that students can use the rubrics to regulate their own learning, although classroom facilitation can have an effect on the ways in which students use the rubric feedback (Cole et al., 2019b ).

Limitations

The process skill definitions presented here represent a consensus understanding among members of the POGIL community and the instructors who participated in this study, but these skills are often defined in multiple ways by various STEM instructors, employers, and students (Danczak et al., 2017 ). One issue with critical thinking, in particular, is the broadness of how the skill is defined in the literature. Through this work, we have evidence via expert review to indicate that our definitions represent common understandings among a set of STEM faculty. Nonetheless, we cannot claim that all STEM instructors or researchers will share the skill definitions presented here.

There is currently a debate in the STEM literature (National Research Council, 2011 ) about whether the critical thinking construct is domain-general or domain-specific, that is, whether or not one’s critical thinking ability in one discipline can be applied to another discipline. We cannot make claims about the generalness of the construct based on the data presented here because the same students were not tested across multiple disciplines or courses. Additionally, we did not gather evidence for convergent validity, which is “the degree to which an operationalized construct is similar to other operationalized constructs that it theoretically should be similar to” (National Research Council, 2011 ). In other words, evidence for convergent validity would be the comparison of multiple measures of information processing or critical thinking. However, none of the instructors who used the ELIPSS rubrics also used a secondary measure of the constructs. Although the rubrics were examined by a multidisciplinary group of collaborators, this group was primarily chemists and included eight faculties from other disciplines, so the content validity of the rubrics may be somewhat limited.

Finally, the generalizability of the rubrics is limited by the relatively small number of students who were interviewed about their work. During their interviews, the students in the organic and physical chemistry courses each said that they could use the rubric scores as feedback to improve their skills. Additionally, as discussed in the “Validity and Reliability” section, the processes described by the students aligned with the content of the rubric and provided evidence of the rubric scores’ validity. However, the data gathered from the student interviews only represents the views of a subset of students in the courses, and further study is needed to determine the most appropriate contexts in which the rubrics can be implemented.

Conclusions and implications

Two rubrics were developed to assess and provide feedback on undergraduate STEM students’ critical thinking and information processing. Faculty survey responses indicated that the rubrics measured the relevant aspects of each process skill in the disciplines that were examined. Faculty survey responses, TA interviews, and student interviews over multiple semesters indicated that the rubric scores accurately reflected the evidence of process skills that the instructors wanted to see and the processes that the students performed when they were completing their assignments. The rubrics showed high inter-rater agreement scores, indicating that multiple raters could identify the same areas for improvement in student work.

In terms of constructive alignment, courses should ideally have alignment between their intended learning outcomes, student and instructor activities, and assessments. By using the ELIPSS rubrics, instructors were able to explicitly articulate the intended learning outcomes of their courses to their students. The instructors were then able to assess and provide feedback to students on different aspects of their process skills. Future efforts will be focused on modifying student assignments to enable instructors to better elicit evidence of these skills. In terms of self-regulated learning, students indicated in the interviews that the rubric scores were accurate representations of their work (performances), could help them reflect on their previous work (self-reflection), and the feedback they received could be used to inform their future work (forethought). Not only did the students indicate that the rubrics could help them regulate their learning, but the faculty members indicated that the rubrics had helped them regulate their teaching. With the individual categories on each rubric, the faculty members were better able to observe their students’ strengths and areas for improvement and then tailor their instruction to meet those needs. Our results indicated that the rubrics helped instructors in multiple STEM disciplines and at multiple institutions reflect on their teaching and then make changes to better align their teaching with their desired outcomes.

Overall, the rubrics can be used in a number of different ways to modify courses or for programmatic assessment. As previously stated, instructors can use the rubrics to define expectations for their students and provide them with feedback on desired skills throughout a course. The rubric categories can be used to give feedback on individual aspects of student process skills to provide specific feedback to each student. If an instructor or department wants to change from didactic lecture-based courses to active learning ones, the rubrics can be used to measure non-content learning gains that stem from the adoption of such pedagogies. Although the examples provided here for each rubric were situated in chemistry contexts, the rubrics were tested in multiple disciplines and institution types. The rubrics have the potential for wide applicability to assess not only laboratory reports but also homework assignments, quizzes, and exams. Assessing these tasks provides a way for instructors to achieve constructive alignment between their intended outcomes and their assessments, and the rubrics are intended to enhance this alignment to improve student process skills that are valued in the classroom and beyond.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

American Association of Colleges and Universities

Critical Thinking Assessment Test

Comprehensive University

Enhancing Learning by Improving Process Skills in STEM

Liberal Education and America’s Promise

Nuclear Magnetic Resonance

Primary Collaborative Team

Peer-led Team Learning

Process Oriented Guided Inquiry Learning

Primarily Undergraduate Institution

Research University

Science, Technology, Engineering, and Mathematics

Valid Assessment of Learning in Undergraduate Education

ABET Engineering Accreditation Commission. (2012). Criteria for Accrediting Engineering Programs . Retrieved from http://www.abet.org/accreditation/accreditation-criteria/criteria-for-accrediting-engineering-programs-2016-2017/ .

American Chemical Society Committee on Professional Training. (2015). Unergraduate Professional Education in Chemistry: ACS Guidelines and Evaluation Procedures for Bachelor's Degree Programs . Retrieved from https://www.acs.org/content/dam/acsorg/about/governance/committees/training/2015-acs-guidelines-for-bachelors-degree-programs.pdf

Association of American Colleges and Universities. (2019). VALUE Rubric Development Project. Retrieved from https://www.aacu.org/value/rubrics .

Bailin, S. (2002). Critical Thinking and Science Education. Science and Education, 11 , 361–375.

Article   Google Scholar  

Biggs, J. (1996). Enhancing teaching through constructive alignment. Higher Education, 32 (3), 347–364.

Biggs, J. (2003). Aligning teaching and assessing to course objectives. Teaching and learning in higher education: New trends and innovations, 2 , 13–17.

Google Scholar  

Biggs, J. (2014). Constructive alignment in university teaching. HERDSA Review of higher education, 1 (1), 5–22.

Black, P., & Wiliam, D. (1998). Assessment and Classroom Learning. Assessment in Education: Principles, Policy & Practice, 5 (1), 7–74.

Bodner, G. M. (1986). Constructivism: A theory of knowledge. Journal of Chemical Education, 63 (10), 873–878.

Brewer, C. A., & Smith, D. (2011). Vision and change in undergraduate biology education: a call to action. American Association for the Advancement of Science . DC : Washington .

Brookhart, S. M., & Chen, F. (2014). The quality and effectiveness of descriptive rubrics. Educational Review , 1–26.

Butler, D. L., & Winne, P. H. (1995). Feedback and Self-Regulated Learning: A Theoretical Synthesis. Review of Educational Research, 65 (3), 245–281.

Cole, R., Lantz, J., & Ruder, S. (2016). Enhancing Learning by Improving Process Skills in STEM. Retrieved from http://www.elipss.com .

Cole, R., Lantz, J., & Ruder, S. (2019a). PO: The Process. In S. R. Simonson (Ed.), POGIL: An Introduction to Process Oriented Guided Inquiry Learning for Those Who Wish to Empower Learners (pp. 42–68). Sterling, VA: Stylus Publishing.

Cole, R., Reynders, G., Ruder, S., Stanford, C., & Lantz, J. (2019b). Constructive Alignment Beyond Content: Assessing Professional Skills in Student Group Interactions and Written Work. In M. Schultz, S. Schmid, & G. A. Lawrie (Eds.), Research and Practice in Chemistry Education: Advances from the 25 th IUPAC International Conference on Chemistry Education 2018 (pp. 203–222). Singapore: Springer.

Chapter   Google Scholar  

Danczak, S., Thompson, C., & Overton, T. (2017). ‘What does the term Critical Thinking mean to you?’A qualitative analysis of chemistry undergraduate, teaching staff and employers' views of critical thinking. Chemistry Education Research and Practice, 18 , 420–434.

Daniel, K. L., Bucklin, C. J., Leone, E. A., & Idema, J. (2018). Towards a Definition of Representational Competence. In Towards a Framework for Representational Competence in Science Education (pp. 3–11). Switzerland: Springer.

Davies, M. (2013). Critical thinking and the disciplines reconsidered. Higher Education Research & Development, 32 (4), 529–544.

Deloitte Access Economics. (2014). Australia's STEM Workforce: a survey of employers. Retrieved from https://www2.deloitte.com/au/en/pages/economics/articles/australias-stem-workforce-survey.html .

Driscoll, M. P. (2005). Psychology of learning for instruction . Boston, MA: Pearson Education.

Ennis, R. H. (1990). The extent to which critical thinking is subject-specific: Further clarification. Educational researcher, 19 (4), 13–16.

Facione, P. A. (1984). Toward a theory of critical thinking. Liberal Education, 70 (3), 253–261.

Facione, P. A. (1990a). The California Critical Thinking Skills Test--College Level . In Technical Report #1 . Experimental Validation and Content : Validity .

Facione, P. A. (1990b). The California critical thinking skills test—college level . In Technical Report #2 . Factors Predictive of CT : Skills .

Freeman, S., Eddy, S. L., McDonough, M., Smith, M. K., Okoroafor, N., Jordt, H., & Wenderoth, M. P. (2014). Active learning increases student performance in science, engineering, and mathematics. Proceedings of the National Academy of Sciences, 111 (23), 8410–8415.

Gafney, L., & Varma-Nelson, P. (2008). Peer-led team learning: evaluation, dissemination, and institutionalization of a college level initiative (Vol. 16): Springer Science & Business Media, Netherlands.

Glassner, A., & Schwarz, B. B. (2007). What stands and develops between creative and critical thinking? Argumentation? Thinking Skills and Creativity, 2 (1), 10–18.

Gosser, D. K., Cracolice, M. S., Kampmeier, J. A., Roth, V., Strozak, V. S., & Varma-Nelson, P. (2001). Peer-led team learning: A guidebook: Prentice Hall Upper Saddle River, NJ .

Gray, K., & Koncz, A. (2018). The key attributes employers seek on students' resumes. Retrieved from http://www.naceweb.org/about-us/press/2017/the-key-attributes-employers-seek-on-students-resumes/ .

Hanson, D. M. (2008). A cognitive model for learning chemistry and solving problems: implications for curriculum design and classroom instruction. In R. S. Moog & J. N. Spencer (Eds.), Process-Oriented Guided Inquiry Learning (pp. 15–19). Washington, DC: American Chemical Society.

Hattie, J., & Gan, M. (2011). Instruction based on feedback. Handbook of research on learning and instruction , 249-271.

Huitt, W. (1998). Critical thinking: an overview. In Educational psychology interactive Retrieved from http://www.edpsycinteractive.org/topics/cogsys/critthnk.html .

Joint Committee on Standards for Educational Psychological Testing. (2014). Standards for Educational and Psychological Testing : American Educational Research Association.

Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2 (2), 130–144.

Kumi, B. C., Olimpo, J. T., Bartlett, F., & Dixon, B. L. (2013). Evaluating the effectiveness of organic chemistry textbooks in promoting representational fluency and understanding of 2D-3D diagrammatic relationships. Chemistry Education Research and Practice, 14 , 177–187.

Lai, E. R. (2011). Critical thinking: a literature review. Pearson's Research Reports, 6 , 40–41.

Lewis, A., & Smith, D. (1993). Defining higher order thinking. Theory into Practice, 32 , 131–137.

Miri, B., David, B., & Uri, Z. (2007). Purposely teaching for the promotion of higher-order thinking skills: a case of critical thinking. Research in Science Education, 37 , 353–369.

Moog, R. S., & Spencer, J. N. (Eds.). (2008). Process oriented guided inquiry learning (POGIL) . Washington, DC: American Chemical Society.

Moskal, B. M., & Leydens, J. A. (2000). Scoring rubric development: validity and reliability. Practical Assessment, Research and Evaluation, 7 , 1–11.

Nakhleh, M. B. (1992). Why some students don't learn chemistry: Chemical misconceptions. Journal of Chemical Education, 69 (3), 191.

National Research Council. (2011). Assessing 21st Century Skills: Summary of a Workshop . Washington, DC: The National Academies Press.

National Research Council. (2012). Education for Life and Work: Developing Transferable Knowledge and Skills in the 21st Century . Washington, DC: The National Academies Press.

Nguyen, D. H., Gire, E., & Rebello, N. S. (2010). Facilitating Strategies for Solving Work-Energy Problems in Graphical and Equational Representations. 2010 Physics Education Research Conference, 1289 , 241–244.

Nicol, D. J., & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: A model and seven principles of good feedback practice. Studies in Higher Education, 31 (2), 199–218.

Panadero, E., & Jonsson, A. (2013). The use of scoring rubrics for formative assessment purposes revisited: a review. Educational Research Review, 9 , 129–144.

Pearl, A. O., Rayner, G., Larson, I., & Orlando, L. (2019). Thinking about critical thinking: An industry perspective. Industry & Higher Education, 33 (2), 116–126.

Ramsden, P. (1997). The context of learning in academic departments. The experience of learning, 2 , 198–216.

Rau, M. A., Kennedy, K., Oxtoby, L., Bollom, M., & Moore, J. W. (2017). Unpacking “Active Learning”: A Combination of Flipped Classroom and Collaboration Support Is More Effective but Collaboration Support Alone Is Not. Journal of Chemical Education, 94 (10), 1406–1414.

Reynders, G., Suh, E., Cole, R. S., & Sansom, R. L. (2019). Developing student process skills in a general chemistry laboratory. Journal of Chemical Education , 96 (10), 2109–2119.

Saxton, E., Belanger, S., & Becker, W. (2012). The Critical Thinking Analytic Rubric (CTAR): Investigating intra-rater and inter-rater reliability of a scoring mechanism for critical thinking performance assessments. Assessing Writing, 17 , 251–270.

Schmidt, H. G., De Volder, M. L., De Grave, W. S., Moust, J. H. C., & Patel, V. L. (1989). Explanatory Models in the Processing of Science Text: The Role of Prior Knowledge Activation Through Small-Group Discussion. J. Educ. Psychol., 81 , 610–619.

Simonson, S. R. (Ed.). (2019). POGIL: An Introduction to Process Oriented Guided Inquiry Learning for Those Who Wish to Empower Learners . Sterling, VA: Stylus Publishing, LLC.

Singer, S. R., Nielsen, N. R., & Schweingruber, H. A. (Eds.). (2012). Discipline-Based education research: understanding and improving learning in undergraduate science and engineering . Washington D.C.: The National Academies Press.

Smit, R., & Birri, T. (2014). Assuring the quality of standards-oriented classroom assessment with rubrics for complex competencies. Studies in Educational Evaluation, 43 , 5–13.

Stein, B., & Haynes, A. (2011). Engaging Faculty in the Assessment and Improvement of Students' Critical Thinking Using the Critical Thinking Assessment Test. Change: The Magazine of Higher Learning, 43 , 44–49.

Swanson, H. L., Oconnor, J. E., & Cooney, J. B. (1990). An Information-Processing Analysis of Expert and Novice Teachers Problem-Solving. American Educational Research Journal, 27 (3), 533–556.

The Royal Society. (2014). Vision for science and mathematics education: The Royal Society Science Policy Centre . London: England.

Watson, G., & Glaser, E. M. (1964). Watson-Glaser Critical Thinking Appraisal Manual . New York, NY: Harcourt, Brace, and World.

Zimmerman, B. J. (2002). Becoming a self-regulated learner: An overview. Theory into Practice, 41 (2), 64–70.

Zohar, A., Weinberger, Y., & Tamir, P. (1994). The Effect of the Biology Critical Thinking Project on the Development of Critical Thinking. Journal of Research in Science Teaching, 31 , 183–196.

Download references

Acknowledgements

We thank members of our Primary Collaboration Team and Implementation Cohorts for collecting and sharing data. We also thank all the students who have allowed us to examine their work and provided feedback.

Supporting information

• Product rubric survey

• Initial implementation survey

• Continuing implementation survey

This work was supported in part by the National Science Foundation under collaborative grants #1524399, #1524936, and #1524965. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and affiliations.

Department of Chemistry, University of Iowa, W331 Chemistry Building, Iowa City, Iowa, 52242, USA

Gil Reynders & Renée S. Cole

Department of Chemistry, Virginia Commonwealth University, Richmond, Virginia, 23284, USA

Gil Reynders & Suzanne M. Ruder

Department of Chemistry, Drew University, Madison, New Jersey, 07940, USA

Juliette Lantz

Department of Chemistry, Ball State University, Muncie, Indiana, 47306, USA

Courtney L. Stanford

You can also search for this author in PubMed   Google Scholar

Contributions

RC, JL, and SR performed an initial literature review that was expanded by GR. All authors designed the survey instruments. GR collected and analyzed the survey and interview data with guidance from RC. GR revised the rubrics with extensive input from all other authors. All authors contributed to reliability measurements. GR drafted all manuscript sections. RC provided extensive comments during manuscript revisions; JL, SR, and CS also offered comments. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Renée S. Cole .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1..

Supporting Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Reynders, G., Lantz, J., Ruder, S.M. et al. Rubrics to assess critical thinking and information processing in undergraduate STEM courses. IJ STEM Ed 7 , 9 (2020). https://doi.org/10.1186/s40594-020-00208-5

Download citation

Received : 01 October 2019

Accepted : 20 February 2020

Published : 09 March 2020

DOI : https://doi.org/10.1186/s40594-020-00208-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Constructive alignment
  • Self-regulated learning
  • Process skills
  • Professional skills
  • Critical thinking
  • Information processing

articles on assessing critical thinking

Educational Membership icon

  • New! Member Benefit New! Member Benefit
  • Featured Analytics Hub
  • Resources Resources
  • Member Directory
  • Networking Communities
  • Advertise, Exhibit, Sponsor
  • Find or Post Jobs

Connect Icon

  • Learn and Engage Learn and Engage
  • Bridge Program

articles on assessing critical thinking

  • Compare AACSB-Accredited Schools
  • Explore Programs

Bullseye mission icon

  • Advocacy Advocacy
  • Featured AACSB Announces 2024 Class of Influential Leaders
  • Diversity, Equity, Inclusion, and Belonging
  • Influential Leaders
  • Innovations That Inspire
  • Connect With Us Connect With Us
  • Accredited School Search
  • Accreditation
  • Learning and Events
  • Advertise, Sponsor, Exhibit
  • Tips and Advice
  • Is Business School Right for Me?

Assessing Critical Thinking in the Digital Era

Article Icon

Sponsored Content

  • Technology is poised to revolutionize education. Instead of being disrupted by the new tech, schools should participate in its development.
  • Technology can be particularly useful in helping schools assess critical thinking skills, which have become even more important in a world that increasingly relies on artificial intelligence.
  • Peregrine Global Services has worked with institutions of higher learning to launch a new Critical Thinking Assessment tool to help schools measure both retained knowledge and acquired competencies.

  Technology has traditionally disrupted education, and higher education institutions have struggled to keep pace with these changes. However, when institutions of higher education partner with the technology sector, they can become sources of disruption themselves.

One of the most notable examples of how technology disrupted the educational field is the calculator. As Sarah Banks outlines in a 2011 master’s thesis that analyzes historical attitudes about the use of calculators in junior high and high school math classrooms, the invention met with mixed responses from educators.

Some educators viewed calculators as helpful tools that could speed up calculations and save time, allowing students to focus on more complex mathematical concepts. Others expressed concern that calculators would become crutches for students, hindering their ability to develop basic arithmetic skills. Eventually, of course, calculators became indispensable tools in the classroom and beyond.

More recently, artificial intelligence (AI) has emerged as a powerful new technology that has the potential to revolutionize education. However, educators such as Andre Perry and Nicol Turner Lee have expressed concerns about the possible negative impacts of AI. Among other things, they note that its algorithms can perpetuate bias and discrimination. Industry observers such as Lyss Welding point out that AI poses a risk to academic integrity because it allows students to plagiarize and cheat on homework in ways that are easier, faster, and harder to detect.

Artificial intelligence (AI) has emerged as a powerful new technology that has the potential to revolutionize education.

Despite these concerns, AI technology has become an integral part of modern education as more educators are actively adapting and leveraging it to benefit their learners. But teachers should not introduce technology into their classrooms unless they are also helping students develop their skills in higher-order thinking. While technology provides tools to assist with calculations, information access, and other tasks, critical thinking enables students to make sense of that information and use it effectively.

The Importance of Assessment

However, while critical thinking is widely recognized as an essential skill, it can be challenging for higher education institutions to quantify or measure how well students have learned it. Assessment is a vital and dynamic component of teaching knowledge, skills, and competencies. It informs program and institutional improvement, providing invaluable information that administrators, faculty, and staff can use to make data-driven decisions that lead to better student outcomes.

One of the key difficulties in assessing critical thinking is defining what it is and how it should be measured. Critical thinking is a complex skill that involves the ability to analyze and evaluate information, think creatively, and make reasoned judgments, as Richard Paul and Linda Elder outline in their 2019 publication . It is not a single skill that can be easily quantified or measured through traditional assessments. As a result, educators have had to develop more nuanced approaches to evaluating critical thinking skills, such as project-based assessments and open-ended questions that require students to demonstrate their reasoning and problem-solving abilities.

While critical thinking is widely recognized as an essential skill, it can be challenging for higher education institutions to quantify or measure how well students have learned it.

Another challenge in measuring critical thinking is ensuring that assessments are fair and unbiased. Assessments that are overly reliant on multiple-choice questions or rote memorization can unfairly disadvantage students who may excel in other areas of critical thinking.

For these reasons, educators need effective assessment methods that accurately measure critical thinking skills in a variety of contexts. These assessments should use consistent and objective criteria to ensure that all students are given equal opportunities to demonstrate their abilities.

However, building such assessment tools and overcoming the barriers associated with measuring critical thinking places a large and sometimes overwhelming administrative burden on faculty and staff. Unfortunately, there can be a negative impact on student performance when faculty members must allocate more time and resources to handling administrative tasks than to teaching courses and supporting learner success.

A Partnership Between Industry and Academia

The need for critical thinking assessment tools is being addressed through a recent partnership between various higher education institutions and Peregrine Global Services, an education technology company specializing in assessment and instructional solutions. Peregrine recently launched its Critical Thinking Assessment to help colleges and universities evaluate this important skill.

To ensure that the assessment tool would meet the specific needs of the higher education community, the company developed its Peregrine Partner Program, which involved beta testing the tool with programs of varying sizes and types during the fall of 2022 and the spring of 2023. Each educational partner provided valuable feedback on how to present data to help schools make informed decisions, how to remove administrative burdens associated with assessment, and how to foster a culture of quality.

The partnership between Peregrine and the higher education institutions has led to several unforeseen advancements in technology. These include the ability to analyze exam data by course, cohort, or program, as well as the implementation of blind scoring to remove scoring bias. The new tool also adopts an innovative approach to assessing critical thinking and generating the data necessary to analyze exam results. For example, schools will be able to sort and filter data by levels of higher-order thinking.

The Critical Thinking Assessment uses a standardized rubric covering six critical thinking subcriteria and provides institutions with the flexibility to customize the exams to meet their needs. Academic programs can tailor the service to cover specific disciplines and assess varying levels of higher-order thinking. Learners receive scenarios randomly, ensuring a unique testing experience for each student.

The system auto-scores multiple-choice questions, while designated program faculty and assessment administrators use a rubric to manually score open-ended items. The short case studies and scenario questions are written and validated by subject matter experts with practical and teaching experience in each specific discipline.

“The Critical Thinking Assessment helps make assessment a facultywide effort, where everyone has buy-in,” says Melodie Philhours, associate professor of marketing and director of assessment at Arkansas State University’s Neil Griffin College of Business in Jonesboro. “The assessment tool significantly reduces the time and resources required for assessment, allowing faculty to focus on teaching and improving student learning outcomes. One of the most significant benefits has been the removal of the administrative burden related to compiling and entering the data, as the results are readily available after the assessment is fully scored.”

At the Forefront of Disruption

The collaboration between Peregrine and its partner schools will benefit not only the institutions involved, but also the broader field of education. Any time higher education and the technology sector can work together, they will drive innovation and disruption, ultimately leading to better learner outcomes. With the Critical Thinking Assessment tool, Peregrine aims to help higher education institutions assess not just retained knowledge, but also acquired skills and competencies.

In the future, Peregrine plans to incorporate AI into the assessment and build an aggregate pool, so schools can compare their results over periods of time, internally and externally, allowing them to benchmark against schools with similar demographics. Until then, Peregrine is offering the tool to schools as a course-level assessment they can use in their overall assessment portfolio. 

The partnership between Peregrine and universities highlights the potential for industry and academia to come together to address the challenges faced by higher education. It demonstrates that when universities are at the forefront of disrupting education in a positive manner, they can move along with technology rather than lag behind it.

  • critical thinking
  • partnerships

Video Icon

APS

  • Teaching Tips

A Brief Guide for Teaching and Assessing Critical Thinking in Psychology

In my first year of college teaching, a student approached me one day after class and politely asked, “What did you mean by the word ‘evidence’?” I tried to hide my shock at what I took to be a very naive question. Upon further reflection, however, I realized that this was actually a good question, for which the usual approaches to teaching psychology provided too few answers. During the next several years, I developed lessons and techniques to help psychology students learn how to evaluate the strengths and weaknesses of scientific and nonscientific kinds of evidence and to help them draw sound conclusions. It seemed to me that learning about the quality of evidence and drawing appropriate conclusions from scientific research were central to teaching critical thinking (CT) in psychology.

In this article, I have attempted to provide guidelines to psychol­ogy instructors on how to teach CT, describing techniques I devel­oped over 20 years of teaching. More importantly, the techniques and approach described below are ones that are supported by scientific research. Classroom examples illustrate the use of the guidelines and how assessment can be integrated into CT skill instruction.

Overview of the Guidelines

Confusion about the definition of CT has been a major obstacle to teaching and assessing it (Halonen, 1995; Williams, 1999). To deal with this problem, we have defined CT as reflective think­ing involved in the evaluation of evidence relevant to a claim so that a sound or good conclusion can be drawn from the evidence (Bensley, 1998). One virtue of this definition is it can be applied to many thinking tasks in psychology. The claims and conclusions psychological scientists make include hypotheses, theoretical state­ments, interpretation of research findings, or diagnoses of mental disorders. Evidence can be the results of an experiment, case study, naturalistic observation study, or psychological test. Less formally, evidence can be anecdotes, introspective reports, commonsense beliefs, or statements of authority. Evaluating evidence and drawing appropriate conclusions along with other skills, such as distin­guishing arguments from nonarguments and finding assumptions, are collectively called argument analysis skills. Many CT experts take argument analysis skills to be fundamental CT skills (e.g., Ennis, 1987; Halpern, 1998). Psychology students need argument analysis skills to evaluate psychological claims in their work and in everyday discourse.

Some instructors expect their students will improve CT skills like argument analysis skills by simply immersing them in challenging course work. Others expect improvement because they use a textbook with special CT questions or modules, give lectures that critically review the literature, or have students complete written assignments. While these and other traditional techniques may help, a growing body of research suggests they are not sufficient to efficiently produce measurable changes in CT skills. Our research on acquisition of argument analysis skills in psychology (Bensley, Crowe, Bernhardt, Buchner, & Allman, in press) and on critical reading skills (Bensley & Haynes, 1995; Spero & Bensley, 2009) suggests that more explicit, direct instruction of CT skills is necessary. These results concur with results of an earlier review of CT programs by Chance (1986) and a recent meta-analysis by Abrami et al., (2008).

Based on these and other findings, the following guidelines describe an approach to explicit instruction in which instructors can directly infuse CT skills and assessment into their courses. With infusion, instructors can use relevant content to teach CT rules and concepts along with the subject matter. Directly infus­ing CT skills into course work involves targeting specific CT skills, making CT rules, criteria, and methods explicit, providing guided practice in the form of exercises focused on assessing skills, and giving feedback on practice and assessments. These components are similar to ones found in effective, direct instruc­tion approaches (Walberg, 2006). They also resemble approaches to teaching CT proposed by Angelo (1995), Beyer (1997), and Halpern (1998). Importantly, this approach has been successful in teaching CT skills in psychology (e.g., Bensley, et al., in press; Bensley & Haynes, 1995; Nieto & Saiz, 2008; Penningroth, Despain, & Gray, 2007). Directly infusing CT skill instruction can also enrich content instruction without sacrificing learning of subject matter (Solon, 2003). The following seven guidelines, illustrated by CT lessons and assessments, explicate this process.

Seven Guidelines for Teaching and Assessing Critical Thinking

1. Motivate your students to think critically

Critical thinking takes effort. Without proper motivation, students are less inclined to engage in it. Therefore, it is good to arouse interest right away and foster commitment to improving CT throughout a course. One motivational strategy is to explain why CT is important to effective, professional behavior. Often, telling a compelling story that illustrates the consequences of failing to think critically can mo­tivate students. For example, the tragic death of 10-year-old Candace Newmaker at the hands of her therapists practicing attachment therapy illustrates the perils of using a therapy that has not been supported by good empirical evidence (Lilienfeld, 2007).

Instructors can also pique interest by taking a class poll posing an interesting question on which students are likely to have an opinion. For example, asking students how many think that the full moon can lead to increases in abnormal behavior can be used to introduce the difference between empirical fact and opinion or common sense belief. After asking students how psychologists answer such questions, instructors might go over the meta-analysis of Rotton and Kelly (1985). Their review found that almost all of the 37 studies they reviewed showed no association between the phase of the moon and abnormal behavior with only a few, usually poorly, controlled studies supporting it. Effect size over all stud­ies was very small (.01). Instructors can use this to illustrate how psychologists draw a conclusion based on the quality and quantity of research studies as opposed to what many people commonly believe. For other interesting thinking errors and misconceptions related to psychology, see Bensley (1998; 2002; 2008), Halpern (2003), Ruscio (2006), Stanovich (2007), and Sternberg (2007).

Attitudes and dispositions can also affect motivation to think critically. If students lack certain CT dispositions such as open-mindedness, fair-mindedness, and skepticism, they will be less likely to think critically even if they have CT skills (Halpern, 1998). Instructors might point out that even great scientists noted for their powers of reasoning sometimes fail to think critically when they are not disposed to use their skills. For example, Alfred Russel Wallace who used his considerable CT skills to help develop the concept of natural selection also believed in spiritualistic contact with the dead. Despite considerable evidence that mediums claiming to contact the dead were really faking such contact, Wallace continued to believe in it (Bensley, 2006). Likewise, the great American psychologist William James, whose reasoning skills helped him develop the seeds of important contemporary theories, believed in spiritualism despite evidence to the contrary.

2. Clearly state the CT goals and objectives for your class

Once students are motivated, the instructor should focus them on what skills they will work on during the course. The APA task force on learning goals and objectives for psychology listed CT as one of 10 major goals for students (Halonen et al., 2002). Under critical thinking they have further specified outcomes such as evaluating the quality of information, identifying and evaluating the source and credibility of information, recognizing and defending against think­ing errors and fallacies. Instructors should publish goals like these in their CT course objectives in their syllabi and more specifically as assignment objectives in their assignments. Given the pragmatic penchant of students for studying what is needed to succeed in a course, this should help motivate and focus them.

To make instruction efficient, course objectives and lesson ob­jectives should explicitly target CT skills to be improved. Objectives should specify the behavior that will change in a way that can be measured. A course objective might read, “After taking this course, you will be able to analyze arguments found in psychological and everyday discussions.” When the goal of a lesson is to practice and improve specific microskills that make up argument analysis, an assignment objective might read “After successfully completing this assignment, you will be able to identify different kinds of evidence in a psychological discussion.” Or another might read “After suc­cessfully completing this assignment, you will be able to distinguish arguments from nonarguments.” Students might demonstrate they have reached these objectives by showing the behavior of correctly labeling the kinds of evidence presented in a passage or by indicating whether an argument or merely a claim has been made. By stating objectives in the form of assessable behaviors, the instructor can test these as assessment hypotheses.

Sometimes when the goal is to teach students how to decide which CT skills are appropriate in a situation, the instructor may not want to identify specific skills. Instead, a lesson objective might read, “After successfully completing this assignment, you will be able to decide which skills and knowledge are appropriate for criti­cally analyzing a discussion in psychology.”

3. Find opportunities to infuse CT that fit content and skill requirements of your course

To improve their CT skills, students must be given opportunities to practice them. Different courses present different opportunities for infusion and practice. Stand-alone CT courses usually provide the most opportunities to infuse CT. For example, the Frostburg State University Psychology Department has a senior seminar called “Thinking like a Psychologist” in which students complete lessons giving them practice in argument analysis, critical reading, critically evaluating information on the Internet, distinguishing science from pseudoscience, applying their knowledge and CT skills in simula­tions of psychological practice, and other activities.

In more typical subject-oriented courses, instructors must find specific content and types of tasks conducive to explicit CT skill instruction. For example, research methods courses present several opportunities to teach argument analysis skills. Instructors can have students critically evaluate the quality of evidence provided by studies using different research methods and designs they find in PsycINFO and Internet sources. This, in turn, could help students write better critical evaluations of research for research reports.

A cognitive psychology teacher might assign a critical evalu­ation of the evidence on an interesting question discussed in text­book literature reviews. For example, students might evaluate the evidence relevant to the question of whether people have flashbulb memories such as accurately remembering the 9-11 attack. This provides the opportunity to teach them that many of the studies, although informative, are quasi-experimental and cannot show causation. Or, students might analyze the arguments in a TV pro­gram such as the fascinating Nova program Kidnapped by Aliens on people who recall having been abducted by aliens.

4. Use guided practice, explicitly modeling and scaffolding CT.

Guided practice involves modeling and supporting the practice of target skills, and providing feedback on progress towards skill attainment. Research has shown that guided practice helps student more efficiently acquire thinking skills than unguided and discovery approaches (Meyer, 2004).

Instructors can model the use of CT rules, criteria, and proce­dures for evaluating evidence and drawing conclusions in many ways. They could provide worked examples of problems, writing samples displaying good CT, or real-world examples of good and bad thinking found in the media. They might also think out loud as they evaluate arguments in class to model the process of thinking.

To help students learn to use complex rules in thinking, instruc­tors should initially scaffold student thinking. Scaffolding involves providing product guidelines, rules, and other frameworks to support the process of thinking. Table 1 shows guidelines like those found in Bensley (1998) describing nonscientific kinds of evidence that can support student efforts to evaluate evidence in everyday psychologi­cal discussions. Likewise, Table 2 provides guidelines like those found in Bensley (1998) and Wade and Tavris (2005) describing various kinds of scientific research methods and designs that differ in the quality of evidence they provide for psychological arguments.

In the cognitive lesson on flashbulb memory described earlier, students use the framework in Table 2 to evaluate the kinds of evidence in the literature review. Table 1 can help them evaluate the kinds of evidence found in the Nova video Kidnapped by Aliens . Specifically, they could use it to contrast scientific authority with less credible authority. The video includes statements by scientific authorities like Elizabeth Loftus based on her extensive research contrasted with the nonscientific authority of Bud Hopkins, an artist turned hypnotherapist and author of popular books on alien abduction. Loftus argues that the memories of alien abduction in the children interviewed by Hopkins were reconstructed around the suggestive interview questions he posed. Therefore, his conclu­sion that the children and other people in the video were recalling actual abduction experiences was based on anecdotes, unreliable self-reports, and other weak evidence.

Modeling, scaffolding, and guided practice are especially useful in helping students first acquire CT skills. After sufficient practice, however, instructors should fade these and have students do more challenging assignments without these supports to promote transfer.

5. Align assessment with practice of specific CT skills

Test questions and other assessments of performance should be similar to practice questions and problems in the skills targeted but differ in content. For example, we have developed a series of practice and quiz questions about the kinds of evidence found in Table 1 used in everyday situations but which differ in subject matter from practice to quiz. Likewise, other questions employ research evidence examples corresponding to Table 2. Questions ask students to identify kinds of evidence, evaluate the quality of the evidence, distinguish arguments from nonarguments, and find assumptions in the examples with practice examples differing in content from assessment items.

6. Provide feedback and encourage students to reflect on it

Instructors should focus feedback on the degree of attainment of CT skill objectives in the lesson or assessment. The purpose of feedback is to help students learn how to correct faulty thinking so that in the future they monitor their thinking and avoid such problems. This should increase their metacognition or awareness and control of their thinking, an important goal of CT instruction (Halpern, 1998).

Students must use their feedback for it to improve their CT skills. In the CT exercises and critical reading assignments, students receive feedback in the form of corrected responses and written feedback on open-ended questions. They should be advised that paying attention to feedback on earlier work and assessments should improve their performance on later assessments.

7. Reflect on feedback and assessment results to improve CT instruction

Instructors should use the feedback they provide to students and the results of ongoing assessments to ‘close the loop,’ that is, use these outcomes to address deficiencies in performance and improve instruction. In actual practice, teaching and assessment strategies rarely work optimally the first time. Instructors must be willing to tinker with these to make needed improvements. Reflec­tion on reliable and valid assessment results provides a scientific means to systematically improve instruction and assessment.

Instructors may find the direct infusion approach as summarized in the seven guidelines to be efficient, especially in helping students acquire basic CT skills, as research has shown. They may especially appreciate how it allows them to take a scientific approach to the improvement of instruction. Although the direct infusion approach seems to efficiently promote acquisition of CT skills, more research is needed to find out if students transfer their skills outside of the class­room or whether this approach needs adjustment to promote transfer.

Table 1. Strengths and Weaknesses of Nonscientific Sources and Kinds of Evidence

Informal beliefs and folk theories of mind commonly assumed to be true

— is a view shared by many, not just a few people.

— is familiar and appeals to

everyday experience.

— is not based on careful,

systematic observation.

— may be biased by cultural

and social influences.

— often goes untested.

Story or example, often biographical, used to support a claim

— can vividly illustrate an ability, trait, behavior, or situation.

— provides a ‘real-world’ example.

— is not based on careful, systematic observation.

— may be unique, not repeatable, and cannot be generalized for large groups.

Reports of one’s own experience often in the form of testimonials and introspective self-reports

— tells what a person may be feeling, experiencing, or aware of at the time.

— is compelling and easily identified with.

— is often subjective and

biased.

— may be unreliable because

people are often unaware of

the real reasons for their

behaviors and experiences.

Statement made by a person or group assumed to have special knowledge or expertise

— may be true or useful when the authority has relevant knowledge or expertise.

— is convenient because acquiring one’s own knowledge and expertise takes a lot of time.

— is misleading when presumed authority does not have or pretends to have special knowledge or expertise.

— may be biased.

Table 2. Strengths and Weaknesses of Scientific Research Methods/Designs Used as Sources of Evidence

Detailed description of

one or a few subjects

— provides much information about one person.

— may inform about a person with special or rare abilities, knowledge, or characteristics.

— may be unique and hard to replicate.

— may not generalize to other people.

— cannot show cause and effect.

Observations of behavior made in the field or natural environment

— allows observations to be readily generalized to real world.

— can be a source of hypotheses.

— allows little control of extraneous variables.

— cannot test treatments.

— cannot show cause and effect.

A method like a questionnaire that allows many questions to be asked

— allows economical collection of much data.

— allows for study of many different questions at once.

— may have problems of self

reports such as dishonesty,

forgetting, and misrepresentation of self.

— may involve biased sampling.

A method for finding a quantitative relationship between variables — allows researcher to calculate

the strength and direction of

relation between variables.

— can use it to make predictions.

— does not allow random assignment of participants or much control of subject variables.

— cannot test treatments.

— cannot show cause and effect.

A method for comparing

treatment conditions without random assignment

— allows comparison of treatments.

— allows some control of extraneous variables.

— does not allow random assign-

ment of participants or much

control of subject variables.

— Cannot show cause and effect.

A method for comparing

Treatment conditions in which variables can be controlled through random assignment

— allows true manipulation

of treatment conditions.

— allows random assignment and much control of extraneous variables.

— can show cause and effect.

— cannot manipulate and test some variables.

— may control variables and conditions so much that they become artificial and

not like the ‘real world’.

Abrami, P. C., Bernard, R. M., Borokhovhovski, E., Wade, A., Surkes, M. A., Tamim, R., et al., (2008). Instructional interventions affecting critical thinking skills and dispositions: A stage 1 meta-analysis. Review of Educational Research, 4 , 1102–1134.

Angelo, T. A. (1995). Classroom assessment for critical thinking. Teaching of Psychology , 22(1), 6–7.

Bensley, D.A. (1998). Critical thinking in psychology: A unified skills approach. Pacific Grove, CA: Brooks/Cole.

Bensley, D.A. (2002). Science and pseudoscience: A critical thinking primer. In M. Shermer (Ed.), The Skeptic encyclopedia of pseudoscience. (pp. 195–203). Santa Barbara, CA: ABC–CLIO.

Bensley, D.A. (2006). Why great thinkers sometimes fail to think critically. Skeptical Inquirer, 30, 47–52.

Bensley, D.A. (2008). Can you learn to think more like a psychologist? The Psychologist, 21, 128–129.

Bensley, D.A., Crowe, D., Bernhardt, P., Buckner, C., & Allman, A. (in press). Teaching and assessing critical thinking skills for argument analysis in psychology. Teaching of Psychology .

Bensley, D.A. & Haynes, C. (1995). The acquisition of general purpose strategic knowledge for argumentation. Teaching of Psychology, 22 , 41–45.

Beyer, B.K. (1997). Improving student thinking: A comprehensive approach . Boston: Allyn & Bacon.

Chance, P. (1986) Thinking in the classroom: A review of programs . New York: Instructors College Press.

Ennis, R.H. (1987). A taxonomy of critical thinking dispositions and abilities. In J. B. Baron & R. F. Sternberg (Eds.). Teaching thinking skills: Theory and practice (pp. 9–26). New York: Freeman.

Halonen, J.S. (1995). Demystifying critical thinking. Teaching of Psychology, 22 , 75–81.

Halonen, J.S., Appleby, D.C., Brewer, C.L., Buskist, W., Gillem, A. R., Halpern, D. F., et al. (APA Task Force on Undergraduate Major Competencies). (2002) Undergraduate psychology major learning goals and outcomes: A report. Washington, DC: American Psychological Association. Retrieved August 27, 2008, from http://www.apa.org/ed/pcue/reports.html .

Halpern, D.F. (1998). Teaching critical thinking for transfer across domains: Dispositions, skills, structure training, and metacognitive monitoring. American Psychologist , 53 , 449–455.

Halpern, D.F. (2003). Thought and knowledge: An introduction to critical thinking . (3rd ed.). Mahwah, NJ: Erlbaum.

Lilienfeld, S.O. (2007). Psychological treatments that cause harm. Perspectives on Psychological Science , 2 , 53–70.

Meyer, R.E. (2004). Should there be a three-strikes rule against pure discovery learning? The case for guided methods of instruction. American Psychologist , 59 , 14–19.

Nieto, A.M., & Saiz, C. (2008). Evaluation of Halpern’s “structural component” for improving critical thinking. The Spanish Journal of Psychology , 11 ( 1 ), 266–274.

Penningroth, S.L., Despain, L.H., & Gray, M.J. (2007). A course designed to improve psychological critical thinking. Teaching of Psychology , 34 , 153–157.

Rotton, J., & Kelly, I. (1985). Much ado about the full moon: A meta-analysis of lunar-lunacy research. Psychological Bulletin , 97 , 286–306.

Ruscio, J. (2006). Critical thinking in psychology: Separating sense from nonsense. Belmont, CA: Wadsworth.

Solon, T. (2007). Generic critical thinking infusion and course content learning in introductory psychology. Journal of Instructional Psychology , 34(2), 972–987.

Stanovich, K.E. (2007). How to think straight about psychology . (8th ed.). Boston: Pearson.

Sternberg, R.J. (2007). Critical thinking in psychology: It really is critical. In R. J. Sternberg, H. L. Roediger, & D. F. Halpern (Eds.), Critical thinking in psychology. (pp. 289–296) . Cambridge, UK: Cambridge University Press.

Wade, C., & Tavris, C. (2005) Invitation to psychology. (3rd ed.). Upper Saddle River, NJ: Prentice Hall.

Walberg, H.J. (2006). Improving educational productivity: A review of extant research. In R. F. Subotnik & H. J. Walberg (Eds.), The scientific basis of educational productivity (pp. 103–159). Greenwich, CT: Information Age.

Williams, R.L. (1999). Operational definitions and assessment of higher-order cognitive constructs. Educational Psychology Review , 11 , 411–427.

articles on assessing critical thinking

Excellent article.

articles on assessing critical thinking

Interesting and helpful!

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines .

Please login with your APS account to comment.

About the Author

D. Alan Bensley is Professor of Psychology at Frostburg State University. He received his Master’s and PhD degrees in cognitive psychology from Rutgers University. His main teaching and research interests concern the improvement of critical thinking and other cognitive skills. He coordinates assessment for his department and is developing a battery of instruments to assess critical thinking in psychology. He can be reached by email at [email protected] Association for Psychological Science December 2010 — Vol. 23, No. 10

articles on assessing critical thinking

Student Notebook: Five Tips for Working with Teaching Assistants in Online Classes

Sarah C. Turner suggests it’s best to follow the golden rule: Treat your TA’s time as you would your own.

Teaching Current Directions in Psychological Science

Aimed at integrating cutting-edge psychological science into the classroom, Teaching Current Directions in Psychological Science offers advice and how-to guidance about teaching a particular area of research or topic in psychological science that has been

European Psychology Learning and Teaching Conference

The School of Education of the Paris Lodron University of Salzburg is hosting the next European Psychology Learning and Teaching (EUROPLAT) Conference on September 18–20, 2017 in Salzburg, Austria. The main theme of the conference

Privacy Overview

CookieDurationDescription
__cf_bm30 minutesThis cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
CookieDurationDescription
AWSELBCORS5 minutesThis cookie is used by Elastic Load Balancing from Amazon Web Services to effectively balance load on the servers.
CookieDurationDescription
at-randneverAddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT2 yearsYouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc1 year 27 daysSet by addthis.com to determine the usage of addthis.com service.
_ga2 yearsThe _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_3507334_11 minuteSet by Google to distinguish users.
_gid1 dayInstalled by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CookieDurationDescription
loc1 year 27 daysAddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE5 months 27 daysA cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSCsessionYSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devicesneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-idneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextIdneverThis cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requestsneverThis cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Menu Trigger

Why Schools Need to Change Yes, We Can Define, Teach, and Assess Critical Thinking Skills

articles on assessing critical thinking

Jeff Heyck-Williams (He, His, Him) Director of the Two Rivers Learning Institute in Washington, DC

critical thinking

Today’s learners face an uncertain present and a rapidly changing future that demand far different skills and knowledge than were needed in the 20th century. We also know so much more about enabling deep, powerful learning than we ever did before. Our collective future depends on how well young people prepare for the challenges and opportunities of 21st-century life.

Critical thinking is a thing. We can define it; we can teach it; and we can assess it.

While the idea of teaching critical thinking has been bandied around in education circles since at least the time of John Dewey, it has taken greater prominence in the education debates with the advent of the term “21st century skills” and discussions of deeper learning. There is increasing agreement among education reformers that critical thinking is an essential ingredient for long-term success for all of our students.

However, there are still those in the education establishment and in the media who argue that critical thinking isn’t really a thing, or that these skills aren’t well defined and, even if they could be defined, they can’t be taught or assessed.

To those naysayers, I have to disagree. Critical thinking is a thing. We can define it; we can teach it; and we can assess it. In fact, as part of a multi-year Assessment for Learning Project , Two Rivers Public Charter School in Washington, D.C., has done just that.

Before I dive into what we have done, I want to acknowledge that some of the criticism has merit.

First, there are those that argue that critical thinking can only exist when students have a vast fund of knowledge. Meaning that a student cannot think critically if they don’t have something substantive about which to think. I agree. Students do need a robust foundation of core content knowledge to effectively think critically. Schools still have a responsibility for building students’ content knowledge.

However, I would argue that students don’t need to wait to think critically until after they have mastered some arbitrary amount of knowledge. They can start building critical thinking skills when they walk in the door. All students come to school with experience and knowledge which they can immediately think critically about. In fact, some of the thinking that they learn to do helps augment and solidify the discipline-specific academic knowledge that they are learning.

The second criticism is that critical thinking skills are always highly contextual. In this argument, the critics make the point that the types of thinking that students do in history is categorically different from the types of thinking students do in science or math. Thus, the idea of teaching broadly defined, content-neutral critical thinking skills is impossible. I agree that there are domain-specific thinking skills that students should learn in each discipline. However, I also believe that there are several generalizable skills that elementary school students can learn that have broad applicability to their academic and social lives. That is what we have done at Two Rivers.

Defining Critical Thinking Skills

We began this work by first defining what we mean by critical thinking. After a review of the literature and looking at the practice at other schools, we identified five constructs that encompass a set of broadly applicable skills: schema development and activation; effective reasoning; creativity and innovation; problem solving; and decision making.

critical thinking competency

We then created rubrics to provide a concrete vision of what each of these constructs look like in practice. Working with the Stanford Center for Assessment, Learning and Equity (SCALE) , we refined these rubrics to capture clear and discrete skills.

For example, we defined effective reasoning as the skill of creating an evidence-based claim: students need to construct a claim, identify relevant support, link their support to their claim, and identify possible questions or counter claims. Rubrics provide an explicit vision of the skill of effective reasoning for students and teachers. By breaking the rubrics down for different grade bands, we have been able not only to describe what reasoning is but also to delineate how the skills develop in students from preschool through 8th grade.

reasoning rubric

Before moving on, I want to freely acknowledge that in narrowly defining reasoning as the construction of evidence-based claims we have disregarded some elements of reasoning that students can and should learn. For example, the difference between constructing claims through deductive versus inductive means is not highlighted in our definition. However, by privileging a definition that has broad applicability across disciplines, we are able to gain traction in developing the roots of critical thinking. In this case, to formulate well-supported claims or arguments.

Teaching Critical Thinking Skills

The definitions of critical thinking constructs were only useful to us in as much as they translated into practical skills that teachers could teach and students could learn and use. Consequently, we have found that to teach a set of cognitive skills, we needed thinking routines that defined the regular application of these critical thinking and problem-solving skills across domains. Building on Harvard’s Project Zero Visible Thinking work, we have named routines aligned with each of our constructs.

For example, with the construct of effective reasoning, we aligned the Claim-Support-Question thinking routine to our rubric. Teachers then were able to teach students that whenever they were making an argument, the norm in the class was to use the routine in constructing their claim and support. The flexibility of the routine has allowed us to apply it from preschool through 8th grade and across disciplines from science to economics and from math to literacy.

argumentative writing

Kathryn Mancino, a 5th grade teacher at Two Rivers, has deliberately taught three of our thinking routines to students using the anchor charts above. Her charts name the components of each routine and has a place for students to record when they’ve used it and what they have figured out about the routine. By using this structure with a chart that can be added to throughout the year, students see the routines as broadly applicable across disciplines and are able to refine their application over time.

Assessing Critical Thinking Skills

By defining specific constructs of critical thinking and building thinking routines that support their implementation in classrooms, we have operated under the assumption that students are developing skills that they will be able to transfer to other settings. However, we recognized both the importance and the challenge of gathering reliable data to confirm this.

With this in mind, we have developed a series of short performance tasks around novel discipline-neutral contexts in which students can apply the constructs of thinking. Through these tasks, we have been able to provide an opportunity for students to demonstrate their ability to transfer the types of thinking beyond the original classroom setting. Once again, we have worked with SCALE to define tasks where students easily access the content but where the cognitive lift requires them to demonstrate their thinking abilities.

These assessments demonstrate that it is possible to capture meaningful data on students’ critical thinking abilities. They are not intended to be high stakes accountability measures. Instead, they are designed to give students, teachers, and school leaders discrete formative data on hard to measure skills.

While it is clearly difficult, and we have not solved all of the challenges to scaling assessments of critical thinking, we can define, teach, and assess these skills . In fact, knowing how important they are for the economy of the future and our democracy, it is essential that we do.

Jeff Heyck-Williams (He, His, Him)

Director of the two rivers learning institute.

Jeff Heyck-Williams is the director of the Two Rivers Learning Institute and a founder of Two Rivers Public Charter School. He has led work around creating school-wide cultures of mathematics, developing assessments of critical thinking and problem-solving, and supporting project-based learning.

Read More About Why Schools Need to Change

Liberty Leads Partnership Program

A Pedagogy of Belonging: Student Voices 30 Years Later

David Penberg

June 3, 2024

middle school student presentation

Equitable and Sustainable Social-Emotional Learning: Embracing Flexibility for Diverse Learners

Clementina Jose

May 30, 2024

high school student invention team

Nurturing STEM Identity and Belonging: The Role of Equitable Program Implementation in Project Invent

Alexis Lopez (she/her)

May 9, 2024

articles on assessing critical thinking

  • Open access
  • Published: 08 April 2022

Assessing critical thinking through L2 argumentative essays: an investigation of relevant and salient criteria from raters’ perspectives

  • Takanori Sato   ORCID: orcid.org/0000-0003-1483-966X 1  

Language Testing in Asia volume  12 , Article number:  9 ( 2022 ) Cite this article

4202 Accesses

3 Citations

4 Altmetric

Metrics details

Although some second language (L2) pedagogical approaches recognize critical thinking (CT) as an important skill, its assessment is challenging because it is not a well-defined construct with varying definitions. This study aimed to identify the relevant and salient features of argumentative essays that allow for the assessment of L2 students’ CT skills. This study implemented a convergent mixed-methods research design, collecting and analyzing both quantitative and qualitative data to collate the results. Five raters assessed 140 causal argumentative essays written by Japanese university students attending Content and Language Integrated Learning courses based on five criteria: Task Achievement, Coherence and Cohesion, Lexical Resource, Grammatical Range and Accuracy, and CT Skills. A standard multiple regression was conducted to examine the relationships among these criteria. Additionally, raters’ written verbal protocols were collected to identify the essay features to be considered when assessing students’ CT skills. The results indicated that raters’ judgments of students’ CT were closely linked to Task Achievement. Furthermore, their assessments were affected by the essay’s relevancy to the question, content development, logicality, and quality of ideas. This study’s findings help to conceptualize CT as a construct and should be incorporated into the assessment criteria of various L2 educational contexts.

Introduction

Some second language (L2) pedagogical approaches, including English for academic purposes (EAP) and Content and Language Integrated Learning (CLIL), stress the importance of critical thinking (CT) since the skills are vital in academia and help students engage with world knowledge (de Chazal, 2014 ; Mehisto & Ting, 2017 ). Being an integral part of such instructional approaches, the assessment of CT must be conducted to foster decisions on summative and formative purposes in the course. In this context, essay writing assignments are considered as an effective tool for assessing CT skills, as they provide students with time to carefully consider reasons for their assertions and refine their ideas (Nosich, 2022 ; Wade, 1995 ).

However, assessing CT through essay writing is challenging because “the term ‘critical thinking’ is a notoriously fuzzy construct in education” (Yuan & Stapleton, 2020 , p. 41) and “critical thinking as a concept is diffuse” (Wilson, 2016 , p. 257). While performance assessment requires rating scales to enable assessors to measure students’ L2 output (McNamara, 1996 ), the elusive construct of CT makes it difficult to clearly decide what to assess. Although numerous elements of CT have been explicated for general education (e.g., Paul & Elder, 2014 ), it has not been operationalized specifically for L2 pedagogical settings, and hence relevant and salient criteria have not been established for assessing L2 students’ CT through their essays. In particular, delineating this construct is warranted for argumentative writing, which is an imperative type of writing that L2 students are likely to engage in various academic contexts (Hirvela, 2017 ).

One effective approach to disentangling such an elusive construct is to investigate people’s intuitive judgments of it. In other words, studies on how essay readers evaluate the writers’ CT skills can provide empirical data that can help researchers identify relevant and salient features of the construct. Nevertheless, no existing studies have implemented this research approach to delineate CT for L2 writing assessment. Therefore, this study aimed to identify the features of argumentative essays that allow for the assessment of L2 students’ CT by investigating how readers rate and judge the writers’ CT skills through their argumentative essays. This study’s findings contribute to the conceptualization of CT as a construct and the development of rating scales for measuring it in L2 educational contexts.

Literature review

Ct theories and argumentative writing.

CT is known as a fuzzy and elusive concept because of its various competing definitions and interpretations (Wilson, 2016 ). Davies and Barnett ( 2015 ) indicate how widely CT has been defined by summarizing its concepts in three movements: the critical thinking movement, which focuses on argumentation skills and dispositions; the criticality movement, which addresses ethical actions and morality in society; and the critical pedagogy movement, which aims to overcome the oppression that restricts human freedom. One widely utilized definition for CT is “Critical thinking is a reasonable reflective thinking focused on deciding what to believe and do” (Ennis, 2011 , p. 10). This conception outlines 12 dispositions (e.g., trying to be well informed and being open-minded) and 16 abilities (e.g., analyzing arguments and judging the credibility of sources) that describe the characteristics of ideal critical thinkers. While argumentative skills are required to demonstrate reflective thinking, this view focuses on judgment formation and decision-making than the mechanisms of argumentation (Davies & Barnett, 2015 ). Ennis ( 2011 ) claims that CT “should be a very important part of our personal, civic, and vocational lives and should receive attention in our education system” (p. 10).

CT skills in L2 pedagogies are built on Ennis’s ( 2011 ) conception and focus on promoting argumentation and cognitive thinking skills. Dummett and Hughes ( 2019 ) defined CT in the English language teaching context as “a mindset that involves thinking reflectively (being curious), rationally (thinking analytically), and reasonably (coming to sensible conclusions)” [emphasis in original] (p. 4) and illustrated how it is associated with Anderson and Krathwohl’s ( 2001 ) categories of cognitive process dimensions. Anderson and Krathwohl ( 2001 ) specified six cognitive process categories that education should incorporate to help students improve their retention abilities and the transfer of learning. These categories are as follows: to remember (retrieving knowledge from memory), understand (building connections between prior and new knowledge), apply (using the acquired knowledge in new situations), analyze (breaking down concepts into constituent parts and verifying how they relate to each other), evaluate (making judgments using certain criteria), and create (making new products using previous learning experience). They are regarded as relevant skills for CT development that should be taught in EAP (de Chazal, 2014 ) and CLIL (Coyle et al., 2010 ). Among them, “analyze” and “evaluate” are most often associated with CT (de Chazal, 2014 ).

The ability to present arguments is an essential CT skill because it involves presenting one’s views with both reasons and evidence (Chaffee, 2019 ; Fisher, 2011 ; Nosich, 2022 ). As Cottrell ( 2017 ) states, “essays are exercises in critical thinking” (p. 161). Notably, in argumentative essay writing tasks—with or without source materials—students must not only present their ideas but also assess their own reasoning. At a minimum, essay writing involves remembering (retrieving relevant information), creating (writing an essay), and evaluating (critiquing one’s own ideas) (Anderson & Krathwohl, 2001 ). As fundamental CT abilities, critiquing one’s own reasoning and engaging in dialectical thinking (Tanaka & Gilliland, 2017 ), as well as the need for refinement, make essay writing appropriate for assessing students’ CT skills (Wade, 1995 ).

CT assessment criteria

Scholars have proposed various criteria for assessing CT skills, including cognitive thinking and reasoning skills. Chaffee ( 2019 ) and Fisher ( 2011 ) provided two criteria focusing on reasoning: whether the reasons support its conclusion (validity) and whether the reasons are true and acceptable (truth). An argument that includes accurate reasons that fully support the writer’s claims is considered a sound argument. Furthermore, Paul and Elder ( 2014 ) proposed the following nine intellectual standards for assessing reasoning: (a) clarity of statements, (b) accuracy of information (i.e., truth), (c) precision of statements, (d) relevance of ideas, (e) depth of thoughts, (f) breadth of viewpoints, (g) logicalness (i.e., validity), (h) significance of information, and (i) fairness of arguments (see also Nosich, 2022 ). These were proposed for use by those who study CT to evaluate a given argument and improve the quality of their own reasoning. Thus, these criteria were not specifically designed for assessing the CT skills of L2 learners through their argumentative essays. Yanning ( 2017 ) developed a rating scale based on Paul and Elder’s ( 2014 ) standards and implemented it to measure Chinese students’ CT through their L2 argumentative essays. However, as the aim of the study was to gauge the effectiveness of a pedagogical approach, the appropriateness of the scale itself was not scrutinized.

Some scholars have proposed certain criteria to specifically assess CT skills through argumentative essays. Cottrell’s ( 2017 ) description of critical writers enlists the following features of essays that reflect CT skills: presenting arguments clearly to make them comprehensible to readers, selecting the most controversial points to discuss in detail, placing arguments in logical order to emphasize the most controversial points, and using discourse markers to help readers understand the arguments. Additionally, the Washington State University (WSU) Center for Teaching, Learning, and Technology ( 2009 ) developed a rating scale for CT skills displayed in argumentative essays consisting of seven criteria with detailed descriptors. The rating scale examines students’ (a) identification of an issue, (b) consideration of the issue’s context, (c) presentation and assessment of supporting evidence, (d) integration of diverse perspectives, (e) presentation of their own perspectives, (f) identification of implications and consequences, and (g) communication of the message. These criteria were identified based on the practical experiences of WSU’s staff members. This scale has also been included in writing-intensive courses in a U.S. university’s general education curriculum (Morozov, 2011 ). Although Cottrell’s ( 2017 ) description and WSU’s rating scale connect CT skills with writing abilities, they were neither developed specifically to assess L2 learners’ CT skills nor based on research. Hence, these criteria, developed for native English speakers, do not necessarily consider the characteristics of L2 students’ writing.

Finally, Stapleton ( 2001 ) created a scheme to quantify CT as displayed in argumentative passages written in English by Japanese university students. This covers the key elements of CT and examines the numbers of (a) arguments presented (opinions and their reasons), (b) evidence given in support of each reason, (c) presentation of opposing arguments, (d) refutations of these counterarguments, and (e) any potential fallacies. Nevertheless, the quantified outcomes here do not necessarily reflect the essay’s CT level or logical quality. For example, presenting numerous pieces of supporting evidence does not mean that the writer possesses high CT skills. Thus, the scheme cannot readily be adopted to measure L2 students’ level of CT displayed in their argumentative essays.

In summary, a wide range of criteria has been suggested to assess CT skills based on theories conceptualizing CT. A significant limitation of the current CT criteria is that they are neither empirically derived nor supported for use in L2 educational contexts. Therefore, it remains unclear whether the suggested criteria are relevant to and salient in L2 essay writing assessments and whether other important criteria exist that have not yet been acknowledged.

Conceptualizing constructs for the development of a rating scale

Investigating raters’ intuitive judgments of CT skills is helpful in forming its conceptualization in L2 educational contexts. An empirical investigation of raters’ judgments would reveal the construct’s components and facilitate the development of a rating scale. Researchers have identified the influential features of various constructs in applied linguistics, including oral fluency (e.g., Bosker et al., 2013 ), accentedness and comprehensibility (e.g., Saito et al., 2017 ), oral communicative ability (Sato, 2012 ; Sato & McNamara, 2019 ; McNamara, 1990 ), and writing proficiency (e.g., Cumming et al., 2001 ). These studies scrutinized raters’ intuitive judgments of the targeted constructs without using descriptors and rigorous training to assess them.

Furthermore, they identified the influential components of raters’ intuitive judgments of the constructs using one of the following three approaches. The first approach investigated the correlation between raters’ judgments and objective measurements of the linguistic features of the performances (Bosker et al., 2013 ; Saito et al., 2017 ). The second approach examined the relationship between raters’ judgments and their ratings of specific performance features (Sato, 2012 ; McNamara, 1990 ). The third approach required raters to judge performances and verbalize their rating process to identify features that affected their judgments (Sato & McNamara, 2019 ; Cumming et al., 2001 ). The first and second approaches identify features that unconsciously influence raters’ judgments (e.g., McNamara, 1990 ). However, they do not consider the influence of other factors. The third approach compensates for this limitation. Nonetheless, analyzing verbal protocols may not be sufficient because raters’ reports may not accurately represent the actual factors that affected their judgments (Gass & Mackey, 2017 ).

These studies have had important implications for the development of rating scales for oral fluency as well as overall speaking and writing proficiency. However, investigation of raters’ judgments in the context of assessing L2 learners’ CT through argumentative essay writing has not been conducted yet.

Theoretical background

Rating scales are tools, composed of criteria which assess test-takers’ performance. Consequently, it is important to shortlist criteria that should be included in the scales by operationally defining the target construct and specifying its constituents. In general, a theory explicating the target construct is an important frame of reference for operationally defining it (Bachman & Palmer, 2010 ). However, there is no agreed theory of writing explaining the construct of L2 writing itself (Knoch, 2022 ), and none of the CT assessment models was developed specifically for L2 writing based on research (e.g., Paul & Elder, 2014 ). In this context, the empirically identified components of CT contribute to conceptualizing it for L2 writing assessment and can be included as features into a rating scale.

This study aimed to identify the features of argumentative essays that allow for the assessment of CT by investigating how readers judge the writers’ CT skills. More specifically, the study addresses the following research questions (RQs):

What is the relationship between rater judgments of students’ CT skills and their ratings using the assessment criteria for L2 writing proficiency?

What essay features do raters consider when judging students’ CT skills?

The second and third approaches (see the “Conceptualizing Constructs for the Development of a Rating Scale” section) were applied to answer RQs 1 and 2, respectively. RQ1 is concerned with raters’ intuitive judgments of writers’ CT skills corresponding to any criteria used to measure L2 writing proficiency. Additionally, to explore any influential features other than the criteria uncovered by RQ1, a verbal protocol analysis was employed for RQ2. Therefore, this study aims to conceptualize CT by combining both research approaches and compensating for their limitations.

Methodology

The present study is exploratory research employing inductive reasoning, as its purpose is to identify the relevant and salient CT features of argumentative essays without applying existing CT theoretical frameworks. This study implemented a convergent mixed-methods research design, which involves the collection and analysis of both quantitative and qualitative data to merge the results (Creswell, 2015 ). More specifically, the scores for students’ argumentative essays awarded by five raters were analyzed to examine the relationships among the assessment criteria. Additionally, the raters’ verbal protocol data were analyzed to reveal the essay features that influenced the raters’ judgments of the students’ CT skills.

Participants

Eighty-nine first- and second-year university students who attended CLIL courses participated in this study and took both pre- and post-test (see the “Data Collection Instrument” section). They were from two elementary, three lower-intermediate, and two upper-intermediate English courses. Based on their Test of English for Academic Purposes scores (a placement test), it was determined that the students’ proficiency grades were roughly equivalent to levels A2 to B1 of the Common European Framework of Reference (CEFR) (Council of Europe, 2001 ). In the CLIL courses, the students were taught an academic subject selected by each instructor (e.g., Japanese culture and world Englishes) and encouraged to use the four English language skills (reading, listening, speaking, and writing). CT development is an explicit aim stated in the course syllabus, although the degree to which it was covered in the classes depended on each instructor. Moreover, details about CT—including its definition and assessment—were not included in the syllabus. The students signed an informed consent form to agree to have their essays used for research purposes.

Five native English speakers with work experience as examiners of the International English Language Testing System (IELTS) participated in the study as raters. Table  1 shows their background information. All raters had a Master’s degree and at least 16 years’ experience in English language teaching and 6 years’ experience as IELTS examiners. They were chosen because this research examined the effectiveness of the CLIL program using the IELTS rating scale.

Data collection instrument

This study used students’ performance data derived from a course evaluation project that examined the effectiveness of a CLIL program offered at a private Japanese university. Students of seven 28-class (14-week) CLIL courses underwent identical speaking and writing tests on two occasions: during the 2nd/3rd class (pre-test) and the 27th class (post-test). This study then used the students’ essays in the pre- and post-writing tests and their subsequent ratings in its analysis.

Pre- and post-tests to measure students’ productive skills were developed for the course evaluation. A timed-independent writing task was developed by the researcher along with his colleague, in which students were instructed to write an essay to answer the following prompt: “What motivates students to study their subject at university? Give specific details and examples to explain your answer.” They were asked to write approximately 300 words in 30 min using either a computer or pen and paper. This is a causal argumentative essay task in which students are required to speculate on the possible causes of a given phenomenon (Ramage et al., 2015 ). The task was considered suitable to elicit students’ CT skills because it involved critiquing and refining one’s reasoning while formulating arguments. Moreover, a similar writing task has also been employed by some well-known English proficiency tests (e.g., IELTS and the Test of English as a Foreign Language) to assess argumentation of L2 learners (Hirvela, 2017 ). The topic was selected because it was assumed that students do not need any specialized background knowledge to respond to it, but rather are able to use their creativity and personal examples to construct their argument.

Data collection procedure

Students were informed that the test’s purpose was to examine improvements in their productive language skills following a one-semester CLIL course. They were also told that their test results would not affect their grades in this course. However, they were not informed that their CT skills would be assessed through the tests.

The handwritten essays were typed in Microsoft Word, and the same formatting style was applied to all of them, including those typed by the students themselves (Times New Roman, 12-point, single-spaced). Then, two essays each from the 70 students, who had produced, were collected, and hence a total of 140 essays (70 each from the pre- and post-tests) were procured. These students were chosen from among those who wrote more than 120 words, as it would have been difficult to assess multiple linguistic features and CT skills in shorter essays. Furthermore, the sample comprised approximately an equal number of students randomly selected from each of the three English courses (elementary: n = 23, lower-intermediate: n = 24, and upper-intermediate: n = 23). The essays ranged between 121 and 356 words, with an average word count of 205.6.

Next, the five raters were given the 140 essays for assessment. Each essay was scored by two to three of the five raters, with the connectivity required for a Rasch analysis being established. Each rater was requested to assess 60 essays. The five raters were asked to rate the essays using the IELTS Task 2 Writing band descriptors (public version) (British Council, n.d. ), which consist of four criteria: Task Achievement, Coherence and Cohesion, Lexical Resource, and Grammatical Range and Accuracy. The IELTS scale was used because it includes a wide range of writing proficiency components with detailed descriptors. Additionally, as Plakans and Gebril ( 2017 ) claim, argumentation could be measured by these criteria. Although the IELTS rating scale includes 10 levels (0–9), this study only used six (1–6) because the students’ proficiency was not high enough for them to obtain scores above seven.

In addition to assessing the essays according to the IELTS criteria, the raters were also asked to judge the level of each student’s CT skills. The following two definitions of CT from online English dictionaries were provided:

“The objective analysis and evaluation of an issue in order to form a judgement” (Oxford Dictionary).

“The process of thinking carefully about a subject or idea, without allowing feelings or opinions to affect you” (Cambridge Dictionary).

These definitions include the key elements of CT addressed in L2 pedagogies, such as careful thinking (reflective thinking), objective analysis and evaluation (rational thinking), and judgment formation (reasonable thinking) (Dummett & Hughes, 2019 ). Definitions from English dictionaries, rather than those found in the CT literature, were provided because they are concise and easy to understand. The raters scored the students’ CT skills using a six-level semantic differential scale ranging from one “Poor” to six “Excellent” with unspecified midpoints. Descriptors and training in using the scale were not provided because this study’s main aim was to investigate raters’ interpretation of students’ CT without the influence of any pre-existing rating scale. The raters were informed that this criterion was adopted for research purposes rather than for course evaluation.

Each rater performed a retrospective written verbal protocol (Gass & Mackey, 2017 ) by writing comments on eight to 10 randomly selected essays. The raters were asked to indicate which part of the students’ essays influenced the judgments of their CT skills and explain how these identified portions influenced the scores assigned using Microsoft Word’s comment function immediately after scoring the essays. In total, 103 comments were given to 18 essays. Unfortunately, a written verbal protocol could undermine the validity of the reports because the raters would be able to write about things that they did not think about while actually rating students’ CT skills. However, an oral concurrent verbal protocol could not be adopted because the raters concurrently scored other features of the essays, whereas the focus of this study was only on CT.

Data analysis

The scores given to the 140 essays were statistically analyzed to answer RQ1. First, the rater reliability was confirmed using the many-facet Rasch measurement. The pre- and post-test data were separately analyzed using the FACETS 3.83.0 software (Linacre, 2019 ). Rater infit mean-square values, which indicate rater reliability, were within the acceptable range (0.7–1.3). This suggests that all the raters scored the students’ argumentative essays consistently using the IELTS band descriptors and a scale for assessing CT skills. Second, a multiple regression (MR) was conducted using the raw scores to examine the relative importance of the four IELTS criteria (the predictor variables) in predicting the raters’ judgments of students’ CT skills (the outcome variable). As there is no hypothesis about the strength of the predictors, this study performed a standard MR simultaneously with all predictors. The pre- and post-test data were separately analyzed using SPSS Statistics version 26. The assumptions for the MR (the number of data cases, multicollinearity, normality, linearity, and homoscedasticity) were examined. This study had 150 data points each in the pre- and post-tests, which was larger than the required 15 cases of data per predictor (Field, 2018 ). As the variance inflation factor values ranged from 1.89 to 2.74, staying far under 10, multicollinearity was not present among the predictor variables. The last three assumptions were examined using the scatterplots of residual, histogram, and P-P plot. All the assumptions were satisfied except for normally distributed errors for both the pre- and post-tests. However, the violation of this assumption is not of great concern because of the amount of data in this study (Field, 2018 ).

The raters’ verbal protocol data were then analyzed to answer RQ2. Thematic analysis, which involves identifying themes within the data (Braun & Clarke, 2013 ), was carried out to identify the features of students’ argumentative essays affecting their CT scores as assessed by the raters. First, each of the raters’ comments was read to generate initial codes that grouped similar concepts. Second, coding categories, based on the generated codes, were developed, with all comments being sorted into the developed categories using NVivo 11. Third, the coding categories were reexamined and collated to identify any overarching themes. Therefore, the analysis was inductive, with the identified themes being linked to the data. Finally, a PhD student in applied linguistics was asked to code 30% of the data to ensure inter-coder reliability. The kappa coefficient was 0.71, demonstrating adequate agreement. Disagreements were resolved through subsequent discussions, and the categories were finalized.

This section will present the results of the MR answering RQ1 (identifying the relationship between rater judgments of students’ CT skills and their ratings of other criteria for L2 writing proficiency) and the thematic analysis answering RQ2 (exploring the essay features that the raters consider while judging students’ CT skills).

Relationship between CT and other criteria

Tables  2 and 3 present the descriptive statistics of the scores on each criterion and the results of the MR analyses, respectively. The regression results indicate that the Task Achievement scores made the largest contribution to the raters’ intuitive judgments of CT skills in both the pre- and post-tests ( β s = .39 and .30, p s = .000 and .002, respectively). Additionally, the contribution of Lexical Resource was statistically significant and found to be the second largest predictor in the post-test ( β = .29, p = .006). In contrast, the other criteria had minor contributions and were not significant ( p s > .05). Overall, the four IELTS criteria explained 47% and 33% of the variance in CT skills in the pre- and post-tests, respectively, suggesting that elements other than the IELTS criteria explain rater judgments of CT. To summarize, the raters’ judgments of participants’ CT skills were explained by their Task Achievement scores most strongly, followed by the Lexical Resources scores.

Influential features on rater judgments of CT

Table  4 shows the findings of the thematic analysis of the comments written by the raters. The following five features, representing the criteria used by the raters to judge the students’ CT skills, were explored: Relevancy, Content Development, Logicality, Quality of Ideas, and Other Features. The first category, Relevancy, concerned the question of whether the written ideas were addressing the given question (What motivates students to study their subject at the university?). The raters positively evaluated essays that maintain their focus on the question and negatively judged pieces of content deviated from it. Second, Content Development referred to how deeply students discussed their ideas by including supporting details and examples. Essays with a sufficient amount of details, examples, and ideas were considered as those displaying high CT skills. Third, the raters noted the logicality of the arguments, the link between written ideas, and coherence. The raters considered that high CT skills were demonstrated by logical connections among ideas, especially the link between the writers’ main claim and supporting evidence. Fourth, the raters evaluated the quality of ideas focusing on the validity and originality of ideas as well as on the width of perspective presented in the essays. Students discussing well-thought and original ideas from multiple points of view were regarded as those possessing high CT skills, whereas those presenting poor and ubiquitous thoughts based only on their own personal experience were evaluated otherwise. Finally, linguistic accuracy and miscellaneous features were categorized as Other Features.

Three raters (R1, R2, and R3) made some comments on linguistic errors found in the essays. For example, R3 pointed out linguistic accuracy by saying, “Despite the inaccuracies in language and grammar, the student is able to present a weak link between motivation and being able to pursue one’s own interests.” However, linguistic features were not regarded as an independent factor that influences rater judgments of students’ CT skills. First, comments on linguistic errors were not prevalent within the protocol data (5.2% of all the comments). Second, half of the comments on linguistic errors were in although clauses or despite phrases as in the example above, suggesting that the influence of linguistic features may be weaker than that of the other features presented in Table  4 .

To illustrate the essay features that affected raters’ judgments of students’ CT skills, three essays, and the corresponding raters’ feedback, are presented in Tables  5 , 6 , and 7 . These essays received positive, positive and negative, and negative comments, respectively. Additionally, they included a wide range of features, as presented in Table  4 .

Table  5 presents the body paragraph of the essay written by Student 15 in the pre-test. The essay was rated by R1, R3, and R5, and the scores for CT skills given by the raters were 5, 4, and 3.5, respectively. The body paragraph contained two factors that motivate students to study their subjects at the university and supporting details. Comments 1 to 3 were given to the first factor, while Comments 4 to 6 were given to the second factor.

In the essay, Student 15 argues that curiosity motivates students to study at the university and explains it by providing a negative case in which students are not curious about math, which eventually leads to less concentration, interest, and motivation. In the latter half of the paragraph, she points out that positive feelings toward learning motivate students to study and presents a concrete example of how learning leads to more interest in the subjects. Overall, the raters’ comments on Student 15’s CT skills were positive. Raters appeared to perceive that both factors were supported by logical reasoning and relevant examples, which positively contributed to their judgments of her CT skills. For example, a chain of reasoning explaining why curiosity is important (from second to fifth sentences) was perceived as logical and connected to motivation for studying. The second argument (positive feelings toward learning) was also judged to be connected to motivation by R3 and R6. Furthermore, Comment 6 made by R5 indicates that the originality of the idea was part of CT from the raters’ perspective. The example of learning about Chinese literature was considered original and evaluated positively. Simply presenting ubiquitous arguments and supporting details may give the impression that students did not consider the given question carefully.

Table  6 presents the entire essay written by Student 69 in the post-test. The essay was rated by R1, R3, and R5, and the scores for CT skills given by the raters were 3, 3.5, and 3, respectively. It contained two factors that motivate students to study at the university and supporting details. Comments 1 to 3 were given to the first factor, while Comments 4 to 6 were given to the second factor.

In the essay, Student 69 argues that students’ dreams and Grade Point Averages (GPAs) motivate them to study at the university. First, she claims that university students’ dreams motivate them to study by providing a personal example in which she is able to study science hard because being a scientist is her dream for the future. Second, the student mentions that GPA is an incentive to study as students cannot study at the laboratory they wish if they have a low GPA. Raters acknowledged that her arguments successfully addressed the question (Comments 1 and 3). However, they negatively commented that the supporting evidence was based primarily on the student’s personal experience, therefore considering it egocentric. Although the support for the second factor was positively judged by R5 (as the student explains how low GPA influences all university students and not only herself), R3 commented that the argument should have been supported with more logical reasoning. The inclusion of her personal feeling (“If I could not enter the room which we are not interesting in, I’m very sad and I can’t do my best.”) may have made the second factor sound less logical and coherent. Finally, R1 wrote that the essay does not display the student’s CT skills (Comment 6). In a different essay, he also noted: “Perhaps it is difficult to show great CT skills with this task, as they are not really analyzing a text or doing any research.” This suggests that R1 appears to believe that timed independent essay writing cannot appropriately elicit the writer’s CT skills.

Table  7 presents the body paragraphs in the essay written by Student 50 in the pre-test. The essay was rated by R2 and R4, and the scores for CT skills given by the raters were 2 and 3, respectively. The comments refer to the entire essay.

In his essay, Student 50 discusses three points of personal dream plan for future, plan to travel abroad, and desire to make friends. However, he fails to explain clearly and explicitly how the three points motivate students to study. R2 perceived that these points were not relevant to the question and evaluated that the student did not engage in the topic appropriately. R2 also mentioned that it was difficult to rate the student’s CT skills because of the irrelevant opinions presented. Although his arguments are based solely on his personal experience as in Student 69’s essay, this feature was not mentioned by the raters. Furthermore, R4 commented that the three points raised by the student are basic and universal, which influenced his rating of the student’s CT skills. As discussed above, the presentation of universal opinions may negatively affect the raters’ impression of the writer’s CT skills. Nevertheless, it was not clear how the raters judged the extent to which written thoughts were universal or original.

RQ1 asked: “What is the relationship between rater judgments of students’ CT skills and their ratings on the criteria used to measure L2 writing proficiency?” The results indicate that raters’ judgments of students’ CT skills are most strongly explained by Task Achievement scores, although Lexical Resource scores were found to be another significant predictor in the post-test.

Task Achievement measures how adequately a student addresses all parts of the task, presents their position, and develops their main ideas with relevant details (British Council, n.d. ). Therefore, in the argumentative essay task used in this study, this criterion concerned itself with the extent to which students adequately answered the prompt and supported their answers by giving relevant and specific details, as well as examples. As some literature indicates, these elements are related to CT. Specifically, these are equivalent to two intellectual standards proposed by Paul and Elder ( 2014 ): relevance (how well the idea is connected to the question) and clarity (how well the idea is explained and elaborated). Moreover, Task Achievement appears to involve some aspects recognized in Stapleton’s ( 2001 ) scheme: the presence of arguments (opinions and their reasons) and supporting evidence. This finding suggests that the raters’ judgment of writers’ CT skills might be influenced by the content of argument more than how it is presented even in learners’ essays including linguistic errors.

This study found a weak relationship between CT skills and the linguistic features displayed in the participants’ essays, suggesting that demonstrating a high linguistic quality does not guarantee positive judgments of CT skills from readers. This supports the claim made by de Chazal ( 2014 ) that language proficiency is not a predictor of CT ability. However, using diverse and accurate vocabulary, measured by the Lexical Resource criterion, may lead to better impressions of one’s CT skills on readers. This may be since diverse vocabulary results in development of an idea, which was judged as a relevant element of CT. Additionally, errors in vocabulary in the L2 students’ essays might have undermined the clarity and comprehensibility of the content. As the clarity of statement is a fundamental element in the sense that other elements cannot be evaluated unless the content is written clearly (Nosich, 2022 ; Paul & Elder, 2014 ), the use of vocabulary influencing accurate conveyance of messages could be a linguistic feature relevant to CT skills especially in L2 argumentative writing. However, it remains unknown why Lexical Resource scores were not a significant predictor of CT in the pre-test.

To further examine the essay features that contributed to raters’ judgments of CT, RQ2 asked, “What essay features do raters consider when judging students’ CT skills?” The analysis of the protocol data revealed five categories: relevancy to the question, content development, logicality, quality of ideas, and other features. The first two categories support the results of RQ1 and align with the concepts of relevancy and clarity in Paul and Elder’s ( 2014 ) criteria. Few comments on linguistic features also partially concur with the outcome of RQ1. Overall, the raters seemed to construe CT skills displayed through the writing task as argumentation skills as emphasized in critical thinking movement (Davies & Barnett, 2015 ) and the CT literature (e.g., Cottrell, 2017 ; Fisher, 2011 ; Nosich, 2022 ).

Rater judgments of CT skills also included elements that were not addressed by the Task Achievement criterion: logicality and quality of ideas (see Table  4 ). Comments on logicality (logical structure, connection between ideas, and reasons supporting claims) showed that raters seemed to focus on the logical reasoning supporting students’ claims and fallacies, which is regarded as an assessment criterion for both arguments and CT skills (Paul & Elder, 2014 ; Stapleton, 2001 ). This is partly addressed by the Coherence and Cohesion criterion in the IELTS rating scale (logical sequencing of information and ideas) (British Council, n.d. ). This feature was considered as an essential component of CT in the argumentative writing task, in which “an author states a claim, uses some form of evidence—data, reasons, examples, etc.—to support the claim, and shows how the evidence supports the claim” (Hirvela & Belcher, 2021 , p. 1). The central purpose of the writing task could influence the raters’ attention to logicality. Moreover, the raters might have applied their critical reading skills, which involve appraising the degree to which the students adequately justified their opinions (Wallace & Wray, 2021 ).

The quality of ideas was primarily related to the range of perspectives displayed in the essay and their originality. First, supporting a claim by simply citing personal experience was judged negatively and regarded as egocentric (see Table  6 ). In contrast, raters positively evaluated writers who explained how a certain factor motivates university students in general, not solely for them, to study at university. This suggests that the type of evidence used influences rater judgments of a writer’s CT, and anecdotal evidence can be perceived less persuasive than other types, including causal evidence (Hornikx & Hoeken, 2007 ). Additionally, supporting claims via personal experiences can be perceived as failing to consider the question from other perspectives. This may negatively influence the judgment of a writer’s CT skills, as engaging in broader thinking by seeing situations from different perspectives has been identified as a key component of CT (Chaffee, 2019 ; Nosich, 2022 ; Paul & Elder, 2014 ). Second, raters positively judged original thoughts but negatively evaluated common and universal ideas. This is related to a disposition of critical thinkers known as intellectual autonomy (Paul & Elder, 2014 ), which entails having authorship of one’s own thoughts rather than simply accepting or borrowing those of others. Raters’ focus on originality of thought resonates with a conceptualization of CT given by academics of history, philosophy, and literary/cultural studies in Moore’s ( 2013 ) study. In particular, originality may be relevant to tasks involving creating or producing ideas (Anderson & Krathwohl, 2001 ), including the argumentative writing used in this study.

It was found that raters neither mentioned all of the features recognized in the literature nor focused on the same features in the essays written by different students. For example, the raters seldom commented on the accuracy of supporting evidence (Paul & Elder, 2014 ) and did not comment on the breadth of thinking or the inclusion of counterarguments in the essays (Stapleton, 2001 ; Washington State University Center for Teaching, Learning, and Technology, 2009 ). This may be because only a few students supported their ideas using evidence other than personal experiences and included counterarguments that challenged their own points. It suggests that relevant and salient CT criteria depend on writing tasks (e.g., independent or integrated). In this study, the raters’ foci might be narrower than theoretical models because the students were required to write an argument, without any external resources, in a short period of time. Furthermore, raters focused on the essay’s relevancy to the question in those written by students who obtained low CT scores (Table  7 ) and focused more on logicality and the quality of ideas in essays written by those who obtained medium to high CT scores (Table  5 ). This suggests that the essay features influencing raters’ judgments depend, not only on CT features displayed in the essays, but also on the students’ overall CT level.

This study investigated five raters’ judgments of students’ CT skills through reading and rating their argumentative essays, thereby revealing the features of the essays that contributed to their judgments. The results indicate that the raters’ intuitive perceptions of students’ CT skills were linked to the Task Achievement and, partly, Lexical Resource criteria in the IELTS (British Council, n.d. ). Additionally, raters’ written comments revealed that their judgments of the writers’ CT skills were affected by the essay’s relevancy to the question, content development, logicality, and quality of ideas.

The findings of this study help to delineate the CT skills addressed in L2 pedagogies so that they can be assessed through argumentative essays. In particular, test developers and teachers who are keen to assess test-takers’ CT skills could incorporate the explored features into the assessment criteria. Since there are distinct elements pertinent to CT skills, it is possible to address them in different criteria for assessing essays. If a pre-established rating scale needs to be adopted because of practical constraints, the Task Achievement category in the IELTS rating scale (British Council, n.d. ), addressing relevancy and content development, can be a viable option as scores predict students’ CT skills to some degree. However, it is recommended to incorporate other features explored by this study into rating scales for a more precise measurement of CT skills, because positive judgments from raters are likely to require more than what the Task Achievement category comprehends, such as logicality, range of perspectives (or types of evidence), and originality of ideas. Logicality can be assessed through the category of coherence, which typically focuses on progression of ideas and logical sequencing (Knoch, 2007 ). Although it already entails some components of logicality explored in the study, the category can explicitly mention the connection between a claim and supporting evidence to measure CT more precisely. The quality of ideas and originality could be evaluated though the category of content. While some existing content categories address relevancy and content development (e.g., Jacobs et al., 1981 ), the creativity of ideas and their ability to create interest can also be addressed in the criterion (see Bae et al., 2016 ), although the assessment of these features is highly subjective.

It is also possible to create a single-independent assessment criterion for CT by including all the features discovered by the study. This option is beneficial because the same features are not necessarily relevant and salient across different proficiency levels. This study indicated that raters focused on the essay’s relevancy to the question written by students who obtained low CT scores and focused more on logicality and quality of ideas in the essays written by those who obtained medium to high CT scores. This finding suggests that descriptors for low CT levels should focus on the essay’s relevancy to the topic, while descriptors for higher levels should address the logicality and originality in the content.

This study’s findings also have some implications for classroom-based assessment activities, especially for self- and peer-assessment (Brown & Abeywickrama, 2019 ). In the EAP and CLIL classrooms, which emphasize cultivating CT skills, self- and peer-assessment is recognized as a useful activity for improving students’ task performance (e.g., Coyle et al., 2010 ; Ferris, 2018 ). The explored features can be incorporated into the development of checklists, which convey the students the construct in a simple manner and are suitable for use in self- and peer-assessment (Green, 2021 ). For instance, after reading their own or peers’ argumentative essays, students could be asked to respond to statements such as “The essay discusses the given prompt without any irrelevant piece of content,” or “The essay includes opinions from various perspectives” by indicating yes or no. Can-do statements can also be created based on the study’s findings for self-assessment, including statements such as “I can give sufficient examples supporting my opinion” or “I can connect ideas logically.” In so doing, students can grasp the elusive concept of CT and realize the characteristics of highly evaluated essays without deeply learning about the definition of CT itself.

This study has some limitations. First, this study collected data from only five native English speakers who have worked as IELTS examiners. Because of this small sample size, it is difficult to generalize these findings in broader contexts, including readers with different backgrounds. Moreover, the raters did not necessarily have a deep understanding of CT and exhibited raters’ bias in their ratings. Second, this study collected data from students with relatively low English proficiency levels. Students with higher English proficiency may display a wider range of features related to CT skills, including reflecting on their own supporting evidence (Chason et al., 2017 ). Third, written verbal protocol data may suffer from non-veridicality, such that the results reported may have included features that raters did not actually consider while rating or may not have comprehensively included all of the features that they considered (Gass & Mackey, 2017 ). Post-marking interviews should have been conducted to triangulate the findings. Fourth, the assigned essay topic and writing task also had some limitations. Whether or not the students were able to demonstrate their CT skills through the topic was not determined. In other words, the topic’s appropriateness was not examined. Lastly, only one type of writing assignment (timed, independent, causal argumentative essay writing) was employed to examine the relevant criteria for measuring CT skills. As noted by R1, the task adopted in this study did not involve any analysis and research, meaning that it may not always be suitable to assess CT. Using other types of essays, such as source-based argumentative writing (Plakans & Ohta, 2021 ), might reveal different dimensions of CT, that is, different essay features might have been found to be relevant to raters’ judgments of CT skills.

Therefore, further research is recommended to investigate the way in which CT skills are related to various essay writing tasks, including integrated writing tasks or research projects, which can be done by examining rater judgments and collecting data that compensate for limitations of verbal protocols (e.g., interviews). Such research will help in revealing the elusive concept of CT skills in L2 pedagogies.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Common European Framework of Reference

Content and Language Integrated Learning

  • Critical thinking

English for academic purposes

Grade point average

International English Language Testing System

Second language

Multiple regression

Research question

Washington State University

Anderson, L., & Krathwohl, D. (Eds.) (2001). A taxonomy for learning, teaching, and assessing: a revision of Bloom’s taxonomy of educational objectives . London: Pearson.

Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice: developing language assessments and justifying their use in the real world . Oxford: Oxford University Press.

Bae, J., Bentler, P. M., & Lee, Y.-S. (2016). On the role of content in writing assessment. Language Assessment Quarterly , 13 (4), 302–328. https://doi.org/10.1080/15434303.2016.1246552 .

Article   Google Scholar  

Bosker, H. R., Pinget, A., Quené, H., Sanders, T., & de Jong, N. H. (2013). What makes speech sound fluent? The contributions of pauses, speed and repairs. Language Testing , 30 (2), 159–175. https://doi.org/10.1177/0265532212455394 .

Braun, V., & Clarke, V. (2013). Successful qualitative research: a practical guide for beginners . London: Sage.

British Council. (n.d.). IELTS task 2 writing band descriptors (public version) . Retrieved October 1, 2018, from https://takeielts.britishcouncil.org/sites/default/files/ielts_task_2_writing_band_descriptors.pdf#search=%27IELTS+TASK+2+Writing+band+descriptors+%28public+version%29%27 .

Brown, H. D., & Abeywickrama, P. (2019). Language assessment: principles and classroom practices , (3rd ed.). Hoboken: Pearson.

Chaffee, J. (2019). Thinking critically , (12th ed.). Boston: Cengage Learning.

Chason, L., Loyet, D., Sorenson, L., & Stoops, A. (2017). An approach for embedding critical thinking in second language paragraph writing. TESOL Journal , 8 (3), 582–612. https://doi.org/10.1002/tesj.288 .

Cottrell, S. (2017). Critical thinking skills: effective analysis, argument and reflection , (3rd ed.). London: Red Globe Press.

Council of Europe (2001). Common European Framework of Reference for languages: learning, teaching, assessment . Cambridge: Cambridge University Press.

Coyle, D., Hood, P., & Marsh, D. (2010). CLIL: Content and language integrated learning . Cambridge: Cambridge University Press.

Creswell, J. W. (2015). A concise introduction to mixed methods research . Thousand Oaks: Sage.

Cumming, A. H., Kantor, R., & Powers, D. E. (2001). Scoring TOEFL essays and TOEFL 2000 prototype writing tasks: an investigation into raters’ decision making and development of a preliminary analytic framework. (TOEFL Monograph No. MS-22) . Princeton: Educational Testing Service.

Google Scholar  

Davies, M., & Barnett, R. (2015). Introduction. In M. Davies, & R. Barnett (Eds.), The Palgrave handbook of critical thinking in higher education , (pp. 1–25). New York: Palgrave Macmillan.

de Chazal, E. (2014). English for academic purposes . Oxford: Oxford University Press.

Dummett, P., & Hughes, J. (2019). Critical thinking in ELT: a working model for the classroom . Boston: National Geographic Learning.

Ennis, R. (2011). Critical thinking: reflection and perspective Part I. Inquiry: Critical Thinking Across the Disciplines , 26 (1), 4–18. https://doi.org/10.5840/inquiryctnews20112613 .

Ferris, D. R. (2018). Writing instruction and assessment: activities, feedback, and options. In J. M. Newton, D. R. Ferris, C. C. M. Goh, W. Grabe, F. L. Stoller, & L. Vandergrift (Eds.), Teaching English to second language learners in academic contexts: reading, writing, listening, and speaking , (pp. 106–122). New York: Routledge.

Field, A. (2018). Discovering statistics using IBM SPSS statistics , (5th ed.). London: Sage.

Fisher, A. (2011). Critical thinking: an introduction , (2nd ed.). Cambridge: Cambridge University Press.

Gass, S. M., & Mackey, A. (2017). Stimulated recall methodology in applied linguistics and L2 research , (2nd ed.). New York: Routledge.

Green, A. (2021). Exploring language assessment and testing: language in action , (2nd ed.). Oxon: Routledge.

Hirvela, A. (2017). Argumentation and second language writing: are we missing the boat? Journal of Second Language Writing , 36 , 69–74. https://doi.org/10.1016/j.jslw.2017.05.002 .

Hirvela, A., & Belcher, D. (2021). Introduction. In A. Hirvela, & D. Belcher (Eds.), Argumentative writing in a second language: perspectives on research and pedagogy , (pp. 1–9). Ann Arbor: University of Michigan Press.

Hornikx, J., & Hoeken, H. (2007). Cultural differences in the persuasiveness of evidence types and evidence quality. Communication Monographs , 74 (4), 443–463. https://doi.org/10.1080/03637750701716578 .

Jacobs, H., Zinkgraf, S., Wormuth, D., Hartfiel, V., & Hughey, J. (1981). Testing ESL composition: a practical approach . Rowley: Newbury House.

Knoch, U. (2007). ‘Little coherence, considerable strain for reader’: a comparison between two rating scales for the assessment of coherence. Assessing Writing , 12 (2), 108–128. https://doi.org/10.1016/j.asw.2007.07.002 .

Knoch, U. (2022). Assessing writing. In G. Fulcher, & L. Harding (Eds.), The Routledge handbook of language testing , (2nd ed., pp. 236–253). Oxon: Routledge.

Linacre, J. M. (2019). Facets computer program for many-facet Rasch measurement, version 3.83.0 [Computer Software] . Winsteps.com .

McNamara, T. F. (1990). Item Response Theory and the validation of an ESP test for health professionals. Language Testing , 7 (1), 52–75. https://doi.org/10.1177/026553229000700105 .

McNamara, T. F. (1996). Measuring second language performance . London: Pearson.

Mehisto, P., & Ting, T. (2017). CLIL essentials for secondary school teachers . Cambridge: Cambridge University Press.

Moore, T. (2013). Critical thinking: seven definitions in search of a concept. Studies in Higher Education , 38 (4), 506–522. https://doi.org/10.1080/03075079.2011.586995 .

Morozov, A. (2011). Student attitudes toward the assessment criteria in writing-intensive college courses. Assessing Writing , 16 (1), 6–31. https://doi.org/10.1016/j.asw.2010.09.001 .

Nosich, G. (2022). Critical writing: a guide to writing a paper using the concepts and processes of critical thinking . London: Rowman and Littlefield.

Paul, R., & Elder, L. (2014). Critical thinking: tools for taking charge of your learning and your life , (2nd ed.). Lanham: Rowman and Littlefield.

Plakans, L., & Gebril, A. (2017). An assessment perspective on argumentation in writing. Journal of Second Language Writing , 36 , 85–86. https://doi.org/10.1016/j.jslw.2017.05.008 .

Plakans, L., & Ohta, R. (2021). Source-based argumentative writing assessment. In A. Hirvela, & D. Belcher (Eds.), Argumentative writing in a second language: perspectives on research and pedagogy , (pp. 64–81). Ann Arbor: University of Michigan Press.

Ramage, J. D., Bean, J. C., & Johnson, J. (2015). Writing arguments: a rhetoric with readings , (10th ed.). Boston: Pearson.

Saito, K., Trofimovich, P., & Isaacs, T. (2017). Using listener judgments to investigate linguistic influences on L2 comprehensibility and accentedness: a validation and generalization study. Applied Linguistics , 38 (4), 439–462. https://doi.org/10.1093/applin/amv047 .

Sato, T. (2012). The contribution of test-takers’ speech content to scores on an English oral proficiency test. Language Testing, 29 (2), 223–241. https://doi.org/10.1177/0265532211421162 .

Sato, T., & McNamara, T. (2019). What counts in second language oral communication ability? The perspective of linguistic laypersons. Applied Linguistics, 40 (6), 894–916. https://doi.org/10.1093/applin/amy032 .

Stapleton, P. (2001). Assessing critical thinking in the writing of Japanese university students: insights about assumptions and content familiarity. Written Communication , 18 (4), 506–548. https://doi.org/10.1177/0741088301018004004 .

Tanaka, J., & Gilliland, B. (2017). Critical thinking instruction in English for academic purposes writing courses: a dialectical thinking approach. TESOL Journal , 8 (3), 657–674. https://doi.org/10.1002/tesj.291 .

Wade, C. (1995). Using writing to develop and assess critical thinking. Teaching of Psychology , 22 (1), 24–28. https://doi.org/10.1207/s15328023top2201_8 .

Wallace, M., & Wray, A. (2021). Critical reading and writing for postgraduates , (4th ed.). London: Sage Publication.

Washington State University Center for Teaching, Learning, and Technology. (2009). Critical and integrative thinking . Retrieved March 3, 2020, from https://www.colorado.edu/sei/sites/default/files/attached-files/wsu-critical-thinking-rubric-2009.pdf .

Wilson, K. (2016). Critical reading, critical thinking: delicate scaffolding in English for Academic Purposes (EAP). Thinking Skills and Creativity , 16 , 256–265. https://doi.org/10.1016/j.tsc.2016.10.002 .

Yanning, D. (2017). Teaching and assessing critical thinking in second language writing: an infusion approach. Chinese Journal of Applied Linguistics , 40 (4), 431–451. https://doi.org/10.1515/cjal-2017-0025 .

Yuan, R., & Stapleton, P. (2020). Student teachers’ perceptions of critical thinking and its teaching. ELT Journal , 74 (1), 40–48. https://doi.org/10.1093/elt/ccz044 .

Download references

Acknowledgements

Not applicable.

Author information

Authors and affiliations.

Sophia University, Bld#6 510, 7-1 Kioi-cho, Chiyoda-ku, Tokyo, 102-8554, Japan

Takanori Sato

You can also search for this author in PubMed   Google Scholar

Contributions

The author read and approved the final manuscript.

Corresponding author

Correspondence to Takanori Sato .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Sato, T. Assessing critical thinking through L2 argumentative essays: an investigation of relevant and salient criteria from raters’ perspectives. Lang Test Asia 12 , 9 (2022). https://doi.org/10.1186/s40468-022-00159-4

Download citation

Received : 05 January 2022

Accepted : 04 March 2022

Published : 08 April 2022

DOI : https://doi.org/10.1186/s40468-022-00159-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Second language writing
  • Argumentative writing
  • Assessment criteria
  • Rating scale development

articles on assessing critical thinking

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PMC10607682

Logo of jintell

Critical Thinking: Creating Job-Proof Skills for the Future of Work

Daniela dumitru.

1 Teacher Training Department, Bucharest University of Economic Studies, 010374 Bucharest, Romania

2 Doctoral School of Psychology and Educational Sciences, University of Bucharest, 050663 Bucharest, Romania

Diane F. Halpern

3 Department of Psychology, Claremont McKenna College, Claremont, CA 91711, USA; moc.liamg@nreplahfenaid

In this study, we explore the transformative impact of artificial intelligence (AI) on the job market and argue for the growing importance of critical thinking skills in the face of job automation and changing work dynamics. Advancements in AI have the potential to disrupt various professions, including, for example, programming, legal work, and radiology. However, solely relying on AI systems can lead to errors and misjudgments, emphasizing the need for human oversight. The concept of “job-proof skills” is introduced, highlighting the importance of critical thinking, problem-solving, empathy, ethics, and other human attributes that machines cannot replicate with the same standards and agility. We maintain that critical thinking can be taught and learned through appropriate classroom instruction and transfer-focused approaches. The need for critical thinking skills is further reinforced by the influx of information and the spread of misinformation in the age of social media. Moreover, employers increasingly value critical thinking skills in their workforce, yet there exists a gap between the demand for these skills and the preparedness of college graduates. Critical thinking is not only essential for the future of work, but also for informed citizenship in an increasingly complex world. The potential impact of AI on job disruption, wages, and employment polarization is discussed, highlighting the correlation between jobs requiring critical thinking skills and their resistance to automation. We conclude by discussing collaborative efforts between universities and labor market organizations to adapt curricula and promote the development of critical thinking skills, drawing on examples from European initiatives. The need to prioritize critical thinking skills in education and address the evolving demands of the labor market is emphasized as a crucial step for navigating the future of work and opportunities for workers.

1. Introduction: Critical Thinking: Creating Job-Proof Skills for the Future of Work

The rapid evolution of online technologies has ushered in a paradigm shift in employment, redefining the nature of work and the skills required to succeed in the digital age. This transformative landscape, characterized by the ubiquitous presence of the Internet, social media platforms, and advanced artificial intelligence systems, has created a plethora of new opportunities and challenges in the labor market. As we navigate this digital frontier, it is becoming increasingly clear that traditional employment paradigms are undergoing a profound transformation. The convergence of online technologies with the demands of a networked world has not only created new job opportunities, but it has also disrupted established industries, rendering some job roles obsolete while creating demand for previously unforeseen skills. In this era of unprecedented connectivity and innovation, examining the intricate interplay between online technologies and jobs is paramount as it holds the key to understanding the dynamics of our rapidly evolving workforce.

Artificial intelligence (AI) is disrupting many jobs and promises “to change the way the world works” ( adminGPT 2023, para. 13 ). The number and range of AI programs are increasing at a rapid pace, and they are likely to continually improve to meet user demands. Consider, for example, ChatGPT, which can respond to questions and requests in a way that seems to come from a human rather than a computer program. GPT stands for “generative pretrained transformer”. It is generative in that it can provide responses that it never “learned”; it is pretrained with a large language model ( Bushwick et al. 2023 ). Newer versions can describe visual images, although thus far, they cannot create visual images. Its uses are seemingly endless. It is easy to imagine how such programs can change the lives of blind individuals. In fact, it can and will change the lives of all of us.

In this paper, we argue that these advances in online technologies will make critical thinking (CT) more important than ever before. Many who are preparing to enter the job market, and many who are already employed, will need to adapt to new forms of job automation and different ways of working.

Consider, for example, that an early achievement of ChatGPT was its generation of Python code (a computer language) to compute various tasks, such as data analysis. Apparently, getting ChatGPT to generate code is so easy that several YouTube videos have popped up claiming that they can teach novice users to use ChatGPT to generate code in 90 s. ( Data Professor 2023 ). The benefits are obvious, but so are the potential job losses for people who work in Python. Python coders will need to upgrade their skills, perhaps first becoming experts in the use of ChatGPT and similar programs, but this also has a positive side--they can spend more time working on larger questions such as which analyses are needed, and, of course, carefully reviewing the work produced by AI to ensure that it is accurate and understandable. Early versions of ChatGPT responses often contained errors. A New York lawyer learned the hard way: Steven A. Schwartz, a lawyer for 30 years, used ChatGPT to create a legal document ( Weiser and Schweber 2023 ). It was filled with fake citations and bogus judicial opinions. Sadly, Mr. Schwartz never checked the accuracy of the document he filed in court. The judge was not amused. This highly public and embarrassing event should be a lesson for all of us. Current AI programs cannot be trusted to take over our work, though they may be able to aid or supplement it. However, other AI programs can “read” radiographs more accurately than human radiologists, which provides a benefit to both radiologists and patients. There is an immediate positive effect for this advancement: Radiologists will have more time to directly work with patients, and yes, they must also check the accuracy of the outputs from their programs when presenting diagnoses.

For the rest of us, whether we are students or early or late in our careers, we need to focus on the development of “job-proof skills” in the face of AI advances. A report from the United Nations defines job-proof skills as “conceptual and strategic thinking, problem-solving, empathy, optimism, ethics, emotional intelligence, and judgments are the future-proof skills and attributes that machines will not be able to replicate with the same standards and agility as qualified human beings” ( Elkeiy 2022, para. 5 ). In other words, critical thinking skills will always be needed.

2. What Is Critical Thinking?

Although some scholars in the field of critical thinking have emphasized differences among various definitions, we believe that the commonalities are evident (c.f., Dwyer 2017 ; Nisbett 2015 ; Lipman 1991 ; Fisher 2001 ). There are some differences in the use of terms and several skills might be more important, but all of the definitions (more or less) conform to our preferred definition: “Critical thinking is the use of those cognitive skills and abilities that increase the probability of a desirable outcome. It is purposeful, reasoned, and goal directed. It is the kind of thinking involved in solving problems, formulating inferences, calculating likelihoods, and making decisions. Critical thinkers use these skills appropriately, without prompting, and usually with conscious intent, in a variety of settings. That is, they are predisposed to think critically. When we think critically, we are evaluating the outcomes of our thought processes--how good a decision is or how well a problem is solved. Critical thinking also involves evaluating the thinking process--the reasoning that went into the conclusion we’ve arrived at, or the kinds of factors considered in making a decision” ( Halpern and Dunn 2023, pp. 6–7 ). The reason we need a common definition of critical thinking is that, without it, instructors can and have passed almost anything off as instruction in critical thinking. However, common ground is to be found concerning CT definitions. In a European project, which we shall refer to in Section 4.3 , the critical thinking definition is based on the works of Halpern and Dunn ( 2023 ), Facione ( 1990 ), Paul and Elder ( 2008 ), and Kuhn ( 1999 ). During two debate sessions, 33 international participants from higher education and the labor market defined critical thinking as a deliberate cognitive process guided by conscious, dynamic, self-directed, self-monitored, and self-correcting thought ( Rebelo et al. 2023 ). It relies on both disciplinary and procedural knowledge, along with metacognitive aspects (including metacognitive, meta-strategic, and epistemological dimensions). Critical thinking can be cultivated and enhanced through the development of competencies, and it is facilitated by various attitudes, such as systematic thinking, open-mindedness, empathy, flexibility, and cognitive maturity. Additionally, it encompasses intellectual skills such as reflection, self-regulation, analysis, inference, explanation, synthesis, and systematic thought. Critical thinking not only stimulates problem-solving capabilities but also facilitates effective communication, fosters independent and holistic thinking, and bolsters decision-making and active citizenship ( Pnevmatikos et al. 2021 ).

2.1. Can Critical Thinking Be Learned?

We teach writing, oral communication, and mathematics with the (often implicit) belief that these skills will be learned and transferred to multiple settings both inside and outside of the classroom. There is a large and growing research literature showing that, with appropriate classroom instruction in critical thinking, including specific instruction designed for transfer, the skills will spontaneously transfer and in uncued (i.e., there are no reminders to use the critical thinking skill that was learned in class) situations ( Dumitru 2012 ; Heijltjes et al. 2014 ; Tiruneh 2019 ). Several such studies were presented by Dwyer ( 2017 ) and Halpern and Dunn ( 2023 ). For the sake of brevity, we review just one recent study. The study was designed to counteract the effects of conspiracy theories. When people believe conspiracy theories, they often act in harmful ways–such as refusing to get the COVID-19 vaccine, which resulted in the death of large numbers of people around the world, or attacking the United State Capitol Building on 6 January 2021 in the belief that there was a conspiracy afoot designed to steal the United States 2020 presidential election from Donald Trump. In a review of the research literature on the efficacy of interventions, the researchers found “there was one intervention which was characteristically different to the rest” ( O’Mahony et al. 2023, para. 23 ). It was a semester-long university course in critical thinking that was designed to teach students the difference between good scientific practices and pseudoscience. These courses require effort and commitment, but they are effective. The same conclusion applies to all interventions designed to enhance critical thinking. There are no fast and easy “once and done” strategies that work. This is why we recommend continuous and pervasive coursework to make sure that the learning of CT skills “sticks.”

2.2. The Need for Critical Thinking Skills

Online technologies-related (including AI) job loss and redesign are not the only reasons why we need to concentrate on teaching and learning the skills of critical thinking. COVID-19 left 140 million people out of work, and many of their jobs will never return ( Roslansky 2021 ). We are drowning in a tsunami of information, confronted with advertisements online, in news reports, social media, podcasts, and more. The need to be able to distinguish good information from bad is critical. In addition, employers want to hire people with critical thinking skills. In a recent report by Hart Research Associated ( 2018 ), they found that in an employer survey of 501 business executives, 78% said that critical thinking/analytic reasoning is the most important skill they want in their employees, but they also added that only 34% of college graduates arrive well prepared in critical thinking. This gap between what employers want and their perception of the preparedness of the workforce was larger for critical thinking than for any other area. In fact, every report on the future of work made this same point. Consider this quote from The World Economic Forum ( 2020 ) on the future of jobs: “Skills gaps continue to be high as in-demand skills across jobs change in the next five years. The top skills and skill groups which employers see as rising in prominence in the lead up to 2025 include groups such as critical thinking and analysis as well as problem-solving.” (p. 5). In a report from the Office of the European Union: Key Competences for Lifelong Learning, the commissioner wrote “Critical thinking, media literacy, and communication skills are some of the requirements to navigate our increasingly complex world” ( Navracsics 2019, p. 3 ). Of course, critical thinking is not just needed in the world of work. A true democracy requires an educated citizenry with citizens who can think critically about world social issues, such as the use/threat of AI, war, poverty, climate change, and so much more. Irrational voters are a threat to all of us—and to themselves.

The need to think critically is not new, but it has taken on a new urgency as social media and other forms of communication have made the deliberate spread of misinformation move at the speed of light. There is nothing new about the use of lies, half-truths, and innuendos to get people to believe something that is not true. Anyone can post anything on popular media sites, and this “fake news” is often copied and shared thousands of times. Sometimes the information is spread with a deliberate attempt to mislead; other times, it is copied and spread by people who believe it is true. These messages are often used to discredit political adversaries, create social unrest, and incite fear. It can be a difficult task to determine what to believe and what to discard. Vosoughi et al. ( 2018 ) analyzed data from 126,000 tweets that were spread by approximately 3 million people. How did the researchers discriminate true data from false data? The same way we all should. They used several different fact-checking sites and found 95% to 98% agreement regarding the truth or falsehood of information. They found that false data spread more quickly and more widely than true data because the false data tended to be novel and sensational, rendering it salient and seductive.

In today’s landscape, the imperative to foster critical thinking skills is becoming increasingly apparent as we grapple with the rapid rise of social media and artificial intelligence technologies and their profound impact on the future of work. The confluence of these transformative forces has ushered in a new era characterized by the potential for significant job disruption. As online technologies advance and automation becomes more widespread, certain traditional job roles may become obsolete, requiring the development of innovative skills and adaptability in the workforce. In this context, critical thinking emerges as a central element in preparing individuals to navigate the evolving job market. It equips individuals with the ability to analyze complex information, discern credible sources from the proliferation of social media information, and make informed decisions in an era of blurring boundaries between human and machine contributions to the workforce. Cultivating critical thinking skills will be essential to ensuring that individuals can take advantage of the opportunities presented by new technologies while mitigating the challenges of job disruption in this AI-driven future.

3. Critical Thinking Skills and Job Disruption and Replacement

Eloundou et al. in 2023 estimated that about 15% of all U.S. workers’ jobs could be accomplished much faster and at the same level of quality with currently available AI. There are large differences in the extent to which various occupations and industries will be affected by advancements in AI. For example, tasks that require a high degree of human interaction, highly specialized domain knowledge, or creating innovative technologies will be minimally affected; whereas, other occupations such as providing captions for images or answering questions about a text or document are more likely to be affected. Routine-based jobs in general are more likely to be dislodged by advanced technologies ( Acemoglu 2002 ). Using the basic definitions of skills that are standard in O*Net, Eloundou et al. ( 2023 ) found a clear negative correlation between jobs requiring knowledge of science and critical thinking skills and the likelihood that AI will “take over” the job. These findings reinforce our main point—the best way to gain job-proof skills is with critical thinking.

The effect of online technologies on wages is complicated because of the large number of factors that come together to determine earnings. Acemoglu and Autor ( 2011 ) advocated for a model that simultaneously considers the level of the tasks required for any job (low, medium, and high), where a high level of skill is defined as one that allows employees to perform a variety of tasks, the demand for the tasks, and technological changes that can complement a task or replace it. They assert that employment has become increasingly polarized with the growth in both high education, high wage occupations and low education, and low wage occupations in the United States and the European Union. To understand and predict which occupations will be most disrupted by AI (and other developing technologies), an investigator will need to simultaneously consider all of these variables. Technological advancements can generate shifts in demand, favoring either high- or low-skilled workers. According to Acemoglu and Autor ( 2011 ), we can expect some of the largest disruptive effects at the middle level of skills, where some of the tasks performed at this level can be more easily replaced by new technologies, with widespread employment growth in high- and low-skilled occupations.

4. Business-University Collaborations

The pursuit of promoting high standards of critical thinking in university students across various academic disciplines is a challenging endeavor that should be leveraged through collaboration with stakeholders. In such collaborations, stakeholders can contribute to refining the skills required by learners and bring their own perspectives to academic instruction. This close partnership between universities and stakeholders helps minimize gaps and mismatches in the transition to the labor market, facilitates research collaboration, and increases student motivation.

Collaborations between businesses and universities have gained increasing importance in today’s rapidly evolving educational and economic landscape. These partnerships are instrumental in bridging the gap between academic learning and the real-world skills demanded by the job market. One key aspect of business-university collaboration (BUC) is the alignment of curricula with the dynamic needs of industries. This entails the joint effort of higher education institutions (HEIs) and industry experts to design, develop, and deliver educational programs that equip students with practical, job-ready skills. The curriculum design phase involves tailoring study programs, courses, and modules to address skills gaps and align with the specific requirements of employers.

Moreover, BUC extends beyond the classroom. Collaborations often involve business engagement in educational activities, including guest lectures, internships, co-op programs, and research projects. These interactions provide students with invaluable exposure to real-world scenarios, allowing them to apply theoretical knowledge in practical settings.

In essence, BUC is a multifaceted partnership that benefits both students and businesses. It ensures that educational programs remain relevant, fostering a seamless transition from academia to the workforce. This collaborative approach not only enhances students’ employability but also contributes to the overall growth and innovation of industries.

Operationalizing the collaboration implicates a particular focus on curriculum design, development, and delivery. These involve the collaboration between higher education institutions and labor market partners to create or enhance undergraduate or postgraduate study programs, courses, or modules. This collaborative effort aims to address skills gaps, align curricula with employers’ needs, integrate training initiatives, and improve graduates’ employability. Additionally, curriculum delivery includes various forms of business involvement, such as guest lectures, placements, supervision, mentoring, and work-based learning activities.

While the existing literature often discusses the barriers and motivations for university-business collaboration ( Healy et al. 2014 ; Orazbayeva et al. 2020 ), there is a need for more empirical insights into the roles and responsibilities of each party engaged in joint curriculum design, development, and delivery, as well as lessons learned from these collaborations ( Rebelo et al. 2023 ).

4.1. Why Do We Need Higher Education’s Help?

In the preceding sections of this paper, we delved into the disruptive forces of artificial intelligence (AI) on the job market and the critical need for individuals to adapt to these changes by developing “job-proof skills”. The rise of online technologies such as ChatGPT presents both opportunities and challenges, particularly in fields where middle-level skills are required. To effectively tackle these challenges, we must turn our attention to the pivotal role of education and the cultivation of essential skills such as critical thinking.

We highlighted how AI is rapidly transforming various industries and the need for individuals to adapt to these changes. Moreover, we explored the question of whether critical thinking can be learned, showcasing research evidence that supports the teachability of this skill. Now, we shall explore practical strategies for fostering critical thinking skills through collaborations between universities and businesses. The idea here is to create an educational framework that equips students with the capabilities needed to thrive in the evolving workforce.

Building upon the success of two European projects, “Critical thinking across higher education curricula—CRITHINKEDU” and “Critical thinking for successful jobs—THINK4JOBS”, we argue that incorporating practical experience and CT development through apprenticeships is a possible action for better higher education classes. This collaborative approach between HEI and LMO designed to address the differing perspectives and terminologies used by these two entities regarding critical thinking could be an important curriculum design for the better adaptation of job market technology disruptions.

Research conducted by Eloundou et al. ( 2023 ), which shows that critical thinking skills and science skills are less likely to be taken by AI, compels us to sustain the THINK4JOBS apprenticeship curricula as a possible teaching protocol for critical thinking enhancement to face challenges posed by AI at work.

The results from these projects demonstrate significant progress in students’ critical thinking skills and dispositions. These improvements, as highlighted below in Section 4.3 , underscore the effectiveness of embedding critical thinking in the curriculum. The guidelines formulated for implementing Critical Thinking Blended Apprenticeship Curricula provide a roadmap for educators to follow when effectively integrating critical thinking into their courses.

As we ponder the possibility of a world where critical thinking is widespread, we can envision a future where individuals are equipped to confront the ideological fanaticism that threatens global stability. Critical thinking, as both a cognitive skill and a disposition, has the potential to shape a workforce capable of adapting to the ever-changing landscape of work, making informed decisions, and contributing to a more rational and democratic world. The THINK4JOBS project emphasizes the practical steps taken to prepare students for the future job market and sets the stage for further exploration of the role of critical thinking in addressing global challenges, including AI presence in the job market.

4.2. CRITHINKEDU Proctocol for Critical Thinking Education across Curricula

Given that the best education for the future of work is the acquisition of critical thinking skills, how can we facilitate this sort of education? One way to obtain a job-proof education is to create classes with the help of labor market organizations. Two projects funded by the European Union were designed to bring to life the idea that better communication and collaboration between universities and employers result in a better adaptation of the curriculum, especially a curriculum involving critical thinking skill development.

Between 2016 and 2019, the project “Critical thinking across the European higher education curriculum—CRITHINKEDU” focused on how CT is taught in various academic domains. The CRITHINKEDU project, involving universities across Europe, exemplifies how academia and industry can join forces to bridge the gap between classroom learning and real-world job demands. This initiative aimed to enhance the curriculum by explicitly emphasizing critical thinking skill development. It revealed that employers across various fields value critical thinking, and they perceive it as essential for recent graduates entering the workforce.

The participants were eleven universities from nine European countries (Belgium, Czech Republic, Greece, Italy, Spain, Portugal, Romania, Lithuania, and Ireland; Dominguez 2018). Qualitative research was conducted with 32 focus groups comprised of professionals from various European countries and fields. The findings align with previous studies: “CT is a set of interconnected skills (interpretation, inference, analysis, explanation, evaluation, self-regulation”, see Payan-Carreira et al. ( 2023, p. 16 ), and dispositions (open-mindedness, refection, attentiveness, organization, perseverance, intrinsic goal motivation ( Payan-Carreira et al. 2023 ), essential for recent graduates in response to labor market demands. However, an important consideration is that the practical application of CT varies across professional fields. The participants in this study defined the ideal critical thinker as someone with a cultivated mindset, motivated to learn and improve, and equipped with cognitive and behavioral tools to anticipate, regulate, and monitor their thinking. CT is associated with problem-solving and decision-making and is intertwined with other skills such as proactivity, adaptability, creativity, emotional intelligence, communication, and teamwork. The report from this project also introduced “a European collection of the Critical Thinking skills and dispositions needed in different professional fields for the 21st century” ( Dominguez 2018 ), which categorizes CT skills and dispositions based on professional fields and offers a basis for defining learning objectives and adapting university curricula. This study provides valuable insights from 189 European employers into CT needs in the labor market for new graduates. The interviewed professionals had an obvious preference for CT skills in STEM fields and an obvious preference for dispositions in the Humanities. Social Sciences and bio-medical sciences professionals were equally interested in CT skills and dispositions, with a slight preference for dispositions ( Dominguez 2018, p. 28 ).

4.3. Next Steps: THINK4JOBS Blended Appreticeship Curricula

After the termination of the CRITHINKEDU project, partners from Romania, Greece, Lithuania, and Portugal, with the addition of a new partner from Germany, proposed a new research application: “Critical Thinking for Successful Jobs—THINK4JOBS” ( www.think4jobs.uowm.gr ). The idea was to utilize the results from the previous project and, together with labor market organizations, create new courses that are more adapted to the reality of the future of work. The core element of the classes was explicit teaching of critical thinking, using real-life cases and methods. In an apprenticeship model, critical thinking skills are embedded in a relevant context. The value of realistic contexts is that students can see the need for the skills being taught in a workplace scenario. Relevant contexts enhance student engagement and motivation to learn. Dumitru et al. ( 2021 ) focused on improving students’ critical thinking skills and dispositions through collaboration between Higher Education Institutions (HEIs) and Labor Market Organizations (LMOs). The aim was to bridge the gap between HEI curricula and the expectations of the labor market by incorporating apprenticeships that provide practical experience and CT development.

The process of mapping responses from those in the labor market organizations onto college curricula involved the use of research methods such as observation, focus groups, and documentary analysis, with stakeholders from HEIs and LMOs participating. The findings indicated that while there were no definitive “gaps” between HEIs and LMOs, there were contextual differences in the approach to CT. HEIs focus on long-term career preparation, while LMOs emphasize short-term learning strategies. The terminology and expression of CT also differed between the two contexts. Based on the findings, ten work-based scenarios were created, with one from each discipline involved in the project. Overall, the report ( Dumitru et al. 2021 ) highlighted the different goals and perspectives of HEIs and LMOs regarding CT, emphasizing the need for collaboration and a common understanding of which skills should be included in the college curriculum.

There is a different context in the approach to CT, since HEIs usually use different learning activities, focusing more on career preparation with long-term goals, while LMOs follow compact and short-term learning and teaching strategies. Furthermore, the findings suggest that CT is a new workplace requirement and that HEIs and LMOs do not choose the same terminology when referring to the concept, with HEIs usually choosing scientific terms. Another element that emerged is that CT is generally expressed in a declarative way in higher education institutions, while in LMOs the application to specific cases follows a more procedural approach. Put another way, LMOs are focused on making a profit, while HEI is focused on being socially responsible.

In the second phase of the project, partners ( Pnevmatikos et al. 2021 ) focused on the development of a collaborative training curriculum for Higher Education Instructors and LMO tutors. The purpose of the training was to enhance comprehension and knowledge of critical thinking for both sides of this collaboration, since previous research indicated a potential lack of conceptual and procedural understanding between these two entities. Additionally, the training aimed to facilitate the promotion, support, and evaluation of students’ CT skills within apprenticeship curricula, as well as the creation of blended curricula utilizing an open-source learning platform. The training course encompassed workshops that delved into various aspects of CT, including analyzing and reassembling ideas about CT, formulating a working definition of CT, instructional methodologies, blended learning techniques, usage of a learning platform, CT assessment, and the development of a Memorandum of Understanding (MoU) between higher education institutions and LMOs. The participants’ knowledge about these topics was assessed through pre- and post-training online questionnaires. Although data analysis showed various predicted trends, only perceived self-confidence in the topics covered during the training obtained statistical significance ( Pnevmatikos et al. 2021 ).

In the final report from this project, Payan-Carreira et al. ( 2023 ) presented the results of the implementation of the critical thinking Blended Apprenticeships Curricula (CTBAC) and discussed the improvements in critical thinking skills and dispositions observed in students. The study involved cross-disciplinary analysis and assessed changes before and after the piloting activities. A total of 609 students participated, and their critical thinking skills and dispositions were evaluated.

The consortium chose the Critical Thinking Self-Assessment Scale (CTSAS) developed by Nair ( 2011 ) as an instrument to assess CT skills based on an earlier conceptualization ( Facione 1990 ). The questionnaire has been tested in various geographic and cultural contexts, demonstrating good reliability, internal consistency, and confirmatory factor analysis results. However, the original CTSAS was considered too long to complete, consisting of 115 items, so a shorter version was specifically developed for this project. The short form of the questionnaire (CTSAS-SF) was created through a two-step process. Items with loading weights below .500 were eliminated, resulting in 84 remaining items. Redundant and non-cognitive-focused items were marked for elimination, leaving 60 items. The short form maintained the original scale’s framework and utilized a seven-point Likert scale ranging from 0 (Never) to 6 (Always) for students to respond to items assessing various dimensions and subdimensions of CT skills.

The CTSAS-SF validation process, with confirmatory factor analysis, resulted in two models with equivalent satisfactory goodness-of-fit indices. Model 4, the second-order factor model (RMSEA = .051; TLI = .924; CFI = .927), had a chi-square/df ratio of 2.33. The Cronbach alpha of the overall instrument was excellent (α = .969). Sample items are shown in Table 1 .

Sample items forming Critical Thinking Self-Assessment Scale (CTSAS), Nair ( 2011 ).

NO. of ItemItemSkill
1 Interpretation
10 Analysis
22 Evaluation
31 Inference

Compared to instruments for assessing CT skills, the availability of instruments for measuring critical thinking (CT) dispositions is limited. However, one of the instruments adopted by the consortium to assess CT dispositions is the Student-Educator Negotiated Critical Thinking Dispositions Scale (SENCTDS), which was developed by Quinn et al. ( 2020 ). The scale was validated with a mixed population of Irish and American undergraduate students. The scale considers a variety of CT dispositions that the authors consider important for the labor market and real-world decision-making. Some of the items in the scale combine Facione ’s ( 1990 ) original CT dispositions into new dimensions that are relevant to academic and labor market success, such as organization, perseverance, and intrinsic goal motivation. The scale consists of six dimensions (Reflection, Attentiveness, Open-mindedness, Organization, Perseverance, and Intrinsic Goal Motivation) and presents statements for students to respond to using a 7-point Likert scale. The Likert scale ranges from 1 (strongly disagree) to 7 (strongly agree). The original version of the SENCTDS contains 21 items. The validation process, with confirmatory factor analysis, identified only one model presenting a satisfactory goodness-of-fit index—model 3, comprised of six correlated factors (RMSEA = .054; TLI = .974; CFI = .969) with a chi-square/df ratio of 2.57. The instrument presented a high Cronbach alpha (α = .842), suggesting a strong internal consistency of the instrument. Sample items are presented in Table 2 .

Sample items from Student-Educator Negotiated Critical Thinking Dispositions Scale (SENCTDS), developed by Quinn et al. ( 2020 ).

No. of ItemItemDisposition
2 Reflection
6 Attentiveness
11 Open-mindedness
13 Organization
21 Intrinsic goal motivation

The analysis showed gains in critical thinking skills and indicated that changes were more prominent in skills than dispositions. All skills (interpretation, analysis, inference, explanation, self-regulation, and evaluation) obtained significant differences between the pretest and posttest, with p ≤ .0001 to all skills, plus the integrated critical thinking skills score was t = 9.705 and p ≤ .0001, which demonstrates strong significant difference between pre- and the posttest. Dispositions displayed no significant differences regarding the integrated score, but showed significant differences in reflection (t = 1.766, p = .079), open-mindedness (t = 2.636, p = .009), organization (t = 2.568, p = .011), and intrinsic goal motivation (t = 1.712, p = .088).

Based on the findings from the implementation of the blended apprenticeship curricula, the following guidelines were formulated for implementing Critical Thinking Blended Apprenticeship Curricula ( Payan-Carreira et al. 2023 ):

  • Provide an explanation of the importance of critical thinking—Clearly communicate to students why critical thinking is a vital skill in today’s workforce and how it is valued in specific professions. Explicitly incorporate the development of critical thinking as an outcome of the course.
  • Emphasize continuous and pervasive CT training—To achieve success, there should be a concerted effort across disciplinary curricula to foster students’ critical thinking skills and dispositions. Skills require training, and dispositions necessitate the internalization of desired attitudes. Therefore, sufficient time and a collaborative approach at the disciplinary level are necessary for consistent and significant progress.
  • Allocate dedicated time—Building on the previous point, it is essential to allocate specific time within the course to work on the proposed critical thinking goals. Students and educators need to schedule activities and create opportunities for preparation, development, and feedback exchange. This ensures that the intervention leads to meaningful, lasting learning.
  • Establish connections with real-world scenarios—Foster student engagement and improve their perception of learning experiences by incorporating case studies that reflect situations professionals encounter in their daily work. By grounding the learning content in reality, students are more likely to be motivated and actively participate in the educational process.

Foster reflection on CT skills and dispositions—Offer students the chance to reflect on their reasoning processes and the attitudes they have developed throughout their learning experiences. Encouraging reflective thinking enhances the effectiveness of learning interventions and helps cultivate a deeper understanding of one’s experiences.

These steps aim to guide educators in effectively implementing the critical thinking blended apprenticeship curricula while also maximizing the impact of critical thinking development in students.

The two European projects made a great start in integrating the skills that employers want employees to learn from university curricula, but the results are nonetheless provisional. There is not a clear agreement among participating universities regarding how best to teach critical thinking, nor any regarding its importance for future jobs. We urge that more work should be done to nurture critical thinking within university curricula in order to provide our current students—who represent the future of the workforce—the much-wanted job-proof skills they need.

5. European Recommendations and Good Practices

Critical thinking stands as a pivotal goal for European Higher Education Institutions. To facilitate the attainment of this objective, we present an educational protocol that draws from comprehensive research and practical experiences, including insights from the CRITHINKEDU project. This protocol amalgamates insights from both theoretical and empirical studies on critical thinking with practical strategies for its cultivation.

Recommendations go toward signing memorandums of understanding between universities and labor market organizations to cultivate strong partnerships ( Rebelo et al. 2023 ). Effective collaboration between universities and businesses is crucial in fostering critical thinking. This partnership thrives on the synergy that results when academic institutions and businesses combine their expertise, resources, and perspectives. Strategies such as aligning goals, fostering long-term commitment, and promoting a culture of collaboration can strengthen these partnerships and ensure that academic research is harmoniously aligned with real-world needs.

Another recommendation relates to the formulation of compelling goals . Accurate and transparent goals are fundamental to the successful implementation of university-industry collaborations to promote critical thinking. These goals must be clearly defined and easily understood at multiple levels, from the institutional to the program and course levels. Recognition of critical thinking as an overarching goal implies its integration into assessment and evaluation processes.

Another recommendation is to develop flexible curricula . To effectively foster critical thinking, curricula must demonstrate adaptability and responsiveness to emerging trends and market demands. The use of agile curriculum design methodologies and the involvement of business partners in curriculum development is of great value. Approaches such as problem-based and case-based learning facilitate rapid adaptation to evolving market needs, such as the use of AI-powered software to solve work tasks better and faster. Regular feedback mechanisms and ongoing collaboration with business partners ensure that curricula remain relevant and flexible.

Incorporating real-world challenges and case studies into curricula bridges the gap between academia and the business world, creating an environment that encourages experiential learning. The active involvement of business stakeholders in providing relevant challenges plays a key role. Students’ problem-solving skills are enhanced by shifting from traditional teaching methods to project-based, problem-based, or case-based learning. Engaging students through apprenticeships, internships, guest lectures, and seminars immerses them in authentic work environments and fosters their professional development.

Ongoing, multi-faceted evaluation is a cornerstone of the collaboration between higher education and the business community to cultivate critical thinking. Assessment includes measuring learners’ progress in critical thinking, the effectiveness of curricula, and the impact of partnerships through the use of key performance indicators.

Regarding how to implement a critical thinking curriculum, pedagogical research ( Elen et al. 2019 ) suggests that in the development of critical thinking, whether it is regarded as a skill, disposition, or a combination of both, three categories of supportive measures can be identified: modeling, induction, and declaration.

Modeling: Support the development of critical thinking skills by demonstrating what it means to think critically at the institutional, programmatic, and course levels, considering multiple perspectives and alternative viewpoints.

Induction: Support critical thinking development by provoking critical thinking through the presentation of open-ended questions, unstructured tasks, complex problems, and real-world issues. The exact nature of “induction” and how it is implemented may vary across fields and disciplines. Induction can be carried out in a variety of ways; for example, presenting unstructured problems, providing authentic tasks, encouraging constructive controversy, asking “why” questions, or encouraging student autonomy.

Explanation: Promote the development of critical thinking by articulating or explicitly stating what is at stake, what strategies can be used, and what criteria must be met. This explanation can take the form of oral or written communication and should always be explicit and specific. Declaring and making things explicit can be accomplished in a variety of ways, including using critical thinking rubrics, developing elaborate concept maps, providing feedback on critical thinking, and engaging in discussion and reflection on critical issues.

This integrated approach, encompassing university-business collaboration and an educational protocol, underscores the significance of critical thinking in higher education. It provides a structured framework for nurturing this essential skill by aligning objectives, fostering partnerships, adapting curricula, and implementing ongoing evaluation practices. In doing so, educational institutions are better poised to equip students with the critical thinking skills needed to thrive in a rapidly evolving world.

6. Concluding Remarks or Can Critical THINKING Save the World?

In summary, the dynamic interaction between universities, businesses, and the evolving technology landscape, including the rise of artificial intelligence (AI) and online technologies, underscore the critical need to nurture and develop students’ critical thinking skills. As we navigate the challenges posed by AI and the ever-expanding digital realm, collaborative efforts between academia and industry have proven to be instrumental in preparing students for the future job market.

Incorporating real-world experiences, such as apprenticeships, into the curriculum is an important step toward improving students’ critical thinking skills in real-world contexts. Projects such as “Critical thinking across higher education curricula—CRITHINKEDU” and “Critical thinking for successful jobs—THINK4JOBS” have demonstrated the potential of these collaborations to bridge the gap between classroom learning and industry needs. In addition, the development of flexible curricula that can adapt to the evolving needs of the job market, especially considering online technologies, is essential. By integrating real-world challenges and case studies into the curriculum, students gain valuable problem-solving skills and are better prepared to navigate the complexities of the digital age.

Ongoing assessment and evaluation are critical components of this collaborative effort, ensuring that critical thinking remains a central focus and that students are making meaningful progress in acquiring this essential skill.

With the disruption of AI and the ubiquity of online technologies, the integration of critical thinking into higher education curricula is more important than ever. It enables students not only to thrive in a technology-driven world, but also to contribute to a rational, democratic, and globally interconnected society. The partnerships forged between universities and businesses, along with a well-defined educational protocol, provide a roadmap for cultivating these essential skills and preparing students for the challenges and opportunities of the future job market. The imperative to foster critical thinking in university curricula remains a fundamental step in equipping tomorrow’s workforce to navigate the complexities of an AI-influenced job market and a rapidly changing world.

Lilienfeld ( 2007, para. 3 ) said it well: “The greatest threat to the world is ideological fanaticism, by ideological fanaticism I mean the unshakeable conviction that one’s belief system and that of other in-group members is always right and righteous and that others’ belief systems are always wrong and wrong-headed”. Imagine a world where (most or even many) people use the skills of critical thinking. Just maybe, CT could save the world.

The job market will require a psychologically adaptable toolkit, and we propose that critical thinking is an essential component therein. The disruptions imposed by new technological advances such as AI will require students to learn new employable skills because we will need not just an engineer, but a critical thinking engineer; not just a programmer, but a critical thinking programmer; and not just a journalist, but a critical thinking journalist. The dignity of workers—their humanity and our collective survival—may well depend on CT, a very human creation.

Acknowledgments

We sincerely thank Dana Dunn, Moravian University, for comments on an earlier version of this manuscript.

Funding Statement

Daniela Dumitru received funding from European Commission/EACEA, through the ERASMUS+ Programme, “Critical Thinking for Successful Jobs—Think4Jobs” Project, with the reference number 2020-1-EL01-KA203-078797.

Author Contributions

Conceptualization, D.F.H. and D.D.; investigation, D.F.H. and D.D.; resources, D.F.H. and D.D.; writing—original draft preparation, D.F.H. and D.D.; writing—review and editing, D.F.H. and D.D. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

  • Acemoglu Daron. Technical Change, Inequality, and the Labor Market. [(accessed on 15 May 2023)]; Journal of Economic Literature. 2002 40 :7–72. doi: 10.1257/jel.40.1.7. Available online: http://www.jstor.org/stable/2698593 [ CrossRef ] [ Google Scholar ]
  • Acemoglu Daron, Autor David. Skills, Tasks and Technologies: Implications for Employment and Earnings. In: Ashenfelter Orley, Card David., editors. Handbook of Labor Economics. 1st ed. North Holland-Elsevier; San Diego: 2011. pp. 1043–71. [ Google Scholar ]
  • adminGPT The Future Is Here: Analytics and Artificial Intelligence in Every Industry. May 22, 2023. [(accessed on 2 June 2023)]. @utopost. Available online: https://chatgpt.com/27739697/the-future-is-here-analytics-and-artificial-intelligence-in-every-industry#/
  • Bushwick Sophie, Harper Kelso, Bose Tulika. What You Need to Know about GPT-4. Scientific American Podcasts. 2023. [(accessed on 31 May 2023)]. Available online: https://www.scientificamerican.com/podcast/episode/what-you-need-to-know-about-gpt-4/
  • Data Professor How to Use ChatGPT to Generate Code in 90 Seconds. 2023. [(accessed on 31 May 2023)]. Available online: https://www.youtube.com/watch?v=ELJzUcYrAIQ
  • Dominguez Caroline. (coord.) A European Collection of the Critical Thinking SKILLS and Dispositions Needed in Different Professional Fields for the 21st Century. UTAD; Vila Real: 2018. [(accessed on 2 June 2023)]. Available online: https://crithinkedu.utad.pt/en/intellectual-outputs/ [ Google Scholar ]
  • Dumitru Daniela. Critical Thinking and Integrated Programs. [(accessed on 15 May 2023)]; The Problem of Transferability. Procedia-Social and Behavioral Sciences. 2012 33 :143–7. doi: 10.1016/j.sbspro.2012.01.100. Available online: http://www.sciencedirect.com/science/article/pii/S1877042812001085 [ CrossRef ] [ Google Scholar ]
  • Dumitru Daniela, Christodoulou Panagiota, Lithoxoidou Angeliki, Georgiadou Triantafyllia, Pnevmatikos Dimtrios, MarinDrămnescu Aurel, Enachescu Vladimir, Stăiculescu Camelia, Lăcătuş Maria Liana, Paduraru Monica Elisabeta, et al. Think4Jobs Toolkit: Ten Work-Based Learning Scenarios. University of Western Macedonia; Greece: 2021. [(accessed on 22 May 2023)]. Available online: https://think4jobs.uowm.gr/results/intellectualoutput1 [ Google Scholar ]
  • Dwyer Cristopher P. Critical Thinking: Conceptual Perspectives and Practical Guidelines. Cambridge University Press; Cambridge: 2017. [ Google Scholar ]
  • Elen Jan, Jiang Lai, Huyghe Steven, Evers Marleen, Verburgh Ann, Dumitru Daniela, Palaigeorgiou George. In: Promoting Critical Thinking in European Higher Education Institutions: Towards an Educational Protocol. Dominguez C., Payan-Carreira R., editors. UTAD; Vila Real: 2019. [(accessed on 30 August 2023)]. Available online: https://repositorio.utad.pt/bitstream/10348/9227/1/CRITHINKEDU%20O4%20%28ebook%29_FINAL.pdf [ Google Scholar ]
  • Elkeiy Gabriel. Future-Proof Skills can Help Balance Individual and Societal Progress. United Nations, UN Chronicle. Aug 5, 2022. [(accessed on 25 May 2023)]. Available online: https://www.un.org/en/un-chronicle/future-proof-skills-can-help-balance-individual-and-societal-progress#:~:text=Conceptual%20and%20strategic%20thinking%2C%20creativity,agility%20as%20qualified%20human%20beings
  • Eloundou Tyna, Manning Sam, Mishkin Pamela, Rock Daniel. GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models. 2023. [(accessed on 1 June 2023)]. Available online: https://arxiv.org/pdf/2303.10130.pdf
  • Facione Peter A. Critical Thinking: A Statement of Expert Consensus for Purposes of Educational Assessment and Instruction. Research Findings and Recommendations. [(accessed on 10 May 2023)]; 1990 Available online: http://files.eric.ed.gov/fulltext/ED315423.pdf
  • Fisher Alec. Critical Thinking: An Introduction. Cambridge University Press; Cambridge: 2001. [ Google Scholar ]
  • Halpern Diane F., Dunn Dana S. Thought and Knowledge: An Introduction to Critical Thinking. 6th ed. Routledge Taylor & Francis; New York: 2023. [ Google Scholar ]
  • Hart Research Associated Fulfilling the America Dream: Liberal Education and the Future of Work. 2018. [(accessed on 20 June 2023)]. Conducted on Behalf of Association of American Colleges and Universities. Available online: https://dgmg81phhvh63.cloudfront.net/content/user-photos/Research/PDFs/2018EmployerResearchReport.pdf
  • Healy Adrian, Perkmann Markus, Goddard John, Kempton Louise. Directorate General for Education and Culture, European Commission. European Union; Brussels: 2014. Measuring the Impact of University Business Cooperation. [ Google Scholar ]
  • Heijltjes Anita, Gog Tamara, Paas Fred. Improving Students’ Critical Thinking: Empirical Support for Explicit Instructions Combined with Practice. Applied Cognitive Psychology. 2014; 28 :518–30. doi: 10.1002/acp.3025. [ CrossRef ] [ Google Scholar ]
  • Kuhn Deanna. A Developmental Model of Critical Thinking. Educational Researcher. 1999; 28 :16–46. doi: 10.3102/0013189X028002016. [ CrossRef ] [ Google Scholar ]
  • Lilienfeld Scott. Can Psychology Change the World? The British Psychological Society, Research Digest. 2007. [(accessed on 31 May 2023)]. Available online: http://bps-research-digest.blogspot.com/2007/09/can-psychology-save-world.html
  • Lipman Matthew. Thinking in Education. Cambridge University Press; New York: 1991. [ Google Scholar ]
  • Nair Girija. Preliminary Psychometric Characteristics of the Critical Thinking Self-Assessment Scale. University of Saskatchewan; Saskatoon: 2011. [(accessed on 18 May 2023)]. Available online: https://harvest.usask.ca/bitstream/handle/10388/ETD-2011-09-103/girija.nair.phd.thesis.pdf;jsessionid=F19CA2ACBE3978E8DF9E19C77CB3198E?sequence=3 [ Google Scholar ]
  • Navracsics Tibor. Key Competences for Lifelong Learning. 2019. [(accessed on 22 May 2023)]. Foreword. European Commission, Directorate-General for Education, Youth, Sport and Culture. Publications Office. Available online: https://data.europa.eu/doi/10.2766/569540 [ Google Scholar ]
  • Nisbett Richard. Mindware Tools for Smart Thinking. Doubleday Canada; Toronto: 2015. [ Google Scholar ]
  • O’Mahony Cian, Brassil Maryanne, Murphy Gillian, Linehan Conor. The efficacy of interventions in reducing belief in conspiracy theories: A systematic review. PLoS ONE. 2023; 18 :e0280902. doi: 10.1371/journal.pone.0280902. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Orazbayeva Balzhan, Daveyb Todd, Plewa Carolin, Galán-Muros Victoria. Engagement of academics in education-driven university-business cooperation: A motivation-based perspective. Studies in Higher Education. 2020; 45 :1723–36. doi: 10.1080/03075079.2019.1582013. [ CrossRef ] [ Google Scholar ]
  • Paul Richard, Elder Linda. The Miniature Guide to Critical Thinking Concepts and Tools. Foundation for Critical Thinking Press; Santa Barbara: 2008. [ Google Scholar ]
  • Payan-Carreira Rita, Rebelo Hugo, Sebastião Luis, Sacau Ana, Ferreira David, Simões Margarida, Pnevmatikos Dimitrios, Christodoulou Panagiota, Lithoxoidou Angeliki, Georgiadou Triantafyllia, et al. THINK4JOBS Guidelines: A Protocol for Critical Thinking Transfer from Curricula to Labour Market. University of Western Macedonia; Greece: 2023. [(accessed on 2 June 2023)]. Available online: https://think4jobs.uowm.gr/results/intellectualoutput4 [ Google Scholar ]
  • Pnevmatikos Dimitios, Christodoulou Panagiota, Georgiadou Triantafyllia, Lithoxoidou Angeliki, Dimitriadou Catherine, Carreira Rita Payan, Simões Margarida, Ferreira David, Rebelo Hugo, Sebastião Luis. THINK4JOBS TRAINING: Critical Thinking Training Packages for Higher Education Instructors and Labour Market Tutors. University of Western Macedonia; Greece: 2021. [(accessed on 10 June 2023)]. Available online: https://think4jobs.uowm.gr/results/intellectualoutput2 [ Google Scholar ]
  • Quinn Sarah, Hogan Michael, Dwyer Cristopher, Finn Patrick, Fogarty Emer. Development and Validation of the Student-Educator Negotiated Critical Thinking Dispositions Scale (SENCTDS) Thinking Skills and Creativity. 2020; 38 :100710. doi: 10.1016/j.tsc.2020.100710. [ CrossRef ] [ Google Scholar ]
  • Rebelo Hugo, Christodoulou Panagiota, Payan-Carreira Rita, Dumitru Daniela, Mäkiö Elena, Mäkiö Juho, Pnevmatikos Dimitrios. University-Business Collaboration for the Design, Development and Delivery of Critical Thinking Blended Apprenticeships Curricula: Lessons Learned from a Three-Year Project. Education Sciences. 2023; 2023 :2023081992. doi: 10.20944/preprints202308.1992.v1. [ CrossRef ] [ Google Scholar ]
  • Roslansky Ryan. You Need a Skills-Based Approach to Hiring and Developing Talent. Harvard Business Review. 2021. [(accessed on 1 June 2023)]. Available online: https://hbr.org/2021/06/you-need-a-skills-based-approach-to-hiring-and-developing-talent
  • Tiruneh Dawit. Transfer of Critical Thinking Skills Across Domains: Implicit or Explicit Instructional Approaches?; Paper presented at 2019 AERA Annual Meeting; Toronto, ON, Canada. June 4; 2019. [ CrossRef ] [ Google Scholar ]
  • Vosoughi Soroush, Roy Deb, Aral Sinan. The spread of true and false news online. Science. 2018; 359 :1146–51. doi: 10.1126/science.aap9559. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Weiser Benjamin, Schweber Nate. The ChatGPT Lawyer Explains Himself. The New York Times. 2023. [(accessed on 11 June 2023)]. Available online: https://www.nytimes.com/2023/06/08/nyregion/lawyer-chatgpt-sanctions.html
  • World Economic Forum The Future of Jobs Report 2020. 2020. [(accessed on 31 May 2023)]. Available online: https://www3.weforum.org/docs/WEF_Future_of_Jobs_2020.pdf

Bookmark this page

Translate this page from English...

*Machine translated pages not guaranteed for accuracy. Click Here for our professional translations.

Defining Critical Thinking


Everyone thinks; it is our nature to do so. But much of our thinking, left to itself, is biased, distorted, partial, uninformed or down-right prejudiced. Yet the quality of our life and that of what we produce, make, or build depends precisely on the quality of our thought. Shoddy thinking is costly, both in money and in quality of life. Excellence in thought, however, must be systematically cultivated.


Critical thinking is that mode of thinking - about any subject, content, or problem - in which the thinker improves the quality of his or her thinking by skillfully taking charge of the structures inherent in thinking and
imposing intellectual standards upon them.



Foundation for Critical Thinking Press, 2008)

Teacher’s College, Columbia University, 1941)



Critical thinking definition

articles on assessing critical thinking

Critical thinking, as described by Oxford Languages, is the objective analysis and evaluation of an issue in order to form a judgement.

Active and skillful approach, evaluation, assessment, synthesis, and/or evaluation of information obtained from, or made by, observation, knowledge, reflection, acumen or conversation, as a guide to belief and action, requires the critical thinking process, which is why it's often used in education and academics.

Some even may view it as a backbone of modern thought.

However, it's a skill, and skills must be trained and encouraged to be used at its full potential.

People turn up to various approaches in improving their critical thinking, like:

  • Developing technical and problem-solving skills
  • Engaging in more active listening
  • Actively questioning their assumptions and beliefs
  • Seeking out more diversity of thought
  • Opening up their curiosity in an intellectual way etc.

Is critical thinking useful in writing?

Critical thinking can help in planning your paper and making it more concise, but it's not obvious at first. We carefully pinpointed some the questions you should ask yourself when boosting critical thinking in writing:

  • What information should be included?
  • Which information resources should the author look to?
  • What degree of technical knowledge should the report assume its audience has?
  • What is the most effective way to show information?
  • How should the report be organized?
  • How should it be designed?
  • What tone and level of language difficulty should the document have?

Usage of critical thinking comes down not only to the outline of your paper, it also begs the question: How can we use critical thinking solving problems in our writing's topic?

Let's say, you have a Powerpoint on how critical thinking can reduce poverty in the United States. You'll primarily have to define critical thinking for the viewers, as well as use a lot of critical thinking questions and synonyms to get them to be familiar with your methods and start the thinking process behind it.

Are there any services that can help me use more critical thinking?

We understand that it's difficult to learn how to use critical thinking more effectively in just one article, but our service is here to help.

We are a team specializing in writing essays and other assignments for college students and all other types of customers who need a helping hand in its making. We cover a great range of topics, offer perfect quality work, always deliver on time and aim to leave our customers completely satisfied with what they ordered.

The ordering process is fully online, and it goes as follows:

  • Select the topic and the deadline of your essay.
  • Provide us with any details, requirements, statements that should be emphasized or particular parts of the essay writing process you struggle with.
  • Leave the email address, where your completed order will be sent to.
  • Select your prefered payment type, sit back and relax!

With lots of experience on the market, professionally degreed essay writers , online 24/7 customer support and incredibly low prices, you won't find a service offering a better deal than ours.

Appointments at Mayo Clinic

  • Stress management

Positive thinking: Stop negative self-talk to reduce stress

Positive thinking helps with stress management and can even improve your health. Practice overcoming negative self-talk with examples provided.

Is your glass half-empty or half-full? How you answer this age-old question about positive thinking may reflect your outlook on life, your attitude toward yourself, and whether you're optimistic or pessimistic — and it may even affect your health.

Indeed, some studies show that personality traits such as optimism and pessimism can affect many areas of your health and well-being. The positive thinking that usually comes with optimism is a key part of effective stress management. And effective stress management is associated with many health benefits. If you tend to be pessimistic, don't despair — you can learn positive thinking skills.

Understanding positive thinking and self-talk

Positive thinking doesn't mean that you ignore life's less pleasant situations. Positive thinking just means that you approach unpleasantness in a more positive and productive way. You think the best is going to happen, not the worst.

Positive thinking often starts with self-talk. Self-talk is the endless stream of unspoken thoughts that run through your head. These automatic thoughts can be positive or negative. Some of your self-talk comes from logic and reason. Other self-talk may arise from misconceptions that you create because of lack of information or expectations due to preconceived ideas of what may happen.

If the thoughts that run through your head are mostly negative, your outlook on life is more likely pessimistic. If your thoughts are mostly positive, you're likely an optimist — someone who practices positive thinking.

The health benefits of positive thinking

Researchers continue to explore the effects of positive thinking and optimism on health. Health benefits that positive thinking may provide include:

  • Increased life span
  • Lower rates of depression
  • Lower levels of distress and pain
  • Greater resistance to illnesses
  • Better psychological and physical well-being
  • Better cardiovascular health and reduced risk of death from cardiovascular disease and stroke
  • Reduced risk of death from cancer
  • Reduced risk of death from respiratory conditions
  • Reduced risk of death from infections
  • Better coping skills during hardships and times of stress

It's unclear why people who engage in positive thinking experience these health benefits. One theory is that having a positive outlook enables you to cope better with stressful situations, which reduces the harmful health effects of stress on your body.

It's also thought that positive and optimistic people tend to live healthier lifestyles — they get more physical activity, follow a healthier diet, and don't smoke or drink alcohol in excess.

Identifying negative thinking

Not sure if your self-talk is positive or negative? Some common forms of negative self-talk include:

  • Filtering. You magnify the negative aspects of a situation and filter out all the positive ones. For example, you had a great day at work. You completed your tasks ahead of time and were complimented for doing a speedy and thorough job. That evening, you focus only on your plan to do even more tasks and forget about the compliments you received.
  • Personalizing. When something bad occurs, you automatically blame yourself. For example, you hear that an evening out with friends is canceled, and you assume that the change in plans is because no one wanted to be around you.
  • Catastrophizing. You automatically anticipate the worst without facts that the worse will happen. The drive-through coffee shop gets your order wrong, and then you think that the rest of your day will be a disaster.
  • Blaming. You try to say someone else is responsible for what happened to you instead of yourself. You avoid being responsible for your thoughts and feelings.
  • Saying you "should" do something. You think of all the things you think you should do and blame yourself for not doing them.
  • Magnifying. You make a big deal out of minor problems.
  • Perfectionism. Keeping impossible standards and trying to be more perfect sets yourself up for failure.
  • Polarizing. You see things only as either good or bad. There is no middle ground.

Focusing on positive thinking

You can learn to turn negative thinking into positive thinking. The process is simple, but it does take time and practice — you're creating a new habit, after all. Following are some ways to think and behave in a more positive and optimistic way:

  • Identify areas to change. If you want to become more optimistic and engage in more positive thinking, first identify areas of your life that you usually think negatively about, whether it's work, your daily commute, life changes or a relationship. You can start small by focusing on one area to approach in a more positive way. Think of a positive thought to manage your stress instead of a negative one.
  • Check yourself. Periodically during the day, stop and evaluate what you're thinking. If you find that your thoughts are mainly negative, try to find a way to put a positive spin on them.
  • Be open to humor. Give yourself permission to smile or laugh, especially during difficult times. Seek humor in everyday happenings. When you can laugh at life, you feel less stressed.
  • Follow a healthy lifestyle. Aim to exercise for about 30 minutes on most days of the week. You can also break it up into 5- or 10-minute chunks of time during the day. Exercise can positively affect mood and reduce stress. Follow a healthy diet to fuel your mind and body. Get enough sleep. And learn techniques to manage stress.
  • Surround yourself with positive people. Make sure those in your life are positive, supportive people you can depend on to give helpful advice and feedback. Negative people may increase your stress level and make you doubt your ability to manage stress in healthy ways.
  • Practice positive self-talk. Start by following one simple rule: Don't say anything to yourself that you wouldn't say to anyone else. Be gentle and encouraging with yourself. If a negative thought enters your mind, evaluate it rationally and respond with affirmations of what is good about you. Think about things you're thankful for in your life.

Here are some examples of negative self-talk and how you can apply a positive thinking twist to them:

Putting positive thinking into practice
Negative self-talk Positive thinking
I've never done it before. It's an opportunity to learn something new.
It's too complicated. I'll tackle it from a different angle.
I don't have the resources. Necessity is the mother of invention.
I'm too lazy to get this done. I couldn't fit it into my schedule, but I can re-examine some priorities.
There's no way it will work. I can try to make it work.
It's too radical a change. Let's take a chance.
No one bothers to communicate with me. I'll see if I can open the channels of communication.
I'm not going to get any better at this. I'll give it another try.

Practicing positive thinking every day

If you tend to have a negative outlook, don't expect to become an optimist overnight. But with practice, eventually your self-talk will contain less self-criticism and more self-acceptance. You may also become less critical of the world around you.

When your state of mind is generally optimistic, you're better able to handle everyday stress in a more constructive way. That ability may contribute to the widely observed health benefits of positive thinking.

There is a problem with information submitted for this request. Review/update the information highlighted below and resubmit the form.

From Mayo Clinic to your inbox

Sign up for free and stay up to date on research advancements, health tips, current health topics, and expertise on managing health. Click here for an email preview.

Error Email field is required

Error Include a valid email address

To provide you with the most relevant and helpful information, and understand which information is beneficial, we may combine your email and website usage information with other information we have about you. If you are a Mayo Clinic patient, this could include protected health information. If we combine this information with your protected health information, we will treat all of that information as protected health information and will only use or disclose that information as set forth in our notice of privacy practices. You may opt-out of email communications at any time by clicking on the unsubscribe link in the e-mail.

Thank you for subscribing!

You'll soon start receiving the latest Mayo Clinic health information you requested in your inbox.

Sorry something went wrong with your subscription

Please, try again in a couple of minutes

  • Forte AJ, et al. The impact of optimism on cancer-related and postsurgical cancer pain: A systematic review. Journal of Pain and Symptom Management. 2021; doi:10.1016/j.jpainsymman.2021.09.008.
  • Rosenfeld AJ. The neuroscience of happiness and well-being. Child and Adolescent Psychiatric Clinics of North America. 2019;28:137.
  • Kim ES, et al. Optimism and cause-specific mortality: A prospective cohort study. American Journal of Epidemiology. 2016; doi:10.1093/aje/kww182.
  • Amonoo HL, et al. Is optimism a protective factor for cardiovascular disease? Current Cardiology Reports. 2021; doi:10.1007/s11886-021-01590-4.
  • Physical Activity Guidelines for Americans. 2nd ed. U.S. Department of Health and Human Services. https://health.gov/paguidelines/second-edition. Accessed Oct. 20, 2021.
  • Seaward BL. Essentials of Managing Stress. 4th ed. Burlington, Mass.: Jones & Bartlett Learning; 2021.
  • Seaward BL. Cognitive restructuring: Reframing. Managing Stress: Principles and Strategies for Health and Well-Being. 8th ed. Burlington, Mass.: Jones & Bartlett Learning; 2018.
  • Olpin M, et al. Stress Management for Life. 5th ed. Cengage Learning; 2020.
  • A very happy brain
  • Being assertive
  • Bridge pose
  • Caregiver stress
  • Cat/cow pose
  • Child's pose
  • COVID-19 and your mental health
  • Does stress make rheumatoid arthritis worse?
  • Downward-facing dog
  • Ease stress to reduce eczema symptoms
  • Ease stress to reduce your psoriasis flares
  • Forgiveness
  • Job burnout
  • Learn to reduce stress through mindful living
  • Manage stress to improve psoriatic arthritis symptoms
  • Mayo Clinic Minute: Meditation is good medicine
  • Mountain pose
  • New School Anxiety
  • Seated spinal twist
  • Standing forward bend
  • Stress and high blood pressure
  • Stress relief from laughter
  • Stress relievers
  • Support groups
  • Tips for easing stress when you have Crohn's disease

Mayo Clinic does not endorse companies or products. Advertising revenue supports our not-for-profit mission.

  • Opportunities

Mayo Clinic Press

Check out these best-sellers and special offers on books and newsletters from Mayo Clinic Press .

  • Mayo Clinic on Incontinence - Mayo Clinic Press Mayo Clinic on Incontinence
  • The Essential Diabetes Book - Mayo Clinic Press The Essential Diabetes Book
  • Mayo Clinic on Hearing and Balance - Mayo Clinic Press Mayo Clinic on Hearing and Balance
  • FREE Mayo Clinic Diet Assessment - Mayo Clinic Press FREE Mayo Clinic Diet Assessment
  • Mayo Clinic Health Letter - FREE book - Mayo Clinic Press Mayo Clinic Health Letter - FREE book
  • Healthy Lifestyle
  • Positive thinking Stop negative self-talk to reduce stress

We’re transforming healthcare

Make a gift now and help create new and better solutions for more than 1.3 million patients who turn to Mayo Clinic each year.

IMAGES

  1. ULTIMATE CRITICAL THINKING CHEAT SHEET Published 01/19/2017 Infographic

    articles on assessing critical thinking

  2. 25 Critical Thinking Examples (2024)

    articles on assessing critical thinking

  3. Critical Thinking Assessment Primary by Perry's Primary Thinking Club

    articles on assessing critical thinking

  4. ️ Critical thinking benefits. BENEFITS OF USING CRITICAL THINKING IN

    articles on assessing critical thinking

  5. (PDF) Assessing Critical Thinking in STEM and Beyond

    articles on assessing critical thinking

  6. Critical thinking 10

    articles on assessing critical thinking

VIDEO

  1. UCAT: Abstract Reasoning Tactics

  2. Reformation, Not Deconstruction: A Healthy Biblical Process for Assessing Our Beliefs

  3. What does critical thinking involve? #literacy #criticalthinking

  4. Final Assessment || Solving Problems with Critical and Creative Thinking #ibm #edunet #eduskills

  5. Mastering Decision-Making: Strategies for Confident Choices and Success! 🎯🤔

  6. Reason: Why Students

COMMENTS

  1. Assessing Critical Thinking in Higher Education: Current State and

    Critical thinking is one of the most frequently discussed higher order skills, believed to play a central role in logical thinking, decision making, and problem solving (Butler, 2012; Halpern, 2003).It is also a highly contentious skill in that researchers debate about its definition; its amenability to assessment; its degree of generality or specificity; and the evidence of its practical ...

  2. Frontiers

    Enhancing students' critical thinking (CT) skills is an essential goal of higher education. This article presents a systematic approach to conceptualizing and measuring CT. CT generally comprises the following mental processes: identifying, evaluating, and analyzing a problem; interpreting information; synthesizing evidence; and reporting a conclusion. We further posit that CT also involves ...

  3. Bridging critical thinking and transformative learning: The role of

    In recent decades, approaches to critical thinking have generally taken a practical turn, pivoting away from more abstract accounts - such as emphasizing the logical relations that hold between statements (Ennis, 1964) - and moving toward an emphasis on belief and action.According to the definition that Robert Ennis (2018) has been advocating for the last few decades, critical thinking is ...

  4. Guidelines for a Scientific Approach to Critical Thinking Assessment

    This article examines benefits of taking a scientific approach to critical thinking assessment and proposes guidelines for planning, conducting, and using assessment research. Specifically, we discuss study design options and strategies for improving the quality of assessment data and the use of such data to improve critical thinking ...

  5. J. Intell.

    Given the importance of critical thinking to our everyday lives, the fair and unbiased assessment of critical thinking is useful for guiding educators in their classrooms, for the sake of self-improvement, and in employment decisions. This chapter will also review the psychometric properties of several critical thinking assessments, with a ...

  6. Critical Thinking Is About Asking Better Questions

    Summary. Critical thinking is the ability to analyze and effectively break down an issue in order to make a decision or find a solution. At the heart of critical thinking is the ability to ...

  7. [PDF] Assessing Critical Thinking

    Assessing Critical Thinking. S. Brookfield. Published 1 September 1997. Education, Psychology. New Directions for Adult and Continuing Education. The author discusses the process of critical thinking and his belief that it is social process, then offers three locally grounded strategies for assessing critical thinking: experiential, behavioral ...

  8. Assessing critical thinking in higher education: the HEIghten™ approach

    Abstract. Critical thinking is a learning outcome highly valued by higher education institutions and the workforce. The Educational Testing Service (ETS) has designed a next generation assessment, the HEIghten TM critical thinking assessment, to measure students' critical thinking skills in analytical and synthetic dimensions. This paper introduces the theoretical framework that guided the ...

  9. Assessment of Critical Thinking: An Evolutionary Approach

    ized tests to assess critical thinking.6 All of this suggests that campus-based attempts at defining these outcomes have provided a renewed impetus toward assessing critical thinking because campus administrators and faculty are able to do so on their own terms.7 Purpose In this article, we describe our experience with assessing critical think-

  10. A critical issue: assessing the critical thinking skills and

    The theoretical framework for this study was heavily influenced by the work of Dr. Peter Facione, whose seminal work in the field of critical thinking assessment is utilized by educators, employers, and policymakers who recognize the need for students and alumni of institutions of higher education to be able to properly demonstrate these skills and dispositions as a result of their time in ...

  11. Rubrics to assess critical thinking and information processing in

    Process skills such as critical thinking and information processing are commonly stated outcomes for STEM undergraduate degree programs, but instructors often do not explicitly assess these skills in their courses. Students are more likely to develop these crucial skills if there is constructive alignment between an instructor's intended learning outcomes, the tasks that the instructor and ...

  12. Assessing Critical Thinking in the Digital Era

    One of the key difficulties in assessing critical thinking is defining what it is and how it should be measured. Critical thinking is a complex skill that involves the ability to analyze and evaluate information, think creatively, and make reasoned judgments, as Richard Paul and Linda Elder outline in their 2019 publication.

  13. A Brief Guide for Teaching and Assessing Critical Thinking in

    Instructional interventions affecting critical thinking skills and dispositions: A stage 1 meta-analysis. Review of Educational Research, 4, 1102-1134. Angelo, T. A. (1995). Classroom assessment for critical thinking. Teaching of Psychology, 22(1), 6-7. Bensley, D.A. (1998). Critical thinking in psychology: A unified skills approach.

  14. Assessing critical thinking in business education: Key issues and

    As a starting point, at the centre of our framework (Fig. 1) sit the four critical thinking criteria that informed the rubric used by the external marker to assess the reports and also used for accreditation purposes (Appendix B); that is, 1) critical evaluation of the issues; 2) development and presentation of the arguments; 3) application of theories and ideas to real-world context; and 4 ...

  15. Assessing Adolescent Critical Thinking

    A taxonomy of learning, teaching, and assessment: A revision of Bloom's taxonomy of educational objectives. Longman. Google Scholar. ... Critical thinking—What every person needs to survive in a rapidly changing world (3rd ed.). Foundation for Critical Thinking. Google Scholar.

  16. Teaching, Measuring & Assessing Critical Thinking Skills

    Yes, We Can Define, Teach, and Assess Critical Thinking Skills. Critical thinking is a thing. We can define it; we can teach it; and we can assess it. While the idea of teaching critical thinking has been bandied around in education circles since at least the time of John Dewey, it has taken greater prominence in the education debates with the ...

  17. What Are Critical Thinking Skills and Why Are They Important?

    It makes you a well-rounded individual, one who has looked at all of their options and possible solutions before making a choice. According to the University of the People in California, having critical thinking skills is important because they are [ 1 ]: Universal. Crucial for the economy. Essential for improving language and presentation skills.

  18. Assessing critical thinking through L2 argumentative essays: an

    Although some second language (L2) pedagogical approaches recognize critical thinking (CT) as an important skill, its assessment is challenging because it is not a well-defined construct with varying definitions. This study aimed to identify the relevant and salient features of argumentative essays that allow for the assessment of L2 students' CT skills. This study implemented a convergent ...

  19. Critical Thinking: Creating Job-Proof Skills for the Future of Work

    The consortium chose the Critical Thinking Self-Assessment Scale (CTSAS) developed by Nair as an instrument to assess CT skills based on an earlier conceptualization (Facione 1990). The questionnaire has been tested in various geographic and cultural contexts, demonstrating good reliability, internal consistency, and confirmatory factor ...

  20. Assessing and Developing Students' Critical Thinking

    Abstract. The development of critical thinking is an important step in achieving the goals of holistic education, not only through helping students gain knowledge but above all through ensuring that they think effectively. The aim of this article is to explore the concept of critical thinking and to discuss possibilities to develop students ...

  21. Defining Critical Thinking

    Critical thinking is, in short, self-directed, self-disciplined, self-monitored, and self-corrective thinking. It presupposes assent to rigorous standards of excellence and mindful command of their use. It entails effective communication and problem solving abilities and a commitment to overcome our native egocentrism and sociocentrism.

  22. Using Critical Thinking in Essays and other Assignments

    Critical thinking, as described by Oxford Languages, is the objective analysis and evaluation of an issue in order to form a judgement. Active and skillful approach, evaluation, assessment, synthesis, and/or evaluation of information obtained from, or made by, observation, knowledge, reflection, acumen or conversation, as a guide to belief and ...

  23. Positive thinking: Reduce stress by eliminating negative self-talk

    Positive thinking often starts with self-talk. Self-talk is the endless stream of unspoken thoughts that run through your head. These automatic thoughts can be positive or negative. Some of your self-talk comes from logic and reason. Other self-talk may arise from misconceptions that you create because of lack of information or expectations due ...

  24. Using Writing to Develop and Assess Critical Thinking

    Abstract. Written work has several important advantages over oral discussion in the development and assessment of students' critical-thinking skills. This article describes a set of short writing assignments that tap eight essential aspects of critical and creative thought. Examples illustrate how the assignments can be tied to topics in ...

  25. Book Review: Against critical thinking in health, social care and

    Based on: Grimwood Tom, Against critical thinking in health, social care and social work reframing philosophy for professional practice. Taylor & Francis, 2023; 176 pp., ISBN 9780367642358, $130 (h/c) OnlineFirst. ... Get full access to this article. View all access and purchase options for this article. Get Access. Cite article Cite article ...