Mental Workload

Mental Workload

Status: established
Last updated: 2026-05-31
Sources: 9781119636113.Ch7.Pdf
Tags: [mental-workload, cognitive-workload, workload-assessment, NASA-TLX, attention, human-factors, neuroergonomics]

Summary

Mental workload (MWL) is a foundational construct in human factors and ergonomics, representing the relationship between cognitive demand imposed by tasks and the operator's capacity to meet those demands. A useful operational definition characterizes MWL as "the operator's allocation of limited processing capacity or resources to meet task demands; that is, the balance of internal resources and external demands" (Matthews & Reinerman-Jones, 2017, pp. 3-4). Mental workload and cognitive workload are synonymous terms addressing the same construct (Hancock et al., 2021).

Body

Context

Hancock et al. (2021), in their handbook chapter on mental workload, examine MWL as the balance between the cognitive demand a task imposes and the operator's limited processing capacity. The chapter traces the construct's origins, its theoretical bases, the four families of measurement methods, computational modeling, and the challenges automation poses. Mental workload and cognitive workload are treated as synonymous terms for the same construct. Within this knowledge base the article sits in the cognitive-ergonomics core named in Human Factors Ergonomics Discipline: it draws its limited-capacity premise from Information Processing, shares its overload concern with Situation Awareness, rests on the perceptual demands of Sensation And Perception, and supplies the physiological-measurement link taken further in Neuroergonomics.

Key Points

Mental workload assessment emerged during and after World War II when the demands of combat raised concerns about human performance capacities, with aviation central to its development. The Cooper-Harper Scale (1969) let NASA pilots rate aircraft handling qualities through a decision tree yielding 1 (best) to 10 (worst), and influenced later instruments including the NASA-TLX and SWAT, founded in aviation research at NASA and the U.S. Air Force respectively (PDF pp. 2–3, orig. pp. 203–204).

Three theories ground the construct. Resource Theory (Kahneman, 1973; Gopher & Donchin, 1986) treats attention as a limited resource pool, where higher demand draws more heavily on finite resources. Norman and Bobrow's (1975) Performance Resource Functions distinguish resource-limited processing (performance rises with allocated resources) from data-limited processing (performance is capped by data quality, so extra resources yield no gain). Multiple Resource Theory (Wickens, 1984, 2008) extends the single pool to several pools indexed by processing nature, stage, and modality, explaining why tasks using different codes (visual/auditory input, vocal/manual response) interfere less than tasks sharing codes. Assessment focuses on thresholds between underload, moderate workload, and overload, with overload — all resources allocated, no spare capacity — most often producing performance decrements; this "red line" has driven much research (Hancock & Caird, 1993; Young et al., 2015) (PDF pp. 3–4, orig. pp. 204–205).

Measurement spans four families, and applied researchers favor combining them rather than relying on one (Gopher & Kimchi, 1989; Hockey et al., 1989). Primary-task measures infer load from focal-task performance, but added difficulty within capacity may not degrade performance and low-MWL situations are often data-limited. Secondary-task measures, such as the Peripheral Detection Task (an LED lighting every 3–5 seconds, with rising workload slowing responses), index spare capacity but are intrusive, impractical in safety-critical work, and weak for studying underload. Subjective measures are the most widely used for their usability (Estes, 2015; Matthews et al., 2015): the NASA-TLX (Hart & Staveland, 1988) dominates with its six subscales (mental, physical, temporal demand; performance; effort; frustration), its paired-comparison weighting often safely omitted as Raw TLX (Hendy et al., 1993; Hill et al., 1992), with variants such as SURG-TLX and DALI. Other instruments include SWAT, Cooper-Harper, Workload Profile, ISA, the Bedford Scale, and RSME. Estes (2015) modelled a nonlinear S-shaped relationship between actual workload and ratings, with perceptions distorted at extremes (Desmond et al., 1998; Foy & Chapman, 2018). Subjective measures are cheap, unobtrusive post-task, diagnostic, and high in face validity, but depend on short-term memory, lose within-task variation, cannot run in real time without disruption, and are subject to psychometric and cultural effects (Johnson & Widyanti, 2011). Physiological measures offer objective, continuous, non-invasive data, though no single measure conclusively indexes MWL given individual variability (Charles & Nixon, 2019); they are evaluated against seven criteria (sensitivity, diagnosticity, selectivity, reliability, intrusiveness, practical constraints, operator acceptance; Eggemeier, 1988; O'Donnell & Eggemeier, 1986). P300 amplitude falls as MWL rises (Brouwer et al., 2012; Käthner et al., 2014; Prinzel et al., 2003); heart rate variability falls with sympathetic arousal and stress (Kim et al., 2018); EMG amplitude rises with load (Fallahi et al., 2016b); and skin conductance rises with arousal but lags 1–3 seconds and is sensitive to drugs and ambient conditions (PDF pp. 4–12, orig. pp. 205–213).

Computational modeling aggregates dimensions in several ways: additive aggregation (Workload Profile, equal weights), weighted aggregation (NASA-TLX paired comparisons, which scale poorly as dimensions grow), and ranking-based aggregation (SWAT, three attributes at three levels via 27-card sorting, ignoring interactions). The Xie and Salvendy (2000a) framework proposes multiple indices — instantaneous, peak, accumulated, average, and overall workload — plus effective versus ineffective workload, a degrading factor, and management load. Longo (2014, 2015) frames MWL as a defeasible concept modelled with AI argumentation, where premises support claims and attack relations capture contradictions, allowing interactions and inconsistencies to be represented formally. Machine learning can learn workload models directly from data, including deep learning on EEG features (Moustafa et al., 2017; Yin & Zhang, 2018) (PDF pp. 13–17, orig. pp. 214–218).

A central assessment challenge is the non-converging evidence across measures (Hancock & Matthews, 2019): association (methods converge), insensitivity (one method shows no change), and dissociation (methods disagree on direction). Dissociation is not new (Yeh & Wickens, 1988) but can be informative, since methods tap different facets — for example subjective ratings track effort invested to hold performance constant, so they dissociate from primary-task measures (Hockey, 1997; Hilburn, 1997; Tsang & Vidulich, 2006) (PDF pp. 8–9, orig. pp. 209–210).

Conclusion

Hancock et al. (2021) conclude that automation is reshaping rather than eliminating mental workload: operators shift from active control to teammate, supervisor, or monitor roles, exposing them to underload (boredom, inattentiveness, fatigue), to overload when supervising several autonomous systems, and to rapid transitions between the two, as in self-driving-vehicle takeovers requiring response within seconds (Eriksson & Stanton, 2017). Advances in wireless micro-electronics and neuroergonomic sensors will generate large data volumes, turning assessment into a signal-to-noise problem. The authors close on hedonomics: with advanced assessment, work could be redesigned so that workers face interesting, challenging, and rewarding tasks rather than under- or overload (Hancock et al., 2005).

References

Brouwer, A.-M., Hogervorst, M.A., Van Erp, J.B., Heffelaar, T., Zimmerman, P.H. & Oostenveld, R. (2012) 'Estimating workload using EEG spectral power and ERPs in the n-back task', Journal of Neural Engineering, 9(4), 045008. To be validated.

Charles, R.L. & Nixon, J. (2019) 'Measuring mental workload using physiological measures: A systematic review', Applied Ergonomics, 74, pp. 221–232. To be validated.

Cooper, G.E. & Harper, R.P. (1969) The use of pilot ratings in the evaluation of aircraft handling qualities. Advisory Group for Aerospace Research and Development (AGARD), Report 567, NATO. London: Technical Editing and Reproduction Ltd. To be validated.

Desmond, P.A., Hancock, P.A. & Monette, J.L. (1998) 'Fatigue and automation-induced impairments in simulated driving performance', Transportation Research Record, 1628, pp. 8–14. To be validated.

Eggemeier, F.T. (1988) 'Properties of workload assessment techniques', in P.A. Hancock & N. Meshkati (eds.) Human mental workload. Amsterdam: Elsevier, pp. 41–62. To be validated.

Eriksson, A. & Stanton, N.A. (2017) 'Takeover time in highly automated vehicles: Noncritical transitions to and from manual control', Human Factors, 59(4), pp. 689–705. To be validated.

Estes, S. (2015) 'The workload curve: Subjective mental workload', Human Factors, 57(7), pp. 1174–1187. To be validated.

Fallahi, M., Motamedzade, M., Heidarimoghadam, R., Soltanian, A.R. & Miyake, S. (2016b) 'Effects of mental workload on physiological and subjective responses during traffic density monitoring: A field study', Applied Ergonomics, 52, pp. 95–103. To be validated.

Foy, H.J. & Chapman, P. (2018) 'Mental workload is reflected in driver behaviour, physiology, eye movements and prefrontal cortex activation', Applied Ergonomics, 73, pp. 90–99. To be validated.

Gopher, D. & Donchin, E. (1986) 'Workload: An examination of the concept', in K.R. Boff, L. Kaufman & J.P. Thomas (eds.) Handbook of perception and human performance, Vol. II, Cognitive processes and performance. New York: Wiley. To be validated.

Gopher, D. & Kimchi, R. (1989) 'Engineering psychology', Annual Review of Psychology, 40, pp. 431–455. To be validated.

Hancock, G.M., Longo, L., Young, M.S. & Hancock, P.A. (2021). Mental workload. In G. Salvendy & W. Karwowski (Eds.), Handbook of Human Factors and Ergonomics (5th ed., pp. 203-226). John Wiley & Sons. hancock2021mentalworkload

Hart, S.G. & Staveland, L.E. (1988) 'Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research', in Hancock, P.A. & Meshkati, N. (eds.) Human Mental Workload (Advances in Psychology, Vol. 52). Amsterdam: North-Holland, pp. 139–183. doi: 10.1016/S0166-4115(08)62386-9. hart1988nasatlx (see Nasa Tlx)

Hendy, K.C., Hamilton, K.M. & Landry, L.N. (1993) 'Measuring subjective workload: When is one scale better than many?', Human Factors, 35, pp. 579–601. To be validated.

Hilburn, B. (1997) 'Dynamic decision aiding: The impact of adaptive automation on mental workload', in D. Harris (ed.) Engineering psychology and cognitive ergonomics. Vol. I: Transportation systems. Aldershot: Ashgate, pp. 193–200. To be validated.

Hill, S.G., Iavecchia, H.P., Byers, J.C., Bittner, A.C., Zakland, A.L. & Christ, R.E. (1992) 'Comparison of four subjective workload rating scales', Human Factors, 34, pp. 429–439. To be validated.

Hockey, G.R.J. (1997) 'Compensatory control in the regulation of human performance under stress and high workload: A cognitive-energetical framework', Biological Psychology, 45, pp. 73–93. To be validated.

Hockey, G.R.J., Briner, R.B., Tattersall, A.J. & Wiethoff, M. (1989) 'Assessing the impact of computer workload on operator stress: The role of system controllability', Ergonomics, 32(11), pp. 1401–1418. To be validated.

Johnson, A. & Widyanti, A. (2011) 'Cultural influences on the measurement of subjective mental workload', Ergonomics, 54(6), pp. 509–518. To be validated.

Kahneman, D. (1973) Attention and effort. Englewood Cliffs, NJ: Prentice-Hall. To be validated.

Käthner, I., Wriessnegger, S.C., Müller-Putz, G.R., Kübler, A. & Halder, S. (2014) 'Effects of mental workload and fatigue on the P300, alpha and theta band power during operation of an ERP (P300) brain–computer interface', Biological Psychology, 102, pp. 118–129. To be validated.

Kim, H.-G., Cheon, E.-J., Bai, D.-S., Lee, Y.H. & Koo, B.-H. (2018) 'Stress and heart rate variability: A meta-analysis and review of the literature', Psychiatry Investigation, 15(3), pp. 235–245. To be validated.

Matthews, G. & Reinerman-Jones, L. (2017) Workload assessment: How to diagnose workload issues and enhance performance. Santa Monica, CA: Human Factors and Ergonomics Society. To be validated.

Matthews, G., Reinerman-Jones, L.E., Barber, D.J. & Abich, J. (2015) 'The psychometrics of mental workload: Multiple measures are sensitive but divergent', Human Factors, 57(1), pp. 125–143. To be validated.

Moustafa, K., Luz, S. & Longo, L. (2017) 'Assessment of mental workload: A comparison of machine learning methods and subjective assessment techniques', in International Symposium on Human Mental Workload: Models and Applications. Cham: Springer, pp. 30–50. To be validated.

Norman, D.A. & Bobrow, D.G. (1975) 'On data-limited and resource-limited processes', Cognitive Psychology, 7, pp. 44–64. To be validated.

O'Donnell, R. & Eggemeier, F.T. (1986) 'Workload assessment methodology', in K.R. Boff, L. Kaufman & J.P. Thomas (eds.) Handbook of perception and human performance. Vol. II: Cognitive processes and performance. New York: Wiley. To be validated.

Prinzel, L.J. III, Freeman, F.G., Scerbo, M.W., Mikulka, P.J. & Pope, A.T. (2003) 'Effects of a psychophysiological system for adaptive automation on performance, workload, and the event-related potential P300 component', Human Factors, 45(4), pp. 601–614. To be validated.

Tsang, P.S. & Vidulich, M.A. (2006) 'Mental workload and situation awareness', in G. Salvendy (ed.) Handbook of human factors and ergonomics (3rd ed.). Hoboken, NJ: Wiley, pp. 243–268. To be validated.

Wickens, C.D. (1984) 'Processing resources in attention', in R. Parasuraman & D.R. Davies (eds.) Varieties of attention. New York: Academic Press, pp. 63–102. To be validated.

Wickens, C.D. (2008) 'Multiple resources and mental workload', Human Factors, 50(2), pp. 449–454. To be validated.

Xie, B. & Salvendy, G. (2000a) 'Prediction of mental workload in single and multiple tasks environments', International Journal of Cognitive Ergonomics, 4, pp. 213–242. To be validated.

Yeh, Y. & Wickens, C.D. (1988) 'Dissociation of performance and subjective measures of workload', Human Factors, 30, pp. 111–120. To be validated.

Yin, Z. & Zhang, J. (2018) 'Task-generic mental fatigue recognition based on neurophysiological signals and dynamical deep extreme learning machine', Neurocomputing, 283, pp. 266–281. To be validated.

Open Questions

  • How can workload assessment techniques adapt to rapid transitions between under- and overload in automated systems?
  • What methods best resolve associations, insensitivities, and dissociations between different measurement types?
  • How should team workload be assessed in human-AI collaborative systems?
  • Can machine learning models of workload achieve sufficient explainability for practical HFE applications?
  • How do cultural differences systematically affect workload ratings, and how should assessment be adapted?