NASA-TLX (Task Load Index)¶

Status: emerging
Last updated: 2026-06-03
Sources: S0166 4115_2808_2962386 9.Pdf
Tags: [nasa-tlx, mental-workload, workload-assessment, subjective-workload, rating-scale, human-factors, aviation]

Summary¶

The NASA Task Load Index (NASA-TLX) is a multidimensional subjective rating procedure for assessing perceived workload, developed by Hart & Staveland (1988) from a multi-year NASA-Ames research program. It scores workload on six subscales — Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration — and combines them into a single overall score using importance weights obtained from each rater by pairwise comparison. The six subscales were distilled from ten candidate factors studied across sixteen experiments, and the weighting step was designed to reduce the between-subject variability that limits single-scale workload ratings. NASA-TLX is the most widely used subjective workload instrument and the canonical example of the rating techniques surveyed in Mental Workload.

Body¶

Context¶

Hart & Staveland (1988) report the empirical and theoretical work behind the NASA Task Load Index, a subjective workload rating scale. Their chapter reviews a multi-year programme that obtained subjective evaluations of ten workload-related factors from sixteen experiments — spanning simple cognitive and manual-control tasks, complex laboratory and supervisory-control tasks, and aircraft simulation — and used the results to derive a six-factor instrument. Within this knowledge base the article is the primary source for the workload-measurement method that Mental Workload discusses, where NASA-TLX had previously been cited only indirectly. It connects to Multiple Resource Theory and Working Memory Capacity through the cognitive demands it measures, and to Supervisory Control Of Automation and Situation Awareness, where operator workload is a primary evaluative criterion.

Key Points¶

Why a multidimensional, weighted scale. Workload remains an important and measurable construct despite disagreement about its definition, and subjective ratings are the most commonly used measure and the criterion against which others are compared (PDF p. 1, orig. p. 139). Two problems motivate the design: subjective ratings show high between-subject variability, and the sources of workload are numerous and vary across tasks. Hart & Staveland propose a multidimensional technique that identifies the specific sources of workload relevant to a given task and combines them into a global rating, reducing experimentally irrelevant between-subject variability while preserving relevant differences (PDF p. 1, orig. p. 139).

Six subscales. The candidate set of ten workload-related factors was reduced to six that were either consistently related to workload or diagnostic of particular task types: Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration (PDF p. 8, orig. p. 146). Some factors (such as mental demand and mental effort) were related to workload consistently across subjects and experiments, while others (such as physical effort and own performance) were closely related only under some conditions — which is why the instrument keeps several dimensions rather than collapsing to one (PDF p. 8, orig. p. 146).

Two-part procedure: ratings plus importance weights. Each task is rated on all six subscales. Because raters weight the factors differently, an importance weight is obtained from each individual by pairwise comparison: every pair of the six subscales is presented and the rater chooses the more relevant member, giving each subscale a weight equal to the number of times it was chosen. The overall workload score is the weighted combination of the six subscale ratings. The weight and rating data were drawn from large pooled data bases (the rating data base held 3,461 entries per scale; weights came from 247 subjects), and the pairwise-comparison weighting was shown to capture systematic individual differences in how people define workload (PDF p. 11, orig. p. 149).

Validation across task types. Subjective evaluations varied as a function of difficulty manipulations within experiments, of different workload sources between experiments, and of individual differences in how workload was defined (PDF p. 1, orig. p. 139). Analyses across the sixteen experiments related each subscale to overall workload to identify the primary sources of workload in each task type, supporting the claim that the instrument is both sensitive and diagnostic (PDF p. 11, orig. p. 149).

Conclusion¶

Hart & Staveland (1988) conclude with a multidimensional rating scale in which information about the magnitude and sources of six workload-related factors is combined to derive a sensitive and reliable estimate of workload. By separating the rating of each dimension from the rater's own weighting of its importance, NASA-TLX addresses the between-subject variability of single-scale measures while remaining diagnostic of where workload comes from. It has become the standard subjective workload instrument and the reference point for the workload-assessment methods treated in Mental Workload.

References¶

Hart, S.G. & Staveland, L.E. (1988) 'Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research', in Hancock, P.A. & Meshkati, N. (eds.) Human Mental Workload (Advances in Psychology, Vol. 52). Amsterdam: North-Holland, pp. 139–183. doi: 10.1016/S0166-4115(08)62386-9. hart1988nasatlx

Open Questions¶

How well does the pairwise-comparison weighting improve sensitivity over unweighted (Raw TLX) scoring in practice?
How do the six subscales behave under automation, where physical demand falls but monitoring and temporal demand may rise (connecting to Supervisory Control Of Automation)?
How stable is the instrument across cultures and operational domains beyond the aviation and laboratory tasks in the original study?