Response Interpolation and Scale Sensitivity: Evidence Against 5-Point Scales

Kraig Finstad

Journal of Usability Studies, Volume 5, Issue 3, May 2010, pp. 104 - 110

Five-point Likert items are apparently not sensitive enough to capture a usability test participant’s true evaluations and are thus more likely to elicit attempts at responses outside the confines of the instrument. When a questionnaire with such an item is administered in person, the impact may be reduced because the facilitator can opt to request that respondents alter their responses to fit the categories. However, for an electronically-distributed survey with 5-point Likert items, the practical implication is that it may not be able to adequately capture data. For participants whose true subjective evaluation of a survey item is not expressed as a valid option, the only recourses are to choose a different, inaccurate response, or ignore the item entirely. The skipping of an item, in survey tools that don’t strictly regulate and validate responses, may cause more serious data loss in the form of missing cases. This becomes especially problematic in an instrument like the SUS where the scores are summed into a composite final score, as any discarded responses invalidate the entire response set for that participant. Conversely, there are negative implications for single-item usability evaluations. Sauro and Dumas (2009) noted the possibility for errors that might be introduced with a small number (five or seven) of discrete Likert responses, and noted that computerized sliding scales can allow for higher sensitivity. Their study did conclude that a single post-test, 7-point Likert item can be a sensitive and robust measure. This current research would predict that a similar 5-point single item evaluation would not perform as well, as errors (evidenced by interpolation) are significantly more likely to occur with 5-point scales. The data lost in a 10-item instrument like the SUS (through insensitivity, not missing cases) may be negligible, but if a usability evaluation relies on just one data point the impact is much greater. During the course of evaluating the appropriateness of such a single-item scale, the measure of interpolation itself can be used to quickly pilot test whether a Likert item is likely to elicit a measurement error.

It appears that a 7-point Likert item is more likely to reflect a respondent’s true subjective evaluation of a usability questionnaire item than a 5-point item scale. When one considers previous research and how it bears on the balance between sensitivity and efficiency (Diefenbach, Weinstein, & O’Reilly, 1993; Russell & Bobko, 1992), the 7-point item scale may represent a “sweet spot” in survey construction. That is, it is sensitive enough to minimize interpolations and is also compact enough to be responded to efficiently. In fact, the results of this study can be seen as a behavioral validation of the subjective results found in Diefenbach et al. (1993). In that study it was found that the 7-point item excelled not only in objective accuracy but also in perceived accuracy and ease of use.

The perception of accuracy is also very important here, as participants ranked 5-point items lower due to this subjective lack of accuracy. This feeling about 5-point items, that the categories available do not match the respondent’s true evaluation, may be manifested behaviorally as interpolations when the opportunity is present. Conversely, the lack of such behavior in this study’s 7-point item condition reflects the higher perception of accuracy seen in Diefenbach et al. (1993).

