Determining What Individual SUS Scores Mean: Adding an Adjective Rating Scale

Aaron Bangor, Philip Kortum, and James Miller

Journal of Usability Studies, Volume 4, Issue 3, May 2009, pp. 114-123

There are numerous surveys available to usability practitioners to aid them in assessing the usability of a product or service. Many of these surveys are used to evaluate specific types of interfaces, while others can be used to evaluate a wider range of interface types. The System Usability Scale (SUS) (Brooke, 1996) is one of the surveys that can be used to assess the usability of a variety of products or services. There are several characteristics of the SUS that makes its use attractive. First, it is composed of only ten statements, so it is relatively quick and easy for study participants to complete and for administrators to score. Second, it is nonproprietary, so it is cost effective to use and can be scored very quickly, immediately after completion. Third, the SUS is technology agnostic, which means that it can be used by a broad group of usability practitioners to evaluate almost any type of user interface, including Web sites, cell phones, interactive voice response (IVR) systems (both touch-tone and speech), TV applications, and more. Lastly, the result of the survey is a single score, ranging from 0 to 100, and is relatively easy to understand by a wide range of people from other disciplines who work on project teams.

Bangor, Kortum, and Miller (2008) described the results of 2,324 SUS surveys from 206 usability tests collected over a ten year period. In that study, it was found that the SUS was highly reliable (alpha = 0.91) and useful over a wide range of interface types. The study also concluded that while there was a small, significant correlation between age and SUS scores (SUS scores decreasing with increasing age), there was no effect of gender. Further, it was confirmed that the SUS was predictive of impacts of changes to the user interface on usability when multiple changes to a single product were made over a large number of iterations. Other researchers have also found that the SUS is a compact and effective instrument for measuring usability. Tullis and Stetson (2004) measured the usability of two Web sites using five different surveys (including the Questionnaire for User Interaction Satisfaction [QUIS], the SUS, the Computer System Usability Questionnaire [CSUQ], and two vendor specific surveys) and found that the SUS provided the most reliable results across a wide range of sample sizes. One of the unanswered questions from previous research has been the meaning of a specific SUS score in describing a productís usability. Is a score of 50 sufficient to say that a product is usable, or is a score of 75 or 100 required?

Over the course of the 10 year study reported by Bangor, Kortum, and Miller an anecdotal pattern in the test scores had begun to emerge that equated quite well with letter grades given at most major universities. The concept of applying a letter grade to the usability of the product was appealing because it is familiar to most of the people who work on design teams regardless of their discipline. Having an easy-to-understand, familiar reference point that can be easily understood by engineers and project managers facilitates the communication of the results of testing. Like the standard letter grade scale, products that scored in the 90s were exceptional, products that scored in the 80s were good, and products that scored in the 70s were acceptable. Anything below a 70 had usability issues that were cause for concern. While this concept was intuitive, we believed that a validated scale in which the usability of a product could be assigned an adjective description might be even more useful.

Bangor, Kortum, and Miller reported the results of a pilot study that sought to map descriptive adjectives (e.g., good, awful, etc.) to the range of SUS scores. This paper presents the final results of that study.

