upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

Heuristic Evaluation Quality Score (HEQS): Defining Heuristic Expertise

Shazeeye Kirmani

Journal of Usability Studies, Volume 4, Issue 1, November 2008, pp. 31-48

Article Contents


Heuristic evaluation is a discount usability engineering method involving a few evaluators who judge the compliance of an interface based on a set of heuristics. It is difficult for one evaluator to find all the usability problems with an interface hence a few evaluators, preferably between three to five evaluators, are suggested. This optimal range gives the best benefit-to-cost ratio (Nielsen & Landauer, 1993). Because the quality of the evaluation is highly dependent on their skills, it is critical to measure these skills to ensure evaluations are of a certain standard. This popular technique that is used by 76% of the usability community (UPA Survey, 2005) and has a high a cost-to-benefit ratio of 1:48 (Nielsen, 1994) emphasizes the assessment of heuristic evaluation skills. More so, evaluators are extremely confident of their abilities as experts. An online heuristic evaluation competition held in November 2007 by UPA, Bangalore, India asked contestants to rate themselves on a scale of 5, where 5 meant that they absolutely thought they would win the competition. Results showed that 85% of 20 contestants felt confident of winning the competition. They scored 4 or more on a scale of 5. This confidence or the inability to authenticate it threatens the quality of heuristic evaluation. Experts could misuse this false confidence, intentionally or unintentionally, to provide expertise of sub optimal standards limiting the usability of the applications that they evaluate. This issue can pose grave risk to the users of these applications who depend on these applications, in some cases, to save their lives. Hence, it is critical to quantify this expertise to ensure evaluations of a certain standard.

A framework to quantify heuristic evaluation skills was proposed by the authors (Kirmani & Rajasekaran, 2007). Quantification is based on the number of unique, valid issues identified by the evaluators as well as the severity of each issue. Unique in this context refers to a problem that could be repeated in more than one place but still counted as a single, unique problem with several instances. Unique, valid issues are categorized into eight user interface parameters and severity is categorized into three. The three categories of severity are showstoppers or catastrophic issues preventing users from accomplishing goals, major issues or issues causing users to waste time or increase learning significantly, and irritants or cosmetic issues violating minor usability guidelines. Weights of 5, 3, and 1 are assigned to showstoppers, major issues, and irritants respectively. A Heuristic Evaluation Quality Score (HEQS) is computed for each evaluator by multiplying the weight factor with the number of issues in that severity category. For example, Evaluator A has identified 2 showstoppers, 10 major issues, and 20 irritants his HEQS= 2*5+10*3+20*1 =60. A benchmark of the collated evaluations of all the evaluators is used to compare skills across applications as well as within applications. If the benchmark HEQS is 200 then Evaluator A identified an HEQS% of 60/200 or 30%. Skills are also computed for eight User Interface (UI) parameters to identify strengths and weaknesses of the evaluators. The eight parameters are Interaction Design, Information Architecture, Visual Design, Navigation, Labeling, Content, Functionality, and Other (for issues that do not fall into the first seven).

This metric has been used to compare the heuristic expertise of individual evaluators with other evaluators across or within applications to base evaluations on individual strengths. It has also been used to identify weaknesses and train evaluators in those skills. Measuring improvement based on training and tailoring training programs to groups or individuals based on this methodology are some other applications.

What has not been addressed in the previous study is a definition of heuristic expertise at a global level. This study aims to define these standards for the world wide usability community. It is also known that many such competitions will need to be conducted before these results can be generalized and this is a first attempt to do so.

In particular the following questions are addressed:

Previous | Next