upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

How To Specify the Participant Group Size for Usability Studies: A Practitioner’s Guide

Ritch Macefield

Journal of Usability Studies, Volume 5, Issue 1, Nov 2009, pp. 34 - 45

Article Contents

Studies Related to Problem Discovery

Many commercial usability studies are concerned with problem discovery in interfaces and here practitioners need to keep in mind two important and interrelated facts.

First, unlike widgets and people, it is not always easy to objectively define and/or identify a problem. This is primarily because, as pointed out by Caulton (2001), problems are a function of the interaction and do not necessarily constitute a static feature of the interface. So a feature of the system may constitute a problem for one user but not another and, similarly, it may constitute a problem for a user on one day but not the next. Problems also arise from rich and complex interrelationships between features so it is not always easy to “pin them down.” In summary, problems with interfaces are often fuzzy and subjective in nature. Indeed, these properties of problems are one reason why there is so much controversy as to what statistical methods and thinking best applies to these studies.

Second, an important goal of these studies is typically to rank the severity of problems. Put another way, simple enumeration of problems (and analysis on that basis) would not typically be a useful exercise within these studies. Yet such ranking is an issue that is not well addressed in current research literature (although it is often mentioned e.g., Faulkner, 2003). A possible reason for this is that ranking problems is complex and highly subjective matter. There may even be disagreement within a study team as to what mechanism and heuristics should be used to rank problems. Similarly, practitioners often disagree as to whether a feature of the system constitutes a problem at all.

Problem discovery level and context criticality

Table 1 is an abstract from Faulkner (2003) showing how, based on a large number of studies, various participant group sizes (“No. Users” column) probably influences the problem discovery level that a study will achieve. If we accept this advice we can simply specify the group size according to the probable mean and/or minimum level of problem discovery we are seeking.

Table 1. Abstract from Faulkner (2003)

Table 1

This leaves the challenge of how to determine what problem discovery level is appropriate for a particular study. There are some factors to aid us in meeting this challenge; we can easily argue that high(er) problem detection levels are desirable in the following contexts:

Complexity of the study

Another key reason why we must be careful not to over generalize advice concerning study group sizes relates to the complexity of a study. For example, Hudson (2001) and Spool and Schroeder (2001) have criticized the advice in Nielsen (2000) that five participants is optimal for these studies because this advice is underpinned by relatively simple studies utilizing quite closed/specific tasks. By contract, Spool and Schroeder (2001) conducted more complex studies, utilizing very open tasks, and found that five participants would probably discover only 35% of the problems in an interface. Similarly, Caulton (2001) and Woolrych and Cockton (2001) attacked Nielsen’s advice on the basis that he had grossly underestimated the impact of variation across individual participants within a particular study.

Taking this into account, it is argued here that the optimal group size should be influenced by the study’s complexity, with larger numbers of participants being required for more complex studies.

This leads us to the challenge of assessing a study’s complexity and, again, there are factors to aid us here. It is easy to argue that a study’s complexity typically increases along with increases in the following factors:

Another key factor here is the nature and volume of any training that the target user group would be given on the system, and which must then be reflected in the study design. Studies requiring such training are common with many non-pervasive systems (e.g., call centre applications or accounting systems) and this has the potential to increase a study’s complexity because any variation in the training input can easily become a contaminating experimental effect. On the other hand, if the training input is consistent and well reflects the training actually used for the target users, we can argue that this decreases complexity because the study participants should well reflect the target users in terms of what relevant knowledge they will bring to the interactions.

These factors can also be used as criteria to help determine the relevance of particular research literature i.e., it is preferable that practitioners are informed by literature underpinned by studies that have similar (levels of) complexity to that they are designing.

To summarize here, there is no “one size fits all” figure for the optimal group size for usability studies related to problem discovery. Rather, this should be influenced by the study’s context and complexity. Further, practitioners should accept these studies will inevitably involve a degree of subjectivity and that any numeric values that result are indicative. Similarly, they should view these studies as being formative and diagnostic exercises rather than (quasi) experiments designed to give objective answers. Indeed, it could be argued that the considerable volume of research literature that seeks to apply statistical methods to this type of study is not as important as some might think; particularly given that this literature has (understandable) little to offer as to how statistical methods might account for problem of differing severity.

However, there is the following advice from the research community that is useful to consider here:

Studies related to problem discovery in early conceptual prototypes

Usability practitioners often need to study novel interface design concepts. These range from new types of control to whole new interface paradigms. Most of these studies involve an early conceptual prototype and are worthy of special consideration here for the following reasons:

These studies are typically interested primarily in discovering severe usability problems (“show stoppers”) at an early stage so that we do not waste resources refining design concepts that are ultimately unviable.

Because the conceptual prototypes are produced early in the SDLC, they are more likely to contain errors than would be the case with more mature prototypes or working systems. These may be technical errors (bugs) or articulator errors (the way in which a concept works).

Interfaces exploiting novel design concepts typically present significantly greater usability challenges for users than is the case for more conventional interface designs. This is because the novelty, by its very nature, limits the usefulness of any existing (tacit) knowledge that the user has of operating interfaces (e.g., Macefield, 2005, 2007; Raskin, 1994; Sasse, 1993, 1997).

Given this, it is easy to argue that these prototypes are likely to contain more (severe) usability problems than systems exploiting more conventional interface design concepts. In turn, it is easy to argue that this significantly increases the likelihood that fewer study participants will be required to discover these problems. Therefore, we can argue that with studies involving early conceptual prototypes, the degree of novelty is inversely proportional to the number of participants that are likely to be required.

Another factor that drives the optimal group size for this type of study towards the lower end of the range is that early conceptual prototypes are typically quite low fidelity and very limited in scope. This is primarily to mitigate the risk of expending resources on developing unviable design concepts. As a consequence, these prototypes are typically capable of supporting only simple/constrained tasks. As such, it is easy to argue that these studies are often relatively simple in nature and, therefore, it is easy to argue that the advice from e.g., Nielsen (2000) to use small study group sizes is particularly relevant here (because Nielsen’s advice is underpinned by relatively simple studies).

To summaries here, it is easy to argue that for most studies related to problem discovery a group size of 3-20 participants is valid, with 5-10 participants being a sensible baseline range, and that the group size should be increased along with the study’s complexity and the criticality of its context. In the case of studies related to problem discovery in early conceptual prototypes, there are typically factors that drive the optimal group size towards the lower end of this range.


Previous | Next