upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

Card Sort Analysis Best Practices

Carol Righi, Janice James, Michael Beasley, Donald L. Day, Jean E. Fox, Jennifer Gieber, Chris Howe, and Laconya Ruby

Journal of Usability Studies, Volume 8, Issue 3, May 2013, pp. 69 - 89

Article Contents

Create the Top-Level Categories

After an initial look at the item-by-item matrix and dendrogram, you can now create your top-level categories.

Studies from over half a century have demonstrated that for most cognitive tasks, people are most comfortable with about seven plus or minus two distinct items to consider2. In terms of a card sort study, this corresponds to the number of top-level groupings on a website or application, often depicted as tabs. The interactive dendrograms generated by card sorting tools provide a slider, helping you to find the optimum point between about five to nine final content categories.

While trying to derive five to nine top-level categories for a website or application using the slider described above, it is also important to pay attention to the size of the categories. Categories with a relatively large amount of content may signal that the group needs to be broken into multiple subcategories. In one regard, analysis of card sort data can be considered a balance between the two competing objectives of number and size of categories.

Note that it’s typically not possible to define all categories with a single placement of the vertical line (slider). For example, you may find when you settle on a number of categories between five to nine, that most of the categories you have created are reasonable, except for one. In this case, you need to move the slider to the left to increase the correlation standard for group membership for that group only, creating more subdivisions within this group, while maintaining the structure of the other categories. In fact, this activity is how you can create subcategories from the data, even when the participants only sorted items at one hierarchical level. We will address this activity at greater depth later.

When you’ve reached a happy balance between the number of categories and the number of items within each group, by manipulating the slider in the dendrogram, go back to the item-by-item matrix. Look to see how many cells have high correlations (those cells with the darker background color in Figure 1). Realistically, if half to two-thirds of rows have a high correlation, your categories are in good shape, and you can move on to the next step in your analysis. If not, continue experimenting with the number and size of categories in the dendrogram and recheck the item-by-item matrix.

If you have fewer cells with high agreement, it may be a result of different participants having very different mental models for how your content should be organized. To test this hypothesis, try separating your participants’ data by user type and then recheck the matrix. If both sets of data show more items that correlate highly with one another, then you are probably working with participant groups who have very different mental models. At this point, you will need to decide how to accommodate the different groupings. Possible design solutions may include using one of the groupings for the IA and accommodating the other via crosslinks or search filters.

If splitting the data by participant groups doesn’t result in stronger agreements, then you should examine your data more closely. First, recheck for outliers as discussed earlier and try removing them from the data to see if results become clearer. The problem may also be that you didn’t have enough participants for a clear model to emerge3 or that your content space is very complex and not easily understood.

If you are concerned that your data set is too small to make any definitive conclusions, then you may consider running more card sorting sessions. Or, if the complexity of the data seems to be causing your participants a lot of uncertainty in their categorizations, then you may want to recruit more expert participants. However, this will skew results toward the expert user’s perspective, so you will need to provide additional supporting design elements for new users as they learn both the domain and your content.

2Miller, G. A. (1956). "The magical number seven, plus or minus two: Some limits on our capacity for processing information". Psychological Review 63 (2): 81–97. Note that since this original article was published, there have been many studies that have provided more nuanced interpretation to this guideline; however, using seven plus or minus two categories is a widely accepted heuristic for IA development for most typical websites.
3Researchers have explored the issue of how many participants are required to generate valid card sort data, e.g., Tullis, T. & Wood, L., (2004), How many users are enough for a card-sorting study? http://home.comcast.net/~tomtullis/publications/UPA2004CardSorting.pdf


Previous | Next