The Usability of Computerized Card Sorting: A Comparison of Three Applications by Researchers and End Users
Barbara S. Chaparro, Veronica D. Hinkle, and Shannon K. Riley
Journal of Usability Studies, Volume 4, Issue 1, November 2008, pp. 31-48
Article Contents
Results
The following sections discuss task success, task difficulty, task completion time, satisfaction scores, and preference rankings.
Task success
All participants were successful on all tasks with all programs.
Task difficulty
Mean difficulty scores for each task by program are presented in Table 4 and summarized in Figure 8. A two-way within subjects ANOVA (task x program) was conducted to compare the average difficulty across tasks and applications. Results indicate a significant main effect of application, F(2,14) = 9.90, p <.01, η2 =.59, but no main effect of task and no interaction. Post-hoc comparisons revealed that CardZort was rated significantly more difficult overall than Opensort and WebSort. It should be noted that while there was a difference across programs, all difficulty ratings were fairly low indicating that the participants were able to complete the tasks with relative ease.
| Task | CardZort | WebSort | OpenSort |
|---|---|---|---|
| Sort items into groups. | 2.00 (1.07) | 1.88 (.64) | 1.00 (.00) |
| Name the groups. | 2.13 (.99) | 1.13 (.35) | 1.38 (1.06) |
| Move items from any group to another group. | 2.13 (.83) | 2.00 (1.07) | 1.00 (.00) |
| Complete the sorting session. | 1.50 (1.50) | 1.00 (.00) | 1.50 (.76) |
| Tasks Average | 1.94 (.56) | 1.50 (.38) | 1.22 (.28) |

Figure 8. Mean task difficulty across applications.
Task completion time
Time-on-task was measured in seconds, from the start to end of each task and averaged across all participants for each application. A two-way within subjects ANOVA (task x program) was conducted to compare total time across tasks and application. Results showed no main effect of program, task, or any interaction.
Satisfaction
Satisfaction was measured using the 10-item System Usability Scale (Brooke, 1996) that provides a total score out of 100. A one-way within subjects ANOVA revealed significant differences between the scores, F(2, 14) = 5.88, p= .014, η2 =.46. Post-hoc tests revealed that participants were more satisfied with OpenSort than WebSort (Figure 9).

Figure 9. Mean satisfaction scores across applications.
Preference
Participants unanimously chose OpenSort as the most preferred card sort application (Figure 10). Preference differences across applications was analyzed using a Friedman's Chi Square, X2 (2, N = 8)= 12.25, p< .01. Post-hoc tests showed that OpenSort was more preferred than WebSort (mean rank = 1.0 and 2.62, respectively).

Figure 10. Application preference ranking: Each bar represents the number of participants that chose that application first, second, or third.
