upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

The Combined Walkthrough: Measuring Behavioral, Affective, and Cognitive Information in Usability Testing

Timo Partala and Riitta Kangaskorte

Journal of Usability Studies, Volume 5, Issue 1, Nov 2009, pp. 21 - 33

Article Contents


Discussion

The results of the experiment suggest that the approach developed for this research succeeded in producing useful information about the participants’ behavior, affects, and cognition in single usability testing sessions of interactive media software. In this experiment, we used quantitative scales to study the participants’ emotions. This approach was easily understood by the participants and produced significant variations in experienced affective valence and arousal for different tasks. The participants were given detailed instructions validated in psychological basic research, and it seemed that the participants had no difficulties in distinguishing between emotional valence and arousal or overall emotional experience and experience related to a particular media element alone. The valence ratings were useful in evaluating the severity of the usability problems found. The neutral middle point of the valence dimension could be used as a critical value. If the test participants’ subjective ratings of valence fall on the negative side on average, an intervention in the form of redesigning (a part of) the user interface is clearly necessary. Arousal ratings gave further information about the depth of the participants’ responses, especially in conjunction with negative emotional valence.

Significant correlations were found between the different usability measures. Many other studies have reported lower correlations. In a recent meta-analysis of correlations among usability measures in 73 human-computer interaction studies, Hornbæk and Law (2007) found mostly low (absolute value <.3) correlations between measures of effectiveness (e.g., errors), efficiency (e.g., task times), and subjective satisfaction (e.g., satisfaction rating or preference). The correlations from this experiment are comparable to those presented by Hornbæk and Law, because in both studies the correlations were calculated at the least aggregated level possible (in the case of this experiment, at the level of single tasks). The higher correlations obtained in this experiment may be due to the context of ordinary usability testing, while studies reviewed by Hornbæk and Law were in many cases experiments comparing innovative user interfaces or interaction techniques. Even though the between-measure correlations were higher in the present study than those reported by Hornbæk and Law (2007), the current results still suggest that it is beneficial to measure different aspects of usability in order to obtain a more coherent view of the usability of software. The coefficients of determination between the objective and subjective measures obtained were below .5, which suggests that each measure gave information that complements the other measures. By measuring, for example, task times or experienced affective valence alone, the other variable cannot be predicted reliably.

The use of positive and negative affective media elements significantly affected the valence ratings of overall subjective experiences related to the interaction when completing the tasks. In the context of a positive media element, the participants evaluated their overall user experience related to the information retrieval task as more positive than in the context of a negative media element. The media elements used in the experiment were deliberately chosen as clearly positive and negative. These results suggest that user experiences of interactive media can be guided in the desired direction by using dominant audio and video media elements with emotional valence. These findings have some design implications. In order to design for coherent emotional user experiences, the designers have to take into account both the affective effects of the media elements used in the product and the effects of different user interface solutions and usability problems.

When compared to other existing usability evaluation methods, the current approach has some advantages. First, it is an empirical method and the data gathered is based on observations of real system usage. Second, it is an integrated method capable of producing cognitive, behavioral, and affective information in the same test trials. Third, it combines some of the benefits of usability testing and expert evaluation. It is possible to get objective data based on the participants’ performance in the tasks, but it also enables using expert judgment to detect problems when observing the participants carrying out the tasks. In the current experiment, the instructions for the researcher were detailed, but experienced usability experts could use their judgment more freely in identifying usability problems when observing the participant. In addition to the cognitive questions developed based on cognitive walkthrough, they could also ask the participant additional questions in a more open-ended fashion.

The method also has some challenges. First, learning the method demands some practice and the researcher also has to be familiar with the software to design representative tasks. On the other hand, the number of test participants does not have to be very large. Because of our research goals, we carried out this study with 16 participants, but in practice a smaller number of participants would have been enough to find a clear majority of the problems and to get a realistic idea about task times and the users’ experiences and processes on the affective and cognitive levels. The second challenge is related to a possibility for human errors, for example, in this experiment, the evaluator accidentally skipped a couple of problems that she was supposed to react to. On the other hand, there were a few occasions when participants did not agree with the evaluator about the existence of a usability problem. These disadvantages can be complemented with good instructions, practice, and a post-hoc video analysis to make it possible to detect missed usability problems afterwards.

Previous | Next