upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

The Combined Walkthrough: Measuring Behavioral, Affective, and Cognitive Information in Usability Testing

Timo Partala and Riitta Kangaskorte

Journal of Usability Studies, Volume 5, Issue 1, Nov 2009, pp. 21 - 33

Article Contents


The following sections discuss the identified usability problems, the emotional experiences, task times and completion rates, correlations of measures, and experiences of using the method.

Identified Usability Problems

By observing the participants during the experiment, the researcher identified a total 146 usability problems. Of these, 136 (93.2%, on average 8.5 per participant and 1.2 per task) were also judged as usability problems based on the participants’ answers to the cognitive questions (if the participant answered “No” to any of the cognitive walkthrough questions for the identified problem, or he or she could not give a simple yes or no answer, then the researcher judged that the participant’s comments indicated the presence of a usability problem). For the three questions, the participants’ answers indicated that there was a problem related to the first question (awareness) 60 times, to the second question (association) 78 times, and to the third question (feedback) 15 times. Further analyses revealed that 116 out of the 136 problems were identified by only question 1 and 2. All the usability problems indicated by question 3 were also indicated by either or both of the other questions. The distribution of found usability problems by unique question is presented in Table 2.

Table 2. The Number of Usability Problems Detected by Each Question or Combination of Questions.

Table 2

The figures presented in Table 2 contain multiple instances of the same usability problems. When analyzing the found usability problems, it was in some cases difficult to determine which of the found problems were unique because of the differences in the participants’ behavior and answers to the cognitive questions concerning the usability problems. Thus, a formal analysis of the number of unique usability problems was not fully feasible, but on estimate about one fourth of the problems presented in Table 2 (35 cases) were unique.

Among the incomplete tasks, there were only five cases in which the researcher’s questions related to the participant’s cognition did not give any indication of the nature of the usability problem resulting in a failure to complete the task. In the other 31 cases the cognitive questions found at least one potential explanation for the critical problem.

The found usability problems were related, for example, to the following aspects of the media software:

Emotional Experiences

The average overall task-related emotional experiences of the participants for all tasks were 4.6 on the valence scale and also 4.6 on the arousal scale. However, there was a significant amount of variation in the ratings of emotional experiences. The average overall task-related emotional experiences for the first three tasks (which did not include dominant media elements) were 4.9 on the valence scale and 4.5 on the arousal scale. The overall task-related user experiences for the four tasks (which contained a dominant media element) are presented in Figure 1.

Figure 1

Figure 1. The average overall task-level affective experiences for the tasks containing media elements

The statistical analysis showed that the participants evaluated their task-level user experience as more positive in valence in response to the tasks containing a positive audio element than to tasks containing a negative audio element Z = 3.2, p < 0.01. The other pairwise differences for the evaluations of task-level affective experiences were not statistically significant. The average emotional experiences of the participants in relation to the dominant audio and video elements alone are presented in Figure 2.

Figure 2

Figure 2. Average emotional experiences for the video and audio clips alone

As expected, the participants rated their experienced affective responses to the positive audio as significantly more positive in valence than to the negative audio Z = 3.0, p < 0.01. Similarly, the participants rated their experienced affective responses to the positive video as significantly more positive in valence than to the negative video Z = 2.9, p < 0.01. The differences in experienced arousal were not significant, but they were approaching statistical significance (p < 0.07 for both audio and video).

Task Times and Completion Rates

Of the 111 analyzed tasks, in 36 cases (32.4%) the participants did not reach the goal in four minutes (double the estimated average task time); these cases were considered as incomplete tasks. Average task time for the tasks that were successfully completed (N=75, 67.6%) was 125.2 seconds. The average task times for the different tasks varied from 96.5 seconds to 198.8 seconds.

Correlations of Measures

The correlations between the different measures were first calculated for all the completed tasks. For comparison, correlations were also calculated for completed tasks 1-3 (the tasks that did not contain dominant media elements, N=36). In order to also obtain a view on the incomplete tasks, correlations were calculated separately for that data (N=36) excluding the data for task times. The correlations for the different measures are presented in Table 3.

Table 3. Correlations of Different Measures Measured for the Completed Tasks (N=75)

Table 3

For the completed tasks 1-3, the correlations were as follows: task times: usability problems .64, task times: valence -.48, task times: arousal .43, and valence: arousal -.52, all significant (p<0.01). The following correlations were also significant (p < 0.05): usability problems: valence -.40 and usability problems: arousal .35. For the tasks that the participants could not finish by themselves (N=36), the correlations were as follows: usability problems: valence .01, usability problems: arousal .19, and valence: arousal -.53 (the last correlation is significant, p < 0.01). Task times were not included in this analysis.

Experiences of Using the Method

A researcher in usability conducted the experiment. She had previous experience of using the cognitive walkthrough method as a usability inspection method. Before the experiment began, the creator of the combined walkthrough method thoroughly instructed the researcher on the method. The method and arrangements were tested in pilot studies before the actual experiment occurred. The researcher analyzed introspectively her experiences of using the method in practice. According to her, the method was clearly defined and efficient. However, using the method required some practice. Firstly, it was a challenge for the researcher to note all the points in which a test participant takes an incorrect action during a task. This was especially true for the interactive media software used in the experiment in which there is more than one way to finish the tasks. Secondly, the researcher has to be able to interpret the test participants’ answers to the cognitive walkthrough if the test participants cannot give a simple yes or no answer, but instead explains his or her thoughts with many sentences. However, information obtained from free comments related to a particular problematic point with the user interface can be valuable in further developing the user interface.

It is essential for a novice researcher to have detailed instructions for using the combined walkthrough method. In addition, a researcher has to be familiar with the software that is the object of evaluation in order to be able to design the tasks and detect incorrect actions. In the case of this experiment, the creator of the method assisted the researcher in designing the tasks.

Previous | Next