upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

Discourse Variations Between Usability Tests and Usability Reports

Erin Friess

Journal of Usability Studies, Volume 6, Issue 3, May 2011, pp. 102 - 116

Article Contents


During the three oral usability reports, which took on average about four minutes, 31 total usability findings were presented to the large group. Team A (Ericka and John, who interviewed P1) presented 11 findings. Team B (Tom and Tara, who interviewed P2) presented 13 findings. Team C (Laura and Tom, who interviewed P3) presented 7 findings. Of these 31 findings, 26 findings (83.9% of the total findings) could be found to have some basis (though not always an accurate basis) in the respective UT session either through the language used by the usability participant or through reasonable assumptions from the actions described in the think aloud. The remaining findings presented in the usability reports (16.1% of the total findings) had no discernible basis in the language of the end-user participants in the usability sessions. The breakdown of the findings presented in the oral reports is summarized in Figure 1.

Figure 1

Figure 1. Breakdown of source of findings in oral usability reports

Accurate Findings Stemming from Usability Sessions

Of the findings mentioned in the oral report that had some basis in UT, 65.4% seemingly accurately conveyed the intent of the statement made in the evaluation. Though the finding in the oral report did not have to be identical to the finding in the study in order to be considered accurate, the finding had to clearly represent the same notion that was conveyed in the test. Figure 2 shows the breakdown of accurate and potentially inaccurate findings in the findings with some basis in usability testing.

Figure 2

Figure 2. Breakdown of findings with some basis in usability testing

Accurate “Sound Bite” Findings

Of the seemingly accurate findings, 70.6% were derived from singular, discrete clauses in the UT that I call “sound bite” findings, as opposed to interpreted findings. Sound bite findings are based on a small portion of a conversation and are often verbatim repetitions of the words used in the usability testing by the testing participants.

As an example of a sound bite finding, in a post-task questioning, Tom asked P2 if there was “anything in particular that caused [P2] uncertainty while [he was] completing the task.” After initially denying that there was anything, P2 said, “Well, that one thing said ‘mailpiece.’ I don’t [know] what that meant. Sounds like ‘hairpiece.’” In the subsequent oral report, Tom said, “He also mentioned that he didn’t like the term ‘mailpiece.’” Clearly, Tom’s statement stemmed directly from P2’s statements during the think aloud study. Additionally, Ericka stated in her oral report that P1 “said straight up that she skipped [a section of the document] because it reminded her of a textbook.” This clearly referenced an exchange in the usability study itself:

P1: Yeah, I didn’t read this page but I hate it.
Ericka: Why?
P1: It reminds me of my 10th grade history textbook.
Ericka: In what way?
P1: Just a lot of text, ya know.

Figure 3 shows the breakdown of sound bite and interpretation findings in the accurate findings from usability testing.

Figure 3

Figure 3. Breakdown of accurate findings from usability testing

Accurate Interpretation Findings

The remaining accurate findings (29.4% of the accurate findings) were not derived from a singular sound bite, but were accurate interpretations from the UT. For example John, in the report of P1’s test, said, “Just to clarify, she only used the table of contents. She didn’t read from front to end, and that jumping back and forth might have hurt her.” Though P1 never said that she only used the table of contents (and not the headings) and she never said that she was “jumping back and forth,” the actions and behaviors revealed in the think aloud show this to be an accurate account of what occurred. Indeed, in one task, P1 flipped to the table of contents five times and spent over 10 minutes attempting to complete a task, but in that time she never looked at the particular page that had the information required to complete the task.

Similarly, Tom said of P3, “She seemed to have a little trouble navigating the information at the beginning.” Again, P3 never said that she had trouble navigating the information, but her actions, which involved her flipping through pages of the book and then, after acknowledging “getting frustrated” (though she never explicitly said at what), arbitrarily picking a response, seemed to appropriately indicate that she did, indeed, have “trouble navigating the information.”

Potentially Inaccurate Findings Stemming from Usability Sessions

Of the findings mentioned in the oral report that had some basis in the UT, 34.6% of the findings seemingly inaccurately conveyed the intent of the statement made in the evaluation. A breakdown of the types of inaccurate findings is shown in Figure 4.

Figure 4

Figure 4. Breakdown of potentially inaccurate findings from usability testing

Potentially Inaccurate Sound Bite Findings

Of these inaccurate findings, 22.2% dealt with a singular, discrete finding taken out of context. For example, in the oral report about P2’s test, Tom said, “He liked the chart on the extra services page.” Technically, this is correct, as P2 said in his usability test shortly after seeing the chart, “I really like this chart.” However, P2 could not use the chart to find an appropriate solution to the task. Additionally, though P2 said that he “really like[d]” the chart upon first blush, he later recanted that statement with, “Wow, it looks good, but now that I’m trying to, you know [make a decision with it], it’s kinda confusing.” Tom asked, “What’s confusing about it?” P2 said, “I guess it’s all these terms. I don’t know what they mean.” Therefore, although P2 did initially say that he liked the chart, Tom’s presentation of that statement is not entirely accurate.

In another example, Ericka said of P1, “she was a little confused by [the icons].” And, indeed, P1 did say in her usability test, “What are these supposed to be?” However, less than a minute later, P1 said, “Oh, these are pictures of the shapes of things I could be mailing. So I can use Delivery Confirmation if I have a box, or a tube, or a big letter, but I can’t use it if I have only a little letter.” She then completed the task using the chart with the icons. Therefore, although P1 did initially show some trepidation with the icons, through her use of the document she learned how the icons were supposed to be used. However, Ericka seized upon a singular statement of confusion and used that as a finding, rather than placing it within the context of the study.

Potentially Inaccurate Interpretation Findings

The remaining inaccurate findings that had some basis in the usability study dealt with findings that seemed to potentially have problems in their interpretation. For example, Tom, in his report of the session with P2 said, “He didn’t like the font on the measuring pages.” Indeed, P2 did discuss the font on the pages that dealt with measuring mailpieces, but his comments did not necessarily indicate that he “didn’t like” the font (which P2 refers to as a “script”). The exchange went as follows:

P2: Huh, that’s an interesting script.
Tom: Where?
P2: Here. I don’t think I’ve ever seen that before, and I read a lot. Okay, here we go. This here looks like it’s discussing how to pack...

While P2 made the comments that the “script” was “interesting” and that he hadn’t seen it before, he didn’t indicate, either through his language or his intonation, that he did not like the font. Tom did not further question P2 to determine if his comments meant that P2 truly did not like the font or the comments could be taken at face value. Therefore, the assumption that P2 “didn’t like the font” was perhaps faulty.

Similarly, in a discussion of a chart, Laura said of P3, “She liked the chart.” P3 did mention the chart in her usability session, but her comments were more equivocal than what Laura presented. In her session, P3 said, “There’s something about this chart...there’s all the stuff I need, but...wait I think I’m on the wrong page.” Laura did not follow up with what P3 meant by “there’s something about this chart.” From her language choices, it was not clear whether the “something” was a positive attribute or a negative attribute. Therefore the interpretation that Laura made that P3 “liked the chart” may or may not be appropriate.

Findings with No Substantiation in Usability Sessions

In addition to the findings that potentially had some basis in usability sessions, 16.1% of the findings reported in the oral reports seemingly had no basis in the usability sessions. For example, Tom, in discussing what P2 said of a service chart, said, “...he really liked the chart. Said that it made him feel confident in the document.” However, at no time during P2’s session did he ever mention the word “confident.” Further, in the discussion of the particular chart, P2 indicated that this particular chart was “odd” and “misleading.”  Therefore, it appears that Tom’s statements in the oral report that P2 felt confidence with the chart have no grounding in the usability study.

In another example, Ericka said that P1 “...didn’t like the hand drawn images, but liked the line art.” In this example, the first clause regarding the hand drawn images is an accurate sound bite because in her study P1 said, “the icons that were drawn [by hand] didn’t help [me feel like this was a document for adults and not for kids]. Yeah, these ones [indicating the hand drawn icons], not so much.” However, the second clause referencing the line art has no basis in the usability study. At no point in the study did P1 indicate that she “liked” any of the images that were created using line art. The only alternative to the hand drawn images given by P1 was photographs, in “I think maybe photos are the way to go.” Therefore, the indication that P1 liked line art was unsubstantiated in the usability study itself.

Findings from Usability Sessions Not Mentioned in Oral Reports

Not surprisingly, many potential findings in the usability studies were not mentioned in the brief oral reports. The potential findings were determined by the two raters in their list of items that were mentioned in the study but not mentioned in the report. The two raters had a 96.8% agreement on the list of items, with one rater identifying one finding that the other rater did not, which was not included in the list. According to the raters, the three tests averaged 33.3 discrete findings per session, with the high being 37 findings from Team A and the low being 28 findings from Team C. (Though an issue might have been mentioned by a UT participant several times, it was only counted once as a finding). During the oral reports, 10.3 findings were reported on average, with the high being 13 findings from Team B and the low being 7 findings from Team C. After removing the findings reported in the oral reports that had no basis in the usability test, approximately 26% of the total findings from the usability sessions were reported in oral reports.

Figure 5

Figure 5. Summary of usability findings presented in oral usability reports

In this case study, it appears that the group used findings from usability testing in highly particular ways (see Figure 5). Ultimately, it appears that three results emerged:


Previous | Next