upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

Reverse Engineering of Content to Find Usability Problems: A Healthcare Case Study

Shadi Ghajar-Khosravi, Flora Wan, Samir Gupta, and Mark Chignell

Journal of Usability Studies, Volume 8, Issue 1, November 2012, pp. 16 - 28

Article Contents


Results

The following sections discuss the time each participant took to complete each method, the usability problems encountered with the tool, the usability questionnaire, and a summary of the results. Overall, participants found the reverse engineering approach an enjoyable testing process although they spent more time to perform the tasks and encountered more usability problems during the tasks.

Time

The task-based scenario method took participants an average of 12.6 minutes when done first and 11.4 minutes when done second. In contrast, the reverse engineering task was much faster (an average of 9.9 minutes) when done second than when done first (an average of 16.8 minutes). The longer performance time in the reverse engineering method when it is used first was likely due to increased task demands for that method. In the reverse engineering method, participants had to recreate the plan to look exactly like the one given, whereas in the task-based method, participants had more freedom to make certain decisions, such as whether the plan should be in a portrait or a landscape format. The more rigid requirements of the reverse engineering method caused participants to go through the tool in more depth (using more time) and through more paths (which would tend to take longer and potentially produce more errors).

Usability Problems

The detailed results concerning usability problems are shown in Table 1. A total of 110 usability problems were encountered, representing varying repetitions of 11 unique problems. The usability problems were identified by one of the authors (Flora Wan) acting as the tester. Problems were identified based on observation notes written during the sessions, verbal feedback from participants, and problems with the asthma action plans that were produced. In protocol B (where the reverse engineering testing method was performed first) a greater number of unique usability problems (11 unique problems compared to 9 in protocol A) were found throughout the two methods. All of the 11 unique problems were discovered during the reverse engineering testing method when it was performed first (in protocol B).

Usability problems can vary greatly in severity and uncovering more severe problems is generally more beneficial than finding minor problems (e.g., Molich & Dumas, 2008). We asked one respirologist and one human factors specialist to independently rate the 11 unique problems on a severity scale (0= not a problem, 1= cosmetic, 2 = minor, 3 = major, 4 = catastrophic). The average severity rating (across the two experts) for each usability problem is shown in the second column of Table 1 (in parentheses, following the column title). The problems ranged from an average severity rating of 1, for trying to carry out a search that was not a supported feature, to a problem with a rating of 3.5 (midway between major and catastrophic on the severity rating scale) that involved pre-selection of options.

In protocol B, participants had more difficulties using the Wikibreathe tool relative to the participants in protocol A. The six participants who were tested with protocol A encountered a total of 29 (16+13) usability problems throughout both testing methods. However, the six participants who were tested based on protocol B encountered a total of 81 (79+2) usability problems throughout both testing methods, 79 of which were encountered during the first (reverse engineering) testing method.

Table 1. Usability Problems (TM=Task-based testing method, REM=Reverse engineering testing method, Total # of participants =12, Total # of participants in each protocol = 6)

Table 1 Table 1

Reverse engineering demands more attention to details

As noted above, participants who were tested with the reverse engineering method first (protocol B) found two additional problems that were not found by the group of participants who were tested with the task-based scenario method first (protocol A). The two problems that were missed in protocol A were the following:

Figure 2

Figure 2. Selecting the layout of the asthma action plan

Both of these problems had an average severity of 2.5 placing them between minor and major on the severity scale.

One possible conclusion from this is that the reverse engineering testing method (when done first) demands more attention to detail (concerning layout and fonts in this case study) from participants as they need to compare the artifacts provided to them, as a model, with the one they are trying to create.

Why were some problems not encountered during those reverse engineering tests that were conducted after the task-based method (in Protocol A)? One possible explanation would be that when moving on to the reverse engineering testing method (second method) after performing in a task-based test method, participants were influenced by their experience with the previous method. Thus, they did not pay as much attention to the details when interacting with the product for the second time as they were already accustomed to the design of the tool and had already made some assumptions about how the system worked. This behavior would not occur in a usability test in which only reverse engineering was used.

It is possible that some of the specific deficiencies found with task-based usability testing in our case (e.g., failure to uncover the layout problem) may have resulted from the specific tasks that we chose to test. However, even if this were the case, it highlights the difficulty in choosing an appropriate set of tasks in the task-based approach when doing usability testing on a reasonably complex piece of software.

Usability Questionnaire

Immediately after each method, participants were asked to complete a questionnaire about the usability of the tool and their attitude towards the particular usability testing method that was used. The first part of the questionnaire was based on the SUS. The mean overall SUS scores shown below range from 0 to 100. The closer the score to 100, the more satisfied the participant would be with the product and probably the more usable the product is perceived.

System Usability Scale (SUS)

We found a potentially interesting relationship between number of usability problems found and overall usability scores calculated based on SUS ratings. Seventy-nine of the 81 problems that the six participants following protocol B found were encountered during the first testing method used, i.e., the reverse engineering testing method. Yet, the average SUS score in protocol B was higher than the corresponding average score for participants following protocol A, and the highest overall SUS score across all four conditions was obtained for the reverse engineering method done first. However, the differences are relatively small. Without a larger number of participants and a statistical test, we cannot be sure that these results are reliable.

Table 2. Mean Overall SUS Scores (out of 100) Across the Study Conditions (TM=Task-based testing method, REM=Reverse engineering testing method)

Table 2

Attitude toward the usability method

After completing the SUS questionnaire, participants were asked to answer a number of open-ended questions inquiring about their attitude towards the tool (for example, if they felt confident or frustrated while using the tool) and the usability evaluation method. Although our sample size was small, the tool attitudes did not appear to differ between the task-based and reverse engineering methods even though results showed that participants spent more time on the reverse engineering method and encountered more issues. Overall, participants had a positive attitude toward both methods.

During the post-experiment interview, several participants commented that it was more difficult to create the plan using the reverse engineering method, but it also encouraged them to explore the tool in more depth. Those comments suggest that participants were not less satisfied with the reverse engineering method in spite of the longer time it took to perform the tasks. By exploring the tool, these participants encountered errors that the other group did not. On the other hand, participants who followed the written instructions first commented that it helped them “learn” how to use the tool.

Summary of Results

The results of comparing the reverse engineering and task-based methods for usability testing can be summarized in the following major points.

Reverse engineering has the potential to discover more issues

The results of the case study showed that the reverse engineering testing method took longer to complete and participants encountered more issues compared to the task-based method, i.e., 79 vs. 16 issues when each method performed first and 13 vs. 2 issues when performed second. Thus, reverse engineering has the potential to uncover more usability problems than the traditional task-based testing approach. It is noteworthy that 79 of the 110 (72%) usability problems encountered were found in the situation where the reverse engineering method was being used, and it was being done first. Thus over 70% of the usability problem instances were found in just one of the four contexts (reverse engineering done first) that were examined.

Reverse engineering is an enjoyable task

Even though the reverse engineering task took longer and caused participants to commit more errors, their attitudes toward the tool and usability method were consistently positive. One participant even commented that she enjoyed the visual aspect of being able to look at something and then recreate it.

We speculate that reverse engineering activities (i.e., recreating artifacts) may take away some of the stress participants normally experience when carrying out a usability test. This may be due to a tendency for reverse engineering testing to make testing more enjoyable by making it more like a problem solving activity than a task performance. In addition, reverse engineering does not require test administrators to select and order task scenarios like task-based scenarios do.

Future case studies

While the case study that we carried out has some important implications for the use of reverse engineering in usability testing, it would be useful to extend the results with a consideration of other products and better matched participants. One of the limitations of this study was that the participants were not real asthmatics, respirologists, or asthma educators. It would be useful in a future case study to focus on a different product where the test participants are the intended users of that product. Also, in a future study, it would be interesting to compare the task-based and reverse-engineering method against a “self-directed” method where participants are asked to accomplish their own goal with a tool.

Previous | Next