upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

Conducting Iterative Usability Testing on a Web Site: Challenges and Benefits

Jennifer C. Romano Bergstrom, Erica L. Olmsted-Hawala, Jennifer M. Chen, and Elizabeth D. Murphy

Journal of Usability Studies, Volume 7, Issue 1, November 2011, pp. 9 - 30

Article Contents


Iteration 1 Conceptual Design

Iteration 1 was a low-fidelity usability test of the conceptual design, which was represented solely on paper (Romano, Olmsted-Hawala, & Murphy, 2009).  The original design was created by the development team with input from the project’s requirements management team that was also responsible for the design phase of the project, a group independent of the usability team.  The AFF project manager contacted us for usability testing once the conceptual design was ready.  The primary purpose of the first low-fidelity usability test was to assess the usability of the conceptual design of the new Web site from the participants’ perspectives as revealed by their observed performance and self-reported satisfaction.  Participants were not asked to identify usability issues as such, because doing so requires professional training and experience.

We tested the interface with seven novices.  We did not recruit experts in the first round of testing because the goal was to see whether users grasped the conceptual design of the site, and we assumed that if novice users understood the site, experts would too (Chadwick-Dias & Bergel, 2010; Redish, 2010; Summers & Summers, 2005).  Participants completed 11 tasks.  The objectives were to identify the users’ expectations and logic as they attempted to find information and to identify design features that supported users’ expectations.  The main objective was to compare the design concepts, known as the designer’s conceptual model of the product (Rubin & Chisnell, 2008), with participants’ understanding of the user interface.  Through low-fidelity testing, we aimed to understand whether users readily grasped the concepts of the new AFF and understood the Web site’s capabilities.

Materials and Testing Procedure

We used preliminary paper versions of the Main page and Search Results page of the user interface and supporting materials (e.g., some “pop-up” pages).  A member of the usability team created the paper prototypes based on mock ups that were designed by the development team and had been presented at an internal Census Bureau presentation.  The participant and test administrator sat side-by-side at a desk in a 10 ft. by 12 ft. room, and the participants pointed at the paper prototypes to “walk through” tasks given to them by the test administrator.  The test administrator acted as the computer (Snyder, 2003) and brought up paper versions of the screens and pop-up windows that would appear if it were a live Web site and overlaid them onto the paper prototype.  The paper prototypes are shown in the left panels of Figure 3 (Main page) and Figure 4 (Search Results page).  The right panels show the prototypes used in forthcoming iterations, as discussed below.  Each session lasted about an hour.

Figure 3

Figure 3. Main page. Iteration 1: left panel,  Iteration 2: right panel

Figure 4

Figure 4. Search Results page.  Iteration 1: left panel,  Iteration 3: right panel

Results

This section highlights accuracy, satisfaction, and some of the high-severity issues that the usability team identified during testing.  The average accuracy score across all tasks and participants was quite low: 40%.  Some participants were unable to complete any of the tasks correctly, and some tasks were not completed correctly by any of the participants.  Accuracy ranged from zero to 82% across participants and from zero to 71% across tasks 2. Satisfaction was also very low: The average satisfaction score was 4.79 (out of 9, with 1 being low and 9 being high).

While we may have been the “bearers of bad news,” no one on the AFF team was surprised at the findings because they had observed participants struggling with the prototypes during the test sessions, and we had been discussing the findings throughout testing.  The contractors were not worried that their contracts might be at risk because usability testing had been planned for as part of the requirements; therefore results, feedback, and changes were expected. The AFF team responded readily to those results.

We examined participants’ behavior and comments, along with the accuracy and satisfaction scores, to assess the usability of the Web site and to infer the likely design elements that caused participants to experience difficulties.  The usability team then grouped usability issues into categories based on severity ratings that were regularly used in our lab.  The purpose of our rankings was to place the issues in a rough order of importance so the designers and developers were aware of the items that should be dealt with first.  The lab defined high-severity issues as those that brought most participants to a standstill (i.e., they could not complete the task), medium-severity issues as those that caused some difficulty or confusion but the participants were able to complete the task, and low-severity issues as those that caused minor annoyances but did not interfere with the flow of the tasks.  The AFF team was familiar with our scale from our previous collaborations.  In the following sections, we highlight the high-severity issues discovered during this usability test.

Finding 1: There was no direct, useful guidance displayed about what the user needed to do.

In commenting on the two site pages we tested, participants said they were confused about what they needed to do to select information and how to proceed to get results.  Some issues were related to placement of information and/or extra items on the page that led to participants being unsure whether parts of the page worked together or separately.  On the Search Results page (left panel of Figure 4), participants did not know where to direct their attention because the page was overloaded with information.  We found that the page needed more white space to allow the design to direct users’ attention to critical regions of a display (Mullet & Sano, 1995).  Further, there was a need for clear directions on what users were supposed to do to understand, interpret, and act on the information they were seeing.  We made recommendations to improve the design of the pages and the communication with the user (e.g., explicit instructions about what the user could do) based on participants’ experience and comments and based on the discussions we had with the observers at the end of each session.  We expected that these changes would lead to users’ better understanding the site and for usability of the site to subsequently increase.

Finding 2: Jargon and poorly-worded statements confused participants. 

Throughout the two pages tested, all participants said that they had trouble understanding Census-specific jargon and poorly-worded statements.  For example, on the Main page, the phrase “Select the topics you would like to see products for below” was not clear.  It was not clear what “below” modified.  Did it modify “see,” as in “see below,” or did it modify “Select the topics” as in “Select the topics…below”?  Also, the word “products” was not clear.  Although it was meant to refer to data tables, charts, maps, and documents, participants said that it conjured up images of cleaning products, groceries, and items produced by manufacturers.  Presumably, the typical, non-technical user does not associate the word “product” with a data table or a document.  Participants said they did not know what many terms meant and that they would not know what to do with these terms on the site.  We recommended reducing jargon and using plain language (Redish, 2007) to make the interface easier to understand.  We gave examples of how the developers could re-word some jargon text.  For example, we recommended they change the word “product” to a simpler term, such as “information,” which conveys the same meaning but isn’t as awkward for people who don’t know Census jargon. 

Finding 3: Icons were confusing for participants. 

Participants said they had difficulty understanding what the action icons along the right side of the Search Results page represented (See Figure 4, right side of left panel).  Many of the participants said they were confused by the different icons, especially the comma-separated values (CSV) and the Excel icons.  In general, participants had different interpretations of the icons.  On the Main page and on some pop-up pages of the Search Results page, participants said they were confused by the red X within a red circle icon (shown in Figure 3, left panel), which was supposed to allow users to delete an item.  On some pop-up screens, the icon was near a black X within a black box icon, which was supposed to show users which items were selected.  People said they were not sure which X icon deleted items, and some said they thought they could select items by clicking on the icons.  While actions, as opposed to objects, may be difficult to depict representationally (Nielsen, 1993a), we recommended creating user-identifiable icons that could be simply understood without text.  We also recommended eliminating either the red X within the red circle icon or the black X within the black box icon.  If these changes were implemented, we expected users’ understanding of the icons to increase.

Plans for Iteration 2

Over 13 days, we discussed findings with the AFF team and discussed potential changes to alleviate the usability issues (which their team had observed).  We documented our verbal conversations in a preliminary report and sent it to the AFF team.  The preliminary report documented the high-severity ranked usability issues and included some mock ups of recommended changes but did not contain the detailed introduction or methodology sections that were in the final report.  We met with the design team three weeks later to recap findings and plan the next test.

There were no conflicts between the development team and the usability team on the recommendations because the problems with the Web site were clear to both teams from usability testing.  The specific recommendations or design changes to ameliorate the problems were different and many.  At the same time we were testing, the AFF team was in the process of conducting a technical assessment of the conceptual design that indicated it was highly complex and could not be built in the time frame allotted. It exceeded the resources available for the solution. The usability results confirmed that the user interface was difficult and more suited to experts, and it gave the contractors valuable feedback on how to go about simplifying the system.

The designers made major changes to the user interface design on both pages to improve usability.  Some of the changes were in response to the sessions they had attended and our joint observations and recommendations, some were based on ideas from a senior member who had just been brought on to the project, and some were based on feedback from stakeholders in other areas of the Census Bureau3. See the right panel in Figure 3 for the new Main page and the left panel in Figure 5 for the new Table View page (one step past the Search Results page, tested in Iteration 1). This page was designed to appear once an item was selected on the Search Results page. With these changes in place, we planned usability testing for the revised Web site.

Figure 5

Figure 5. Table View page. Iteration 2: left panel,  Iteration 3: right panel

Figure 6

Figure 6. Map View page. Iteration 2: left panel,  Iteration 3: right panel


 

2One middle-age participant with a Bachelor’s degree failed all tasks.  All participants failed one task.

3We do not know how our recommendations matched the stakeholders’ recommendations; we only know that our findings were consistent with the feedback from other parts of the Census Bureau, as told to us by the AFF project manager. Stakeholders had not observed the usability sessions.

 

Previous | Next