upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

Adapting Web 1.0 Evaluation Techniques for E-Government in Second Life

Alla Keselman, Victor Cid, Matthew Perry, Claude Steinberg, Fred B. Wood, and Elliot R. Siegel

Journal of Usability Studies, Volume 6, Issue 4, August 2011, pp. 204 - 225

Article Contents


Pilot Exercises: Results

The following sections discuss the results of this project: usability, user feedback, performance, and usage.

Usability

The following subsections discuss the pros and cons of multi-player quest, ability to obtain known usability metrics, and communication challenges, strategies during the session, and technical issues with recording and preparing data.

Pros and cons of a multi-player quest

The Infothon exercise succeeded in simulating “real-life” VW experience, characterized by multi-player interactions and complex navigational choices. However, administering and analyzing data from multi-user sessions proved challenging. While we anticipated not being able to follow each participant throughout the whole performance, we expected that frequent spot-checking, supplemented by video analysis, would suffice. However, as described in the subsequent sections, inferring participants’ intentions and attracting their attention with clarifying questions proved more difficult than in traditional usability studies.

Because interactivity and socialization are key to VW engagement, we continued seeing value in retaining the authenticity of the VW experience and conducting multi-user sessions. Improvements to our approach may include specifying the order in which tasks should be performed, providing participants with ways for indicating the beginning and the end of each task, and assigning separate moderators to teams or pairs of participants. Proposed techniques for improving the moderator’s ability to communicate with participants are described in a separate subsection, Communication challenges and strategies during the session.

Ability to obtain known usability metrics

The expert panel concluded that most traditional usability metrics, such as learnability, ease of navigation, efficiency, user satisfaction, and user errors were applicable and measurable in VWs. Table 4 summarizes specific component variables of these metrics and summarizes our findings about their ease of implementation in this exercise.

Table 4. Ease of Assessing Specific Traditional Usability Metrics in the Infothon

Table 4

Traditionally, learnability and navigation ease are measured as a user’s ability to accomplish a task (task completion) and find optimal or near-optimal paths to desired information or an object without retracing. Evaluating task completion for information seeking in the exercise proved relatively straightforward: participants succeeded if they obtained the correct answer to a question. Evaluating successful completion for finding objects was slightly more complex, as the moderator did not follow every participant, probing questions were necessary to verify success. Judging partial success proved more difficult than in Web studies, where it is defined as (a) going to the right destination page but missing the answer or (b) going down the right path but veering astray at the end. As avatar movement in VWs is a continuous flow rather than a sequence of discreet steps, it was often difficult to determine whether participants passing by a relevant location were overlooking it or ignoring it. Table 5 provides examples of partial task completion in Second Life, as compared to the traditional Web.

Table 5. Second Life Analogues of Traditional Measures of Partial Task Completion

Table 5

In traditional Web usability, navigation ease is usually measured as users’ ability to find the shortest path to the destination, using their understanding of the site’s architecture and navigational aids (e.g., menus). The exercise brought to our attention three factors that added complexity to evaluating navigation in VWs:

Table 6. VW Analogues of Web 1.0 Navigational Aids Problems

Table 6

Efficiency, or the ability to accomplish tasks with minimal effort, is closely related to ease of navigation. In Web 1.0, efficiency is typically measured as time on task and number of steps (clicks) leading to completion. As suggested earlier, taking the shortest, fastest path is not always optimal in VWs. When efficiency is desirable, however, measuring it as time on task is straightforward, unless participants are multi-tasking. Assessing the number of steps in the continuous flow if the VW experience is more challenging, due to the variety of ways to move about a VW: teleport, fly, or run/walk.

User errors are behaviors that lead to task failures of two kinds: (a) incorrect assumption of task completion and (b) user confusion and/or frustration. In our exercise, incorrect assumptions of success were rare. Errors leading to confusion fell into several distinct categories. One involved encountering the relevant information source (opening a note card), but failing to locate the information. Another was searching for information in the wrong places. These difficulties were closely related to the effect of the 3-D space and the physical space metaphor of the town. Information in Tox Town in Second Life is distributed among physical objects and “traditional” information resources. For example, information about a water pollutant chemical may be found in the Tox Town in Second Life environment upon interaction with a water fountain, as well as on a Web poster in the library. Users were much more likely to attempt to utilize the physical space metaphor and search for information in objects, rather than information products. If a seemingly relevant object or place did not exist or did not contain desired information, this led to confusion. Yet another error category was similarly related to the town space metaphor and included walking into doors that were not open, getting “stuck” behind virtual objects, etc. The exercise suggests that user errors, identified in VW usability studies, are highly informative for the design of virtual environments.

Communication challenges and strategies during the session

VW interactions typically involve simultaneously managing many streams of synchronous and asynchronous communication with multiple avatars. Communication modes in Second Life include voice and local text avatars in close proximity, and non-local instant messaging for avatars at a greater distance or for private communication (local chat is visible to all nearby). The high demand that Second Life places on users’ attention impacted both communications with the moderator and the interactions among the participants. Participants and the moderator often had multiple chat tabs open on their screen. Periodically, the moderator would pose a question and not receive a response, either because the question would display in a hidden tab or because the participant was preoccupied. Similar failures to connect were happening among the participants. For the second of the two usability sessions, the problem was solved by creating chat groups, which reduced the number of communication tabs each participant needed to control—a solution we would recommend.

A special case of communication involved the moderator’s interviews with participants about the answers to the quest questions, conducted via text chat. In an attempt to make the process more efficient, the moderator had created chat macros (customized text created as “gestures” in Second Life) with pre-entered interview questions, which were triggered by keywords. The strategy was effective, although some questions required real-time modifications and qualification. To avoid accidentally activating a macro in the course of spontaneous conversation, we recommend using uncommon words as triggers. Another limitation involved Second Life’s limit on the length of chat text charters on one line that caused some questions to be split. Sometimes, portion of a question vanished from the local chat before they could be read.

Technical issues with recording and preparing data

Technologies originally created for conducting evaluations of Internet sites present their own challenges for conducting user research in VWs. Our screen recording and screen event tracking tool, Techsmith Morae, was not originally designed for testing in VWs. As such, it exhibited several limitations:

User Feedback

The following sections discuss focus groups and employing opinion surveys to assess user feedback.

Focus groups

The discussion during the post-activity, in-world focus group was extremely lively. The discussion occurred via text messaging, with no communication delays or difficulty obtaining participants’ attention. Based on this pilot exercise, we recommend in-world focus groups as method for capturing user feedback.

Employing opinion surveys

The Web-based and the in-world survey formats collected a similar number of responses (10 for the Web survey and 11 for the in-world version). Given the small sample, we did not attempt to collect data on the preferred invitational format (pop-up vs. posters and signs). Some participants expressed annoyance at receiving multiple pop-up invitations while they were engaged in other VW activities. These participants expected the invitations to desist once they had either accepted or refused the invitation. It may be possible to increase participation by placing in-world survey invitations at likely destination points rather than en route to destinations.

Overall, participants did not experience difficulties with the opinion survey. However, comparing survey results for the in-world and the Web versions suggests the need to further explore the effect of survey medium, trigger, and timing on reported satisfaction. On five out of seven questions about the ease of Tox Town in Second Life, mean responses were more positive for the in-world version. The small sample size and uncertainty about the actual number of survey-takers prevent us from any conclusive interpretation of the situation. Possibly, completing the surveys in-world felt more like part of the Second Life experience, while transferring to take the survey in a Web browser outside of the VW created more distance and a more critical outlook. It is also possible that the disparity has to do with the place and time, rather than the medium of the survey. Until we better understand the influences of the medium, trigger, and timing on satisfaction, it may be advisable to standardize these variables in evaluations.

Performance

The application performance experiments were not conducted during the Infothon, but at a later date. To run our experiments, we chose a VW scene that had multiple textures with textual and graphical information resembling an exhibit hall. Our prototype performance monitoring software reported an average of 114 seconds to render the specific Second Life scene used during the duration of the experiment. These times varied from 95 to 155 seconds. No real connectivity problems occurred during the testing phase, but our tool was able to report simulated connectivity failures and VW platform down times. To emulate the type of performance monitoring service performed by firms like Keynote, Inc., or Gomez on the Web, a more elaborated version of our tools could run in a number of computers across the Internet and the measured results be reported to a central location for reporting and alerting when undesirable performance, based on baseline measurements, is detected. This way, application developers on the Second Life platform could have a way to monitor the average time it takes users to access their content, evaluate the impact of content changes on user experience, and determine how reliably the platform keeps the content accessible to users. One of the potential problems of this approach is that it’s not possible to reliably test the same scene from different locations at the exact same time. For example, a single avatar cannot be used to do the testing at the same time from multiple computers, and even if different avatars are used they may interfere with each other during the rendering of the scene.

Usage

Maya Realities sends its paying Second Life clients a weekly usage report via email every Sunday. They reported 248 avatar visits from 73 unique visitors to the Tox Town in Second Life region during the week of the Infothon. These numbers are larger than the number of avatars used for the usability exercise, but can be explained by the fact that the increased avatar activity on the region attracted other avatars to the island. It is common in Second Life that users tend to visit regions that show more activity on the Second Life maps (crowds attract more people). The usage statistics also show that the average visit duration was 91 minutes, for a total of 111 hours of combined use time of the region. See Figure 5 for the heatmap of the region for that week and Figure 6 for a chart of the total number of minutes per day spent by visitors to the region during that week (the Infothon took place on November 18, 2009). The Maya Realities Web site allows access, via a private user ID and password, to detailed daily and hourly usage statistics. The statistics reported during the hours of the Infothon were consistent with our observations during exercise. Therefore the data provided by this commercial service reflects the actual use of the virtual space. However, the actual use that the avatars make of the information content in the virtual world is not captured.

Figure 5

Figure 5. Maya Realities’ Virtual NLM heatmap during week of Infothon: Every blue point represents an avatar position every minute; red dots represent avatars staying inactive in the same location for more than 10 minutes.

Figure 6

Figure 6. Number of minutes spent by all visitors on Virtual NLM region during the week of the Infothon

 

Previous | Next