Adapting Web 1.0 Evaluation Techniques for E-Government in Second Life
Alla Keselman, Victor Cid, Matthew Perry, Claude Steinberg, Fred B. Wood, and Elliot R. Siegel
Journal of Usability Studies, Volume 6, Issue 4, August 2011, pp. 204 - 225
Article Contents
Pilot Exercises: Results
The following sections discuss the results of this project: usability, user feedback, performance, and usage.
Usability
The following subsections discuss the pros and cons of multi-player quest, ability to obtain known usability metrics, and communication challenges, strategies during the session, and technical issues with recording and preparing data.
Pros and cons of a multi-player quest
The Infothon exercise succeeded in simulating “real-life” VW experience, characterized by multi-player interactions and complex navigational choices. However, administering and analyzing data from multi-user sessions proved challenging. While we anticipated not being able to follow each participant throughout the whole performance, we expected that frequent spot-checking, supplemented by video analysis, would suffice. However, as described in the subsequent sections, inferring participants’ intentions and attracting their attention with clarifying questions proved more difficult than in traditional usability studies.
Because interactivity and socialization are key to VW engagement, we continued seeing value in retaining the authenticity of the VW experience and conducting multi-user sessions. Improvements to our approach may include specifying the order in which tasks should be performed, providing participants with ways for indicating the beginning and the end of each task, and assigning separate moderators to teams or pairs of participants. Proposed techniques for improving the moderator’s ability to communicate with participants are described in a separate subsection, Communication challenges and strategies during the session.
Ability to obtain known usability metrics
The expert panel concluded that most traditional usability metrics, such as learnability, ease of navigation, efficiency, user satisfaction, and user errors were applicable and measurable in VWs. Table 4 summarizes specific component variables of these metrics and summarizes our findings about their ease of implementation in this exercise.
Table 4. Ease of Assessing Specific Traditional Usability Metrics in the Infothon

Traditionally, learnability and navigation ease are measured as a user’s ability to accomplish a task (task completion) and find optimal or near-optimal paths to desired information or an object without retracing. Evaluating task completion for information seeking in the exercise proved relatively straightforward: participants succeeded if they obtained the correct answer to a question. Evaluating successful completion for finding objects was slightly more complex, as the moderator did not follow every participant, probing questions were necessary to verify success. Judging partial success proved more difficult than in Web studies, where it is defined as (a) going to the right destination page but missing the answer or (b) going down the right path but veering astray at the end. As avatar movement in VWs is a continuous flow rather than a sequence of discreet steps, it was often difficult to determine whether participants passing by a relevant location were overlooking it or ignoring it. Table 5 provides examples of partial task completion in Second Life, as compared to the traditional Web.
Table 5. Second Life Analogues of Traditional Measures of Partial Task Completion

In traditional Web usability, navigation ease is usually measured as users’ ability to find the shortest path to the destination, using their understanding of the site’s architecture and navigational aids (e.g., menus). The exercise brought to our attention three factors that added complexity to evaluating navigation in VWs:
- The first was the variety of ways to move about Second Life: users could teleport, fly, or run/walk. If the optimal path is viewed as being the shortest path to the information, then teleporting directly to the information object is more desirable than flying; and, flying is then more desirable than running/walking. While this hierarchy provides an easy to score metric, it fails to address potential advantages of slowing down (e.g., exploration of the VW environment and/or looking for social contacts to interact).
- The second complicating factor was that compared with Web 1.0, navigational aids are less common and often do not persist from one scene to another. For example: Tox Town in Second Life has a billboard with a town map on the main square, but the map cannot be picked up and carried around. Our evaluation of the navigational paths, therefore, largely focused on the placement and helpfulness of navigational aids. Table 6 presents Second Life analogues of Web 1.0 problems with navigational menus, identified in our pilot.
- The third complicating factor was the importance of social interactivity and user satisfaction in VWs. In some situations, the optimal path may not be the shortest, but the one that leads to the most interesting, interactive, satisfying “travel” experience.
Table 6. VW Analogues of Web 1.0 Navigational Aids Problems

Efficiency, or the ability to accomplish tasks with minimal effort, is closely related to ease of navigation. In Web 1.0, efficiency is typically measured as time on task and number of steps (clicks) leading to completion. As suggested earlier, taking the shortest, fastest path is not always optimal in VWs. When efficiency is desirable, however, measuring it as time on task is straightforward, unless participants are multi-tasking. Assessing the number of steps in the continuous flow if the VW experience is more challenging, due to the variety of ways to move about a VW: teleport, fly, or run/walk.
User errors are behaviors that lead to task failures of two kinds: (a) incorrect assumption of task completion and (b) user confusion and/or frustration. In our exercise, incorrect assumptions of success were rare. Errors leading to confusion fell into several distinct categories. One involved encountering the relevant information source (opening a note card), but failing to locate the information. Another was searching for information in the wrong places. These difficulties were closely related to the effect of the 3-D space and the physical space metaphor of the town. Information in Tox Town in Second Life is distributed among physical objects and “traditional” information resources. For example, information about a water pollutant chemical may be found in the Tox Town in Second Life environment upon interaction with a water fountain, as well as on a Web poster in the library. Users were much more likely to attempt to utilize the physical space metaphor and search for information in objects, rather than information products. If a seemingly relevant object or place did not exist or did not contain desired information, this led to confusion. Yet another error category was similarly related to the town space metaphor and included walking into doors that were not open, getting “stuck” behind virtual objects, etc. The exercise suggests that user errors, identified in VW usability studies, are highly informative for the design of virtual environments.
Communication challenges and strategies during the session
VW interactions typically involve simultaneously managing many streams of synchronous and asynchronous communication with multiple avatars. Communication modes in Second Life include voice and local text avatars in close proximity, and non-local instant messaging for avatars at a greater distance or for private communication (local chat is visible to all nearby). The high demand that Second Life places on users’ attention impacted both communications with the moderator and the interactions among the participants. Participants and the moderator often had multiple chat tabs open on their screen. Periodically, the moderator would pose a question and not receive a response, either because the question would display in a hidden tab or because the participant was preoccupied. Similar failures to connect were happening among the participants. For the second of the two usability sessions, the problem was solved by creating chat groups, which reduced the number of communication tabs each participant needed to control—a solution we would recommend.
A special case of communication involved the moderator’s interviews with participants about the answers to the quest questions, conducted via text chat. In an attempt to make the process more efficient, the moderator had created chat macros (customized text created as “gestures” in Second Life) with pre-entered interview questions, which were triggered by keywords. The strategy was effective, although some questions required real-time modifications and qualification. To avoid accidentally activating a macro in the course of spontaneous conversation, we recommend using uncommon words as triggers. Another limitation involved Second Life’s limit on the length of chat text charters on one line that caused some questions to be split. Sometimes, portion of a question vanished from the local chat before they could be read.
Technical issues with recording and preparing data
Technologies originally created for conducting evaluations of Internet sites present their own challenges for conducting user research in VWs. Our screen recording and screen event tracking tool, Techsmith Morae, was not originally designed for testing in VWs. As such, it exhibited several limitations:
- Recording files in Morae’s proprietary format, already large and cumbersome for Web studies where page content remains the same for minutes at a time, increased in size by an order of magnitude to keep pace with the fast paced actions of avatars on screen. Analysis of the resulting video in Morae Manager on all but the highest end computers became painfully slow. We therefore recommend saving or converting recordings to a more manageable video format and viewing the recordings in conjunction with an exported time-coded notation log.
- Because Morae cannot see inside the Second Life browser window, it cannot track browser events as it can in a browser like Internet Explorer. Users must click to an external link in another browser for Morae to register any activity.
- Keystroke tracking, though useful for tracking search strings on Web sites and unaffected by the browser in use, was not an effective means of recording participant chats as each keystroke appears on a separate line. We recommend instead having each user configure the VW browser to record text chats before interaction with other avatars begins. We recognize that while chat log recorders in certain VWs (like There) record all keystrokes, those in other VWs (like Second Life) record only what is actually sent to other avatars.
User Feedback
The following sections discuss focus groups and employing opinion surveys to assess user feedback.
Focus groups
The discussion during the post-activity, in-world focus group was extremely lively. The discussion occurred via text messaging, with no communication delays or difficulty obtaining participants’ attention. Based on this pilot exercise, we recommend in-world focus groups as method for capturing user feedback.
Employing opinion surveys
The Web-based and the in-world survey formats collected a similar number of responses (10 for the Web survey and 11 for the in-world version). Given the small sample, we did not attempt to collect data on the preferred invitational format (pop-up vs. posters and signs). Some participants expressed annoyance at receiving multiple pop-up invitations while they were engaged in other VW activities. These participants expected the invitations to desist once they had either accepted or refused the invitation. It may be possible to increase participation by placing in-world survey invitations at likely destination points rather than en route to destinations.
Overall, participants did not experience difficulties with the opinion survey. However, comparing survey results for the in-world and the Web versions suggests the need to further explore the effect of survey medium, trigger, and timing on reported satisfaction. On five out of seven questions about the ease of Tox Town in Second Life, mean responses were more positive for the in-world version. The small sample size and uncertainty about the actual number of survey-takers prevent us from any conclusive interpretation of the situation. Possibly, completing the surveys in-world felt more like part of the Second Life experience, while transferring to take the survey in a Web browser outside of the VW created more distance and a more critical outlook. It is also possible that the disparity has to do with the place and time, rather than the medium of the survey. Until we better understand the influences of the medium, trigger, and timing on satisfaction, it may be advisable to standardize these variables in evaluations.
Performance
The application performance experiments were not conducted during the Infothon, but at a later date. To run our experiments, we chose a VW scene that had multiple textures with textual and graphical information resembling an exhibit hall. Our prototype performance monitoring software reported an average of 114 seconds to render the specific Second Life scene used during the duration of the experiment. These times varied from 95 to 155 seconds. No real connectivity problems occurred during the testing phase, but our tool was able to report simulated connectivity failures and VW platform down times. To emulate the type of performance monitoring service performed by firms like Keynote, Inc., or Gomez on the Web, a more elaborated version of our tools could run in a number of computers across the Internet and the measured results be reported to a central location for reporting and alerting when undesirable performance, based on baseline measurements, is detected. This way, application developers on the Second Life platform could have a way to monitor the average time it takes users to access their content, evaluate the impact of content changes on user experience, and determine how reliably the platform keeps the content accessible to users. One of the potential problems of this approach is that it’s not possible to reliably test the same scene from different locations at the exact same time. For example, a single avatar cannot be used to do the testing at the same time from multiple computers, and even if different avatars are used they may interfere with each other during the rendering of the scene.
Usage
Maya Realities sends its paying Second Life clients a weekly usage report via email every Sunday. They reported 248 avatar visits from 73 unique visitors to the Tox Town in Second Life region during the week of the Infothon. These numbers are larger than the number of avatars used for the usability exercise, but can be explained by the fact that the increased avatar activity on the region attracted other avatars to the island. It is common in Second Life that users tend to visit regions that show more activity on the Second Life maps (crowds attract more people). The usage statistics also show that the average visit duration was 91 minutes, for a total of 111 hours of combined use time of the region. See Figure 5 for the heatmap of the region for that week and Figure 6 for a chart of the total number of minutes per day spent by visitors to the region during that week (the Infothon took place on November 18, 2009). The Maya Realities Web site allows access, via a private user ID and password, to detailed daily and hourly usage statistics. The statistics reported during the hours of the Infothon were consistent with our observations during exercise. Therefore the data provided by this commercial service reflects the actual use of the virtual space. However, the actual use that the avatars make of the information content in the virtual world is not captured.

Figure 5. Maya Realities’ Virtual NLM heatmap during week of Infothon: Every blue point represents an avatar position every minute; red dots represent avatars staying inactive in the same location for more than 10 minutes.

Figure 6. Number of minutes spent by all visitors on Virtual NLM region during the week of the Infothon
