upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

Adapting Web 1.0 Evaluation Techniques for E-Government in Second Life

Alla Keselman, Victor Cid, Matthew Perry, Claude Steinberg, Fred B. Wood, and Elliot R. Siegel

Journal of Usability Studies, Volume 6, Issue 4, August 2011, pp. 204 - 225

Article Contents


Expert Panel Reviews: Findings

The sessionsí findings indicated that many Web 1.0 evaluation metrics and methods appeared applicable to VW environments (Table 2). The panels recorded possible applications of these methods and metrics to VWs and suggested potential modifications. The details are as follows.

Table 2. Web Evaluation Methods and Their Potential VW Analogues

Table 2

Usability

The panel concluded that most of Web 1.0 usability metrics were relevant for VWs, with efficiency being somewhat less important and enjoyment/satisfaction extremely important (see Table 3). This has to do with the goals of VW interactions, which are more likely to be about immersion, socialization, and exploration, and less likely to be about the shortest path to the needed information. Ease of navigation and learnability were deemed relevant and likely to be impacted by virtual topography of the 3-D environment. Like usability metrics, most traditional usability methods are also relevant in VWs according to the panel experts. User testing and focus groups were thought largely applicable, with more adjustments needed for user testing. Heuristic reviews might be applicable in the future; however, a separate investigation is needed to adapt existing leading heuristics to VWs.

Figure 4

Figure 4. Expert panelís recommendations

Table 3. Relevance of Traditional Suability Metrics to Virtual Worlds (based on the expert panel)

Table 3

The panelists felt that modifications required for adapting traditional Web user testing methods to VWs were related to the unique characteristics of VW environments. One such characteristic is the social nature of the VWs: user experience and satisfaction are likely to be greatly affected by interactions with other avatars, their appearance, experience, friendliness, etc. The other characteristic is the game-like or quest-like nature of many VW environments. To preserve the authenticity of the experience during testing, panel members recommended that our VW usability evaluation engage multiple participants and motivate participants in search-oriented tasks by presenting them a quest or game-like format within the VW.

The expert panel participants also indicated that VW environments may require innovative methods for inferring the usersí intentions, as the complex, multi-tasking nature of a VW experience may complicate eliciting think-aloud protocols. This challenge could potentially be addressed through post-study think-alouds and question-and-answer times and activities that fit into the flow of the experience. Another potential methodological challenge lies in the need to distinguish between usability of the VW platform (e.g., Second Life), which is beyond the application designerís control, and the application created on that VW platform (e.g., Tox Town in Second Life). This issue resembles usability testing of the earlier days of the Internet, when browser controls were not standardized and users were not uniformly experienced with them (e.g., Netscape Navigator 3.0 vs. Internet Explorer 2.0). Addressing this challenge may require testing experienced VW users and novices separately, and providing novices with an introduction to and support for the specific VW platform.

The panelists believed that in addition to unique theoretical and methodological challenges, usability testing in VWs is likely to involve distinctive technical considerations. For example, to fully understand the nature of VW interactions, evaluators might want to record sessions by multiple participants, each participating on a different monitor simultaneously, but with views of all the monitors captured in parallel on a single recording. This will allow future comparison of user actions, avatar motions or viewing angles at any given point in time. With the help of a screen mirroring utility like Real VNC, this feature appears to be supported by Morae version 3.1, at least for two screens at a time. Displaying and recording more than two participantsí screens in tandem would require testing Morae with multiple graphics cards.

User Feedback

The panelists suggested that similarly to usability, some VW user satisfaction metrics are likely to mirror those of the traditional Web (e.g., Content, Functionality, Look and Feel, Search and Navigation of the American Customer Satisfaction Index, ACSI). Others are likely to resemble satisfaction dimensions of video games (Isbister & Schaffer, 2008) and include, among other things, the ability to interact with others and allow users to control avatars and the environment. Additional research is needed to define specific metrics. At the present time, assessment of satisfaction should combine traditional Web and the newer gaming measures.

Panelists felt that the leading methods of obtaining user feedback about traditional Web applications, focus groups, and surveys are applicable to VWs. Further, the panelist thought VWs would allow greater variability of communication modes and triggers. In addition to the traditional face-to-face, Web and phone modes, focus groups about VW applications can be conducted in-world via typed chat or audio (i.e., avatars talking within the VW via their users communicating using their computer microphones/speakers). VW surveys can be triggered via static invitations (e.g., banners), event-based pop-ups (e.g., based on an avatarís proximity to an object), or by an invitation from a pollster, who can be a staffed avatar or a drone. It was additionally noted that surveys can be implemented in a Web browser, via popular tools such as Zoomerang or Survey Monkey, or through in-world interactive scripts embedded within objects. As users of VWs may reflect their in-world personas more than their real-world personas, more research is needed to determine what effects this may have in in-world survey responses (Palfrey & Gasser, 2008).

Performance

On the Web side, many organizations collect IT performance information at different levels of detail and for different specific purposes (e.g., for content management purposes or technical infrastructure management purposes). For the purposes of this study, our focus was on basic performance information that can support the maintenance of the information content. It was suggested that useful performance metrics should at a minimum include content availability and download speed over time. In our panel sessions and through subsequent empirical testing, we were unable to identify automated performance tools that we considered useful for VWs. Linden Labs makes available some performance metrics in their client software, but it is difficult to relate the data they provide to the actual information content in the VW.

Usage

The panel agreed that existing automated commercial services (e.g., Maya Technologies) provide a number of useful metrics per period of time, including: number of unique visitors, total visits, unique visitors interacting with specific content, number of hours spent by visitors at location, and average visit duration. These same metrics are used on the Web today. Commercial VW usage services also employ some unique data visualization methods, such as 3 D heat maps. The challenge lies in expanding to VWs some metrics that are common on the Web, but are not easy to define in VWs, such as page views. A VW metric similar to page views could be useful, but the part of the content visible to the user at any given moment is difficult to determine automatically, even when the location of the avatar is known.

 

Previous | Next