upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

A Meta-Analytical Review of Empirical Mobile Usability Studies

Constantinos K. Coursaris and Dan J. Kim

Journal of Usability Studies, Volume 6, Issue 3, May 2011, pp. 117 - 171

Article Contents


Results of Analysis

The literature review of empirical research on mobile usability performed appears in the Appendix. The review results are summarized in terms of the context defined in the study, key usability dimensions measured, research methodology used, sample size, and key findings.

The following sets of analysis pertain to the contextual factors studied among the 100 empirical mobile usability studies reviewed. In doing so, the independent variables studied are described under each of the four contextual framework categories of Figure 1. Overall, empirical mobile usability studies have been focused on investigating task characteristics (47%), followed by technology (46%), environment (14%), and user characteristics (14%; where single-nation populations in studies are not included, albeit one might consider them as cultural studies depending on the frame of reference). (Note: distribution exceeds 100% as multiple areas may have been studied in a single study.) Hence, there is a lack of empirical research on the relevance of user characteristics and the impact of the environment on mobile usability. For example, because on-screen keyboards are now a standard of smartphone technology, it would be important to understand the optimal design of on-screen smartphone/mobile device keyboards according to target user groups and their characteristics.

By contrast with our earlier data set of 45 empirical studies published by 2006, the distribution of research emphasis included research on task (56%), user (26%), technology (22%), and environment characteristics (7%). It is interesting to note that the proportion of studies that considered the environment doubled, and part of this increased emphasis is a result of a number of recent studies that compared and contrasted different usability testing methods and environments. Also, many more articles in this study’s larger sample appear to focus on tasks and related technologies far more frequently than on the other two dimensions, i.e., the user and the environment. Thus, it appears that the human needs to be entered back in the Human-Computer Interaction investigations that focus on mobile usability.

Task characteristics: Open and unstructured tasks, and interactivity and complexity understudied

The framework called for the identification of either closed or open tasks. Closed tasks were used most frequently (58%), and examples would include checking the list of received calls, finding a “Welcome Note” on a mobile website or a mobile app, enabling the vibrating alert, setting the phone on silent mode, and other tasks that have a predefined state or outcome. Open tasks were used in 35% of studies, and examples include interacting with a network of services using verbal or visual information, keeping a pocket diary and filling in forms with each use of the Internet, logging in to websites and rewriting web diaries that were first written on a pocket diary, and other tasks that do not have a pre-defined outcome (i.e., the outcome is user dependent). Nine percent of reviewed studies did not report on tasks. Hence, there is a relative lack of research involving open and unstructured tasks. Also, effects of task interactivity and task complexity on mobile usability were not investigated. With the increasingly important role of mobile devices in academia, an important question that arises is to what extent can such devices enhance a learner’s experience; exploring the potential interaction effect between task interactivity and task complexity can help inform the design and use of mobile technology, applications, and services in the classroom or education environments at large.

This research design pattern is fairly consistent with our earlier analysis from 2006, where closed-open tasks were used 69% and 22% respectively (with 9%, again, not reporting). Hence, the same research gap exists surrounding open and unstructured tasks, and factors such as interactivity, complexity, and others as they relate to mobile usability.

User characteristics: A narrow focus on studied user dimensions is prevalent

The most prominent user-related variable studied in empirical mobile usability research was (prior) experience, focusing on either novices (16%), experts (13%), or both (16%). Culture (3%) and job-specific roles (i.e., physicians, engineers; 8%) were also measured. Disability was only explored twice (i.e., 2%), examining the role of technology in assisting users with visual impairment and memory loss respectively. No empirical mobile usability research studied the role of gender or age, and mobility was investigated in just 6% of studies. From these statistics it becomes apparent that research has been limited in both the range and frequency of user characteristics studied. Examples of such limitations are found in the myriad of disabilities that can negatively impact a mobile user’s experience or even prohibit the use of certain services, and yet are extremely underserved.

Comparing these statistics with our 2006 sample, a small shift away from convenient, novice samples (from 25% to 16%) to an examination of the impact of experience (from 9 to 16%) on the dependent constructs appears. Cross-cultural studies did not emerge significantly during this period, which is somewhat surprising considering the uptake of mobile devices around the world; by contrast, work-related context was investigated proportionately twice as much, while convenient samples of students were utilized at similar rates. Thus, the same need and corresponding opportunities for user-centered empirical mobile usability studies still exists.

Technology characteristics: Enabling technology beyond the interface is overlooked in mobile studies

The most popular variable investigated in these studies pertaining to the technology used was the interface. These studies involved mobile phones (44%), PDAs (38%), Pocket PCs (5%), and various interfaces (19%) including a desktop, a tablet PC, a discman, and wearable or prototype devices. Again, these frequencies exceed 100% because a few studies involved multiple devices. The above distribution was quite similar to the 2006 sample. Hence, the lack of research as it relates to technology beyond the interface continues. For example, whether the lack of support for Flash by iOS (available at the time this paper was written) significantly impacts the usability of mobile (iPhone/iPad) users, or to what extent does network interoperability enhances a device’s mobile usability would be of significant value particularly among the practitioner community, while extending previously validated research models and theories in the mobile domain.

Environment characteristics: Area with greatest potential for future mobile usability research

Eleven percent of studies explored factors as they relate to the environment. This focus has shown an increase since the 2006 reported research incidence rate of 7%, partly due to an emphasis on usability evaluation methods becoming more relevant and scholars’ interest in comparing lab to field-based methods. Lighting and noise levels previously studied were joined by studies on sound, temperature, acceleration, humidity, as well as social aspects. Hence, physical, psychosocial, and other environment-specific factors present a significant opportunity for future research in mobile usability. For example, little is known about the impact of co-location (i.e., a mobile user being in physical proximity to other individuals) on the use of a mobile device (e.g., which types of applications are more likely to be used when alone vs. collocated with familiar or unfamiliar individuals). Such insight could further advance the contextual designs of mobile devices, whether through user-configured settings, sensors, or other means.

Methodology characteristics: A call for neuroscience research in mobile usability 

The final set of analysis pertains to the experiment setup and methodology. Laboratory studies were conducted most often (47%), followed by field studies (21%), while 10% of studies involved both. Hence, lab-tested mobile usability research was dominant, which was also the trend found in our 2006 sample. Next, multiple methodologies were identified in these studies, including questionnaires (61%); device data (33%); direct observation (7%); focus groups (7%); discussions (3%); and voice mail and web mail diaries, as well as Think Aloud Method (each at 2%); and single studies leveraging a usability test/expert, evaluation/participatory, design/card, sorting/task analysis. Frequencies of methodology used exceed 100% because most studies (45%) involved a multi-method approach. Specifically, device data were most commonly triangulated with questionnaire (13%), observation (5%), or interview data (4%). However, with only 13% of the studies being the case, there is limited research that contrasts self-reported data with device data, something that has remained unchanged from the results of our 2006 sample. Lastly, there were no studies involving neuroscience, an area that is of particular importance in mobile usability. With the associated cost of the needed technology to employ related methods, e.g., eye tracking and brain imaging, the area is prime for growth and novel contributions to the field. Knowledge dissemination outlets can both benefit and support the fueling of such research through special calls for related works.

Analysis of Mobile Usability Measurement Dimensions

Because the focus of this study was on the usability dimensions measured in empirical mobile usability studies, we reorganized them in terms of usability dimensions. Table 1 presents a summary of these 31 measured usability dimensions.

Table 1. Frequency of Usability Measures Used in the Reviewed Studies

Original List of Measures Collapsed List Of Measures
MEASURES SOURCES COUNT MEASURES UNIQUE COUNT %
Efficiency Barnard, Yi, Jacko, & Sears, 2005; Bohnenberger, Jameson, Kruger, & Butz, 2002; Brewster, 2002; Brewster & Murray, 2000; Bruijn, Spence, & Chong, 2002; Butts & Cockburn, 2002; Buyukkoten, Garcia-Molina, & Paepcke, 2001; Chin & Salomaa, 2009; Chittaro & Dal Cin, 2002; Chittaro & Dal Cin, 2001; Clarkson, Clawson, Lyons, & Starner, 2005; Costa, Silva, & Aparicio, 2007; Duda, Schiel, & Hess, 2002; Fitchett & Cockburn, 2009; Fithian, Iachello, Moghazy, Pousman, & Stasko, 2003; Goldstein, Alsio, & Werdenhoff, 2002; Gupta & Sharma, 2009; Huang, Chou, & Bias, 2006; James & Reischel, 2001; Jones, Buchanan, & Thimbleby, 2002; Kaikkonen, Kallio, Kekäläinen, Kankainen, & Cankar, 2005; Kim, Chan, & Gupta, 2007; Kjeldskov & Graham, 2003; Kjeldskov, Skov, & Stage, 2010; Koltringer & Grechenig, 2004; Langan-Fox, Platania-Phung, & Waycott, 2006; Liang, Huang, & Yeh, 2007; Lindroth, Nilsson, & Rasmussen, 2001; Massimi & Baecker, 2008; Nagata, 2003; Nielsen, Overgaard, Pedersen, Stage, & Stenild, 2006; Olmsted, 2004; Poupyrev, Maruyama, & Rekimoto, 2002; Pousttchi & Thurnher, 2006; Rodden, Milic-Frayling, Sommerer, & Blackwell, 2003; Ross & Blasch, 2002; Ryan & Gonsalves, 2005; Seth, Momaya, & Gupta, 2008; Shami et al., 2005; Sodnik, Dicke, Tomazic, & Billinghurst, 2008; Wigdor, & Balakrishnan, 2003 41 Efficiency 61 33
Errors Andon, 2004; Brewster & Murray, 2000; Butts & Cockburn 2002; Cheverst, Davies, Mitchell, Friday, & Efstratiou, 2000; Danesh, Inkpen, Lau, Shu, & Booth, 2001; Fitchett & Cockburn, 2009; Gupta & Sharma, 2009; Huang et al., 2006; James & Reischel, 2001; Jones, Buchanan, & Thimbleby, 2002; Juola & Voegele 2004; Kaikkonen, 2005; Kaikkonen et al., 2005; Kim, Kim, Lee, Chae, & Choi, 2002; Kjeldskov & Graham, 2003; Koltringer & Grechenig, 2004; Langan-Fox et al., 2006; Lehikoinen & Salminen, 2002; Lindroth et al., 2001; MacKenzie, Kober, Smith, Jones, & Skepner, 2001; Massimi & Baecker, 2008; Nagata, 2003; Palen & Salzman, 2002; Ross & Blasch, 2002; Ryan & Gonsalves, 2005; Waterson, Landay, & Matthews 2002; Wigdor & Balakrishnan, 2003 27 Effectiveness 49 27
Ease of Use Cheverst et al., 2000; Chong, Darmawan, Ooi, & Binshan, 2010; Cyr, Head, & Ivanov, 2006; Ebner, Stickel, Scerbakov, & Holzinger, 2009; Ervasti & Helaakoski, 2010; Fang, Chan, Brzezinski, & Xu, 2003; Fithian et al., 2003; Hinckley, Pierce, Sinclair, & Horvitz, 2000; Hsu, Lu, & Hsu, 2007; Jones, Buchanan, & Thimbleby, 2002; Kim et al., 2002; Kim et al., 2007; Kim et al., 2010; Li & Yeh, 2010; Licoppe & Heurtin, 2001; Mao, Srite, Thatcher, & Yaprak, 2005; Massey, Khatri, & Ramesh, 2005; Olmsted, 2004; Pagani, 2004; Palen & Salzman, 2002; Pousttchi & Thurnher, 2006; Qiu, Zhang, & Huang, 2004; Roto, Popescu, Koivisto, & Vartiainen, 2006; Ryan & Gonsalves, 2005; Wu & Wang, 2005; Xu, Liao, & Li, 2008 26 Satisfaction 18 10
Usefulness Bødker, Gimpel, & Hedman, 2009; Chong et al., 2010; Cyr et al., 2006; Ebner et al., 2009; Ervasti & Helaakoski, 2010; Fang et al. 2003; Fithian et al., 2003; Hsu et al., 2007; Hummel, Hess, & Grill, 2008; Kim et al., 2010; Li & Yeh, 2010; Mao et al., 2005; Pagani, 2004; Palen & Salzman, 2002; Pousttchi & Thurnher, 2006; Wu & Wang, 2005; Xu et al., 2008 17 Accessibility 15 8
Effectiveness Barnard et al., 2005; Bohnenberger et al., 2002; Brewster, 2002; Brewster & Murray, 2000; Chin & Salomaa, 2009; Costa et al., 2007; Duh, Tan, & Chen, 2006; Goldstein et al., 2002; Huang et al., 2006; Kleijnen, Ruyter, & Wetzels, 2007; Liang et al., 2007; Nielsen et al., 2006; Pousttchi & Thurnher, 2006; Ryan & Gonsalves, 2005; Shami et al., 2005; Sodnik et al., 2008 16 Learnability 8 4
Satisfaction Dahlberg & Öörni, 2007; Ebner et al., 2009; Huang et al., 2006; Hummel et al., 2008; Juola & Voegele, 2004; Kallinen, 2004; Kim et al., 2002; Kim et al., 2007; Kleijnen et al., 2007; Lindroth, 2001; Nielsen et al., 2006; Olmsted, 2004; Palen & Salzman, 2002; Ryan & Gonsalves, 2005; Shami et al., 2005 15 Workload 7 4
Accuracy Barnard et al., 2005; Burigat, Chittaro, & Gabrielli, 2008; Clarkson et al., 2005; Duh et al., 2006; Keeker, 1997; Koltringer & Grechenig, 2004; Olmsted, 2004; Thomas & Macredie, 2002; Wigdor & Balakrishnan, 2003; Wu & Wang, 2005 10 Enjoyment 4 2
Learnability Butts & Cockburn, 2002; Dahlberg & Öörni, 2007; Fithian et al., 2003; Kaikkonen et al., 2005; Lindroth, 2001; MacKenzie et al., 2001; Roto et al., 2006; Ryan & Gonsalves, 2005 8 Acceptability 3 2
Workload Barnard et al., 2005; Chan, Fang, & Brzezinski, 2002; Chin & Salomaa, 2009; Jones, Jones, Marsden, Patel, & Cockburn, 2005; Li & McQueen, 2008; Seth et al., 2008; Sodnik et al., 2008 7 Quality 3 2
Accessibility King & Mbogho, 2009; Mao et al., 2005; Pagani, 2004; Palen, Salzman & Youngs, 2001; Suzuki et al., 2009 6 Security 3 2
Reliability Andon, 2004; Barnard et al., 2005; Costa et al., 2007; Kleijnen et al., 2007; Lin, Goldman, Price, Sears, & Jacko, 2007; Wu & Wang, 2005 6 Aesthetics 4 2
Attitude Goldstein et al., 2002; Juola & Voegele 2004; Khalifa & Cheng, 2002; Palen & Salzman, 2002; Strom, 2001 5 Utility 2 1
Problems Observed Kaikkonen, 2005; Kaikkonen et al., 2005; Kjeldskov & Graham, 2003; Nielsen et al., 2006 4 Memorability 2 1
Enjoyment Cyr et al., 2006; Ebner et al., 2009; Hummel, 2008; Kim et al., 2010 4 Content 2 1
Acceptability Andon, 2004; Butts & Cockburn, 2002; Juola & Voegele 2004 3 Flexibility 1 1
Quality Barnard, Yi, Jacko, & Sears, 2007; Bødker et al., 2009; Kleijnen et al., 2007 3 Playfulness 1 1
Security Andon, 2004; Fang et al., 2003; Kim et al., 2007 3

 

 

 

Aesthetics Cyr et al., 2006; Li & Yeh, 2010; Wang, Zhong, Zhang, Lv, & Wang, 2009 3

 

 

 

Utility Duda et al., 2002; Hassanein & Head, 2003 2

 

 

 

Operability Chittaro, Dal Cin, 2002; Kaikkonen et al., 2005 2

 

 

 

Memorability Langan-Fox et al., 2006; Lindroth et al., 2001 2

 

 

 

Responsiveness Barnard et al., 2007; Kleijnen et al., 2007 2

 

 

 

Content Kim, Kim, & Lee, 2005; Koivumäki, Ristola, & Kesti, 2006 2

 

 

 

Attractiveness Lin et al., 2007 1

 

 

 

Flexibility Cheverst et al., 2000 1

 

 

 

Playfulness Fang et al., 2003 1

 

 

 

Technicality Hummel et al., 2008 1

 

 

 

Availability Pagani, 2004 1

 

 

 

Functionality Pagani, 2004 1

 

 

 

Interconnectivity Andon, 2004 1

 

 

 

Integrity Costa et al., 2007 1

 

 

 

A preliminary inspection of Table 1 shows that the constructs of efficiency, errors, ease of use, effectiveness, satisfaction, and learnability are most commonly measured in empirical mobile usability studies. All of these measures were defined in the work of Han et al. (2001) on the classification of performance and image/impression dimensions with slight variations. The measure of errors was defined by Nielsen (1993) as the “number of such actions made by users while performing some specified task” (p.32). Han et al. (2001) addressed errors through two measures: (a) error prevention (i.e., “ability to prevent the user from making mistakes and errors” p. 147) and (b) effectiveness (i.e., “accuracy and completeness with which specified users achieved specified goals” p.147). With respect to the reviewed literature, mobile usability studies measured the error rate, as opposed to error prevention, associated with the system. Hence, we collapsed the errors, accuracy, and problems observed measures found in this literature review with effectiveness (effectiveness offering a broader definition and operationalization). This broader interpretation of effectiveness may be extended to encompass the extent to which a system achieves its intended objective, or simply put, its usefulness. Hence, the latter may also be collapsed with effectiveness. Similarly, the second order measure of efficiency often attempts to capture the first-order factor of ease of use. This is supported conceptually, because the “easier” a system is to use the less resources are consumed during the task. Hence, ease of use may be collapsed with efficiency. Furthermore, Shackel defined attitude as the “level of user satisfaction with the system” (2009, p 341). Han et al. (2001) defined satisfaction as “the degree to which a product is giving contentment or making the user satisfied” p.147.  Hence, attitude (as defined in these usability studies) may be collapsed into the single measure of satisfaction. It should be noted that the frequency count for each collapsed criterion is based on unique counts of a particular publication (i.e., if errors and effectiveness were found in the same study, the publication would count these only once for the unique count). In addition, accessibility had been studied in most cited studies as the degree to which a system was accessible; this was just to clarify from the scope accessibility in the context of vulnerable/disabled users. Hence, other measures found in studies that speak to this concept include reliability, responsiveness, availability, functionality, and interconnectivity, and can be collapsed under accessibility. Lastly, attractiveness speaks to the broader concept of aesthetics, and integrity is a security dimension, so these can be grouped respectively.

Upon review of the measures’ relative appearance in the reviewed literature the three core constructs for the measurement of usability appear to be the following:

The above findings are arguably neither surprising nor favorable for the field, as these factors have been set as the standard for more than a decade, regardless of significant technology advances and use settings and scenarios—the usability scholar’s lens has gone unchanged. However, the growing popularity of games and similarly engaging and hedonically oriented experiences in the use of mobile devices might suggest that both the factors studied and the definitions set forth for mobile usability may be revisited before too long.

The remaining measures identified in Table 1 reflect the peripheral dimensions measured in empirical mobile usability studies cited in the Appendix, including Accessibility (8%), Learnability (4%), Workload (4%), Aesthetics (2%), Enjoyment (2%), Acceptability (2%), Quality (2%), Security (2%), Utility (1%), Playfulness (1%), Memorability (1%), Content (1%), and Flexibility (1%).

 

Previous | Next