upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

Improving the Usability of E-Book Readers

Eva Siegenthaler, Pascal Wurtz, Rudolf Groner

Journal of Usability Studies, Volume 6, Issue 1, November 2010, pp. 25 - 38

Article Contents


Results

The following sections discuss the reading performance, results of usability ratings, and comparison between subjective user data and objective eye-movement data.

Reading Performance

We analyzed reading performance and eye-movement data (for a detailed eye-movement analysis, see Siegenthaler, Wurtz & Groner, 2010). Analysis of reading speed was based on the time codes of the video recordings. The start and stop times of reading and page-turns were coded for later statistical analysis.

Statistical analysis was performed using F-statistics based on a repeated measures ANOVA with the within factor “reading device” (iRex, Bookeen, BeBook, Sony, Ectaco, classic paper book). In cases of unequal variances within the groups, Friedman-tests (using χ2-statistics) were employed.

Reading time

No significant effects were found for total reading duration, F(5,40) = 0.857, p = .518. The time needed to read the text did not differ between the different devices.

Reading speed

We measured reading speed in counting words read per minute. No significant effect was found for reading speed between the different devices, F(5,40) = 1.113, p = .369.

Total page-turn duration

The time needed for each page-turn (where no reading takes place) was assessed, i.e., the period of time between the last reading fixation on the bottom of a page and the first reading fixation on the top of the next page. By summing up those durations, we measured the total time needed for page-turns. The means of total page-turn durations differed significantly between the reading devices, χ2 (5) = 22.016, p < .01.

Proportion of time spent for page-turns

The proportion of time spent for turning the pages, i.e., the ration of time needed to turn the pages and the total reading time, is shown in Table 1. The Friedman test revealed significant differences between the e-readers, χ2(5) = 19.857, p < .01.

Table 1. Mean Proportion (in %) and Standard Deviations (SD) of Time Spent for Page-Turns

Table 1

Mean fixation duration

Visual fixation duration is a well established indicator of the difficulty of perceptual and/or cognitive processing (Just & Carpenter, 1980; Menz & Groner, 1982). The mean duration of visual fixations differed significantly between the reading devices, χ2(5) = 25.063, p < .01. Figure 3 shows differences in means and standard deviations of the fixation durations.

Figure 3

Figure 3. Mean fixation durations (ms) for reading on the different devices, with bars indicating one standard deviation. A single asterisk indicates a significant difference of p < .05 to the device with the shortest mean fixation duration (iRex); double asterisks indicate a significant difference of p < .01.

Number of letters per fixation

The mean number of letters, read per fixation was significantly affected by the type of book, χ2(5) = 14.460, p < .05. In the classic paper book, one fixation covered the largest number of letters. Table 2 shows the numbers of letters per fixation depending on the reading device. Note that the paper book had the smallest font.

Table 2. Mean Number of Letters per Fixation and Standard Deviations (SD)

Table 2

Results of the Usability Ratings

The following sections describe the usability analysis based on the usability tasks and the ratings of the participants after they had tried to solve the tasks. Statistical analysis was performed using a Friedman-test with the reading devices as a within factor (iRex, Bookeen, BeBook, Sony, Ectaco, classic paper book).

Success rate of the usability tasks

After participants had solved the usability tasks, they had to report whether they solved the task successfully or not. Table 3 shows the success rates of the five usability tasks.

Table 3. Percentage of Success Rates for the Five Usability Tasks1

Table 3

Subjective Usability ratings

Design

The question was, “How do you like the design?” The ratings (on a 1–6 Likert scale) about the design of the reading devices was significantly different between devices, χ2(5) = 20.388, p < .01. Table 4 shows the results.

Table 4. Mean Rating and Standard Deviations (SD) for Design on a Likert Scale from 1 (Very Bad) to 6 (Very Good)

Table 4

Navigation

Participants judged the navigation on a Likert scale from 1 (very bad) to 6 (very good). The question was, “How do you judge the navigation?” We found significant differences between the devices, χ2(5) = 25.064, p < .01. As a control question, we asked, “How are you getting along with the reading device?” and replicated the above result, χ2(5) = 29.411, p < .01. Table 5 shows the results and ranks.

Table 5. Mean Ratings and Standard Deviations (SD) for Navigation, Likert Scale from 1 (Very Bad) to 6 (Very Good)

Table 5

Functionality

The question, “How do you judge the different functions/applications (like reading, listening music, storage of pictures…) of the reading device?” was judged on a Likert scale from 1 (very bad) to 6 (very good). We found significant differences in the functionality of the reading devices, χ2(5) = 19.265, p < .01. Table 6 shows results and ranks for functionality.

Table 6. Mean Rating and Standard Deviation (SD) of Functionality Ratings on a Likert Scale Ranging from 1 (Very Bad) to 6 (Very Good)

Table 6

Handiness

The question was, “How handy do you rate the reading device?” Participants judged handiness on a Likert-scale ranging from 1 (very bad) to 6 (very good). We found significant differences in handiness of the reading devices, χ2(5) = 15.111, p < .05. Table 7 shows the results and ranks.

Table 7. Mean Rating and Standard Deviations (SD) for Handiness on a Likert Scale Ranging from 1 (Very Bad) to 6 (Very Good)

Table 7

Usability ratings based on a questionnaire

After participants performed the requested tasks in the usability test, they filled out the usability questionnaire by Huang et al. (2006). The questionnaire originally resulted in a 10-item Likert scale that, for comparison with the other rating scales, was transformed into a 6-item Likert scale. The usability ratings differed significantly between the different reading devices, χ2(5) = 26.667, p < .01. Table 8 shows the mean usability ratings.

Table 8. Mean Usability Ratings (Based on the Questionnaire by Huang et al. 2006) and Standard Deviations (SD) Transformed to a Likert Scale Ranging from 1 (Very Bad) to 6 (Very Good)

Table 8

Comparison Between Subjective User Data and Objective Eye Movement Data

In the multifunctional approach employed in our analysis, different usability methods were combined. We found a dissociation, which is a discrepant result between perception (eye tracking) and evaluation (interviews) in the second legibility test. The results in the first legibility test showed a significant correlation between preference and legibility (r = .356, p < .01); the second legibility test showed a dissociation (r = –.002) between preference and legibility. In the first session, 60% of the participants preferred the iRex device to read with; however, after having used all devices, only 30% still preferred it. Figure 4 shows the comparison between subjective and objective measures for the two tests, a similar distribution between interview data and eye-movement data in Test 1 (left side of Figure 4) and a dissociation in Test 2 (right side).

Figure 4

Figure 4. The two diagrams in the upper half show the percentage of “favorite to read with” as judged by participants. The lower diagrams show the eye-movement data as the percentage of participants who had the shortest visual fixations when reading on the device compared to the other devices.

This result suggests that subjective appraisal by the subject is prone to bias: When asking users to rate the legibility, their judgment is biased by their overall impression of the device, including usability. Test setup and procedure (like order of tasks and questions) influence the participant’s appraisal.

 

1Percentage of success means: F.e. 100% of participants solved the task successfully.

Previous | Next