upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

Usability Evaluation of Randomized Keypad

Young Sam Ryu, Do Hyong Koh, Brad L. Aday, Xavier A. Gutierrez, and John D. Platt

Journal of Usability Studies, Volume 5, Issue 2, Feb 2010, pp. 65 - 75

Article Contents


Results

The following sections discuss the results for the pre-test survey, the user test with 4-digit PINs, the user test with 8-digit PINs, the user test using mixed ANOVA, and the post-test survey.

Pre-test Survey

According to the pre-test survey for 50 participants, 48 out of 50 (96%) of the participants were familiar with touch-screen numeric keypads. Among the 48 participants, 44 participants were familiar with touch-screen numeric keypads at ATMs (91.70%), 34 at grocery store checkouts (70.80%), 14 at door locks (29.20%), 36 at touch-screen phones (75%), and 4 at others (such as, GPS and PC) (8.30%). Of those participants familiar with touch-screen keypads, the majority indicated use of the technology on a regular basis (36% “every day,” 40% “a few times a week,” 20% “a few times a month”). Thirty-four out of 50 participants (68%) reported they had felt insecure (yes or somewhat) when they typed in PINs using touch-screen keypads in public (Figure 2); both groups of participants (4 digits vs. 8 digits) showed the same percentage (68%, 17 out of 25 each). This percentage is significantly greater than 50% because the 95% adjusted-Wald binomial confidence interval for this percentage ranges from 54.13% to 79.30%, p<.05. Further, 34 out of 50 participants (66%) liked the idea of using a randomized keypad to provide more security (Figure 3); both groups of participants (4 digits vs. 8 digits) showed the same percentage (66%, 17 out of 25 each). Because the 95% adjusted-Wald binomial confidence interval for this percentage ranges from 52.11% to 77.61%, the percentage is significantly greater than 50%, p<.05.

Figure 2

Figure 2. Pre-test survey question

Figure 3

Figure 3. Pre-test survey question

User Test with 4-Digit PINs

A total of 25 participants conducted the task of entering a randomly generated 4-digit PIN. Each participant completed 20 trials of this task using a conventional keypad and 40 trials using a randomized keypad. The randomized keypad task used more iterations to assess the presence of any learning or training in the task.

The average completion time with the randomized keypad (4.598 seconds, SD=0.795) was significantly longer than the average completion time of the conventional keypad (3.405 seconds, SD=0.612), F(1,24)=72.80, p<.001 (Figure 4). However, there was no significant difference in completion time between the first 20 trials of the randomized task (4.634 seconds, SD=0.788) and the last 20 trials of the randomized task (4.563 seconds, SD=0.877), F(1,24)=0.49, p=0.4894 (Figure 5). Thus, no significant learning effect was observed in the use of a randomized keypad for entering a 4-digit PIN.

Figure 4

Figure 4. Average completion time for the 4-digit PINs user test

Figure 5

Figure 5. Average completion time for the 4-digit PINs user test

As part of the data gathering process, the error rate per trial was also measured. The average error rate for the conventional keypad (0.0440) was slightly higher than the error rate for the randomized keypad (0.0350) (Figure 6), but not significantly higher, F(1,24)=1.45, p=0.2408. The standard deviation of the error-rate data was very high (conventional keypad = 0.0545, randomized = 0.0445) as compared to the mean values. Also, there was no significant difference in error rate between the first 20 trials of the randomized task (0.04, SD=0.07) and the last 20 trials of the randomized task (0.03, SD=0.04), F(1, 24)=1.20, p=0.2832 (Figure 7).

Figure 6

Figure 6. Average error rate for the 4-digit PINs user test

Figure 7

Figure 7. Average error rate for the 4-digit PINs user test

User Test with 8-Digit PINs

Data from the long (8-digit) PIN testing showed the average completion time with the randomized keypad (8.932 seconds, SD=1.546) was significantly longer than that of the conventional keypad (6.277 seconds, SD=1.480), F(1,24)=111.59, p<.001 (Figure 8). However, no significant difference was noted in completion times between the first 20 trials (9.060 seconds, SD=1.531) and the last 20 trials (8.80 seconds, SD=1.630) for the randomized keypad, F(1,24)=3.65, p=0.0681 (Figure 9). Thus, no significant learning effect was observed in the use of a randomized keypad for entering an 8-digit PIN.

Figure 8

Figure 8. Average completion time for the 8-digit PINs user test

Figure 9

Figure 9. Average completion time for the 8-digit PINs user test

Unlike the result from the 4-digit task, the average error rate for the conventional keypad (0.0380, SD=0.0525) was significantly lower than the error rate for the randomized keypad (0.0690, SD=0.0755), F(1,24)=4.79, p=0.0386 (Figure 10). However, there was no significant difference in error rate between the first 20 trials of the randomized task (0.0720, SD=0.0817) and the last 20 trials of the randomized task (0.0660, SD=0.0702), F(1,24)=0.10, p=0.7549 (Figure 11).

Figure 10

Figure 10. Average error rate for the 8-digit PINs user test

Figure 11

Figure 11. Average error rate for the 8-digit PINs user test

User Test— Mixed ANOVA

By combining the data from both the 4-digit group and the 8-digit group, mixed ANOVA was performed to assess any interaction effect according to the PIN length. The PIN length (4 digits vs. 8 digits) was the “between subject” variable, while the type of keypad (conventional vs. randomized) and trials (1st to 20th trials) were the “within subject” variables. To match 20 trials of the conventional keypad, only the first 20 out of 40 trials of the randomized keypad tests were taken for the mixed ANOVA.

Completion time

Obviously, the effect of PIN length on completion time was significant, F(1,48)=146.47, p<0.0001. The main effect of the type of keypad was significant as well, implying that task completion time took longer with a randomized keypad than with a conventional keypad, regardless of the PIN length, F(1,48)=203.79, p<.0001. Also, interaction was noted between the PIN length and the type of keypad, F(1,48)=30.57, p<.001. Figure 12 shows the interaction plot. This interaction can be interpreted as the increase in completion time between conventional and randomized keypads when users were required to type more digits for the longer PINs.

Figure 12

Figure 12. Interaction plot between the PIN length and type of keypad for completion time

The main effect and interaction effect caused by the trial were analyzed. The main effect of the trial was significant, F(19,912)=3.43, p<0.0001. This implies that the completion time variations among trials caused by different PINs were great. Also, there was a significant interaction between PIN length and trial, F(19,912)=1.60, p=0.0495. To interpret this interaction thoroughly, a one-way ANOVA was performed with the trial as the factor, according to each type of keypad (Table 1). According to the p-values in Table 1, the trial was not a significant factor for short (4-digit) PINs, while it was a significant factor for long (8-digit) PINs. Thus, the interaction between PIN length and trial can be attributed to greater completion time variations with longer PINs. In other words, different PINs account for significant proportion of completion time variations for longer PINs.

Table 1. One-way ANOVA result of completion time with the trial as the factor

Table 1

Error rate

No significant main effect due to PIN length was noted in the number of errors, F(1,48)=0.63, p=0.4309. Similarly, no significant main effect was noted due to the type of keypad, F(1, 48)=2.23, p=0.1422. However, the relatively low p-value, F(1,48)=2.82, p=0.0997, on the interaction between PIN length and keypad type can be explained by a significant effect of keypad type during 8-digits tasks, as indicated in the previous section, F(1,24)=4.79, p=0.0386 (Figure 13). No significant main effects of trial, F(19,912)=1.23, p=0.2211, were noted nor any other interaction effect.

Figure 13

Figure 13. Interaction plot between the PIN length and type of keypad for average error rate

Post-test Survey

In the post-test survey, 18 of 25 (72%) participants with short (4-digit) PINs and 16 of 25 (64%) participants with long (8-digit) PINs believed (yes or probably) that the randomized keypad would provide more security. Thus, according to the Adjusted Wald Binomial confidence interval, the result from the short PINs (52%<95%C.I.<86%) was significant; however, the same metric applied to the longer PINs (44%<95%C.I.<79%) was not significant. Also, there was no significant difference in the beliefs between the two groups according to a Fisher’s exact test, p=0.7624.

The randomized keypad was somewhat more difficult to use (yes or probably), with 7 of 25 (28%) participants and 10 of 25 (40%) participants expressing some difficulty in the short-PIN and long-PIN cases, respectively. According to the Adjusted Wald Binomial confidence interval, the result from the short-PINs case (14%<95%C.I.<47%) showed that the percentage of users having difficulty was significantly less than 50%, while the same metric applied in the long-PINs case (23%<95%C.I.<59%) was not significantly less than 50%. Also, there was no significant difference in the beliefs between the two groups according to Fisher’s exact test, p=0.5512.

Previous | Next