upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

Comparing Computer Versus Human Data Collection Methods for Public Usability Evaluations of a Tactile-Audio Display

Maria Karam, Carmen Branje, John-Patrick Udo, Frank Russo, and Deborah I. Fels

Journal of Usability Studies, Volume 5, Issue 4, August 2010, pp. 132 - 146

Article Contents


Results

We report on the results from the researcher-administered and the self-administered questionnaires, in addition to the advantages and disadvantages experienced with both methods. Results from the human-administered questionnaire were described in a previous paper (Branje et al., 2009). In this section, we review results from the self-administered version and then compare these results to those of the human-administered version. A Kolmogorov-Smirnov test for normality showed that the data departed significantly from normality, due to the difference in the number of participants and question responses between the two conditions. As a result, a non-parametric Mann-Whitney test was performed to examine whether there were significant differences between the two survey methods for the respective survey responses. Seven of the eighteen response questions, including four of the five questions concerning the detectability of the various aspects of the Emoti-Chair, were found to have a statistically significant difference (p < 0.05) between survey methods (see Table 2).

Researcher-Administered Questionnaire Results

Having a researcher present to administer questions to a participant is an effective way of reducing problems associated with the integrity of the results and for obtaining additional information through open-ended questions and interviews about the user experience with the system. Humans have an advantage over computers to make observations, alter their approach, or answer questions while the participant is engaged. However, it is also more challenging to approach people in a public setting without interfering with the natural flow of the user experience.

Several existing problems common to human-administered questionnaires were also present in this study, including the potential for participants to feel the need to respond positively because they were speaking with a person and may not have wanted to provide negative feedback. While human researchers can also obtain additional information through interviews and open-ended questions, we were more interested in leveraging the large turnover of visitors who could provide us with general feedback on the system. This approach provided us with a broad overview of the users’ perspective on the system within the public entertainment domain, towards further identifying problems and other issues with the universality of the design.

Self-Administered Questionnaire Results

Many aspects of the self-administered questionnaire made it preferable over the researcher-administered one. First, although initial programming work was required to develop the automated questionnaire, very little continued labor was required from the research team to administer the questionnaire. The questionnaire was remotely accessible and so data could be collected on a regular basis and any problems with the questionnaire could be identified and fixed early on in the pilot stage of the study. The self-administered questionnaire yielded a high number of participants, and even after anomalous entries were deleted, over 550 usable questionnaire entries were collected over a two-month period.

As we expected, a major problem with the self-administered questionnaire was associated with the public nature of the display. Researchers observed participants pressing question keys in rapid succession and observed others walking away from incomplete questionnaires.

As this was a children’s science museum, there were many attention grabbing aspects of the exhibits competing for visitors’ attention, and it appeared that in some cases the questionnaire was not able to retain the participants’ focus long enough to complete the 25 questions.

Comparing Results Between Survey Methods

In this section, we discuss the differences in survey results between the two questionnaire methods. These are presented and organized based on sections of the questionnaires. Although most of the results showed no statistically significant difference between the two methods, this is an expected result, suggesting that both methods yield similar results. The few that did show difference are discussed below.

Table 1. Age groups of the participants who took part in each survey method.

Table 1

Demographics

There were no participants in the oldest age category (65+) surveyed using the human-administered questionnaire and only 21 were surveyed using the computer-administered survey. The largest number of participants for both the computer and human administered surveys were youths, followed by adults and children. This seems to reflect the demographic of visitors to the museum, which is geared towards schools where a few adults supervise many children.

The distribution of participants for both methodologies was statistically significant, such that the groups were not equally divided by age. We suspect that the variation in age group taking part in the studies may be due to several factors. First, children under 10 are likely to have lower comprehension and reading levels than youth or adults, which may have discouraged many from taking the self-administered survey. Youths may have been more confident in taking the survey, because they are more likely to be comfortable with computers systems and touch screens. Additionally, the survey was mounted on the wall approximately 36 cm off the floor, which may have been out of reach for some of the younger visitors.

Detection of Tactile Sensations

One of the most important aspects of the Emoti-Chair explored in this study was the detectability of each of its tactile components. We found that responses to questions about the detectability of sensations were significantly different for the two survey methods. Table 2 shows the mean and standard deviation (sd) for questions with significant results. In general, participants interviewed by the researchers indicated that they could detect the whole body vibrations, the lower back vibrations, and arm vibrations more than participants who took the self-administered survey. However, upper back vibrations were found to be significantly less obvious to respondents of the researcher-administered survey than in the automated version. There were several possible reasons for this result. First, the signals emitted through the tactile devices located on the upper back were relatively weaker than those on the lower parts of the back. The upper back stimulators were used to convey high frequency signals, while the seat presented vibrations in the mid to low frequency ranges. At high frequencies the voice coils do not vibrate as strongly as they do for lower frequencies, making them more difficult to sense.

Table 2. Significant differences between computer-based and researcher collected survey data using the Mann-Whitney U test. Ratings were provided on a 5-point Likert scale where 1 is the least detectable and 5 is the most detectable.

Table 2

Though we could adjust the power delivered to each of the vibrators individually, this could not be done on an individual basis in this environment. Also, there were fewer participants who took part in the human-administered evaluation, which may have yielded a larger percentage of respondents who did not notice the higher frequencies along the upper back. Finally, people of different stature may have experienced the upper back vibrations along different parts of their back (e.g., taller people may have felt the upper back vibrations in their mid-back section instead), but our data did not address individual body types for this study.

The difference in response to the detectability of the sensations in the middle back was not significant between computer and human versions of the survey. The anonymity offered by the computer-based survey may have elicited these more candid/negative responses, leading to more honest responses from those who did not detect certain sensations. Anonymity or perception of privacy is reported to be one of leading explanations for the increase in honest responses that are associated with computer-based surveys (see Richman, Kiesler, Weisband, & Drasgow, 1999 for a meta-comparison between computer-based and human delivered surveys for non-cognitive type data).

Enjoyment Ratings

Significant differences in responses were also found in the enjoyment ratings for the back vibrations and linear actuators for the two survey methods. Table 3 shows a comparison of means and standard deviations for each survey method for this category of question. Generally, participants responding to the self-administered questionnaire rated these features of the Emoti-Chair higher on enjoyment than did the participants who took part in the researcher-administered questionnaire.

Table 3. Significant differences between computer-based and researcher collected survey data. Ratings were provided on a 5-point Likert scale where 1 is least enjoyable and 5 is most enjoyable.

Table 3

Other enjoyment variables that were not statistically significant were overall enjoyment, overall comfort, and enjoyment of the air jets, seat vibrations, and arm vibrations.

These results would seem counter intuitive to literature (e.g., Richman et al., 1999) regarding the candidness of ratings in computer-based surveys versus researcher-based surveys as the computer surveys showed a higher enjoyment rating. There could be two possible explanations for this result. The first relates to the possibility that this result is a Type I (a false positive) error in that we compared 18 questions and we would expect that, by chance, at least one question (5%) would be significant. The second explanation is that enjoyment is a more abstract concept compared to whether or not the sensation is detectable. The researcher-based survey allowed elaboration on the concept of enjoyment and that could have led to the more negative ratings. If enjoyment were further explained or better worded in the computer-based survey, a more negative answer may have resulted.

Information and exhibit responses

A significant difference was also shown for participant responses to the two survey methods for the perceived alignment of the Emoti-Chair with the concept of an eco-footprint. Fit was rated on a 5-point Likert scale with 1 being no alignment and 5 being completely aligned, U(623)= 17210, Z= -2.38, p<0.05; M(self administered) = 2.92, SD =1.39; M(researcher administered) = 2.51, SD = 1.18. Participants responding to the self-administered questionnaire found that the Emoti-Chair fit better with the eco-footprint concept than did the participants responding to the researcher-administered questionnaire. Although, the exact reason for this difference is difficult to ascertain, we speculate that people who took the initiative to complete the self-administered survey may have also spent more time reading through the exhibit material that outlined the purpose of the display.

There were five variables in this category with non-significant differences between survey types: the relationship between the video material and the chair rocking, air jets, background information provided, vibrations, and the fit of the Emoti-Chair with the polar year theme of the exhibit.

Analysis

The results of this study are promising, suggesting that there is value in using an automated response system to obtain user feedback in a public-usability domain. While it was shown that the responses to both surveys yielded similar trends for most of the questions, several interesting findings were revealed, suggesting where improvements to our approach can be made.

First, a more detailed description of the system and the different tactile sensations they provide would be beneficial for those who will be experiencing the system. For example, some people may not have experienced all of the devices in the chair, especially in the case where they were not fully seated and could not feel the back vibrations. This was a problem for both versions of the study, because participants may not have wanted to ask the researcher or re-read the display information to clarify what each of the different sensations were.

Second, the availability of the self-administered questionnaire over the human administered version was advantageous for both researchers and participants. Human researchers cannot always be available to interview participants, while the automated version is always in operation during museum hours. Human researchers may also be selective in who they approach to interview, while participants may also be more hesitant to approach researchers than they would be walking up to the computer.

Third, the inclusion of a touch screen to obtain user feedback adds another element of interest for visitors to the museum, who often seek out new experiences. The computer adds an interesting element for users, who may be keener to explore interactions with a computer screen associated with the exhibit than to break the flow of that experience to speak with a human researcher. However, human interviewers offer opportunities to explain unclear terminology or concepts that are presented to the interviewee. The self-administered version does not offer the same level of communication; however, we believe that a carefully designed help system could facilitate access to information about the system. Humans may also be able to probe interviewees to clarify their answers or to gather further data such as why the interviewee had a particular response. In addition, interviewers who are frequently at the museum (multiple days and weeks) can specifically target interviewees in missing groups. Although this did not happen in our case (we are missing responses from people in the 65+ group from the human interviewer responses), it is possible to adjust. The self-administered version cannot selectively target respondents from demographic categories with low numbers of participants.

Another limitation of our approach was the lack of data that showed people’s levels of understanding of the exhibit content including the Emoti-Chair. This is some of the most difficult information to collect, particularly in a public setting. Using typical multiple-choice, fill-in the blank, matching or essay type questions is problematic as new factors such as difficulties with literacy, test-taking anxieties, compliance with completing the survey, and the relationship between scores on “tests” and understanding are introduced. In addition, people may feel embarrassed about wrong answers or appearing to not have paid attention, which may lead to hesitation when asking for clarification or explanations. In addition, while the goal of our reported study was not to evaluate the participant’s level of comprehension of the display, this is a factor we would like to explore in future studies. One approach to this would be to implement a “test your knowledge” element as a separate entity that is integrated with the exhibit. In this particular museum a number of “test your knowledge” stations already exist, which are designed to be fun and engaging. Using the museum’s existing approach combined with a tracking mechanism that collects people’s responses could provide this type of data.

Although there is considerable research showing that the presence of human interviewers can introduce more response bias (Richman et al., 1999), there are benefits to using both human and computer-generated data collection techniques. The computer version is valuable for acquiring large number of participants, while the human administered versions are useful in testing the questions and in discovering any potential problems with the exhibit, or the questions, before the automated version is finalized. Human collected data are also important in gathering more detailed probing data that explains the user’s perception of the exhibit, the system, and the questions.

Previous | Next