upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

Intra- and Inter-Cultural Usability in Computer-Supported Collaboration

Ravi Vatrapu and Dan Suthers

Journal of Usability Studies, Volume 5, Issue 4, August 2010, pp. 172 - 197

Article Contents


Results

Results are grouped under the following five subsections: Demographics, Culture Measures, Objective Usability Measures, and Subjective Usability Measures. The empirical data generated by the experimental study were analyzed at four levels: culture (American, Chinese), gender (female, male), dyadic culture (American-American, American-Chinese, Chinese-Chinese), and dyadic gender (female-female, female-male, male-female).

Demographics

The age of the participants (n=60) ranged from a minimum of 22.00 years to a maximum of 45.00 years. The average age of the participants was 28.20 years (SD = 4.6, SE = 0.60). There was no age difference at any of the four levels of analysis (culture, gender, dyadic culture, dyadic gender). As expected, American participants reported to have spent significantly more time in the United States of America than the Chinese participants. On the other hand, the time spent by the participants in Hawai‘i with respect to culture and gender was not statistically significant. Of the 60 participants, 30 reported being doctoral students and the other 30 participants reported being masters students. Of the 30 Chinese participants 14 were doctoral students and 16 were masters students. Of the 30 American participants, 16 were doctoral students and 14 were masters students. Participants belonged to 30 different departments at the University of Hawai‘i at Mānoa. There were no significant differences at any of the four levels of analysis for prior experience with experimental studies, prior knowledge about the experimental task, and partner familiarity.

Culture Measures

As mentioned before, a PVQ (Schwartz et al., 2001) was used to measure culture at the individual level. The GLOBE instrument (House et al., 2004) was used to measure culture at the group level.

Ten individual values were measured by the PVQ (Schwartz et al., 2001). Statistical analysis showed that at the level of culture the following PVQ values were significant: Conformity, F(1,56)=7.71, p=0.008; Benevolence, F(1,56)=5.60, p=0.02; Universalism, F(1,56)=6.66, p=0.01; Self-Direction, F(1,56)=7.48, p=0.01; Stimulation, F(1,56)=10.02, p=0.003; and Security, F(1,56)= 30.76, p<0.0001.

Significant differences were observed on both sections of the GLOBE instrument. For the “AS IS” section, significant differences between the American and Chinese groups were observed for the following: Institutional Collectivism, F(1,56)=43.55, p<0.01; In-Group Collectivism, F(1,56) =102.43, p<0.01; and Assertiveness, F(1,56)=28.57, p<0.01. For the “SHOULD BE” section of the GLOBE instrument, statistically significant differences were found for the following: Uncertainty Avoidance, F(1,56)=49.65, p<0.01; Assertiveness, F(1,56)=4.20, p=0.04; Future Orientation, F(1,56)=14. 23, p=0.01; Humane Orientation, F(1,56)= 7.90, p=0.007; and Gender Egalitarianism, F(1,56)=4.89, p=0.03.

In summary, there is necessary and sufficient evidence to conclude that Chinese and American participants significantly differ on specific PVQ individual values as well as GLOBE cultural dimensions. Even though a nation-state based stratified random sampling frame was utilized, systemic variation between the two participant groups was empirically documented and not stereotypically assumed or dogmatically asserted.

Objective Usability Measures

Objective usability measures consisted of the efficiency (total task time in minutes) and effectiveness (usage of certain features of interest). A multivariate analysis of variance with the independent variable of dyadic culture (American-American, American-Chinese, and Chinese-Chinese) and the dependent variables of task time, cross-referencing, shared workspace refresh, threaded discussion messages, embedded discussion notes, evidential relation links (for+against+unknown), data nodes, hypotheses nodes, and unspecified nodes was statistically significant, Roy’s Largest Root=0.767, F(9, 50)=4.259, p<0.001. Each of these dependent variables is discussed below.

With respect to the independent variable of culture (American, Chinese), a multivariate analysis of variance with the same set of dependent variables mentioned above yielded significant results, Roy’s Largest Root=0.586, F(9, 50)=3.758, p=0.003. Each of the dependent variables is discussed below.

Efficiency

On average, task time was greater for Chinese participants (M=156.07 minutes, SD=19.22) than the American participants (M=144.96, SD=25.14). On average, female participants’ task time (M=155.58, SD=20.88) was greater than the male participants (M=145.44, SD=24.00) in the study. A two-way ANOVA showed marginal main effects for culture, F(1,56)= 3.77, p=0.06 and gender, F(1,56)=3.14, p=0.08. On the other hand, total task time varied significantly between the intra- and inter-cultural conditions of the experimental study, F(2,51)=5.17, p=0.009. A Bonferroni post-hoc comparison showed that the American intra-cultural group had significantly lower task time than the Chinese intra-cultural group and the American-Chinese inter-cultural group. No significant differences were observed at the dyadic gender level.

Effectiveness

Effectiveness measures counted the number of uses of the software features of structural and functional significance to computer-supported collaborative learning. Each measure is introduced, briefly discussed, and then empirical results are presented in the following sections.

Shared workspace refresh

As discussed in the Software section in the Methodology section of this paper, the shared workspace (information organizer + discussion) could be refreshed (a) automatically after returning from the game or (b) on demand when the participant clicked on the Refresh button (see top-right in Figure 1). There were four reports and a final page. Participants had to play and quit the game in order to receive the next report. All the participants played and quit the game at least four times and therefore, received all four reports. However, the refresh count varied due to the differences in the number of on-demand refreshes of the shared workspace. There was no significant main effect for refresh count with respect to culture or dyadic culture. Figure 2 presents the effectiveness results (minus the cross-referencing usage measure) at the culture level of analysis. Figure 3 presents the effectiveness results at the dyadic culture level of analysis.

Figure 2

Figure 2. Effectiveness with respect to culture

Figure 3

Figure 3. Effectiveness with respect to dyadic culture

Cross-referencing

Video analysis of the screen recordings of participant sessions was done to obtain the counts for cross-referencing (graphical objects embedded in the threaded discussion messages, see Figure 1). Even though the empirical trend was that on average, American participants used cross-referencing more than the Chinese participants, no statistically significant differences were found at any of the four levels of analysis (culture, gender, dyadic culture, dyadic gender).

Threaded discussion messages

Counts for discourse usage were obtained from the software logs of participant sessions. For threaded discussion messages, American participants created more threaded discussion messages than the Chinese participants, and the difference was statistically significant, F(1,56)=8.88, p=0.004. Significant differences were also observed at the dyadic culture level of analysis, F(2,57)=8.84, p<0.001. Participants in the Chinese-American inter-cultural condition created the highest number of threaded discussion messages followed by the American intra-cultural group and the Chinese intra-cultural group.

Embedded discussion notes

For the embedded discussion notes, no statistically significant differences were found. However, the observed empirical trend was that Chinese participants created more embedded discussion notes than the American participants. With respect to the dyadic culture level of analysis, significant differences were observed between the three experimental conditions, F(2,57)=4.76, p=0.012. Post-hoc comparisons showed that participants in the American intra-cultural group created more embedded discussion notes than those in the Chinese intra-cultural group while participants in the American-Chinese inter-cultural group created the lowest number of embedded discussion notes.

Evidential relation links

Counts for evidential relation links were obtained from the software logs of participant sessions. The total number of evidential links created was the sum of the Against links, the For links, and the Unknown links created. On average, American participants created significantly more evidential relation links compared to the Chinese participants of the experimental study, F(1,56)=5.54, p=0.02. However, no significant differences were observed at the dyadic culture level of analysis.

Knowledge-map nodes

No significant differences were observed between the Chinese and American participants in the number of data and hypotheses nodes created. However, Chinese participants created fewer unspecified nodes than the American participants, F(1, 56)=5.76, p=0.02. At the gender level of analysis, female participants created significantly more hypothesis nodes than the male participants, F(1, 56)=4.68, p=0.035. No significant differences were observed at the dyadic gender level of analysis.

Subjective Usability Measures

As mentioned earlier, a validated usability instrument, the QUIS questionnaire (Chin, Diehl, & Norman, 1988), was administered to collect the participants’ subjective perceptions and preferences of the learning environment. In addition to various system measures, the QUIS 7.0 instrument also measured participants’ subjective satisfaction with the instructions and the software tutorial. The coding key for the QUIS instrument was used for the quantitative analysis of the data (http://lap.umd.edu/QUIS/QuantQUIS.htm).

Overall system user satisfaction

On average, the overall user satisfaction for the Chinese participants was higher than that for the Americans (see Figure 4). However, no significant differences were found at any of the four levels of analysis (culture, gender, dyadic culture, dyadic gender). A consistent empirical trend at the dyadic level of analysis was that the mean subjective rating for the inter-cultural group was situated between the mean ratings for the two intra-cultural groups (Figure 5).

Information display

Significant differences were observed between the Chinese and American participants on the QUIS section for information display on the screen, F(1,56)=8.00, p=0.01. Chinese participants’ subjective satisfaction scores for the screen information display were lower than the American participants.

System terminology

System Terminology and System Information sections of the QUIS instrument received significantly lower ratings from the Chinese participants, F(1,56)=4.84, p=0.03. Significant differences were observed at the dyadic culture level of analysis, F(2,57)=3.30, p=0.04. Chinese intra-cultural participants had the least subjective satisfaction with the system terminology followed the inter-cultural group and the American intra-cultural group.

System capabilities

No significant differences were observed despite lower scores by Chinese participants compared to the American participants of the experimental study. No significant differences were observed at the dyadic level of analysis.

Ease of learning of the system

Results for the Learning section of the QUIS instrument showed a marginally significant difference on the ease of learning measure at the level of culture. No significant differences were observed at the dyadic level of analysis.

Software demonstration and tutorial

Results for the Tutorial section of the QUIS instrument showed no significant difference for participants’ subjective evaluation of the software demo and experimental instructions at any of the four levels of analysis. Therefore, experimenter bias and “demand characteristics” (Orne, 1962) are ruled out as confounding variables in the study.

Another interesting empirical trend was that the average subjective ratings in the inter-cultural condition were always between those of the subjective ratings for the two intra-cultural conditions.

Figure 4

Figure 4. QUIS ratings with respect to culture

Figure 5

Figure 5. QUIS ratings with respect to dyadic culture

In summary, there was a discrepancy between Chinese participants higher overall satisfaction ratings and their lower satisfaction ratings for the specific components of the system (Screen and Terminology & System Information). Similarly, American participants reported lower overall reaction ratings but higher satisfaction ratings to the specific components of the system. Figure 4 presents a summary of the QUIS results with respect to culture. Figure 5 presents the summary of the QUIS results with respect to the dyadic culture. It must be emphasized that while the mean difference in the overall ratings between the three collaboration conditions (Chinese-Chinese, American-Chinese, and American-American) and the two cultural groups (Chinese, American) is small and statistically significant. However, a qualitative analysis of the user comments on the QUIS questionnaire tells a different story as discussed in the following section.

Analysis of comments

The QUIS instrument includes an open-ended comments solicitation at the end of each of the six sections. The user comments were transcribed. A few illustrative user comments are included below:

C2P23 (Chinese, Female): “The link function is very helpful and I'll expect a drag and drop from the organizer to the message panel.”

C6P2 (Chinese, Male): “Having a zoom in/out feature might help.”

I3P1 (Chinese, Female): “I like the screen. But the text boxes are somehow difficult to organize, if there are too many of them.”

I3P2 (American, Female): “I like the idea and that can link data boxes. However, a function that would put everything into a condensed list or a short outline to see all data at once should be helpful b/c sometimes too much information is displayed at once to work with in a rational manner.”

I8P1 (American, Female): “I found that performing an operation did not always lead to a predictable result. Sometimes, I was unable to move the text over or the wrong copied text appeared in the box.”

I8P2 (Chinese, Male): “In general the system speed is satisfactory, and it's reliable, but it needs more on other functions such as undo, correcting typo, etc.”

I11P2 (Chinese, Male): “It is good if there is a ‘undo’ and ‘redo’ (ctrl+z or ctrl+y).”

I12P2 (Chinese, Female): “I think it would be better to separate the links sent by partner and mine. It seems all the links are put together and inconvenient [sic] to read.”

A3P1 (American, Female): “When new text boxes appear from the partner, they should be in a separate section so it is easy to see them and sort them out from mine and older ones. They should be color coded differently until read. The size of screens should be adjustable to allow more.”

A7P2 (American, Male): “Instructions were well laid out and easy to use. Messages sometimes appear overlapping, difficult to see everything that way.”

A8P1 (American, Female): “Overall fairly clear & easy to navigate.”

A9P2 (American, Male): “In the boxes, the word ‘text’ should be eliminated in a click. It shouldn't need deleting.”

Qualitative analysis of the comments shows that undo, copy + paste, zooming, and color coding of contributions were the most frequent usability suggestions. Usability problems mentioned included scrolling issues, font size, default text, etc. Negative comments were mainly about screen clutter.

The coding scheme developed in (Vatrapu & Pérez-Quiñones, 2006) was adapted for the content analysis of the comments. The modified coding scheme is described below:

  • Usability problem (U): Interaction design flaw or a user difficulty that is directly associated with an interface/interaction design flaw.
  • Suggestion (S): subjective preference of the participant to the implemented design choice or tradeoff.
  • Positive comment (P): participant’s subjective approval of a design choice or tradeoff.
  • Negative comment (N): participant’s subjective disapproval of a design choice or tradeoff.
  • Other comment (O): User comment that couldn’t be categorized under one of the above categories.

Total comments = usability problems (U) + suggestions (S) + negative comments (N) + positive comments (P)

Figure 6 presents the results for the aggregate count of user comments with respect to culture. Figure 7 presents the aggregate count of user comments with respect to the dyadic culture.

Even though the Chinese participants made more usability suggestions, more positive comments, and less negative comments than the American participants, no significant differences were observed at any of the four levels of analysis. Another interesting but statistically insignificant empirical trend was that more usability suggestions, negative comments, and positive comments were made in the inter-cultural condition compared to the two intra-cultural conditions.

Figure 6

Figure 6. Aggregate count of user comments with respect to culture

Figure 7

Figure 7. Aggregate count of user comments with respect to dyadic culture

3After both the participants for an experimental session have read and signed the informed consent form, each participant was assigned a unique participant ID of the form NxPy, where N refers to the treatment condition—one of the three cultural profiles, N=C for Chinese-Chinese condition, A for American-American condition, I for American-Chinese condition; x refers to the experimental session number within that condition (1-12); and Py refers to information distribution assignment (P1 or P2).

Previous | Next