upa - home page JUS - Journal of usability studies
An international peer-reviewed journal

Creating Effective Decision Aids for Complex Tasks

Caroline Clarke Hayes and Farnaz Akhavi

Journal of Usability Studies, Volume 3, Issue 4, August 2008, pp. 152-172

Article Contents


Results

The following sections discuss rankings, time, and user preferences.

Rankings

The rankings for a set of alternatives, ordering them from best to worst, represent a decision; the top ranked alternative represents the decision maker’s top choice. However, not all decisions are of equal quality. One question that we wished to assess is whether the decision method had an impact on decision quality.

Unfortunately, decision quality is difficult to assess directly for many reasons. The knowledge and skill of the decision maker impact the likelihood that the alternative identified as "best" will actually prove to be the best once it is actually built and criteria (such as cost, performance, reliability, and marketability) can be tested. We define a high quality decision as one where the decision maker's rankings accurately reflect the rankings computed from empirically measured data once the design alternatives are actually built and deployed, using the decision maker's specified criteria.

Although alternative prototypes are sometimes built and tested during a design process, the cost of doing so is often prohibitively expensive, particularly for complex devices like lunar exploration vehicles. Thus, in many cases, it is not possible to directly measure decision quality because most of the alternatives are never built.

However, there are other measures that one can use as indicators of decision quality when decision quality cannot be measured directly. While experts lack perfect judgment, they are far better than others at making judgments in their own area of expertise. Experienced conceptual designers for space missions estimate cost within 10% of the actual cost (Mark, 2002), which is quite impressive given the novelty of the designs and the number or unknowns they must manage. Additionally, it has been found in domains ranging from manufacturing plans (Hayes & Parzen, 1997; Hayes & Wright, 1989) to medical diagnosis (Aikins et al., 1983) that while experts may sometimes disagree on which is the top alternative, there are high correlations in their rankings even when those rankings are arrived at independently without consultation. In other words, even if two experts independently rank two different alternatives as their top choices, it is likely that both alternatives will be ranked near the top for most experts. If one assumes that experts are able to judge quality, then one indicator of quality is the correlation of a decision maker's rankings with those of experts. Figure 6 shows the average rank correlations between the rankings of the expert subjects and between the intermediates and the experts. Thus, the taller the bar is in Figure 6, the higher the level of agreement with experts (or of experts with each other).

Figure 6. The average correlation of subjects' rankings to expert rankings, using three different decision methods for intermediate-level and expert designers.

Figure 6. The average correlation of subjects' rankings to expert rankings, using three different decision methods for intermediate-level and expert designers.

These results show that experts were indeed more consistent in their rankings than were the intermediates. The decision aid used also made a difference in average correlation with expert rankings:

Each subject's rankings were produced independently. They were not allowed to discuss the relative merits of the alternatives in a set with other subjects prior to the experiment. Thus, the correlation between subjects is not the result of group discussions. We feel that these results indicate that experts are better at making decisions than intermediates, most likely because their greater experience allowed them to assess the alternatives more accurately. Both decision aids helped the experts to produce better decisions, possibly by encouraging them to think more systematically and carefully about the criteria, the alternatives, and their relative merits. The decision aids did not appear to make a significant difference in the consistency of the rankings for the intermediate-level designers, possibly because they lacked the experience to make good assessments of the likely cost, performance, etc. of the alternatives.

What was surprising was that the fuzzy method did not have a significantly different impact on the consistency of rankings from the deterministic method. In fact, it appears slightly worse than the deterministic method for both groups (although not significantly). There are many possible explanations for this result. Perhaps the subjects were more used to thinking of criteria in terms of single values than ranges (e.g., they are more comfortable thinking in terms like "the cost is $10" as opposed to "the cost is probably between $8 and $14"); perhaps they were not good at estimating uncertain values (Tversky, 2003) or the display used in the experiment was not supporting this reasoning about uncertain values in a way that fit their internal concepts.

Previous | Next