Resources: UPA 2004 Idea Markets
Best Practices for Utest Logging
Activator: David Dayton, SPSU
As a result of this Idea Market session, I have identified four types of utest logging:
- The first type is described in popular usability textbooks: the logger records predictable events which are sorted on the fly into one or more categories. This type of logging produces quantitative data that can be analyzed with statistical methods and compared to pre-defined benchmarks to arrive at conclusions about the usability of the product tested. None of the five session participants who shared their logging methods with me reported that they used this type of logging.
- Another approach to logging observations uses free-form handwritten note-taking to capture significant events indicating a possible usability problem. The notes are later analyzed to group and categorize events and rate their severity. Four of the five participants who shared information on this topic used some form of this approach.
- A third approach uses software designed for logging to combine methods 1 and 2 above: the logger codes events that fall into certain pre-set categories but also types descriptive notes so that observers of the session can later review the log to identify and discuss the most significant problems. One participant used this approach.
- The logging approach I use focuses on capturing the “story” of a test session in shorthand form with a running description that highlights significant events; time-stamping is used to sync each block of log text to the video tape. Occasionally, I will type a category label for a particular event, such as “navigation problem” or “mental model gap,” to make sure the event is noticed and examined by the entire team during the analysis session.
One participant’s simple criterion for deciding whether a log is effective met with general agreement by those participants who heard or read it: A log is effective if it allows a team to find answers to its questions without having to review the session tapes. This same participant noted that a common mistake made by those new to logging is what I would call “observation overkill”: recording excess information that can impede the team in reviewing the logs during analysis sessions.
After mulling over the information collected during the session, I found myself intrigued by these observations:
- No one who stopped by during the session spoke up for the purely quantitative approach to logging described in method 1 above.
- The description of an effective log that emerged from the session
implies that a log’s main function is to serve as a shared record
to be used in lieu of the session tapes to reach a consensus about what
happened. However, that function seems at odds with the logging described
in method 2 above. Three of the free-form note takers did not mention
whether their observations were time-stamped and shared with others
during analysis sessions as a baseline record of what occurred during
the sessions. Should their methods count as “logging” if
their notes do not constitute a baseline record subjected to inspection
and analysis by the entire team of observers? For similar reasons, should
the individual note taking by members of a team that culminates in an
affinity analysis session count as logging, or is it a method of recording
observations that should be distinguished from logging with a different
Though only five participants shared their methods of utest logging during this session, at least that many more stopped by to read the questions and notes and to listen to whatever discussion was happening at the moment. I think the session produced worthwhile information that I will make a starting point for a deeper investigation into logging practices.
In this Idea Market session, I solicited answers to the following questions:
- What approaches are most widely used to log observations during usability tests?
- What are the hallmarks of an effective usability test log?
- What are the key issues that practitioners disagree about with regard to logging?
- What common mistakes do newbie loggers make?
- What important guidelines should newbie loggers learn?
Under headings referring to these questions, I will summarize in detail the information and opinions that Idea Market participants shared with me.
Approaches to Logging
Some widely used textbook explanations of usability testing procedures describe logging as a purely quantitative data-collection process: the logger records the occurrence of certain predictable events using codes that put each event into one or more categories; the time of each event is also recorded, along with the total time for each task.
None of the five participants I talked with during this session used this approach. Only one participant sorted events on the fly into pre-set categories. He used category coding in combination with a running description typed into a software program designed for utest logging. Three other session participants reported that they logged utest observations with handwritten notes. Two of the three mentioned that they used either a highlighter or sticky notes to mark significant events in their notes when analyzing and categorizing the events.
One participant described logging as a function shared by all the observers on the utest team. At this company, each team member observing a test uses sticky notes to record short descriptions of usability problems as they occur, with a limit of one significant event per sticky note. The data thus collected are analyzed one task at a time after each test session. During the data-analysis sessions, team members use affinity diagramming to group and label their sticky notes.
At Southern Polytechnic’s Usability Center, I use the logging approach described by my colleague Carol Barnum in Usability Testing and Research. I use a software application with a time-stamp function that we sync with the video tape for each test session. I use the note-taking function in the software to type a detailed running description of the evaluator’s actions, along with his or her comments revealing reactions, assumptions, expectations, and problem-solving process. I also record significant observations whispered by other members of the team in the control room.
Our utest teams always include members of the client’s product development team and, usually, documentation and tech support people as well. During the data analysis session held immediately after the final evaluator has finished, the team sits down to review the logs one at a time. Working from their own notes, team members suggest significant events, which we find in the log and discuss, deciding whether the event represents a problem and, if so, deciding on the terms for describing it concisely. These findings are listed on a flip chart for each user.
At the end of the session, the team reviews the flip chart pages taped to the walls to sort and categorize the significant events in a table whose column headings use terms such as “mental model,” “navigation,” “feedback,” etc. The table shows which events apply to which users; this helps the team to identify problems that should be given top priority.
Hallmarks of an Effective Log
One participant defined an effective log as one that enables the team to answer the questions being asked without having to review the video tapes of the test sessions.
That definition seemed to satisfy everyone who heard or read it during the session.
The same participant identified the hallmarks of effective logging software as software that provides a note-taking function, time-stamp function, and an optional function to allow the logger to quickly categorize events with pre-set labels/terms. This participant said the following were the most important categories he used when logging:
- Usability problem
- Error message
- User action
- User suggestion
Key Issues in Contention
Participants in the session offered no remarks on this question. I had thought I might hear about disagreements concerning the value of the strictly quantitative approach to logging that is described in some widely known usability textbooks. However, only one participant reported the logging of events with pre-set category labels, and he put as much or more emphasis on the notes he typed to describe events.
I have examined several usability-test logging applications, and they all provide the option of coding events on the fly with pre-set categories. Some of the applications allow as many as three different categories to be applied to a single event. The quantitative logging paradigm, then, would seem to be strong and widely applied. Whether it really is widely used and to what degree is one of several questions about utest logging that I intend to investigate further.
Common Newbie Mistakes
My question about mistakes commonly committed by inexperienced loggers drew one response. A participant said that newbie loggers tend to try to capture everything that happens and everything the user says. The result is a log that’s impossible to review rapidly to identify significant events.
Guidelines for Newbies
Participants did not offer
specific guidelines for newbie loggers to learn, but the participant who
noted the newbie tendency to try to capture too much information ended
up suggesting a guideline by implication: To become effective loggers,
newcomers need to filter their observations on the fly, keeping unimportant
and marginally informative details to a minimum. I believe that learning
how to do that effectively requires loggers to work with their teams to
achieve a clear, concise style of recording observations that captures
what the team thinks is important and avoids, as much as possible, the
injection of purely subjective judgments into the data.