A PTS Professional Paper


Web Site Design, Usability, Documentation

Three Usability Methods
                     


Phillip T. Scarborough
PTS Technical Writing
 
 
 

A COMPARISON OF THREE METHODS FOR THE 
USABILITY EVALUATION OF WEB SITES

Phillip Scarborough
Tec-Ed, Inc.
P.O. Box 51054
Palo Alto, CA 94303
 

Stephanie Rosenbaum
Tec-Ed, Inc.
P.O. Box 51054
Palo Alto, CA 94303
 

This paper compares three usability assessment methods for Web sites: usability focus groups, heuristic evaluation, and laboratory testing. It describes Web site usability issues and presents case histories of studies conducted by the authors, illustrating why each assessment method was chosen and how it was applied.

Usability Focus Groups

Usability focus groups apply a method that originated in market research to obtain rich qualitative information from target users, often when a Web site is still in the planning or early development stages. Focus groups are helpful for:

  • Assessing what users need

  • Determining how users make decisions

  • Exploring user opinions of new ideas

  • Collecting user feedback in order to improve existing Web sites

In a focus group, people with similar characteristics who don’t know one another discuss selected topics with the assistance of a moderator [1]. Focus groups place people in relaxed group situations, as opposed to the controlled situations typical of laboratory testing.

Strengths of Focus Groups

Encourage candor. The focus group moderator creates an atmosphere in which participants feel free to express diverse points of view, with no pressure to agree on or even to support particular ideas. Participants develop perceptions and make choices in part by interacting with the other people in the group, just as they do in real life. The relaxed environment leads to increased candor by the participants.

Allow discussion flexibility. The relaxed, open environment of the focus group allows new issues to emerge for discussion. The moderator can explore these unanticipated issues with participants and ask probing questions to collect detailed information.

Enable larger sample sizes. Laboratory testing is usually done with fewer than 20 participants because of the time and cost constraints of individual sessions. Not only are group (rather than individual) sessions a “multiplier” of the moderator’s time, but also the groups can grow somewhat in size—from a low of 5 or 6 people to a high of about 10 people per session—without dramatically increasing the resources needed.

Concerns About Focus Groups

Permit off-topic discussions. The moderator has less control of a focus group than of a highly structured laboratory test session. Focus group participants may influence not only each other but also the course of the discussion. Sharing group control can be inefficient and lead to irrelevant discussions, unless the moderator is skilled enough to keep the group focused.

Produce difficult-to-analyze data. Participants may modify or even reverse their positions after interacting with others, making focus group data more difficult to analyze. Usability specialists must take care to interpret comments in context and in order, and to avoid drawing conclusions too early.

Require trained moderators. A skilled moderator is key to obtaining the best results from a focus group. Untrained moderators typically lack the expertise to ask open-ended questions, use techniques like pauses and probes, and avoid influencing the discussion (especially if they are part of the product development or marketing team).

Behave in varied ways. Each focus group has unique behaviors. One group might be lethargic and dull, the next energetic and stimulating. Conducting multiple sessions helps balance the individual differences that may emerge from a single focus group.

Complicate scheduling. Each focus group requires that all participants come to a specified place at an appointed time. Laboratory testing offers more flexibility in scheduling, because individual participants usually have choices of test times and dates.

Methodology for Usability Focus Groups

The authors’ methodology for usability focus groups is to create a team of two usability specialists who work together to design and carry out the study. One usability specialist moderates the focus group sessions; the other takes detailed notes to expedite reporting of the results.

We use two-person teams because an effective focus group moderator must concentrate on the facilitating task: drawing out quiet group members, eliciting explanations of ambiguous or incomplete comments, making sure everyone’s opinions are respected. As a result, the moderator’s notes about the participants may be sketchy and uneven.

Also, a trained observer provides “outside objectivity” that’s hard for members of a product team to achieve. The same immersion in their development and marketing goals that creates a successful product makes it difficult for the product team to step back and see it with fresh eyes.

Each focus group typically consists of 7 to 10 participants who share key characteristics that relate to the study topics. The authors’ firm employs trained interviewers on our staff to recruit participants. Depending on the focus group topic, participant candidates may come from existing customer lists, our internal database of potential participants, or targeted newspaper ads. A detailed screening script and questionnaire, developed in conjunction with the client, ensure consistency in selecting participants.

The moderator conducts the focus groups from a script outline that lists high-level questions and issues, with supplementary “probing questions” for each. However, because of the dynamics of the group environment, focus group scripts can’t be as detailed as those for laboratory testing. The moderator must remain flexible, which in turn puts more demands for in-depth note-taking on the observer. We supplement our detailed notes with videotapes and back-up audiotapes of the sessions. We then analyze the collected information and prepare a written report on the focus group findings.

Product developers are sometimes concerned that usability specialists who moderate focus groups won’t have the in-depth product knowledge to answer participants’ questions, especially if the moderator comes from a consulting firm. The authors normally invite developers to observe focus group sessions. If difficult questions arise, the moderator offers the participants the opportunity to talk with the developers at the end of the session. This approach also gives developers the opportunity to ask questions that occur to them while observing, without interfering with the planned agenda of the focus group.

Usability Focus Group Case History:
New
Web-based Product

The authors recently conducted usability focus groups to explore a company’s new
Web-based product, one which supported searching for information on the Web. The study sought to learn why and how intended users might use the product, and to gain specific feedback on alternative user interface designs.

Two 90-minute focus group sessions were held in the company’s offices, in a conference room set up with portable video equipment. A monitor was set up in an adjacent room so that the developers could observe the focus groups without disturbing or influencing the participants. One usability specialist moderated the sessions, and another took notes at the back of the room.

The focus groups began with questions about the participants’ current Web use, which helped us to understand the participants’ opinions in the context of their experiences. For example, we asked the participants how they currently conducted Web searches and what they thought about various aspects of browser and search engine technology.

We next introduced the first user interface design for the product by projecting slides on a screen at the front of the room. We asked participants what they perceived as the advantages and disadvantages of the product, how it would help or hinder their own search processes, and how they would feel about using it on a daily basis.

We repeated similar questions for each of the three user interface designs being considered. To counterbalance the effects of which user interface the participants saw first, we reversed the order of presentation of the alternative designs for the second group. We would have preferred to conduct three focus group sessions, but project resources did not permit it. However, the behavior and opinions of the two groups were quite consistent.

At the end of each session, we asked what participants thought about specific features that the developers wanted feedback on. We also discussed which of the alternative designs the participants preferred, and why.

The focus group results showed that most participants in both groups clearly preferred the same user interface design. Typical concerns centered on the placement of buttons and icons, and on the overall ease of use of the product. However, participants did not agree on which features would be more or less useful to them.

Heuristic Evaluation

Heuristic evaluations are expert evaluations of products or systems, including information systems and documentation. They’re conducted by usability specialists, domain experts, or—preferably— “double experts” with both usability and domain experience. Evaluators use industry-accepted guidelines for usability (“heuristics”), their own experience from prior usability studies, their domain knowledge, and their ability to “put on the user’s hat” when identifying problems and recommending solutions.

Heuristic evaluation by two or more usability specialists can identify a majority of the usability problems in a Web site, with the problem-identification percentage increasing as you add evaluators [2]. More evaluators not only find more problems, but also provide a better indication of problem severity when they jointly analyze the results of their independent evaluations.

Strengths of Heuristic Evaluation

Works with limited time and resources. Skilled evaluators can produce high-quality results in a limited time—usually two or three weeks, including a report of findings and recommendations—because the method doesn’t involve time-consuming participant recruiting or detailed scripting. Heuristic evaluation can enable many usability improvements to take place before a release deadline that would not permit laboratory testing.

Increases the value of lab testing. As the first phase of a two-phase usability effort, heuristic evaluation can greatly increase the value of laboratory testing. Heuristic evaluation “harvests the low-hanging fruit” by identifying obvious or clear-cut usability problems.

Unmasks hidden usability problems. Without prior heuristic evaluation, test participants may spend much of their sessions struggling with an obvious usability problem. Meanwhile, other equally important usability problems can be “masked” by the first problem and not be found during testing.

Is suitable for early use. The two-phase approach of heuristic evaluation followed by laboratory testing is consistent with current iterative software development practices. For example, heuristic evaluation can take place on an early prototype, before changes become costly, while laboratory testing can follow at the alpha stage.

Concerns About Heuristic Evaluation

Does not provide user data. The major drawback of heuristic evaluation is that, regardless of the evaluators’ skill and experience, they remain surrogate users and not typical users of the Web site. The results of heuristic evaluation are not actual (“primary”) user data and thus are slightly suspect. Real users can surprise us: they often have problems we don’t expect, or they breeze through where we expect them to bog down.

Rarely emulates key audience groups. Heuristic evaluation rarely emulates all the key audience groups for a Web site. For example, user groups accessing a site devoted to financial planning might include accountants, stock brokers, insurance company professionals, insurance agents, financial planners, bankers, SEC attorneys, and more.

Depends on evaluator expertise. Heuristic evaluation is highly dependent on the skills and experience of the evaluators. Usability specialists may lack domain expertise; domain specialists are rarely trained or experienced in usability methodology. The authors prefer to concentrate on usability expertise because the developers can usually fill gaps in domain knowledge.

Can appear to be just another opinion. For every new Web site, the developers often have strong design opinions. The results of an heuristic evaluation can sound like just another opinion, and why should the developers accept the usability specialists’ opinion over their own?

Methodology for Heuristic Evaluation

The authors’ methodology for performing heuristic evaluation is to create a team of at least two usability specialists, who perform independent evaluations of the user interface and take notes on their findings. The evaluators then discuss their separate findings and find common ground for communicating the findings to the developers.

The findings are usability problems and concerns about the Web site, as well as notes of successful features that shouldn’t be changed. Often we can recommend specific improvements; sometimes we only suggest design directions to follow.

We generally organize our findings into four categories: user task support, UI behavior, presentation, and terminology. Although there tends to be overlap in findings among these categories, using the categories ensures that we give full attention to each aspect of a usability problem.

The evaluation team always delivers a written report of findings and recommendations. When practical, we give an oral results presentation as well, to discuss the findings with the developers.

Heuristic Evaluation Case History:
Industrial Product Information Web Site,
Phase 1

A major publisher of industrial product information has developed a Web site for engineers, managers, purchasing personnel, and other audiences to look up product information. The target audiences may or may not already use Web information resources, so Web site success depends on successful first use. In addition, the site needs to be easy to use for ongoing use.

The publisher commissioned a series of usability studies of the Web site user interface. The first study was an heuristic evaluation to identify first-tier problems that did not require collection of user data to identify—problems such as inconvenient placement of screen elements, unfamiliar terminology, and cross-platform readability issues.

The software engineers developing the site were receptive to the value added by usability assessments. Many of the issues the evaluators identified had already emerged in development discussions and informal UI walkthroughs. In addition, although the prototype user interface had not yet undergone graphic redesign, the heuristic evaluation results gave the site’s graphic designers more insight into how users approached their search tasks.

Meanwhile, the developers worked from the evaluators’ suggestions to create a more usable interface for the next prototype, on which we conducted laboratory testing.

Laboratory Testing

In laboratory-based usability testing, people whose characteristics (or “profiles”) match those of the Web site’s target audience perform a sequence of typical tasks using the site. The test participants, usually working one at a time, perform the same tasks under controlled conditions.

A detailed description of formal usability testing methodology is beyond the scope of this paper. Several recent books and papers discuss laboratory testing in detail [3, 4]. A previous paper by one of the authors compares laboratory testing to several other usability methods [5].

Laboratory testing of Web sites can explore questions with measurable answers, confirm or challenge the assumptions of developers, and help choose between design alternatives. Recording user behavior on Web sites is especially challenging, because users of Web sites can take numerous possible paths to reach their goal—and often cycle through pages repeatedly.

Strengths of Laboratory Testing

Provides measurable data for decision-making. Laboratory testing is valuable when making clear-cut design decisions about Web sites. It can answer questions like:

·        Which of two alternative designs for a home page is more successful, and why?

·        What problems do people encounter performing product registration on a Web
      site? How long does the registration process take?

·        How long does it take people to find desired information on a search site?
      How many and what kind of errors do people make in specifying the desired
      information?

·        What problems do people encounter when downloading software from a Web
      site? How long does a typical download process take?


Builds
credibility. Measurable, quantitative data builds credibility for usability research, especially in technical or engineering-driven organizations.

Reassures managers. Corporate managers accustomed to numerical data usually find laboratory testing reassuring.

Convinces observers. If the Web site developers can watch actual test participants having problems using the site, this experience is often more convincing than the opinions of usability specialists, however similar. (A dedicated laboratory facility isn’t required; developers can observe at a remote monitor through a video-camera feed, or watch videotapes after the test sessions.)


Concerns About Laboratory Testing

Ongoing revisions discourage testing. Web sites are revised more quickly and more often than software that will reside on a user’s computer. There may be less motivation to conduct formal usability testing because site developers are already planning revisions to the version of the site that will be tested.

Web sites can be moving targets. Especially when navigation from the home page is an issue, a changing Web site can degrade the script developed to explore the issues identified for the lab test. Web site developers have to resist modifying a particular Web page version while laboratory testing takes place; usability specialists have to be willing to adjust the script right up to the day before the test.

Lab tests require more resources. Even usability testing with tightly focused issues and 4 to 6 participants per audience group [6] requires more resources than heuristic evaluation and usually requires more resources than focus groups.

Lab tests often take longer. Because of the need to recruit participants with profiles that match the target audience for the Web site, it’s difficult to gain reliable data from a laboratory test in less than three weeks from the project start date, and many laboratory tests take considerably longer. The entire process typically takes from four to six weeks, including results reporting.

Methodology for Laboratory Testing

Most of the authors’ methodology for laboratory testing of Web sites is consistent with the published literature. We first describe the questions to be answered and the desired participant characteristics in a test design document. This document becomes the basis for a recruiting script, a participant-screening questionnaire, and a test administrator script that ensures all participants receive the same instructions and error remediation.

The vast number of user path alternatives at a Web site, especially a large informational Web site, makes usability test task scenarios trickier to scope. Rather than directing users to specific paths, our approach has been to allow users to go wherever they please to perform a task; we track where they go and their stated reasons. The greater the number of users recruited, the more we can assess which pathways are more frequently traveled and why.

The browser history list does not adequately record the order of pages visited, the links selected, or how much time users spent on each page. Server logs provide vast amounts of data that requires time-consuming analysis, and even then one does not know why a user spent a lot of time on a page.

The note-taking method used at the authors’ firm captures these types of information, which we believe are critical to understanding the scope of usability problems at a Web site [7]. We embed in the script specific prompts for note-taking about user activities. We also have a printout of the Web pages themselves on which to note where users visited and in what page order. Of course, we also videotape the test sessions, but our clients usually want the results more quickly than we can deliver if we need to watch all the videos.

Laboratory Testing Case History:
Industrial Product Information Web Site,
Phase 2

Concurrent with performing the heuristic evaluation of the industrial products Web site, the usability team planned the first round of laboratory testing, which was performed with a limited prototype that reflected the recommendations arising from the heuristic evaluation. The task scenarios for the usability test were based on the preliminary product information that the Web site would access and the concerns identified during heuristic evaluation. The results of the laboratory testing informed the graphic design revisions already underway.

Two usability specialists administered and observed the test sessions, using participants who met the screening criteria for people who would be likely users of the Web site. During the test sessions, participants “walked through” some two dozen Web pages to search or browse for particular types of products, view product information, and refine searches.

The usability team collected both qualitative and quantitative data, including which choices users made to find product information and how satisfied they were with the results. The test administrator also interviewed participants about the improvements they wanted in the final Web site and their preferences for planned features, such as automatic updates about selected products.

Participant experiences and comments indicated that the Web pages were generally logical and easy to use. However, certain terminology, screen elements, and page-layout choices continued to slow first-time use; participants also wanted more options at the highest level of the information hierarchy. The authors’ recommendations included alternatives that would address participants’ problems in these areas.

Conclusions

In considering which of the three methods presented in this paper to try first for evaluating Web site usability, let’s suppose an organization or company has just a small window of time in which to prove the value of usability research in the development cycle. In that case, the authors recommend starting with collection of primary user data through laboratory testing. Actual user data will convince more people, especially in engineering-driven companies, than will usability focus groups or heuristic evaluation.

If an organization is receptive to usability research or already has a usability program in place, an iterative sequence of usability focus groups, heuristic evaluation, and laboratory testing achieves the greatest value from each method:

·        The usability focus groups gather user requirements and opinions to support
      product design.

·        The heuristic evaluation makes a pass at catching the most visible usability
      problems (“the low-hanging fruit”) in an early prototype.

·        The laboratory testing validates the resulting product improvements and focuses
      on deeper issues. In addition, iterative testing is critical to uncovering issues
      arising from resolution of earlier problems.

Would the authors ever recommend usability focus groups or heuristic evaluation alone? Yes, when resources and priorities make it necessary—because some usability evaluation is better than none at all.

For example, if a new Web site differs greatly from its predecessors or if the development team recognizes they need more information about user needs and desires, it’s especially important to conduct usability focus groups early in the design process. Another opportunity may occur later to obtain resources for usability testing.

On the other hand, if a site has already undergone iterative testing and is now receiving minor revisions, or if a site has an extremely small usability budget, then a modest heuristic evaluation project can be entirely appropriate.

Every usability project the authors have performed over the past ten years has produced many recommendations, usually both ideas for immediate implementation and others that influence long-term development. Although developers can generalize more reliably from laboratory testing data, all usability research methods can produce valuable results.

Acknowledgments

Portions of this paper appeared in a slightly different form in the SIQDOC 97 Proceedings, published by the Association for Computing Machinery.

References

1.   Krueger, R. (1988). Focus Groups: A Practical Guide for Applied
    
 Research. Newbury Park, CA: Sage Publications, Inc.

2.   Nielsen, J. (1993). Usability Engineering. New York, NY: Academic
       Press, Inc.

3.   Dumas, J.S. and Redish, J.C. (1993). A practical guide to usability testing.
      Nonwood, NJ: Ablex.

4.   Kantner, L. (1994). Techniques for Managing a Usability Test. IEEE
    
 Transactions on Professional Communication, 37(3) pp. 143-148.

5.   Rosenbaum, S. and Kantner, L. (1995). “Alternative Methods for Usability
      Testing.” ErgoCon ‘95 Proceedings, San Jose, CA, pp. 47-51.

6.   Virzi, R.A. (1992). “Refining the Test Phase of Usability Evaluation: How
      Many Subjects is Enough?” Human Factors, 34(4).

7.   Kantner, L. (in press). “Following a Fast-Moving Target: Recording User
      Behavior in Web Usability Testing.” Common Ground.