Concept and Diagnostic Research for the Web:
Lessons Drawn from Case Studies 1995-1998

by: Cheryl Harris, Ph.D.

Paper presented at: ESOMAR "Worldwide Internet Seminar," Paris, January, 1998
copyright Cheryl Harris 1998. All rights reserved.

Earlier Forrester Research projected that expenditures for website development alone will reach $10 billion annually before the year 2000 (Forrester 1997.) Currently, the cost of building a full website in major metropolitan areas such as New York City is estimated at $302,550 -- and that is without many of the added features that are increasingly becoming standard, such as Java applets, cookies, transaction capabilities, dynamic page delivery, a robust search engine, or chat forums (Net Marketing, 1997.) Although still only a fraction of the overall advertising and marketing budgets of most advertisers (Jupiter Communications AdSpend, 1997), developing an online presence already demands significant investment. Furthermore, there is evidence that a poorly conceptualized and executed website may do more to harm than help established brand equity.

These are reasons enough to consider ways in which the research industry may cooperate in developing appropriate methods for testing website concepts, content, features and functionality with the target audience(s) of the site. New media agencies and website developers find they are being asked to track the performance of their sites after launch, well beyond the insights provided by traffic data, and increasingly, are interested in insuring that their work matches audience expectations prior to and during site development. Based on a large number of accumulated international case studies in which the websites of "Fortune 100" firms in various stages of development have been tested with prospective visitors, this essay will discuss implications for creating models and protocols for future website tests. Approaches for Website Testing A wide range of methods have been applied to the problem of testing websites. Perhaps because many agencies specializing in interactive media have strong ties to the software industry, the "usability" testing method has been popular and widely used. Usability testing is a means by which software developers watch how users might interact with their product in a laboratory environment. Users are given a set of tasks to accomplish and are asked to report problems they encounter in the software as they work on their tasks.

Typically, users are debriefed after the session and patterns of interaction are examined across several such sessions. Sometimes usability tests are conducted in groups, with pairs of peers or friends assigned to a workstation and commenting on the software in tandem. Despite the widespread application of "Usability" studies in the software industry, there appear to be many problems with extending the methodology of the Usability approach to the WorldWide Web. First, there are a number of distinctive differences between a software package and a website or websites. People generally use a given software package to accomplish a limited number of fixed tasks, such as word-processing, statistical analysis, or image manipulation.

The expectations associated with a particular software package tend to cluster around a predictable set of functions and interface issues. However, WWW users come to the Web with almost unlimited objectives. Each user's interaction with a site warrants an independent investigation of the situational expectations attached to that interaction. For example, "usability of a site depends on what users are trying to accomplish. Are they surfing? Doing research? Buying products? Downloading software?" (Spool, 1997) Furthermore, individual websites are constructed with differing objectives in mind, which could include branding, retailing, or public service, among many other potential goals. This greatly stretches the "usability" methodology, to the point that in a recent manual entitled "WebSite Usability: A Designer's Guide" (Spool, 1997) the authors conclude that after having applied usability testing methods to a number of websites, they admit in fact "websites aren't like software....We assumed that websites would be just another form of software and could be tested similarly, but we were wrong." The authors suggest that the accepted rules for testing software do not seem to apply in website evaluation, and that few of their hypotheses worked in this new area. In short, usability testing in its conventional form has many problems in addressing the Web and further appears to be ignorant of the implications of more than 50 years of social scientific and cultural-studies based literature in communication and media research.

Usability testing models, for example, are ill-equipped to consider the influence of competing advertising within a given environment nor is it able to take into account the evolving editorial matter which forms the basis of most sites. Moreover, the experimental laboratory method used is not ideal for evaluating audience/user behavior, because it severely decontextualizes user behavior, which has long been recognized by audience researchers as a problem in evaluating the interaction between media and audiences. Therefore other methods of evaluating user-website interaction which allow this interaction to take place in its naturalistic setting appear to have more promise. Websites are also being tested in one or more of these ways: Surveys are posted on the website and all users are invited to fill out the survey. An elaboration of this approach includes the server selecting every nth visitor to the site and being served a "pop-up" invitation to fill out a survey (which can link to an internal form or take the visitor offsite to a vendor's server, then return the visitor to the original page request.) Occasionally, surveys will be sent out to a registered user base or even randomly, in the hope that the person on the other end of the e-mail address may have visited the site in question at some time in the past. Perhaps the most significant problem in using surveys for website evaluation is that a comprehensive evaluation of a site seems to be more appropriate for qualitative approaches, in that the exploratory nature of the web and the variety of objectives associated with its use cannot easily be explained by imposing categories on it from above. Lacking externally and internally valid models to predict web user behavior at this early stage of the medium, it is difficult to construct adequate survey instruments.

There is also the difficulty that survey responses obtained in this manner tend to be polarized in nature -- only the most delighted visitors or the most dissatisfied seem to respond, thus skewing the data. Alternatively, some companies have been developing ways of performing qualitative website evaluations in an online environment. Initially these tended to take place in chat-room environments such as on the IRC (Internet Relay Chat) system, MUDS (Multi-user Dungeons) or MOO's (Multi-User Dungeon, Object Oriented) but because these environments have little or no ability to include graphic images they lacked the synergism of evaluating the site in a structured fashion while looking at it in real-time. These text-based areas of the Internet also frequently had security problems. For this reason, firms like my own have focused on developing secure interviewing environments that are web-based so that discussion with one or more visitors may take place along with full multimedia immersion (website, webpages, audio, video, photos, etc.) Other advantages of web-based interviewing environments include the ability to observe or participate in an interview from wherever web access is available, now quite widespread. In addition, respondents are experiencing a website as they do "naturally" -- on their own equipment, at their own pace, and with the bandwidth they would normally use. This decreases the chance of introducing bias due to superior equipment or connection speed that a centralized "laboratory" test environment might enjoy. Sampling for Website Evaluations As the author (Harris, 1996, 1997, 1998) and other researchers have noted, sampling for online research is one of the greatest challenges faced in moving the field forward. Because no master database of online users exists (or is likely to be available in the foreseeable future), probability or true random sampling designs cannot be realistically achieved. In addition to this limitation, Internet users are notoriously intolerant of unsolicited communications (known as "spam") and so invitations to participate in surveys or other research projects, even by credible and well-known research firms, are often met with anger or even, various forms of unpleasant retribution.

Research firms which randomly solicit participation by "broadcast" e-mail are in danger of having their Internet Access pulled by their service providers, who have agreed to a zero-tolerance policy on spam. This has led most research organizations who practice online research to form internal panels, which can be screened, validated and sampled on demand. Much work needs to be done to advance our knowledge of panel management for this special population, which is quite transient and which is also quite capable of cloaking or disguising identity in ways perhaps unavailable to members of "traditional" mail or telephone panels. We do not yet know what constitutes panel maturation effects for online panels, or even what the optimum "black-out" period might be to avoid panel wear-out for members. It is clear that online panels can be very large and capable of very fine segmentation with the right tools. The fully international scope of online panels and the ease of access to members via e-mail coupled with the low cost of maintenance makes for a powerful equation. Recruiting panel members has been accomplished in a number of ways: some are rather costly: buying lists (which still puts one in danger of spamming), inviting participation through a phone or mail contact, buying online banner presence that invites clickthrough to a screener, or simply registering one's site with various search engines/directories and hoping for traffic. Several major research companies in the U.S., such as Simmons Market Research Bureau (SMRB) and National Family Opinion (NFO) have benefitted by being able to identify subsets of their large consumer panels who report online usage. These subset panels have the advantage of already being well-screened and with a wealth of household data attached to them, but it is unclear whether or not there are intervening variables associated with pre-existing membership in a panel which may make these panelists less representative of the online population.

Generally, we have found quota and stratified sampling schemes to be reasonably robust for online applications. Although nth selection sampling as a way of sampling website visitor traffic is promising, it must also be used with caution. This is so for several reasons. First, the software used for nth selection is typically based on CGI- code with a Javascript pop-up at various points in the intercept. Some web-browsers interact improperly with Javascript protocols or have problems with CGI calls. More research also needs to be done to determine response-bias effects. For example, what differentiates the "refusals" from the "cooperatives" in a website intercept attempt? Some respondents have reported being irritated by the obtrusiveness of the intercept device or indeed, see it as a violation of their privacy. As online users are highly sensitive to privacy issues and are fearful of the ways in which computers can track their behavior without their knowledge, it is reasonable that response effects related to the intercept method of sampling could be significant. Toward a Taxonomy of Website Elements Even a casual web user is aware of the many different elements which are now deployed in website design. These include advertising content delivered in many ways (animated or "flat" banners, with animated banners being far more common than the now outdated "flat" banner, interstitial ads, dynamically served ads, ads in multiple page positions, "keyword" ads, etc.), the "editorial" content of a site, graphics and other multimedia elements (such as audio, video, or animation delivered by software such as Shockwave and Flash), and page navigation strategies. Additional features might include avatar- or text-based chat communities, e-mail, and shopping baskets or other aids to online commerce.

Because it is commonly accepted that users exhibit very purposeful behavior when visiting sites -- they want to find what they were seeking and as quickly as possible -- and often do not go beyond the "splash" or home-page in deciding whether or not the site will be productive, designers try to load up the splash-page with as much information and as many features as possible to entice visitors to stay and explore. However, this practice has resulted in very cluttered design solutions in many cases. Developing a comprehensive taxonomy of elements to be evaluated in a test is a difficult and ever-evolving task. In our evaluation protocols, we first parse each site's elements by asking several questions as we analyze a site internally. For example: what are the elements, particularly in the splash page, that are most likely to contribute to a user's perception of a consistent brand identity or image? For example, are there patterns in the color palette or graphics utilized which could be isolated for analysis? What strategies are available to encourage the visitor to remain in the site as long as possible? What elements in the page layouts influence navigation choices (and possibilities?) Where should we be most alert for opportunities for the visitor to leave prematurely? When we interact with users during the test sessions, we ask many questions about why visitors exhibit each and every navigation behavior that we see.

We sometimes create a content analytic scheme for comparative studies, and are careful to apply the same categorical definitions across pages or across sites. This is helpful in making certain that critical elements are examined in our discussions with visitors. Other approaches which have been used for website evaluation include subjecting users to a limited battery of uses-and-gratifications oriented scales (Eighmey, 1997) and to measurements which examine behavior based on the variables of time spent on a page and number of page requests per visit, when factors such as page background color, image size, use of javascripts, presence of frames, and celebrity endorsements are manipulated (Dreze, 1997.) While these approaches do not seem well-suited to reflecting the comprehensive nature of a user's web experiences, they do shed light on some of the many factors which may influence the perceived quality of those experiences. Interviewing Techniques In recent months we have experimented widely with using Projective Interviewing Techniques within a range of website evaluation tests. Projective interviewing is commonly used in qualitative research, and has proven valuable in identifying underlying associations with brands, products, and concepts. The difficulty with applying projective tests online appears to be the lack of nonverbal supportive data (for example, the respondent's facial expression, intonation, or body language), which can contribute greatly to an overall understanding of the cultural and social assumptions attached to a verbalization. We are currently studying ways in which we can allow respondents to re-introduce this nonverbal data within a virtual environment that is predominantly text-based.

We do this by providing devices such as an extensive customizable library of emotive icons, vocabulary clusters which express various postures and other reactions (for example, the respondent may select the descriptors "nods enthusiastically" or "grins wickedly" from the library as an attachment to a typed verbalization.) This encourages participants to stay in touch with their physical and emotional responses as part of their overall contribution to the discussion and to continually include these in their remarks. We have also had good results with associative games and exercises in which respondents are asked to ascribe personality attributes to a website, brand, or representation of a concept. The standard focus group technique of asking participants to think of the product as a person, and to fully describe that person ("what gender would this person be? What would he/she look like? What kind of work would he/she do? What would he/she wear? Drive? Think about?") has worked well for us in online environments, once the nonverbal expressive strategies are added to the mix. In fact, we have found consistently that online respondents are less subject to the peer-pressure effect so frequently observed in focus groups around projective exercises in particular, probably due to the influence of their apparent anonymity. We acquire extensive and very detailed data in this way which goes beyond what we might expect in "offline" groups and without the over-emphasis on consensus building which seems to be a feature of group discussions. Projective exercises across an accumulated series of groups or depth interviews, with 25 data points or more, produce patterns which are easily recognizable but also enough variance to be certain of the thoroughness of the interviewing. Incentivization Ensuring reliable cooperation rates in online studies has been an ongoing concern for researchers. While participants in qualitative studies that demand a substantial investment of time require per-respondent incentive fees, just as conventionally executed focus groups and depth interviews do, we have found that the "magic number" for compensating online respondents tend to be lower than in offline groups. For example, a face-to-face focus group in a major urban center in North America which requires an "average" demographic profile may run as much as $50-$80 per respondent in incentive fees -- much more for a specialized or rarified demographic. Online, cooperation rates are acceptably high with incentive fees of between $25-$40, which represents a considerable savings in overall costs per project. This may be because the time commitment for online groups is less (typically one hour as opposed to two in "offline" groups) and there is no travel time, as respondents may participate from anywhere they have Internet access. For online survey research, we have observed no appreciable difference in cooperation rates between per-respondent incentivization plans and more generalized strategies such as awarding a prize by a random drawing, then announcing the winner. Because the cost and administration burden of fulfilling incentive fees for a large survey sample could be very steep, we have found positioning incentives as "awards" -- as long as the reward is perceived as a valuable one for the population in question -- to be a powerful means of incentivization. Appropriate Stages for Website Testing Websites may benefit from systematic evaluation at multiple stages of their development. Of course, the protocols utilized must be adjusted depending on the stage of development at the time of the test, but there should be awareness of the importance of gathering data capable of being analyzed longitudinally. Stages which are appropriate for testing include: Concept stage; Themes, ideas and topics can be tested with the target audience(s) in the same environment in which the finished product will be encountered Storyboard stage; content/copy can be roughed in at this point or the layout/graphic design and planned features can be targeted for test. "Beta" stages; A website goes through a number of iterations as features and functions are laid in; editorial content usually continues to change as the beta evolves. Pre-launch. A finalized beta, ready for launch but not yet public. At specified points within the first year of launch, or in response to observed problems, such as underperformance of goals (traffic, sales, ad clickthrough, etc.) Pre-determined re-design periods. Cross-methodological comparisons A great deal of work remains to be done in looking at the influence of the interviewing environment as an independent variable. While this work is well-advanced in making comparisons between such techniques as mail, panel, and telephone surveys, little is yet known about how the feature of interviewing online may impact respondent-interviewer communication behaviors and strategies. A few side-by-side studies are being reported now, such as Alecia Helton's analysis of Texas Instruments experience in conducting online and telephone surveys. We are currently analyzing data from a recent series of online focus groups done for our client HBO during the same week that they performed a set of "offline" focus groups on the same topic, with the same recruitment criteria. Early results suggest that there are significant differences in the expression of responses, but few appreciable differences in the conclusions. The training and expertise of the interviewer in online studies is probably a critical factor in this equation, and as there are few "experts" in online interviewing and no available formal training, it remains to be seen how methodological comparisons may be stabilized across practitioners so that reasonable conclusions may be drawn from this data. We plan to release our results from this qualitative cross-methodological comparison this spring.

CONCLUSIONS/SUMMARY

Within the past 12 months, the discipline of online interviewing, and particularly, practices associated with website evaluation, have come a long way. The challenges in further advancing the field have become clearer, although much work needs to be done in better understanding such aspects as sampling, managing panels, and applying accepted models from the history of marketing and advertising research to an online environment. Researchers interested in participating in online research must be willing to stay constantly alert to changes in not only the online user population but the cycles of new media industries -- as the business models evolve, so will the relevant criteria for doing effective website evaluation. Additionally, there will be increased pressure to invest heavily and continually in new hardware and software that improves online interviewing performance. Around the corner, for example, could be multi-point videoconferencing systems that bring respondents and interviewers together in a true virtual space. While at first glance this relieves us of struggling with such issues as the artificial introduction of nonverbal data, it also suggests a range of new problems in adapting our research models to fit the mediation of advanced technological systems. For researchers willing to accept the challenge of continual change, this vision of the future appears utopian. In a rapidly evolving environment such as that the Internet brings to a researcher's agenda, change and demand for adaptation appear to be the only sure bets.

References

Dreze, Z. and Zufryden, F. (1997). Testing Website Design and Promotional Content, Journal of Advertising Research. March/April, p. 77-91.

Eighmey, J. (1997). Profiling User Responses to Commercial Websites. Journal of Advertising Research, May/June, p.59-66.

Forrester Research, Inc. (1997). Marketing on the WWW Conference, New York City, May 21.

Gaal, O. (1997). WebTrack AdSpend: A Monthly Data Report. Jupiter Communications.

Harris, C. (1996). An Internet Education: A Guide to Doing Research on the Internet. Wadsworth/ITP.

Harris, C. (1997). Theorizing Interactivity: Models and Cases in Online Research. Marketing and Research Today, ESOMAR, November.

Harris, C. (1998, forthcoming). Strategic Interactive Communications. NetMarketing, July 1997. (Http://www.netb2b.com/cgi-bin/cgi_wpi_archive/wpi/96/09/01/article.1)

Spool, J. (1997) WebSite Usability: A Designer's Guide, User Interface Engineering.

About the Author

Cheryl Harris, Ph.D., is an experienced e-business executive and entrepreneur, as well as a respected educator. A former professor at California State University and Parsons School of Design, New York, a published author and frequent international public speaker, she is well-known as one of the leaders in user experience and usability research. In 1996 she founded Northstar Interactive, an online research and consulting firm, and led the firm to its successful acquisition in February, 2000. Northstar developed web-based software and usability tools and consulted on strategy + design issues for such clients as Procter & Gamble, Motorola, Sprint, IBM, Netscape, Sony, AT&T, Time Warner, Roadrunner, Ogilvy & Mather, Grey, Modemmedia, Monsterboard, Mastercard, Citibank, eBay, Office.com, Insweb, Ziff Davis, Conde Nast, NBC, HBO, Discovery, and CNBC. She was also SVP, Interactive Strategy at Datek Online where her redesign of the online brokerage's site resulted in a doubling of customer accounts in less than four months. The new site was recognized or received top awards from Money magazine, TheStreet.com, Gomez Advisors, PC Computing, Red Herring, and several others. She is on the boards of several institutions, including the Lower Manhattan Cultural Council, the University of Massachussets IT initiative, WNET reelnewyork, and is a juror for several digital media festivals. Her publications include three books: An Internet Education (International Thomson Press, 1996) Theorizing Fandom (Hampton Press, 1998) as well as numerous articles. She received her Ph.D. from the University of Massachusetts-Amherst in 1992.