The Story of Subject Naught:
A Cautionary but Optimistic Tale of Internet Survey Research






*University of Minnesota
†University of Texas
 

Abstract

In a web-based, sexual behavior risk study using a rigorous response validation protocol, we identified 124 invalid responses out of 1,150 total (11% rejection). Nearly all of these (119) were due to repeat survey submissions from the same participants, and 65 of them came from a single participant. This brief describes how we were able to detect these repeat submissions using the validation protocol, and highlights the importance of using both automated and manual validation techniques

Introduction

Critics of Internet survey research raise concerns about participant validity, specifically the difficulties of assuring participant eligibility and unique participation. The Men's Internet Study (MINTS) is a web-based, cross-sectional HIV/STI prevention study of the sexual risk behaviors of Internet-using Latino men who have sex with men. In conducting this Internet survey, we piloted a multifaceted strategy to validate participant responses. This brief describes our experiences in using this validation protocol and offers lessons for researchers considering Internet-based survey research.

Methods

The MINTS study was advertised through banner advertisements at Gay.com. The study offered $20 to Latino adult men who met the criteria of living in the U.S. and having had sex, at least once, with another man. A priori, only completed surveys were included for analysis. The web-based survey contained 455 data values and took an average of 42 minutes to complete.

Because of the available compensation and the political sensitivity of the subject matter, we recognized motivation for ineligible persons to participate and for respondents to complete the survey on multiple occasions. Although unable to anticipate all forms of deception, we developed an extensive validation protocol to manually review data. Table 1 shows the validation techniques we used, the number of suspect records that were flagged, and the number of such records subsequently invalidated. The validation techniques marked with a * were not anticipated in the initial design; they were added after manual inspection revealed specific problems. We retained the first complete survey from any subject, so techniques for detecting duplicates (marked with a †) necessarily detected more surveys than were actually invalidated. In addition, these techniques detected surveys that duplicated prior incomplete surveys when the subject re-registered and started over rather than completing the original survey.

Validation Goal and Method Number of Completed Surveys Flagged Number of Completed Surveys Eventually Invalidated
Validate Eligibility Criteria
  Breadth of internet use .80 .49
  Cross-check Latino Identity 3 3
  Cross-check MSM status (sex with another man) 2 0
  US Residence - non US payment address 0 0
  US Residence - non US IP address 2 0
  US Residence - Inconsistent ZIP code 34 2
  Age - birth date shows under 18 years old 0 0
  Age - birth day/age inconsistency (> 1 year) 22 0
Identify Duplicate Survey Submission
  Duplicate IP address (entire address)† 188 103
  Duplicate IP address (first 24 bits, includes above)† 203 104
  Duplicate e-mail Address† 21 3
  Duplicate name† 16 3
  Duplicate payment information (check address)† 16 0
  Duplicate e-payment receipt* 114 104
Detect Suspicious Completion Time
  Short completion time (12 minutes or less)* 114 91
Table 1. Summary of MINTS validation methods and results

Note: Totals do not sum to numbers in Figure 1, because a single survey
could be flagged multiple times before invalidation.

Results

Figure 1 traces potential subjects through recruitment, enrollment, participation, and validation. Of 1,150 completed surveys, we rejected 124 (11%), including 119 (10%) that were repeat surveys, 65 (6%) of which came from the same individual participant-the person we call Subject Naught.

Subject recruitment, retention, and validation in the MINTS study
Figure 1. Subject recruitment, retention, and validation in the MINTS study

We found few incidents of internally-inconsistent responses, despite a large number of internal consistency checks. Although many invalid surveys were detected through a combination of duplicate IP address, e-mail address, and birthdate/age checks, most were detected by monitoring completion time, comparing start/end times across surveys, and reviewing payment records. Subject Naught was identified only because of these last three indicators. He submitted different but internally-consistent content for each survey, used a variety of e-mail addresses, completed the survey on several computers with different IP addresses (specifically, computers in a University library), and requested payments to different accounts. He completed the surveys in an average of under eight minutes each (so fast as to raise suspicions), and typically started the next survey within three minutes after completing the previous ones. A post-hoc review of e-payments confirmed a common account holder for the different accounts.

To assess the potential impact of using these invalid surveys, we repeated our analyses on a dataset with the invalidated surveys included. When all surveys were included, a statistically significant relationship appeared (p<.01) showing a greater demand from Spanish-speakers for on-line HIV-prevention materials. When the invalid surveys were removed, no significant relationship was found. In this case, had we not had a strong validation protocol, erroneous recommendations for intervention (e.g., developing more Spanish-language prevention material) would have been made.

Discussion

To our knowledge, this is the first study to demonstrate both the importance of validation in Internet-based survey research, and the vulnerability of such research to sabotage by one subject. We identify four key lessons about validity threats in web-based survey research for compensation.

First, validity checking is an essential part of Internet-based studies. Using such a protocol identified 11% of our study responses as invalid, which in turn would have affected the results and recommendations from the study.

Second, although automated testing can "flag" suspicious survey completion patterns, manual review is essential and the final decision to exclude should be a human decision, not an automated one. Without some level of automated testing, it would have been impossible for us to validate over 1,000 data records; still we found many invalid surveys through payment verification-a process that was added only after a manual review of the data revealed certain anomalies. Also, some automatically "flagged" records we later determined were valid. For example, we automated detection of duplicate IP addresses, but duplicate IP addresses sometimes originate from dial-up ISPs, or where eligible subjects share a computer or Internet connection.

Third, use of a rigorous validation protocol provides greater confidence in the study sample. It may never be possible to claim 100% confidence in eligibility or uniqueness of subjects, but elsewhere (Ross, Rosser, & Stanton, 2004; Ross, Rosser, Stanton, & Konstan, 2004) we demonstrate that once the invalidated surveys were removed, the sample approximated on location and race the U.S. census for Hispanics.

Fourth, and most important, web-based survey research is still highly worthwhile, and can achieve high validity. The web does not provide a "cheap and easy" research platform, but it does provide an easier way to reach a large and geographically dispersed population. The substantial investment made in validity checks can increase confidence in the quality of data collected and therefore in the results of the research.

References

Ross, M. W., Rosser, B. R. S., Stanton, J., & Konstan J. (2004). Characteristics of Latino men who have sex with men on the Internet who complete and drop out of an Internet-based sexual behavior survey. AIDS Education and Prevention, 16 (6), 526-537.

Ross, M. W., Rosser, S., & Stanton, J. (2004). Beliefs about cybersex and Internet-mediated sex of Latino men who have Internet sex with men: relationships with sexual practices in cybersex and in real life. AIDS Care, 16 (8), 1002-1011.


About the Authors

Joseph A. Konstan (Ph.D., University of California, Berkeley, 1993) is Associate Professor of Computer Science and Engineering at the University of Minnesota, Twin Cities. His interests include human-computer interaction, personalization systems, and research on the applications and use of the Internet.
Address: Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455 USA

B. R. Simon Rosser (Ph.D., The Flinders University of South Australia, 1990) is Professor in the Program in Human Sexuality in the Department of Family Practice and Community Health at the University of Minnesota Medical School. He is director of the HIV/STI Intervention and Prevention Studies (HIPS) Center and a licensed psychologist with advanced degrees in psychology, behavioral medicine, and epidemiology.
Address: HIPS Center, University of Minnesota Medical School, 1300 South Second St., Minneapolis, MN 55454 USA

Michael W. Ross (Ph.D., University of Melbourne, 1980) is Professor in the WHO Center for Health Promotion and Prevention Research in the School of Public Health of the University of Texas at Houston. He has several advanced degrees in psychology, public health, community health education, venereology, and criminology.
Address: WHO Center for Health Promotion and Prevention Research, University of Texas, PO Box 20036, Houston, TX 77225 USA

Jeffrey Stanton (M.P.A., University of Minnesota, 2004) was project coordinator for the MINTS project. He is currently working for Family Health International.
Address: c/o HIPS Center, University of Minnesota Medical School, 1300 South Second St. Minneapolis, MN 55454 USA

Weston M. Edwards (Ph.D., University of Minnesota, 2004) was the research assistant for the MINTS project. He is currently in private practice as a psychologist.
Address: c/o HIPS Center, University of Minnesota Medical School, 1300 South Second St. Minneapolis, MN 55454 USA