Using paradata to evaluate response quality in surveys

Code

J5-9334 (C) - included in ARIS records

Head

PhD Vasja Vehovar

Period

7/1/2018 - 6/30/2021

Range in 2021

0.75 FTE

Science

Social sciences (8)
Other (1)

Reseacher status

Researcher (7)
Junior expert or technical associate (2)

Education

Doctoral degree (6)
Other (3)

Sex

Woman (4)
Man (5)

Status

Employed at RO and RRD (8)
Retired (1)

No. of publications

0 (1)
1–9 (1)
10–99 (3)
100–999 (4)

Projects / Programmes source: ARIS

Using paradata to evaluate response quality in surveys

Research activity

Code	Science	Field	Subfield
5.03.00	Social sciences	Sociology

Code	Science	Field
S274	Social sciences	Research methodology in science

Code	Science	Field
5.04	Social Sciences	Sociology

Keywords

paradata, surveys, data quality, web surveys

Evaluation (metodology)

Evaluation of bibliographic research performance indicators according to ARIS methodology

Citations Citations for bibliographic records in COBIB.SI that are linked to records in citation databases

Organisations (1) , Researchers (9)

0582 University of Ljubljana, Faculty of Social Sciences

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	30704	PhD Jernej Berzelak	Sociology	Researcher	2018 - 2020	140
2.	34789	PhD Gregor Čehovin	Sociology	Researcher	2019 - 2021	66
3.	17913	PhD Katja Lozar Manfreda	Sociology	Researcher	2018 - 2021	192
4.	38368	PhD Miha Matjašič	Sociology	Researcher	2018 - 2021	69
5.	38051	Bojana Novak-Fajfar		Technical associate	2018 - 2020	0
6.	29060	Ajda Petek	Sociology	Researcher	2018	29
7.	27574	PhD Andraž Petrovčič	Sociology	Researcher	2018 - 2021	349
8.	51405	Katja Trebežnik	Sociology	Researcher	2019	4
9.	10155	PhD Vasja Vehovar	Sociology	Head	2018 - 2021	887

Abstract

Technological developments are driving surveys towards paper-free and interview-free data collection methods. Computerised self-administered surveys (among which web surveys dominate) can be conducted much more cheaply, although there are rising concerns about the data quality. Within this context, survey paradata – data about the process of collecting survey data – are playing an increasingly important role. One essential type of paradata is the time a respondent needs to answer a survey question. Together with other paradata (e.g. mouse movements, keystrokes etc.), this can help identify respondents who perform the response process in an undesirable way (e.g. too fast or providing inaccurate answers). Paradata can also serve in evaluations of survey instruments and for online interventions, such as real-time alerts. To effectively capture and process paradata, a proper approach is needed concerning the level of measurement (item, question, page or questionnaire), the metrics used (statistical or cognitive measures), and criteria for the removal of respondents with low quality of responses (e.g. the quickest 1% of respondents). A preliminary literature review and research practice (e.g. online panels) show the current approaches are extremely diverse and inconsistent. Particularly critical is the lack of comprehensive insight into the relationship between paradata, response quality indicators and the effects on estimates when respondents with a high probability of low quality of response are removed. The proposed project addresses issues that are highly relevant to general social science methodology and narrows existing knowledge gaps by pursuing the following objectives: 1. providing new knowledge via a systematic overview of the literature and global research practice on the use of paradata; 2. establishing original knowledge on the link between paradata, response quality and the effects of removing low quality responses; 3. developing new approaches to identify respondents who – according to paradata and response data – demonstrate probability for low quality of response, whereby their removal would increase the overall data quality; 4. developing a set of standardised paradata-based compound indicators – related to quality of responses and also to the social and psychological traits of respondents – to enrich response data. The above objectives will be achieved by studying dedicated web surveys where detailed paradata were captured and specific methodological questions were used, which includes the European Social Survey online panel (CRONOS 2017). In addition, new studies will be conducted with a leading Slovenian online panel (e.g. Valicon), with a global online panel (e.g. Survey Monkey) and with non-panel respondents. Another research stream will comprise a meta-study of around 100,000 web surveys from the 1KA open-survey platform. The project is to be run by a team, which is one of the pioneers of web survey research (since 1996). Besides publications (particularly the monograph Callegaro, Lozar-Manfreda and Vehovar: Web survey methodology, 2015, Sage), the team is internationally recognised for maintaining a central resource for web survey methodology (WebSM.org) and for the open-source survey platform 1KA. The project’s objectives aim to bring some conclusive insights into paradata issues (objectives 1 and 2), areas where considerable controversies exist. The objectives also seek to develop basis for key industry standards (objectives 3 and 4) and for general breakthrough improvements in survey data quality. Assurance that these ambitious goals will be achieved comes from the past achievements and extremely high competence of the Scientific Advisory Board composed of leading researchers in this area: Mick Couper (U of Michigan), Michael Bosnjak (U of Trier), Frauke Kreuter (U of Maryland) and Jon Krosnick (Stanford University).

Significance for science

Paradata in surveys are currently in a certain dilemma, with their potentials still not fully recognised, at least not in an operational manner supporting massive and standardised use. On one hand, they are extensively researched and examined in several specific studies while, on the other hand, they remain not fully accepted and used as a standard component of computerised survey data collection.

This project will be one of the first to develop a comprehensive methodology for the general and standardised exploitation of survey paradata. With this, the project is at the forefront of innovation in survey methodology, hence also representing a valuable improvement in social science methodology in general. 

More specifically, the project also provides grounds for new solutions with the prospect of becoming a foundation for corresponding survey industry standards:
• Data cleaning is a procedure in the post-survey adjustment process and it includes several tasks. One relates to the process of removing respondents with potentially high probability for response quality to be unacceptably low (e.g. due to speeding). However, current data cleaning procedures very often do not rely on the paradata at all. Even when they do, many different approaches exist, sometimes with contradictory results. This project will provide the basis for developing a new optimised approach for identifying respondents with a high probability of unacceptably low response quality, which may be better to be removed. 
• Data augmentation relates to survey response data, which can be added with a standardised set of paradata-based compound indicators. This added information need to be correlated with the response quality and also with the behavioural and psychological characteristics of the respondents. The project will study in detail these relations. Based on these results, large amount of information contained in paradata will be fully exploited, but extracted to produce a handful of indicators relevant to the data quality evaluations and also to the substantive use. We estimate that this set of paradata-based compound indicators will have roughly around 10 dimensions. 

Within this project, prototypes with the implementation of above mentioned solutions will be developed within the open-source platform for web surveys – 1KA, so that contributions will be easily evaluated and exploited.

The project’s scope is also closely related to other emerging developments in social science methodology, including analysis of large datasets (‘big data’), measurements based on the digital footprints, survey data collection related to the Internet of Things, analysis of activities in online social networks and collection of ambient characteristics related to the mobile survey data collection.

Significance for the country

Paradata in surveys are currently in a certain dilemma, with their potentials still not fully recognised, at least not in an operational manner supporting massive and standardised use. On one hand, they are extensively researched and examined in several specific studies while, on the other hand, they remain not fully accepted and used as a standard component of computerised survey data collection.

This project will be one of the first to develop a comprehensive methodology for the general and standardised exploitation of survey paradata. With this, the project is at the forefront of innovation in survey methodology, hence also representing a valuable improvement in social science methodology in general. 

More specifically, the project also provides grounds for new solutions with the prospect of becoming a foundation for corresponding survey industry standards:
• Data cleaning is a procedure in the post-survey adjustment process and it includes several tasks. One relates to the process of removing respondents with potentially high probability for response quality to be unacceptably low (e.g. due to speeding). However, current data cleaning procedures very often do not rely on the paradata at all. Even when they do, many different approaches exist, sometimes with contradictory results. This project will provide the basis for developing a new optimised approach for identifying respondents with a high probability of unacceptably low response quality, which may be better to be removed. 
• Data augmentation relates to survey response data, which can be added with a standardised set of paradata-based compound indicators. This added information need to be correlated with the response quality and also with the behavioural and psychological characteristics of the respondents. The project will study in detail these relations. Based on these results, large amount of information contained in paradata will be fully exploited, but extracted to produce a handful of indicators relevant to the data quality evaluations and also to the substantive use. We estimate that this set of paradata-based compound indicators will have roughly around 10 dimensions. 

Within this project, prototypes with the implementation of above mentioned solutions will be developed within the open-source platform for web surveys – 1KA, so that contributions will be easily evaluated and exploited.

The project’s scope is also closely related to other emerging developments in social science methodology, including analysis of large datasets (‘big data’), measurements based on the digital footprints, survey data collection related to the Internet of Things, analysis of activities in online social networks and collection of ambient characteristics related to the mobile survey data collection.

Most important scientific results

Interim report

Most important socioeconomically and culturally relevant results

Using paradata to evaluate response quality in surveys

Views history

Favourite

Using paradata to evaluate response quality in surveys

FRASCATI classification

CERIF classification

FORD classification

Confirmation required

Views history

Favourite