Loading...
Projects / Programmes source: ARIS

Using paradata to evaluate response quality in surveys

Research activity

Code Science Field Subfield
5.03.00  Social sciences  Sociology   

Code Science Field
S274  Social sciences  Research methodology in science 

Code Science Field
5.04  Social Sciences  Sociology 
Keywords
paradata, surveys, data quality, web surveys
Evaluation (rules)
source: COBISS
Researchers (9)
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  30704  PhD Jernej Berzelak  Public health (occupational safety)  Researcher  2018 - 2020  120 
2.  34789  PhD Gregor Čehovin  Sociology  Researcher  2019 - 2021  52 
3.  17913  PhD Katja Lozar Manfreda  Sociology  Researcher  2018 - 2021  181 
4.  38368  Miha Matjašič  Sociology  Researcher  2018 - 2021  33 
5.  38051  Bojana Novak-Fajfar    Technical associate  2018 - 2020 
6.  29060  Ajda Petek  Sociology  Researcher  2018  20 
7.  27574  PhD Andraž Petrovčič  Sociology  Researcher  2018 - 2021  290 
8.  51405  Katja Trebežnik  Sociology  Researcher  2019 
9.  10155  PhD Vasja Vehovar  Sociology  Head  2018 - 2021  840 
Organisations (1)
no. Code Research organisation City Registration number No. of publicationsNo. of publications
1.  0582  University of Ljubljana, Faculty of Social Sciences  Ljubljana  1626957  40,369 
Abstract
Technological developments are driving surveys towards paper-free and interview-free data collection methods. Computerised self-administered surveys (among which web surveys dominate) can be conducted much more cheaply, although there are rising concerns about the data quality. Within this context, survey paradata – data about the process of collecting survey data – are playing an increasingly important role. One essential type of paradata is the time a respondent needs to answer a survey question. Together with other paradata (e.g. mouse movements, keystrokes etc.), this can help identify respondents who perform the response process in an undesirable way (e.g. too fast or providing inaccurate answers). Paradata can also serve in evaluations of survey instruments and for online interventions, such as real-time alerts. To effectively capture and process paradata, a proper approach is needed concerning the level of measurement (item, question, page or questionnaire), the metrics used (statistical or cognitive measures), and criteria for the removal of respondents with low quality of responses (e.g. the quickest 1% of respondents). A preliminary literature review and research practice (e.g. online panels) show the current approaches are extremely diverse and inconsistent. Particularly critical is the lack of comprehensive insight into the relationship between paradata, response quality indicators and the effects on estimates when respondents with a high probability of low quality of response are removed. The proposed project addresses issues that are highly relevant to general social science methodology and narrows existing knowledge gaps by pursuing the following objectives: 1. providing new knowledge via a systematic overview of the literature and global research practice on the use of paradata; 2. establishing original knowledge on the link between paradata, response quality and the effects of removing low quality responses; 3. developing new approaches to identify respondents who – according to paradata and response data – demonstrate probability for low quality of response, whereby their removal would increase the overall data quality; 4. developing a set of standardised paradata-based compound indicators – related to quality of responses and also to the social and psychological traits of respondents – to enrich response data. The above objectives will be achieved by studying dedicated web surveys where detailed paradata were captured and specific methodological questions were used, which includes the European Social Survey online panel (CRONOS 2017). In addition, new studies will be conducted with a leading Slovenian online panel (e.g. Valicon), with a global online panel (e.g. Survey Monkey) and with non-panel respondents. Another research stream will comprise a meta-study of around 100,000 web surveys from the 1KA open-survey platform. The project is to be run by a team, which is one of the pioneers of web survey research (since 1996). Besides publications (particularly the monograph Callegaro, Lozar-Manfreda and Vehovar: Web survey methodology, 2015, Sage), the team is internationally recognised for maintaining a central resource for web survey methodology (WebSM.org) and for the open-source survey platform 1KA. The project’s objectives aim to bring some conclusive insights into paradata issues (objectives 1 and 2), areas where considerable controversies exist. The objectives also seek to develop basis for key industry standards (objectives 3 and 4) and for general breakthrough improvements in survey data quality. Assurance that these ambitious goals will be achieved comes from the past achievements and extremely high competence of the Scientific Advisory Board composed of leading researchers in this area: Mick Couper (U of Michigan), Michael Bosnjak (U of Trier), Frauke Kreuter (U of Maryland) and Jon Krosnick (Stanford University).
Significance for science
Paradata in surveys are currently in a certain dilemma, with their potentials still not fully recognised, at least not in an operational manner supporting massive and standardised use. On one hand, they are extensively researched and examined in several specific studies while, on the other hand, they remain not fully accepted and used as a standard component of computerised survey data collection. This project will be one of the first to develop a comprehensive methodology for the general and standardised exploitation of survey paradata. With this, the project is at the forefront of innovation in survey methodology, hence also representing a valuable improvement in social science methodology in general. More specifically, the project also provides grounds for new solutions with the prospect of becoming a foundation for corresponding survey industry standards: • Data cleaning is a procedure in the post-survey adjustment process and it includes several tasks. One relates to the process of removing respondents with potentially high probability for response quality to be unacceptably low (e.g. due to speeding). However, current data cleaning procedures very often do not rely on the paradata at all. Even when they do, many different approaches exist, sometimes with contradictory results. This project will provide the basis for developing a new optimised approach for identifying respondents with a high probability of unacceptably low response quality, which may be better to be removed. • Data augmentation relates to survey response data, which can be added with a standardised set of paradata-based compound indicators. This added information need to be correlated with the response quality and also with the behavioural and psychological characteristics of the respondents. The project will study in detail these relations. Based on these results, large amount of information contained in paradata will be fully exploited, but extracted to produce a handful of indicators relevant to the data quality evaluations and also to the substantive use. We estimate that this set of paradata-based compound indicators will have roughly around 10 dimensions. Within this project, prototypes with the implementation of above mentioned solutions will be developed within the open-source platform for web surveys – 1KA, so that contributions will be easily evaluated and exploited. The project’s scope is also closely related to other emerging developments in social science methodology, including analysis of large datasets (‘big data’), measurements based on the digital footprints, survey data collection related to the Internet of Things, analysis of activities in online social networks and collection of ambient characteristics related to the mobile survey data collection.
Significance for the country
Paradata in surveys are currently in a certain dilemma, with their potentials still not fully recognised, at least not in an operational manner supporting massive and standardised use. On one hand, they are extensively researched and examined in several specific studies while, on the other hand, they remain not fully accepted and used as a standard component of computerised survey data collection. This project will be one of the first to develop a comprehensive methodology for the general and standardised exploitation of survey paradata. With this, the project is at the forefront of innovation in survey methodology, hence also representing a valuable improvement in social science methodology in general. More specifically, the project also provides grounds for new solutions with the prospect of becoming a foundation for corresponding survey industry standards: • Data cleaning is a procedure in the post-survey adjustment process and it includes several tasks. One relates to the process of removing respondents with potentially high probability for response quality to be unacceptably low (e.g. due to speeding). However, current data cleaning procedures very often do not rely on the paradata at all. Even when they do, many different approaches exist, sometimes with contradictory results. This project will provide the basis for developing a new optimised approach for identifying respondents with a high probability of unacceptably low response quality, which may be better to be removed. • Data augmentation relates to survey response data, which can be added with a standardised set of paradata-based compound indicators. This added information need to be correlated with the response quality and also with the behavioural and psychological characteristics of the respondents. The project will study in detail these relations. Based on these results, large amount of information contained in paradata will be fully exploited, but extracted to produce a handful of indicators relevant to the data quality evaluations and also to the substantive use. We estimate that this set of paradata-based compound indicators will have roughly around 10 dimensions. Within this project, prototypes with the implementation of above mentioned solutions will be developed within the open-source platform for web surveys – 1KA, so that contributions will be easily evaluated and exploited. The project’s scope is also closely related to other emerging developments in social science methodology, including analysis of large datasets (‘big data’), measurements based on the digital footprints, survey data collection related to the Internet of Things, analysis of activities in online social networks and collection of ambient characteristics related to the mobile survey data collection.
Most important scientific results Interim report
Most important socioeconomically and culturally relevant results
Views history
Favourite