IVACS2024: Ana E. Sancho-Ortiz presents the compilation challenges of the SciDis Database

Ana E. Sancho-Ortiz took part in the 11th Inter-Varietal Applied Corpus Studies (IVACS) conference with a paper entitled “Methodological challenges in working with digitally-mediated data: The compilation of the SciDis (Science Dissemination) Database”. For this event, celebrated in the University of Cambridge on the 16th and 17th of July, our PhD candidate prepared a methodological presentation aimed at explaining the methodological problems our research group has faced in the compilation of the SciDis (Science Dissemination) Database. Specifically, she concentrated on the issue of representativeness, delving on its relevance for the compilation of corpora made of digital texts known for encompassing diverse typologies of disseminating practice, being written by expert authors with different roles as regards the generation and mediation of specialised knowledge, and presenting different discursive and rhetorical features depending on the disciplinary culture to which they belong.

You can read the abstract of her talk below:

The development of technology has prompted the exploration of innovative corpus-compilation techniques that enable the creation of multimodal mega- and micro-corpora of off-line and online texts (O’Keeffe and McCarthy 2022). This possibility to engage with extensive datasets entails the recognition of corpus building as an ongoing decision-making process characterized by the constant emergence of methodological challenges (Collins 2019). In this context, the SciDis (Science Dissemination) data has emerged as a static collection of digitally mediated texts aimed to represent diverse discursive phenomena within the field of scientific communication online. This database constitutes the object of study of the SciDis project, interested in the study of digital professional practices in English in the context of science dissemination, primarily characterized by a reliance on knowledge recontextualization.

This study addresses the methodological challenges in the compilation of the SciDis database, with a central emphasis on representativeness as regards digital discursive practices. The initial challenges relate to the typology of the practices considered and the need for the database to encompass the dynamic nature of digitally mediated texts. The decision was taken to observe web-hosted practices, on the one hand, and social media practices, on the other. Other difficulties were encountered regarding who generates the content and the degree to which expert users intervene in the communication of knowledge. Here, it was determined to classify the selected texts into two categories: author-generated knowledge, for the practices endorsed by users, and writer-mediated knowledge, for those wherein users mediate between authors and the knowledge they generate. Lastly, the final set of challenges pertains to the disciplinary and idiosyncratic differences identified between the three fields of knowledge selected for the analysis (health, economy and natural sciences). Thus, to ensure representativeness (Biber et al. 1998), it was decided to explore all practices specific to and shared between the disciplines but only compile and analyze those common to all. The ultimate aim of the compilation and analysis of this database is the exploration of discursive processes that take place in digital knowledge dissemination such as recontextualization, dialogicity and identity construction.

References:

Biber, Douglas, Susan Conrad and Randi Reppen. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge University Press.
Collins, Luke Curtis. 2019. Corpus Linguistics for Online Communication: A Guide for Research. Routledge.
O’Keeffe, A., & McCarthy, M.J. (Eds.). (2022). The Routledge Handbook of Corpus Linguistics (2nd ed.). Routledge. https://doi.org/10.4324/9780367076399