Our team member Daniel Pascual carried out a content and work session on digital tools and apps for retrieving and storing texts from websites for corpus compilations. Together we explored how they work and their affordances and limitations. He focused on three main tools:
The first one was NCapture, which is a tool used with the software NVivo. It is a free web-browser extension that allows the user to gather web content and import it into NVivo so that it can be analysed. Its limitations? The more multimodal elements and hyperlinks the web captured includes, the more disorganised and convoluted they will appear on NVivo, making it somewhat hard to analyse. However, this tool works very efficiently with webs that include mostly text.
The second tool we explored was GoFullPage. This is another browser extension which captures a high-quality screenshot of the full webpage, and it can be downloaded as an image or PDF. The tool even allows the user to edit the screenshots by cropping or blurring them or adding shapes and text, although this is a paid feature. It can be very useful for analysing the different visual elements present in webpages.
The third and last tool Daniel taught us how to use is WaybackMachine. This is part of the Internet Archive, and it allows users to access websites that have already expired and consult past versions that are no longer available. This might be a very interesting resource to explore changes that different sites have undergone, as well as for the compilation of corpora.
Thank you very much Daniel for such a useful workshop that can help the InterGedi team advance with our research.