Project: Text Analysis Methods and Tools for Similarity Metrics in Large National Text Corpora

Printer-friendly version

The Latvian–Ukrainian Joint Programme of Scientific and Technological Cooperation Project funded by the Latvian State Education Development Agency was approved on 17 December 2020. Project Text Analysis Methods and Tools for Similarity Metrics in Large National Text Corpora: the Case of the Latvian National Digital Library (LNDL) and the National Repository of Academic Texts of Ukraine (NRATU) will be carried out by the National Library of Latvia (principal investigator Dr.hist. Valters Scerbinskis) and Ukrainian Institute of Scientific and Technical Expertise and Information (principal investigator Dr.oec. Olena Serhiivna Chmyr).

Project objectives:

  1. To investigate existing algorithms and methods of similarity metrics and identify the most suitable approaches for text analysis and similarity detection in the given text corpora (LNDL and NRATU). This would serve as a theoretical and methodological grounds for further testing of chosen methods, as well as will contribute to the body of knowledge in the field of similarity metrics.
  2. To design testing environments (adjacent to LNDL and NRATU) that will allow to conduct experiments for this project and will serve as prototypes for further development of text analysis tools in question that will continue beyond this project.
  3. To apply the prototyped tools and methods and test the validity of selected approaches in 3-4 case studies based on topics relevant for social sciences and humanities research. This would allow to compare the methods in question (incl. comparison of usability of similar methods in different languages), as well as contribute to the digital humanities and digital social sciences research.
  4. To inform and engage social sciences and humanities students and members of the research community, encourage them to apply methods of digital text analysis and similarity metrics in their research, as well as discover the content of LNDL and NRATU in a new way. To foster the building of networks among digital humanities and digital social sciences enthusiasts, computer science specialists, and libraries.

Project partners: National Library of Latvia, Ukrainian Institute of Scientific and Technical Expertise and Information

Project implementation period: 2021–2022

Project funding: 38 610,00 EUR

Project sponsor: State Education Development Agency, Republic of Latvia