Head of the research group: izr. dr. Lan Umek
Bibliometric analysis has become increasingly important in recent years as a means of evaluating and analyzing the scientific literature. As the proportion of bibliometric documents in total scientific output increases dramatically, there is a need to use more advanced statistical methods, especially those related to data mining and subgroup analysis, to improve bibliometric analysis. Although subgroups occur naturally in bibliographic data (temporal dimension, geographic scope, topic, etc.), their evaluation and analysis has rarely been performed. In this project, we will present concrete examples of data mining methods that could be integrated into bibliometrics, especially in terms of prediction (classification and regression) and subgroup analysis.
To address this gap, we will be the first to implement two subgroup discovery approaches in bibliometrics. Both algorithms aim to discover subgroups of bibliographic documents that reflect significant relationships between two aspects, such as keywords and authors. The first algorithm combines a partitioning clustering approach with contingency table analysis and extracts subgroups of documents that reflect significant relationships between the analyzed aspects. The second algorithm will combine a hierarchical clustering approach and statistical classification techniques (such as logistic regression, support vector machines, neural networks, etc.) to extract subgroups that are similar with respect to one analyzed aspect and can be reliably separated from the rest of the documents by the second analyzed aspect.
As part of the project, we will implement basic and advanced bibliometric techniques in a Python package called Biblium. Biblium will be the most comprehensive Python package for bibliometric analysis, as it will integrate all the procedures from the R package Bibliometrix along with more sophisticated methods for analyzing bibliographic data, including data mining methods and subgroup analysis. In addition, we will perform the bibliometric analysis itself and implement several state-of-the-art approaches and visualizations that are implemented in different programs but are not under one umbrella.
In the final phase of the project, we will integrate Biblium with the open source data mining software Orange as its add-on Orangebib. This integration will combine bibliometric analysis with data mining methods in a user-friendly software that does not require programming skills to use. Together with existing Orange add-ons (bioinformatics, advanced text mining, geomaps, etc.), Orange users will be able to find new, creative ways to combine different aspects of bibliographic data and make an important contribution to the field of bibliometrics.
We plan to apply data mining and subgroup discovery techniques to several areas, including applications in the natural sciences (medicine, drug repurposing, genetics, etc.) and the social sciences (public administration, online learning, taxation, artificial intelligence, and disruptive technologies in the public sector, etc.).
We intend to publish several papers as results of the project, including software and methodology papers in leading journals of scientometrics and data mining, as well as application of the developed and implemented tools in several journals of natural and social sciences. We also plan to participate in several (inter)national conferences in the field of scientometrics, presenting Biblium and Orangebib. As a final deliverable, we plan to organize a free one-day online workshop where users will learn how to use Orangebib to easily perform advanced bibliometric analyzes.
Duration (from/to):
- 10. 2023 – 30. 9. 2026
Contracting Authority:
Slovenian Research and Innovation Agency
Financing:
The project is being financed with 2571 yearly hours (A price category) for 3 years.
Members of the research group and links to the SICRIS portal:
prof. dr. Aleksander Aristovnik
Suzana Mišić, from 1.12.2023
The project will be conducted during a 3-year period and organized in seven complementary work packages (WP1–WP7), as described in the detailed description of work programmes. The tasks within each WP and their deadlines are clearly shown in the Gantt chart of the project (Figure).
- Work package 1: Project management
- Work package 2: Beta version of Biblium
- Work package 3: Bibliometrics of bibliometrics
- Work Package 4: Implementation of subgroup discovery algorithms
- Work package 5: Implementation of stable version of Biblium
- Work package 6: Implementation of Biblium functionalities to Orange
- Work package 7: Dissemination of the results
Gantt chart of the project: