Skip to main content
Fig. 2 | Systematic Reviews

Fig. 2

From: Screening PubMed abstracts: is class imbalance always a challenge to machine learning?

Fig. 2

Computational plan. The set of documents for each systematic review considered was imported and converted into a corpus, preprocessed, and the corresponding document-term matrix (DTM) was created for the training. Next, for each combination of machine learning technique (MLT), each one of the corresponding ten randomly selected tuning parameters, and balancing technique adopted, the training was divided in fivefold for the cross-validation (CV) process. In each step of the CV, the DTM was rescaled to the term frequencies-inverse document frequencies (TF-IDF) weights (which are retained to rescale all the samples in the corresponding, i.e., the out-fold, test set). Next, the imbalance was treated with the selected algorithm, and the classifier was trained. Once the features in the test set were adapted to the training set, i.e., additional features were removed, missing ones were added with zero weight, and all of them were reordered accordingly; the trained model was applied to the test set to provide the statistics of interest

Back to article page