Datasets (size) | Data scenarios | Downsampling | Word frequencies | Classification algorithms | Model metricsb |
---|---|---|---|---|---|
Psoriasis (4442) | Abstract screening | With | Removing words appearing < 5 times across all citations | SVM | ROC |
Lung cancer (12,769) | Full-text screening | Without | Removing words appearing < 10 times across all citations | Naïve Bayes | Sensitivity |
Liver cancer (8507) | Removing full-text excludes | Removing words appearing < 100 times across all citations | Bagged CART | ||
Melanoma (3089) | Removing words appearing < 500 times across all citations | ||||
Obesity (5187) | Keeping top 50 words in terms of variable importancea | ||||
Keeping top 100 words in terms of variable importancea | |||||
Keeping top 500 words in terms of variable importancea |