Semi-automating abstract screening with a natural language model pretrained on biomedical literature

Table 1 List of performance measures assessed

Measure	Definition	Estimate
Recall/sensitivity	\(\frac{\mathrm{Number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{included}\;\mathrm{by}\;\mathrm{human}\;\mathrm{reviewer}\;\mathrm{and}\;\mathrm{pBERT}}{Nu\mathrm{mber}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{included}\;\mathrm{by}\;\mathrm{human}\;\mathrm{reviewer}}\)	37.7%
Precision/positive predictive value	\(\frac{\mathrm{Number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{included}\;\mathrm{by}\;\mathrm{human}\;\mathrm{reviewer}\;\mathrm{and}\;\mathrm{pBERT}}{Nu\mathrm{mber}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{included}\;\mathrm{by}\;\mathrm{pBERT}}\)	37.7%
F1	\(2 \times \frac{precision \times recall }{precision+recall}\)	37.7%
Accuracy	\(\frac{\mathrm{Number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{included}\;\mathrm{by}\;\mathrm{human}\;\mathrm{reviewer}\;\mathrm{and}\;\mathrm{pBERT}}{\mathrm{Total}\;\mathrm{number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{screened}}\)	70.2%
Disagreement	\(\frac{\mathrm{Number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{with}\;\mathrm{different}\;\mathrm{decisions}\;\mathrm{by}\;\mathrm{human}\;\mathrm{reviewer}\;\mathrm{and}\;\mathrm{pBERT}}{\mathrm{Total}\;\mathrm{number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{screened}}\)	3.0%

ISSN: 2046-4053