Skip to main content

Table 1 List of performance measures assessed

From: Semi-automating abstract screening with a natural language model pretrained on biomedical literature

Measure

Definition

Estimate

Recall/sensitivity

\(\frac{\mathrm{Number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{included}\;\mathrm{by}\;\mathrm{human}\;\mathrm{reviewer}\;\mathrm{and}\;\mathrm{pBERT}}{Nu\mathrm{mber}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{included}\;\mathrm{by}\;\mathrm{human}\;\mathrm{reviewer}}\)  

37.7%

Precision/positive predictive value

\(\frac{\mathrm{Number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{included}\;\mathrm{by}\;\mathrm{human}\;\mathrm{reviewer}\;\mathrm{and}\;\mathrm{pBERT}}{Nu\mathrm{mber}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{included}\;\mathrm{by}\;\mathrm{pBERT}}\)  

37.7%

F1

\(2 \times \frac{precision \times recall }{precision+recall}\)

37.7%

Accuracy

\(\frac{\mathrm{Number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{included}\;\mathrm{by}\;\mathrm{human}\;\mathrm{reviewer}\;\mathrm{and}\;\mathrm{pBERT}}{\mathrm{Total}\;\mathrm{number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{screened}}\)  

70.2%

Disagreement

\(\frac{\mathrm{Number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{with}\;\mathrm{different}\;\mathrm{decisions}\;\mathrm{by}\;\mathrm{human}\;\mathrm{reviewer}\;\mathrm{and}\;\mathrm{pBERT}}{\mathrm{Total}\;\mathrm{number}\;\mathrm{of}\;\mathrm{abstracts}\;\mathrm{screened}}\)  

3.0%