Skip to main content

Table 1 Characteristics of the document-term matrices (DTMs)

From: Screening PubMed abstracts: is class imbalance always a challenge to machine learning?

Systematic reviews

Documents

Tokens

Non-zero entries

Zero entries

Sparsity

Yang et al. 2014 [15]

418

61208

147445

25437499

0.99

Meng et al 2014 [16]

209

35821

73977

7412612

0.99

Segelov et al. 2014 [17]

413

58351

125027

23963936

0.99

Li et al. 2014 [18]

206

33851

68826

6904480

0.99

Lv et al. 2014 [19]

412

57485

138846

23544974

0.99

Wang et al. 2015 [20]

832

101418

288432

84091344

1.00

Zhou at al. 2014 [21]

209

33389

69854

6908447

0.99

Liu et al. 2014 [22]

623

88108

219258

54672026

1.00

Douxfils et al. 2014 [23]

413

58133

141721

23869208

0.99

Kourbeti et al. 2014 [24]

1675

187947

603479

314207746

1.00

Li et al. 2014 [25]

209

33653

69130

6964347

0.99

Cavender et al. 2014 [26]

414

59572

141105

24521703

0.99

Chatterjee et al. 2014 [27]

418

54458

130782

22632662

0.99

Funakoshi et al 2014 [28]

1043

131172

370385

136442011

1.00

  1. For each, DTM reported the number of documents included (number of rows), the number of tokens included/computed within those documents (number of columns), the number of cells of the matrix which are filled with a 0 (zero), or a positive weight; the ratio of non-zero over the total ammount of entries (i.e., the sparsity) is also reported