Skip to main content


We’d like to understand how you use our websites in order to improve them. Register your interest.

Should methodological filters for diagnostic test accuracy studies be used in systematic reviews of psychometric instruments? a case study involving screening for postnatal depression



Challenges exist when searching for diagnostic test accuracy (DTA) studies that include the design of DTA search strategies and selection of appropriate filters. This paper compares the performance of three MEDLINE search strategies for psychometric diagnostic test accuracy (DTA) studies in postnatal depression.


A reference set of six relevant studies was derived from a forward citation search via Web of Knowledge. The performance of the 'target condition and index test' method recommended by the Cochrane DTA Group was compared to two alternative strategies which included methodological filters. Outcome measures were total citations retrieved, sensitivity, precision and associated 95% confidence intervals (95%CI).


The Cochrane recommended strategy and one of the filtered search strategies were equivalent in performance and both retrieved a total of 105 citations, sensitivity was 100% (95% CI 61%, 100%) and precision was 5.2% (2.6%, 11.9%). The second filtered search retrieved a total of 31 citations, sensitivity was 66.6% (30%, 90%) and precision was 12.9% (5.1%, 28.6%). This search missed the DTA study with most relevance to the DTA review.


The Cochrane recommended search strategy, 'target condition and index test', method was pragmatic and sensitive. It was considered the optimum method for retrieval of relevant studies for a psychometric DTA review (in this case for postnatal depression). Potential limitations of using filtered searches during a psychometric mental health DTA review should be considered.

Peer Review reports


The advent of systematic reviews has generated challenges to develop optimum methods with which to identify studies from electronic bibliographic databases [1]. There is a great deal of expertise in this matter for systematic reviews of randomised trials [2]. However the design of optimum information retrieval strategies for recent developments such as Diagnostic Test Accuracy (DTA) reviews is not yet resolved; challenges that exist when searching for DTA studies have been acknowledged and include the design of DTA search strategies and selection of appropriate filters [35]. DTA studies are important for the assessment of new or existing screening tests; the accuracy of a screening test is assessed by comparing the test to a 'gold standard' to examine if the screening test can accurately classify those with or without the disease, and methodologically rigorous DTA reviews are an important contribution to the overall evidence of a new or existing screening test.

The Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy [6] recommends that a search strategy for identification of DTA studies focus primarily on search terms in relation to the 'target condition' (for example, the illness or medical condition) and the 'index test' (for example, the new test to be compared to a gold standard or some other 'reference' test); no specific filter terms such as 'sensitivity' or 'specificity' are recommended. An important contribution to the debate on search filters, by Whiting et al. (2010) [7] comprised a substantial review, which sought to clarify the use of search filters by evaluating if an optimum search strategy for retrieval of DTA studies was available. The review found inclusion of filters in DTA searches missed relevant studies; in a head-to head comparison of different strategies the review identified seven systematic reviews, which included 506 primary studies as a 'reference set' to examine the utility and sensitivity of the 'subject search', the search method recommended by the Cochrane Handbook [6] compared to 22 published strategies that used filter terms combined with the 'subject search'. However the reference set only included studies of biochemical laboratory tests such as urinary tract infection, faecal occult blood tests and imaging techniques. When we came to conduct a systematic review of a related but distinct field - psychometrics and the identification of mental disorders - we sought to adapt the approaches of Whiting and colleagues, since their review did not address this issue.

DTA reviews in mental health may present new challenges as the 'index' test will likely be an eponymous psychometric measure, for example the Patient Health Questionnaire-9 [8] or Edinburgh Postnatal Depression Scale [9]. Several studies have specifically examined optimum search strategies for retrieval of mental health studies, however these have focused on the identification of specific study designs, for example intervention studies for mental health or optimum strategies to retrieve papers with content related to a specific mental health condition, for example depression [1012]. DTA reviews of psychometric measures for depression have been undertaken, however none of the search strategies explicitly described following methodological guidelines [1316]. In 2007 the National Institute for Health and Clinical Excellence (NICE) issued guidance on the use of questionnaires to detect postnatal depression [17]. In this guidance, they advocate the use of a standardised method to detect postnatal depression and recommend the use of brief case-finding questions:

1) "During the past month, have you often been bothered by feeling down, depressed or hopeless?"

2) "During the past month, have you often been bothered by little interest or pleasure in doing things?"

A third question should be considered if the woman answers "yes" to either of the initial screening questions: "Is this something you feel you need or want help with?"

Recently a DTA review was warranted to evaluate the psychometric properties of these questions and examine the utility of this policy [18]. To our knowledge, the Cochrane recommended search strategy for DTA studies, conducted specifically in the area of psychometrics and mental health has not been evaluated. In this paper we examine whether a useful search strategy is available to identify studies for a DTA review of a brief psychometric measure for postnatal depression.


A head-to-head comparison of three alternative search strategies undertaken in MEDLINE (1996 to June, week 3, 2011) was compared to a reference set of studies.

Reference set

A forward citation search using the first publication to conduct a DTA study of the brief psychometric measure as the reference point [19] was performed in ISI Web of Knowledge, which retrieved 350 primary study and review citations. Titles and abstracts were screened for relevant papers. Inclusion criteria consisted of primary studies which examined the brief case-finding questions in a postnatal population, where the accuracy of these questions was compared to a gold standard or 'reference' standard test. A total of six studies which used the case-finding questions for postnatal depression was identified and selected as the reference set [2025].


Each of the three searches contained comparable components in relation to the review question; the searches contained parallel constructs related to the 'target condition' of postnatal depression and parallel constructs related to the 'index test' (the case-finding questions). Search terms related to the index test (the case-finding questions) were constructed by review of titles and abstracts of citations indentified via the forward citation search. Search terms for the target condition (postnatal depression) were constructed from search strings developed for a Health Technology Assessment (HTA) review of postnatal depression [13]. Two of the three searches contained a methodological filter designed to detect DTA studies. The first filtered search was an adapted strategy based on the comprehensive search strategy conducted by University of York, Centre for Reviews and Dissemination (CRD) for the HTA review of methods to identify postnatal depression in primary care [13], hereafter referred to as the 'CRD filter search' (Table 1). The second filtered search used specific filter terms developed by Vincent et al. (2003) [26]; their high sensitivity was demonstrated by Richie et al. (2007) [27] and these terms are recommended by the Scottish Intercollegiate Guidelines Network (SIGN) [28], hereafter referred to as the 'Vincent filter search' (Table 2). The Cochrane DTA Handbook advises against adding a methodological filter. It uses only two concepts - the 'target condition' and the 'index test'. This was the third search strategy tested, hereafter referred to as the 'Cochrane search' (Table 3). Additional terms for filters, for example use of floating sub-heading 'di.fs' were suggested by an information specialist at the University of York.

Table 1 CRD Filter Search conducted in MEDLINE (OVID SP) from 1996 to June Week3, 2011
Table 2 Vincent Filter Search conducted in MEDLINE (OVID SP) from 1996 to June Week3, 2011
Table 3 Cochrane Search conducted in MEDLINE (OVID SP) from 1996 to June Week3, 2011


The performance of the three searches was compared to the reference set as described by Whiting et al.[7], in terms of completeness of retrieval; the sensitivity of each search (number of relevant studies/reference set × 100) and missed studies (reference set - number of relevant studies identified) was calculated. Efficiency of the assessment process for the researcher was assessed using precision (number of relevant studies/total number of studies identified in search × 100). Associated 95% confidence intervals were calculated using the Vassar online statistics calculator for proportions [29].


Table 4 presents the results of the head-to-head comparison of the three strategies. The Cochrane search and CRD filter search each retrieved a total of 105 citations; precision was low (5.2%) due to the number of irrelevant studies identified, however both searches were 100% sensitive and retrieved all reference set studies. The Vincent filter search retrieved a total of 31 citations; although precision was slightly higher due to the smaller number of studies retrieved, the search had poor sensitivity. Of the six papers identified as the reference set, only one paper examined the DTA of the questions compared to a 'gold standard' diagnostic criteria [20], therefore only this study would be eligible for inclusion in a DTA review of the questions. Whilst the Cochrane search and CRD filter search identified this study [20] the Vincent filter search failed to do so. A Venn diagram (Figure 1) shows the commonality and discrepancy between the three searches in MEDLINE.

Table 4 Summary table of the performance of the three search strategies
Figure 1

Head to head comparison of three alternative search strategies conducted in Medline 1996 to June, Week 3 2011.


This small exploratory case study sought to identify a useful search strategy for identification of DTA studies of a brief psychometric measure recommended for identification of postnatal depression. Relatively few DTA reviews have been conducted in mental health, and those that have addressed this area do not describe a formal method to construct their search strategy. Studies that have examined the sensitivity and accuracy of various search strategies to inform search techniques for DTA reviews, including current published Cochrane DTA Reviews [6] tend to focus on DTA of physical and laboratory tests. In a review of 12 search filters for retrieval of DTA studies, Leeflang et al.[4] included 27 diagnostic systematic reviews, of which one review examined the DTA of two measures to identify alcohol problems in primary care, however, to our knowledge, no paper has specifically examined if a useful search strategy to conduct a DTA review of a psychometric measure for a mental health disorder was available.

Three strategies with alternative methods and search terms were compared to a reference set of records. The Cochrane search and CRD filter search identified all studies in the reference set. Despite the complex use of a variety of filter terms within the CRD filter search, the total number of records retrieved and the completeness of retrieval for relevant studies from the Cochrane search (no methodological filter included) was identical. The construction of the strategies in relation to the use of filters was different, yet the Cochrane search did not suffer loss of any relevant studies. Therefore it had the advantage of providing a less complex yet more pragmatic search strategy, which could be applied to a wide range of electronic databases with the potential for low risk of missing key citations. A potential consequence of not adding a methodological filter when searches for studies are conducted in a large evidence base is the potential loss of precision. Low precision was observed with the Cochrane search. Searches where researchers have many citations to screen in order to identify relevant studies may not be viewed as the most efficient strategy. There is a potential trade off between high sensitivity and low precision that needs consideration with this approach. This may point to further work to balance the need for precision, the pragmatism of less complex searches (as recommended by the Cochrane Group) and the need to refine filter terms related to psychometric tests. The Vincent filter search included specific filter terms to identify DTA studies, so precision was slightly higher than that observed with the Cochrane search. At the same time, sensitivity was lower in comparison with the Cochrane search. As such, the Vincent filter search failed to retrieve the most important study to the review, a DTA study which used gold standard diagnostic criteria [20]. A possible explanation for this is mental health DTA studies are likely to suffer from indexing problems under the filter terms in MEDLINE in the same way as studies of biochemical, laboratory tests or imaging tests do. Indexing problems of filter terms associated with the Vincent filter search might explain the low number of retrievals using the specific DTA filter terms. Additionally, psychometric tests are often associated with terms related to reliability and validity rather than diagnostic accuracy terms. Richie et al (2007) [27] has acknowledged the difficulty of transcription, indexing and lack of transparency in reporting search filters in papers and electronic sources used to conduct DTA searches. Use of the Standards for the Reporting of Diagnostic Accuracy Studies (STARD) [30] guidelines to ensure accuracy, transparency and completeness of reporting DTA studies may assist indexing and identification of psychometric DTA studies in electronic databases, as STARD recommends use of diagnostic terminology for example, sensitivity and specificity and so on for all studies reporting diagnostic accuracy. In addition, a recent and very positive development by EMBASE is the addition of a diagnostic test accuracy indexing term in EMTREE, which should assist researchers searching for DTA studies in this particular database [31].

This was a small exploratory search with limitations, for example the reference set may not be considered a 'gold standard' comparator as it was derived from a forward citation search. However, there are very few DTA reviews of psychometric measures for postnatal depression [18, 13], so there was a relatively small chance of missing relevant studies in this specific area. Evaluation of the Cochrane search strategy for DTA reviews would benefit from further consideration in other mental health disorders and alternative electronic databases as it is difficult to generalise the results of this small study as a method of retrieval of psychometric DTA studies for all mental health conditions.


The findings from this exploratory study reflect the conclusions of Whiting et al.[7] and in our case, the 'target condition and index test' Cochrane search provided an effective and pragmatic strategy when constructing a systematic search strategy for retrieval of DTA studies of a brief psychometric measure for postnatal depression in electronic databases. In addition, use of the Cochrane search provided a means to report the conduct of the DTA review search strategy with transparency, according to published guidelines. It is therefore reasonable to conclude that researchers may find it preferable to consider use of the 'target condition and index test' method recommended within the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy [6] rather than filtered searches for a DTA search strategy of a psychometric measure for mental health disorders such as postnatal depression.

Author's Information

The decision to choose an appropriate search strategy was faced by the researcher (RM) when they came to conduct a DTA review of a psychometric measure in postnatal depression. As definitive guidelines for conduct of diagnostic reviews were published last year in the first Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (2010), the researcher was interested to know if the Cochrane recommended search for diagnostic reviews was relevant to a psychometric diagnostic review in mental health - the complexity/pragmatic aspects faced when using multiple, unfamiliar search filters compared to a brief, relatively simple method was of interest. Overall, the ability to justify the search strategy used in the review was deemed important in terms of reporting transparency as other psychometric diagnostic reviews in mental health have not specified a specific strategy method or cited specific guidance for their strategy choice.


95% CI:

95% confidence interval


Centre for Reviews and Dissemination


Diagnostic Test Accuracy


Health Technology Assessment


National Institute for Health & Clinical Excellence


Scottish Intercollegiate Guidelines Network.


  1. 1.

    Hunt DL, McKibbon KA: Locating and appraising systematic reviews. Ann Intern Med. 1997, 126: 532-538.

  2. 2.

    Cochrane Handbook for Systematic Reviews of Interventions. Edited by: Higgins JPT, Green S. 2009, Version 5.0.2 [updated September 2009]: The Cochrane Collaboration; [available from]

  3. 3.

    Leeflang M, Deeks J, Gatsonis C, Bossuyt P: Systematic reviews of diagnostic test accuracy. Ann Intern Med. 2008, 149: 889-897.

  4. 4.

    Leeflang MMG, Scholten RJPM, Rutjes AWS, Reitsma JB, Bossuyt PMM: Use of methodological search filters to identify diagnostic accuracy studies can lead to the omission of relevant studies. J Clin Epidemiol. 2006, 59: 234-240. 10.1016/j.jclinepi.2005.07.014.

  5. 5.

    Glanville J, Bayliss S, Booth A, Dundar Y, Fernandes H, Fleeman ND, Foster L, Fraser C, Fry-Smith A, Golder S, Lefebvre C, Miller C, Paisley S, Payne L, Price A, Welch K: So many filters, so little time: the development of a search filter appraisal checklist. J Med Libr Assoc. 2008, 96: 356-10.3163/1536-5050.96.4.011.

  6. 6.

    Cochrane Diagnostic Test Accuracy Working Group: Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. 2010, [accessed 5/09/2011], []

  7. 7.

    Whiting P, Westwood M, Beynon R, Burke M, Sterne JAC, Glanville J: Inclusion of methodological filters in searches for diagnostic test accuracy misses relevant studies. J Clin Epidemiol. 2011, 64: 602-607. 10.1016/j.jclinepi.2010.07.006.

  8. 8.

    Spitzer RL, Kroenke K, Williams JBW, the Patient Health Questionnaire Primary Care Study Group: Validation and utility of a self-report Version of PRIME-D: The PHQ Primary Care Study. JAMA. 1999, 282: 1737-1744. 10.1001/jama.282.18.1737.

  9. 9.

    Cox J, Holden J, Sagovsky R: Detection of postnatal depression: development of a 10 item postnatal depression scale. Br J Psychiatry. 1987, 150: 782-786. 10.1192/bjp.150.6.782.

  10. 10.

    Adams CE, Power A, Frederick K, Lefebvre C: An investigation of the adequacy of MEDLINE searches for randomized controlled trials (RCTs) of the effects of mental health care. Psychol Med. 1994, 24: 741-748. 10.1017/S0033291700027896.

  11. 11.

    Watson RJ, Richardson PH: Identifying randomized controlled trials of cognitive therapy for depression: comparing the efficiency of Embase, Medline and PsycINFO bibliographic databases. Br J Med Psychol. 1999, 72: 535-542. 10.1348/000711299160220.

  12. 12.

    Wilczynski NL, Haynes RB, Team Hedges: Optimal search strategies for identifying mental health content in MEDLINE: an analytical survey. Ann Gen Psychiatry. 2006, 5: 4-10.1186/1744-859X-5-4.

  13. 13.

    Hewitt CE, Gilbody SM, Brealey S, Paulden M, Palmer S, Mann R, Green J, Morrell J, Barkham M, Light K, Richards D: Methods to identify postnatal depression in primary care: an integrated evidence synthesis and value of information analysis. Health Technol Assess. 2009, 13: 1-145. 147-230

  14. 14.

    Reuland DS, Cherrington A, Watkins GS, Bradford DW, Blanco RA, Gaynes BN: Diagnostic Accuracy of Spanish Language Depression-Screening Instruments. Ann Fam Med. 2009, 7: 455-462. 10.1370/afm.981.

  15. 15.

    Vodermaier A, Linden W, Siu C: Screening for Emotional Distress in Cancer Patients: A Systematic Review of Assessment Instruments. J Natl Cancer Inst. 2009, 101: 1464-1488. 10.1093/jnci/djp336.

  16. 16.

    Wittkampf KA, Naeije L, Schene AH, Huyser J, van Weert HC: Diagnostic accuracy of the mood module of the Patient Health Questionnaire: a systematic review. Gen Hosp Psychiatry. 2007, 29: 388-395. 10.1016/j.genhosppsych.2007.06.004.

  17. 17.

    National Institute for Health & Clinical Excellence: Antenatal and Post-Natal Mental Health, The NICE Guideline on Clinical Management and Service Guidance. 2007, London: The British Psychological Society and The Royal College of Psychiatrists

  18. 18.

    Mann R, Gilbody SM: Validity of two case finding questions to detect postnatal depression: A review of diagnostic test accuracy. J Affect Disord. 2011, 133: 388-397. 10.1016/j.jad.2010.11.015.

  19. 19.

    Whooley MA, Avins AL, Miranda J, Brownwer WS: Case-finding instruments for depression: two questions are as good as many. J Gen Intern Med. 1997, 12: 439-445. 10.1046/j.1525-1497.1997.00076.x.

  20. 20.

    Gjerdingen D, Crow S, McGovern P, Miner M, Center B: Postpartum Depression Screening at Well-Child Visits: Validity of a 2-Question Screen and the PHQ-9. Ann Fam Med. 2009, 7: 63-70. 10.1370/afm.933.

  21. 21.

    Bennett IM, Coco A, Coyne JC, Mitchell AJ, Nicholson J, Johnson E, Horst M, Ratcliffe S: Efficiency of a two-item pre-screen to reduce the burden of depression screening in pregnancy and postpartum: An IMPLICIT network study. J Am Board Fam Med. 2008, 21: 317-325. 10.3122/jabfm.2008.04.080048.

  22. 22.

    Cutler CB, Legano LA, Dreyer BP, Fierman AH, Berkule SB, Lusskin SI, Tomopoulos S, Roth M, Mendelsohn AL: Screening for maternal depression in a low education population using a two item questionnaire. Arch Womens Ment Health. 2007, 10: 277-283. 10.1007/s00737-007-0202-z.

  23. 23.

    Mishina H, Hayashino Y, Fukuhara S: Test performance of two-question screening for postpartum depressive symptoms. Pediatr Int. 2009, 51: 48-53. 10.1111/j.1442-200X.2008.02659.x.

  24. 24.

    Olson AL, Dietrich AJ, Prazar G, Hurley J: Brief maternal depression screening at well-child visits. Pediatrics. 2006, 118: 207-216. 10.1542/peds.2005-2346.

  25. 25.

    Olson AL, Dietrich AJ, Prazar G, Hurley J, Tuddenham A, Hedberg V, Naspinsky DA: Two approaches to maternal depression screening during well child visits. J Dev Behav Pediatr. 2005, 26: 169-176. 10.1097/00004703-200506000-00002.

  26. 26.

    Vincent S, Greenley S, Beaven O: Clinical Evidence Diagnosis: developing a sensitive search strategy to retrieve diagnostic studies on deep vein thrombosis: a pragmatic approach. Health Info Libr J. 2003, 20: 150-159. 10.1046/j.1365-2532.2003.00427.x.

  27. 27.

    Ritchie G, Glanville J, Lefebvre C: Do published search filters to identify diagnostic test accuracy studies perform adequately?. Health Info Libr J. 2007, 24: 188-192. 10.1111/j.1471-1842.2007.00735.x.

  28. 28.

    Scottish Collegiate Guidelines Network, (SIGN): SIGN: Search Filters. 2010, [accessed 23.06.2011], []

  29. 29.

    Vassar statistics online calculator. accessed 24.6.2011, []

  30. 30.

    Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie D, de Vet HC, STARD Group: Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Br Med J. 2003, 326: 41-44. 10.1136/bmj.326.7379.41.

  31. 31.

    The Cochrane Collaboration. accessed 27.6.2011, []

Download references


The authors wish to thank Janette Colclough for her advice with regards to identification of search terms and the conduct of the search strategies and also thank the two reviewers for their comments and suggestions to earlier drafts of the manuscript.

Author information



Corresponding author

Correspondence to Rachel Mann.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

RM designed the concept and undertook the searches and drafted the final manuscript. SG commented on, and helped draft the final manuscript. All authors read and gave approval to the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Mann, R., Gilbody, S.M. Should methodological filters for diagnostic test accuracy studies be used in systematic reviews of psychometric instruments? a case study involving screening for postnatal depression. Syst Rev 1, 9 (2012).

Download citation


  • Diagnostic test accuracy
  • Systematic review
  • Methodological filters
  • Literature searching
  • Psychometrics
  • Mental Health


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate. Please note that comments may be removed without notice if they are flagged by another user or do not comply with our community guidelines.