Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Systematic Review | Definition, Example, & Guide

Systematic Review | Definition, Example & Guide

Published on June 15, 2022 by Shaun Turney . Revised on November 20, 2023.

A systematic review is a type of review that uses repeatable methods to find, select, and synthesize all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer.

They answered the question “What is the effectiveness of probiotics in reducing eczema symptoms and improving quality of life in patients with eczema?”

In this context, a probiotic is a health product that contains live microorganisms and is taken by mouth. Eczema is a common skin condition that causes red, itchy skin.

Table of contents

What is a systematic review, systematic review vs. meta-analysis, systematic review vs. literature review, systematic review vs. scoping review, when to conduct a systematic review, pros and cons of systematic reviews, step-by-step example of a systematic review, other interesting articles, frequently asked questions about systematic reviews.

A review is an overview of the research that’s already been completed on a topic.

What makes a systematic review different from other types of reviews is that the research methods are designed to reduce bias . The methods are repeatable, and the approach is formal and systematic:

  • Formulate a research question
  • Develop a protocol
  • Search for all relevant studies
  • Apply the selection criteria
  • Extract the data
  • Synthesize the data
  • Write and publish a report

Although multiple sets of guidelines exist, the Cochrane Handbook for Systematic Reviews is among the most widely used. It provides detailed guidelines on how to complete each step of the systematic review process.

Systematic reviews are most commonly used in medical and public health research, but they can also be found in other disciplines.

Systematic reviews typically answer their research question by synthesizing all available evidence and evaluating the quality of the evidence. Synthesizing means bringing together different information to tell a single, cohesive story. The synthesis can be narrative ( qualitative ), quantitative , or both.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

what is methodology in systematic review

Systematic reviews often quantitatively synthesize the evidence using a meta-analysis . A meta-analysis is a statistical analysis, not a type of review.

A meta-analysis is a technique to synthesize results from multiple studies. It’s a statistical analysis that combines the results of two or more studies, usually to estimate an effect size .

A literature review is a type of review that uses a less systematic and formal approach than a systematic review. Typically, an expert in a topic will qualitatively summarize and evaluate previous work, without using a formal, explicit method.

Although literature reviews are often less time-consuming and can be insightful or helpful, they have a higher risk of bias and are less transparent than systematic reviews.

Similar to a systematic review, a scoping review is a type of review that tries to minimize bias by using transparent and repeatable methods.

However, a scoping review isn’t a type of systematic review. The most important difference is the goal: rather than answering a specific question, a scoping review explores a topic. The researcher tries to identify the main concepts, theories, and evidence, as well as gaps in the current research.

Sometimes scoping reviews are an exploratory preparation step for a systematic review, and sometimes they are a standalone project.

Prevent plagiarism. Run a free check.

A systematic review is a good choice of review if you want to answer a question about the effectiveness of an intervention , such as a medical treatment.

To conduct a systematic review, you’ll need the following:

  • A precise question , usually about the effectiveness of an intervention. The question needs to be about a topic that’s previously been studied by multiple researchers. If there’s no previous research, there’s nothing to review.
  • If you’re doing a systematic review on your own (e.g., for a research paper or thesis ), you should take appropriate measures to ensure the validity and reliability of your research.
  • Access to databases and journal archives. Often, your educational institution provides you with access.
  • Time. A professional systematic review is a time-consuming process: it will take the lead author about six months of full-time work. If you’re a student, you should narrow the scope of your systematic review and stick to a tight schedule.
  • Bibliographic, word-processing, spreadsheet, and statistical software . For example, you could use EndNote, Microsoft Word, Excel, and SPSS.

A systematic review has many pros .

  • They minimize research bias by considering all available evidence and evaluating each study for bias.
  • Their methods are transparent , so they can be scrutinized by others.
  • They’re thorough : they summarize all available evidence.
  • They can be replicated and updated by others.

Systematic reviews also have a few cons .

  • They’re time-consuming .
  • They’re narrow in scope : they only answer the precise research question.

The 7 steps for conducting a systematic review are explained with an example.

Step 1: Formulate a research question

Formulating the research question is probably the most important step of a systematic review. A clear research question will:

  • Allow you to more effectively communicate your research to other researchers and practitioners
  • Guide your decisions as you plan and conduct your systematic review

A good research question for a systematic review has four components, which you can remember with the acronym PICO :

  • Population(s) or problem(s)
  • Intervention(s)
  • Comparison(s)

You can rearrange these four components to write your research question:

  • What is the effectiveness of I versus C for O in P ?

Sometimes, you may want to include a fifth component, the type of study design . In this case, the acronym is PICOT .

  • Type of study design(s)
  • The population of patients with eczema
  • The intervention of probiotics
  • In comparison to no treatment, placebo , or non-probiotic treatment
  • The outcome of changes in participant-, parent-, and doctor-rated symptoms of eczema and quality of life
  • Randomized control trials, a type of study design

Their research question was:

  • What is the effectiveness of probiotics versus no treatment, a placebo, or a non-probiotic treatment for reducing eczema symptoms and improving quality of life in patients with eczema?

Step 2: Develop a protocol

A protocol is a document that contains your research plan for the systematic review. This is an important step because having a plan allows you to work more efficiently and reduces bias.

Your protocol should include the following components:

  • Background information : Provide the context of the research question, including why it’s important.
  • Research objective (s) : Rephrase your research question as an objective.
  • Selection criteria: State how you’ll decide which studies to include or exclude from your review.
  • Search strategy: Discuss your plan for finding studies.
  • Analysis: Explain what information you’ll collect from the studies and how you’ll synthesize the data.

If you’re a professional seeking to publish your review, it’s a good idea to bring together an advisory committee . This is a group of about six people who have experience in the topic you’re researching. They can help you make decisions about your protocol.

It’s highly recommended to register your protocol. Registering your protocol means submitting it to a database such as PROSPERO or ClinicalTrials.gov .

Step 3: Search for all relevant studies

Searching for relevant studies is the most time-consuming step of a systematic review.

To reduce bias, it’s important to search for relevant studies very thoroughly. Your strategy will depend on your field and your research question, but sources generally fall into these four categories:

  • Databases: Search multiple databases of peer-reviewed literature, such as PubMed or Scopus . Think carefully about how to phrase your search terms and include multiple synonyms of each word. Use Boolean operators if relevant.
  • Handsearching: In addition to searching the primary sources using databases, you’ll also need to search manually. One strategy is to scan relevant journals or conference proceedings. Another strategy is to scan the reference lists of relevant studies.
  • Gray literature: Gray literature includes documents produced by governments, universities, and other institutions that aren’t published by traditional publishers. Graduate student theses are an important type of gray literature, which you can search using the Networked Digital Library of Theses and Dissertations (NDLTD) . In medicine, clinical trial registries are another important type of gray literature.
  • Experts: Contact experts in the field to ask if they have unpublished studies that should be included in your review.

At this stage of your review, you won’t read the articles yet. Simply save any potentially relevant citations using bibliographic software, such as Scribbr’s APA or MLA Generator .

  • Databases: EMBASE, PsycINFO, AMED, LILACS, and ISI Web of Science
  • Handsearch: Conference proceedings and reference lists of articles
  • Gray literature: The Cochrane Library, the metaRegister of Controlled Trials, and the Ongoing Skin Trials Register
  • Experts: Authors of unpublished registered trials, pharmaceutical companies, and manufacturers of probiotics

Step 4: Apply the selection criteria

Applying the selection criteria is a three-person job. Two of you will independently read the studies and decide which to include in your review based on the selection criteria you established in your protocol . The third person’s job is to break any ties.

To increase inter-rater reliability , ensure that everyone thoroughly understands the selection criteria before you begin.

If you’re writing a systematic review as a student for an assignment, you might not have a team. In this case, you’ll have to apply the selection criteria on your own; you can mention this as a limitation in your paper’s discussion.

You should apply the selection criteria in two phases:

  • Based on the titles and abstracts : Decide whether each article potentially meets the selection criteria based on the information provided in the abstracts.
  • Based on the full texts: Download the articles that weren’t excluded during the first phase. If an article isn’t available online or through your library, you may need to contact the authors to ask for a copy. Read the articles and decide which articles meet the selection criteria.

It’s very important to keep a meticulous record of why you included or excluded each article. When the selection process is complete, you can summarize what you did using a PRISMA flow diagram .

Next, Boyle and colleagues found the full texts for each of the remaining studies. Boyle and Tang read through the articles to decide if any more studies needed to be excluded based on the selection criteria.

When Boyle and Tang disagreed about whether a study should be excluded, they discussed it with Varigos until the three researchers came to an agreement.

Step 5: Extract the data

Extracting the data means collecting information from the selected studies in a systematic way. There are two types of information you need to collect from each study:

  • Information about the study’s methods and results . The exact information will depend on your research question, but it might include the year, study design , sample size, context, research findings , and conclusions. If any data are missing, you’ll need to contact the study’s authors.
  • Your judgment of the quality of the evidence, including risk of bias .

You should collect this information using forms. You can find sample forms in The Registry of Methods and Tools for Evidence-Informed Decision Making and the Grading of Recommendations, Assessment, Development and Evaluations Working Group .

Extracting the data is also a three-person job. Two people should do this step independently, and the third person will resolve any disagreements.

They also collected data about possible sources of bias, such as how the study participants were randomized into the control and treatment groups.

Step 6: Synthesize the data

Synthesizing the data means bringing together the information you collected into a single, cohesive story. There are two main approaches to synthesizing the data:

  • Narrative ( qualitative ): Summarize the information in words. You’ll need to discuss the studies and assess their overall quality.
  • Quantitative : Use statistical methods to summarize and compare data from different studies. The most common quantitative approach is a meta-analysis , which allows you to combine results from multiple studies into a summary result.

Generally, you should use both approaches together whenever possible. If you don’t have enough data, or the data from different studies aren’t comparable, then you can take just a narrative approach. However, you should justify why a quantitative approach wasn’t possible.

Boyle and colleagues also divided the studies into subgroups, such as studies about babies, children, and adults, and analyzed the effect sizes within each group.

Step 7: Write and publish a report

The purpose of writing a systematic review article is to share the answer to your research question and explain how you arrived at this answer.

Your article should include the following sections:

  • Abstract : A summary of the review
  • Introduction : Including the rationale and objectives
  • Methods : Including the selection criteria, search method, data extraction method, and synthesis method
  • Results : Including results of the search and selection process, study characteristics, risk of bias in the studies, and synthesis results
  • Discussion : Including interpretation of the results and limitations of the review
  • Conclusion : The answer to your research question and implications for practice, policy, or research

To verify that your report includes everything it needs, you can use the PRISMA checklist .

Once your report is written, you can publish it in a systematic review database, such as the Cochrane Database of Systematic Reviews , and/or in a peer-reviewed journal.

In their report, Boyle and colleagues concluded that probiotics cannot be recommended for reducing eczema symptoms or improving quality of life in patients with eczema. Note Generative AI tools like ChatGPT can be useful at various stages of the writing and research process and can help you to write your systematic review. However, we strongly advise against trying to pass AI-generated text off as your own work.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Prospective cohort study

Research bias

  • Implicit bias
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic
  • Social desirability bias

A literature review is a survey of scholarly sources (such as books, journal articles, and theses) related to a specific topic or research question .

It is often written as part of a thesis, dissertation , or research paper , in order to situate your work in relation to existing knowledge.

A literature review is a survey of credible sources on a topic, often used in dissertations , theses, and research papers . Literature reviews give an overview of knowledge on a subject, helping you identify relevant theories and methods, as well as gaps in existing research. Literature reviews are set up similarly to other  academic texts , with an introduction , a main body, and a conclusion .

An  annotated bibliography is a list of  source references that has a short description (called an annotation ) for each of the sources. It is often assigned as part of the research process for a  paper .  

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, November 20). Systematic Review | Definition, Example & Guide. Scribbr. Retrieved September 3, 2024, from https://www.scribbr.com/methodology/systematic-review/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, how to write a literature review | guide, examples, & templates, how to write a research proposal | examples & templates, what is critical thinking | definition & examples, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

Covidence website will be inaccessible as we upgrading our platform on Monday 23rd August at 10am AEST, / 2am CEST/1am BST (Sunday, 15th August 8pm EDT/5pm PDT) 

How to write the methods section of a systematic review

Home | Blog | How To | How to write the methods section of a systematic review

Covidence breaks down how to write a methods section

The methods section of your systematic review describes what you did, how you did it, and why. Readers need this information to interpret the results and conclusions of the review. Often, a lot of information needs to be distilled into just a few paragraphs. This can be a challenging task, but good preparation and the right tools will help you to set off in the right direction 🗺️🧭.

Systematic reviews are so-called because they are conducted in a way that is rigorous and replicable. So it’s important that these methods are reported in a way that is thorough, clear, and easy to navigate for the reader – whether that’s a patient, a healthcare worker, or a researcher. 

Like most things in a systematic review, the methods should be planned upfront and ideally described in detail in a project plan or protocol. Reviews of healthcare interventions follow the PRISMA guidelines for the minimum set of items to report in the methods section. But what else should be included? It’s a good idea to consider what readers will want to know about the review methods and whether the journal you’re planning to submit the work to has expectations on the reporting of methods. Finding out in advance will help you to plan what to include.

what is methodology in systematic review

Describe what happened

While the research plan sets out what you intend to do, the methods section is a write-up of what actually happened. It’s not a simple case of rewriting the plan in the past tense – you will also need to discuss and justify deviations from the plan and describe the handling of issues that were unforeseen at the time the plan was written. For this reason, it is useful to make detailed notes before, during, and after the review is completed. Relying on memory alone risks losing valuable information and trawling through emails when the deadline is looming can be frustrating and time consuming! 

Keep it brief

The methods section should be succinct but include all the noteworthy information. This can be a difficult balance to achieve. A useful strategy is to aim for a brief description that signposts the reader to a separate section or sections of supporting information. This could include datasets, a flowchart to show what happened to the excluded studies, a collection of search strategies, and tables containing detailed information about the studies.This separation keeps the review short and simple while enabling the reader to drill down to the detail as needed. And if the methods follow a well-known or standard process, it might suffice to say so and give a reference, rather than describe the process at length. 

Follow a structure

A clear structure provides focus. Use of descriptive headings keeps the writing on track and helps the reader get to key information quickly. What should the structure of the methods section look like? As always, a lot depends on the type of review but it will certainly contain information relating to the following areas:

  • Selection criteria ⭕
  • Data collection and analysis 👩‍💻
  • Study quality and risk of bias ⚖️

Let’s look at each of these in turn.

1. Selection criteria ⭕

The criteria for including and excluding studies are listed here. This includes detail about the types of studies, the types of participants, the types of interventions and the types of outcomes and how they were measured. 

2. Search 🕵🏾‍♀️

Comprehensive reporting of the search is important because this means it can be evaluated and replicated. The search strategies are included in the review, along with details of the databases searched. It’s also important to list any restrictions on the search (for example, language), describe how resources other than electronic databases were searched (for example,  non-indexed journals), and give the date that the searches were run. The PRISMA-S extension provides guidance on reporting literature searches. 

what is methodology in systematic review

Systematic reviewer pro-tip:

 Copy and paste the search strategy to avoid introducing typos

3. Data collection and analysis 👩‍💻

This section describes:

  • how studies were selected for inclusion in the review
  • how study data were extracted from the study reports
  • how study data were combined for analysis and synthesis

To describe how studies were selected for inclusion , review teams outline the screening process. Covidence uses reviewers’ decision data to automatically populate a PRISMA flow diagram for this purpose. Covidence can also calculate Cohen’s kappa to enable review teams to report the level of agreement among individual reviewers during screening.

To describe how study data were extracted from the study reports , reviewers outline the form that was used, any pilot-testing that was done, and the items that were extracted from the included studies. An important piece of information to include here is the process used to resolve conflict among the reviewers. Covidence’s data extraction tool saves reviewers’ comments and notes in the system as they work. This keeps the information in one place for easy retrieval ⚡.

To describe how study data were combined for analysis and synthesis, reviewers outline the type of synthesis (narrative or quantitative, for example), the methods for grouping data, the challenges that came up, and how these were dealt with. If the review includes a meta-analysis, it will detail how this was performed and how the treatment effects were measured.

4. Study quality and risk of bias ⚖️

Because the results of systematic reviews can be affected by many types of bias, reviewers make every effort to minimise it and to show the reader that the methods they used were appropriate. This section describes the methods used to assess study quality and an assessment of the risk of bias across a range of domains. 

Steps to assess the risk of bias in studies include looking at how study participants were assigned to treatment groups and whether patients and/or study assessors were blinded to the treatment given. Reviewers also report their assessment of the risk of bias due to missing outcome data, whether that is due to participant drop-out or non-reporting of the outcomes by the study authors.

Covidence’s default template for assessing study quality is Cochrane’s risk of bias tool but it is also possible to start from scratch and build a tool with a set of custom domains if you prefer.

Careful planning, clear writing, and a structured approach are key to a good methods section. A methodologist will be able to refer review teams to examples of good methods reporting in the literature. Covidence helps reviewers to screen references, extract data and complete risk of bias tables quickly and efficiently. Sign up for a free trial today!

Picture of Laura Mellor. Portsmouth, UK

Laura Mellor. Portsmouth, UK

Perhaps you'd also like....

what is methodology in systematic review

Top 5 Tips for High-Quality Systematic Review Data Extraction

Data extraction can be a complex step in the systematic review process. Here are 5 top tips from our experts to help prepare and achieve high quality data extraction.

what is methodology in systematic review

How to get through study quality assessment Systematic Review

Find out 5 tops tips to conducting quality assessment and why it’s an important step in the systematic review process.

what is methodology in systematic review

How to extract study data for your systematic review

Learn the basic process and some tips to build data extraction forms for your systematic review with Covidence.

Better systematic review management

Head office, working for an institution or organisation.

Find out why over 350 of the world’s leading institutions are seeing a surge in publications since using Covidence!

Request a consultation with one of our team members and start empowering your researchers: 

By using our site you consent to our use of cookies to measure and improve our site’s performance. Please see our Privacy Policy for more information. 

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Systematic Review | Definition, Examples & Guide

Systematic Review | Definition, Examples & Guide

Published on 15 June 2022 by Shaun Turney . Revised on 18 July 2024.

A systematic review is a type of review that uses repeatable methods to find, select, and synthesise all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer.

They answered the question ‘What is the effectiveness of probiotics in reducing eczema symptoms and improving quality of life in patients with eczema?’

In this context, a probiotic is a health product that contains live microorganisms and is taken by mouth. Eczema is a common skin condition that causes red, itchy skin.

Table of contents

What is a systematic review, systematic review vs meta-analysis, systematic review vs literature review, systematic review vs scoping review, when to conduct a systematic review, pros and cons of systematic reviews, step-by-step example of a systematic review, frequently asked questions about systematic reviews.

A review is an overview of the research that’s already been completed on a topic.

What makes a systematic review different from other types of reviews is that the research methods are designed to reduce research bias . The methods are repeatable , and the approach is formal and systematic:

  • Formulate a research question
  • Develop a protocol
  • Search for all relevant studies
  • Apply the selection criteria
  • Extract the data
  • Synthesise the data
  • Write and publish a report

Although multiple sets of guidelines exist, the Cochrane Handbook for Systematic Reviews is among the most widely used. It provides detailed guidelines on how to complete each step of the systematic review process.

Systematic reviews are most commonly used in medical and public health research, but they can also be found in other disciplines.

Systematic reviews typically answer their research question by synthesising all available evidence and evaluating the quality of the evidence. Synthesising means bringing together different information to tell a single, cohesive story. The synthesis can be narrative ( qualitative ), quantitative , or both.

Prevent plagiarism, run a free check.

Systematic reviews often quantitatively synthesise the evidence using a meta-analysis . A meta-analysis is a statistical analysis, not a type of review.

A meta-analysis is a technique to synthesise results from multiple studies. It’s a statistical analysis that combines the results of two or more studies, usually to estimate an effect size .

A literature review is a type of review that uses a less systematic and formal approach than a systematic review. Typically, an expert in a topic will qualitatively summarise and evaluate previous work, without using a formal, explicit method.

Although literature reviews are often less time-consuming and can be insightful or helpful, they have a higher risk of bias and are less transparent than systematic reviews.

Similar to a systematic review, a scoping review is a type of review that tries to minimise bias by using transparent and repeatable methods.

However, a scoping review isn’t a type of systematic review. The most important difference is the goal: rather than answering a specific question, a scoping review explores a topic. The researcher tries to identify the main concepts, theories, and evidence, as well as gaps in the current research.

Sometimes scoping reviews are an exploratory preparation step for a systematic review, and sometimes they are a standalone project.

A systematic review is a good choice of review if you want to answer a question about the effectiveness of an intervention , such as a medical treatment.

To conduct a systematic review, you’ll need the following:

  • A precise question , usually about the effectiveness of an intervention. The question needs to be about a topic that’s previously been studied by multiple researchers. If there’s no previous research, there’s nothing to review.
  • If you’re doing a systematic review on your own (e.g., for a research paper or thesis), you should take appropriate measures to ensure the validity and reliability of your research.
  • Access to databases and journal archives. Often, your educational institution provides you with access.
  • Time. A professional systematic review is a time-consuming process: it will take the lead author about six months of full-time work. If you’re a student, you should narrow the scope of your systematic review and stick to a tight schedule.
  • Bibliographic, word-processing, spreadsheet, and statistical software . For example, you could use EndNote, Microsoft Word, Excel, and SPSS.

A systematic review has many pros .

  • They minimise research b ias by considering all available evidence and evaluating each study for bias.
  • Their methods are transparent , so they can be scrutinised by others.
  • They’re thorough : they summarise all available evidence.
  • They can be replicated and updated by others.

Systematic reviews also have a few cons .

  • They’re time-consuming .
  • They’re narrow in scope : they only answer the precise research question.

The 7 steps for conducting a systematic review are explained with an example.

Step 1: Formulate a research question

Formulating the research question is probably the most important step of a systematic review. A clear research question will:

  • Allow you to more effectively communicate your research to other researchers and practitioners
  • Guide your decisions as you plan and conduct your systematic review

A good research question for a systematic review has four components, which you can remember with the acronym PICO :

  • Population(s) or problem(s)
  • Intervention(s)
  • Comparison(s)

You can rearrange these four components to write your research question:

  • What is the effectiveness of I versus C for O in P ?

Sometimes, you may want to include a fourth component, the type of study design . In this case, the acronym is PICOT .

  • Type of study design(s)
  • The population of patients with eczema
  • The intervention of probiotics
  • In comparison to no treatment, placebo , or non-probiotic treatment
  • The outcome of changes in participant-, parent-, and doctor-rated symptoms of eczema and quality of life
  • Randomised control trials, a type of study design

Their research question was:

  • What is the effectiveness of probiotics versus no treatment, a placebo, or a non-probiotic treatment for reducing eczema symptoms and improving quality of life in patients with eczema?

Step 2: Develop a protocol

A protocol is a document that contains your research plan for the systematic review. This is an important step because having a plan allows you to work more efficiently and reduces bias.

Your protocol should include the following components:

  • Background information : Provide the context of the research question, including why it’s important.
  • Research objective(s) : Rephrase your research question as an objective.
  • Selection criteria: State how you’ll decide which studies to include or exclude from your review.
  • Search strategy: Discuss your plan for finding studies.
  • Analysis: Explain what information you’ll collect from the studies and how you’ll synthesise the data.

If you’re a professional seeking to publish your review, it’s a good idea to bring together an advisory committee . This is a group of about six people who have experience in the topic you’re researching. They can help you make decisions about your protocol.

It’s highly recommended to register your protocol. Registering your protocol means submitting it to a database such as PROSPERO or ClinicalTrials.gov .

Step 3: Search for all relevant studies

Searching for relevant studies is the most time-consuming step of a systematic review.

To reduce bias, it’s important to search for relevant studies very thoroughly. Your strategy will depend on your field and your research question, but sources generally fall into these four categories:

  • Databases: Search multiple databases of peer-reviewed literature, such as PubMed or Scopus . Think carefully about how to phrase your search terms and include multiple synonyms of each word. Use Boolean operators if relevant.
  • Handsearching: In addition to searching the primary sources using databases, you’ll also need to search manually. One strategy is to scan relevant journals or conference proceedings. Another strategy is to scan the reference lists of relevant studies.
  • Grey literature: Grey literature includes documents produced by governments, universities, and other institutions that aren’t published by traditional publishers. Graduate student theses are an important type of grey literature, which you can search using the Networked Digital Library of Theses and Dissertations (NDLTD) . In medicine, clinical trial registries are another important type of grey literature.
  • Experts: Contact experts in the field to ask if they have unpublished studies that should be included in your review.

At this stage of your review, you won’t read the articles yet. Simply save any potentially relevant citations using bibliographic software, such as Scribbr’s APA or MLA Generator .

  • Databases: EMBASE, PsycINFO, AMED, LILACS, and ISI Web of Science
  • Handsearch: Conference proceedings and reference lists of articles
  • Grey literature: The Cochrane Library, the metaRegister of Controlled Trials, and the Ongoing Skin Trials Register
  • Experts: Authors of unpublished registered trials, pharmaceutical companies, and manufacturers of probiotics

Step 4: Apply the selection criteria

Applying the selection criteria is a three-person job. Two of you will independently read the studies and decide which to include in your review based on the selection criteria you established in your protocol . The third person’s job is to break any ties.

To increase inter-rater reliability , ensure that everyone thoroughly understands the selection criteria before you begin.

If you’re writing a systematic review as a student for an assignment, you might not have a team. In this case, you’ll have to apply the selection criteria on your own; you can mention this as a limitation in your paper’s discussion.

You should apply the selection criteria in two phases:

  • Based on the titles and abstracts : Decide whether each article potentially meets the selection criteria based on the information provided in the abstracts.
  • Based on the full texts: Download the articles that weren’t excluded during the first phase. If an article isn’t available online or through your library, you may need to contact the authors to ask for a copy. Read the articles and decide which articles meet the selection criteria.

It’s very important to keep a meticulous record of why you included or excluded each article. When the selection process is complete, you can summarise what you did using a PRISMA flow diagram .

Next, Boyle and colleagues found the full texts for each of the remaining studies. Boyle and Tang read through the articles to decide if any more studies needed to be excluded based on the selection criteria.

When Boyle and Tang disagreed about whether a study should be excluded, they discussed it with Varigos until the three researchers came to an agreement.

Step 5: Extract the data

Extracting the data means collecting information from the selected studies in a systematic way. There are two types of information you need to collect from each study:

  • Information about the study’s methods and results . The exact information will depend on your research question, but it might include the year, study design , sample size, context, research findings , and conclusions. If any data are missing, you’ll need to contact the study’s authors.
  • Your judgement of the quality of the evidence, including risk of bias .

You should collect this information using forms. You can find sample forms in The Registry of Methods and Tools for Evidence-Informed Decision Making and the Grading of Recommendations, Assessment, Development and Evaluations Working Group .

Extracting the data is also a three-person job. Two people should do this step independently, and the third person will resolve any disagreements.

They also collected data about possible sources of bias, such as how the study participants were randomised into the control and treatment groups.

Step 6: Synthesise the data

Synthesising the data means bringing together the information you collected into a single, cohesive story. There are two main approaches to synthesising the data:

  • Narrative ( qualitative ): Summarise the information in words. You’ll need to discuss the studies and assess their overall quality.
  • Quantitative : Use statistical methods to summarise and compare data from different studies. The most common quantitative approach is a meta-analysis , which allows you to combine results from multiple studies into a summary result.

Generally, you should use both approaches together whenever possible. If you don’t have enough data, or the data from different studies aren’t comparable, then you can take just a narrative approach. However, you should justify why a quantitative approach wasn’t possible.

Boyle and colleagues also divided the studies into subgroups, such as studies about babies, children, and adults, and analysed the effect sizes within each group.

Step 7: Write and publish a report

The purpose of writing a systematic review article is to share the answer to your research question and explain how you arrived at this answer.

Your article should include the following sections:

  • Abstract : A summary of the review
  • Introduction : Including the rationale and objectives
  • Methods : Including the selection criteria, search method, data extraction method, and synthesis method
  • Results : Including results of the search and selection process, study characteristics, risk of bias in the studies, and synthesis results
  • Discussion : Including interpretation of the results and limitations of the review
  • Conclusion : The answer to your research question and implications for practice, policy, or research

To verify that your report includes everything it needs, you can use the PRISMA checklist .

Once your report is written, you can publish it in a systematic review database, such as the Cochrane Database of Systematic Reviews , and/or in a peer-reviewed journal.

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

A literature review is a survey of scholarly sources (such as books, journal articles, and theses) related to a specific topic or research question .

It is often written as part of a dissertation , thesis, research paper , or proposal .

There are several reasons to conduct a literature review at the beginning of a research project:

  • To familiarise yourself with the current state of knowledge on your topic
  • To ensure that you’re not just repeating what others have already done
  • To identify gaps in knowledge and unresolved problems that your research can address
  • To develop your theoretical framework and methodology
  • To provide an overview of the key findings and debates on the topic

Writing the literature review shows your reader how your work relates to existing research and what new insights it will contribute.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Turney, S. (2024, July 17). Systematic Review | Definition, Examples & Guide. Scribbr. Retrieved 3 September 2024, from https://www.scribbr.co.uk/research-methods/systematic-reviews/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, what is a literature review | guide, template, & examples, exploratory research | definition, guide, & examples, what is peer review | types & examples.

  • A-Z Publications

Annual Review of Psychology

Volume 70, 2019, review article, how to do a systematic review: a best practice guide for conducting and reporting narrative reviews, meta-analyses, and meta-syntheses.

  • Andy P. Siddaway 1 , Alex M. Wood 2 , and Larry V. Hedges 3
  • View Affiliations Hide Affiliations Affiliations: 1 Behavioural Science Centre, Stirling Management School, University of Stirling, Stirling FK9 4LA, United Kingdom; email: [email protected] 2 Department of Psychological and Behavioural Science, London School of Economics and Political Science, London WC2A 2AE, United Kingdom 3 Department of Statistics, Northwestern University, Evanston, Illinois 60208, USA; email: [email protected]
  • Vol. 70:747-770 (Volume publication date January 2019) https://doi.org/10.1146/annurev-psych-010418-102803
  • First published as a Review in Advance on August 08, 2018
  • Copyright © 2019 by Annual Reviews. All rights reserved

Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of evidence in relation to a particular research question. The best reviews synthesize studies to draw broad theoretical conclusions about what a literature means, linking theory to evidence and evidence to theory. This guide describes how to plan, conduct, organize, and present a systematic review of quantitative (meta-analysis) or qualitative (narrative review, meta-synthesis) information. We outline core standards and principles and describe commonly encountered problems. Although this guide targets psychological scientists, its high level of abstraction makes it potentially relevant to any subject area or discipline. We argue that systematic reviews are a key methodology for clarifying whether and how research findings replicate and for explaining possible inconsistencies, and we call for researchers to conduct systematic reviews to help elucidate whether there is a replication crisis.

Article metrics loading...

Full text loading...

Literature Cited

  • APA Publ. Commun. Board Work. Group J. Artic. Rep. Stand. 2008 . Reporting standards for research in psychology: Why do we need them? What might they be?. Am. Psychol . 63 : 848– 49 [Google Scholar]
  • Baumeister RF 2013 . Writing a literature review. The Portable Mentor: Expert Guide to a Successful Career in Psychology MJ Prinstein, MD Patterson 119– 32 New York: Springer, 2nd ed.. [Google Scholar]
  • Baumeister RF , Leary MR 1995 . The need to belong: desire for interpersonal attachments as a fundamental human motivation. Psychol. Bull. 117 : 497– 529 [Google Scholar]
  • Baumeister RF , Leary MR 1997 . Writing narrative literature reviews. Rev. Gen. Psychol. 3 : 311– 20 Presents a thorough and thoughtful guide to conducting narrative reviews. [Google Scholar]
  • Bem DJ 1995 . Writing a review article for Psychological Bulletin. Psychol . Bull 118 : 172– 77 [Google Scholar]
  • Borenstein M , Hedges LV , Higgins JPT , Rothstein HR 2009 . Introduction to Meta-Analysis New York: Wiley Presents a comprehensive introduction to meta-analysis. [Google Scholar]
  • Borenstein M , Higgins JPT , Hedges LV , Rothstein HR 2017 . Basics of meta-analysis: I 2 is not an absolute measure of heterogeneity. Res. Synth. Methods 8 : 5– 18 [Google Scholar]
  • Braver SL , Thoemmes FJ , Rosenthal R 2014 . Continuously cumulating meta-analysis and replicability. Perspect. Psychol. Sci. 9 : 333– 42 [Google Scholar]
  • Bushman BJ 1994 . Vote-counting procedures. The Handbook of Research Synthesis H Cooper, LV Hedges 193– 214 New York: Russell Sage Found. [Google Scholar]
  • Cesario J 2014 . Priming, replication, and the hardest science. Perspect. Psychol. Sci. 9 : 40– 48 [Google Scholar]
  • Chalmers I 2007 . The lethal consequences of failing to make use of all relevant evidence about the effects of medical treatments: the importance of systematic reviews. Treating Individuals: From Randomised Trials to Personalised Medicine PM Rothwell 37– 58 London: Lancet [Google Scholar]
  • Cochrane Collab. 2003 . Glossary Rep., Cochrane Collab. London: http://community.cochrane.org/glossary Presents a comprehensive glossary of terms relevant to systematic reviews. [Google Scholar]
  • Cohn LD , Becker BJ 2003 . How meta-analysis increases statistical power. Psychol. Methods 8 : 243– 53 [Google Scholar]
  • Cooper HM 2003 . Editorial. Psychol. Bull. 129 : 3– 9 [Google Scholar]
  • Cooper HM 2016 . Research Synthesis and Meta-Analysis: A Step-by-Step Approach Thousand Oaks, CA: Sage, 5th ed.. Presents a comprehensive introduction to research synthesis and meta-analysis. [Google Scholar]
  • Cooper HM , Hedges LV , Valentine JC 2009 . The Handbook of Research Synthesis and Meta-Analysis New York: Russell Sage Found, 2nd ed.. [Google Scholar]
  • Cumming G 2014 . The new statistics: why and how. Psychol. Sci. 25 : 7– 29 Discusses the limitations of null hypothesis significance testing and viable alternative approaches. [Google Scholar]
  • Earp BD , Trafimow D 2015 . Replication, falsification, and the crisis of confidence in social psychology. Front. Psychol. 6 : 621 [Google Scholar]
  • Etz A , Vandekerckhove J 2016 . A Bayesian perspective on the reproducibility project: psychology. PLOS ONE 11 : e0149794 [Google Scholar]
  • Ferguson CJ , Brannick MT 2012 . Publication bias in psychological science: prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. Psychol. Methods 17 : 120– 28 [Google Scholar]
  • Fleiss JL , Berlin JA 2009 . Effect sizes for dichotomous data. The Handbook of Research Synthesis and Meta-Analysis H Cooper, LV Hedges, JC Valentine 237– 53 New York: Russell Sage Found, 2nd ed.. [Google Scholar]
  • Garside R 2014 . Should we appraise the quality of qualitative research reports for systematic reviews, and if so, how. Innovation 27 : 67– 79 [Google Scholar]
  • Hedges LV , Olkin I 1980 . Vote count methods in research synthesis. Psychol. Bull. 88 : 359– 69 [Google Scholar]
  • Hedges LV , Pigott TD 2001 . The power of statistical tests in meta-analysis. Psychol. Methods 6 : 203– 17 [Google Scholar]
  • Higgins JPT , Green S 2011 . Cochrane Handbook for Systematic Reviews of Interventions, Version 5.1.0 London: Cochrane Collab. Presents comprehensive and regularly updated guidelines on systematic reviews. [Google Scholar]
  • John LK , Loewenstein G , Prelec D 2012 . Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23 : 524– 32 [Google Scholar]
  • Juni P , Witschi A , Bloch R , Egger M 1999 . The hazards of scoring the quality of clinical trials for meta-analysis. JAMA 282 : 1054– 60 [Google Scholar]
  • Klein O , Doyen S , Leys C , Magalhães de Saldanha da Gama PA , Miller S et al. 2012 . Low hopes, high expectations: expectancy effects and the replicability of behavioral experiments. Perspect. Psychol. Sci. 7 : 6 572– 84 [Google Scholar]
  • Lau J , Antman EM , Jimenez-Silva J , Kupelnick B , Mosteller F , Chalmers TC 1992 . Cumulative meta-analysis of therapeutic trials for myocardial infarction. N. Engl. J. Med. 327 : 248– 54 [Google Scholar]
  • Light RJ , Smith PV 1971 . Accumulating evidence: procedures for resolving contradictions among different research studies. Harvard Educ. Rev. 41 : 429– 71 [Google Scholar]
  • Lipsey MW , Wilson D 2001 . Practical Meta-Analysis London: Sage Comprehensive and clear explanation of meta-analysis. [Google Scholar]
  • Matt GE , Cook TD 1994 . Threats to the validity of research synthesis. The Handbook of Research Synthesis H Cooper, LV Hedges 503– 20 New York: Russell Sage Found. [Google Scholar]
  • Maxwell SE , Lau MY , Howard GS 2015 . Is psychology suffering from a replication crisis? What does “failure to replicate” really mean?. Am. Psychol. 70 : 487– 98 [Google Scholar]
  • Moher D , Hopewell S , Schulz KF , Montori V , Gøtzsche PC et al. 2010 . CONSORT explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 340 : c869 [Google Scholar]
  • Moher D , Liberati A , Tetzlaff J , Altman DG PRISMA Group. 2009 . Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ 339 : 332– 36 Comprehensive reporting guidelines for systematic reviews. [Google Scholar]
  • Morrison A , Polisena J , Husereau D , Moulton K , Clark M et al. 2012 . The effect of English-language restriction on systematic review-based meta-analyses: a systematic review of empirical studies. Int. J. Technol. Assess. Health Care 28 : 138– 44 [Google Scholar]
  • Nelson LD , Simmons J , Simonsohn U 2018 . Psychology's renaissance. Annu. Rev. Psychol. 69 : 511– 34 [Google Scholar]
  • Noblit GW , Hare RD 1988 . Meta-Ethnography: Synthesizing Qualitative Studies Newbury Park, CA: Sage [Google Scholar]
  • Olivo SA , Macedo LG , Gadotti IC , Fuentes J , Stanton T , Magee DJ 2008 . Scales to assess the quality of randomized controlled trials: a systematic review. Phys. Ther. 88 : 156– 75 [Google Scholar]
  • Open Sci. Collab. 2015 . Estimating the reproducibility of psychological science. Science 349 : 943 [Google Scholar]
  • Paterson BL , Thorne SE , Canam C , Jillings C 2001 . Meta-Study of Qualitative Health Research: A Practical Guide to Meta-Analysis and Meta-Synthesis Thousand Oaks, CA: Sage [Google Scholar]
  • Patil P , Peng RD , Leek JT 2016 . What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspect. Psychol. Sci. 11 : 539– 44 [Google Scholar]
  • Rosenthal R 1979 . The “file drawer problem” and tolerance for null results. Psychol. Bull. 86 : 638– 41 [Google Scholar]
  • Rosnow RL , Rosenthal R 1989 . Statistical procedures and the justification of knowledge in psychological science. Am. Psychol. 44 : 1276– 84 [Google Scholar]
  • Sanderson S , Tatt ID , Higgins JP 2007 . Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography. Int. J. Epidemiol. 36 : 666– 76 [Google Scholar]
  • Schreiber R , Crooks D , Stern PN 1997 . Qualitative meta-analysis. Completing a Qualitative Project: Details and Dialogue JM Morse 311– 26 Thousand Oaks, CA: Sage [Google Scholar]
  • Shrout PE , Rodgers JL 2018 . Psychology, science, and knowledge construction: broadening perspectives from the replication crisis. Annu. Rev. Psychol. 69 : 487– 510 [Google Scholar]
  • Stroebe W , Strack F 2014 . The alleged crisis and the illusion of exact replication. Perspect. Psychol. Sci. 9 : 59– 71 [Google Scholar]
  • Stroup DF , Berlin JA , Morton SC , Olkin I , Williamson GD et al. 2000 . Meta-analysis of observational studies in epidemiology (MOOSE): a proposal for reporting. JAMA 283 : 2008– 12 [Google Scholar]
  • Thorne S , Jensen L , Kearney MH , Noblit G , Sandelowski M 2004 . Qualitative meta-synthesis: reflections on methodological orientation and ideological agenda. Qual. Health Res. 14 : 1342– 65 [Google Scholar]
  • Tong A , Flemming K , McInnes E , Oliver S , Craig J 2012 . Enhancing transparency in reporting the synthesis of qualitative research: ENTREQ. BMC Med. Res. Methodol. 12 : 181– 88 [Google Scholar]
  • Trickey D , Siddaway AP , Meiser-Stedman R , Serpell L , Field AP 2012 . A meta-analysis of risk factors for post-traumatic stress disorder in children and adolescents. Clin. Psychol. Rev. 32 : 122– 38 [Google Scholar]
  • Valentine JC , Biglan A , Boruch RF , Castro FG , Collins LM et al. 2011 . Replication in prevention science. Prev. Sci. 12 : 103– 17 [Google Scholar]
  • Article Type: Review Article

Most Read This Month

Most cited most cited rss feed, job burnout, executive functions, social cognitive theory: an agentic perspective, on happiness and human potentials: a review of research on hedonic and eudaimonic well-being, sources of method bias in social science research and recommendations on how to control it, mediation analysis, missing data analysis: making it work in the real world, grounded cognition, personality structure: emergence of the five-factor model, motivational beliefs, values, and goals.

Jump to navigation

Home

Cochrane Training

Chapter 1: starting a review.

Toby J Lasserson, James Thomas, Julian PT Higgins

Key Points:

  • Systematic reviews address a need for health decision makers to be able to access high quality, relevant, accessible and up-to-date information.
  • Systematic reviews aim to minimize bias through the use of pre-specified research questions and methods that are documented in protocols, and by basing their findings on reliable research.
  • Systematic reviews should be conducted by a team that includes domain expertise and methodological expertise, who are free of potential conflicts of interest.
  • People who might make – or be affected by – decisions around the use of interventions should be involved in important decisions about the review.
  • Good data management, project management and quality assurance mechanisms are essential for the completion of a successful systematic review.

Cite this chapter as: Lasserson TJ, Thomas J, Higgins JPT. Chapter 1: Starting a review [last updated August 2021]. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.5. Cochrane, 2024. Available from www.training.cochrane.org/handbook . 

1.1 Why do a systematic review?

Systematic reviews were developed out of a need to ensure that decisions affecting people’s lives can be informed by an up-to-date and complete understanding of the relevant research evidence. With the volume of research literature growing at an ever-increasing rate, it is impossible for individual decision makers to assess this vast quantity of primary research to enable them to make the most appropriate healthcare decisions that do more good than harm. By systematically assessing this primary research, systematic reviews aim to provide an up-to-date summary of the state of research knowledge on an intervention, diagnostic test, prognostic factor or other health or healthcare topic. Systematic reviews address the main problem with ad hoc searching and selection of research, namely that of bias. Just as primary research studies use methods to avoid bias, so should summaries and syntheses of that research.

A systematic review attempts to collate all the empirical evidence that fits pre-specified eligibility criteria in order to answer a specific research question. It uses explicit, systematic methods that are selected with a view to minimizing bias, thus providing more reliable findings from which conclusions can be drawn and decisions made (Antman et al 1992, Oxman and Guyatt 1993). Systematic review methodology, pioneered and developed by Cochrane, sets out a highly structured, transparent and reproducible methodology (Chandler and Hopewell 2013). This involves: the a priori specification of a research question; clarity on the scope of the review and which studies are eligible for inclusion; making every effort to find all relevant research and to ensure that issues of bias in included studies are accounted for; and analysing the included studies in order to draw conclusions based on all the identified research in an impartial and objective way.

This Handbook is about systematic reviews on the effects of interventions, and specifically about methods used by Cochrane to undertake them. Cochrane Reviews use primary research to generate new knowledge about the effects of an intervention (or interventions) used in clinical, public health or policy settings. They aim to provide users with a balanced summary of the potential benefits and harms of interventions and give an indication of how certain they can be of the findings. They can also compare the effectiveness of different interventions with one another and so help users to choose the most appropriate intervention in particular situations. The primary purpose of Cochrane Reviews is therefore to inform people making decisions about health or health care.

Systematic reviews are important for other reasons. New research should be designed or commissioned only if it does not unnecessarily duplicate existing research (Chalmers et al 2014). Therefore, a systematic review should typically be undertaken before embarking on new primary research. Such a review will identify current and ongoing studies, as well as indicate where specific gaps in knowledge exist, or evidence is lacking; for example, where existing studies have not used outcomes that are important to users of research (Macleod et al 2014). A systematic review may also reveal limitations in the conduct of previous studies that might be addressed in the new study or studies.

Systematic reviews are important, often rewarding and, at times, exciting research projects. They offer the opportunity for authors to make authoritative statements about the extent of human knowledge in important areas and to identify priorities for further research. They sometimes cover issues high on the political agenda and receive attention from the media. Conducting research with these impacts is not without its challenges, however, and completing a high-quality systematic review is often demanding and time-consuming. In this chapter we introduce some of the key considerations for potential review authors who are about to start a systematic review.

1.2 What is the review question?

Getting the research question right is critical for the success of a systematic review. Review authors should ensure that the review addresses an important question to those who are expected to use and act upon its conclusions.

We discuss the formulation of questions in detail in Chapter 2 . For a question about the effects of an intervention, the PICO approach is usually used, which is an acronym for Population, Intervention, Comparison(s) and Outcome. Reviews may have additional questions, for example about how interventions were implemented, economic issues, equity issues or patient experience.

To ensure that the review addresses a relevant question in a way that benefits users, it is important to ensure wide input. In most cases, question formulation should therefore be informed by people with various relevant – but potentially different – perspectives (see Chapter 2, Section 2.4 ).

1.3 Who should do a systematic review?

Systematic reviews should be undertaken by a team. Indeed, Cochrane will not publish a review that is proposed to be undertaken by a single person. Working as a team not only spreads the effort, but ensures that tasks such as the selection of studies for eligibility, data extraction and rating the certainty of the evidence will be performed by at least two people independently, minimizing the likelihood of errors. First-time review authors are encouraged to work with others who are experienced in the process of systematic reviews and to attend relevant training.

Review teams must include expertise in the topic area under review. Topic expertise should not be overly narrow, to ensure that all relevant perspectives are considered. Perspectives from different disciplines can help to avoid assumptions or terminology stemming from an over-reliance on a single discipline. Review teams should also include expertise in systematic review methodology, including statistical expertise.

Arguments have been made that methodological expertise is sufficient to perform a review, and that content expertise should be avoided because of the risk of preconceptions about the effects of interventions (Gøtzsche and Ioannidis 2012). However, it is important that both topic and methodological expertise is present to ensure a good mix of skills, knowledge and objectivity, because topic expertise provides important insight into the implementation of the intervention(s), the nature of the condition being treated or prevented, the relationships between outcomes measured, and other factors that may have an impact on decision making.

A Cochrane Review should represent an independent assessment of the evidence and avoiding financial and non-financial conflicts of interest often requires careful management. It will be important to consider if there are any relevant interests that may constitute a conflict of interest. There are situations where employment, holding of patents and other financial support should prevent people joining an author team. Funding of Cochrane Reviews by commercial organizations with an interest in the outcome of the review is not permitted. To ensure that any issues are identified early in the process, authors planning Cochrane Reviews should consult the Conflict of Interest Policy . Authors should make complete declarations of interest before registration of the review, and refresh these annually thereafter until publication and just prior to publication of the protocol and the review. For authors of review updates, this must be done at the time of the decision to update the review, annually thereafter until publication, and just prior to publication. Authors should also update declarations of interest at any point when their circumstances change.

1.3.1 Involving consumers and other stakeholders

Because the priorities of decision makers and consumers may be different from those of researchers, it is important that review authors consider carefully what questions are important to these different stakeholders. Systematic reviews are more likely to be relevant to a broad range of end users if they are informed by the involvement of people with a range of experiences, in terms of both the topic and the methodology (Thomas et al 2004, Rees and Oliver 2017). Engaging consumers and other stakeholders, such as policy makers, research funders and healthcare professionals, increases relevance, promotes mutual learning, improved uptake and decreases research waste.

Mapping out all potential stakeholders specific to the review question is a helpful first step to considering who might be invited to be involved in a review. Stakeholders typically include: patients and consumers; consumer advocates; policy makers and other public officials; guideline developers; professional organizations; researchers; funders of health services and research; healthcare practitioners, and, on occasion, journalists and other media professionals. Balancing seniority, credibility within the given field, and diversity should be considered. Review authors should also take account of the needs of resource-poor countries and regions in the review process (see Chapter 16 ) and invite appropriate input on the scope of the review and the questions it will address.

It is established good practice to ensure that consumers are involved and engaged in health research, including systematic reviews. Cochrane uses the term ‘consumers’ to refer to a wide range of people, including patients or people with personal experience of a healthcare condition, carers and family members, representatives of patients and carers, service users and members of the public. In 2017, a Statement of Principles for consumer involvement in Cochrane was agreed. This seeks to change the culture of research practice to one where both consumers and other stakeholders are joint partners in research from planning, conduct, and reporting to dissemination. Systematic reviews that have had consumer involvement should be more directly applicable to decision makers than those that have not (see online Chapter II ).

1.3.2 Working with consumers and other stakeholders

Methods for working with consumers and other stakeholders include surveys, workshops, focus groups and involvement in advisory groups. Decisions about what methods to use will typically be based on resource availability, but review teams should be aware of the merits and limitations of such methods. Authors will need to decide who to involve and how to provide adequate support for their involvement. This can include financial reimbursement, the provision of training, and stating clearly expectations of involvement, possibly in the form of terms of reference.

While a small number of consumers or other stakeholders may be part of the review team and become co-authors of the subsequent review, it is sometimes important to bring in a wider range of perspectives and to recognize that not everyone has the capacity or interest in becoming an author. Advisory groups offer a convenient approach to involving consumers and other relevant stakeholders, especially for topics in which opinions differ. Important points to ensure successful involvement include the following.

  • The review team should co-ordinate the input of the advisory group to inform key review decisions.
  • The advisory group’s input should continue throughout the systematic review process to ensure relevance of the review to end users is maintained.
  • Advisory group membership should reflect the breadth of the review question, and consideration should be given to involving vulnerable and marginalized people (Steel 2004) to ensure that conclusions on the value of the interventions are well-informed and applicable to all groups in society (see Chapter 16 ).

Templates such as terms of reference, job descriptions, or person specifications for an advisory group help to ensure clarity about the task(s) required and are available from INVOLVE . The website also gives further information on setting and organizing advisory groups. See also the Cochrane training website for further resources to support consumer involvement.

1.4 The importance of reliability

Systematic reviews aim to be an accurate representation of the current state of knowledge about a given issue. As understanding improves, the review can be updated. Nevertheless, it is important that the review itself is accurate at the time of publication. There are two main reasons for this imperative for accuracy. First, health decisions that affect people’s lives are increasingly taken based on systematic review findings. Current knowledge may be imperfect, but decisions will be better informed when taken in the light of the best of current knowledge. Second, systematic reviews form a critical component of legal and regulatory frameworks; for example, drug licensing or insurance coverage. Here, systematic reviews also need to hold up as auditable processes for legal examination. As systematic reviews need to be both correct, and be seen to be correct, detailed evidence-based methods have been developed to guide review authors as to the most appropriate procedures to follow, and what information to include in their reports to aid auditability.

1.4.1 Expectations for the conduct and reporting of Cochrane Reviews

Cochrane has developed methodological expectations for the conduct, reporting and updating of systematic reviews of interventions (MECIR) and their plain language summaries ( Plain Language Expectations for Authors of Cochrane Summaries ; PLEACS). Developed collaboratively by methodologists and Cochrane editors, they are intended to describe the desirable attributes of a Cochrane Review. The expectations are not all relevant at the same stage of review conduct, so care should be taken to identify those that are relevant at specific points during the review. Different methods should be used at different stages of the review in terms of the planning, conduct, reporting and updating of the review.

Each expectation has a title, a rationale and an elaboration. For the purposes of publication of a review with Cochrane, each has the status of either ‘mandatory’ or ‘highly desirable’. Items described as mandatory are expected to be applied, and if they are not then an appropriate justification should be provided; failure to implement such items may be used as a basis for deciding not to publish a review in the Cochrane Database of Systematic Reviews (CDSR). Items described as highly desirable should generally be implemented, but there are reasonable exceptions and justifications are not required.

All MECIR expectations for the conduct of a review are presented in the relevant chapters of this Handbook . Expectations for reporting of completed reviews (including PLEACS) are described in online Chapter III . The recommendations provided in the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) Statement have been incorporated into the Cochrane reporting expectations, ensuring compliance with the PRISMA recommendations and summarizing attributes of reporting that should allow a full assessment of the methods and findings of the review (Moher et al 2009).

1.5 Protocol development

Preparing a systematic review is complex and involves many judgements. To minimize the potential for bias in the review process, these judgements should be made as far as possible in ways that do not depend on the findings of the studies included in the review. Review authors’ prior knowledge of the evidence may, for example, influence the definition of a systematic review question, the choice of criteria for study eligibility, or the pre-specification of intervention comparisons and outcomes to analyse. It is important that the methods to be used should be established and documented in advance (see MECIR Box 1.5.a , MECIR Box 1.5.b and MECIR Box 1.5.c ).

Publication of a protocol for a review that is written without knowledge of the available studies reduces the impact of review authors’ biases, promotes transparency of methods and processes, reduces the potential for duplication, allows peer review of the planned methods before they have been completed, and offers an opportunity for the review team to plan resources and logistics for undertaking the review itself. All chapters in the Handbook should be consulted when drafting the protocol. Since systematic reviews are by their nature retrospective, an element of knowledge of the evidence is often inevitable. This is one reason why non-content experts such as methodologists should be part of the review team (see Section 1.3 ). Two exceptions to the retrospective nature of a systematic review are a meta-analysis of a prospectively planned series of trials and some living systematic reviews, as described in Chapter 22 .

The review question should determine the methods used in the review, and not vice versa. The question may concern a relatively straightforward comparison of one treatment with another; or it may necessitate plans to compare different treatments as part of a network meta-analysis, or assess differential effects of an intervention in different populations or delivered in different ways.

The protocol sets out the context in which the review is being conducted. It presents an opportunity to develop ideas that are foundational for the review. This concerns, most explicitly, definition of the eligibility criteria such as the study participants and the choice of comparators and outcomes. The eligibility criteria may also be defined following the development of a logic model (or an articulation of the aspects of an extent logic model that the review is addressing) to explain how the intervention might work (see Chapter 2, Section 2.5.1 ).

MECIR Box 1.5.a Relevant expectations for conduct of intervention reviews

: Planning the search ( )

.

Searches should be motivated directly by the eligibility criteria for the review, and it is important that all types of eligible studies are considered when planning the search. If searches are restricted by publication status or by language of publication, there is a possibility of publication bias, or language bias (whereby the language of publication is selected in a way that depends on the findings of the study), or both. Removing language restrictions in English language databases is not a good substitute for searching non-English language journals and databases.

A key purpose of the protocol is to make plans to minimize bias in the eventual findings of the review. Reliable synthesis of available evidence requires a planned, systematic approach. Threats to the validity of systematic reviews can come from the studies they include or the process by which reviews are conducted. Biases within the studies can arise from the method by which participants are allocated to the intervention groups, awareness of intervention group assignment, and the collection, analysis and reporting of data. Methods for examining these issues should be specified in the protocol. Review processes can generate bias through a failure to identify an unbiased (and preferably complete) set of studies, and poor quality assurance throughout the review. The availability of research may be influenced by the nature of the results (i.e. reporting bias). To reduce the impact of this form of bias, searching may need to include unpublished sources of evidence (Dwan et al 2013) ( MECIR Box 1.5.b ).

MECIR Box 1.5.b Relevant expectations for the conduct of intervention reviews

Planning the assessment of risk of bias in included studies ( )

.

Predefining the methods and criteria for assessing risk of bias is important since analysis or interpretation of the review findings may be affected by the judgements made during this process. For randomized trials, use of the Cochrane risk-of-bias tool is Mandatory, so it is sufficient (and easiest) simply to refer to the definitions of low risk, unclear risk and high risk of bias provided in the .

Developing a protocol for a systematic review has benefits beyond reducing bias. Investing effort in designing a systematic review will make the process more manageable and help to inform key priorities for the review. Defining the question, referring to it throughout, and using appropriate methods to address the question focuses the analysis and reporting, ensuring the review is most likely to inform treatment decisions for funders, policy makers, healthcare professionals and consumers. Details of the planned analyses, including investigations of variability across studies, should be specified in the protocol, along with methods for interpreting the results through the systematic consideration of factors that affect confidence in estimates of intervention effect ( MECIR Box 1.5.c ).

MECIR Box 1.5.c Relevant expectations for conduct of intervention reviews

Planning the synthesis of results ( )

.

Predefining the synthesis methods, particularly the statistical methods, is important, since analysis or interpretation of the review findings may be affected by the judgements made during this process.

Planning sub-group analyses ( )

.

Pre-specification reduces the risk that large numbers of undirected subgroup analyses will lead to spurious explanations of heterogeneity.

Planning the GRADE assessment and ‘Summary of findings’ table ( )

.

Methods for assessing the certainty of evidence for the most important outcomes in the review need to be pre-specified. In ‘Summary of findings’ tables the most important feature is to predefine the choice of outcomes in order to guard against selective presentation of results in the review. The table should include the essential outcomes for decision making (typically up to seven), which generally should not include surrogate or interim outcomes. The choice of outcomes should not be based on any anticipated or observed magnitude of effect, or because they are likely to have been addressed in the studies to be reviewed.

While the intention should be that a review will adhere to the published protocol, changes in a review protocol are sometimes necessary. This is also the case for a protocol for a randomized trial, which must sometimes be changed to adapt to unanticipated circumstances such as problems with participant recruitment, data collection or event rates. While every effort should be made to adhere to a predetermined protocol, this is not always possible or appropriate. It is important, however, that changes in the protocol should not be made based on how they affect the outcome of the research study, whether it is a randomized trial or a systematic review. Post hoc decisions made when the impact on the results of the research is known, such as excluding selected studies from a systematic review, or changing the statistical analysis, are highly susceptible to bias and should therefore be avoided unless there are reasonable grounds for doing this.

Enabling access to a protocol through publication (all Cochrane Protocols are published in the CDSR ) and registration on the PROSPERO register of systematic reviews reduces duplication of effort, research waste, and promotes accountability. Changes to the methods outlined in the protocol should be transparently declared.

This Handbook provides details of the systematic review methods developed or selected by Cochrane. They are intended to address the need for rigour, comprehensiveness and transparency in preparing a Cochrane systematic review. All relevant chapters – including those describing procedures to be followed in the later stages of the review – should be consulted during the preparation of the protocol. A more specific description of the structure of Cochrane Protocols is provide in online Chapter II .

1.6 Data management and quality assurance

Systematic reviews should be replicable, and retaining a record of the inclusion decisions, data collection, transformations or adjustment of data will help to establish a secure and retrievable audit trail. They can be operationally complex projects, often involving large research teams operating in different sites across the world. Good data management processes are essential to ensure that data are not inadvertently lost, facilitating the identification and correction of errors and supporting future efforts to update and maintain the review. Transparent reporting of review decisions enables readers to assess the reliability of the review for themselves.

Review management software, such as Covidence and EPPI-Reviewer , can be used to assist data management and maintain consistent and standardized records of decisions made throughout the review. These tools offer a central repository for review data that can be accessed remotely throughout the world by members of the review team. They record independent assessment of studies for inclusion, risk of bias and extraction of data, enabling checks to be made later in the process if needed. Research has shown that even experienced reviewers make mistakes and disagree with one another on risk-of-bias assessments, so it is particularly important to maintain quality assurance here, despite its cost in terms of author time. As more sophisticated information technology tools begin to be deployed in reviews (see Chapter 4, Section 4.6.6.2 and Chapter 22, Section 22.2.4 ), it is increasingly apparent that all review data – including the initial decisions about study eligibility – have value beyond the scope of the individual review. For example, review updates can be made more efficient through (semi-) automation when data from the original review are available for machine learning.

1.7 Chapter information

Authors: Toby J Lasserson, James Thomas, Julian PT Higgins

Acknowledgements: This chapter builds on earlier versions of the Handbook . We would like to thank Ruth Foxlee, Richard Morley, Soumyadeep Bhaumik, Mona Nasser, Dan Fox and Sally Crowe for their contributions to Section 1.3 .

Funding: JT is supported by the National Institute for Health Research (NIHR) Collaboration for Leadership in Applied Health Research and Care North Thames at Barts Health NHS Trust. JPTH is a member of the NIHR Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol. JPTH received funding from National Institute for Health Research Senior Investigator award NF-SI-0617-10145. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

1.8 References

Antman E, Lau J, Kupelnick B, Mosteller F, Chalmers T. A comparison of results of meta-analyses of randomized control trials and recommendations of clinical experts: treatment for myocardial infarction. JAMA 1992; 268 : 240–248.

Chalmers I, Bracken MB, Djulbegovic B, Garattini S, Grant J, Gulmezoglu AM, Howells DW, Ioannidis JP, Oliver S. How to increase value and reduce waste when research priorities are set. Lancet 2014; 383 : 156–165.

Chandler J, Hopewell S. Cochrane methods – twenty years experience in developing systematic review methods. Systematic Reviews 2013; 2 : 76.

Dwan K, Gamble C, Williamson PR, Kirkham JJ, Reporting Bias Group. Systematic review of the empirical evidence of study publication bias and outcome reporting bias: an updated review. PloS One 2013; 8 : e66844.

Gøtzsche PC, Ioannidis JPA. Content area experts as authors: helpful or harmful for systematic reviews and meta-analyses? BMJ 2012; 345 .

Macleod MR, Michie S, Roberts I, Dirnagl U, Chalmers I, Ioannidis JP, Al-Shahi Salman R, Chan AW, Glasziou P. Biomedical research: increasing value, reducing waste. Lancet 2014; 383 : 101–104.

Moher D, Liberati A, Tetzlaff J, Altman D, PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Medicine 2009; 6 : e1000097.

Oxman A, Guyatt G. The science of reviewing research. Annals of the New York Academy of Sciences 1993; 703 : 125–133.

Rees R, Oliver S. Stakeholder perspectives and participation in reviews. In: Gough D, Oliver S, Thomas J, editors. An Introduction to Systematic Reviews . 2nd ed. London: Sage; 2017. p. 17–34.

Steel R. Involving marginalised and vulnerable people in research: a consultation document (2nd revision). INVOLVE; 2004.

Thomas J, Harden A, Oakley A, Oliver S, Sutcliffe K, Rees R, Brunton G, Kavanagh J. Integrating qualitative research with trials in systematic reviews. BMJ 2004; 328 : 1010–1012.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Methodology of a systematic review

Affiliations.

  • 1 Hospital Universitario La Paz, Madrid, España. Electronic address: [email protected].
  • 2 Hospital Universitario Fundación Alcorcón, Madrid, España.
  • 3 Instituto Valenciano de Oncología, Valencia, España.
  • 4 Hospital Universitario de Cabueñes, Gijón, Asturias, España.
  • 5 Hospital Universitario Ramón y Cajal, Madrid, España.
  • 6 Hospital Universitario Gregorio Marañón, Madrid, España.
  • 7 Hospital Universitario de Canarias, Tenerife, España.
  • 8 Hospital Clínic, Barcelona, España; EAU Guidelines Office Board Member.
  • PMID: 29731270
  • DOI: 10.1016/j.acuro.2018.01.010

Context: The objective of evidence-based medicine is to employ the best scientific information available to apply to clinical practice. Understanding and interpreting the scientific evidence involves understanding the available levels of evidence, where systematic reviews and meta-analyses of clinical trials are at the top of the levels-of-evidence pyramid.

Acquisition of evidence: The review process should be well developed and planned to reduce biases and eliminate irrelevant and low-quality studies. The steps for implementing a systematic review include (i) correctly formulating the clinical question to answer (PICO), (ii) developing a protocol (inclusion and exclusion criteria), (iii) performing a detailed and broad literature search and (iv) screening the abstracts of the studies identified in the search and subsequently of the selected complete texts (PRISMA).

Synthesis of the evidence: Once the studies have been selected, we need to (v) extract the necessary data into a form designed in the protocol to summarise the included studies, (vi) assess the biases of each study, identifying the quality of the available evidence, and (vii) develop tables and text that synthesise the evidence.

Conclusions: A systematic review involves a critical and reproducible summary of the results of the available publications on a particular topic or clinical question. To improve scientific writing, the methodology is shown in a structured manner to implement a systematic review.

Keywords: Meta-analysis; Metaanálisis; Methodology; Metodología; Revisión sistemática; Systematic review.

Copyright © 2018 AEU. Publicado por Elsevier España, S.L.U. All rights reserved.

PubMed Disclaimer

Similar articles

  • Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas. Crider K, Williams J, Qi YP, Gutman J, Yeung L, Mai C, Finkelstain J, Mehta S, Pons-Duran C, Menéndez C, Moraleda C, Rogers L, Daniels K, Green P. Crider K, et al. Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
  • The Effectiveness of Integrated Care Pathways for Adults and Children in Health Care Settings: A Systematic Review. Allen D, Gillen E, Rixson L. Allen D, et al. JBI Libr Syst Rev. 2009;7(3):80-129. doi: 10.11124/01938924-200907030-00001. JBI Libr Syst Rev. 2009. PMID: 27820426
  • The future of Cochrane Neonatal. Soll RF, Ovelman C, McGuire W. Soll RF, et al. Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12. Early Hum Dev. 2020. PMID: 33036834
  • Palliative Treatment of Cancer-Related Pain [Internet]. Kongsgaard U, Kaasa S, Dale O, Ottesen S, Nordøy T, Hessling SE, von Hofacker S, Bruland ØS, Lyngstadaas A. Kongsgaard U, et al. Oslo, Norway: Knowledge Centre for the Health Services at The Norwegian Institute of Public Health (NIPH); 2005 Dec. Report from Norwegian Knowledge Centre for the Health Services (NOKC) No. 09-2005. Oslo, Norway: Knowledge Centre for the Health Services at The Norwegian Institute of Public Health (NIPH); 2005 Dec. Report from Norwegian Knowledge Centre for the Health Services (NOKC) No. 09-2005. PMID: 29320015 Free Books & Documents. Review.
  • WHO/ILO work-related burden of disease and injury: Protocol for systematic reviews of occupational exposure to dusts and/or fibres and of the effect of occupational exposure to dusts and/or fibres on pneumoconiosis. Mandrioli D, Schlünssen V, Ádám B, Cohen RA, Colosio C, Chen W, Fischer A, Godderis L, Göen T, Ivanov ID, Leppink N, Mandic-Rajcevic S, Masci F, Nemery B, Pega F, Prüss-Üstün A, Sgargi D, Ujita Y, van der Mierden S, Zungu M, Scheepers PTJ. Mandrioli D, et al. Environ Int. 2018 Oct;119:174-185. doi: 10.1016/j.envint.2018.06.005. Epub 2018 Jun 27. Environ Int. 2018. PMID: 29958118 Review.
  • Effects of whole-body vibration on bone mineral density in postmenopausal women: an overview of systematic reviews. Yin S, Liu Y, Zhong Y, Zhu F. Yin S, et al. BMC Womens Health. 2024 Aug 6;24(1):444. doi: 10.1186/s12905-024-03290-x. BMC Womens Health. 2024. PMID: 39107743 Free PMC article.
  • Effects of different nutrition interventions on sarcopenia criteria in older people: A study protocol for a systematic review of systematic reviews with meta-analysis. Ferreira LF, Roda Cardoso J, Telles da Rosa LH. Ferreira LF, et al. PLoS One. 2024 May 10;19(5):e0302843. doi: 10.1371/journal.pone.0302843. eCollection 2024. PLoS One. 2024. PMID: 38728270 Free PMC article.
  • Editorial: Reviews in psychiatry 2022: psychopharmacology. Taube M. Taube M. Front Psychiatry. 2024 Feb 28;15:1382027. doi: 10.3389/fpsyt.2024.1382027. eCollection 2024. Front Psychiatry. 2024. PMID: 38482070 Free PMC article. No abstract available.
  • Writing a Scientific Review Article: Comprehensive Insights for Beginners. Amobonye A, Lalung J, Mheta G, Pillai S. Amobonye A, et al. ScientificWorldJournal. 2024 Jan 17;2024:7822269. doi: 10.1155/2024/7822269. eCollection 2024. ScientificWorldJournal. 2024. PMID: 38268745 Free PMC article. Review.
  • Appraising systematic reviews: a comprehensive guide to ensuring validity and reliability. Shaheen N, Shaheen A, Ramadan A, Hefnawy MT, Ramadan A, Ibrahim IA, Hassanein ME, Ashour ME, Flouty O. Shaheen N, et al. Front Res Metr Anal. 2023 Dec 21;8:1268045. doi: 10.3389/frma.2023.1268045. eCollection 2023. Front Res Metr Anal. 2023. PMID: 38179256 Free PMC article. Review.
  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Elsevier Science

Other Literature Sources

  • scite Smart Citations

Research Materials

  • NCI CPTC Antibody Characterization Program
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

1.2.2  What is a systematic review?

A systematic review attempts to collate all empirical evidence that fits pre-specified eligibility criteria in order to answer a specific research question.  It  uses explicit, systematic methods that are selected with a view to minimizing bias, thus providing more reliable findings from which conclusions can be drawn and decisions made (Antman 1992, Oxman 1993) . The key characteristics of a systematic review are:

a clearly stated set of objectives with pre-defined eligibility criteria for studies;

an explicit, reproducible methodology;

a systematic search that attempts to identify all studies that would meet the eligibility criteria;

an assessment of the validity of the findings of the included studies, for example through the assessment of risk of bias; and

a systematic presentation, and synthesis, of the characteristics and findings of the included studies.

Many systematic reviews contain meta-analyses. Meta-analysis is the use of statistical methods to summarize the results of independent studies (Glass 1976). By combining information from all relevant studies, meta-analyses can provide more precise estimates of the effects of health care than those derived from the individual studies included within a review (see Chapter 9, Section 9.1.3 ). They also facilitate investigations of the consistency of evidence across studies, and the exploration of differences across studies.

Introduction to Systematic Reviews

In this guide.

  • Introduction
  • Types of Reviews
  • Systematic Review Process
  • Protocols & Guidelines
  • Data Extraction and Screening
  • Resources & Tools

What is a Systematic Review?

Knowledge synthesis is a term used to describe the method of synthesizing results from individual studies and interpreting these results within the larger body of knowledge on the topic. It requires highly structured, transparent and reproducible methods using quantitative and/or qualitative evidence. Systematic reviews, meta-analyses, scoping reviews, rapid reviews, narrative syntheses, practice guidelines, among others, are all forms of knowledge syntheses. For more information on types of reviews, visit the "Types of Reviews" tab on the left.

A systematic review varies from an ordinary literature review in that it uses a comprehensive, methodical, transparent and reproducible search strategy to ensure conclusions are as unbiased and closer to the truth as possible. The Cochrane Handbook for Systematic Reviews of Interventions  defines a systematic review as:

"A systematic review attempts to identify, appraise and synthesize all the empirical evidence that meets pre-specified eligibility criteria to answer a given research question. Researchers conducting systematic reviews use explicit methods aimed at minimizing bias, in order to produce more reliable findings that can be used to inform decision making [...] This involves: the a priori specification of a research question; clarity on the scope of the review and which studies are eligible for inclusion; making every effort to find all relevant research and to ensure that issues of bias in included studies are accounted for; and analysing the included studies in order to draw conclusions based on all the identified research in an impartial and objective way." ( Chapter 1: Starting a review )

What are systematic reviews? from Cochrane on Youtube .

  • Next: Types of Reviews >>
  • Last Updated: Jul 31, 2024 1:44 PM
  • URL: https://laneguides.stanford.edu/systematicreviews
  • Open access
  • Published: 08 June 2023

Guidance to best tools and practices for systematic reviews

  • Kat Kolaski 1 ,
  • Lynne Romeiser Logan 2 &
  • John P. A. Ioannidis 3  

Systematic Reviews volume  12 , Article number:  96 ( 2023 ) Cite this article

25k Accesses

22 Citations

75 Altmetric

Metrics details

Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods. In addition, guideline developers, peer reviewers, and journal editors often disregard current methodological standards. Although extensively acknowledged and explored in the methodological literature, most clinicians seem unaware of these issues and may automatically accept evidence syntheses (and clinical practice guidelines based on their conclusions) as trustworthy.

A plethora of methods and tools are recommended for the development and evaluation of evidence syntheses. It is important to understand what these are intended to do (and cannot do) and how they can be utilized. Our objective is to distill this sprawling information into a format that is understandable and readily accessible to authors, peer reviewers, and editors. In doing so, we aim to promote appreciation and understanding of the demanding science of evidence synthesis among stakeholders. We focus on well-documented deficiencies in key components of evidence syntheses to elucidate the rationale for current standards. The constructs underlying the tools developed to assess reporting, risk of bias, and methodological quality of evidence syntheses are distinguished from those involved in determining overall certainty of a body of evidence. Another important distinction is made between those tools used by authors to develop their syntheses as opposed to those used to ultimately judge their work.

Exemplar methods and research practices are described, complemented by novel pragmatic strategies to improve evidence syntheses. The latter include preferred terminology and a scheme to characterize types of research evidence. We organize best practice resources in a Concise Guide that can be widely adopted and adapted for routine implementation by authors and journals. Appropriate, informed use of these is encouraged, but we caution against their superficial application and emphasize their endorsement does not substitute for in-depth methodological training. By highlighting best practices with their rationale, we hope this guidance will inspire further evolution of methods and tools that can advance the field.

Part 1. The state of evidence synthesis

Evidence syntheses are commonly regarded as the foundation of evidence-based medicine (EBM). They are widely accredited for providing reliable evidence and, as such, they have significantly influenced medical research and clinical practice. Despite their uptake throughout health care and ubiquity in contemporary medical literature, some important aspects of evidence syntheses are generally overlooked or not well recognized. Evidence syntheses are mostly retrospective exercises, they often depend on weak or irreparably flawed data, and they may use tools that have acknowledged or yet unrecognized limitations. They are complicated and time-consuming undertakings prone to bias and errors. Production of a good evidence synthesis requires careful preparation and high levels of organization in order to limit potential pitfalls [ 1 ]. Many authors do not recognize the complexity of such an endeavor and the many methodological challenges they may encounter. Failure to do so is likely to result in research and resource waste.

Given their potential impact on people’s lives, it is crucial for evidence syntheses to correctly report on the current knowledge base. In order to be perceived as trustworthy, reliable demonstration of the accuracy of evidence syntheses is equally imperative [ 2 ]. Concerns about the trustworthiness of evidence syntheses are not recent developments. From the early years when EBM first began to gain traction until recent times when thousands of systematic reviews are published monthly [ 3 ] the rigor of evidence syntheses has always varied. Many systematic reviews and meta-analyses had obvious deficiencies because original methods and processes had gaps, lacked precision, and/or were not widely known. The situation has improved with empirical research concerning which methods to use and standardization of appraisal tools. However, given the geometrical increase in the number of evidence syntheses being published, a relatively larger pool of unreliable evidence syntheses is being published today.

Publication of methodological studies that critically appraise the methods used in evidence syntheses is increasing at a fast pace. This reflects the availability of tools specifically developed for this purpose [ 4 , 5 , 6 ]. Yet many clinical specialties report that alarming numbers of evidence syntheses fail on these assessments. The syntheses identified report on a broad range of common conditions including, but not limited to, cancer, [ 7 ] chronic obstructive pulmonary disease, [ 8 ] osteoporosis, [ 9 ] stroke, [ 10 ] cerebral palsy, [ 11 ] chronic low back pain, [ 12 ] refractive error, [ 13 ] major depression, [ 14 ] pain, [ 15 ] and obesity [ 16 , 17 ]. The situation is even more concerning with regard to evidence syntheses included in clinical practice guidelines (CPGs) [ 18 , 19 , 20 ]. Astonishingly, in a sample of CPGs published in 2017–18, more than half did not apply even basic systematic methods in the evidence syntheses used to inform their recommendations [ 21 ].

These reports, while not widely acknowledged, suggest there are pervasive problems not limited to evidence syntheses that evaluate specific kinds of interventions or include primary research of a particular study design (eg, randomized versus non-randomized) [ 22 ]. Similar concerns about the reliability of evidence syntheses have been expressed by proponents of EBM in highly circulated medical journals [ 23 , 24 , 25 , 26 ]. These publications have also raised awareness about redundancy, inadequate input of statistical expertise, and deficient reporting. These issues plague primary research as well; however, there is heightened concern for the impact of these deficiencies given the critical role of evidence syntheses in policy and clinical decision-making.

Methods and guidance to produce a reliable evidence synthesis

Several international consortiums of EBM experts and national health care organizations currently provide detailed guidance (Table 1 ). They draw criteria from the reporting and methodological standards of currently recommended appraisal tools, and regularly review and update their methods to reflect new information and changing needs. In addition, they endorse the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for rating the overall quality of a body of evidence [ 27 ]. These groups typically certify or commission systematic reviews that are published in exclusive databases (eg, Cochrane, JBI) or are used to develop government or agency sponsored guidelines or health technology assessments (eg, National Institute for Health and Care Excellence [NICE], Scottish Intercollegiate Guidelines Network [SIGN], Agency for Healthcare Research and Quality [AHRQ]). They offer developers of evidence syntheses various levels of methodological advice, technical and administrative support, and editorial assistance. Use of specific protocols and checklists are required for development teams within these groups, but their online methodological resources are accessible to any potential author.

Notably, Cochrane is the largest single producer of evidence syntheses in biomedical research; however, these only account for 15% of the total [ 28 ]. The World Health Organization requires Cochrane standards be used to develop evidence syntheses that inform their CPGs [ 29 ]. Authors investigating questions of intervention effectiveness in syntheses developed for Cochrane follow the Methodological Expectations of Cochrane Intervention Reviews [ 30 ] and undergo multi-tiered peer review [ 31 , 32 ]. Several empirical evaluations have shown that Cochrane systematic reviews are of higher methodological quality compared with non-Cochrane reviews [ 4 , 7 , 9 , 11 , 14 , 32 , 33 , 34 , 35 ]. However, some of these assessments have biases: they may be conducted by Cochrane-affiliated authors, and they sometimes use scales and tools developed and used in the Cochrane environment and by its partners. In addition, evidence syntheses published in the Cochrane database are not subject to space or word restrictions, while non-Cochrane syntheses are often limited. As a result, information that may be relevant to the critical appraisal of non-Cochrane reviews is often removed or is relegated to online-only supplements that may not be readily or fully accessible [ 28 ].

Influences on the state of evidence synthesis

Many authors are familiar with the evidence syntheses produced by the leading EBM organizations but can be intimidated by the time and effort necessary to apply their standards. Instead of following their guidance, authors may employ methods that are discouraged or outdated 28]. Suboptimal methods described in in the literature may then be taken up by others. For example, the Newcastle–Ottawa Scale (NOS) is a commonly used tool for appraising non-randomized studies [ 36 ]. Many authors justify their selection of this tool with reference to a publication that describes the unreliability of the NOS and recommends against its use [ 37 ]. Obviously, the authors who cite this report for that purpose have not read it. Authors and peer reviewers have a responsibility to use reliable and accurate methods and not copycat previous citations or substandard work [ 38 , 39 ]. Similar cautions may potentially extend to automation tools. These have concentrated on evidence searching [ 40 ] and selection given how demanding it is for humans to maintain truly up-to-date evidence [ 2 , 41 ]. Cochrane has deployed machine learning to identify randomized controlled trials (RCTs) and studies related to COVID-19, [ 2 , 42 ] but such tools are not yet commonly used [ 43 ]. The routine integration of automation tools in the development of future evidence syntheses should not displace the interpretive part of the process.

Editorials about unreliable or misleading systematic reviews highlight several of the intertwining factors that may contribute to continued publication of unreliable evidence syntheses: shortcomings and inconsistencies of the peer review process, lack of endorsement of current standards on the part of journal editors, the incentive structure of academia, industry influences, publication bias, and the lure of “predatory” journals [ 44 , 45 , 46 , 47 , 48 ]. At this juncture, clarification of the extent to which each of these factors contribute remains speculative, but their impact is likely to be synergistic.

Over time, the generalized acceptance of the conclusions of systematic reviews as incontrovertible has affected trends in the dissemination and uptake of evidence. Reporting of the results of evidence syntheses and recommendations of CPGs has shifted beyond medical journals to press releases and news headlines and, more recently, to the realm of social media and influencers. The lay public and policy makers may depend on these outlets for interpreting evidence syntheses and CPGs. Unfortunately, communication to the general public often reflects intentional or non-intentional misrepresentation or “spin” of the research findings [ 49 , 50 , 51 , 52 ] News and social media outlets also tend to reduce conclusions on a body of evidence and recommendations for treatment to binary choices (eg, “do it” versus “don’t do it”) that may be assigned an actionable symbol (eg, red/green traffic lights, smiley/frowning face emoji).

Strategies for improvement

Many authors and peer reviewers are volunteer health care professionals or trainees who lack formal training in evidence synthesis [ 46 , 53 ]. Informing them about research methodology could increase the likelihood they will apply rigorous methods [ 25 , 33 , 45 ]. We tackle this challenge, from both a theoretical and a practical perspective, by offering guidance applicable to any specialty. It is based on recent methodological research that is extensively referenced to promote self-study. However, the information presented is not intended to be substitute for committed training in evidence synthesis methodology; instead, we hope to inspire our target audience to seek such training. We also hope to inform a broader audience of clinicians and guideline developers influenced by evidence syntheses. Notably, these communities often include the same members who serve in different capacities.

In the following sections, we highlight methodological concepts and practices that may be unfamiliar, problematic, confusing, or controversial. In Part 2, we consider various types of evidence syntheses and the types of research evidence summarized by them. In Part 3, we examine some widely used (and misused) tools for the critical appraisal of systematic reviews and reporting guidelines for evidence syntheses. In Part 4, we discuss how to meet methodological conduct standards applicable to key components of systematic reviews. In Part 5, we describe the merits and caveats of rating the overall certainty of a body of evidence. Finally, in Part 6, we summarize suggested terminology, methods, and tools for development and evaluation of evidence syntheses that reflect current best practices.

Part 2. Types of syntheses and research evidence

A good foundation for the development of evidence syntheses requires an appreciation of their various methodologies and the ability to correctly identify the types of research potentially available for inclusion in the synthesis.

Types of evidence syntheses

Systematic reviews have historically focused on the benefits and harms of interventions; over time, various types of systematic reviews have emerged to address the diverse information needs of clinicians, patients, and policy makers [ 54 ] Systematic reviews with traditional components have become defined by the different topics they assess (Table 2.1 ). In addition, other distinctive types of evidence syntheses have evolved, including overviews or umbrella reviews, scoping reviews, rapid reviews, and living reviews. The popularity of these has been increasing in recent years [ 55 , 56 , 57 , 58 ]. A summary of the development, methods, available guidance, and indications for these unique types of evidence syntheses is available in Additional File 2 A.

Both Cochrane [ 30 , 59 ] and JBI [ 60 ] provide methodologies for many types of evidence syntheses; they describe these with different terminology, but there is obvious overlap (Table 2.2 ). The majority of evidence syntheses published by Cochrane (96%) and JBI (62%) are categorized as intervention reviews. This reflects the earlier development and dissemination of their intervention review methodologies; these remain well-established [ 30 , 59 , 61 ] as both organizations continue to focus on topics related to treatment efficacy and harms. In contrast, intervention reviews represent only about half of the total published in the general medical literature, and several non-intervention review types contribute to a significant proportion of the other half.

Types of research evidence

There is consensus on the importance of using multiple study designs in evidence syntheses; at the same time, there is a lack of agreement on methods to identify included study designs. Authors of evidence syntheses may use various taxonomies and associated algorithms to guide selection and/or classification of study designs. These tools differentiate categories of research and apply labels to individual study designs (eg, RCT, cross-sectional). A familiar example is the Design Tree endorsed by the Centre for Evidence-Based Medicine [ 70 ]. Such tools may not be helpful to authors of evidence syntheses for multiple reasons.

Suboptimal levels of agreement and accuracy even among trained methodologists reflect challenges with the application of such tools [ 71 , 72 ]. Problematic distinctions or decision points (eg, experimental or observational, controlled or uncontrolled, prospective or retrospective) and design labels (eg, cohort, case control, uncontrolled trial) have been reported [ 71 ]. The variable application of ambiguous study design labels to non-randomized studies is common, making them especially prone to misclassification [ 73 ]. In addition, study labels do not denote the unique design features that make different types of non-randomized studies susceptible to different biases, including those related to how the data are obtained (eg, clinical trials, disease registries, wearable devices). Given this limitation, it is important to be aware that design labels preclude the accurate assignment of non-randomized studies to a “level of evidence” in traditional hierarchies [ 74 ].

These concerns suggest that available tools and nomenclature used to distinguish types of research evidence may not uniformly apply to biomedical research and non-health fields that utilize evidence syntheses (eg, education, economics) [ 75 , 76 ]. Moreover, primary research reports often do not describe study design or do so incompletely or inaccurately; thus, indexing in PubMed and other databases does not address the potential for misclassification [ 77 ]. Yet proper identification of research evidence has implications for several key components of evidence syntheses. For example, search strategies limited by index terms using design labels or study selection based on labels applied by the authors of primary studies may cause inconsistent or unjustified study inclusions and/or exclusions [ 77 ]. In addition, because risk of bias (RoB) tools consider attributes specific to certain types of studies and study design features, results of these assessments may be invalidated if an inappropriate tool is used. Appropriate classification of studies is also relevant for the selection of a suitable method of synthesis and interpretation of those results.

An alternative to these tools and nomenclature involves application of a few fundamental distinctions that encompass a wide range of research designs and contexts. While these distinctions are not novel, we integrate them into a practical scheme (see Fig. 1 ) designed to guide authors of evidence syntheses in the basic identification of research evidence. The initial distinction is between primary and secondary studies. Primary studies are then further distinguished by: 1) the type of data reported (qualitative or quantitative); and 2) two defining design features (group or single-case and randomized or non-randomized). The different types of studies and study designs represented in the scheme are described in detail in Additional File 2 B. It is important to conceptualize their methods as complementary as opposed to contrasting or hierarchical [ 78 ]; each offers advantages and disadvantages that determine their appropriateness for answering different kinds of research questions in an evidence synthesis.

figure 1

Distinguishing types of research evidence

Application of these basic distinctions may avoid some of the potential difficulties associated with study design labels and taxonomies. Nevertheless, debatable methodological issues are raised when certain types of research identified in this scheme are included in an evidence synthesis. We briefly highlight those associated with inclusion of non-randomized studies, case reports and series, and a combination of primary and secondary studies.

Non-randomized studies

When investigating an intervention’s effectiveness, it is important for authors to recognize the uncertainty of observed effects reported by studies with high RoB. Results of statistical analyses that include such studies need to be interpreted with caution in order to avoid misleading conclusions [ 74 ]. Review authors may consider excluding randomized studies with high RoB from meta-analyses. Non-randomized studies of intervention (NRSI) are affected by a greater potential range of biases and thus vary more than RCTs in their ability to estimate a causal effect [ 79 ]. If data from NRSI are synthesized in meta-analyses, it is helpful to separately report their summary estimates [ 6 , 74 ].

Nonetheless, certain design features of NRSI (eg, which parts of the study were prospectively designed) may help to distinguish stronger from weaker ones. Cochrane recommends that authors of a review including NRSI focus on relevant study design features when determining eligibility criteria instead of relying on non-informative study design labels [ 79 , 80 ] This process is facilitated by a study design feature checklist; guidance on using the checklist is included with developers’ description of the tool [ 73 , 74 ]. Authors collect information about these design features during data extraction and then consider it when making final study selection decisions and when performing RoB assessments of the included NRSI.

Case reports and case series

Correctly identified case reports and case series can contribute evidence not well captured by other designs [ 81 ]; in addition, some topics may be limited to a body of evidence that consists primarily of uncontrolled clinical observations. Murad and colleagues offer a framework for how to include case reports and series in an evidence synthesis [ 82 ]. Distinguishing between cohort studies and case series in these syntheses is important, especially for those that rely on evidence from NRSI. Additional data obtained from studies misclassified as case series can potentially increase the confidence in effect estimates. Mathes and Pieper provide authors of evidence syntheses with specific guidance on distinguishing between cohort studies and case series, but emphasize the increased workload involved [ 77 ].

Primary and secondary studies

Synthesis of combined evidence from primary and secondary studies may provide a broad perspective on the entirety of available literature on a topic. This is, in fact, the recommended strategy for scoping reviews that may include a variety of sources of evidence (eg, CPGs, popular media). However, except for scoping reviews, the synthesis of data from primary and secondary studies is discouraged unless there are strong reasons to justify doing so.

Combining primary and secondary sources of evidence is challenging for authors of other types of evidence syntheses for several reasons [ 83 ]. Assessments of RoB for primary and secondary studies are derived from conceptually different tools, thus obfuscating the ability to make an overall RoB assessment of a combination of these study types. In addition, authors who include primary and secondary studies must devise non-standardized methods for synthesis. Note this contrasts with well-established methods available for updating existing evidence syntheses with additional data from new primary studies [ 84 , 85 , 86 ]. However, a new review that synthesizes data from primary and secondary studies raises questions of validity and may unintentionally support a biased conclusion because no existing methodological guidance is currently available [ 87 ].

Recommendations

We suggest that journal editors require authors to identify which type of evidence synthesis they are submitting and reference the specific methodology used for its development. This will clarify the research question and methods for peer reviewers and potentially simplify the editorial process. Editors should announce this practice and include it in the instructions to authors. To decrease bias and apply correct methods, authors must also accurately identify the types of research evidence included in their syntheses.

Part 3. Conduct and reporting

The need to develop criteria to assess the rigor of systematic reviews was recognized soon after the EBM movement began to gain international traction [ 88 , 89 ]. Systematic reviews rapidly became popular, but many were very poorly conceived, conducted, and reported. These problems remain highly prevalent [ 23 ] despite development of guidelines and tools to standardize and improve the performance and reporting of evidence syntheses [ 22 , 28 ]. Table 3.1  provides some historical perspective on the evolution of tools developed specifically for the evaluation of systematic reviews, with or without meta-analysis.

These tools are often interchangeably invoked when referring to the “quality” of an evidence synthesis. However, quality is a vague term that is frequently misused and misunderstood; more precisely, these tools specify different standards for evidence syntheses. Methodological standards address how well a systematic review was designed and performed [ 5 ]. RoB assessments refer to systematic flaws or limitations in the design, conduct, or analysis of research that distort the findings of the review [ 4 ]. Reporting standards help systematic review authors describe the methodology they used and the results of their synthesis in sufficient detail [ 92 ]. It is essential to distinguish between these evaluations: a systematic review may be biased, it may fail to report sufficient information on essential features, or it may exhibit both problems; a thoroughly reported systematic evidence synthesis review may still be biased and flawed while an otherwise unbiased one may suffer from deficient documentation.

We direct attention to the currently recommended tools listed in Table 3.1  but concentrate on AMSTAR-2 (update of AMSTAR [A Measurement Tool to Assess Systematic Reviews]) and ROBIS (Risk of Bias in Systematic Reviews), which evaluate methodological quality and RoB, respectively. For comparison and completeness, we include PRISMA 2020 (update of the 2009 Preferred Reporting Items for Systematic Reviews of Meta-Analyses statement), which offers guidance on reporting standards. The exclusive focus on these three tools is by design; it addresses concerns related to the considerable variability in tools used for the evaluation of systematic reviews [ 28 , 88 , 96 , 97 ]. We highlight the underlying constructs these tools were designed to assess, then describe their components and applications. Their known (or potential) uptake and impact and limitations are also discussed.

Evaluation of conduct

Development.

AMSTAR [ 5 ] was in use for a decade prior to the 2017 publication of AMSTAR-2; both provide a broad evaluation of methodological quality of intervention systematic reviews, including flaws arising through poor conduct of the review [ 6 ]. ROBIS, published in 2016, was developed to specifically assess RoB introduced by the conduct of the review; it is applicable to systematic reviews of interventions and several other types of reviews [ 4 ]. Both tools reflect a shift to a domain-based approach as opposed to generic quality checklists. There are a few items unique to each tool; however, similarities between items have been demonstrated [ 98 , 99 ]. AMSTAR-2 and ROBIS are recommended for use by: 1) authors of overviews or umbrella reviews and CPGs to evaluate systematic reviews considered as evidence; 2) authors of methodological research studies to appraise included systematic reviews; and 3) peer reviewers for appraisal of submitted systematic review manuscripts. For authors, these tools may function as teaching aids and inform conduct of their review during its development.

Description

Systematic reviews that include randomized and/or non-randomized studies as evidence can be appraised with AMSTAR-2 and ROBIS. Other characteristics of AMSTAR-2 and ROBIS are summarized in Table 3.2 . Both tools define categories for an overall rating; however, neither tool is intended to generate a total score by simply calculating the number of responses satisfying criteria for individual items [ 4 , 6 ]. AMSTAR-2 focuses on the rigor of a review’s methods irrespective of the specific subject matter. ROBIS places emphasis on a review’s results section— this suggests it may be optimally applied by appraisers with some knowledge of the review’s topic as they may be better equipped to determine if certain procedures (or lack thereof) would impact the validity of a review’s findings [ 98 , 100 ]. Reliability studies show AMSTAR-2 overall confidence ratings strongly correlate with the overall RoB ratings in ROBIS [ 100 , 101 ].

Interrater reliability has been shown to be acceptable for AMSTAR-2 [ 6 , 11 , 102 ] and ROBIS [ 4 , 98 , 103 ] but neither tool has been shown to be superior in this regard [ 100 , 101 , 104 , 105 ]. Overall, variability in reliability for both tools has been reported across items, between pairs of raters, and between centers [ 6 , 100 , 101 , 104 ]. The effects of appraiser experience on the results of AMSTAR-2 and ROBIS require further evaluation [ 101 , 105 ]. Updates to both tools should address items shown to be prone to individual appraisers’ subjective biases and opinions [ 11 , 100 ]; this may involve modifications of the current domains and signaling questions as well as incorporation of methods to make an appraiser’s judgments more explicit. Future revisions of these tools may also consider the addition of standards for aspects of systematic review development currently lacking (eg, rating overall certainty of evidence, [ 99 ] methods for synthesis without meta-analysis [ 105 ]) and removal of items that assess aspects of reporting that are thoroughly evaluated by PRISMA 2020.

Application

A good understanding of what is required to satisfy the standards of AMSTAR-2 and ROBIS involves study of the accompanying guidance documents written by the tools’ developers; these contain detailed descriptions of each item’s standards. In addition, accurate appraisal of a systematic review with either tool requires training. Most experts recommend independent assessment by at least two appraisers with a process for resolving discrepancies as well as procedures to establish interrater reliability, such as pilot testing, a calibration phase or exercise, and development of predefined decision rules [ 35 , 99 , 100 , 101 , 103 , 104 , 106 ]. These methods may, to some extent, address the challenges associated with the diversity in methodological training, subject matter expertise, and experience using the tools that are likely to exist among appraisers.

The standards of AMSTAR, AMSTAR-2, and ROBIS have been used in many methodological studies and epidemiological investigations. However, the increased publication of overviews or umbrella reviews and CPGs has likely been a greater influence on the widening acceptance of these tools. Critical appraisal of the secondary studies considered evidence is essential to the trustworthiness of both the recommendations of CPGs and the conclusions of overviews. Currently both Cochrane [ 55 ] and JBI [ 107 ] recommend AMSTAR-2 and ROBIS in their guidance for authors of overviews or umbrella reviews. However, ROBIS and AMSTAR-2 were released in 2016 and 2017, respectively; thus, to date, limited data have been reported about the uptake of these tools or which of the two may be preferred [ 21 , 106 ]. Currently, in relation to CPGs, AMSTAR-2 appears to be overwhelmingly popular compared to ROBIS. A Google Scholar search of this topic (search terms “AMSTAR 2 AND clinical practice guidelines,” “ROBIS AND clinical practice guidelines” 13 May 2022) found 12,700 hits for AMSTAR-2 and 1,280 for ROBIS. The apparent greater appeal of AMSTAR-2 may relate to its longer track record given the original version of the tool was in use for 10 years prior to its update in 2017.

Barriers to the uptake of AMSTAR-2 and ROBIS include the real or perceived time and resources necessary to complete the items they include and appraisers’ confidence in their own ratings [ 104 ]. Reports from comparative studies available to date indicate that appraisers find AMSTAR-2 questions, responses, and guidance to be clearer and simpler compared with ROBIS [ 11 , 101 , 104 , 105 ]. This suggests that for appraisal of intervention systematic reviews, AMSTAR-2 may be a more practical tool than ROBIS, especially for novice appraisers [ 101 , 103 , 104 , 105 ]. The unique characteristics of each tool, as well as their potential advantages and disadvantages, should be taken into consideration when deciding which tool should be used for an appraisal of a systematic review. In addition, the choice of one or the other may depend on how the results of an appraisal will be used; for example, a peer reviewer’s appraisal of a single manuscript versus an appraisal of multiple systematic reviews in an overview or umbrella review, CPG, or systematic methodological study.

Authors of overviews and CPGs report results of AMSTAR-2 and ROBIS appraisals for each of the systematic reviews they include as evidence. Ideally, an independent judgment of their appraisals can be made by the end users of overviews and CPGs; however, most stakeholders, including clinicians, are unlikely to have a sophisticated understanding of these tools. Nevertheless, they should at least be aware that AMSTAR-2 and ROBIS ratings reported in overviews and CPGs may be inaccurate because the tools are not applied as intended by their developers. This can result from inadequate training of the overview or CPG authors who perform the appraisals, or to modifications of the appraisal tools imposed by them. The potential variability in overall confidence and RoB ratings highlights why appraisers applying these tools need to support their judgments with explicit documentation; this allows readers to judge for themselves whether they agree with the criteria used by appraisers [ 4 , 108 ]. When these judgments are explicit, the underlying rationale used when applying these tools can be assessed [ 109 ].

Theoretically, we would expect an association of AMSTAR-2 with improved methodological rigor and an association of ROBIS with lower RoB in recent systematic reviews compared to those published before 2017. To our knowledge, this has not yet been demonstrated; however, like reports about the actual uptake of these tools, time will tell. Additional data on user experience is also needed to further elucidate the practical challenges and methodological nuances encountered with the application of these tools. This information could potentially inform the creation of unifying criteria to guide and standardize the appraisal of evidence syntheses [ 109 ].

Evaluation of reporting

Complete reporting is essential for users to establish the trustworthiness and applicability of a systematic review’s findings. Efforts to standardize and improve the reporting of systematic reviews resulted in the 2009 publication of the PRISMA statement [ 92 ] with its accompanying explanation and elaboration document [ 110 ]. This guideline was designed to help authors prepare a complete and transparent report of their systematic review. In addition, adherence to PRISMA is often used to evaluate the thoroughness of reporting of published systematic reviews [ 111 ]. The updated version, PRISMA 2020 [ 93 ], and its guidance document [ 112 ] were published in 2021. Items on the original and updated versions of PRISMA are organized by the six basic review components they address (title, abstract, introduction, methods, results, discussion). The PRISMA 2020 update is a considerably expanded version of the original; it includes standards and examples for the 27 original and 13 additional reporting items that capture methodological advances and may enhance the replicability of reviews [ 113 ].

The original PRISMA statement fostered the development of various PRISMA extensions (Table 3.3 ). These include reporting guidance for scoping reviews and reviews of diagnostic test accuracy and for intervention reviews that report on the following: harms outcomes, equity issues, the effects of acupuncture, the results of network meta-analyses and analyses of individual participant data. Detailed reporting guidance for specific systematic review components (abstracts, protocols, literature searches) is also available.

Uptake and impact

The 2009 PRISMA standards [ 92 ] for reporting have been widely endorsed by authors, journals, and EBM-related organizations. We anticipate the same for PRISMA 2020 [ 93 ] given its co-publication in multiple high-impact journals. However, to date, there is a lack of strong evidence for an association between improved systematic review reporting and endorsement of PRISMA 2009 standards [ 43 , 111 ]. Most journals require a PRISMA checklist accompany submissions of systematic review manuscripts. However, the accuracy of information presented on these self-reported checklists is not necessarily verified. It remains unclear which strategies (eg, authors’ self-report of checklists, peer reviewer checks) might improve adherence to the PRISMA reporting standards; in addition, the feasibility of any potentially effective strategies must be taken into consideration given the structure and limitations of current research and publication practices [ 124 ].

Pitfalls and limitations of PRISMA, AMSTAR-2, and ROBIS

Misunderstanding of the roles of these tools and their misapplication may be widespread problems. PRISMA 2020 is a reporting guideline that is most beneficial if consulted when developing a review as opposed to merely completing a checklist when submitting to a journal; at that point, the review is finished, with good or bad methodological choices. However, PRISMA checklists evaluate how completely an element of review conduct was reported, but do not evaluate the caliber of conduct or performance of a review. Thus, review authors and readers should not think that a rigorous systematic review can be produced by simply following the PRISMA 2020 guidelines. Similarly, it is important to recognize that AMSTAR-2 and ROBIS are tools to evaluate the conduct of a review but do not substitute for conceptual methodological guidance. In addition, they are not intended to be simple checklists. In fact, they have the potential for misuse or abuse if applied as such; for example, by calculating a total score to make a judgment about a review’s overall confidence or RoB. Proper selection of a response for the individual items on AMSTAR-2 and ROBIS requires training or at least reference to their accompanying guidance documents.

Not surprisingly, it has been shown that compliance with the PRISMA checklist is not necessarily associated with satisfying the standards of ROBIS [ 125 ]. AMSTAR-2 and ROBIS were not available when PRISMA 2009 was developed; however, they were considered in the development of PRISMA 2020 [ 113 ]. Therefore, future studies may show a positive relationship between fulfillment of PRISMA 2020 standards for reporting and meeting the standards of tools evaluating methodological quality and RoB.

Choice of an appropriate tool for the evaluation of a systematic review first involves identification of the underlying construct to be assessed. For systematic reviews of interventions, recommended tools include AMSTAR-2 and ROBIS for appraisal of conduct and PRISMA 2020 for completeness of reporting. All three tools were developed rigorously and provide easily accessible and detailed user guidance, which is necessary for their proper application and interpretation. When considering a manuscript for publication, training in these tools can sensitize peer reviewers and editors to major issues that may affect the review’s trustworthiness and completeness of reporting. Judgment of the overall certainty of a body of evidence and formulation of recommendations rely, in part, on AMSTAR-2 or ROBIS appraisals of systematic reviews. Therefore, training on the application of these tools is essential for authors of overviews and developers of CPGs. Peer reviewers and editors considering an overview or CPG for publication must hold their authors to a high standard of transparency regarding both the conduct and reporting of these appraisals.

Part 4. Meeting conduct standards

Many authors, peer reviewers, and editors erroneously equate fulfillment of the items on the PRISMA checklist with superior methodological rigor. For direction on methodology, we refer them to available resources that provide comprehensive conceptual guidance [ 59 , 60 ] as well as primers with basic step-by-step instructions [ 1 , 126 , 127 ]. This section is intended to complement study of such resources by facilitating use of AMSTAR-2 and ROBIS, tools specifically developed to evaluate methodological rigor of systematic reviews. These tools are widely accepted by methodologists; however, in the general medical literature, they are not uniformly selected for the critical appraisal of systematic reviews [ 88 , 96 ].

To enable their uptake, Table 4.1  links review components to the corresponding appraisal tool items. Expectations of AMSTAR-2 and ROBIS are concisely stated, and reasoning provided.

Issues involved in meeting the standards for seven review components (identified in bold in Table 4.1 ) are addressed in detail. These were chosen for elaboration for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors based on consistent reports of their frequent AMSTAR-2 or ROBIS deficiencies [ 9 , 11 , 15 , 88 , 128 , 129 ]; and/or 2) the review component is judged by standards of an AMSTAR-2 “critical” domain. These have the greatest implications for how a systematic review will be appraised: if standards for any one of these critical domains are not met, the review is rated as having “critically low confidence.”

Research question

Specific and unambiguous research questions may have more value for reviews that deal with hypothesis testing. Mnemonics for the various elements of research questions are suggested by JBI and Cochrane (Table 2.1 ). These prompt authors to consider the specialized methods involved for developing different types of systematic reviews; however, while inclusion of the suggested elements makes a review compliant with a particular review’s methods, it does not necessarily make a research question appropriate. Table 4.2  lists acronyms that may aid in developing the research question. They include overlapping concepts of importance in this time of proliferating reviews of uncertain value [ 130 ]. If these issues are not prospectively contemplated, systematic review authors may establish an overly broad scope, or develop runaway scope allowing them to stray from predefined choices relating to key comparisons and outcomes.

Once a research question is established, searching on registry sites and databases for existing systematic reviews addressing the same or a similar topic is necessary in order to avoid contributing to research waste [ 131 ]. Repeating an existing systematic review must be justified, for example, if previous reviews are out of date or methodologically flawed. A full discussion on replication of intervention systematic reviews, including a consensus checklist, can be found in the work of Tugwell and colleagues [ 84 ].

Protocol development is considered a core component of systematic reviews [ 125 , 126 , 132 ]. Review protocols may allow researchers to plan and anticipate potential issues, assess validity of methods, prevent arbitrary decision-making, and minimize bias that can be introduced by the conduct of the review. Registration of a protocol that allows public access promotes transparency of the systematic review’s methods and processes and reduces the potential for duplication [ 132 ]. Thinking early and carefully about all the steps of a systematic review is pragmatic and logical and may mitigate the influence of the authors’ prior knowledge of the evidence [ 133 ]. In addition, the protocol stage is when the scope of the review can be carefully considered by authors, reviewers, and editors; this may help to avoid production of overly ambitious reviews that include excessive numbers of comparisons and outcomes or are undisciplined in their study selection.

An association with attainment of AMSTAR standards in systematic reviews with published prospective protocols has been reported [ 134 ]. However, completeness of reporting does not seem to be different in reviews with a protocol compared to those without one [ 135 ]. PRISMA-P [ 116 ] and its accompanying elaboration and explanation document [ 136 ] can be used to guide and assess the reporting of protocols. A final version of the review should fully describe any protocol deviations. Peer reviewers may compare the submitted manuscript with any available pre-registered protocol; this is required if AMSTAR-2 or ROBIS are used for critical appraisal.

There are multiple options for the recording of protocols (Table 4.3 ). Some journals will peer review and publish protocols. In addition, many online sites offer date-stamped and publicly accessible protocol registration. Some of these are exclusively for protocols of evidence syntheses; others are less restrictive and offer researchers the capacity for data storage, sharing, and other workflow features. These sites document protocol details to varying extents and have different requirements [ 137 ]. The most popular site for systematic reviews, the International Prospective Register of Systematic Reviews (PROSPERO), for example, only registers reviews that report on an outcome with direct relevance to human health. The PROSPERO record documents protocols for all types of reviews except literature and scoping reviews. Of note, PROSPERO requires authors register their review protocols prior to any data extraction [ 133 , 138 ]. The electronic records of most of these registry sites allow authors to update their protocols and facilitate transparent tracking of protocol changes, which are not unexpected during the progress of the review [ 139 ].

Study design inclusion

For most systematic reviews, broad inclusion of study designs is recommended [ 126 ]. This may allow comparison of results between contrasting study design types [ 126 ]. Certain study designs may be considered preferable depending on the type of review and nature of the research question. However, prevailing stereotypes about what each study design does best may not be accurate. For example, in systematic reviews of interventions, randomized designs are typically thought to answer highly specific questions while non-randomized designs often are expected to reveal greater information about harms or real-word evidence [ 126 , 140 , 141 ]. This may be a false distinction; randomized trials may be pragmatic [ 142 ], they may offer important (and more unbiased) information on harms [ 143 ], and data from non-randomized trials may not necessarily be more real-world-oriented [ 144 ].

Moreover, there may not be any available evidence reported by RCTs for certain research questions; in some cases, there may not be any RCTs or NRSI. When the available evidence is limited to case reports and case series, it is not possible to test hypotheses nor provide descriptive estimates or associations; however, a systematic review of these studies can still offer important insights [ 81 , 145 ]. When authors anticipate that limited evidence of any kind may be available to inform their research questions, a scoping review can be considered. Alternatively, decisions regarding inclusion of indirect as opposed to direct evidence can be addressed during protocol development [ 146 ]. Including indirect evidence at an early stage of intervention systematic review development allows authors to decide if such studies offer any additional and/or different understanding of treatment effects for their population or comparison of interest. Issues of indirectness of included studies are accounted for later in the process, during determination of the overall certainty of evidence (see Part 5 for details).

Evidence search

Both AMSTAR-2 and ROBIS require systematic and comprehensive searches for evidence. This is essential for any systematic review. Both tools discourage search restrictions based on language and publication source. Given increasing globalism in health care, the practice of including English-only literature should be avoided [ 126 ]. There are many examples in which language bias (different results in studies published in different languages) has been documented [ 147 , 148 ]. This does not mean that all literature, in all languages, is equally trustworthy [ 148 ]; however, the only way to formally probe for the potential of such biases is to consider all languages in the initial search. The gray literature and a search of trials may also reveal important details about topics that would otherwise be missed [ 149 , 150 , 151 ]. Again, inclusiveness will allow review authors to investigate whether results differ in gray literature and trials [ 41 , 151 , 152 , 153 ].

Authors should make every attempt to complete their review within one year as that is the likely viable life of a search. (1) If that is not possible, the search should be updated close to the time of completion [ 154 ]. Different research topics may warrant less of a delay, for example, in rapidly changing fields (as in the case of the COVID-19 pandemic), even one month may radically change the available evidence.

Excluded studies

AMSTAR-2 requires authors to provide references for any studies excluded at the full text phase of study selection along with reasons for exclusion; this allows readers to feel confident that all relevant literature has been considered for inclusion and that exclusions are defensible.

Risk of bias assessment of included studies

The design of the studies included in a systematic review (eg, RCT, cohort, case series) should not be equated with appraisal of its RoB. To meet AMSTAR-2 and ROBIS standards, systematic review authors must examine RoB issues specific to the design of each primary study they include as evidence. It is unlikely that a single RoB appraisal tool will be suitable for all research designs. In addition to tools for randomized and non-randomized studies, specific tools are available for evaluation of RoB in case reports and case series [ 82 ] and single-case experimental designs [ 155 , 156 ]. Note the RoB tools selected must meet the standards of the appraisal tool used to judge the conduct of the review. For example, AMSTAR-2 identifies four sources of bias specific to RCTs and NRSI that must be addressed by the RoB tool(s) chosen by the review authors. The Cochrane RoB-2 [ 157 ] tool for RCTs and ROBINS-I [ 158 ] for NRSI for RoB assessment meet the AMSTAR-2 standards. Appraisers on the review team should not modify any RoB tool without complete transparency and acknowledgment that they have invalidated the interpretation of the tool as intended by its developers [ 159 ]. Conduct of RoB assessments is not addressed AMSTAR-2; to meet ROBIS standards, two independent reviewers should complete RoB assessments of included primary studies.

Implications of the RoB assessments must be explicitly discussed and considered in the conclusions of the review. Discussion of the overall RoB of included studies may consider the weight of the studies at high RoB, the importance of the sources of bias in the studies being summarized, and if their importance differs in relationship to the outcomes reported. If a meta-analysis is performed, serious concerns for RoB of individual studies should be accounted for in these results as well. If the results of the meta-analysis for a specific outcome change when studies at high RoB are excluded, readers will have a more accurate understanding of this body of evidence. However, while investigating the potential impact of specific biases is a useful exercise, it is important to avoid over-interpretation, especially when there are sparse data.

Synthesis methods for quantitative data

Syntheses of quantitative data reported by primary studies are broadly categorized as one of two types: meta-analysis, and synthesis without meta-analysis (Table 4.4 ). Before deciding on one of these methods, authors should seek methodological advice about whether reported data can be transformed or used in other ways to provide a consistent effect measure across studies [ 160 , 161 ].

Meta-analysis

Systematic reviews that employ meta-analysis should not be referred to simply as “meta-analyses.” The term meta-analysis strictly refers to a specific statistical technique used when study effect estimates and their variances are available, yielding a quantitative summary of results. In general, methods for meta-analysis involve use of a weighted average of effect estimates from two or more studies. If considered carefully, meta-analysis increases the precision of the estimated magnitude of effect and can offer useful insights about heterogeneity and estimates of effects. We refer to standard references for a thorough introduction and formal training [ 165 , 166 , 167 ].

There are three common approaches to meta-analysis in current health care–related systematic reviews (Table 4.4 ). Aggregate meta-analyses is the most familiar to authors of evidence syntheses and their end users. This standard meta-analysis combines data on effect estimates reported by studies that investigate similar research questions involving direct comparisons of an intervention and comparator. Results of these analyses provide a single summary intervention effect estimate. If the included studies in a systematic review measure an outcome differently, their reported results may be transformed to make them comparable [ 161 ]. Forest plots visually present essential information about the individual studies and the overall pooled analysis (see Additional File 4  for details).

Less familiar and more challenging meta-analytical approaches used in secondary research include individual participant data (IPD) and network meta-analyses (NMA); PRISMA extensions provide reporting guidelines for both [ 117 , 118 ]. In IPD, the raw data on each participant from each eligible study are re-analyzed as opposed to the study-level data analyzed in aggregate data meta-analyses [ 168 ]. This may offer advantages, including the potential for limiting concerns about bias and allowing more robust analyses [ 163 ]. As suggested by the description in Table 4.4 , NMA is a complex statistical approach. It combines aggregate data [ 169 ] or IPD [ 170 ] for effect estimates from direct and indirect comparisons reported in two or more studies of three or more interventions. This makes it a potentially powerful statistical tool; while multiple interventions are typically available to treat a condition, few have been evaluated in head-to-head trials [ 171 ]. Both IPD and NMA facilitate a broader scope, and potentially provide more reliable and/or detailed results; however, compared with standard aggregate data meta-analyses, their methods are more complicated, time-consuming, and resource-intensive, and they have their own biases, so one needs sufficient funding, technical expertise, and preparation to employ them successfully [ 41 , 172 , 173 ].

Several items in AMSTAR-2 and ROBIS address meta-analysis; thus, understanding the strengths, weaknesses, assumptions, and limitations of methods for meta-analyses is important. According to the standards of both tools, plans for a meta-analysis must be addressed in the review protocol, including reasoning, description of the type of quantitative data to be synthesized, and the methods planned for combining the data. This should not consist of stock statements describing conventional meta-analysis techniques; rather, authors are expected to anticipate issues specific to their research questions. Concern for the lack of training in meta-analysis methods among systematic review authors cannot be overstated. For those with training, the use of popular software (eg, RevMan [ 174 ], MetaXL [ 175 ], JBI SUMARI [ 176 ]) may facilitate exploration of these methods; however, such programs cannot substitute for the accurate interpretation of the results of meta-analyses, especially for more complex meta-analytical approaches.

Synthesis without meta-analysis

There are varied reasons a meta-analysis may not be appropriate or desirable [ 160 , 161 ]. Syntheses that informally use statistical methods other than meta-analysis are variably referred to as descriptive, narrative, or qualitative syntheses or summaries; these terms are also applied to syntheses that make no attempt to statistically combine data from individual studies. However, use of such imprecise terminology is discouraged; in order to fully explore the results of any type of synthesis, some narration or description is needed to supplement the data visually presented in tabular or graphic forms [ 63 , 177 ]. In addition, the term “qualitative synthesis” is easily confused with a synthesis of qualitative data in a qualitative or mixed methods review. “Synthesis without meta-analysis” is currently the preferred description of other ways to combine quantitative data from two or more studies. Use of this specific terminology when referring to these types of syntheses also implies the application of formal methods (Table 4.4 ).

Methods for syntheses without meta-analysis involve structured presentations of the data in any tables and plots. In comparison to narrative descriptions of each study, these are designed to more effectively and transparently show patterns and convey detailed information about the data; they also allow informal exploration of heterogeneity [ 178 ]. In addition, acceptable quantitative statistical methods (Table 4.4 ) are formally applied; however, it is important to recognize these methods have significant limitations for the interpretation of the effectiveness of an intervention [ 160 ]. Nevertheless, when meta-analysis is not possible, the application of these methods is less prone to bias compared with an unstructured narrative description of included studies [ 178 , 179 ].

Vote counting is commonly used in systematic reviews and involves a tally of studies reporting results that meet some threshold of importance applied by review authors. Until recently, it has not typically been identified as a method for synthesis without meta-analysis. Guidance on an acceptable vote counting method based on direction of effect is currently available [ 160 ] and should be used instead of narrative descriptions of such results (eg, “more than half the studies showed improvement”; “only a few studies reported adverse effects”; “7 out of 10 studies favored the intervention”). Unacceptable methods include vote counting by statistical significance or magnitude of effect or some subjective rule applied by the authors.

AMSTAR-2 and ROBIS standards do not explicitly address conduct of syntheses without meta-analysis, although AMSTAR-2 items 13 and 14 might be considered relevant. Guidance for the complete reporting of syntheses without meta-analysis for systematic reviews of interventions is available in the Synthesis without Meta-analysis (SWiM) guideline [ 180 ] and methodological guidance is available in the Cochrane Handbook [ 160 , 181 ].

Familiarity with AMSTAR-2 and ROBIS makes sense for authors of systematic reviews as these appraisal tools will be used to judge their work; however, training is necessary for authors to truly appreciate and apply methodological rigor. Moreover, judgment of the potential contribution of a systematic review to the current knowledge base goes beyond meeting the standards of AMSTAR-2 and ROBIS. These tools do not explicitly address some crucial concepts involved in the development of a systematic review; this further emphasizes the need for author training.

We recommend that systematic review authors incorporate specific practices or exercises when formulating a research question at the protocol stage, These should be designed to raise the review team’s awareness of how to prevent research and resource waste [ 84 , 130 ] and to stimulate careful contemplation of the scope of the review [ 30 ]. Authors’ training should also focus on justifiably choosing a formal method for the synthesis of quantitative and/or qualitative data from primary research; both types of data require specific expertise. For typical reviews that involve syntheses of quantitative data, statistical expertise is necessary, initially for decisions about appropriate methods, [ 160 , 161 ] and then to inform any meta-analyses [ 167 ] or other statistical methods applied [ 160 ].

Part 5. Rating overall certainty of evidence

Report of an overall certainty of evidence assessment in a systematic review is an important new reporting standard of the updated PRISMA 2020 guidelines [ 93 ]. Systematic review authors are well acquainted with assessing RoB in individual primary studies, but much less familiar with assessment of overall certainty across an entire body of evidence. Yet a reliable way to evaluate this broader concept is now recognized as a vital part of interpreting the evidence.

Historical systems for rating evidence are based on study design and usually involve hierarchical levels or classes of evidence that use numbers and/or letters to designate the level/class. These systems were endorsed by various EBM-related organizations. Professional societies and regulatory groups then widely adopted them, often with modifications for application to the available primary research base in specific clinical areas. In 2002, a report issued by the AHRQ identified 40 systems to rate quality of a body of evidence [ 182 ]. A critical appraisal of systems used by prominent health care organizations published in 2004 revealed limitations in sensibility, reproducibility, applicability to different questions, and usability to different end users [ 183 ]. Persistent use of hierarchical rating schemes to describe overall quality continues to complicate the interpretation of evidence. This is indicated by recent reports of poor interpretability of systematic review results by readers [ 184 , 185 , 186 ] and misleading interpretations of the evidence related to the “spin” systematic review authors may put on their conclusions [ 50 , 187 ].

Recognition of the shortcomings of hierarchical rating systems raised concerns that misleading clinical recommendations could result even if based on a rigorous systematic review. In addition, the number and variability of these systems were considered obstacles to quick and accurate interpretations of the evidence by clinicians, patients, and policymakers [ 183 ]. These issues contributed to the development of the GRADE approach. An international working group, that continues to actively evaluate and refine it, first introduced GRADE in 2004 [ 188 ]. Currently more than 110 organizations from 19 countries around the world have endorsed or are using GRADE [ 189 ].

GRADE approach to rating overall certainty

GRADE offers a consistent and sensible approach for two separate processes: rating the overall certainty of a body of evidence and the strength of recommendations. The former is the expected conclusion of a systematic review, while the latter is pertinent to the development of CPGs. As such, GRADE provides a mechanism to bridge the gap from evidence synthesis to application of the evidence for informed clinical decision-making [ 27 , 190 ]. We briefly examine the GRADE approach but only as it applies to rating overall certainty of evidence in systematic reviews.

In GRADE, use of “certainty” of a body of evidence is preferred over the term “quality.” [ 191 ] Certainty refers to the level of confidence systematic review authors have that, for each outcome, an effect estimate represents the true effect. The GRADE approach to rating confidence in estimates begins with identifying the study type (RCT or NRSI) and then systematically considers criteria to rate the certainty of evidence up or down (Table 5.1 ).

This process results in assignment of one of the four GRADE certainty ratings to each outcome; these are clearly conveyed with the use of basic interpretation symbols (Table 5.2 ) [ 192 ]. Notably, when multiple outcomes are reported in a systematic review, each outcome is assigned a unique certainty rating; thus different levels of certainty may exist in the body of evidence being examined.

GRADE’s developers acknowledge some subjectivity is involved in this process [ 193 ]. In addition, they emphasize that both the criteria for rating evidence up and down (Table 5.1 ) as well as the four overall certainty ratings (Table 5.2 ) reflect a continuum as opposed to discrete categories [ 194 ]. Consequently, deciding whether a study falls above or below the threshold for rating up or down may not be straightforward, and preliminary overall certainty ratings may be intermediate (eg, between low and moderate). Thus, the proper application of GRADE requires systematic review authors to take an overall view of the body of evidence and explicitly describe the rationale for their final ratings.

Advantages of GRADE

Outcomes important to the individuals who experience the problem of interest maintain a prominent role throughout the GRADE process [ 191 ]. These outcomes must inform the research questions (eg, PICO [population, intervention, comparator, outcome]) that are specified a priori in a systematic review protocol. Evidence for these outcomes is then investigated and each critical or important outcome is ultimately assigned a certainty of evidence as the end point of the review. Notably, limitations of the included studies have an impact at the outcome level. Ultimately, the certainty ratings for each outcome reported in a systematic review are considered by guideline panels. They use a different process to formulate recommendations that involves assessment of the evidence across outcomes [ 201 ]. It is beyond our scope to describe the GRADE process for formulating recommendations; however, it is critical to understand how these two outcome-centric concepts of certainty of evidence in the GRADE framework are related and distinguished. An in-depth illustration using examples from recently published evidence syntheses and CPGs is provided in Additional File 5 A (Table AF5A-1).

The GRADE approach is applicable irrespective of whether the certainty of the primary research evidence is high or very low; in some circumstances, indirect evidence of higher certainty may be considered if direct evidence is unavailable or of low certainty [ 27 ]. In fact, most interventions and outcomes in medicine have low or very low certainty of evidence based on GRADE and there seems to be no major improvement over time [ 202 , 203 ]. This is still a very important (even if sobering) realization for calibrating our understanding of medical evidence. A major appeal of the GRADE approach is that it offers a common framework that enables authors of evidence syntheses to make complex judgments about evidence certainty and to convey these with unambiguous terminology. This prevents some common mistakes made by review authors, including overstating results (or under-reporting harms) [ 187 ] and making recommendations for treatment. This is illustrated in Table AF5A-2 (Additional File 5 A), which compares the concluding statements made about overall certainty in a systematic review with and without application of the GRADE approach.

Theoretically, application of GRADE should improve consistency of judgments about certainty of evidence, both between authors and across systematic reviews. In one empirical evaluation conducted by the GRADE Working Group, interrater reliability of two individual raters assessing certainty of the evidence for a specific outcome increased from ~ 0.3 without using GRADE to ~ 0.7 by using GRADE [ 204 ]. However, others report variable agreement among those experienced in GRADE assessments of evidence certainty [ 190 ]. Like any other tool, GRADE requires training in order to be properly applied. The intricacies of the GRADE approach and the necessary subjectivity involved suggest that improving agreement may require strict rules for its application; alternatively, use of general guidance and consensus among review authors may result in less consistency but provide important information for the end user [ 190 ].

GRADE caveats

Simply invoking “the GRADE approach” does not automatically ensure GRADE methods were employed by authors of a systematic review (or developers of a CPG). Table 5.3 lists the criteria the GRADE working group has established for this purpose. These criteria highlight the specific terminology and methods that apply to rating the certainty of evidence for outcomes reported in a systematic review [ 191 ], which is different from rating overall certainty across outcomes considered in the formulation of recommendations [ 205 ]. Modifications of standard GRADE methods and terminology are discouraged as these may detract from GRADE’s objectives to minimize conceptual confusion and maximize clear communication [ 206 ].

Nevertheless, GRADE is prone to misapplications [ 207 , 208 ], which can distort a systematic review’s conclusions about the certainty of evidence. Systematic review authors without proper GRADE training are likely to misinterpret the terms “quality” and “grade” and to misunderstand the constructs assessed by GRADE versus other appraisal tools. For example, review authors may reference the standard GRADE certainty ratings (Table 5.2 ) to describe evidence for their outcome(s) of interest. However, these ratings are invalidated if authors omit or inadequately perform RoB evaluations of each included primary study. Such deficiencies in RoB assessments are unacceptable but not uncommon, as reported in methodological studies of systematic reviews and overviews [ 104 , 186 , 209 , 210 ]. GRADE ratings are also invalidated if review authors do not formally address and report on the other criteria (Table 5.1 ) necessary for a GRADE certainty rating.

Other caveats pertain to application of a GRADE certainty of evidence rating in various types of evidence syntheses. Current adaptations of GRADE are described in Additional File 5 B and included on Table 6.3 , which is introduced in the next section.

The expected culmination of a systematic review should be a rating of overall certainty of a body of evidence for each outcome reported. The GRADE approach is recommended for making these judgments for outcomes reported in systematic reviews of interventions and can be adapted for other types of reviews. This represents the initial step in the process of making recommendations based on evidence syntheses. Peer reviewers should ensure authors meet the minimal criteria for supporting the GRADE approach when reviewing any evidence synthesis that reports certainty ratings derived using GRADE. Authors and peer reviewers of evidence syntheses unfamiliar with GRADE are encouraged to seek formal training and take advantage of the resources available on the GRADE website [ 211 , 212 ].

Part 6. Concise Guide to best practices

Accumulating data in recent years suggest that many evidence syntheses (with or without meta-analysis) are not reliable. This relates in part to the fact that their authors, who are often clinicians, can be overwhelmed by the plethora of ways to evaluate evidence. They tend to resort to familiar but often inadequate, inappropriate, or obsolete methods and tools and, as a result, produce unreliable reviews. These manuscripts may not be recognized as such by peer reviewers and journal editors who may disregard current standards. When such a systematic review is published or included in a CPG, clinicians and stakeholders tend to believe that it is trustworthy. A vicious cycle in which inadequate methodology is rewarded and potentially misleading conclusions are accepted is thus supported. There is no quick or easy way to break this cycle; however, increasing awareness of best practices among all these stakeholder groups, who often have minimal (if any) training in methodology, may begin to mitigate it. This is the rationale for inclusion of Parts 2 through 5 in this guidance document. These sections present core concepts and important methodological developments that inform current standards and recommendations. We conclude by taking a direct and practical approach.

Inconsistent and imprecise terminology used in the context of development and evaluation of evidence syntheses is problematic for authors, peer reviewers and editors, and may lead to the application of inappropriate methods and tools. In response, we endorse use of the basic terms (Table 6.1 ) defined in the PRISMA 2020 statement [ 93 ]. In addition, we have identified several problematic expressions and nomenclature. In Table 6.2 , we compile suggestions for preferred terms less likely to be misinterpreted.

We also propose a Concise Guide (Table 6.3 ) that summarizes the methods and tools recommended for the development and evaluation of nine types of evidence syntheses. Suggestions for specific tools are based on the rigor of their development as well as the availability of detailed guidance from their developers to ensure their proper application. The formatting of the Concise Guide addresses a well-known source of confusion by clearly distinguishing the underlying methodological constructs that these tools were designed to assess. Important clarifications and explanations follow in the guide’s footnotes; associated websites, if available, are listed in Additional File 6 .

To encourage uptake of best practices, journal editors may consider adopting or adapting the Concise Guide in their instructions to authors and peer reviewers of evidence syntheses. Given the evolving nature of evidence synthesis methodology, the suggested methods and tools are likely to require regular updates. Authors of evidence syntheses should monitor the literature to ensure they are employing current methods and tools. Some types of evidence syntheses (eg, rapid, economic, methodological) are not included in the Concise Guide; for these, authors are advised to obtain recommendations for acceptable methods by consulting with their target journal.

We encourage the appropriate and informed use of the methods and tools discussed throughout this commentary and summarized in the Concise Guide (Table 6.3 ). However, we caution against their application in a perfunctory or superficial fashion. This is a common pitfall among authors of evidence syntheses, especially as the standards of such tools become associated with acceptance of a manuscript by a journal. Consequently, published evidence syntheses may show improved adherence to the requirements of these tools without necessarily making genuine improvements in their performance.

In line with our main objective, the suggested tools in the Concise Guide address the reliability of evidence syntheses; however, we recognize that the utility of systematic reviews is an equally important concern. An unbiased and thoroughly reported evidence synthesis may still not be highly informative if the evidence itself that is summarized is sparse, weak and/or biased [ 24 ]. Many intervention systematic reviews, including those developed by Cochrane [ 203 ] and those applying GRADE [ 202 ], ultimately find no evidence, or find the evidence to be inconclusive (eg, “weak,” “mixed,” or of “low certainty”). This often reflects the primary research base; however, it is important to know what is known (or not known) about a topic when considering an intervention for patients and discussing treatment options with them.

Alternatively, the frequency of “empty” and inconclusive reviews published in the medical literature may relate to limitations of conventional methods that focus on hypothesis testing; these have emphasized the importance of statistical significance in primary research and effect sizes from aggregate meta-analyses [ 183 ]. It is becoming increasingly apparent that this approach may not be appropriate for all topics [ 130 ]. Development of the GRADE approach has facilitated a better understanding of significant factors (beyond effect size) that contribute to the overall certainty of evidence. Other notable responses include the development of integrative synthesis methods for the evaluation of complex interventions [ 230 , 231 ], the incorporation of crowdsourcing and machine learning into systematic review workflows (eg the Cochrane Evidence Pipeline) [ 2 ], the shift in paradigm to living systemic review and NMA platforms [ 232 , 233 ] and the proposal of a new evidence ecosystem that fosters bidirectional collaborations and interactions among a global network of evidence synthesis stakeholders [ 234 ]. These evolutions in data sources and methods may ultimately make evidence syntheses more streamlined, less duplicative, and more importantly, they may be more useful for timely policy and clinical decision-making; however, that will only be the case if they are rigorously reported and conducted.

We look forward to others’ ideas and proposals for the advancement of methods for evidence syntheses. For now, we encourage dissemination and uptake of the currently accepted best tools and practices for their development and evaluation; at the same time, we stress that uptake of appraisal tools, checklists, and software programs cannot substitute for proper education in the methodology of evidence syntheses and meta-analysis. Authors, peer reviewers, and editors must strive to make accurate and reliable contributions to the present evidence knowledge base; online alerts, upcoming technology, and accessible education may make this more feasible than ever before. Our intention is to improve the trustworthiness of evidence syntheses across disciplines, topics, and types of evidence syntheses. All of us must continue to study, teach, and act cooperatively for that to happen.

Muka T, Glisic M, Milic J, Verhoog S, Bohlius J, Bramer W, et al. A 24-step guide on how to design, conduct, and successfully publish a systematic review and meta-analysis in medical research. Eur J Epidemiol. 2020;35(1):49–60.

Article   PubMed   Google Scholar  

Thomas J, McDonald S, Noel-Storr A, Shemilt I, Elliott J, Mavergames C, et al. Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for cochrane reviews. J Clin Epidemiol. 2021;133:140–51.

Article   PubMed   PubMed Central   Google Scholar  

Fontelo P, Liu F. A review of recent publication trends from top publishing countries. Syst Rev. 2018;7(1):147.

Whiting P, Savović J, Higgins JPT, Caldwell DM, Reeves BC, Shea B, et al. ROBIS: a new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225–34.

Shea BJ, Grimshaw JM, Wells GA, Boers M, Andersson N, Hamel C, et al. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol. 2007;7:1–7.

Article   Google Scholar  

Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358: j4008.

Goldkuhle M, Narayan VM, Weigl A, Dahm P, Skoetz N. A systematic assessment of Cochrane reviews and systematic reviews published in high-impact medical journals related to cancer. BMJ Open. 2018;8(3): e020869.

Ho RS, Wu X, Yuan J, Liu S, Lai X, Wong SY, et al. Methodological quality of meta-analyses on treatments for chronic obstructive pulmonary disease: a cross-sectional study using the AMSTAR (Assessing the Methodological Quality of Systematic Reviews) tool. NPJ Prim Care Respir Med. 2015;25:14102.

Tsoi AKN, Ho LTF, Wu IXY, Wong CHL, Ho RST, Lim JYY, et al. Methodological quality of systematic reviews on treatments for osteoporosis: a cross-sectional study. Bone. 2020;139(June): 115541.

Arienti C, Lazzarini SG, Pollock A, Negrini S. Rehabilitation interventions for improving balance following stroke: an overview of systematic reviews. PLoS ONE. 2019;14(7):1–23.

Kolaski K, Romeiser Logan L, Goss KD, Butler C. Quality appraisal of systematic reviews of interventions for children with cerebral palsy reveals critically low confidence. Dev Med Child Neurol. 2021;63(11):1316–26.

Almeida MO, Yamato TP, Parreira PCS, do Costa LOP, Kamper S, Saragiotto BT. Overall confidence in the results of systematic reviews on exercise therapy for chronic low back pain: a cross-sectional analysis using the Assessing the Methodological Quality of Systematic Reviews (AMSTAR) 2 tool. Braz J Phys Ther. 2020;24(2):103–17.

Mayo-Wilson E, Ng SM, Chuck RS, Li T. The quality of systematic reviews about interventions for refractive error can be improved: a review of systematic reviews. BMC Ophthalmol. 2017;17(1):1–10.

Matthias K, Rissling O, Pieper D, Morche J, Nocon M, Jacobs A, et al. The methodological quality of systematic reviews on the treatment of adult major depression needs improvement according to AMSTAR 2: a cross-sectional study. Heliyon. 2020;6(9): e04776.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Riado Minguez D, Kowalski M, Vallve Odena M, Longin Pontzen D, Jelicic Kadic A, Jeric M, et al. Methodological and reporting quality of systematic reviews published in the highest ranking journals in the field of pain. Anesth Analg. 2017;125(4):1348–54.

Churuangsuk C, Kherouf M, Combet E, Lean M. Low-carbohydrate diets for overweight and obesity: a systematic review of the systematic reviews. Obes Rev. 2018;19(12):1700–18.

Article   CAS   PubMed   Google Scholar  

Storman M, Storman D, Jasinska KW, Swierz MJ, Bala MM. The quality of systematic reviews/meta-analyses published in the field of bariatrics: a cross-sectional systematic survey using AMSTAR 2 and ROBIS. Obes Rev. 2020;21(5):1–11.

Franco JVA, Arancibia M, Meza N, Madrid E, Kopitowski K. [Clinical practice guidelines: concepts, limitations and challenges]. Medwave. 2020;20(3):e7887 ([Spanish]).

Brito JP, Tsapas A, Griebeler ML, Wang Z, Prutsky GJ, Domecq JP, et al. Systematic reviews supporting practice guideline recommendations lack protection against bias. J Clin Epidemiol. 2013;66(6):633–8.

Zhou Q, Wang Z, Shi Q, Zhao S, Xun Y, Liu H, et al. Clinical epidemiology in China series. Paper 4: the reporting and methodological quality of Chinese clinical practice guidelines published between 2014 and 2018: a systematic review. J Clin Epidemiol. 2021;140:189–99.

Lunny C, Ramasubbu C, Puil L, Liu T, Gerrish S, Salzwedel DM, et al. Over half of clinical practice guidelines use non-systematic methods to inform recommendations: a methods study. PLoS ONE. 2021;16(4):1–21.

Faber T, Ravaud P, Riveros C, Perrodeau E, Dechartres A. Meta-analyses including non-randomized studies of therapeutic interventions: a methodological review. BMC Med Res Methodol. 2016;16(1):1–26.

Ioannidis JPA. The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses. Milbank Q. 2016;94(3):485–514.

Møller MH, Ioannidis JPA, Darmon M. Are systematic reviews and meta-analyses still useful research? We are not sure. Intensive Care Med. 2018;44(4):518–20.

Moher D, Glasziou P, Chalmers I, Nasser M, Bossuyt PMM, Korevaar DA, et al. Increasing value and reducing waste in biomedical research: who’s listening? Lancet. 2016;387(10027):1573–86.

Barnard ND, Willet WC, Ding EL. The misuse of meta-analysis in nutrition research. JAMA. 2017;318(15):1435–6.

Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, et al. GRADE guidelines: 1. Introduction - GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011;64(4):383–94.

Page MJ, Shamseer L, Altman DG, Tetzlaff J, Sampson M, Tricco AC, et al. Epidemiology and reporting characteristics of systematic reviews of biomedical research: a cross-sectional study. PLoS Med. 2016;13(5):1–31.

World Health Organization. WHO handbook for guideline development, 2nd edn. WHO; 2014. Available from: https://www.who.int/publications/i/item/9789241548960 . Cited 2022 Jan 20

Higgins J, Lasserson T, Chandler J, Tovey D, Thomas J, Flemying E, et al. Methodological expectations of Cochrane intervention reviews. Cochrane; 2022. Available from: https://community.cochrane.org/mecir-manual/key-points-and-introduction . Cited 2022 Jul 19

Cumpston M, Chandler J. Chapter II: Planning a Cochrane review. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook . Cited 2022 Jan 30

Henderson LK, Craig JC, Willis NS, Tovey D, Webster AC. How to write a cochrane systematic review. Nephrology. 2010;15(6):617–24.

Page MJ, Altman DG, Shamseer L, McKenzie JE, Ahmadzai N, Wolfe D, et al. Reproducible research practices are underused in systematic reviews of biomedical interventions. J Clin Epidemiol. 2018;94:8–18.

Lorenz RC, Matthias K, Pieper D, Wegewitz U, Morche J, Nocon M, et al. AMSTAR 2 overall confidence rating: lacking discriminating capacity or requirement of high methodological quality? J Clin Epidemiol. 2020;119:142–4.

Posadzki P, Pieper D, Bajpai R, Makaruk H, Könsgen N, Neuhaus AL, et al. Exercise/physical activity and health outcomes: an overview of Cochrane systematic reviews. BMC Public Health. 2020;20(1):1–12.

Wells G, Shea B, O’Connell D, Peterson J, Welch V, Losos M. The Newcastile-Ottawa Scale (NOS) for assessing the quality of nonrandomized studies in meta-analyses. The Ottawa Hospital; 2009. Available from: https://www.ohri.ca/programs/clinical_epidemiology/oxford.asp . Cited 2022 Jul 19

Stang A. Critical evaluation of the Newcastle-Ottawa scale for the assessment of the quality of nonrandomized studies in meta-analyses. Eur J Epidemiol. 2010;25(9):603–5.

Stang A, Jonas S, Poole C. Case study in major quotation errors: a critical commentary on the Newcastle-Ottawa scale. Eur J Epidemiol. 2018;33(11):1025–31.

Ioannidis JPA. Massive citations to misleading methods and research tools: Matthew effect, quotation error and citation copying. Eur J Epidemiol. 2018;33(11):1021–3.

Khalil H, Ameen D, Zarnegar A. Tools to support the automation of systematic reviews: a scoping review. J Clin Epidemiol. 2022;144:22–42.

Crequit P, Boutron I, Meerpohl J, Williams H, Craig J, Ravaud P. Future of evidence ecosystem series: 2. Current opportunities and need for better tools and methods. J Clin Epidemiol. 2020;123:143–52.

Shemilt I, Noel-Storr A, Thomas J, Featherstone R, Mavergames C. Machine learning reduced workload for the cochrane COVID-19 study register: development and evaluation of the cochrane COVID-19 study classifier. Syst Rev. 2022;11(1):15.

Nguyen P-Y, Kanukula R, McKensie J, Alqaidoom Z, Brennan SE, Haddaway N, et al. Changing patterns in reporting and sharing of review data in systematic reviews with meta-analysis of the effects of interventions: a meta-research study. medRxiv; 2022 Available from: https://doi.org/10.1101/2022.04.11.22273688v3 . Cited 2022 Nov 18

Afshari A, Møller MH. Broken science and the failure of academics—resignation or reaction? Acta Anaesthesiol Scand. 2018;62(8):1038–40.

Butler E, Granholm A, Aneman A. Trustworthy systematic reviews–can journals do more? Acta Anaesthesiol Scand. 2019;63(4):558–9.

Negrini S, Côté P, Kiekens C. Methodological quality of systematic reviews on interventions for children with cerebral palsy: the evidence pyramid paradox. Dev Med Child Neurol. 2021;63(11):1244–5.

Page MJ, Moher D. Mass production of systematic reviews and meta-analyses: an exercise in mega-silliness? Milbank Q. 2016;94(3):515–9.

Clarke M, Chalmers I. Reflections on the history of systematic reviews. BMJ Evid Based Med. 2018;23(4):121–2.

Alnemer A, Khalid M, Alhuzaim W, Alnemer A, Ahmed B, Alharbi B, et al. Are health-related tweets evidence based? Review and analysis of health-related tweets on twitter. J Med Internet Res. 2015;17(10): e246.

PubMed   PubMed Central   Google Scholar  

Haber N, Smith ER, Moscoe E, Andrews K, Audy R, Bell W, et al. Causal language and strength of inference in academic and media articles shared in social media (CLAIMS): a systematic review. PLoS ONE. 2018;13(5): e196346.

Swetland SB, Rothrock AN, Andris H, Davis B, Nguyen L, Davis P, et al. Accuracy of health-related information regarding COVID-19 on Twitter during a global pandemic. World Med Heal Policy. 2021;13(3):503–17.

Nascimento DP, Almeida MO, Scola LFC, Vanin AA, Oliveira LA, Costa LCM, et al. Letter to the editor – not even the top general medical journals are free of spin: a wake-up call based on an overview of reviews. J Clin Epidemiol. 2021;139:232–4.

Ioannidis JPA, Fanelli D, Dunne DD, Goodman SN. Meta-research: evaluation and improvement of research methods and practices. PLoS Biol. 2015;13(10):1–7.

Munn Z, Stern C, Aromataris E, Lockwood C, Jordan Z. What kind of systematic review should I conduct? A proposed typology and guidance for systematic reviewers in the medical and health sciences. BMC Med Res Methodol. 2018;18(1):1–9.

Pollock M, Fernandez R, Becker LA, Pieper D, Hartling L. Chapter V: overviews of reviews. Cochrane handbook for systematic reviews of interventions. In:  Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-v . Cited 2022 Mar 7

Tricco AC, Lillie E, Zarin W, O’Brien K, Colquhoun H, Kastner M, et al. A scoping review on the conduct and reporting of scoping reviews. BMC Med Res Methodol. 2016;16(1):1–10.

Garritty C, Gartlehner G, Nussbaumer-Streit B, King VJ, Hamel C, Kamel C, et al. Cochrane rapid reviews methods group offers evidence-informed guidance to conduct rapid reviews. J Clin Epidemiol. 2021;130:13–22.

Elliott JH, Synnot A, Turner T, Simmonds M, Akl EA, McDonald S, et al. Living systematic review: 1. Introduction—the why, what, when, and how. J Clin Epidemiol. 2017;91:23–30.

Higgins JPT, Thomas J, Chandler J. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook . Cited 2022 Jan 25

Aromataris E, Munn Z. JBI Manual for Evidence Synthesis [internet]. JBI; 2020 [cited 2022 Jan 15]. Available from: https://synthesismanual.jbi.global .

Tufanaru C, Munn Z, Aromartaris E, Campbell J, Hopp L. Chapter 3: Systematic reviews of effectiveness. In Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis [internet]. JBI; 2020 [cited 2022 Jan 25]. Available from: https://synthesismanual.jbi.global .

Leeflang MMG, Davenport C, Bossuyt PM. Defining the review question. In: Deeks JJ, Bossuyt PM, Leeflang MMG, Takwoingi Y, editors. Cochrane handbook for systematic reviews of diagnostic test accuracy [internet]. Cochrane; 2022 [cited 2022 Mar 30]. Available from: https://training.cochrane.org/6-defining-review-question .

Noyes J, Booth A, Cargo M, Flemming K, Harden A, Harris J, et al.Qualitative evidence. In: Higgins J, Tomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions [internet]. Cochrane; 2022 [cited 2022 Mar 30]. Available from: https://training.cochrane.org/handbook/current/chapter-21#section-21-5 .

Lockwood C, Porritt K, Munn Z, Rittenmeyer L, Salmond S, Bjerrum M, et al. Chapter 2: Systematic reviews of qualitative evidence. In: Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis [internet]. JBI; 2020 [cited 2022 Jul 11]. Available from: https://synthesismanual.jbi.global .

Debray TPA, Damen JAAG, Snell KIE, Ensor J, Hooft L, Reitsma JB, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ. 2017;356:i6460.

Moola S, Munn Z, Tufanaru C, Aromartaris E, Sears K, Sfetcu R, et al. Systematic reviews of etiology and risk. In: Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis [internet]. JBI; 2020 [cited 2022 Mar 30]. Available from: https://synthesismanual.jbi.global/ .

Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res. 2010;19(4):539–49.

Prinsen CAC, Mokkink LB, Bouter LM, Alonso J, Patrick DL, de Vet HCW, et al. COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27(5):1147–57.

Munn Z, Moola S, Lisy K, Riitano D, Tufanaru C. Chapter 5: Systematic reviews of prevalence and incidence. In: Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis [internet]. JBI; 2020 [cited 2022 Mar 30]. Available from: https://synthesismanual.jbi.global/ .

Centre for Evidence-Based Medicine. Study designs. CEBM; 2016. Available from: https://www.cebm.ox.ac.uk/resources/ebm-tools/study-designs . Cited 2022 Aug 30

Hartling L, Bond K, Santaguida PL, Viswanathan M, Dryden DM. Testing a tool for the classification of study designs in systematic reviews of interventions and exposures showed moderate reliability and low accuracy. J Clin Epidemiol. 2011;64(8):861–71.

Crowe M, Sheppard L, Campbell A. Reliability analysis for a proposed critical appraisal tool demonstrated value for diverse research designs. J Clin Epidemiol. 2012;65(4):375–83.

Reeves BC, Wells GA, Waddington H. Quasi-experimental study designs series—paper 5: a checklist for classifying studies evaluating the effects on health interventions—a taxonomy without labels. J Clin Epidemiol. 2017;89:30–42.

Reeves BC, Deeks JJ, Higgins JPT, Shea B, Tugwell P, Wells GA. Chapter 24: including non-randomized studies on intervention effects.  In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-24 . Cited 2022 Mar 1

Reeves B. A framework for classifying study designs to evaluate health care interventions. Forsch Komplementarmed Kl Naturheilkd. 2004;11(Suppl 1):13–7.

Google Scholar  

Rockers PC, Røttingen J, Shemilt I. Inclusion of quasi-experimental studies in systematic reviews of health systems research. Health Policy. 2015;119(4):511–21.

Mathes T, Pieper D. Clarifying the distinction between case series and cohort studies in systematic reviews of comparative studies: potential impact on body of evidence and workload. BMC Med Res Methodol. 2017;17(1):8–13.

Jhangiani R, Cuttler C, Leighton D. Single subject research. In: Jhangiani R, Cuttler C, Leighton D, editors. Research methods in psychology, 4th edn. Pressbooks KPU; 2019. Available from: https://kpu.pressbooks.pub/psychmethods4e/part/single-subject-research/ . Cited 2022 Aug 15

Higgins JP, Ramsay C, Reeves BC, Deeks JJ, Shea B, Valentine JC, et al. Issues relating to study design and risk of bias when including non-randomized studies in systematic reviews on the effects of interventions. Res Synth Methods. 2013;4(1):12–25.

Cumpston M, Lasserson T, Chandler J, Page M. 3.4.1 Criteria for considering studies for this review, Chapter III: Reporting the review. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-iii#section-iii-3-4-1 . Cited 2022 Oct 12

Kooistra B, Dijkman B, Einhorn TA, Bhandari M. How to design a good case series. J Bone Jt Surg. 2009;91(Suppl 3):21–6.

Murad MH, Sultan S, Haffar S, Bazerbachi F. Methodological quality and synthesis of case series and case reports. Evid Based Med. 2018;23(2):60–3.

Robinson K, Chou R, Berkman N, Newberry S, FU R, Hartling L, et al. Methods guide for comparative effectiveness reviews integrating bodies of evidence: existing systematic reviews and primary studies. AHRQ; 2015. Available from: https://archive.org/details/integrating-evidence-report-150226 . Cited 2022 Aug 7

Tugwell P, Welch VA, Karunananthan S, Maxwell LJ, Akl EA, Avey MT, et al. When to replicate systematic reviews of interventions: consensus checklist. BMJ. 2020;370: m2864.

Tsertsvadze A, Maglione M, Chou R, Garritty C, Coleman C, Lux L, et al. Updating comparative effectiveness reviews:current efforts in AHRQ’s effective health care program. J Clin Epidemiol. 2011;64(11):1208–15.

Cumpston M, Chandler J. Chapter IV: Updating a review. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook . Cited 2022 Aug 2

Pollock M, Fernandes RM, Newton AS, Scott SD, Hartling L. A decision tool to help researchers make decisions about including systematic reviews in overviews of reviews of healthcare interventions. Syst Rev. 2019;8(1):1–8.

Pussegoda K, Turner L, Garritty C, Mayhew A, Skidmore B, Stevens A, et al. Identifying approaches for assessing methodological and reporting quality of systematic reviews: a descriptive study. Syst Rev. 2017;6(1):1–12.

Bhaumik S. Use of evidence for clinical practice guideline development. Trop Parasitol. 2017;7(2):65–71.

Moher D, Eastwood S, Olkin I, Drummond R, Stroup D. Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Lancet. 1999;354:1896–900.

Stroup D, Berlin J, Morton S, Olkin I, Williamson G, Rennie D, et al. Meta-analysis of observational studies in epidemiology A proposal for reporting. JAMA. 2000;238(15):2008–12.

Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. J Clin Epidemiol. 2009;62(10):1006–12.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372: n71.

Oxman AD, Guyatt GH. Validation of an index of the quality of review articles. J Clin Epidemiol. 1991;44(11):1271–8.

Centre for Evidence-Based Medicine. Critical appraisal tools. CEBM; 2015. Available from: https://www.cebm.ox.ac.uk/resources/ebm-tools/critical-appraisal-tools . Cited 2022 Apr 10

Page MJ, McKenzie JE, Higgins JPT. Tools for assessing risk of reporting biases in studies and syntheses of studies: a systematic review. BMJ Open. 2018;8(3):1–16.

Article   CAS   Google Scholar  

Ma LL, Wang YY, Yang ZH, Huang D, Weng H, Zeng XT. Methodological quality (risk of bias) assessment tools for primary and secondary medical studies: what are they and which is better? Mil Med Res. 2020;7(1):1–11.

Banzi R, Cinquini M, Gonzalez-Lorenzo M, Pecoraro V, Capobussi M, Minozzi S. Quality assessment versus risk of bias in systematic reviews: AMSTAR and ROBIS had similar reliability but differed in their construct and applicability. J Clin Epidemiol. 2018;99:24–32.

Swierz MJ, Storman D, Zajac J, Koperny M, Weglarz P, Staskiewicz W, et al. Similarities, reliability and gaps in assessing the quality of conduct of systematic reviews using AMSTAR-2 and ROBIS: systematic survey of nutrition reviews. BMC Med Res Methodol. 2021;21(1):1–10.

Pieper D, Puljak L, González-Lorenzo M, Minozzi S. Minor differences were found between AMSTAR 2 and ROBIS in the assessment of systematic reviews including both randomized and nonrandomized studies. J Clin Epidemiol. 2019;108:26–33.

Lorenz RC, Matthias K, Pieper D, Wegewitz U, Morche J, Nocon M, et al. A psychometric study found AMSTAR 2 to be a valid and moderately reliable appraisal tool. J Clin Epidemiol. 2019;114:133–40.

Leclercq V, Hiligsmann M, Parisi G, Beaudart C, Tirelli E, Bruyère O. Best-worst scaling identified adequate statistical methods and literature search as the most important items of AMSTAR2 (A measurement tool to assess systematic reviews). J Clin Epidemiol. 2020;128:74–82.

Bühn S, Mathes T, Prengel P, Wegewitz U, Ostermann T, Robens S, et al. The risk of bias in systematic reviews tool showed fair reliability and good construct validity. J Clin Epidemiol. 2017;91:121–8.

Gates M, Gates A, Duarte G, Cary M, Becker M, Prediger B, et al. Quality and risk of bias appraisals of systematic reviews are inconsistent across reviewers and centers. J Clin Epidemiol. 2020;125:9–15.

Perry R, Whitmarsh A, Leach V, Davies P. A comparison of two assessment tools used in overviews of systematic reviews: ROBIS versus AMSTAR-2. Syst Rev. 2021;10(1):273.

Gates M, Gates A, Guitard S, Pollock M, Hartling L. Guidance for overviews of reviews continues to accumulate, but important challenges remain: a scoping review. Syst Rev. 2020;9(1):1–19.

Aromataris E, Fernandez R, Godfrey C, Holly C, Khalil H, Tungpunkom P. Chapter 10: umbrella reviews. In: Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis. JBI; 2020. Available from: https://synthesismanual.jbi.global . Cited 2022 Jul 11

Pieper D, Lorenz RC, Rombey T, Jacobs A, Rissling O, Freitag S, et al. Authors should clearly report how they derived the overall rating when applying AMSTAR 2—a cross-sectional study. J Clin Epidemiol. 2021;129:97–103.

Franco JVA, Meza N. Authors should also report the support for judgment when applying AMSTAR 2. J Clin Epidemiol. 2021;138:240.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. PLoS Med. 2009;6(7): e1000100.

Page MJ, Moher D. Evaluations of the uptake and impact of the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement and extensions: a scoping review. Syst Rev. 2017;6(1):263.

Page MJ, Moher D, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ. 2021;372: n160.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. Updating guidance for reporting systematic reviews: development of the PRISMA 2020 statement. J Clin Epidemiol. 2021;134:103–12.

Welch V, Petticrew M, Petkovic J, Moher D, Waters E, White H, et al. Extending the PRISMA statement to equity-focused systematic reviews (PRISMA-E 2012): explanation and elaboration. J Clin Epidemiol. 2016;70:68–89.

Beller EM, Glasziou PP, Altman DG, Hopewell S, Bastian H, Chalmers I, et al. PRISMA for abstracts: reporting systematic reviews in journal and conference abstracts. PLoS Med. 2013;10(4): e1001419.

Moher D, Shamseer L, Clarke M. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev. 2015;4(1):1.

Hutton B, Salanti G, Caldwell DM, Chaimani A, Schmid CH, Cameron C, et al. The PRISMA extension statement for reporting of systematic reviews incorporating network meta-analyses of health care interventions: checklist and explanations. Ann Intern Med. 2015;162(11):777–84.

Stewart LA, Clarke M, Rovers M, Riley RD, Simmonds M, Stewart G, et al. Preferred reporting items for a systematic review and meta-analysis of individual participant data: The PRISMA-IPD statement. JAMA. 2015;313(16):1657–65.

Zorzela L, Loke YK, Ioannidis JP, Golder S, Santaguida P, Altman DG, et al. PRISMA harms checklist: Improving harms reporting in systematic reviews. BMJ. 2016;352: i157.

McInnes MDF, Moher D, Thombs BD, McGrath TA, Bossuyt PM, Clifford T, et al. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy studies The PRISMA-DTA statement. JAMA. 2018;319(4):388–96.

Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169(7):467–73.

Wang X, Chen Y, Liu Y, Yao L, Estill J, Bian Z, et al. Reporting items for systematic reviews and meta-analyses of acupuncture: the PRISMA for acupuncture checklist. BMC Complement Altern Med. 2019;19(1):1–10.

Rethlefsen ML, Kirtley S, Waffenschmidt S, Ayala AP, Moher D, Page MJ, et al. PRISMA-S: An extension to the PRISMA statement for reporting literature searches in systematic reviews. J Med Libr Assoc. 2021;109(2):174–200.

Blanco D, Altman D, Moher D, Boutron I, Kirkham JJ, Cobo E. Scoping review on interventions to improve adherence to reporting guidelines in health research. BMJ Open. 2019;9(5): e26589.

Koster TM, Wetterslev J, Gluud C, Keus F, van der Horst ICC. Systematic overview and critical appraisal of meta-analyses of interventions in intensive care medicine. Acta Anaesthesiol Scand. 2018;62(8):1041–9.

Johnson BT, Hennessy EA. Systematic reviews and meta-analyses in the health sciences: best practice methods for research syntheses. Soc Sci Med. 2019;233:237–51.

Pollock A, Berge E. How to do a systematic review. Int J Stroke. 2018;13(2):138–56.

Gagnier JJ, Kellam PJ. Reporting and methodological quality of systematic reviews in the orthopaedic literature. J Bone Jt Surg. 2013;95(11):1–7.

Martinez-Monedero R, Danielian A, Angajala V, Dinalo JE, Kezirian EJ. Methodological quality of systematic reviews and meta-analyses published in high-impact otolaryngology journals. Otolaryngol Head Neck Surg. 2020;163(5):892–905.

Boutron I, Crequit P, Williams H, Meerpohl J, Craig J, Ravaud P. Future of evidence ecosystem series 1. Introduction-evidence synthesis ecosystem needs dramatic change. J Clin Epidemiol. 2020;123:135–42.

Ioannidis JPA, Bhattacharya S, Evers JLH, Der Veen F, Van SE, Barratt CLR, et al. Protect us from poor-quality medical research. Hum Reprod. 2018;33(5):770–6.

Lasserson T, Thomas J, Higgins J. Section 1.5 Protocol development, Chapter 1: Starting a review. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/archive/v6/chapter-01#section-1-5 . Cited 2022 Mar 20

Stewart L, Moher D, Shekelle P. Why prospective registration of systematic reviews makes sense. Syst Rev. 2012;1(1):7–10.

Allers K, Hoffmann F, Mathes T, Pieper D. Systematic reviews with published protocols compared to those without: more effort, older search. J Clin Epidemiol. 2018;95:102–10.

Ge L, Tian J, Li Y, Pan J, Li G, Wei D, et al. Association between prospective registration and overall reporting and methodological quality of systematic reviews: a meta-epidemiological study. J Clin Epidemiol. 2018;93:45–55.

Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, et al. Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) 2015: elaboration and explanation. BMJ. 2015;350: g7647.

Pieper D, Rombey T. Where to prospectively register a systematic review. Syst Rev. 2022;11(1):8.

PROSPERO. PROSPERO will require earlier registration. NIHR; 2022. Available from: https://www.crd.york.ac.uk/prospero/ . Cited 2022 Mar 20

Kirkham JJ, Altman DG, Williamson PR. Bias due to changes in specified outcomes during the systematic review process. PLoS ONE. 2010;5(3):3–7.

Victora CG, Habicht JP, Bryce J. Evidence-based public health: moving beyond randomized trials. Am J Public Health. 2004;94(3):400–5.

Peinemann F, Kleijnen J. Development of an algorithm to provide awareness in choosing study designs for inclusion in systematic reviews of healthcare interventions: a method study. BMJ Open. 2015;5(8): e007540.

Loudon K, Treweek S, Sullivan F, Donnan P, Thorpe KE, Zwarenstein M. The PRECIS-2 tool: designing trials that are fit for purpose. BMJ. 2015;350: h2147.

Junqueira DR, Phillips R, Zorzela L, Golder S, Loke Y, Moher D, et al. Time to improve the reporting of harms in randomized controlled trials. J Clin Epidemiol. 2021;136:216–20.

Hemkens LG, Contopoulos-Ioannidis DG, Ioannidis JPA. Routinely collected data and comparative effectiveness evidence: promises and limitations. CMAJ. 2016;188(8):E158–64.

Murad MH. Clinical practice guidelines: a primer on development and dissemination. Mayo Clin Proc. 2017;92(3):423–33.

Abdelhamid AS, Loke YK, Parekh-Bhurke S, Chen Y-F, Sutton A, Eastwood A, et al. Use of indirect comparison methods in systematic reviews: a survey of cochrane review authors. Res Synth Methods. 2012;3(2):71–9.

Jüni P, Holenstein F, Sterne J, Bartlett C, Egger M. Direction and impact of language bias in meta-analyses of controlled trials: empirical study. Int J Epidemiol. 2002;31(1):115–23.

Vickers A, Goyal N, Harland R, Rees R. Do certain countries produce only positive results? A systematic review of controlled trials. Control Clin Trials. 1998;19(2):159–66.

Jones CW, Keil LG, Weaver MA, Platts-Mills TF. Clinical trials registries are under-utilized in the conduct of systematic reviews: a cross-sectional analysis. Syst Rev. 2014;3(1):1–7.

Baudard M, Yavchitz A, Ravaud P, Perrodeau E, Boutron I. Impact of searching clinical trial registries in systematic reviews of pharmaceutical treatments: methodological systematic review and reanalysis of meta-analyses. BMJ. 2017;356: j448.

Fanelli D, Costas R, Ioannidis JPA. Meta-assessment of bias in science. Proc Natl Acad Sci USA. 2017;114(14):3714–9.

Hartling L, Featherstone R, Nuspl M, Shave K, Dryden DM, Vandermeer B. Grey literature in systematic reviews: a cross-sectional study of the contribution of non-English reports, unpublished studies and dissertations to the results of meta-analyses in child-relevant reviews. BMC Med Res Methodol. 2017;17(1):64.

Hopewell S, McDonald S, Clarke M, Egger M. Grey literature in meta-analyses of randomized trials of health care interventions. Cochrane Database Syst Rev. 2007;2:MR000010.

Shojania K, Sampson M, Ansari MT, Ji J, Garritty C, Radar T, et al. Updating systematic reviews. AHRQ Technical Reviews. 2007: Report 07–0087.

Tate RL, Perdices M, Rosenkoetter U, Wakim D, Godbee K, Togher L, et al. Revision of a method quality rating scale for single-case experimental designs and n-of-1 trials: The 15-item Risk of Bias in N-of-1 Trials (RoBiNT) Scale. Neuropsychol Rehabil. 2013;23(5):619–38.

Tate RL, Perdices M, McDonald S, Togher L, Rosenkoetter U. The design, conduct and report of single-case research: Resources to improve the quality of the neurorehabilitation literature. Neuropsychol Rehabil. 2014;24(3–4):315–31.

Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366: l4894.

Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355: i4919.

Igelström E, Campbell M, Craig P, Katikireddi SV. Cochrane’s risk of bias tool for non-randomized studies (ROBINS-I) is frequently misapplied: a methodological systematic review. J Clin Epidemiol. 2021;140:22–32.

McKenzie JE, Brennan SE. Chapter 12: Synthesizing and presenting findings using other methods. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-12 . Cited 2022 Apr 10

Ioannidis J, Patsopoulos N, Rothstein H. Reasons or excuses for avoiding meta-analysis in forest plots. BMJ. 2008;336(7658):1413–5.

Stewart LA, Tierney JF. To IPD or not to IPD? Eval Health Prof. 2002;25(1):76–97.

Tierney JF, Stewart LA, Clarke M. Chapter 26: Individual participant data. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-26 . Cited 2022 Oct 12

Chaimani A, Caldwell D, Li T, Higgins J, Salanti G. Chapter 11: Undertaking network meta-analyses. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook . Cited 2022 Oct 12.

Cooper H, Hedges L, Valentine J. The handbook of research synthesis and meta-analysis. 3rd ed. Russell Sage Foundation; 2019.

Sutton AJ, Abrams KR, Jones DR, Sheldon T, Song F. Methods for meta-analysis in medical research. Methods for meta-analysis in medical research; 2000.

Deeks J, Higgins JPT, Altman DG. Chapter 10: Analysing data and undertaking meta-analyses. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic review of interventions. Cochrane; 2022. Available from: http://www.training.cochrane.org/handbook . Cited 2022 Mar 20.

Clarke MJ. Individual patient data meta-analyses. Best Pract Res Clin Obstet Gynaecol. 2005;19(1):47–55.

Catalá-López F, Tobías A, Cameron C, Moher D, Hutton B. Network meta-analysis for comparing treatment effects of multiple interventions: an introduction. Rheumatol Int. 2014;34(11):1489–96.

Debray T, Schuit E, Efthimiou O, Reitsma J, Ioannidis J, Salanti G, et al. An overview of methods for network meta-analysis using individual participant data: when do benefits arise? Stat Methods Med Res. 2016;27(5):1351–64.

Tonin FS, Rotta I, Mendes AM, Pontarolo R. Network meta-analysis : a technique to gather evidence from direct and indirect comparisons. Pharm Pract (Granada). 2017;15(1):943.

Tierney JF, Vale C, Riley R, Smith CT, Stewart L, Clarke M, et al. Individual participant data (IPD) metaanalyses of randomised controlled trials: guidance on their use. PLoS Med. 2015;12(7): e1001855.

Rouse B, Chaimani A, Li T. Network meta-analysis: an introduction for clinicians. Intern Emerg Med. 2017;12(1):103–11.

Cochrane Training. Review Manager RevMan Web. Cochrane; 2022. Available from: https://training.cochrane.org/online-learning/core-software/revman . Cited 2022 Jun 24

MetaXL. MetalXL. Epi Gear; 2016. Available from: http://epigear.com/index_files/metaxl.html . Cited 2022 Jun 24.

JBI. JBI SUMARI. JBI; 2019. Available from: https://sumari.jbi.global/ . Cited 2022 Jun 24.

Ryan R. Cochrane Consumers and Communication Review Group: data synthesis and analysis. Cochrane Consumers and Communication Review Group; 2013. Available from: http://cccrg.cochrane.org . Cited 2022 Jun 24

McKenzie JE, Beller EM, Forbes AB. Introduction to systematic reviews and meta-analysis. Respirology. 2016;21(4):626–37.

Campbell M, Katikireddi SV, Sowden A, Thomson H. Lack of transparency in reporting narrative synthesis of quantitative data: a methodological assessment of systematic reviews. J Clin Epidemiol. 2019;105:1–9.

Campbell M, McKenzie JE, Sowden A, Katikireddi SV, Brennan SE, Ellis S, et al. Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline. BMJ. 2020;368: l6890.

McKenzie JE, Brennan S, Ryan R. Summarizing study characteristics and preparing for synthesis. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook . Cited 2022 Oct 12

AHRQ. Systems to rate the strength of scientific evidence. Evidence report/technology assessment no. 47. AHRQ; 2002. Available from: https://archive.ahrq.gov/clinic/epcsums/strengthsum.htm . Cited 2022 Apr 10.

Atkins D, Eccles M, Flottorp S, Guyatt GH, Henry D, Hill S, et al. Systems for grading the quality of evidence and the strength of recommendations I: critical appraisal of existing approaches. BMC Health Serv Res. 2004;4(1):38.

Ioannidis JPA. Meta-research: the art of getting it wrong.  Res Synth Methods. 2010;1(3–4):169–84.

Lai NM, Teng CL, Lee ML. Interpreting systematic reviews:  are we ready to make our own conclusions? A cross sectional study. BMC Med. 2011;9(1):30.

Glenton C, Santesso N, Rosenbaum S, Nilsen ES, Rader T, Ciapponi A, et al. Presenting the results of Cochrane systematic reviews to a consumer audience: a qualitative study. Med Decis Making. 2010;30(5):566–77.

Yavchitz A, Ravaud P, Altman DG, Moher D, HrobjartssonA, Lasserson T, et al. A new classification of spin in systematic reviews and meta-analyses was developed and ranked according to the severity. J Clin Epidemiol. 2016;75:56–65.

Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, et al. GRADE Working Group. Grading quality of evidence and strength of recommendations. BMJ. 2004;328:7454.

GRADE Working Group. Organizations. GRADE; 2022 [cited 2023 May 2].  Available from: www.gradeworkinggroup.org .

Hartling L, Fernandes RM, Seida J, Vandermeer B, Dryden DM. From the trenches: a cross-sectional study applying the grade tool in systematic reviews of healthcare interventions.  PLoS One. 2012;7(4):e34697.

Hultcrantz M, Rind D, Akl EA, Treweek S, Mustafa RA, Iorio A, et al. The GRADE working group clarifies the construct of certainty of evidence. J Clin Epidemiol. 2017;87:4–13.

Schünemann H, Brozek J, Guyatt G, Oxman AD, Editors. Section 6.3.2. Symbolic representation. GRADE Handbook [internet].  GRADE; 2013 [cited 2022 Jan 27]. Available from: https://gdt.gradepro.org/app/handbook/handbook.html#h.lr8e9vq954 .

Siemieniuk R, Guyatt G What is GRADE? [internet] BMJ Best Practice; 2017 [cited 2022 Jul 20]. Available from: https://bestpractice.bmj.com/info/toolkit/learn-ebm/what-is-grade/ .

Guyatt G, Oxman AD, Sultan S, Brozek J, Glasziou P, Alonso-Coello P, et al. GRADE guidelines: 11. Making an overall rating of confidence in effect estimates for a single outcome and for all outcomes. J Clin Epidemiol. 2013;66(2):151–7.

Guyatt GH, Oxman AD, Sultan S, Glasziou P, Akl EA, Alonso-Coello P, et al. GRADE guidelines: 9. Rating up the quality of evidence. J Clin Epidemiol. 2011;64(12):1311–6.

Guyatt GH, Oxman AD, Vist G, Kunz R, Brozek J, Alonso-Coello P, et al. GRADE guidelines: 4. Rating the quality of evidence - Study limitations (risk of bias). J Clin Epidemiol. 2011;64(4):407–15.

Guyatt GH, Oxman AD, Kunz R, Brozek J, Alonso-Coello P, Rind D, et al. GRADE guidelines 6. Rating the quality of evidence - Imprecision. J Clin Epidemiol. 2011;64(12):1283–93.

Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al. GRADE guidelines: 7. Rating the quality of evidence - Inconsistency. J Clin Epidemiol. 2011;64(12):1294–302.

Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al. GRADE guidelines: 8. Rating the quality of evidence - Indirectness. J Clin Epidemiol. 2011;64(12):1303–10.

Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J, et al. GRADE guidelines: 5. Rating the quality of evidence - Publication bias. J Clin Epidemiol. 2011;64(12):1277–82.

Andrews JC, Schünemann HJ, Oxman AD, Pottie K, Meerpohl JJ, Coello PA, et al. GRADE guidelines: 15. Going from evidence to recommendation - Determinants of a recommendation’s direction and strength. J Clin Epidemiol. 2013;66(7):726–35.

Fleming PS, Koletsi D, Ioannidis JPA, Pandis N. High quality of the evidence for medical and other health-related interventions was uncommon in Cochrane systematic reviews. J Clin Epidemiol. 2016;78:34–42.

Howick J, Koletsi D, Pandis N, Fleming PS, Loef M, Walach H, et al. The quality of evidence for medical interventions does not improve or worsen: a metaepidemiological study of Cochrane reviews. J Clin Epidemiol. 2020;126:154–9.

Mustafa RA, Santesso N, Brozek J, Akl EA, Walter SD, Norman G, et al. The GRADE approach is reproducible in assessing the quality of evidence of quantitative evidence syntheses. J Clin Epidemiol. 2013;66(7):736-742.e5.

Schünemann H, Brozek J, Guyatt G, Oxman A, editors. Section 5.4: Overall quality of evidence. GRADE Handbook. GRADE; 2013. Available from: https://gdt.gradepro.org/app/handbook/handbook.html#h.lr8e9vq954a . Cited 2022 Mar 25.

GRADE Working Group. Criteria for using GRADE. GRADE; 2016. Available from: https://www.gradeworkinggroup.org/docs/Criteria_for_using_GRADE_2016-04-05.pdf . Cited 2022 Jan 26

Werner SS, Binder N, Toews I, Schünemann HJ, Meerpohl JJ, Schwingshackl L. Use of GRADE in evidence syntheses published in high-impact-factor nutrition journals: a methodological survey. J Clin Epidemiol. 2021;135:54–69.

Zhang S, Wu QJ, Liu SX. A methodologic survey on use of the GRADE approach in evidence syntheses published in high-impact factor urology and nephrology journals. BMC Med Res Methodol. 2022;22(1):220.

Li L, Tian J, Tian H, Sun R, Liu Y, Yang K. Quality and transparency of overviews of systematic reviews. J Evid Based Med. 2012;5(3):166–73.

Pieper D, Buechter R, Jerinic P, Eikermann M. Overviews of reviews often have limited rigor: a systematic review. J Clin Epidemiol. 2012;65(12):1267–73.

Cochrane Editorial Unit. Appendix 1: Checklist for auditing GRADE and SoF tables in protocols of intervention reviews. Cochrane Training; 2022. Available from: https://training.cochrane.org/gomo/modules/522/resources/8307/Checklist for GRADE and SoF methods in Protocols for Gomo.pdf. Cited 2022 Mar 12

Ryan R, Hill S. How to GRADE the quality of the evidence. Cochrane Consumers and Communication Group. Cochrane; 2016. Available from: https://cccrg.cochrane.org/author-resources .

Cunningham M, France EF, Ring N, Uny I, Duncan EA, Roberts RJ, et al. Developing a reporting guideline to improve meta-ethnography in health research: the eMERGe mixed-methods study. Heal Serv Deliv Res. 2019;7(4):1–116.

Tong A, Flemming K, McInnes E, Oliver S, Craig J. Enhancing transparency in reporting the synthesis of qualitative research: ENTREQ. BMC Med Res Methodol. 2012;12:181.

Gates M, Gates G, Pieper D, Fernandes R, Tricco A, Moher D, et al. Reporting guideline for overviews of reviews of healthcare interventions: development of the PRIOR statement. BMJ. 2022;378:e070849.

Whiting PF, Reitsma JB, Leeflang MMG, Sterne JAC, Bossuyt PMM, Rutjes AWSS, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(4):529–36.

Hayden JA, van der Windt DA, Cartwright JL, Co P. Research and reporting methods assessing bias in studies of prognostic factors. Ann Intern Med. 2013;158(4):280–6.

Critical Appraisal Skills Programme. CASP qualitative checklist. CASP; 2018. Available from: https://casp-uk.net/images/checklist/documents/CASP-Qualitative-Studies-Checklist/CASP-Qualitative-Checklist-2018_fillable_form.pdf . Cited 2022 Apr 26

Hannes K, Lockwood C, Pearson A. A comparative analysis of three online appraisal instruments’ ability to assess validity in qualitative research. Qual Health Res. 2010;20(12):1736–43.

Munn Z, Moola S, Riitano D, Lisy K. The development of a critical appraisal tool for use in systematic reviews addressing questions of prevalence. Int J Heal Policy Manag. 2014;3(3):123–8.

Lewin S, Bohren M, Rashidian A, Munthe-Kaas H, Glenton C, Colvin CJ, et al. Applying GRADE-CERQual to qualitative evidence synthesis findings-paper 2: how to make an overall CERQual assessment of confidence and create a Summary of Qualitative Findings table. Implement Sci. 2018;13(suppl 1):10.

Munn Z, Porritt K, Lockwood C, Aromataris E, Pearson A.  Establishing confidence in the output of qualitative research synthesis: the ConQual approach. BMC Med Res Methodol. 2014;14(1):108.

Flemming K, Booth A, Hannes K, Cargo M, Noyes J. Cochrane Qualitative and Implementation Methods Group guidance series—paper 6: reporting guidelines for qualitative, implementation, and process evaluation evidence syntheses. J Clin Epidemiol. 2018;97:79–85.

Lockwood C, Munn Z, Porritt K. Qualitative research synthesis:  methodological guidance for systematic reviewers utilizing meta-aggregation. Int J Evid Based Health. 2015;13(3):179–87.

Schünemann HJ, Mustafa RA, Brozek J, Steingart KR, Leeflang M, Murad MH, et al. GRADE guidelines: 21 part 1.  Study design, risk of bias, and indirectness in rating the certainty across a body of evidence for test accuracy. J Clin Epidemiol. 2020;122:129–41.

Schünemann HJ, Mustafa RA, Brozek J, Steingart KR, Leeflang M, Murad MH, et al. GRADE guidelines: 21 part 2. Test accuracy: inconsistency, imprecision, publication bias, and other domains for rating the certainty of evidence and presenting it in evidence profiles and summary of findings tables. J Clin Epidemiol. 2020;122:142–52.

Foroutan F, Guyatt G, Zuk V, Vandvik PO, Alba AC, Mustafa R, et al. GRADE Guidelines 28: use of GRADE for the assessment of evidence about prognostic factors:  rating certainty in identification of groups of patients with different absolute risks. J Clin Epidemiol. 2020;121:62–70.

Janiaud P, Agarwal A, Belbasis L, Tzoulaki I. An umbrella review of umbrella reviews for non-randomized observational evidence on putative risk and protective factors [internet]. OSF protocol; 2021 [cited 2022 May 28]. Available from: https://osf.io/xj5cf/ .

Mokkink LB, Prinsen CA, Patrick DL, Alonso J, Bouter LM, et al. COSMIN methodology for systematic reviews of Patient-Reported Outcome Measures (PROMs) - user manual. COSMIN; 2018 [cited 2022 Feb 15]. Available from:  http://www.cosmin.nl/ .

Thomas J, M P, Noyes J, Chandler J, Rehfuess E, Tugwell P, et al. Chapter 17: Intervention complexity. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane handbook for systematic reviews of interventions. Cochrane; 2022. Available from: https://training.cochrane.org/handbook/current/chapter-17 . Cited 2022 Oct 12

Guise JM, Chang C, Butler M, Viswanathan M, Tugwell P. AHRQ series on complex intervention systematic reviews—paper 1: an introduction to a series of articles that provide guidance and tools for reviews of complex interventions. J Clin Epidemiol. 2017;90:6–10.

Riaz IB, He H, Ryu AJ, Siddiqi R, Naqvi SAA, Yao Y, et al. A living, interactive systematic review and network meta-analysis of first-line treatment of metastatic renal cell carcinoma [formula presented]. Eur Urol. 2021;80(6):712–23.

Créquit P, Trinquart L, Ravaud P. Live cumulative network meta-analysis: protocol for second-line treatments in advanced non-small-cell lung cancer with wild-type or unknown status for epidermal growth factor receptor. BMJ Open. 2016;6(8):e011841.

Ravaud P, Créquit P, Williams HC, Meerpohl J, Craig JC, Boutron I. Future of evidence ecosystem series: 3. From an evidence synthesis ecosystem to an evidence ecosystem. J Clin Epidemiol. 2020;123:153–61.

Download references

Acknowledgements

Michelle Oakman Hayes for her assistance with the graphics, Mike Clarke for his willingness to answer our seemingly arbitrary questions, and Bernard Dan for his encouragement of this project.

The work of John Ioannidis has been supported by an unrestricted gift from Sue and Bob O’Donnell to Stanford University.

Author information

Authors and affiliations.

Departments of Orthopaedic Surgery, Pediatrics, and Neurology, Wake Forest School of Medicine, Winston-Salem, NC, USA

Kat Kolaski

Department of Physical Medicine and Rehabilitation, SUNY Upstate Medical University, Syracuse, NY, USA

Lynne Romeiser Logan

Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, CA, USA

John P. A. Ioannidis

You can also search for this author in PubMed   Google Scholar

Contributions

All authors participated in the development of the ideas, writing, and review of this manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Kat Kolaski .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’ s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been published simultaneously in BMC Systematic Reviews, Acta Anaesthesiologica Scandinavica, BMC Infectious Diseases, British Journal of Pharmacology, JBI Evidence Synthesis, the Journal of Bone and Joint Surgery Reviews , and the Journal of Pediatric Rehabilitation Medicine .

Supplementary Information

Additional file 2a..

Overviews, scoping reviews, rapid reviews and living reviews.

Additional file 2B.

Practical scheme for distinguishing types of research evidence.

Additional file 4.

Presentation of forest plots.

Additional file 5A.

Illustrations of the GRADE approach.

Additional file 5B.

 Adaptations of GRADE for evidence syntheses.

Additional file 6.

 Links to Concise Guide online resources.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Kolaski, K., Logan, L.R. & Ioannidis, J.P.A. Guidance to best tools and practices for systematic reviews. Syst Rev 12 , 96 (2023). https://doi.org/10.1186/s13643-023-02255-9

Download citation

Received : 03 October 2022

Accepted : 19 February 2023

Published : 08 June 2023

DOI : https://doi.org/10.1186/s13643-023-02255-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Certainty of evidence
  • Critical appraisal
  • Methodological quality
  • Risk of bias
  • Systematic review

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

what is methodology in systematic review

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • For authors
  • Browse by collection
  • BMJ Journals

You are here

  • Volume 14, Issue 9
  • Meta-analysis and systematic review of the diagnostic value of contrast-enhanced spectral mammography for the detection of breast cancer
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Jiulin Liu 1 , 2 ,
  • Ran Xiao 3 ,
  • Huijia Yin 1 ,
  • Ying Hu 1 ,
  • Siyu Zhen 1 ,
  • Shihao Zhou 1 , 2 ,
  • http://orcid.org/0000-0001-8516-1396 Dongming Han 1
  • 1 Department of Magnetic Resonance Imaging (MRI) , The First Affiliated Hospital of Xinxiang Medical University , Weihui , Henan , China
  • 2 Department of Radiology , Luoyang Orthopedic-Traumatological Hospital of Henan Province (Henan Provincial Orthopedic Hospital) , Zhengzhou , Henan , China
  • 3 Department of Respiratory Medicine , The First Affiliated Hospital of Xinxiang Medical University , Weihui , Henan , China
  • Correspondence to Dr Dongming Han; 625492590{at}qq.com

Objective The objective is to evaluate the diagnostic effectiveness of contrast-enhanced spectral mammography (CESM) in the diagnosis of breast cancer.

Data sources PubMed, Embase and Cochrane libraries up to 18 June 2022.

Eligibility criteria for selecting studies We included trials studies, compared the results of different researchers on CESM in the diagnosis of breast cancer, and calculated the diagnostic value of CESM for breast cancer.

Data extraction and synthesis Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) evaluated the methodological quality of all the included studies. The study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses specification. In addition to sensitivity and specificity, other important parameters were explored in an analysis of CESM accuracy for breast cancer diagnosis. For overall accuracy estimation, summary receiver operating characteristic curves were calculated. STATA V.14.0 was used for all analyses.

Results This meta-analysis included a total of 12 studies. According to the summary estimates for CESM in the diagnosis of breast cancer, the pooled sensitivity and specificity were 0.97 (95% CI 0.92 to 0.98) and 0.76 (95% CI 0.64 to 0.85), respectively. Positive likelihood ratio was 4.03 (95% CI 2.65 to 6.11), negative likelihood ratio was 0.05 (95% CI 0.02 to 0.09) and the diagnostic odds ratio was 89.49 (95% CI 45.78 to 174.92). Moreover, there was a 0.95 area under the curve.

Conclusions The CESM has high sensitivity and good specificity when it comes to evaluating breast cancer, particularly in women with dense breasts. Thus, provide more information for clinical diagnosis and treatment.

  • breast imaging
  • breast tumours
  • diagnostic radiology

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:  http://creativecommons.org/licenses/by-nc/4.0/ .

https://doi.org/10.1136/bmjopen-2022-069788

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

This systematic review was a comprehensive search of experimental and observational studies on contrast-enhanced spectral mammography (CESM) in the diagnosis of breast cancer.

We included only prospective studies. Prospective studies were of higher quality with less bias, and our study screening criteria were developed prior to the meta-analysis.

The study was conducted by two people and was strictly based on inclusion criteria.

The data in this study were summarised using sound statistical methods.

A recent literature was added, and a literature from the same institution included only the most recent or the largest sample size.

We summarised the sensitivity and specificity of CESM in the diagnosis of breast cancer.

Introduction

Globally, female breast cancer has overtaken lung cancer as the leading cause of cancer death, making it the fifth most common cause of death. 1 From the mid-20th century, the incidence of breast cancer in women has been increasing slowly by about 0.5% per year. 2 At present, the diagnostic methods of breast cancer include MRI, full field digital mammography (FFDM) and ultrasound (US). MRI is the most sensitive examination in the diagnosis of breast cancer at present. 3 However, it has some disadvantages such as no claustrophobic and high price. In addition, although FFDM is an effective diagnostic method for breast cancer, it also has the hazard of recall and needs further testing. 4 Ultrasonography has good diagnostic efficacy for breast cancer, especially in women with dense breasts; however, it has a relatively low positive predictive value. 5 Contrast-enhanced spectral mammography (CESM), which visualises breast neovascularisation in a manner similar to MRI, is an emerging technology that uses iodine contrast agent. 6 CESM has the advantages of patient friendliness and low cost. Previous studies have shown that CESM has obvious advantages in displaying lesions compared with US. The advantage of CESM is that it can show changes in anatomy and local blood perfusion, which may be caused by tumour angiogenesis. 7 Moreover, CESM is useful in detecting the suspicious findings in routine breast imaging 7 and the sensitivity and specificity of CESM are different in different studies.

It has been reported that several meta-analyses have been conducted regarding the diagnostic performance of CESM for breast cancer; however, their pooled results were different and had several limitations. 8–11 On the one hand, the sensitivity and specificity differed across the above-mentioned meta-analyses. 8 10 11 On the other hand, the numbers of included studies were limited. In addition, partial meta-analyses included none-English studies and overlapped studies, which might affect their pooled results. In the past few years, several studies evaluating the diagnostic value of CESM in breast cancer have been published. Therefore, we conducted this meta-analysis using available evidence to comprehensively determine whether CESM is effective in detecting breast cancer in women.

Material and methods

As recommended by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), we conducted our study followed the PRISMA specification, 12 which met the requirements of diagnostic systematic review.

Search strategy

To evaluate the accuracy of CESM in diagnosing breast cancer, we retrieved the following databases: PubMed, Embase and Cochrane library. Two reviewers, JL and RX, independently searched the above databases up to the date of 18 June 2022. Our searching terms included ‘contrast-enhanced spectral mammography’, ‘Dual-Energy Contrast-Enhanced Spectral Mammography’, ‘CESM’, ‘contrast-enhanced digital mammography’, ‘CEDM’, ‘Breast Neoplasms’, ‘Breast Neoplasm’, ‘Breast Tumor’, ‘Breast Tumors’, ‘Breast Cancer’, ‘Malignant Neoplasm of Breast’, ‘Breast Malignant Neoplasm’, ‘Breast Carcinomas’, ‘Breast Carcinoma’, ‘breast mass’, ‘breast lesion’, ‘breast lesions’, ‘breast diseases’. In addition, the references of all the included studies were also reviewed.

Inclusion and exclusion criteria

Following is the list of inclusion criteria: (1) studies diagnosing breast cancer, (2) studies provided data on the sensitivity and specificity, (3) studies involving ≥10 patients or case, (4) English language and(5) prospective studies. Following is the list of exclusion criteria: (1) overlapped research, (2) commentaries, letters, editorials or abstracts or (3) studies referencing artificial intelligence and radiomics.

Study screening

The titles and abstracts of the literature in the electronic databases were initially screened by two authors, following the above criteria for inclusion and exclusion. Each of the two researchers screened two times to avoid omission. If there is any disagreement, the third author was consulted to decide. Eligibly downloaded full texts and further screened. First, if the authors and institutions of the study are the same, we will include the most recently published studies with the largest sample size. If the research institutions are the same, but the authors are different, we will send an email to the corresponding authors to ask. If we do not receive a reply, we will include the most recently published studies having the largest sample size.

Data abstraction

Two reviewers extracted data. If necessary, the difference shall be solved by the third reviewer. Each study was analysed for the following information: first author name, publication year, country, the numbers of patients and lesions, median age, the results of true positive (TP), false positive (FP), false negative (FN) and true negative (TN).

Quality assessment

The quality of the methodology included in the publication was assessed by the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2). 13 QUADAS-2 were mainly focused on the following four domains: patient selection, index test, reference standard and flow and timing, with minimal overlapping, which present the main quality of the diagnostic study. Each domain is assessed according to risk of bias, with the three domains assessed according to applicability. The risk of bias was considered low if the study met the above criteria and high otherwise. Disagreements between the two reviewers on quality assessment were resolved by consensus.

Statistical analysis

STATA V.14.0 was used for all analyses. I 2 measure was used to quantify the heterogeneity between studies. If there is no statistical heterogeneity, the fixed effect model is used to consolidate the data. On the contrary, the random effect model is used to summarise the data. The sensitivity was shown in the form TP/(TP+FN), where TP represents the number of true-positive results and FN represent the number of FN results. The specificity was shown in the form TN/(TN+FP), where TN represent the number of TN results and FP represent the number of FN results. 14 We also computed other significant measures on the evaluation of diagnostic experiments such as positive likelihood ratio (PLR) and negative likelihood ratio (NLR) and diagnostic OR (DOR). The summary receiver operating characteristic curve ROC (SROC) curve and the area under the curve (AUC) of the SROC curve were also computed.

Study characteristics

After a systematic search, we included 12 studies. 15–26 The complete selection process is in detail in PRISMA flowchart ( figure 1 ). From 544 screened studies, 85 studies were subjected to full text reading. The characteristics of all the 12 included studies are shown in table 1 . These 12 studies are all prospective studies published between 2014 and 2022. Most patients had US, mammography and related examinations before CESM examination. The dense breast we collected account for approximately two-thirds. In addition, the methodological quality assessment of all included studies was shown in online supplemental table 1 .

Supplemental material

  • View inline

Study characteristics of each included study

  • Download figure
  • Open in new tab
  • Download powerpoint

The figure shows the workflow for study screening and selection. CESM, contrast-enhanced spectral mammography.

Diagnostic accuracy of CESM

The sensitivity and specificity values were shown in Forest plots ( figure 2 ). A very high pooled test sensitivity of 0.97 (95% CI 0.92 to 0.98) was estimated. The pooled specificity was 0.76 (95% CI 0.64 to 0.85). The PLR was 4.03 (95% CI 2.65 to 6.11), NLR was 0.05 (95% CI 0.02 to 0.09) ( figure 3 ) and DOR was 89.49 (95% CI 45.78 to 174.92) ( online supplemental figure 1 ). I 2 values of sensitivity, specificity, PLR, NLR and DOR were 76.60%, 87.95%, 86.25%, 65.73% and 99.78%, respectively.

Forest plot of estimates of sensitivity and specificity for contrast-enhanced spectral mammography in the diagnosis of breast cancer.

Forest plot of estimates of positive likelihood ratio and negative likelihood ratio for contrast-enhanced spectral mammography in the diagnosis of breast cancer.

As shown in figure 4 , the SROC curve shows an AUC of 0.95 (0.93 to 0.97). CI is an interval estimation based on the average point estimation. The prediction interval is the interval estimation based on the individual value point estimation.

The plot shows the summary bivariate ROC curve for CESM diagnostic accuracy. AUC, area under the curve; CESM, contrast-enhanced spectral mammography; ROC, receiver operating characteristic curve; SENS, sensitivity; SPEC, specificity; SROC, summary receiver operating characteristic curve.

A confidence contour and a prediction contour were shown in the figure.

Fagan plots were drawn to understand the prior probability (current incidence) and the posterior probability (incidence estimated from this diagnostic experiment). In our sample, the pretest probability of malignancy was 50%, with a positive finding at CESM a post-test probability of 80% while a negative finding a post-test probability of 4% ( online supplemental figure 2 ).

Regression analysis

We analysed some covariates (number of lesions, number of patients, being dense breast or not, year of publication) possible influence on the diagnostic accuracy of CESM. The regression analysis showed that the sensitivity of the studies that only included dense breast was different from that of other studies, but both were high ( online supplemental figure 3 ). In addition, a limited number of studies were included, which reduced the reliability of the regression analysis.

Publication bias

A funnel plot drawn with Stata V.14.0 software was used to analyse the publication bias of the included studies ( online supplemental figure 4 ). The included studies were evenly distributed on both sides of the regression line, showing that the included literatures had no obvious publication bias (p=0.78).

CESM is emerging as a valuable tool for the diagnosis and staging of breast cancer. CESM combines the contrast enhancement effect caused by tumour neovascularisation with the information of anatomical changes. The lesions were highlighted by reciprocal subtraction of the images, which further increased the sensitivity of CESM for the diagnosis of breast cancer. It improves the accuracy in diagnosing breast cancer, providing more accurate tumour size and identification of multifocal disease, especially in patients with the dense type of breast. 27

Results showed that the pooled sensitivity (0.97, 95% CI 0.92 to 0.98) was higher and the pooled specificity (0.76, 95% CI 0.64 to 0.85) was slightly lower than a previous meta-analysis 9 which indicated a pooled sensitivity of 0.89 (95% CI 0.88 to 0.91) and a pooled specificity of 0.84 (95% CI 0.82 to 0.85). The reason for the high sensitivity may be that our study went through more rigorous study screening, included the latest literature, and CESM has been increasingly used in clinical practice in recent years. Another point is that all the studies we included are prospective studies, which are less susceptible to bias than retrospective studies. Another previous meta-analysis 8 has obtained that CESM has high sensitivity for the diagnosis of breast cancer, but it has low specificity. This may be due to the following reasons: three studies included by the meta-analysis were similar and written by the same first author; the meta-analysis only included eight studies and the pooled specificity were obtained by six literatures. All the reasons may result in some bias. However, during our screening, there are five studies from the same authors 15 28–31 and with similar results, we only included one in which the study type was prospective and with large sample size and longest time span.

In addition, compared with other studies, this study included the latest studies in recent years, and conducted a more rigorous article screening, with each of the two researchers screening two times.

The DOR is a common statistic in epidemiology that expresses the strength of the association between exposure and disease. 32 The diagnostic DOR for a test is the ratio of the odds of being positive in the disease to the odds of being positive in the non-disease. In our meta-analysis, the DOR was 89.49 (95% CI45.78 to 174.92), which was high. It indicated that if CESM showed a positive result, the probability of a true breast cancer being correctly diagnosed was 89.49 to 1. DOR offers considerable advantages in a meta-analysis of diagnostic studies by combining results from different studies into a more precise pooled estimate. The I 2 statistic, also known as the inconsistency index, is a measure of heterogeneity or variability across studies in a meta-analysis. It quantifies the proportion of total variation in effect estimates that is due to heterogeneity rather than chance. Differences in study populations: the studies included in the meta-analysis may have varied in terms of patient characteristics, such as age, mammary gland type, disease severity or comorbidities. These differences can contribute to heterogeneity in the estimated DOR. Clinical and contextual factors: heterogeneity in DOR can also arise from differences in the clinical context, such as variations in disease prevalence, healthcare settings or geographic locations.

The SROC curve method takes into account the possible heterogeneity of thresholds. 33 The SROC indicates the relationship between the TP rate and FP rate at different diagnostic thresholds. 34 In general, the AUC of a diagnostic method between 0.5 and 0.7 means low accuracy, 0.7 and 0.9 means good accuracy, above 0.9 high accuracy. The SROC curve shows an AUC of 0.95, indicating high accuracy.

The study of Hobbs et al 35 reminds of that patients’ preferences for CESM will provide further evidence supporting the adoption of CESM as an alternative to ce-MRI in selected clinical indications, if diagnostic non-inferiority of CESM is confirmed. Ferranti et al 25 suggested that CESM may provide compensation for MRI through a slight FN tendency. Furthermore, Clauser et al 36 thought the specificity of CESM is higher than that of MRI. CEM determines breast cancer based on tumour angiogenesis assessment. 24 Growth factors secreted by cancer cells promote the formation of new blood vessels during division and proliferate to tumour cells. It is because of the increased vascular endothelial cell gap and permeability that the contrast in the tumour area is enhanced. CESM may combine the high sensitivity of MRI with the low cost and availability of FFDM. 37

However, there are some limitations in the study. First, primary source participants were all patients with lesions diagnosed by breast US or mammography. This may induce a selection bias. Second, the majority of the main participants were with dense breast. This point, while highlighting the superiority of CESM over dense breast examination, may still be subject to some bias. Third, due to the excessive number of retrieved literatures, we only included prospective studies and studies writing in English. In this way, some reliable studies and results may be missed.

The CESM has high sensitivity and good specificity when it comes to evaluating breast cancer, particularly in women with dense breasts. Thus, provide more information for clinical diagnosis and treatment.

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

  • Siegel RL , et al
  • Siegel RL ,
  • Miller KD ,
  • Fuchs HE , et al
  • Poortmans P ,
  • Morrow M , et al
  • Lobbes MBI ,
  • Jochelson MS ,
  • Tagliafico AS ,
  • Bignotti B ,
  • Rossi F , et al
  • Huang J-M ,
  • Zhang K , et al
  • Pesapane F ,
  • Agazzi GM , et al
  • McInnes MDF ,
  • Thombs BD , et al
  • Whiting PF ,
  • Rutjes AWS ,
  • Westwood ME , et al
  • Psoter KJ ,
  • Roudsari BS ,
  • Dighe MK , et al
  • Luczyńska E ,
  • Heinze-Paluchowska S ,
  • Dyczek S , et al
  • Mokhtar O ,
  • Wang L , et al
  • Akashi-Tanaka S ,
  • Suzuki S , et al
  • Sun B , et al
  • El Ghany EA
  • Pan Y , et al
  • Petrillo A ,
  • Vallone P , et al
  • Mohamed SAS ,
  • Moftah SG ,
  • Chalabi NAEM , et al
  • Sannapareddy K ,
  • Potlapalli A , et al
  • Ferranti FR ,
  • Vasselli F ,
  • Barba M , et al
  • Abu Samra MF ,
  • Ibraheem MA , et al
  • Adamczyk A , et al
  • Łuczyńska E ,
  • Hendrick E , et al
  • Luczynska E ,
  • Niemiec J ,
  • Ambicka A , et al
  • Lijmer JG ,
  • Prins MH , et al
  • Takwoingi Y ,
  • Riley RD , et al
  • Taylor DB ,
  • Buzynski S , et al
  • Clauser P ,
  • Baltzer PAT ,
  • Kapetas P , et al
  • Bicchierai G ,
  • Tonelli P ,
  • Piacenti A , et al

Contributors JL and RX designed the study. SZou and YH gathered data. JL and SZhen performed the analysis. HY and DH revised it critically for important intellectual content. DH acted as guarantor. All authors contributed to the article and approved the submitted version.

Funding This work has received funding by the Henan Medical Science and Technology Research Program (LHGJ20210498,LHGJ20230528).

Competing interests None declared.

Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

Repetitive Behaviors In Autism And OCD: A Systematic Review

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

Learn about our Editorial Process

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Autism and obsessive-compulsive disorder ( OCD ) share notable similarities in repetitive behaviors, presenting challenges for differential diagnosis and treatment planning. Both conditions can involve various types of repetitive behaviors, including:

  • Ritualized routines: Following strict patterns or sequences in daily activities.
  • Repetitive movements: Such as hand-flapping in autism or compulsive touching in OCD.
  • Intense interests : Preoccupation with specific topics or objects, though the nature and function may differ between conditions.
  • Ordering and arranging: Needing items to be in a particular order or symmetry .
  • Repetitive thoughts: Persistent, intrusive thoughts in OCD or fixations on specific topics in autism.
  • Verbal repetition: Echolalia in autism or repetitive phrases in OCD.

These behaviors can vary in intensity and presentation between individuals and across the lifespan. Additionally, autism and OCD often co-occur, with over 17% of autistic individuals also meeting criteria for OCD, further complicating the clinical picture.

A close up of someone's hand popping bubble wrap.

  • The study found considerable overlap in the expression and content of repetitive behaviors between autistic individuals and those with OCD, making differential diagnosis challenging.
  • No significant differences were generally found in overall intensity of autism-related restricted and repetitive behaviors and interests (RRBIs) between autistic individuals, those with OCD, and those with both conditions.
  • Individuals with OCD typically showed greater overall obsessive-compulsive symptom severity compared to autistic individuals without OCD.
  • Mixed findings were reported regarding the content of repetitive behaviors, with some studies finding differences in specific types of behaviors (e.g., hoarding, checking) between groups.
  • Factors associated with repetitive behaviors included neurological differences, executive functioning, social/communication factors, and age, though findings were often mixed or limited.
  • The research was limited by a lack of diversity in sampling, with underrepresentation of certain groups (e.g., non-White individuals, those with intellectual impairments).
  • The study highlights the importance of understanding the function and experience of repetitive behaviors, rather than just their outward expression, for accurate differential diagnosis and treatment planning.

This systematic review aimed to examine similarities and differences in repetitive behaviors between autistic individuals and those with obsessive-compulsive disorder (OCD).

The rationale for this study stems from the diagnostic overlap and high co-occurrence rates between autism and OCD, as well as the phenotypic similarities in repetitive behaviors observed in both conditions (van Steensel et al., 2011).

Previous research has highlighted challenges in differentiating between repetitive behaviors attributable to autism versus those related to OCD, which can impact accurate diagnosis and appropriate treatment planning (Paula-Pérez, 2013; Jiujias et al., 2017).

While previous narrative reviews have begun to explore this topic (Paula-Pérez, 2013; Jiujias et al., 2017), a systematic approach was needed to provide a comprehensive and up-to-date synthesis of the quantitative literature.

This study aimed to address this gap by systematically examining similarities and differences in repetitive behaviors across these populations in terms of expression, content, and associated factors.

The study followed the Preferred Reporting for Items for Systematic Reviews and Meta-Analyses (PRISMA) statement guidelines.

A systematic search was conducted across five electronic databases: Science Direct, PsycINFO, Web of Science, Scopus, and CINAHL.

Search strategy and terms

Search terms were grouped by target population (autism and OCD) and behavior of interest (RRBIs and compulsive behaviors).

The initial search was conducted on August 18, 2022, with an updated search on June 30, 2023.

Inclusion and exclusion criteria

Inclusion criteria were:

  • Participants of any age with autism and/or OCD diagnosis
  • Quantitative study design
  • Included a measure of RRBIs and/or compulsive behaviors
  • Comparison between diagnostic groups
  • Peer-reviewed and full-text articles
  • Written in English

Exclusion criteria included non-human research, participants without formal diagnoses, single clinical population studies, and qualitative studies.

Statistical measures

The review employed both quantitative and narrative synthesis of the data. Categories of behavior were determined based on the measures used to assess repetitive behaviors across studies.

Expression of Repetitive Behaviors:

  • Most studies (91%) found no significant differences in overall intensity of RRBIs between autistic individuals, those with OCD, and those with both conditions.
  • The majority of studies (86%) reported individuals with OCD had significantly greater total obsessive-compulsive symptom severity compared to autistic individuals without OCD.
  • Studies comparing individuals with OCD to those with both conditions generally found no significant differences in total obsessive-compulsive symptom severity (67% of studies).

Content of Repetitive Behaviors:

  • Mixed findings were reported for the content of autism-related RRBIs, with limited studies examining this aspect.
  • Counting and repeating compulsions were generally similar across groups.
  • Checking compulsions were more frequently reported in individuals with OCD.
  • Hoarding compulsions were more frequently endorsed by autistic adults compared to adults with OCD.
  • Mixed findings were reported for washing/cleaning and ordering/arranging compulsions.

Factors Associated with Repetitive Behaviors:

  • Neurological factors: Some studies associated the intensity of repetitive behaviors with differences in glutamate concentration or striatal activity, though findings were limited.
  • Executive functioning: Challenges in executive functioning were associated with greater intensity of repetitive behaviors in autistic children, but not consistently in children with OCD.
  • Social and communication factors: Some studies found associations between social difficulties and obsessive-compulsive symptom severity, though findings were limited.
  • Psychological/experiential factors: Limited studies reported on these factors, generally finding no differences in metacognitive beliefs or control associated with repetitive behaviors between groups.
  • Age: One study found sameness behavior to be more prevalent in younger children with OCD, but not in autistic children.

This systematic review provides a comprehensive analysis of the current literature on repetitive behaviors in autism and OCD, highlighting the considerable overlap in behavioral presentations between these conditions.

The findings suggest that while some distinctions can be made at a group level, particularly in terms of overall obsessive-compulsive symptom severity, differentiating between autism and OCD based solely on the expression or content of repetitive behaviors remains challenging.

The study extends previous research by systematically examining a wide range of factors associated with repetitive behaviors, including neurological, executive functioning, and social/communication factors.

However, the mixed and often limited findings in these areas underscore the need for further research to better understand the underlying mechanisms driving repetitive behaviors in each condition.

A key insight from this review is the importance of considering the function and experience of repetitive behaviors, rather than just their outward expression, for accurate differential diagnosis and treatment planning.

This aligns with previous suggestions by Paula-Pérez (2013) regarding the potential importance of emotional valence in differentiating between autism and OCD-related repetitive behaviors.

Future research directions could include:
  • More in-depth examinations of the content of autism-related RRBIs between groups.
  • Studies focusing on the underlying motivations and functions of repetitive behaviors in each condition.
  • Longitudinal studies tracking changes in repetitive behaviors over time and their associations with various factors.
  • Research including more diverse samples, particularly individuals with intellectual impairments and from diverse ethnic backgrounds.
The study had several methodological strengths:
  • Adherence to PRISMA guidelines for systematic reviews.
  • Comprehensive search strategy across multiple databases.
  • Inclusion of a wide range of studies examining various aspects of repetitive behaviors.
  • Rigorous quality assessment of included studies.
  • Consideration of both autism and OCD-specific measures of repetitive behaviors.
  • Examination of associated factors beyond just behavioral presentations.

Limitations

This study also had several methodological limitations, including:
  • Limited diversity in sampling, with underrepresentation of non-White individuals and those with intellectual impairments.
  • Exclusion of grey literature and non-English publications, potentially introducing bias.
  • Lack of input from individuals with lived experience of autism or OCD in the included studies.
  • Heterogeneity in measures used across studies, making direct comparisons challenging.
  • Limited examination of sex and gender differences in repetitive behaviors.

These limitations impact the generalizability of the findings and highlight the need for more diverse and inclusive research in this area.

Implications

The findings of this review have significant implications for clinical practice and research:
  • Differential diagnosis: The considerable overlap in repetitive behaviors between autism and OCD suggests that clinicians should not rely solely on the presence or intensity of these behaviors for diagnosis. A more comprehensive assessment considering the function, motivation, and subjective experience of repetitive behaviors is necessary.
  • Treatment planning: Understanding the similarities and differences in repetitive behaviors across these conditions can inform more tailored interventions. For instance, treatments targeting executive functioning may be more relevant for autism-related repetitive behaviors.
  • Comorbidity considerations: The high rates of co-occurrence between autism and OCD highlight the importance of screening for both conditions in clinical settings, even when one diagnosis is already established.
  • Research directions: The review underscores the need for more nuanced research examining specific subtypes of repetitive behaviors and their underlying mechanisms in both autism and OCD.
  • Inclusive practices: The limitations in sampling diversity emphasize the importance of more inclusive research practices to ensure findings are generalizable across different populations.

Primary reference

O’Loghlen, J., McKenzie, M., Lang, C., & Paynter, J. (2024). Repetitive behaviors in autism and obsessive-compulsive disorder: A systematic review.  Journal of Autism and Developmental Disorders.  Advance online publication.  https://doi.org/10.1007/s10803-024-06357-8

Other references

Jiujias, M., Kelley, E., & Hall, L. (2017). Restricted, repetitive behaviors in autism spectrum disorder and obsessive–compulsive disorder: A comparative review. Child Psychiatry and Human Development, 48 (6), 944–959. https://doi.org/10.1007/s10578-017-0717-0

Paula-Pérez, I. (2013). Differential diagnosis between obsessive compulsive disorder and restrictive and repetitive behavioural patterns, activities and interests in autism spectrum disorders. Revista De Psiquiatría Y Salud Mental (English Edition), 6 (4), 178–186. https://doi.org/10.1016/j.rpsmen.2012.07.006

van Steensel, F. J. A., Bögels, S. M., & Perrin, S. (2011). Anxiety disorders in children and adolescents with autistic spectrum disorders: A meta-analysis. Clinical Child and Family Psychology Review, 14 (3), 302–317. https://doi.org/10.1007/s10567-011-0097-0

Keep Learning

  • How might the ego-syntonic vs. ego-dystonic nature of repetitive behaviors in autism and OCD, respectively, influence treatment approaches?
  • What ethical considerations should be taken into account when conducting research on repetitive behaviors in autistic individuals or those with OCD?
  • How might cultural factors influence the expression and interpretation of repetitive behaviors in different populations?
  • Given the overlap in repetitive behaviors between autism and OCD, how might this impact our understanding of these conditions as distinct diagnostic categories?
  • What role might sensory processing differences play in the manifestation of repetitive behaviors in autism versus OCD?

Print Friendly, PDF & Email

  • Search Menu
  • Sign in through your institution
  • Advance articles
  • Grand Theme Reviews
  • Author Guidelines
  • Submission Site
  • Reasons to Publish
  • Open Access
  • About Human Reproduction Update
  • About the European Society of Human Reproduction and Embryology
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Dispatch Dates
  • Terms and Conditions
  • Contact ESHRE
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

Introduction, supplementary data, data availability, authors’ roles, conflict of interest.

  • < Previous

Barriers and enablers to a healthy lifestyle in people with infertility: a mixed-methods systematic review

ORCID logo

Lisa Moran and Stephanie Cowan are joint senior authors.

  • Article contents
  • Figures & tables
  • Supplementary Data

Sophia Torkel, Rui Wang, Robert J Norman, Lijun Zhao, Kai Liu, Dana Boden, Wentong Xu, Lisa Moran, Stephanie Cowan, Barriers and enablers to a healthy lifestyle in people with infertility: a mixed-methods systematic review, Human Reproduction Update , Volume 30, Issue 5, September-October 2024, Pages 569–583, https://doi.org/10.1093/humupd/dmae011

  • Permissions Icon Permissions

While there is a recognized role of optimizing lifestyle (diet and physical activity) behaviours in the management of infertility, the best practice remains unknown and factors influencing the lifestyle of people with infertility are not well understood.

This systematic review evaluated barriers and enablers to a healthy lifestyle in people with infertility, from the perspectives of people with infertility and health professionals, in order to inform optimal behavioural change strategies.

Ovid MEDLINE(R), PsycINFO, EMBASE, EBM Reviews, and CINAHL were searched from inception to 28 August 2023. Eligible studies were qualitative and quantitative primary studies that explored barriers and/or enablers to lifestyle for infertility management. Quality assessment was performed using the Centre for Evidence-Based Management Critical Appraisal of a Survey Tool and the Critical Appraisal Skills Programme Qualitative Checklist. Data were analysed by thematic analysis with themes mapped to the Capability, Opportunity, Motivation and Behaviour (COM-B) model and Theoretical Domains Framework (TDF).

After screening 12 326 abstracts and 99 full-texts, 27 studies were included (12 quantitative, 6 qualitative and 9 mixed-methods) with 22 studies of women with infertility (n = 2524), 11 studies of men with infertility (n = 1407), and 6 studies of health professionals (n = 372). We identified barriers and enablers relating to capability (e.g. strategies for behaviour change), opportunity (e.g. limited time, resources, and money), and motivation (e.g. interplay between lifestyle and emotional state). Based on the identified themes, suggested intervention components to integrate into lifestyle management of infertility include facilitating development of self-management skills to support lifestyle change (e.g. self-monitoring, action planning, and goal setting) and incorporating mental health strategies (e.g. providing information about the benefits of healthy lifestyle behaviours for mental health and encouraging patients to reframe healthy lifestyle behaviours as self-care strategies).

The findings have identified important factors that influence lifestyle management in people with infertility and have suggested relevant intervention components to consider when designing interventions. Given the paucity of qualitative studies identified, more research is needed to further understand the complex and interacting factors that shape lifestyle during the fertility journey.

Several interacting factors influence lifestyle management of infertility, and these factors can be targeted to optimize interventions.

Several interacting factors influence lifestyle management of infertility, and these factors can be targeted to optimize interventions.

Infertility, defined as the inability to achieve a clinical pregnancy after 12 months of regular, unprotected sexual intercourse, affects up to 186 million individuals globally ( Inhorn and Patrizio, 2015 ; Zegers-Hochschild et al. , 2017 ). Couples with infertility experience increased psychological stress and reduced quality of life ( Chachamovich et al. , 2010 ; Greil et al. , 2011 ). Existing medical treatments for infertility have significant physical, psychological, and financial burdens, which could lead to treatment discontinuation before achieving a live birth ( Gameiro et al. , 2012 ). Additionally, the live birth rate per cycle of IVF is ∼25–40% ( Chambers et al. , 2021 ), highlighting a crucial need to investigate the role of modifiable factors that can influence fertility. A healthy lifestyle is recommended in multiple clinical guidelines for infertility ( Practice Committee of the American Society for Reproductive Medicine and the Practice Committee of the Society for Reproductive Endocrinology and Infertility, 2022 ; Romualdi et al. , 2023 ; Teede et al. , 2023 ).

Preconception lifestyle factors positively associated with improved fertility outcomes following ART include optimal physical activity ( Rao et al. , 2018 ) and diet ( Kellow et al. , 2022 ; Sanderman et al. , 2022 ; Alesi et al. , 2023 ; Winter et al. , 2023 ) and having a BMI within the healthy weight range ( Oostingh et al. , 2019 ; Purewal et al. , 2019 ; Sermondade et al. , 2019 ). Conversely, preconception lifestyle factors, which decrease fertility include tobacco use and heavy alcohol use ( Augood et al. , 1998 ; Oostingh et al. , 2019 ). To support the development of successful lifestyle interventions for infertility, there is a need to examine factors influencing the lifestyle behaviours of people living with infertility and understand perspectives of health professionals working in infertility management. People with infertility experience unique challenges during the preconception period. There is a widespread use and interest in adjuncts to fertility treatment, including lifestyle management, to aid conception ( Vincent and Furnham, 1996 ; Smith et al. , 2010 ). However, time, financial, and mental health impacts of fertility treatment may reduce the capacity of individuals to engage in healthy lifestyle behaviours ( Gameiro et al. , 2012 ; Wu et al. , 2013 ). Health professionals working in fertility clinics also experience multiple stressors including a high workload and concerns about patient expectations of care ( Boivin et al. , 2017 ), as potential barriers to effectively promoting lifestyle management. No previous systematic reviews have examined barriers and enablers to lifestyle change in people with infertility from the perspectives of people with infertility and health professionals. Understanding these barriers and enablers ensures a targeted approach that can optimize intervention effectiveness.

The Behaviour Change Wheel (BCW) is a systematic and evidence-based method used to ensure interventions address key barriers and enablers ( Michie et al. , 2011 ). In the BCW method, barriers and enablers are classified using the Capability, Opportunity, Motivation and Behaviour (COM-B) model, which considers the individual and contextual factors shaping behaviours and posits that behaviours are influenced by three interacting components: capability, opportunity, and motivation ( Michie et al. , 2011 ). The Theoretical Domains Framework (TDF) further subdivides each component of the COM-B model and consists of 14 domains ( Atkins et al. , 2017 ). Behaviour change is then elicited via behaviour change techniques (BCTs) mapped to these domains ( Michie et al. , 2013 ). Analysing barriers and enablers to lifestyle change using these frameworks allows for the systematic development of suitable interventions ( Michie et al. , 2011 ).

The aim of this mixed-methods systematic review was to evaluate barriers and enablers to lifestyle change in people with infertility, from perspectives of people with infertility and health care professionals, using the COM-B model and TDF in order to inform optimal behavioural change strategies.

This review was prospectively registered in the Prospective Register of Systematic Reviews (PROSPERO) (CRD42022359424) and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement ( Page et al. , 2021 ) and the Enhancing transparency in reporting the synthesis of qualitative research (ENTREQ) statement ( Tong et al. , 2012 ).

Search strategy

A comprehensive search strategy was developed utilizing free-text words and subject headings. The following databases were searched from inception to 28 August 2023: Ovid MEDLINE(R), PsycINFO, EMBASE, and EBM Reviews. CINAHL Plus was searched from inception to 12 September 2022 and CINAHL Complete was searched from 12 September 2022 to 28 August 2023, due to an upgrade in our access. The search strategy reflected three key concepts: eligible populations (terms related to infertility), eligible interventions (terms related to lifestyle interventions, defined as diet, physical activity, and/or behavioural interventions), and eligible outcomes (terms related to barriers, enablers, experiences, and/or perceptions). Search terms within the same concept were combined using the ‘or’ Boolean operator, and the three concepts were combined using the ‘and’ Boolean operator. The full search strategy is available in Supplementary Table S1 . Additional methods of study identification included expert referral and handsearching of references of included studies and relevant systematic reviews.

Study selection

Details of inclusion and exclusion criteria are available in Supplementary Table S2 . Studies were eligible if they reported on barriers, enablers, experiences, and/or perceptions in relation to lifestyle change in people with infertility from perspectives of people with infertility and/or health professionals involved in infertility management. Qualitative, quantitative, and mixed-methods primary studies were eligible in any language. Narrative reviews, letters, editorials, commentaries, and systematic reviews were ineligible. Conference abstracts were only included if raw data were available from study authors. For studies published as conference abstracts only, or with insufficient information to determine eligibility, study authors were contacted via email once to request further information. If sufficient information to determine eligibility remained unavailable, the study was not included and categorized as ‘Awaiting Classification’. All screening was performed by two independent reviewers with this distributed among seven reviewers for title and abstract (D.B., K.L., L.M., L.Z, S.C., S.T., and W.X.) and four reviewers for full-text screening (D.B., S.C., S.T., and W.X.). Eligibility assessment was performed using Covidence. Conflicts were resolved by discussion until consensus was achieved.

Quality assessment

For qualitative studies, the Critical Appraisal Skills Programme (CASP) Qualitative Checklist was used for quality assessment ( Critical Appraisal Skills Programme, 2018 ). For quantitative studies, the Centre for Evidence-Based Management (CEBMa) critical appraisal of a survey tool was used ( Center for Evidence Based Management, 2014 ), regardless of study design. The use of this tool for all quantitative studies allowed for comparison between studies and was deemed suitable because our review did not aim to examine intervention effects. Both tools prompt evaluation of both internal and external validity, and facilitate appraisal of methodological quality and transparency of reporting. For all studies, quality assessment was performed by two independent reviewers (S.T. for all studies and L.Z, K.L, or W.X as the second reviewer). Discrepancies were resolved by discussion or referral to a third reviewer (L.M.). Studies were assessed as low quality if they met <65% of the criteria, moderate quality if they met 65–90%, and high quality if they met >90%. Although CASP and CEMBa do not assign a scoring system for these quality assessment tools, our scoring system is similar to prior systematic reviews that used these tools ( Goossens et al. , 2018 ; Njau et al. , 2019 ).

Data extraction

Data were extracted from all included English language studies by one reviewer (S.T.) with verification by two reviewers (L.M. and S.C.). For the two included studies published in Chinese, data were extracted by two reviewers (K.L. and S.T.). Data extracted included author, year of publication, country, setting, study type, data collection method, aims, sample size, inclusion criteria, exclusion criteria, response rate and participant demographic characteristics.

Data analysis and synthesis

Data analyses were conducted using the convergent integrated approach to mixed-methods systematic reviews ( Stern et al. , 2020 ), by which quantitative data that answered the review question were first converted to textual descriptions and then analysed along with qualitative data. This approach to developing a preliminary synthesis of data is used to explore relationships across and between different studies with different methodology ( Popay et al. , 2006 ; Stern et al. , 2020 ). The results section of qualitative studies and textualized data from quantitative and mixed-methods studies were imported into NVivo 20.3 and thematically synthesized according to previously published methods for systematic reviews of qualitative research ( Thomas and Harden, 2008 ). Codes were inductively generated, organized, and refined until subthemes and themes emerged through an iterative process. Perspectives of people with infertility and health professionals were analysed and presented separately.

Themes were deductively mapped and tabulated to the COM-B and TDF and intervention functions and policy categories as guided by the BCW ( Michie et al. , 2011 ; Atkins et al. , 2017 ). Three reviewers (S.T., S.C., and L.M.) worked collaboratively in data analysis and mapping of themes. The identified themes were discussed and agreed upon by all reviewers. Any discrepancies over mapping of themes to domains were resolved by discussion. All three reviewers were dietitians and two were experienced qualitative researchers (S.C. and L.M.). Suggested intervention components were developed in accordance with corresponding intervention functions and policy categories.

The study selection process is illustrated in Fig. 1 . The database searches returned 15 986 records; after the removal of 3660 duplicates, 12 326 titles and abstracts and 99 full-texts were screened. Finally, 27 studies (28 reports) were included (25 studies published in English and 2 studies in Chinese). Two studies were published as conference abstracts ( Simon et al. , 2013 ; Uraz et al. , 2019 ) and unpublished data provided by study authors were included. Exclusion reasons and citations for studies excluded based on full-text are presented in Supplementary Table S3 .

Flowchart of study selection.

Flowchart of study selection.

Study characteristics

Characteristics of included studies are presented in Table 1 , and detailed information on demographic characteristics of participants are presented in Supplementary Tables S4 and S5 . Of the included studies, 12 were quantitative, 6 were qualitative, and 9 were mixed-methods. There were 22 studies that reported on perspectives of women with infertility (n = 2524) and 11 studies that reported on perspectives of men with infertility (n = 1407), while 6 studies reported on perspectives of health professionals (n = 372), of whom the majority were medical doctors (n = 224, 60%) or nurses/midwives (n = 115, 31%).

Characteristics of included studies.

PopulationAuthors and yearCountryStudy typeData collection methodsSample (n)
People with infertility only BelgiumMixed-methods Women (14) and men (3) with infertility undergoing IVF
USAQuantitativeSurveysWomen (150) with infertility attending a fertility clinic
USAQualitativeInterviewsWomen (7) with infertility with current or previous IVF use
Belgium and The NetherlandsQualitativeIn-depth interviewsHeterosexual couples (10) and women (2) with infertility who were attempting natural conception after a fertility work-up
IndiaQuantitativeSurveysWomen (332) with infertility who have not used ART
AustraliaMixed-methodsStructured interviewsHeterosexual couples (10) attending a fertility clinic
AustraliaQuantitativeSurveysHeterosexual couples (23) attending a fertility clinic
UKQuantitativeSurveysMen (1149) attending a hospital andrology department for infertility investigation
The NetherlandsQuantitativeSurveysWomen (289) with infertility and BMI ≥29 kg m who were randomized to the intervention group of the LIFEstyle RCT
USAMixed-methods Women (5) with infertility who had completed cancer treatment at least 6 months before the study and had resumed menstruation
IranQualitativeSemi-structured interviewsHeterosexual couples (25) with primary infertility undergoing ART
ChinaQuantitativeSurveysWomen (573) with infertility attending their first appointment at a hospital fertility clinic
ChinaQuantitativeSurveysMen (120) with male-factor infertility
CanadaMixed-methods Women (49) scheduled to begin IVF at a public fertility clinic
UKQualitativeIn-depth interviewsHeterosexual couples (25) attending a hospital fertility clinic for the first time
CanadaQualitativeDyadic interviewsHeterosexual couples (32) seeking medical treatment for infertility
USAQualitativeIn-depth interviewsWomen (40) with infertility and BMI ≥35 kg m seeking fertility treatment
USAQuantitativeSurveysWomen (148) with infertility attending a hospital fertility centre
USAQuantitativeSurveysWomen (9) with infertility and BMI ≥27 kg m attending a fertility clinic
CanadaMixed-methodsSurveysWomen (13) and men (6) with infertility attending a fertility clinic
CanadaQuantitativeSurveysWomen (400) with infertility presenting to a university fertility clinic for the first time
People with infertility and health professionals IranQuantitativeSurveysWomen (220) with infertility who had undergone at least 3 months of fertility treatment, nutritionists (10) and infertility specialists (5)
The NetherlandsMixed-methodsSurveysWomen (130) on the waiting list for ART and nurses (7) working in a preconception clinic for these women
The NetherlandsMixed-methods Women (18) and men (4) with an indication for ART + health professionals (9) (gynaecologist (1), midwife (1), general practitioner (1), fertility doctor (1), preventive health care centre physician (1), dietitian (1), medical advisor from a health insurance company (1), representative of the municipality of Rotterdam (1), and representative of the Dutch association of parent and patient organizations (1))
Health professionals only BelgiumMixed-methodsSurveysHealth professionals (111) (nurses/midwives (50), gynaecologists (42), embryologists (13), psychologists (4), scientists (2)) who interact with patients and work in registered Belgian fertility clinics
AustraliaMixed-methodsSurveysNurses (53) who manage IVF clinics
USAQuantitativeSurveysObstetrics and gynaecology residents (177)
PopulationAuthors and yearCountryStudy typeData collection methodsSample (n)
People with infertility only BelgiumMixed-methods Women (14) and men (3) with infertility undergoing IVF
USAQuantitativeSurveysWomen (150) with infertility attending a fertility clinic
USAQualitativeInterviewsWomen (7) with infertility with current or previous IVF use
Belgium and The NetherlandsQualitativeIn-depth interviewsHeterosexual couples (10) and women (2) with infertility who were attempting natural conception after a fertility work-up
IndiaQuantitativeSurveysWomen (332) with infertility who have not used ART
AustraliaMixed-methodsStructured interviewsHeterosexual couples (10) attending a fertility clinic
AustraliaQuantitativeSurveysHeterosexual couples (23) attending a fertility clinic
UKQuantitativeSurveysMen (1149) attending a hospital andrology department for infertility investigation
The NetherlandsQuantitativeSurveysWomen (289) with infertility and BMI ≥29 kg m who were randomized to the intervention group of the LIFEstyle RCT
USAMixed-methods Women (5) with infertility who had completed cancer treatment at least 6 months before the study and had resumed menstruation
IranQualitativeSemi-structured interviewsHeterosexual couples (25) with primary infertility undergoing ART
ChinaQuantitativeSurveysWomen (573) with infertility attending their first appointment at a hospital fertility clinic
ChinaQuantitativeSurveysMen (120) with male-factor infertility
CanadaMixed-methods Women (49) scheduled to begin IVF at a public fertility clinic
UKQualitativeIn-depth interviewsHeterosexual couples (25) attending a hospital fertility clinic for the first time
CanadaQualitativeDyadic interviewsHeterosexual couples (32) seeking medical treatment for infertility
USAQualitativeIn-depth interviewsWomen (40) with infertility and BMI ≥35 kg m seeking fertility treatment
USAQuantitativeSurveysWomen (148) with infertility attending a hospital fertility centre
USAQuantitativeSurveysWomen (9) with infertility and BMI ≥27 kg m attending a fertility clinic
CanadaMixed-methodsSurveysWomen (13) and men (6) with infertility attending a fertility clinic
CanadaQuantitativeSurveysWomen (400) with infertility presenting to a university fertility clinic for the first time
People with infertility and health professionals IranQuantitativeSurveysWomen (220) with infertility who had undergone at least 3 months of fertility treatment, nutritionists (10) and infertility specialists (5)
The NetherlandsMixed-methodsSurveysWomen (130) on the waiting list for ART and nurses (7) working in a preconception clinic for these women
The NetherlandsMixed-methods Women (18) and men (4) with an indication for ART + health professionals (9) (gynaecologist (1), midwife (1), general practitioner (1), fertility doctor (1), preventive health care centre physician (1), dietitian (1), medical advisor from a health insurance company (1), representative of the municipality of Rotterdam (1), and representative of the Dutch association of parent and patient organizations (1))
Health professionals only BelgiumMixed-methodsSurveysHealth professionals (111) (nurses/midwives (50), gynaecologists (42), embryologists (13), psychologists (4), scientists (2)) who interact with patients and work in registered Belgian fertility clinics
AustraliaMixed-methodsSurveysNurses (53) who manage IVF clinics
USAQuantitativeSurveysObstetrics and gynaecology residents (177)

NR, not reported.

Sample characteristics are from the subset of the study sample, which provided data relevant to our review question.

Details of quality assessment are shown in Supplementary Tables S6 and S7 . Overall, the study quality was low to moderate. Of the 15 qualitative and mixed-methods studies: all had a clear statement of aims and appropriate selection of a qualitative methodology, but none adequately considered the relationship between the researcher and participant. Of the 21 quantitative and mixed-methods studies, all had a clear research question and appropriate study design, but less than half used valid and reliable surveys (8/21), had a representative sample (6/21), or based their sample size on power calculations (4/21).

Themes on barriers and enablers to a healthy lifestyle during infertility, from the perspectives of people with infertility and health professionals

Thematic analysis identified 17 themes for both patients and health professionals, mapped to 5 COM-B components and 12 TDF domains ( Table 2 ). Detailed information on themes, subthemes, supporting examples, and data sources (relative contribution of quantitative and qualitative data) are provided in Supplementary Tables S8 and S9 .

Themes on barriers and enablers to a healthy lifestyle in people with infertility, from the perspectives of people with infertility and health professionals.

COM-BPeople with infertilityHealth professionals
CapabilityPsychologicalVarying levels of understanding of the relationship between lifestyle and fertility (B, E) Health professionals’ knowledge, skills and training (B, E)
Strategies for behaviour change (B, E) Health professional identification of appropriate content and recipients for lifestyle education (B, E)
OpportunityPhysicalLimited time, resources, and money (B)
Mode of delivery (B, E)
Reshaping the food environment (B, E)
Unmet needs from health professional led lifestyle advice (B)
SocialSupport from partner, peers, and health professionals (B, E)
Catering for others (B)
MotivationReflectiveUnderstanding the importance and consequences of lifestyle change (B, E)
Confidence in skills required to follow a healthy lifestyle (B, E) Belief that patients do not want health professional-led lifestyle education (B)
Considering the stage of change (B) Professional responsibility (B, E)
AutomaticInterplay between lifestyle and emotional state (B, E) Providing lifestyle intervention is rewarding (E)
COM-BPeople with infertilityHealth professionals
CapabilityPsychologicalVarying levels of understanding of the relationship between lifestyle and fertility (B, E) Health professionals’ knowledge, skills and training (B, E)
Strategies for behaviour change (B, E) Health professional identification of appropriate content and recipients for lifestyle education (B, E)
OpportunityPhysicalLimited time, resources, and money (B)
Mode of delivery (B, E)
Reshaping the food environment (B, E)
Unmet needs from health professional led lifestyle advice (B)
SocialSupport from partner, peers, and health professionals (B, E)
Catering for others (B)
MotivationReflectiveUnderstanding the importance and consequences of lifestyle change (B, E)
Confidence in skills required to follow a healthy lifestyle (B, E) Belief that patients do not want health professional-led lifestyle education (B)
Considering the stage of change (B) Professional responsibility (B, E)
AutomaticInterplay between lifestyle and emotional state (B, E) Providing lifestyle intervention is rewarding (E)

B, barriers; COM-B, Capability, Opportunity, Motivation and Behaviour; E, enablers.

Quantitative data contributed to this theme.

Qualitative data contributed to this theme.

Themes relating to capability

Varying levels of understanding of the relationship between lifestyle and fertility.

The majority of people with infertility were aware that unhealthy diet ( Homan and Norman, 2009 ), physical inactivity ( Jayasena et al. , 2020 ; Gundimi et al. , 2022 ), and having a BMI outside the healthy weight range ( Homan and Norman, 2009 ) were risk factors for infertility. They were also aware of the adverse effects of having an elevated BMI on fertility outcomes ( Cardozo et al. , 2012 ; Jayasena et al. , 2020 ; Gundimi et al. , 2022 ), and could self-identify when their BMI was above the healthy range ( Homan and Norman, 2009 ; Cardozo et al. , 2012 ). Conversely, people at or below a healthy weight experienced weight misperception overestimating their BMI category ( Homan and Norman, 2009 ; Cardozo et al. , 2012 ). Level of education was also a barrier to understanding the role of lifestyle in fertility management; those with a lower level of education struggled to identify the relationship between high BMI and decreased fertility ( Cardozo et al. , 2012 ; Jayasena et al. , 2020 ). Overall, people were interested in learning more about the role of diet and physical activity in the fertility journey ( Vause et al. , 2009 ; Uraz et al. , 2019 ; Jayasena et al. , 2020 ; Boedt et al. , 2021 ; Dreischor et al. , 2022 ; Latifi et al. , 2022 ). Topics of interest included foods to improve fertility ( Boedt et al. , 2021 ; Latifi et al. , 2022 ) and contraindications of physical activity during fertility treatment ( Boedt et al. , 2021 ; Latifi et al. , 2022 ).

Health professionals’ knowledge, skills, and training

Varying levels of confidence amongst health professionals in supporting patients to improve lifestyle were seen as a major barrier. While nurses felt they had insufficient skills to facilitate meaningful change in patients with low motivation ( Ockhuijsen et al. , 2012 ), their confidence levels improved following adequate training ( Ockhuijsen et al. , 2012 ). Health professional awareness of lifestyle risk factors for infertility was an important enabler, with almost all obstetrics and gynaecology residents correctly identifying underweight and obesity as infertility risk factors ( Roberts et al. , 2020 ).

Strategies for behaviour change

People with infertility identified difficulty in breaking unhealthy habits as a barrier to making lifestyle changes ( Homan et al. , 2012 ). Strategies for behaviour change such as self-monitoring (e.g. via calorie-counting applications) and action planning (e.g. by documenting plans to change habits) supported people with infertility in adhering to a healthy lifestyle ( Van Dijk et al. , 2017 ; Sacha et al. , 2018 ; Uraz et al. , 2019 ).

Health professional identification of appropriate content and recipients for lifestyle education

Barriers to the provision of optimal lifestyle management include differing perceptions from nurses regarding which patients may benefit from lifestyle interventions ( Homan et al. , 2018 ) and the emphasis on weight, rather than diet and exercise, in fertility management ( Boedt et al. , 2023 ). Nurses considered lifestyle counselling important for people with multiple lifestyle risk factors, and lower intensity interventions (e.g. written information or telephone availability) sufficient in other cases ( Ockhuijsen et al. , 2012 ), highlighting the potential for provision of target care. Provision by nurses of lifestyle education as a routine practice could potentially enable them to establish consistent routines for care ( Homan et al. , 2018 ).

Themes relating to opportunity

Reshaping the food environment.

Women with infertility described barriers to following a healthy diet including external food cues, such as the sight and smell of food increasing their susceptibility to eating unhealthy foods or overeating ( Karsten et al. , 2019 ), and the external food environment, such as limited availability of healthy foods and hesitancy to visit grocery stores during COVID-19 pandemic ( Claydon et al. , 2022 ). Women adapted during the COVID-19 pandemic by using grocery delivery services and bulk-buying foods to reduce the frequency of their grocery store visits ( Claydon et al. , 2022 ).

Limited time, resources, and money

People with infertility cited time constraints ( Homan and Norman, 2009 ; Homan et al. , 2012 ; Simon et al. , 2013 ; Uraz et al. , 2019 ; Gundimi et al. , 2022 ), limited finances ( Homan and Norman, 2009 ), and lack of exercise facilities ( Gundimi et al. , 2022 ) as barriers to a healthy lifestyle. Similarly, health professionals working in fertility clinics cited time constraints and lack of resource allocation from health organizations as reasons for not providing lifestyle modification programmes ( Homan et al. , 2018 ; Boedt et al. , 2023 ). Some nurses also doubted the cost-effectiveness of providing lifestyle modification programmes ( Homan et al. , 2018 ).

Unmet needs from health professional-led lifestyle advice

People with infertility did not receive health professional-led lifestyle advice that met their needs. They felt nutrition services were not adequately integrated into the health system ( Klobodu et al. , 2023 ), and wanted fertility clinics to improve access to nutrition professionals ( Latifi et al. , 2022 ). Women felt that lifestyle was not discussed early enough in treatment ( Ockhuijsen et al. , 2012 ; Riggan et al. , 2023 ), particularly where their BMI affected treatment eligibility ( Riggan et al. , 2023 ). For those who did receive lifestyle advice, recommendations were seen as irrelevant, impractical, and lacking in detail ( Porter and Bhattacharya, 2008 ; Ockhuijsen et al. , 2012 ; Klobodu et al. , 2023 ). Consequently, patients often sought information from alternative avenues, such as the internet and print media, which they considered to be helpful ( Porter and Bhattacharya, 2008 ).

Mode of delivery

People with infertility expressed that feelings of anxiety triggered by attending fertility clinics ( Uraz et al. , 2019 ) and logistical challenges ( Uraz et al. , 2019 ; Boedt et al. , 2021 ) are barriers to engaging in clinic-led lifestyle programmes. Both people with infertility and health professionals felt the use of digital health would support adoption of healthy lifestyles ( Van Dijk et al. , 2017 ; Sacha et al. , 2018 ; Uraz et al. , 2019 ). While men considered face-to-face lifestyle consultations to be fully replaceable by a mobile-health, women preferred a hybrid model of a combination of face-to-face consultations and mobile-health ( Van Dijk et al. , 2017 ). Both women with infertility and nurses emphasized the value of an online patient-facing for the self-administered lifestyle risk screening tool. This would enable nurses to efficiently address risk factors in a personalized way ( Ockhuijsen et al. , 2012 ), and streamline low-risk patients risk to less intensive telehealth consultations ( Ockhuijsen et al. , 2012 ). Educational handouts for patients (both physical leaflets and multimedia) and manuals/guides for health professionals were also viewed as enablers to adoption and delivery of lifestyle recommendations ( Ockhuijsen et al. , 2012 ; Jayasena et al. , 2020 ). People with infertility also appreciated when lifestyle advice provided practical strategies for scalable changes delivered over time, ideally involving six sessions ( Ockhuijsen et al. , 2012 ; Van Dijk et al. , 2017 ; Uraz et al. , 2019 ).

Support from partner, peers, and health professionals

While people with infertility were apprehensive about lifestyle interventions delivered in large groups ( Oron et al. , 2015 ; Uraz et al. , 2019 ) and disliked it when group members raised questions irrelevant to their own needs ( Uraz et al. , 2019 ), they felt more comfortable with group dynamics over time ( Uraz et al. , 2019 ). Women also acknowledged benefits of peer support, with group interventions facilitating feedback, accountability, and a sense of connection ( Simon et al. , 2013 ; Oron et al. , 2015 ; Uraz et al. , 2019 ). Patients also considered support from their partner ( Simon et al. , 2013 ; Uraz et al. , 2019 ) and health professionals to be an important enabler to lifestyle change ( Simon et al. , 2013 ; Uraz et al. , 2019 ; Boedt et al. , 2021 ; Riggan et al. , 2023 ), particularly when an empathetic approach to care was provided by health professionals ( Ockhuijsen et al. , 2012 ; Riggan et al. , 2023 ).

Catering for others

Women with infertility considered picky eaters in the household and social obligations (e.g. entertaining) to be a barrier to healthy eating at least some of the time ( Simon et al. , 2013 ).

Themes relating to motivation

Understanding the importance and consequences of lifestyle change.

For people with infertility, misconceptions about impacts of lifestyle change on fertility stemmed from cultural and societal influences. These included adapting their diet to align with cultural practices (e.g. Chinese medicine principle of hot and cold body energies) ( Read et al. , 2014 ), beliefs that exercise was unnecessary or harmful while trying to conceive ( Porter and Bhattacharya, 2008 ; Gundimi et al. , 2022 ), and concerns that delaying fertility treatment for preconception weight loss would narrow the reproductive window ( Riggan et al. , 2023 ). For health professional (nurses, nutritionists, and infertility specialists), weight management was more likely to be considered important when compared to diet and exercise ( Homan et al. , 2018 ; Langarizadeh et al. , 2022 ). Overall, people with infertility considered preconception lifestyle change important in influencing fertility outcomes ( Homan and Norman, 2009 ; Vause et al. , 2009 ; Simon et al. , 2013 ) and recognized benefits for overall health that extended beyond fertility ( Homan and Norman, 2009 ; Gundimi et al. , 2022 ).

Belief that patients do not want health professional-led lifestyle education

Health professionals (gynaecologists and fertility doctors) were concerned that patients are not interested in health professional-led lifestyle education ( Van Dijk et al. , 2017 ). Similarly, nurses cited previous poor uptake of lifestyle modification programmes as a reason why their clinics did not offer these programmes ( Homan et al. , 2018 ). In addition, nurses were concerned about appearing accusatory when discussing topics such as weight management ( Ockhuijsen et al. , 2012 ).

Confidence in skills required to follow a healthy lifestyle

Women with infertility reported varying levels of confidence in the skills required to follow a healthy lifestyle, with low levels of confidence in their cooking skills emphasized as a key barrier ( Simon et al. , 2013 ). However, following a dietitian-led group intervention enhanced their confidence in their ability to prepare healthy meals and make healthy shopping choices ( Simon et al. , 2013 ). Similarly, women who participated in a nurse-led lifestyle intervention described having improved self-efficacy to manage their infertility ( Li et al. , 2017 ). Women also voiced that health professional supervision would help them to lose more weight in a shorter amount of time ( Sacha et al. , 2018 ), and to meet BMI eligibly criteria for fertility treatment ( Riggan et al. , 2023 ).

Considering the stage of change

For some individuals with infertility, competing priorities such as their career presented barriers to viewing lifestyle change as a priority ( Homan and Norman, 2009 ). In addition, the stage of change (as defined by the transtheoretical model ( Prochaska and Velicer, 1997 )) varied for different lifestyle components: women with obesity and infertility had a higher level of readiness to change for weight loss compared to physical activity ( Karsten et al. , 2019 ).

Professional responsibility

Some health professionals employed in fertility clinics believed lifestyle management was not the clinic’s responsibility, and instead placed the responsibility on patients or other external health professionals ( Boedt et al. , 2023 ). However, the vast majority of fertility clinic staff considered that the clinic doctors and nurses have a responsibility to promote healthy lifestyle for patients seeking ART ( Homan et al. , 2018 ; Boedt et al. , 2023 ), with nurses being more likely than gynaecologists to adopt this perspective ( Boedt et al. , 2023 ). Reasons for fertility clinic staff viewing healthy lifestyle promotion as part of their professional role included its effect on treatment outcomes (including fertility, pregnancy complication reduction, and offspring health), reducing health care costs for society, and providing patients with education and empowerment ( Boedt et al. , 2023 ). Although nurses recognized the crucial role of their clinics in healthy lifestyle information provision, they emphasized the value of coordinating multi-disciplinary care, by providing referrals to allied health providers that could offer more tailored and structured programmes ( Homan et al. , 2018 ; Boedt et al. , 2023 ).

Interplay between lifestyle and emotional state

Emotional eating was a barrier to a healthy lifestyle in women with infertility ( Porter and Bhattacharya, 2008 ; Simon et al. , 2013 ; Karsten et al. , 2019 ). Distress experienced during the infertility journey was compounded by feelings of self-blame following unsuccessful attempts to reduce weight and make lifestyle changes ( Porter and Bhattacharya, 2008 ). Some people also experienced a lack of enjoyment from exercise ( Homan and Norman, 2009 ). In contrast, people who made positive lifestyle changes during their infertility journey described a feeling of empowerment, by proactively taking actions to improve their chances of successful conception ( Porter and Bhattacharya, 2008 ; Oron et al. , 2015 ). Although some people only viewed their lifestyle changes positively after conception ( Porter and Bhattacharya, 2008 ), others described how lifestyle changes such as yoga assisted with stress management ( Oron et al. , 2015 ; Dreischor et al. , 2022 ). Additionally, women who participated in lifestyle interventions that incorporated mental health strategies reported that this helped them to feel more at peace with the future regardless of the pregnancy outcome ( Oron et al. , 2015 ). Low intensity forms of exercise, including yoga and walking, were viewed more favourably by women with infertility ( Gundimi et al. , 2022 ).

Providing lifestyle intervention is rewarding

An enabler to health professional provision of lifestyle intervention was that nurses enjoyed motivating and supporting patients with reaching their goals ( Ockhuijsen et al. , 2012 ).

Intervention functions, policy categories, BCTs, and components

Suggested components to include in lifestyle interventions for people with infertility are displayed in Supplementary Tables S10 and S11 , with selected examples illustrated in Fig. 2 . Overall, suggested intervention components emphasize the importance of shared decision-making, empowering patients with practical strategies, tailoring lifestyle education to individuals’ needs, and incorporating mental health strategies. Suggested intervention functions and policy categories (to deliver and enable implementation of the intervention functions) are displayed in Table 3 . There were 34 BCTs identified across the suggested intervention functions, encompassing 14 of the 16 BCT categories ( Table 4 ).

Suggested intervention components to include in lifestyle interventions for infertility. COM-B, Capability, Opportunity, Motivation and Behaviour; HP, health professional; SMS, short message service.

Suggested intervention components to include in lifestyle interventions for infertility. COM-B, Capability, Opportunity, Motivation and Behaviour; HP, health professional; SMS, short message service.

Intervention functions and policy categories corresponding to the suggested interventions to address barriers and enablers within each theme.

Intervention Functions Policy Categories
ThemesEducationPersuasionIncentivizationTrainingEnvironmental restructuringModellingEnablementCommunication/marketingGuidelinesFiscalEnvironmental/social planningService provision
People with infertilityVarying levels of understanding of the relationship between lifestyle and fertility
Strategies for behaviour change
Reshaping the food environment
Limited time, resources, and money
Mode of delivery
Unmet needs from health professional-led lifestyle advice
Support from partner, peers, and health professionals
Catering for others
Understanding the importance and consequences of lifestyle change
Confidence in skills required to follow a healthy lifestyle
Considering the stage of change
Interplay between lifestyle and emotional state
Health professionalsIdentification of appropriate content and recipients for lifestyle education
Health professionals’ knowledge, skills, and training
Limited time, resources, and money
Mode of delivery
Belief that patients do not want health professional-led lifestyle education
Understanding the importance and consequences of lifestyle change
Professional responsibility
Providing lifestyle intervention is rewarding
Intervention Functions Policy Categories
ThemesEducationPersuasionIncentivizationTrainingEnvironmental restructuringModellingEnablementCommunication/marketingGuidelinesFiscalEnvironmental/social planningService provision
People with infertilityVarying levels of understanding of the relationship between lifestyle and fertility
Strategies for behaviour change
Reshaping the food environment
Limited time, resources, and money
Mode of delivery
Unmet needs from health professional-led lifestyle advice
Support from partner, peers, and health professionals
Catering for others
Understanding the importance and consequences of lifestyle change
Confidence in skills required to follow a healthy lifestyle
Considering the stage of change
Interplay between lifestyle and emotional state
Health professionalsIdentification of appropriate content and recipients for lifestyle education
Health professionals’ knowledge, skills, and training
Limited time, resources, and money
Mode of delivery
Belief that patients do not want health professional-led lifestyle education
Understanding the importance and consequences of lifestyle change
Professional responsibility
Providing lifestyle intervention is rewarding

Categories of behaviour change techniques suggested to address barriers and enablers within each theme.

Themes1. Goals and planning2. Feedback and monitoring3. Social support4. Shaping knowledge5. Natural consequences6. Comparison of behaviour7. Associations8. Repetition and substitution9. Comparison of outcomes10. Reward and threat11. Regulation12. Antecedents13. Identity15. Self-belief
People with infertilityVarying levels of understanding of the relationship between lifestyle and fertility
Strategies for behaviour change
Reshaping the food environment
Limited time, resources, and money
Mode of delivery
Unmet needs from health professional-led lifestyle advice
Support from partner, peers, and health professionals
Catering for others
Understanding the importance and consequences of lifestyle change
Confidence in skills required to follow a healthy lifestyle
Considering the stage of change
Interplay between lifestyle and emotional state
Health professionalsIdentification of appropriate content and recipients for lifestyle education
Health professionals’ knowledge, skills, and training
Limited time, resources, and money
Mode of delivery
Belief that patients do not want health professional led lifestyle education
Understanding the importance and consequences of lifestyle change
Professional responsibility
Providing lifestyle intervention is rewarding
Themes1. Goals and planning2. Feedback and monitoring3. Social support4. Shaping knowledge5. Natural consequences6. Comparison of behaviour7. Associations8. Repetition and substitution9. Comparison of outcomes10. Reward and threat11. Regulation12. Antecedents13. Identity15. Self-belief
People with infertilityVarying levels of understanding of the relationship between lifestyle and fertility
Strategies for behaviour change
Reshaping the food environment
Limited time, resources, and money
Mode of delivery
Unmet needs from health professional-led lifestyle advice
Support from partner, peers, and health professionals
Catering for others
Understanding the importance and consequences of lifestyle change
Confidence in skills required to follow a healthy lifestyle
Considering the stage of change
Interplay between lifestyle and emotional state
Health professionalsIdentification of appropriate content and recipients for lifestyle education
Health professionals’ knowledge, skills, and training
Limited time, resources, and money
Mode of delivery
Belief that patients do not want health professional led lifestyle education
Understanding the importance and consequences of lifestyle change
Professional responsibility
Providing lifestyle intervention is rewarding

In this mixed-methods systematic review, we reported that barriers and enablers to achieving a healthy lifestyle from the perspectives of people with infertility included emotional eating and time constraints versus social support and awareness of lifestyle risk factors respectively. Provision of lifestyle advice by health professionals was influenced by barriers including time constraints and enablers including use of online resources to support intervention delivery.

Themes relating to capability demonstrated the crucial role of perception of the role of lifestyle and self-management strategies in influencing behaviours of people with infertility and health professionals. Consistent with previous literature in preconception women ( Kandel et al. , 2021 ), knowledge was both an enabler and barrier depending on patients’ level of education relating to awareness of lifestyle risk factors (BMI, diet, and physical activity) on fertility. However, no studies specifically assessed the awareness of people with infertility regarding the impacts of underweight or excessive physical activity on fertility. Nevertheless, improving risk perception by addressing knowledge gaps on the relationship between lifestyle and fertility is essential to improve engagement with lifestyle interventions ( Ferrer and Klein, 2015 ). It is additionally crucial for interventions to extend beyond information provision, and intervention components identified here encourage the use of goal setting, action planning, and problem solving. Self-management strategies are essential to facilitate the capability to engage in meaningful behaviour change, supported by self-monitoring of diet, physical activity, and/or weight via digital health, being associated with improved diet and physical activity or weight loss in adults with overweight or obesity ( Teasdale et al. , 2018 ; Patel et al. , 2021 ; Laranjo et al. , 2021 ). Of note, any focus on weight loss should be considered in light of the higher prevalence of disordered eating and eating disorders among people with infertility compared to the general population ( Hecht et al. , 2022 ) given concerns that self-monitoring of dietary intake in the general population could contribute to guilt and disordered eating in susceptible individuals ( Orji et al. , 2018 ). We acknowledge the need for an individualized approach to ensure that care provision is both safe and effective and highlight the identification of shared decision-making between the health professional and patient in our intervention components to ensure a tailored and appropriate approach.

Themes relating to opportunity highlighted limited availability of time and resources as major barriers from the perspectives of both people with infertility and health professionals. To overcome this, both groups emphasized the value of digital health in supporting efficient and targeted care. Dietary interventions delivered via telehealth are effective at improving diet quality and clinical outcomes in adults with chronic conditions ( Kelly et al. , 2016 ), and could address logistical challenges and anxiety induced by attending fertility clinics for lifestyle intervention. Of note, opportunity for lifestyle management of infertility can evolve over time; while advancements in telehealth enhanced enablers, COVID-19 pandemic-related disruptions in the food supply exacerbated barriers ( Claydon et al. , 2022 ). Themes relating to opportunity also highlighted the complex interpersonal dynamics experienced by people with infertility. Women with infertility cited partner support as an important enabler to a healthy lifestyle. Although conception is an event that requires contribution from both male and female partners in heterosexual couples, women with infertility disproportionately bear the burdens of fertility treatment ( Inhorn and Patrizio, 2015 ). We identified less data on perspectives of men compared to those of women, which may be caused by men’s reluctance to discuss infertility with researchers ( Lloyd, 1996 ), as well as the overall tendency for infertility research to focus on women ( Culley et al. , 2013 ). However, men view themselves as an important source of support for their partners during the infertility journey, even in cases of male-factor infertility ( Bell, 2016 ). Hence, our suggested interventions recommend collaboration between partners in lifestyle modification. Additionally, we highlighted the potential benefits of lifestyle interventions delivered in a group setting, with participants expressing appreciation for the opportunity to connect with people in similar situations. This sense of connection is particularly valuable during the infertility journey, which is often accompanied by feelings of loneliness and isolation ( Hinton et al. , 2010 ; O’Connell et al. , 2021 ). We recommend limiting participant numbers and ensuring intervention personnel are trained to manage group interactions and ensure congruity of goals for all patients to overcome the reluctance of patients to engage with group interventions relating to factors including discomfort with group dynamics ( Oron et al. , 2015 ) or programme content not being relevant to their personal goals ( Uraz et al. , 2019 ).

Themes relating to motivation highlighted varying opinions on the importance of different lifestyle factors and the complex interplay between lifestyle and emotions experienced during the infertility journey. Health professionals often considered weight management more important than lifestyle behaviours such as diet and exercise and were more likely to discuss weight in consultations. This prioritization may be reflective of the presence of BMI cut-offs in many countries, affecting eligibility for publicly funded fertility treatment, which has been criticized due to insufficient evidence to support their benefit ( Legro, 2016 ). When lifestyle discussions focus on weight rather than diet and physical activity, this may be viewed as stigmatizing and lacking in practical strategies for change ( Koball et al. , 2018 ). Furthermore, it is essential to consider the complex and intertwined relationship between stress experienced during the infertility journey and difficulty managing weight. Women with infertility who experienced unsuccessful weight loss attempts described feelings of shame and blamed themselves for their inability to fall pregnant, exacerbating emotional eating ( Porter and Bhattacharya, 2008 ; Wong and Qian, 2016 ). Referral for evidence-based psychological interventions such as cognitive behavioural therapy may be indicated in these situations ( Smith et al. , 2023 ). Additionally, to break the cycle of shame related to weight, providers of lifestyle interventions could consider shifting the focus away from weight by informing people with infertility of the benefits of healthy lifestyle behaviours independent of weight ( Salvia et al. , 2023 ), and encouraging consumers to reframe healthy lifestyle behaviours as broader self-care strategies.

Strengths and limitations

This review has several strengths. We used a comprehensive search strategy, and the use of the convergent integrated approach allowed for data triangulation. Credibility of findings was increased by including perspectives of two important groups of stakeholders, people with infertility and health professionals. The use of established frameworks including the COM-B and TDF additionally supports structured development of interventions to target identified enablers and barriers. We acknowledge limitations of this review. Conducting a mixed-methods systematic review required us to synthesize highly heterogeneous data. While we used published guidelines to inform our methodology and ensure appropriate integration and analysis of data, the disproportionately large number of quantitative studies may limit interpretation of findings. Most of the surveys included in this study were not validated and used close-ended questions that could not cover all possible answers. This may have led participants to answer inauthentically or misinterpret questions. Surveys also cannot capture rich perspectives gained through qualitative inquiry. For example, while our findings indicated people with infertility understood the relationship of elevated BMI and physical inactivity with infertility, this was informed by survey data that did not inquire about their awareness of the detrimental effects of having a low BMI or excessive physical activity. Similarly, while our findings indicated women identified having picky eaters in the household, social obligations and their lack of skills to cook healthy meals as obstacles to adopting healthy behaviours, rating a list of predefined barriers does not provide adequate context to understand how or why these obstacles impact their behaviours. Hence, this review has importantly highlighted the incomplete nature of the literature, indicating a need for more qualitative research. Additionally, none of the qualitative studies included in our review adequately addressed the researcher–participant relationship, which may contribute to bias in the results. Furthermore, there were a limited number of studies reporting on the perspectives on health professionals, particularly allied health professionals, with no studies reporting on perspectives of physiotherapists or exercise physiologists. Finally, the majority of research was published in higher income countries. This limited the generalizability of our findings and our ability to evaluate geographical or temporal trends in enablers and barriers.

Implications for practice and research

Our systematic review identified several interacting factors, which influence lifestyle behaviours in people with infertility. We suggest intervention components that may be considered by stakeholders involved in intervention development and delivery when optimizing lifestyle interventions for people with infertility. We identified gaps in existing literature, highlighting the need for further qualitative research to enhance understanding of barriers and enablers to lifestyle change from the perspectives of both people seeking fertility services and health professionals across diverse professions, cultural backgrounds, education levels, sexual orientations, and socioeconomic positions. Hence, it is likely that there are additional enablers and barriers to lifestyle change, which were not captured in this review. Additionally, there is a need for research to synthesize barriers and enablers to other lifestyle modifications such as cessation of alcohol and tobacco use, which were not within the scope of this review. As additional evidence on the perspectives of people with infertility and health professionals becomes available, the suggested interventions can be further refined to suit the needs of the specific populations and settings to which they are applied.

Several interacting factors influence lifestyle in people with infertility, including knowledge, use of behavioural change strategies, time and resource availability, interpersonal interactions, self-efficacy, and emotions experienced during the infertility journey. Health professionals similarly experience several factors influencing delivery of lifestyle interventions, including knowledge, time and resource availability, and varying perspectives about the role of lifestyle intervention and their own professional role in delivering lifestyle interventions. By considering the unique perspectives of people with infertility and health professionals involved in their care, lifestyle interventions can be designed and developed according to these perspectives. As such, we recommend that the information provision component of interventions gives focus to lifestyle changes beyond weight loss, and includes information on beneficial effects of lifestyle changes beyond improvements in fertility. We recommend that interventions extend beyond information provision, by enhancing capability via support for practical strategies for behaviour change, enhancing opportunity via involvement of the partner, and enhancing motivation by encouraging the reframing of lifestyle changes as self-care strategies. Therefore, the evidence synthesized in our review can help support the development of successful lifestyle interventions for infertility.

Supplementary data are available at Human Reproduction Update online.

The data underlying this article will be shared on reasonable request to the corresponding author.

S.T., L.Z., K.L., D.B., W.X., L.M., and S.C. contributed to eligibility assessment. S.T. and K.L. contributed to data extraction. S.T., L.M., and S.C. contributed to data analysis. S.T., L.Z., K.L., W.X., and L.M. contributed to quality assessment. S.T. wrote the first draft of the manuscript. R.W., R.J.N., L.M., and S.C. provided supervision to the review process. L.M. and S.C. contributed as senior authors. All authors provided intellectual input. All authors meet the International Committee of Medical Journal Editors (ICMJE) criteria for authorship and approved the final version for publication.

No funding was obtained for this specific review. S.T. is supported by a Centre of Research Excellence Health in Preconception & Pregnancy (CRE HiPP) PhD Scholarship. R.W. is supported by a NHMRC Emerging Leadership Investigator Grant (2009767). L.M. is supported by a Veski Fellowship.

The authors have no conflicts of interest.

Alesi S , Habibi N , Silva TR , Cheung N , Torkel S , Tay CT , Quinteros A , Winter H , Teede H , Mousa A et al.  Assessing the influence of preconception diet on female fertility: a systematic scoping review of observational studies . Hum Reprod Update 2023 ; 29 : 811 – 828 .

Google Scholar

Atkins L , Francis J , Islam R , O’Connor D , Patey A , Ivers N , Foy R , Duncan EM , Colquhoun H , Grimshaw JM et al.  A guide to using the Theoretical Domains Framework of behaviour change to investigate implementation problems . Implement Sci 2017 ; 12 : 77 .

Augood C , Duckitt K , Templeton AA. Smoking and female infertility: a systematic review and meta-analysis . Hum Reprod 1998 ; 13 : 1532 – 1539 .

Bell AV. ‘I don’t consider a cup performance; I consider it a test’: masculinity and the medicalisation of infertility . Sociol Health Illn 2016 ; 38 : 706 – 720 .

Boedt T , Dancet E , Speelman N , Spiessens C , Matthys C , Lie Fong S. Belgian fertility clinic staff value healthy lifestyle promotion but lack access to a structured lifestyle modification programme: an observational study . Gynecol Obstet Invest 2023 ; 88 : 278 – 285 .

Boedt T , Matthys C , Lie Fong S , De Neubourg D , Vereeck S , Seghers J , Van Der Gucht K , Weyn B , Geerts D , Spiessens C et al.  Systematic development of a mobile preconception lifestyle programme for couples undergoing IVF: the PreLiFe-programme . Hum Reprod 2021 ; 36 : 2493 – 2505 .

Boivin J , Bunting L , Koert E , Ieng UC , Verhaak C. Perceived challenges of working in a fertility clinic: a qualitative analysis of work stressors and difficulties working with patients . Hum Reprod 2017 ; 32 : 403 – 408 .

Cardozo ER , Neff LM , Brocks ME , Ekpo GE , Dune TJ , Barnes RB , Marsh EE. Infertility patients’ knowledge of the effects of obesity on reproductive health outcomes . Am J Obstet Gynecol 2012 ; 207 : 509.e501 – 509.e510 .

Center for Evidence Based Management . Critical Appraisal Checklist for Cross-Sectional Study . 2014 . Available from https://cebma.org/assets/Uploads/Critical-Appraisal-Questions-for-a-Cross-Sectional-Study-July-2014-1-v2.pdf (18 January 2023, date last accessed).

Chachamovich JR , Chachamovich E , Ezer H , Fleck MP , Knauth D , Passos EP. Investigating quality of life and health-related quality of life in infertility: a systematic review . J Psychosom Obstet Gynaecol 2010 ; 31 : 101 – 110 .

Chambers GM , Dyer S , Zegers-Hochschild F , de Mouzon J , Ishihara O , Banker M , Mansour R , Kupka MS , Adamson GD. International Committee for Monitoring Assisted Reproductive Technologies world report: assisted reproductive technology, 2014† . Hum Reprod 2021 ; 36 : 2921 – 2934 .

Claydon EA , Davidson DL , McCarty KM , Wang J. I had to go in a bubble: investigating the effects of COVID-19 on fertility treatments and nutrition . Womens Health Rep (New Rochelle) 2022 ; 3 : 617 – 623 .

Critical Appraisal Skills Programme . CASP Qualitative Checklist . 2018 . Available from https://casp-uk.net/checklists/casp-qualitative-studies-checklist-fillable.pdf (18 January 2023, date last accessed).

Culley L , Hudson N , Lohan M. Where are all the men? The marginalization of men in social scientific research on infertility . Reprod Biomed Online 2013 ; 27 : 225 – 235 .

Dreischor F , Laan ETM , Peeters F , Peeraer K , Lambalk CB , Goddijn M , Custers IM , Dancet EAF. The needs of subfertile couples continuing to attempt natural conception: In-depth interviews . Hum Reprod Open 2022 ; 2022 : hoac037 .

Ferrer RA , Klein WM. Risk perceptions and health behavior . Curr Opin Psychol 2015 ; 5 : 85 – 89 .

Gameiro S , Boivin J , Peronace L , Verhaak CM. Why do patients discontinue fertility treatment? A systematic review of reasons and predictors of discontinuation in fertility treatment . Hum Reprod Update 2012 ; 18 : 652 – 669 .

Goossens J , De Roose M , Van Hecke A , Goemaes R , Verhaeghe S , Beeckman D. Barriers and facilitators to the provision of preconception care by healthcare providers: a systematic review . Int J Nurs Stud 2018 ; 87 : 113 – 130 .

Greil AL , McQuillan J , Lowry M , Shreffler KM. Infertility treatment and fertility-specific distress: a longitudinal analysis of a population-based sample of U.S. women . Soc Sci Med 2011 ; 73 : 87 – 94 .

Gundimi S , Krishna Rao B , Bailey A , Kumar P , Nair V , Velingkar KR , Ramachandra P. Knowledge, attitudes, and practices toward exercises among women visiting an infertility clinic: a cross-sectional study . Health Care Women Int 2022 ; 44 : 473 – 486 .

Hecht LM , Hadwiger A , Patel S , Hecht BR , Loree A , Ahmedani BK , Miller-Matero LR. Disordered eating and eating disorders among women seeking fertility treatment: a systematic review . Arch Womens Ment Health 2022 ; 25 : 21 – 32 .

Hinton L , Kurinczuk JJ , Ziebland S. Infertility; isolation and the internet: a qualitative interview study . Patient Educ Couns 2010 ; 81 : 436 – 441 .

Homan G , deLacey S , Tremellen K. Promoting healthy lifestyle in fertility clinics; an Australian perspective . Hum Reprod Open 2018 ; 2018 : hox028 .

Homan G , Litt J , Norman R. The FAST study: Fertility ASsessment and advice Targeting lifestyle choices and behaviours: a pilot study . Hum Reprod 2012 ; 27 : 2396 – 2404 .

Homan G , Norman RJ. Couples perception regarding how lifestyle might affect fertility: results of a pilot study . Aust J Adv Nurs 2009 ; 26 : 77 – 86 .

Inhorn MC , Patrizio P. Infertility around the globe: new thinking on gender, reproductive technologies and global movements in the 21st century . Hum Reprod Update 2015 ; 21 : 411 – 426 .

Jayasena CN , Sharma A , Abbara A , Luo R , White CJ , Hoskin SG , Khanjani S , Crawford MJ , Ramsay JW , Minhas S et al.  Burdens and awareness of adverse self‐reported lifestyle factors in men with sub‐fertility: a cross‐sectional study in 1149 men . Clin Endocrinol (Oxf) 2020 ; 93 : 312 – 321 .

Kandel P , Lim S , Pirotta S , Skouteris H , Moran LJ , Hill B. Enablers and barriers to women’s lifestyle behavior change during the preconception period: a systematic review . Obes Rev 2021 ; 22 : e13235 .

Karsten MDA , Van Oers AM , Groen H , Mutsaerts MAQ , Van Poppel MNM , Geelen A , Van De Beek C , Painter RC , Mol BWJ , Roseboom TJ et al. ; LIFEstyle Study Group . Determinants of successful lifestyle change during a 6-month preconception lifestyle intervention in women with obesity and infertility . Eur J Nutr 2019 ; 58 : 2463 – 2475 .

Kellow NJ , Cerf JL , Horta F , Dordevic AL , Bennett CJ. The effect of dietary patterns on clinical pregnancy and live birth outcomes in men and women receiving assisted reproductive technologies: a systematic review and meta-analysis . Adv Nutr 2022 ; 13 : 857 – 874 .

Kelly JT , Reidlinger DP , Hoffmann TC , Campbell KL. Telehealth methods to deliver dietary interventions in adults with chronic disease: a systematic review and meta-analysis . Am J Clin Nutr 2016 ; 104 : 1693 – 1702 .

Klobodu C , Deutsch J , Vitolins MZ , Fisher K , Nasser JA , Stott D , Milliron BJ. Optimizing fertility treatment with nutrition guidance: exploring barriers and facilitators to healthful nutrition among female cancer survivors with fertility challenges . Integr Cancer Ther 2023 ; 22 : 15347354231191984 .

Koball AM , Mueller PS , Craner J , Clark MM , Nanda S , Kebede EB , Grothe KB. Crucial conversations about weight management with healthcare providers: patients’ perspectives and experiences . Eat Weight Disord 2018 ; 23 : 87 – 94 .

Langarizadeh M , Fatemi Aghda SA , Nadjarzadeh A. Design and evaluation of a mobile-based nutrition education application for infertile women in Iran . BMC Med Inform Decis Mak 2022 ; 22 : 58 .

Laranjo L , Ding D , Heleno B , Kocaballi B , Quiroz JC , Tong HL , Chahwan B , Neves AL , Gabarron E , Dao KP et al.  Do smartphone applications and activity trackers increase physical activity in adults? Systematic review, meta-analysis and metaregression . Br J Sports Med 2021 ; 55 : 422 – 432 .

Latifi M , Allahbakhshian L , Eini F , Karami NA , Al-Suqri MN. Health information needs of couples undergoing assisted reproductive techniques . Iran J Nurs Midwifery Res 2022 ; 27 : 522 – 530 .

Legro RS. Mr fertility authority, Tear Down That Weight Wall! . Hum Reprod 2016 ; 31 : 2662 – 2664 .

Li X-Q , Sun C-F , Guo M. Benefits of nursing care service in the assisted reproduction clinic to self-cycle-management and self-efficiency of infertility patients . Zhonghua Nan Ke Xue = National Journal of Andrology 2017 ; 23 : 536 – 539 .

Liu J , , Wang JG , , Xu HX , , Sun Y , , Peng L , , Sun J , , Xie S. Application of the Green Model to lifestyle intervention in male sterility patients . Zhonghua Nan Ke Xue 2020 ; 26 : 441 – 445 .

Lloyd M. Condemned to be meaningful: non-response in studies of men and infertility . Sociol Health Illn 1996 ; 18 : 433 – 454 .

Michie S , Richardson M , Johnston M , Abraham C , Francis J , Hardeman W , Eccles MP , Cane J , Wood CE. The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: building an international consensus for the reporting of behavior change interventions . Ann Behav Med 2013 ; 46 : 81 – 95 .

Michie S , Van Stralen MM , West R. The behaviour change wheel: a new method for characterising and designing behaviour change interventions . Implement Sci 2011 ; 6 : 42 .

Njau B , Covin C , Lisasi E , Damian D , Mushi D , Boulle A , Mathews C. A systematic review of qualitative evidence on factors enabling and deterring uptake of HIV self-testing in Africa . BMC Public Health 2019 ; 19 : 1289 .

O’Connell SBL , Gelgoot EN , Grunberg PH , Schinazi J , Da Costa D , Dennis C-L , Rosberger Z , Zelkowitz P. ‘I felt less alone knowing I could contribute to the forum’: psychological distress and use of an online infertility peer support forum . Health Psychol Behav Med 2021 ; 9 : 128 – 148 .

Ockhuijsen HDL , Gamel CJ , Van Den Hoogen A , Macklon NS. Integrating preconceptional care into an IVF programme . J Adv Nurs 2012 ; 68 : 1156 – 1165 .

Oostingh EC , Hall J , Koster MPH , Grace B , Jauniaux E , Steegers-Theunissen RPM. The impact of maternal lifestyle factors on periconception outcomes: a systematic review of observational studies . Reprod Biomed Online 2019 ; 38 : 77 – 94 .

Orji R , Lomotey R , Oyibo K , Orji F , Blustein J , Shahid S. Tracking feels oppressive and ‘punishy’: exploring the costs and benefits of self-monitoring for health and wellness . Digit Health 2018 ; 4 : 2055207618797554 .

Oron G , Allnutt E , Lackman T , Sokal-Arnon T , Holzer H , Takefman J. A prospective study using Hatha Yoga for stress reduction among women waiting for IVF treatment . Reprod Biomed Online 2015 ; 30 : 542 – 548 .

Page MJ , McKenzie JE , Bossuyt PM , Boutron I , Hoffmann TC , Mulrow CD , Shamseer L , Tetzlaff JM , Akl EA , Brennan SE et al.  The PRISMA 2020 statement: an updated guideline for reporting systematic reviews . BMJ 2021 ; 372 : n71 .

Patel ML , Wakayama LN , Bennett GG. Self‐monitoring via digital health in weight loss interventions: a systematic review among adults with overweight or obesity . Obesity (Silver Spring) 2021 ; 29 : 478 – 499 .

Popay J , Roberts H , Sowden A , Petticrew M , Arai L , Rodgers M , Britten N , Roen K , Duffy S. Guidance on the Conduct of Narrative Synthesis in Systematic Reviews. A Product from the ESRC Methods Programme Version , Vol. 1, 2006 , 1 – 92 . https://www.lancaster.ac.uk/media/lancaster-university/content-assets/documents/fhm/dhr/chir/NSsynthesisguidanceVersion1-April2006.pdf (16 January 2024, date last accessed).

Google Preview

Porter M , Bhattacharya S. Helping themselves to get pregnant: a qualitative longitudinal study on the information-seeking behaviour of infertile couples . Hum Reprod 2008 ; 23 : 567 – 572 .

Practice Committee of the American Society for Reproductive Medicine and the Practice Committee of the Society for Reproductive Endocrinology and Infertility . Optimizing natural fertility: a committee opinion . Fertil Steril 2022 ; 117 : 53 – 63 .

Prochaska JO , Velicer WF. The transtheoretical model of health behavior change . Am J Health Promot 1997 ; 12 : 38 – 48 .

Purewal S , Chapman SCE , Van Den Akker OBA. A systematic review and meta-analysis of lifestyle and body mass index predictors of successful assisted reproductive technologies . J Psychosom Obstet Gynaecol 2019 ; 40 : 2 – 18 .

Rao M , Zeng Z , Tang L. Maternal physical activity before IVF/ICSI cycles improves clinical pregnancy rate and live birth rate: a systematic review and meta-analysis . Reprod Biol Endocrinol 2018 ; 16 : 1 – 8 .

Read SC , Carrier M-E , Whitley R , Gold I , Tulandi T , Zelkowitz P. Complementary and alternative medicine use in infertility: cultural and religious influences in a multicultural Canadian setting . J Altern Complement Med 2014 ; 20 : 686 – 692 .

Riggan KA , Rousseau AC , Dsouza KN , Woodward KT , Lue J , Phelan SM , Allyse MA , Shenoy CC. Patient perceptions of body mass index restrictions limiting fertility care for women with high body mass index . Reprod Biomed Online 2023 ; 47 : 103210 .

Roberts LM , Kudesia R , Zhao H , Dolan S , Rose M. A cross-sectional survey of fertility knowledge in obstetrics and gynecology residents . Fertil Res Pract 2020 ; 6 : 22 .

Romualdi D , Ata B , Bhattacharya S , Bosch E , Costello M , Gersak K , Homburg R , Mincheva M , Norman RJ , Piltonen T et al. ; Guideline Group on Unexplained Infertility . Evidence-based guideline: unexplained infertility† . Hum Reprod 2023 ; 38 : 1881 – 1890 .

Sacha C , Page C , Goldman R , Ginsburg E , Zera C. Are women with obesity and infertility willing to attempt weight loss prior to fertility treatment? Obes Res Clin Pract 2018 ; 12 : 125 – 128 .

Salvia MG , Ritholz MD , Craigen KLE , Quatromoni PA. Women’s perceptions of weight stigma and experiences of weight-neutral treatment for binge eating disorder: a qualitative study . eClinicalMedicine 2023 ; 56 : 101811 .

Sanderman EA , Willis SK , Wise LA. Female dietary patterns and outcomes of in vitro fertilization (IVF): a systematic literature review . Nutr J 2022 ; 21 : 5 .

Sermondade N , Huberlant S , Bourhis-Lefebvre V , Arbo E , Gallot V , Colombani M , Fréour T. Female obesity is negatively associated with live birth rate following IVF: a systematic review and meta-analysis . Hum Reprod Update 2019 ; 25 : 439 – 451 .

Simon JD , Dennett C , Thyer A. Assessing the effectiveness of “food for fertility” program . Fertil Steril 2013 ; 100 : S81 .

Smith J , Ang XQ , Giles EL , Traviss-Turner G. Emotional eating interventions for adults living with overweight or obesity: a systematic review and meta-analysis . Int J Environ Res Public Health 2023 ; 20 : 2722 .

Smith JF , Eisenberg ML , Millstein SG , Nachtigall RD , Shindel AW , Wing H , Cedars M , Pasch L , Katz PP ; Infertility Outcomes Program Project Group . The use of complementary and alternative fertility treatment in couples seeking fertility care: data from a prospective cohort in the United States . Fertil Steril 2010 ; 93 : 2169 – 2174 .

Stern C , Lizarondo L , Carrier J , Godfrey C , Rieger K , Salmond S , Apóstolo J , Kirkpatrick P , Loveday H. Methodological guidance for the conduct of mixed methods systematic reviews . JBI Evid Synth 2020 ; 18 : 2108 – 2118 .

Teasdale N , Elhussein A , Butcher F , Piernas C , Cowburn G , Hartmann-Boyce J , Saksena R , Scarborough P. Systematic review and meta-analysis of remotely delivered interventions using self-monitoring or tailored feedback to change dietary behavior . Am J Clin Nutr 2018 ; 107 : 247 – 256 .

Teede HJ , Tay CT , Laven J , Dokras A , Moran LJ , Piltonen TT , Costello MF , Boivin J , Redman LM , Boyle JA et al. ; International PCOS Network . Recommendations from the 2023 International Evidence-based Guideline for the Assessment and Management of Polycystic Ovary Syndrome† . Hum Reprod 2023 ; 38 : 1655 – 1679 .

Thomas J , Harden A. Methods for the thematic synthesis of qualitative research in systematic reviews . BMC Med Res Methodol 2008 ; 8 : 45 .

Tong A , Flemming K , McInnes E , Oliver S , Craig J. Enhancing transparency in reporting the synthesis of qualitative research: ENTREQ . BMC Med Res Methodol 2012 ; 12 : 181 .

Uraz Z , Cooley K , Vu A , Sanders L , Hannam T , Raina P. Participant experiences with an evidence-based diet and lifestyle group educational and support program: a feasibility study . Hum Reprod 2019 ;34(Supp1): 368 – 368 .

Van Dijk MR , Koster MP , Rosman AN , Steegers-Theunissen RP. Opportunities of mHealth in preconception care: preferences and experiences of patients and health care providers and other involved professionals . JMIR Mhealth Uhealth 2017 ; 5 : e123 .

Vause TD , Jones L , Evans M , Wilkie V , Leader A. Pre-conception health awareness in infertility patients . J Obstet Gynaecol Can 2009 ; 31 : 717 – 720 .

Vincent C , Furnham A. Why do patients turn to complementary medicine? An empirical study . Br J Clin Psychol 1996 ; 35 : 37 – 48 .

Winter HG , Rolnik DL , Mol BWJ , Torkel S , Alesi S , Mousa A , Habibi N , Silva TR , Oi Cheung T , Thien Tay C et al.  Can dietary patterns impact fertility outcomes? A systematic review and meta-analysis . Nutrients 2023 ; 15 : 2589 .

Wong M , Qian M. The role of shame in emotional eating . Eat Behav 2016 ; 23 : 41 – 47 .

Wu AK , Elliott P , Katz PP , Smith JF. Time costs of fertility care: the hidden hardship of building a family . Fertil Steril 2013 ; 99 : 2025 – 2030 .

Zegers-Hochschild F , Adamson GD , Dyer S , Racowsky C , De Mouzon J , Sokol R , Rienzi L , Sunde A , Schmidt L , Cooke ID et al.  The international glossary on infertility and fertility care, 2017 . Fertil Steril 2017 ; 108 : 393 – 406 .

Author notes

  • health personnel
  • infertility
  • lifestyle changes
  • behavioral change
  • healthy lifestyle
Month: Total Views:
May 2024 571
June 2024 400
July 2024 203
August 2024 168

Email alerts

Citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1460-2369
  • Copyright © 2024 Human Reproduction Update
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PMC10248995

Logo of sysrev

Guidance to best tools and practices for systematic reviews

Kat kolaski.

1 Departments of Orthopaedic Surgery, Pediatrics, and Neurology, Wake Forest School of Medicine, Winston-Salem, NC USA

Lynne Romeiser Logan

2 Department of Physical Medicine and Rehabilitation, SUNY Upstate Medical University, Syracuse, NY USA

John P. A. Ioannidis

3 Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, CA USA

Associated Data

Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods. In addition, guideline developers, peer reviewers, and journal editors often disregard current methodological standards. Although extensively acknowledged and explored in the methodological literature, most clinicians seem unaware of these issues and may automatically accept evidence syntheses (and clinical practice guidelines based on their conclusions) as trustworthy.

A plethora of methods and tools are recommended for the development and evaluation of evidence syntheses. It is important to understand what these are intended to do (and cannot do) and how they can be utilized. Our objective is to distill this sprawling information into a format that is understandable and readily accessible to authors, peer reviewers, and editors. In doing so, we aim to promote appreciation and understanding of the demanding science of evidence synthesis among stakeholders. We focus on well-documented deficiencies in key components of evidence syntheses to elucidate the rationale for current standards. The constructs underlying the tools developed to assess reporting, risk of bias, and methodological quality of evidence syntheses are distinguished from those involved in determining overall certainty of a body of evidence. Another important distinction is made between those tools used by authors to develop their syntheses as opposed to those used to ultimately judge their work.

Exemplar methods and research practices are described, complemented by novel pragmatic strategies to improve evidence syntheses. The latter include preferred terminology and a scheme to characterize types of research evidence. We organize best practice resources in a Concise Guide that can be widely adopted and adapted for routine implementation by authors and journals. Appropriate, informed use of these is encouraged, but we caution against their superficial application and emphasize their endorsement does not substitute for in-depth methodological training. By highlighting best practices with their rationale, we hope this guidance will inspire further evolution of methods and tools that can advance the field.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13643-023-02255-9.

Part 1. The state of evidence synthesis

Evidence syntheses are commonly regarded as the foundation of evidence-based medicine (EBM). They are widely accredited for providing reliable evidence and, as such, they have significantly influenced medical research and clinical practice. Despite their uptake throughout health care and ubiquity in contemporary medical literature, some important aspects of evidence syntheses are generally overlooked or not well recognized. Evidence syntheses are mostly retrospective exercises, they often depend on weak or irreparably flawed data, and they may use tools that have acknowledged or yet unrecognized limitations. They are complicated and time-consuming undertakings prone to bias and errors. Production of a good evidence synthesis requires careful preparation and high levels of organization in order to limit potential pitfalls [ 1 ]. Many authors do not recognize the complexity of such an endeavor and the many methodological challenges they may encounter. Failure to do so is likely to result in research and resource waste.

Given their potential impact on people’s lives, it is crucial for evidence syntheses to correctly report on the current knowledge base. In order to be perceived as trustworthy, reliable demonstration of the accuracy of evidence syntheses is equally imperative [ 2 ]. Concerns about the trustworthiness of evidence syntheses are not recent developments. From the early years when EBM first began to gain traction until recent times when thousands of systematic reviews are published monthly [ 3 ] the rigor of evidence syntheses has always varied. Many systematic reviews and meta-analyses had obvious deficiencies because original methods and processes had gaps, lacked precision, and/or were not widely known. The situation has improved with empirical research concerning which methods to use and standardization of appraisal tools. However, given the geometrical increase in the number of evidence syntheses being published, a relatively larger pool of unreliable evidence syntheses is being published today.

Publication of methodological studies that critically appraise the methods used in evidence syntheses is increasing at a fast pace. This reflects the availability of tools specifically developed for this purpose [ 4 – 6 ]. Yet many clinical specialties report that alarming numbers of evidence syntheses fail on these assessments. The syntheses identified report on a broad range of common conditions including, but not limited to, cancer, [ 7 ] chronic obstructive pulmonary disease, [ 8 ] osteoporosis, [ 9 ] stroke, [ 10 ] cerebral palsy, [ 11 ] chronic low back pain, [ 12 ] refractive error, [ 13 ] major depression, [ 14 ] pain, [ 15 ] and obesity [ 16 , 17 ]. The situation is even more concerning with regard to evidence syntheses included in clinical practice guidelines (CPGs) [ 18 – 20 ]. Astonishingly, in a sample of CPGs published in 2017–18, more than half did not apply even basic systematic methods in the evidence syntheses used to inform their recommendations [ 21 ].

These reports, while not widely acknowledged, suggest there are pervasive problems not limited to evidence syntheses that evaluate specific kinds of interventions or include primary research of a particular study design (eg, randomized versus non-randomized) [ 22 ]. Similar concerns about the reliability of evidence syntheses have been expressed by proponents of EBM in highly circulated medical journals [ 23 – 26 ]. These publications have also raised awareness about redundancy, inadequate input of statistical expertise, and deficient reporting. These issues plague primary research as well; however, there is heightened concern for the impact of these deficiencies given the critical role of evidence syntheses in policy and clinical decision-making.

Methods and guidance to produce a reliable evidence synthesis

Several international consortiums of EBM experts and national health care organizations currently provide detailed guidance (Table ​ (Table1). 1 ). They draw criteria from the reporting and methodological standards of currently recommended appraisal tools, and regularly review and update their methods to reflect new information and changing needs. In addition, they endorse the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for rating the overall quality of a body of evidence [ 27 ]. These groups typically certify or commission systematic reviews that are published in exclusive databases (eg, Cochrane, JBI) or are used to develop government or agency sponsored guidelines or health technology assessments (eg, National Institute for Health and Care Excellence [NICE], Scottish Intercollegiate Guidelines Network [SIGN], Agency for Healthcare Research and Quality [AHRQ]). They offer developers of evidence syntheses various levels of methodological advice, technical and administrative support, and editorial assistance. Use of specific protocols and checklists are required for development teams within these groups, but their online methodological resources are accessible to any potential author.

Guidance for development of evidence syntheses

 Cochrane (formerly Cochrane Collaboration)
 JBI (formerly Joanna Briggs Institute)
 National Institute for Health and Care Excellence (NICE)—United Kingdom
 Scottish Intercollegiate Guidelines Network (SIGN) —Scotland
 Agency for Healthcare Research and Quality (AHRQ)—United States

Notably, Cochrane is the largest single producer of evidence syntheses in biomedical research; however, these only account for 15% of the total [ 28 ]. The World Health Organization requires Cochrane standards be used to develop evidence syntheses that inform their CPGs [ 29 ]. Authors investigating questions of intervention effectiveness in syntheses developed for Cochrane follow the Methodological Expectations of Cochrane Intervention Reviews [ 30 ] and undergo multi-tiered peer review [ 31 , 32 ]. Several empirical evaluations have shown that Cochrane systematic reviews are of higher methodological quality compared with non-Cochrane reviews [ 4 , 7 , 9 , 11 , 14 , 32 – 35 ]. However, some of these assessments have biases: they may be conducted by Cochrane-affiliated authors, and they sometimes use scales and tools developed and used in the Cochrane environment and by its partners. In addition, evidence syntheses published in the Cochrane database are not subject to space or word restrictions, while non-Cochrane syntheses are often limited. As a result, information that may be relevant to the critical appraisal of non-Cochrane reviews is often removed or is relegated to online-only supplements that may not be readily or fully accessible [ 28 ].

Influences on the state of evidence synthesis

Many authors are familiar with the evidence syntheses produced by the leading EBM organizations but can be intimidated by the time and effort necessary to apply their standards. Instead of following their guidance, authors may employ methods that are discouraged or outdated 28]. Suboptimal methods described in in the literature may then be taken up by others. For example, the Newcastle–Ottawa Scale (NOS) is a commonly used tool for appraising non-randomized studies [ 36 ]. Many authors justify their selection of this tool with reference to a publication that describes the unreliability of the NOS and recommends against its use [ 37 ]. Obviously, the authors who cite this report for that purpose have not read it. Authors and peer reviewers have a responsibility to use reliable and accurate methods and not copycat previous citations or substandard work [ 38 , 39 ]. Similar cautions may potentially extend to automation tools. These have concentrated on evidence searching [ 40 ] and selection given how demanding it is for humans to maintain truly up-to-date evidence [ 2 , 41 ]. Cochrane has deployed machine learning to identify randomized controlled trials (RCTs) and studies related to COVID-19, [ 2 , 42 ] but such tools are not yet commonly used [ 43 ]. The routine integration of automation tools in the development of future evidence syntheses should not displace the interpretive part of the process.

Editorials about unreliable or misleading systematic reviews highlight several of the intertwining factors that may contribute to continued publication of unreliable evidence syntheses: shortcomings and inconsistencies of the peer review process, lack of endorsement of current standards on the part of journal editors, the incentive structure of academia, industry influences, publication bias, and the lure of “predatory” journals [ 44 – 48 ]. At this juncture, clarification of the extent to which each of these factors contribute remains speculative, but their impact is likely to be synergistic.

Over time, the generalized acceptance of the conclusions of systematic reviews as incontrovertible has affected trends in the dissemination and uptake of evidence. Reporting of the results of evidence syntheses and recommendations of CPGs has shifted beyond medical journals to press releases and news headlines and, more recently, to the realm of social media and influencers. The lay public and policy makers may depend on these outlets for interpreting evidence syntheses and CPGs. Unfortunately, communication to the general public often reflects intentional or non-intentional misrepresentation or “spin” of the research findings [ 49 – 52 ] News and social media outlets also tend to reduce conclusions on a body of evidence and recommendations for treatment to binary choices (eg, “do it” versus “don’t do it”) that may be assigned an actionable symbol (eg, red/green traffic lights, smiley/frowning face emoji).

Strategies for improvement

Many authors and peer reviewers are volunteer health care professionals or trainees who lack formal training in evidence synthesis [ 46 , 53 ]. Informing them about research methodology could increase the likelihood they will apply rigorous methods [ 25 , 33 , 45 ]. We tackle this challenge, from both a theoretical and a practical perspective, by offering guidance applicable to any specialty. It is based on recent methodological research that is extensively referenced to promote self-study. However, the information presented is not intended to be substitute for committed training in evidence synthesis methodology; instead, we hope to inspire our target audience to seek such training. We also hope to inform a broader audience of clinicians and guideline developers influenced by evidence syntheses. Notably, these communities often include the same members who serve in different capacities.

In the following sections, we highlight methodological concepts and practices that may be unfamiliar, problematic, confusing, or controversial. In Part 2, we consider various types of evidence syntheses and the types of research evidence summarized by them. In Part 3, we examine some widely used (and misused) tools for the critical appraisal of systematic reviews and reporting guidelines for evidence syntheses. In Part 4, we discuss how to meet methodological conduct standards applicable to key components of systematic reviews. In Part 5, we describe the merits and caveats of rating the overall certainty of a body of evidence. Finally, in Part 6, we summarize suggested terminology, methods, and tools for development and evaluation of evidence syntheses that reflect current best practices.

Part 2. Types of syntheses and research evidence

A good foundation for the development of evidence syntheses requires an appreciation of their various methodologies and the ability to correctly identify the types of research potentially available for inclusion in the synthesis.

Types of evidence syntheses

Systematic reviews have historically focused on the benefits and harms of interventions; over time, various types of systematic reviews have emerged to address the diverse information needs of clinicians, patients, and policy makers [ 54 ] Systematic reviews with traditional components have become defined by the different topics they assess (Table 2.1 ). In addition, other distinctive types of evidence syntheses have evolved, including overviews or umbrella reviews, scoping reviews, rapid reviews, and living reviews. The popularity of these has been increasing in recent years [ 55 – 58 ]. A summary of the development, methods, available guidance, and indications for these unique types of evidence syntheses is available in Additional File 2 A.

Types of traditional systematic reviews

Review typeTopic assessedElements of research question (mnemonic)
Intervention [ , ]Benefits and harms of interventions used in healthcare. opulation, ntervention, omparator, utcome ( )
Diagnostic test accuracy [ ]How well a diagnostic test performs in diagnosing and detecting a particular disease. opulation, ndex test(s), and arget condition ( )
Qualitative
 Cochrane [ ]Questions are designed to improve understanding of intervention complexity, contextual variations, implementation, and stakeholder preferences and experiences.

etting, erspective, ntervention or Phenomenon of nterest, omparison, valuation ( )

ample, henomenon of nterest, esign, valuation, esearch type ( )

spective, etting, henomena of interest/Problem, nvironment, omparison (optional), me/timing, indings ( )

 JBI [ ]Questions inform meaningfulness and appropriateness of care and the impact of illness through documentation of stakeholder experiences, preferences, and priorities. opulation, the Phenomena of nterest, and the ntext
Prognostic [ ]Probable course or future outcome(s) of people with a health problem. opulation, ntervention (model), omparator, utcomes, iming, etting ( )
Etiology and risk [ ]The relationship (association) between certain factors (e.g., genetic, environmental) and the development of a disease or condition or other health outcome. opulation or groups at risk, xposure(s), associated utcome(s) (disease, symptom, or health condition of interest), the context/location or the time period and the length of time when relevant ( )
Measurement properties [ , ]What is the most suitable instrument to measure a construct of interest in a specific study population? opulation, nstrument, onstruct, utcomes ( )
Prevalence and incidence [ ]The frequency, distribution and determinants of specific factors, health states or conditions in a defined population: eg, how common is a particular disease or condition in a specific group of individuals?Factor, disease, symptom or health ndition of interest, the epidemiological indicator used to measure its frequency (prevalence, incidence), the ulation or groups at risk as well as the ntext/location and time period where relevant ( )

Both Cochrane [ 30 , 59 ] and JBI [ 60 ] provide methodologies for many types of evidence syntheses; they describe these with different terminology, but there is obvious overlap (Table 2.2 ). The majority of evidence syntheses published by Cochrane (96%) and JBI (62%) are categorized as intervention reviews. This reflects the earlier development and dissemination of their intervention review methodologies; these remain well-established [ 30 , 59 , 61 ] as both organizations continue to focus on topics related to treatment efficacy and harms. In contrast, intervention reviews represent only about half of the total published in the general medical literature, and several non-intervention review types contribute to a significant proportion of the other half.

Evidence syntheses published by Cochrane and JBI

Intervention857296.3Effectiveness43561.5
Diagnostic1761.9Diagnostic Test Accuracy91.3
Overview640.7Umbrella40.6
Methodology410.45Mixed Methods20.3
Qualitative170.19Qualitative15922.5
Prognostic110.12Prevalence and Incidence60.8
Rapid110.12Etiology and Risk71.0
Prototype 80.08Measurement Properties30.4
Economic60.6
Text and Opinion10.14
Scoping436.0
Comprehensive 324.5
Total = 8900Total = 707

a Data from https://www.cochranelibrary.com/cdsr/reviews . Accessed 17 Sep 2022

b Data obtained via personal email communication on 18 Sep 2022 with Emilie Francis, editorial assistant, JBI Evidence Synthesis

c Includes the following categories: prevalence, scoping, mixed methods, and realist reviews

d This methodology is not supported in the current version of the JBI Manual for Evidence Synthesis

Types of research evidence

There is consensus on the importance of using multiple study designs in evidence syntheses; at the same time, there is a lack of agreement on methods to identify included study designs. Authors of evidence syntheses may use various taxonomies and associated algorithms to guide selection and/or classification of study designs. These tools differentiate categories of research and apply labels to individual study designs (eg, RCT, cross-sectional). A familiar example is the Design Tree endorsed by the Centre for Evidence-Based Medicine [ 70 ]. Such tools may not be helpful to authors of evidence syntheses for multiple reasons.

Suboptimal levels of agreement and accuracy even among trained methodologists reflect challenges with the application of such tools [ 71 , 72 ]. Problematic distinctions or decision points (eg, experimental or observational, controlled or uncontrolled, prospective or retrospective) and design labels (eg, cohort, case control, uncontrolled trial) have been reported [ 71 ]. The variable application of ambiguous study design labels to non-randomized studies is common, making them especially prone to misclassification [ 73 ]. In addition, study labels do not denote the unique design features that make different types of non-randomized studies susceptible to different biases, including those related to how the data are obtained (eg, clinical trials, disease registries, wearable devices). Given this limitation, it is important to be aware that design labels preclude the accurate assignment of non-randomized studies to a “level of evidence” in traditional hierarchies [ 74 ].

These concerns suggest that available tools and nomenclature used to distinguish types of research evidence may not uniformly apply to biomedical research and non-health fields that utilize evidence syntheses (eg, education, economics) [ 75 , 76 ]. Moreover, primary research reports often do not describe study design or do so incompletely or inaccurately; thus, indexing in PubMed and other databases does not address the potential for misclassification [ 77 ]. Yet proper identification of research evidence has implications for several key components of evidence syntheses. For example, search strategies limited by index terms using design labels or study selection based on labels applied by the authors of primary studies may cause inconsistent or unjustified study inclusions and/or exclusions [ 77 ]. In addition, because risk of bias (RoB) tools consider attributes specific to certain types of studies and study design features, results of these assessments may be invalidated if an inappropriate tool is used. Appropriate classification of studies is also relevant for the selection of a suitable method of synthesis and interpretation of those results.

An alternative to these tools and nomenclature involves application of a few fundamental distinctions that encompass a wide range of research designs and contexts. While these distinctions are not novel, we integrate them into a practical scheme (see Fig. ​ Fig.1) 1 ) designed to guide authors of evidence syntheses in the basic identification of research evidence. The initial distinction is between primary and secondary studies. Primary studies are then further distinguished by: 1) the type of data reported (qualitative or quantitative); and 2) two defining design features (group or single-case and randomized or non-randomized). The different types of studies and study designs represented in the scheme are described in detail in Additional File 2 B. It is important to conceptualize their methods as complementary as opposed to contrasting or hierarchical [ 78 ]; each offers advantages and disadvantages that determine their appropriateness for answering different kinds of research questions in an evidence synthesis.

An external file that holds a picture, illustration, etc.
Object name is 13643_2023_2255_Fig1_HTML.jpg

Distinguishing types of research evidence

Application of these basic distinctions may avoid some of the potential difficulties associated with study design labels and taxonomies. Nevertheless, debatable methodological issues are raised when certain types of research identified in this scheme are included in an evidence synthesis. We briefly highlight those associated with inclusion of non-randomized studies, case reports and series, and a combination of primary and secondary studies.

Non-randomized studies

When investigating an intervention’s effectiveness, it is important for authors to recognize the uncertainty of observed effects reported by studies with high RoB. Results of statistical analyses that include such studies need to be interpreted with caution in order to avoid misleading conclusions [ 74 ]. Review authors may consider excluding randomized studies with high RoB from meta-analyses. Non-randomized studies of intervention (NRSI) are affected by a greater potential range of biases and thus vary more than RCTs in their ability to estimate a causal effect [ 79 ]. If data from NRSI are synthesized in meta-analyses, it is helpful to separately report their summary estimates [ 6 , 74 ].

Nonetheless, certain design features of NRSI (eg, which parts of the study were prospectively designed) may help to distinguish stronger from weaker ones. Cochrane recommends that authors of a review including NRSI focus on relevant study design features when determining eligibility criteria instead of relying on non-informative study design labels [ 79 , 80 ] This process is facilitated by a study design feature checklist; guidance on using the checklist is included with developers’ description of the tool [ 73 , 74 ]. Authors collect information about these design features during data extraction and then consider it when making final study selection decisions and when performing RoB assessments of the included NRSI.

Case reports and case series

Correctly identified case reports and case series can contribute evidence not well captured by other designs [ 81 ]; in addition, some topics may be limited to a body of evidence that consists primarily of uncontrolled clinical observations. Murad and colleagues offer a framework for how to include case reports and series in an evidence synthesis [ 82 ]. Distinguishing between cohort studies and case series in these syntheses is important, especially for those that rely on evidence from NRSI. Additional data obtained from studies misclassified as case series can potentially increase the confidence in effect estimates. Mathes and Pieper provide authors of evidence syntheses with specific guidance on distinguishing between cohort studies and case series, but emphasize the increased workload involved [ 77 ].

Primary and secondary studies

Synthesis of combined evidence from primary and secondary studies may provide a broad perspective on the entirety of available literature on a topic. This is, in fact, the recommended strategy for scoping reviews that may include a variety of sources of evidence (eg, CPGs, popular media). However, except for scoping reviews, the synthesis of data from primary and secondary studies is discouraged unless there are strong reasons to justify doing so.

Combining primary and secondary sources of evidence is challenging for authors of other types of evidence syntheses for several reasons [ 83 ]. Assessments of RoB for primary and secondary studies are derived from conceptually different tools, thus obfuscating the ability to make an overall RoB assessment of a combination of these study types. In addition, authors who include primary and secondary studies must devise non-standardized methods for synthesis. Note this contrasts with well-established methods available for updating existing evidence syntheses with additional data from new primary studies [ 84 – 86 ]. However, a new review that synthesizes data from primary and secondary studies raises questions of validity and may unintentionally support a biased conclusion because no existing methodological guidance is currently available [ 87 ].

Recommendations

We suggest that journal editors require authors to identify which type of evidence synthesis they are submitting and reference the specific methodology used for its development. This will clarify the research question and methods for peer reviewers and potentially simplify the editorial process. Editors should announce this practice and include it in the instructions to authors. To decrease bias and apply correct methods, authors must also accurately identify the types of research evidence included in their syntheses.

Part 3. Conduct and reporting

The need to develop criteria to assess the rigor of systematic reviews was recognized soon after the EBM movement began to gain international traction [ 88 , 89 ]. Systematic reviews rapidly became popular, but many were very poorly conceived, conducted, and reported. These problems remain highly prevalent [ 23 ] despite development of guidelines and tools to standardize and improve the performance and reporting of evidence syntheses [ 22 , 28 ]. Table 3.1  provides some historical perspective on the evolution of tools developed specifically for the evaluation of systematic reviews, with or without meta-analysis.

Tools specifying standards for systematic reviews with and without meta-analysis

 Quality of Reporting of Meta-analyses (QUOROM) StatementMoher 1999 [ ]
 Meta-analyses Of Observational Studies in Epidemiology (MOOSE)Stroup 2000 [ ]
 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)Moher 2009 [ ]
 PRISMA 2020 Page 2021 [ ]
 Overview Quality Assessment Questionnaire (OQAQ)Oxman and Guyatt 1991 [ ]
 Systematic Review Critical Appraisal SheetCentre for Evidence-based Medicine 2005 [ ]
 A Measurement Tool to Assess Systematic Reviews (AMSTAR)Shea 2007 [ ]
 AMSTAR-2 Shea 2017 [ ]
 Risk of Bias in Systematic Reviews (ROBIS) Whiting 2016 [ ]

a Currently recommended

b Validated tool for systematic reviews of interventions developed for use by authors of overviews or umbrella reviews

These tools are often interchangeably invoked when referring to the “quality” of an evidence synthesis. However, quality is a vague term that is frequently misused and misunderstood; more precisely, these tools specify different standards for evidence syntheses. Methodological standards address how well a systematic review was designed and performed [ 5 ]. RoB assessments refer to systematic flaws or limitations in the design, conduct, or analysis of research that distort the findings of the review [ 4 ]. Reporting standards help systematic review authors describe the methodology they used and the results of their synthesis in sufficient detail [ 92 ]. It is essential to distinguish between these evaluations: a systematic review may be biased, it may fail to report sufficient information on essential features, or it may exhibit both problems; a thoroughly reported systematic evidence synthesis review may still be biased and flawed while an otherwise unbiased one may suffer from deficient documentation.

We direct attention to the currently recommended tools listed in Table 3.1  but concentrate on AMSTAR-2 (update of AMSTAR [A Measurement Tool to Assess Systematic Reviews]) and ROBIS (Risk of Bias in Systematic Reviews), which evaluate methodological quality and RoB, respectively. For comparison and completeness, we include PRISMA 2020 (update of the 2009 Preferred Reporting Items for Systematic Reviews of Meta-Analyses statement), which offers guidance on reporting standards. The exclusive focus on these three tools is by design; it addresses concerns related to the considerable variability in tools used for the evaluation of systematic reviews [ 28 , 88 , 96 , 97 ]. We highlight the underlying constructs these tools were designed to assess, then describe their components and applications. Their known (or potential) uptake and impact and limitations are also discussed.

Evaluation of conduct

Development.

AMSTAR [ 5 ] was in use for a decade prior to the 2017 publication of AMSTAR-2; both provide a broad evaluation of methodological quality of intervention systematic reviews, including flaws arising through poor conduct of the review [ 6 ]. ROBIS, published in 2016, was developed to specifically assess RoB introduced by the conduct of the review; it is applicable to systematic reviews of interventions and several other types of reviews [ 4 ]. Both tools reflect a shift to a domain-based approach as opposed to generic quality checklists. There are a few items unique to each tool; however, similarities between items have been demonstrated [ 98 , 99 ]. AMSTAR-2 and ROBIS are recommended for use by: 1) authors of overviews or umbrella reviews and CPGs to evaluate systematic reviews considered as evidence; 2) authors of methodological research studies to appraise included systematic reviews; and 3) peer reviewers for appraisal of submitted systematic review manuscripts. For authors, these tools may function as teaching aids and inform conduct of their review during its development.

Description

Systematic reviews that include randomized and/or non-randomized studies as evidence can be appraised with AMSTAR-2 and ROBIS. Other characteristics of AMSTAR-2 and ROBIS are summarized in Table 3.2 . Both tools define categories for an overall rating; however, neither tool is intended to generate a total score by simply calculating the number of responses satisfying criteria for individual items [ 4 , 6 ]. AMSTAR-2 focuses on the rigor of a review’s methods irrespective of the specific subject matter. ROBIS places emphasis on a review’s results section— this suggests it may be optimally applied by appraisers with some knowledge of the review’s topic as they may be better equipped to determine if certain procedures (or lack thereof) would impact the validity of a review’s findings [ 98 , 100 ]. Reliability studies show AMSTAR-2 overall confidence ratings strongly correlate with the overall RoB ratings in ROBIS [ 100 , 101 ].

Comparison of AMSTAR-2 and ROBIS

Characteristic
ExtensiveExtensive
InterventionIntervention, diagnostic, etiology, prognostic
7 critical, 9 non-critical4
 Total number1629
 Response options

Items # 1, 3, 5, 6, 10, 13, 14, 16: rated or

Items # 2, 4, 7, 8, 9 : rated or

Items # 11 , 12, 15: rated or

24 assessment items: rated

5 items regarding level of concern: rated

 ConstructConfidence based on weaknesses in critical domainsLevel of concern for risk of bias
 CategoriesHigh, moderate, low, critically lowLow, high, unclear

a ROBIS includes an optional first phase to assess the applicability of the review to the research question of interest. The tool may be applicable to other review types in addition to the four specified, although modification of this initial phase will be needed (Personal Communication via email, Penny Whiting, 28 Jan 2022)

b AMSTAR-2 item #9 and #11 require separate responses for RCTs and NRSI

Interrater reliability has been shown to be acceptable for AMSTAR-2 [ 6 , 11 , 102 ] and ROBIS [ 4 , 98 , 103 ] but neither tool has been shown to be superior in this regard [ 100 , 101 , 104 , 105 ]. Overall, variability in reliability for both tools has been reported across items, between pairs of raters, and between centers [ 6 , 100 , 101 , 104 ]. The effects of appraiser experience on the results of AMSTAR-2 and ROBIS require further evaluation [ 101 , 105 ]. Updates to both tools should address items shown to be prone to individual appraisers’ subjective biases and opinions [ 11 , 100 ]; this may involve modifications of the current domains and signaling questions as well as incorporation of methods to make an appraiser’s judgments more explicit. Future revisions of these tools may also consider the addition of standards for aspects of systematic review development currently lacking (eg, rating overall certainty of evidence, [ 99 ] methods for synthesis without meta-analysis [ 105 ]) and removal of items that assess aspects of reporting that are thoroughly evaluated by PRISMA 2020.

Application

A good understanding of what is required to satisfy the standards of AMSTAR-2 and ROBIS involves study of the accompanying guidance documents written by the tools’ developers; these contain detailed descriptions of each item’s standards. In addition, accurate appraisal of a systematic review with either tool requires training. Most experts recommend independent assessment by at least two appraisers with a process for resolving discrepancies as well as procedures to establish interrater reliability, such as pilot testing, a calibration phase or exercise, and development of predefined decision rules [ 35 , 99 – 101 , 103 , 104 , 106 ]. These methods may, to some extent, address the challenges associated with the diversity in methodological training, subject matter expertise, and experience using the tools that are likely to exist among appraisers.

The standards of AMSTAR, AMSTAR-2, and ROBIS have been used in many methodological studies and epidemiological investigations. However, the increased publication of overviews or umbrella reviews and CPGs has likely been a greater influence on the widening acceptance of these tools. Critical appraisal of the secondary studies considered evidence is essential to the trustworthiness of both the recommendations of CPGs and the conclusions of overviews. Currently both Cochrane [ 55 ] and JBI [ 107 ] recommend AMSTAR-2 and ROBIS in their guidance for authors of overviews or umbrella reviews. However, ROBIS and AMSTAR-2 were released in 2016 and 2017, respectively; thus, to date, limited data have been reported about the uptake of these tools or which of the two may be preferred [ 21 , 106 ]. Currently, in relation to CPGs, AMSTAR-2 appears to be overwhelmingly popular compared to ROBIS. A Google Scholar search of this topic (search terms “AMSTAR 2 AND clinical practice guidelines,” “ROBIS AND clinical practice guidelines” 13 May 2022) found 12,700 hits for AMSTAR-2 and 1,280 for ROBIS. The apparent greater appeal of AMSTAR-2 may relate to its longer track record given the original version of the tool was in use for 10 years prior to its update in 2017.

Barriers to the uptake of AMSTAR-2 and ROBIS include the real or perceived time and resources necessary to complete the items they include and appraisers’ confidence in their own ratings [ 104 ]. Reports from comparative studies available to date indicate that appraisers find AMSTAR-2 questions, responses, and guidance to be clearer and simpler compared with ROBIS [ 11 , 101 , 104 , 105 ]. This suggests that for appraisal of intervention systematic reviews, AMSTAR-2 may be a more practical tool than ROBIS, especially for novice appraisers [ 101 , 103 – 105 ]. The unique characteristics of each tool, as well as their potential advantages and disadvantages, should be taken into consideration when deciding which tool should be used for an appraisal of a systematic review. In addition, the choice of one or the other may depend on how the results of an appraisal will be used; for example, a peer reviewer’s appraisal of a single manuscript versus an appraisal of multiple systematic reviews in an overview or umbrella review, CPG, or systematic methodological study.

Authors of overviews and CPGs report results of AMSTAR-2 and ROBIS appraisals for each of the systematic reviews they include as evidence. Ideally, an independent judgment of their appraisals can be made by the end users of overviews and CPGs; however, most stakeholders, including clinicians, are unlikely to have a sophisticated understanding of these tools. Nevertheless, they should at least be aware that AMSTAR-2 and ROBIS ratings reported in overviews and CPGs may be inaccurate because the tools are not applied as intended by their developers. This can result from inadequate training of the overview or CPG authors who perform the appraisals, or to modifications of the appraisal tools imposed by them. The potential variability in overall confidence and RoB ratings highlights why appraisers applying these tools need to support their judgments with explicit documentation; this allows readers to judge for themselves whether they agree with the criteria used by appraisers [ 4 , 108 ]. When these judgments are explicit, the underlying rationale used when applying these tools can be assessed [ 109 ].

Theoretically, we would expect an association of AMSTAR-2 with improved methodological rigor and an association of ROBIS with lower RoB in recent systematic reviews compared to those published before 2017. To our knowledge, this has not yet been demonstrated; however, like reports about the actual uptake of these tools, time will tell. Additional data on user experience is also needed to further elucidate the practical challenges and methodological nuances encountered with the application of these tools. This information could potentially inform the creation of unifying criteria to guide and standardize the appraisal of evidence syntheses [ 109 ].

Evaluation of reporting

Complete reporting is essential for users to establish the trustworthiness and applicability of a systematic review’s findings. Efforts to standardize and improve the reporting of systematic reviews resulted in the 2009 publication of the PRISMA statement [ 92 ] with its accompanying explanation and elaboration document [ 110 ]. This guideline was designed to help authors prepare a complete and transparent report of their systematic review. In addition, adherence to PRISMA is often used to evaluate the thoroughness of reporting of published systematic reviews [ 111 ]. The updated version, PRISMA 2020 [ 93 ], and its guidance document [ 112 ] were published in 2021. Items on the original and updated versions of PRISMA are organized by the six basic review components they address (title, abstract, introduction, methods, results, discussion). The PRISMA 2020 update is a considerably expanded version of the original; it includes standards and examples for the 27 original and 13 additional reporting items that capture methodological advances and may enhance the replicability of reviews [ 113 ].

The original PRISMA statement fostered the development of various PRISMA extensions (Table 3.3 ). These include reporting guidance for scoping reviews and reviews of diagnostic test accuracy and for intervention reviews that report on the following: harms outcomes, equity issues, the effects of acupuncture, the results of network meta-analyses and analyses of individual participant data. Detailed reporting guidance for specific systematic review components (abstracts, protocols, literature searches) is also available.

PRISMA extensions

PRISMA for systematic reviews with a focus on health equity [ ]PRISMA-E2012
Reporting systematic reviews in journal and conference abstracts [ ]PRISMA for Abstracts2015; 2020
PRISMA for systematic review protocols [ ]PRISMA-P2015
PRISMA for Network Meta-Analyses [ ]PRISMA-NMA2015
PRISMA for Individual Participant Data [ ]PRISMA-IPD2015
PRISMA for reviews including harms outcomes [ ]PRISMA-Harms2016
PRISMA for diagnostic test accuracy [ ]PRISMA-DTA2018
PRISMA for scoping reviews [ ]PRISMA-ScR2018
PRISMA for acupuncture [ ]PRISMA-A2019
PRISMA for reporting literature searches [ ]PRISMA-S2021

PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses

a Note the abstract reporting checklist is now incorporated into PRISMA 2020 [ 93 ]

Uptake and impact

The 2009 PRISMA standards [ 92 ] for reporting have been widely endorsed by authors, journals, and EBM-related organizations. We anticipate the same for PRISMA 2020 [ 93 ] given its co-publication in multiple high-impact journals. However, to date, there is a lack of strong evidence for an association between improved systematic review reporting and endorsement of PRISMA 2009 standards [ 43 , 111 ]. Most journals require a PRISMA checklist accompany submissions of systematic review manuscripts. However, the accuracy of information presented on these self-reported checklists is not necessarily verified. It remains unclear which strategies (eg, authors’ self-report of checklists, peer reviewer checks) might improve adherence to the PRISMA reporting standards; in addition, the feasibility of any potentially effective strategies must be taken into consideration given the structure and limitations of current research and publication practices [ 124 ].

Pitfalls and limitations of PRISMA, AMSTAR-2, and ROBIS

Misunderstanding of the roles of these tools and their misapplication may be widespread problems. PRISMA 2020 is a reporting guideline that is most beneficial if consulted when developing a review as opposed to merely completing a checklist when submitting to a journal; at that point, the review is finished, with good or bad methodological choices. However, PRISMA checklists evaluate how completely an element of review conduct was reported, but do not evaluate the caliber of conduct or performance of a review. Thus, review authors and readers should not think that a rigorous systematic review can be produced by simply following the PRISMA 2020 guidelines. Similarly, it is important to recognize that AMSTAR-2 and ROBIS are tools to evaluate the conduct of a review but do not substitute for conceptual methodological guidance. In addition, they are not intended to be simple checklists. In fact, they have the potential for misuse or abuse if applied as such; for example, by calculating a total score to make a judgment about a review’s overall confidence or RoB. Proper selection of a response for the individual items on AMSTAR-2 and ROBIS requires training or at least reference to their accompanying guidance documents.

Not surprisingly, it has been shown that compliance with the PRISMA checklist is not necessarily associated with satisfying the standards of ROBIS [ 125 ]. AMSTAR-2 and ROBIS were not available when PRISMA 2009 was developed; however, they were considered in the development of PRISMA 2020 [ 113 ]. Therefore, future studies may show a positive relationship between fulfillment of PRISMA 2020 standards for reporting and meeting the standards of tools evaluating methodological quality and RoB.

Choice of an appropriate tool for the evaluation of a systematic review first involves identification of the underlying construct to be assessed. For systematic reviews of interventions, recommended tools include AMSTAR-2 and ROBIS for appraisal of conduct and PRISMA 2020 for completeness of reporting. All three tools were developed rigorously and provide easily accessible and detailed user guidance, which is necessary for their proper application and interpretation. When considering a manuscript for publication, training in these tools can sensitize peer reviewers and editors to major issues that may affect the review’s trustworthiness and completeness of reporting. Judgment of the overall certainty of a body of evidence and formulation of recommendations rely, in part, on AMSTAR-2 or ROBIS appraisals of systematic reviews. Therefore, training on the application of these tools is essential for authors of overviews and developers of CPGs. Peer reviewers and editors considering an overview or CPG for publication must hold their authors to a high standard of transparency regarding both the conduct and reporting of these appraisals.

Part 4. Meeting conduct standards

Many authors, peer reviewers, and editors erroneously equate fulfillment of the items on the PRISMA checklist with superior methodological rigor. For direction on methodology, we refer them to available resources that provide comprehensive conceptual guidance [ 59 , 60 ] as well as primers with basic step-by-step instructions [ 1 , 126 , 127 ]. This section is intended to complement study of such resources by facilitating use of AMSTAR-2 and ROBIS, tools specifically developed to evaluate methodological rigor of systematic reviews. These tools are widely accepted by methodologists; however, in the general medical literature, they are not uniformly selected for the critical appraisal of systematic reviews [ 88 , 96 ].

To enable their uptake, Table 4.1  links review components to the corresponding appraisal tool items. Expectations of AMSTAR-2 and ROBIS are concisely stated, and reasoning provided.

Systematic review components linked to appraisal with AMSTAR-2 and ROBIS a

Table Table
Methods for study selection#5#2.5All three components must be done in duplicate, and methods fully described.Helps to mitigate CoI and bias; also may improve accuracy.
Methods for data extraction#6#3.1
Methods for RoB assessmentNA#3.5
Study description#8#3.2Research design features, components of research question (eg, PICO), setting, funding sources.Allows readers to understand the individual studies in detail.
Sources of funding#10NAIdentified for all included studies.Can reveal CoI or bias.
Publication bias#15*#4.5Explored, diagrammed, and discussed.Publication and other selective reporting biases are major threats to the validity of systematic reviews.
Author CoI#16NADisclosed, with management strategies described.If CoI is identified, management strategies must be described to ensure confidence in the review.

CoI conflict of interest, MA meta-analysis, NA not addressed, PICO participant, intervention, comparison, outcome, PRISMA-P Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols, RoB risk of bias

a Components shown in bold are chosen for elaboration in Part 4 for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors; and/or 2) the component is evaluated by standards of an AMSTAR-2 “critical” domain

b Critical domains of AMSTAR-2 are indicated by *

Issues involved in meeting the standards for seven review components (identified in bold in Table 4.1 ) are addressed in detail. These were chosen for elaboration for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors based on consistent reports of their frequent AMSTAR-2 or ROBIS deficiencies [ 9 , 11 , 15 , 88 , 128 , 129 ]; and/or 2) the review component is judged by standards of an AMSTAR-2 “critical” domain. These have the greatest implications for how a systematic review will be appraised: if standards for any one of these critical domains are not met, the review is rated as having “critically low confidence.”

Research question

Specific and unambiguous research questions may have more value for reviews that deal with hypothesis testing. Mnemonics for the various elements of research questions are suggested by JBI and Cochrane (Table 2.1 ). These prompt authors to consider the specialized methods involved for developing different types of systematic reviews; however, while inclusion of the suggested elements makes a review compliant with a particular review’s methods, it does not necessarily make a research question appropriate. Table 4.2  lists acronyms that may aid in developing the research question. They include overlapping concepts of importance in this time of proliferating reviews of uncertain value [ 130 ]. If these issues are not prospectively contemplated, systematic review authors may establish an overly broad scope, or develop runaway scope allowing them to stray from predefined choices relating to key comparisons and outcomes.

Research question development

AcronymMeaning
feasible, interesting, novel, ethical, and relevant
specific, measurable, attainable, relevant, timely
time, outcomes, population, intervention, context, study design, plus (effect) moderators

a Cummings SR, Browner WS, Hulley SB. Conceiving the research question and developing the study plan. In: Hulley SB, Cummings SR, Browner WS, editors. Designing clinical research: an epidemiological approach; 4th edn. Lippincott Williams & Wilkins; 2007. p. 14–22

b Doran, GT. There’s a S.M.A.R.T. way to write management’s goals and objectives. Manage Rev. 1981;70:35-6.

c Johnson BT, Hennessy EA. Systematic reviews and meta-analyses in the health sciences: best practice methods for research syntheses. Soc Sci Med. 2019;233:237–51

Once a research question is established, searching on registry sites and databases for existing systematic reviews addressing the same or a similar topic is necessary in order to avoid contributing to research waste [ 131 ]. Repeating an existing systematic review must be justified, for example, if previous reviews are out of date or methodologically flawed. A full discussion on replication of intervention systematic reviews, including a consensus checklist, can be found in the work of Tugwell and colleagues [ 84 ].

Protocol development is considered a core component of systematic reviews [ 125 , 126 , 132 ]. Review protocols may allow researchers to plan and anticipate potential issues, assess validity of methods, prevent arbitrary decision-making, and minimize bias that can be introduced by the conduct of the review. Registration of a protocol that allows public access promotes transparency of the systematic review’s methods and processes and reduces the potential for duplication [ 132 ]. Thinking early and carefully about all the steps of a systematic review is pragmatic and logical and may mitigate the influence of the authors’ prior knowledge of the evidence [ 133 ]. In addition, the protocol stage is when the scope of the review can be carefully considered by authors, reviewers, and editors; this may help to avoid production of overly ambitious reviews that include excessive numbers of comparisons and outcomes or are undisciplined in their study selection.

An association with attainment of AMSTAR standards in systematic reviews with published prospective protocols has been reported [ 134 ]. However, completeness of reporting does not seem to be different in reviews with a protocol compared to those without one [ 135 ]. PRISMA-P [ 116 ] and its accompanying elaboration and explanation document [ 136 ] can be used to guide and assess the reporting of protocols. A final version of the review should fully describe any protocol deviations. Peer reviewers may compare the submitted manuscript with any available pre-registered protocol; this is required if AMSTAR-2 or ROBIS are used for critical appraisal.

There are multiple options for the recording of protocols (Table 4.3 ). Some journals will peer review and publish protocols. In addition, many online sites offer date-stamped and publicly accessible protocol registration. Some of these are exclusively for protocols of evidence syntheses; others are less restrictive and offer researchers the capacity for data storage, sharing, and other workflow features. These sites document protocol details to varying extents and have different requirements [ 137 ]. The most popular site for systematic reviews, the International Prospective Register of Systematic Reviews (PROSPERO), for example, only registers reviews that report on an outcome with direct relevance to human health. The PROSPERO record documents protocols for all types of reviews except literature and scoping reviews. Of note, PROSPERO requires authors register their review protocols prior to any data extraction [ 133 , 138 ]. The electronic records of most of these registry sites allow authors to update their protocols and facilitate transparent tracking of protocol changes, which are not unexpected during the progress of the review [ 139 ].

Options for protocol registration of evidence syntheses

 BMJ Open
 BioMed Central
 JMIR Research Protocols
 World Journal of Meta-analysis
 Cochrane
 JBI
 PROSPERO

 Research Registry-

 Registry of Systematic Reviews/Meta-Analyses

 International Platform of Registered Systematic Review and Meta-analysis Protocols (INPLASY)
 Center for Open Science
 Protocols.io
 Figshare
 Open Science Framework
 Zenodo

a Authors are advised to contact their target journal regarding submission of systematic review protocols

b Registration is restricted to approved review projects

c The JBI registry lists review projects currently underway by JBI-affiliated entities. These records include a review’s title, primary author, research question, and PICO elements. JBI recommends that authors register eligible protocols with PROSPERO

d See Pieper and Rombey [ 137 ] for detailed characteristics of these five registries

e See Pieper and Rombey [ 137 ] for other systematic review data repository options

Study design inclusion

For most systematic reviews, broad inclusion of study designs is recommended [ 126 ]. This may allow comparison of results between contrasting study design types [ 126 ]. Certain study designs may be considered preferable depending on the type of review and nature of the research question. However, prevailing stereotypes about what each study design does best may not be accurate. For example, in systematic reviews of interventions, randomized designs are typically thought to answer highly specific questions while non-randomized designs often are expected to reveal greater information about harms or real-word evidence [ 126 , 140 , 141 ]. This may be a false distinction; randomized trials may be pragmatic [ 142 ], they may offer important (and more unbiased) information on harms [ 143 ], and data from non-randomized trials may not necessarily be more real-world-oriented [ 144 ].

Moreover, there may not be any available evidence reported by RCTs for certain research questions; in some cases, there may not be any RCTs or NRSI. When the available evidence is limited to case reports and case series, it is not possible to test hypotheses nor provide descriptive estimates or associations; however, a systematic review of these studies can still offer important insights [ 81 , 145 ]. When authors anticipate that limited evidence of any kind may be available to inform their research questions, a scoping review can be considered. Alternatively, decisions regarding inclusion of indirect as opposed to direct evidence can be addressed during protocol development [ 146 ]. Including indirect evidence at an early stage of intervention systematic review development allows authors to decide if such studies offer any additional and/or different understanding of treatment effects for their population or comparison of interest. Issues of indirectness of included studies are accounted for later in the process, during determination of the overall certainty of evidence (see Part 5 for details).

Evidence search

Both AMSTAR-2 and ROBIS require systematic and comprehensive searches for evidence. This is essential for any systematic review. Both tools discourage search restrictions based on language and publication source. Given increasing globalism in health care, the practice of including English-only literature should be avoided [ 126 ]. There are many examples in which language bias (different results in studies published in different languages) has been documented [ 147 , 148 ]. This does not mean that all literature, in all languages, is equally trustworthy [ 148 ]; however, the only way to formally probe for the potential of such biases is to consider all languages in the initial search. The gray literature and a search of trials may also reveal important details about topics that would otherwise be missed [ 149 – 151 ]. Again, inclusiveness will allow review authors to investigate whether results differ in gray literature and trials [ 41 , 151 – 153 ].

Authors should make every attempt to complete their review within one year as that is the likely viable life of a search. (1) If that is not possible, the search should be updated close to the time of completion [ 154 ]. Different research topics may warrant less of a delay, for example, in rapidly changing fields (as in the case of the COVID-19 pandemic), even one month may radically change the available evidence.

Excluded studies

AMSTAR-2 requires authors to provide references for any studies excluded at the full text phase of study selection along with reasons for exclusion; this allows readers to feel confident that all relevant literature has been considered for inclusion and that exclusions are defensible.

Risk of bias assessment of included studies

The design of the studies included in a systematic review (eg, RCT, cohort, case series) should not be equated with appraisal of its RoB. To meet AMSTAR-2 and ROBIS standards, systematic review authors must examine RoB issues specific to the design of each primary study they include as evidence. It is unlikely that a single RoB appraisal tool will be suitable for all research designs. In addition to tools for randomized and non-randomized studies, specific tools are available for evaluation of RoB in case reports and case series [ 82 ] and single-case experimental designs [ 155 , 156 ]. Note the RoB tools selected must meet the standards of the appraisal tool used to judge the conduct of the review. For example, AMSTAR-2 identifies four sources of bias specific to RCTs and NRSI that must be addressed by the RoB tool(s) chosen by the review authors. The Cochrane RoB-2 [ 157 ] tool for RCTs and ROBINS-I [ 158 ] for NRSI for RoB assessment meet the AMSTAR-2 standards. Appraisers on the review team should not modify any RoB tool without complete transparency and acknowledgment that they have invalidated the interpretation of the tool as intended by its developers [ 159 ]. Conduct of RoB assessments is not addressed AMSTAR-2; to meet ROBIS standards, two independent reviewers should complete RoB assessments of included primary studies.

Implications of the RoB assessments must be explicitly discussed and considered in the conclusions of the review. Discussion of the overall RoB of included studies may consider the weight of the studies at high RoB, the importance of the sources of bias in the studies being summarized, and if their importance differs in relationship to the outcomes reported. If a meta-analysis is performed, serious concerns for RoB of individual studies should be accounted for in these results as well. If the results of the meta-analysis for a specific outcome change when studies at high RoB are excluded, readers will have a more accurate understanding of this body of evidence. However, while investigating the potential impact of specific biases is a useful exercise, it is important to avoid over-interpretation, especially when there are sparse data.

Synthesis methods for quantitative data

Syntheses of quantitative data reported by primary studies are broadly categorized as one of two types: meta-analysis, and synthesis without meta-analysis (Table 4.4 ). Before deciding on one of these methods, authors should seek methodological advice about whether reported data can be transformed or used in other ways to provide a consistent effect measure across studies [ 160 , 161 ].

Common methods for quantitative synthesis

Aggregate data

Individual

participant data

Weighted average of effect estimates

Pairwise comparisons of effect estimates, CI

Overall effect estimate, CI, value

Evaluation of heterogeneity

Forest plot with summary statistic for average effect estimate
Network Variable The interventions, which are compared directly indirectlyNetwork diagram or graph, tabular presentations
Comparisons of relative effects between any pair of interventionsEffect estimates for intervention pairings
Summary relative effects for pair-wise comparisons with evaluations of inconsistency and heterogeneityForest plot, other methods
Treatment rankings (ie, probability that an intervention is among the best options)Rankogram plot
Summarizing effect estimates from separate studies (without combination that would provide an average effect estimate)Range and distribution of observed effects such as median, interquartile range, range

Box-and-whisker plot, bubble plot

Forest plot (without summary effect estimate)

Combining valuesCombined value, number of studiesAlbatross plot (study sample size against values per outcome)
Vote counting by direction of effect (eg, favors intervention over the comparator)Proportion of studies with an effect in the direction of interest, CI, valueHarvest plot, effect direction plot

CI confidence interval (or credible interval, if analysis is done in Bayesian framework)

a See text for descriptions of the types of data combined in each of these approaches

b See Additional File 4  for guidance on the structure and presentation of forest plots

c General approach is similar to aggregate data meta-analysis but there are substantial differences relating to data collection and checking and analysis [ 162 ]. This approach to syntheses is applicable to intervention, diagnostic, and prognostic systematic reviews [ 163 ]

d Examples include meta-regression, hierarchical and multivariate approaches [ 164 ]

e In-depth guidance and illustrations of these methods are provided in Chapter 12 of the Cochrane Handbook [ 160 ]

Meta-analysis

Systematic reviews that employ meta-analysis should not be referred to simply as “meta-analyses.” The term meta-analysis strictly refers to a specific statistical technique used when study effect estimates and their variances are available, yielding a quantitative summary of results. In general, methods for meta-analysis involve use of a weighted average of effect estimates from two or more studies. If considered carefully, meta-analysis increases the precision of the estimated magnitude of effect and can offer useful insights about heterogeneity and estimates of effects. We refer to standard references for a thorough introduction and formal training [ 165 – 167 ].

There are three common approaches to meta-analysis in current health care–related systematic reviews (Table 4.4 ). Aggregate meta-analyses is the most familiar to authors of evidence syntheses and their end users. This standard meta-analysis combines data on effect estimates reported by studies that investigate similar research questions involving direct comparisons of an intervention and comparator. Results of these analyses provide a single summary intervention effect estimate. If the included studies in a systematic review measure an outcome differently, their reported results may be transformed to make them comparable [ 161 ]. Forest plots visually present essential information about the individual studies and the overall pooled analysis (see Additional File 4  for details).

Less familiar and more challenging meta-analytical approaches used in secondary research include individual participant data (IPD) and network meta-analyses (NMA); PRISMA extensions provide reporting guidelines for both [ 117 , 118 ]. In IPD, the raw data on each participant from each eligible study are re-analyzed as opposed to the study-level data analyzed in aggregate data meta-analyses [ 168 ]. This may offer advantages, including the potential for limiting concerns about bias and allowing more robust analyses [ 163 ]. As suggested by the description in Table 4.4 , NMA is a complex statistical approach. It combines aggregate data [ 169 ] or IPD [ 170 ] for effect estimates from direct and indirect comparisons reported in two or more studies of three or more interventions. This makes it a potentially powerful statistical tool; while multiple interventions are typically available to treat a condition, few have been evaluated in head-to-head trials [ 171 ]. Both IPD and NMA facilitate a broader scope, and potentially provide more reliable and/or detailed results; however, compared with standard aggregate data meta-analyses, their methods are more complicated, time-consuming, and resource-intensive, and they have their own biases, so one needs sufficient funding, technical expertise, and preparation to employ them successfully [ 41 , 172 , 173 ].

Several items in AMSTAR-2 and ROBIS address meta-analysis; thus, understanding the strengths, weaknesses, assumptions, and limitations of methods for meta-analyses is important. According to the standards of both tools, plans for a meta-analysis must be addressed in the review protocol, including reasoning, description of the type of quantitative data to be synthesized, and the methods planned for combining the data. This should not consist of stock statements describing conventional meta-analysis techniques; rather, authors are expected to anticipate issues specific to their research questions. Concern for the lack of training in meta-analysis methods among systematic review authors cannot be overstated. For those with training, the use of popular software (eg, RevMan [ 174 ], MetaXL [ 175 ], JBI SUMARI [ 176 ]) may facilitate exploration of these methods; however, such programs cannot substitute for the accurate interpretation of the results of meta-analyses, especially for more complex meta-analytical approaches.

Synthesis without meta-analysis

There are varied reasons a meta-analysis may not be appropriate or desirable [ 160 , 161 ]. Syntheses that informally use statistical methods other than meta-analysis are variably referred to as descriptive, narrative, or qualitative syntheses or summaries; these terms are also applied to syntheses that make no attempt to statistically combine data from individual studies. However, use of such imprecise terminology is discouraged; in order to fully explore the results of any type of synthesis, some narration or description is needed to supplement the data visually presented in tabular or graphic forms [ 63 , 177 ]. In addition, the term “qualitative synthesis” is easily confused with a synthesis of qualitative data in a qualitative or mixed methods review. “Synthesis without meta-analysis” is currently the preferred description of other ways to combine quantitative data from two or more studies. Use of this specific terminology when referring to these types of syntheses also implies the application of formal methods (Table 4.4 ).

Methods for syntheses without meta-analysis involve structured presentations of the data in any tables and plots. In comparison to narrative descriptions of each study, these are designed to more effectively and transparently show patterns and convey detailed information about the data; they also allow informal exploration of heterogeneity [ 178 ]. In addition, acceptable quantitative statistical methods (Table 4.4 ) are formally applied; however, it is important to recognize these methods have significant limitations for the interpretation of the effectiveness of an intervention [ 160 ]. Nevertheless, when meta-analysis is not possible, the application of these methods is less prone to bias compared with an unstructured narrative description of included studies [ 178 , 179 ].

Vote counting is commonly used in systematic reviews and involves a tally of studies reporting results that meet some threshold of importance applied by review authors. Until recently, it has not typically been identified as a method for synthesis without meta-analysis. Guidance on an acceptable vote counting method based on direction of effect is currently available [ 160 ] and should be used instead of narrative descriptions of such results (eg, “more than half the studies showed improvement”; “only a few studies reported adverse effects”; “7 out of 10 studies favored the intervention”). Unacceptable methods include vote counting by statistical significance or magnitude of effect or some subjective rule applied by the authors.

AMSTAR-2 and ROBIS standards do not explicitly address conduct of syntheses without meta-analysis, although AMSTAR-2 items 13 and 14 might be considered relevant. Guidance for the complete reporting of syntheses without meta-analysis for systematic reviews of interventions is available in the Synthesis without Meta-analysis (SWiM) guideline [ 180 ] and methodological guidance is available in the Cochrane Handbook [ 160 , 181 ].

Familiarity with AMSTAR-2 and ROBIS makes sense for authors of systematic reviews as these appraisal tools will be used to judge their work; however, training is necessary for authors to truly appreciate and apply methodological rigor. Moreover, judgment of the potential contribution of a systematic review to the current knowledge base goes beyond meeting the standards of AMSTAR-2 and ROBIS. These tools do not explicitly address some crucial concepts involved in the development of a systematic review; this further emphasizes the need for author training.

We recommend that systematic review authors incorporate specific practices or exercises when formulating a research question at the protocol stage, These should be designed to raise the review team’s awareness of how to prevent research and resource waste [ 84 , 130 ] and to stimulate careful contemplation of the scope of the review [ 30 ]. Authors’ training should also focus on justifiably choosing a formal method for the synthesis of quantitative and/or qualitative data from primary research; both types of data require specific expertise. For typical reviews that involve syntheses of quantitative data, statistical expertise is necessary, initially for decisions about appropriate methods, [ 160 , 161 ] and then to inform any meta-analyses [ 167 ] or other statistical methods applied [ 160 ].

Part 5. Rating overall certainty of evidence

Report of an overall certainty of evidence assessment in a systematic review is an important new reporting standard of the updated PRISMA 2020 guidelines [ 93 ]. Systematic review authors are well acquainted with assessing RoB in individual primary studies, but much less familiar with assessment of overall certainty across an entire body of evidence. Yet a reliable way to evaluate this broader concept is now recognized as a vital part of interpreting the evidence.

Historical systems for rating evidence are based on study design and usually involve hierarchical levels or classes of evidence that use numbers and/or letters to designate the level/class. These systems were endorsed by various EBM-related organizations. Professional societies and regulatory groups then widely adopted them, often with modifications for application to the available primary research base in specific clinical areas. In 2002, a report issued by the AHRQ identified 40 systems to rate quality of a body of evidence [ 182 ]. A critical appraisal of systems used by prominent health care organizations published in 2004 revealed limitations in sensibility, reproducibility, applicability to different questions, and usability to different end users [ 183 ]. Persistent use of hierarchical rating schemes to describe overall quality continues to complicate the interpretation of evidence. This is indicated by recent reports of poor interpretability of systematic review results by readers [ 184 – 186 ] and misleading interpretations of the evidence related to the “spin” systematic review authors may put on their conclusions [ 50 , 187 ].

Recognition of the shortcomings of hierarchical rating systems raised concerns that misleading clinical recommendations could result even if based on a rigorous systematic review. In addition, the number and variability of these systems were considered obstacles to quick and accurate interpretations of the evidence by clinicians, patients, and policymakers [ 183 ]. These issues contributed to the development of the GRADE approach. An international working group, that continues to actively evaluate and refine it, first introduced GRADE in 2004 [ 188 ]. Currently more than 110 organizations from 19 countries around the world have endorsed or are using GRADE [ 189 ].

GRADE approach to rating overall certainty

GRADE offers a consistent and sensible approach for two separate processes: rating the overall certainty of a body of evidence and the strength of recommendations. The former is the expected conclusion of a systematic review, while the latter is pertinent to the development of CPGs. As such, GRADE provides a mechanism to bridge the gap from evidence synthesis to application of the evidence for informed clinical decision-making [ 27 , 190 ]. We briefly examine the GRADE approach but only as it applies to rating overall certainty of evidence in systematic reviews.

In GRADE, use of “certainty” of a body of evidence is preferred over the term “quality.” [ 191 ] Certainty refers to the level of confidence systematic review authors have that, for each outcome, an effect estimate represents the true effect. The GRADE approach to rating confidence in estimates begins with identifying the study type (RCT or NRSI) and then systematically considers criteria to rate the certainty of evidence up or down (Table 5.1 ).

GRADE criteria for rating certainty of evidence

[ ]
Risk of bias [ ]Large magnitude of effect
Imprecision [ ]Dose–response gradient
Inconsistency [ ]All residual confounding would decrease magnitude of effect (in situations with an effect)
Indirectness [ ]
Publication bias [ ]

a Applies to randomized studies

b Applies to non-randomized studies

This process results in assignment of one of the four GRADE certainty ratings to each outcome; these are clearly conveyed with the use of basic interpretation symbols (Table 5.2 ) [ 192 ]. Notably, when multiple outcomes are reported in a systematic review, each outcome is assigned a unique certainty rating; thus different levels of certainty may exist in the body of evidence being examined.

GRADE certainty ratings and their interpretation symbols a

 ⊕  ⊕  ⊕  ⊕ High: We are very confident that the true effect lies close to that of the estimate of the effect
 ⊕  ⊕  ⊕ Moderate: We are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different
 ⊕  ⊕ Low: Our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect
 ⊕ Very low: We have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect

a From the GRADE Handbook [ 192 ]

GRADE’s developers acknowledge some subjectivity is involved in this process [ 193 ]. In addition, they emphasize that both the criteria for rating evidence up and down (Table 5.1 ) as well as the four overall certainty ratings (Table 5.2 ) reflect a continuum as opposed to discrete categories [ 194 ]. Consequently, deciding whether a study falls above or below the threshold for rating up or down may not be straightforward, and preliminary overall certainty ratings may be intermediate (eg, between low and moderate). Thus, the proper application of GRADE requires systematic review authors to take an overall view of the body of evidence and explicitly describe the rationale for their final ratings.

Advantages of GRADE

Outcomes important to the individuals who experience the problem of interest maintain a prominent role throughout the GRADE process [ 191 ]. These outcomes must inform the research questions (eg, PICO [population, intervention, comparator, outcome]) that are specified a priori in a systematic review protocol. Evidence for these outcomes is then investigated and each critical or important outcome is ultimately assigned a certainty of evidence as the end point of the review. Notably, limitations of the included studies have an impact at the outcome level. Ultimately, the certainty ratings for each outcome reported in a systematic review are considered by guideline panels. They use a different process to formulate recommendations that involves assessment of the evidence across outcomes [ 201 ]. It is beyond our scope to describe the GRADE process for formulating recommendations; however, it is critical to understand how these two outcome-centric concepts of certainty of evidence in the GRADE framework are related and distinguished. An in-depth illustration using examples from recently published evidence syntheses and CPGs is provided in Additional File 5 A (Table AF5A-1).

The GRADE approach is applicable irrespective of whether the certainty of the primary research evidence is high or very low; in some circumstances, indirect evidence of higher certainty may be considered if direct evidence is unavailable or of low certainty [ 27 ]. In fact, most interventions and outcomes in medicine have low or very low certainty of evidence based on GRADE and there seems to be no major improvement over time [ 202 , 203 ]. This is still a very important (even if sobering) realization for calibrating our understanding of medical evidence. A major appeal of the GRADE approach is that it offers a common framework that enables authors of evidence syntheses to make complex judgments about evidence certainty and to convey these with unambiguous terminology. This prevents some common mistakes made by review authors, including overstating results (or under-reporting harms) [ 187 ] and making recommendations for treatment. This is illustrated in Table AF5A-2 (Additional File 5 A), which compares the concluding statements made about overall certainty in a systematic review with and without application of the GRADE approach.

Theoretically, application of GRADE should improve consistency of judgments about certainty of evidence, both between authors and across systematic reviews. In one empirical evaluation conducted by the GRADE Working Group, interrater reliability of two individual raters assessing certainty of the evidence for a specific outcome increased from ~ 0.3 without using GRADE to ~ 0.7 by using GRADE [ 204 ]. However, others report variable agreement among those experienced in GRADE assessments of evidence certainty [ 190 ]. Like any other tool, GRADE requires training in order to be properly applied. The intricacies of the GRADE approach and the necessary subjectivity involved suggest that improving agreement may require strict rules for its application; alternatively, use of general guidance and consensus among review authors may result in less consistency but provide important information for the end user [ 190 ].

GRADE caveats

Simply invoking “the GRADE approach” does not automatically ensure GRADE methods were employed by authors of a systematic review (or developers of a CPG). Table 5.3 lists the criteria the GRADE working group has established for this purpose. These criteria highlight the specific terminology and methods that apply to rating the certainty of evidence for outcomes reported in a systematic review [ 191 ], which is different from rating overall certainty across outcomes considered in the formulation of recommendations [ 205 ]. Modifications of standard GRADE methods and terminology are discouraged as these may detract from GRADE’s objectives to minimize conceptual confusion and maximize clear communication [ 206 ].

Criteria for using GRADE in a systematic review a

1. The certainty in the evidence (also known as quality of evidence or confidence in the estimates) should be defined consistently with the definitions used by the GRADE Working Group.
2. Explicit consideration should be given to each of the GRADE domains for assessing the certainty in the evidence (although different terminology may be used).
3. The overall certainty in the evidence should be assessed for each important outcome using four or three categories (such as high, moderate, low and/or very low) and definitions for each category that are consistent with the definitions used by the GRADE Working Group.
4. Evidence summaries … should be used as the basis for judgments about the certainty in the evidence.

a Adapted from the GRADE working group [ 206 ]; this list does not contain the additional criteria that apply to the development of a clinical practice guideline

Nevertheless, GRADE is prone to misapplications [ 207 , 208 ], which can distort a systematic review’s conclusions about the certainty of evidence. Systematic review authors without proper GRADE training are likely to misinterpret the terms “quality” and “grade” and to misunderstand the constructs assessed by GRADE versus other appraisal tools. For example, review authors may reference the standard GRADE certainty ratings (Table 5.2 ) to describe evidence for their outcome(s) of interest. However, these ratings are invalidated if authors omit or inadequately perform RoB evaluations of each included primary study. Such deficiencies in RoB assessments are unacceptable but not uncommon, as reported in methodological studies of systematic reviews and overviews [ 104 , 186 , 209 , 210 ]. GRADE ratings are also invalidated if review authors do not formally address and report on the other criteria (Table 5.1 ) necessary for a GRADE certainty rating.

Other caveats pertain to application of a GRADE certainty of evidence rating in various types of evidence syntheses. Current adaptations of GRADE are described in Additional File 5 B and included on Table 6.3 , which is introduced in the next section.

Concise Guide to best practices for evidence syntheses, version 1.0 a

Cochrane , JBICochrane, JBICochraneCochrane, JBIJBIJBIJBICochrane, JBIJBI
 ProtocolPRISMA-P [ ]PRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-P
 Systematic reviewPRISMA 2020 [ ]PRISMA-DTA [ ]PRISMA 2020

eMERGe [ ]

ENTREQ [ ]

PRISMA 2020PRISMA 2020PRISMA 2020PRIOR [ ]PRISMA-ScR [ ]
 Synthesis without MASWiM [ ]PRISMA-DTA [ ]SWiM eMERGe [ ] ENTREQ [ ] SWiM SWiM SWiM PRIOR [ ]

For RCTs: Cochrane RoB2 [ ]

For NRSI:

ROBINS-I [ ]

Other primary research

QUADAS-2[ ]

Factor review QUIPS [ ]

Model review PROBAST [ ]

CASP qualitative checklist [ ]

JBI Critical Appraisal Checklist [ ]

JBI checklist for studies reporting prevalence data [ ]

For NRSI: ROBINS-I [ ]

Other primary research

COSMIN RoB Checklist [ ]AMSTAR-2 [ ] or ROBIS [ ]Not required
GRADE [ ]GRADE adaptation GRADE adaptation

CERQual [ ]

ConQual [ ]

GRADE adaptation Risk factors GRADE adaptation

GRADE (for intervention reviews)

Risk factors

Not applicable

AMSTAR A MeaSurement Tool to Assess Systematic Reviews, CASP Critical Appraisal Skills Programme, CERQual Confidence in the Evidence from Reviews of Qualitative research, ConQual Establishing Confidence in the output of Qualitative research synthesis, COSMIN COnsensus-based Standards for the selection of health Measurement Instruments, DTA diagnostic test accuracy, eMERGe meta-ethnography reporting guidance, ENTREQ enhancing transparency in reporting the synthesis of qualitative research, GRADE Grading of Recommendations Assessment, Development and Evaluation, MA meta-analysis, NRSI non-randomized studies of interventions, P protocol, PRIOR Preferred Reporting Items for Overviews of Reviews, PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses, PROBAST Prediction model Risk Of Bias ASsessment Tool, QUADAS quality assessment of studies of diagnostic accuracy included in systematic reviews, QUIPS Quality In Prognosis Studies, RCT randomized controlled trial, RoB risk of bias, ROBINS-I Risk Of Bias In Non-randomised Studies of Interventions, ROBIS Risk of Bias in Systematic Reviews, ScR scoping review, SWiM systematic review without meta-analysis

a Superscript numbers represent citations provided in the main reference list. Additional File 6 lists links to available online resources for the methods and tools included in the Concise Guide

b The MECIR manual [ 30 ] provides Cochrane’s specific standards for both reporting and conduct of intervention systematic reviews and protocols

c Editorial and peer reviewers can evaluate completeness of reporting in submitted manuscripts using these tools. Authors may be required to submit a self-reported checklist for the applicable tools

d The decision flowchart described by Flemming and colleagues [ 223 ] is recommended for guidance on how to choose the best approach to reporting for qualitative reviews

e SWiM was developed for intervention studies reporting quantitative data. However, if there is not a more directly relevant reporting guideline, SWiM may prompt reviewers to consider the important details to report. (Personal Communication via email, Mhairi Campbell, 14 Dec 2022)

f JBI recommends their own tools for the critical appraisal of various quantitative primary study designs included in systematic reviews of intervention effectiveness, prevalence and incidence, and etiology and risk as well as for the critical appraisal of systematic reviews included in umbrella reviews. However, except for the JBI Checklists for studies reporting prevalence data and qualitative research, the development, validity, and reliability of these tools are not well documented

g Studies that are not RCTs or NRSI require tools developed specifically to evaluate their design features. Examples include single case experimental design [ 155 , 156 ] and case reports and series [ 82 ]

h The evaluation of methodological quality of studies included in a synthesis of qualitative research is debatable [ 224 ]. Authors may select a tool appropriate for the type of qualitative synthesis methodology employed. The CASP Qualitative Checklist [ 218 ] is an example of a published, commonly used tool that focuses on assessment of the methodological strengths and limitations of qualitative studies. The JBI Critical Appraisal Checklist for Qualitative Research [ 219 ] is recommended for reviews using a meta-aggregative approach

i Consider including risk of bias assessment of included studies if this information is relevant to the research question; however, scoping reviews do not include an assessment of the overall certainty of a body of evidence

j Guidance available from the GRADE working group [ 225 , 226 ]; also recommend consultation with the Cochrane diagnostic methods group

k Guidance available from the GRADE working group [ 227 ]; also recommend consultation with Cochrane prognostic methods group

l Used for syntheses in reviews with a meta-aggregative approach [ 224 ]

m Chapter 5 in the JBI Manual offers guidance on how to adapt GRADE to prevalence and incidence reviews [ 69 ]

n Janiaud and colleagues suggest criteria for evaluating evidence certainty for meta-analyses of non-randomized studies evaluating risk factors [ 228 ]

o The COSMIN user manual provides details on how to apply GRADE in systematic reviews of measurement properties [ 229 ]

The expected culmination of a systematic review should be a rating of overall certainty of a body of evidence for each outcome reported. The GRADE approach is recommended for making these judgments for outcomes reported in systematic reviews of interventions and can be adapted for other types of reviews. This represents the initial step in the process of making recommendations based on evidence syntheses. Peer reviewers should ensure authors meet the minimal criteria for supporting the GRADE approach when reviewing any evidence synthesis that reports certainty ratings derived using GRADE. Authors and peer reviewers of evidence syntheses unfamiliar with GRADE are encouraged to seek formal training and take advantage of the resources available on the GRADE website [ 211 , 212 ].

Part 6. Concise Guide to best practices

Accumulating data in recent years suggest that many evidence syntheses (with or without meta-analysis) are not reliable. This relates in part to the fact that their authors, who are often clinicians, can be overwhelmed by the plethora of ways to evaluate evidence. They tend to resort to familiar but often inadequate, inappropriate, or obsolete methods and tools and, as a result, produce unreliable reviews. These manuscripts may not be recognized as such by peer reviewers and journal editors who may disregard current standards. When such a systematic review is published or included in a CPG, clinicians and stakeholders tend to believe that it is trustworthy. A vicious cycle in which inadequate methodology is rewarded and potentially misleading conclusions are accepted is thus supported. There is no quick or easy way to break this cycle; however, increasing awareness of best practices among all these stakeholder groups, who often have minimal (if any) training in methodology, may begin to mitigate it. This is the rationale for inclusion of Parts 2 through 5 in this guidance document. These sections present core concepts and important methodological developments that inform current standards and recommendations. We conclude by taking a direct and practical approach.

Inconsistent and imprecise terminology used in the context of development and evaluation of evidence syntheses is problematic for authors, peer reviewers and editors, and may lead to the application of inappropriate methods and tools. In response, we endorse use of the basic terms (Table 6.1 ) defined in the PRISMA 2020 statement [ 93 ]. In addition, we have identified several problematic expressions and nomenclature. In Table 6.2 , we compile suggestions for preferred terms less likely to be misinterpreted.

Terms relevant to the reporting of health care–related evidence syntheses a

A review that uses explicit, systematic methods to collate and synthesize findings of studies that address a clearly formulated question.
The combination of quantitative results of two or more studies. This encompasses meta-analysis of effect estimates and other methods, such as combining values, calculating the range and distribution of observed effects, and vote counting based on the direction of effect.
A statistical technique used to synthesize results when study effect estimates and their variances are available, yielding a quantitative summary of results.
An event or measurement collected for participants in a study (such as quality of life, mortality).
The combination of a point estimate (such as a mean difference, risk ratio or proportion) and a measure of its precision (such as a confidence/credible interval) for a particular outcome.
A document (paper or electronic) supplying information about a particular study. It could be a journal article, preprint, conference abstract, study register entry, clinical study report, dissertation, unpublished manuscript, government report, or any other document providing relevant information.
The title or abstract (or both) of a report indexed in a database or website (such as a title or abstract for an article indexed in Medline). Records that refer to the same report (such as the same journal article) are “duplicates”; however, records that refer to reports that are merely similar (such as a similar abstract submitted to two different conferences) should be considered unique.
An investigation, such as a clinical trial, that includes a defined group of participants and one or more interventions and outcomes. A “study” might have multiple reports. For example, reports could include the protocol, statistical analysis plan, baseline characteristics, results for the primary outcome, results for harms, results for secondary outcomes, and results for additional mediator and moderator analyses.

a Reproduced from Page and colleagues [ 93 ]

Terminology suggestions for health care–related evidence syntheses

PreferredPotentially problematic

Evidence synthesis with meta-analysis

Systematic review with meta-analysis

Meta-analysis
Overview or umbrella review

Systematic review of systematic reviews

Review of reviews

Meta-review

RandomizedExperimental
Non-randomizedObservational
Single case experimental design

Single-subject research

N-of-1 design

Case report or case seriesDescriptive study
Methodological qualityQuality
Certainty of evidence

Quality of evidence

Grade of evidence

Level of evidence

Strength of evidence

Qualitative systematic reviewQualitative synthesis
Synthesis of qualitative data Qualitative synthesis
Synthesis without meta-analysis

Narrative synthesis , narrative summary

Qualitative synthesis

Descriptive synthesis, descriptive summary

a For example, meta-aggregation, meta-ethnography, critical interpretative synthesis, realist synthesis

b This term may best apply to the synthesis in a mixed methods systematic review in which data from different types of evidence (eg, qualitative, quantitative, economic) are summarized [ 64 ]

We also propose a Concise Guide (Table 6.3 ) that summarizes the methods and tools recommended for the development and evaluation of nine types of evidence syntheses. Suggestions for specific tools are based on the rigor of their development as well as the availability of detailed guidance from their developers to ensure their proper application. The formatting of the Concise Guide addresses a well-known source of confusion by clearly distinguishing the underlying methodological constructs that these tools were designed to assess. Important clarifications and explanations follow in the guide’s footnotes; associated websites, if available, are listed in Additional File 6 .

To encourage uptake of best practices, journal editors may consider adopting or adapting the Concise Guide in their instructions to authors and peer reviewers of evidence syntheses. Given the evolving nature of evidence synthesis methodology, the suggested methods and tools are likely to require regular updates. Authors of evidence syntheses should monitor the literature to ensure they are employing current methods and tools. Some types of evidence syntheses (eg, rapid, economic, methodological) are not included in the Concise Guide; for these, authors are advised to obtain recommendations for acceptable methods by consulting with their target journal.

We encourage the appropriate and informed use of the methods and tools discussed throughout this commentary and summarized in the Concise Guide (Table 6.3 ). However, we caution against their application in a perfunctory or superficial fashion. This is a common pitfall among authors of evidence syntheses, especially as the standards of such tools become associated with acceptance of a manuscript by a journal. Consequently, published evidence syntheses may show improved adherence to the requirements of these tools without necessarily making genuine improvements in their performance.

In line with our main objective, the suggested tools in the Concise Guide address the reliability of evidence syntheses; however, we recognize that the utility of systematic reviews is an equally important concern. An unbiased and thoroughly reported evidence synthesis may still not be highly informative if the evidence itself that is summarized is sparse, weak and/or biased [ 24 ]. Many intervention systematic reviews, including those developed by Cochrane [ 203 ] and those applying GRADE [ 202 ], ultimately find no evidence, or find the evidence to be inconclusive (eg, “weak,” “mixed,” or of “low certainty”). This often reflects the primary research base; however, it is important to know what is known (or not known) about a topic when considering an intervention for patients and discussing treatment options with them.

Alternatively, the frequency of “empty” and inconclusive reviews published in the medical literature may relate to limitations of conventional methods that focus on hypothesis testing; these have emphasized the importance of statistical significance in primary research and effect sizes from aggregate meta-analyses [ 183 ]. It is becoming increasingly apparent that this approach may not be appropriate for all topics [ 130 ]. Development of the GRADE approach has facilitated a better understanding of significant factors (beyond effect size) that contribute to the overall certainty of evidence. Other notable responses include the development of integrative synthesis methods for the evaluation of complex interventions [ 230 , 231 ], the incorporation of crowdsourcing and machine learning into systematic review workflows (eg the Cochrane Evidence Pipeline) [ 2 ], the shift in paradigm to living systemic review and NMA platforms [ 232 , 233 ] and the proposal of a new evidence ecosystem that fosters bidirectional collaborations and interactions among a global network of evidence synthesis stakeholders [ 234 ]. These evolutions in data sources and methods may ultimately make evidence syntheses more streamlined, less duplicative, and more importantly, they may be more useful for timely policy and clinical decision-making; however, that will only be the case if they are rigorously reported and conducted.

We look forward to others’ ideas and proposals for the advancement of methods for evidence syntheses. For now, we encourage dissemination and uptake of the currently accepted best tools and practices for their development and evaluation; at the same time, we stress that uptake of appraisal tools, checklists, and software programs cannot substitute for proper education in the methodology of evidence syntheses and meta-analysis. Authors, peer reviewers, and editors must strive to make accurate and reliable contributions to the present evidence knowledge base; online alerts, upcoming technology, and accessible education may make this more feasible than ever before. Our intention is to improve the trustworthiness of evidence syntheses across disciplines, topics, and types of evidence syntheses. All of us must continue to study, teach, and act cooperatively for that to happen.

Acknowledgements

Michelle Oakman Hayes for her assistance with the graphics, Mike Clarke for his willingness to answer our seemingly arbitrary questions, and Bernard Dan for his encouragement of this project.

Authors’ contributions

All authors participated in the development of the ideas, writing, and review of this manuscript. The author(s) read and approved the final manuscript.

The work of John Ioannidis has been supported by an unrestricted gift from Sue and Bob O’Donnell to Stanford University.

Declarations

The authors declare no competing interests.

This article has been published simultaneously in BMC Systematic Reviews, Acta Anaesthesiologica Scandinavica, BMC Infectious Diseases, British Journal of Pharmacology, JBI Evidence Synthesis, the Journal of Bone and Joint Surgery Reviews , and the Journal of Pediatric Rehabilitation Medicine .

Publisher’ s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Advances in brain-computer interface for decoding speech imagery from EEG signals: a systematic review

  • Review Paper
  • Published: 04 September 2024

Cite this article

what is methodology in systematic review

  • Nimra Rahman 1 ,
  • Danish Mahmood Khan   ORCID: orcid.org/0000-0001-8815-6146 2 , 3 ,
  • Komal Masroor 1 ,
  • Mehak Arshad 1 ,
  • Amna Rafiq 1 &
  • Syeda Maham Fahim 1  

Numerous individuals encounter challenges in verbal communication due to various factors, including physical disabilities, neurological disorders, and strokes. In response to this pressing need, technology has actively pursued solutions to bridge the communication gap, recognizing the inherent difficulties faced in verbal communication, particularly in contexts where traditional methods may be inadequate. Electroencephalogram (EEG) has emerged as a primary non-invasive method for measuring brain activity, offering valuable insights from a cognitive neurodevelopmental perspective. It forms the basis for Brain-Computer Interfaces (BCIs) that provide a communication channel for individuals with neurological impairments, thereby empowering them to express themselves effectively. EEG-based BCIs, especially those adapted to decode imagined speech from EEG signals, represent a significant advancement in enabling individuals with speech disabilities to communicate through text or synthesized speech. By utilizing cognitive neurodevelopmental insights, researchers have been able to develop innovative approaches for interpreting EEG signals and translating them into meaningful communication outputs. To aid researchers in effectively addressing this complex challenge, this review article synthesizes key findings from state-of-the-art significant studies. It investigates into the methodologies employed by various researchers, including preprocessing techniques, feature extraction methods, and classification algorithms utilizing Deep Learning and Machine Learning approaches and their integration. Furthermore, the review outlines the potential avenues for future research, with the goal of advancing the practical implementation of EEG-based BCI systems for decoding imagined speech from a cognitive neurodevelopmental perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

what is methodology in systematic review

Explore related subjects

  • Artificial Intelligence

Data availability

Not applicable.

Abdulghani MM, Walters WL, Abed KH (2023) Imagined speech classification using EEG and deep learning. Bioengineering 10(6):649

Article   PubMed   PubMed Central   Google Scholar  

Abibullaev B, Keutayeva A, Zollanvari A (2023) Deep learning in EEG-based bcis: a comprehensive review of transformer models, advantages, challenges, and applications. IEEE Access

Agarwal P, Kumar S (2022) Electroencephalography based imagined alphabets classification using spatial and time-domain features. Int J Imaging Syst Technol 32(1):111–122

Article   Google Scholar  

Alizadeh D, Omranpour H (2023) Em-csp: an efficient multiclass common spatial pattern feature method for speech imagery EEG signals recognition. Biomed Signal Process Control 84:104933

Ali S, Mumtaz W, Maqsood A (2023) EEG based thought-to-text translation via deep learning. In: 2023 7th international multi-topic ICT conference (IMTIC). IEEE, pp 1–8

AlZoubi O, Koprinska I, Calvo RA (2008) Classification of brain-computer interface data. In: Proceedings of the 7th Australasian data mining conference. Vol 87, pp 123–131

Assenza G, Di Lazzaro V (2015) A useful electroencephalography (EEG) marker of brain plasticity: delta waves. Neural Regen Res 10(8):1216–1217

Basharpoor S, Heidari F, Molavi P (2021) EEG coherence in theta, alpha, and beta bands in frontal regions and executive functions. Appl Neuropsychol Adult 28(3):310–317

Article   PubMed   Google Scholar  

Cheng X, Huang K, Zou Y, Ma S (2024) Sleepegan: a gan-enhanced ensemble deep learning model for imbalanced classification of sleep stages. Biomed Signal Process Control 92:106020

Cheng C, Liu W, Fan Z, Feng L, Jia Z (2024) A novel transformer autoencoder for multi-modal emotion recognition with incomplete data. Neural Netw 172:106111

Chen X, Wang Y, Gao X (2008) High-speed steady-state visual evoked potential-based brain-computer interfaces. Smart wheelchairs and brain-computer interfaces. Elsevier, Amsterdam, pp 111–130

Chapter   Google Scholar  

Chen Z, Song Z, Zhang T, Wei Y (2024) Iot devices and data availability optimization by ann and knn. EURASIP J Inf Secur 2024(1):2

Clément C (2019) Brain-computer interface technologies, 1st edn. Springer, Cham

Book   Google Scholar  

Dhiman R et al (2023) Machine learning techniques for electroencephalogram based brain-computer interface: a systematic literature review. Meas Sens 28:100823

Garces Correa MA, Orosco LL (2018) EEG signal processing in brain–computer interface

Gasparini F, Cazzaniga E, Saibene A (2022) Inner speech recognition through electroencephalographic signals. arXiv preprint arXiv:2210.06472

Heting Q, Nuo G (2022) Research on the classification algorithm of imaginary speech EEG signals based on twin neural network. In: 2022 7th international conference on signal and image processing (ICSIP). IEEE, pp 211–216

He M, Wu Y, Li Z, Wang S, Li W, Zhou W, Rong H, Wang J (2023) Hmt: an EEG signal classification method based on cnn architecture. In: 2023 5th international conference on intelligent control, measurement and signal processing (ICMSP). IEEE, pp. 1015–1018

Hong N, Kim B, Lee J, Choe HK, Jin KH, Kang H (2024) Machine learning-based high-frequency neuronal spike reconstruction from low-frequency and low-sampling-rate recordings. Nat Commun 15(1):635

Article   CAS   PubMed   PubMed Central   Google Scholar  

Jafferson AJ, Ponnusamy V, Jovic J, Trajanovic M (2021) An iot based cloud EEG signal analytic framework for thought to text mapping. IEIE Trans Smart Process Comput 10(3):183–188

Jeong J-H, Cho J-H, Lee B-H, Lee S-W (2022) Real-time deep neurolinguistic learning enhances noninvasive neural language decoding for brain-machine interaction. IEEE Trans Cybern 53(12):7469–7482

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

Article   CAS   PubMed   Google Scholar  

Lopez-Bernal D, Balderas D, Ponce P, Molina A (2022) A state-of-the-art review of EEG-based imagined speech decoding. Front Hum Neurosci 16:867281

Luo S, Angrick M, Coogan C, Candrea DN, Wyse-Sookoo K, Shah S, Rabbani Q, Milsap GW, Weiss AR, Anderson WS et al (2023) Stable decoding from a speech bci enables control for an individual with als without recalibration for 3 months. Adv Sci 10(35):2304853

Mahapatra NC, Bhuyan P (2023) Decoding of imagined speech electroencephalography neural signals using transfer learning method. J Phys Commun 7(9):095002

Mandal SK, Naskar MNB (2023) Mi brain-computer interfaces: a concise overview. Biomed Signal Process Control 86:105293

Mohamed A-r, Dahl G, Hinton G, et al (2009) Deep belief networks for phone recognition. In: Nips workshop on deep learning for speech recognition and related applications. Vancouver, Canada vol. 1, p 39

Nagatani Y, Takazawa K, Maeda K, Kambara A, Soeta Y, Ogawa K (2023) Gamma-modulated human speech-originated sound evokes and entrains gamma wave in human brain. Appl Acoust 211:109518

Ng HW, Guan C (2024) Subject-independent meta-learning framework towards optimal training of EEG-based classifiers. Neural Netw 172:106108

Pan H, Wang Y, Li Z, Chu X, Teng B, Gao H (2024) A complete scheme for multi-character classification using EEG signals from speech imagery. IEEE Trans Biomed Eng

Park H-j, Lee B (2023) Multiclass classification of imagined speech EEG using noise-assisted multivariate empirical mode decomposition and multireceptive field convolutional neural network. Front Hum Neurosci 17:1186594

Ramadan RA, Altamimi AB (2024) Unraveling the potential of brain-computer interface technology in medical diagnostics and rehabilitation: a comprehensive literature review. Health Technol 14(2):263–276

Rusnac A-L, Grigore O (2021) EEG preprocessing methods for bci imagined speech signals. In: 2021 international conference on e-health and bioengineering (EHB). IEEE, pp 1–4

Saini M, Satija U (2023) State of art mental tasks classification based on electroencephalogram: a review. Physiol Meas

Schirrmeister RT, Springenberg JT, Fiederer LDJ, Glasstetter M, Eggensperger K, Tangermann M, Hutter F, Burgard W, Ball T (2017) Deep learning with convolutional neural networks for EEG decoding and visualization. Hum Brain Mapp 38(11):5391–5420

Segal E, Petrides M (2013) Functional activation during reading in relation to the sulci of the angular gyrus region. Eur J Neurosci 38(5):2793–2801

Sen O, Sheehan AM, Raman PR, Khara KS, Khalifa A, Chatterjee B (2023) Machine-learning methods for speech and handwriting detection using neural signals: a review. Sensors 23(12):5575

Shaila S, Anirudh B, Nair AS, Monish L, Murala P, Sanjana A (2023) EEG signal-based human emotion recognition using power spectrum density and discrete wavelet transform. In: International conference on emerging research in computing, information, communication and applications. Springer, pp 557–567

Tan EL, Tahedl M, Lope J, Hengeveld JC, Doherty MA, McLaughlin RL, Hardiman O, Chang KM, Finegan E, Bede P (2024) Language deficits in primary lateral sclerosis: cortical atrophy, white matter degeneration and functional disconnection between cerebral regions. J Neurol 271(1):431–445

Teo W-P, White D, Macpherson H (2018) Using noninvasive methods to drive brain-computer interface (bci): the role of electroencephalography and functional near-infrared spectroscopy in bci. Smart wheelchairs and brain-computer interfaces. Elsevier, Amsterdam, pp 33–63

Vorontsova D, Menshikov I, Zubov A, Orlov K, Rikunov P, Zvereva E, Flitman L, Lanikin A, Sokolova A, Markov S et al (2021) Silent EEG-speech recognition using convolutional and recurrent neural network with 85% accuracy of 9 words classification. Sensors 21(20):6744

Yadav H, Maini S (2023) Electroencephalogram based brain-computer interface: applications, challenges, and opportunities. Multimed Tools Appl 82(30):47003–47047

Yang J, Ning C, Deb C, Zhang F, Cheong D, Lee SE, Sekhar C, Tham KW (2017) k-shape clustering algorithm for building energy usage patterns analysis and forecasting model accuracy improvement. Energy Build 146:27–37

Yang Q, Cruz-Garza JG, Kalantari S (2023) Brain-computer interfaces as an architectural design tool: feasibility and usability study. Autom Constr 154:105011

Zhang X, Yao L, Sheng QZ, Kanhere SS, Gu T, Zhang D (2018) Converting your thoughts to texts: enabling brain typing via deep feature learning of EEG signals. In: 2018 IEEE international conference on pervasive computing and communications (PerCom). IEEE, pp 1–10

Zhao Y, Chen Y, Cheng K, Huang W (2023) Artificial intelligence based multimodal language decoding from brain activity: a review. Brain Res Bull 110713

Zolfaghari S, Sarbaz Y, Shafiee-Kandjani AR (2024) Analysing the behaviour change of brain regions of methamphetamine abusers using electroencephalogram signals: Hope to design a decision support system. Addctn Biol 29(2):13362

Download references

The authors have not disclosed any funding.

Author information

Authors and affiliations.

Department of Telecommunications Engineering, NED University of Engineering and Technology, Karachi, Sindh, 75270, Pakistan

Nimra Rahman, Komal Masroor, Mehak Arshad, Amna Rafiq & Syeda Maham Fahim

Department of Electronic Engineering, NED University of Engineering and Technology, Karachi, Sindh, 75270, Pakistan

Danish Mahmood Khan

Department of Computing and Information Systems, School of Engineering and Technology, Sunway University, Selangor, 47500, Petaling Jaya, Malaysia

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Danish Mahmood Khan .

Ethics declarations

Conflict of interest.

The authors declare no potential Conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Rahman, N., Khan, D.M., Masroor, K. et al. Advances in brain-computer interface for decoding speech imagery from EEG signals: a systematic review. Cogn Neurodyn (2024). https://doi.org/10.1007/s11571-024-10167-0

Download citation

Received : 21 March 2024

Revised : 16 July 2024

Accepted : 17 August 2024

Published : 04 September 2024

DOI : https://doi.org/10.1007/s11571-024-10167-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Brain computer interface (BCI)
  • Speech imagery
  • Imagined speech
  • Electroencephalography (EEG)
  • Machine learning
  • Deep learning
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. A Step by Step Guide for Conducting a Systematic Review

    what is methodology in systematic review

  2. systematic literature review steps

    what is methodology in systematic review

  3. Systematic Literature Review Methodology

    what is methodology in systematic review

  4. Introduction to systematic reviews

    what is methodology in systematic review

  5. How to Conduct a Systematic Review

    what is methodology in systematic review

  6. Basics of Systematic Review

    what is methodology in systematic review

VIDEO

  1. Systematic Literature Review

  2. Methodological Reviews

  3. Ace the Systematic Literature Review!

  4. Research Methodology: Philosophically Explained!

  5. Method Talk 3 Systematic review Dr Ngo Cong Lem

  6. Systematic Literature Review: An Introduction [Urdu/Hindi]

COMMENTS

  1. An overview of methodological approaches in systematic reviews

    Learn how to conduct a systematic review of literature with different methodological approaches and examples in this comprehensive article.

  2. Systematic Review

    A systematic review is a type of review that uses repeatable methods to find, select, and synthesize the available evidence.

  3. How to write the methods section of a systematic review

    The methods section of your systematic review describes what you did, how you did it, and why. Covidence shares some tips on how to do it.

  4. Systematic reviews: Structure, form and content

    Systematic reviews: Structure, form and content. This article aims to provide an overview of the structure, form and content of systematic reviews. It focuses in particular on the literature searching component, and covers systematic database searching techniques, searching for grey literature and the importance of librarian involvement in the ...

  5. How to Do a Systematic Review: A Best Practice Guide for Conducting and

    Abstract Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of evidence in relation to a particular research question. The best reviews synthesize ...

  6. Guidelines for writing a systematic review

    Guidelines for writing a systematic review. 1. Introduction. A key feature of any academic activity is to have a sufficient understanding of the subject area under investigation and thus an awareness of previous research. Undertaking a literature review with an analysis of the results on a specific issue is required to demonstrate sufficient ...

  7. Systematic reviews: Brief overview of methods, limitations, and

    These criteria have withstood the test of time, appearing in earlier literature describing systematic review methods. 4, 8 This article offers a brief overview of the methodology for systematic reviews, discussion of potential limitations, and resources for authors and editors.

  8. Introduction to Systematic Reviews

    Introduction: What Is a Systematic Review? A systematic review identifies and synthesizes all relevant studies that fit prespecified criteria to answer a research question (Lasserson et al. 2019; IOM 2011 ). What sets a systematic review apart from a narrative review is that it follows consistent, rigorous, and transparent methods established in a protocol in order to minimize bias and errors ...

  9. Methodology

    The method described may either be completely new, or may offer a better version of an existing method. The article must describe a demonstrable advance on what is currently available. The method needs to have been well tested and ideally, but not necessarily, used in a way that proves its value. Systematic Reviews strongly encourages that all ...

  10. Systematic Review

    A systematic review is a type of review that uses repeatable methods to find, select, and synthesise all available evidence. It answers a clearly formulated research question and explicitly states the methods used to arrive at the answer. Example: Systematic review. In 2008, Dr Robert Boyle and his colleagues published a systematic review in ...

  11. PDF Conducting a Systematic Review: Methodology and Steps

    This section explains steps and methods for guiding reviewers on conducting a systematic review, using examples from published systematic reviews and diferent types of studies.

  12. How to Do a Systematic Review: A Best Practice Guide ...

    Systematic reviews are characterized by a methodical and replicable methodology and presentation. They involve a comprehensive search to locate all relevant published and unpublished work on a subject; a systematic integration of search results; and a critique of the extent, nature, and quality of evidence in relation to a particular research question. The best reviews synthesize studies to ...

  13. How to do a systematic review

    High quality up-to-date systematic reviews are essential in order to help healthcare practitioners and researchers keep up-to-date with a large and rapidly growing body of evidence. Systematic reviews answer pre-defined research questions using explicit, reproducible methods to identify, critically appraise and combine results of primary research studies. Key stages in the production of ...

  14. Chapter 1: Starting a review

    Key Points: Systematic reviews address a need for health decision makers to be able to access high quality, relevant, accessible and up-to-date information. Systematic reviews aim to minimize bias through the use of pre-specified research questions and methods that are documented in protocols, and by basing their findings on reliable research. Systematic reviews should be conducted by a team ...

  15. Five steps to conducting a systematic review

    A review earns the adjective systematic if it is based on a clearly formulated question, identifies relevant studies, appraises their quality and summarizes the evidence by use of explicit methodology. It is the explicit and systematic approach that distinguishes systematic reviews from traditional reviews and commentaries.

  16. Methodology of a systematic review

    A systematic review involves a critical and reproducible summary of the results of the available publications on a particular topic or clinical question. To improve scientific writing, the methodology is shown in a structured manner to implement a systematic review.

  17. 1.2.2 What is a systematic review?

    Many systematic reviews contain meta-analyses. Meta-analysis is the use of statistical methods to summarize the results of independent studies (Glass 1976). By combining information from all relevant studies, meta-analyses can provide more precise estimates of the effects of health care than those derived from the individual studies included within a review (see Chapter 9, Section 9.1.3). They ...

  18. Introduction

    What is a Systematic Review? Knowledge synthesis is a term used to describe the method of synthesizing results from individual studies and interpreting these results within the larger body of knowledge on the topic. It requires highly structured, transparent and reproducible methods using quantitative and/or qualitative evidence. Systematic reviews, meta-analyses, scoping reviews, rapid ...

  19. Guidance on Conducting a Systematic Literature Review

    Abstract Literature reviews establish the foundation of academic inquires. However, in the planning field, we lack rigorous systematic reviews. In this article, through a systematic search on the methodology of literature review, we categorize a typology of literature reviews, discuss steps in conducting a systematic literature review, and provide suggestions on how to enhance rigor in ...

  20. Guidance to best tools and practices for systematic reviews

    Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods. In addition, guideline developers, peer reviewers, and ...

  21. Meta-analysis and systematic review of the diagnostic value of contrast

    Data extraction and synthesis Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) evaluated the methodological quality of all the included studies. The study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses specification. In addition to sensitivity and specificity, other important parameters were explored in an analysis of CESM accuracy for breast ...

  22. Repetitive Behaviors In Autism And OCD: A Systematic Review

    Method. The study followed the Preferred Reporting for Items for Systematic Reviews and Meta-Analyses (PRISMA) statement guidelines. ... This systematic review provides a comprehensive analysis of the current literature on repetitive behaviors in autism and OCD, highlighting the considerable overlap in behavioral presentations between these ...

  23. Systematic Review Methods

    A systematic review is a protocol driven comprehensive review and synthesis of data focusing on a topic or on related key questions. It is typically performed by experienced methodologists with the input of domain experts.

  24. Barriers and enablers to a healthy lifestyle in people with infertility

    Conducting a mixed-methods systematic review required us to synthesize highly heterogeneous data. While we used published guidelines to inform our methodology and ensure appropriate integration and analysis of data, the disproportionately large number of quantitative studies may limit interpretation of findings. Most of the surveys included in ...

  25. What factors influence patient autonomy in healthcare decision-making

    We conducted a systematic review through a comprehensive search of three major electronic databases of biomedical sciences. ... with few quantitative, mixed-methods, and review studies included. Our synthesis of data identified nine key factors that influenced patients' autonomous decision-making through cultural, social, religious, or ...

  26. What is the optimal serum level for lithium in the maintenance

    Aims: To systematically review the existing trials on optimal serum levels for lithium for maintenance treatment of bipolar disorder and to develop clinical recommendations. Methods: Systematic literature search. Discussion of major characteristics, limitations, methodological quality, and results of selected trials. Delphi survey consisting of clinical questions and corresponding statements.

  27. Systematic Review of Single-Channel EEG-Based Drowsiness Detection Methods

    Drowsiness is characterized by reduced attentiveness, commonly experienced during the transition from wakefulness to sleepiness. It can decrease an individual's alertness, thereby increasing the risk of accidents during activities such as driving, crane operation, working in mining areas, and industrial machinery operation. The detection of drowsiness plays an important role in preventing ...

  28. Guidance to best tools and practices for systematic reviews

    Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal ...

  29. Is Clot Composition Associated With Cause of Stroke? A Systematic

    Our study was performed according to the Preferred Reporting Items for Systematic Reviews and Meta‐Analyses guidelines. 18 We searched MEDLINE (PubMed), Embase, and the Cochrane Library to identify studies between January 1, 2000 and March 20, 2024 that reported clot histology in adult patients who underwent MT for large vessel occlusion‐AIS. . Review and meta‐analysis articles on the ...

  30. Advances in brain-computer interface for decoding speech ...

    An in-depth exploration of the existing literature becomes imperative as researchers investigate the utilization of DL methodologies in decoding speech imagery from EEG devices within this domain (Lopez-Bernal et al. 2022).The objective of this review is to guide readers through the rapid advancements in research and technology within EEG-based BCIs specifically speech imagery, shedding light ...