We have found some common guidelines that help us determine whether studies and websites can be trusted. We share what we find valuable in a series of blog posts. Andy Jackson, ND, and Miki Scheidel reviewed and provided input on this post.

Evidence that a therapy “works” runs a whole range from unreliable to trustworthy. We present an overview of some of the issues in determining whether evidence is reliable and appropriate. In this post, we consider the design of research studies.

Hierarchy of evidence

Research spans a hierarchy of strength of evidence, starting with less reliable and less credible evidence and working down through increasingly credible sources of evidence (all examples are fictional). The evidence toward the bottom of this list is generally regarded as stronger and more reliable than evidence toward the top of the list.

Clinical or expert testimonial

A medical practitioner provides information from their personal experience treating patients. 

Example: “During 32 years in medical practice, I’ve seen this treatment help hundreds of patients with hot flashes.”

Cell or animal studies

Using human cells and tissues or living animals, these studies are a good first indication of a therapy’s effects. However, isolated cells or tissues in a highly controlled lab may behave very differently from tumors and other cells in living beings, and animals have different metabolism, different body chemistry, and many other differences from humans. These studies are a good indication of whether therapies are safe and effective and whether studies in humans are worth pursuing, but they are only a first indication. 

Case studies or uncontrolled studies

One patient or a group is treated and observed over time. The status after a treatment is compared to before treatment. This type of study doesn’t tell us what happened to people who didn’t receive the treatment, so we cannot know if any change was due to the treatment, to the natural course of the condition being treated, or to something else.


Six patients given this therapy experienced less nausea and vomiting than before using the therapy. 

Ms. X was followed for seven months on this treatment, with these results…

Retrospective observational studies, or case-control studies

People are asked to remember or records are reviewed to determine what therapies or practices they used in the past, and researchers look for patterns and compare them to current health status. People with a disease, condition, or characteristic of interest are compared to people without that condition or characteristic to identify differences between the groups. People are not always accurate in remembering or reporting past practices, which is a serious problem in these studies. 


Physical activity levels for the last 10 years among five hundred people with breast cancer were compared to levels of similar people without breast cancer.

Two hundred people who said they used aspirin at least once a week for the last 5 years were compared to two hundred others who did not use aspirin.

The number of prescriptions of benzodiazapines filled is used to compare exposure to benzodiazapines among people diagnosed with dementia and people without a dementia diagnosis.

Prospective observational studies

People’s treatments or practices are observed and recorded by researchers, or people are asked to record therapies they use in the present, and researchers look for differences in outcomes based on differences in therapies. Observational studies may follow a specific group of people—called a cohort—over time to track outcomes. Observational studies show a relationship between a therapy and an outcome, but they do not show that the therapy caused the outcome.1Thiese MS. Observational and interventional study design types; an overview. Biochemic Medica (Zagreb). 2014;24(2):199-210; Stürmer T, Brookhart MA. Chapter 2: Study Design Considerations. in Dreyer NA, Nourjah P, Smith SR, Torchia MM, editors. Developing a Protocol for Observational Comparative Effectiveness Research: A User’s Guide. Rockville (MD): Agency for Healthcare Research and Quality (US); 2013 Jan. AHRQ Methods for Effective Health Care. Setting criteria for selecting people for the study and creating comparable comparison groups are key aspects of creating a successful study.

Example: Five thousand people with breast cancer were asked to keep food journals for six years following their diagnosis. Those eating the most vegetables were compared to a similar set of people eating the fewest vegetables.

Small to mid-sized prospective, experimental clinical studies

Two or more groups of people (anywhere from a handful to a few hundred) are made as similar as possible or are randomly divided into groups. If groups are randomly divided, these studies are called randomized controlled trials, or RCTs—considered the “gold standard” in research.2Misra S. Randomized double blind placebo control studies, the “Gold Standard” in intervention based studies. Indian Journal of Sexually Transmitted Diseases. 2012 Jul;33(2):131-4. Researchers assume that randomly assigning people into one group who gets a treatment and another group who doesn’t get that treatment—called the control group—evens out any factors besides the one they’re interested in that might influence treatment outcomes. The larger the groups are, the more effective randomization is at creating comparable groups.

In this study design, one group receives the therapy or participates in the practice of interest to the researchers. The other group receives standard care or no treatment, perhaps in the form of a placeboa pill, medicine, or procedure—thought to be both harmless and ineffective—prescribed for the psychological benefit to the patient or as a sham treatment in a study to allow a comparison to a therapy of interest. The health outcomes of the groups are compared. Sometimes the groups switch treatments after a period to further determine whether the health outcome is due to the treatment or to differences in the people. 

In what is called a “blinded” study, people do not know which treatment they are receiving. A blinded study is a stronger study design than an unblinded study in which practitioners or even the people in a study know if they are receiving treatment or not. Blinding is easy to accomplish if the treatment of interest is a pill, as placebo pills are easy to create. If you’re trying to study something like yoga or a vegetarian diet or surgery, blinding is much more difficult. People are pretty good at figuring out if they are doing yoga or eating meat or not, and performing a sham surgery could be considered unethical. As a result, sometimes the comparison (control) treatment is not sufficiently well designed for researchers to draw sound conclusions about the effects of treatment.3Hilal T, Sonbol MB, Prasad V. Analysis of control arm quality in randomized clinical trials leading to anticancer drug approval by the US Food and Drug Administration. JAMA Oncology. 2019 May 2.

Knowing if you are getting a therapy or not in a study can affect your expectations, motivation, and even your behavior. One effect that has been studied and observed in clinical trials happens when a study participant believes they were given the inactive (placebo) pill. They may give up, get angry, or try to sabotage the study. They may stop cooperating. They may want to prove that they don’t need a fancy pill to beat their opponents, trying much harder to succeed, believing they have to beat an opponent who already has a large advantage. Or perhaps a study participant simply doesn’t like the researcher and misbehaves. All this behavior is known as the “screw you” effect, and it can dramatically change study outcomes.4Inglis-Arkell E. The “screw you” effect and other perils of informed volunteers. Gizmodo. January 3, 2014. Viewed October 5, 2022. 

Blinding the health professionals interacting with the study participants can be important, too. Health professionals can knowingly or unknowingly treat people differently or communicate expectations about outcomes to study participants if they know what study treatment the participants are getting. This can change the study outcomes. A study in which neither the study participant nor the health professionals interacting with and/or assessing them knows who is getting the active treatment is called a double blinded study and is considered a very strong study design.

Large experimental clinical studies 

These are the same as small studies, only with many more people. Remember, the larger the groups are, the more effective randomization is at creating comparable groups. With larger groups, we have more confidence that any differences in health outcomes between the treatment group and the control group aren’t due to some difference between groups that we don’t have control over. For example, in a large study in which people are randomly assigned to groups, we’d expect about the same number of people in each group to be left-handed or near-sighted. These characteristics probably don’t have much bearing on cancer outcomes, but having diabetes or being over 65 or having chronic inflammation can affect your risk of cancer, and so having groups that are similar on these characteristics is important if researchers are trying to figure out if a therapy or practice has an effect on cancer risk. There may be many, many more characteristics that we don’t know about yet that are also related to cancer, and randomization is assumed to even those out between groups.

Sometimes study effects (the effects of the treatment) can seem modest for each participant, but a statistically significant effect is found for the whole group. Sometimes only a small number of participants show any effect of the treatment, but the treatment is considered a success for those people. Most commonly in medical literature, the success of treatments and therapies is judged at a population level, not an individual level. A therapy is judged to be effective if enough people showed enough of an effect. However, many individuals in the study may have shown no effect. 

Meta-analysis of several studies

Researchers conduct a review and also combine the results of two or more studies. Data from the studies are analyzed as though they were from one, bigger study. This approach can often find more subtle effects that may have been overlooked or dismissed in the individual studies. A meta-analysis may also be able to identify reasons that smaller studies found conflicting outcomes.

Meta-analyses are only as strong as the studies they include. Pooling several poorly designed studies won’t lead to good evidence.

Clinical practice guidelines

A panel of medical researchers reviews all the evidence to date and concludes that a therapy fits into categories of recommendation for specific medical conditions. The following examples of categories are based on clinical practice guidelines from the Society for Integrative Oncology:5Deng GE, Frenkel M et al. Evidence-based clinical practice guidelines for integrative oncology: complementary therapies and botanicals. Journal of the Society for Integrative Oncology. 2009 Summer;7(3):85-120.

Strong recommendation in favor of useHigh- or moderate-quality evidence shows that benefits clearly outweigh risk and burdens.
Weak recommendation in favor of useHigh- or moderate-quality evidence shows that benefits are balanced with risks and burden.
Weak (inconclusive or conflicting) evidence leaves uncertainty in estimates of benefits, risks and burden; no clear advantage is shown over other options.
Weak recommendation against useHigh- or moderate-quality evidence shows that risks and burdens are probably greater than benefits.
Evidence shows no advantage compared to other options, while risks and burdens may be greater.
Weak (inconclusive or conflicting) evidence leaves uncertainty in estimates of benefits, while risks and burden are established.
Weak recommendation against useHigh- or moderate-quality evidence shows that risks and burdens clearly outweigh benefits.

You’ll find varying conclusions even across expert groups. Some groups consider a set of evidence sufficient to recommend a therapy, but another group may view the same evidence as inconclusive. They have different standards, just as individuals do—some groups are more cautious about making recommendations and want a higher level of evidence than others. We encourage you to consider all the evidence and find your own level of comfort.

Helpful link

Other posts in this series


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About the Author

Nancy Hepp, MS

Ms. Hepp is a researcher and communicator who has been writing and editing educational content on varied health topics for more than 20 years. She serves as lead researcher and writer for CancerChoices.

Learn More

Ms. Hepp is a researcher and communicator who has been writing and editing educational content on varied health topics for more than 20 years. She serves as lead researcher and writer for CancerChoices and also served as the first program manager. Her graduate work in research and cognitive psychology, her master’s degree in instructional design, and her certificate in web design have all guided her in writing and presenting information for a wide variety of audiences and uses. Nancy’s service as faculty development coordinator in the Department of Family Medicine at Wright State University also provided experience in medical research, plus insights into medical education and medical care from the professional’s perspective.

Nancy Hepp, MS Lead Researcher