In research, the effectiveness of a treatment modality is often put in terms of whether or not it is statistically significant. What does statistically significant mean? Let’s take a look.
Most research design starts with what is called the “null hypothesis”. The null hypothesis states that the independent variable (for example, treatment such as alcohol counseling), had no effect on the dependant variable (for example, the recidivism rate). Using legal terms, there is a presumption that the treatment will not help. If that presumption cannot be overcome by statistically significant data, the treatment is considered ineffective. The burden of proof is on the proponent of the treatment modality being studied. The proponent must prove, by statistically significant evidence, that the treatment modality is effective.
For a treatment modality to be considered effective, the data must show that the treatment (alcohol counseling) affected the condition being treated (recidivism rate) and the effect was statistically significant. The difference between the recidivism rate for individuals with no treatment and the recidivism rate for individuals with treatment must be negative (meaning it has declined) and the size of the decline must make it unlikely that it is due to chance alone.
In science, depending on the required rigor of the study, most effects are considered statistically significant if the likelihood that the difference is a result of chance alone is less than 5% or less than 1% (the P-level). For example, one could have a study that showed a reduction of the recidivism rate of 20%, but because of the size and variability of the sample (and other factors), the p-level may be 40%. There is a 40% chance that the measured difference is merely an artifact of pure chance. One would not have much confidence in the result of such a study.
Flip a coin 5 times and if you get 4 heads and 1 tail, there is 60 percent difference in the results (80% heads minus 20% tails). The difference is the result of chance alone, and not due to the construction of the coin. The difference is not statistically significant. If you flipped the coin 1,000 times with the same result, it is far more likely that you have evidence of a trick coin. The larger sample size adds to the chance of finding a difference statistically significant.
Something may be statistically significant, but practically insignificant. For example, imagine a study of a drug for alcoholism involving five million people. The study may show a reduction in alcohol consumption of one tenth of one percent. Because of the size of the sample, this reduction may be statistically significant. But in the practical world of individuals attempting to treat alcohol dependent individuals, this reduction is of no practical value. It is statistically significant, but practically insignificant.
You could write a book about the abuse of the phrase "significant difference" in the media. Corporations issue press releases touting the "significant improvement" of their medication over others treating gout, cancer, baldness, etc. and the press blithely prints/passes it along to consumers without investigating and discovering that the new meds are often only marginally better than existing treatments (perhaps two or three percent more effective at two or three times the cost), but the difference is "statistically significant", and therefore reportable. Good job Steven explaining the nuances.
ReplyDelete