Data dredging and data-snooping bias can occur when researchers - TopicsExpress



          

Data dredging and data-snooping bias can occur when researchers either do not form a hypothesis in advance or narrow the data used to reduce the probability of the sample refuting a specific hypothesis. Although data-snooping bias can occur in any field that uses data mining, it is of particular concern in finance and medical research, which both heavily use data mining. The process of data mining involves automatically testing huge numbers of hypotheses about a single data set by exhaustively searching for combinations of variables that might show a correlation. Conventional tests of statistical significance are based on the probability that an observation arose by chance, and necessarily accept some risk of mistaken test results, called the significance. When large numbers of tests are performed, some produce false results, hence 5% of randomly chosen hypotheses turn out to be significant at the 5% level, 1% turn out to be significant at the 1% significance level, and so on, by chance alone. This and a comic example (imgs.xkcd/comics/significant.png) exemplify the multiple comparisons hazard in data dredging; there is no overall effect of jelly beans on acne. Also, subgroups are sometimes explored without alerting the reader to the number of questions at issue, which can lead to misinformed conclusions.[1] en.wikipedia.org/wiki/Data_dredging
Posted on: Fri, 07 Jun 2013 18:40:53 +0000

Trending Topics



Recently Viewed Topics




© 2015