Can "Bayesian" calculate "the probability that a studied hypothesis is true"?

I asked to Prof. Wasserstein if the #2 in ASA's p-value statement implicitly suggests that "Bayesian" can calculate "the probability that a studied hypothesis is true", and Prof. Wasserstein kindly replied to the question.

 

I come to have this question when I watched debates in Japanese Twitters.

The following book by Toyoda Hideki has been published on March in 2020.

"Rescue Dying Statistics!: [Move] from Significance Tests to "Probability that a Hypothesis is True""

朝倉書店| 瀕死の統計学を救え! ―有意性検定から「仮説が正しい確率」へ―

Even before this book was published, Gen Kuroki (@genkuroki), Tokei-Chotto-wakaru-tan(@stattan), UnPainMan(@not_identified2) and some others strongly criticized the main framework in this book (and Tokyoda's previous papers and books) in Japanese Twitter.

The followings are my question and reply from Prof. Wasserstein in this entry.

I thank Prof. Wasserstein for replying to my sudden e-mail and giving me a permission to open his reply. I apologize in advance that I don't follow all discussions in ASA or Japanese Twitter. I also apologize for my poor English. Note that I had sent the following question before the above book has been published.

 

My e-mail to Prof. Wasserstein

Sent: Tuesday, February 4, 2020 9:04 AM
Subject: Question about p-value statement again: Can Bayesian get Pr(H0|D)?

 

Hello, Ron

I am Yusuke Ono at SAS Institute Japan. Last year I have asked a question about "random chance" in the 2016 p-value statement. In this mail, I would like to ask your opinion or advice about "Bayesian" posterior probabilities for hypotheses. I am very sorry again for this long e-mail.


In Wasserstein and Lazar(2016, p.131), the following phrase is written.

"2. P-values do not measure the probability that the studied hypothesis is true, [...]"


Does this sentence suggest implicitly that "Bayesian" posterior probability for a studied hypothesis is "the probability that the studied hypothesis is true"?

--- [Details and Backgrounds] ---
At least if I pick up some explanations partially from textbooks, some of them say "Bayesian" can calculate the probability that the studied hypothesis is true.

For examples, Goodman(2008, p.136) says as below.
"Let us suppose we flip a penny four times and observe four heads, two-sided P = .125. This does not mean that the probability of the coin being fair is only 12.5%. The only way we can calculate that probability is by Bayes' theorem, [...]"


Casella and Berger (2002:2nd ed., p.379) says as below (maybe, based on subjective Bayesian).
"In a hypothesis testing problem, the posterior distribution [in a Bayesian model] may be used to calculate the probabilities that H_0 and H_1 are true."


Casella and Berger (2002:2nd ed., p.436) says as below.

"In contrast, the Bayesian setup allows us to say that \lambda is inside [.262, 1.184] with some probability, not 0 or 1."
(But they also warn on p.436 that "However, remember that nothing comes free. The ease of construction and interpretation comes with additional assumptions. The Bayesian model requires more input than the classical model.)


Good (1965, p.8) says as below (maybe, based on empirical Bayesian).
"Several different kinds of Bayesians exist, but it seems to me that the essential defining property of a Bayesian is that he regards it as meaningful to talk about the probability P(H|E) of a hypothesis H, given evidence E."


Matthews (2019, pp.205-206) says as below.
"However, they [CIs] are certainly not immune, with standard CIs often being interpreted as the range within which the true effect size lies with specified probability. This interpretation is valid only within a Bayesian framework, under which CIs become credible intervals with uninformative priors. By explicitly using the Bayesian framework, the use of AnCred obviates this interpretative issue."


And on the Internet, you can find more strong claims if you search them with keywords, ""the probability that H0 is true" Bayesian".


The background for the reason why I am asking this question came from some argument on Twitter. A Japanese psychological statistician, Dr. Hideki Toyoda, will publish the following book on March.

"Rescue Dying Statistics!: [Move] from Significance Tests to "Probability that a Hypothesis is True""
Japanese title: 瀕死の統計学を救え! ―有意性検定から「仮説が正しい確率」へ―
[Publisher's page] http://www.asakura.co.jp/books/isbn/978-4-254-12255-8/

Although this book has not been published yet, several persons (let me call them "Akaikians" just for convenience) are against this book, his past books and articles on Twitter. "Akaikians" claim that "Bayesian" posterior probabilities (as well as p-values) are based on a specific model, and we basically don't know the model is correct or not, so first of all, we need to admit that the "Bayesian" posterior probability (as well as the p-value) is just imaginary.
("Akaikians" are also against subjective "Bayesian" perspectives, but let me skip the topic here.)

In your 2016 p-value statement, it's clearly said that p-value is Pr(T >= t| H0 and A) where A is an assumption. But in the statement, there is no reference to "Bayesian" posterior probability, and it doesn't warn that "Bayesian" posterior probability is not Pr(H0| T = t), but Pr(H0|T = t and A ). This my comment may be the same as the following ones in Stark (2016, p.1) and Greenland et al. (2016, p.6), supplement papers for the 2016 p-value statement.

---------------
"The "other approaches" section ignores the fact that the assumptions of some of those methods are identical to those p-values. Indeed, some of the methods use p-values as input (e.g., the False Discovery Rate)." (Stark, 2016, p.1)
-----------------

"It is possible to compute an interval that can be interpreted as having 95 % probability of containing the true value; nonetheless, such computations require not only the assumptions used to compute the confidence interval, but also further assumptions about the size of effects in the model. These further assumptions are summarized in what is called a prior distribution, and the resulting intervals are usually called Bayesian posterior (or credible) intervals to distinguish them from confidence intervals (e.g., see Rothman et al. 2008, Ch. 13 and 18)." (Greenland et al., 2016, p.6)
-----------------

Everyone must agree that <Apples drawn in a picture aren't oranges>. But I think, in order to avoid misleading, we should say that <Apples drawn in a picture aren't oranges drawn in a picture>.


--- [References] ---
Casella, G. and Berger, R.L.(2002: 2nd ed.)

Statistical Inference (2nd ed.)
Brooks/Cole

Good, I. J. (1965)
The Estimation of Probabilities: An Essay on Modern Bayesian Methods MIT Press

Goodman, S. (2008)
A Dirty Dozen: Twelve P-Value Misconceptions Semin Hematol, 45(3), 135-140

Greenland, S., Senn, S.J., Rothman, K.J., Carlin, J.B., Poole, C., Goodman, S.N., and Altman, D.G.(2016) Statistical Tests, P-values, Confidence Intervals, and Power: A Guide to Misinterpretations The American Statistician, 70(2), Online Supplement

Matthews, R. A. J. (2019)
Moving Towards the Post p<0.05 Era via the Analysis of Credibility The American Statistician, 73(sup1), 202-212

Stark, P. B. (2016)
The Value of p-Values
The American Statistician, 70(2), Online Supplement

Wasserstein, R.L. and Lazar, N.A. (2016) The ASA Statement on p-Values: Context, Process, and Purpose The American Statistician, 70(2), 129-133

Best Regards,

-
SAS Institute Japan
JMP Japan Group
Yusuke Ono (Mr.)
--- All opinions in this e-mail are my own. Only I am responsible for all things in this mail.

 

Reply from Prof. Wasserstein

Date: 2020/3/20, Fri 05:26
Subject: RE: Question about p-value statement again: Can Bayesian get Pr(H0|D)?

 

I apologize for the lengthy delay in responding, Yusuke. We have been greatly distracted with other things.

My colleagues and I don't find anything to disagree with in what you have pointed out. We would point add that principle 2 in the 2016 ASA Statement was put there NOT to drive people to Bayesian methods, but simply to point out that this P(H|D) interpretation is a common misconception, and we want people to stop making that error.

We are in complete agreement with you that it would be a shame for people to stop using NHST (which we are not fans of) but then make the same mistakes using Bayesian methods. Two mistakes involving p-values are (1) forgetting about all the other assumptions and (2) using thresholds (like p<0.05) to imbue powers to p-values (such as declaring things "significant) they don't really have. The same types of mistakes can be made using Bayesian methods or, I suppose, any similar framework. We've been criticized for not saying this explicitly in the 2016 ASA Statement, but our focus there was on the pervasive misuse of p-values. We do mention the concern about misusing other statistics in our 2019 editorial (https://amstat.tandfonline.com/doi/full/10.1080/00031305.2019.1583913#.XnPTpW5FxPY ), section 2, sixth paragraph.

I hope this helps, and apologies again for the delay. And I hope you are well!
Ron