Bayesian statistics

Can someone redpill me on the practical uses of the Bayesian interpretation of probability, compared to the classical one?
How does a Bayesian solve a statistics problem that is usually handled by, say, hypothesis testing? Which problems if any are easier to tackle?

I can understand Bayes' theorem, and the idea of using evidence to update priors. I can grant that, for instance, you might want to view the probability of a coin landing on heads vs tails as a random variable parametrized by its history of past outcomes, as opposed to an intrinsic property (i.e. a fixed but unknown parameter) of the coin. But I don't quite feel I "get the point" of the Bayesian approach yet.

Well, the wikipedia page has a nice example. Bayesian approach is used in medical differential diagnosis, as well as some remote sensing applications and climate science. Basically, whenever you enter with some prior information because you deal with multiple measurements which come with their own set of uncertainties.

Which wikipedia page? There's like six of them.

I wrote an undergraduate essay on bayesian inference versus frequentist (null-hypothesis significance testing) inference.

One of the fundamental differences between the two paradigms, is that for the bayesian, hypotheses can have probabilities attached to them (for most of them, probability is equated with rational credence for belief, though this philosophy of probability is independent of bayesian inference per se), whereas for the frequentist, hypotheses cannot be given probabilities, instead hypotheses are rejected or accepted. The latter is in line with a Popperian view of science, that scientific theories can only ever be falsified, but not accepted per se.

As such, one strength (it is claimed) is that bayesian inference allows us to attach credences to hypotheses themselves, whereas the frequentist cannot. The fact that a bayesian can is precisely due to bayes theorem, which allows us to calculate the posterior probability of a hypothesis, given the evidence. In contrast the frequentist only ever looks at the value "p = P(E, or something more extreme than E | H0)", that is, the probability of attaining the evidence or something more extreme than it, under the assumption of the null hypothesis.

Another thing bayesians claim, is that unlike the frequentist, non-actual outcomes do NOT factor into the decisions of a bayesian. That is, the bayesian is only concerned with their observations, and how their observations affect the probability of a hypothesis. On the other hand, a frequentist IS concerned with non-actual outcomes. For instance, to calculate a p-value, one must consider situations that are "more extreme" than what actually happened, on the null hypothesis. This requires that the experimenter decide on their outcome space prior to the experiment happening. It is claimed, that this gives the frequentist problems in that the "intentions of the experimenter affect the outcome" (see: optional stopping rules).

Bayesian probabilities update continuously. They're subjective, and they're iterative.

Classical probability assumes every chance operates like fair dice or the flip of a fair coin. It assumes there really is a 1/36 chance of snake eyes, for example. It fails when we do not fully understand the mechanism of the problem we're approaching, and cannot accurately assess our priors.

Frequentist probability runs a trial and then assigns classical probabilities based on the results of that trial as a whole.

Bayesian probability can (via the theorem) adjust its priors, and therefore its assumptions, after every single roll. This is tedious, but provides an ongoing statistical analysis that can adjust as situations change.

cont.

To come to a slightly separate point however, it is not true that only bayesians can incorporate prior information, frequentists also incorporate prior information, but it is done so implicitly in the choice of statistical model. So in practice, frequentist and bayesian methods borrow from each other (see Andrew Gelman's paper, philosophy and practice of bayesian statistics)

Frequentists have retorts to most of these objections, but to summarise, the fundamental difference between frequentist inference and bayesian inference is that one runs under a Popperian philosophy of science, which believes only in the falsification of scientific theory, rather than giving actual affirmative probabilities about hypotheses (disclaimer: bayesians have ways of affirming a Popperian framework, by employing so called Bayes factors, which is the ratio of likelihoods under the null and alternative hypotheses), but the frequentist method is actually designed to affirm a falisificationist framework of science.

This is very informative, thank you. I will definitely check out that paper of Gelman's, if I can get my hands on it.

One point of contention I have, is that you can perform multiple hypothesis tests with different null hypotheses, and try to reject each in turn. We even have the result that 100p% confidence intervals are precisely the ranges of values for which an appropriate hypothesis test would fail to reject the null hypothesis using 1-p as the critical p-value threshold, and vice versa.
This doesn't attach specific credences to hypotheses, sure, but it still gives you a wand you can wave over hypotheses to tell which ones are plausible and which aren't, much like checking which of your hypotheses have a sufficiently large probability. Which makes me wonder if there's some similar grander correspondence between frequentist and Bayesian statistics, akin to the CI--hypothesis testing correspondence.

I am a little puzzled by the deciding-on-the-outcome-space comment, because I feel like that's essentially the same thing as coming up with initial Bayesian priors. Much like a different test statistic changes the p-value, so would a different 'space' of possible (i.e. nonzero probability) hypotheses change how evidence updates the priors.

>This doesn't attach specific credences to hypotheses, sure, but it still gives you a wand you can wave over hypotheses to tell which ones are plausible and which aren't
Sure, this is why frequentist inference as originally formalized by Neyman-Pearson, is more a decision procedure than it is an attachment of probabilities. We obtain our confidence interval, and then decide to reject the null hypothesis if our confidence interval does not contain the value for the null hypothesis.

Though I understand what you're concern is, what you're getting at is that frequentist inference is a lot like some probabilistic version of the deduction rule, 'modus tollens', that is:

1. If H0, then probably not-E
2. E
3. Therefore, probably not-H0

So line 1 corresponds to setting up the null hypothesis H0 and deciding the confidence interval, line 2 corresponds to observing the outcome, and line 3 corresponds to rejecting the null, which as you've said, is sort of like waving a wand over the hypotheses and giving themselves some probability judgement. Now, a true frequentist might say that this is not what they explicitly endorse, instead we are just given that the null is rejected at significance level (alpha). Though I agree that it seems to follow this intuition. Whether there is a true duality at the heart of both paradigms, it could be, I'm not sure myself.

>I am a little puzzled by the deciding-on-the-outcome-space comment, because I feel like that's essentially the same thing as coming up with initial Bayesian priors.
Yea sort of, but the reason why its presented as an objection to frequentism is mostly because frequentists like to portray themselves as a purely objective statistical framework as opposed to the bayesian needing "subjective priors" to do statistics, so the claim then is that because frequentists rely on the experimenter to decide the outcome space, there is also a "subjective" element (how subjective it really is, is of course subject to debate).

I don't understand any of this. I saw Bayes theorem but it was just a theorem. Provable from the 3 axioms of probability. Why are you saying it is a philosophy?

There's schools of thought in statistics, Bayesian and frequentism. I honestly thought they were equivalent.

Isn't the p-value just like a conditional probability?

Okay, so Bayesian statistics has nothing to do with Bayes' theorem? Is it just some philosophical spook that I'll never have to deal with in pure math right?

The theorem itself is not a philosophy of course, the theorem is a mathematical fact that can be proven from the axioms. How we should use it in practice however is up to debate. For instance, suppose you have a hypothesis H and some evidence E, for the bayesian, to calculate P(H | E), they must know what P(H) (the prior probability) is, how is this determined? Is there an objective way to do this? If not, what other kinds of statistical inference can we do?

So a bayesian in philosophy of statistics/science is someone who believes that the application of bayes theorem (in so far as its ok to specify a prior probability) is rational (indeed rationally obligatory) in science. The challenges to it is that specifying the prior is necessarily subjective, and etc. Though the debate has since then evolved quite a lot.

>So a bayesian in philosophy of statistics/science is someone who believes that the application of bayes theorem (in so far as its ok to specify a prior probability) is rational

If bayes theorem is a theorem, then why would someone debate using it? If someone debates the use of the theorem then aren't they literally debating the axioms? This makes no sense.

Honestly I didn't even study the Bayesian side of things, I was just made aware that it exists but I think it does have to do with Bayes theorem extended to more variables.

p value is the conditional probability of the evidence (or something more extreme than it) under the null hypothesis. Bayesians do not use this however, they focus on the conditional probability of the hypothesis on the evidence (and only the observed evidence). So the frequentists and bayesians are concerned with the (almost) opposite conditional probabilities.
Its not a philosophical spook, its to do with how statistics is done in practice, bayesian statistics is a different method to frequentist statistics and sees popularity in computer science and other applications, whereas frequentist inference is still favoured by psychology and biology.

But yes, none of this is relevant to the actual probability and measure theory behind it all, no one disagrees on that, so it doesn't affect pure mathematicians

The question is if its possible to use this theorem effectively though, again, the use of bayes theorem is that you need to specify P(H), the probability of the hypothesis without ANY prior information. So, how do you do this?

Oh, ok I get it. When I used the theorem the problem already specified very specific probabilities. I guess that in the "real" "world" statistics is not just that. I think I understand you.

Yep, exactly right.

>frequentist inference as originally formalized by Neyman-Pearson, is more a decision procedure than it is an attachment of probabilities.
This. Bayesian statistics is a calculus of probability. Frequentist statistics is a set of procedures to make crisp decisions informed by probabilities.

>The question is if its possible to use this theorem effectively though,
Outside textbook examples, it usually is not feasible, which is why frequentist statistics exists. Bayesian statistics is the math of what is happening; frequentist statistics are a set of tools to approximate the reasoning that Bayesian statistics prescribe in the real world.

>Bayesian statistics is a calculus of probability. Frequentist statistics is a set of procedures to make crisp decisions informed by probabilities.
This is a very satisfying summary.

>redpill me

REDPILL.

Its kind of like saying, yeah 95% of people who smoke cigarettes die of lung cancer. So if we set the bar at 5% we say its statistically significant.

To be fair, both Bayesian and frequentist are basically the same thing written in a different way.

b-but muh priors

I'd just like to interject for a moment. What you’re referring to as Bayesian statistics, is in fact, nonsense, or as I’ve recently taken to calling it, the inverse probability.

The inverse probability is not a method unto itself, but rather another work or fiction made useful by the idiots that proclaim to know some prior probability, these idiots are in fact worse than those that reject the axiom of countable additivity and instead embrace the axiom of finite additivity.

You see, Bayes identified the problem, provided the solution and let the whole idea die with him as he too understood the inanity of the inverse probability. Laplace on the other hand was too arrogant and decided to pursue these trivialities.

The only question that you should ask yourself related to Bayesians is as follows. Bayesians: knaves or fools?

Frequentistic approach is kind of not rigorous, but you still can get same results using appropriate heuristics.

>instead hypotheses are rejected or accepted
You mean rejected or not rejected. Or was I lied to in Stat 101?

My mistake, you are correct, formally we do not accept hypotheses under a frequentist framework as I allude to in the next sentence.

Generally the probability of B keeps changing so your model dynamically updates. Like searching for a crashed plane in the ocean. After you have searched a certain area and found nothing the probability that the plane is a specific region has changed because you eliminated an area. These models can be quite complex. Going back to the plane crash example, you start with a very complicated model with ocean currents. Every time you search a certain area the whole model needs to be updated. I don't understand why people always use flowery language and try to make this some philosophical statement. The probability of B keeps changing so your model keeps being updated.

>Can someone redpill me on the practical uses of the Bayesian interpretation of probability, compared to the classical one?

Speed.

Instead of calculation and measurement you use estimation and approximation.