P values are dumB

What are the alternatives to p values? Explain. I'm a brainlet.

Actually learning statistics like a man.

just report confidence intervals instead
they contain all the information a p-value does, are significantly less confusing to non-experts and have the added benefit of exposing the fact that the variance of your estimator might be embarassingly big.

>p values are dumB
no U
Lrn2probabilly, Billy

Confidence intervals are just as confusing, tell me what a 95% confidence interval for the sample mean from 0.1 to 93.7 actually represents
What you actually should be doing is using a linear model and reporting more than the most simplistic tests

Effect size.

95% likely that the population mean is between 0.1 - 93.7

easy shit nigger

MCMC hp testing, apparently.
Just found out today.

>95% likely that the population mean is between 0.1 - 93.7
Only in a frequentist view

how the fuck are p-values confusing to non-experts?

>are significantly less confusing to non-experts
not by much. just about every explanation of confidence intervals i've seen outside statistics texts is wrong.

For an example of people getting it wrong, see:

only the left tail is interesting
t. risk management

CIs are way more confusing to me than p-values and I'm supposed to know this shit

They also don't really solve the underlying problems with leaning on p-values for interpretation and hypothesis testing. Also they're prone to their own brand of wrong interpretations - confidence intervals can overlap and still be significantly different in a hypothesis test.

Incorrect, not that easy nigger
Bayesian confidence intervals have a different name

What’s wrong with what he said?

Bayes factor/likelihood ratio is the only correct answer.

he didn't say what a confidence interval means

>What’s wrong with what he said?
I'm not a "he".

Explains why you can't into a confidence interval

A 95% confidence interval means that if you carry out an experiment a theoretically infinite number of times and then assign a defined interval around the sampling mean of each experiment, 95% of those experiments will have an interval that overlaps with the population mean.

user said that there's a 95% probability that the specific interval in this case overlaps the population mean, which is a different statement and is wrong. It either overlaps or it doesn't, assigning a probability is meaningless in this context.

Bait?

So, give a correct and intuitive explanation about what to do with a confidence interval of [0.1 ; 93.7] in practice. Is it nonsense to even calculate confidence intervals based on real data?

About >95% likely that the population mean is between 0.1 - 93.7:

The wording that he chose might at first sight imply that he hasn't understood the topic fully. There's a chance he actually hasn't. The population mean, let's call it µ, is a fixed, non-random number. It doesn't fluctuate. From a narrow view only on µ itself, it seems useless to use the probability concept on a non-random number. µ is with 100% prob µ, full stop. On the other hand, the confidence interval (the lower and upper bound of it, as a vector) is a random variable. So the confidence interval is the one that fluctuates and the question should be "Does the ci intercept µ? / Does the ci include µ?" and not "Does µ fall into the ci?" because µ doesn't fall anywhere, it's just a fixed number.

So much for the theory. What can I actually say about a ci of [0.1 ; 93.7] then? Well if you knew µ, you could say "[0.1 ; 93.7] does contain µ with 100% prob" or "[0.1 ; 93.7] does contain µ with 0% prob". But we don't know µ, that's why we are estimating it in the first place.

The concept of randomness is just there to model things we don't know for sure. Ignorance is in most cases based on missing information, either regarding the data or regarding the structure of the actual, physical model of the "random" experiment. If you believe the universe is deterministic, "randomness" is literally only there to model information that exist but that we just don't have. It might also be possible that we don't know things because we can't know them in advance because they are actually random, which requires you to believe that we are in a probabilistic universe. Even if it is, a lot of things that us stupid humans treat as random are actually deterministic. (1/2)

(2/2)
You can roll a die, I am thinking about a number between 1 and 6 and I will keep this number in mind and won't change it. You roll the die and you roll a 4. Have you rolled the number that's in my mind?

In a world of complete information, you would know what all the neurons and electrons and so on in my brain are doing and you would know the number in my mind. You could say either "yes, I rolled the number that is in your mind" or "no, I didn't role the number in your mind". But of course, you don't have all existing information available, you don't know the number in my mind so the best you can do is to fall back to the concept of randomness and therefore the concept of probabilities. The best you can do in this situation is saying "There's a 1/6th chance that the number I rolled, 4, is the number in your mind".

It's the same with confidence intervals, that's why they are calculated in practice nevertheless. You can say "There's a 95% chance that the ci that I got here from this one random sample, so the interval [0.1 ; 93.7], contains µ".

So much about >It either overlaps or it doesn't, assigning a probability is meaningless in this context.

Generally, it is not even wrong to say
>95% likely that the population mean is between 0.1 - 93.7
because you can read it in a way that the 95% probability applies to the interval and not the population mean. But as I'm reading it again, I have to admit that this sentence would probably be interpreted wrongly by most and that he also most likely confused the fluctuation of µ with the fluctuation of the ci.

This is just silly. We each flip a coin but don’t look. What’s the probability that they match?

Similar questions are asked in thousands of classrooms and labs around the world each day. Nobody says it’s meaningless because we’ve already flipped.

>when the abstract has a p value but doesn't mention anything about effect size

Because non-experts rarely think about the fact that stopping your data collection while the p value is fluctuating below 5% is easy.

You might as well throw away data points that don't fit your hypothesis or make up data points yourself. Either way, it's fraud

This is the problem with frequentist statistics. It does NOT represent probability, even though it really sounds like it should. It regularly confuses even experts. It was designed for a world that didn't have computers and doing the mathematically ideal calculations was impossible.

The problem is you are just assuming an arbitrary prior. Usually a VERY convenient prior that gives your hypothesis a high probability to start with.

Let's say you have a hypothesis that "green jelly beans cause cancer". Let's give this hypothesis a prior probability of 1 in 1 100,000. Why? Because the vast majority of random assertions like that are wrong. If more than 1% of random foods caused cancer, we would all be dead by now. And if you really believed that the probability was higher than 1% you would probably avoid eating them.

So a study is done and comes back with a p value of 0.05. Is there now a 5% that green jelly beans cause cancer? Let's do the calculation.

In 1 hypothetical world, green jelly beans cause cancer and in 99,999 hypothetical worlds they don't. Now in 5% of those 99,999 worlds, the same study comes back positive. And lets say it also does in the 1 jelly-beans-cause-cancer world. Now out of all the hypothetical worlds that have a positive study, 1 actually has jelly beans that cause cancer and 5,000. So the probability you exist in the world where jelly beans actually cause cancer is 0.02%.

This is why possibly the majority of scientific research is wrong. Certainly the majority of research with large p values.

The correct thing to do is report Bayesian likelihood ratios. In my example above, that would be 20:1. That is, it increases the odds of the hypothesis by a factor of 20 (20 times 1:99,999 is 1:5,000, same as we calculated above.) It DOES NOT depend on any prior and does not pretend to be a probability. It also means you can chain together multiple studies easily just by multiplying their likelihood ratios together.

Bayesian methods do not have this problem. You can stop whenever you want. Why should the math care about how you decide to stop an experiment? The same observations should lead to the same conclusions.

>If more than 1% of random foods caused cancer, we would all be dead by now.
No, then we wouldn't eat those things and wouldn't call them food. Coincidentally we don't eat most things

BUMP

OP, your question has already been answered

You wrote two posts without saying one goddamn thing beyond "he's right because I want to say he's right". It's impressive how bad you are at this.

Theres quite a lot messed up here. I will just accept that we are talking about about green jelly beans that give you real obvious cancer with a probability of 100%, like if you eat a single one, you immediately grow a giant tumor in one universe and with obvious 0% prob in other universes. First of all, for the problem you created, we cannot do hypothesis testing. Just to repeat and slightly rephrase the given problem, you are saying that on average, if you take 100,000 parallel universes, there is expected to be only 1 where green jelly beans cause cancer. The problem is that we can't take measurements from parallel universe, we can just make the on observation in our universe whether
they cause cancer or not. So we have to work with n=1. It's impossible to estimate the variance of the random binary variable that gives a 1 when they cause cancer in a specific universe and a 0 when they don't. Without estimated variance, no p-value.

But let us alter your problem to the hypothesis "1 in 100,000 green jelly beans cause cancer". Let Y be the random binary variable that is 1 if a random jelly bean we eat will give us cancer and is 0 if it doesn't.

H0: E(Y) = µ = 1/100,000
H1: µ =/= 1/100,000
Let's say our significance level is 10%, so we need p-values >5% to reject H0 because it's a two sided test.

p-value gives 5%. That just means we can say that E(Y) is significantly different from 1/100,000 if we are working with a significance level of 10%. It means "we believe that not every 1/100,000th green jelly bean causes cancer, because given E(Y) actually is 1/100,000, there's only a 10% chance that we drew a sample like the one that we are seeing here". It doesn't say "there's a 10% chance that E(Y) is actually 1/100,000" and it also doesn't say "E(Y) has to be 10%", as you are claiming.

> Now in 5% of those 99,999 worlds, the same study comes back positive.
is therefore wrong and everything that follows is wrong as well.

He said that the guy is wrong though. A confidence interval either contains the real mean or it doesn't.

If [math]r_{i,\alpha}[/math] is 1 if an [math]{\alpha}[/math]-confidence interval (e.g. [math]{\alpha} = 0.95[/math]) contains the true mean, and 0 otherwise, then [eqn]\sum_{n\to \infty} \frac {r_{n,\alpha}} n = \alpha[/eqn]

So everything I was learning about statistics is garbage.
Where should I re-start?
Is Bayesian statistics the new paradigm in statistics?

No, it's not all garbage, but you should definitely learn Bayesian statistics, it's much more intuitive, and is quite interesting

>No, it's not all garbage, but you should definitely learn Bayesian statistics, it's much more intuitive, and is quite interesting

So how do you address the need for priors?

Something being intuitive doesn't make it more useful for actually doing science.

I feel like you're trying to read something from my post that just isn't there, but to answer your question, generally one presents itself, failing that I just make one up, or if I can't come up with a good prior I'll use a vague prior or two and perform a sensitivity analysis

You can't just stop as soon as you reach your desired conclusion, though. You'll still get biased results if you do that, even under a Bayesian analysis.

For example:
>Hypothesis: Outcomes of a coin flip are Bernoulli distributed Bernoulli with a probability of heads greater than 0.5
>Prior: Uniform(0,1)
>Flips coin twice
>Two heads
>We're good here guys, pack it up, no need to collect more data, hypothesis confirmed.

It's of particular importance to fields like medicine which will elicit changes in practice based on the interpretation of results, such as diagnostic testing. It is worth learning.

bait?
the population mean is not a random variable, the probability that it is between 0.1 and 93.7 is either 0 or 1.

>So how do you address the need for priors?
Usually you can make some good guesses based on your knowledge of the system and the type of data you're analyzing. Worst case you can just assume a uniform prior, like some parameterizations of the beta distribution give you.

Also, if you're using something like a beta prior that can look sort of like several different types of distributions, you can use some parameter-fitting on your control sample to find an ok prior.

Also parameter fitting is something you can/should do with any analysis regardless of the distribution in question, but it's especially useful if you have no clue what prior to use

>we are talking about about green jelly beans that give you real obvious cancer with a probability of 100%
I never stated that.

>we cannot do hypothesis testing
Yes people do hypothesis testing on similar hypotheses every day. Look how many studies there are that are variations of "does coffee cause with cancer" or "does diet coke cause diabetes". Ok fine, most of them are correlation studies, but they could be randomized experiments in principle.

>if you take 100,000 parallel universes
I never said anything about "parellel universes" which may or may not exist. I said "hypothetical worlds" which is a convenience that makes bayesian math easier to understand.

>"1 in 100,000 green jelly beans cause cancer"
No that is not the hypothesis that was being discussed. You've confused P(hypothesis) with P(cancer|jellybean). The percentage of jelly beans that cause cancer is irrelevant. All that's stated is some randomized experiment was done, and found a correlation between cancer and jellybeans. And it had a p value of 0.05. The exact correlation isn't stated and is irrelevant.

>The problem is that we can't take measurements from parallel universe... we have to work with n=1...
Alright I give up here. Your interpretation of my comment is so confused that I don't even know where to begin. I was just explaining the absolute basics of how to do a bayesian calculation. Please look up Bayes theorem and learn it.

>So how do you address the need for priors?

See . You can just calculate likelihood ratios which do not depend on a prior. In general bayesian methods often use an "uninformative prior", e.g. assuming a random value could be anything between 0 and a billion with uniform probability.

No that experiment is perfectly fine. With bayesian analysis of 2 data points, you'd get a very uncertain result as you should.

Bayesian methods don't care how you decide to stop. The observed data is all that matters.

That's exactly what a conf interval would tell you if you sampled and made one brainlet

Why are bayesian statistics enthusiasts/evangelists always the biggest popsci pseudo mathematicians ever?

>the problem with frequentist statistics is that they assume arbitrary convenient priors.
>proceeds to pick the figure that 1 in 100k foods cause cancer out of nowhere

Why are these people such jokes. Every real statistician uses bayesian statistics, but they use it in avenues that make sense. Time series, graph models. No one who does real work with bayesian statistics is such a dogmatic moron they think that assuming a prior figure out of somewhere is more authoritative or an improvement over doing a hypothesis test.

These fucking popsci cretins have clearly never in their life taken a mathematical statistics course yet they have this total confidence fueled by xkcd of total strawman situations where a statistician would never behave that way.

How can they be so aggressively retarded?

thank you, I appreciate it

No , it isn't . Every statistician uses bayes when appropriate and frequentist statistics when appropriate. anyone telling you that bayes is always more appropriate and an improvement over frequentist statistic is an unbearable popsci faggot.

The fact that you bought into this nonsense so easily shows that you didn't really understand statistics at all.

Are you actually clinically retarded? I explained in my comment how to do bayesian statistics without priors. By reporting likelihood ratios. And then you complain about "hur dur muh priors". Fuck off back to frequentist land where you all use arbitrary priors anyway and pretend you don't.

>Every real statistician uses bayesian statistics,
The vast majority of science is not performed by "real statisticians". The amount of actual scientists that misunderstand p values and confidence intervals is abysmal. The Bayesian alternatives are strictly superior in every way. Likelihood ratios are vastly more intuitive and can't be confused for probability. Frequentism needs to die.

>faggot
Why the homophobia?

And I am giving up on you

>So a study is done and comes back with a p value of 0.05. Is there now a 5% that green jelly beans cause cancer?
No, because that‘s not how you interprete the p-value. Pleas look up hypothses testing and the interpretation of the p-value und learn it (really just the basic concepts, doesn’t matter if Bayesian or frequentist)

Brainlet here.
So i already did and I still don't get what you mean by real interpretation of p values

I understand the concepts. I was explaining them. P values do not represent probability, but evidence. And much weaker evidence than most people assume.

>P values do not represent probability, but evidence.
No. P values represent a portion of a probability distribution used to describe an experiment. They can be generated for any data and are agnostic to evidence.

Yes I'm aware of how it's calculated. The only reason anyone CARES about that calculation is because it happens to be a good estimate of the evidence for a hypothesis. If it wasn't, no one would use it and it would be pointless.

You are right, it's not a perfect estimate of bayesian evidence. But you are the one defending it for Christ's sake. I was just using it an example of how weak it was even under ideal assumptions.

Not him but a p-value has a very specific meaning and I'm not sure that you understand it, one of the problems with p-values is that they are commonly misunderstood, even by educated folk

I bet you don’t even know the difference between p-value and the alpha of the significance level, because choosing p=5% is quite odd to make a point

Just how much can someone know all the technical terms but get all the concepts wrong?

>Yes I'm aware of how it's calculated.
Then you should know your statement was wrong. P-values are entirely about probability. Evidence is something the researcher infers from data.

>But you are the one defending it for Christ's sake.
I'm also not that guy going hard on Bayesianism.

So, wich is the difference?

P values indicate significance between data sets when compared to determine relationships. Past a certain point (I think .05) that’s when it’s deemed as statistically significant, and any nonnegative value smaller than that is just stronger statistical evidence of correlation. So I don’t really see the point in there being an alternative. Just have to understand what it means

It's .05 for real sciences. Higher for non-science fields like sociology.

>(I think .05)
why are you speaking authoritatively on something you admittedly know little about?

To be fair, he's got most of it

What exactly are you not understanding? Do you need me to walk you through it step by step again?

My problem is specifically with p value thresholds and the concept of "statistical significance". In a proper bayesian mindset there is no such thing as statistical significance. There are only varying degrees of evidence which can be stronger or weaker. Setting a threshold just leads to Bad Things. Like p hacking, extreme publication bias, and the widespread false belief that because a correlation or a study is "significant" that means it's likely to be real. We now have entire fields of science that are based on nothing but publication bias and random fluctuations in data.

Feel free to walk us through this tripe again

honestly does psychology have any value at all

>What exactly are you not understanding?
Why you don't start understanding that you are at Mount Stupid (pic related). Why you don't stop defending your current understanding of the topic for days and start questioning your understanding instead. Why you are not reading the definition of the p-value again carefully somewhere (e.g. this thread: , ) to check if your current understanding is correct.

And I'm not convinced YOU understand it. You've not made a single argument against me. Just vague ad hominems that I don't understand, with nothing to back that up. If I'm so wrong show me where. Put me to shame.

P values calculate the probability of the null hypothesis producing a result as extreme or more extreme than the observed result. Is that not correct?
Converting p values into bayesian evidence is a bit of a rough process that requires some assumptions. But even if you make ideal assumptions it doesn't come out well.
That is, assume that in every world where the hypothesis being considered is true, the observed result is produced. And in every other world, it has a 5% chance of being produced just randomly.
Do the math and you get an odds ratio of 20:1.

It's also funny you mention Dunning-Kruger, because it's one of the many studies that hasn't replicated well (and even the original study showed nothing like the graph you pulled out of your ass.)

One user has already given up on you, your wording still isn't great but you've finally got to a definition of a p-value that's mostly correct
I'm tired of this thread so I'm giving up on you too

>There are only varying degrees of evidence which can be stronger or weaker.
"Stronger or weaker" is a judgement call made by humans in the interpretation phase of data analysis. P-values feed into that interpretation phase but contain no value information themselves. That's true for Bayesian and frequentist statistics alike.