as a PhD statistician, can confirm
we already can analyze Veeky Forums
I don't understand statistics
I feel like everyone doing a science can benefit from a double major in statistics.
Double major in math helps too, but probably more for theoretical aspects, which may not apply directly to your field (might be good trivia).
My b
>1000 people CAN be used to generalize. As long as the sample is randomly selected, and therefore, representative of the population.
This seems wrong on its face
If within the 325 million people there are (say) 2,000 "types" of people, then you physically can't get a representative sample out of only 1000.
Being representative is not a property of a single sample, but of the algorithm that carries out the sampling.
A single representative sample doesn't have to capture everything, but you need the sampling algorithm to capture everything if done repeatedly.
For a simple toy example, suppose I hand you a coin and tell you that it's loaded so it only turns up heads 10% of the time. To test this claim, you flip the coin 10 times.
This constitutes your sample: 10 observations from the infinite population consisting of all times that the coin is flipped.
Now, how many "ways" are there to flip that coin? Arguably infinitely many, depending on the number of revolutions it makes in the air, the amount of force applied, the height at which you flip it, etc. In this sense, no finite sample will ever "truly represent" the infinite population.
But whether you agree with this perspective or not, it's impractical to say that flipping a coin 10 times, or any finite number of times, will never tell you anything about whether the coin is loaded or not.
Analogously, no sample of the American population (short of a full census) will allow you to generalize with 100% certainty. But if you're willing to compromise on the 100%, statistics can quantify the level of certainty you're allowed to have, based on details of the sampling algorithm (e.g., number of observations, underlying statistical model, presence of sampling biases etc.).
>random
>representative
here are your two spooks
>>As a statistician, you predict the future essentially and learn how to handle uncertainty and real numbers - essentially a magician.
only rationalists believe that statistics bring any kind of knowledge, and that this knowledge somehow corrects the ''what is sensed''
its like if you randomly pick 1000 people, what are the chances that so many of them have you chosen will be from a select group.
Not sure I ever saw a questionnaire that tried to sort the sample in to 2,000 possible categories per question, though.
Survey documents typically offer a few choices, and then may break some of them down into a few sub-choices.
But frankly, nobody cares about something 1 out of 2,000people think. If that few people think a certain way about a certain thing, who cares?
Of course, you're right that one sample can never "truly" capture the population. Although astronomically unlikely, it is still a logical possibility that all 324900000 people not included in the survey prefer apples to oranges. Same with confidence intervals. Sure, you can say that the true proportion is in the confidence interval 99.9% of the time (say), but you can never actually know whether the confidence interval of your original sample contains the true proportion unless you give the survey to all 325 million people. The reason statistics "works" is because we don't expect it to be 100% precise 100% of the time. It's not a crystal ball that can read the minds of millions of people or predict the future (remember the presidential election?).
Flip a coin, it may give tails with a 100% rate.
Flip again, it may give tails again, 2/2, 100% rate
Flip again, it gives you a head, 2/3 rate of tails or 66%
If you flipped the coin a thousand times, the average would be close to a perfect 50% rate.
That's how it works with polling too.
The first poll may be 95% accurate.
Doing a second poll may boost your accuracy to 98%. And doing a third boosts it to 99%.