Help me out Veeky Forums. Does this look like a negative binomial distribution, Poisson distribution...

Question

Help me out Veeky Forums. Does this look like a negative binomial distribution, Poisson distribution...

Angel Wilson

Help me out Veeky Forums. Does this look like a negative binomial distribution, Poisson distribution, or some other distribution? Why?

December 26, 2017 - 06:43

Other urls found in this thread:

en.wikipedia.org/wiki/Stopping_time)
reference.wolfram.com/language/ref/FindDistribution.html
reference.wolfram.com/language/ref/FindDistributionParameters.html
ufile.io/8xcui
reference.wolfram.com/language/ref/CensoredDistribution.html
twitter.com/SFWRedditImages

Bentley Gonzalez

Kinda looks like Boltzmann doesn't it?

December 26, 2017 - 07:02

Jaxon Jones

why dont you do some parameter fitting and find out how good you can make each distribution match?

besides, from a philosophical level, the question is kinda pointless. there are always going to be deviations between empirical data and a hypothetical distribution, even if you've chosen the correct model. the real question is, how much deviance are you willing to tolerate? if you choose negative binomial, are you willing to tolerate the error in your analysis?

December 26, 2017 - 07:07

Carter Anderson

It's discrete tho.

December 26, 2017 - 10:57

Andrew Evans

...discrete approximation of an offset Gamma?

December 26, 2017 - 13:20

Brandon Collins

I think it can't be a Poisson or negative binomial distribution because it's 0 at 0.

December 27, 2017 - 02:02

Ethan Hernandez

I'm just trying to find a plausible theoretical distribution that is generating this empirical distribution.

December 27, 2017 - 02:03

Juan King

That's too broad. You should think about what kind of process is producing your data and try to match that to a distribution, otherwise you'll probably just overfit.

December 27, 2017 - 02:06

Henry Jenkins

What are you recording?

December 27, 2017 - 02:08

Nathan Howard

It's the stopping time (en.wikipedia.org/wiki/Stopping_time) for a stochastic process.

December 27, 2017 - 02:31

Gabriel Long

Then why would it be a discrete distribution? Are you parameterizing a stochastic process by a natural number and taking the mean of a bunch of stopping times?

December 27, 2017 - 03:28

Landon Green

Oh nevermind it's probably a discrete time process.

December 27, 2017 - 03:32

Thomas Powell

It's a discrete-time stochastic process.

December 27, 2017 - 03:33

Gavin Nelson

Some attempts at fitting.

December 27, 2017 - 03:34

Josiah Anderson

Forgot to mention, I used scipy.optimize.cuve_fit.

December 27, 2017 - 03:36

Landon Reyes

If you have Mathematica there's functions like
reference.wolfram.com/language/ref/FindDistribution.html
reference.wolfram.com/language/ref/FindDistributionParameters.html

If not detail the process and I can try running them for you.

December 27, 2017 - 03:42

Jaxon Cook

I have Mathematica. Is there a version of these functions that takes in an empirical pmf/pdf (in this case, list of probabilities for each time value) rather than the data itself? I'm using ~1 million data points or more.

December 27, 2017 - 03:46

Samuel James

>How much deviance are you willing to tolerate
Furries are where I draw the line, personally.

December 27, 2017 - 03:56

Ryder Turner

It is a lognormal distribution. Mathematica has an excellent fit function for it.

December 27, 2017 - 03:57

Xavier Jones

Not that I know of. A million isn't that many, I'd just bite the bullet or maybe take a random sample and try the functions on that.

December 27, 2017 - 04:04

Nicholas Butler

Not sure why the log normal fit looks like that.

December 27, 2017 - 05:10

Liam Wood

Fixed typo (model3 to model4) and removed the location and scale parameters from GammaDistribution. Looks much better now.

December 27, 2017 - 05:14

Oliver Baker

Upload samples.dat somewhere.

December 27, 2017 - 05:17

Jose Sanders

ufile.io/8xcui

I ran the stochastic process up to a maximum of 99 steps, any sample of the stochastic process that continued after that is labeled as -1.

December 27, 2017 - 05:21

Jackson Stewart

Inverse Gaussian distribution looks like a pretty tight fit. The plot thickens.

December 27, 2017 - 05:33

Cooper Collins

Have you tried a Landau distribution?

December 27, 2017 - 05:34

Elijah Parker

Doesn't the Landau distribution have support on negative values, tho?

December 27, 2017 - 05:36

William Rogers

no idea. It looks like a photopeak efficiency curve for a scintillator that I've used before.

You could also try a*exp(bx^2+cx+dx^-1 + fx^-2 etc)

December 27, 2017 - 05:39

Austin Torres

Here's what FindDistribution suggests in case you haven't already tried that.

Let me say though that the number of runs that went past 99 seems excessive to simply delete them. As you can see at the bottom here, the best distribution Mathematica could find assigns only a tiny tiny fraction of the probability mass to values beyond 99, whereas the actual data suggests that as much as a quarter of the mass should be found there. I don't know if there's some way to constrain the search to account for this but it's something to think about. Maybe try looking for some heavy-tailed distributions.

December 27, 2017 - 05:57

Luke Rivera

Huh, you're right. The fitting process would naively think that values > 99 have zero probability. Not sure how to constrain it to account for that. MLE on the range [0..99]?

December 27, 2017 - 06:16

Adrian Williams

I guess I could go back to fitting the PDF of each distribution to the points we do have values for.

December 27, 2017 - 06:17

Tyler Allen

Try replacing all the '-1's with '100's and fitting the same distributions with this applied to them:
reference.wolfram.com/language/ref/CensoredDistribution.html
with xmin = 0, xmax = 100

December 27, 2017 - 06:22

Jordan Cruz

Shouldn't it be TruncatedDistribution?

December 27, 2017 - 07:02

Tyler Lopez

That's what you'd want to use if you just ignored all the -1s.

December 27, 2017 - 07:05

Christopher Lopez

I see. This is what it looks like now.

I forgot to mention that there is a nonzero probability that the stochastic process never stops, hence some of the samples don't stop at *any* time. Any ideas on how I can handle such a semimeasure/"deficient" distribution?

December 27, 2017 - 07:55

Juan Thompson

Something just occurred to me: Maybe a mixture distribution with one sub-distribution being one of the parametrized distributions and the other being a delta at 100.

December 27, 2017 - 08:03

Aiden Anderson

any chance you could upload something more broadly compatible? i was going to play around with this in R until I noticed it was a mathematica file format

December 27, 2017 - 11:13

Jaxon Hernandez

readBin('~/Downloads/samples.dat', "int", n=1e5, size=8)

this works fine for me

December 27, 2017 - 11:41

Connor Clark

ah, thanks, I didn't know about that function

December 28, 2017 - 01:32

1 2 ... 4 Next

Help me out Veeky Forums. Does this look like a negative binomial distribution, Poisson distribution...

Last threads