Make Money from Machine Learning

hey Veeky Forums,

MLfag here. been thinking about different ways to make money off of Machine Learning (still in school, also pet projects are pretty fun).

Two that come to mind are predicting sports books/horse racing and online poker.

Anyone have experience with these kinds of things?

If i could get enough horse-data, random forests looked like a good option, since i dont think horses have long enough careers to make neural nets viable.

Poker seems like it would be saturated with people that got to it first.

Other urls found in this thread:

en.wikipedia.org/wiki/Claudico,
nytimes.com/2015/03/22/opinion/sunday/making-march-madness-easy.html?_r=0
twitter.com/NSFWRedditGif

>Poker seems like it would be saturated with people that got to it first.
There are already bots playing online, but I don't know how sophisticated they are, probably not very. They play winning though so that's one way to earn money. I know heads up limit holdem is a solved game, but state of the art bots can't beat top level holdem players heads up with deep stacks, see en.wikipedia.org/wiki/Claudico, so there's plenty of stuff to accomplish still.

Poker is not smart.

Your bot will basically just learn to play the odds of its hand, unless you do something really sophisticated.

A smart human player will be able to pick up on this after enough hands, and will bust your bot.

Sports is doable, but machine learning may be overkill. More traditional statistical methods would probably be more appropriate.

well machine learning is how you implement the statistics

i've read that as well. some people seem to disagree that it was "solved".

and i wouldn't need to beat top-level players, just the average online poker player

NumerAI is a good way to practice applying machine learning while also getting compensated with bitcoin.

Reminds me of this article.
nytimes.com/2015/03/22/opinion/sunday/making-march-madness-easy.html?_r=0
tl;dr good features are more important than good models

>i dont think horses have long enough careers to make neural nets viable
Depends on the net. Yeah they can require a lot of data but only if you go crazy with the network model. Logistic regression is basically the simplest NN you can get.

>NumerAI
very interesting. Aside from what's in the top 10 google searches, anything you'd like to note about it?

if i used a logistic regression activation in the hidden layer, and a softmax output activation, with only one hidden layer, i think i'd still need ~10k observations.

horses dont race that many times

It's a good excuse to learn more ML techniques. Ultimately the stuff I have learned trying to be as competitive as possible has been more valuable than the money I have made (about 70$ in the past few months so that's not really saying that much).

ever do Kaggle? Those are pretty solid and have a good community as well.

Yeah, in fact I think I found out about NumerAI through Kaggle. Though the question of the thread was how to make money from machine learning and I think it is much easier to do that through NumeraI

I thought you were going to start a company on convolutional neural networks to solve some problem and then it will be bought by a big company for hundreds of millions. Guess you just want to gamble.

thanks dad.

ah, right. Yeah, all the Kaggle competition winners are teams. Berkeley's was pretty wizard level too

ML great at interpolation, horrible at extrapolation

You'll find the problem is getting the data. People know for ages how valuable data is and they figured out what kind of data is the most valuable, so they at least started to make it as difficult as possible to get it. There won't be some database online with everything you need. Chances are those just don't exist. So you end up scraping for data with little chance to get enough of it. If you can find larger amounts of data, then you'll quickly find that people already tried what you are about to try.

>There won't be some database online with everything you need. Chances are those just don't exist.
If OP wanted to do this a few years sooner, he could easily buy hundreds of millions of handhistories. The site that offered them has gone down though, but you can probably still get them somewhere or just mine your own.

Sports like football probably wouldn't work, because the teams change nearly every year, and machine learning is slow, so you probably will never enough data before a team make a major change.

Modular machine learning with neural networks is a thing though.

>unless you do something really sophisticated.
adding noise to its decisions is not sophisticated

undergrad wannabe MLfag here. I haven't got much substance to contribute - mostly replying to bump this post

iirc most lucrative sports betting is about predicting the point spread - by how many points does the winning team win? Your objective is to predict the point spread as a function of both teams' individual players' career stats, and put down money when you think the house is wrong.

Why does the horse's career length matter? You don't want to predict the winner as a function of the horses' names, you want to predict it as a function of their performance statistics.

sportsfags tell me that a football team's performance as a function of individual player stats is simpler to approximate than a basketball team's - the latter involves more teamwork, where the former is more additive.

You might want to just try and find some really fucking obscure sports. I remember hearing about a guy who would travel the world looking for obscure kinds of races on which to bet. He'd always use the same exact technique, and would get banned within a few months after cleaning them out, so he'd just find another sport.

hmm. interesting thought about the horses.
So you'd use past speeds on that course of similar weighted horse/jockey combinations? or what? Random Forest and neural networks seem like the obvious option.

but if the horse's career is less than 5 years, then you won't that much statistics on them.

it seems like you've researched this a bit more. I have quite a bit of technical knowledge, but not much in the domain.

if you have any questions regarding algoirthms to use etc. please ask.

sorry, that comment represents almost all of my domain knowledge - I'm just repeating what I overheard in a meeting at my school's sports analytics club. I was just surprised that so many people in this thread were talking about having little data on a candidate - it's not the name of the team that matters, it's the lineup, and there should be much more data on the lineup

But yeah, idk if I can think of anything better than what you just described. I'm sure there's some correlation between courses though - maybe cluster horses together based on how their performance differs between, say, rough tracks and smooth tracks. If you were doing linear regression, you'd have a different regression function for each

I've never actually encountered random forest in a course before. Afaik from the wiki intro, it's an ensemble of decision trees, but why do I hear about it so much more than the usual information gain strategy for single trees?

i'm gonna get together with a friend of mine who does sports betting and try some stuff. maybe check back in january i browse often.

if you really care about actually making money rather than just having a pet project, you should research the fees that the online services charge (it's not like they'll let you bet for free). also i'm guessing that your winnings would be taxed as income. so consider whether your competitive advantage from your algorithm will be enough to make this a more lucrative strategy that just putting your money in a stock market index fund (which will most likely have lower service fees and will be considered capital gains, so taxed at a lower rate than income)

in a decision tree, the deeper you make the tree, the more likely you are to overfit (get good results on training set and bad on validation set).

A random forest is an ensemble method that uses a bunch of decision trees. However, each tree has only a certain percentage of the sample. say, make 10 trees each with 15% of the data, overlapping a bit (which is okay).

Then you use the average of the posterior probabilities as your prediction. (meaning you put in your test data point, and out of the 10 trees, each one will output a guess and a probability. You average all the probabilities per class then predict using the highest.

It reduces overfitting really substantially, So you can have deeper trees and still get good accuracy.

It uses the same entropy calculation for maximizing information gain.

there's also some other tricks you can do like forcing certain trees to specialize on edge cases by feeding them more outliers etc.

It also has a certain readability quality. As you can look back through the tree and see what steps lead to the decision.

pretty cool imho.