Hi Veeky Forums, /sp/ here...

Hi Veeky Forums, /sp/ here. I built a machine learning implementation for predicting point spreads that uses 2 different algorithms and identifies consistent findings by both methods as "safe" bets. Basically, I've been able to get near 60% for a small subset of games.

However, I have an idea that stock prices may be slightly more data driven, and therefore a better target for my algorithm, and I'd like to test that hypothesis.

I was wondering if there are any sources of recent stock price data for a huge number of stocks that I could use as inputs, and also whether there were any resources where I could like buy mock stock with fake money to test the results without risking any actually money.

I know that I've gotten shit on /bet/ for thinking I can build a better algorithm than the sports books, but I know I'm not going to get rich, I just do this stuff as an autistic hobby.

Other urls found in this thread:

quantopian.com/
twitter.com/NSFWRedditImage

no stock api is going to be free. my dad has a small hedge fund with his buddies (they're all retired devs playing with their money), they have a broker who grants them access to their api

Yahoo finance API is delayed but free

Would it be possible to scrape and build an API off of a site that posts current stock prices?

how delayed? That might be fine at least just for testing, especially since if I make "predictions" on delayed data, the true result may already be out.

quantopian.com/

>Quantopian provides a large set of financial data for free. The free data includes corporate fundamental data and minutely trade price and volume data from 2002 to present day for all major US exchanges. You can work with this data immediately.

Forgot text.

I was planning on using Google finance, there is a way to get stock data as Json from there I believe. It is however delayed ~15 minutes like yahoo.
You can also make mock trades and deposites but IDK if u can through HTTP.

I'm interested though, I've already got a node JS app that queries cryptocurrency data for me, just need to get some analysis together and I'm good to go. Perhaps we share?

Google sheets actually has inbuilt functions to pull from google finance.

Although I was doing that 5 years ago and that functionality might be improved/removed.

15 minutes won't be a problem. The algorithm needs about an hour to train on a 4 processor computer, and I'm not interested in purchasing any cluster time.

The implementation is right now totally only for NBA data, and I have no indication beyond a hunch that this will work for stocks. But I'll give you the basic idea:

It would take stock history data, like price trends over the last several months, across a couple hundred stocks. The more the better, then it treats the price data as a hi-dimensional vector on which I train an ISOTOP implementation (though any dimensionality reduction method will work). The resultant clustering then identifies historically correlated stocks. I can't find any published work on dimensionality reduction in the use of stocks, but I haven't looked especially hard.

The second algorithm is a hidden markov model which will give an idea of which of the stocks are increasing and decreasing, and the likelihood of their switching between states. There are some published findings to suggest that HMM is a powerful tool for stock analysis.

On the trained reduced demensionality set, any stocks identified by the HMM as high-likelihood to switch to increasing, and are clustered in a region with stocks that are currently in an increasing state will be identified as stocks that should be purchased, and any owned stock with the opposite trend should be sold.

With basketball the reason this wasn't so good wasn't because the algorithms didn't pick games well enough, its just the number of instances of "safe bets" were too low. But there are significantly more stocks than there are basketball teams, and the data is more robust at that, so I am hopeful that this could work. If it does work, and works well, I'll make another thread over the next couple of months with more concrete information.

/blog

Nice. How are you going to build those methods and algorithms?

The ISOTOP algorithm is built already such that it takes any set of input vectors pretty robustly, and I should be able to code up a way to input historical data if I can figure out a good source, which this thread has helped with. Then the question will be how exactly to bin up the historical data into vectors.

The HMM is a little trickier as it will need to be totally reworked, however there are published examples of pretty much the exact implementation I need.

All in java.

Forgot to post a qt

Out of curiosity I want to ask you do plan on only using OHLC data or is there some other data you use? Do you think that you'd get better results if you also included market microstructure data?

Anything over 55% reliably for sports gambling is already fucking fantastic.

Teach me your ways master

It only worked because it relied on very small market vs large market games on back to backs, in which the averages favored the small market team w/o the point spread favoring the large market team. The "averages" were calculated using some webscraped scoring and opponent-scoring metrics that I kind of ad hoc, trial and error estimated. So basically if the algorithm decided the small market should be giving points, but the sports book had them getting points, it's a safe bet. The problem was, there are only a handful of these games a season, so you know around 60% correct rate isn't as reliable a way to make money on such a small sample size.

I also never found any data that would allow me to see how the prediction would have worked over previous time periods, which would validate the success rate a little bit better.

I don't know yet. My plan initially is to see how things look with just the OHLC data. I can simulate a year of predictions by limiting the historical data available to the program and see how its predictions would have worked throughout the year of 2016. If I'm encouraged by the results I'll probably work on increasing how sophisticated the whole deal is, with more data sets, and fine tuning parameters and such.

I think I would get better results with microstructure data, thing is, I know next to nothing about markets or even stocks. Like I said, its all just a hobby and a learning experience. Gonna see where it takes me.

Why not apply your algorithm to cryptos? It satisfies your criteria. And most exchanges have free apis.

I've been studying the markets as a hobby for the past few months or so.
Is what you plan on doing something like find the different stocks that are correlated with eachother and then when one or more of them start a move before the others you would buy the ones that havent yet moved but you can reasonable expect them to because they are usually correlated?
You might do better including data about the market microstructure because it could give a lot richer information to provide clues about what is taking place, its basically how the market operates, its like the assembly language of the market in a way.
You also might use market structure, not in the same sense of microstructure, but in the sense of for example you have a group of a few correlated stocks and if a swing high is broken to the upside in a couple of of the stocks in this group but the others havent broken above a swing high then this may be a sign that there are larger players accumulating positions.

That's a good idea. Imma look into that. I found a pretty good stock API since making the thread tho.

>You might do better including data about the market microstructure because it could give a lot richer information to provide clues about what is taking place, its basically how the market operates, its like the assembly language of the market in a way
Is this data easily quantized? I already know how to get an algorithm to do what I want with OHLC data, but I am struggling to see how market structure can be defined numerically

>You also might use market structure, not in the same sense of microstructure, but in the sense of for example you have a group of a few correlated stocks and if a swing high is broken to the upside in a couple of of the stocks in this group but the others havent broken above a swing high then this may be a sign that there are larger players accumulating positions

That's a good point. I'll see about trying to get the program to do some post processing based on this.

I forgot again, sorry folks

protip: chad fucks all those girls for free while being a NEET

Mirin dat bmw 1602 in the background

this is idea I've been having aswell and I am currently majoring in AI to eventually be able to make this, any advice on types of courses I could take to get to your position faster?

>tfw

>any advice on types of courses I could take to get to your position faster

I am pretty much self taught as far as data systems, coding and algorithms go. I am a premed major, and I'm interviewing with medschools right now. I took a course on java in highschool, and then got a job in a genomics lab on campus. Most of my knowledge has come from developing tools for DNA big data analysis in the lab. Basically just practice +reading. But I don't wanna sound like some sort of master, this is just idea, whether I can get it working is another question.

I know that my school offers lost of computer science and coding courses, but not a whole lot of big data/algorithms classes. If you can find anything relating to the later go with that.

If you're interested in doing something with cryptos hmu. [email protected]