I'm new to crypto, and I've made a good amount of money so far just listening to directions from this board for the past month. I hopped on Litecoin when i saw the bitcoin futures open and Charlie Lee's planned interview on CNBC.
But i'd like to try something more quantitative.
I work in the field of Machine Learning at a top 4 tech company (Amazon, Facebook, Google, Microsoft), and I studied and taught the subject in university. It's the study of tuning an offline learning agent based-off historic-data.
I can put things together quite quickly on the programming side, and i've seen API's released for coins' data on GDAX.
So my question to you all, what features do you think would have predictive power in determining the direction of a coin's prices?
I will be tuning via cross-validation, and if i stumble onto anything worthwhile, i will post my good picks onto this board under this name.
You need to run a pajeet ratio. If too many pajeets are involved, the predicted price should be lower
Lucas Jackson
>what features do you think would have predictive power in determining the direction of a coin's prices? Basically any positive news about the coin question. Most coins moon after a good announcement that most of the time you can easily see coming.
Juan Green
Twitter / plebbit / google trends.. The more talk there is about a coin, the more value it gets.
Logan Bailey
Everything?
cross-coin correlations for different periods. Sentiment on twitter and Veeky Forums google trends
Jordan Brooks
Volume, wallet holders/addresses, news mentions
Austin Hall
don't give this guy free info, he's going to use it in his model to make millions at your expense
at least post a BTC address so he can pay you before you provide your expertise
Christian Perry
*in question Also this.
Owen Harris
as someone working in the same industry (but at a startup) i've thought about doing similar experiments on stocks
my thoughts were to try a CNN but the kernels would run across open, close, high, low, and volume at a certain interval (15m, 30m, 1d, etc.)
trivially, it could learn things like MACD, RSI, etc. but nontrivially I think it could learn internal indicators that humans have never thought of (as CNNs tend to do)
if you want to start a discord or something, id join
Jackson James
1Equ9TBAowaSPrU5CnNnjxTAa9P4tLE31g BTC
Robert Powell
yeah, i've used the Twitter API for a similar project for capturing public sentiments?
Any other sources to capture public sentiments?
Also, this only would deal with the normie population, it would not really tell us about the Whales.
I was toying with the idea of a recurrent-neural network reading transaction histories to identify whales.
not to sound harsh, but most Indians I run into in work are entirely disappointing in their practice. They don't learn much creativity in school. Much like the Chinese, their education is based-off regurgitation.
Any luck so far? What type of model?
Asher Nguyen
What model are you planning on using? I've been interested in applying recurrent neural networks to this but I haven't had the time to flesh it out.
Features are probably going to be things like trade volume across multiple time windows, price obviously, maybe frequency of mentions across social media. You're probably better off not asking Veeky Forums though and studying a bit of TA.
And if you're for real about where you work, are you hiring undergrads?
Jack Cruz
> id join same
Ethan Mitchell
yo can you guys get me a job? about to finish my m.s. in statistics.
we can talk about crypto all day. thx.
Jack Bennett
ah, a friend of mine was using this on stocks (and wrote a thesis on it), it's entirely too difficult because of an imbalance of information.
ah, a deep-learning enthusiast. Hello friend, that is my area of specialty I suppose.
Why the convolutional layer?
Lucas Jenkins
From my experience filtering + order flow + order books works fairly well OP, I've been using wavenet DNNs with some degree of success in some toy models. RL models are also gaining momentum now see this > risk.net/awards/5368836/quant-hedge-fund-of-the-year-man-ahl
Still, if you are serious about this I wouldn't worry too much about forecasting prices directly, rather you should focus on forecasting derivative functions of the price. I trade momentum in my models and so far online filtering + regime forecasting has beaten every machine learning setup when it comes to crypto. Your experience might vary of course.
Brody Ortiz
oh consumers....
Jacob Ortiz
Convolution makes sense because you could (potentially) feed in raw data and treat the whole layer as a filter. Also, if you look at it from a signals processing point of view you are essentially doing noise decomposition ala wavelet transform.
Jordan Young
Multi-agent RL environments are becoming more popular these days. I've also been wondering if these could be use to model price behavior - has anything like this been done? Might be a fun project.
Cameron Rodriguez
I can tell that you lack practical ML experience... Friendly advice: Use DL and DO NOT CROSS VALIDATE TIME SERIES DATA
Daniel Parker
I was thinking about this today: You have bassically two kind of coins 1- Top 10 ( maybe top 20) : the ones that thanks to BTC , the price rise and Btc futures announcement now have a bigger chance than ever to penetrate normie acceptance and adoption. This coins use to be what SUB, ARK , SALT and a lot other are today. In the beginning of 2017 everyone was talking about ltc,eth , ripple and now they have their exposure. Consider Monero in this list beacuse Coinbase is adding it. 2 " Second Wave " Coins : ARK , SALT , SUB are now in the phase of having secondary growth potential + the chance to gain a lot of adoption after the first group finish the market penetration You could imagine some kind of money influx ratio between the top ten and the second wave, where the first group have the bigger piece of cake. If you are interested in the topic , ask. Also sorry for my english..
Austin Reyes
I've been thinking about using numerical derivatives of different timesteps as the feature (and as the output!)
However, that might not even be necessary.
If you recall, we like neural networks for one reason. If the network is deep enough and the regularization parameter is tuned well-enough, the neural network can approximate any function, not necessarily smooth even!
Meaning suppose we have some function
f: D ->R going from D the set of our timesteps prices, to R, the derivative of the next time-step's price.
It may be that it would develop the features of numerical derivatives itself if they are helpful.
I have to go over the proof again to make sure, since these would be different functions every time, it may be much easier to just give the prices and the numerical derivatives prior to training.
Nathan Robinson
I'm not sure about modelling price behaviour but its been used under the assumption of existing arb opportunities. The network is then signed in a typical policy-reward structure and the agent figures out arbitrage spots by itself. See linked article, they are definitely doing it.
Brayden Morris
There have been many people before you that have tried this path. The common consensus is that the market is unstable as fuck, driven by whale randomness in their pump and dumps, and that the top 10 or so hold their values and grow well.
Pretty much any analysis you run is going to tell you hodling is the answer. If you're lucky/smart, you're going to be able to find extraneous factors to better predict whale movement, but none of that is going to be readily available when limited to what you can see on exchanges. You'd have to include outside semantic analyses on the news, and it'd probably be worthwhile to run through the blockchain and find publicly available ledger transactions for big transactions in BTC sales and exchanges, preferably tagging public keys known to hold a lot on the ledgers which are clearly making moves to split their funds into different accounts or spread into different altcoins. You could then build a bot to move as they move, but this would be a hefty project to take on and it'd require a lot of computing power as the money branches off into several different paths.
Oliver Rivera
traditional technical indicators are kernel-based if they have any basis in reality, a CNN would do a better job than human hand-crafted ones
that was my line of thinking, at least, never got around to testing it. the imbalance of information makes sense. even professional traders use a mix of fundamental/technical analysis.
I always assumed you would have to feed in sentiment as well to balance the technical information and at the time sentiment was beyond my scope
Sebastian Kelly
I'm doing the same thing
Look into order book configurations and Ichimoku Clouds
Hudson Foster
In practice, I have gotten much worse results from CNN's on time-series data for that same reason. But if it worked for you, congrats.
interesting on the RL. Also DNN's you mean Deep Neural Networks? I haven't heard of that acronym
Ah, that was my original idea to forecast derivative functions of price. Did you use numerical derivative functions of price as inputs as well?
Levi Reed
Nobody has mentioned google trends for crypto sentiment. Use it as a contrarian indicator. Google trends is actually surprisingly accurate in reacting to price. Just look up in google trends "buy btc" or "sell btc" IDK about API though
Leo Barnes
CS background? I think you are approaching the problem from a slightly unfavorable angle. In your place I would focus on existing market anomalies and would try to design models that take advantage of those. Rather than predicting prices or price direction outright desu.
The problem with non-linear setups is you might just end up modelling a ton of noise and overfitting time series data is a HUGE problem in the industry. Also debugging, assessing performance, etc is quite difficult with what essentially is a blackbox.
I come from a pure math background, so maybe I'm looking at this the wrong way, but thats just been my experience in practice. If you want to go the non-linear way, I would model supplementary functions to price (like volatility) and use those to model future events.
Gabriel King
I think a good use would be using your machine learning tool to discover which shilling sites are the best predictors of impending coin moonage, and how much advertising a coin needs to get before it starts to moon.
Other ideas: - Percentage of coin that has been mined vs price action - Date from ICO vs price action - Date from listing on major exchanges vs price action
Josiah James
I hold math and CS degrees
Why should noise be a problem?
Ah, very nice. is there an API for it? have you used it programmatically at all?
If not, time to break out the beautiful soup
Levi Hughes
exactly, you need both technical and sentiment, and in a neural network, it doesn't matter so much where things go.
Why assume they are kernel-based? And you use that word in the sense of similarity functions? of what?
Brody Morgan
my bad, just noticed the "IDK about API though"
David Flores
> Why should noise be a problem? Because a non-trivial model will give you a great in-sample Sharpe but will most likely have very little predictive power out of sample. In other words "shit in, shit out" (and nowhere else does this apply more than with financial data)
Ethan Davis
kernel-based in the sense of a weighted window scanning over the time-series data. Most technical indicators are a simple average over a specified timespan, the SMA (simple moving average) is the most basic example. the EMA (exponential moving average) weights recent data more heavily than older data. The MACD is the difference between, for example, a 26 day EMA and a 12 day EMA.
Angel Morris
> In practice, I have gotten much worse results from CNN's on time-series data for that same reason. But if it worked for you, congrats. Just to clarify, only really used it in tests and toy models, never committed any money to those as I said. My life models rely on nothing more than trend, reversion and regime prediction.
Christopher Stewart
*live
Tyler Torres
hm never thought of those, very interesting. writing them down.
I have yet to run into that problem.
I'm using both financial and non-financial inputs, a bias for affine functions, and a fair regularization. My features are also derivatives of the inputs not the inputs themselves.
I also cross-validate, and compare loss per epoch in the training versus test set, to see when over fitting starts to occur.
Haven't encountered many noise-problems in the industry. Perhaps a PCA/SVD with the Semi-Circle Law might help? Used in semantic indexing a lot
You're right in saying that time series data cannot be cross validated in the usual sense, though.
Matthew Nguyen
Well, you ought to know when big whales move coins from wallets to somewhere else
Colton Taylor
You cannot measure the market relying only on information from inside the market.
Welcome to this industry, but hope you will not succeed.
Sebastian Reed
It is really not that hard to use ML to find an edge... Yes, I have done it, and no I did not use price data alone. Why don't I use it anymore? Too much work to update models compared to hos easy it is to make bank on trading random alts.
Jose Gutierrez
Oh yes you mean deep learning probably.
What do you mean by anomalies? Are you talking about arbitrage?
Henry Thomas
(OP) Hello, I'm a developer, can we start a discord?
David Brown
BTC price movements are correlated to the number on Pink wojacks on Veeky Forums
screencap this
Brandon Cox
AH i see what you're saying now. sorry, kernel is one of those words with 5 different common-usages.
and fair point you have there. Maybe I have a bias toward CNN's outside of image recognition.
Ah, well it helps to build the different NN variations by hand with arrays and play around with the differences in model and see the effects. That's how I learned them anyways. For a lot of these things, the proofs as to why they work in different situations don't exist yet, so a lot of it is intuition and this is how I developed mine. (and lots of Kaggle)
. What are reversion and regime?
Henry Johnson
>Rosenblatt A bit pretentious there op
Luke Ortiz
how?
Kevin Morales
join us at vectorspace.ai my friend
Ethan Gonzalez
Why not automate all of it?
Jason Lopez
what's wrong with my name?
Noah Brooks
I'm assuming he means 'deep learning'.
> My features are also derivatives of the inputs not the inputs themselves That certainly helps. My comment re: just modelling noise, really was meant for raw data input. If you are doing some feature engineering that will help.
Still, not sure if you've ran anything live yet but overfitting and various data snooping biases are especially hard to navigate in finance. Here's a good paper on this topic (and others if you are interested): papers.ssrn.com/sol3/papers.cfm?abstract_id=3031282
Christopher Walker
Its all public knowledge user, you just need to know the address of a certain wallet.
Leo Perry
ah, sounds like a tough problem in itself since those can be created in an instant and transferred/split etc.
ah, il bookmark it for tonight.
And yes, raw inputs tend to produce quite a bit of noise in this sector. However, with some time and ingenuity you can mostly eliminate that problem.
Adam Cruz
Make a discord already
Luke Rogers
"Anomalies" is kind of an antiquated term sorry... Kind of a hold over from the early 80s still used in most quant literature. Anomaly simply means something that disproves the EMH. Keep in mind that the EMH itself is pretty much bunk, even Fama himself almost went as far as retracting his whole paper (just not far enough lol).
So "momentum" would be an anomaly (since theres no rational reason why a rising asset is more likely to rise in the future under the EMH assumption), "january effect", "price drift", etc, etc
Once behavioural finance got involved all of it found some theoretical base (see Hong & Stein on momentum for example) but the term "anomaly" remained.
"Mean reversion" is, as the name states, the tendency of something to revert back to its mean. Which is pretty much the base of many pairs trading/heding strategies. "Regime" here means bear vs. bull, or trending vs. stagnating, etc. just general the general market condition
Chase Baker
Ah, a thoughtful explanation, thank you.
Also, thank you for explaining Anomalies, I was thinking geometric anomalies in Euclidean space of the features. (like price doubling overnight or something of that sort)
I guess that goes to show my expertise lies in Math/CS and not in behavioral finance haha. Although you seem to be quite knowledgeable in the field. I hope it's working well for you
I'm off to do some feature scraping from Twitter and Google and whatnot.
If the thread dies and you'd like to continue our conversation, please make a new one and call for Rosenblatt. I'm usually on about noon PST
Jordan Ramirez
I would try to hook into blockchain explorers and just monitor every transaction above some threshold or keep count of addresses with lots of small inputs
Aiden Evans
>features do you think would have predictive power
Certainly, as time increases so does the price of bitcoin. That's a really good indicator there
Cooper Stewart
lol, I'm no behavioural scientist myself. Fin/math background, just been algotrading for a while now. Good luck and drop me a message at "biznessanon [at] protonmail [dot] ch" if you wanna bounce around some ideas