Machine Learning

Hello all,

I'm new to crypto, and I've made a good amount of money so far just listening to directions from this board for the past month. I hopped on Litecoin when i saw the bitcoin futures open and Charlie Lee's planned interview on CNBC.

But i'd like to try something more quantitative.

I work in the field of Machine Learning at a top 4 tech company (Amazon, Facebook, Google, Microsoft), and I studied and taught the subject in university. It's the study of tuning an offline learning agent based-off historic-data.

I can put things together quite quickly on the programming side, and i've seen API's released for coins' data on GDAX.

So my question to you all, what features do you think would have predictive power in determining the direction of a coin's prices?

I will be tuning via cross-validation, and if i stumble onto anything worthwhile, i will post my good picks onto this board under this name.

Other urls found in this thread:

risk.net/awards/5368836/quant-hedge-fund-of-the-year-man-ahl
scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html
papers.ssrn.com/sol3/papers.cfm?abstract_id=3031282
twitter.com/NSFWRedditImage

fuck you that's what i doing

(((Rosen)))blatt

You need to run a pajeet ratio. If too many pajeets are involved, the predicted price should be lower

>what features do you think would have predictive power in determining the direction of a coin's prices?
Basically any positive news about the coin question. Most coins moon after a good announcement that most of the time you can easily see coming.

Twitter / plebbit / google trends.. The more talk there is about a coin, the more value it gets.

Everything?

cross-coin correlations for different periods.
Sentiment on twitter and Veeky Forums
google trends

Volume, wallet holders/addresses, news mentions

don't give this guy free info, he's going to use it in his model to make millions at your expense

at least post a BTC address so he can pay you before you provide your expertise

*in question
Also this.

as someone working in the same industry (but at a startup) i've thought about doing similar experiments on stocks


my thoughts were to try a CNN but the kernels would run across open, close, high, low, and volume at a certain interval (15m, 30m, 1d, etc.)

trivially, it could learn things like MACD, RSI, etc. but nontrivially I think it could learn internal indicators that humans have never thought of (as CNNs tend to do)

if you want to start a discord or something, id join

1Equ9TBAowaSPrU5CnNnjxTAa9P4tLE31g
BTC

yeah, i've used the Twitter API for a similar project for capturing public sentiments?

Any other sources to capture public sentiments?

Also, this only would deal with the normie population, it would not really tell us about the Whales.

I was toying with the idea of a recurrent-neural network reading transaction histories to identify whales.

not to sound harsh, but most Indians I run into in work are entirely disappointing in their practice. They don't learn much creativity in school. Much like the Chinese, their education is based-off regurgitation.

Any luck so far? What type of model?

What model are you planning on using? I've been interested in applying recurrent neural networks to this but I haven't had the time to flesh it out.

Features are probably going to be things like trade volume across multiple time windows, price obviously, maybe frequency of mentions across social media. You're probably better off not asking Veeky Forums though and studying a bit of TA.

And if you're for real about where you work, are you hiring undergrads?

> id join
same

yo can you guys get me a job? about to finish my m.s. in statistics.

we can talk about crypto all day. thx.

ah, a friend of mine was using this on stocks (and wrote a thesis on it), it's entirely too difficult because of an imbalance of information.

ah, a deep-learning enthusiast. Hello friend, that is my area of specialty I suppose.

Why the convolutional layer?

From my experience filtering + order flow + order books works fairly well OP, I've been using wavenet DNNs with some degree of success in some toy models. RL models are also gaining momentum now
see this > risk.net/awards/5368836/quant-hedge-fund-of-the-year-man-ahl

Still, if you are serious about this I wouldn't worry too much about forecasting prices directly, rather you should focus on forecasting derivative functions of the price. I trade momentum in my models and so far online filtering + regime forecasting has beaten every machine learning setup when it comes to crypto. Your experience might vary of course.

oh consumers....

Convolution makes sense because you could (potentially) feed in raw data and treat the whole layer as a filter. Also, if you look at it from a signals processing point of view you are essentially doing noise decomposition ala wavelet transform.

Multi-agent RL environments are becoming more popular these days. I've also been wondering if these could be use to model price behavior - has anything like this been done? Might be a fun project.

I can tell that you lack practical ML experience... Friendly advice: Use DL and DO NOT CROSS VALIDATE TIME SERIES DATA

I was thinking about this today:
You have bassically two kind of coins
1- Top 10 ( maybe top 20) : the ones that thanks to BTC , the price rise and Btc futures announcement now have a bigger chance than ever to penetrate normie acceptance and adoption. This coins use to be what SUB, ARK , SALT and a lot other are today. In the beginning of 2017 everyone was talking about ltc,eth , ripple and now they have their exposure. Consider Monero in this list beacuse Coinbase is adding it.
2 " Second Wave " Coins : ARK , SALT , SUB are now in the phase of having secondary growth potential + the chance to gain a lot of adoption after the first group finish the market penetration
You could imagine some kind of money influx ratio between the top ten and the second wave, where the first group have the bigger piece of cake.
If you are interested in the topic , ask. Also sorry for my english..

I've been thinking about using numerical derivatives of different timesteps as the feature (and as the output!)

However, that might not even be necessary.

If you recall, we like neural networks for one reason. If the network is deep enough and the regularization parameter is tuned well-enough, the neural network can approximate any function, not necessarily smooth even!

Meaning suppose we have some function

f: D ->R going from D the set of our timesteps prices, to R, the derivative of the next time-step's price.

It may be that it would develop the features of numerical derivatives itself if they are helpful.

I have to go over the proof again to make sure, since these would be different functions every time, it may be much easier to just give the prices and the numerical derivatives prior to training.

I'm not sure about modelling price behaviour but its been used under the assumption of existing arb opportunities. The network is then signed in a typical policy-reward structure and the agent figures out arbitrage spots by itself. See linked article, they are definitely doing it.

There have been many people before you that have tried this path. The common consensus is that the market is unstable as fuck, driven by whale randomness in their pump and dumps, and that the top 10 or so hold their values and grow well.

Pretty much any analysis you run is going to tell you hodling is the answer. If you're lucky/smart, you're going to be able to find extraneous factors to better predict whale movement, but none of that is going to be readily available when limited to what you can see on exchanges. You'd have to include outside semantic analyses on the news, and it'd probably be worthwhile to run through the blockchain and find publicly available ledger transactions for big transactions in BTC sales and exchanges, preferably tagging public keys known to hold a lot on the ledgers which are clearly making moves to split their funds into different accounts or spread into different altcoins. You could then build a bot to move as they move, but this would be a hefty project to take on and it'd require a lot of computing power as the money branches off into several different paths.

traditional technical indicators are kernel-based
if they have any basis in reality, a CNN would do a better job than human hand-crafted ones

that was my line of thinking, at least, never got around to testing it. the imbalance of information makes sense. even professional traders use a mix of fundamental/technical analysis.

I always assumed you would have to feed in sentiment as well to balance the technical information and at the time sentiment was beyond my scope

I'm doing the same thing

Look into order book configurations and Ichimoku Clouds

In practice, I have gotten much worse results from CNN's on time-series data for that same reason. But if it worked for you, congrats.

interesting on the RL.
Also DNN's you mean Deep Neural Networks? I haven't heard of that acronym

Ah, that was my original idea to forecast derivative functions of price. Did you use numerical derivative functions of price as inputs as well?

Nobody has mentioned google trends for crypto sentiment. Use it as a contrarian indicator. Google trends is actually surprisingly accurate in reacting to price. Just look up in google trends "buy btc" or "sell btc" IDK about API though

CS background? I think you are approaching the problem from a slightly unfavorable angle. In your place I would focus on existing market anomalies and would try to design models that take advantage of those. Rather than predicting prices or price direction outright desu.

The problem with non-linear setups is you might just end up modelling a ton of noise and overfitting time series data is a HUGE problem in the industry. Also debugging, assessing performance, etc is quite difficult with what essentially is a blackbox.

I come from a pure math background, so maybe I'm looking at this the wrong way, but thats just been my experience in practice. If you want to go the non-linear way, I would model supplementary functions to price (like volatility) and use those to model future events.

I think a good use would be using your machine learning tool to discover which shilling sites are the best predictors of impending coin moonage, and how much advertising a coin needs to get before it starts to moon.

Other ideas:
- Percentage of coin that has been mined vs price action
- Date from ICO vs price action
- Date from listing on major exchanges vs price action

I hold math and CS degrees

Why should noise be a problem?

Ah, very nice. is there an API for it? have you used it programmatically at all?

If not, time to break out the beautiful soup

exactly, you need both technical and sentiment, and in a neural network, it doesn't matter so much where things go.

Why assume they are kernel-based? And you use that word in the sense of similarity functions? of what?

my bad, just noticed the "IDK about API though"

> Why should noise be a problem?
Because a non-trivial model will give you a great in-sample Sharpe but will most likely have very little predictive power out of sample. In other words "shit in, shit out" (and nowhere else does this apply more than with financial data)

kernel-based in the sense of a weighted window scanning over the time-series data. Most technical indicators are a simple average over a specified timespan, the SMA (simple moving average) is the most basic example. the EMA (exponential moving average) weights recent data more heavily than older data. The MACD is the difference between, for example, a 26 day EMA and a 12 day EMA.

> In practice, I have gotten much worse results from CNN's on time-series data for that same reason. But if it worked for you, congrats.
Just to clarify, only really used it in tests and toy models, never committed any money to those as I said. My life models rely on nothing more than trend, reversion and regime prediction.

*live

hm never thought of those, very interesting. writing them down.

I have yet to run into that problem.

I'm using both financial and non-financial inputs, a bias for affine functions, and a fair regularization. My features are also derivatives of the inputs not the inputs themselves.

I also cross-validate, and compare loss per epoch in the training versus test set, to see when over fitting starts to occur.

Haven't encountered many noise-problems in the industry. Perhaps a PCA/SVD with the Semi-Circle Law might help? Used in semantic indexing a lot

What is DL?

To be honest cross validation for time series is an adequate name, because the standard technique is to use time windows, like:
scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html

You're right in saying that time series data cannot be cross validated in the usual sense, though.

Well, you ought to know when big whales move coins from wallets to somewhere else

You cannot measure the market relying only on information from inside the market.

Welcome to this industry, but hope you will not succeed.

It is really not that hard to use ML to find an edge... Yes, I have done it, and no I did not use price data alone. Why don't I use it anymore? Too much work to update models compared to hos easy it is to make bank on trading random alts.

Oh yes you mean deep learning probably.

What do you mean by anomalies? Are you talking about arbitrage?

(OP)
Hello, I'm a developer,
can we start a discord?

BTC price movements are correlated to the number on Pink wojacks on Veeky Forums

screencap this

AH i see what you're saying now. sorry, kernel is one of those words with 5 different common-usages.

and fair point you have there. Maybe I have a bias toward CNN's outside of image recognition.

Ah, well it helps to build the different NN variations by hand with arrays and play around with the differences in model and see the effects. That's how I learned them anyways. For a lot of these things, the proofs as to why they work in different situations don't exist yet, so a lot of it is intuition and this is how I developed mine. (and lots of Kaggle)

. What are reversion and regime?

>Rosenblatt
A bit pretentious there op

how?

join us at vectorspace.ai my friend

Why not automate all of it?

what's wrong with my name?

I'm assuming he means 'deep learning'.

> My features are also derivatives of the inputs not the inputs themselves
That certainly helps. My comment re: just modelling noise, really was meant for raw data input. If you are doing some feature engineering that will help.

Still, not sure if you've ran anything live yet but overfitting and various data snooping biases are especially hard to navigate in finance. Here's a good paper on this topic (and others if you are interested): papers.ssrn.com/sol3/papers.cfm?abstract_id=3031282

Its all public knowledge user, you just need to know the address of a certain wallet.

ah, sounds like a tough problem in itself since those can be created in an instant and transferred/split etc.

ah, il bookmark it for tonight.

And yes, raw inputs tend to produce quite a bit of noise in this sector. However, with some time and ingenuity you can mostly eliminate that problem.

Make a discord already

"Anomalies" is kind of an antiquated term sorry... Kind of a hold over from the early 80s still used in most quant literature. Anomaly simply means something that disproves the EMH. Keep in mind that the EMH itself is pretty much bunk, even Fama himself almost went as far as retracting his whole paper (just not far enough lol).

So "momentum" would be an anomaly (since theres no rational reason why a rising asset is more likely to rise in the future under the EMH assumption), "january effect", "price drift", etc, etc

Once behavioural finance got involved all of it found some theoretical base (see Hong & Stein on momentum for example) but the term "anomaly" remained.

"Mean reversion" is, as the name states, the tendency of something to revert back to its mean. Which is pretty much the base of many pairs trading/heding strategies. "Regime" here means bear vs. bull, or trending vs. stagnating, etc. just general the general market condition

Ah, a thoughtful explanation, thank you.

Also, thank you for explaining Anomalies, I was thinking geometric anomalies in Euclidean space of the features. (like price doubling overnight or something of that sort)

I guess that goes to show my expertise lies in Math/CS and not in behavioral finance haha. Although you seem to be quite knowledgeable in the field. I hope it's working well for you

I'm off to do some feature scraping from Twitter and Google and whatnot.

If the thread dies and you'd like to continue our conversation, please make a new one and call for Rosenblatt. I'm usually on about noon PST

I would try to hook into blockchain explorers and just monitor every transaction above some threshold or keep count of addresses with lots of small inputs

>features do you think would have predictive power

Certainly, as time increases so does the price of bitcoin. That's a really good indicator there

lol, I'm no behavioural scientist myself. Fin/math background, just been algotrading for a while now. Good luck and drop me a message at "biznessanon [at] protonmail [dot] ch" if you wanna bounce around some ideas