Machine Learning General - good papers edition

Post good (no bs) papers in machine learning.

Not limited to """"deep learning""""

I'll post the first two

Other urls found in this thread:

arxiv.org/abs/1312.6114
cs.toronto.edu/~fritz/absps/colt93.html
arxiv.org/abs/1412.7525
drive.google.com/file/d/0B2A1tnmq5zQdcFNkWU1vdDJiT00/view
pnas.org/content/108/Supplement_3/15647.full
twitter.com/SFWRedditImages

Auto-Encoding Variational Bayes
Kingma and Welling, 2014

arxiv.org/abs/1312.6114

Keeping Neural Networks Simple by Minimizing the Description Length of the Weights
Hinton and van Camp, 1993

cs.toronto.edu/~fritz/absps/colt93.html

Bump

Difference Target Propagation
Dong-Hyun Lee, Saizheng Zhang, Asja Fischer, Yoshua Bengio
arxiv.org/abs/1412.7525

Bamperoo

REEEEEEEEEEEEEEEEE

>2017
>still using tensorflow

Hinton has aa good review paper about machine learning and the brain. Feature detection in title i think. 2010 paper.

Nice. So many machine learner people meme here without knowing hinton. This variational method is now used to furnish a unifying brain theory now. Very interesting. Seems to apply to evolution too. Its a very simple theory actually too apart from the math.

drive.google.com/file/d/0B2A1tnmq5zQdcFNkWU1vdDJiT00/view

>unifying brain theory

Why do computer scientists think they can shoehorn their algorithms onto ann immensely complex biological structure of which the DNA has evolved over billions of years?

Have you even read it? Maybe a unifying theory is too strong but it is very good at describing neuroscientific and psychological phenomena.

All it is is formalizing and filling in details of how organisms can be seen as models of their environment. Something thats alreadt been mathematically proved by conant and ashby.

Approximate bayesian inference.

Not specifically, but I've read other comments Hinton has made about backpropagation and the brain, and I've read a lot of his work. The link I posted describes the difficulties of instantiating backpropagation in the brain. When you try doing BPTT (which you would probably need to produce e.g. reasoning), it becomes even more intractable. To reproduce the argument here: taking the derivative of spiking neurons is difficult, there isn't a strong case to be made that partial derivative information is communicated within the brain (or how neurons could do that), there is no evidence for forward and backward passes of information, the brain isn't doing supervised learning (its more like unsupervised learning with nudging toward achieving multiple goals), the brain doesn't use negative values (e.g. prediction errors are encoded in dopamine spiking through periods of decreased activity), changes in neural connections occur over multiple time frames, cortical layers have feedback going in both directions all the time, etc.

Systems biology isn't very relevant when talking about the specifics of how the brain learns.

Not very descriptive -- that just says that the brain has a prior that it updates in a principled manner. What are the contents of the graphical model? How are things updated? How do you deal with multiple layers of abstraction? Etc.

No.1 free energy IS unsupervised.
No.2 there is strong evidence that the brain does pass partial derivatives in terms of prediction error which hinton actially has advocated in recent lectures.
No.3 dopamine neurons dont actually encode prediction error per se but common misconception. The cortex has been clearly shown to encode prediction error.
No. 4 hinton misunderstands what the free energy idea is about. Its not necessaily about the brain learning by specific em algorithms this way. Its just the brain can be described in this way. And the novelty is that it can describe alot of phenomena in different areas that were previously unconnected.
No. 5. There is alot of literature from the last 5 to 10 yers about the potential specifics using graphical models, kalman filters and several other methods and fully accounts for the brains hierarchical nature. Infact in addition there is detailed models of neuronal microcircuits by bastos that summarises ideas on a microcircuit level. There are concrete ideas about how it works on a microcircuit level. Again. If you havent read anything dont bother replying because yes its based on hintons fre energy autoencoders but its alot more complex than that so please...

Avoid generalisations.

When i said approximate bayesian inference do you think i was trying to give you the whole theory? Dumbas. Just easy quick label. Bayesian inference bounded to take into account cost.

How can anyone in DL not know hinton

>No.2 there is strong evidence that the brain does pass partial derivatives in terms of prediction error

The only person I have ever heard say this is Hinton

Theres strong evidence that feedforward connections pass prediction error in this sense. Different language same things.

Just seen so many college kids or high schoolers talk about it hwre but only ever had one guy who replied about hinton.

1. Yea.
2. Such as?
3. Yes they do. Read Dayan (just won like a 1 mil prize for his work) or Schultz. Specifically reward prediction errors. They fit it to temporal difference.
4. Sure, I agree.
5. Graphical models isn't very specific. Kalman filters... Dayan has some work on that as well. Yes, some of those ideas are plausible and good candidates for building models of how the brain does certain things... yet no where near the full story. If they had anywhere near the full story, they could build recurrent networks that can deal with more than 50 time-steps of dependencies, not catastrophically mislabel images, use proactive reasoning and generalization across tasks, etc.

what you mean?

2. Literally read neuroscience e.g. mumford rao ballard prediction error ideas feedforward feedback. Friston.

3. No dopamine CORRELATES with prediction error. Not necessarily cause. Its correlates are too diverse for prediction error. Neurotransmitters are modulatory not driving and prediction error is seen all over the brain not just the midbrain and regardless of neurotransmission.

5. Literally just go on google scholar and search karl friston. Am i going to give you the whole fucking idea here?

And i never said it was the full story but free energy minimization can model a whole lot of phenomena. Again. Not specific algorithms but a useful way of quantifying self organizing phwnomena. Not just the brain. Life and not the only idea. Other controbuting ideas exist. Metastability in coordination dynamics is also good and arguably more applicable but free energy uniquely joins together ideas including how to bring together predictive coding and homeostasis. It solves the dark room problem.

Try explain how dopamine isnt necessary for learning if it encodes reward prediction error...

Yes the idea is very important for the progression of neuroscience in history but its not the case.

My uni!

Well done.

2. Got any specific paper in mind? I don't doubt that a prediction error for one layer can't be encoded (in fact I believe that), but sending information down several layers over what the partial derivative of weight x on layer 0 to outcome y in layer 10 is, seems prohibitive.

3. I didn't say it was the only correlate of prediction errors; the argument I made was that negative values (as often used in neural networks) don't occur in the brain, except against some base-line value. Also dopamine, as you surely know, increases LTP by invoking stronger and more sustained activity in already active neurons (and through inhibitory sheaves often inhibiting activation of less active ones). I would say that for reward prediction errors, dopamine is absolutely critical -- that's common sense if you look at people with dopaminergic insufficiency or excess (either endogenous or exogenous). I agree (well, I assume this is your position) that more general prediction errors are most likely encoded everywhere in the cortex as the balance between 'internal' and 'external' inputs to a given neuron.

5. I actually fully support the predictive coding approach to explaining cortical function (and have implemented models based on such schemes), my disagreement was with Hinton et. al's (at least former) dogmatic refusal to see the problems with assuming vanilla backpropagation exists in the brain. I don't think the dark room problem is really a problem, if you look at conditioning theory. It has been established since ages that sensory stimulation in an of itself has reinforcing qualities. The Pearce & Hall model (1980 paper) already laid the foundations for tying uncertainty to attention. There's been numerous elaborations upon how a slight bit of uncertainty is reinforcing (such that animals/humans seek it out), while extremes produce avoidance behaviour through fear and boredom.

2. How many neurons exist in the brain. I think hinton said theres more in the brain than seconds you actually exist.

3. Not sure what you mean by negative values but lateral habenula correlates with a negative prediction error.

And no its not common sense. The thing is we can only really test learning through performance which makes it hard to see that dopamine affects performance not learning. Theres a wealth of literature saying that dopamine isnt necessary for learning. Incentive salience is actually a better theory. Its just that dopamine correlates with prediction error coz incentive salience is updated with predicrtion error. A better term is precision. Your examples of dopamine deficiency dont explain why people have symptoms unrelated to reward like parkinsonianism.

Pearce and halls big in my department but not massive fan. I dont think it solves it. Free energy by friston does by saying organisms are models of their environment with sensory expectations but because learning about the world is intractable we use approximate inference.

Look at deep temporal models of the brain maybe. Also he has one called something like graphical models of belief propogation. One on perceptual hierarchies using dynamical systems. Several coauthoring the someone called kiebel. He has a few. Life as we know it is probably his most interesting papers. Only thing is friston is quite shit at explaining himself and by necessity also has to dumb down his math to be understood leaving you wondering where he got it from but a compsci person recently detived his whole principle to explain it. Her name is... i forget
Ailannis? Im on phone so cant link soz.

And the problem (which free energy elaborates on) is why do associations have rewarding properties. See barrett for her treatment of interoceptive and limbic hierarchies in predictive coding. Pandya for a purely neuroanatomicaly view.

Sorry when i gave you a paper list i thought you meant general free energy papers but literally just search rao & ballard and mumford (1992-3) in google scholar.

Bump.

Bump for compsci faggot.

>CSlets that didn't even take control theory I think they can explain how the brain works

>control theory engineers weep as CS fags do in the past 5 years what it took them decades to accomplish.

i hate both of you.

>lel we trained deep convolutional networks™ to classify images by taking derivatives and MAP inference, we're such geniuses we can explain all biology
>don't know what a dynamical system is, hasn't even heard about the principle of least action

>>don't know what a dynamical system is, hasn't even heard about the principle of least action
CHAD COMPUTER SCIENTIST

2. Fair enough, but cortical structures have been researched extensively and, to my knowledge, no one has come up with a plausible way of encoding such huge dependencies (aside from the dopamine neuron networks that span the entire brain from the mid-brain). If you google Feynman Machines or look at the doc I posted earlier, there's a lot of work being done on how backpropagation can be approximated by 'local' learning rules. I.e. instantiating similar dependencies without assuming the backward pass of derivatives over multiple layers.

I agree dopamine isn't necessary for learning (acetylcholine seems more prominent in handling how strong the prior is, for instance), but it seems to be necessary for coordinated reinforcement learning. I don't see why (though thats a poor argument, I know) the brain would encode incentive salience in proportion to the prediction error separately from the prediction error itself. Wouldn't the dopamine spike suffice? Since it strengthens connections when the reward is there or anticipated, and weakens connections (due to a lull in spiking) when not. I don't think parkinsons is completely disconnected, since it ties to the motor neuron system that is intricately linked with the midbrain.

Yeah P&H is quite inadequate, but I'm saying that all these concepts 'introduced' by machine learning/physics people have existed in the field of learning theory since time immemorial. Traditional Pavlovian and other theories of learning are seen as fossils, so no one cares what has already been established through countless experiments before. I agree though that free energy, predictive-coding etc. are more encompassing in many ways.

Thanks, I don't think we have a fundamental disagreement -- my annoyance was just with machine learning people who have absolutely no knowledge of neuroscience/psychology who claim the brain is a random forest/autoencoder etc.

It isnt necessary for reinforcement learning either. Its necessary for the expression of behaviour. All learning incl. Reinforcement learning is cortically mediated. You need a signal to update precision/confidence/ uncertainty because behavioural cues elicit multiple behaviours and thats what dopamine does. It explains why parkinsonianism happens. Lack of dopamine prevent behaviour being elicited in the triatum because 'policies' lack precision/certainty/salience/confidencre.

Name a better machine learning library and why

>theano - terse as fuck, GPU support is shit, support in general is waning
>keras - basic as fuck, might as well use tflearn
>torch - lua and not properly supported
>caffe - basic as fuck

The one you roll on your own because regressing pure functions like Tensorflow does is a dead end for ML.

Pytorch
>winfags get out

Take a look at these two:

pnas.org/content/108/Supplement_3/15647.full

and:

By carrot or by stick: cognitive reinforcement learning in parkinsonism
MJ Frank, LC Seeberger, RC O'reilly - Science, 2004 - science.sciencemag.org
(google gives a pdf)

Dopamine neurons correlate with reward prediction errors (first article), parkinsonism leads to more difficulty with learning through rewards than negative reinforcement.

Second article: "Parkinson’s patients off medication are better at learning to avoid choices that lead to negative outcomes than they are at learning from positive outcomes. Dopamine medication reverses this bias, making patients more sensitive to positive than negative outcomes", which is in line with what the TD theory of dopamine firing predicts.

From first article: "positive prediction errors shift dopamine firing rates more than negative prediction errors suggests either that the relationship between this firing rate and actual learning is strongly nonlinear about the zero point or that dopamine codes positive and negative prediction errors in tandem with a second system specialized for the negative component."

"Wickens (45) and Wickens and Kotter (46) proposed the most relevant of these for our discussion, which is often known as the three-factor rule. What Wickens (45) and Wickens and Kotter (46) proposed was that synapses would be strengthened whenever presynaptic and postsynaptic activities co-occurred with dopamine, and these same synapses would be weakened when presynaptic and postsynaptic activities occurred in the absence of dopamine. Indeed, there is now growing understanding at the biophysical level of the many steps by which dopamine can alter synaptic strengths (47)."

So its absolutely crucial to driving reinforcement learning.

pytorch is the true patrician choice. Your lack of even mentioning it shows how pleb you are. Go back to watching your Andrew Ng videos and leave the real science to the big boys.