What is Veeky Forums's opinion on capsule networks proposed by Geoffrey Hinton?

What is Veeky Forums's opinion on capsule networks proposed by Geoffrey Hinton?

Paper:
arxiv.org/pdf/1710.09829.pdf
Github code:
github.com/naturomics/CapsNet-Tensorflow

Other urls found in this thread:

people.idsia.ch/~ciresan/data/cvpr2012.pdf
en.wikipedia.org/wiki/Topological_data_analysis
openreview.net/pdf?id=HJWLfGWRb
twitter.com/AnonBabble

Just like everything else

>falling for the simplification meme

>implying it's necessarily tractable to provide guarantees for a model's accuracy
>implying regression validation isn't good enough for most practical applications
Randall sure is a retard

>What if the answers are wrong?
What does he mean by that? Is he talking about the known answers in the training sets? If you can't trust your training set data then you shouldn't be doing supervised learning in the first place. And if he instead means the answers generated by the program, you deal with them being wrong by having an error function to measure distance from the training set answers and the whole point is having it update to move in the direction of decreasing error.

It seems like a weird ensembling trick

It reminds me of the architecture in 'Multi-column Deep Neural Networks for Image Classification': people.idsia.ch/~ciresan/data/cvpr2012.pdf

>It seems like a weird ensembling trick

it's sort of the opposite, from what i can tell.

What's the opposite of ensembling?

Its basically combining Hough transforms and neural nets. Its basically a coincidence detector.

Nose here, mouth here, eye here, other eye there = high probability of face.

The main difference is that the spatial relationships of those subcomponents are taken into account. In a normal convolutional net, simply the presence of a feature is what is used, whereas with a capsule the "pose" of a feature is corroborated with the "pose" of other features to do inference.

Hinton is 69 years old and still producing original research. Truly original, as in no one else was even thinking about this type of shit. Dude is a legend.

user is 30 years old and still producing useless comments from his parents' basement. Truly un-insightful, as in no one else was even dumb enough to think about this type of shit. Dude is a loser.

Good one. Very funny!

Thanks. Nice trips, btw. :)

The fact that they report results on MNIST tells you everything you need to know--they couldn't make it scale up.

>day 1, mathematicians are unaware
>neurons are peaceful, scalar-powered beings

>day 17, mathematicians lurk in the distance
>a strange vector-valued fellow wandered into town

>day 44, mathematicians strike from the shadows
>here we define categorical neurons over more general fields and spaces, employing tools from classical algebraic geometry to allow for a more general notion of activation functions..

kek

ensembling consists of averaging many noisy information sources that are all supposed to provide the same info. the noise is averaged out.

selectively using information from data sources that provide different info would be the opposite.

pretty good, user

>tfw the mathematicians are already here
en.wikipedia.org/wiki/Topological_data_analysis

>The main difference is that the spatial relationships of those subcomponents are taken into account. In a normal convolutional net, simply the presence of a feature is what is used, whereas with a capsule the "pose" of a feature is corroborated with the "pose" of other features to do inference.

this assertion is kind of a leap. standard CNNs most definitely leverage spatial relationships. capsules might be a more elegant way to model those relationships though.

>standard CNNs most definitely leverage spatial relationships.

Test your CNN on rotated images and tell me what happens.

it will do just fine if it's trained accordingly.

And I should add... when you haven't trained on augmented data.

A CNN can learn spatial relationships that it has been trained on. IE you have to augment your dataset, or hope that you data naturally has many examples of affine transformations.

A capsule can generalize spatial relationships without ever seeing augmented data.

>A capsule can generalize spatial relationships without ever seeing augmented data.

i don't see that claim anywhere in the paper.

>Section 5.2 Robustness to Affine Transformations

>Transformation
matrices that learn to encode the intrinsic spatial relationship between a part and a whole constitute
viewpoint invariant knowledge that automatically generalizes to novel viewpoints.

I can keep going... I'm just gonna guess you didn't actually read the paper.

it generalizes better. that doesn't mean CNNs do not take advantage of spatial relationships.

don't get me wrong, i like the paper, but feel that your statement was misleading. i'll read more of it later.

was the CNN they trained for comparison also regularized using some reconstruction loss?

CNNs can implicitly use spatial relationships, by having copies of feature detectors for various configurations of the sub-features. This is obviously brittle and breaks if you see an instance of a feature, where the sub-features are in a pose that you have never seen before.

Capsules should do significantly better in these situations, as they explicitly make use of spatial relationships. you don't need to learn clones of your feature detectors for every possible configuration, you just learn 1, and take its pose into account.

With scalar outputs per neuron, each neuron can just say "Feature present/not present"

With vector valued output, the capsule neuron can say "Feature present/not present, and if it is present this is its pose".

Significantly more powerful.

Why not just do fuck all transformations to the input dataset and train it on that. I get that linear transformations are special because they are the only outcome of a perspective transform (I think). But what happens if you see like a picture on a pillar thats a nonlinear transformation but humans can still easily recognize whatever is on the pillar

>Why not just do fuck all transformations to the input dataset and train it on that.

That is what people do in practice with CNNs, and it is effective.

Capsules are an attempt to try to avoid this. This inspiration for capsules is actually from experimental psychology. There are numerous experiments which show that humans impose hierarchical coordinate frames on visual objects. Capsules are trying to incorporate this phenomenon into neural nets.

still waiting for an answer on this one

>The baseline is a standard CNN with three convolutional layers of 256, 256, 128 channels

I'm going to assume no. That is a bit of an unfair comparison to be honest.

so what you're saying is that an under-regularized network with more parameters is going to over-fit more than a network with fewer parameters and better regularization? shocker.

I don't disagree with that

the paper also includes hinton's usual gripe about pooling. they didn't have to use pooling. they could have used a longer stride or fewer feature maps in the last few layers. they also could have taken groups of outputs from some hidden layer in the fully-connected network to backpropagate reconstruction error through.

it's a neat paper otherwise, so it's kind of unfortunate that they didn't consider this.

I would bet that if you threw a team of asian PhDs at this, you could beat capsules with traditional conv nets.

But I think the intention with this paper was basically a proof of concept. Here is this crazy idea, and it works. It will still take a few cycles of iterations before it is a usable technology.

Also Machine Learning has this unfortunate problem where you are aren't competitive on common benchmarks, your paper will likely be ignored. Its a fast track to getting stuck in local research minima, but unfortunately its the state of the field right now. Hinton is obviously conscious of this and had to frame his paper to look good to reviewers. Unfortunately there are political aspects to science.

apparently there is another paper about capsules
openreview.net/pdf?id=HJWLfGWRb

>randomly generating models with no rhyme or reason
>a good thing

This is why CS is a meme.