Data Science

Any data scientists around? I'm about 5 months in to my self-education in the field. I have a BA in computer science and I'm trying to augment my skill set so I can get away from full stack web dev and do more back end ML stuff.

I took the Andrew NG ML class on coursera and loved it. If anyone in the field is around, I'd love to talk about what I should study next.

Other urls found in this thread:

tech.marksblogg.com/
amstat.org/careers/whatisstatistics.cfm
libgen.io/book/index.php?md5=7C4333E23A6BF64508F7C848E13CE5FB
cs.cmu.edu/~15150/lect.html
cs.cmu.edu/~rjsimmon/15411-f15/schedule.html
twitter.com/NSFWRedditGif

Are you struggling with the maths? Everytime I see someone from CS talking about datascience they all go about programming languages and the latest frameworks to do datascience but they seem to utterly ignore statistics and optimization

No, I'm quite comfortable with maths. I've been studying convex optimization and probabilistic graphical models for the mathematical underside.

Anyone else studying Data Science? If so, what are you studying currently?

Yes, you could self-educate yourself statistics and algorithms, but I doubt you'll be hired as a data scientist unless you have an actual degree (MSc) in either math/statistics or machine learning.

you sound like you're good to enter the workforce. Maybe learn how to distribute databases and queries, and how to structure your work pipeline.
tech.marksblogg.com/ I find these blog posts to be of really good quality.
Most big business have or need analytics support and analytics can increase sales consistently if you do it right. With most small-medium businesses this would mean not only doing the exploratory analytics to find where you can optimize the sales process, but also be able to quickly hack together something that works to optimize it and measure that.

I started an LLC with a friend. We were thinking of doing consulting in the field first. Like finding small businesses with analytical needs, providing a solution, then using those projects as a source of credibility for larger projects.

Honestly getting a nice pipeline in place is really what I'm struggling with. I know TensorFlow pretty well, but I don't have a way of storing trained models, versioning them, and providing front ends. What do you suggest in terms of a pipeline?

literal meme major

I'm starting a two years M.Sc. Program in Data Science next month (got my BA in CS too). Got into it by some kaggle competitions and blogs.

Tbh, I don't really want to work in typical business domains like consulting, business analytics or marketing. I'd rather try get into pharma-/biotech or health care projects.

>data science

I hate that phase so much. Just call it statistics.

I'm tired of the phrase too, but data science is not simply statistics.

>simply statistics

You're underestimating the breadth of statistics.

amstat.org/careers/whatisstatistics.cfm

>Statistics is the science of learning from data

Data science implies statistics and modern methods like machine learning, neural networks, i.e. stuff that's a bit outside the boundary of stats.

>machine learning

Is regression and classification AKA statistics.

>neural networks

Literally just using the chain rule. Are you seriously suggesting we need an entire new discipline created just because you're doing statistics + writing programs to solve basic calculus optimization problems by repeatedly finding the squared error of a known training set's answer's minus your network's answers and adjusting weighted connections based on their gradients with respect to the activation function you're using?

You're absolutely on point with your response. Data Science isn't really it's own thing. It more implies the combination of statistics and more computer science type stuff.

I've heard similar arguments leveraged against calling Computer Science, Computer Science instead of mathematics.

Really it's pedantic to argue the purity of terms in my opinion. People use the word to describe a particular application of statistics, and it's very clear what is meant when we say data science.

I've also heard people argue that everything is simply philosophy, but it should be clear why we use other words to describe subsets.

Not really a data scientist since I have no Idea about programming outside of R, but I am a survey statistician. If you have questions I might be able to help

Not him, but what was your degree in? I'm a bio/math double major with a statistics minor, but have no idea what I want to do yet upon graduation.

I actually have a bachelors degree in sociology (cause I was pretty fucking stupid), took lot's of survey methodology classes, learned sample designs etc. and did a Msc in survey statistics.

It's pretty nice actually I design and conduct and analyze surveys for a big company, focus on analyzing tho.

Not that guy, but I just want to say that a bio/math double major with stats will land you any cushy bioinformatics job you could ever want. If you know how to code or some computer science type stuff, you're especially in the gold because that's one of the hottest fields to be going into right now.

NYU has a Center for Data Science lead by Yaun Lecun. I think arguing against the term is hipster.

What statistical concepts have you found most useful in your line of work? What do you do on a daily basis?

>focus on analyzing tho
sorry for the hijack, but is this common for individuals hired with msc in stats to do more analysis than ground work?

>applies for job
>"Hi Mr user, have you got 5 years experience in the field X?"
>"No? Hum, you are not exactly who we are looking for. Good luck"

I'm in an introductory course, we have covered PCA and association rules so far. I think we are going cover decission trees now

If you have to present results to your superior, there is not a lot of stuff you can do. I usually just put some graphs up cause they barely know what a t-test is (yes this is common). Also very important are weighting techniques, linear and multiple regression. And you better fucking understand sampling theory and how to manage complex sample designs if you want to do a good job.

It's my first job so I don't really know but I guess not.

I poke around the TensorFlow library for machine learning. This is a good intro: libgen.io/book/index.php?md5=7C4333E23A6BF64508F7C848E13CE5FB it also gives you a mathematical crash course in machine learning.

If you like ML, try this:
cs.cmu.edu/~15150/lect.html (SML, as semester goes on this will fill with lectures)

cs.cmu.edu/~rjsimmon/15411-f15/schedule.html (Compiler Design in ML)

Learning about compilers translates into interesting jobs, specifically in security research which hires data scientists all the time too.

Wait, you meant ML as in machine learning not standard ML disregard.