Alright Veeky Forums, I need some advice

Question

Alright Veeky Forums, I need some advice

Justin Foster

Alright Veeky Forums, I need some advice.

I'm finishing my masters in comp. sci. this summer. I've specialized in AI and Machine Learning and I got an interview as a junior Data Scientist at a pretty big company. They liked me at the interview and they've sent a business case which I'm working on ATM. This is the data I have:

Costumer data about how much an individual spent during a year.
Data entries containing two of customers, a measurement of how much they shop together, and the zip code that they both share.

So I'm basically supposed to, from a pretty crude document, attempt to blow them away by finding correlations in the data and explain my line of 'business thinking'. The questions I'm supposed to answer is "What are the correlations between the attributes?" "Are customers often paired with others who have similar number of friends/money spent?" and I'm pretty lost actually.

I've created some covariance matrices to show that the variables (money spent, shopping together) are dependent. I've also done some clustering and attempted to show that the number of friends aswell as the frequency of shopping correlates to groups that reside within the same ZIP code.

How would you guys use this data? I find it really hard to apply machine learning here. I'm mostly just falling back on basic statistics.

February 14, 2017 - 00:44

Justin Mitchell

The data is only for one year btw so no time series.

February 14, 2017 - 00:47

Asher Hill

PCA
t-SNE

> I'm mostly just falling back on basic statistics.

That's what data science is. People just call it machine learning as a buzzword.

February 14, 2017 - 01:02

Julian Scott

The data isn't really that high dimensional though. What exactly would I be using those algos for?

First table is just [user_id, money_spent]
Second is just the 'shopping together' entry: [first_id, second_id, ZIP_code, shopping_together_rating]

February 14, 2017 - 01:19

Landon Myers

I think it's because most people can do algos. Most people love the fancy stuff. But you have to be able to extract information once your done, present it in a business friendly way and ADD VALUE to the company

February 14, 2017 - 02:42

Jeremiah Barnes

>I fucking hate the words machine learning.
>If you need to explain away, try linear models with a >a small factor space.
>look at the residuals of the linear models.
>if the residuals are normal, then the linear model is >correct, and you can explain away like a dumb fuck >economist.

February 14, 2017 - 02:48

Jeremiah Scott

how do you have a masters in comp sci emphasizing in machine learning and not be able to do this?

February 14, 2017 - 03:11

Carter Gomez

>masters in computer science
>specialized in AI

nice meme kiddo

it aint personal, but dont be a false flagging lame.

February 14, 2017 - 03:12

Ryan Miller

You dumb idiot
time series data being time series data has nothing to do with the duration. if it cycles it's time series data and if you're doing k-means on it you're fucking up.

February 14, 2017 - 03:20

Camden Anderson

Never learn stats or math,... only using fancy frameworks and textbook algorithms.

February 14, 2017 - 03:28

Luis Gutierrez

sad day in higher education

February 14, 2017 - 03:32

Jose Torres

put your data on paste bin

February 14, 2017 - 03:36

Juan Cooper

>All these salty fucks
>Only one even remotely constructive reply

Stay golden Veeky Forums

February 14, 2017 - 03:54

Parker Moore

>A time series is a series of data points indexed (or listed or graphed) in time order.

It's ONE set of values compiled from observations in made during one year.

February 14, 2017 - 03:58

Landon Powell

>A time series is a series of data points indexed (or listed or graphed) in time order.
>It's ONE set of values compiled from observations in made during one year.

List the factors

February 14, 2017 - 04:13

Adrian Myers

>ONE set of values
my miss understanding
only a single factor
try time series
or bayesian

February 14, 2017 - 04:23

Chase Gomez

This is the data:

Two tables of data, all collected as the sum of one years observations
>[user_id, money_spent]
>[first_id, second_id, ZIP_code, shopping_together_rating]

February 14, 2017 - 05:12

Luke Williams

try multivariate time series
with [user_id, money_spent].

try using some google maps api to get a sense of the zip codes or visualize it with ratings.

the second may just need a prediction of ratings and zip code.

if there is enough data on specific users, they may want a prediction on where the users shop.

if you can join the the tables by user_id

try to find out if higher rating correlates with money spent.

zip codes and money spent

February 14, 2017 - 06:00

Benjamin Martinez

Thanks mate!

There isn't any form of temporal data though. All entries present is just the compilation (sum) of a years spendings.

February 14, 2017 - 06:14

Isaac Bell

that makes it easier.
get the means of money spent by zip code and ratings.
if you can create an interactive dashboard in D3.js more power to you.
people loved visual shit.

February 14, 2017 - 06:29

Aaron Bailey

I'd write a bunch of queries and reports relating the data.

February 14, 2017 - 07:45

Brody Flores

Random forest classifier
Your welcome

February 14, 2017 - 07:49

Carson Rogers

The feature space is too small for random forest. This is why everyone hates CS.
Just talk talk talk but some of us see right through you.

February 14, 2017 - 07:56

Robert Sanchez

To lazy to come up with a complete plan, but have you tried some 'association rule learning" OP?

February 14, 2017 - 08:00

Brandon Walker

>+1 for data visualization

Don't underestimate the lazy manager who doesn't give a shit about your ingenious plans as long as he can't understand it within 5 seconds.

February 14, 2017 - 08:06

Carter Murphy

im actually a mathfag, just started delving into ML few days ago. Tell me Is it gonna be saturated? I cant imagine since you actually need half a brain for it right?

February 14, 2017 - 08:09

William Allen

the people are very loud about their knowledge but can't read the literature, which makes it even more saturated.

February 14, 2017 - 08:16

Brody Reed

These are mostly CS people BTW. They're taking away from people who can actually read published literature on the topic.

February 14, 2017 - 08:24

Christian Nguyen

It's a trick test. The data is obviously lacking.

Call them right now in a furious tone and demand more sophisticated data to analyse and how shit the current data and that your pseudo number generator can do better than what they gave you.

That job offer will surely be yours.

February 14, 2017 - 08:26

Nolan Parker

We are all going to make it

February 14, 2017 - 09:02

Brayden Reed

Isn't this too simplistic? I feel they'll love it if I cram in some algorithm that does some kinda prediction haphazardly.

February 14, 2017 - 15:50

Sebastian Rogers

If you can't do their interview question (with internet!) mean you are clearly not qualified for the fucking job. it's sad how I have to explain something so obvious.

February 14, 2017 - 16:12

1 2 ... 4 Next

Alright Veeky Forums, I need some advice

Last threads