Alright Veeky Forums, I need some advice

Alright Veeky Forums, I need some advice.

I'm finishing my masters in comp. sci. this summer. I've specialized in AI and Machine Learning and I got an interview as a junior Data Scientist at a pretty big company. They liked me at the interview and they've sent a business case which I'm working on ATM. This is the data I have:

Costumer data about how much an individual spent during a year.
Data entries containing two of customers, a measurement of how much they shop together, and the zip code that they both share.

So I'm basically supposed to, from a pretty crude document, attempt to blow them away by finding correlations in the data and explain my line of 'business thinking'. The questions I'm supposed to answer is "What are the correlations between the attributes?" "Are customers often paired with others who have similar number of friends/money spent?" and I'm pretty lost actually.

I've created some covariance matrices to show that the variables (money spent, shopping together) are dependent. I've also done some clustering and attempted to show that the number of friends aswell as the frequency of shopping correlates to groups that reside within the same ZIP code.

How would you guys use this data? I find it really hard to apply machine learning here. I'm mostly just falling back on basic statistics.

The data is only for one year btw so no time series.

PCA
t-SNE

> I'm mostly just falling back on basic statistics.

That's what data science is. People just call it machine learning as a buzzword.

The data isn't really that high dimensional though. What exactly would I be using those algos for?

First table is just [user_id, money_spent]
Second is just the 'shopping together' entry: [first_id, second_id, ZIP_code, shopping_together_rating]

I think it's because most people can do algos. Most people love the fancy stuff. But you have to be able to extract information once your done, present it in a business friendly way and ADD VALUE to the company

>I fucking hate the words machine learning.
>If you need to explain away, try linear models with a >a small factor space.
>look at the residuals of the linear models.
>if the residuals are normal, then the linear model is >correct, and you can explain away like a dumb fuck >economist.

how do you have a masters in comp sci emphasizing in machine learning and not be able to do this?

>masters in computer science
>specialized in AI

nice meme kiddo

it aint personal, but dont be a false flagging lame.

You dumb idiot
time series data being time series data has nothing to do with the duration. if it cycles it's time series data and if you're doing k-means on it you're fucking up.

Never learn stats or math,... only using fancy frameworks and textbook algorithms.

sad day in higher education

put your data on paste bin

>All these salty fucks
>Only one even remotely constructive reply

Stay golden Veeky Forums

>A time series is a series of data points indexed (or listed or graphed) in time order.

It's ONE set of values compiled from observations in made during one year.

>A time series is a series of data points indexed (or listed or graphed) in time order.
>It's ONE set of values compiled from observations in made during one year.

List the factors

>ONE set of values
my miss understanding
only a single factor
try time series
or bayesian

This is the data:

Two tables of data, all collected as the sum of one years observations
>[user_id, money_spent]
>[first_id, second_id, ZIP_code, shopping_together_rating]

try multivariate time series
with [user_id, money_spent].

try using some google maps api to get a sense of the zip codes or visualize it with ratings.

the second may just need a prediction of ratings and zip code.

if there is enough data on specific users, they may want a prediction on where the users shop.

if you can join the the tables by user_id

try to find out if higher rating correlates with money spent.

zip codes and money spent

Thanks mate!

There isn't any form of temporal data though. All entries present is just the compilation (sum) of a years spendings.

that makes it easier.
get the means of money spent by zip code and ratings.
if you can create an interactive dashboard in D3.js more power to you.
people loved visual shit.

I'd write a bunch of queries and reports relating the data.

Random forest classifier
Your welcome

The feature space is too small for random forest. This is why everyone hates CS.
Just talk talk talk but some of us see right through you.

To lazy to come up with a complete plan, but have you tried some 'association rule learning" OP?

>+1 for data visualization

Don't underestimate the lazy manager who doesn't give a shit about your ingenious plans as long as he can't understand it within 5 seconds.

im actually a mathfag, just started delving into ML few days ago. Tell me Is it gonna be saturated? I cant imagine since you actually need half a brain for it right?

the people are very loud about their knowledge but can't read the literature, which makes it even more saturated.

These are mostly CS people BTW. They're taking away from people who can actually read published literature on the topic.

It's a trick test. The data is obviously lacking.

Call them right now in a furious tone and demand more sophisticated data to analyse and how shit the current data and that your pseudo number generator can do better than what they gave you.

That job offer will surely be yours.

We are all going to make it

Isn't this too simplistic? I feel they'll love it if I cram in some algorithm that does some kinda prediction haphazardly.

If you can't do their interview question (with internet!) mean you are clearly not qualified for the fucking job. it's sad how I have to explain something so obvious.