Guys I have an interesting coding problem for you that I'm really stuck with

Question

Guys I have an interesting coding problem for you that I'm really stuck with

Ayden Barnes

Guys I have an interesting coding problem for you that I'm really stuck with.

Say I have 2 lists of names. Both in random orders but with a few of them in both. I need to find out which names are in group B but not group A. How would I do this in excel or matlab?

To complicate matters further some entries aren't 100% similar but are very similar for example in list A a name might be Andrew B. Cosby and in group be just Andrew Cosby but obviously this is a match and should not be in my answer list.

Thanks guys!

June 20, 2016 - 06:15

Tyler Davis

Do you know any other coding languages? I don't think excel (and I dont know about matlab) is the best way to handle lists of stings.

June 20, 2016 - 06:21

Ryan Harris

>To complicate matters further some entries aren't 100% similar

I don't know what you mean with that but you can use the levenshtein difference with a tolerance level to find similar but not identical strings.

about the other thing,

repmat(A, N_b,1) - repmat(B,N_a,1)'

the zeros are your doubles.

June 20, 2016 - 06:23

Luis Hall

Nigger if you have data like that you need to standarize it first

June 20, 2016 - 06:24

James Young

I don't really unfortunately, I'm a maths student so my using of coding is limited to calculations on matlab do you not think matlab or excel could handle something like this?

June 20, 2016 - 06:27

Adrian Reed

Would take for slow going but I think you're right. Any ideas how to do it with standardised data?

June 20, 2016 - 06:29

John Ross

What language is this?

June 20, 2016 - 06:30

Camden Russell

I mean if you've standardized the data you can just use sets. Or a terrible ugly for loop.

June 20, 2016 - 06:40

Christopher Rogers

matlab

June 20, 2016 - 06:45

Joseph Brown

What command would you use to compare an element of A with an element of B?
What are the inputs in this case?

June 20, 2016 - 06:49

Thomas Foster

A, your first list, B, your second list, N_a, length of A, N_b, length of B

you're a math student and you've never used repmat?

June 20, 2016 - 06:55

Jordan Long

Nope, I'll give it a browse

Thanks all for your help, if this works I'll share some of the £65k with you!

June 20, 2016 - 06:58

Ian Clark

>£65k

pfff yea right

June 20, 2016 - 07:08

Luke Gutierrez

uninteresting programming problem in a shit language

June 20, 2016 - 09:22

Elijah Powell

First of all you should sort your fucking data.
After that its pretty simple:
Compare A[0] with the first letter of B[n]
If its a match; compare the names(just first and last) if the names match record the name/ remove from list
Else Break the loop and move onto A[1]
this is probably the simplest but it wont be terribly fast

June 20, 2016 - 12:21

Joshua Hall

Sounds like a job for setdiff.

June 20, 2016 - 12:27

Sebastian Baker

it depends how your data is formatted. if you are using VBA for Excel you can use the Front() commands and compare the first n characters.

June 20, 2016 - 12:29

Julian Turner

Use sets in Python
Set B - ( Set A N Set B )

June 20, 2016 - 15:07

Matthew Roberts

He needs to normalize the data first so that equivalent names are equal

June 20, 2016 - 15:34

Mason Price

sha4096

June 20, 2016 - 21:26

Justin Green

Matlab is bad at this because it's a shit language (with shit string support), but you can do something like that:
First go through both lists of names and convert them to upper(or lower) case while also removing things like B. in your Andrew Cosby example (a good way to do this would probably be to take the first and last word).
After that, use the appropriate set operations on the lists.

June 20, 2016 - 21:43

Owen Diaz

post it to mechanical turk for peanuts, your time clearly is more valuable

alternatively if your sets are really big make it into a captcha and let faggots do it for free

June 20, 2016 - 21:54

James Taylor

FOR EACH X NOT LISTC()[] IN LISTA {
LISTD [] = X
}

Listc() {
For each X in LISTB[] {
LISTC [] = "*" & X & "*"
}
}

hisssss :^)

June 20, 2016 - 22:25

Caleb Gutierrez

[code]
void faggot {
FOR EACH X NOT LISTC()[] IN LISTA {
LISTD[] = X
}
}

static array listc()[] {
For each Y in LISTB[] {
LISTC[] = "*" & X & "*"
}
}
[/code]

Theres some python for you.

June 20, 2016 - 22:30

Landon Davis

Dont do this it makes mustard gas

But really this will infinitely loop and segfault Windows. 9/10

June 20, 2016 - 22:31

Carter Diaz

not sure i can think of a non O(n^2) way to do it.

just go one by one thru list b, checking each value of list a. you should also do a isSimilar() method to take two names, split across whitespace and compare the first and last values (names).

June 20, 2016 - 22:38

William Williams

>O(n^2) way to do it.

Concatenate the lists in 1
Sort the list in N log N
Run through the list and check neighbors in N.

There you go, N log N solution

If the lists are already sorted it's an N solution.

Fucking noobs

June 21, 2016 - 01:20

Caleb Ortiz

Perl has some lovely regular expression and this amazing data structure known as a hash for just this sort of thing. I encourage you to look it up, even if its the legacy of legacy.

Python has similar stuff going on, but regexp in Python is a little bit less intuitive for me (please dont ask me how /// is easier than regexp.) And a hash is just a 2 dimensional array in Python with naming and size restrictions.

Matlab has very poor regexp support from what I understand, even though I like it.

You have yourself there a week 1 day 5 regexp problem in Perl

June 21, 2016 - 02:55

Elijah Bennett

Seeing how you are thinking about excel or matlab you probably don't give a shit about time complexity.

Store both lists as simple arrays.

Take a name from list B and compare it to literally every other member in list A. If there is no match (track this with a boolean) then you output this name.

Repeat this for every member in list B and there you have.

Assuming lists of the same size this is just O(n squared) so it is not absolutely shit, but is literally as bad as you can do.

June 21, 2016 - 03:29

Cameron Ross

in R, only considering exact matches:

unique(B[! B %in% A])

June 21, 2016 - 05:37

Samuel Richardson

>please dont ask me how /// is easier than regexp
it's not the syntax that's shit in python's regex, but the implementation.

they recommend you pre-compile your patterns, but have it set up so you can just pass a pattern string instead of a pattern object, but it's caching behind the scenes so there's sometimes no difference in the behavior no matter how you set up the search

it's a great example of horribly planned pre-optimization

June 21, 2016 - 05:50

1 2 ... 4 Next

Guys I have an interesting coding problem for you that I'm really stuck with

Last threads