Guys I have an interesting coding problem for you that I'm really stuck with.
Say I have 2 lists of names. Both in random orders but with a few of them in both. I need to find out which names are in group B but not group A. How would I do this in excel or matlab?
To complicate matters further some entries aren't 100% similar but are very similar for example in list A a name might be Andrew B. Cosby and in group be just Andrew Cosby but obviously this is a match and should not be in my answer list.
Thanks guys!
Tyler Davis
Do you know any other coding languages? I don't think excel (and I dont know about matlab) is the best way to handle lists of stings.
Ryan Harris
>To complicate matters further some entries aren't 100% similar
I don't know what you mean with that but you can use the levenshtein difference with a tolerance level to find similar but not identical strings.
about the other thing,
repmat(A, N_b,1) - repmat(B,N_a,1)'
the zeros are your doubles.
Luis Hall
Nigger if you have data like that you need to standarize it first
James Young
I don't really unfortunately, I'm a maths student so my using of coding is limited to calculations on matlab do you not think matlab or excel could handle something like this?
Adrian Reed
Would take for slow going but I think you're right. Any ideas how to do it with standardised data?
John Ross
What language is this?
Camden Russell
I mean if you've standardized the data you can just use sets. Or a terrible ugly for loop.
Christopher Rogers
matlab
Joseph Brown
What command would you use to compare an element of A with an element of B? What are the inputs in this case?
Thomas Foster
A, your first list, B, your second list, N_a, length of A, N_b, length of B
you're a math student and you've never used repmat?
Jordan Long
Nope, I'll give it a browse
Thanks all for your help, if this works I'll share some of the £65k with you!
Ian Clark
>£65k
pfff yea right
Luke Gutierrez
uninteresting programming problem in a shit language
Elijah Powell
First of all you should sort your fucking data. After that its pretty simple: Compare A[0] with the first letter of B[n] If its a match; compare the names(just first and last) if the names match record the name/ remove from list Else Break the loop and move onto A[1] this is probably the simplest but it wont be terribly fast
Joshua Hall
Sounds like a job for setdiff.
Sebastian Baker
it depends how your data is formatted. if you are using VBA for Excel you can use the Front() commands and compare the first n characters.
Julian Turner
Use sets in Python Set B - ( Set A N Set B )
Matthew Roberts
He needs to normalize the data first so that equivalent names are equal
Mason Price
sha4096
Justin Green
Matlab is bad at this because it's a shit language (with shit string support), but you can do something like that: First go through both lists of names and convert them to upper(or lower) case while also removing things like B. in your Andrew Cosby example (a good way to do this would probably be to take the first and last word). After that, use the appropriate set operations on the lists.
Owen Diaz
post it to mechanical turk for peanuts, your time clearly is more valuable
alternatively if your sets are really big make it into a captcha and let faggots do it for free
James Taylor
FOR EACH X NOT LISTC()[] IN LISTA { LISTD [] = X }
Listc() { For each X in LISTB[] { LISTC [] = "*" & X & "*" } }
hisssss :^)
Caleb Gutierrez
[code] void faggot { FOR EACH X NOT LISTC()[] IN LISTA { LISTD[] = X } }
static array listc()[] { For each Y in LISTB[] { LISTC[] = "*" & X & "*" } } [/code]
Theres some python for you.
Landon Davis
Dont do this it makes mustard gas
But really this will infinitely loop and segfault Windows. 9/10
Carter Diaz
not sure i can think of a non O(n^2) way to do it.
just go one by one thru list b, checking each value of list a. you should also do a isSimilar() method to take two names, split across whitespace and compare the first and last values (names).
William Williams
>O(n^2) way to do it.
Concatenate the lists in 1 Sort the list in N log N Run through the list and check neighbors in N.
There you go, N log N solution
If the lists are already sorted it's an N solution.
Fucking noobs
Caleb Ortiz
Perl has some lovely regular expression and this amazing data structure known as a hash for just this sort of thing. I encourage you to look it up, even if its the legacy of legacy.
Python has similar stuff going on, but regexp in Python is a little bit less intuitive for me (please dont ask me how /// is easier than regexp.) And a hash is just a 2 dimensional array in Python with naming and size restrictions.
Matlab has very poor regexp support from what I understand, even though I like it.
You have yourself there a week 1 day 5 regexp problem in Perl
Elijah Bennett
Seeing how you are thinking about excel or matlab you probably don't give a shit about time complexity.
Store both lists as simple arrays.
Take a name from list B and compare it to literally every other member in list A. If there is no match (track this with a boolean) then you output this name.
Repeat this for every member in list B and there you have.
Assuming lists of the same size this is just O(n squared) so it is not absolutely shit, but is literally as bad as you can do.
Cameron Ross
in R, only considering exact matches:
unique(B[! B %in% A])
Samuel Richardson
>please dont ask me how /// is easier than regexp it's not the syntax that's shit in python's regex, but the implementation.
they recommend you pre-compile your patterns, but have it set up so you can just pass a pattern string instead of a pattern object, but it's caching behind the scenes so there's sometimes no difference in the behavior no matter how you set up the search
it's a great example of horribly planned pre-optimization