Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.math.mathematica

Topic: Database Challenge
Replies: 4   Last Post: Jan 2, 2010 5:08 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Nicholas Kormanik

Posts: 15
Registered: 12/8/09
Database Challenge
Posted: Jan 1, 2010 5:37 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply


There are 12 records in this mini database. Two columns. First
column are social security numbers. Second column are names.
Unfortunately Jane Doe appears three times, with three different
versions of her name, but having the same social security number.

Challenge: Remove the duplicates, where social security is the same,
and keep any one of the names. Final result will be whittled down to
10 records.

(Real life problem has 6.5 million records, and lots of duplicates,
with various versions of names.)


025-60-4044 joe average
004-16-4077 jane doe
014-27-9076 mike smith
098-43-2098 rodolfo pilas
073-15-6005 gustavo boksar
004-16-4077 jane a. doe
147-79-9074 bea busaniche
165-63-0189 pablo medrano
124-96-7092 jeff aaron
004-16-4077 jane anne doe
172-30-6069 michael peters
059-85-1062 leroy baker







Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.