On Feb 11, 2013, at 2:23, Tim Uckun <timuckun@gmail.com> wrote:
> This works pretty good except for when the top 100 records have
> duplicated email address (two sales for the same email address).
>=20
> I am wondering what the best strategy is for dealing with this
> scenario. Doing the records one at a time would work but obviously it
> would be much slower. There are no other columns I can rely on to
> make the record more unique either.
The best strategy is fixing your data-model so that you have a unique =
key. As you found out already, e-mail addresses aren't very suitable as =
unique keys for people. For this particular case I'd suggest adding a =
surrogate key.
Alternatively, you might try using (first_name, email) as your key. =
You'll probably still get some duplicates, but they should be less and =
perhaps few enough for your case.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.