Thread: duplicate rows mystery

duplicate rows mystery

From
Gerry Jensen
Date:
I'm trying to remove duplicate records from a table.  The best way I could
think of was to select into a new table using DISTINCT.

I issued the following command:

select distinct on (symbol, date) * into price from pricebackup;

It did indeed remove several rows that had duplicate items.  I then tried
to build a unique index on the columns (symbol, date) in pricebackup with:

create unique index price_symbol_date on pricebackup (symbol, date);

and got the error:

ERROR:  could not create unique index
DETAIL:  Table contains duplicated values.

How is it possible that I could have duplicate values in that table when I
just selected them into the table as distinct?  What am I missing?

Any help is appreciated.

Thanks,

Gerry

Re: duplicate rows mystery

From
Michael Fuhr
Date:
On Thu, Apr 14, 2005 at 03:26:16AM -0600, Gerry Jensen wrote:
>
> select distinct on (symbol, date) * into price from pricebackup;
>
> It did indeed remove several rows that had duplicate items.  I then tried
> to build a unique index on the columns (symbol, date) in pricebackup with:
>
> create unique index price_symbol_date on pricebackup (symbol, date);
>
> and got the error:
>
> ERROR:  could not create unique index
> DETAIL:  Table contains duplicated values.

If the SELECT and CREATE statements are what you really did, then
you tried to create an index on the original table (pricebackup),
not the table you SELECTed into (price).  Are you sure you have the
right table names in the right places?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: duplicate rows mystery

From
Tom Lane
Date:
Gerry Jensen <gerry@xmission.com> writes:
> I issued the following command[s]:
> select distinct on (symbol, date) * into price from pricebackup;
> create unique index price_symbol_date on pricebackup (symbol, date);
> and got the error:
> ERROR:  could not create unique index
> DETAIL:  Table contains duplicated values.

Um, isn't pricebackup the one that still has the dup rows?  You should
have created the index on the new table.

            regards, tom lane

Re: duplicate rows mystery

From
Gerry Jensen
Date:
Sorry.  That was a typo in my post.  I switched the table names in my
SELECT INTO line in my post (I tested it again to make sure that's not
what I did in psql and my results were the same).  price was the original
table with duplicate rows.  What I really did was:

select distinct on (symbol, date) * into pricebackup from price;

then:

create unique index price_symbol_date on pricebackup (symbol, date);

Yet it still gave me the duplicate value errors.  How is this possible?

Thanks,

Gerry

On Thu, 14 Apr 2005, Michael Fuhr wrote:

> On Thu, Apr 14, 2005 at 03:26:16AM -0600, Gerry Jensen wrote:
>>
>> select distinct on (symbol, date) * into price from pricebackup;
>>
>> It did indeed remove several rows that had duplicate items.  I then tried
>> to build a unique index on the columns (symbol, date) in pricebackup with:
>>
>> create unique index price_symbol_date on pricebackup (symbol, date);
>>
>> and got the error:
>>
>> ERROR:  could not create unique index
>> DETAIL:  Table contains duplicated values.
>
> If the SELECT and CREATE statements are what you really did, then
> you tried to create an index on the original table (pricebackup),
> not the table you SELECTed into (price).  Are you sure you have the
> right table names in the right places?
>
> --
> Michael Fuhr
> http://www.fuhr.org/~mfuhr/
>

Re: duplicate rows mystery

From
Steve Tucknott
Date:
How about the simple case that pricebackup is not empty before the
select? You say it is a new table, but did you do a test run or anything
like that before the first select? How about dropping or truncating the
table and starting again?


On Thu, 2005-04-14 at 18:09, Gerry Jensen wrote:
> Sorry.  That was a typo in my post.  I switched the table names in my
> SELECT INTO line in my post (I tested it again to make sure that's not
> what I did in psql and my results were the same).  price was the original
> table with duplicate rows.  What I really did was:
>
> select distinct on (symbol, date) * into pricebackup from price;
>
> then:
>
> create unique index price_symbol_date on pricebackup (symbol, date);
>
> Yet it still gave me the duplicate value errors.  How is this possible?
>
> Thanks,
>
> Gerry
>
> On Thu, 14 Apr 2005, Michael Fuhr wrote:
>
> > On Thu, Apr 14, 2005 at 03:26:16AM -0600, Gerry Jensen wrote:
> >>
> >> select distinct on (symbol, date) * into price from pricebackup;
> >>
> >> It did indeed remove several rows that had duplicate items.  I then tried
> >> to build a unique index on the columns (symbol, date) in pricebackup with:
> >>
> >> create unique index price_symbol_date on pricebackup (symbol, date);
> >>
> >> and got the error:
> >>
> >> ERROR:  could not create unique index
> >> DETAIL:  Table contains duplicated values.
> >
> > If the SELECT and CREATE statements are what you really did, then
> > you tried to create an index on the original table (pricebackup),
> > not the table you SELECTed into (price).  Are you sure you have the
> > right table names in the right places?
> >
> > --
> > Michael Fuhr
> > http://www.fuhr.org/~mfuhr/
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
--


Regards,

Steve Tucknott
ReTSol Ltd

DDI    01903 828769
MOBILE    07736715772



Re: duplicate rows mystery

From
Gerry Jensen
Date:
Yes.  I tried it several times, dropping the newly created table each
time.  I'm wondering if my database is just corrupt.

My indexes in my original table were definitely corrupt (I originally had
a unique (symbol, date) index).  I discovered this when certain queries
failed to return certain rows (e.g. selecting on a specific date), but
different queries did return those rows (e.g. selecting a range of dates
that included the missing date).  When I dropped the indexes, I was able
to again see the missing rows with the specific date query.  Apparently
the index corruption had allowed the duplicate rows to be inserted in the
first place.

So perhaps there's something seriously wrong with my database and I need
to start over from scratch.  The main reason for my post is that I wanted
to make sure I was understanding "DISTINCT ON" correctly and there was
nothing wrong with my logic.  If those things look fine and logical to
everyone, then my posting here has been beneficial to me.

Anyway, thanks for the suggestion.

Gerry

On Thu, 14 Apr 2005, Steve Tucknott wrote:

> How about the simple case that pricebackup is not empty before the
> select? You say it is a new table, but did you do a test run or anything
> like that before the first select? How about dropping or truncating the
> table and starting again?
>
>
> On Thu, 2005-04-14 at 18:09, Gerry Jensen wrote:
>> Sorry.  That was a typo in my post.  I switched the table names in my
>> SELECT INTO line in my post (I tested it again to make sure that's not
>> what I did in psql and my results were the same).  price was the original
>> table with duplicate rows.  What I really did was:
>>
>> select distinct on (symbol, date) * into pricebackup from price;
>>
>> then:
>>
>> create unique index price_symbol_date on pricebackup (symbol, date);
>>
>> Yet it still gave me the duplicate value errors.  How is this possible?
>>
>> Thanks,
>>
>> Gerry
>>
>> On Thu, 14 Apr 2005, Michael Fuhr wrote:
>>
>>> On Thu, Apr 14, 2005 at 03:26:16AM -0600, Gerry Jensen wrote:
>>>>
>>>> select distinct on (symbol, date) * into price from pricebackup;
>>>>
>>>> It did indeed remove several rows that had duplicate items.  I then tried
>>>> to build a unique index on the columns (symbol, date) in pricebackup with:
>>>>
>>>> create unique index price_symbol_date on pricebackup (symbol, date);
>>>>
>>>> and got the error:
>>>>
>>>> ERROR:  could not create unique index
>>>> DETAIL:  Table contains duplicated values.
>>>
>>> If the SELECT and CREATE statements are what you really did, then
>>> you tried to create an index on the original table (pricebackup),
>>> not the table you SELECTed into (price).  Are you sure you have the
>>> right table names in the right places?
>>>
>>> --
>>> Michael Fuhr
>>> http://www.fuhr.org/~mfuhr/
>>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: Have you searched our list archives?
>>
>>                http://archives.postgresql.org
> --
>
>
> Regards,
>
> Steve Tucknott
> ReTSol Ltd
>
> DDI    01903 828769
> MOBILE    07736715772
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>