Thread: 8.0 -> 8.1 dump duplicate key problem?

8.0 -> 8.1 dump duplicate key problem?

From
"Merlin Moncure"
Date:
I have an interesting situation...am staging a mock upgrade from 8.0 to
8.1 on a bzip compressed backup image from one of our production
servers.  The dump image is big (270 mb compressed) and so is the file
in question ~ 440k rows.

Am piping the backup image to the server via
psql -cd backup.bz2 | psql zyx

When the dump gets to the point where the indexes/keys are built, the
primary key fails to build due to duplicate key constraint failure.
However, after dump is complete, I can create the p-key without any
modification to the table and everything is fine.

The error is reproducible...I have confirmed in two separate runs.  The
source 8.0 database is still up and does not report any dups in the
p-key fields, which of course it shouldn't.

Merlin


Re: 8.0 -> 8.1 dump duplicate key problem?

From
Tom Lane
Date:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> When the dump gets to the point where the indexes/keys are built, the
> primary key fails to build due to duplicate key constraint failure.
> However, after dump is complete, I can create the p-key without any
> modification to the table and everything is fine.

That's pretty bizarre.  What's the datatype of the key column(s)?

Can you reduce it to a smaller test case, or perhaps send me the full
dump off-list?  (270m is a bit much for email, but web or ftp would
work ... also, presumably only the pkey column is needed to generate
the error ...)
        regards, tom lane


Re: 8.0 -> 8.1 dump duplicate key problem?

From
Dennis Bjorklund
Date:
On Thu, 10 Nov 2005, Tom Lane wrote:

> > When the dump gets to the point where the indexes/keys are built, the
> > primary key fails to build due to duplicate key constraint failure.
> 
> That's pretty bizarre.  What's the datatype of the key column(s)?

There was one guy on IRC that had the same problem. We didn't get that
much info about it and suggested that he check the original database for
corruption (rebuilding indexes) to make sure it's in a proper state.

Either this is the same guy or/and we might have a bigger problem. The guy 
on irc also had an integer PK.

-- 
/Dennis Björklund



Re: 8.0 -> 8.1 dump duplicate key problem?

From
"Merlin Moncure"
Date:
> "Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
> > When the dump gets to the point where the indexes/keys are built,
the
> > primary key fails to build due to duplicate key constraint failure.
> > However, after dump is complete, I can create the p-key without any
> > modification to the table and everything is fine.
>
> That's pretty bizarre.  What's the datatype of the key column(s)?
>
> Can you reduce it to a smaller test case, or perhaps send me the full
> dump off-list?  (270m is a bit much for email, but web or ftp would
> work ... also, presumably only the pkey column is needed to generate
> the error ...)

I am working on pairing down a test case.  As you can imagine this is a
nasty business.  My best shot is to go to the source server and trying
to dump just that file, but that may mask the problem.

Come to think of it, I did make one configuration change: I bumped
work_mem after the dump was loaded after noticing a lot of activity in
pg_dump.

Table schema follows.  If the schema seems a bit odd, it is because it
was converted from an ISAM file.

[Dennis: I am not the guy from IRC]

esp=# \d data1.parts_order_line_file             Table "data1.parts_order_line_file"         Column          |
Type          | Modifiers 
--------------------------+-------------------------+-----------id                       | cuid
|lastmod                 | ts                      |prl_combined_key         | character(9)            | not
nullprl_seq_no              | pic_9_3                 | not nullprl_combined_key_2       | character(9)
|prl_item_no             | character varying(15)   |prl_comment_desc         | character varying(2500) |prl_location
        | character(4)            |prl_workstation          | character(4)            |prl_stock_loc            |
character(4)           |prl_qty                  | numeric(7,3)            |prl_adj_price            | numeric(8,2)
      |prl_cost                 | numeric(11,5)           |prl_weight               | numeric(7,2)            |prl_uom
               | character(2)            |prl_vendor_no            | character(6)            |prl_vendor_part_no
|character varying(15)   |prl_track_this_part      | character(1)            |prl_warranty_period      | character
varying(10)  |prl_comments_1           | character varying(30)   |prl_comments_2           | character varying(30)
|prl_qty_shipped         | numeric(6,2)[]          |prl_qty_still_on_bo      | numeric(6,2)[]
|prl_qty_credited        | numeric(6,2)[]          |prl_credit_reason        | character(2)[]
|prl_credit_reason_type  | character(1)[]          |prl_cancel_ship          | character(1)[]
|prl_exchange_part       | character(1)            |prl_authorization_code   | character varying(10)   |prl_item_status
        | character(1)            |prl_item_status_alpha    | character(1)            |prl_cancel_flag          |
character(1)           |prl_charge_type_flag     | character(1)            |prl_ct_taxable_flag      | character(1)
      |prl_account_cat_code     | character(6)            |prl_retail_price         | numeric(8,2)
|prl_line_needs_serials  | character(1)            |prl_chrg_type_ship_indx  | pic_9_1
|prl_claim_type_flag     | character(1)            |prl_attached_wc_seq_no   | pic_9_3
|prl_attached_claim_type | character varying(10)   |prl_already_issued       | character(1)
|prl_returned_part_flag  | character(1)            |prl_prev_qty_shipped     | numeric(6,2)[]
|prl_prev_qty_still_on_bo| numeric(6,2)[]          |prl_prev_qty_credited    | numeric(6,2)[]          | 
Indexes:   "parts_order_line_file_pkey" PRIMARY KEY, btree (prl_combined_key,
prl_seq_no)   "parts_order_line_file_prl_exchange_part_key" UNIQUE, btree
(prl_exchange_part, id)   "parts_order_line_file_prl_item_no_key" UNIQUE, btree (prl_item_no,
id)   "parts_order_line_file_prl_trx_type_2_key" UNIQUE, btree
(prl_combined_key_2, prl_item_no, id)

CREATE DOMAIN public.pic_9_3 AS int2 DEFAULT 0  CONSTRAINT valid_range CHECK (((VALUE >= 0) AND (VALUE <= 999)));


Re: 8.0 -> 8.1 dump duplicate key problem?

From
"Merlin Moncure"
Date:
> That's pretty bizarre.  What's the datatype of the key column(s)?
>
> Can you reduce it to a smaller test case, or perhaps send me the full
> dump off-list?  (270m is a bit much for email, but web or ftp would
> work ... also, presumably only the pkey column is needed to generate
> the error ...)

I just confirmed that there are duplicate p-keys in the source table :(.
Three currently but last week there were six.  Just FYI I am not 100%
sure pg rebuilt the p-key with dups in it...I need to double check this.

esp=# select * from
esp-# (
esp(#     select prl_combined_key, prl_seq_no, count(*) as c from
parts_order_line_file group by 1,2

esp(# ) q where q.c > 1;prl_combined_key | prl_seq_no | c
------------------+------------+--- 00136860        |         20 | 2 00136860        |         23 | 2 00137050        |
        1 | 2 
(3 rows)

esp=# \d parts_order_line_file             Table "data1.parts_order_line_file"         Column          |          Type
        | Modifiers 
--------------------------+-------------------------+-----------
[...]
Indexes:   "parts_order_line_file_pkey" PRIMARY KEY, btree (prl_combined_key,
prl_seq_no)   "parts_order_line_file_prl_exchange_part_key" UNIQUE, btree
(prl_exchange_part, id)   "parts_order_line_file_prl_item_no_key" UNIQUE, btree (prl_item_no,
id)   "parts_order_line_file_prl_trx_type_2_key" UNIQUE, btree
(prl_combined_key_2, prl_item_no, id)

I keep a timestamp on every row for last modified date:

esp=# select lastmod from parts_order_line_file where prl_combined_key =
' 00136860' and prl_seq_no
in (20, 23);        lastmod
-------------------------2005-09-15 11:17:17.0622005-09-15 11:17:17.187
(2 rows)


There have been no schema changes since 9/15...

Merlin


Re: 8.0 -> 8.1 dump duplicate key problem?

From
Merlin Moncure
Date:
On 11/14/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
> > I just confirmed that there are duplicate p-keys in the source table :(.
>
> Well, that's not very good either, but at least it narrows down the
> problem.
>
> Do the "duplicate" rows appear to be independent insertions, or
> successive states of the same logical row?  Looking at their xmin/xmax
> might help determine this, and if the table has OIDs then looking at the
> oid would be pretty conclusive.

no oid.
esp=# select xmin, xmax, lastmod from parts_order_line_file where
prl_combined_key =
esp-# ' 00136860' and prl_seq_no in (20, 23);  xmin    | xmax |         lastmod
-----------+------+-------------------------584527952 |    0 | 2005-09-15 11:17:17.062584527961 |    0 | 2005-09-15
11:17:17.187
(2 rows)

the nature of the file, unfortunately is that a record may be
rewritten several times over it's lifespan, due to ISAM style
resequencing on the table.   the file is taken directly from user
entry app so high speec race condition type behavior is unlikely, save
for a order duplication aspect I need to check into.

Being a ported COBOL app, tranasactions are one-record wonders, save
for places where parts have been rewritten in pl/pgsql (does not apply
here).

Merlin


Re: 8.0 -> 8.1 dump duplicate key problem?

From
Tom Lane
Date:
Merlin Moncure <mmoncure@gmail.com> writes:
> On 11/14/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Do the "duplicate" rows appear to be independent insertions, or
>> successive states of the same logical row?

> esp=# select xmin, xmax, lastmod from parts_order_line_file where
> prl_combined_key =
> esp-# ' 00136860' and prl_seq_no in (20, 23);
>    xmin    | xmax |         lastmod
> -----------+------+-------------------------
>  584527952 |    0 | 2005-09-15 11:17:17.062
>  584527961 |    0 | 2005-09-15 11:17:17.187
> (2 rows)

I think you need to try this with enable_indexscan = 0; it should be
showing us 4 rows according to your prior result, and it's only showing
2, which suggests that the indexscan is short-circuiting because it
"knows" there can only be 1 result row.  Also, since you're probing for
more than one primary key value, please include the pkey columns in the
query so we can tell what's what...
        regards, tom lane


Re: 8.0 -> 8.1 dump duplicate key problem?

From
Merlin Moncure
Date:
On 11/14/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
> > esp=# select xmin, xmax, lastmod from parts_order_line_file where
> > prl_combined_key =
> > esp-# ' 00136860' and prl_seq_no in (20, 23);
> >    xmin    | xmax |         lastmod
> > -----------+------+-------------------------
> >  584527952 |    0 | 2005-09-15 11:17:17.062
> >  584527961 |    0 | 2005-09-15 11:17:17.187
> > (2 rows)
>
> I think you need to try this with enable_indexscan = 0; it should be

right, I missed that!

esp=# select prl_combined_key, prl_seq_no, xmin, xmax, lastmod from
parts_order_line_file where prl_combined_key = ' 00136860' and
prl_seq_no in (20, 23);prl_combined_key | prl_seq_no |   xmin    | xmax |         lastmod
------------------+------------+-----------+------+------------------------- 00136860        |         20 | 584527952 |
  0 | 2005-09-15 11:17:17.062 00136860        |         20 | 584412245 |    0 | 2005-09-15 09:31:35.381 00136860
|        23 | 584527961 |    0 | 2005-09-15 11:17:17.187 00136860        |         23 | 584415243 |    0 | 2005-09-15
09:32:18.898

merlin


Re: 8.0 -> 8.1 dump duplicate key problem?

From
Merlin Moncure
Date:
On 11/14/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I think you need to try this with enable_indexscan = 0; it should be
> showing us 4 rows according to your prior result, and it's only showing

one thing I forgot to mention....there is a sequence on the table.
Sequence is global for all tables hooked via default on a
domain..highly unlikely records were inserted back to back.

esp=# select id, prl_combined_key, prl_seq_no, xmin, xmax, lastmod
from parts_order_line_file where
prl_combined_key = ' 00136860' and prl_seq_no in (20, 23);   id    | prl_combined_key | prl_seq_no |   xmin    | xmax |
       lastmod 
----------+------------------+------------+-----------+------+-------------------------15077227 |  00136860        |
    20 | 584412245 |    0 | 
2005-09-15 09:31:35.38115077260 |  00136860        |         23 | 584415243 |    0 |
2005-09-15 09:32:18.89815082475 |  00136860        |         20 | 584527952 |    0 |
2005-09-15 11:17:17.06215082522 |  00136860        |         23 | 584527961 |    0 |
2005-09-15 11:17:17.187
(4 rows)


Re: 8.0 -> 8.1 dump duplicate key problem?

From
Tom Lane
Date:
Merlin Moncure <mmoncure@gmail.com> writes:
> esp=# select prl_combined_key, prl_seq_no, xmin, xmax, lastmod from
> parts_order_line_file where prl_combined_key = ' 00136860' and
> prl_seq_no in (20, 23);
>  prl_combined_key | prl_seq_no |   xmin    | xmax |         lastmod
> ------------------+------------+-----------+------+-------------------------
>   00136860        |         20 | 584527952 |    0 | 2005-09-15 11:17:17.062
>   00136860        |         20 | 584412245 |    0 | 2005-09-15 09:31:35.381
>   00136860        |         23 | 584527961 |    0 | 2005-09-15 11:17:17.187
>   00136860        |         23 | 584415243 |    0 | 2005-09-15 09:32:18.898

OK, so the fact that they all have xmax=0 proves that none are UPDATEd
versions of others, which leaves us with the presumption that there was
an outright failure of duplicate-key detection during INSERT :-(

The pairing seems very odd though: judging from the proximity of xmin
and lastmod, the first and third rows were inserted at almost the same
time, and they do *not* have equal keys; the rows they should have
conflicted with were inserted some time earlier.

Can you think of anything special about the client-side logic that might
have created unusual behavior in this situation?  Not that it's not
clearly a backend-side bug, I'm just looking for a handle to attack it
with.  (If you can determine the exact SQL commands that were issued to
create these records, it'd be great.)

One question that seems interesting is whether there might have been a
VACUUM working on the pkey index concurrently with the later insertions.
If you can either positively confirm or positively deny that, it'd be
useful information.
        regards, tom lane


Re: 8.0 -> 8.1 dump duplicate key problem?

From
Rod Taylor
Date:
On Mon, 2005-11-14 at 14:51 -0500, Tom Lane wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
> > esp=# select prl_combined_key, prl_seq_no, xmin, xmax, lastmod from
> > parts_order_line_file where prl_combined_key = ' 00136860' and
> > prl_seq_no in (20, 23);
> >  prl_combined_key | prl_seq_no |   xmin    | xmax |         lastmod
> > ------------------+------------+-----------+------+-------------------------
> >   00136860        |         20 | 584527952 |    0 | 2005-09-15 11:17:17.062
> >   00136860        |         20 | 584412245 |    0 | 2005-09-15 09:31:35.381
> >   00136860        |         23 | 584527961 |    0 | 2005-09-15 11:17:17.187
> >   00136860        |         23 | 584415243 |    0 | 2005-09-15 09:32:18.898
> 
> OK, so the fact that they all have xmax=0 proves that none are UPDATEd
> versions of others, which leaves us with the presumption that there was
> an outright failure of duplicate-key detection during INSERT :-(

Is there any chance it is related to the 8.0 problem I reported on
Wednesday?


http://groups.google.ca/group/pgsql.hackers/browse_frm/thread/a832e9919fb53f92/83e8f0bd272b8597?lnk=st&q=%22possible+savepoint+bug%22&rnum=1&hl=en#83e8f0bd272b8597

-- 



Re: 8.0 -> 8.1 dump duplicate key problem?

From
Tom Lane
Date:
Rod Taylor <pg@rbt.ca> writes:
> Is there any chance it is related to the 8.0 problem I reported on
> Wednesday?

Too soon to tell ... though one would like to think we don't have more
than one bug in that area ;-).

If either of you can come up with even a low-probability test case,
it'd be a great help.
        regards, tom lane


Re: 8.0 -> 8.1 dump duplicate key problem?

From
Merlin Moncure
Date:
On 11/14/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The pairing seems very odd though: judging from the proximity of xmin
> and lastmod, the first and third rows were inserted at almost the same
> time, and they do *not* have equal keys; the rows they should have
> conflicted with were inserted some time earlier.

> Can you think of anything special about the client-side logic that might
> have created unusual behavior in this situation?  Not that it's not
> clearly a backend-side bug, I'm just looking for a handle to attack it
> with.  (If you can determine the exact SQL commands that were issued to
> create these records, it'd be great.)

> One question that seems interesting is whether there might have been a
> VACUUM working on the pkey index concurrently with the later insertions.
> If you can either positively confirm or positively deny that, it'd be
> useful information.

VACUUM is scheuled to run weekly, so we can rul that out.  Can rule
out exotic transaction behavor (savepoint) and long running
transactions generally, save for pg_dump.  In my app, all I/O save for
rare pgadmin edit is with PQExecParams/PQExecPrepared through libpq.
Inserts and updates are currently not ever prepared.  Following is an
example of the insert logged from the ISAM drver logging utility.
Transaction load on the table is moderate to heavy but is 10:1
read/write ratio at least.

This is of course running windows...reading Rod's mail it seems we
might have similar problem (running 8.0.2).  The update would be
virtually the same and matching on the p-key.

0.0181023   sec:  insert into parts_order_line_file (prl_combined_key,
prl_seq_no, prl_combined_key_2, prl_item_no, prl_comment_desc,
prl_location, prl_workstation, prl_stock_loc, prl_qty, prl_adj_price,
prl_cost, prl_weight, prl_uom, prl_vendor_no, prl_vendor_part_no,
prl_track_this_part, prl_warranty_period, prl_comments_1,
prl_comments_2, prl_qty_shipped, prl_qty_still_on_bo,
prl_qty_credited, prl_credit_reason, prl_credit_reason_type,
prl_cancel_ship, prl_exchange_part, prl_authorization_code,
prl_item_status, prl_item_status_alpha, prl_cancel_flag,
prl_charge_type_flag, prl_ct_taxable_flag, prl_account_cat_code,
prl_retail_price, prl_line_needs_serials, prl_chrg_type_ship_indx,
prl_claim_type_flag, prl_attached_wc_seq_no, prl_attached_claim_type,
prl_already_issued, prl_returned_part_flag, prl_prev_qty_shipped,
prl_prev_qty_still_on_bo, prl_prev_qty_credited) values ($1, $2, $3,
$4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18,
$19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32,
$33, $34, $35, $36, $37, $38, $39, $40, $41, $42, $43, $44) params:
$1= 00001126 $2=001 $3= 00001126 $4=BAR-000006 $5=BAR $6=PL-1 $7= $8=
$9= 0001.000 $10= 000016.50 $11= 000015.00000 $12=00050.00 $13=EA
$14=000006 $15=BAR-000006 $16=N $17= $18= $19= $20={" 0001.00","
0000.00"," 0000.00"," 0000.00"," 0000.00"," 0000.00"," 0000.00","
0000.00"," 0000.00"} $21={" 0000.00"," 0000.00"," 0000.00","
0000.00"," 0000.00"," 0000.00"," 0000.00"," 0000.00"," 0000.00"}
$22={" 0000.00"," 0000.00"," 0000.00"," 0000.00"," 0000.00","
0000.00"," 0000.00"," 0000.00"," 0000.00"}
$23={"","","","","","","","",""} $24={"","","","","","","","",""}
$25={"","","","","","","","",""} $26=N $27= $28=2 $29=A $30= $31= $32=
$33= $34= 000019.80 $35= $36=0 $37= $38=000 $39= $40= $41= $42={"
0000.00"," 0000.00"," 0000.00"," 0000.00"," 0000.00"," 0000.00","
0000.00"," 0000.00"," 0000.00"} $43={" 0000.00"," 0000.00","
0000.00"," 0000.00"," 0000.00"," 0000.00"," 0000.00"," 0000.00","
0000.00"} $44={" 0000.00"," 0000.00"," 0000.00"," 0000.00","
0000.00"," 0000.00"," 0000.00"," 0000.00"," 0000.00"}


Re: 8.0 -> 8.1 dump duplicate key problem?

From
Merlin Moncure
Date:
> Rod Taylor <pg@rbt.ca> writes:
> > Is there any chance it is related to the 8.0 problem I reported on
> > Wednesday?
>
> Too soon to tell ... though one would like to think we don't have more
> than one bug in that area ;-).
>
> If either of you can come up with even a low-probability test case,
> it'd be a great help.
>
>             regards, tom lane

I spend several hours yesterday trying all different manner of things
and came up with zero.  Eventually I got to the point where I felt
like I was wasting my time.  ISTM there is likely some type of trigger
which will be difficult to catch via simulation.  Our production
servers are running dual opterons and my dev box isn't, so it's
impossible for me to to a pefect simiulation :(.

While I was initially extremely concerned about this, an internal
audit confirmed there was no actual data loss...the proper row
appeared to be visible in all cases, so I classify this as more of
correctible annoyance, fwiw.

Talked with Rod a bit off-list and the best course of action is to
implement a nightly process   to check for things and work from there.I'll keep you posted.

Merlin


Re: 8.0 -> 8.1 dump duplicate key problem?

From
Merlin Moncure
Date:
On 11/16/05, Rod Taylor <pg@rbt.ca> wrote:
> On Mon, 2005-11-14 at 14:51 -0500, Tom Lane wrote:
> > Merlin Moncure <mmoncure@gmail.com> writes:
> > > esp=# select prl_combined_key, prl_seq_no, xmin, xmax, lastmod from
> > > parts_order_line_file where prl_combined_key = ' 00136860' and
> > > prl_seq_no in (20, 23);
> > >  prl_combined_key | prl_seq_no |   xmin    | xmax |         lastmod
> > > ------------------+------------+-----------+------+-------------------------
> > >   00136860        |         20 | 584527952 |    0 | 2005-09-15 11:17:17.062
> > >   00136860        |         20 | 584412245 |    0 | 2005-09-15 09:31:35.381
> > >   00136860        |         23 | 584527961 |    0 | 2005-09-15 11:17:17.187
> > >   00136860        |         23 | 584415243 |    0 | 2005-09-15 09:32:18.898
> >
> > OK, so the fact that they all have xmax=0 proves that none are UPDATEd
> > versions of others, which leaves us with the presumption that there was
> > an outright failure of duplicate-key detection during INSERT :-(
>
> I realize this doesn't help much but I have found some recently which
> are from updates. The duplicates were on a table which rarely has an
> insert or delete (maybe one of each per day, but tens of thousands of
> updates).
>
> Vacuum every 30 minutes.

I can't prove it (yet) but looking at the phantom rows strongly
indicates the same:  xmin    | xmax |    id    |         lastmod         |
prl_combined_key | prl_seq_no
-----------+------+----------+-------------------------+------------------+-----------584412869 |    0 | 15077217 |
2005-09-1509:31:48.163 |  00136860      |         19584527952 |    0 | 15082475 | 2005-09-15 11:17:17.062 |  00136860
  |         20584412245 |    0 | 15077227 | 2005-09-15 09:31:35.381 |  00136860      |         20584527955 |    0 |
15082479| 2005-09-15 11:17:17.109 |  00136860      |         21 

The way my application works, rows are only ever inserted at the end
of the sequence number range.  If a row is deleted, the rows are
resequenced in place by updating the enitre record minus the p-key.

Seq# 20 and 21 have xmin and ID very close suggesting phantom was
generated during resequencing operation.  This is all updates of
course, except for the highest seq# which is deleted.  Can confirm
that as of yet but am putting auditing controls in which will catch it
next time.

Merlin