Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} |
Date | |
Msg-id | CAM3SWZRdNXyo_wpdQJacjpBPTpN7ONJtS3nniHMAFsGoudU2YA@mail.gmail.com Whole thread Raw |
In response to | Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} (Jeff Janes <jeff.janes@gmail.com>) |
List | pgsql-hackers |
On Mon, Dec 29, 2014 at 11:52 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > Correct, I haven't seen any problems with approach #1 That helps with debugging #2, then. That's very helpful. > Generally the problem will occur early on in the process, and if not then it > will not occur at all. I think that is because the table starts out empty, > and so a lot of insertions collide with each other. Once the table is more > thoroughly populated, most query takes the CONFLICT branch and therefore two > insertion-branches are unlikely to collide. > > At its simplest, I just use the count_upsert.pl script and your patch and > forget all the rest of the stuff from my test platform. I can reproduce this on my laptop now. I think that building at -O2 and without assertions helps. I'm starting to work through debugging it. I threw together a quick script for getting pg_xlogdump into a Postgres table (a nice use of the new pg_lsn type). It's here: https://github.com/petergeoghegan/jjanes_upsert/blob/master/pg_xlogdump2csv.py It tells a story. Looking at the last segment before shutdown when the problem occurred, I see: postgres=# select count(*), tx from my_xlogdump group by tx having count(*) > 4 order by 1;count | tx -------+--------- 5 | 1917836 5 | 1902576 5 | 1909746 5 | 1901586 5 | 1916971 6 | 1870077 39 | 1918004 119 | 1918003 2246 | 0 (9 rows) postgres=# select max(tx::text::int4) from my_xlogdump ; max ---------1918004 (1 row) So the last two transactions (1918003 and 1918004) get into some kind of live-lock situation, it looks like. Or at least something that causes them to produce significant more WAL records than other xacts due to some kind of persistent problem with conflicts. Here is where the earlier of the two problematic transactions has its first record: postgres=# select * from my_xlogdump where tx = '1918003' order by r_lsn asc limit 1;rmgr | len_rec | len_tot | tx | r_lsn | prev_lsn | descr ------+---------+---------+---------+------------+------------+----------------------------------------------------Heap | 3 | 203 | 1918003 | 0/1783BB70 | 0/1783BB48 | INSERT off 33 blkref #0: rel 1663/16471/12502 blk (1 row) After and including that record, until the trouble spot up to and including shutdown, here is the rmgr breakdown: postgres=# select count(*), rmgr from my_xlogdump where r_lsn >= '0/1783BB70' group by rmgr order by 1;count | rmgr -------+------------- 1 | XLOG -- 1 CHECKPOINT_SHUTDOWN record 2 | Transaction -- commit records for the twoxacts 20 | Heap2 -- all are CLEAN remxid records, tx is 0 76 | Heap -- All from our two xacts... 80 | Btree -- All from XID 1918003 only (5 rows) So looks like a bad interaction with VACUUM. Maybe it's a problem with VACUUM interlocking. That was my first suspicion, FWIW. I'll need to do more investigating, but I can provide a custom format dump of the table, in case anyone wants to look at what I have here in detail. I've uploaded it to: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/files/my_xlogdump.custom.dump.table -- Peter Geoghegan
pgsql-hackers by date: