Thread: BUG #14983: ERROR: duplicate key value violates unique constraint"oid_tbl_oid_key"

BUG #14983: ERROR: duplicate key value violates unique constraint"oid_tbl_oid_key"

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      14983
Logged by:          Tony Reix
Email address:      tony.reix@atos.net
PostgreSQL version: 10.1
Operating system:   AIX
Description:

This error randomly appears when PostgreSQL is compiled with -O2 .
With -O0, no error out of 100 tries.

This error appears with v9.6.6, v10.0, and v10.1 . With compiler XLC v13 or
v12 (last available versions). On AIX 6.1, 7.1, and 7.2 .

The test is part of the PostgreSQL regression tests: cd src/bin ; su pgstbf
; gmake -C pgbench check.

I'd like to know how to produce more detailed traces.
Since I port PostgreSQL, I do not know how to debug PostgreSQL. So, I'd like
to get help about how to find the root cause of this issue.

Errors appearing in logs:
1) src/bin/pgbench/tmp_check/log/001_pgbench_main.log :
   2017-12-18 16:07:39.268 CET [12976136] t/001_pgbench.pl LOG:  execute
P0_0: INSERT INTO oid_tbl SELECT FROM generate_series(1,1000);
   2017-12-18 16:07:39.268 CET [13041676] t/001_pgbench.pl ERROR:  duplicate
key value violates unique constraint "oid_tbl_oid_key"
   2017-12-18 16:07:39.268 CET [13041676] t/001_pgbench.pl DETAIL:  Key
(oid)=(100291) already exists.
   2017-12-18 16:07:39.268 CET [13041676] t/001_pgbench.pl STATEMENT:
INSERT INTO oid_tbl SELECT FROM generate_series(1,1000);
   2017-12-18 16:07:39.285 CET [12386468] t/001_pgbench.pl LOG:  execute
P0_0: INSERT INTO oid_tbl SELECT FROM generate_series(1,1000);

2) src/bin/pgbench/tmp_check/log/regress_log_001_pgbench :

  Success. You can now start the database server using:
  
      pg_ctl -D
/home2/freeware/src/packages/BUILD/postgresql-10.1-xlc13-AvecO2/32bit/src/bin/pgbench/tmp_check/data_main_2SQ6/pgdata
-l logfile start
  
  # Running:

/home2/freeware/src/packages/BUILD/postgresql-10.1-xlc13-AvecO2/32bit/src/bin/pgbench/../../../src/test/regress/pg_regress
--config-auth
/home2/freeware/src/packages/BUILD/postgresql-10.1-xlc13-AvecO2/32bit/src/bin/pgbench/tmp_check/data_main_2SQ6/pgdata
  ### Starting node "main"
  # Running: pg_ctl -D
/home2/freeware/src/packages/BUILD/postgresql-10.1-xlc13-AvecO2/32bit/src/bin/pgbench/tmp_check/data_main_2SQ6/pgdata
-l

/home2/freeware/src/packages/BUILD/postgresql-10.1-xlc13-AvecO2/32bit/src/bin/pgbench/tmp_check/log/001_pgbench_main.log
start
  waiting for server to start.... done
  server started
  # Postmaster PID for node "main" is 12845290
  # Running: pgbench --no-vacuum --client=5 --protocol=prepared
--transactions=25 --file

/home2/freeware/src/packages/BUILD/postgresql-10.1-xlc13-AvecO2/32bit/src/bin/pgbench/tmp_check/data_main_2SQ6/pgbench_script
  ok 1 - concurrent OID generation: exit code 0
  not ok 2 - concurrent OID generation: no stderr
  
  #   Failed test 'concurrent OID generation: no stderr'
  #   at
/home2/freeware/src/packages/BUILD/postgresql-10.1-xlc13-AvecO2/32bit/src/bin/pgbench/../../../src/test/perl/TestLib.pm
line 304.
  #          got: 'client 3 aborted in command 0 of script 0; ERROR:
duplicate key value violates unique constraint "oid_tbl_oid_key"
  # DETAIL:  Key (oid)=(100291) already exists.
  # 
  # '
  #     expected: 
  not ok 3 - concurrent OID generation: matches
  
  #   Failed test 'concurrent OID generation: matches'
  #   at
/home2/freeware/src/packages/BUILD/postgresql-10.1-xlc13-AvecO2/32bit/src/bin/pgbench/../../../src/test/perl/TestLib.pm
line 305.
  #                   'transaction type:

/home2/freeware/src/packages/BUILD/postgresql-10.1-xlc13-AvecO2/32bit/src/bin/pgbench/tmp_check/data_main_2SQ6/pgbench_script
  # scaling factor: 1
  # query mode: prepared
  # number of clients: 5
  # number of threads: 1
  # number of transactions per client: 25
  # number of transactions actually processed: 117/125
  # latency average = 37.404 ms
  # tps = 133.674197 (including connections establishing)
  # tps = 134.614762 (excluding connections establishing)
  # '
  #     doesn't match '(?^:processed: 125/125)'
  ### Stopping node "main" using mode immediate
  # Running: pg_ctl -D
/home2/freeware/src/packages/BUILD/postgresql-10.1-xlc13-AvecO2/32bit/src/bin/pgbench/tmp_check/data_main_2SQ6/pgdata
-m immediate stop
  waiting for server to shut down.... done
  server stopped
  # No postmaster PID for node "main"
  # Looks like you failed 2 tests of 3.



On Mon, Dec 18, 2017 at 03:36:18PM +0000, PG Bug reporting form wrote:
> Errors appearing in logs:
> 1) src/bin/pgbench/tmp_check/log/001_pgbench_main.log :
>    2017-12-18 16:07:39.268 CET [12976136] t/001_pgbench.pl LOG:  execute
> P0_0: INSERT INTO oid_tbl SELECT FROM generate_series(1,1000);
>    2017-12-18 16:07:39.268 CET [13041676] t/001_pgbench.pl ERROR:  duplicate
> key value violates unique constraint "oid_tbl_oid_key"
>    2017-12-18 16:07:39.268 CET [13041676] t/001_pgbench.pl DETAIL:  Key
> (oid)=(100291) already exists.
>    2017-12-18 16:07:39.268 CET [13041676] t/001_pgbench.pl STATEMENT:
> INSERT INTO oid_tbl SELECT FROM generate_series(1,1000);
>    2017-12-18 16:07:39.285 CET [12386468] t/001_pgbench.pl LOG:  execute
> P0_0: INSERT INTO oid_tbl SELECT FROM generate_series(1,1000);

This comes from 001_pgbench.pl, which aims at testing LWLock and
spinlock concurrency. This visibly points to something going wrong with
what happens on AIX for XLC 13. The buildfarm has a couple of animals
owned by Noah though those are using XLC 12. My gut feeling is that if
you stress enough a Postgres instance in one run you should get a high
reproducibility rate. Could you test manually something close to what
the TAP test does?

1) Create a PG instance and initialize it with this schema:
CREATE UNLOGGED TABLE oid_tbl () WITH OIDS;
ALTER TABLE oid_tbl ADD UNIQUE (oid);
2) Create a pgbench script with this query:
INSERT INTO oid_tbl SELECT FROM generate_series(1,1000);
3) Run pgbench with --protocol=prepared, but with a high number of
clients in priority, a longer time, or a higher number of transactions.

If you are looking for better support on this platform with this
compiler combination, I would highly recommend setting up a buildfarm
animal. This would allow for faster bug fixes and detection of
problems.
--
Michael

Attachment
On Tue, Jan 09, 2018 at 02:37:35PM +0900, Michael Paquier wrote:
> On Mon, Dec 18, 2017 at 03:36:18PM +0000, PG Bug reporting form wrote:
> > Errors appearing in logs:
> > 1) src/bin/pgbench/tmp_check/log/001_pgbench_main.log :
> >    2017-12-18 16:07:39.268 CET [12976136] t/001_pgbench.pl LOG:  execute
> > P0_0: INSERT INTO oid_tbl SELECT FROM generate_series(1,1000);
> >    2017-12-18 16:07:39.268 CET [13041676] t/001_pgbench.pl ERROR:  duplicate
> > key value violates unique constraint "oid_tbl_oid_key"
> >    2017-12-18 16:07:39.268 CET [13041676] t/001_pgbench.pl DETAIL:  Key
> > (oid)=(100291) already exists.
> >    2017-12-18 16:07:39.268 CET [13041676] t/001_pgbench.pl STATEMENT: 
> > INSERT INTO oid_tbl SELECT FROM generate_series(1,1000);
> >    2017-12-18 16:07:39.285 CET [12386468] t/001_pgbench.pl LOG:  execute
> > P0_0: INSERT INTO oid_tbl SELECT FROM generate_series(1,1000);
> 
> This comes from 001_pgbench.pl, which aims at testing LWLock and
> spinlock concurrency. This visibly points to something going wrong with
> what happens on AIX for XLC 13.

Known defect:
https://postgr.es/m/flat/a72cfcb0-37d0-de2f-b3ec-f38ad8d6a8cc@postgrespro.ru


On Mon, Jan 08, 2018 at 10:21:15PM -0800, Noah Misch wrote:
> On Tue, Jan 09, 2018 at 02:37:35PM +0900, Michael Paquier wrote:
> > This comes from 001_pgbench.pl, which aims at testing LWLock and
> > spinlock concurrency. This visibly points to something going wrong with
> > what happens on AIX for XLC 13.
>
> Known defect:
> https://postgr.es/m/flat/a72cfcb0-37d0-de2f-b3ec-f38ad8d6a8cc@postgrespro.ru

Thanks for the pointer, I forgot this one. We want to finish wrapping
the patches proposed there then.
--
Michael

Attachment
Hi Michael,

Sorry for answering so late. I missed your answer.
We are still trying to figure out what is the root cause of this issue with: AIX XLC -O2 32bit.
I see no issue with AIX XLC -O0 32bit or AIX XLC -O2 64bit or AIX GCC -O2 32bit and Linux/* 64bit. However, I have no
Linux32bit machine available. 

Recently, we have discovered that, in file:  src/include/port/atomics/arch-ppc.h , the way the pg_memory_barrier_impl
isimplemented may be differently implemented on AIX when using GCC or XLC. We have to experiment with a change: #if
defined(__GNUC__)|| defined(__xlc__) . 

We'll experiment with your suggestions.


About the buildfarm, I faced a blocking issue.


Sena will help me about PostgreSQL on AIX.


However, reading PostgreSQL documentation, it seems possible that this behavior appears ... by design.
Read lines begining by:  >>>

https://www.postgresql.org/docs/10/static/sql-createtable.html
  CREATE TABLE  WITH ( storage_parameter [= value] [, ... ] )
  This clause specifies optional storage parameters for a table or index;
         see Storage Parameters for more information.
  - The WITH clause for a table can also include OIDS=TRUE (or just OIDS) to specify that rows of the
        new table should have OIDs (object identifiers) assigned to them, or OIDS=FALSE to specify that
        the rows should not have OIDs.
  - If OIDS=FALSE is specified or implied, the new table does not store OIDs and no OID will be assigned
        for a row inserted into it.
    This is generally considered worthwhile, since it will reduce OID consumption and thereby postpone
>>>        the wraparound of the 32-bit OID counter.
>>>    Once the counter wraps around, OIDs can no longer be assumed to be unique,
     which makes them considerably less useful.



Cordialement,

Tony Reix

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net