Thread: BUG #15000: Cache lookup failure

BUG #15000: Cache lookup failure

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15000
Logged by:          Thomas Kellerer
Email address:      postgres@kellerer.name
PostgreSQL version: 10.1
Operating system:   CentOS release 6.6
Description:

We have a script that copies data from one database to another using pg_dump
(plain text format) and the pipes the output to psql 

The script consistently fails with:

pg_dump: [archiver (db)] query failed: ERROR:  cache lookup failed for
relation 16693
pg_dump: [archiver (db)] query was: SELECT tableoid, oid, conname,
pg_catalog.pg_get_constraintdef(oid) AS consrc, conislocal, convalidated
FROM pg_catalog.pg_constraint WHERE conr                           elid =
'16693'::pg_catalog.oid    AND contype = 'c' ORDER BY conname

Postgres version is: 
PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313
(Red Hat 4.4.7-18), 64-bit

Apparently I can't attach the script here, so it can be downloaded from: 

https://sql-workbench.net/pg/localized_texts_cache_lookup_error.sh




Re: BUG #15000: Cache lookup failure

From
Tom Lane
Date:
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> We have a script that copies data from one database to another using pg_dump
> (plain text format) and the pipes the output to psql 
> The script consistently fails with:

> pg_dump: [archiver (db)] query failed: ERROR:  cache lookup failed for
> relation 16693
> pg_dump: [archiver (db)] query was: SELECT tableoid, oid, conname,
> pg_catalog.pg_get_constraintdef(oid) AS consrc, conislocal, convalidated
> FROM pg_catalog.pg_constraint WHERE conr                           elid =
> '16693'::pg_catalog.oid    AND contype = 'c' ORDER BY conname

Hmph.  Is it always the same OID in the message and the reported query?
If so, what table does that correspond to (try select '16693'::regclass)
and is there anything odd about that table's declaration or usage?

It's hard to explain this as a race condition because pg_dump should
have a lock on the table by the time it issues this query.  So I'm
not sure what's going on.

            regards, tom lane


Re: BUG #15000: Cache lookup failure

From
Thomas Kellerer
Date:
Tom Lane schrieb am 08.01.2018 um 16:12:
>> We have a script that copies data from one database to another using pg_dump
>> (plain text format) and the pipes the output to psql 
>> The script consistently fails with:
> 
>> pg_dump: [archiver (db)] query failed: ERROR:  cache lookup failed for
>> relation 16693
>> pg_dump: [archiver (db)] query was: SELECT tableoid, oid, conname,
>> pg_catalog.pg_get_constraintdef(oid) AS consrc, conislocal, convalidated
>> FROM pg_catalog.pg_constraint WHERE conr                           elid =
>> '16693'::pg_catalog.oid    AND contype = 'c' ORDER BY conname
> 
> Hmph.  Is it always the same OID in the message and the reported query?

No, it's not always the same OID. 

I just re-ran the script and got

    ERROR:  cache lookup failed for relation 16816

> If so, what table does that correspond to (try select '16693'::regclass)

When I do that right after the error occurred then of course, nothing is shown:

    prod=# select '16816'::regclass;
     regclass
    ----------
     16816
    (1 row)

Same result on the source database. 

The OIDs for everything that the script creates (in the target database) are

  oid  |      relname
-------+-------------------
 16790 | shop_pkey
 16792 | language_seq
 16794 | languages
 16797 | languages_pkey
 16799 | l10n_seq
 16787 | shop
 16847 | l10n_pkey
 16849 | l10n_unique
 16851 | l10n_value_unique
 16832 | l10n
 16838 | l10n_value

(For the above run, with "cache lookup failed for relation 16816")


> and is there anything odd about that table's declaration or usage?

No, nothing special.

For completeness, here is the DDL that is run by the shell script:

    CREATE TABLE shop
    (
       shop_id                      bigint          NOT NULL 
    );
     
    ALTER TABLE shop
       ADD CONSTRAINT shop_pkey
       PRIMARY KEY (shop_id);

    INSERT INTO shop VALUES (1), (2), (3);

    DROP SEQUENCE IF EXISTS language_seq;
     
    CREATE SEQUENCE language_seq
           INCREMENT BY 1
           MINVALUE 1
           MAXVALUE 2147483646
           CACHE 1
           NO CYCLE;
     
    DROP TABLE IF EXISTS languages CASCADE;
     
    CREATE TABLE languages
    (
       language_id  char(2)        NOT NULL,
       image_url    varchar(300)   NOT NULL,
       description  varchar(100)
    );
     
    ALTER TABLE languages
       ADD CONSTRAINT languages_pkey
       PRIMARY KEY (language_id);

    DROP SEQUENCE IF EXISTS l10n_seq;
     
    CREATE SEQUENCE l10n_seq
           INCREMENT BY 1
           MINVALUE 1
           MAXVALUE 2147483646
           CACHE 1
           NO CYCLE;
     
    DROP TABLE IF EXISTS l10n CASCADE;
     
    CREATE TABLE l10n
    (
       l10n_id      bigint          NOT NULL,
       l10n_key     varchar(80)     NOT NULL,
       shop_id      bigint          NOT NULL,
       description  varchar(4000),
       image_url    varchar(256)
    );
     
    ALTER TABLE l10n
       ADD CONSTRAINT l10n_pkey
       PRIMARY KEY (l10n_id);
     
    ALTER TABLE l10n
      ADD CONSTRAINT l10n_shop_fk FOREIGN KEY (shop_id)
      REFERENCES shop (shop_id)
      ON UPDATE NO ACTION
      ON DELETE NO ACTION;
     
    ALTER TABLE l10n
       ADD CONSTRAINT l10n_unique UNIQUE (l10n_key, shop_id);
     
    DROP TABLE IF EXISTS l10n_value CASCADE;
     
    CREATE TABLE l10n_value
    (
       l10n_id      bigint          NOT NULL,
       language_id  varchar(5)      NOT NULL,
       l10n_value   varchar(4000)   NOT NULL,
       dirty        char(1)         DEFAULT '0'::bpchar,
       cms_user     varchar(20),
       modified_at  timestamp       DEFAULT now(),
       CONSTRAINT l10n_value_check_dirty CHECK ((dirty = '0'::bpchar) OR (dirty = '1'::bpchar))
    );
     
    ALTER TABLE l10n_value
      ADD CONSTRAINT l10n_value_l10n_fk FOREIGN KEY (l10n_id)
      REFERENCES l10n (l10n_id)
      ON UPDATE NO ACTION
      ON DELETE CASCADE;
     
    ALTER TABLE l10n_value
       ADD CONSTRAINT l10n_value_unique UNIQUE (l10n_id, language_id);



Re: BUG #15000: Cache lookup failure

From
Tom Lane
Date:
Thomas Kellerer <thomas@kellerer.eu> writes:
> Tom Lane schrieb am 08.01.2018 um 16:12:
>> Hmph.  Is it always the same OID in the message and the reported query?

> No, it's not always the same OID. 

OK.  The explanation then presumably is that something is dropping that
table concurrently with the pg_dump run.  It remains unclear to me why
pg_dump is failing to protect itself against such a drop, but before we
can figure that out, we need to be able to reproduce the situation.
Can you identify anything that would be dropping tables concurrently
with this script?

            regards, tom lane


Re: BUG #15000: Cache lookup failure

From
Thomas Kellerer
Date:
Tom Lane schrieb am 08.01.2018 um 16:35:
> OK.  The explanation then presumably is that something is dropping that
> table concurrently with the pg_dump run.  It remains unclear to me why
> pg_dump is failing to protect itself against such a drop, but before we
> can figure that out, we need to be able to reproduce the situation.
>
>
> Can you identify anything that would be dropping tables concurrently
> with this script?

I feel ashamed :( 

There was a left over pipe symbol in the script that we didn't see:

  pg_dump --table=l10n --table=l10n_value --clean ${db_stag} |
  psql --single-transaction --dbname=${db_prod} --file=${backup_stag}

So pg_dump pipes a script with DROP statement into psql, which then 
runs essentially the same script again which was created in a previous step (also with --clean) 

Seeing my (stupid) mistake, I am not sure if this is supposed to work or not...

Thomas


Re: BUG #15000: Cache lookup failure

From
Tom Lane
Date:
Thomas Kellerer <thomas@kellerer.eu> writes:
> There was a left over pipe symbol in the script that we didn't see:

>   pg_dump --table=l10n --table=l10n_value --clean ${db_stag} |
>   psql --single-transaction --dbname=${db_prod} --file=${backup_stag}

> So pg_dump pipes a script with DROP statement into psql, which then
> runs essentially the same script again which was created in a previous step (also with --clean)

Hm, but if psql is restoring into a different database than pg_dump
is dumping from, I don't see how they'd interact?

            regards, tom lane


Re: BUG #15000: Cache lookup failure

From
Thomas Kellerer
Date:
Tom Lane schrieb am 08.01.2018 um 17:04:
>> There was a left over pipe symbol in the script that we didn't see:
> 
>>    pg_dump --table=l10n --table=l10n_value --clean ${db_stag} |
>>    psql --single-transaction --dbname=${db_prod} --file=${backup_stag}
> 
>> So pg_dump pipes a script with DROP statement into psql, which then
>> runs essentially the same script again which was created in a previous step (also with --clean)
> 
> Hm, but if psql is restoring into a different database than pg_dump
> is dumping from, I don't see how they'd interact?

Yes, put psql is running the restore script (created with --clean) *twice*:
once because of the pipe and a second time because of the --file argument