Thread: BUG #15000: Cache lookup failure
The following bug has been logged on the website: Bug reference: 15000 Logged by: Thomas Kellerer Email address: postgres@kellerer.name PostgreSQL version: 10.1 Operating system: CentOS release 6.6 Description: We have a script that copies data from one database to another using pg_dump (plain text format) and the pipes the output to psql The script consistently fails with: pg_dump: [archiver (db)] query failed: ERROR: cache lookup failed for relation 16693 pg_dump: [archiver (db)] query was: SELECT tableoid, oid, conname, pg_catalog.pg_get_constraintdef(oid) AS consrc, conislocal, convalidated FROM pg_catalog.pg_constraint WHERE conr elid = '16693'::pg_catalog.oid AND contype = 'c' ORDER BY conname Postgres version is: PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit Apparently I can't attach the script here, so it can be downloaded from: https://sql-workbench.net/pg/localized_texts_cache_lookup_error.sh
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes: > We have a script that copies data from one database to another using pg_dump > (plain text format) and the pipes the output to psql > The script consistently fails with: > pg_dump: [archiver (db)] query failed: ERROR: cache lookup failed for > relation 16693 > pg_dump: [archiver (db)] query was: SELECT tableoid, oid, conname, > pg_catalog.pg_get_constraintdef(oid) AS consrc, conislocal, convalidated > FROM pg_catalog.pg_constraint WHERE conr elid = > '16693'::pg_catalog.oid AND contype = 'c' ORDER BY conname Hmph. Is it always the same OID in the message and the reported query? If so, what table does that correspond to (try select '16693'::regclass) and is there anything odd about that table's declaration or usage? It's hard to explain this as a race condition because pg_dump should have a lock on the table by the time it issues this query. So I'm not sure what's going on. regards, tom lane
Tom Lane schrieb am 08.01.2018 um 16:12: >> We have a script that copies data from one database to another using pg_dump >> (plain text format) and the pipes the output to psql >> The script consistently fails with: > >> pg_dump: [archiver (db)] query failed: ERROR: cache lookup failed for >> relation 16693 >> pg_dump: [archiver (db)] query was: SELECT tableoid, oid, conname, >> pg_catalog.pg_get_constraintdef(oid) AS consrc, conislocal, convalidated >> FROM pg_catalog.pg_constraint WHERE conr elid = >> '16693'::pg_catalog.oid AND contype = 'c' ORDER BY conname > > Hmph. Is it always the same OID in the message and the reported query? No, it's not always the same OID. I just re-ran the script and got ERROR: cache lookup failed for relation 16816 > If so, what table does that correspond to (try select '16693'::regclass) When I do that right after the error occurred then of course, nothing is shown: prod=# select '16816'::regclass; regclass ---------- 16816 (1 row) Same result on the source database. The OIDs for everything that the script creates (in the target database) are oid | relname -------+------------------- 16790 | shop_pkey 16792 | language_seq 16794 | languages 16797 | languages_pkey 16799 | l10n_seq 16787 | shop 16847 | l10n_pkey 16849 | l10n_unique 16851 | l10n_value_unique 16832 | l10n 16838 | l10n_value (For the above run, with "cache lookup failed for relation 16816") > and is there anything odd about that table's declaration or usage? No, nothing special. For completeness, here is the DDL that is run by the shell script: CREATE TABLE shop ( shop_id bigint NOT NULL ); ALTER TABLE shop ADD CONSTRAINT shop_pkey PRIMARY KEY (shop_id); INSERT INTO shop VALUES (1), (2), (3); DROP SEQUENCE IF EXISTS language_seq; CREATE SEQUENCE language_seq INCREMENT BY 1 MINVALUE 1 MAXVALUE 2147483646 CACHE 1 NO CYCLE; DROP TABLE IF EXISTS languages CASCADE; CREATE TABLE languages ( language_id char(2) NOT NULL, image_url varchar(300) NOT NULL, description varchar(100) ); ALTER TABLE languages ADD CONSTRAINT languages_pkey PRIMARY KEY (language_id); DROP SEQUENCE IF EXISTS l10n_seq; CREATE SEQUENCE l10n_seq INCREMENT BY 1 MINVALUE 1 MAXVALUE 2147483646 CACHE 1 NO CYCLE; DROP TABLE IF EXISTS l10n CASCADE; CREATE TABLE l10n ( l10n_id bigint NOT NULL, l10n_key varchar(80) NOT NULL, shop_id bigint NOT NULL, description varchar(4000), image_url varchar(256) ); ALTER TABLE l10n ADD CONSTRAINT l10n_pkey PRIMARY KEY (l10n_id); ALTER TABLE l10n ADD CONSTRAINT l10n_shop_fk FOREIGN KEY (shop_id) REFERENCES shop (shop_id) ON UPDATE NO ACTION ON DELETE NO ACTION; ALTER TABLE l10n ADD CONSTRAINT l10n_unique UNIQUE (l10n_key, shop_id); DROP TABLE IF EXISTS l10n_value CASCADE; CREATE TABLE l10n_value ( l10n_id bigint NOT NULL, language_id varchar(5) NOT NULL, l10n_value varchar(4000) NOT NULL, dirty char(1) DEFAULT '0'::bpchar, cms_user varchar(20), modified_at timestamp DEFAULT now(), CONSTRAINT l10n_value_check_dirty CHECK ((dirty = '0'::bpchar) OR (dirty = '1'::bpchar)) ); ALTER TABLE l10n_value ADD CONSTRAINT l10n_value_l10n_fk FOREIGN KEY (l10n_id) REFERENCES l10n (l10n_id) ON UPDATE NO ACTION ON DELETE CASCADE; ALTER TABLE l10n_value ADD CONSTRAINT l10n_value_unique UNIQUE (l10n_id, language_id);
Thomas Kellerer <thomas@kellerer.eu> writes: > Tom Lane schrieb am 08.01.2018 um 16:12: >> Hmph. Is it always the same OID in the message and the reported query? > No, it's not always the same OID. OK. The explanation then presumably is that something is dropping that table concurrently with the pg_dump run. It remains unclear to me why pg_dump is failing to protect itself against such a drop, but before we can figure that out, we need to be able to reproduce the situation. Can you identify anything that would be dropping tables concurrently with this script? regards, tom lane
Tom Lane schrieb am 08.01.2018 um 16:35: > OK. The explanation then presumably is that something is dropping that > table concurrently with the pg_dump run. It remains unclear to me why > pg_dump is failing to protect itself against such a drop, but before we > can figure that out, we need to be able to reproduce the situation. > > > Can you identify anything that would be dropping tables concurrently > with this script? I feel ashamed :( There was a left over pipe symbol in the script that we didn't see: pg_dump --table=l10n --table=l10n_value --clean ${db_stag} | psql --single-transaction --dbname=${db_prod} --file=${backup_stag} So pg_dump pipes a script with DROP statement into psql, which then runs essentially the same script again which was created in a previous step (also with --clean) Seeing my (stupid) mistake, I am not sure if this is supposed to work or not... Thomas
Thomas Kellerer <thomas@kellerer.eu> writes: > There was a left over pipe symbol in the script that we didn't see: > pg_dump --table=l10n --table=l10n_value --clean ${db_stag} | > psql --single-transaction --dbname=${db_prod} --file=${backup_stag} > So pg_dump pipes a script with DROP statement into psql, which then > runs essentially the same script again which was created in a previous step (also with --clean) Hm, but if psql is restoring into a different database than pg_dump is dumping from, I don't see how they'd interact? regards, tom lane
Tom Lane schrieb am 08.01.2018 um 17:04: >> There was a left over pipe symbol in the script that we didn't see: > >> pg_dump --table=l10n --table=l10n_value --clean ${db_stag} | >> psql --single-transaction --dbname=${db_prod} --file=${backup_stag} > >> So pg_dump pipes a script with DROP statement into psql, which then >> runs essentially the same script again which was created in a previous step (also with --clean) > > Hm, but if psql is restoring into a different database than pg_dump > is dumping from, I don't see how they'd interact? Yes, put psql is running the restore script (created with --clean) *twice*: once because of the pipe and a second time because of the --file argument