Re: Cache lookup failure for index during pg_dump - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Cache lookup failure for index during pg_dump
Date
Msg-id 14781.1266626391@sss.pgh.pa.us
Whole thread Raw
In response to Cache lookup failure for index during pg_dump  (Bob Lunney <bob_lunney@yahoo.com>)
Responses Re: Cache lookup failure for index during pg_dump
Re: Cache lookup failure for index during pg_dump
Re: Cache lookup failure for index during pg_dump
List pgsql-bugs
Bob Lunney <bob_lunney@yahoo.com> writes:
> I have a job that summarizes some data from a table, truncates the table, drops the indexes, writes the summarized
databack into the table, then recreates the indexes.  The operations above are not in a single transaction, but
separatestatements executed by a script.   Easy, runs great, has for years.   
> Recently the job takes a little longer to run and is still going when the database dump starts. That's when I started
gettingthis: 

> ERROR:  cache lookup failed for index 70424

> My questions are: can making DDL changes during a dump cause this error?  Are the queries used by pg_dump
transactionallyconsistent, i.e. do they run in a transaction and get a single view of the database system catalogs?
Otherthan finer coordination of jobs, how can this situation be avoided? 

It's a bit messy.  pg_dump runs in a serializable transaction, so it
sees a consistent snapshot of the database including system catalogs.
However, it relies in part on various specialized backend functions like
pg_get_indexdef(), and those things tend to run on SnapshotNow time, ie
they look at the currently committed state.  So it is possible to get
this type of error if someone performs DDL changes while a dump is
happening: pg_dump sees index 70424 still listed in the catalogs,
so it asks about it, and the backend says "there is no such index",
which there isn't anymore because somebody dropped it since pg_dump's
transaction started.

The window for this sort of thing isn't very large, because the first
thing pg_dump does is acquire AccessShareLock on every table it intends
to dump, and past that point it won't be possible for anyone to modify
the table's DDL.  But it can happen.

The right fix for this is to make all those inquiry functions use the
calling query's snapshot; but duplicating a lot of backend
infrastructure is going to be a major pain in the rear, so the
discussion has kind of petered out every time it's come up in the past.

In practice, because pg_dump does lock out DDL changes for the bulk of
its run, it's not a great idea to be scheduling DDL-changing jobs during
your dumps anyhow.  Most of the time they'll just get blocked till the
dump finishes, and if they touch more than one table it's not at all
unlikely for them to end up deadlocked against pg_dump's locks.  A fix
for the snapshot-timing problem wouldn't do a thing for that problem.

So in short, the path of least resistance is to reschedule your dumps.
Or reconsider whether you really need to drop and recreate those indexes
--- could you use REINDEX instead?

            regards, tom lane

pgsql-bugs by date:

Previous
From: Aris Setyawan
Date:
Subject: Re: BUG #5015: MySQL migration wizard does not start
Next
From: Euler Taveira de Oliveira
Date:
Subject: Re: Cache lookup failure for index during pg_dump