Thread: Odd pg dump error: cache lookup failure

Odd pg dump error: cache lookup failure

From

Wells Oliver

Date:

24 August 2020, 17:49:52

My pg_dump scripts have been bombing lately, and this is the error I see a bit into the backup process:

pg_dump: error: query failed: ERROR: cache lookup failed for attribute 1 of relation 1152770777

pg_dump: error: query was: SELECT t.tableoid, t.oid, t.relname AS indexname, inh.inhparent AS parentidx, pg_catalog.pg_get_indexdef(i.indexrelid) AS indexdef, i.indnkeyatts AS indnkeyatts, i.indnatts AS indnatts, i.indkey, i.indisclustered, i.indisreplident, c.contype, c.conname, c.condeferrable, c.condeferred, c.tableoid AS contableoid, c.oid AS conoid, pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, (SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, t.reloptions AS indreloptions, (SELECT pg_catalog.array_agg(attnum ORDER BY attnum) FROM pg_catalog.pg_attribute WHERE attrelid = i.indexrelid AND attstattarget >= 0) AS indstatcols,(SELECT pg_catalog.array_agg(attstattarget ORDER BY attnum) FROM pg_catalog.pg_attribute WHERE attrelid = i.indexrelid AND attstattarget >= 0) AS indstatvals FROM pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) JOIN pg_catalog.pg_class t2 ON (t2.oid = i.indrelid) LEFT JOIN pg_catalog.pg_constraint c ON (i.indrelid = c.conrelid AND i.indexrelid = c.conindid AND c.contype IN ('p','u','x')) LEFT JOIN pg_catalog.pg_inherits inh ON (inh.inhrelid = indexrelid) WHERE i.indrelid = '1152770777'::pg_catalog.oid AND (i.indisvalid OR t2.relkind = 'p') AND i.indisready ORDER BY indexname

Any ideas on what this might mean, or where to look?

Wells Oliver
wells.oliver@gmail.com

Re: Odd pg dump error: cache lookup failure

From

Alvaro Herrera

Date:

24 August 2020, 19:08:53

On 2020-Aug-24, Wells Oliver wrote:

> My pg_dump scripts have been bombing lately, and this is the error I see a
> bit into the backup process:
> 
> pg_dump: error: query failed: ERROR:  cache lookup failed for attribute 1
> of relation 1152770777

What pg version is this, and can you run the query directly using psql
and have it throw the same error?  What do
select * from pg_class where oid = '1152770777'
and
select * from pg_attribute where attrelid = '1152770777'
show?  If the latter fails to show an entry for attnum=1, does it
magically appear if you do
SET enable_indexscan=off;
SET enable_bitmapscan=off;
prior to running the query?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Odd pg dump error: cache lookup failure

From

Wells Oliver

Date:

24 August 2020, 20:06:59

12.4. Both of those queries return no rows, with and without those options set.

We do do some nightly processing of data during the same window, in which views are re-materialized: would that cause this issue? No non-temporary tables are dropped or re-created, though.

On Mon, Aug 24, 2020 at 12:08 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Aug-24, Wells Oliver wrote:

> My pg_dump scripts have been bombing lately, and this is the error I see a
> bit into the backup process:
>
> pg_dump: error: query failed: ERROR: cache lookup failed for attribute 1
> of relation 1152770777

What pg version is this, and can you run the query directly using psql
and have it throw the same error? What do
select * from pg_class where oid = '1152770777'
and
select * from pg_attribute where attrelid = '1152770777'
show? If the latter fails to show an entry for attnum=1, does it
magically appear if you do
SET enable_indexscan=off;
SET enable_bitmapscan=off;
prior to running the query?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Wells Oliver
wells.oliver@gmail.com

Re: Odd pg dump error: cache lookup failure

From

Alvaro Herrera

Date:

24 August 2020, 23:13:38

On 2020-Aug-24, Wells Oliver wrote:

> 12.4. Both of those queries return no rows, with and without those options
> set.
> 
> We do do some nightly processing of data during the same window, in which
> views are re-materialized: would that cause this issue? No non-temporary
> tables are dropped or re-created, though.

Oh, so this could be some kind of relation that exists when pg_dump
starts but is dropped before it reaches the point where it dumps the
data for that relation.

Hmm.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Odd pg dump error: cache lookup failure

From

Stephen Frost

Date:

24 August 2020, 23:16:21

Greetings,

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> On 2020-Aug-24, Wells Oliver wrote:
> > 12.4. Both of those queries return no rows, with and without those options
> > set.
> >
> > We do do some nightly processing of data during the same window, in which
> > views are re-materialized: would that cause this issue? No non-temporary
> > tables are dropped or re-created, though.
>
> Oh, so this could be some kind of relation that exists when pg_dump
> starts but is dropped before it reaches the point where it dumps the
> data for that relation.
>
> Hmm.

That'd be cute since pg_dump should be locking all the relations that
it's going to export the data from...

Thanks,

Stephen

Attachment

signature.asc

Re: Odd pg dump error: cache lookup failure

From

Tom Lane

Date:

25 August 2020, 17:24:36

Stephen Frost <sfrost@snowman.net> writes:
> * Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
>> Oh, so this could be some kind of relation that exists when pg_dump
>> starts but is dropped before it reaches the point where it dumps the
>> data for that relation.

> That'd be cute since pg_dump should be locking all the relations that
> it's going to export the data from...

The failure is evidently in getIndexes's query, and that will try to
acquire information about every table not satisfying

        if (!tbinfo->hasindex)
            continue;
        if (!(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION) &&
            !tbinfo->interesting)
            continue;

However, getTables previously acquired locks on only the tables
satisfying

        if (tblinfo[i].dobj.dump &&
            (tblinfo[i].relkind == RELKIND_RELATION ||
             tblinfo->relkind == RELKIND_PARTITIONED_TABLE) &&
            (tblinfo[i].dobj.dump & DUMP_COMPONENTS_REQUIRING_LOCK))

(BTW, Stephen, the first line of that is clearly redundant now...)

AFAICS the tbinfo->interesting test is equivalent to dobj.dump being
nonzero.  Also, while the relkind restriction looks problematic,
I don't think hasindex could be true for any rels of other relkinds.
So the explanation has to lie in the fact that the former is an OR
condition, ie probe table if "DUMP_COMPONENT_DEFINITION" *or*
"interesting", while the latter is an AND condition, ie lock table
if "dump" *and* "DUMP_COMPONENTS_REQUIRING_LOCK".

This'd be irrelevant if pg_dump is just dumping everything,
which leads me to ask what are pg_dump's command line switches,
and is it possible that the switches are excluding partitioning
or inheritance parents of tables that are due to be dumped?

            regards, tom lane

Re: Odd pg dump error: cache lookup failure

From

Wells Oliver

Date:

25 August 2020, 18:23:25

Interesting: the only parameters I pass are a list of excluded schemas via -N arguments, as well as a --format=c.

I don't think anything in the excluded schemas is being referenced by something that is set to be dumped: is that what you're thinking? It's possible, and I'll dig a bit, but things are generally walled off pretty well.

On Tue, Aug 25, 2020 at 10:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Stephen Frost <sfrost@snowman.net> writes:
> * Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
>> Oh, so this could be some kind of relation that exists when pg_dump
>> starts but is dropped before it reaches the point where it dumps the
>> data for that relation.

> That'd be cute since pg_dump should be locking all the relations that
> it's going to export the data from...

The failure is evidently in getIndexes's query, and that will try to
acquire information about every table not satisfying

if (!tbinfo->hasindex)
continue;
if (!(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION) &&
!tbinfo->interesting)
continue;

However, getTables previously acquired locks on only the tables
satisfying

if (tblinfo[i].dobj.dump &&
(tblinfo[i].relkind == RELKIND_RELATION ||
tblinfo->relkind == RELKIND_PARTITIONED_TABLE) &&
(tblinfo[i].dobj.dump & DUMP_COMPONENTS_REQUIRING_LOCK))

(BTW, Stephen, the first line of that is clearly redundant now...)

AFAICS the tbinfo->interesting test is equivalent to dobj.dump being
nonzero. Also, while the relkind restriction looks problematic,
I don't think hasindex could be true for any rels of other relkinds.
So the explanation has to lie in the fact that the former is an OR
condition, ie probe table if "DUMP_COMPONENT_DEFINITION" *or*
"interesting", while the latter is an AND condition, ie lock table
if "dump" *and* "DUMP_COMPONENTS_REQUIRING_LOCK".

This'd be irrelevant if pg_dump is just dumping everything,
which leads me to ask what are pg_dump's command line switches,
and is it possible that the switches are excluding partitioning
or inheritance parents of tables that are due to be dumped?

regards, tom lane

Wells Oliver
wells.oliver@gmail.com

Re: Odd pg dump error: cache lookup failure

From

Tom Lane

Date:

25 August 2020, 19:05:27

Wells Oliver <wells.oliver@gmail.com> writes:
> Interesting: the only parameters I pass are a list of excluded schemas via
> -N arguments, as well as a --format=c.

> I don't think anything in the excluded schemas is being referenced by
> something that is set to be dumped: is that what you're thinking? It's
> possible, and I'll dig a bit, but things are generally walled off pretty
> well.

I overlooked something in my previous quick code scan, which is that
flagInhTables() will set the "interesting" flag on parents of tables
that are due to be dumped, even if the parents are not.  This *will*
result in running getIndexes on tables that we have no lock on.

However, it's not quite clear how that leads to the observed failure,
because if we have a lock on some child partition, it shouldn't really
be possible for someone to drop an index on the partition root,
should it?  In any case you didn't admit to doing such things.
This theory would require both that a table-to-be-dumped is a
partition of some partitioned table that's in an excluded schema,
and that you're concurrently doing DDL on that partitioned table.

            regards, tom lane

Re: Odd pg dump error: cache lookup failure

From

Wells Oliver

Date:

25 August 2020, 19:06:23

And refreshing materialized views during the dump process wouldn't cause this?

On Tue, Aug 25, 2020 at 12:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Wells Oliver <wells.oliver@gmail.com> writes:
> Interesting: the only parameters I pass are a list of excluded schemas via
> -N arguments, as well as a --format=c.

> I don't think anything in the excluded schemas is being referenced by
> something that is set to be dumped: is that what you're thinking? It's
> possible, and I'll dig a bit, but things are generally walled off pretty
> well.

I overlooked something in my previous quick code scan, which is that
flagInhTables() will set the "interesting" flag on parents of tables
that are due to be dumped, even if the parents are not. This *will*
result in running getIndexes on tables that we have no lock on.

However, it's not quite clear how that leads to the observed failure,
because if we have a lock on some child partition, it shouldn't really
be possible for someone to drop an index on the partition root,
should it? In any case you didn't admit to doing such things.
This theory would require both that a table-to-be-dumped is a
partition of some partitioned table that's in an excluded schema,
and that you're concurrently doing DDL on that partitioned table.

regards, tom lane

Wells Oliver
wells.oliver@gmail.com

Re: Odd pg dump error: cache lookup failure

From

Tom Lane

Date:

25 August 2020, 19:33:29

Wells Oliver <wells.oliver@gmail.com> writes:
> And refreshing materialized views during the dump process wouldn't cause
> this?

matviews can't be part of inheritance trees AFAIR, so we'd need some
other theory to explain it that way.

Did you try tracing down the table OID mentioned in the error to see
if it still exists, and if so what is it?

            regards, tom lane

Re: Odd pg dump error: cache lookup failure

From

Wells Oliver

Date:

25 August 2020, 19:44:50

It doesn't exist any longer, which lead me to try to think of things that might be dropped during the dump process.

On Tue, Aug 25, 2020 at 12:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Wells Oliver <wells.oliver@gmail.com> writes:
> And refreshing materialized views during the dump process wouldn't cause
> this?

matviews can't be part of inheritance trees AFAIR, so we'd need some
other theory to explain it that way.

Did you try tracing down the table OID mentioned in the error to see
if it still exists, and if so what is it?

regards, tom lane

Wells Oliver
wells.oliver@gmail.com

Re: Odd pg dump error: cache lookup failure

From

Tom Lane

Date:

25 August 2020, 19:51:03

I wrote:
> Wells Oliver <wells.oliver@gmail.com> writes:
>> And refreshing materialized views during the dump process wouldn't cause
>> this?

> matviews can't be part of inheritance trees AFAIR, so we'd need some
> other theory to explain it that way.

Oh wait a second.  Matviews can have indexes, and getTables doesn't
lock them (because we don't allow LOCK TABLE on views).  So it's fairly
clear that you could get here with no lock if 1152770777 is a matview.
However, it's still not entirely clear how a refresh could trigger
the observed error.  We certainly aren't changing the matview's OID
when we do that.  Do we rewrite the pg_attribute entries anyway?
And even if we do, why would there be a problem?

Are your refreshes using CONCURRENTLY?

            regards, tom lane

Re: Odd pg dump error: cache lookup failure

From

Wells Oliver

Date:

25 August 2020, 19:53:14

Each of the refreshes would definitely use CONCURRENTLY.

On Tue, Aug 25, 2020 at 12:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:
> Wells Oliver <wells.oliver@gmail.com> writes:
>> And refreshing materialized views during the dump process wouldn't cause
>> this?

> matviews can't be part of inheritance trees AFAIR, so we'd need some
> other theory to explain it that way.

Oh wait a second. Matviews can have indexes, and getTables doesn't
lock them (because we don't allow LOCK TABLE on views). So it's fairly
clear that you could get here with no lock if 1152770777 is a matview.
However, it's still not entirely clear how a refresh could trigger
the observed error. We certainly aren't changing the matview's OID
when we do that. Do we rewrite the pg_attribute entries anyway?
And even if we do, why would there be a problem?

Are your refreshes using CONCURRENTLY?

regards, tom lane

Wells Oliver
wells.oliver@gmail.com

Re: Odd pg dump error: cache lookup failure

From

Tom Lane

Date:

25 August 2020, 19:54:41

Wells Oliver <wells.oliver@gmail.com> writes:
> It doesn't exist any longer, which lead me to try to think of things that
> might be dropped during the dump process.

Hm, if you're actually *dropping* matviews during the dump then it's
not so hard to explain this error.  They'd have to be ones that were
selected to be dumped though.

            regards, tom lane

Re: Odd pg dump error: cache lookup failure

From

Tom Lane

Date:

25 August 2020, 23:24:04

I wrote:
> Wells Oliver <wells.oliver@gmail.com> writes:
>> It doesn't exist any longer, which lead me to try to think of things that
>> might be dropped during the dump process.

> Hm, if you're actually *dropping* matviews during the dump then it's
> not so hard to explain this error.  They'd have to be ones that were
> selected to be dumped though.

I experimented a bit to try to reproduce this problem.  I cannot get
any sort of error from REFRESH (with or without CONCURRENTLY) in
parallel with a pg_dump.  If I drop a view or matview, I can easily
get an error, but I've not managed to reproduce one that looks like
yours; it tends to be more like

pg_dump: error: query failed: ERROR:  could not open relation with OID 45698

What I found that *would* reproduce "cache lookup failed for attribute"
from pg_get_indexdef() is to explicitly drop a matview's index just
before pg_dump gets to it.  So I wonder if you are doing that in your
"refresh" procedure.  The timing is not terribly tight; the drop has to
happen between where pg_dump acquires its transaction snapshot and where
it tries to investigate the matview's indexes, which could be some while
in a database with many objects.  Also, if the transaction doing the index
drop also takes out any exclusive locks on regular tables, that could make
it much easier to send pg_dump down this rabbit hole, since it'd block
on those locks till the damage was done.

            regards, tom lane

Re: Odd pg dump error: cache lookup failure

From

Wells Oliver

Date:

26 August 2020, 17:51:59

Thanks Tom, this is intriguing. I've changed our backups to do pg_dump with verbose, and if I see this issue again I'll dig a bit with the additional information.

On Tue, Aug 25, 2020 at 4:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:
> Wells Oliver <wells.oliver@gmail.com> writes:
>> It doesn't exist any longer, which lead me to try to think of things that
>> might be dropped during the dump process.

> Hm, if you're actually *dropping* matviews during the dump then it's
> not so hard to explain this error. They'd have to be ones that were
> selected to be dumped though.

I experimented a bit to try to reproduce this problem. I cannot get
any sort of error from REFRESH (with or without CONCURRENTLY) in
parallel with a pg_dump. If I drop a view or matview, I can easily
get an error, but I've not managed to reproduce one that looks like
yours; it tends to be more like

pg_dump: error: query failed: ERROR: could not open relation with OID 45698

What I found that *would* reproduce "cache lookup failed for attribute"
from pg_get_indexdef() is to explicitly drop a matview's index just
before pg_dump gets to it. So I wonder if you are doing that in your
"refresh" procedure. The timing is not terribly tight; the drop has to
happen between where pg_dump acquires its transaction snapshot and where
it tries to investigate the matview's indexes, which could be some while
in a database with many objects. Also, if the transaction doing the index
drop also takes out any exclusive locks on regular tables, that could make
it much easier to send pg_dump down this rabbit hole, since it'd block
on those locks till the damage was done.

regards, tom lane

Wells Oliver
wells.oliver@gmail.com

Re: Odd pg dump error: cache lookup failure

From

Alvaro Herrera

Date:

21 October 2020, 20:06:59

On 2020-Aug-25, Tom Lane wrote:

> Oh wait a second.  Matviews can have indexes, and getTables doesn't
> lock them (because we don't allow LOCK TABLE on views).

Here's a quick patchset that makes pg_dump do LOCK TABLE on all
relations it dumps; patch 0001 allows server-side LOCK TABLE to run on
any relkind, and then 0002 makes pg_dump test for that capability at
connection start, and if it exists, then it's used for all relations to
dump.

Attachment

Re: Odd pg dump error: cache lookup failure

From

Tom Lane

Date:

21 October 2020, 20:33:54

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Here's a quick patchset that makes pg_dump do LOCK TABLE on all
> relations it dumps; patch 0001 allows server-side LOCK TABLE to run on
> any relkind, and then 0002 makes pg_dump test for that capability at
> connection start, and if it exists, then it's used for all relations to
> dump.

I think the "test for that capability" bit needs more subtlety.

In the first place, demanding exclusive lock on a system view has a
pretty high probability of failing due to someone else accessing the
view at that moment, and/or causing failures in other transations.
Since you only need to find out if the command can work at all, you
could make this a lot less fragile and unfriendly by just asking for
ACCESS SHARE lock.

Even with the lower lock level, you might get back a transient
can't-lock-it failure rather than the condition you are looking for.
So I think there needs to be a check on the errcode here.

Should there be a timeout on the LOCK?  Or even better, use NOWAIT
where available (which actually appears to be all versions of
interest).

I'd also advise skipping the whole thing if server version >= 14.
I see you have a lower-bound check which is good, but not nearly
as useful over time as the other will be.

(Also, personally I'd put the version check in IsLockTableGeneric,
not its caller.)

Why is it OK to drop the check for RELKIND_PARTITIONED_TABLE
when !hasGenericLockTable?  Surely that does the wrong thing
on released minor versions?

            regards, tom lane

Re: Odd pg dump error: cache lookup failure

From

Alvaro Herrera

Date:

22 October 2020, 19:01:55

On 2020-Oct-21, Tom Lane wrote:

> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Here's a quick patchset that makes pg_dump do LOCK TABLE on all
> > relations it dumps; patch 0001 allows server-side LOCK TABLE to run on
> > any relkind, and then 0002 makes pg_dump test for that capability at
> > connection start, and if it exists, then it's used for all relations to
> > dump.
> 
> I think the "test for that capability" bit needs more subtlety.

Great ideas, thanks -- all adopted in the attached version.  I didn't
test this with 9.5 but as you say NOWAIT is already supported there and
the command itself does work.

Since the wrong-object error was thrown before the lock-unavailable
error, and both before the ACL check, it works to test for those codes
specifically; it's not a problem if the lock is unavailable or denied.
Right now if any other error occurs in the LOCK TABLE, pg_dump aborts
with an error.  I don't know what that could be.  I suppose if it dies
with OOM then it doesn't matter whether it's here or elsewhere.

Testing for a view as I was doing was not good, because PG12 does allow
locks on views.  I decided to use pg_aggregate_fnoid_index because it's
easier for manual tests: it doesn't prevent connecting to the database
when a strong lock is held.  This in contrast with other more obvious
candidates.  It's a pretty arbitrary choice but it shouldn't harm
anything since we don't actually hold the lock for more than an instant,
and that only if it's not contended.

I discovered once again that fallthrough comment handling in combination
with ccache are pretty obnoxious.

Attachment

Re: Odd pg dump error: cache lookup failure

From

Tom Lane

Date:

22 October 2020, 20:04:24

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2020-Oct-21, Tom Lane wrote:
>> I think the "test for that capability" bit needs more subtlety.

> Great ideas, thanks -- all adopted in the attached version.  I didn't
> test this with 9.5 but as you say NOWAIT is already supported there and
> the command itself does work.

OK, looking a bit harder this time:

1. I think this bit in LockViewRecurse_walker must be removed as well:

            /* Currently, we only allow plain tables or views to be locked. */
            if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE &&
                relkind != RELKIND_VIEW)
                continue;

(Mostly, stuff in views would be ordinary tables anyway ... but
foreign tables are a possible exception.)

2. Should we add anything to the LOCK TABLE reference page?  I see nothing
there now that bears on this change, but ...

3. Is hard-coding #define's for ERRCODEs actually the best we can do here?
I guess it is right now, but ugh.  Seems like we ought to find a way for
frontend code to make use of errcodes.h.  Not a matter for this patch
though.

4. You neglected to change the table name in the warn_or_exit_horribly
error message.

> Testing for a view as I was doing was not good, because PG12 does allow
> locks on views.  I decided to use pg_aggregate_fnoid_index because it's
> easier for manual tests: it doesn't prevent connecting to the database
> when a strong lock is held.  This in contrast with other more obvious
> candidates.  It's a pretty arbitrary choice but it shouldn't harm
> anything since we don't actually hold the lock for more than an instant,
> and that only if it's not contended.

There might be some value in using one of pg_class's indexes instead,
mainly because the parser will surely need to touch that on the way
to performing the LOCK, while pg_aggregate is far afield.  Given the
limited range of PG versions that we'll ever need to perform the check
for, I don't think there's much risk in using any well-known index
from a version-compatibility standpoint; but acquiring locks across
different catalogs could conceivably risk some issues vs a VACUUM
FULL or the like.

Other than these points, it seems committable.

            regards, tom lane

Re: Odd pg dump error: cache lookup failure

From

Alvaro Herrera

Date:

23 October 2020, 18:49:04

On 2020-Oct-22, Tom Lane wrote:

> 1. I think this bit in LockViewRecurse_walker must be removed as well:
> 
>             /* Currently, we only allow plain tables or views to be locked. */
>             if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE &&
>                 relkind != RELKIND_VIEW)
>                 continue;
> 
> (Mostly, stuff in views would be ordinary tables anyway ... but
> foreign tables are a possible exception.)

Actually, foreign tables are not a problem: I guess they were hammered
out enough when somebody allowed them to be partitions, for partitioned
table recursive locking.  What *is* a problem with that change is any
RTE that's not RTE_RELATION.  It's not obvious in the above test, but
the most important case being left out is "relkind == '\0'" which is
what you get for other RTEs; so that broke when removing the above.  So
I added a "continue" for other RTE kinds, and now it all seems to work
correctly as far as I can tell.

While on this: it's important to note that lack of recursion here to
join RTEs does not break anything, since the relations being joined are
separately part of the rtable so they'll be processed anyway.

BTW I don't know what's the possible usefulness of locking standalone
composite types, but it's there.

> 2. Should we add anything to the LOCK TABLE reference page?  I see nothing
> there now that bears on this change, but ...

Eh.  There's a couple of places where it now seems more correct to talk
about a *relation* being locked.  I went as far as mentioning the set of
relkinds, for extra clarity (I skipped toast tables because it seemed
pointless).  The subject of the ONLY keyword not applying to views was a
bit surprising so I added that too.  Not 100% wedded on the wording of
these changes.

> 3. Is hard-coding #define's for ERRCODEs actually the best we can do here?
> I guess it is right now, but ugh.  Seems like we ought to find a way for
> frontend code to make use of errcodes.h.  Not a matter for this patch
> though.

Yeah, I had the same though -- didn't like this at all but evidently it
hasn't bothered anyone sufficiently.

> 4. You neglected to change the table name in the warn_or_exit_horribly
> error message.

Ugh, fixed.

> There might be some value in using one of pg_class's indexes instead,
> mainly because the parser will surely need to touch that on the way
> to performing the LOCK, while pg_aggregate is far afield.

Hmm.  I ended up using pg_class_tblspc_relfilenode_index, which I
suppose would be the least trafficked index of pg_class.  At least,
holding that one doesn't block other connections (so I could use
LOCK TABLE <that index>;
in another session and still launch pg_dump to verify it worked correctly)

Re: Odd pg dump error: cache lookup failure

From

Alvaro Herrera

Date:

23 October 2020, 18:51:44

Attachments.

Attachment

Re: Odd pg dump error: cache lookup failure

From

Tom Lane

Date:

26 October 2020, 17:52:58

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> [ v3 patches ]

This looks committable to me, with one trivial nit:
in LockViewRecurse_walker, it'd be better to not do get_rel_name()
(or even fetch rte->relid, really) until we've verified the rtekind.
As-is, it's wasting cycles and presuming more than it ought to
about the error behavior of get_rel_name().  That's a pre-existing
problem not the fault of this patch, but we might as well fix it
while we're here.

            regards, tom lane

Re: Odd pg dump error: cache lookup failure

From

Alvaro Herrera

Date:

27 October 2020, 17:36:18

Both are pushed now.

Thanks for the reviews!