Home > mailing lists

Re: BUG #15309: ERROR: catalog is missing 1 attribute(s) for relid760676 when max_parallel_maintenance_workers > 0 - Mailing list pgsql-bugs

From	Peter Geoghegan
Subject	Re: BUG #15309: ERROR: catalog is missing 1 attribute(s) for relid760676 when max_parallel_maintenance_workers > 0
Date	August 6, 2018 23:31:16
Msg-id	CAH2-Wzn9eJMQYnxBmc4=VsGcK3tLk6Z1xO2s9nXhBRMBqHTJ3Q@mail.gmail.com Whole thread Raw
In response to	Re: BUG #15309: ERROR: catalog is missing 1 attribute(s) for relid760676 when max_parallel_maintenance_workers > 0 (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: BUG #15309: ERROR: catalog is missing 1 attribute(s) for relid760676 when max_parallel_maintenance_workers > 0
List	pgsql-bugs

Tree view

On Mon, Aug 6, 2018 at 10:43 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> I'll work to isolate and diagnose the problem today. It likely has
> something to do with corrupting the state needed by a catalog parallel
> index build in the context of the VACUUM FULL. pg_attribute grows to
> several tens of megabytes here, which is enough to get a parallel
> index build.

This repro can be further simplified, by just doing a VACUUM FULL on
pg_attribute alone. There is no index corruption prior to that point.
After that point, there is -- both pg_attribute_relid_attnam_index and
pg_attribute_relid_attnum_index seem to become corrupt. All other
symptoms probably stem from this initial corruption, so I'm focusing
on it.

What I see if I look at the corrupt pg_attribute_relid_attnum_index
structure is that the index does actually have an entry for a heap
tuple that amcheck complains about lacking an entry for -- at least,
it has a key match. The problem that amcheck noticed was that the heap
item pointer was not as it should be (i.e. the index tuple points to
the wrong heap tuple). I also noticed that nearby index tuples had
duplicate entries, the first pointing to approximately the same place
in the heap that the corrupt-to-amcheck tuple points to, and the
second pointing to approximately the same place in the heap that
amcheck expected to find it at (amcheck was complaining about an
adjacent entry, so it's only approximately the same place in the
heap).

I suspect that the problem is that parallel workers have a different
idea about which relfilenode they need to scan, or something along
those lines. Maybe cluster_rel() needs to be taught about parallel
CREATE INDEX. I must have missed some detail within cluster.c prior to
parallel CREATE INDEX going in.

-- 
Peter Geoghegan

pgsql-bugs by date:

From: Yahor Yuzefovich
Date: 06 August 2018, 22:30:00
Subject: Re: Docker image of 11~beta2-2 orders strings case-insensitively

From: "David G. Johnston"
Date: 06 August 2018, 23:34:52
Subject: Re: Docker image of 11~beta2-2 orders strings case-insensitively

Re: BUG #15309: ERROR: catalog is missing 1 attribute(s) for relid760676 when max_parallel_maintenance_workers > 0 - Mailing list pgsql-bugs

Previous

Next