Re: REINDEX INDEX results in a crash for an index of pg_class since9.6 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: REINDEX INDEX results in a crash for an index of pg_class since9.6
Date
Msg-id 20190426020248.iyev66bkpf2abxqd@alap3.anarazel.de
Whole thread Raw
In response to Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
List pgsql-hackers
Hi,

On 2019-04-25 17:12:33 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-04-25 16:02:03 -0400, Tom Lane wrote:
> >> That could work.  The important API spec is then that the relcache entry
> >> reflects the *previous* state of the relation, and is not to be modified
> >> by the tableam call.
>
> > Right.
>
> > I was wondering if we should just pass in the pg_class tuple as an "out"
> > argument, instead of pointers to relfilnode/relfrozenxid/relminmxid.
>
> Yeah, possibly.  The whole business with xids is perhaps heapam specific,
> so decoupling this function's signature from them would be good.

I've left that out in the attached. Currently VACUUM FULL / CLUSTER also
needs to handle those, and the callback for transactional rewrite
(table_relation_copy_for_cluster()), also returns those as output
parameter.  I think I can see a way how we could clean up the relevant
cluster.c code, but until that's done, I don't see much point in a
different interface (I'll probably write apatch .

The attached patch fixes the problem for me, and passes all existing
tests. It contains a few changes that are not strictly necessary, but
imo clear improvements.

We probably could split the tableam changes and related refactoring from
the fix to make backpatching simpler. I've not done that yet, but I
think we should before committing.

Questions:
- Should we move the the CommandCounterIncrement() from
  RelationSetNewRelfilenode() to the callers? That'd allow them to do
  other things to the new relation (e.g. fill it), before making the
  changes visible. Don't think it'd currently help, but it seems like it
  could make code more robust in the future.

- Should we introduce an assertion into CatalogIndexInsert()'s
  HeapTupleIsHeapOnly() path, that asserts that all the relevant indexes
  aren't ReindexIsProcessingIndex()? Otherwise it seems way too easy to
  re-introduce bugs like this one.  Dirty hack for that included.

- Wonder if we shouldn't introduce something akin to
  SetReindexProcessing() for table rewrites (e.g. VACUUM FULL), to
  prevent the related error of inserting/updating a catalog table that's
  currently being rewritten.

Taking this as a WIP, what do you think?

Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: "Matsumura, Ryo"
Date:
Subject: RE: Patch: doc for pg_logical_emit_message()
Next
From: Alvaro Herrera
Date:
Subject: Re: Reducing the runtime of the core regression tests