Re: pg_reorg in core? - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: pg_reorg in core?
Date
Msg-id CAB7nPqRQcYzp4aLG0+bU1g=SQpWBe5gLyAjRgcixzOJqJ8ts9g@mail.gmail.com
Whole thread Raw
In response to Re: pg_reorg in core?  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: pg_reorg in core?  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers


On Wed, Sep 26, 2012 at 8:13 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On Tuesday, September 25, 2012 01:48:34 PM Michael Paquier wrote:
> On Tue, Sep 25, 2012 at 5:55 PM, Andres Freund <andres@2ndquadrant.com>wrote:
> > On Tuesday, September 25, 2012 04:37:05 AM Michael Paquier wrote:
> > > On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund <andres@2ndquadrant.com
> > >
> > >wrote:
> > > Could you clarify what do you mean here by cleanup?
> > > I am afraid I do not get your point here.
> >
> > Sorry, was a bit tired when writing the above.
> >
> > The point is that to work concurrent the CONCURRENT operations
> > commit/start multiple transactions internally. It can be interrupted
> > (user, shutdown, error,
> > crash) and leave transient state behind every time it does so. What I
> > wanted to
> > say is that we need to take care that each of those can easily be cleaned
> > up
> > afterwards.
>
> Sure, many errors may happen.
> But, in the case of CREATE INDEX CONCURRENTLY, there is no clean up method
> implemented as far as I know (might be missing something though). Isn't an
> index only considered as invalid in case of failure for concurrent creation?
Well, you can DROP or REINDEX the invalid index.

There are several scenarios where you can get invalid indexes. Unique
violations, postgres restarts, aborted index creation...

> In the case of REINDEX it would be essential to create such a cleanup
> mechanism as I cannot imagine a production database with an index that has
> been marked as invalid due to a concurrent reindex failure, by assuming here,
> of course, that REINDEX CONCURRENTLY would use the same level of process
> error as CREATE INDEX CONCURRENTLY.
Not sure what youre getting at?
I just meant that when CREATE INDEX CONCURRENTLY fails, the index created is
considered as invalid, so it cannot be used by planner.

Based on what you told before:
1) build new index with indisready = false
newindex.indisready = true
wait
2) newindex.indisvalid = true
wait
3) swap(oldindex.relfilenode, newindex.relfilenode)
oldindex.indisvalid = false
wait
4) oldindex.indisready = false
wait
drop new index with old relfilenode

If the reindex fails at step 1 or 2, the new index is not usable so the relation will finish
with an index which is not valid. If it fails at step 4, the old index is invalid. If it fails at step
3, both indexes are valid and both are usable for given relation.
Do you think it is acceptable to consider that the user has to do the cleanup of the old or new index
himself if there is a failure?


> One of the possible cleanup mechanisms I got on top of my head is a
> callback at transaction abort, each callback would need to be different for
> each subtransaction used at during the concurrent operation.
> In case the callback itself fails, well the old and/or new indexes become
> invalid.
Thats not going to work. E.g. the session might have been aborted or such.
Also, there is not much you can do from an callback at transaction end as you
cannot do catalog modifications.

I was thinking of REINDEX CONCURRENTLY CONTINUE or something vaguely similar.
You could also reissue the reindex command and avoid an additional command. When launching a
concurrent reindex, it could be possible to check if there is already an index that has been created to replace the
old one that failed previously. In order to control that, why not adding an additional field in pg_index?
When creating a new index concurrently, we register in its pg_index entry the oid of the index that it has to
replace. When reissuing the command after a failure, it is then possible to check if there is already an index that has
been issued by a previous REINDEX CONCURRENT command and based on the flag values of the old and new
indexes it is then possible to replay the command from the step where it previously failed.
--
Michael Paquier
http://michael.otacoo.com

pgsql-hackers by date:

Previous
From: Petr Chmelar
Date:
Subject: Re: Enum binary access
Next
From: Tomas Vondra
Date:
Subject: autovacuum stress-testing our system