Re: WAL Bypass for indexes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: WAL Bypass for indexes
Date
Msg-id 7498.1144072541@sss.pgh.pa.us
Whole thread Raw
In response to Re: WAL Bypass for indexes  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: WAL Bypass for indexes
Re: WAL Bypass for indexes
List pgsql-hackers
Simon Riggs <simon@2ndquadrant.com> writes:
> Thinking about this some more, I ask myself: why is it we log index
> inserts at all? We log heap inserts, which contain all the information
> we need to replay all index inserts also, so why bother?

(1) We can't run user-defined functions during log replay.  Quite
aside from any risk of nondeterminism, the normal transaction
infrastructure isn't functioning in that environment.

(2) Some of the index code is itself deliberately nondeterministic.
I'm thinking in particular of the move-right-or-not choice in
_bt_insertonpg() when there are many equal keys, but randomization is
in general a useful algorithmic technique that we'd have to forswear.

(3) In the presence of concurrency, the sequence of heap-insert WAL
records isn't enough info, because it doesn't tell you what order the
index inserts occurred in.  The btree code, at least, is sufficiently
concurrent that even knowing the sequence of leaf-key insertions isn't
full information --- it's not hard to imagine cases where decisions
about where to split upper-level pages are dependent on which process
manages to obtain lock on a page first.

There are probably some other reasons that I forgot.  Check the
archives; this point has been debated before.

Basically the problem here is that you can't mix logged and non-logged
operations --- if you're going to WAL-log any operations on an index
then you have to be sure that the replay will regenerate exactly the
same series of index states that happened the first time.  So none of
this is an argument against "rebuild the index at end of replay"; but
I don't see any workable half measures.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Horváth Sándor
Date:
Subject: deferrable check, trigger
Next
From: Simon Riggs
Date:
Subject: Re: WAL Bypass for indexes