Re: Potential GIN vacuum bug - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Potential GIN vacuum bug
Date
Msg-id CAMkU=1zk+hE-MEggw3zCrUTSQPu9c8qZiogSbhH0n3Yzmx-S+A@mail.gmail.com
Whole thread Raw
In response to Re: Potential GIN vacuum bug  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Potential GIN vacuum bug  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, Aug 30, 2015 at 11:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Janes <jeff.janes@gmail.com> writes:

Your earlier point about how the current design throttles insertions to
keep the pending list from growing without bound seems like a bigger deal
to worry about.  I think we'd like to have some substitute for that.
Perhaps we could make the logic in insertion be something like

        if (pending-list-size > threshold)
        {
                if (conditional-lock-acquire(...))
                {
                        do-pending-list-cleanup;
                        lock-release;
                }
                else if (pending-list-size > threshold * 2)
                {
                        unconditional-lock-acquire(...);
                        if (pending-list-size > threshold)
                                do-pending-list-cleanup;
                        lock-release;
                }
        }

so that once the pending list got too big, incoming insertions would wait
for it to be cleared.  Whether to use a 2x safety margin or something else
could be a subject for debate, of course.

If the goal is to not change existing behavior (like for back patching) the margin should be 1, always wait.  But we would still have to deal with the fact that unconditional acquire attempt by the backends will cause a vacuum to cancel itself, which is undesirable.  If we define a new namespace for this lock (like the relation extension lock has its own namespace) then perhaps the cancellation code could be made to not cancel on that condition.  But that too seems like a lot of work to backpatch.

Would we bother to back-patch a theoretical bug which there is no evidence is triggering in the field?  Of course, if people are getting bit by this, they probably wouldn't know.  You search for "malevolent unicorns", get no hits, and just assume there are no hits, without scouring the table and seeing it is an index problem.  Or if you do realize it is an index problem, you would probably never trace it back to the cause of the problem.  There are quite a few reports of mysterious index corruptions which never get resolved.

If we want to improve the current behavior rather than fix a bug, then I think that if the list is greater than threshold*2 and the cleaning lock is unavailable, what it should do is proceed to insert the tuple's keys into the index itself, as if fastupdate = off.  That would require some major surgery to the existing code, as by the time it invokes the clean up, it is too late to not insert into the pending list.


Cheers,

Jeff

pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: [patch] Proposal for \rotate in psql
Next
From: Bruce Momjian
Date:
Subject: Re: Horizontal scalability/sharding