Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept
Date
Msg-id CAPpHfdu03sZ21U76vPncN3w_+N9jP4p5Svj5fb6FF9N58GYWYw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
List pgsql-hackers
On Tue, Aug 21, 2018 at 4:10 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> After reading [1] and [2] I got that there are at least 3 different
> issues with heap truncation:
> 1) Data corruption on file truncation error (explained in [1]).
> 2) Expensive scanning of the whole shared buffers before file truncation.
> 3) Cancel of read-only queries on standby even if hot_standby_feedback
> is on, caused by replication of AccessExclusiveLock.
>
> It seems that fixing any of those issues requires redesign of heap
> truncation.  So, ideally redesign of heap truncation should fix all
> the issues of above.  Or at least it should be understood how the rest
> of issues can be fixed later using the new design.
>
> I would like to share some my sketchy thoughts about new heap
> truncation design.  Let's imagine we introduced dirty_barrier buffer
> flag, which prevents dirty buffer from being written (and
> correspondingly evicted).  Then truncation algorithm could look like
> this:
>
> 1) Acquire ExclusiveLock on relation.
> 2) Calculate truncation point using count_nondeletable_pages(), while
> simultaneously placing dirty_barrier flag on dirty buffers and saving
> their numbers to array.  Assuming no writes are performing
> concurrently, no to-be-truncated-away pages should be written from
> this point.
> 3) Truncate data files.
> 4) Iterate past truncation point buffers and clean dirty and
> dirty_barrier flags from them (using numbers we saved to array on step
> #2).
> 5) Release relation lock.
> *) On exception happen after step #2, iterate past truncation point
> buffers and clean dirty_barrier flags from them (using numbers we
> saved to array on step #2)
>
> After heap truncation using this algorithm, shared buffers may contain
> past-OEF buffers.  But those buffers are empty (no used items) and
> clean.  So, real-only queries shouldn't hint those buffers dirty
> because there are no used items.  Normally, these buffers will be just
> evicted away from the shared buffer arena.  If relation extension will
> happen short after heap truncation then some of those buffers could be
> found after relation extension.  I think this situation could be
> handled.  For instance, we can teach vacuum to claim page as new once
> all the tuples were gone.
>
> We're taking only exclusive lock here.  And assuming we will teach our
> scans to treat page-past-OEF situation as no-visible-tuples-found,
> concurrent read-only queries will work concurrently with heap
> truncate.  Also we don't have to scan whole shared buffers, only past
> truncation point buffers are scanned at step #2.  Later flags are
> cleaned only from truncation point dirty buffers.  Data corruption on
> truncation error also shouldn't happen as well, because we don't
> forget to write any dirty buffers before insure that data files were
> successfully truncated.
>
> The problem I see with this approach so far is placing too many
> dirty_barrier flags can affect concurrent activity.  In order to cope
> that we may, for instance, truncate relation in multiple iterations
> when we find too many past truncation point dirty buffers.
>
> Any thoughts?

Given I've no feedback on this idea yet, I'll try to implement a PoC
patch for that.  It doesn't look to be difficult.  And we'll see how
does it work.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Flexible configuration for full-text search
Next
From: Peter Eisentraut
Date:
Subject: pg_verify_checksums -r option