Re: WIP: Access method extendability - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: WIP: Access method extendability
Date
Msg-id 56FAB125.4020401@2ndquadrant.com
Whole thread Raw
In response to Re: WIP: Access method extendability  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: WIP: Access method extendability  (Teodor Sigaev <teodor@sigaev.ru>)
List pgsql-hackers
On 29/03/16 18:25, Alvaro Herrera wrote:
>> + /*-------------------------------------------------------------------------
>> >+  * API for construction of generic xlog records
>> >+  *
>> >+  * This API allows user to construct generic xlog records which describe
>> >+  * difference between pages in a generic way.  This is useful for
>> >+  * extensions which provide custom access methods because they can't
>> >+  * register their own WAL redo routines.
>> >+  *
>> >+  * Each record must be constructed by following these steps:
>> >+  * 1) GenericXLogStart(relation) - start construction of a generic xlog
>> >+  *      record for the given relation.
>> >+  * 2) GenericXLogRegister(buffer, isNew) - register one or more buffers
>> >+  *      for the record.  This function returns a copy of the page
>> >+  *      image where modifications can be performed.  The second argument
>> >+  *      indicates if the block is new (i.e. a full page image should be taken).
>> >+  * 3) Apply modification of page images obtained in the previous step.
>> >+  * 4) GenericXLogFinish() - finish construction of generic xlog record.
>> >+  *
>> >+  * The xlog record construction can be canceled at any step by calling
>> >+  * GenericXLogAbort().  All changes made to page images copies will be
>> >+  * discarded.
>> >+  *
>> >+  * Please, note the following points when constructing generic xlog records.
>> >+  * - No direct modifications of page images are allowed!  All modifications
>> >+  *     must be done in the copies returned by GenericXLogRegister().  In other
>> >+  *     words the code which makes generic xlog records must never call
>> >+  *     BufferGetPage().
>> >+  * - Registrations of buffers (step 2) and modifications of page images
>> >+  *     (step 3) can be mixed in any sequence.  The only restriction is that
>> >+  *     you can only modify page image after registration of corresponding
>> >+  *     buffer.
>> >+  * - After registration, the buffer also can be unregistered by calling
>> >+  *     GenericXLogUnregister(buffer).  In this case the changes made in
>> >+  *     that particular page image copy will be discarded.
>> >+  * - Generic xlog assumes that pages are using standard layout, i.e., all
>> >+  *     data between pd_lower and pd_upper will be discarded.
>> >+  * - Maximum number of buffers simultaneously registered for a generic xlog
>> >+  *     record is MAX_GENERIC_XLOG_PAGES.  An error will be thrown if this limit
>> >+  *     is exceeded.
>> >+  * - Since you modify copies of page images, GenericXLogStart() doesn't
>> >+  *     start a critical section.  Thus, you can do memory allocation, error
>> >+  *     throwing etc between GenericXLogStart() and GenericXLogFinish().
>> >+  *     The actual critical section is present inside GenericXLogFinish().
>> >+  * - GenericXLogFinish() takes care of marking buffers dirty and setting their
>> >+  *     LSNs.  You don't need to do this explicitly.
>> >+  * - For unlogged relations, everything works the same except there is no
>> >+  *     WAL record produced.  Thus, you typically don't need to do any explicit
>> >+  *     checks for unlogged relations.
>> >+  * - If registered buffer isn't new, generic xlog record contains delta
>> >+  *     between old and new page images.  This delta is produced by per byte
>> >+  *     comparison.  This current delta mechanism is not effective for data shifts
>> >+  *     inside the page and may be improved in the future.
>> >+  * - Generic xlog redo function will acquire exclusive locks on buffers
>> >+  *     in the same order they were registered.  After redo of all changes,
>> >+  *     the locks will be released in the same order.
>> >+  *
>> >+  *
>> >+  * Internally, delta between pages consists of set of fragments.  Each
>> >+  * fragment represents changes made in given region of page.  A fragment is
>> >+  * described as follows:
>> >+  *
>> >+  * - offset of page region (OffsetNumber)
>> >+  * - length of page region (OffsetNumber)
>> >+  * - data - the data to place into described region ('length' number of bytes)
>> >+  *
>> >+  * Unchanged regions of page are not represented in the delta.  As a result,
>> >+  * the delta can be more compact than full page image.  But if the unchanged region
>> >+  * of the page is less than fragment header (offset and length) the delta
>> >+  * would be bigger than the full page image. For this reason we break into fragments
>> >+  * only if the unchanged region is bigger than MATCH_THRESHOLD.
>> >+  *
>> >+  * The worst case for delta size is when we didn't find any unchanged region
>> >+  * in the page. Then size of delta would be size of page plus size of fragment
>> >+  * header.
>> >+  */
>> >+ #define FRAGMENT_HEADER_SIZE    (2 * sizeof(OffsetNumber))
>> >+ #define MATCH_THRESHOLD            FRAGMENT_HEADER_SIZE
>> >+ #define MAX_DELTA_SIZE            BLCKSZ + FRAGMENT_HEADER_SIZE
>

I incorporated your changes and did some additional refinements on top
of them still.

Attached is delta against v12, that should cause less issues when
merging for Teodor.

--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PROPOSAL] Client Log Output Filtering
Next
From: Alvaro Herrera
Date:
Subject: Re: [PROPOSAL] Client Log Output Filtering