Re: [HACKERS] Write Ahead Logging for Hash Indexes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Write Ahead Logging for Hash Indexes
Date
Msg-id CA+TgmoYgQbhohoSHvMZrgAQWvyhxR2qGpKnid6wWbW4v6+hU1Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Write Ahead Logging for Hash Indexes  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] Write Ahead Logging for Hash Indexes  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Sat, Mar 11, 2017 at 12:20 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>                  /*
>> +                 * Change the shared buffer state in critical section,
>> +                 * otherwise any error could make it unrecoverable after
>> +                 * recovery.
>> +                 */
>> +                START_CRIT_SECTION();
>> +
>> +                /*
>>                   * Insert tuple on new page, using _hash_pgaddtup to ensure
>>                   * correct ordering by hashkey.  This is a tad inefficient
>>                   * since we may have to shuffle itempointers repeatedly.
>>                   * Possible future improvement: accumulate all the items for
>>                   * the new page and qsort them before insertion.
>>                   */
>>                  (void) _hash_pgaddtup(rel, nbuf, itemsz, new_itup);
>>
>> +                END_CRIT_SECTION();
>>
>> No way.  You have to start the critical section before making any page
>> modifications and keep it alive until all changes have been logged.
>>
>
> I think what we need to do here is to accumulate all the tuples that
> need to be added to new bucket page till either that page has no more
> space or there are no more tuples remaining in an old bucket.  Then in
> a critical section, add them to the page using _hash_pgaddmultitup and
> log the entire new bucket page contents as is currently done in patch
> log_split_page().

I agree.

> Now, here we can choose to log the individual
> tuples as well instead of a complete page, however not sure if there
> is any benefit for doing the same because XLogRecordAssemble() will
> anyway remove the empty space from the page.  Let me know if you have
> something else in mind.

Well, if you have two pages that are 75% full, and you move a third of
the tuples from one of them into the other, it's going to be about
four times more efficient to log only the moved tuples than the whole
page.  But logging the whole filled page wouldn't be wrong, just
somewhat inefficient.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] WIP: Faster Expression Processing v4
Next
From: "Mengxing Liu"
Date:
Subject: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflicttracking in serializable transactions