Re: [HACKERS] Write Ahead Logging for Hash Indexes - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] Write Ahead Logging for Hash Indexes
Date
Msg-id CAA4eK1+TuLSo0k9DCFi8U4xTxCe6trUHauxTDp06Ng6U=Bs=Cg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Write Ahead Logging for Hash Indexes  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sun, Mar 12, 2017 at 8:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Mar 11, 2017 at 12:20 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>                  /*
>>> +                 * Change the shared buffer state in critical section,
>>> +                 * otherwise any error could make it unrecoverable after
>>> +                 * recovery.
>>> +                 */
>>> +                START_CRIT_SECTION();
>>> +
>>> +                /*
>>>                   * Insert tuple on new page, using _hash_pgaddtup to ensure
>>>                   * correct ordering by hashkey.  This is a tad inefficient
>>>                   * since we may have to shuffle itempointers repeatedly.
>>>                   * Possible future improvement: accumulate all the items for
>>>                   * the new page and qsort them before insertion.
>>>                   */
>>>                  (void) _hash_pgaddtup(rel, nbuf, itemsz, new_itup);
>>>
>>> +                END_CRIT_SECTION();
>>>
>>> No way.  You have to start the critical section before making any page
>>> modifications and keep it alive until all changes have been logged.
>>>
>>
>> I think what we need to do here is to accumulate all the tuples that
>> need to be added to new bucket page till either that page has no more
>> space or there are no more tuples remaining in an old bucket.  Then in
>> a critical section, add them to the page using _hash_pgaddmultitup and
>> log the entire new bucket page contents as is currently done in patch
>> log_split_page().
>
> I agree.
>

Okay, I have changed like that in the patch.

>> Now, here we can choose to log the individual
>> tuples as well instead of a complete page, however not sure if there
>> is any benefit for doing the same because XLogRecordAssemble() will
>> anyway remove the empty space from the page.  Let me know if you have
>> something else in mind.
>
> Well, if you have two pages that are 75% full, and you move a third of
> the tuples from one of them into the other, it's going to be about
> four times more efficient to log only the moved tuples than the whole
> page.
>

I don't see how this could happen during split?  I mean if you are
moving 25% tuples from old bucket page to a new bucket page which is
75% full, it will log the new bucket page only after it gets full.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Write Ahead Logging for Hash Indexes
Next
From: Surafel Temesgen
Date:
Subject: Re: [HACKERS] New CORRESPONDING clause design