Re: [HACKERS] Write Ahead Logging for Hash Indexes - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] Write Ahead Logging for Hash Indexes
Date
Msg-id CAA4eK1Jx4_=pbkLXZgCbeUuOhCrnugk+oRKVhM-X4y-w9=bFHQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Write Ahead Logging for Hash Indexes  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Write Ahead Logging for Hash Indexes  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Mar 9, 2017 at 3:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 7, 2017 at 6:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> Great, thanks.  0001 looks good to me now, so committed.
>>>
>>> Committed 0002.
>>
>> Here are some initial review thoughts on 0003 based on a first read-through.
>
> More thoughts on the main patch:
>
>                  /*
> +                 * Change the shared buffer state in critical section,
> +                 * otherwise any error could make it unrecoverable after
> +                 * recovery.
> +                 */
> +                START_CRIT_SECTION();
> +
> +                /*
>                   * Insert tuple on new page, using _hash_pgaddtup to ensure
>                   * correct ordering by hashkey.  This is a tad inefficient
>                   * since we may have to shuffle itempointers repeatedly.
>                   * Possible future improvement: accumulate all the items for
>                   * the new page and qsort them before insertion.
>                   */
>                  (void) _hash_pgaddtup(rel, nbuf, itemsz, new_itup);
>
> +                END_CRIT_SECTION();
>
> No way.  You have to start the critical section before making any page
> modifications and keep it alive until all changes have been logged.
>

I think what we need to do here is to accumulate all the tuples that
need to be added to new bucket page till either that page has no more
space or there are no more tuples remaining in an old bucket.  Then in
a critical section, add them to the page using _hash_pgaddmultitup and
log the entire new bucket page contents as is currently done in patch
log_split_page().  Now, here we can choose to log the individual
tuples as well instead of a complete page, however not sure if there
is any benefit for doing the same because XLogRecordAssemble() will
anyway remove the empty space from the page.  Let me know if you have
something else in mind.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Eric Ridge
Date:
Subject: Re: [HACKERS] How to get the 'ctid' from a record type?
Next
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] New CORRESPONDING clause design