On Sun, Mar 12, 2017 at 8:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Mar 11, 2017 at 12:20 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> /*
>>> + * Change the shared buffer state in critical section,
>>> + * otherwise any error could make it unrecoverable after
>>> + * recovery.
>>> + */
>>> + START_CRIT_SECTION();
>>> +
>>> + /*
>>> * Insert tuple on new page, using _hash_pgaddtup to ensure
>>> * correct ordering by hashkey. This is a tad inefficient
>>> * since we may have to shuffle itempointers repeatedly.
>>> * Possible future improvement: accumulate all the items for
>>> * the new page and qsort them before insertion.
>>> */
>>> (void) _hash_pgaddtup(rel, nbuf, itemsz, new_itup);
>>>
>>> + END_CRIT_SECTION();
>>>
>>> No way. You have to start the critical section before making any page
>>> modifications and keep it alive until all changes have been logged.
>>>
>>
>> I think what we need to do here is to accumulate all the tuples that
>> need to be added to new bucket page till either that page has no more
>> space or there are no more tuples remaining in an old bucket. Then in
>> a critical section, add them to the page using _hash_pgaddmultitup and
>> log the entire new bucket page contents as is currently done in patch
>> log_split_page().
>
> I agree.
>
Okay, I have changed like that in the patch.
>> Now, here we can choose to log the individual
>> tuples as well instead of a complete page, however not sure if there
>> is any benefit for doing the same because XLogRecordAssemble() will
>> anyway remove the empty space from the page. Let me know if you have
>> something else in mind.
>
> Well, if you have two pages that are 75% full, and you move a third of
> the tuples from one of them into the other, it's going to be about
> four times more efficient to log only the moved tuples than the whole
> page.
>
I don't see how this could happen during split? I mean if you are
moving 25% tuples from old bucket page to a new bucket page which is
75% full, it will log the new bucket page only after it gets full.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com