Re: Faster inserts with mostly-monotonically increasing values - Mailing list pgsql-hackers

From Claudio Freire
Subject Re: Faster inserts with mostly-monotonically increasing values
Date
Msg-id CAGTBQpbZVR2HjdwpxroeocusL5+n=Hn0JbzJGbB8LPtGKxQpNA@mail.gmail.com
Whole thread Raw
In response to Re: Faster inserts with mostly-monotonically increasing values  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: Faster inserts with mostly-monotonically increasing values  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-hackers
On Wed, Mar 14, 2018 at 1:36 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
>
>
> On Sun, Mar 11, 2018 at 9:18 PM, Claudio Freire <klaussfreire@gmail.com>
> wrote:
>>
>> On Sun, Mar 11, 2018 at 2:27 AM, Pavan Deolasee
>>
>> >
>> > Yes, I will try that next - it seems like a good idea. So the idea would
>> > be:
>> > check if the block is still the rightmost block and the insertion-key is
>> > greater than the first key in the page. If those conditions are
>> > satisfied,
>> > then we do a regular binary search within the page to find the correct
>> > location. This might add an overhead of binary search when keys are
>> > strictly
>> > ordered and a single client is inserting the data. If that becomes a
>> > concern, we might be able to look for that special case too and optimise
>> > for
>> > it too.
>>
>> Yeah, pretty much that's the idea. Beware, if the new item doesn't
>> fall in the rightmost place, you still need to check for serialization
>> conflicts.
>
>
> So I've been toying with this idea since yesterday and I am quite puzzled
> with the results. See the attached patch which compares the insertion key
> with the last key inserted by this backend, if the cached block is still the
> rightmost block in the tree. I initially only compared with the first key in
> the page, but I tried this version because of the strange performance
> regression which I still have no answers.
>
> For a small number of clients, the patched version does better. But as the
> number of clients go up, the patched version significantly underperforms
> master. I roughly counted the number of times the fastpath is taken and I
> noticed that almost 98% inserts take the fastpath. I first thought that the
> "firstkey" location in the page might be becoming a hot-spot for concurrent
> processes and hence changed that to track the per-backend last offset and
> compare against that the next time. But that did not help much.

+            _bt_compare(rel, natts, itup_scankey, page,
+                        RelationGetLastOffset(rel)) >= 0)

Won't this access garbage if the last offset is stale and beyond the
current rightmost page's last offset?

I'd suggest simply using P_FIRSTDATAKEY after checking that the page
isn't empty (see _bt_binsrch).

On Wed, Mar 14, 2018 at 11:12 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
>> > Hmm. I can try that. It's quite puzzling though that slowing down things
>> > actually make them better.
>>
>> That's not what is happening though.
>>
>> The cache path is 1) try to lock cached block, 2) when got lock check
>> relevance, if stale 3) recheck from top
>>
>> The non-cached path is just 3) recheck from top
>>
>> The overall path length is longer in the cached case but provides
>> benefit if we can skip step 3 in high % of cases. The non-cached path
>> is more flexible because it locates the correct RHS block, even if it
>> changes dynamically between starting the recheck from top.
>>
>
> So as I noted in one of the previous emails, the revised patch still takes
> fast path in 98% cases. So it's not clear why the taking steps 1, 2 and 3 in
> just 2% cases should cause such dramatic slowdown.

Real-world workloads will probably take the slow path more often, so
it's probably worth keeping the failure path as contention-free as
possible.

Besides, even though it may be just 2% the times it lands there, it
could still block for a considerable amount of time for no benefit.

So I guess a conditional lock is not a bad idea.


pgsql-hackers by date:

Previous
From: Jeevan Chalke
Date:
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Next
From: Alvaro Herrera
Date:
Subject: Re: Failed to request an autovacuum work-item in silence