Re: Faster inserts with mostly-monotonically increasing values - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Faster inserts with mostly-monotonically increasing values
Date
Msg-id CAA8=A78bTsWX1Agm7UB6tSsVbbU-_towWdxggLb0JzqpHubp5g@mail.gmail.com
Whole thread Raw
In response to Re: Faster inserts with mostly-monotonically increasing values  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: Faster inserts with mostly-monotonically increasing values  (Pavan Deolasee <pavan.deolasee@gmail.com>)
List pgsql-hackers
On Fri, Mar 23, 2018 at 10:20 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Thu, Mar 22, 2018 at 3:29 AM, Pavan Deolasee
> <pavan.deolasee@gmail.com> wrote:
>>
>> On Thu, Mar 22, 2018 at 7:22 AM, Claudio Freire <klaussfreire@gmail.com>
>> wrote:
>>>
>>>
>>> If you're not planning on making any further changes, would you mind
>>> posting a coalesced patch?
>>>
>>> Notice that I removed the last offset thing only halfway, so it would
>>> need some cleanup.
>>
>>
>> Here is an updated patch. I removed the last offset caching thing completely
>> and integrated your changes for conditional lock access. Some other
>> improvements to test cases  too. I realised that we must truncate and
>> re-insert to test index fastpath correctly.


This patch looks in pretty good shape. I have been trying hard to
think of some failure mode but haven't been able to come up with one.

We can put off for another day consideration of if/when e can skip the
check for serialization conflict.

> Some comments
>
> +    /*
> +     * It's very common to have an index on an auto-incremented or
> +     * monotonically increasing value. In such cases, every insertion happens
> +     * towards the end of the index. We try to optimise that case by caching
> +     * the right-most block of the index. If our cached block is still the
> +     * rightmost block, has enough free space to accommodate a new entry and
> +     * the insertion key is greater or equal to the first key in this page,
> +     * then we can safely conclude that the new key will be inserted in the
> +     * cached block. So we simply search within the cached page and insert the
> +     * key at the appropriate location. We call it a fastpath.
>
> It should say "the insertion key is strictly greater than the first key"


right.

>
> Equal cases cannot be accepted since it would break uniqueness checks,
> so the comment should say that. The code already is correctly checking
> for strict inequality, it's just the comment that is out of sync.
>
> You might accept equal keys for non-unique indexes, but I don't
> believe making that distinction in the code is worth the hassle.
>
> Also, "rightmost block" != "rightmost leaf page" ("leaf" being the key
> difference). So it should say "rightmost leaf page".


right.

[...]
>
> Setting "offset = InvalidOffsetNumber" in that contorted way is
> unnecessary. You can remove the first assignment and instead
> initialize unconditionally right after the fastpath block (that
> doesn't make use of offset anyway):


Yes, I noticed that and it's confusing, Just set it at the top.



>
> In indexing.out:
>
> +explain select sum(a) from fastpath where a = 6456;
> +                                     QUERY PLAN
> +------------------------------------------------------------------------------------
> + Aggregate  (cost=4.31..4.32 rows=1 width=8)
> +   ->  Index Only Scan using fpindex1 on fastpath  (cost=0.29..4.30
> rows=1 width=4)
> +         Index Cond: (a = 6456)
> +(3 rows)
>
> Having costs in explain tests can be fragile. Better use "explain
> (costs off)". If you run "make check" continuously in a loop, you
> should get some failures related to that pretty quickly.
>



Agree about costs off, but I'm fairly dubious of the value of using
EXPLAIN at all here.Nothing in the patch should result in any change
in EXPLAIN output.

I would probably just have a few regression lines that should be sure
to exercise the code path and leave it at that.


cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [PoC PATCH] Parallel dump to /dev/null
Next
From: Pavel Stehule
Date:
Subject: Re: Re: csv format for psql