Home > mailing lists

Re: Index Skip Scan - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Index Skip Scan
Date	February 8, 2020 16:59:37
Msg-id	20200208165937.mtnpmawawfzgm4sb@development Whole thread Raw
In response to	Re: Index Skip Scan (Dmitry Dolgov <9erthalion6@gmail.com>)
List	pgsql-hackers

Tree view

On Sat, Feb 08, 2020 at 04:24:40PM +0100, Dmitry Dolgov wrote:
>> On Sat, Feb 08, 2020 at 03:22:17PM +0100, Tomas Vondra wrote:
>>
>> I've done some testing and benchmarking of the v31 patch, looking for
>> regressions, costing issues etc. Essentially, I've ran a bunch of SELECT
>> DISTINCT queries on data sets of various size, number of distinct values
>> etc. The results are fairly large, so I've uploaded them to github
>>
>>     https://github.com/tvondra/skip-scan-test
>
>Thanks a lot for testing!
>
>> There are a couple of regressions, where the plan with skipscan enables
>> is ~10x slower. But this seems to happen only in high-cardinality cases
>> where we misestimate the number of groups. Consider a table with two
>> independent columns
>>
>>   CREATE TABLE t (a text, b text);
>>   INSERT INTO t SELECT
>>     md5((10000*random())::int::text),
>>     md5((10000*random())::int::text)
>>   FROM generate_series(1,1000000) s(i);
>>
>>   CREATE INDEX ON t(a,b);
>>
>>   ANALYZE;
>>
>> which then behaves like this:
>>
>>   test=# select * from (select distinct a,b from t) foo offset 10000000;
>>   Time: 3138.222 ms (00:03.138)
>>   test=# set enable_indexskipscan = off;
>>   Time: 0.312 ms
>>   test=# select * from (select distinct a,b from t) foo offset 10000000;
>>   Time: 199.749 ms
>>
>> So in this case the skip scan is ~15x slower than the usual plan (index
>> only scan + unique). The reason why this happens is pretty simple - to
>> estimate the number of groups we multiply the ndistinct estimates for
>> the two columns (which both have n_distinct = 10000), but then we cap
>> the estimate to 10% of the table. But when the columns are independent
>> with high cardinalities that under-estimates the actual value, making
>> the cost for skip scan much lower than it should be.
>
>The current implementation checks if we can find the next value on the
>same page to do a shortcut instead of tree traversal and improve such
>kind of situations, but I can easily imagine that it's still not enough
>in some extreme situations.
>

Yeah. I'm not sure there's room for further improvements. The regressed
cases were subject to the 10% cap, and with ndistinct being more than
10% of the table, we probably can find many distinct keys on each index
page - we know that every ~10 rows the values change.

>> I don't think this is an issue the skipscan patch needs to fix, though.
>> Firstly, the regressed cases are a tiny minority. Secondly, we already
>> have a way to improve the root cause - creating extended stats with
>> ndistinct coefficients generally makes the problem go away.
>
>Yes, I agree.
>
>> One interesting observation however is that this regression only
>> happened with text columns but not with int or bigint. My assumption is
>> that this is due to text comparisons being much more expensive. Not sure
>> if there is something we could do to deal with this - reduce the number
>> of comparisons or something?
>
>Hm, interesting. I need to check that we do not do any unnecessary
>comparisons.
>
>> On Sat, Feb 08, 2020 at 02:11:59PM +0100, Tomas Vondra wrote:
>> > Yes, I've mentioned that already in one of the previous emails :) The
>> > simplest way I see to achieve what we want is to do something like in
>> > attached modified version with a new hasDeclaredCursor field. It's not a
>> > final version though, but posted just for discussion, so feel free to
>> > suggest any improvements or alternatives.
>>
>> IMO the proper fix for this case (moving forward, reading backwards) is
>> simply making it work by properly checking deleted tuples etc. Not sure
>> why that would be so much complex (haven't tried implementing it)?
>
>It's probably not that complex by itself, but requires changing
>responsibilities isolation. At the moment current implementation leaves
>jumping over a tree fully to _bt_skip, and heap visibility checks only
>to IndexOnlyNext. To check deleted tuples properly we need to either
>verify a corresponding heap tuple visibility inside _bt_skip (as I've
>mentioned in one of the previous emails, checking if an index tuple is
>dead is not enough), or teach the code in IndexOnlyNext to understand
>that _bt_skip can lead to returning the same tuple while moving forward
>& reading backward. Do you think it's still makes sense to go this way?

Not sure. I have to think about this first.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Tom Lane
Date: 08 February 2020, 16:57:09
Subject: Re: ALTER TABLE rewrite to use clustered order

From: Tom Lane
Date: 08 February 2020, 17:07:57
Subject: Re: Internal key management system

Re: Index Skip Scan - Mailing list pgsql-hackers

Previous

Next