Re: Index Skip Scan - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Index Skip Scan |
Date | |
Msg-id | 20200208165937.mtnpmawawfzgm4sb@development Whole thread Raw |
In response to | Re: Index Skip Scan (Dmitry Dolgov <9erthalion6@gmail.com>) |
List | pgsql-hackers |
On Sat, Feb 08, 2020 at 04:24:40PM +0100, Dmitry Dolgov wrote: >> On Sat, Feb 08, 2020 at 03:22:17PM +0100, Tomas Vondra wrote: >> >> I've done some testing and benchmarking of the v31 patch, looking for >> regressions, costing issues etc. Essentially, I've ran a bunch of SELECT >> DISTINCT queries on data sets of various size, number of distinct values >> etc. The results are fairly large, so I've uploaded them to github >> >> https://github.com/tvondra/skip-scan-test > >Thanks a lot for testing! > >> There are a couple of regressions, where the plan with skipscan enables >> is ~10x slower. But this seems to happen only in high-cardinality cases >> where we misestimate the number of groups. Consider a table with two >> independent columns >> >> CREATE TABLE t (a text, b text); >> INSERT INTO t SELECT >> md5((10000*random())::int::text), >> md5((10000*random())::int::text) >> FROM generate_series(1,1000000) s(i); >> >> CREATE INDEX ON t(a,b); >> >> ANALYZE; >> >> which then behaves like this: >> >> test=# select * from (select distinct a,b from t) foo offset 10000000; >> Time: 3138.222 ms (00:03.138) >> test=# set enable_indexskipscan = off; >> Time: 0.312 ms >> test=# select * from (select distinct a,b from t) foo offset 10000000; >> Time: 199.749 ms >> >> So in this case the skip scan is ~15x slower than the usual plan (index >> only scan + unique). The reason why this happens is pretty simple - to >> estimate the number of groups we multiply the ndistinct estimates for >> the two columns (which both have n_distinct = 10000), but then we cap >> the estimate to 10% of the table. But when the columns are independent >> with high cardinalities that under-estimates the actual value, making >> the cost for skip scan much lower than it should be. > >The current implementation checks if we can find the next value on the >same page to do a shortcut instead of tree traversal and improve such >kind of situations, but I can easily imagine that it's still not enough >in some extreme situations. > Yeah. I'm not sure there's room for further improvements. The regressed cases were subject to the 10% cap, and with ndistinct being more than 10% of the table, we probably can find many distinct keys on each index page - we know that every ~10 rows the values change. >> I don't think this is an issue the skipscan patch needs to fix, though. >> Firstly, the regressed cases are a tiny minority. Secondly, we already >> have a way to improve the root cause - creating extended stats with >> ndistinct coefficients generally makes the problem go away. > >Yes, I agree. > >> One interesting observation however is that this regression only >> happened with text columns but not with int or bigint. My assumption is >> that this is due to text comparisons being much more expensive. Not sure >> if there is something we could do to deal with this - reduce the number >> of comparisons or something? > >Hm, interesting. I need to check that we do not do any unnecessary >comparisons. > >> On Sat, Feb 08, 2020 at 02:11:59PM +0100, Tomas Vondra wrote: >> > Yes, I've mentioned that already in one of the previous emails :) The >> > simplest way I see to achieve what we want is to do something like in >> > attached modified version with a new hasDeclaredCursor field. It's not a >> > final version though, but posted just for discussion, so feel free to >> > suggest any improvements or alternatives. >> >> IMO the proper fix for this case (moving forward, reading backwards) is >> simply making it work by properly checking deleted tuples etc. Not sure >> why that would be so much complex (haven't tried implementing it)? > >It's probably not that complex by itself, but requires changing >responsibilities isolation. At the moment current implementation leaves >jumping over a tree fully to _bt_skip, and heap visibility checks only >to IndexOnlyNext. To check deleted tuples properly we need to either >verify a corresponding heap tuple visibility inside _bt_skip (as I've >mentioned in one of the previous emails, checking if an index tuple is >dead is not enough), or teach the code in IndexOnlyNext to understand >that _bt_skip can lead to returning the same tuple while moving forward >& reading backward. Do you think it's still makes sense to go this way? Not sure. I have to think about this first. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: