Re: Index Skip Scan - Mailing list pgsql-hackers
From | Rafia Sabih |
---|---|
Subject | Re: Index Skip Scan |
Date | |
Msg-id | CA+FpmFeKh9BAhDTAf-ZWQ=pvVaRy+Lnahb_8=OOFqcVXJDBWWQ@mail.gmail.com Whole thread Raw |
In response to | Re: Index Skip Scan (Floris Van Nee <florisvannee@Optiver.com>) |
Responses |
Re: Index Skip Scan
|
List | pgsql-hackers |
On Sat, 1 Jun 2019 at 06:10, Floris Van Nee <florisvannee@optiver.com> wrote: > > Actually I'd like to add something to this. I think I've found a bug in the current implementation. Would someone be ableto check? > I am willing to give it a try. > Given a table definition of (market text, feedcode text, updated_at timestamptz, value float8) and an index on (market,feedcode, updated_at desc) (note that this table slightly deviates from what I described in my previous mail) andfilling it with data. > > > The following query uses an index skip scan and returns just 1 row (incorrect!) > > select distinct on (market, feedcode) market, feedcode > from streams.base_price > where market='TEST' > > The following query still uses the regular index scan and returns many more rows (correct) > select distinct on (market, feedcode) * > from streams.base_price > where market='TEST' > Aren't those two queries different? select distinct on (market, feedcode) market, feedcode vs select distinct on (market, feedcode)* Anyhow, it's just the difference in projection so doesn't matter much. I verified this scenario at my end and you are right, there is a bug. Here is my repeatable test case, create table t (market text, feedcode text, updated_at timestamptz, value float8) ; create index on t (market, feedcode, updated_at desc); insert into t values('TEST', 'abcdef', (select timestamp '2019-01-10 20:00:00' + random() * (timestamp '2014-01-20 20:00:00' - timestamp '2019-01-20 20:00:00') ), generate_series(1,100)*9.88); insert into t values('TEST', 'jsgfhdfjd', (select timestamp '2019-01-10 20:00:00' + random() * (timestamp '2014-01-20 20:00:00' - timestamp '2019-01-20 20:00:00') ), generate_series(1,100)*9.88); Now, without the patch, select distinct on (market, feedcode) market, feedcode from t where market='TEST'; market | feedcode --------+----------- TEST | abcdef TEST | jsgfhdfjd (2 rows) explain select distinct on (market, feedcode) market, feedcode from t where market='TEST'; QUERY PLAN ---------------------------------------------------------------- Unique (cost=12.20..13.21 rows=2 width=13) -> Sort (cost=12.20..12.70 rows=201 width=13) Sort Key: feedcode -> Seq Scan on t (cost=0.00..4.51 rows=201 width=13) Filter: (market = 'TEST'::text) (5 rows) And with the patch, select distinct on (market, feedcode) market, feedcode from t where market='TEST'; market | feedcode --------+---------- TEST | abcdef (1 row) explain select distinct on (market, feedcode) market, feedcode from t where market='TEST'; QUERY PLAN ------------------------------------------------------------------------------------------------ Index Only Scan using t_market_feedcode_updated_at_idx on t (cost=0.14..0.29 rows=2 width=13) Scan mode: Skip scan Index Cond: (market = 'TEST'::text) (3 rows) Notice that in the explain statement it shows correct number of rows to be skipped. -- Regards, Rafia Sabih
pgsql-hackers by date: