Re: Index Skip Scan - Mailing list pgsql-hackers

From Rafia Sabih
Subject Re: Index Skip Scan
Date
Msg-id CA+FpmFeKh9BAhDTAf-ZWQ=pvVaRy+Lnahb_8=OOFqcVXJDBWWQ@mail.gmail.com
Whole thread Raw
In response to Re: Index Skip Scan  (Floris Van Nee <florisvannee@Optiver.com>)
Responses Re: Index Skip Scan
List pgsql-hackers
On Sat, 1 Jun 2019 at 06:10, Floris Van Nee <florisvannee@optiver.com> wrote:
>
> Actually I'd like to add something to this. I think I've found a bug in the current implementation. Would someone be
ableto check? 
>
I am willing to give it a try.
> Given a table definition of (market text, feedcode text, updated_at timestamptz, value float8) and an index on
(market,feedcode, updated_at desc) (note that this table slightly deviates from what I described in my previous mail)
andfilling it with data. 
>
>
> The following query uses an index skip scan and returns just 1 row (incorrect!)
>
> select distinct on (market, feedcode) market, feedcode
> from streams.base_price
> where market='TEST'
>
> The following query still uses the regular index scan and returns many more rows (correct)
> select distinct on (market, feedcode) *
> from streams.base_price
> where market='TEST'
>
Aren't those two queries different?
select distinct on (market, feedcode) market, feedcode vs select
distinct on (market, feedcode)*
Anyhow, it's just the difference in projection so doesn't matter much.
I verified this scenario at my end and you are right, there is a bug.
Here is my repeatable test case,

create table t (market text, feedcode text, updated_at timestamptz,
value float8) ;
create index on t (market, feedcode, updated_at desc);
insert into t values('TEST', 'abcdef', (select timestamp '2019-01-10
20:00:00' + random() * (timestamp '2014-01-20 20:00:00' - timestamp
'2019-01-20 20:00:00') ), generate_series(1,100)*9.88);
insert into t values('TEST', 'jsgfhdfjd', (select timestamp
'2019-01-10 20:00:00' + random() * (timestamp '2014-01-20 20:00:00' -
timestamp '2019-01-20 20:00:00') ), generate_series(1,100)*9.88);

Now, without the patch,
select distinct on (market, feedcode) market, feedcode from t  where
market='TEST';
 market | feedcode
--------+-----------
 TEST   | abcdef
 TEST   | jsgfhdfjd
(2 rows)
explain select distinct on (market, feedcode) market, feedcode from t
where market='TEST';
                           QUERY PLAN
----------------------------------------------------------------
 Unique  (cost=12.20..13.21 rows=2 width=13)
   ->  Sort  (cost=12.20..12.70 rows=201 width=13)
         Sort Key: feedcode
         ->  Seq Scan on t  (cost=0.00..4.51 rows=201 width=13)
               Filter: (market = 'TEST'::text)
(5 rows)

And with the patch,
select distinct on (market, feedcode) market, feedcode from t   where
market='TEST';
 market | feedcode
--------+----------
 TEST   | abcdef
(1 row)

explain select distinct on (market, feedcode) market, feedcode from t
where market='TEST';
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Index Only Scan using t_market_feedcode_updated_at_idx on t
(cost=0.14..0.29 rows=2 width=13)
   Scan mode: Skip scan
   Index Cond: (market = 'TEST'::text)
(3 rows)

Notice that in the explain statement it shows correct number of rows
to be skipped.

--
Regards,
Rafia Sabih



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: coverage additions
Next
From: Dmitry Dolgov
Date:
Subject: Re: Index Skip Scan