Re: Index Skip Scan - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: Index Skip Scan
Date
Msg-id CA+q6zcU=8OEKw8bc2DHsf7xSFqaT7gOjc-_nbX8FO1N4eizQHg@mail.gmail.com
Whole thread Raw
In response to Re: Index Skip Scan  (Floris Van Nee <florisvannee@Optiver.com>)
Responses Re: Index Skip Scan
List pgsql-hackers
> On Sat, Jun 1, 2019 at 6:10 AM Floris Van Nee <florisvannee@optiver.com> wrote:
>
> After some talks with Jesper at PGCon about the Index Skip Scan, I started
> testing this patch, because it seems to have great potential in speeding up
> many of our queries (great conference by the way, really enjoyed my first
> time being there!). I haven't looked in depth to the code itself, but I
> focused on some testing with real data that we have.

Thanks!

> Actually I'd like to add something to this. I think I've found a bug in the
> current implementation. Would someone be able to check?
>
> The following query uses an index skip scan and returns just 1 row (incorrect!)
>
> select distinct on (market, feedcode) market, feedcode
> from streams.base_price
> where market='TEST'
>
> The following query still uses the regular index scan and returns many more
> rows (correct)
> select distinct on (market, feedcode) *
> from streams.base_price
> where market='TEST'

Yes, good catch, I'll investigate. Looks like in the current implementation
something is not quite right, when we have this order of columns in an index
and where clause (e.g. in the examples above everything seems fine if we create
index over (feedcode, market) and not the other way around).

> As far as I can see, there are two main problems with that at the moment.
>
> 1) Only support for Index-Only scan at the moment, not for regular index
>    scans. This was already mentioned upthread and I can understand that it
>    was left out until now to constrain the scope of this. However, if we were
>    to support 'distinct on' + selecting columns that are not part of the
>    index we need a regular index scan instead of the index only scan.

Sure, it's something I hope we can tackle as the next step.

> select distinct feedcode from prices -- approx 10ms
>
> select distinct feedcode from prices where updated_at <= '1999-01-01 00:00' -- approx 200ms
>
> Both use the index skip scan, but the first one is very fast, because it can
> skip large parts of the index. The second one scans the full index, because
> it never finds any row that matches the where condition so it can never skip
> anything.

Interesting, I'll take a closer look.



pgsql-hackers by date:

Previous
From: Rafia Sabih
Date:
Subject: Re: Index Skip Scan
Next
From: Chapman Flack
Date:
Subject: Re: PostgreSQL vs SQL/XML Standards