Home > mailing lists

Re: Index Skip Scan - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: Index Skip Scan
Date	July 2, 2019 09:00:06
Msg-id	CA+hUKGKo30N5VNuRWhDuMGVZ3hTcv4J5RGXe286GRJZLk_jBYQ@mail.gmail.com Whole thread Raw
In response to	Re: Index Skip Scan (Jesper Pedersen <jesper.pedersen@redhat.com>)
Responses	Re: Index Skip Scan
List	pgsql-hackers

Tree view

On Fri, Jun 21, 2019 at 1:20 AM Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:
> Attached is v20, since the last patch should have been v19.

I took this for a quick spin today.  The DISTINCT ON support is nice
and I think it will be very useful.  I've signed up to review it and
will have more to say later.  But today I had a couple of thoughts
after looking into how src/backend/optimizer/plan/planagg.c works and
wondering how to do some more skipping tricks with the existing
machinery.

1.  SELECT COUNT(DISTINCT i) FROM t could benefit from this.  (Or
AVG(DISTINCT ...) or any other aggregate).  Right now you get a seq
scan, with the sort/unique logic inside the Aggregate node.  If you
write SELECT COUNT(*) FROM (SELECT DISTINCT i FROM t) ss then you get
a skip scan that is much faster in good cases.  I suppose you could
have a process_distinct_aggregates() in planagg.c that recognises
queries of the right form and generates extra paths a bit like
build_minmax_path() does.  I think it's probably better to consider
that in the grouping planner proper instead.  I'm not sure.

2.  SELECT i, MIN(j) FROM t GROUP BY i could benefit from this if
you're allowed to go forwards.  Same for SELECT i, MAX(j) FROM t GROUP
BY i if you're allowed to go backwards.  Those queries are equivalent
to SELECT DISTINCT ON (i) i, j FROM t ORDER BY i [DESC], j [DESC]
(though as Floris noted, the backwards version gives the wrong answers
with v20).  That does seem like a much more specific thing applicable
only to MIN and MAX, and I think preprocess_minmax_aggregates() could
be taught to handle that sort of query, building an index only scan
path with skip scan in build_minmax_path().

-- 
Thomas Munro
https://enterprisedb.com

pgsql-hackers by date:

From: Julien Rouhaud
Date: 02 July 2019, 08:45:44
Subject: Re: Add parallelism and glibc dependent only options to reindexdb

From: Alexander Korotkov
Date: 02 July 2019, 09:16:14
Subject: Re: Support for jsonpath .datetime() method

Re: Index Skip Scan - Mailing list pgsql-hackers

Previous

Next