[GENERAL] Loose indexscan and partial indexes - Mailing list pgsql-general

From Peter J. Holzer
Subject [GENERAL] Loose indexscan and partial indexes
Date
Msg-id 20170210130206.GA8667@hjp.at
Whole thread Raw
Responses Re: [GENERAL] Loose indexscan and partial indexes  (Thomas Kellerer <spam_eater@gmx.net>)
List pgsql-general
I was almost through writing a bug report when I figured out what I was
doing wrong, so I'll post it here in the hope that it prevents someone
from tearing their hair out.

I was trying to use a loose index scan as described on
https://wiki.postgresql.org/wiki/Loose_indexscan on a column with just 3
distinct values in a table with about 20 million rows. To my surprise it
was not faster than a simple “select distinct”, which was doing a
sequential scan.

Here is the plan:

wds=> explain analyze WITH RECURSIVE t AS (
   SELECT MIN(periodizitaet) AS periodizitaet FROM facttable_imf_ifs
   UNION ALL
   SELECT (SELECT MIN(periodizitaet) FROM facttable_imf_ifs WHERE periodizitaet > t.periodizitaet)
   FROM t WHERE t.periodizitaet IS NOT NULL
   )
SELECT periodizitaet FROM t WHERE periodizitaet IS NOT NULL
;
                                                                                                 QUERY PLAN
                                                                                    

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on t  (cost=993270.41..993272.43 rows=100 width=32) (actual time=8073.953..8074.042 rows=3 loops=1)
   Filter: (periodizitaet IS NOT NULL)
   Rows Removed by Filter: 1
   CTE t
     ->  Recursive Union  (cost=993206.56..993270.41 rows=101 width=32) (actual time=8073.952..8074.039 rows=4 loops=1)
           ->  Aggregate  (cost=993206.56..993206.57 rows=1 width=2) (actual time=8073.948..8073.948 rows=1 loops=1)
                 ->  Seq Scan on facttable_imf_ifs facttable_imf_ifs_1  (cost=0.00..940457.05 rows=21099805 width=2)
(actualtime=0.258..3567.893 rows=20649561 loops=1) 
           ->  WorkTable Scan on t t_1  (cost=0.00..6.18 rows=10 width=32) (actual time=0.021..0.021 rows=1 loops=4)
                 Filter: (periodizitaet IS NOT NULL)
                 Rows Removed by Filter: 0
                 SubPlan 2
                   ->  Result  (cost=0.59..0.60 rows=1 width=0) (actual time=0.025..0.025 rows=1 loops=3)
                         InitPlan 1 (returns $2)
                           ->  Limit  (cost=0.44..0.59 rows=1 width=2) (actual time=0.024..0.024 rows=1 loops=3)
                                 ->  Index Only Scan using facttable_imf_ifs_periodizitaet_idx on facttable_imf_ifs
(cost=0.44..1059854.76rows=7033268 width=2) (actual time=0.022..0.022 rows=1 loops=3) 
                                       Index Cond: ((periodizitaet IS NOT NULL) AND (periodizitaet >
t_1.periodizitaet))
                                       Heap Fetches: 2
 Planning time: 9.261 ms
 Execution time: 8074.219 ms
(19 rows)

So it's doing a sequential scan on the initial select in the recursive
CTE, but using the index on the subsequent selects.

But why? If it uses the index on
    SELECT MIN(periodizitaet) FROM facttable_imf_ifs WHERE periodizitaet > 'x'
shouldn't it be able to use the same index on
    SELECT MIN(periodizitaet) FROM facttable_imf_ifs
?

I was playing around a bit with different values and not getting
anywhere, until it hit me:

facttable_imf_ifs_periodizitaet_idx is a partial index with a
“WHERE periodizitaet IS NOT NULL” clause. This is fine for a
“WHERE periodizitaet > 'x'” query, because that implies NOT NULL, but a
straight unadorned “SELECT MIN(periodizitaet)” could return NULL, so it
can't use the index.

Add “where periodizitaet is not null” to the initial query (I'm not
interested in null values, that index is partial for a reason) and all
is well:

wds=> explain analyze WITH RECURSIVE t AS (
   SELECT MIN(periodizitaet) AS periodizitaet FROM facttable_imf_ifs where periodizitaet is not null
   UNION ALL
   SELECT (SELECT MIN(periodizitaet) FROM facttable_imf_ifs WHERE periodizitaet > t.periodizitaet)
   FROM t WHERE t.periodizitaet IS NOT NULL
   )
SELECT periodizitaet FROM t WHERE periodizitaet IS NOT NULL
;
                                                                                                           QUERY PLAN
                                                                                                        

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on t  (cost=64.40..66.42 rows=100 width=32) (actual time=0.046..0.704 rows=3 loops=1)
   Filter: (periodizitaet IS NOT NULL)
   Rows Removed by Filter: 1
   CTE t
     ->  Recursive Union  (cost=0.51..64.40 rows=101 width=32) (actual time=0.043..0.698 rows=4 loops=1)
           ->  Result  (cost=0.51..0.52 rows=1 width=0) (actual time=0.042..0.042 rows=1 loops=1)
                 InitPlan 1 (returns $1)
                   ->  Limit  (cost=0.44..0.51 rows=1 width=2) (actual time=0.039..0.040 rows=1 loops=1)
                         ->  Index Only Scan using facttable_imf_ifs_periodizitaet_idx on facttable_imf_ifs
(cost=0.44..1516760.47rows=21080284 width=2) (actual time=0.038..0.038 rows=1 loops=1) 
                               Index Cond: ((periodizitaet IS NOT NULL) AND (periodizitaet IS NOT NULL))
                               Heap Fetches: 1
           ->  WorkTable Scan on t t_1  (cost=0.00..6.19 rows=10 width=32) (actual time=0.161..0.162 rows=1 loops=4)
                 Filter: (periodizitaet IS NOT NULL)
                 Rows Removed by Filter: 0
                 SubPlan 3
                   ->  Result  (cost=0.59..0.60 rows=1 width=0) (actual time=0.212..0.212 rows=1 loops=3)
                         InitPlan 2 (returns $3)
                           ->  Limit  (cost=0.44..0.59 rows=1 width=2) (actual time=0.211..0.211 rows=1 loops=3)
                                 ->  Index Only Scan using facttable_imf_ifs_periodizitaet_idx on facttable_imf_ifs
facttable_imf_ifs_1 (cost=0.44..1061729.65 rows=7026761 width=2) (actual time=0.208..0.208 rows=1 loops=3) 
                                       Index Cond: ((periodizitaet IS NOT NULL) AND (periodizitaet >
t_1.periodizitaet))
                                       Heap Fetches: 2
 Planning time: 8.883 ms
 Execution time: 0.801 ms
(23 rows)

800 times faster :-).

        hp

--
   _  | Peter J. Holzer    | A coding theorist is someone who doesn't
|_|_) |                    | think Alice is crazy.
| |   | hjp@hjp.at         | -- John Gordon
__/   | http://www.hjp.at/ |    http://downlode.org/Etext/alicebob.html

Attachment

pgsql-general by date:

Previous
From: Steven Winfield
Date:
Subject: Re: [GENERAL] Build exclusion constraints USING INDEX
Next
From: Thomas Kellerer
Date:
Subject: Re: [GENERAL] Loose indexscan and partial indexes