Re: Index use with left join - Mailing list pgsql-general

From Julian Scarfe
Subject Re: Index use with left join
Date
Msg-id 001101c53c6a$a43035b0$0600a8c0@Wilbur
Whole thread Raw
In response to Index use with left join  ("Julian Scarfe" <julian@avbrief.com>)
Responses Re: Index use with left join  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
From: "Tom Lane" <tgl@sss.pgh.pa.us>

> The problem is that it's underestimating the number of rows pulled from
> the n table (1 vs actual 150), which makes a simple nestloop join look
> like the way to go.  That error comes from the fact that we don't really
> have any statistical estimation for geometric conditions :-(.  Some of
> the PostGIS hackers have been working on such, I believe, but I'm not
> sure how far they've gotten.

Thanks Tom.

I can see the poor estimation of the geometric filter is a significant
factor.

Reviewing it I wondered if the coarse/fine nature of the filters was also an
issue.  Query 4, with the filter on the index, selects 396 of the ~20,000
rows of the table (having estimated 22).  Query 5, with the filter requiring
a Seq scan, selects 150 of the ~20,000 rows of the table (having estimated
5), all of which were returned by Query 4.  Does the planner "realise" that
the intersection, Query 6, will still return 150 rows, or does it assume
independence of the filters in some way and estimate
20,000*(150/20,000)*(396/20,000)?

I guess I can test that by trying it with a non-geometric column and a btree
index:

7) Filtered requiring a sequntial scan

explain analyze
select  n.ref, n.code, a.ident, a.name
        from n left outer join a on (a.ident = n.code)
        where code ~~ 'EGT%';
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=10361.53..10441.90 rows=419 width=45) (actual
time=731.175..732.615 rows=248 loops=1)
   Merge Cond: ("outer".code = "inner"."?column3?")
   ->  Sort  (cost=9078.36..9078.49 rows=53 width=20) (actual
time=547.154..547.300 rows=248 loops=1)
         Sort Key: n.code
         ->  Seq Scan on n  (cost=0.00..9076.84 rows=53 width=20) (actual
time=260.558..543.587 rows=248 loops=1)
               Filter: (code ~~ 'EGT%'::text)
   ->  Sort  (cost=1283.17..1308.44 rows=10105 width=25) (actual
time=180.359..181.149 rows=1292 loops=1)
         Sort Key: (a.ident)::text
         ->  Seq Scan on a  (cost=0.00..611.05 rows=10105 width=25) (actual
time=0.125..83.844 rows=10105 loops=1)
 Total runtime: 735.613 ms

8) Filtered using the index, but returning a subset of the 419 rows from
Query 7

explain analyze
select  n.ref, n.code, a.ident, a.name
        from n left outer join a on (a.ident = n.code)
        where code = 'EGTT';
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=1283.17..1572.15 rows=411 width=45) (actual
time=451.609..510.507 rows=226 loops=1)
   Merge Cond: ("outer".code = "inner"."?column3?")
   ->  Index Scan using n_pkey on n  (cost=0.00..208.82 rows=52 width=20)
(actual time=17.301..73.840 rows=226 loops=1)
         Index Cond: (code = 'EGTT'::text)
   ->  Sort  (cost=1283.17..1308.44 rows=10105 width=25) (actual
time=430.231..431.032 rows=1279 loops=1)
         Sort Key: (a.ident)::text
         ->  Seq Scan on a  (cost=0.00..611.05 rows=10105 width=25) (actual
time=5.743..321.326 rows=10105 loops=1)
 Total runtime: 514.103 ms


9) Filtered on both

explain analyze
select  n.ref, n.code, a.ident, a.name
        from n left outer join a on (a.ident = n.code)
        where code ~~ 'EGT%' and code = 'EGTT';
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..971.58 rows=8 width=45) (actual
time=53.634..12222.285 rows=226 loops=1)
   Join Filter: (("inner".ident)::text = "outer".code)
   ->  Index Scan using n_pkey on n  (cost=0.00..208.95 rows=1 width=20)
(actual time=0.288..21.137 rows=226 loops=1)
         Index Cond: (code = 'EGTT'::text)
         Filter: (code ~~ 'EGT%'::text)
   ->  Seq Scan on a  (cost=0.00..611.05 rows=10105 width=25) (actual
time=0.008..18.741 rows=10105 loops=226)
 Total runtime: 12223.328 ms


Similar problem.  Of course Query 9 is concocted and unrealistic, but it is
representative of the coarse/fine filter problem, where I select a set of
rows using an approximate filter (e.g. bounding box for the geometrical
case) with an index and then use a second, exact but computationally
expensive filter to keep only those rows that I really want.

Any general suggestions for workarounds?

Julian



pgsql-general by date:

Previous
From: Renato Cramer
Date:
Subject: Data Mart with PostgreSQL
Next
From: "Steve - DND"
Date:
Subject: Re: Can't install plpython on Windows 8.0