Re: Different plan chosen when in lateral subquery - Mailing list pgsql-performance

From Laurenz Albe
Subject Re: Different plan chosen when in lateral subquery
Date
Msg-id 1512548438.2691.27.camel@cybertec.at
Whole thread Raw
In response to Different plan chosen when in lateral subquery  (Alex Reece <awreece@gmail.com>)
List pgsql-performance
Alex Reece wrote:
> I get very different plan chosen when my query is in a lateral subquery vs standalone --
> it doesn't use a key when joining on a table, instead opting to do a hash join. Here is the query:
> 
>     select distinct on (sub.entity_id, sub.note_id, sub.series_id)
>            entity_id, note_id, series_id
>     from
>     (
>         select alloc.entity_id, alloc.note_id, alloc.series_id, alloc.amount, inv.name
>         from public.portfolio_allocations alloc
>         JOIN contributions contrib on contrib.id = alloc.note_id
>         JOIN investments inv on inv.id = contrib.investment_id
>         where entity_id = '\x5787f132f50f7b03002cf835' and 
>         alloc.allocated_on <= dates.date
>     ) sub
> 
> And wrapped inside the lateral:
> 
>         explain analyze
>         select *
>         from generate_series('2017-03-14 20:59:59.999'::TIMESTAMPTZ,            current_timestamp::TIMESTAMP +
INTERVAL'1 day', '24 hours') dates,
 
>         LATERAL (
>             ... <SUB QUERY HERE> ...
>         ) lat
> 
> Run by itself injecting a hard coded value for dates.date, I get the expected plan which uses a key index on
contributions:

[...]

>                      ->  Nested Loop  (cost=0.17..14.23 rows=2 width=52) (actual time=0.022..0.028 rows=2 loops=1)
>                            ->  Index Scan using portfolio_allocations_entity_id_allocated_on_idx on
portfolio_allocationsalloc  (cost=0.09..6.05 rows=2 width=39) (actual time=0.012..0.014 
 
>                                  Index Cond: ((entity_id = '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <=
'2017-03-1420:59:59.999+00'::timestamp with time zone))
 
>                            ->  Index Scan using contributions_id_accrue_from_idx on contributions contrib
(cost=0.08..4.09rows=1 width=26) (actual time=0.005..0.005 rows=1 loops=2)
 
>                                  Index Cond: (id = alloc.note_id)

[...]

> But run in the lateral, it doesn't use the index:

[...]

>                     ->  Hash Join  (cost=10775.83..20355.61 rows=5724 width=52) (actual time=1.657..5.980 rows=6713
loops=267)
>                            Hash Cond: (alloc.note_id = contrib.id)
>                            ->  Bitmap Heap Scan on portfolio_allocations alloc  (cost=69.82..9628.13 rows=5724
width=39)(actual time=1.010..2.278 rows=6713 loops=267)
 
>                                  Recheck Cond: ((entity_id = '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on
<=date(dates.dates)))
 
>                                  Heap Blocks: exact=118074
>                                  ->  Bitmap Index Scan on portfolio_allocations_entity_id_allocated_on_idx
(cost=0.00..69.53rows=5724 width=0) (actual time=0.956..0.956 rows=6713 lo
 
>                                        Index Cond: ((entity_id = '\x5787f132f50f7b03002cf835'::bytea) AND
(allocated_on<= date(dates.dates)))
 
>                            ->  Hash  (cost=9464.85..9464.85 rows=354617 width=26) (actual time=169.792..169.792
rows=354617loops=1)
 
>                                  Buckets: 524288  Batches: 1  Memory Usage: 24296kB
>                                  ->  Seq Scan on contributions contrib  (cost=0.00..9464.85 rows=354617 width=26)
(actualtime=0.007..83.246 rows=354617 loops=1)
 

[...]

> I have a few questions here:
>   - Why doesn't it use the primary key index in either case?

I don't know about the first query; perhaps the primary key index is fragmented.
Compare the size of the indexes on disk.
In the second query a sequential scan is used because PostgreSQL chooses a hash join.
That choice is made because the index scans returns 6713 rows rather than the 2
from the first query, probably because the date is different.

>   - Why isn't it choosing portfolio_allocations_pnsa, which seems like it would prevent it from having to sort?

In a bitmap index scan, the table is scanned in physical order, so the result
is not sorted in index order.
I don't know if PostgreSQL is smart enough to figure out that it could use an index
scan and preserve the order through the joins to obviate the sort.
You could try to set enable_bitmapscan=off and see if things are different then.
Perhaps the slower index scan would outweigh the advantage of avoiding the sort.

Yours,
Laurenz Albe


pgsql-performance by date:

Previous
From: Sergei Kornilov
Date:
Subject: Re: insert and query performance on big string table with pg_trgm
Next
From: Justin Pryzby
Date:
Subject: Re: Bitmap scan is undercosted? - overestimated correlation andcost_index