Thread: Different plan chosen when in lateral subquery

Different plan chosen when in lateral subquery

From
Alex Reece
Date:
I get very different plan chosen when my query is in a lateral subquery vs standalone -- it doesn't use a key when joining on a table, instead opting to do a hash join. Here is the query:

select distinct on (sub.entity_id, sub.note_id, sub.series_id)
       entity_id, note_id, series_id
from
(
select alloc.entity_id, alloc.note_id, alloc.series_id, alloc.amount, inv.name
from public.portfolio_allocations alloc
JOIN contributions contrib on contrib.id = alloc.note_id
JOIN investments inv on inv.id = contrib.investment_id
where entity_id = '\x5787f132f50f7b03002cf835' and 
alloc.allocated_on <= dates.date
) sub

And wrapped inside the lateral:

        explain analyze
        select *
        from generate_series('2017-03-14 20:59:59.999'::TIMESTAMPTZ,            current_timestamp::TIMESTAMP + INTERVAL '1 day', '24 hours') dates,
        LATERAL (
         ... <SUB QUERY HERE> ...
        ) lat

Run by itself injecting a hard coded value for dates.date, I get the expected plan which uses a key index on contributions:

      Unique  (cost=14.54..14.54 rows=2 width=39) (actual time=0.052..0.053 rows=2 loops=1)
         ->  Sort  (cost=14.54..14.54 rows=2 width=39) (actual time=0.052..0.052 rows=2 loops=1)
               Sort Key: alloc.note_id, alloc.series_id
               Sort Method: quicksort  Memory: 25kB
               ->  Nested Loop  (cost=0.25..14.53 rows=2 width=39) (actual time=0.030..0.042 rows=2      loops=1)
                     ->  Nested Loop  (cost=0.17..14.23 rows=2 width=52) (actual time=0.022..0.028       rows=2 loops=1)
                           ->  Index Scan using portfolio_allocations_entity_id_allocated_on_idx on      portfolio_allocations alloc  (cost=0.09..6.05 rows=2 width=39) (actual     time=0.012..0.014 
                                 Index Cond: ((entity_id = '\x5787f132f50f7b03002cf835'::bytea) AND      (allocated_on <= '2017-03-14 20:59:59.999+00'::timestamp with time   zone))
                           ->  Index Scan using contributions_id_accrue_from_idx on contributions     contrib  (cost=0.08..4.09 rows=1 width=26) (actual time=0.005..0.005    rows=1 loops=2)
                                 Index Cond: (id = alloc.note_id)
                     ->  Index Only Scan using investments_pkey on investments inv  (     cost=0.08..0.15 rows=1 width=13) (actual time=0.005..0.006 rows=1 loops=2)
                           Index Cond: (id = contrib.investment_id)
                           Heap Fetches: 2
       Planning time: 0.617 ms
       Execution time: 0.100 ms
      (15 rows)

But run in the lateral, it doesn't use the index:

       Nested Loop  (cost=14.54..24.55 rows=2000 width=47) (actual time=0.085..0.219 rows=534      loops=1)
         ->  Function Scan on generate_series dates  (cost=0.00..3.00 rows=1000 width=8) (actual      time=0.031..0.043 rows=267 loops=1)
         ->  Materialize  (cost=14.54..14.55 rows=2 width=39) (actual time=0.000..0.000 rows=2     loops=267)
               ->  Unique  (cost=14.54..14.54 rows=2 width=39) (actual time=0.052..0.053 rows=2       loops=1)
                     ->  Sort  (cost=14.54..14.54 rows=2 width=39) (actual time=0.051..0.052 rows=2      loops=1)
                           Sort Key: alloc.note_id, alloc.series_id
                           Sort Method: quicksort  Memory: 25kB
                           ->  Nested Loop  (cost=0.25..14.53 rows=2 width=39) (actual       time=0.029..0.041 rows=2 loops=1)
                                 ->  Nested Loop  (cost=0.17..14.23 rows=2 width=52) (actual       time=0.021..0.027 rows=2 loops=1)
                                       ->  Index Scan using       portfolio_allocations_entity_id_allocated_on_idx on      portfolio_allocations alloc  (cost=0.09..6.05 rows=2     width=39) (actual time=0
                                             Index Cond: ((entity_id =     '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on      <= '2017-03-14 20:59:59.999+00'::timestamp with time    zone))
                                       ->  Index Scan using contributions_id_accrue_from_idx on       contributions contrib  (cost=0.08..4.09 rows=1 width=26) (     actual time=0.005..0.005 rows=1 loo
                                             Index Cond: (id = alloc.note_id)
                                 ->  Index Only Scan using investments_pkey on investments inv  (     cost=0.08..0.15 rows=1 width=13) (actual time=0.005..0.006 rows=1    loops=2)
                                       Index Cond: (id = contrib.investment_id)
                                       Heap Fetches: 2
       Planning time: 0.718 ms
       Execution time: 0.296 ms
      (18 rows)

For reference, here are the indexes on the relevant tables:

Indexes:
    "portfolio_allocations_entity_id_allocated_on_idx" btree (entity_id, allocated_on DESC)
    "portfolio_allocations_note_id_allocated_on_idx" btree (note_id, allocated_on DESC)
    "portfolio_allocations_pnsa" btree (entity_id, note_id, series_id, allocated_on DESC)

Indexes:
    "contributions_pkey" PRIMARY KEY, btree (id)
    "contributions_id_accrue_from_idx" btree (id, events_earnings_accrue_from)

I have a few questions here:
  - Why doesn't it use the primary key index in either case?
  - Why isn't it choosing portfolio_allocations_pnsa, which seems like it would prevent it from having to sort?

Best,
~Alex

Re: Different plan chosen when in lateral subquery

From
Alex Reece
Date:
Weird, when I deleted an erroneous index it started picking a reasonable plan. This now works as expected, for posterity here is the bad plan:

 Nested Loop  (cost=21281.50..21323812.82 rows=5621000 width=47) (actual time=171.648..7233.298 rows=85615 loops=1)

   ->  Function Scan on generate_series dates  (cost=0.00..3.00 rows=1000 width=8) (actual time=0.031..0.252 rows=267 loops=1)

   ->  Unique  (cost=21281.50..21290.08 rows=5621 width=39) (actual time=25.730..27.050 rows=321 loops=267)

         ->  Sort  (cost=21281.50..21284.36 rows=5724 width=39) (actual time=25.728..26.242 rows=6713 loops=267)

               Sort Key: alloc.note_id, alloc.series_id

               Sort Method: quicksort  Memory: 2220kB

               ->  Nested Loop  (cost=10775.92..21210.05 rows=5724 width=39) (actual time=1.663..21.938 rows=6713 loops=267)

                     ->  Hash Join  (cost=10775.83..20355.61 rows=5724 width=52) (actual time=1.657..5.980 rows=6713 loops=267)

                           Hash Cond: (alloc.note_id = contrib.id)

                           ->  Bitmap Heap Scan on portfolio_allocations alloc  (cost=69.82..9628.13 rows=5724 width=39) (actual time=1.010..2.278 rows=6713 loops=267)

                                 Recheck Cond: ((entity_id = '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <= date(dates.dates)))

                                 Heap Blocks: exact=118074

                                 ->  Bitmap Index Scan on portfolio_allocations_entity_id_allocated_on_idx  (cost=0.00..69.53 rows=5724 width=0) (actual time=0.956..0.956 rows=6713 lo

                                       Index Cond: ((entity_id = '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <= date(dates.dates)))

                           ->  Hash  (cost=9464.85..9464.85 rows=354617 width=26) (actual time=169.792..169.792 rows=354617 loops=1)

                                 Buckets: 524288  Batches: 1  Memory Usage: 24296kB

                                 ->  Seq Scan on contributions contrib  (cost=0.00..9464.85 rows=354617 width=26) (actual time=0.007..83.246 rows=354617 loops=1)

                     ->  Index Only Scan using investments_pkey on investments inv  (cost=0.08..0.15 rows=1 width=13) (actual time=0.002..0.002 rows=1 loops=1792457)

                           Index Cond: (id = contrib.investment_id)

                           Heap Fetches: 1792457

 Planning time: 0.721 ms

 Execution time: 7236.507 ms

On Tue, Dec 5, 2017 at 10:04 AM Alex Reece <awreece@gmail.com> wrote:
I get very different plan chosen when my query is in a lateral subquery vs standalone -- it doesn't use a key when joining on a table, instead opting to do a hash join. Here is the query:

select distinct on (sub.entity_id, sub.note_id, sub.series_id)
       entity_id, note_id, series_id
from
(
select alloc.entity_id, alloc.note_id, alloc.series_id, alloc.amount, inv.name
from public.portfolio_allocations alloc
JOIN contributions contrib on contrib.id = alloc.note_id
JOIN investments inv on inv.id = contrib.investment_id
where entity_id = '\x5787f132f50f7b03002cf835' and 
alloc.allocated_on <= dates.date
) sub

And wrapped inside the lateral:

        explain analyze
        select *
        from generate_series('2017-03-14 20:59:59.999'::TIMESTAMPTZ,            current_timestamp::TIMESTAMP + INTERVAL '1 day', '24 hours') dates,
        LATERAL (
         ... <SUB QUERY HERE> ...
        ) lat

Run by itself injecting a hard coded value for dates.date, I get the expected plan which uses a key index on contributions:

      Unique  (cost=14.54..14.54 rows=2 width=39) (actual time=0.052..0.053 rows=2 loops=1)
         ->  Sort  (cost=14.54..14.54 rows=2 width=39) (actual time=0.052..0.052 rows=2 loops=1)
               Sort Key: alloc.note_id, alloc.series_id
               Sort Method: quicksort  Memory: 25kB
               ->  Nested Loop  (cost=0.25..14.53 rows=2 width=39) (actual time=0.030..0.042 rows=2      loops=1)
                     ->  Nested Loop  (cost=0.17..14.23 rows=2 width=52) (actual time=0.022..0.028       rows=2 loops=1)
                           ->  Index Scan using portfolio_allocations_entity_id_allocated_on_idx on      portfolio_allocations alloc  (cost=0.09..6.05 rows=2 width=39) (actual     time=0.012..0.014 
                                 Index Cond: ((entity_id = '\x5787f132f50f7b03002cf835'::bytea) AND      (allocated_on <= '2017-03-14 20:59:59.999+00'::timestamp with time   zone))
                           ->  Index Scan using contributions_id_accrue_from_idx on contributions     contrib  (cost=0.08..4.09 rows=1 width=26) (actual time=0.005..0.005    rows=1 loops=2)
                                 Index Cond: (id = alloc.note_id)
                     ->  Index Only Scan using investments_pkey on investments inv  (     cost=0.08..0.15 rows=1 width=13) (actual time=0.005..0.006 rows=1 loops=2)
                           Index Cond: (id = contrib.investment_id)
                           Heap Fetches: 2
       Planning time: 0.617 ms
       Execution time: 0.100 ms
      (15 rows)

But run in the lateral, it doesn't use the index:

       Nested Loop  (cost=14.54..24.55 rows=2000 width=47) (actual time=0.085..0.219 rows=534      loops=1)
         ->  Function Scan on generate_series dates  (cost=0.00..3.00 rows=1000 width=8) (actual      time=0.031..0.043 rows=267 loops=1)
         ->  Materialize  (cost=14.54..14.55 rows=2 width=39) (actual time=0.000..0.000 rows=2     loops=267)
               ->  Unique  (cost=14.54..14.54 rows=2 width=39) (actual time=0.052..0.053 rows=2       loops=1)
                     ->  Sort  (cost=14.54..14.54 rows=2 width=39) (actual time=0.051..0.052 rows=2      loops=1)
                           Sort Key: alloc.note_id, alloc.series_id
                           Sort Method: quicksort  Memory: 25kB
                           ->  Nested Loop  (cost=0.25..14.53 rows=2 width=39) (actual       time=0.029..0.041 rows=2 loops=1)
                                 ->  Nested Loop  (cost=0.17..14.23 rows=2 width=52) (actual       time=0.021..0.027 rows=2 loops=1)
                                       ->  Index Scan using       portfolio_allocations_entity_id_allocated_on_idx on      portfolio_allocations alloc  (cost=0.09..6.05 rows=2     width=39) (actual time=0
                                             Index Cond: ((entity_id =     '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on      <= '2017-03-14 20:59:59.999+00'::timestamp with time    zone))
                                       ->  Index Scan using contributions_id_accrue_from_idx on       contributions contrib  (cost=0.08..4.09 rows=1 width=26) (     actual time=0.005..0.005 rows=1 loo
                                             Index Cond: (id = alloc.note_id)
                                 ->  Index Only Scan using investments_pkey on investments inv  (     cost=0.08..0.15 rows=1 width=13) (actual time=0.005..0.006 rows=1    loops=2)
                                       Index Cond: (id = contrib.investment_id)
                                       Heap Fetches: 2
       Planning time: 0.718 ms
       Execution time: 0.296 ms
      (18 rows)

For reference, here are the indexes on the relevant tables:

Indexes:
    "portfolio_allocations_entity_id_allocated_on_idx" btree (entity_id, allocated_on DESC)
    "portfolio_allocations_note_id_allocated_on_idx" btree (note_id, allocated_on DESC)
    "portfolio_allocations_pnsa" btree (entity_id, note_id, series_id, allocated_on DESC)

Indexes:
    "contributions_pkey" PRIMARY KEY, btree (id)
    "contributions_id_accrue_from_idx" btree (id, events_earnings_accrue_from)

I have a few questions here:
  - Why doesn't it use the primary key index in either case?
  - Why isn't it choosing portfolio_allocations_pnsa, which seems like it would prevent it from having to sort?

Best,
~Alex

Re: Different plan chosen when in lateral subquery

From
Alex Reece
Date:
Argh, so sorry for repeated posts; I'll be very careful to review them before posting. The "good plan" was the result of me hard coding '2017-03-14 20:59:59.999+00'::timestamp of using dates.date inside the lateral subquery. When I correctly use dates.date, it takes 7000ms instead of 0.3ms. My questions still remain:

I have a few questions here:
  - Why doesn't it use the primary key on contributions in either case, preferring contributions_id_accrue_from_idx or none at all?
  - Why isn't it choosing portfolio_allocations_pnsa, which seems like it would prevent it from having to sort?
  - What information can I gather to answer these questions on my own?

~Alex

On Tue, Dec 5, 2017 at 10:08 AM Alex Reece <awreece@gmail.com> wrote:
Weird, when I deleted an erroneous index it started picking a reasonable plan. This now works as expected, for posterity here is the bad plan:

 Nested Loop  (cost=21281.50..21323812.82 rows=5621000 width=47) (actual time=171.648..7233.298 rows=85615 loops=1)

   ->  Function Scan on generate_series dates  (cost=0.00..3.00 rows=1000 width=8) (actual time=0.031..0.252 rows=267 loops=1)

   ->  Unique  (cost=21281.50..21290.08 rows=5621 width=39) (actual time=25.730..27.050 rows=321 loops=267)

         ->  Sort  (cost=21281.50..21284.36 rows=5724 width=39) (actual time=25.728..26.242 rows=6713 loops=267)

               Sort Key: alloc.note_id, alloc.series_id

               Sort Method: quicksort  Memory: 2220kB

               ->  Nested Loop  (cost=10775.92..21210.05 rows=5724 width=39) (actual time=1.663..21.938 rows=6713 loops=267)

                     ->  Hash Join  (cost=10775.83..20355.61 rows=5724 width=52) (actual time=1.657..5.980 rows=6713 loops=267)

                           Hash Cond: (alloc.note_id = contrib.id)

                           ->  Bitmap Heap Scan on portfolio_allocations alloc  (cost=69.82..9628.13 rows=5724 width=39) (actual time=1.010..2.278 rows=6713 loops=267)

                                 Recheck Cond: ((entity_id = '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <= date(dates.dates)))

                                 Heap Blocks: exact=118074

                                 ->  Bitmap Index Scan on portfolio_allocations_entity_id_allocated_on_idx  (cost=0.00..69.53 rows=5724 width=0) (actual time=0.956..0.956 rows=6713 lo

                                       Index Cond: ((entity_id = '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <= date(dates.dates)))

                           ->  Hash  (cost=9464.85..9464.85 rows=354617 width=26) (actual time=169.792..169.792 rows=354617 loops=1)

                                 Buckets: 524288  Batches: 1  Memory Usage: 24296kB

                                 ->  Seq Scan on contributions contrib  (cost=0.00..9464.85 rows=354617 width=26) (actual time=0.007..83.246 rows=354617 loops=1)

                     ->  Index Only Scan using investments_pkey on investments inv  (cost=0.08..0.15 rows=1 width=13) (actual time=0.002..0.002 rows=1 loops=1792457)

                           Index Cond: (id = contrib.investment_id)

                           Heap Fetches: 1792457

 Planning time: 0.721 ms

 Execution time: 7236.507 ms



On Tue, Dec 5, 2017 at 10:04 AM Alex Reece <awreece@gmail.com> wrote:
I get very different plan chosen when my query is in a lateral subquery vs standalone -- it doesn't use a key when joining on a table, instead opting to do a hash join. Here is the query:

select distinct on (sub.entity_id, sub.note_id, sub.series_id)
       entity_id, note_id, series_id
from
(
select alloc.entity_id, alloc.note_id, alloc.series_id, alloc.amount, inv.name
from public.portfolio_allocations alloc
JOIN contributions contrib on contrib.id = alloc.note_id
JOIN investments inv on inv.id = contrib.investment_id
where entity_id = '\x5787f132f50f7b03002cf835' and 
alloc.allocated_on <= dates.date
) sub

And wrapped inside the lateral:

        explain analyze
        select *
        from generate_series('2017-03-14 20:59:59.999'::TIMESTAMPTZ,            current_timestamp::TIMESTAMP + INTERVAL '1 day', '24 hours') dates,
        LATERAL (
         ... <SUB QUERY HERE> ...
        ) lat

Run by itself injecting a hard coded value for dates.date, I get the expected plan which uses a key index on contributions:

      Unique  (cost=14.54..14.54 rows=2 width=39) (actual time=0.052..0.053 rows=2 loops=1)
         ->  Sort  (cost=14.54..14.54 rows=2 width=39) (actual time=0.052..0.052 rows=2 loops=1)
               Sort Key: alloc.note_id, alloc.series_id
               Sort Method: quicksort  Memory: 25kB
               ->  Nested Loop  (cost=0.25..14.53 rows=2 width=39) (actual time=0.030..0.042 rows=2      loops=1)
                     ->  Nested Loop  (cost=0.17..14.23 rows=2 width=52) (actual time=0.022..0.028       rows=2 loops=1)
                           ->  Index Scan using portfolio_allocations_entity_id_allocated_on_idx on      portfolio_allocations alloc  (cost=0.09..6.05 rows=2 width=39) (actual     time=0.012..0.014 
                                 Index Cond: ((entity_id = '\x5787f132f50f7b03002cf835'::bytea) AND      (allocated_on <= '2017-03-14 20:59:59.999+00'::timestamp with time   zone))
                           ->  Index Scan using contributions_id_accrue_from_idx on contributions     contrib  (cost=0.08..4.09 rows=1 width=26) (actual time=0.005..0.005    rows=1 loops=2)
                                 Index Cond: (id = alloc.note_id)
                     ->  Index Only Scan using investments_pkey on investments inv  (     cost=0.08..0.15 rows=1 width=13) (actual time=0.005..0.006 rows=1 loops=2)
                           Index Cond: (id = contrib.investment_id)
                           Heap Fetches: 2
       Planning time: 0.617 ms
       Execution time: 0.100 ms
      (15 rows)

But run in the lateral, it doesn't use the index:

       Nested Loop  (cost=14.54..24.55 rows=2000 width=47) (actual time=0.085..0.219 rows=534      loops=1)
         ->  Function Scan on generate_series dates  (cost=0.00..3.00 rows=1000 width=8) (actual      time=0.031..0.043 rows=267 loops=1)
         ->  Materialize  (cost=14.54..14.55 rows=2 width=39) (actual time=0.000..0.000 rows=2     loops=267)
               ->  Unique  (cost=14.54..14.54 rows=2 width=39) (actual time=0.052..0.053 rows=2       loops=1)
                     ->  Sort  (cost=14.54..14.54 rows=2 width=39) (actual time=0.051..0.052 rows=2      loops=1)
                           Sort Key: alloc.note_id, alloc.series_id
                           Sort Method: quicksort  Memory: 25kB
                           ->  Nested Loop  (cost=0.25..14.53 rows=2 width=39) (actual       time=0.029..0.041 rows=2 loops=1)
                                 ->  Nested Loop  (cost=0.17..14.23 rows=2 width=52) (actual       time=0.021..0.027 rows=2 loops=1)
                                       ->  Index Scan using       portfolio_allocations_entity_id_allocated_on_idx on      portfolio_allocations alloc  (cost=0.09..6.05 rows=2     width=39) (actual time=0
                                             Index Cond: ((entity_id =     '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on      <= '2017-03-14 20:59:59.999+00'::timestamp with time    zone))
                                       ->  Index Scan using contributions_id_accrue_from_idx on       contributions contrib  (cost=0.08..4.09 rows=1 width=26) (     actual time=0.005..0.005 rows=1 loo
                                             Index Cond: (id = alloc.note_id)
                                 ->  Index Only Scan using investments_pkey on investments inv  (     cost=0.08..0.15 rows=1 width=13) (actual time=0.005..0.006 rows=1    loops=2)
                                       Index Cond: (id = contrib.investment_id)
                                       Heap Fetches: 2
       Planning time: 0.718 ms
       Execution time: 0.296 ms
      (18 rows)

For reference, here are the indexes on the relevant tables:

Indexes:
    "portfolio_allocations_entity_id_allocated_on_idx" btree (entity_id, allocated_on DESC)
    "portfolio_allocations_note_id_allocated_on_idx" btree (note_id, allocated_on DESC)
    "portfolio_allocations_pnsa" btree (entity_id, note_id, series_id, allocated_on DESC)

Indexes:
    "contributions_pkey" PRIMARY KEY, btree (id)
    "contributions_id_accrue_from_idx" btree (id, events_earnings_accrue_from)

I have a few questions here:
  - Why doesn't it use the primary key index in either case?
  - Why isn't it choosing portfolio_allocations_pnsa, which seems like it would prevent it from having to sort?

Best,
~Alex

Re: Different plan chosen when in lateral subquery

From
Laurenz Albe
Date:
Alex Reece wrote:
> I get very different plan chosen when my query is in a lateral subquery vs standalone --
> it doesn't use a key when joining on a table, instead opting to do a hash join. Here is the query:
> 
>     select distinct on (sub.entity_id, sub.note_id, sub.series_id)
>            entity_id, note_id, series_id
>     from
>     (
>         select alloc.entity_id, alloc.note_id, alloc.series_id, alloc.amount, inv.name
>         from public.portfolio_allocations alloc
>         JOIN contributions contrib on contrib.id = alloc.note_id
>         JOIN investments inv on inv.id = contrib.investment_id
>         where entity_id = '\x5787f132f50f7b03002cf835' and 
>         alloc.allocated_on <= dates.date
>     ) sub
> 
> And wrapped inside the lateral:
> 
>         explain analyze
>         select *
>         from generate_series('2017-03-14 20:59:59.999'::TIMESTAMPTZ,            current_timestamp::TIMESTAMP +
INTERVAL'1 day', '24 hours') dates,
 
>         LATERAL (
>             ... <SUB QUERY HERE> ...
>         ) lat
> 
> Run by itself injecting a hard coded value for dates.date, I get the expected plan which uses a key index on
contributions:

[...]

>                      ->  Nested Loop  (cost=0.17..14.23 rows=2 width=52) (actual time=0.022..0.028 rows=2 loops=1)
>                            ->  Index Scan using portfolio_allocations_entity_id_allocated_on_idx on
portfolio_allocationsalloc  (cost=0.09..6.05 rows=2 width=39) (actual time=0.012..0.014 
 
>                                  Index Cond: ((entity_id = '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on <=
'2017-03-1420:59:59.999+00'::timestamp with time zone))
 
>                            ->  Index Scan using contributions_id_accrue_from_idx on contributions contrib
(cost=0.08..4.09rows=1 width=26) (actual time=0.005..0.005 rows=1 loops=2)
 
>                                  Index Cond: (id = alloc.note_id)

[...]

> But run in the lateral, it doesn't use the index:

[...]

>                     ->  Hash Join  (cost=10775.83..20355.61 rows=5724 width=52) (actual time=1.657..5.980 rows=6713
loops=267)
>                            Hash Cond: (alloc.note_id = contrib.id)
>                            ->  Bitmap Heap Scan on portfolio_allocations alloc  (cost=69.82..9628.13 rows=5724
width=39)(actual time=1.010..2.278 rows=6713 loops=267)
 
>                                  Recheck Cond: ((entity_id = '\x5787f132f50f7b03002cf835'::bytea) AND (allocated_on
<=date(dates.dates)))
 
>                                  Heap Blocks: exact=118074
>                                  ->  Bitmap Index Scan on portfolio_allocations_entity_id_allocated_on_idx
(cost=0.00..69.53rows=5724 width=0) (actual time=0.956..0.956 rows=6713 lo
 
>                                        Index Cond: ((entity_id = '\x5787f132f50f7b03002cf835'::bytea) AND
(allocated_on<= date(dates.dates)))
 
>                            ->  Hash  (cost=9464.85..9464.85 rows=354617 width=26) (actual time=169.792..169.792
rows=354617loops=1)
 
>                                  Buckets: 524288  Batches: 1  Memory Usage: 24296kB
>                                  ->  Seq Scan on contributions contrib  (cost=0.00..9464.85 rows=354617 width=26)
(actualtime=0.007..83.246 rows=354617 loops=1)
 

[...]

> I have a few questions here:
>   - Why doesn't it use the primary key index in either case?

I don't know about the first query; perhaps the primary key index is fragmented.
Compare the size of the indexes on disk.
In the second query a sequential scan is used because PostgreSQL chooses a hash join.
That choice is made because the index scans returns 6713 rows rather than the 2
from the first query, probably because the date is different.

>   - Why isn't it choosing portfolio_allocations_pnsa, which seems like it would prevent it from having to sort?

In a bitmap index scan, the table is scanned in physical order, so the result
is not sorted in index order.
I don't know if PostgreSQL is smart enough to figure out that it could use an index
scan and preserve the order through the joins to obviate the sort.
You could try to set enable_bitmapscan=off and see if things are different then.
Perhaps the slower index scan would outweigh the advantage of avoiding the sort.

Yours,
Laurenz Albe