PG choosing nested loop for set membership? - Mailing list pgsql-general

From Brian Crowell
Subject PG choosing nested loop for set membership?
Date
Msg-id CAAQkdDriD3sc=1nzu3V8eORo9vHKUbwzmbSGWYFfpdM8oM3=iA@mail.gmail.com
Whole thread Raw
Responses Re: PG choosing nested loop for set membership?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: PG choosing nested loop for set membership?  (David Johnston <polobo@yahoo.com>)
Re: PG choosing nested loop for set membership?  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-general
Hello, it's me, a Postgres n00b again. I'm dealing with a query that
scans a rather large table (94,000,000 tuples or so) and just picks
out certain rows and sums them:

select dci.snapshot_time as "time", round(sum(dci.todays_pl)::numeric,0) as pl
from dbo._pl_data_cache_intraday dci
where dci.snapshot_time between '2014-03-25
11:32:40.004552-05'::timestamptz and '2014-03-25
12:02:40.015177-05'::timestamptz
    and dci.symbol in (select sec.symbol from dbo.security_underliers
sec where sec.ultimate_underlier = 'SPY')
    and dci.manager = 'BJC'
    and dci.account in (select account from pl2.visible_accounts where is_fund)
group by dci.snapshot_time
order by dci.snapshot_time;

For the most part, Postgres is doing the right thing: snapshot_time is
the lead column in all of the table's indexes, so it's able to pick up
the source rows fairly quickly in its index scan. It's also enforcing
"dci.manager = 'BJC'" in the same scan, and does a Hash Semi Join for
"dci.symbol in (...)".

The trouble comes when enforcing the "dci.account in (...)" search
condition: pl2.visible_accounts is a view that determines which
accounts the current user can see, which, depending on who you are,
can be several hundred or none at all. Postgres estimates the output
of this query as two rows, but in my case, it's actually 240.

Unfortunately, that leads the query planner to try to think a nested
loop is cheap enough to enforce this, when actually it's really
expensive.

If I hard-code the results from pl2.visible_accounts, Postgres will do
a hash semi join for me, which is much faster, but then I have to wrap
up this whole query as a function in order to preserve its security
properties. Not only is that the situation I was trying to avoid, it
means I can't use EXPLAIN for my query anymore.

I've noticed I can also do the really sneaky "dci.account in (select
unnest(array_agg(account)) from pl2.visible_accounts)", which tricks
the estimator into thinking there will be 100 rows. That _really_
feels like cheating.

Besides the above, is there anything I can do to get Postgres to do a
hash instead of a nested loop?

--Brian


pgsql-general by date:

Previous
From: Steven Schlansker
Date:
Subject: Trimming transaction logs after extended WAL archive failures
Next
From: Tom Lane
Date:
Subject: Re: PG choosing nested loop for set membership?