Thread: PG choosing nested loop for set membership?

PG choosing nested loop for set membership?

From
Brian Crowell
Date:
Hello, it's me, a Postgres n00b again. I'm dealing with a query that
scans a rather large table (94,000,000 tuples or so) and just picks
out certain rows and sums them:

select dci.snapshot_time as "time", round(sum(dci.todays_pl)::numeric,0) as pl
from dbo._pl_data_cache_intraday dci
where dci.snapshot_time between '2014-03-25
11:32:40.004552-05'::timestamptz and '2014-03-25
12:02:40.015177-05'::timestamptz
    and dci.symbol in (select sec.symbol from dbo.security_underliers
sec where sec.ultimate_underlier = 'SPY')
    and dci.manager = 'BJC'
    and dci.account in (select account from pl2.visible_accounts where is_fund)
group by dci.snapshot_time
order by dci.snapshot_time;

For the most part, Postgres is doing the right thing: snapshot_time is
the lead column in all of the table's indexes, so it's able to pick up
the source rows fairly quickly in its index scan. It's also enforcing
"dci.manager = 'BJC'" in the same scan, and does a Hash Semi Join for
"dci.symbol in (...)".

The trouble comes when enforcing the "dci.account in (...)" search
condition: pl2.visible_accounts is a view that determines which
accounts the current user can see, which, depending on who you are,
can be several hundred or none at all. Postgres estimates the output
of this query as two rows, but in my case, it's actually 240.

Unfortunately, that leads the query planner to try to think a nested
loop is cheap enough to enforce this, when actually it's really
expensive.

If I hard-code the results from pl2.visible_accounts, Postgres will do
a hash semi join for me, which is much faster, but then I have to wrap
up this whole query as a function in order to preserve its security
properties. Not only is that the situation I was trying to avoid, it
means I can't use EXPLAIN for my query anymore.

I've noticed I can also do the really sneaky "dci.account in (select
unnest(array_agg(account)) from pl2.visible_accounts)", which tricks
the estimator into thinking there will be 100 rows. That _really_
feels like cheating.

Besides the above, is there anything I can do to get Postgres to do a
hash instead of a nested loop?

--Brian


Re: PG choosing nested loop for set membership?

From
Tom Lane
Date:
Brian Crowell <brian@fluggo.com> writes:
> The trouble comes when enforcing the "dci.account in (...)" search
> condition: pl2.visible_accounts is a view that determines which
> accounts the current user can see, which, depending on who you are,
> can be several hundred or none at all. Postgres estimates the output
> of this query as two rows, but in my case, it's actually 240.

So the main estimation error is inside that view, which you didn't
show us :-(

            regards, tom lane


Re: PG choosing nested loop for set membership?

From
David Johnston
Date:
Brian Crowell wrote
> Hello, it's me, a Postgres n00b again. I'm dealing with a query that
> scans a rather large table (94,000,000 tuples or so) and just picks
> out certain rows and sums them:
>
> select dci.snapshot_time as "time", round(sum(dci.todays_pl)::numeric,0)
> as pl
> from dbo._pl_data_cache_intraday dci
> where dci.snapshot_time between '2014-03-25
> 11:32:40.004552-05'::timestamptz and '2014-03-25
> 12:02:40.015177-05'::timestamptz
>     and dci.symbol in (select sec.symbol from dbo.security_underliers
> sec where sec.ultimate_underlier = 'SPY')
>     and dci.manager = 'BJC'
>     and dci.account in (select account from pl2.visible_accounts where
> is_fund)
> group by dci.snapshot_time
> order by dci.snapshot_time;
>
> For the most part, Postgres is doing the right thing: snapshot_time is
> the lead column in all of the table's indexes, so it's able to pick up
> the source rows fairly quickly in its index scan. It's also enforcing
> "dci.manager = 'BJC'" in the same scan, and does a Hash Semi Join for
> "dci.symbol in (...)".
>
> The trouble comes when enforcing the "dci.account in (...)" search
> condition: pl2.visible_accounts is a view that determines which
> accounts the current user can see, which, depending on who you are,
> can be several hundred or none at all. Postgres estimates the output
> of this query as two rows, but in my case, it's actually 240.
>
> Unfortunately, that leads the query planner to try to think a nested
> loop is cheap enough to enforce this, when actually it's really
> expensive.
>
> If I hard-code the results from pl2.visible_accounts, Postgres will do
> a hash semi join for me, which is much faster, but then I have to wrap
> up this whole query as a function in order to preserve its security
> properties. Not only is that the situation I was trying to avoid, it
> means I can't use EXPLAIN for my query anymore.
>
> I've noticed I can also do the really sneaky "dci.account in (select
> unnest(array_agg(account)) from pl2.visible_accounts)", which tricks
> the estimator into thinking there will be 100 rows. That _really_
> feels like cheating.
>
> Besides the above, is there anything I can do to get Postgres to do a
> hash instead of a nested loop?

1) Try using EXISTS instead of IN
2 - and the one I'd use by default) Use an INNER JOIN

SELECT ...
FROM ... dci
JOIN (SELECT account FROM ... WHERE is_fund) accts USING (account)
JOIN (SELECT symbol FROM ... WHERE ... = 'SPY') sec USING (symbol)
WHERE ...

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PG-choosing-nested-loop-for-set-membership-tp5797457p5797459.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: PG choosing nested loop for set membership?

From
Brian Crowell
Date:
On Tue, Mar 25, 2014 at 4:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So the main estimation error is inside that view, which you didn't
> show us :-(

I didn't know which direction you'd want to go with it.  :P

The view is like this:

===
create or replace view pl2.visible_accounts
as
select
    -- {{pk}} The account in question. This is the primary key.
    acc.account,

    -- The manager for this account.
    acc.manager,

    -- True if this account is in the fund, false otherwise.
    acc.is_fund
from pl2._visible_accounts_by_rule_set acc
inner join pl2.current_user u on acc.rule_set_id =
u.impersonated_user_permission_rule_set_id;
===

pl2._visible_accounts_by_rule_set has rule_set_id = 1 with 241
entries, rule_set_id = 3 with 76, and nothing else. Postgres correctly
assumes pl2_current_user will return one row. In my case, this will
return rule_set_id = 1.

Explaining just this view yields:

'Nested Loop  (cost=2.77..10.23 rows=2 width=10) (actual
time=0.086..0.222 rows=241 loops=1)'
'  Output: acc.account, acc.manager, acc.is_fund'
'  Buffers: shared hit=7'
'  ->  Hash Right Join  (cost=2.62..5.12 rows=1 width=8) (actual
time=0.064..0.068 rows=1 loops=1)'
'        Output: real_user.permission_rule_set_id,
impersonated_user.permission_rule_set_id'
'        Hash Cond: (impersonated_user.user_id = real_user.impersonating)'
'        Buffers: shared hit=4'
'        ->  Seq Scan on pl2._users impersonated_user
(cost=0.00..2.35 rows=35 width=8) (actual time=0.002..0.007 rows=35
loops=1)'
'              Output: impersonated_user.user_id,
impersonated_user.user_principal_name, impersonated_user.name,
impersonated_user.permission_rule_set_id,
impersonated_user.impersonating, impersonated_user.is_admin'
'              Buffers: shared hit=2'
'        ->  Hash  (cost=2.61..2.61 rows=1 width=8) (actual
time=0.041..0.041 rows=1 loops=1)'
'              Output: real_user.impersonating,
real_user.permission_rule_set_id'
'              Buckets: 1024  Batches: 1  Memory Usage: 1kB'
'              Buffers: shared hit=2'
'              ->  Seq Scan on pl2._users real_user  (cost=0.00..2.61
rows=1 width=8) (actual time=0.026..0.036 rows=1 loops=1)'
'                    Output: real_user.impersonating,
real_user.permission_rule_set_id'
'                    Filter: (real_user.user_principal_name =
("session_user"())::text)'
'                    Rows Removed by Filter: 34'
'                    Buffers: shared hit=2'
'  ->  Index Scan using _visible_accounts_by_rule_set_idx on
pl2._visible_accounts_by_rule_set acc  (cost=0.15..3.54 rows=158
width=14) (actual time=0.018..0.086 rows=241 loops=1)'
'        Output: acc.rule_set_id, acc.account, acc.manager, acc.is_fund'
'        Index Cond: (acc.rule_set_id =
COALESCE(impersonated_user.permission_rule_set_id,
real_user.permission_rule_set_id))'
'        Buffers: shared hit=3'
'Total runtime: 0.313 ms'

All of the estimates on this view are reasonable, except for that
nested loop at the top. The only thing I can think is that it's
uncertain which ID I will pick, and I can't help it there.

--Brian


Re: PG choosing nested loop for set membership?

From
Brian Crowell
Date:
On Tue, Mar 25, 2014 at 4:12 PM, David Johnston <polobo@yahoo.com> wrote:
> 2 - and the one I'd use by default) Use an INNER JOIN

That's where I started, but Postgres is smart enough to know that this
is equivalent to what I'm doing, and still picks the nested loop. I
went to IN in the hopes of tricking it.

I haven't tried EXISTS. I'll have to try that tomorrow.

--Brian


Re: PG choosing nested loop for set membership?

From
Tom Lane
Date:
Brian Crowell <brian@fluggo.com> writes:
> Explaining just this view yields:

> 'Nested Loop  (cost=2.77..10.23 rows=2 width=10) (actual time=0.086..0.222 rows=241 loops=1)'
> '  ->  Hash Right Join  (cost=2.62..5.12 rows=1 width=8) (actual time=0.064..0.068 rows=1 loops=1)'
> '  ->  Index Scan using _visible_accounts_by_rule_set_idx on pl2._visible_accounts_by_rule_set acc  (cost=0.15..3.54
rows=158width=14) (actual time=0.018..0.086 rows=241 loops=1)' 

> All of the estimates on this view are reasonable, except for that
> nested loop at the top.

Yeah.  The weird thing about that is that the nestloop rowcount estimate
isn't the product of the two input rowcounts --- you'd sort of expect an
estimate of 158 given the input-relation sizes.  While that's not ipso
facto evidence of a bug (because the estimates are arrived at in different
ways), I'm having a hard time replicating it here.  Are you using an
up-to-date PG release?

One thing that might help is to increase the statistics target for
pl2._visible_accounts_by_rule_set.  The other two tables are small enough
that you don't need to do that for them.  (Although come to think of it,
they are also small enough that maybe auto-analyze isn't triggering for
them ... does a manual ANALYZE improve matters?)

            regards, tom lane


Re: PG choosing nested loop for set membership?

From
Brian Crowell
Date:
On Tue, Mar 25, 2014 at 5:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah.  The weird thing about that is that the nestloop rowcount estimate
> isn't the product of the two input rowcounts --- you'd sort of expect an
> estimate of 158 given the input-relation sizes.  While that's not ipso
> facto evidence of a bug (because the estimates are arrived at in different
> ways), I'm having a hard time replicating it here.  Are you using an
> up-to-date PG release?

All right, I think I'm onto something. But first I'll answer your questions.

Version is 9.3.3 from the Postgres Debian archives.


> One thing that might help is to increase the statistics target for
> pl2._visible_accounts_by_rule_set.  The other two tables are small enough
> that you don't need to do that for them.  (Although come to think of it,
> they are also small enough that maybe auto-analyze isn't triggering for
> them ... does a manual ANALYZE improve matters?)

You were right that auto-analyze didn't go after them. Weird. But a
few manual analyzes later, and no change.

Here's what I did, though. I collapsed the pl2.current_user view into
pl2.visible_accounts:

===
select
    acc.account,
    acc.manager,
    acc.is_fund
from pl2._visible_accounts_by_rule_set acc
    inner join (pl2._users u
    left join pl2._users iu on u.impersonating = iu.user_id)
        on acc.rule_set_id = coalesce(iu.permission_rule_set_id,
u.permission_rule_set_id)
where u.user_principal_name = session_user
===

I noticed that join-on-coalesce pattern that gave us trouble in SQL
Server. The query planner can't do a thing with that. So I rewrote the
query so the last join would be solid:

===
select
    acc.account,
    acc.manager,
    acc.is_fund
from pl2._users lu
    inner join pl2._users u on u.user_id = coalesce(lu.impersonating,
lu.user_id)
    inner join pl2._visible_accounts_by_rule_set acc
        on acc.rule_set_id = u.permission_rule_set_id
where lu.user_principal_name = session_user
===

The join order is the same, and the indexes used are the same, but the
estimate is much better:

'Nested Loop  (cost=0.68..13.70 rows=133 width=10) (actual
time=0.073..0.211 rows=241 loops=1)'
'  Output: acc.account, acc.manager, acc.is_fund'
'  Buffers: shared hit=10'
'  ->  Nested Loop  (cost=0.54..8.58 rows=1 width=4) (actual
time=0.056..0.059 rows=1 loops=1)'
'        Output: u.permission_rule_set_id'
'        Buffers: shared hit=7'
'        ->  Index Scan using _pl2_users_user_principal_name_idx on
pl2._users lu  (cost=0.27..4.29 rows=1 width=8) (actual
time=0.045..0.047 rows=1 loops=1)'
'              Output: lu.user_id, lu.user_principal_name, lu.name,
lu.permission_rule_set_id, lu.impersonating, lu.is_admin'
'              Index Cond: (lu.user_principal_name = ("session_user"())::text)'
'              Buffers: shared hit=4'
'        ->  Index Scan using _users_pkey on pl2._users u
(cost=0.27..4.29 rows=1 width=8) (actual time=0.006..0.006 rows=1
loops=1)'
'              Output: u.user_id, u.user_principal_name, u.name,
u.permission_rule_set_id, u.impersonating, u.is_admin'
'              Index Cond: (u.user_id = COALESCE(lu.impersonating, lu.user_id))'
'              Buffers: shared hit=3'
'  ->  Index Scan using _visible_accounts_by_rule_set_idx on
pl2._visible_accounts_by_rule_set acc  (cost=0.15..3.54 rows=158
width=14) (actual time=0.015..0.089 rows=241 loops=1)'
'        Output: acc.rule_set_id, acc.account, acc.manager, acc.is_fund'
'        Index Cond: (acc.rule_set_id = u.permission_rule_set_id)'
'        Buffers: shared hit=3'
'Total runtime: 0.297 ms'

I'll see if I can write an isolated test case for the coalesce
misestimate. Or do you think the query planner will ever be able to do
anything with that form?

--Brian


Re: PG choosing nested loop for set membership?

From
Tom Lane
Date:
Brian Crowell <brian@fluggo.com> writes:
> Here's what I did, though. I collapsed the pl2.current_user view into
> pl2.visible_accounts:

> ===
> select
>     acc.account,
>     acc.manager,
>     acc.is_fund
> from pl2._visible_accounts_by_rule_set acc
>     inner join (pl2._users u
>     left join pl2._users iu on u.impersonating = iu.user_id)
>         on acc.rule_set_id = coalesce(iu.permission_rule_set_id,
> u.permission_rule_set_id)
> where u.user_principal_name = session_user
> ===

> I noticed that join-on-coalesce pattern that gave us trouble in SQL
> Server. The query planner can't do a thing with that. So I rewrote the
> query so the last join would be solid:

> ===
> select
>     acc.account,
>     acc.manager,
>     acc.is_fund
> from pl2._users lu
>     inner join pl2._users u on u.user_id = coalesce(lu.impersonating,
> lu.user_id)
>     inner join pl2._visible_accounts_by_rule_set acc
>         on acc.rule_set_id = u.permission_rule_set_id
> where lu.user_principal_name = session_user
> ===

Hm.  It's not obvious from here that those give the same results ---
but you probably understand your schema better than the rest of us.

> I'll see if I can write an isolated test case for the coalesce
> misestimate. Or do you think the query planner will ever be able to do
> anything with that form?

Probably not much.  I'd guess that the real benefit of this approach
is that it avoids the join-condition-using-three-input-relations,
which is a bear from any angle.

            regards, tom lane


Re: PG choosing nested loop for set membership?

From
Brian Crowell
Date:
On Wed, Mar 26, 2014 at 10:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hm.  It's not obvious from here that those give the same results ---
> but you probably understand your schema better than the rest of us.

The _users table has a "user_id", and a nullable column
"impersonating" which refers to a user_id you want to impersonate. If
"impersonating" isn't null, you want the rule_set_id for that user. If
not, you want the rule_set_id of your own user. Hence the first
query's left join to the second, impersonated user. The final join
grabs the first rule_set_id it can find with a coalesce.

The second query does the same thing with an inner join; the second
_users reference will have the impersonated user if there is one, or
the original user if there isn't. Either way, there's a solid user to
join to, which I guess is enough for the query planner.

They're really equivalent, since there is still just one rule_set_id at the end.


> Probably not much.  I'd guess that the real benefit of this approach
> is that it avoids the join-condition-using-three-input-relations,
> which is a bear from any angle.

Well look what happens when I remove impersonation, and stick a
coalesce in the wrong place:

===
select
    acc.account,
    acc.manager,
    acc.is_fund
from pl2._users lu
inner join pl2._visible_accounts_by_rule_set acc
    on acc.rule_set_id = coalesce(lu.permission_rule_set_id, 0)
where lu.user_principal_name = session_user
===

'Hash Join  (cost=2.62..9.07 rows=9 width=10) (actual
time=0.066..0.239 rows=241 loops=1)'
'  Output: acc.account, acc.manager, acc.is_fund'
'  Hash Cond: (acc.rule_set_id = COALESCE(lu.permission_rule_set_id, 0))'
'  Buffers: shared hit=4'

Just removing the coalesce (acc.rule_set_id =
lu.permission_rule_set_id) does this:

'Hash Join  (cost=2.62..10.31 rows=133 width=10) (actual
time=0.063..0.257 rows=241 loops=1)'
'  Output: acc.account, acc.manager, acc.is_fund'
'  Hash Cond: (acc.rule_set_id = lu.permission_rule_set_id)'
'  Buffers: shared hit=4'

Which says to me coalesce has a selectivity.

--Brian


Re: PG choosing nested loop for set membership?

From
Tom Lane
Date:
Brian Crowell <brian@fluggo.com> writes:
> Which says to me coalesce has a selectivity.

Well, the point is you're just getting a default selectivity estimate
for the "acc.rule_set_id = coalesce(...anything...)" condition.  The
planner is smarter about plain "x = y" join conditions: it looks up
the column stats for x and y and determines the probability of equality.

In principle I guess we could somehow merge the stats of y and z
when looking at a "coalesce(y, z)" expression, but I'm not sure
how that would work exactly.

            regards, tom lane


Re: PG choosing nested loop for set membership?

From
Brian Crowell
Date:
On Wed, Mar 26, 2014 at 11:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In principle I guess we could somehow merge the stats of y and z
> when looking at a "coalesce(y, z)" expression, but I'm not sure
> how that would work exactly.

Yeah, I'm not sure there's anything to fix here, either. Just a
reminder that coalesces in joins are bad.

The only thing I could think was making an exception for the case when
all inputs to coalesce() have one row. I think that's what's happening
here; coalesce selectivity is estimated at less than one, even though
you can't get a cardinality any less than the inputs (they're already
one), so the nested loop sees an estimate that's less than the product
of its inputs. Or that's my guess anyhow.

Thanks for having a look!

--Brian


Re: PG choosing nested loop for set membership?

From
Jeff Janes
Date:
On Tue, Mar 25, 2014 at 2:00 PM, Brian Crowell <brian@fluggo.com> wrote:
Hello, it's me, a Postgres n00b again. I'm dealing with a query that
scans a rather large table (94,000,000 tuples or so) and just picks
out certain rows and sums them:

select dci.snapshot_time as "time", round(sum(dci.todays_pl)::numeric,0) as pl
from dbo._pl_data_cache_intraday dci
where dci.snapshot_time between '2014-03-25
11:32:40.004552-05'::timestamptz and '2014-03-25
12:02:40.015177-05'::timestamptz
    and dci.symbol in (select sec.symbol from dbo.security_underliers
sec where sec.ultimate_underlier = 'SPY')
    and dci.manager = 'BJC'
    and dci.account in (select account from pl2.visible_accounts where is_fund)
group by dci.snapshot_time
order by dci.snapshot_time;

For the most part, Postgres is doing the right thing: snapshot_time is
the lead column in all of the table's indexes, so it's able to pick up
the source rows fairly quickly in its index scan. It's also enforcing
"dci.manager = 'BJC'" in the same scan, and does a Hash Semi Join for
"dci.symbol in (...)".

The trouble comes when enforcing the "dci.account in (...)" search
condition: pl2.visible_accounts is a view that determines which
accounts the current user can see, which, depending on who you are,
can be several hundred or none at all. Postgres estimates the output
of this query as two rows, but in my case, it's actually 240.

Unfortunately, that leads the query planner to try to think a nested
loop is cheap enough to enforce this, when actually it's really
expensive.

Can you show the explain plan for that?  I can't get it to use anything but a hash join for this type of thing even when the estimated rows in the in-list are 2, unless I disable hash joins altogether.   So I'm curious how your plan differs from the ones I've dummied up.

Cheers,

Jeff