Re: Query slow again after adding an `OR` operation (was: SlowPostgreSQL 10.6 query) - Mailing list pgsql-performance

From David Rowley
Subject Re: Query slow again after adding an `OR` operation (was: SlowPostgreSQL 10.6 query)
Date
Msg-id CAKJS1f8M-TPB86=hqGspDA0mD_pP=LWGRJA0u393xtRzPzUeKQ@mail.gmail.com
Whole thread Raw
In response to Query slow again after adding an `OR` operation (was: Slow PostgreSQL10.6 query)  (Behrang Saeedzadeh <behrangsa@gmail.com>)
List pgsql-performance
On Thu, 10 Oct 2019 at 01:07, Behrang Saeedzadeh <behrangsa@gmail.com> wrote:
>
> This is a follow up to
https://www.postgresql.org/message-id/flat/CAERAJ%2B-1buiJ%2B_JWEo0a9Ao-CVMWpgp%3DEnFx1dJtnB3WmMi2zQ%40mail.gmail.com
>
> The query (generated by Hibernate) got a bit more complex and performance degraded again. I have uploaded all the
detailshere (with changed table names, etc.): https://github.com/behrangsa/slow-query
 
>
> In short, the new query is:

The query mostly appears slow due to the "Rows Removed By Filter" in
the OR condition. The only way to get around not scanning the entire
branch_invoices table would be to somehow write the way in such a way
that allows it to go on the inner side of the join.

You could do that if you ensure there's an index on branch_invoices
(branch_id) and format the query as:

SELECT inv.id           AS i_id,
       inv.invoice_date AS inv_d,
       inv.invoice_xid  AS inv_xid,
       inv.invoice_type AS inv_type,
       brs.branch_id    AS br_id,
       cinvs.company_id AS c_id
FROM invoices inv
         LEFT OUTER JOIN branch_invoices brs ON inv.id = brs.invoice_id
         LEFT OUTER JOIN company_invoices cinvs ON inv.id = cinvs.invoice_id
         INNER JOIN branches br ON brs.branch_id = br.id
WHERE brs.branch_id IN (SELECT br1.id
                        FROM branches br1
                                 INNER JOIN access_rights ar1 ON
br1.id = ar1.branch_id
                                 INNER JOIN users usr1 ON ar1.user_id = usr1.id
                                 INNER JOIN groups grp1 ON
ar1.group_id = grp1.id
                                 INNER JOIN group_permissions gpr1 ON
grp1.id = gpr1.group_id
                                 INNER JOIN permissions prm1 ON
gpr1.permission_id = prm1.id
                        WHERE usr1.id = 1636
                          AND prm1.code = 'C2'
                          AND ar1.access_type = 'T1')
UNION ALL
SELECT br3.id
                        FROM companies cmp
                                 INNER JOIN branches br3 ON cmp.id =
br3.company_id
                                 INNER JOIN access_rights ar2 ON
cmp.id = ar2.company_id
                                 INNER JOIN users usr2 ON ar2.user_id = usr2.id
                                 INNER JOIN groups g2 ON ar2.group_id = g2.id
                                 INNER JOIN group_permissions gpr2 ON
g2.id = gpr2.group_id
                                 INNER JOIN permissions prm2 ON
gpr2.permission_id = prm2.id
                        WHERE usr2.id = 1636
                          AND prm2.code = 'C2'
                          AND ar2.access_type = 'T1')
ORDER BY inv.invoice_date DESC, br.name ASC
LIMIT 12;

The planner may then choose to pullup the subquery and uniquify it
then put it on the outside of a nested loop join then lookup the
branch_invoices record using the index on branch_id.  I think this is
quite a likely plan since the planner estimates there's only going to
be 1 row from each of the subqueries.

Also note, that the LEFT JOIN you have to branch_invoices is not
really a left join since you're insisting that the branch_id must be
in the first or 2nd sub-plan. There's no room for it to be NULL. The
planner will just convert that to an INNER JOIN with the above query
since that'll give it the flexibility to put the subquery in the IN
clause on the outside of the join (after having uniquified it).
You'll need to decide what you actually want the behaviour to be here.
If you do need those NULL rows then you'd better move your WHERE quals
down into the join condition for branch_invoices table. I'd suggest
testing with some mock-up data if you're uncertain of what I mean.

If you find that is faster and you can't rewrite the query due to it
having been generated by Hibernate, then that sounds like a problem
with Hibernate.  PostgreSQL does not currently attempt to do any
rewrites which convert OR clauses to use UNION or UNION ALL. No amount
of tweaking the planner settings is going to change that fact.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Would SSD improve Index Only Scan performance by a lot?
Next
From: Jerry Sievers
Date:
Subject: Re: Modification of data in base folder and very large tables