Thread: Query much slower after upgrade to 9.6.1

Query much slower after upgrade to 9.6.1

From
Adam Brusselback
Date:
Hello all, I have a query that was running quickly enough on 9.5.5 and has slowed to a halt after upgrading to 9.6.1.

The server hardware is the same, 2 core, 4GB ram DigitalOcean virtual server, running Debian 8.6.

The query is a pretty heavy reporting query to aggregate the dollar value of "claims" against a specific "contract".

The query is below for reference.

Query plan on 9.6.1: https://explain.depesz.com/s/NwmH
Query plan on 9.5.5: https://explain.depesz.com/s/ioI4

We just migrated over the weekend, and this issue was brought to my attention today.

SELECT 
  con.client_id
, 'product'::text AS type
, p.product_number AS identifier
, p.product_name AS description
, civ.rebate_direct_decimal_model
, civ.rebate_deviated_decimal_model
, civ.rebate_direct_value
, civ.rebate_direct_type
, civ.rebate_deviated_value
, civ.rebate_deviated_type
, actuals.claimant
, actuals.claimant_id
, civ.estimated_quantity AS prob_exp_volume
, COALESCE(actuals.rebate_allowed_quantity, 0) AS actual_volume
, CASE WHEN r.rate IS NULL OR (r.rate < 0) THEN NULL ELSE (r.rate * civ.estimated_quantity) END AS prob_exp_dollars 
, COALESCE(actuals.rebate_allowed_dollars, 0) AS actual_dollars 
, COALESCE(actuals.transaction_date,null) AS transaction_date
FROM contract con
INNER JOIN contract_item_view civ
ON con.contract_id = civ.contract_id
INNER JOIN product p
ON civ.product_id = p.product_id
INNER JOIN product_uom_conversion civuomc
ON civ.uom_type_id = civuomc.uom_type_id
AND civ.product_id = civuomc.product_id
LEFT JOIN LATERAL (
SELECT 
 claim_product.product_id
, company.company_name AS claimant
, company.company_id AS claimant_id
, MAX(COALESCE(transaction.transaction_date,null)) AS transaction_date
, SUM((civuomc.rate / cpuomc.rate) * claim_product.rebate_allowed_quantity) AS rebate_allowed_quantity
, SUM(claim_product.rebate_allowed_quantity * claim_product.rebate_allowed_rate) AS rebate_allowed_dollars
FROM contract
INNER JOIN contract_claim_bridge
USING (contract_id)
INNER JOIN claim
USING (claim_id)
INNER JOIN claim_product
USING (claim_id)
INNER JOIN product_uom_conversion cpuomc
ON claim_product.uom_type_id = cpuomc.uom_type_id
AND claim_product.product_id = cpuomc.product_id
INNER JOIN invoice
USING (invoice_id)
INNER JOIN company
ON company.company_id = invoice.claimant_company_id
LEFT JOIN LATERAL (
SELECT MAX(transaction_date) AS transaction_date
FROM claim_transaction
WHERE TRUE
AND claim_transaction.claim_id = claim.claim_id
AND claim_transaction.transaction_type IN ('PAYMENT'::enum.transaction_type,'DEDUCTION'::enum.transaction_type)
ORDER BY transaction_date DESC
LIMIT 1
)  transaction
ON TRUE
WHERE contract.contract_sequence = con.contract_sequence
AND contract.contract_renew_version = con.contract_renew_version
AND contract.client_id = con.client_id
AND claim.claim_state = 'COMPLETE'
GROUP BY claim_product.product_id, company.company_name, company.company_id
) actuals
ON actuals.product_id = civ.product_id
LEFT JOIN LATERAL gosimple.calculate_contract_item_probable_exposure_rate(
(
SELECT array_agg(row(x.contract_item_id, x.estimated_quantity, x.price)::gosimple.in_calculate_contract_item_probable_exposure_rate)
FROM
(
SELECT 
 civ2.contract_item_id
, civ2.estimated_quantity AS estimated_quantity
, AVG(pd2.price / puc2.rate) AS price
FROM contract con2
INNER JOIN contract_item_view civ2
ON con2.contract_id = civ2.contract_id
LEFT JOIN product_uom_conversion puc2
ON puc2.product_id = civ2.product_id
AND puc2.uom_type_id = civ2.uom_type_id
LEFT JOIN price_default pd2
ON civ2.product_id = pd2.product_id
AND pd2.active_range @> now()
WHERE TRUE
AND con2.contract_id = con.contract_id
GROUP BY civ2.contract_item_id, civ2.estimated_quantity
) AS x
)) r
ON civ.contract_item_id = r.contract_item_id
WHERE TRUE
AND con.contract_id = '54e28f3b-8f87-46fc-abf0-6fe86f528c0c'

Re: Query much slower after upgrade to 9.6.1

From
Adam Brusselback
Date:
As suggested in the Postgres slack channel by lukasfittl, I disabled hashagg on my old server, and ran the query again. That changed one piece to a groupagg (like was used on the new server) and the performance was similar to the 9.6.1 box.

9.5.5 w/ hashagg disabled: https://explain.depesz.com/s/SBVt

Re: Query much slower after upgrade to 9.6.1

From
Tom Lane
Date:
Adam Brusselback <adambrusselback@gmail.com> writes:
> As suggested in the Postgres slack channel by lukasfittl, I disabled
> hashagg on my old server, and ran the query again. That changed one piece
> to a groupagg (like was used on the new server) and the performance was
> similar to the 9.6.1 box.

If the problem is "new server won't use hashagg", I'd wonder whether
the work_mem setting is the same, or whether maybe you need to bump
it up some (the planner's estimate of how big the hashtable would be
might have changed a bit).

            regards, tom lane


Re: Query much slower after upgrade to 9.6.1

From
Adam Brusselback
Date:
If the problem is "new server won't use hashagg", I'd wonder whether
the work_mem setting is the same, or whether maybe you need to bump
it up some (the planner's estimate of how big the hashtable would be
might have changed a bit).
I actually was speaking with Stephen Frost in the slack channel, and tested both of those theories.

The work_mem was the same between the two servers (12MB), but he suggested I play around with it. I tried 4MB, 20MB, and 128MB. There was no difference from 12MB with any of them.

I have my default_statistics_target set to 300, and ran a VACUUM ANALYZE right after the upgrade to 9.6.1.  He suggested I lower it, so I put it back down to 100, ran a VACUUM ANALYZE, and observed no change in query.  I also tried going the other way and set it to 1000, VACUUM ANALYZE, and again, no difference to query.

Re: Query much slower after upgrade to 9.6.1

From
Tom Lane
Date:
Adam Brusselback <adambrusselback@gmail.com> writes:
>> If the problem is "new server won't use hashagg", I'd wonder whether
>> the work_mem setting is the same, or whether maybe you need to bump
>> it up some (the planner's estimate of how big the hashtable would be
>> might have changed a bit).

> I actually was speaking with Stephen Frost in the slack channel, and tested
> both of those theories.

> The work_mem was the same between the two servers (12MB), but he suggested
> I play around with it. I tried 4MB, 20MB, and 128MB. There was no
> difference from 12MB with any of them.

> I have my default_statistics_target set to 300, and ran a VACUUM ANALYZE
> right after the upgrade to 9.6.1.  He suggested I lower it, so I put it
> back down to 100, ran a VACUUM ANALYZE, and observed no change in query.  I
> also tried going the other way and set it to 1000, VACUUM ANALYZE, and
> again, no difference to query.

Did you pay attention to the estimated number of groups (ie, estimated
output rowcount for the aggregation plan node) while fooling around with
the statistics?  How does it compare to reality, and to 9.5's estimate?

There were several different changes in the planner's number-of-distinct-
values estimation code in 9.6, so maybe the the cause of the difference is
somewhere around there.

            regards, tom lane


Re: Query much slower after upgrade to 9.6.1

From
Adam Brusselback
Date:
Did you pay attention to the estimated number of groups (ie, estimated
output rowcount for the aggregation plan node) while fooling around with
the statistics?  How does it compare to reality, and to 9.5's estimate?

I'm re-doing the tests and paying attention to that now.

With statistics = 100, the under / over estimations change only slightly. Nothing that drastically alters anything: https://explain.depesz.com/s/GEWy
With statistics = 1000 pretty much the same as above: https://explain.depesz.com/s/6CWM

So between 9.5.5, 9.6.1, none of the stats changed in a noticeable way. Changing the statistics target on 9.6.1 slightly altered the estimates, but nothing to write home about.
All have some significant deviations from actual row counts in the part of the query which is making the query slow.