Re: Bogus ANALYZE results for an otherwise-unique column with many nulls - Mailing list pgsql-hackers
From | Andreas Joseph Krogh |
---|---|
Subject | Re: Bogus ANALYZE results for an otherwise-unique column with many nulls |
Date | |
Msg-id | VisenaEmail.5b.208c70dd7d62e340.15659a64f41@tc7-visena Whole thread Raw |
In response to | Re: Bogus ANALYZE results for an otherwise-unique column with many nulls (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
På fredag 05. august 2016 kl. 01:01:06, skrev Tom Lane <tgl@sss.pgh.pa.us>:
I wrote:
> Looking around, there are a couple of places outside commands/analyze.c
> that are making the same mistake, so this patch isn't complete, but it
> illustrates what needs to be done.
Here's a more complete patch.
regards, tom lane
(Sorry for HTML-email, but we live in 2016, and I think it improves readability)
Great!
I have tested it (with PG-9.6 from HEAD, e7caacf733f3ee77a555aa715ab6fbf4434e6b52) and it sure looks like it fixes the problem for my query.
The query:
explain analyze
SELECT
log.relation_id as company_id , sum(log.duration) AS duration
FROM onp_crm_activity_log log
JOIN onp_crm_person logfor ON logfor.onp_user_id = log.logged_for AND logfor.is_resource = FALSE
WHERE 1 = 1
-- Filter out already invoiced
AND NOT EXISTS( SELECT * FROM onp_crm_calendarentry_invoice_membership cem JOIN onp_crm_invoice_line il ON cem.invoice_line_id = il.id JOIN onp_crm_invoice inv ON il.invoice_id = inv.entity_id WHERE cem.calendar_entry_id = log.id AND NOT EXISTS( SELECT * FROM onp_crm_invoice creditnote WHERE il.invoice_id = creditnote.credit_against )
)
GROUP BY
log.relation_id
;
Gave the following explain-plan (before the patch), with enable_nestloop=off (needed to prevent nest-loop-anti-join which caused this query to execute in ~26 min.):
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=25201.62..25220.06 rows=1475 width=36) (actual time=381.191..381.281 rows=404 loops=1) Group Key: log.relation_id -> Hash Anti Join (cost=4782.34..25148.28 rows=10667 width=12) (actual time=103.683..361.409 rows=148330 loops=1) Hash Cond: (log.id = cem.calendar_entry_id) -> Hash Join (cost=81.46..20312.73 rows=10667 width=20) (actual time=0.100..156.432 rows=380617 loops=1) Hash Cond: (log.logged_for = logfor.onp_user_id) -> Seq Scan on onp_crm_activity_log log (cost=0.00..18696.71 rows=380771 width=24) (actual time=0.005..49.195 rows=380774 loops=1) -> Hash (cost=25.02..25.02 rows=4515 width=8) (actual time=0.078..0.078 rows=119 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 69kB -> Index Scan using onp_crm_person_onp_id_idx on onp_crm_person logfor (cost=0.14..25.02 rows=4515 width=8) (actual time=0.005..0.063 rows=119 loops=1) Filter: (NOT is_resource) Rows Removed by Filter: 8
-> Hash (cost=4700.87..4700.87 rows=1 width=4) (actual time=103.575..103.575 rows=232412 loops=1) Buckets: 262144 (originally 1024) Batches: 1 (originally 1) Memory Usage: 10219kB -> Hash Join (cost=474.41..4700.87 rows=1 width=4) (actual time=9.724..76.015 rows=232412 loops=1) Hash Cond: (cem.invoice_line_id = il.id) -> Seq Scan on onp_crm_calendarentry_invoice_membership cem (cost=0.00..3354.28 rows=232528 width=8) (actual time=0.004..17.626 rows=232528 loops=1) -> Hash (cost=474.40..474.40 rows=1 width=4) (actual time=9.710..9.710 rows=11535 loops=1) Buckets: 16384 (originally 1024) Batches: 1 (originally 1) Memory Usage: 534kB -> Merge Join (cost=415.33..474.40 rows=1 width=4) (actual time=4.149..8.467 rows=11535 loops=1) Merge Cond: (il.invoice_id = inv.entity_id) -> Sort (cost=415.05..415.06 rows=1 width=8) (actual time=4.138..4.701 rows=11535 loops=1) Sort Key: il.invoice_id Sort Method: quicksort Memory: 925kB -> Hash Anti Join (cost=77.13..415.04 rows=1 width=8) (actual time=0.257..2.716 rows=11535 loops=1) Hash Cond: (il.invoice_id = creditnote.credit_against) -> Seq Scan on onp_crm_invoice_line il (cost=0.00..294.30 rows=11630 width=8) (actual time=0.003..0.995 rows=11630 loops=1) -> Hash (cost=50.38..50.38 rows=2140 width=4) (actual time=0.247..0.247 rows=37 loops=1) Buckets: 4096 Batches: 1 Memory Usage: 34kB -> Index Only Scan using origo_invoice_credit_against_idx on onp_crm_invoice creditnote (cost=0.28..50.38 rows=2140 width=4) (actual time=0.013..0.182 rows=2140 loops=1) Heap Fetches: 0
-> Index Only Scan using origo_invoice_id_status_sent_idx on onp_crm_invoice inv (cost=0.28..53.98 rows=2140 width=8) (actual time=0.008..1.274 rows=11576 loops=1) Heap Fetches: 0
Planning time: 0.955 ms Execution time: 381.884 ms
(35 rows)
AFAIU, this is the problematic estimate: -> Hash Anti Join (cost=77.13..415.04 rows=1 width=8) (actual time=0.257..2.716 rows=11535 loops=1)
Right?
now (after the patch, and without needing to disable nest_loop) gives the following explain-plan:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=44409.89..44428.47 rows=1486 width=36) (actual time=366.502..366.594 rows=404 loops=1) Group Key: log.relation_id -> Hash Anti Join (cost=10157.30..43650.11 rows=151956 width=12) (actual time=99.420..347.309 rows=148327 loops=1) Hash Cond: (log.id = cem.calendar_entry_id) -> Merge Join (cost=0.56..26729.34 rows=379984 width=20) (actual time=0.011..150.415 rows=380614 loops=1) Merge Cond: (logfor.onp_user_id = log.logged_for) -> Index Scan using onp_crm_person_onp_id_idx on onp_crm_person logfor (cost=0.14..25.02 rows=4515 width=8) (actual time=0.006..0.030 rows=35 loops=1) Filter: (NOT is_resource) Rows Removed by Filter: 2
-> Index Scan using origo_activity_log_logged_for_idx on onp_crm_activity_log log (cost=0.42..21977.16 rows=380771 width=24) (actual time=0.003..69.406 rows=380623 loops=1) -> Hash (cost=7300.48..7300.48 rows=228500 width=4) (actual time=99.226..99.226 rows=232412 loops=1) Buckets: 262144 Batches: 1 Memory Usage: 10219kB -> Hash Join (cost=789.13..7300.48 rows=228500 width=4) (actual time=8.043..67.079 rows=232412 loops=1) Hash Cond: (cem.invoice_line_id = il.id) -> Seq Scan on onp_crm_calendarentry_invoice_membership cem (cost=0.00..3354.28 rows=232528 width=8) (actual time=0.007..16.230 rows=232528 loops=1) -> Hash (cost=646.27..646.27 rows=11429 width=4) (actual time=8.003..8.003 rows=11535 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 534kB -> Merge Anti Join (cost=0.85..646.27 rows=11429 width=4) (actual time=0.021..6.393 rows=11535 loops=1) Merge Cond: (il.invoice_id = creditnote.credit_against) -> Merge Join (cost=0.57..613.99 rows=11630 width=8) (actual time=0.010..4.771 rows=11630 loops=1) Merge Cond: (inv.entity_id = il.invoice_id) -> Index Only Scan using origo_invoice_id_status_sent_idx on onp_crm_invoice inv (cost=0.28..53.98 rows=2140 width=8) (actual time=0.006..0.318 rows=2140 loops=1) Heap Fetches: 0
-> Index Scan using onp_crm_invoice_line_invoice_id_lineno_key on onp_crm_invoice_line il (cost=0.29..409.28 rows=11630 width=8) (actual time=0.003..2.323 rows=11630 loops=1) -> Index Only Scan using onp_crm_invoice_credit_against_key on onp_crm_invoice creditnote (cost=0.28..47.98 rows=2140 width=4) (actual time=0.011..0.016 rows=38 loops=1) Heap Fetches: 0
Planning time: 0.931 ms Execution time: 366.710 ms
(28 rows)
And, for fun with max_parallel_workers_per_gather = 6
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=39788.73..39851.89 rows=1486 width=36) (actual time=234.289..234.747 rows=404 loops=1) Group Key: log.relation_id -> Sort (cost=39788.73..39799.88 rows=4458 width=36) (actual time=234.276..234.307 rows=1000 loops=1) Sort Key: log.relation_id Sort Method: quicksort Memory: 103kB -> Gather (cost=39054.15..39518.53 rows=4458 width=36) (actual time=233.012..234.110 rows=1000 loops=1) Workers Planned: 3
Workers Launched: 3
-> Partial HashAggregate (cost=38054.15..38072.73 rows=1486 width=36) (actual time=225.607..225.701 rows=250 loops=4) Group Key: log.relation_id -> Hash Join (cost=10238.19..37294.37 rows=151956 width=12) (actual time=163.060..219.652 rows=37082 loops=4) Hash Cond: (log.logged_for = logfor.onp_user_id) -> Hash Anti Join (cost=10156.73..28460.48 rows=152271 width=16) (actual time=162.618..212.351 rows=37121 loops=4) Hash Cond: (log.id = cem.calendar_entry_id) -> Parallel Seq Scan on onp_crm_activity_log log (cost=0.00..16117.29 rows=122829 width=24) (actual time=0.010..15.532 rows=95193 loops=4) -> Hash (cost=7300.48..7300.48 rows=228500 width=4) (actual time=161.270..161.270 rows=232412 loops=4) Buckets: 262144 Batches: 1 Memory Usage: 10219kB -> Hash Join (cost=789.13..7300.48 rows=228500 width=4) (actual time=18.276..112.034 rows=232412 loops=4) Hash Cond: (cem.invoice_line_id = il.id) -> Seq Scan on onp_crm_calendarentry_invoice_membership cem (cost=0.00..3354.28 rows=232528 width=8) (actual time=0.045..26.061 rows=232528 loops=4) -> Hash (cost=646.27..646.27 rows=11429 width=4) (actual time=18.116..18.116 rows=11535 loops=4) Buckets: 16384 Batches: 1 Memory Usage: 534kB -> Merge Anti Join (cost=0.85..646.27 rows=11429 width=4) (actual time=0.179..14.505 rows=11535 loops=4) Merge Cond: (il.invoice_id = creditnote.credit_against) -> Merge Join (cost=0.57..613.99 rows=11630 width=8) (actual time=0.113..10.871 rows=11630 loops=4) Merge Cond: (inv.entity_id = il.invoice_id) -> Index Only Scan using origo_invoice_id_status_sent_idx on onp_crm_invoice inv (cost=0.28..53.98 rows=2140 width=8) (actual time=0.063..0.811 rows=2140 loops=4) Heap Fetches: 0
-> Index Scan using onp_crm_invoice_line_invoice_id_lineno_key on onp_crm_invoice_line il (cost=0.29..409.28 rows=11630 width=8) (actual time=0.046..5.213 rows=11630 loops=4) -> Index Only Scan using onp_crm_invoice_credit_against_key on onp_crm_invoice creditnote (cost=0.28..47.98 rows=2140 width=4) (actual time=0.063..0.071 rows=38 loops=4) Heap Fetches: 0
-> Hash (cost=25.02..25.02 rows=4515 width=8) (actual time=0.325..0.325 rows=119 loops=4) Buckets: 8192 Batches: 1 Memory Usage: 69kB -> Index Scan using onp_crm_person_onp_id_idx on onp_crm_person logfor (cost=0.14..25.02 rows=4515 width=8) (actual time=0.032..0.270 rows=119 loops=4) Filter: (NOT is_resource) Rows Removed by Filter: 8
Planning time: 2.443 ms Execution time: 239.979 ms
(38 rows)
All in all, thanks for looking into this, and +1 for backpatching to 9.5.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
pgsql-hackers by date: