Re: Bogus ANALYZE results for an otherwise-unique column with many nulls - Mailing list pgsql-hackers

From Andreas Joseph Krogh
Subject Re: Bogus ANALYZE results for an otherwise-unique column with many nulls
Date
Msg-id VisenaEmail.5b.208c70dd7d62e340.15659a64f41@tc7-visena
Whole thread Raw
In response to Re: Bogus ANALYZE results for an otherwise-unique column with many nulls  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
På fredag 05. august 2016 kl. 01:01:06, skrev Tom Lane <tgl@sss.pgh.pa.us>:
I wrote:
> Looking around, there are a couple of places outside commands/analyze.c
> that are making the same mistake, so this patch isn't complete, but it
> illustrates what needs to be done.

Here's a more complete patch.

regards, tom lane
 
(Sorry for HTML-email, but we live in 2016, and I think it improves readability)
 
Great!
 
I have tested it (with PG-9.6 from HEAD, e7caacf733f3ee77a555aa715ab6fbf4434e6b52) and it sure looks like it fixes the problem for my query.
 
The query:
 
explain analyze
SELECT
      log.relation_id as company_id    , sum(log.duration) AS duration
FROM onp_crm_activity_log log
    JOIN onp_crm_person logfor ON logfor.onp_user_id = log.logged_for AND logfor.is_resource = FALSE
WHERE 1 = 1
      -- Filter out already invoiced
      AND NOT EXISTS(    SELECT * FROM onp_crm_calendarentry_invoice_membership cem        JOIN onp_crm_invoice_line il ON cem.invoice_line_id = il.id        JOIN onp_crm_invoice inv ON il.invoice_id = inv.entity_id    WHERE cem.calendar_entry_id = log.id          AND NOT EXISTS(        SELECT * FROM onp_crm_invoice creditnote        WHERE il.invoice_id = creditnote.credit_against    )
)
GROUP BY
    log.relation_id
;
 
Gave the following explain-plan (before the patch), with enable_nestloop=off (needed to prevent nest-loop-anti-join which caused this query to execute in ~26 min.):
 
                                                                                                          QUERY PLAN                                                                                                           
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=25201.62..25220.06 rows=1475 width=36) (actual time=381.191..381.281 rows=404 loops=1)   Group Key: log.relation_id   ->  Hash Anti Join  (cost=4782.34..25148.28 rows=10667 width=12) (actual time=103.683..361.409 rows=148330 loops=1)         Hash Cond: (log.id = cem.calendar_entry_id)         ->  Hash Join  (cost=81.46..20312.73 rows=10667 width=20) (actual time=0.100..156.432 rows=380617 loops=1)               Hash Cond: (log.logged_for = logfor.onp_user_id)               ->  Seq Scan on onp_crm_activity_log log  (cost=0.00..18696.71 rows=380771 width=24) (actual time=0.005..49.195 rows=380774 loops=1)               ->  Hash  (cost=25.02..25.02 rows=4515 width=8) (actual time=0.078..0.078 rows=119 loops=1)                     Buckets: 8192  Batches: 1  Memory Usage: 69kB                     ->  Index Scan using onp_crm_person_onp_id_idx on onp_crm_person logfor  (cost=0.14..25.02 rows=4515 width=8) (actual time=0.005..0.063 rows=119 loops=1)                           Filter: (NOT is_resource)                           Rows Removed by Filter: 8
         ->  Hash  (cost=4700.87..4700.87 rows=1 width=4) (actual time=103.575..103.575 rows=232412 loops=1)               Buckets: 262144 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 10219kB               ->  Hash Join  (cost=474.41..4700.87 rows=1 width=4) (actual time=9.724..76.015 rows=232412 loops=1)                     Hash Cond: (cem.invoice_line_id = il.id)                     ->  Seq Scan on onp_crm_calendarentry_invoice_membership cem  (cost=0.00..3354.28 rows=232528 width=8) (actual time=0.004..17.626 rows=232528 loops=1)                     ->  Hash  (cost=474.40..474.40 rows=1 width=4) (actual time=9.710..9.710 rows=11535 loops=1)                           Buckets: 16384 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 534kB                           ->  Merge Join  (cost=415.33..474.40 rows=1 width=4) (actual time=4.149..8.467 rows=11535 loops=1)                                 Merge Cond: (il.invoice_id = inv.entity_id)                                 ->  Sort  (cost=415.05..415.06 rows=1 width=8) (actual time=4.138..4.701 rows=11535 loops=1)                                       Sort Key: il.invoice_id                                       Sort Method: quicksort  Memory: 925kB                                       ->  Hash Anti Join  (cost=77.13..415.04 rows=1 width=8) (actual time=0.257..2.716 rows=11535 loops=1)                                             Hash Cond: (il.invoice_id = creditnote.credit_against)                                             ->  Seq Scan on onp_crm_invoice_line il  (cost=0.00..294.30 rows=11630 width=8) (actual time=0.003..0.995 rows=11630 loops=1)                                             ->  Hash  (cost=50.38..50.38 rows=2140 width=4) (actual time=0.247..0.247 rows=37 loops=1)                                                   Buckets: 4096  Batches: 1  Memory Usage: 34kB                                                   ->  Index Only Scan using origo_invoice_credit_against_idx on onp_crm_invoice creditnote  (cost=0.28..50.38 rows=2140 width=4) (actual time=0.013..0.182 rows=2140 loops=1)                                                         Heap Fetches: 0
                                 ->  Index Only Scan using origo_invoice_id_status_sent_idx on onp_crm_invoice inv  (cost=0.28..53.98 rows=2140 width=8) (actual time=0.008..1.274 rows=11576 loops=1)                                       Heap Fetches: 0
 Planning time: 0.955 ms Execution time: 381.884 ms
(35 rows)
 
AFAIU, this is the problematic estimate: -> Hash Anti Join (cost=77.13..415.04 rows=1 width=8) (actual time=0.257..2.716 rows=11535 loops=1)
Right?
 
now (after the patch, and without needing to disable nest_loop) gives the following explain-plan:
                                                                                                       QUERY PLAN                                                                                                       
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=44409.89..44428.47 rows=1486 width=36) (actual time=366.502..366.594 rows=404 loops=1)   Group Key: log.relation_id   ->  Hash Anti Join  (cost=10157.30..43650.11 rows=151956 width=12) (actual time=99.420..347.309 rows=148327 loops=1)         Hash Cond: (log.id = cem.calendar_entry_id)         ->  Merge Join  (cost=0.56..26729.34 rows=379984 width=20) (actual time=0.011..150.415 rows=380614 loops=1)               Merge Cond: (logfor.onp_user_id = log.logged_for)               ->  Index Scan using onp_crm_person_onp_id_idx on onp_crm_person logfor  (cost=0.14..25.02 rows=4515 width=8) (actual time=0.006..0.030 rows=35 loops=1)                     Filter: (NOT is_resource)                     Rows Removed by Filter: 2
               ->  Index Scan using origo_activity_log_logged_for_idx on onp_crm_activity_log log  (cost=0.42..21977.16 rows=380771 width=24) (actual time=0.003..69.406 rows=380623 loops=1)         ->  Hash  (cost=7300.48..7300.48 rows=228500 width=4) (actual time=99.226..99.226 rows=232412 loops=1)               Buckets: 262144  Batches: 1  Memory Usage: 10219kB               ->  Hash Join  (cost=789.13..7300.48 rows=228500 width=4) (actual time=8.043..67.079 rows=232412 loops=1)                     Hash Cond: (cem.invoice_line_id = il.id)                     ->  Seq Scan on onp_crm_calendarentry_invoice_membership cem  (cost=0.00..3354.28 rows=232528 width=8) (actual time=0.007..16.230 rows=232528 loops=1)                     ->  Hash  (cost=646.27..646.27 rows=11429 width=4) (actual time=8.003..8.003 rows=11535 loops=1)                           Buckets: 16384  Batches: 1  Memory Usage: 534kB                           ->  Merge Anti Join  (cost=0.85..646.27 rows=11429 width=4) (actual time=0.021..6.393 rows=11535 loops=1)                                 Merge Cond: (il.invoice_id = creditnote.credit_against)                                 ->  Merge Join  (cost=0.57..613.99 rows=11630 width=8) (actual time=0.010..4.771 rows=11630 loops=1)                                       Merge Cond: (inv.entity_id = il.invoice_id)                                       ->  Index Only Scan using origo_invoice_id_status_sent_idx on onp_crm_invoice inv  (cost=0.28..53.98 rows=2140 width=8) (actual time=0.006..0.318 rows=2140 loops=1)                                             Heap Fetches: 0
                                       ->  Index Scan using onp_crm_invoice_line_invoice_id_lineno_key on onp_crm_invoice_line il  (cost=0.29..409.28 rows=11630 width=8) (actual time=0.003..2.323 rows=11630 loops=1)                                 ->  Index Only Scan using onp_crm_invoice_credit_against_key on onp_crm_invoice creditnote  (cost=0.28..47.98 rows=2140 width=4) (actual time=0.011..0.016 rows=38 loops=1)                                       Heap Fetches: 0
 Planning time: 0.931 ms Execution time: 366.710 ms
(28 rows)
 
And, for fun with max_parallel_workers_per_gather = 6
                                                                                                                   QUERY PLAN                                                                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=39788.73..39851.89 rows=1486 width=36) (actual time=234.289..234.747 rows=404 loops=1)   Group Key: log.relation_id   ->  Sort  (cost=39788.73..39799.88 rows=4458 width=36) (actual time=234.276..234.307 rows=1000 loops=1)         Sort Key: log.relation_id         Sort Method: quicksort  Memory: 103kB         ->  Gather  (cost=39054.15..39518.53 rows=4458 width=36) (actual time=233.012..234.110 rows=1000 loops=1)               Workers Planned: 3
               Workers Launched: 3
               ->  Partial HashAggregate  (cost=38054.15..38072.73 rows=1486 width=36) (actual time=225.607..225.701 rows=250 loops=4)                     Group Key: log.relation_id                     ->  Hash Join  (cost=10238.19..37294.37 rows=151956 width=12) (actual time=163.060..219.652 rows=37082 loops=4)                           Hash Cond: (log.logged_for = logfor.onp_user_id)                           ->  Hash Anti Join  (cost=10156.73..28460.48 rows=152271 width=16) (actual time=162.618..212.351 rows=37121 loops=4)                                 Hash Cond: (log.id = cem.calendar_entry_id)                                 ->  Parallel Seq Scan on onp_crm_activity_log log  (cost=0.00..16117.29 rows=122829 width=24) (actual time=0.010..15.532 rows=95193 loops=4)                                 ->  Hash  (cost=7300.48..7300.48 rows=228500 width=4) (actual time=161.270..161.270 rows=232412 loops=4)                                       Buckets: 262144  Batches: 1  Memory Usage: 10219kB                                       ->  Hash Join  (cost=789.13..7300.48 rows=228500 width=4) (actual time=18.276..112.034 rows=232412 loops=4)                                             Hash Cond: (cem.invoice_line_id = il.id)                                             ->  Seq Scan on onp_crm_calendarentry_invoice_membership cem  (cost=0.00..3354.28 rows=232528 width=8) (actual time=0.045..26.061 rows=232528 loops=4)                                             ->  Hash  (cost=646.27..646.27 rows=11429 width=4) (actual time=18.116..18.116 rows=11535 loops=4)                                                   Buckets: 16384  Batches: 1  Memory Usage: 534kB                                                   ->  Merge Anti Join  (cost=0.85..646.27 rows=11429 width=4) (actual time=0.179..14.505 rows=11535 loops=4)                                                         Merge Cond: (il.invoice_id = creditnote.credit_against)                                                         ->  Merge Join  (cost=0.57..613.99 rows=11630 width=8) (actual time=0.113..10.871 rows=11630 loops=4)                                                               Merge Cond: (inv.entity_id = il.invoice_id)                                                               ->  Index Only Scan using origo_invoice_id_status_sent_idx on onp_crm_invoice inv  (cost=0.28..53.98 rows=2140 width=8) (actual time=0.063..0.811 rows=2140 loops=4)                                                                     Heap Fetches: 0
                                                               ->  Index Scan using onp_crm_invoice_line_invoice_id_lineno_key on onp_crm_invoice_line il  (cost=0.29..409.28 rows=11630 width=8) (actual time=0.046..5.213 rows=11630 loops=4)                                                         ->  Index Only Scan using onp_crm_invoice_credit_against_key on onp_crm_invoice creditnote  (cost=0.28..47.98 rows=2140 width=4) (actual time=0.063..0.071 rows=38 loops=4)                                                               Heap Fetches: 0
                           ->  Hash  (cost=25.02..25.02 rows=4515 width=8) (actual time=0.325..0.325 rows=119 loops=4)                                 Buckets: 8192  Batches: 1  Memory Usage: 69kB                                 ->  Index Scan using onp_crm_person_onp_id_idx on onp_crm_person logfor  (cost=0.14..25.02 rows=4515 width=8) (actual time=0.032..0.270 rows=119 loops=4)                                       Filter: (NOT is_resource)                                       Rows Removed by Filter: 8
 Planning time: 2.443 ms Execution time: 239.979 ms
(38 rows)
 
All in all, thanks for looking into this, and +1 for backpatching to 9.5.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

pgsql-hackers by date:

Previous
From: Murat Tuncer
Date:
Subject: truncate trigger for foreign data wrappers
Next
From: "Tsunakawa, Takayuki"
Date:
Subject: Re: [RFC] Change the default of update_process_title to off