Thread: More records after sort

More records after sort

From
Nicolas Seinlet
Date:
Hello,

I'm facing a strange behaviour and I don't understand why. From a wider query plan, https://explain.dalibo.com/plan/7yh ,a table is scanned and filter based on its primary key, returning 98 records. Then, those records are sorted on another field, and the output is 758,247,643 records.

What can make PostgreSQL to generate records while sorting? What can I do to understand why this is occuring?

                           ->  Sort  (cost=228.52..228.76 rows=98 width=16) (actual time=1.502..31540.651 rows=758247643 loops=1)
                                 Sort Key: i.currency_id
                                 Sort Method: quicksort  Memory: 29kB
                                 ->  Index Scan using account_invoice_pkey on account_invoice i  (cost=0.29..225.28 rows=98 width=16) (actual time=0.035..1.481 rows=98 loops=1)
                                       Index Cond: (id = ANY ('{3839,6912,7164,11026,6479,9973,3599,11306,4092,10652,2732,8625,198,10536,5876,7864,5498,13080,4660,8948,597,10762,12573,11613,12905,11483,11227,12876,4470,8954,5628,4386,9321,4398,11595,7125,8116,1231,1610,6828,2105,9658,1616,5735,9066,4631,51,1185,11736,9579,4303,9983,12759,3889,4667,11258,9423,11855,6555,3164,7597,5596,10396,620,10330,4082,926,3252,5375,8118,7141,3500,12065,2794,2367,12088,7454,7998,11439,8893,6304,10814,7723,4393,3901,3383,5770,5857,4746,1858,5320,4574,8314,11914,7930,7308,6228,3627}'::integer[]))


I can of course describe data model or full query.

Thanks again for reading me,

Nicolas.

Attachment

Re: More records after sort

From
Tom Lane
Date:
Nicolas Seinlet <nicolas@seinlet.com> writes:
> I'm facing a strange behaviour and I don't understand why. From a wider query
plan, https://explain.dalibo.com/plan/7yh ,atable is scanned and filter based on its primary key, returning 98 records.
Then,those records are sorted on another field, and the output is 758,247,643 records. 

You haven't showed us the whole plan; tsk tsk.

However, I'm going to bet that this sort is the inner input of a merge
join, and that the apparent "extra" tuples are caused by the merge backing
up and rescanning the sort result to deal with duplicate join keys in its
outer input.  There must be a heck of a lot of duplicates.  The planner
would typically avoid using merge join if it knew that, so I'm wondering
if you have up-to-date statistics for the outer side of the join.

            regards, tom lane



Re: More records after sort

From
Nicolas Seinlet
Date:
On Tuesday, July 20th, 2021 at 16:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Nicolas Seinlet nicolas@seinlet.com writes:
>

> > I'm facing a strange behaviour and I don't understand why. From a wider query
plan, https://explain.dalibo.com/plan/7yh ,atable is scanned and filter based on its primary key, returning 98 records.
Then,those records are sorted on another field, and the output is 758,247,643 records. 
>

> You haven't showed us the whole plan; tsk tsk.
>

> However, I'm going to bet that this sort is the inner input of a merge
>

> join, and that the apparent "extra" tuples are caused by the merge backing
>

> up and rescanning the sort result to deal with duplicate join keys in its
>

> outer input. There must be a heck of a lot of duplicates. The planner
>

> would typically avoid using merge join if it knew that, so I'm wondering
>

> if you have up-to-date statistics for the outer side of the join.
>

> regards, tom lane

Many thanks for the fast response. The full query plan is below, and the Dalibo link points to it as well. You're
right,it's a merge join. One table doesn't have up-to-date statistics, because the table is filled during the
transactioninvolving this query. 

A workaround we found, but we don't understand why and it's frustrating, is to remove a join from the query (the one
withcurrency table) and replace it with a new column in the table for which the sort generates records. the currency
tablehas up to date statistics, is vacuumed, ... 

The table with no up-to-date statistics is involved in a where clause:
AND NOT EXISTS (SELECT invl_id FROM invl_aml_mapping WHERE invl_id=il.id)
AND NOT EXISTS (SELECT aml_id FROM invl_aml_mapping WHERE aml_id=ml.id)

Query:
INSERT INTO invl_aml_mapping_temp(invl_id, aml_id, cond)
                SELECT il.id, ml.id, 48
                    FROM account_invoice_line il
                    JOIN account_invoice i ON i.id = il.invoice_id
                    JOIN account_move m ON m.id = i.move_id
                    JOIN account_move_line ml ON ml.move_id = m.id
                    JOIN res_company comp ON comp.id = i.company_id
                WHERE il.display_type IS NULL
                  AND ml.tax_line_id IS NULL
                  AND NOT EXISTS (SELECT invl_id FROM invl_aml_mapping WHERE invl_id=il.id)
                  AND NOT EXISTS (SELECT aml_id FROM invl_aml_mapping WHERE aml_id=ml.id)
                  AND i.id =
ANY(ARRAY[3839,6912,7164,11026,6479,9973,3599,11306,4092,10652,2732,8625,198,10536,5876,7864,5498,13080,4660,8948,597,10762,12573,11613,12905,11483,11227,12876,4470,8954,5628,4386,9321,4398,11595,7125,8116,1231,1610,6828,2105,9658,1616,5735,9066,4631,51,1185,11736,9579,4303,9983,12759,3889,4667,11258,9423,11855,6555,3164,7597,5596,10396,620,10330,4082,926,3252,5375,8118,7141,3500,12065,2794,2367,12088,7454,7998,11439,8893,6304,10814,7723,4393,3901,3383,5770,5857,4746,1858,5320,4574,8314,11914,7930,7308,6228,3627])
                  AND il.account_id = ml.account_id AND (
             ARRAY(SELECT r.tax_id
                     FROM account_invoice_line_tax r
                    WHERE r.invoice_line_id = il.id
                 ORDER BY r.tax_id)
             =
             ARRAY(SELECT r.account_tax_id
                     FROM account_move_line_account_tax_rel r
                    WHERE r.account_move_line_id = ml.id
                 ORDER BY r.account_tax_id)
            ) AND
            ROUND(il.price_subtotal - ml._mig_124_precomputed_amount,
                  i.decimal_places) = 0.0
             AND il.product_id = ml.product_id;

Plan:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on invl_aml_mapping_temp  (cost=241.69..505.08 rows=1 width=12) (actual time=185635.849..185635.883 rows=0
loops=1)
   ->  Nested Loop Anti Join  (cost=241.69..505.08 rows=1 width=12) (actual time=185635.847..185635.880 rows=0 loops=1)
         ->  Nested Loop  (cost=241.27..504.61 rows=1 width=8) (actual time=185635.846..185635.873 rows=0 loops=1)
               ->  Nested Loop  (cost=241.15..504.43 rows=1 width=12) (actual time=185635.839..185635.866 rows=0
loops=1)
                     Join Filter: (i.move_id = m.id)
                     ->  Merge Join  (cost=240.73..503.98 rows=1 width=20) (actual time=185635.815..185635.836 rows=0
loops=1)
                           Merge Cond: (res_currency.id = i.currency_id)
                           Join Filter: ((il.invoice_id = i.id) AND (ml.move_id = i.move_id))
                           Rows Removed by Join Filter: 766064628
                           ->  Nested Loop  (cost=12.22..16793.68 rows=64 width=20) (actual time=7.908..91222.587
rows=39084931loops=1) 
                                 Join Filter: (round((il.price_subtotal - ml._mig_124_precomputed_amount),
res_currency.decimal_places)= 0.0) 
                                 Rows Removed by Join Filter: 2990678
                                 ->  Index Scan using res_currency_pkey on res_currency  (cost=0.14..14.27 rows=176
width=8)(actual time=0.015..0.254 rows=6 loops=1) 
                                 ->  Materialize  (cost=12.07..16522.64 rows=73 width=28) (actual time=0.386..13293.316
rows=7012602loops=6) 
                                       ->  Nested Loop  (cost=12.07..16522.27 rows=73 width=28) (actual
time=2.262..74119.980rows=8415119 loops=1) 
                                             ->  Nested Loop Anti Join  (cost=0.42..11037.09 rows=306 width=22) (actual
time=0.063..481.749rows=54864 loops=1) 
                                                   ->  Seq Scan on account_invoice_line il  (cost=0.00..9885.48
rows=752width=22) (actual time=0.049..121.238 rows=150448 loops=1) 
                                                         Filter: (display_type IS NULL)
                                                   ->  Index Only Scan using _upg_invl_aml_mapping_invl_id_17ac41c4868
oninvl_aml_mapping  (cost=0.42..2.26 rows=2 width=4) (actual time=0.002..0.002 rows=1 loops=150448) 
                                                         Index Cond: (invl_id = il.id)
                                                         Heap Fetches: 95944
                                             ->  Bitmap Heap Scan on account_move_line ml  (cost=11.65..17.92 rows=1
width=22)(actual time=0.565..1.322 rows=153 loops=54864) 
                                                   Recheck Cond: ((product_id = il.product_id) AND (account_id =
il.account_id))
                                                   Filter: ((tax_line_id IS NULL) AND ((SubPlan 1) = (SubPlan 2)))
                                                   Rows Removed by Filter: 0
                                                   Heap Blocks: exact=4597567
                                                   ->  BitmapAnd  (cost=11.65..11.65 rows=1 width=0) (actual
time=0.550..0.550rows=0 loops=54864) 
                                                         ->  Bitmap Index Scan on upgrade_fk_related_idx_42
(cost=0.00..1.33rows=15 width=0) (actual time=0.013..0.013 rows=161 loops=54864) 
                                                               Index Cond: (product_id = il.product_id)
                                                         ->  Bitmap Index Scan on account_move_line_account_id_index
(cost=0.00..10.07rows=1047 width=0) (actual time=0.968..0.968 rows=37775 loops=30031) 
                                                               Index Cond: (account_id = il.account_id)
                                                   SubPlan 1
                                                     ->  Index Only Scan using account_invoice_line_tax_pkey on
account_invoice_line_taxr  (cost=0.42..2.64 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=8425000) 
                                                           Index Cond: (invoice_line_id = il.id)
                                                           Heap Fetches: 0
                                                   SubPlan 2
                                                     ->  Index Only Scan using account_move_line_account_tax_rel_pkey
onaccount_move_line_account_tax_rel r_1  (cost=0.29..2.51 rows=1 width=4) (actual time=0.001..0.001 rows=0
loops=8425000)
                                                           Index Cond: (account_move_line_id = ml.id)
                                                           Heap Fetches: 1099
                           ->  Sort  (cost=228.52..228.76 rows=98 width=16) (actual time=1.502..31540.651
rows=758247643loops=1) 
                                 Sort Key: i.currency_id
                                 Sort Method: quicksort  Memory: 29kB
                                 ->  Index Scan using account_invoice_pkey on account_invoice i  (cost=0.29..225.28
rows=98width=16) (actual time=0.035..1.481 rows=98 loops=1) 
                                       Index Cond: (id = ANY
('{3839,6912,7164,11026,6479,9973,3599,11306,4092,10652,2732,8625,198,10536,5876,7864,5498,13080,4660,8948,597,10762,12573,11613,12905,11483,11227,12876,4470,8954,5628,4386,9321,4398,11595,7125,8116,1231,1610,6828,2105,9658,1616,5735,9066,4631,51,1185,1173

6,9579,4303,9983,12759,3889,4667,11258,9423,11855,6555,3164,7597,5596,10396,620,10330,4082,926,3252,5375,8118,7141,3500,12065,2794,2367,12088,7454,7998,11439,8893,6304,10814,7723,4393,3901,3383,5770,5857,4746,1858,5320,4574,8314,11914,7930,7308,6228,3627}'::integer[]))
                     ->  Index Only Scan using account_move_pkey on account_move m  (cost=0.41..0.44 rows=1 width=4)
(neverexecuted) 
                           Index Cond: (id = ml.move_id)
                           Heap Fetches: 0
               ->  Index Only Scan using res_company_pkey on res_company comp  (cost=0.12..0.18 rows=1 width=4) (never
executed)
                     Index Cond: (id = i.company_id)
                     Heap Fetches: 0
         ->  Index Only Scan using _upg_invl_aml_mapping_aml_id_17ac41c486f on invl_aml_mapping invl_aml_mapping_1
(cost=0.42..0.46rows=2 width=4) (never executed) 
               Index Cond: (aml_id = ml.id)
               Heap Fetches: 0
 Planning Time: 14.884 ms
 Execution Time: 185674.278 ms
(55 rows)

Thanks again,

Nicolas.
Attachment

Re: More records after sort

From
Tom Lane
Date:
Nicolas Seinlet <nicolas@seinlet.com> writes:
> Many thanks for the fast response. The full query plan is below, and the Dalibo link points to it as well. You're
right,it's a merge join. One table doesn't have up-to-date statistics, because the table is filled during the
transactioninvolving this query. 

Sure looks like incorporating an ANALYZE into the transaction would help
matters.  This misestimation:

>                                                    ->  Seq Scan on account_invoice_line il  (cost=0.00..9885.48
rows=752width=22) (actual time=0.049..121.238 rows=150448 loops=1) 
>                                                          Filter: (display_type IS NULL)

looks like it's really hurting you, and "ANALYZE account_invoice_line"
after loading that table ought to be enough to fix that.

            regards, tom lane



Re: More records after sort

From
Rob Sargent
Date:
Query:
INSERT INTO invl_aml_mapping_temp(invl_id, aml_id, cond)
               SELECT il.id, ml.id, 48
                   FROM account_invoice_line il
                   JOIN account_invoice i ON i.id = il.invoice_id
                   JOIN account_move m ON m.id = i.move_id
                   JOIN account_move_line ml ON ml.move_id = m.id
                   JOIN res_company comp ON comp.id = i.company_id
               WHERE il.display_type IS NULL
                 AND ml.tax_line_id IS NULL
                 AND NOT EXISTS (SELECT invl_id FROM invl_aml_mapping WHERE invl_id=il.id)
                 AND NOT EXISTS (SELECT aml_id FROM invl_aml_mapping WHERE aml_id=ml.id)
                 AND i.id

Out of curiosity, would combining those two NOT EXISTS in one help? Ever? 
i.e. WHERE invl_id=il.id or aml_id = ml.id


Re: More records after sort

From
Nicolas Seinlet
Date:
Thanks all, I'll try this on Monday when I go back to the office.

Nicolas.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, July 21st, 2021 at 02:55, Rob Sargent <robjsargent@gmail.com> wrote:

Query:
INSERT INTO invl_aml_mapping_temp(invl_id, aml_id, cond)
               SELECT il.id, ml.id, 48
                   FROM account_invoice_line il
                   JOIN account_invoice i ON i.id = il.invoice_id
                   JOIN account_move m ON m.id = i.move_id
                   JOIN account_move_line ml ON ml.move_id = m.id
                   JOIN res_company comp ON comp.id = i.company_id
               WHERE il.display_type IS NULL
                 AND ml.tax_line_id IS NULL
                 AND NOT EXISTS (SELECT invl_id FROM invl_aml_mapping WHERE invl_id=il.id)
                 AND NOT EXISTS (SELECT aml_id FROM invl_aml_mapping WHERE aml_id=ml.id)
                 AND i.id

Out of curiosity, would combining those two NOT EXISTS in one help? Ever? 
i.e. WHERE invl_id=il.id or aml_id = ml.id


Attachment