Thread: More records after sort
Hello,
I'm facing a strange behaviour and I don't understand why. From a wider query plan, https://explain.dalibo.com/plan/7yh ,a table is scanned and filter based on its primary key, returning 98 records. Then, those records are sorted on another field, and the output is 758,247,643 records.
What can make PostgreSQL to generate records while sorting? What can I do to understand why this is occuring?
-> Sort (cost=228.52..228.76 rows=98 width=16) (actual time=1.502..31540.651 rows=758247643 loops=1)
Sort Key: i.currency_id
Sort Method: quicksort Memory: 29kB
-> Index Scan using account_invoice_pkey on account_invoice i (cost=0.29..225.28 rows=98 width=16) (actual time=0.035..1.481 rows=98 loops=1)
Index Cond: (id = ANY ('{3839,6912,7164,11026,6479,9973,3599,11306,4092,10652,2732,8625,198,10536,5876,7864,5498,13080,4660,8948,597,10762,12573,11613,12905,11483,11227,12876,4470,8954,5628,4386,9321,4398,11595,7125,8116,1231,1610,6828,2105,9658,1616,5735,9066,4631,51,1185,11736,9579,4303,9983,12759,3889,4667,11258,9423,11855,6555,3164,7597,5596,10396,620,10330,4082,926,3252,5375,8118,7141,3500,12065,2794,2367,12088,7454,7998,11439,8893,6304,10814,7723,4393,3901,3383,5770,5857,4746,1858,5320,4574,8314,11914,7930,7308,6228,3627}'::integer[]))
I can of course describe data model or full query.
Thanks again for reading me,
Nicolas.
Attachment
Nicolas Seinlet <nicolas@seinlet.com> writes: > I'm facing a strange behaviour and I don't understand why. From a wider query plan, https://explain.dalibo.com/plan/7yh ,atable is scanned and filter based on its primary key, returning 98 records. Then,those records are sorted on another field, and the output is 758,247,643 records. You haven't showed us the whole plan; tsk tsk. However, I'm going to bet that this sort is the inner input of a merge join, and that the apparent "extra" tuples are caused by the merge backing up and rescanning the sort result to deal with duplicate join keys in its outer input. There must be a heck of a lot of duplicates. The planner would typically avoid using merge join if it knew that, so I'm wondering if you have up-to-date statistics for the outer side of the join. regards, tom lane
On Tuesday, July 20th, 2021 at 16:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Nicolas Seinlet nicolas@seinlet.com writes: > > > I'm facing a strange behaviour and I don't understand why. From a wider query plan, https://explain.dalibo.com/plan/7yh ,atable is scanned and filter based on its primary key, returning 98 records. Then,those records are sorted on another field, and the output is 758,247,643 records. > > You haven't showed us the whole plan; tsk tsk. > > However, I'm going to bet that this sort is the inner input of a merge > > join, and that the apparent "extra" tuples are caused by the merge backing > > up and rescanning the sort result to deal with duplicate join keys in its > > outer input. There must be a heck of a lot of duplicates. The planner > > would typically avoid using merge join if it knew that, so I'm wondering > > if you have up-to-date statistics for the outer side of the join. > > regards, tom lane Many thanks for the fast response. The full query plan is below, and the Dalibo link points to it as well. You're right,it's a merge join. One table doesn't have up-to-date statistics, because the table is filled during the transactioninvolving this query. A workaround we found, but we don't understand why and it's frustrating, is to remove a join from the query (the one withcurrency table) and replace it with a new column in the table for which the sort generates records. the currency tablehas up to date statistics, is vacuumed, ... The table with no up-to-date statistics is involved in a where clause: AND NOT EXISTS (SELECT invl_id FROM invl_aml_mapping WHERE invl_id=il.id) AND NOT EXISTS (SELECT aml_id FROM invl_aml_mapping WHERE aml_id=ml.id) Query: INSERT INTO invl_aml_mapping_temp(invl_id, aml_id, cond) SELECT il.id, ml.id, 48 FROM account_invoice_line il JOIN account_invoice i ON i.id = il.invoice_id JOIN account_move m ON m.id = i.move_id JOIN account_move_line ml ON ml.move_id = m.id JOIN res_company comp ON comp.id = i.company_id WHERE il.display_type IS NULL AND ml.tax_line_id IS NULL AND NOT EXISTS (SELECT invl_id FROM invl_aml_mapping WHERE invl_id=il.id) AND NOT EXISTS (SELECT aml_id FROM invl_aml_mapping WHERE aml_id=ml.id) AND i.id = ANY(ARRAY[3839,6912,7164,11026,6479,9973,3599,11306,4092,10652,2732,8625,198,10536,5876,7864,5498,13080,4660,8948,597,10762,12573,11613,12905,11483,11227,12876,4470,8954,5628,4386,9321,4398,11595,7125,8116,1231,1610,6828,2105,9658,1616,5735,9066,4631,51,1185,11736,9579,4303,9983,12759,3889,4667,11258,9423,11855,6555,3164,7597,5596,10396,620,10330,4082,926,3252,5375,8118,7141,3500,12065,2794,2367,12088,7454,7998,11439,8893,6304,10814,7723,4393,3901,3383,5770,5857,4746,1858,5320,4574,8314,11914,7930,7308,6228,3627]) AND il.account_id = ml.account_id AND ( ARRAY(SELECT r.tax_id FROM account_invoice_line_tax r WHERE r.invoice_line_id = il.id ORDER BY r.tax_id) = ARRAY(SELECT r.account_tax_id FROM account_move_line_account_tax_rel r WHERE r.account_move_line_id = ml.id ORDER BY r.account_tax_id) ) AND ROUND(il.price_subtotal - ml._mig_124_precomputed_amount, i.decimal_places) = 0.0 AND il.product_id = ml.product_id; Plan: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Insert on invl_aml_mapping_temp (cost=241.69..505.08 rows=1 width=12) (actual time=185635.849..185635.883 rows=0 loops=1) -> Nested Loop Anti Join (cost=241.69..505.08 rows=1 width=12) (actual time=185635.847..185635.880 rows=0 loops=1) -> Nested Loop (cost=241.27..504.61 rows=1 width=8) (actual time=185635.846..185635.873 rows=0 loops=1) -> Nested Loop (cost=241.15..504.43 rows=1 width=12) (actual time=185635.839..185635.866 rows=0 loops=1) Join Filter: (i.move_id = m.id) -> Merge Join (cost=240.73..503.98 rows=1 width=20) (actual time=185635.815..185635.836 rows=0 loops=1) Merge Cond: (res_currency.id = i.currency_id) Join Filter: ((il.invoice_id = i.id) AND (ml.move_id = i.move_id)) Rows Removed by Join Filter: 766064628 -> Nested Loop (cost=12.22..16793.68 rows=64 width=20) (actual time=7.908..91222.587 rows=39084931loops=1) Join Filter: (round((il.price_subtotal - ml._mig_124_precomputed_amount), res_currency.decimal_places)= 0.0) Rows Removed by Join Filter: 2990678 -> Index Scan using res_currency_pkey on res_currency (cost=0.14..14.27 rows=176 width=8)(actual time=0.015..0.254 rows=6 loops=1) -> Materialize (cost=12.07..16522.64 rows=73 width=28) (actual time=0.386..13293.316 rows=7012602loops=6) -> Nested Loop (cost=12.07..16522.27 rows=73 width=28) (actual time=2.262..74119.980rows=8415119 loops=1) -> Nested Loop Anti Join (cost=0.42..11037.09 rows=306 width=22) (actual time=0.063..481.749rows=54864 loops=1) -> Seq Scan on account_invoice_line il (cost=0.00..9885.48 rows=752width=22) (actual time=0.049..121.238 rows=150448 loops=1) Filter: (display_type IS NULL) -> Index Only Scan using _upg_invl_aml_mapping_invl_id_17ac41c4868 oninvl_aml_mapping (cost=0.42..2.26 rows=2 width=4) (actual time=0.002..0.002 rows=1 loops=150448) Index Cond: (invl_id = il.id) Heap Fetches: 95944 -> Bitmap Heap Scan on account_move_line ml (cost=11.65..17.92 rows=1 width=22)(actual time=0.565..1.322 rows=153 loops=54864) Recheck Cond: ((product_id = il.product_id) AND (account_id = il.account_id)) Filter: ((tax_line_id IS NULL) AND ((SubPlan 1) = (SubPlan 2))) Rows Removed by Filter: 0 Heap Blocks: exact=4597567 -> BitmapAnd (cost=11.65..11.65 rows=1 width=0) (actual time=0.550..0.550rows=0 loops=54864) -> Bitmap Index Scan on upgrade_fk_related_idx_42 (cost=0.00..1.33rows=15 width=0) (actual time=0.013..0.013 rows=161 loops=54864) Index Cond: (product_id = il.product_id) -> Bitmap Index Scan on account_move_line_account_id_index (cost=0.00..10.07rows=1047 width=0) (actual time=0.968..0.968 rows=37775 loops=30031) Index Cond: (account_id = il.account_id) SubPlan 1 -> Index Only Scan using account_invoice_line_tax_pkey on account_invoice_line_taxr (cost=0.42..2.64 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=8425000) Index Cond: (invoice_line_id = il.id) Heap Fetches: 0 SubPlan 2 -> Index Only Scan using account_move_line_account_tax_rel_pkey onaccount_move_line_account_tax_rel r_1 (cost=0.29..2.51 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=8425000) Index Cond: (account_move_line_id = ml.id) Heap Fetches: 1099 -> Sort (cost=228.52..228.76 rows=98 width=16) (actual time=1.502..31540.651 rows=758247643loops=1) Sort Key: i.currency_id Sort Method: quicksort Memory: 29kB -> Index Scan using account_invoice_pkey on account_invoice i (cost=0.29..225.28 rows=98width=16) (actual time=0.035..1.481 rows=98 loops=1) Index Cond: (id = ANY ('{3839,6912,7164,11026,6479,9973,3599,11306,4092,10652,2732,8625,198,10536,5876,7864,5498,13080,4660,8948,597,10762,12573,11613,12905,11483,11227,12876,4470,8954,5628,4386,9321,4398,11595,7125,8116,1231,1610,6828,2105,9658,1616,5735,9066,4631,51,1185,1173 6,9579,4303,9983,12759,3889,4667,11258,9423,11855,6555,3164,7597,5596,10396,620,10330,4082,926,3252,5375,8118,7141,3500,12065,2794,2367,12088,7454,7998,11439,8893,6304,10814,7723,4393,3901,3383,5770,5857,4746,1858,5320,4574,8314,11914,7930,7308,6228,3627}'::integer[])) -> Index Only Scan using account_move_pkey on account_move m (cost=0.41..0.44 rows=1 width=4) (neverexecuted) Index Cond: (id = ml.move_id) Heap Fetches: 0 -> Index Only Scan using res_company_pkey on res_company comp (cost=0.12..0.18 rows=1 width=4) (never executed) Index Cond: (id = i.company_id) Heap Fetches: 0 -> Index Only Scan using _upg_invl_aml_mapping_aml_id_17ac41c486f on invl_aml_mapping invl_aml_mapping_1 (cost=0.42..0.46rows=2 width=4) (never executed) Index Cond: (aml_id = ml.id) Heap Fetches: 0 Planning Time: 14.884 ms Execution Time: 185674.278 ms (55 rows) Thanks again, Nicolas.
Attachment
Nicolas Seinlet <nicolas@seinlet.com> writes: > Many thanks for the fast response. The full query plan is below, and the Dalibo link points to it as well. You're right,it's a merge join. One table doesn't have up-to-date statistics, because the table is filled during the transactioninvolving this query. Sure looks like incorporating an ANALYZE into the transaction would help matters. This misestimation: > -> Seq Scan on account_invoice_line il (cost=0.00..9885.48 rows=752width=22) (actual time=0.049..121.238 rows=150448 loops=1) > Filter: (display_type IS NULL) looks like it's really hurting you, and "ANALYZE account_invoice_line" after loading that table ought to be enough to fix that. regards, tom lane
Query:
INSERT INTO invl_aml_mapping_temp(invl_id, aml_id, cond)
SELECT il.id, ml.id, 48
FROM account_invoice_line il
JOIN account_invoice i ON i.id = il.invoice_id
JOIN account_move m ON m.id = i.move_id
JOIN account_move_line ml ON ml.move_id = m.id
JOIN res_company comp ON comp.id = i.company_id
WHERE il.display_type IS NULL
AND ml.tax_line_id IS NULL
AND NOT EXISTS (SELECT invl_id FROM invl_aml_mapping WHERE invl_id=il.id)
AND NOT EXISTS (SELECT aml_id FROM invl_aml_mapping WHERE aml_id=ml.id)
AND i.id =
Out of curiosity, would combining those two NOT EXISTS in one help? Ever?
Thanks all, I'll try this on Monday when I go back to the office.
Nicolas.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, July 21st, 2021 at 02:55, Rob Sargent <robjsargent@gmail.com> wrote:
Query:
INSERT INTO invl_aml_mapping_temp(invl_id, aml_id, cond)
SELECT il.id, ml.id, 48
FROM account_invoice_line il
JOIN account_invoice i ON i.id = il.invoice_id
JOIN account_move m ON m.id = i.move_id
JOIN account_move_line ml ON ml.move_id = m.id
JOIN res_company comp ON comp.id = i.company_id
WHERE il.display_type IS NULL
AND ml.tax_line_id IS NULL
AND NOT EXISTS (SELECT invl_id FROM invl_aml_mapping WHERE invl_id=il.id)
AND NOT EXISTS (SELECT aml_id FROM invl_aml_mapping WHERE aml_id=ml.id)
AND i.id =Out of curiosity, would combining those two NOT EXISTS in one help? Ever?