Thread: UPDATE using 3 medium-sized tables causes runaway hash table and fills disk
UPDATE using 3 medium-sized tables causes runaway hash table and fills disk
From
Rowan Collins
Date:
Hi All, I've come upon some very strange behaviour with an UPDATE query which causes Postgres to consume all the disk space on the server for no apparent reason. Basically, I'm trying to run an UPDATE involving three medium-sized tables (~60k rows each), which generates a query plan with three Hash Joins. But when I try to execute it, Postgres appears to go into some kind of loop, gradually filling up the disk partition. After a long wait it responds with "ERROR: could not write to hash-join temporary file: No space left on device; SQL state: 53100"; the partition in question has 9.5GB free at the beginning of the query - that's a lot of hash file! If I run "ANALYZE temp_fares_mappings;" - the table which is being Updated, and is the outermost in the query plan - the problem goes away *even though the Query Plan hasn't changed*. I'm not entirely sure how to simplify the query and still reproduce the issue, but I've produced an anonymized data sample and SQL query at http://rwec.co.uk/x/break_postgres.zip In this case, "Analyze" *does* change the query plan (as well as fixing the issue), but I can consistently reproduce the disk-filling behaviour using this sample on Postgres 9.0 and 9.2. Note that it doesn't appear to matter if one or all of the tables are permanent, as I have been using various combinations for debugging, and always get the same behaviour. Trying to write a corresponding SELECT statement doesn't seem to generate the same issue, at least with the queries I've tried. - The plan for the real query is here: http://explain.depesz.com/s/WTP - Sample data and SQL query: http://rwec.co.uk/x/break_postgres.zip - Query plan for sample data, without running Analyze, is here: http://explain.depesz.com/s/qsH - Postgres version strings: "PostgreSQL 9.0.5 on x86_64-pc-linux-gnu, compiled by GCC x86_64-pc-linux-gnu-gcc (Gentoo 4.5.2 p1.1, pie-0.4.5) 4.5.2, 64-bit" and "PostgreSQL 9.2.1 on x86_64-pc-linux-gnu, compiled by x86_64-pc-linux-gnu-gcc (Gentoo Hardened 4.5.4 p1.0, pie-0.4.7) 4.5.4, 64-bit" Does anyone have any idea what's going on here, and whether it is in fact a bug? It certainly feels like one... Thanks for your time. -- Rowan Collins [IMSoP]
Re: UPDATE using 3 medium-sized tables causes runaway hash table and fills disk
From
Thom Brown
Date:
On 26 April 2013 15:39, Rowan Collins <rowan.collins@gmail.com> wrote: > Hi All, > > I've come upon some very strange behaviour with an UPDATE query which causes > Postgres to consume all the disk space on the server for no apparent reason. > > Basically, I'm trying to run an UPDATE involving three medium-sized tables > (~60k rows each), which generates a query plan with three Hash Joins. But > when I try to execute it, Postgres appears to go into some kind of loop, > gradually filling up the disk partition. After a long wait it responds with > "ERROR: could not write to hash-join temporary file: No space left on > device; SQL state: 53100"; the partition in question has 9.5GB free at the > beginning of the query - that's a lot of hash file! > > If I run "ANALYZE temp_fares_mappings;" - the table which is being Updated, > and is the outermost in the query plan - the problem goes away *even though > the Query Plan hasn't changed*. > > I'm not entirely sure how to simplify the query and still reproduce the > issue, but I've produced an anonymized data sample and SQL query at > http://rwec.co.uk/x/break_postgres.zip In this case, "Analyze" *does* change > the query plan (as well as fixing the issue), but I can consistently > reproduce the disk-filling behaviour using this sample on Postgres 9.0 and > 9.2. > > Note that it doesn't appear to matter if one or all of the tables are > permanent, as I have been using various combinations for debugging, and > always get the same behaviour. Trying to write a corresponding SELECT > statement doesn't seem to generate the same issue, at least with the queries > I've tried. > > - The plan for the real query is here: http://explain.depesz.com/s/WTP > - Sample data and SQL query: http://rwec.co.uk/x/break_postgres.zip > - Query plan for sample data, without running Analyze, is here: > http://explain.depesz.com/s/qsH > - Postgres version strings: "PostgreSQL 9.0.5 on x86_64-pc-linux-gnu, > compiled by GCC x86_64-pc-linux-gnu-gcc (Gentoo 4.5.2 p1.1, pie-0.4.5) > 4.5.2, 64-bit" and "PostgreSQL 9.2.1 on x86_64-pc-linux-gnu, compiled by > x86_64-pc-linux-gnu-gcc (Gentoo Hardened 4.5.4 p1.0, pie-0.4.7) 4.5.4, > 64-bit" > > Does anyone have any idea what's going on here, and whether it is in fact a > bug? It certainly feels like one... > Thanks for your time. You done gone broke Postgres. The same occurs when converting your UPDATE query into a SELECT when I try it: SELECT * FROM test_mappings AS LFM, test_low_fares AS LF JOIN test_cob_offers AS CO ON CO.num_adults_searched = LF.number_in_party AND CO.num_children_searched = 0 AND CO.num_infants_searched = 0 WHERE LFM.low_fares_row_id = LF.low_fares_row_id AND CO.central_offer_id = LFM.central_offer_id AND CO.credential_set_id = LFM.credential_set_id AND LFM.cob_offer_id IS NULL; Well something appears to be tangling up the executor. -- Thom
Re: UPDATE using 3 medium-sized tables causes runaway hash table and fills disk
From
Shaun Thomas
Date:
On 04/26/2013 09:39 AM, Rowan Collins wrote: > If I run "ANALYZE temp_fares_mappings;" - the table which is being > Updated, and is the outermost in the query plan - the problem goes > away *even though the Query Plan hasn't changed*. Oh, but it has. The query plan pre-analyze: Update on test_mappings lfm (cost=1277.27..2482.22 rows=1 width=50) -> Hash Join (cost=1277.27..2482.22 rows=1 width=50) Hash Cond: ((lf.low_fares_row_id = lfm.low_fares_row_id) AND (co.central_offer_id = lfm.central_offer_id) AND (co.credential_set_id = lfm.credential_set_id)) -> Hash Join (cost=335.68..1505.46 rows=293 width=44) Hash Cond: (lf.number_in_party = co.num_adults_searched) -> Seq Scan on test_low_fares lf (cost=0.00..946.80 rows=58680 width=22) -> Hash (cost=335.67..335.67 rows=1 width=30) -> Seq Scan on test_cob_offers co (cost=0.00..335.67 rows=1 width=30) Filter: ((num_children_searched = 0) AND (num_infants_searched = 0) AND (supplier_code = 'AMA'::text)) -> Hash (cost=936.39..936.39 rows=297 width=18) -> Seq Scan on test_mappings lfm (cost=0.00..936.39 rows=297 width=18) Filter: (cob_offer_id IS NULL) And post-analyze: Update on test_mappings lfm (cost=2652.90..9219.07 rows=63362 width=50) -> Hash Join (cost=2652.90..9219.07 rows=63362 width=50) Hash Cond: ((lf.number_in_party = co.num_adults_searched) AND (lfm.central_offer_id = co.central_offer_id) AND (lfm.credential_set_id = co.credential_set_id)) -> Hash Join (cost=1959.95..5358.02 rows=63362 width=36) Hash Cond: (lfm.low_fares_row_id = lf.low_fares_row_id) -> Seq Scan on test_mappings lfm (cost=0.00..976.62 rows=63362 width=18) Filter: (cob_offer_id IS NULL) -> Hash (cost=923.98..923.98 rows=56398 width=22) -> Seq Scan on test_low_fares lf (cost=0.00..923.98 rows=56398 width=22) -> Hash (cost=413.48..413.48 rows=15970 width=30) -> Seq Scan on test_cob_offers co (cost=0.00..413.48 rows=15970 width=30) Filter: ((num_children_searched = 0) AND (num_infants_searched = 0) AND (supplier_code = 'AMA'::text)) These are pretty drastically different. But... the first crippled my test system and started consuming vast resources, while the second executed in about 300ms. That's hard to believe with the row counts seen here unless it's turning it into some invisible Cartesian Product. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Rowan Collins <rowan.collins@gmail.com> writes: > I've come upon some very strange behaviour with an UPDATE query which > causes Postgres to consume all the disk space on the server for no > apparent reason. The short answer to that is "analyze your data". Particularly when you're using temp tables, for which auto-analyze cannot help you. The reason this plan is so disastrous is that the planner chooses to use lf.number_in_party = co.num_adults_searched as the inner join qual, after having used ((num_children_searched = 0) AND (num_infants_searched = 0) AND (supplier_code = 'AMA'::text)) as a scan qual. In the absence of any stats, neither choice is unreasonable. Given the actual data, however, *neither of those conditions has any selectivity whatsoever*, and so a join that's predicted to yield 293 rows is actually going to yield 900 million rows occupying something like 90GB. Even then you'd not have been seeing the data spilled to disk, except the planner also didn't know that cob_offer_id IS NULL is likewise totally without selectivity and thus underestimated the size of the upper hash table, so that (with default work_mem) the upper hash join ends up switching to batch mode and thus having to spill most of the 90GB to disk. If you just stick in some ANALYZEs the results are way better: Hash Join (cost=2777.90..9352.23 rows=63362 width=73) (actual time=38.902..183.116 rows=63362 loops=1) Hash Cond: ((lf.number_in_party = co.num_adults_searched) AND (lfm.central_offer_id = co.central_offer_id) AND (lfm.credential_set_id= co.credential_set_id)) -> Hash Join (cost=1959.95..5358.02 rows=63362 width=37) (actual time=26.635..100.670 rows=63362 loops=1) Hash Cond: (lfm.low_fares_row_id = lf.low_fares_row_id) -> Seq Scan on test_mappings lfm (cost=0.00..976.62 rows=63362 width=17) (actual time=0.014..13.390 rows=63362loops=1) Filter: (cob_offer_id IS NULL) -> Hash (cost=923.98..923.98 rows=56398 width=20) (actual time=26.590..26.590 rows=56398 loops=1) Buckets: 2048 Batches: 4 Memory Usage: 777kB -> Seq Scan on test_low_fares lf (cost=0.00..923.98 rows=56398 width=20) (actual time=0.008..11.484 rows=56398loops=1) -> Hash (cost=413.48..413.48 rows=15970 width=36) (actual time=12.225..12.225 rows=15970 loops=1) Buckets: 2048 Batches: 2 Memory Usage: 566kB -> Seq Scan on test_cob_offers co (cost=0.00..413.48 rows=15970 width=36) (actual time=0.021..7.018 rows=15970loops=1) Filter: ((num_children_searched = 0) AND (num_infants_searched = 0) AND (supplier_code = 'AMA'::text)) Total runtime: 187.090 ms but there's really no way for the planner to select the right plan when some of the conditions in the query are selective and some are not and it has no stats to help it tell the difference. regards, tom lane
Re: UPDATE using 3 medium-sized tables causes runaway hash table and fills disk
From
Rowan Collins
Date:
On 29/04/2013 17:04, Shaun Thomas wrote: > On 04/26/2013 09:39 AM, Rowan Collins wrote: >> If I run "ANALYZE temp_fares_mappings;" - the table which is being >> Updated, and is the outermost in the query plan - the problem goes >> away *even though the Query Plan hasn't changed*. > > Oh, but it has. Hi Shaun, Unfortunately the cut-down version of the data I put together to avoid showing confidential data doesn't show the same effect. (Basically, I removed most of the columns from the original tables that weren't referenced anywhere in the query.) The actual import process I was working on when I first uncovered the issue resulted in this query plan: Update on temp_low_fares_mappings lfm (cost=3780.90..10347.07 rows=63362 width=50) -> Hash Join (cost=3780.90..10347.07 rows=63362 width=50) Hash Cond: ((lf.number_in_party = co.num_adults_searched) AND (lfm.central_offer_id = co.central_offer_id) AND (lfm.credential_set_id = co.credential_set_id)) -> Hash Join (cost=2809.95..6208.02 rows=63362 width=36) Hash Cond: (lfm.low_fares_row_id = lf.low_fares_row_id) -> Seq Scan on temp_low_fares_mappings lfm (cost=0.00..976.62 rows=63362 width=18) Filter: (cob_offer_id IS NULL) -> Hash (cost=1773.98..1773.98 rows=56398 width=22) -> Seq Scan on csv_low_fares lf (cost=0.00..1773.98 rows=56398 width=22) -> Hash (cost=691.48..691.48 rows=15970 width=30) -> Seq Scan on offers co (cost=0.00..691.48 rows=15970 width=30) Filter: ((num_children_searched = 0) AND (num_infants_searched = 0) AND (supplier_code = 'AMA'::text)) This is the same basic plan as the test case, but with the tables in a slightly different order (this has the offers table joined last, where the test data joins the mapping table last). Running ANALYZE temp_low_fares_mappings in this case led to no change at all in the query plan, but resolved the odd behaviour of the execution of that plan. Frustratingly, it's quite hard to reproduce the exact circumstances of this due to the complex processing this is just one part of. It is possible that I messed something up during testing, but I am pretty sure the ANALYZE was fixing the execution itself, not just changing the query plan. Regards, Rowan Collins [IMSoP]
Re: UPDATE using 3 medium-sized tables causes runaway hash table and fills disk
From
Shaun Thomas
Date:
On 04/29/2013 12:13 PM, Rowan Collins wrote: > This is the same basic plan as the test case, but with the tables in a > slightly different order (this has the offers table joined last, where > the test data joins the mapping table last). Tom already explained this much better than I could. But the key words here are "same basic plan" and "slightly different order." Slight differences in query plans can have much larger impact than you might imagine, as your test case has clearly demonstrated. :) What you basically created was a worst case scenario for the planner. It takes certain shortcuts based on your WHERE clauses, and the data doesn't match them with nearly the estimated correlation. This often happens with nested loops. The thing that bit you was the fact that this is worst case is so large, it can't really be EXPLAIN ANALYZED to see *where* it went wrong except by someone relatively intimately acquainted with the planner. What would happen if the planner could calculate worst case estimations for likely candidates, and if one could potentially be horrible (equivalent of a Cartesian Product), use the next best? I'm not sure that's actually feasible, but you never know. From my experience, it's not uncommon for WHERE clauses to actually do very little in regards to restricting the result set. In a couple of cases, I've had to use CTE's or our friend OFFSET 0 to insert an optimization fence because some WHERE clause is only removing 5 rows, but the planner is producing a drastically different plan that ends up being much slower. And that's on a fully analyzed table with stats cranked way up. So yeah... analyze your temp tables. If you can find a case where that doesn't help, say something. The more examples we get of edge cases confusing the planner, the quicker they'll get addressed. This is an awesome little test case, though. How dramatically it explodes really is something to behold. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-676-8870 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email