Thread: UPDATE using 3 medium-sized tables causes runaway hash table and fills disk

UPDATE using 3 medium-sized tables causes runaway hash table and fills disk

From
Rowan Collins
Date:
Hi All,

I've come upon some very strange behaviour with an UPDATE query which
causes Postgres to consume all the disk space on the server for no
apparent reason.

Basically, I'm trying to run an UPDATE involving three medium-sized
tables (~60k rows each), which generates a query plan with three Hash
Joins. But when I try to execute it, Postgres appears to go into some
kind of loop, gradually filling up the disk partition. After a long wait
it responds with "ERROR: could not write to hash-join temporary file: No
space left on device; SQL state: 53100"; the partition in question has
9.5GB free at the beginning of the query - that's a lot of hash file!

If I run "ANALYZE temp_fares_mappings;" - the table which is being
Updated, and is the outermost in the query plan - the problem goes away
*even though the Query Plan hasn't changed*.

I'm not entirely sure how to simplify the query and still reproduce the
issue, but I've produced an anonymized data sample and SQL query at
http://rwec.co.uk/x/break_postgres.zip In this case, "Analyze" *does*
change the query plan (as well as fixing the issue), but I can
consistently reproduce the disk-filling behaviour using this sample on
Postgres 9.0 and 9.2.

Note that it doesn't appear to matter if one or all of the tables are
permanent, as I have been using various combinations for debugging, and
always get the same behaviour. Trying to write a corresponding SELECT
statement doesn't seem to generate the same issue, at least with the
queries I've tried.

- The plan for the real query is here: http://explain.depesz.com/s/WTP
- Sample data and SQL query: http://rwec.co.uk/x/break_postgres.zip
- Query plan for sample data, without running Analyze, is here:
http://explain.depesz.com/s/qsH
- Postgres version strings: "PostgreSQL 9.0.5 on x86_64-pc-linux-gnu,
compiled by GCC x86_64-pc-linux-gnu-gcc (Gentoo 4.5.2 p1.1, pie-0.4.5)
4.5.2, 64-bit" and "PostgreSQL 9.2.1 on x86_64-pc-linux-gnu, compiled by
x86_64-pc-linux-gnu-gcc (Gentoo Hardened 4.5.4 p1.0, pie-0.4.7) 4.5.4,
64-bit"

Does anyone have any idea what's going on here, and whether it is in
fact a bug? It certainly feels like one...
Thanks for your time.
--
Rowan Collins
[IMSoP]


On 26 April 2013 15:39, Rowan Collins <rowan.collins@gmail.com> wrote:
> Hi All,
>
> I've come upon some very strange behaviour with an UPDATE query which causes
> Postgres to consume all the disk space on the server for no apparent reason.
>
> Basically, I'm trying to run an UPDATE involving three medium-sized tables
> (~60k rows each), which generates a query plan with three Hash Joins. But
> when I try to execute it, Postgres appears to go into some kind of loop,
> gradually filling up the disk partition. After a long wait it responds with
> "ERROR: could not write to hash-join temporary file: No space left on
> device; SQL state: 53100"; the partition in question has 9.5GB free at the
> beginning of the query - that's a lot of hash file!
>
> If I run "ANALYZE temp_fares_mappings;" - the table which is being Updated,
> and is the outermost in the query plan - the problem goes away *even though
> the Query Plan hasn't changed*.
>
> I'm not entirely sure how to simplify the query and still reproduce the
> issue, but I've produced an anonymized data sample and SQL query at
> http://rwec.co.uk/x/break_postgres.zip In this case, "Analyze" *does* change
> the query plan (as well as fixing the issue), but I can consistently
> reproduce the disk-filling behaviour using this sample on Postgres 9.0 and
> 9.2.
>
> Note that it doesn't appear to matter if one or all of the tables are
> permanent, as I have been using various combinations for debugging, and
> always get the same behaviour. Trying to write a corresponding SELECT
> statement doesn't seem to generate the same issue, at least with the queries
> I've tried.
>
> - The plan for the real query is here: http://explain.depesz.com/s/WTP
> - Sample data and SQL query: http://rwec.co.uk/x/break_postgres.zip
> - Query plan for sample data, without running Analyze, is here:
> http://explain.depesz.com/s/qsH
> - Postgres version strings: "PostgreSQL 9.0.5 on x86_64-pc-linux-gnu,
> compiled by GCC x86_64-pc-linux-gnu-gcc (Gentoo 4.5.2 p1.1, pie-0.4.5)
> 4.5.2, 64-bit" and "PostgreSQL 9.2.1 on x86_64-pc-linux-gnu, compiled by
> x86_64-pc-linux-gnu-gcc (Gentoo Hardened 4.5.4 p1.0, pie-0.4.7) 4.5.4,
> 64-bit"
>
> Does anyone have any idea what's going on here, and whether it is in fact a
> bug? It certainly feels like one...
> Thanks for your time.

You done gone broke Postgres.  The same occurs when converting your
UPDATE query into a SELECT when I try it:

SELECT
  *
FROM
  test_mappings AS LFM, test_low_fares AS LF
JOIN
  test_cob_offers AS CO
  ON
    CO.num_adults_searched   = LF.number_in_party
  AND
    CO.num_children_searched = 0
  AND
    CO.num_infants_searched  = 0
WHERE
  LFM.low_fares_row_id     = LF.low_fares_row_id
AND
  CO.central_offer_id  = LFM.central_offer_id
AND
  CO.credential_set_id = LFM.credential_set_id
AND
  LFM.cob_offer_id IS NULL;

Well something appears to be tangling up the executor.
--
Thom


On 04/26/2013 09:39 AM, Rowan Collins wrote:

> If I run "ANALYZE temp_fares_mappings;" - the table which is being
> Updated, and is the outermost in the query plan - the problem goes
> away *even though the Query Plan hasn't changed*.

Oh, but it has.

The query plan pre-analyze:

  Update on test_mappings lfm  (cost=1277.27..2482.22 rows=1 width=50)
    ->  Hash Join  (cost=1277.27..2482.22 rows=1 width=50)
          Hash Cond: ((lf.low_fares_row_id = lfm.low_fares_row_id) AND
(co.central_offer_id = lfm.central_offer_id) AND (co.credential_set_id =
lfm.credential_set_id))
          ->  Hash Join  (cost=335.68..1505.46 rows=293 width=44)
                Hash Cond: (lf.number_in_party = co.num_adults_searched)
                ->  Seq Scan on test_low_fares lf  (cost=0.00..946.80
rows=58680 width=22)
                ->  Hash  (cost=335.67..335.67 rows=1 width=30)
                      ->  Seq Scan on test_cob_offers co
(cost=0.00..335.67 rows=1 width=30)
                            Filter: ((num_children_searched = 0) AND
(num_infants_searched = 0) AND (supplier_code = 'AMA'::text))
          ->  Hash  (cost=936.39..936.39 rows=297 width=18)
                ->  Seq Scan on test_mappings lfm  (cost=0.00..936.39
rows=297 width=18)
                      Filter: (cob_offer_id IS NULL)


And post-analyze:

  Update on test_mappings lfm  (cost=2652.90..9219.07 rows=63362 width=50)
    ->  Hash Join  (cost=2652.90..9219.07 rows=63362 width=50)
          Hash Cond: ((lf.number_in_party = co.num_adults_searched) AND
(lfm.central_offer_id = co.central_offer_id) AND (lfm.credential_set_id
= co.credential_set_id))
          ->  Hash Join  (cost=1959.95..5358.02 rows=63362 width=36)
                Hash Cond: (lfm.low_fares_row_id = lf.low_fares_row_id)
                ->  Seq Scan on test_mappings lfm  (cost=0.00..976.62
rows=63362 width=18)
                      Filter: (cob_offer_id IS NULL)
                ->  Hash  (cost=923.98..923.98 rows=56398 width=22)
                      ->  Seq Scan on test_low_fares lf
(cost=0.00..923.98 rows=56398 width=22)
          ->  Hash  (cost=413.48..413.48 rows=15970 width=30)
                ->  Seq Scan on test_cob_offers co  (cost=0.00..413.48
rows=15970 width=30)
                      Filter: ((num_children_searched = 0) AND
(num_infants_searched = 0) AND (supplier_code = 'AMA'::text))


These are pretty drastically different. But... the first crippled my
test system and started consuming vast resources, while the second
executed in about 300ms. That's hard to believe with the row counts seen
here unless it's turning it into some invisible Cartesian Product.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Rowan Collins <rowan.collins@gmail.com> writes:
> I've come upon some very strange behaviour with an UPDATE query which
> causes Postgres to consume all the disk space on the server for no
> apparent reason.

The short answer to that is "analyze your data".  Particularly when
you're using temp tables, for which auto-analyze cannot help you.

The reason this plan is so disastrous is that the planner chooses to use
lf.number_in_party = co.num_adults_searched as the inner join qual,
after having used ((num_children_searched = 0) AND (num_infants_searched
= 0) AND (supplier_code = 'AMA'::text)) as a scan qual.  In the absence
of any stats, neither choice is unreasonable.  Given the actual data,
however, *neither of those conditions has any selectivity whatsoever*,
and so a join that's predicted to yield 293 rows is actually going to
yield 900 million rows occupying something like 90GB.  Even then you'd
not have been seeing the data spilled to disk, except the planner also
didn't know that cob_offer_id IS NULL is likewise totally without
selectivity and thus underestimated the size of the upper hash table,
so that (with default work_mem) the upper hash join ends up switching
to batch mode and thus having to spill most of the 90GB to disk.

If you just stick in some ANALYZEs the results are way better:

 Hash Join  (cost=2777.90..9352.23 rows=63362 width=73) (actual time=38.902..183.116 rows=63362 loops=1)
   Hash Cond: ((lf.number_in_party = co.num_adults_searched) AND (lfm.central_offer_id = co.central_offer_id) AND
(lfm.credential_set_id= co.credential_set_id)) 
   ->  Hash Join  (cost=1959.95..5358.02 rows=63362 width=37) (actual time=26.635..100.670 rows=63362 loops=1)
         Hash Cond: (lfm.low_fares_row_id = lf.low_fares_row_id)
         ->  Seq Scan on test_mappings lfm  (cost=0.00..976.62 rows=63362 width=17) (actual time=0.014..13.390
rows=63362loops=1) 
               Filter: (cob_offer_id IS NULL)
         ->  Hash  (cost=923.98..923.98 rows=56398 width=20) (actual time=26.590..26.590 rows=56398 loops=1)
               Buckets: 2048  Batches: 4  Memory Usage: 777kB
               ->  Seq Scan on test_low_fares lf  (cost=0.00..923.98 rows=56398 width=20) (actual time=0.008..11.484
rows=56398loops=1) 
   ->  Hash  (cost=413.48..413.48 rows=15970 width=36) (actual time=12.225..12.225 rows=15970 loops=1)
         Buckets: 2048  Batches: 2  Memory Usage: 566kB
         ->  Seq Scan on test_cob_offers co  (cost=0.00..413.48 rows=15970 width=36) (actual time=0.021..7.018
rows=15970loops=1) 
               Filter: ((num_children_searched = 0) AND (num_infants_searched = 0) AND (supplier_code = 'AMA'::text))
 Total runtime: 187.090 ms

but there's really no way for the planner to select the right plan
when some of the conditions in the query are selective and some are
not and it has no stats to help it tell the difference.

            regards, tom lane


On 29/04/2013 17:04, Shaun Thomas wrote:
 > On 04/26/2013 09:39 AM, Rowan Collins wrote:
 >> If I run "ANALYZE temp_fares_mappings;" - the table which is being
 >> Updated, and is the outermost in the query plan - the problem goes
 >> away *even though the Query Plan hasn't changed*.
 >
 > Oh, but it has.

Hi Shaun,

Unfortunately the cut-down version of the data I put together to avoid
showing confidential data doesn't show the same effect. (Basically, I
removed most of the columns from the original tables that weren't
referenced anywhere in the query.) The actual import process I was
working on when I first uncovered the issue resulted in this query plan:

Update on temp_low_fares_mappings lfm  (cost=3780.90..10347.07
rows=63362 width=50)
   ->  Hash Join  (cost=3780.90..10347.07 rows=63362 width=50)
         Hash Cond: ((lf.number_in_party = co.num_adults_searched) AND
(lfm.central_offer_id = co.central_offer_id) AND (lfm.credential_set_id
= co.credential_set_id))
         ->  Hash Join  (cost=2809.95..6208.02 rows=63362 width=36)
               Hash Cond: (lfm.low_fares_row_id = lf.low_fares_row_id)
               ->  Seq Scan on temp_low_fares_mappings lfm
(cost=0.00..976.62 rows=63362 width=18)
                     Filter: (cob_offer_id IS NULL)
               ->  Hash  (cost=1773.98..1773.98 rows=56398 width=22)
                     ->  Seq Scan on csv_low_fares lf
(cost=0.00..1773.98 rows=56398 width=22)
         ->  Hash  (cost=691.48..691.48 rows=15970 width=30)
               ->  Seq Scan on offers co  (cost=0.00..691.48 rows=15970
width=30)
                     Filter: ((num_children_searched = 0) AND
(num_infants_searched = 0) AND (supplier_code = 'AMA'::text))


This is the same basic plan as the test case, but with the tables in a
slightly different order (this has the offers table joined last, where
the test data joins the mapping table last). Running ANALYZE
temp_low_fares_mappings in this case led to no change at all in the
query plan, but resolved the odd behaviour of the execution of that plan.

Frustratingly, it's quite hard to reproduce the exact circumstances of
this due to the complex processing this is just one part of. It is
possible that I messed something up during testing, but I am pretty sure
the ANALYZE was fixing the execution itself, not just changing the query
plan.

Regards,
Rowan Collins
[IMSoP]


On 04/29/2013 12:13 PM, Rowan Collins wrote:

> This is the same basic plan as the test case, but with the tables in a
> slightly different order (this has the offers table joined last, where
> the test data joins the mapping table last).

Tom already explained this much better than I could. But the key words
here are "same basic plan" and "slightly different order."

Slight differences in query plans can have much larger impact than you
might imagine, as your test case has clearly demonstrated. :)

What you basically created was a worst case scenario for the planner. It
takes certain shortcuts based on your WHERE clauses, and the data
doesn't match them with nearly the estimated correlation. This often
happens with nested loops. The thing that bit you was the fact that this
is worst case is so large, it can't really be EXPLAIN ANALYZED to see
*where* it went wrong except by someone relatively intimately acquainted
with the planner.

What would happen if the planner could calculate worst case estimations
for likely candidates, and if one could potentially be horrible
(equivalent of a Cartesian Product), use the next best? I'm not sure
that's actually feasible, but you never know.

 From my experience, it's not uncommon for WHERE clauses to actually do
very little in regards to restricting the result set. In a couple of
cases, I've had to use CTE's or our friend OFFSET 0 to insert an
optimization fence because some WHERE clause is only removing 5 rows,
but the planner is producing a drastically different plan that ends up
being much slower. And that's on a fully analyzed table with stats
cranked way up.

So yeah... analyze your temp tables. If you can find a case where that
doesn't help, say something. The more examples we get of edge cases
confusing the planner, the quicker they'll get addressed.

This is an awesome little test case, though. How dramatically it
explodes really is something to behold.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email