Re: [GENERAL] Query Using Massive Temp Space - Mailing list pgsql-general

From Tom Lane
Subject Re: [GENERAL] Query Using Massive Temp Space
Date
Msg-id 16625.1511199408@sss.pgh.pa.us
Whole thread Raw
In response to [GENERAL] Query Using Massive Temp Space  (Cory Tucker <cory.tucker@gmail.com>)
Responses Re: [GENERAL] Query Using Massive Temp Space  (Cory Tucker <cory.tucker@gmail.com>)
List pgsql-general
Cory Tucker <cory.tucker@gmail.com> writes:
> I have a query that is using a tremendous amount of temp disk space given
> the overall size of the dataset.  I'd love for someone to try to explain
> what PG is doing and why its using so much space for the query.

> First off, the system is PG 9.6 on Ubuntu with 4 cores and 28 GB of RAM.

9.6.what exactly?

> The query in question is a fairly large join of several tables (6)
> including some aggregations.  The overall dataset size of the 6 tables in
> question is about 20GB and the largest table is about 15M rows.  The query
> is essentially a dump of *most* of the data from these tables joined
> together to be used in another system.
> When the query runs it begins to use an aggressive amount of temp space on
> the volume over the course of many hours, until it reaches about 95%
> capacity and then tapers off.  Never completes though.  The temp space it
> uses is around 1.5TB out of a 2TB volume.  Again, the *total* size of the
> relations in question is only 20GB.

The only thing I can think of offhand that could create temp files far in
excess of the actual data volume is if a hash join repeatedly decides that
it needs to increase the number of hash batches.  We have seen that happen
in cases where an inner relation contains a vast number of identical hash
key values: they all go into the same bucket, and the executor keeps
splitting the batch trying to reduce its size, but being all identical
values it will never be the case that some rows go into a different batch
than others.  There is logic in there to give up splitting when this
happens, but maybe it's not firing in your case, or maybe somebody broke
it recently :-(.

I find it suspicious that your plan involves two separate hash joins both
of which have a much larger table on the inside than the outside ---
offhand that does not look very sane.  So there's also a question of
why did the planner do that.

What can you tell us about the distributions of the join keys for the two
hash joins --- are those keys unique, or a bit less than unique, or very
much not unique?

It would also be interesting to keep an eye on the temp files themselves.
How many are there, how large are they, how fast are new ones created?
If my theory above is right, it would likely manifest as there being a
lot of zero-size temp files from the wasted split attempts (and other
files being exactly the same size as the ones from the previous
iteration).
        regards, tom lane


pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: Migration to PGLister - After
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Migration to PGLister - After