Re: Slow hash join performance with many batches - Mailing list pgsql-performance

From Tom Lane
Subject Re: Slow hash join performance with many batches
Date
Msg-id 28009.1433181492@sss.pgh.pa.us
Whole thread Raw
In response to Slow hash join performance with many batches  (Alex Adriaanse <alex@oseberg.io>)
List pgsql-performance
Alex Adriaanse <alex@oseberg.io> writes:
> I have several databases that have the same schema but different amounts of data in it (let's categorize these as
Small,Medium, and Large). We have a mammoth query with 13 CTEs that are LEFT JOINed against a main table. This query
takes<30 mins on the Small database, <2 hours to run on Large, but on the Medium database it takes in the vicinity of
14hours. 
> Running truss/strace on the backend process running this query on the Medium database reveals that for a big chunk of
thistime Postgres creates/reads/unlinks a very large quantity (millions?) of tiny files inside pgsql_tmp. I also ran an
EXPLAINANALYZE and am attaching the most time-consuming parts of the plan (with names redacted). Although I'm not too
familiarwith the internals of Postgres' Hash implementation, it seems that having over 4 million hash batches could be
what'scausing the problem. 

> I'm running PostgreSQL 9.3.5, and have work_mem set to 32MB.

I'd try using a significantly larger work_mem setting for this query,
so as to have fewer hash batches and more buckets per batch.

It might be unwise to raise your global work_mem setting, but perhaps
you could just do a "SET work_mem" within the session running the query.

Also, it looks like the planner is drastically overestimating the sizes
of the CTE outputs, which is contributing to selecting unreasonably large
numbers of batches.  If you could get those numbers closer to reality it'd
likely help.  Hard to opine further since no details about the CTEs were
provided.

            regards, tom lane


pgsql-performance by date:

Previous
From: Alex Adriaanse
Date:
Subject: Slow hash join performance with many batches
Next
From: Kevin Grittner
Date:
Subject: Re: Fastest way / best practice to calculate "next birthdays"