Thread: Techniques to Avoid Temp Files

Techniques to Avoid Temp Files

From
Duane Murphy
Date:
We are trying to improve performance by avoiding the temp file creation.

LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp8068.125071", size 58988604
STATEMENT: SELECT iiid.installed_item__id, item_detail.id, item_detail.model_id, item_detail.type
FROM installed_item__item_detail AS iiid
INNER JOIN item_detail ON iiid.item_detail__id = item_detail.id
INNER JOIN item ON (item.installed_item__id = iiid.installed_item__id )
INNER JOIN model ON (item.id = model.item__id AND model.id = $1)

Our hypothesis is that the temp file creation is caused by the high row count of the
installed_item__item_detail table.

installed_item__item_detail: 72916824 rows (27 GB)
item_detail:                 59212436 rows (40 GB)

The other two tables, item and model, are temporary tables created during this particular process. Unfortunately, I don't have those table sizes.

What are the causes of temp file creation? In general, temp files are created when the sort merge data will not fit in work_mem. What can I do to reduce the amount of data that is being merged? Is the simple fact that the tables have millions of rows going to cause a merge sort?

I noticed that this query selects from installed_item__item_detail instead of from item_detail which seems like it would also work. Would this change make a positive difference?

installed_item__item_detail is a simple join table. The installed_item__id side cannot be reduced.  Would reducing the number of item_detail rows using additional joins benefit?

What additional information can I gather in order have a better understanding of how to improve this query?

(Unfortunately we do not have (easy) access to this particular database in order to experiment.)

 ...Duane

Background information:

=> select version();
                                                   version
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit

$uname -a
Linux host.name.com 2.6.32-358.6.2.el6.x86_64 #1 SMP Thu May 16 20:59:36 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

=> select name, current_setting(name), source from pg_settings where source not in ('default', 'override');
             name             |  current_setting   |        source
------------------------------+--------------------+----------------------
 application_name             | psql               | client
 checkpoint_completion_target | 0.9                | configuration file
 checkpoint_segments          | 128                | configuration file
 client_encoding              | UTF8               | client
 DateStyle                    | ISO, MDY           | configuration file
 default_statistics_target    | 100                | configuration file
 default_text_search_config   | pg_catalog.english | configuration file
 effective_cache_size         | 512MB              | configuration file
 lc_messages                  | en_US.UTF-8        | configuration file
 lc_monetary                  | en_US.UTF-8        | configuration file
 lc_numeric                   | en_US.UTF-8        | configuration file
 lc_time                      | en_US.UTF-8        | configuration file
 log_autovacuum_min_duration  | 1s                 | configuration file
 log_destination              | stderr,syslog      | configuration file
 log_line_prefix              | [%m]:              | configuration file
 log_min_duration_statement   | 5min               | configuration file
 log_min_error_statement      | notice             | configuration file
 log_rotation_age             | 1d                 | configuration file
 log_rotation_size            | 0                  | configuration file
 log_temp_files               | 1MB                | configuration file
 log_timezone                 | US/Pacific         | environment variable
 log_truncate_on_rotation     | on                 | configuration file
 logging_collector            | on                 | configuration file
 maintenance_work_mem         | 384MB              | configuration file
 max_connections              | 100                | configuration file
 max_stack_depth              | 2MB                | environment variable
 port                         | 5432               | command line
 shared_buffers               | 256MB              | configuration file
 syslog_facility              | local0             | configuration file
 TimeZone                     | US/Pacific         | environment variable
 wal_buffers                  | 1MB                | configuration file
 work_mem                     | 128MB              | configuration file
(32 rows)

Re: Techniques to Avoid Temp Files

From
Albe Laurenz
Date:
Duane Murphy wrote:
> We are trying to improve performance by avoiding the temp file creation.
> 
> LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp8068.125071", size 58988604
> STATEMENT: SELECT iiid.installed_item__id, item_detail.id, item_detail.model_id, item_detail.type
> FROM installed_item__item_detail AS iiid
> INNER JOIN item_detail ON iiid.item_detail__id = item_detail.id
> INNER JOIN item ON (item.installed_item__id = iiid.installed_item__id )
> INNER JOIN model ON (item.id = model.item__id AND model.id = $1)

> What are the causes of temp file creation?

Operations like hash and sort that need more space than work_mem promises.

> What additional information can I gather in order have a better understanding of how to improve this
> query?

It woul be really useful to see the result of "EXPLAIN (ANALYZE, BUFFERS) SELECT ..."
for your query.

But essentially the answer to avoid temporary files is always "increase work_mem".

Yours,
Laurenz Albe

Re: Techniques to Avoid Temp Files

From
Jeff Janes
Date:
On Thu, Jun 18, 2015 at 12:38 PM, Duane Murphy <duane.murphy@gmail.com> wrote:
We are trying to improve performance by avoiding the temp file creation.

LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp8068.125071", size 58988604
STATEMENT: SELECT iiid.installed_item__id, item_detail.id, item_detail.model_id, item_detail.type
FROM installed_item__item_detail AS iiid
INNER JOIN item_detail ON iiid.item_detail__id = item_detail.id
INNER JOIN item ON (item.installed_item__id = iiid.installed_item__id )
INNER JOIN model ON (item.id = model.item__id AND model.id = $1)

Our hypothesis is that the temp file creation is caused by the high row count of the
installed_item__item_detail table.

installed_item__item_detail: 72916824 rows (27 GB)
item_detail:                 59212436 rows (40 GB)

The other two tables, item and model, are temporary tables created during this particular process. Unfortunately, I don't have those table sizes.

Those temporary tables aren't providing any output to the query, so their only role must be to restrict the rows returned by the permanent tables.  If they restrict that by a lot, then it could do a nested loop over the temp tables, doing indexed queries against the permanent tables assuming you have the right indexes.

Temporary tables do not get analyzed automatically, so you should probably run ANALYZE on them explicitly before this big query.
 

What additional information can I gather in order have a better understanding of how to improve this query?

What indexes do the tables have?  What is the output of EXPLAIN, or better yet EXPLAIN (ANALYZE,BUFFERS), for the query?
 
Cheers,

Jeff