Home > mailing lists

Thread: Techniques to Avoid Temp Files

Techniques to Avoid Temp Files

From

Duane Murphy

Date:

18 June 2015, 19:39:07

We are trying to improve performance by avoiding the temp file creation.

LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8068.125071", size 58988604

STATEMENT: SELECT iiid.installed_item__id, item_detail.id, item_detail.model_id, item_detail.type

FROM installed_item__item_detail AS iiid

INNER JOIN item_detail ON iiid.item_detail__id = item_detail.id

INNER JOIN item ON (item.installed_item__id = iiid.installed_item__id )

INNER JOIN model ON (item.id = model.item__id AND model.id = $1)

Our hypothesis is that the temp file creation is caused by the high row count of the

installed_item__item_detail table.

installed_item__item_detail: 72916824 rows (27 GB)

item_detail: 59212436 rows (40 GB)

The other two tables, item and model, are temporary tables created during this particular process. Unfortunately, I don't have those table sizes.

What are the causes of temp file creation? In general, temp files are created when the sort merge data will not fit in work_mem. What can I do to reduce the amount of data that is being merged? Is the simple fact that the tables have millions of rows going to cause a merge sort?

I noticed that this query selects from installed_item__item_detail instead of from item_detail which seems like it would also work. Would this change make a positive difference?

installed_item__item_detail is a simple join table. The installed_item__id side cannot be reduced. Would reducing the number of item_detail rows using additional joins benefit?

What additional information can I gather in order have a better understanding of how to improve this query?

(Unfortunately we do not have (easy) access to this particular database in order to experiment.)

...Duane

Background information:

=> select version();

version

--------------------------------------------------------------------------------------------------------------

PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit

$uname -a

Linux host.name.com 2.6.32-358.6.2.el6.x86_64 #1 SMP Thu May 16 20:59:36 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

=> select name, current_setting(name), source from pg_settings where source not in ('default', 'override');

name | current_setting | source

------------------------------+--------------------+----------------------

application_name | psql | client

checkpoint_completion_target | 0.9 | configuration file

checkpoint_segments | 128 | configuration file

client_encoding | UTF8 | client

DateStyle | ISO, MDY | configuration file

default_statistics_target | 100 | configuration file

default_text_search_config | pg_catalog.english | configuration file

effective_cache_size | 512MB | configuration file

lc_messages | en_US.UTF-8 | configuration file

lc_monetary | en_US.UTF-8 | configuration file

lc_numeric | en_US.UTF-8 | configuration file

lc_time | en_US.UTF-8 | configuration file

log_autovacuum_min_duration | 1s | configuration file

log_destination | stderr,syslog | configuration file

log_line_prefix | [%m]: | configuration file

log_min_duration_statement | 5min | configuration file

log_min_error_statement | notice | configuration file

log_rotation_age | 1d | configuration file

log_rotation_size | 0 | configuration file

log_temp_files | 1MB | configuration file

log_timezone | US/Pacific | environment variable

log_truncate_on_rotation | on | configuration file

logging_collector | on | configuration file

maintenance_work_mem | 384MB | configuration file

max_connections | 100 | configuration file

max_stack_depth | 2MB | environment variable

port | 5432 | command line

shared_buffers | 256MB | configuration file

syslog_facility | local0 | configuration file

TimeZone | US/Pacific | environment variable

wal_buffers | 1MB | configuration file

work_mem | 128MB | configuration file

(32 rows)

Re: Techniques to Avoid Temp Files

From

Albe Laurenz

Date:

19 June 2015, 08:17:22

Duane Murphy wrote:
> We are trying to improve performance by avoiding the temp file creation.
> 
> LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp8068.125071", size 58988604
> STATEMENT: SELECT iiid.installed_item__id, item_detail.id, item_detail.model_id, item_detail.type
> FROM installed_item__item_detail AS iiid
> INNER JOIN item_detail ON iiid.item_detail__id = item_detail.id
> INNER JOIN item ON (item.installed_item__id = iiid.installed_item__id )
> INNER JOIN model ON (item.id = model.item__id AND model.id = $1)

> What are the causes of temp file creation?

Operations like hash and sort that need more space than work_mem promises.

> What additional information can I gather in order have a better understanding of how to improve this
> query?

It woul be really useful to see the result of "EXPLAIN (ANALYZE, BUFFERS) SELECT ..."
for your query.

But essentially the answer to avoid temporary files is always "increase work_mem".

Yours,
Laurenz Albe

Re: Techniques to Avoid Temp Files

From

Jeff Janes

Date:

19 June 2015, 18:26:21

On Thu, Jun 18, 2015 at 12:38 PM, Duane Murphy <duane.murphy@gmail.com> wrote:

We are trying to improve performance by avoiding the temp file creation.

LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8068.125071", size 58988604
STATEMENT: SELECT iiid.installed_item__id, item_detail.id, item_detail.model_id, item_detail.type
FROM installed_item__item_detail AS iiid
INNER JOIN item_detail ON iiid.item_detail__id = item_detail.id
INNER JOIN item ON (item.installed_item__id = iiid.installed_item__id )
INNER JOIN model ON (item.id = model.item__id AND model.id = $1)

Our hypothesis is that the temp file creation is caused by the high row count of the
installed_item__item_detail table.

installed_item__item_detail: 72916824 rows (27 GB)
item_detail: 59212436 rows (40 GB)

The other two tables, item and model, are temporary tables created during this particular process. Unfortunately, I don't have those table sizes.

Those temporary tables aren't providing any output to the query, so their only role must be to restrict the rows returned by the permanent tables. If they restrict that by a lot, then it could do a nested loop over the temp tables, doing indexed queries against the permanent tables assuming you have the right indexes.

Temporary tables do not get analyzed automatically, so you should probably run ANALYZE on them explicitly before this big query.

What additional information can I gather in order have a better understanding of how to improve this query?

What indexes do the tables have? What is the output of EXPLAIN, or better yet EXPLAIN (ANALYZE,BUFFERS), for the query?

Cheers,

Jeff