Thread: Techniques to Avoid Temp Files
We are trying to improve performance by avoiding the temp file creation.
LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8068.125071", size 58988604
STATEMENT: SELECT iiid.installed_item__id, item_detail.id, item_detail.model_id, item_detail.type
FROM installed_item__item_detail AS iiid
INNER JOIN item_detail ON iiid.item_detail__id = item_detail.id
INNER JOIN item ON (item.installed_item__id = iiid.installed_item__id )
INNER JOIN model ON (item.id = model.item__id AND model.id = $1)
Our hypothesis is that the temp file creation is caused by the high row count of the
installed_item__item_detail table.
installed_item__item_detail: 72916824 rows (27 GB)
item_detail: 59212436 rows (40 GB)
The other two tables, item and model, are temporary tables created during this particular process. Unfortunately, I don't have those table sizes.
What are the causes of temp file creation? In general, temp files are created when the sort merge data will not fit in work_mem. What can I do to reduce the amount of data that is being merged? Is the simple fact that the tables have millions of rows going to cause a merge sort?
I noticed that this query selects from installed_item__item_detail instead of from item_detail which seems like it would also work. Would this change make a positive difference?
installed_item__item_detail is a simple join table. The installed_item__id side cannot be reduced. Would reducing the number of item_detail rows using additional joins benefit?
What additional information can I gather in order have a better understanding of how to improve this query?
(Unfortunately we do not have (easy) access to this particular database in order to experiment.)
...Duane
=> select version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
$uname -a
Linux host.name.com 2.6.32-358.6.2.el6.x86_64 #1 SMP Thu May 16 20:59:36 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
=> select name, current_setting(name), source from pg_settings where source not in ('default', 'override');
name | current_setting | source
------------------------------+--------------------+----------------------
application_name | psql | client
checkpoint_completion_target | 0.9 | configuration file
checkpoint_segments | 128 | configuration file
client_encoding | UTF8 | client
DateStyle | ISO, MDY | configuration file
default_statistics_target | 100 | configuration file
default_text_search_config | pg_catalog.english | configuration file
effective_cache_size | 512MB | configuration file
lc_messages | en_US.UTF-8 | configuration file
lc_monetary | en_US.UTF-8 | configuration file
lc_numeric | en_US.UTF-8 | configuration file
lc_time | en_US.UTF-8 | configuration file
log_autovacuum_min_duration | 1s | configuration file
log_destination | stderr,syslog | configuration file
log_line_prefix | [%m]: | configuration file
log_min_duration_statement | 5min | configuration file
log_min_error_statement | notice | configuration file
log_rotation_age | 1d | configuration file
log_rotation_size | 0 | configuration file
log_temp_files | 1MB | configuration file
log_timezone | US/Pacific | environment variable
log_truncate_on_rotation | on | configuration file
logging_collector | on | configuration file
maintenance_work_mem | 384MB | configuration file
max_connections | 100 | configuration file
max_stack_depth | 2MB | environment variable
port | 5432 | command line
shared_buffers | 256MB | configuration file
syslog_facility | local0 | configuration file
TimeZone | US/Pacific | environment variable
wal_buffers | 1MB | configuration file
work_mem | 128MB | configuration file
(32 rows)
Duane Murphy wrote: > We are trying to improve performance by avoiding the temp file creation. > > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8068.125071", size 58988604 > STATEMENT: SELECT iiid.installed_item__id, item_detail.id, item_detail.model_id, item_detail.type > FROM installed_item__item_detail AS iiid > INNER JOIN item_detail ON iiid.item_detail__id = item_detail.id > INNER JOIN item ON (item.installed_item__id = iiid.installed_item__id ) > INNER JOIN model ON (item.id = model.item__id AND model.id = $1) > What are the causes of temp file creation? Operations like hash and sort that need more space than work_mem promises. > What additional information can I gather in order have a better understanding of how to improve this > query? It woul be really useful to see the result of "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." for your query. But essentially the answer to avoid temporary files is always "increase work_mem". Yours, Laurenz Albe
On Thu, Jun 18, 2015 at 12:38 PM, Duane Murphy <duane.murphy@gmail.com> wrote:
We are trying to improve performance by avoiding the temp file creation.LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8068.125071", size 58988604STATEMENT: SELECT iiid.installed_item__id, item_detail.id, item_detail.model_id, item_detail.typeFROM installed_item__item_detail AS iiidINNER JOIN item_detail ON iiid.item_detail__id = item_detail.idINNER JOIN item ON (item.installed_item__id = iiid.installed_item__id )Our hypothesis is that the temp file creation is caused by the high row count of theinstalled_item__item_detail table.installed_item__item_detail: 72916824 rows (27 GB)item_detail: 59212436 rows (40 GB)The other two tables, item and model, are temporary tables created during this particular process. Unfortunately, I don't have those table sizes.
Those temporary tables aren't providing any output to the query, so their only role must be to restrict the rows returned by the permanent tables. If they restrict that by a lot, then it could do a nested loop over the temp tables, doing indexed queries against the permanent tables assuming you have the right indexes.
Temporary tables do not get analyzed automatically, so you should probably run ANALYZE on them explicitly before this big query.
What additional information can I gather in order have a better understanding of how to improve this query?
What indexes do the tables have? What is the output of EXPLAIN, or better yet EXPLAIN (ANALYZE,BUFFERS), for the query?
Cheers,
Jeff