Re: Single query uses all system memory - Mailing list pgsql-general
From | Fabio Pardi |
---|---|
Subject | Re: Single query uses all system memory |
Date | |
Msg-id | 5f160fdf-7fa3-f840-1d5d-b827141dbd45@portavita.eu Whole thread Raw |
In response to | Re: Single query uses all system memory (Magnus Hagander <magnus@hagander.net>) |
List | pgsql-general |
Hi Ivar,
I agree with Magnus. As addition, also i would change:
effective_cache_size
and
maintenance_work_mem
Then disable OOM killer, change logging in order to log files on disk to see if your work_mem is too low and spills on disk, rethink your data structure and, overall, I think that you can have a look to this page, which I find a good starting point for a proper tuning:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
regards,
fabio pardi
On Tue, Jun 26, 2018 at 11:40 AM, Ivar Fredriksen <ivar.fredriksen@gmail.com> wrote:A single large query is able to spend all the system memory (as reported by top), and the oom_reaper kills postgres. See bottom of email for an example query and logs.
Expected behavior would be that postgres is not killed and the query is limited to the shared_buffer memory, potentially failing, but not killing the postgres process.
Then your expectations are completely wrong. shared_buffers have nothing to do with limiting the memory of a query. shared_buffers set the size of the PostgreSQL cache, not the working set. That's controlled by work_mem (see below).
Do I have some fundamental flaw in my understanding of this, or is there some sort of configuration that should be in place that is missing? The offending sample query has been
rewritten to not use joins, and will no longer crash postgres. I am not asking anyone to spend a lot of time analyzing the query itself, it is just provided as an example for when the problem will occur.
My question is more in a broader sense why one query is eating through all system memory, and is there anything I can do to prevent this from happening?
We have set shared_buffers to 256MB on a test-system that has 1GB memory. Production machines have more resources, but the errors are happening in exactly the same way, so I assume (perhaps wrongly) that using the test
system specs and logs might give me the answers I need to figure out what is happening.
Technical details are provided below, a big thanks to anyone who can provide any help or insight to this.
Regards,
Ivar Fredriksen
PostgreSQL version number:
PostgreSQL 10.4 (Debian 10.4-2.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit
Installed with the debian packages for postgres found at: deb http://apt.postgresql.org/pub/
repos/apt/
Changes made to the settings in the postgresql.conf file:
name | current_setting | source
------------------------------
+----------------------------- ------------+----------------- ----- application_name | psql
| client checkpoint_completion_target | 0.9 | configuration file
client_encoding | UTF8
| client cluster_name | 10/main
| configuration file DateStyle | ISO, MDY | configuration file
default_statistics_target | 100
| configuration file default_text_search_config | pg_catalog.english
| configuration file dynamic_shared_memory_type | posix
| configuration file effective_cache_size | 1536MB
| configuration file external_pid_file | /var/run/postgresql/10-main.
pid | configuration file lc_messages | C
| configuration file lc_monetary | C
| configuration file lc_numeric | en_US.UTF-8
| configuration file lc_time | C
| configuration file listen_addresses | *
| configuration file log_line_prefix | %m [%p] %q%u@%d
| configuration file log_timezone | UTC
| configuration file maintenance_work_mem | 128MB
| configuration file max_connections | 100
| configuration file max_stack_depth | 2MB
| environment variable max_wal_size | 2GB
| configuration file min_wal_size | 1GB
| configuration file port | 5432
| configuration file search_path | "$user", public, pg_catalog | configuration file
shared_buffers | 256MB
| configuration file ssl | on
| configuration file ssl_cert_file | /etc/ssl/certs/ssl-cert-
snakeoil.pem | configuration file ssl_key_file | /etc/ssl/private/ssl-cert-
snakeoil.key | configuration file standard_conforming_strings | off
| configuration file stats_temp_directory | /var/run/postgresql/10-main.
pg_stat_tmp | configuration file tcp_keepalives_idle | 0
| configuration file TimeZone | UTC
| configuration file unix_socket_directories | /var/run/postgresql
| configuration file wal_buffers | 16MB
| configuration file work_mem | 1310kB
| configuration file (35 rows)
Operating system and version:
Debian GNU/Linux 9.4 (stretch)
Linux ip-172-31-30-104 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1 (2018-04-29) x86_64 GNU/Linux
Machine specs:
CPU: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
Memory: 1GB
HDD: General purpose SSD from amazon ec2, only one disk
With only 1GB of memory, your value for work_mem is probably too high for a query like yours.Even with 1MB, that's 1MB per join. You seem to have about 350 joins? That would be 350Mb right there. Plus a lot of more overhead.In general, what's up with 350 joins?! That seems to be, ahem, quite insane.For a query like that you probably have to reduce everything to bare minimums -- I'd try with shared_buffers at maybe 64MB and work_mem at 100kB and see if it passes. (it'll probably still be very slow, but that's to be expected from a query like that)But I'd sum it up as (1) don't do 350 joins. Just don't. And (2) if you have to, then a machine with just 1Gb is simply too small.//Magnus
pgsql-general by date: