Re: Single query uses all system memory - Mailing list pgsql-general

From Fabio Pardi
Subject Re: Single query uses all system memory
Date
Msg-id 5f160fdf-7fa3-f840-1d5d-b827141dbd45@portavita.eu
Whole thread Raw
In response to Re: Single query uses all system memory  (Magnus Hagander <magnus@hagander.net>)
List pgsql-general

Hi Ivar,

I agree with Magnus. As addition, also i would change:

 effective_cache_size
and

 maintenance_work_mem

Then disable OOM killer, change logging in order to log files on disk to see if your work_mem is too low and spills on disk, rethink your data structure and, overall, I think that you can have a look to this page, which I find a good starting point for a proper tuning:

https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server


regards,

fabio pardi


On 26/06/18 12:13, Magnus Hagander wrote:


On Tue, Jun 26, 2018 at 11:40 AM, Ivar Fredriksen <ivar.fredriksen@gmail.com> wrote:

A single large query is able to spend all the system memory (as reported by top), and the oom_reaper kills postgres. See bottom of email for an example query and logs.

 

Expected behavior would be that postgres is not killed and the query is limited to the shared_buffer memory, potentially failing, but not killing the postgres process.


Then your expectations are completely wrong. shared_buffers have nothing to do with limiting the memory of a query. shared_buffers set the size of the PostgreSQL cache, not the working set. That's controlled by work_mem (see below).

 

Do I have some fundamental flaw in my understanding of this, or is there some sort of configuration that should be in place that is missing? The offending sample query has been

rewritten to not use joins, and will no longer crash postgres. I am not asking anyone to spend a lot of time analyzing the query itself, it is just provided as an example for when the problem will occur.

My question is more in a broader sense why one query is eating through all system memory, and is there anything I can do to prevent this from happening? 

We have set shared_buffers to 256MB on a test-system that has 1GB memory. Production machines have more resources, but the errors are happening in exactly the same way, so I assume (perhaps wrongly) that using the test

system specs and logs might give me the answers I need to figure out what is happening.

 

Technical details are provided below, a big thanks to anyone who can provide any help or insight to this.

 

Regards,

Ivar Fredriksen

 

PostgreSQL version number:

PostgreSQL 10.4 (Debian 10.4-2.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit

Installed with the debian packages for postgres found at: deb http://apt.postgresql.org/pub/repos/apt/

 

Changes made to the settings in the postgresql.conf file:

             name             |             current_setting             |        source

------------------------------+-----------------------------------------+----------------------

 application_name             | psql                                    | client

 checkpoint_completion_target | 0.9                                     | configuration file

 client_encoding              | UTF8                                    | client

 cluster_name                 | 10/main                                 | configuration file

 DateStyle                    | ISO, MDY                                | configuration file

 default_statistics_target    | 100                                     | configuration file

 default_text_search_config   | pg_catalog.english                      | configuration file

 dynamic_shared_memory_type   | posix                                   | configuration file

 effective_cache_size         | 1536MB                                  | configuration file

 external_pid_file            | /var/run/postgresql/10-main.pid         | configuration file

 lc_messages                  | C                                       | configuration file

 lc_monetary                  | C                                       | configuration file

 lc_numeric                   | en_US.UTF-8                             | configuration file

 lc_time                      | C                                       | configuration file

 listen_addresses             | *                                       | configuration file

 log_line_prefix              | %m [%p] %q%u@%d                         | configuration file

 log_timezone                 | UTC                                     | configuration file

 maintenance_work_mem         | 128MB                                   | configuration file

 max_connections              | 100                                     | configuration file

 max_stack_depth              | 2MB                                     | environment variable

 max_wal_size                 | 2GB                                     | configuration file

 min_wal_size                 | 1GB                                     | configuration file

 port                         | 5432                                    | configuration file

 search_path                  | "$user", public, pg_catalog             | configuration file

 shared_buffers               | 256MB                                   | configuration file

 ssl                          | on                                      | configuration file

 ssl_cert_file                | /etc/ssl/certs/ssl-cert-snakeoil.pem    | configuration file

 ssl_key_file                 | /etc/ssl/private/ssl-cert-snakeoil.key  | configuration file

 standard_conforming_strings  | off                                     | configuration file

 stats_temp_directory         | /var/run/postgresql/10-main.pg_stat_tmp | configuration file

 tcp_keepalives_idle          | 0                                       | configuration file

 TimeZone                     | UTC                                     | configuration file

 unix_socket_directories      | /var/run/postgresql                     | configuration file

 wal_buffers                  | 16MB                                    | configuration file

 work_mem                     | 1310kB                                  | configuration file

(35 rows)

 

Operating system and version:

Debian GNU/Linux 9.4 (stretch)

Linux ip-172-31-30-104 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1 (2018-04-29) x86_64 GNU/Linux

 

Machine specs:

CPU: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz

Memory: 1GB

HDD: General purpose SSD from amazon ec2, only one disk


With only 1GB of memory, your value for work_mem is probably too high for a query like yours.

Even with 1MB, that's 1MB per join. You seem to have about 350 joins? That would be 350Mb right there. Plus a lot of more overhead.

In general, what's up with 350 joins?! That seems to be, ahem, quite insane.

For a query like that you probably have to reduce everything to bare minimums -- I'd try with shared_buffers at maybe 64MB and work_mem at 100kB and see if it passes. (it'll probably still be very slow, but that's to be expected from a query like that)

But I'd sum it up as (1) don't do 350 joins. Just don't. And (2) if you have to, then a machine with just 1Gb is simply too small.


//Magnus


pgsql-general by date:

Previous
From: Andreas Kretschmer
Date:
Subject: Re: We find few queries running three times simultaneously with sameparameters on postgres db
Next
From: Saurabh Agrawal
Date:
Subject: Re: We find few queries running three times simultaneously with sameparameters on postgres db