Thread: How to tune Postgres to take advantage of 256GB RAM hardware
I am trying to configure Postgres (version 9.5) to take advantage of very large memory environment. Server configuration has 256GB RAM, 12 cores and 2 SSDs on RAID0 and runs Ubuntu. Swap is set at 4GB.
The machine is used for data warehouse operations. Typically only 1 statement runs at a time (but these can be very complex, for example doing rolling window functions with partition by several fields, over 30 metrics over 40 million rows of data. Number of joins in statements rarely exceeds 5 (for big tables the number of joins is smaller). Window functions on large tables are common.
No other resource intensive processes are running at the same time.
All tables get recreated from scratch every day. If DB goes kaput, it's a minor inconvenience to restore from backups synced to the cloud.
I started off with these settings which I compiled from a variety of sources including pgtune (not sure how good they are, I'm not a system administrator):
shared_buffers = 65024MB
work_mem = 1680MB
maintenance_work_mem = 10GB
fsync = off
wal_buffers = 16MB
max_wal_size = 8GB
min_wal_size = 4GB
checkpoint_completion_target = 0.9
random_page_cost = 2.0
effective_cache_size = 192GB
default_statistics_target = 500
The first time I spotted something wrong was this 40 million row table mentioned above. Looking at the resources on Ubuntu, as soon as the statement started memory usage went up dramatically. Within a minute it went to 100% (yes, the whole 256GB!) and postgres crashed with the message FATAL: the database system is in recovery mode.
I've tried various different settings, more notably:
- I've reduced shared_buffers to 10GB but kept work_mem at 1600MB.
- I've added the following lines to /etc/sysctl.conf (pinched from google searches):
vm.swappiness = 0 vm.overcommit_memory = 2 vm.overcommit_ratio = 95 vm.dirty_ratio = 2 vm.dirty_background_ratio = 1
Query again crashed, this time with message "out of memory DETAIL: Failed on request of size 112".
With these settings, this is the screenshot as memory usage approaches 100%: https://www.evernote.com/l/AJIE90HcZwVG_o2KqjYIOn72eQHQx2pc0QI
I've then tried different settings for work_mem, not changing anything else.
work_mem = 400MB -> query runs fine but memory usage in the system doesn't exceed 1.3%
work_mem = 500MB -> usage hits 100% and postgres crashes out of memory.
So looks like work_mem is to blame. However, can someone explain why at 400MB Postgres does not seem to take advantage of the shedload of available memory in the system?!
Looking for suggestions here. I'm not a DB system administrator, I'm just an analyst who wants to get their analysis done fast and efficiently hence the hardware spec! What combination of settings can I try to make sure postgres makes full use of the available memory (without blindly trying various combinations)? How can I investigate what's limiting postgres from doing so?
I've done some reading but it's hard to tell what advice might apply to 2016 hardware.
Is there something else I need to configure on the Ubuntu side?
Getting really desperate here so any help is greatly appreciated!
Thanks
Carmen
Am 23. November 2016 23:15:25 MEZ, schrieb Carmen Mardiros <bluecorr@gmail.com>: > >various combinations)? How can I investigate what's limiting postgres >from >doing so? Why fsync=off? Please run the queries with EXPLAIN ANALYSE and show us the output. -- Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
Am 23. November 2016 23:15:25 MEZ, schrieb Carmen Mardiros <bluecorr@gmail.com>:
>
>various combinations)? How can I investigate what's limiting postgres
>from
>doing so?
Why fsync=off?
Please run the queries with EXPLAIN ANALYSE and show us the output.
--
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
Hi Andreas,Thanks for the reply. fsync is off because I don't care if the data gets corrupted in this environment if it means a performance gain, as it's only a minor inconvenience to restore from backups.Here is the query itself that pushes memory usage to 100%: http://pastebin.com/VzCAerwd . What's interesting is that if I execute this stand-alone, memory usage never exceeds 1.3% and query completes in 7 minutes. But if I paste 2 other queries in the query window, followed by this one and execute them all at the same time, by the time postgres reaches the 3rd (this query), memory usage goes up to 100%. I don't understand why this is. And the EXPLAIN output: https://explain.depesz.com/s/hwH5 (have to admit I don't know how to read this!). Any help is greatly appreciated. On Thu, 24 Nov 2016 at 04:19 Andreas Kretschmer <andreas@a-kretschmer.de> wrote:
Am 23. November 2016 23:15:25 MEZ, schrieb Carmen Mardiros <bluecorr@gmail.com>:
>
>various combinations)? How can I investigate what's limiting postgres
>from
>doing so?
Why fsync=off?
Please run the queries with EXPLAIN ANALYSE and show us the output.
--
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
> I am trying to configure Postgres (version 9.5)2016-11-24 9:17 GMT+01:00 Carmen Mardiros <bluecorr@gmail.com>:Hi Andreas,Thanks for the reply. fsync is off because I don't care if the data gets corrupted in this environment if it means a performance gain, as it's only a minor inconvenience to restore from backups.Here is the query itself that pushes memory usage to 100%: http://pastebin.com/VzCAerwd . What's interesting is that if I execute this stand-alone, memory usage never exceeds 1.3% and query completes in 7 minutes. But if I paste 2 other queries in the query window, followed by this one and execute them all at the same time, by the time postgres reaches the 3rd (this query), memory usage goes up to 100%. I don't understand why this is.And the EXPLAIN output: https://explain.depesz.com/s/hwH5 (have to admit I don't know how to read this!). Any help is greatly appreciated.On Thu, 24 Nov 2016 at 04:19 Andreas Kretschmer <andreas@a-kretschmer.de> wrote:
Am 23. November 2016 23:15:25 MEZ, schrieb Carmen Mardiros <bluecorr@gmail.com>:
>
>various combinations)? How can I investigate what's limiting postgres
>from
>doing so?
Why fsync=off?
Please run the queries with EXPLAIN ANALYSE and show us the output.
--
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
Carmen Mardiros <bluecorr@gmail.com> writes: > I've then tried different settings for work_mem, not changing anything else. > work_mem = 400MB -> query runs fine but memory usage in the system doesn't > exceed 1.3% > work_mem = 500MB -> usage hits 100% and postgres crashes out of memory. I suspect what may be happening is that when you push work_mem to >= 500MB, the planner decides it can replace the GroupAgg step with a HashAgg, which tries to form all the aggregate results at once in memory. Because of the drastic underestimate of the number of groups (2.7 mil vs 27 mil actual), the hash table is much bigger than the planner is expecting, causing memory consumption to bloat way beyond what it should be. You could confirm this idea by seeing if the EXPLAIN output changes that way depending on work_mem. (Use plain EXPLAIN, not EXPLAIN ANALYZE, so you don't actually run out of memory while experimenting.) If it's true, you might be able to improve the group-count estimate by increasing the statistics target for ANALYZE. However, the group-count underestimate only seems to be a factor of 10, so you'd still expect the memory usage to not be more than 5GB if the planner were getting it right otherwise. So there may be something else wrong, maybe a plain old memory leak. Can you generate a self-contained example that causes similar memory overconsumption? I'm guessing the behavior isn't very sensitive to the exact data you're using as long as the group counts are similar, so maybe you could post a script that generates junk test data that causes this, rather than needing 27M rows of real data. regards, tom lane