Thread: PostgreSQL on ZFS: performance tuning

PostgreSQL on ZFS: performance tuning

From

trafdev

Date:

29 July 2016, 06:05:06

Hi.

I have an OLAP-oriented DB (light occasional bulk writes and heavy
aggregated selects over large periods of data) based on Postgres 9.5.3.

Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS,
mirror).

The largest table is 13GB (with a 4GB index on it), other tables are 4,
2 and less than 1GB.

After reading a lot of articles and "howto-s" I've collected following
set of tweaks and hints:


ZFS pools creation:
zfs create zroot/ara/sqldb
zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql


zfs get primarycache,recordsize,logbias,compression zroot/ara/sqldb/pgsql
NAME                   PROPERTY      VALUE         SOURCE
zroot/ara/sqldb/pgsql  primarycache  all           local
zroot/ara/sqldb/pgsql  recordsize    8K            local
zroot/ara/sqldb/pgsql  logbias       latency       local
zroot/ara/sqldb/pgsql  compression   lz4           inherited from zroot

L2ARC is disabled
VDEV cache is disabled


pgsql -c "mkdir /ara/sqldb/pgsql/data_ix"
pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data"


/etc/sysctl.conf
vfs.zfs.metaslab.lba_weighting_enabled=0


postgresql.conf:
listen_addresses = '*'
max_connections = 100
shared_buffers = 16GB
effective_cache_size = 48GB
work_mem = 500MB
maintenance_work_mem = 2GB
min_wal_size = 4GB
max_wal_size = 8GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
random_page_cost = 1
log_lock_waits = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_destination = 'csvlog'
logging_collector = on
log_min_duration_statement = 10000
shared_preload_libraries = 'pg_stat_statements'
track_activity_query_size = 10000
track_io_timing = on


zfs-stats -A
------------------------------------------------------------------------
ZFS Subsystem Report                Thu Jul 28 21:58:46 2016
------------------------------------------------------------------------
ARC Summary: (HEALTHY)
    Memory Throttle Count:            0
ARC Misc:
    Deleted:                14.92b
    Recycle Misses:                7.01m
    Mutex Misses:                4.72m
    Evict Skips:                1.28b
ARC Size:                53.27%    32.59    GiB
    Target Size: (Adaptive)        53.28%    32.60    GiB
    Min Size (Hard Limit):        12.50%    7.65    GiB
    Max Size (High Water):        8:1    61.18    GiB
ARC Size Breakdown:
    Recently Used Cache Size:    92.83%    30.26    GiB
    Frequently Used Cache Size:    7.17%    2.34    GiB
ARC Hash Breakdown:
    Elements Max:                10.36m
    Elements Current:        78.09%    8.09m
    Collisions:                9.63b
    Chain Max:                26
    Chains:                    1.49m
------------------------------------------------------------------------

zfs-stats -E
------------------------------------------------------------------------
ZFS Subsystem Report                Thu Jul 28 21:59:57 2016
------------------------------------------------------------------------
ARC Efficiency:                    49.85b
    Cache Hit Ratio:        70.94%    35.36b
    Cache Miss Ratio:        29.06%    14.49b
    Actual Hit Ratio:        66.32%    33.06b
    Data Demand Efficiency:        84.85%    25.39b
    Data Prefetch Efficiency:    17.85%    12.90b
    CACHE HITS BY CACHE LIST:
      Anonymously Used:        4.10%    1.45b
      Most Recently Used:        37.82%    13.37b
      Most Frequently Used:        55.67%    19.68b
      Most Recently Used Ghost:    0.58%    203.42m
      Most Frequently Used Ghost:    1.84%    649.83m
    CACHE HITS BY DATA TYPE:
      Demand Data:            60.92%    21.54b
      Prefetch Data:        6.51%    2.30b
      Demand Metadata:        32.56%    11.51b
      Prefetch Metadata:        0.00%    358.22k
    CACHE MISSES BY DATA TYPE:
      Demand Data:            26.55%    3.85b
      Prefetch Data:        73.13%    10.59b
      Demand Metadata:        0.31%    44.95m
      Prefetch Metadata:        0.00%    350.48k

zfs-stats -Z
------------------------------------------------------------------------
ZFS Subsystem Report                Thu Jul 28 22:02:46 2016
------------------------------------------------------------------------
File-Level Prefetch: (HEALTHY)
DMU Efficiency:                    49.97b
    Hit Ratio:            55.85%    27.90b
    Miss Ratio:            44.15%    22.06b
    Colinear:                22.06b
      Hit Ratio:            0.04%    7.93m
      Miss Ratio:            99.96%    22.05b
    Stride:                    17.85b
      Hit Ratio:            99.61%    17.78b
      Miss Ratio:            0.39%    69.46m
DMU Misc:
    Reclaim:                22.05b
      Successes:            0.05%    10.53m
      Failures:            99.95%    22.04b
    Streams:                10.14b
      +Resets:            0.10%    9.97m
      -Resets:            99.90%    10.13b
      Bogus:                0


Notes\concerns:

- primarycache=metadata (recommended in most articles) produces a
significant performance degradation (in SELECT queries);

- from what I can see, Postgres uses memory too carefully. I would like
somehow to force it to keep accessed data in memory as long as possible.
Instead I often see that even frequently accessed data is pushed out of
memory cache for no apparent reasons.

Do I miss something important in my configs? Are there any double
writes\reads somewhere because of OS\ZFS\Postgres caches? How to avoid them?

Please share your experience\tips. Thanks.

Re: PostgreSQL on ZFS: performance tuning

From

Tomas Vondra

Date:

29 July 2016, 06:31:04


On 07/29/2016 08:04 AM, trafdev wrote:
> Hi.
>
> I have an OLAP-oriented DB (light occasional bulk writes and heavy
> aggregated selects over large periods of data) based on Postgres 9.5.3.
>
> Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS,
> mirror).
>
> The largest table is 13GB (with a 4GB index on it), other tables are 4,
> 2 and less than 1GB.
>
> After reading a lot of articles and "howto-s" I've collected following
> set of tweaks and hints:
>
>
> ZFS pools creation:
> zfs create zroot/ara/sqldb
> zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql
>
>
> zfs get primarycache,recordsize,logbias,compression zroot/ara/sqldb/pgsql
> NAME                   PROPERTY      VALUE         SOURCE
> zroot/ara/sqldb/pgsql  primarycache  all           local
> zroot/ara/sqldb/pgsql  recordsize    8K            local
> zroot/ara/sqldb/pgsql  logbias       latency       local
> zroot/ara/sqldb/pgsql  compression   lz4           inherited from zroot
>
> L2ARC is disabled
> VDEV cache is disabled
>
>
> pgsql -c "mkdir /ara/sqldb/pgsql/data_ix"
> pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data"
>
>
> /etc/sysctl.conf
> vfs.zfs.metaslab.lba_weighting_enabled=0
>
>
> postgresql.conf:
> listen_addresses = '*'
> max_connections = 100
> shared_buffers = 16GB
> effective_cache_size = 48GB

It may not be a problem for your workload, but this effective_cache_size
value is far too high.

> work_mem = 500MB
> maintenance_work_mem = 2GB
> min_wal_size = 4GB
> max_wal_size = 8GB
> checkpoint_completion_target = 0.9

You probably need to increase the checkpoint_timeout too.

> wal_buffers = 16MB
> default_statistics_target = 500
> random_page_cost = 1
> log_lock_waits = on
> log_directory = 'pg_log'
> log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
> log_destination = 'csvlog'
> logging_collector = on
> log_min_duration_statement = 10000
> shared_preload_libraries = 'pg_stat_statements'
> track_activity_query_size = 10000
> track_io_timing = on
>
>
> zfs-stats -A
> ------------------------------------------------------------------------
> ZFS Subsystem Report                Thu Jul 28 21:58:46 2016
> ------------------------------------------------------------------------
> ARC Summary: (HEALTHY)
>     Memory Throttle Count:            0
> ARC Misc:
>     Deleted:                14.92b
>     Recycle Misses:                7.01m
>     Mutex Misses:                4.72m
>     Evict Skips:                1.28b
> ARC Size:                53.27%    32.59    GiB
>     Target Size: (Adaptive)        53.28%    32.60    GiB
>     Min Size (Hard Limit):        12.50%    7.65    GiB
>     Max Size (High Water):        8:1    61.18    GiB
> ARC Size Breakdown:
>     Recently Used Cache Size:    92.83%    30.26    GiB
>     Frequently Used Cache Size:    7.17%    2.34    GiB
> ARC Hash Breakdown:
>     Elements Max:                10.36m
>     Elements Current:        78.09%    8.09m
>     Collisions:                9.63b
>     Chain Max:                26
>     Chains:                    1.49m
> ------------------------------------------------------------------------
>
> zfs-stats -E
> ------------------------------------------------------------------------
> ZFS Subsystem Report                Thu Jul 28 21:59:57 2016
> ------------------------------------------------------------------------
> ARC Efficiency:                    49.85b
>     Cache Hit Ratio:        70.94%    35.36b
>     Cache Miss Ratio:        29.06%    14.49b
>     Actual Hit Ratio:        66.32%    33.06b
>     Data Demand Efficiency:        84.85%    25.39b
>     Data Prefetch Efficiency:    17.85%    12.90b
>     CACHE HITS BY CACHE LIST:
>       Anonymously Used:        4.10%    1.45b
>       Most Recently Used:        37.82%    13.37b
>       Most Frequently Used:        55.67%    19.68b
>       Most Recently Used Ghost:    0.58%    203.42m
>       Most Frequently Used Ghost:    1.84%    649.83m
>     CACHE HITS BY DATA TYPE:
>       Demand Data:            60.92%    21.54b
>       Prefetch Data:        6.51%    2.30b
>       Demand Metadata:        32.56%    11.51b
>       Prefetch Metadata:        0.00%    358.22k
>     CACHE MISSES BY DATA TYPE:
>       Demand Data:            26.55%    3.85b
>       Prefetch Data:        73.13%    10.59b
>       Demand Metadata:        0.31%    44.95m
>       Prefetch Metadata:        0.00%    350.48k
>
> zfs-stats -Z
> ------------------------------------------------------------------------
> ZFS Subsystem Report                Thu Jul 28 22:02:46 2016
> ------------------------------------------------------------------------
> File-Level Prefetch: (HEALTHY)
> DMU Efficiency:                    49.97b
>     Hit Ratio:            55.85%    27.90b
>     Miss Ratio:            44.15%    22.06b
>     Colinear:                22.06b
>       Hit Ratio:            0.04%    7.93m
>       Miss Ratio:            99.96%    22.05b
>     Stride:                    17.85b
>       Hit Ratio:            99.61%    17.78b
>       Miss Ratio:            0.39%    69.46m
> DMU Misc:
>     Reclaim:                22.05b
>       Successes:            0.05%    10.53m
>       Failures:            99.95%    22.04b
>     Streams:                10.14b
>       +Resets:            0.10%    9.97m
>       -Resets:            99.90%    10.13b
>       Bogus:                0
>
>
> Notes\concerns:
>
> - primarycache=metadata (recommended in most articles) produces a
> significant performance degradation (in SELECT queries);

Those articles are wrong. PostgreSQL relies of filesystem cache, so it
needs primarycache=all.

>
> - from what I can see, Postgres uses memory too carefully. I would like
> somehow to force it to keep accessed data in memory as long as possible.
> Instead I often see that even frequently accessed data is pushed out of
> memory cache for no apparent reasons.
 >

This is probably a consequence of the primarycache misconfiguration.

>
> Do I miss something important in my configs? Are there any double
> writes\reads somewhere because of OS\ZFS\Postgres caches? How to avoid
> them?
>
> Please share your experience\tips. Thanks.
>
>

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PostgreSQL on ZFS: performance tuning

From

trafdev

Date:

29 July 2016, 06:47:31

> > - from what I can see, Postgres uses memory too carefully. I would like
> > somehow to force it to keep accessed data in memory as long as possible.
> > Instead I often see that even frequently accessed data is pushed out of
> > memory cache for no apparent reasons.
> >
>
> This is probably a consequence of the primarycache misconfiguration.

Thanks! And I'm using "primarycache=all" in my deployment...

Re: PostgreSQL on ZFS: performance tuning

From

Torsten Zuehlsdorff

Date:

27 September 2016, 16:00:15


On 29.07.2016 08:30, Tomas Vondra wrote:
>
>
> On 07/29/2016 08:04 AM, trafdev wrote:
>> Hi.
>>
>> I have an OLAP-oriented DB (light occasional bulk writes and heavy
>> aggregated selects over large periods of data) based on Postgres 9.5.3.
>>
>> Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS,
>> mirror).
>>
>> The largest table is 13GB (with a 4GB index on it), other tables are 4,
>> 2 and less than 1GB.
>>
>> After reading a lot of articles and "howto-s" I've collected following
>> set of tweaks and hints:
>>
>>
>> ZFS pools creation:
>> zfs create zroot/ara/sqldb
>> zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql
>>
>>
>> zfs get primarycache,recordsize,logbias,compression zroot/ara/sqldb/pgsql
>> NAME                   PROPERTY      VALUE         SOURCE
>> zroot/ara/sqldb/pgsql  primarycache  all           local
>> zroot/ara/sqldb/pgsql  recordsize    8K            local
>> zroot/ara/sqldb/pgsql  logbias       latency       local
>> zroot/ara/sqldb/pgsql  compression   lz4           inherited from zroot
>>
>> L2ARC is disabled
>> VDEV cache is disabled
>>
>>
>> pgsql -c "mkdir /ara/sqldb/pgsql/data_ix"
>> pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data"
>>
>>
>> /etc/sysctl.conf
>> vfs.zfs.metaslab.lba_weighting_enabled=0
>>
>>
>> postgresql.conf:
>> listen_addresses = '*'
>> max_connections = 100
>> shared_buffers = 16GB
>> effective_cache_size = 48GB
>
> It may not be a problem for your workload, but this effective_cache_size
> value is far too high.

May i asked why? ZFS in default caches your size of RAM minus 1 GB.
Getting the shared buffer from the 64 GB RAM i would asume 47 GB would
be a better value. But this would not be far too high. So please can you
explain this?

Greetings,
Torsten

Re: PostgreSQL on ZFS: performance tuning

From

Tomas Vondra

Date:

27 September 2016, 21:38:50

On 09/27/2016 06:00 PM, Torsten Zuehlsdorff wrote:
>
>
> On 29.07.2016 08:30, Tomas Vondra wrote:
>>
>>
>> On 07/29/2016 08:04 AM, trafdev wrote:
>>> Hi.
>>>
>>> I have an OLAP-oriented DB (light occasional bulk writes and heavy
>>> aggregated selects over large periods of data) based on Postgres 9.5.3.
>>>
>>> Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS,
>>> mirror).
>>>
>>> The largest table is 13GB (with a 4GB index on it), other tables are 4,
>>> 2 and less than 1GB.
>>>
>>> After reading a lot of articles and "howto-s" I've collected following
>>> set of tweaks and hints:
>>>
>>>
>>> ZFS pools creation:
>>> zfs create zroot/ara/sqldb
>>> zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql
>>>
>>>
>>> zfs get primarycache,recordsize,logbias,compression
>>> zroot/ara/sqldb/pgsql
>>> NAME                   PROPERTY      VALUE         SOURCE
>>> zroot/ara/sqldb/pgsql  primarycache  all           local
>>> zroot/ara/sqldb/pgsql  recordsize    8K            local
>>> zroot/ara/sqldb/pgsql  logbias       latency       local
>>> zroot/ara/sqldb/pgsql  compression   lz4           inherited from zroot
>>>
>>> L2ARC is disabled
>>> VDEV cache is disabled
>>>
>>>
>>> pgsql -c "mkdir /ara/sqldb/pgsql/data_ix"
>>> pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data"
>>>
>>>
>>> /etc/sysctl.conf
>>> vfs.zfs.metaslab.lba_weighting_enabled=0
>>>
>>>
>>> postgresql.conf:
>>> listen_addresses = '*'
>>> max_connections = 100
>>> shared_buffers = 16GB
>>> effective_cache_size = 48GB
>>
>> It may not be a problem for your workload, but this effective_cache_size
>> value is far too high.
>
> May i asked why? ZFS in default caches your size of RAM minus 1 GB.
> Getting the shared buffer from the 64 GB RAM i would asume 47 GB
> would be a better value. But this would not be far too high. So
> please can you explain this?

Because it's not a global value, but an estimate of how much RAM is
available as a cache for a single query. So if you're running 10 queries
at the same time, they'll have to share the memory.

It's a bit trickier as there's often a fair amount of cross-backend
sharing (backends accessing the same data, so it's likely one backend
loads data into cache, and then other backends access it too).

It also ignores that memory may get allocated for other reasons - some
queries may allocate quite a bit of memory for sorts/aggregations, so
not only is

    effective_cache_size = RAM - shared_buffers

excessive as it ignores the per-query nature, but also because it
neglects these other allocations.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PostgreSQL on ZFS: performance tuning

From

Karl Denninger

Date:

27 September 2016, 22:15:37

On 9/27/2016 16:38, Tomas Vondra wrote:

On 09/27/2016 06:00 PM, Torsten Zuehlsdorff wrote:

On 29.07.2016 08:30, Tomas Vondra wrote:

On 07/29/2016 08:04 AM, trafdev wrote:
Hi.

I have an OLAP-oriented DB (light occasional bulk writes and heavy
aggregated selects over large periods of data) based on Postgres 9.5.3.

Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS,
mirror).

The largest table is 13GB (with a 4GB index on it), other tables are 4,
2 and less than 1GB.

After reading a lot of articles and "howto-s" I've collected following
set of tweaks and hints:

ZFS pools creation:
zfs create zroot/ara/sqldb
zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql

zfs get primarycache,recordsize,logbias,compression
zroot/ara/sqldb/pgsql
NAME                   PROPERTY      VALUE         SOURCE
zroot/ara/sqldb/pgsql primarycache all           local
zroot/ara/sqldb/pgsql recordsize    8K            local
zroot/ara/sqldb/pgsql logbias       latency       local
zroot/ara/sqldb/pgsql compression   lz4           inherited from zroot

L2ARC is disabled
VDEV cache is disabled

pgsql -c "mkdir /ara/sqldb/pgsql/data_ix"
pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data"

/etc/sysctl.conf
vfs.zfs.metaslab.lba_weighting_enabled=0

postgresql.conf:
listen_addresses = '*'
max_connections = 100
shared_buffers = 16GB
effective_cache_size = 48GB

It may not be a problem for your workload, but this effective_cache_size
value is far too high.

May i asked why? ZFS in default caches your size of RAM minus 1 GB.
Getting the shared buffer from the 64 GB RAM i would asume 47 GB
would be a better value. But this would not be far too high. So
please can you explain this?

Because it's not a global value, but an estimate of how much RAM is available as a cache for a single query. So if you're running 10 queries at the same time, they'll have to share the memory.

It's a bit trickier as there's often a fair amount of cross-backend sharing (backends accessing the same data, so it's likely one backend loads data into cache, and then other backends access it too).

It also ignores that memory may get allocated for other reasons - some queries may allocate quite a bit of memory for sorts/aggregations, so not only is

   effective_cache_size = RAM - shared_buffers

excessive as it ignores the per-query nature, but also because it neglects these other allocations.

regards

You may well find that with lz4 compression a 128kb record size on that filesystem is materially faster -- it is here for most workloads under Postgres.

--
Karl Denninger
karl@denninger.net
The Market Ticker
[S/MIME encrypted email preferred]

Attachment

smime.p7s

Re: PostgreSQL on ZFS: performance tuning

From

Jov

Date:

28 September 2016, 04:06:18

+1
larger record size can increase compression ratio，so reduce the io.

Did you set atime off for zfs?

2016年9月28日 6:16 AM，"Karl Denninger" <karl@denninger.net>写道：

On 9/27/2016 16:38, Tomas Vondra wrote:
On 09/27/2016 06:00 PM, Torsten Zuehlsdorff wrote:

On 29.07.2016 08:30, Tomas Vondra wrote:

On 07/29/2016 08:04 AM, trafdev wrote:
Hi.

I have an OLAP-oriented DB (light occasional bulk writes and heavy
aggregated selects over large periods of data) based on Postgres 9.5.3.

Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS,
mirror).

The largest table is 13GB (with a 4GB index on it), other tables are 4,
2 and less than 1GB.

After reading a lot of articles and "howto-s" I've collected following
set of tweaks and hints:

ZFS pools creation:
zfs create zroot/ara/sqldb
zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql

zfs get primarycache,recordsize,logbias,compression
zroot/ara/sqldb/pgsql
NAME                   PROPERTY      VALUE         SOURCE
zroot/ara/sqldb/pgsql primarycache all           local
zroot/ara/sqldb/pgsql recordsize    8K            local
zroot/ara/sqldb/pgsql logbias       latency       local
zroot/ara/sqldb/pgsql compression   lz4           inherited from zroot

L2ARC is disabled
VDEV cache is disabled

pgsql -c "mkdir /ara/sqldb/pgsql/data_ix"
pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data"

/etc/sysctl.conf
vfs.zfs.metaslab.lba_weighting_enabled=0

postgresql.conf:
listen_addresses = '*'
max_connections = 100
shared_buffers = 16GB
effective_cache_size = 48GB

It may not be a problem for your workload, but this effective_cache_size
value is far too high.

May i asked why? ZFS in default caches your size of RAM minus 1 GB.
Getting the shared buffer from the 64 GB RAM i would asume 47 GB
would be a better value. But this would not be far too high. So
please can you explain this?

Because it's not a global value, but an estimate of how much RAM is available as a cache for a single query. So if you're running 10 queries at the same time, they'll have to share the memory.

It's a bit trickier as there's often a fair amount of cross-backend sharing (backends accessing the same data, so it's likely one backend loads data into cache, and then other backends access it too).

It also ignores that memory may get allocated for other reasons - some queries may allocate quite a bit of memory for sorts/aggregations, so not only is

   effective_cache_size = RAM - shared_buffers

excessive as it ignores the per-query nature, but also because it neglects these other allocations.

regards

You may well find that with lz4 compression a 128kb record size on that filesystem is materially faster -- it is here for most workloads under Postgres.

--
Karl Denninger
karl@denninger.net
The Market Ticker
[S/MIME encrypted email preferred]

Re: PostgreSQL on ZFS: performance tuning

From

Jov

Date:

28 September 2016, 04:11:43

using zfs，you can tune full page write off for pg，which can save wal write io.

2016年7月29日 2:05 PM，"trafdev" <trafdev@mail.ru>写道：

Hi.

I have an OLAP-oriented DB (light occasional bulk writes and heavy aggregated selects over large periods of data) based on Postgres 9.5.3.

Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS, mirror).

The largest table is 13GB (with a 4GB index on it), other tables are 4, 2 and less than 1GB.

After reading a lot of articles and "howto-s" I've collected following set of tweaks and hints:

ZFS pools creation:
zfs create zroot/ara/sqldb
zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql

zfs get primarycache,recordsize,logbias,compression zroot/ara/sqldb/pgsql
NAME PROPERTY VALUE SOURCE
zroot/ara/sqldb/pgsql primarycache all local
zroot/ara/sqldb/pgsql recordsize 8K local
zroot/ara/sqldb/pgsql logbias latency local
zroot/ara/sqldb/pgsql compression lz4 inherited from zroot

L2ARC is disabled
VDEV cache is disabled

pgsql -c "mkdir /ara/sqldb/pgsql/data_ix"
pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data"

/etc/sysctl.conf
vfs.zfs.metaslab.lba_weighting_enabled=0

postgresql.conf:
listen_addresses = '*'
max_connections = 100
shared_buffers = 16GB
effective_cache_size = 48GB
work_mem = 500MB
maintenance_work_mem = 2GB
min_wal_size = 4GB
max_wal_size = 8GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
random_page_cost = 1
log_lock_waits = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_destination = 'csvlog'
logging_collector = on
log_min_duration_statement = 10000
shared_preload_libraries = 'pg_stat_statements'
track_activity_query_size = 10000
track_io_timing = on

zfs-stats -A
------------------------------------------------------------------------
ZFS Subsystem Report Thu Jul 28 21:58:46 2016
------------------------------------------------------------------------
ARC Summary: (HEALTHY)
Memory Throttle Count: 0
ARC Misc:
Deleted: 14.92b
Recycle Misses: 7.01m
Mutex Misses: 4.72m
Evict Skips: 1.28b
ARC Size: 53.27% 32.59 GiB
Target Size: (Adaptive) 53.28% 32.60 GiB
Min Size (Hard Limit): 12.50% 7.65 GiB
Max Size (High Water): 8:1 61.18 GiB
ARC Size Breakdown:
Recently Used Cache Size: 92.83% 30.26 GiB
Frequently Used Cache Size: 7.17% 2.34 GiB
ARC Hash Breakdown:
Elements Max: 10.36m
Elements Current: 78.09% 8.09m
Collisions: 9.63b
Chain Max: 26
Chains: 1.49m
------------------------------------------------------------------------

zfs-stats -E
------------------------------------------------------------------------
ZFS Subsystem Report Thu Jul 28 21:59:57 2016
------------------------------------------------------------------------
ARC Efficiency: 49.85b
Cache Hit Ratio: 70.94% 35.36b
Cache Miss Ratio: 29.06% 14.49b
Actual Hit Ratio: 66.32% 33.06b
Data Demand Efficiency: 84.85% 25.39b
Data Prefetch Efficiency: 17.85% 12.90b
CACHE HITS BY CACHE LIST:
Anonymously Used: 4.10% 1.45b
Most Recently Used: 37.82% 13.37b
Most Frequently Used: 55.67% 19.68b
Most Recently Used Ghost: 0.58% 203.42m
Most Frequently Used Ghost: 1.84% 649.83m
CACHE HITS BY DATA TYPE:
Demand Data: 60.92% 21.54b
Prefetch Data: 6.51% 2.30b
Demand Metadata: 32.56% 11.51b
Prefetch Metadata: 0.00% 358.22k
CACHE MISSES BY DATA TYPE:
Demand Data: 26.55% 3.85b
Prefetch Data: 73.13% 10.59b
Demand Metadata: 0.31% 44.95m
Prefetch Metadata: 0.00% 350.48k

zfs-stats -Z
------------------------------------------------------------------------
ZFS Subsystem Report Thu Jul 28 22:02:46 2016
------------------------------------------------------------------------
File-Level Prefetch: (HEALTHY)
DMU Efficiency: 49.97b
Hit Ratio: 55.85% 27.90b
Miss Ratio: 44.15% 22.06b
Colinear: 22.06b
Hit Ratio: 0.04% 7.93m
Miss Ratio: 99.96% 22.05b
Stride: 17.85b
Hit Ratio: 99.61% 17.78b
Miss Ratio: 0.39% 69.46m
DMU Misc:
Reclaim: 22.05b
Successes: 0.05% 10.53m
Failures: 99.95% 22.04b
Streams: 10.14b
+Resets: 0.10% 9.97m
-Resets: 99.90% 10.13b
Bogus: 0

Notes\concerns:

- primarycache=metadata (recommended in most articles) produces a significant performance degradation (in SELECT queries);

- from what I can see, Postgres uses memory too carefully. I would like somehow to force it to keep accessed data in memory as long as possible. Instead I often see that even frequently accessed data is pushed out of memory cache for no apparent reasons.

Do I miss something important in my configs? Are there any double writes\reads somewhere because of OS\ZFS\Postgres caches? How to avoid them?

Please share your experience\tips. Thanks.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: PostgreSQL on ZFS: performance tuning

From

Karl Denninger

Date:

28 September 2016, 11:44:43

On 9/27/2016 23:06, Jov wrote:

+1
larger record size can increase compression ratio，so reduce the io.
Did you set atime off for zfs?

2016年9月28日 6:16 AM，"Karl Denninger" <karl@denninger.net>写道：
On 9/27/2016 16:38, Tomas Vondra wrote:
On 09/27/2016 06:00 PM, Torsten Zuehlsdorff wrote:

On 29.07.2016 08:30, Tomas Vondra wrote:

On 07/29/2016 08:04 AM, trafdev wrote:
Hi.

I have an OLAP-oriented DB (light occasional bulk writes and heavy
aggregated selects over large periods of data) based on Postgres 9.5.3.

Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS,
mirror).

The largest table is 13GB (with a 4GB index on it), other tables are 4,
2 and less than 1GB.

After reading a lot of articles and "howto-s" I've collected following
set of tweaks and hints:

ZFS pools creation:
zfs create zroot/ara/sqldb
zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql

zfs get primarycache,recordsize,logbias,compression
zroot/ara/sqldb/pgsql
NAME                   PROPERTY      VALUE         SOURCE
zroot/ara/sqldb/pgsql primarycache all           local
zroot/ara/sqldb/pgsql recordsize    8K            local
zroot/ara/sqldb/pgsql logbias       latency       local
zroot/ara/sqldb/pgsql compression   lz4           inherited from zroot

L2ARC is disabled
VDEV cache is disabled

pgsql -c "mkdir /ara/sqldb/pgsql/data_ix"
pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data"

/etc/sysctl.conf
vfs.zfs.metaslab.lba_weighting_enabled=0

postgresql.conf:
listen_addresses = '*'
max_connections = 100
shared_buffers = 16GB
effective_cache_size = 48GB

It may not be a problem for your workload, but this effective_cache_size
value is far too high.

May i asked why? ZFS in default caches your size of RAM minus 1 GB.
Getting the shared buffer from the 64 GB RAM i would asume 47 GB
would be a better value. But this would not be far too high. So
please can you explain this?

Because it's not a global value, but an estimate of how much RAM is available as a cache for a single query. So if you're running 10 queries at the same time, they'll have to share the memory.

It's a bit trickier as there's often a fair amount of cross-backend sharing (backends accessing the same data, so it's likely one backend loads data into cache, and then other backends access it too).

It also ignores that memory may get allocated for other reasons - some queries may allocate quite a bit of memory for sorts/aggregations, so not only is

   effective_cache_size = RAM - shared_buffers

excessive as it ignores the per-query nature, but also because it neglects these other allocations.

regards

You may well find that with lz4 compression a 128kb record size on that filesystem is materially faster -- it is here for most workloads under Postgres.

--
Karl Denninger
karl@denninger.net
The Market Ticker
[S/MIME encrypted email preferred]

Yes.

Non-default stuff...

dbms/ticker-9.5 compressratio         1.88x                  -
dbms/ticker-9.5 mounted               yes                    -
dbms/ticker-9.5 quota                 none                   default
dbms/ticker-9.5 reservation           none                   default
dbms/ticker-9.5 recordsize            128K                   default
dbms/ticker-9.5 mountpoint            /dbms/ticker-9.5       local
dbms/ticker-9.5 sharenfs              off                    default
dbms/ticker-9.5 checksum              on                     default
dbms/ticker-9.5 compression           lz4                    inherited from dbms
dbms/ticker-9.5 atime                 off                    inherited from dbms
dbms/ticker-9.5 logbias               throughput             inherited from dbms

--
Karl Denninger
karl@denninger.net
The Market Ticker
[S/MIME encrypted email preferred]

Attachment

smime.p7s

Re: PostgreSQL on ZFS: performance tuning

From

trafdev

Date:

28 September 2016, 18:42:27

Thanks Jov and Karl!

What do you think about:

primarycache=all

for SELECT queries over same data sets?

Yes.

Non-default stuff...

dbms/ticker-9.5 compressratio         1.88x                  -
dbms/ticker-9.5 mounted               yes                    -
dbms/ticker-9.5 quota                 none                   default
dbms/ticker-9.5 reservation           none                   default
dbms/ticker-9.5 recordsize            128K                   default
dbms/ticker-9.5 mountpoint            /dbms/ticker-9.5       local
dbms/ticker-9.5 sharenfs              off                    default
dbms/ticker-9.5 checksum              on                     default
dbms/ticker-9.5 compression           lz4                    inherited from dbms
dbms/ticker-9.5 atime                 off                    inherited from dbms
dbms/ticker-9.5 logbias               throughput             inherited from dbms

--
Karl Denninger
karl@denninger.net
The Market Ticker
[S/MIME encrypted email preferred]

Re: PostgreSQL on ZFS: performance tuning

From

Karl Denninger

Date:

28 September 2016, 19:06:43

On 9/28/2016 13:42, trafdev wrote:

Thanks Jov and Karl!

What do you think about:

primarycache=all

for SELECT queries over same data sets?

Yes.

Non-default stuff...

dbms/ticker-9.5 compressratio         1.88x                  -
dbms/ticker-9.5 mounted               yes                    -
dbms/ticker-9.5 quota                 none                   default
dbms/ticker-9.5 reservation           none                   default
dbms/ticker-9.5 recordsize            128K                   default
dbms/ticker-9.5 mountpoint            /dbms/ticker-9.5       local
dbms/ticker-9.5 sharenfs              off                    default
dbms/ticker-9.5 checksum              on                     default
dbms/ticker-9.5 compression           lz4                    inherited from dbms
dbms/ticker-9.5 atime                 off                    inherited from dbms
dbms/ticker-9.5 logbias               throughput             inherited from dbms

--
Karl Denninger
karl@denninger.net
The Market Ticker
[S/MIME encrypted email preferred]

Primarycache=all is the default; changing it ought to be contemplated only under VERY specific circumstances. In the case of a database if you shut off "all" then an 8kb data read with 128kb blocksize will result in reading 128kb (the block size), returning the requested piece out of the 128kb and then throwing away the rest of the data read since you prohibited it from going into the ARC. That's almost-certainly going to do bad things for throughput!

Note that having an L2ARC, which is the place where you might find setting primarycache to have a benefit, is itself something you need to instrument under your specific workload to see if its worth it. If you want to know if it *might* be worth it you can use (on FreeBSD) zfs-stats -E; if you're seeing materially more than 15% cache misses then it *might* help, assuming what you put it on is *very* fast (e.g. SSD)

In addition if you're on FreeBSD (and you say you are) be aware that the vm system and ZFS interact in some "interesting" ways under certain load profiles. UMA is involved to a material degree in the issue. I have done quite a bit of work on the internal ZFS code in this regard; 11.x is better-behaved than 10.x to a quite-material degree. I have a patch set out against both 10.x and 11.x that address some (but not all) of the issues.

--
Karl Denninger
karl@denninger.net
The Market Ticker
[S/MIME encrypted email preferred]

Attachment

smime.p7s