Thread: PostgreSQL on ZFS: performance tuning
Hi. I have an OLAP-oriented DB (light occasional bulk writes and heavy aggregated selects over large periods of data) based on Postgres 9.5.3. Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS, mirror). The largest table is 13GB (with a 4GB index on it), other tables are 4, 2 and less than 1GB. After reading a lot of articles and "howto-s" I've collected following set of tweaks and hints: ZFS pools creation: zfs create zroot/ara/sqldb zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql zfs get primarycache,recordsize,logbias,compression zroot/ara/sqldb/pgsql NAME PROPERTY VALUE SOURCE zroot/ara/sqldb/pgsql primarycache all local zroot/ara/sqldb/pgsql recordsize 8K local zroot/ara/sqldb/pgsql logbias latency local zroot/ara/sqldb/pgsql compression lz4 inherited from zroot L2ARC is disabled VDEV cache is disabled pgsql -c "mkdir /ara/sqldb/pgsql/data_ix" pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data" /etc/sysctl.conf vfs.zfs.metaslab.lba_weighting_enabled=0 postgresql.conf: listen_addresses = '*' max_connections = 100 shared_buffers = 16GB effective_cache_size = 48GB work_mem = 500MB maintenance_work_mem = 2GB min_wal_size = 4GB max_wal_size = 8GB checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 500 random_page_cost = 1 log_lock_waits = on log_directory = 'pg_log' log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' log_destination = 'csvlog' logging_collector = on log_min_duration_statement = 10000 shared_preload_libraries = 'pg_stat_statements' track_activity_query_size = 10000 track_io_timing = on zfs-stats -A ------------------------------------------------------------------------ ZFS Subsystem Report Thu Jul 28 21:58:46 2016 ------------------------------------------------------------------------ ARC Summary: (HEALTHY) Memory Throttle Count: 0 ARC Misc: Deleted: 14.92b Recycle Misses: 7.01m Mutex Misses: 4.72m Evict Skips: 1.28b ARC Size: 53.27% 32.59 GiB Target Size: (Adaptive) 53.28% 32.60 GiB Min Size (Hard Limit): 12.50% 7.65 GiB Max Size (High Water): 8:1 61.18 GiB ARC Size Breakdown: Recently Used Cache Size: 92.83% 30.26 GiB Frequently Used Cache Size: 7.17% 2.34 GiB ARC Hash Breakdown: Elements Max: 10.36m Elements Current: 78.09% 8.09m Collisions: 9.63b Chain Max: 26 Chains: 1.49m ------------------------------------------------------------------------ zfs-stats -E ------------------------------------------------------------------------ ZFS Subsystem Report Thu Jul 28 21:59:57 2016 ------------------------------------------------------------------------ ARC Efficiency: 49.85b Cache Hit Ratio: 70.94% 35.36b Cache Miss Ratio: 29.06% 14.49b Actual Hit Ratio: 66.32% 33.06b Data Demand Efficiency: 84.85% 25.39b Data Prefetch Efficiency: 17.85% 12.90b CACHE HITS BY CACHE LIST: Anonymously Used: 4.10% 1.45b Most Recently Used: 37.82% 13.37b Most Frequently Used: 55.67% 19.68b Most Recently Used Ghost: 0.58% 203.42m Most Frequently Used Ghost: 1.84% 649.83m CACHE HITS BY DATA TYPE: Demand Data: 60.92% 21.54b Prefetch Data: 6.51% 2.30b Demand Metadata: 32.56% 11.51b Prefetch Metadata: 0.00% 358.22k CACHE MISSES BY DATA TYPE: Demand Data: 26.55% 3.85b Prefetch Data: 73.13% 10.59b Demand Metadata: 0.31% 44.95m Prefetch Metadata: 0.00% 350.48k zfs-stats -Z ------------------------------------------------------------------------ ZFS Subsystem Report Thu Jul 28 22:02:46 2016 ------------------------------------------------------------------------ File-Level Prefetch: (HEALTHY) DMU Efficiency: 49.97b Hit Ratio: 55.85% 27.90b Miss Ratio: 44.15% 22.06b Colinear: 22.06b Hit Ratio: 0.04% 7.93m Miss Ratio: 99.96% 22.05b Stride: 17.85b Hit Ratio: 99.61% 17.78b Miss Ratio: 0.39% 69.46m DMU Misc: Reclaim: 22.05b Successes: 0.05% 10.53m Failures: 99.95% 22.04b Streams: 10.14b +Resets: 0.10% 9.97m -Resets: 99.90% 10.13b Bogus: 0 Notes\concerns: - primarycache=metadata (recommended in most articles) produces a significant performance degradation (in SELECT queries); - from what I can see, Postgres uses memory too carefully. I would like somehow to force it to keep accessed data in memory as long as possible. Instead I often see that even frequently accessed data is pushed out of memory cache for no apparent reasons. Do I miss something important in my configs? Are there any double writes\reads somewhere because of OS\ZFS\Postgres caches? How to avoid them? Please share your experience\tips. Thanks.
On 07/29/2016 08:04 AM, trafdev wrote: > Hi. > > I have an OLAP-oriented DB (light occasional bulk writes and heavy > aggregated selects over large periods of data) based on Postgres 9.5.3. > > Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS, > mirror). > > The largest table is 13GB (with a 4GB index on it), other tables are 4, > 2 and less than 1GB. > > After reading a lot of articles and "howto-s" I've collected following > set of tweaks and hints: > > > ZFS pools creation: > zfs create zroot/ara/sqldb > zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql > > > zfs get primarycache,recordsize,logbias,compression zroot/ara/sqldb/pgsql > NAME PROPERTY VALUE SOURCE > zroot/ara/sqldb/pgsql primarycache all local > zroot/ara/sqldb/pgsql recordsize 8K local > zroot/ara/sqldb/pgsql logbias latency local > zroot/ara/sqldb/pgsql compression lz4 inherited from zroot > > L2ARC is disabled > VDEV cache is disabled > > > pgsql -c "mkdir /ara/sqldb/pgsql/data_ix" > pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data" > > > /etc/sysctl.conf > vfs.zfs.metaslab.lba_weighting_enabled=0 > > > postgresql.conf: > listen_addresses = '*' > max_connections = 100 > shared_buffers = 16GB > effective_cache_size = 48GB It may not be a problem for your workload, but this effective_cache_size value is far too high. > work_mem = 500MB > maintenance_work_mem = 2GB > min_wal_size = 4GB > max_wal_size = 8GB > checkpoint_completion_target = 0.9 You probably need to increase the checkpoint_timeout too. > wal_buffers = 16MB > default_statistics_target = 500 > random_page_cost = 1 > log_lock_waits = on > log_directory = 'pg_log' > log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' > log_destination = 'csvlog' > logging_collector = on > log_min_duration_statement = 10000 > shared_preload_libraries = 'pg_stat_statements' > track_activity_query_size = 10000 > track_io_timing = on > > > zfs-stats -A > ------------------------------------------------------------------------ > ZFS Subsystem Report Thu Jul 28 21:58:46 2016 > ------------------------------------------------------------------------ > ARC Summary: (HEALTHY) > Memory Throttle Count: 0 > ARC Misc: > Deleted: 14.92b > Recycle Misses: 7.01m > Mutex Misses: 4.72m > Evict Skips: 1.28b > ARC Size: 53.27% 32.59 GiB > Target Size: (Adaptive) 53.28% 32.60 GiB > Min Size (Hard Limit): 12.50% 7.65 GiB > Max Size (High Water): 8:1 61.18 GiB > ARC Size Breakdown: > Recently Used Cache Size: 92.83% 30.26 GiB > Frequently Used Cache Size: 7.17% 2.34 GiB > ARC Hash Breakdown: > Elements Max: 10.36m > Elements Current: 78.09% 8.09m > Collisions: 9.63b > Chain Max: 26 > Chains: 1.49m > ------------------------------------------------------------------------ > > zfs-stats -E > ------------------------------------------------------------------------ > ZFS Subsystem Report Thu Jul 28 21:59:57 2016 > ------------------------------------------------------------------------ > ARC Efficiency: 49.85b > Cache Hit Ratio: 70.94% 35.36b > Cache Miss Ratio: 29.06% 14.49b > Actual Hit Ratio: 66.32% 33.06b > Data Demand Efficiency: 84.85% 25.39b > Data Prefetch Efficiency: 17.85% 12.90b > CACHE HITS BY CACHE LIST: > Anonymously Used: 4.10% 1.45b > Most Recently Used: 37.82% 13.37b > Most Frequently Used: 55.67% 19.68b > Most Recently Used Ghost: 0.58% 203.42m > Most Frequently Used Ghost: 1.84% 649.83m > CACHE HITS BY DATA TYPE: > Demand Data: 60.92% 21.54b > Prefetch Data: 6.51% 2.30b > Demand Metadata: 32.56% 11.51b > Prefetch Metadata: 0.00% 358.22k > CACHE MISSES BY DATA TYPE: > Demand Data: 26.55% 3.85b > Prefetch Data: 73.13% 10.59b > Demand Metadata: 0.31% 44.95m > Prefetch Metadata: 0.00% 350.48k > > zfs-stats -Z > ------------------------------------------------------------------------ > ZFS Subsystem Report Thu Jul 28 22:02:46 2016 > ------------------------------------------------------------------------ > File-Level Prefetch: (HEALTHY) > DMU Efficiency: 49.97b > Hit Ratio: 55.85% 27.90b > Miss Ratio: 44.15% 22.06b > Colinear: 22.06b > Hit Ratio: 0.04% 7.93m > Miss Ratio: 99.96% 22.05b > Stride: 17.85b > Hit Ratio: 99.61% 17.78b > Miss Ratio: 0.39% 69.46m > DMU Misc: > Reclaim: 22.05b > Successes: 0.05% 10.53m > Failures: 99.95% 22.04b > Streams: 10.14b > +Resets: 0.10% 9.97m > -Resets: 99.90% 10.13b > Bogus: 0 > > > Notes\concerns: > > - primarycache=metadata (recommended in most articles) produces a > significant performance degradation (in SELECT queries); Those articles are wrong. PostgreSQL relies of filesystem cache, so it needs primarycache=all. > > - from what I can see, Postgres uses memory too carefully. I would like > somehow to force it to keep accessed data in memory as long as possible. > Instead I often see that even frequently accessed data is pushed out of > memory cache for no apparent reasons. > This is probably a consequence of the primarycache misconfiguration. > > Do I miss something important in my configs? Are there any double > writes\reads somewhere because of OS\ZFS\Postgres caches? How to avoid > them? > > Please share your experience\tips. Thanks. > > -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> > - from what I can see, Postgres uses memory too carefully. I would like > > somehow to force it to keep accessed data in memory as long as possible. > > Instead I often see that even frequently accessed data is pushed out of > > memory cache for no apparent reasons. > > > > This is probably a consequence of the primarycache misconfiguration. Thanks! And I'm using "primarycache=all" in my deployment...
On 29.07.2016 08:30, Tomas Vondra wrote: > > > On 07/29/2016 08:04 AM, trafdev wrote: >> Hi. >> >> I have an OLAP-oriented DB (light occasional bulk writes and heavy >> aggregated selects over large periods of data) based on Postgres 9.5.3. >> >> Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS, >> mirror). >> >> The largest table is 13GB (with a 4GB index on it), other tables are 4, >> 2 and less than 1GB. >> >> After reading a lot of articles and "howto-s" I've collected following >> set of tweaks and hints: >> >> >> ZFS pools creation: >> zfs create zroot/ara/sqldb >> zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql >> >> >> zfs get primarycache,recordsize,logbias,compression zroot/ara/sqldb/pgsql >> NAME PROPERTY VALUE SOURCE >> zroot/ara/sqldb/pgsql primarycache all local >> zroot/ara/sqldb/pgsql recordsize 8K local >> zroot/ara/sqldb/pgsql logbias latency local >> zroot/ara/sqldb/pgsql compression lz4 inherited from zroot >> >> L2ARC is disabled >> VDEV cache is disabled >> >> >> pgsql -c "mkdir /ara/sqldb/pgsql/data_ix" >> pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data" >> >> >> /etc/sysctl.conf >> vfs.zfs.metaslab.lba_weighting_enabled=0 >> >> >> postgresql.conf: >> listen_addresses = '*' >> max_connections = 100 >> shared_buffers = 16GB >> effective_cache_size = 48GB > > It may not be a problem for your workload, but this effective_cache_size > value is far too high. May i asked why? ZFS in default caches your size of RAM minus 1 GB. Getting the shared buffer from the 64 GB RAM i would asume 47 GB would be a better value. But this would not be far too high. So please can you explain this? Greetings, Torsten
On 09/27/2016 06:00 PM, Torsten Zuehlsdorff wrote: > > > On 29.07.2016 08:30, Tomas Vondra wrote: >> >> >> On 07/29/2016 08:04 AM, trafdev wrote: >>> Hi. >>> >>> I have an OLAP-oriented DB (light occasional bulk writes and heavy >>> aggregated selects over large periods of data) based on Postgres 9.5.3. >>> >>> Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS, >>> mirror). >>> >>> The largest table is 13GB (with a 4GB index on it), other tables are 4, >>> 2 and less than 1GB. >>> >>> After reading a lot of articles and "howto-s" I've collected following >>> set of tweaks and hints: >>> >>> >>> ZFS pools creation: >>> zfs create zroot/ara/sqldb >>> zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql >>> >>> >>> zfs get primarycache,recordsize,logbias,compression >>> zroot/ara/sqldb/pgsql >>> NAME PROPERTY VALUE SOURCE >>> zroot/ara/sqldb/pgsql primarycache all local >>> zroot/ara/sqldb/pgsql recordsize 8K local >>> zroot/ara/sqldb/pgsql logbias latency local >>> zroot/ara/sqldb/pgsql compression lz4 inherited from zroot >>> >>> L2ARC is disabled >>> VDEV cache is disabled >>> >>> >>> pgsql -c "mkdir /ara/sqldb/pgsql/data_ix" >>> pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data" >>> >>> >>> /etc/sysctl.conf >>> vfs.zfs.metaslab.lba_weighting_enabled=0 >>> >>> >>> postgresql.conf: >>> listen_addresses = '*' >>> max_connections = 100 >>> shared_buffers = 16GB >>> effective_cache_size = 48GB >> >> It may not be a problem for your workload, but this effective_cache_size >> value is far too high. > > May i asked why? ZFS in default caches your size of RAM minus 1 GB. > Getting the shared buffer from the 64 GB RAM i would asume 47 GB > would be a better value. But this would not be far too high. So > please can you explain this? Because it's not a global value, but an estimate of how much RAM is available as a cache for a single query. So if you're running 10 queries at the same time, they'll have to share the memory. It's a bit trickier as there's often a fair amount of cross-backend sharing (backends accessing the same data, so it's likely one backend loads data into cache, and then other backends access it too). It also ignores that memory may get allocated for other reasons - some queries may allocate quite a bit of memory for sorts/aggregations, so not only is effective_cache_size = RAM - shared_buffers excessive as it ignores the per-query nature, but also because it neglects these other allocations. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 9/27/2016 16:38, Tomas Vondra wrote:
On 09/27/2016 06:00 PM, Torsten Zuehlsdorff wrote:You may well find that with lz4 compression a 128kb record size on that filesystem is materially faster -- it is here for most workloads under Postgres.
On 29.07.2016 08:30, Tomas Vondra wrote:
On 07/29/2016 08:04 AM, trafdev wrote:Hi.
I have an OLAP-oriented DB (light occasional bulk writes and heavy
aggregated selects over large periods of data) based on Postgres 9.5.3.
Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS,
mirror).
The largest table is 13GB (with a 4GB index on it), other tables are 4,
2 and less than 1GB.
After reading a lot of articles and "howto-s" I've collected following
set of tweaks and hints:
ZFS pools creation:
zfs create zroot/ara/sqldb
zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql
zfs get primarycache,recordsize,logbias,compression
zroot/ara/sqldb/pgsql
NAME PROPERTY VALUE SOURCE
zroot/ara/sqldb/pgsql primarycache all local
zroot/ara/sqldb/pgsql recordsize 8K local
zroot/ara/sqldb/pgsql logbias latency local
zroot/ara/sqldb/pgsql compression lz4 inherited from zroot
L2ARC is disabled
VDEV cache is disabled
pgsql -c "mkdir /ara/sqldb/pgsql/data_ix"
pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data"
/etc/sysctl.conf
vfs.zfs.metaslab.lba_weighting_enabled=0
postgresql.conf:
listen_addresses = '*'
max_connections = 100
shared_buffers = 16GB
effective_cache_size = 48GB
It may not be a problem for your workload, but this effective_cache_size
value is far too high.
May i asked why? ZFS in default caches your size of RAM minus 1 GB.
Getting the shared buffer from the 64 GB RAM i would asume 47 GB
would be a better value. But this would not be far too high. So
please can you explain this?
Because it's not a global value, but an estimate of how much RAM is available as a cache for a single query. So if you're running 10 queries at the same time, they'll have to share the memory.
It's a bit trickier as there's often a fair amount of cross-backend sharing (backends accessing the same data, so it's likely one backend loads data into cache, and then other backends access it too).
It also ignores that memory may get allocated for other reasons - some queries may allocate quite a bit of memory for sorts/aggregations, so not only is
effective_cache_size = RAM - shared_buffers
excessive as it ignores the per-query nature, but also because it neglects these other allocations.
regards
Attachment
+1
larger record size can increase compression ratio,so reduce the io.
Did you set atime off for zfs?
2016年9月28日 6:16 AM,"Karl Denninger" <karl@denninger.net>写道:
On 9/27/2016 16:38, Tomas Vondra wrote:On 09/27/2016 06:00 PM, Torsten Zuehlsdorff wrote:You may well find that with lz4 compression a 128kb record size on that filesystem is materially faster -- it is here for most workloads under Postgres.
On 29.07.2016 08:30, Tomas Vondra wrote:
On 07/29/2016 08:04 AM, trafdev wrote:Hi.
I have an OLAP-oriented DB (light occasional bulk writes and heavy
aggregated selects over large periods of data) based on Postgres 9.5.3.
Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS,
mirror).
The largest table is 13GB (with a 4GB index on it), other tables are 4,
2 and less than 1GB.
After reading a lot of articles and "howto-s" I've collected following
set of tweaks and hints:
ZFS pools creation:
zfs create zroot/ara/sqldb
zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql
zfs get primarycache,recordsize,logbias,compression
zroot/ara/sqldb/pgsql
NAME PROPERTY VALUE SOURCE
zroot/ara/sqldb/pgsql primarycache all local
zroot/ara/sqldb/pgsql recordsize 8K local
zroot/ara/sqldb/pgsql logbias latency local
zroot/ara/sqldb/pgsql compression lz4 inherited from zroot
L2ARC is disabled
VDEV cache is disabled
pgsql -c "mkdir /ara/sqldb/pgsql/data_ix"
pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data"
/etc/sysctl.conf
vfs.zfs.metaslab.lba_weighting_enabled=0
postgresql.conf:
listen_addresses = '*'
max_connections = 100
shared_buffers = 16GB
effective_cache_size = 48GB
It may not be a problem for your workload, but this effective_cache_size
value is far too high.
May i asked why? ZFS in default caches your size of RAM minus 1 GB.
Getting the shared buffer from the 64 GB RAM i would asume 47 GB
would be a better value. But this would not be far too high. So
please can you explain this?
Because it's not a global value, but an estimate of how much RAM is available as a cache for a single query. So if you're running 10 queries at the same time, they'll have to share the memory.
It's a bit trickier as there's often a fair amount of cross-backend sharing (backends accessing the same data, so it's likely one backend loads data into cache, and then other backends access it too).
It also ignores that memory may get allocated for other reasons - some queries may allocate quite a bit of memory for sorts/aggregations, so not only is
effective_cache_size = RAM - shared_buffers
excessive as it ignores the per-query nature, but also because it neglects these other allocations.
regards
using zfs,you can tune full page write off for pg,which can save wal write io.
2016年7月29日 2:05 PM,"trafdev" <trafdev@mail.ru>写道:
Hi.
I have an OLAP-oriented DB (light occasional bulk writes and heavy aggregated selects over large periods of data) based on Postgres 9.5.3.
Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS, mirror).
The largest table is 13GB (with a 4GB index on it), other tables are 4, 2 and less than 1GB.
After reading a lot of articles and "howto-s" I've collected following set of tweaks and hints:
ZFS pools creation:
zfs create zroot/ara/sqldb
zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql
zfs get primarycache,recordsize,logbias,compression zroot/ara/sqldb/pgsql
NAME PROPERTY VALUE SOURCE
zroot/ara/sqldb/pgsql primarycache all local
zroot/ara/sqldb/pgsql recordsize 8K local
zroot/ara/sqldb/pgsql logbias latency local
zroot/ara/sqldb/pgsql compression lz4 inherited from zroot
L2ARC is disabled
VDEV cache is disabled
pgsql -c "mkdir /ara/sqldb/pgsql/data_ix"
pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data"
/etc/sysctl.conf
vfs.zfs.metaslab.lba_weighting_enabled=0
postgresql.conf:
listen_addresses = '*'
max_connections = 100
shared_buffers = 16GB
effective_cache_size = 48GB
work_mem = 500MB
maintenance_work_mem = 2GB
min_wal_size = 4GB
max_wal_size = 8GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
random_page_cost = 1
log_lock_waits = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_destination = 'csvlog'
logging_collector = on
log_min_duration_statement = 10000
shared_preload_libraries = 'pg_stat_statements'
track_activity_query_size = 10000
track_io_timing = on
zfs-stats -A
------------------------------------------------------------ ------------
ZFS Subsystem Report Thu Jul 28 21:58:46 2016
------------------------------------------------------------ ------------
ARC Summary: (HEALTHY)
Memory Throttle Count: 0
ARC Misc:
Deleted: 14.92b
Recycle Misses: 7.01m
Mutex Misses: 4.72m
Evict Skips: 1.28b
ARC Size: 53.27% 32.59 GiB
Target Size: (Adaptive) 53.28% 32.60 GiB
Min Size (Hard Limit): 12.50% 7.65 GiB
Max Size (High Water): 8:1 61.18 GiB
ARC Size Breakdown:
Recently Used Cache Size: 92.83% 30.26 GiB
Frequently Used Cache Size: 7.17% 2.34 GiB
ARC Hash Breakdown:
Elements Max: 10.36m
Elements Current: 78.09% 8.09m
Collisions: 9.63b
Chain Max: 26
Chains: 1.49m
------------------------------------------------------------ ------------
zfs-stats -E
------------------------------------------------------------ ------------
ZFS Subsystem Report Thu Jul 28 21:59:57 2016
------------------------------------------------------------ ------------
ARC Efficiency: 49.85b
Cache Hit Ratio: 70.94% 35.36b
Cache Miss Ratio: 29.06% 14.49b
Actual Hit Ratio: 66.32% 33.06b
Data Demand Efficiency: 84.85% 25.39b
Data Prefetch Efficiency: 17.85% 12.90b
CACHE HITS BY CACHE LIST:
Anonymously Used: 4.10% 1.45b
Most Recently Used: 37.82% 13.37b
Most Frequently Used: 55.67% 19.68b
Most Recently Used Ghost: 0.58% 203.42m
Most Frequently Used Ghost: 1.84% 649.83m
CACHE HITS BY DATA TYPE:
Demand Data: 60.92% 21.54b
Prefetch Data: 6.51% 2.30b
Demand Metadata: 32.56% 11.51b
Prefetch Metadata: 0.00% 358.22k
CACHE MISSES BY DATA TYPE:
Demand Data: 26.55% 3.85b
Prefetch Data: 73.13% 10.59b
Demand Metadata: 0.31% 44.95m
Prefetch Metadata: 0.00% 350.48k
zfs-stats -Z
------------------------------------------------------------ ------------
ZFS Subsystem Report Thu Jul 28 22:02:46 2016
------------------------------------------------------------ ------------
File-Level Prefetch: (HEALTHY)
DMU Efficiency: 49.97b
Hit Ratio: 55.85% 27.90b
Miss Ratio: 44.15% 22.06b
Colinear: 22.06b
Hit Ratio: 0.04% 7.93m
Miss Ratio: 99.96% 22.05b
Stride: 17.85b
Hit Ratio: 99.61% 17.78b
Miss Ratio: 0.39% 69.46m
DMU Misc:
Reclaim: 22.05b
Successes: 0.05% 10.53m
Failures: 99.95% 22.04b
Streams: 10.14b
+Resets: 0.10% 9.97m
-Resets: 99.90% 10.13b
Bogus: 0
Notes\concerns:
- primarycache=metadata (recommended in most articles) produces a significant performance degradation (in SELECT queries);
- from what I can see, Postgres uses memory too carefully. I would like somehow to force it to keep accessed data in memory as long as possible. Instead I often see that even frequently accessed data is pushed out of memory cache for no apparent reasons.
Do I miss something important in my configs? Are there any double writes\reads somewhere because of OS\ZFS\Postgres caches? How to avoid them?
Please share your experience\tips. Thanks.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On 9/27/2016 23:06, Jov wrote:
Yes.
Non-default stuff...
dbms/ticker-9.5 compressratio 1.88x -
dbms/ticker-9.5 mounted yes -
dbms/ticker-9.5 quota none default
dbms/ticker-9.5 reservation none default
dbms/ticker-9.5 recordsize 128K default
dbms/ticker-9.5 mountpoint /dbms/ticker-9.5 local
dbms/ticker-9.5 sharenfs off default
dbms/ticker-9.5 checksum on default
dbms/ticker-9.5 compression lz4 inherited from dbms
dbms/ticker-9.5 atime off inherited from dbms
dbms/ticker-9.5 logbias throughput inherited from dbms
+1
larger record size can increase compression ratio,so reduce the io.Did you set atime off for zfs?
2016年9月28日 6:16 AM,"Karl Denninger" <karl@denninger.net>写道:On 9/27/2016 16:38, Tomas Vondra wrote:On 09/27/2016 06:00 PM, Torsten Zuehlsdorff wrote:You may well find that with lz4 compression a 128kb record size on that filesystem is materially faster -- it is here for most workloads under Postgres.
On 29.07.2016 08:30, Tomas Vondra wrote:
On 07/29/2016 08:04 AM, trafdev wrote:Hi.
I have an OLAP-oriented DB (light occasional bulk writes and heavy
aggregated selects over large periods of data) based on Postgres 9.5.3.
Server is a FreeBSD 10.3 with 64GB of RAM and 2x500GB SSD (root on ZFS,
mirror).
The largest table is 13GB (with a 4GB index on it), other tables are 4,
2 and less than 1GB.
After reading a lot of articles and "howto-s" I've collected following
set of tweaks and hints:
ZFS pools creation:
zfs create zroot/ara/sqldb
zfs create -o recordsize=8k -o primarycache=all zroot/ara/sqldb/pgsql
zfs get primarycache,recordsize,logbias,compression
zroot/ara/sqldb/pgsql
NAME PROPERTY VALUE SOURCE
zroot/ara/sqldb/pgsql primarycache all local
zroot/ara/sqldb/pgsql recordsize 8K local
zroot/ara/sqldb/pgsql logbias latency local
zroot/ara/sqldb/pgsql compression lz4 inherited from zroot
L2ARC is disabled
VDEV cache is disabled
pgsql -c "mkdir /ara/sqldb/pgsql/data_ix"
pgsql -c "initdb --locale=en_US.UTF-8 -E UTF-8 -D /ara/sqldb/pgsql/data"
/etc/sysctl.conf
vfs.zfs.metaslab.lba_weighting_enabled=0
postgresql.conf:
listen_addresses = '*'
max_connections = 100
shared_buffers = 16GB
effective_cache_size = 48GB
It may not be a problem for your workload, but this effective_cache_size
value is far too high.
May i asked why? ZFS in default caches your size of RAM minus 1 GB.
Getting the shared buffer from the 64 GB RAM i would asume 47 GB
would be a better value. But this would not be far too high. So
please can you explain this?
Because it's not a global value, but an estimate of how much RAM is available as a cache for a single query. So if you're running 10 queries at the same time, they'll have to share the memory.
It's a bit trickier as there's often a fair amount of cross-backend sharing (backends accessing the same data, so it's likely one backend loads data into cache, and then other backends access it too).
It also ignores that memory may get allocated for other reasons - some queries may allocate quite a bit of memory for sorts/aggregations, so not only is
effective_cache_size = RAM - shared_buffers
excessive as it ignores the per-query nature, but also because it neglects these other allocations.
regards
Yes.
Non-default stuff...
dbms/ticker-9.5 compressratio 1.88x -
dbms/ticker-9.5 mounted yes -
dbms/ticker-9.5 quota none default
dbms/ticker-9.5 reservation none default
dbms/ticker-9.5 recordsize 128K default
dbms/ticker-9.5 mountpoint /dbms/ticker-9.5 local
dbms/ticker-9.5 sharenfs off default
dbms/ticker-9.5 checksum on default
dbms/ticker-9.5 compression lz4 inherited from dbms
dbms/ticker-9.5 atime off inherited from dbms
dbms/ticker-9.5 logbias throughput inherited from dbms
Attachment
Thanks Jov and Karl!
What do you think about:
primarycache=all
for SELECT queries over same data sets?
What do you think about:
primarycache=all
for SELECT queries over same data sets?
Yes.
Non-default stuff...
dbms/ticker-9.5 compressratio 1.88x -
dbms/ticker-9.5 mounted yes -
dbms/ticker-9.5 quota none default
dbms/ticker-9.5 reservation none default
dbms/ticker-9.5 recordsize 128K default
dbms/ticker-9.5 mountpoint /dbms/ticker-9.5 local
dbms/ticker-9.5 sharenfs off default
dbms/ticker-9.5 checksum on default
dbms/ticker-9.5 compression lz4 inherited from dbms
dbms/ticker-9.5 atime off inherited from dbms
dbms/ticker-9.5 logbias throughput inherited from dbms
On 9/28/2016 13:42, trafdev wrote:
Note that having an L2ARC, which is the place where you might find setting primarycache to have a benefit, is itself something you need to instrument under your specific workload to see if its worth it. If you want to know if it *might* be worth it you can use (on FreeBSD) zfs-stats -E; if you're seeing materially more than 15% cache misses then it *might* help, assuming what you put it on is *very* fast (e.g. SSD)
In addition if you're on FreeBSD (and you say you are) be aware that the vm system and ZFS interact in some "interesting" ways under certain load profiles. UMA is involved to a material degree in the issue. I have done quite a bit of work on the internal ZFS code in this regard; 11.x is better-behaved than 10.x to a quite-material degree. I have a patch set out against both 10.x and 11.x that address some (but not all) of the issues.
Thanks Jov and Karl!Primarycache=all is the default; changing it ought to be contemplated only under VERY specific circumstances. In the case of a database if you shut off "all" then an 8kb data read with 128kb blocksize will result in reading 128kb (the block size), returning the requested piece out of the 128kb and then throwing away the rest of the data read since you prohibited it from going into the ARC. That's almost-certainly going to do bad things for throughput!
What do you think about:
primarycache=all
for SELECT queries over same data sets?Yes.
Non-default stuff...
dbms/ticker-9.5 compressratio 1.88x -
dbms/ticker-9.5 mounted yes -
dbms/ticker-9.5 quota none default
dbms/ticker-9.5 reservation none default
dbms/ticker-9.5 recordsize 128K default
dbms/ticker-9.5 mountpoint /dbms/ticker-9.5 local
dbms/ticker-9.5 sharenfs off default
dbms/ticker-9.5 checksum on default
dbms/ticker-9.5 compression lz4 inherited from dbms
dbms/ticker-9.5 atime off inherited from dbms
dbms/ticker-9.5 logbias throughput inherited from dbms
Note that having an L2ARC, which is the place where you might find setting primarycache to have a benefit, is itself something you need to instrument under your specific workload to see if its worth it. If you want to know if it *might* be worth it you can use (on FreeBSD) zfs-stats -E; if you're seeing materially more than 15% cache misses then it *might* help, assuming what you put it on is *very* fast (e.g. SSD)
In addition if you're on FreeBSD (and you say you are) be aware that the vm system and ZFS interact in some "interesting" ways under certain load profiles. UMA is involved to a material degree in the issue. I have done quite a bit of work on the internal ZFS code in this regard; 11.x is better-behaved than 10.x to a quite-material degree. I have a patch set out against both 10.x and 11.x that address some (but not all) of the issues.