Home > mailing lists

Thread: Low cache hit ratio

Low cache hit ratio

From

Lucas

Date:

29 July 2021, 07:09:39

Hello,

I have recently deployed a new Slave (streaming replication) and have been monitoring its cache hit ratio.

At the moment, read-only queries are being sent to this slave but only 10% of the traffic.

The cache hit ratio is now at 82%. This database is around 1.4TB and is running on EC2 instances in AWS. PostgreSQL version is 9.2.24.

The instance is a r4.2xlarge, so it's got 8 vCPUs and 61GB of RAM. The postgresql.conf file can be seen below (will show only relevant parameters for the cache/performance):

hot_standby = on
random_page_cost = 1.1
max_connections = 500
shared_buffers = 15GB
statement_timeout = 0
work_mem = 31457kB
maintenance_work_mem = 2GB
wal_level = hot_standby
fsync = on
synchronous_commit = on
wal_buffers = 16MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
max_wal_senders = 20
wal_keep_segments = 1024
effective_cache_size = 45GB
logging_collector = on
autovacuum = on
log_autovacuum_min_duration = 1000
autovacuum_max_workers = 5
autovacuum_naptime = 40s
autovacuum_vacuum_threshold = 200
autovacuum_analyze_threshold = 150
autovacuum_vacuum_scale_factor = 0.02
autovacuum_analyze_scale_factor = 0.005
deadlock_timeout = 2s
max_files_per_process = 4096
effective_io_concurrency = 200
hot_standby_feedback = on
max_standby_streaming_delay = 120s
default_statistics_target = 100

I have tried reducing the shared_buffers parameter from 15GB to 8GB, but the cache hit ratio went down to 60%.

Do you guys have any suggestions, on what I could try to get this cache more hits?

Thanks in advance!

---

Regards,

Lucas

This message is encrypted. Both the Public Key and the GPG encrypted message are included in this email so that you can verify its origin.

Attachment

Re: Low cache hit ratio

From

Philip Semanchuk

Date:

29 July 2021, 13:33:16


> On Jul 29, 2021, at 3:09 AM, Lucas <root@sud0.nz> wrote:
>
> Hello,
>
> I have recently deployed a new Slave (streaming replication) and have been monitoring its cache hit ratio.
>
> At the moment, read-only queries are being sent to this slave but only 10% of the traffic.
> The cache hit ratio is now at 82%. This database is around 1.4TB and is running on EC2 instances in AWS. PostgreSQL
versionis 9.2.24. 
>
> The instance is a r4.2xlarge, so it's got 8 vCPUs and 61GB of RAM.


Hi Lucas,
Have you tried the pg_buffercache extension? It gives you some visibility into what’s in the cache so you can
understandwhat’s staying resident and not leaving room for other things. I wrote a view atop pg_buffercache that I use
forthis purpose. It’s pasted below; I hope you find it helpful. My only caveat is that I run this under Postgres 11. I
*think*I’ve used it under Postgres 9.6 but I’m not sure. It definitely hasn’t been tested on 9.2.  

Hope this helps,
Philip


/* A view of pg_buffercache which shows what's in the Postgres cache.
Access to pg_buffercache requires membership in the group pg_monitor.

It's OK to query this ad hoc, but don't query it aggressively (e.g. in a polling loop). The
Postgres doc says --

    > When the pg_buffercache view is accessed, internal buffer manager locks are taken for
    > long enough to copy all the buffer state data that the view will display. This ensures
    > that the view produces a consistent set of results, while not blocking normal buffer
    > activity longer than necessary. Nonetheless there could be some impact on database
    > performance if this view is read often.

https://www.postgresql.org/docs/11/pgbuffercache.html

*/

CREATE OR REPLACE VIEW
    vw_postgres_cache
AS
    SELECT
        c.relname,
        sum(usagecount) AS usage_count,
        /* count(*) is the number of buffers in use. Each buffer is 8Kb. (8Kb is the page size
        a.k.a. block size configured at compile time, confirm in psql with the command
        `show block_size`).

        I cast the count to float to introduce a float into calculations that are otherwise all int
        which would produce a result rounded to the nearest int.
        */
        -- cache_% shows the portion of the cache that this entity occupies
        ((count(*)::float    / pg_settings.setting::int) * 100)::numeric(3, 1) AS "cache_%",
        -- entity_% shows the portion of this entity that's in cache
        -- pg_table_size() is in bytes, multiply buffer count by (8 * 1024) so units match
        (((count(*)::float * 8192) / pg_table_size(c.relname::text)) * 100)::numeric(4,1)
            AS "entity_%",
        (count(*)::float * 8 / (1024 * 1024))::numeric(20, 1) AS Gb,
        (count(*)::float * 8 / 1024         )::numeric(20, 1) AS Mb
    FROM
        pg_buffercache b
    CROSS JOIN pg_settings
    INNER JOIN pg_class c
        ON b.relfilenode = pg_relation_filenode(c.oid)
        AND b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = current_database()))
    WHERE
        pg_settings.name = 'shared_buffers'
        -- If this is run on a system where shared_buffers is expressed in something other than 8kB
        -- blocks, it will calculate incorrect kB/Mb/Gb values. Asserting the value of the unit here
        -- ensures no results at all will be returned in that case.
        AND pg_settings.unit = '8kB'
    GROUP BY
        c.relname, pg_settings.setting::int
    HAVING
        -- Only include entries that occupy at least 0.1% of the cache
        ((count(*)::float / pg_settings.setting::int) * 100) >= 0.1
    ORDER BY 6 DESC
;