Proposing pg_hibernate - Mailing list pgsql-hackers
From | Gurjeet Singh |
---|---|
Subject | Proposing pg_hibernate |
Date | |
Msg-id | CABwTF4Ui_anAG+ybseFunAH5Z6DE9aw2NPdy4HryK+M5OdXCCA@mail.gmail.com Whole thread Raw |
Responses |
Re: Proposing pg_hibernate
(Gurjeet Singh <gurjeet@singh.im>)
Re: Proposing pg_hibernate (Cédric Villemain <cedric@2ndquadrant.com>) |
List | pgsql-hackers |
Please find attached the pg_hibernate extension. It is a set-it-and-forget-it solution to enable hibernation of Postgres shared-buffers. It can be thought of as an amalgam of pg_buffercache and pg_prewarm.
It uses the background worker infrastructure. It registers one worker process (BufferSaver) to save the shared-buffer metadata when server is shutting down, and one worker per database (BlockReader) when restoring the shared buffers.
It stores the buffer metadata under $PGDATA/pg_database/, one file per database, and one separate file for global objects. It sorts the list of buffers before storage, so that when it encounters a range of consecutive blocks of a relation's fork, it stores that range as just one entry, hence reducing the storage and I/O overhead.
On-disk binary format, which is used to create the per-database save-files, is defined as:
1. One or more relation filenodes; stored as r<relfilenode>.
2. Each realtion is followed by one or more fork number; stored as f<forknumber>
3. Each fork number is followed by one or more block numbers; stored as b<blocknumber>
4. Each block number is followed by zero or more range numbers; stored as N<number>
{r {f {b N* }+ }+ }+
Possible enhancements:
- Ability to save/restore only specific databases.
- Control how many BlockReaders are active at a time; to avoid I/O storms.
- Be smart about lowered shared_buffers across the restart.
- Different modes of reading like pg_prewarm does.
- Include PgFincore functionality, at least for Linux platforms.
The extension currently works with PG 9.3, and may work on 9.4 without any changes; I haven't tested, though. If not, I think it'd be easy to port to HEAD/PG 9.4. I see that 9.4 has put a cap on maximum background workers via a GUC, and since my aim is to provide a non-intrusive no-tuning-required extension, I'd like to use the new dynamic-background-worker infrastructure in 9.4, which doesn't seem to have any preset limits (I think it's limited by max_connections, but I may be wrong). I can work on 9.4 port, if there's interest in including this as a contrib/ module.
To see the extension in action:
.) Compile it.
.) Install it.
.) Add it to shared_preload_libraries.
.) Start/restart Postgres.
.) Install pg_buffercache extension, to inspect the shared buffers.
.) Note the result of pg_buffercache view.
.) Work on your database to fill up the shared buffers.
.) Note the result of pg_buffercache view, again; there should be more blocks than last time we checked.
.) Stop and start the Postgres server.
.) Note the output of pg_buffercache view; it should contain the blocks seen just before the shutdown.
.) Future server restarts will automatically save and restore the blocks in shared-buffers.
The code is also available as Git repository at https://github.com/gurjeet/pg_hibernate/It uses the background worker infrastructure. It registers one worker process (BufferSaver) to save the shared-buffer metadata when server is shutting down, and one worker per database (BlockReader) when restoring the shared buffers.
It stores the buffer metadata under $PGDATA/pg_database/, one file per database, and one separate file for global objects. It sorts the list of buffers before storage, so that when it encounters a range of consecutive blocks of a relation's fork, it stores that range as just one entry, hence reducing the storage and I/O overhead.
On-disk binary format, which is used to create the per-database save-files, is defined as:
1. One or more relation filenodes; stored as r<relfilenode>.
2. Each realtion is followed by one or more fork number; stored as f<forknumber>
3. Each fork number is followed by one or more block numbers; stored as b<blocknumber>
4. Each block number is followed by zero or more range numbers; stored as N<number>
{r {f {b N* }+ }+ }+
Possible enhancements:
- Ability to save/restore only specific databases.
- Control how many BlockReaders are active at a time; to avoid I/O storms.
- Be smart about lowered shared_buffers across the restart.
- Different modes of reading like pg_prewarm does.
- Include PgFincore functionality, at least for Linux platforms.
The extension currently works with PG 9.3, and may work on 9.4 without any changes; I haven't tested, though. If not, I think it'd be easy to port to HEAD/PG 9.4. I see that 9.4 has put a cap on maximum background workers via a GUC, and since my aim is to provide a non-intrusive no-tuning-required extension, I'd like to use the new dynamic-background-worker infrastructure in 9.4, which doesn't seem to have any preset limits (I think it's limited by max_connections, but I may be wrong). I can work on 9.4 port, if there's interest in including this as a contrib/ module.
To see the extension in action:
.) Compile it.
.) Install it.
.) Add it to shared_preload_libraries.
.) Start/restart Postgres.
.) Install pg_buffercache extension, to inspect the shared buffers.
.) Note the result of pg_buffercache view.
.) Work on your database to fill up the shared buffers.
.) Note the result of pg_buffercache view, again; there should be more blocks than last time we checked.
.) Stop and start the Postgres server.
.) Note the output of pg_buffercache view; it should contain the blocks seen just before the shutdown.
.) Future server restarts will automatically save and restore the blocks in shared-buffers.
Demo:
$ make -C contrib/pg_hibernate/
$ make -C contrib/pg_hibernate/ install
$ vi $B/db/data/postgresql.conf
$ grep shared_preload_libraries $PGDATA/postgresql.conf
shared_preload_libraries = 'pg_hibernate' # (change requires restart)
$ pgstart
waiting for server to start.... done
server started
$ pgsql -c 'create extension pg_buffercache;'
CREATE EXTENSION
$ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not null group by reldatabase;'
count
-------
163
14
(2 rows)
$ pgsql -c 'create table test_hibernate as select s as a, s::char(1000) as b from generate_series(1, 100000) as s;'
SELECT 100000
$ pgsql -c 'create index on test_hibernate(a);'
CREATE INDEX
$ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not null group by reldatabase;'
count
-------
2254
14
(2 rows)
$ pgstop
waiting for server to shut down....... done
server stopped
$ pgstart
waiting for server to start.... done
server started
$ pgsql -c 'select count(*) from pg_buffercache where relblocknumber is not null group by reldatabase;'
count
-------
2264
17
(2 rows)
There are a few more blocks than the time they were saved, but all the blocks from before the restart are present in shared buffers after the restart.
Best regards,
--
Attachment
pgsql-hackers by date: