Re: WAL file utilization question - Mailing list pgsql-admin
From | Keaton Adams |
---|---|
Subject | Re: WAL file utilization question |
Date | |
Msg-id | 1179343892.22514.113.camel@MXLRMT-208.corp.mxlogic.com Whole thread Raw |
In response to | Re: WAL file utilization question ("Jim C. Nasby" <decibel@decibel.org>) |
List | pgsql-admin |
Given these postgresql.conf settings:
#---------------------------------------------------------------------------
# WRITE AHEAD LOG
#---------------------------------------------------------------------------
# - Settings -
fsync = on # turns forced synchronization on or off
wal_sync_method = fsync # the default is the first option
# supported by the operating system:
# open_datasync
# fdatasync
# fsync
# fsync_writethrough
# open_sync
full_page_writes = on # recover from partial page writes
wal_buffers = 32 # min 4, 8KB each
commit_delay = 100000 # range 0-100000, in microseconds
commit_siblings = 1000 # range 1-1000
# - Checkpoints -
checkpoint_segments = 500 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 300 # range 30-3600, in seconds
checkpoint_warning = 120 # in seconds, 0 is off
# - Archiving -
archive_command = '/mnt/logship/scripts/archivemaster.sh %p %f'
# command to use to archive a logfile
# segment
And these tables to load data into:
List of relations
Schema | Name | Type | Owner
--------+-----------+-------+----------
public | testload | table | postgres
public | testload2 | table | postgres
public | testload3 | table | postgres
(3 rows)
postgres=# \d testload
Table "public.testload"
Column | Type | Modifiers
--------+----------------+-----------
name | character(100) |
postgres=# \d testload2
Table "public.testload2"
Column | Type | Modifiers
--------+----------------+-----------
name | character(100) |
postgres=# \d testload3
Table "public.testload3"
Column | Type | Modifiers
--------+----------------+-----------
name | character(100) |
There are no indexes on the tables.
Using an 8K data page:
8K data page (8192 bytes)
Less page header and row overhead leaves ~8000 bytes
At 100 bytes per row = ~80 rows/page
Rows loaded: 250,000 / 80 = 3125 data pages * 8192 = 25,600,000 bytes / 1048576 = ~ 24.4 MB of data page space.
The test file is shown here (250,000 rows all the same):
-bash-3.1$ more datafile.txt
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEFFFFFFFFFFGGGGGGGGGGHHHHHHHHHHIIIIIIIIIIJJJJJJJJJJ
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEFFFFFFFFFFGGGGGGGGGGHHHHHHHHHHIIIIIIIIIIJJJJJJJJJJ
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEFFFFFFFFFFGGGGGGGGGGHHHHHHHHHHIIIIIIIIIIJJJJJJJJJJ
The load script:
-bash-3.1$ more loaddata.sql
copy testload from '/home/kadams/logship/datafile.txt' delimiter '|';
copy testload2 from '/home/kadams/logship/datafile.txt' delimiter '|';
copy testload3 from '/home/kadams/logship/datafile.txt' delimiter '|';
So the one load process does a COPY into the three tables. 24.4 MB * 3 tables = ~ 73.2 MB of data page space.
This is the only process running on the database. No other loads/users are on the system.
psql -f sql/loaddata.sql >/dev/null 2>&1 &
It seems that 112 MB of WAL file space (16 MB * 7) is required for 73.2 MB of loaded data, which is an extra 34.8% of disk space to log/archive the COPY commands:
First pass:
LOG: transaction ID wrap limit is 2147484146, limited by database "postgres"
LOG: archived transaction log file "00000001000000010000005E"
LOG: archived transaction log file "00000001000000010000005F"
LOG: archived transaction log file "000000010000000100000060"
LOG: archived transaction log file "000000010000000100000061"
LOG: archived transaction log file "000000010000000100000062"
LOG: archived transaction log file "000000010000000100000063"
LOG: archived transaction log file "000000010000000100000064"
# of logs in pg_xlog: 9
Second pass:
LOG: archived transaction log file "000000010000000100000065"
LOG: archived transaction log file "000000010000000100000066"
LOG: archived transaction log file "000000010000000100000067"
LOG: archived transaction log file "000000010000000100000068"
LOG: archived transaction log file "000000010000000100000069"
LOG: archived transaction log file "00000001000000010000006A"
LOG: archived transaction log file "00000001000000010000006B"
# of logs in pg_xlog: 15
Third pass:
LOG: archived transaction log file "00000001000000010000006C"
LOG: archived transaction log file "00000001000000010000006D"
LOG: archived transaction log file "00000001000000010000006E"
LOG: archived transaction log file "00000001000000010000006F"
LOG: archived transaction log file "000000010000000100000070"
LOG: archived transaction log file "000000010000000100000071"
LOG: archived transaction log file "000000010000000100000072"
# of logs in pg_xlog: 22
Fourth pass:
LOG: archived transaction log file "000000010000000100000073"
LOG: archived transaction log file "000000010000000100000074"
LOG: archived transaction log file "000000010000000100000075"
LOG: archived transaction log file "000000010000000100000076"
LOG: archived transaction log file "000000010000000100000077"
LOG: archived transaction log file "000000010000000100000078"
LOG: archived transaction log file "000000010000000100000079"
# of logs in pg_xlog: 29
PostgreSQL continued to add log files in pg_xlog, so my assumption is that checkpoints did not come into play during the load process, correct? (Frequent checkpoints would have added even more to the WAL file overhead, is my understanding.)
So is there anything I can do to reduce the 34.8% overhead in WAL file space when loading data? Do you see any glaring mistakes in the calculations themselves, and would you agree with this overhead figure? (I make no claim to perfection by any means.)
We are running on PostgreSQL 8.1.4 and are planning to move to 8.3 when it becomes available. Are there space utilization improvements in WAL logging in the upcoming release?
Thanks,
Keaton
On Wed, 2007-05-16 at 11:51 -0500, Jim C. Nasby wrote:
#---------------------------------------------------------------------------
# WRITE AHEAD LOG
#---------------------------------------------------------------------------
# - Settings -
fsync = on # turns forced synchronization on or off
wal_sync_method = fsync # the default is the first option
# supported by the operating system:
# open_datasync
# fdatasync
# fsync
# fsync_writethrough
# open_sync
full_page_writes = on # recover from partial page writes
wal_buffers = 32 # min 4, 8KB each
commit_delay = 100000 # range 0-100000, in microseconds
commit_siblings = 1000 # range 1-1000
# - Checkpoints -
checkpoint_segments = 500 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 300 # range 30-3600, in seconds
checkpoint_warning = 120 # in seconds, 0 is off
# - Archiving -
archive_command = '/mnt/logship/scripts/archivemaster.sh %p %f'
# command to use to archive a logfile
# segment
And these tables to load data into:
List of relations
Schema | Name | Type | Owner
--------+-----------+-------+----------
public | testload | table | postgres
public | testload2 | table | postgres
public | testload3 | table | postgres
(3 rows)
postgres=# \d testload
Table "public.testload"
Column | Type | Modifiers
--------+----------------+-----------
name | character(100) |
postgres=# \d testload2
Table "public.testload2"
Column | Type | Modifiers
--------+----------------+-----------
name | character(100) |
postgres=# \d testload3
Table "public.testload3"
Column | Type | Modifiers
--------+----------------+-----------
name | character(100) |
There are no indexes on the tables.
Using an 8K data page:
8K data page (8192 bytes)
Less page header and row overhead leaves ~8000 bytes
At 100 bytes per row = ~80 rows/page
Rows loaded: 250,000 / 80 = 3125 data pages * 8192 = 25,600,000 bytes / 1048576 = ~ 24.4 MB of data page space.
The test file is shown here (250,000 rows all the same):
-bash-3.1$ more datafile.txt
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEFFFFFFFFFFGGGGGGGGGGHHHHHHHHHHIIIIIIIIIIJJJJJJJJJJ
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEFFFFFFFFFFGGGGGGGGGGHHHHHHHHHHIIIIIIIIIIJJJJJJJJJJ
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEFFFFFFFFFFGGGGGGGGGGHHHHHHHHHHIIIIIIIIIIJJJJJJJJJJ
The load script:
-bash-3.1$ more loaddata.sql
copy testload from '/home/kadams/logship/datafile.txt' delimiter '|';
copy testload2 from '/home/kadams/logship/datafile.txt' delimiter '|';
copy testload3 from '/home/kadams/logship/datafile.txt' delimiter '|';
So the one load process does a COPY into the three tables. 24.4 MB * 3 tables = ~ 73.2 MB of data page space.
This is the only process running on the database. No other loads/users are on the system.
psql -f sql/loaddata.sql >/dev/null 2>&1 &
It seems that 112 MB of WAL file space (16 MB * 7) is required for 73.2 MB of loaded data, which is an extra 34.8% of disk space to log/archive the COPY commands:
First pass:
LOG: transaction ID wrap limit is 2147484146, limited by database "postgres"
LOG: archived transaction log file "00000001000000010000005E"
LOG: archived transaction log file "00000001000000010000005F"
LOG: archived transaction log file "000000010000000100000060"
LOG: archived transaction log file "000000010000000100000061"
LOG: archived transaction log file "000000010000000100000062"
LOG: archived transaction log file "000000010000000100000063"
LOG: archived transaction log file "000000010000000100000064"
# of logs in pg_xlog: 9
Second pass:
LOG: archived transaction log file "000000010000000100000065"
LOG: archived transaction log file "000000010000000100000066"
LOG: archived transaction log file "000000010000000100000067"
LOG: archived transaction log file "000000010000000100000068"
LOG: archived transaction log file "000000010000000100000069"
LOG: archived transaction log file "00000001000000010000006A"
LOG: archived transaction log file "00000001000000010000006B"
# of logs in pg_xlog: 15
Third pass:
LOG: archived transaction log file "00000001000000010000006C"
LOG: archived transaction log file "00000001000000010000006D"
LOG: archived transaction log file "00000001000000010000006E"
LOG: archived transaction log file "00000001000000010000006F"
LOG: archived transaction log file "000000010000000100000070"
LOG: archived transaction log file "000000010000000100000071"
LOG: archived transaction log file "000000010000000100000072"
# of logs in pg_xlog: 22
Fourth pass:
LOG: archived transaction log file "000000010000000100000073"
LOG: archived transaction log file "000000010000000100000074"
LOG: archived transaction log file "000000010000000100000075"
LOG: archived transaction log file "000000010000000100000076"
LOG: archived transaction log file "000000010000000100000077"
LOG: archived transaction log file "000000010000000100000078"
LOG: archived transaction log file "000000010000000100000079"
# of logs in pg_xlog: 29
PostgreSQL continued to add log files in pg_xlog, so my assumption is that checkpoints did not come into play during the load process, correct? (Frequent checkpoints would have added even more to the WAL file overhead, is my understanding.)
So is there anything I can do to reduce the 34.8% overhead in WAL file space when loading data? Do you see any glaring mistakes in the calculations themselves, and would you agree with this overhead figure? (I make no claim to perfection by any means.)
We are running on PostgreSQL 8.1.4 and are planning to move to 8.3 when it becomes available. Are there space utilization improvements in WAL logging in the upcoming release?
Thanks,
Keaton
On Wed, 2007-05-16 at 11:51 -0500, Jim C. Nasby wrote:
On Wed, May 16, 2007 at 10:49:04AM -0600, Keaton Adams wrote: > So, if I do set the checkpoint_segments parameter to 50, 75, or even 100 > I will have a considerable number of WAL files in data/pg_xlog, but they > will be used more efficiently, so I will be archiving less? That's what > I understand from the comments received so far. Yes. Let it create 500 or 1000 WAL files if it wants... it's much more important to limit the frequency of checkpoints than to reduce the number of WAL files (which has virtually no impact on performance).
pgsql-admin by date: