Thread: Checkpoints and slow queries
># sysctl -a | grep dirty
vm.dirty_background_ratio = 1
vm.dirty_background_bytes = 0
vm.dirty_ratio = 20
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 500
This also didn't improve the situation.
- EXPLAIN ANALYZE:
- http://explain.depesz.com/s/BNva - An insert query inserting just 129 rows takes 20 seconds.
- http://explain.depesz.com/s/5hA - An update query updating 43926 rows takes 55 seconds.
- http://explain.depesz.com/s/BNva - An insert query inserting just 129 rows takes 20 seconds.
- History: It gets slower after a few days of the system running.
- The tables get updated every 5 minutes. Utmost 50000 rows in a table get updated every 5 minutes. About 50000 rows get inserted every 1 hour.
- There are 90 tables in the DB. 43 of these are updated every 5 minutes. 8/90 tables receive a high update traffic of 50000 updates/5mins. Remaining tables receive an update traffic of 2000 updates/5min. 43/90 tables are updated every 1 hour.
name | current_setting | source
------------------------------+------------------------+----------------------
application_name | psql | client
checkpoint_completion_target | 0.9 | configuration file
checkpoint_segments | 250 | configuration file
checkpoint_timeout | 1h | configuration file
client_encoding | SQL_ASCII | client
client_min_messages | error | configuration file
constraint_exclusion | on | configuration file
DateStyle | ISO, MDY | configuration file
default_statistics_target | 800 | configuration file
default_text_search_config | pg_catalog.english | configuration file
effective_cache_size | 4GB | configuration file
lc_messages | C | configuration file
lc_monetary | C | configuration file
lc_numeric | C | configuration file
lc_time | C | configuration file
listen_addresses | localhost | configuration file
log_autovacuum_min_duration | 20s | configuration file
log_checkpoints | on | configuration file
log_connections | on | configuration file
log_destination | syslog | configuration file
log_disconnections | on | configuration file
log_line_prefix | user=%u,db=%d | configuration file
log_lock_waits | on | configuration file
log_min_duration_statement | 1s | configuration file
log_min_messages | error | configuration file
log_temp_files | 0 | configuration file
log_timezone | PST8PDT,M3.2.0,M11.1.0 | environment variable
maintenance_work_mem | 64MB | configuration file
max_connections | 12 | configuration file
max_locks_per_transaction | 700 | configuration file
max_stack_depth | 2MB | environment variable
port | 5432 | configuration file
shared_buffers | 500MB | configuration file
ssl | off | configuration file
statement_timeout | 4min | configuration file
syslog_facility | local1 | configuration file
syslog_ident | postgres | configuration file
temp_buffers | 256MB | configuration file
TimeZone | PST8PDT,M3.2.0,M11.1.0 | environment variable
wal_buffers | 1MB | configuration file
work_mem | 128MB | configuration file
- Relevant Schema: All tables referenced in this question have this same schema
managed_target_stats=> \d stat_300_3_1
Table "public.stat_300_40110_1"
Column | Type | Modifiers
--------+---------+-----------
ts | integer |
target | bigint |
port | integer |
data | real[] |
Indexes:
"unique_stat_300_40110_1" UNIQUE CONSTRAINT, btree (ts, target, port)
"idx_port_stat_300_40110_1" btree (port)
"idx_target_stat_300_40110_1" btree (target)
"idx_ts_stat_300_40110_1" btree (ts)
- Hardware:
- CPU: Intel(R) Xeon(R) CPU E5205 @ 1.86GHz
- Memory: 6GB
- Storage Details:
Number Start End Size Type File system Flags
1 512B 24.7MB 24.7MB primary boot
2 24.7MB 6473MB 6449MB primary linux-swap(v1)
3 6473MB 40.8GB 34.4GB primary ext3
4 40.8GB 500GB 459GB extended lba
5 40.8GB 408GB 367GB logical ext3
6 408GB 472GB 64.4GB logical ext3
Disk model and details:
Model Family: Western Digital RE3 Serial ATA family
Device Model: WDC WD5002ABYS-02B1B0
Serial Number: WD-WCASYD132237
Firmware Version: 02.03B03
User Capacity: 500,107,862,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sun Apr 27 05:05:13 2014 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
The postgres data is stored on a software RAID10 on partition 5 of both these disks.
[admin@chief-cmc2 tmp]# mdadm --detail /dev/md3
/dev/md3:
Version : 0.90
Creation Time : Wed Mar 19 06:40:57 2014
Raid Level : raid10
Array Size : 358402048 (341.80 GiB 367.00 GB)
Used Dev Size : 358402048 (341.80 GiB 367.00 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 3
Persistence : Superblock is persistent
Update Time : Sun Apr 27 04:22:07 2014
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : far=2
Chunk Size : 64K
UUID : 79d04a1b:99461915:3d186b3c:53958f34
Events : 0.24
Number Major Minor RaidDevice State
0 8 5 0 active sync /dev/sda5
1 8 21 1 active sync /dev/sdb5
- Maintenance Setup: autovacuum is running with default settings. Old records are deleted every night. I also do 'vacuum full' on a 12 tables that receive large number of updates every night at 1AM. I have noticed that these 'vacuum full' also time out. (I am planning to post a separate question regarding my vacuuming strategy).
- WAL Configuration: The WAL is in the same disk.
On 27.4.2014 23:01, Elanchezhiyan Elango wrote: > (I am resending this question after waiting for several hours because > my previous mail got stalled probably because I didn't confirm my > email address after subscribing. So resending the mail. Sorry if this > is causing a double post.) > > *Problem description:* > After a few days of running in my test environment, a query timed out > (query timeout=4mins). Also in general the queries were taking a lot > longer than expected. The workload in my database is a write intensive > workload. And the writes happen in a burst every 5 minutes. There are a > whole bunch of insert and update queries that run every 5 minutes. When > I analyzed the situation (by enabling more postgres logs), I noticed > that postgres checkpoints were triggering approximately every 5 minutes > and based on my online research I suspected the i/o overhead of > checkpoints was affecting the query performance. The checkpoint related > settings were: > checkpoint_segments = 30 > checkpoint_timeout = 15min > > I modified these settings to the following: > checkpoint_segments = 250 > checkpoint_timeout = 1h > checkpoint_completion_target = 0.9 The problem is that while this makes the checkpoints less frequent, it accumulates more changes that need to be written to disk during the checkpoint. Which means the impact more severe. The only case when this is not true is when repeatedly modifying a subset of the data (say, a few data blocks), because the changes merge into a single write during checkpoint. > After I tweaked these settings, checkpoints were happening only once in > an hour and that improved the query performance. However, when the > checkpoint happens every hour, the query performance is still very poor. > This is still undesirable to my system. So, can you share a few of the checkpoint log messages? So that we get an idea of how much data needs to be synced to disk. > I also tried editing dirty_background_ratio and dirty_expire_centisecs > in /etc/sysctl.conf. All dirty related kernel settings: > >># sysctl -a | grep dirty > > vm.dirty_background_ratio = 1 > vm.dirty_background_bytes = 0 > vm.dirty_ratio = 20 > vm.dirty_bytes = 0 > vm.dirty_writeback_centisecs = 500 > vm.dirty_expire_centisecs = 500 > > This also didn't improve the situation. Can you monitor the amount of dirty data in page cache, i.e. data that needs to be written to disk? Wait for the checkpoint and sample the /proc/meminfo a few times: $ cat /proc/meminfo | grep Dirty Also, watch "iostat -x -k 1" or something similar to see disk activity. > My question is how to track down the reason for the poor performance > during checkpoints and improve the query performance when the > checkpoints happen? > > * *EXPLAIN ANALYZE:* > o http://explain.depesz.com/s/BNva - An insert query inserting > just 129 rows takes 20 seconds. > o http://explain.depesz.com/s/5hA - An update query updating 43926 > rows takes 55 seconds. > * *History:* It gets slower after a few days of the system running. > > *Table Metadata*: > > * The tables get updated every 5 minutes. Utmost 50000 rows in a table > get updated every 5 minutes. About 50000 rows get inserted every 1 hour. > * There are 90 tables in the DB. 43 of these are updated every 5 > minutes. 8/90 tables receive a high update traffic of 50000 > updates/5mins. Remaining tables receive an update traffic of 2000 > updates/5min. 43/90 tables are updated every 1 hour. So how much data in total are we talking about? > *PostgreSQL version: *PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, > compiled by gcc (GCC) 4.6.x-google 20120601 (prerelease), 64-bit > > *How you installed PostgreSQL: *Compiled from source and installed. > > *Changes made to the settings in the postgresql.conf file:* Seems fine to me, except for the following changes: > name | current_setting | source > ------------------------------+------------------------+---------------------- > maintenance_work_mem | 64MB | configuration file > temp_buffers | 256MB | configuration file > wal_buffers | 1MB | configuration file > work_mem | 128MB | configuration file Any particular reasons for setting work_mem > maintenance_work_mem? Why have you modified wal_buffer and temp_buffers? I doubt these are related to the issues you're seeing, though. > * *Relevant Schema*: All tables referenced in this question have this > same schema > > managed_target_stats=> \d stat_300_3_1 > > Table "public.stat_300_40110_1" > > Column | Type | Modifiers > --------+---------+----------- > ts | integer | > target | bigint | > port | integer | > data | real[] | > > Indexes: > "unique_stat_300_40110_1" UNIQUE CONSTRAINT, btree (ts, target, port) > "idx_port_stat_300_40110_1" btree (port) > "idx_target_stat_300_40110_1" btree (target) > "idx_ts_stat_300_40110_1" btree (ts) OK, so there are multiple tables, and you're updating 50k rows in all tables in total? Can you post \dt+ and \di+ so that we get an idea of table/index sizes? > * *Hardware*: > o CPU: Intel(R) Xeon(R) CPU E5205 @ 1.86GHz > o Memory: 6GB > o Storage Details: > > There are 2 500GB disks (/dev/sda, /dev/sdb) with the following 6 > partitions on each disk. > > *Number Start End Size Type File system Flags* > > 1 512B 24.7MB 24.7MB primary boot > 2 24.7MB 6473MB 6449MB primary linux-swap(v1) > 3 6473MB 40.8GB 34.4GB primary ext3 > 4 40.8GB 500GB 459GB extended lba > 5 40.8GB 408GB 367GB logical ext3 > 6 408GB 472GB 64.4GB logical ext3 The first problem here is ext3. It's behavior when performing fsync is really terrible. See http://blog.2ndquadrant.com/linux_filesystems_and_postgres/ for more details. So, the first thing you should do is switching to ext4 or xfs. > *Disk model and details:* > > Model Family: Western Digital RE3 Serial ATA family Regular 7.2k SATA disk, not the most powerful piece of hardware. > The postgres data is stored on a software RAID10 on partition 5 of > both these disks. So essentially RAID0, as you only have 2 drives. > * *Maintenance Setup*: autovacuum is running with default settings. > Old records are deleted every night. I also do 'vacuum full' on a 12 > tables that receive large number of updates every night at 1AM. I > have noticed that these 'vacuum full' also time out. (I am planning > to post a separate question regarding my vacuuming strategy). My bet is it's related. If the system is I/O bound, it's natural the vacuum full are performing badly too. > * *WAL Configuration*: The WAL is in the same disk. Which is not helping, because it interferes with the other I/O. regards Tomas
The problem is that while this makes the checkpoints less frequent, it
accumulates more changes that need to be written to disk during the
checkpoint. Which means the impact more severe.
The only case when this is not true is when repeatedly modifying a
subset of the data (say, a few data blocks), because the changes merge
into a single write during checkpoint.
$ cat /proc/meminfo | grep Dirty
Also, watch "iostat -x -k 1" or something similar to see disk activity.
So how much data in total are we talking about?
OK, so there are multiple tables, and you're updating 50k rows in all
tables in total?
Can you post \dt+ and \di+ so that we get an idea of table/index sizes?
On 27.4.2014 23:01, Elanchezhiyan Elango wrote:> *Problem description:*
> (I am resending this question after waiting for several hours because
> my previous mail got stalled probably because I didn't confirm my
> email address after subscribing. So resending the mail. Sorry if this
> is causing a double post.)
>> After a few days of running in my test environment, a query timed outThe problem is that while this makes the checkpoints less frequent, it
> (query timeout=4mins). Also in general the queries were taking a lot
> longer than expected. The workload in my database is a write intensive
> workload. And the writes happen in a burst every 5 minutes. There are a
> whole bunch of insert and update queries that run every 5 minutes. When
> I analyzed the situation (by enabling more postgres logs), I noticed
> that postgres checkpoints were triggering approximately every 5 minutes
> and based on my online research I suspected the i/o overhead of
> checkpoints was affecting the query performance. The checkpoint related
> settings were:
> checkpoint_segments = 30
> checkpoint_timeout = 15min
>
> I modified these settings to the following:
> checkpoint_segments = 250
> checkpoint_timeout = 1h
> checkpoint_completion_target = 0.9
accumulates more changes that need to be written to disk during the
checkpoint. Which means the impact more severe.
The only case when this is not true is when repeatedly modifying a
subset of the data (say, a few data blocks), because the changes merge
into a single write during checkpoint.So, can you share a few of the checkpoint log messages? So that we get
> After I tweaked these settings, checkpoints were happening only once in
> an hour and that improved the query performance. However, when the
> checkpoint happens every hour, the query performance is still very poor.
> This is still undesirable to my system.
an idea of how much data needs to be synced to disk.Can you monitor the amount of dirty data in page cache, i.e. data that
> I also tried editing dirty_background_ratio and dirty_expire_centisecs
> in /etc/sysctl.conf. All dirty related kernel settings:
>
>># sysctl -a | grep dirty
>
> vm.dirty_background_ratio = 1
> vm.dirty_background_bytes = 0
> vm.dirty_ratio = 20
> vm.dirty_bytes = 0
> vm.dirty_writeback_centisecs = 500
> vm.dirty_expire_centisecs = 500
>
> This also didn't improve the situation.
needs to be written to disk? Wait for the checkpoint and sample the
/proc/meminfo a few times:
$ cat /proc/meminfo | grep Dirty
Also, watch "iostat -x -k 1" or something similar to see disk activity.> * *EXPLAIN ANALYZE:*
> My question is how to track down the reason for the poor performance
> during checkpoints and improve the query performance when the
> checkpoints happen?
>
> o http://explain.depesz.com/s/BNva - An insert query inserting> just 129 rows takes 20 seconds.> o http://explain.depesz.com/s/5hA - An update query updating 43926
> rows takes 55 seconds.
> * *History:* It gets slower after a few days of the system running.
>
> *Table Metadata*:
>
> * The tables get updated every 5 minutes. Utmost 50000 rows in a table> get updated every 5 minutes. About 50000 rows get inserted every 1 hour.> * There are 90 tables in the DB. 43 of these are updated every 5> minutes. 8/90 tables receive a high update traffic of 50000So how much data in total are we talking about?
> updates/5mins. Remaining tables receive an update traffic of 2000
> updates/5min. 43/90 tables are updated every 1 hour.
> *PostgreSQL version: *PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu,> compiled by gcc (GCC) 4.6.x-google 20120601 (prerelease), 64-bit> *How you installed PostgreSQL: *Compiled from source and installed.
>
>
> *Changes made to the settings in the postgresql.conf file:*
Seems fine to me, except for the following changes:
> name | current_setting | source
> ------------------------------+------------------------+----------------------
> maintenance_work_mem | 64MB | configuration file
> temp_buffers | 256MB | configuration file> wal_buffers | 1MB | configuration fileAny particular reasons for setting work_mem > maintenance_work_mem? Why
> work_mem | 128MB | configuration file
have you modified wal_buffer and temp_buffers?
I doubt these are related to the issues you're seeing, though.
> * *Relevant Schema*: All tables referenced in this question have this> same schemaOK, so there are multiple tables, and you're updating 50k rows in all
>
> managed_target_stats=> \d stat_300_3_1
>
> Table "public.stat_300_40110_1"
>
> Column | Type | Modifiers
> --------+---------+-----------
> ts | integer |
> target | bigint |
> port | integer |
> data | real[] |
>
> Indexes:
> "unique_stat_300_40110_1" UNIQUE CONSTRAINT, btree (ts, target, port)
> "idx_port_stat_300_40110_1" btree (port)
> "idx_target_stat_300_40110_1" btree (target)
> "idx_ts_stat_300_40110_1" btree (ts)
tables in total? Can you post \dt+ and \di+ so that we get an idea of
table/index sizes?
> * *Hardware*:
> o CPU: Intel(R) Xeon(R) CPU E5205 @ 1.86GHz
> o Memory: 6GB
> o Storage Details:>> *Number Start End Size Type File system Flags*
> There are 2 500GB disks (/dev/sda, /dev/sdb) with the following 6
> partitions on each disk.
>>The first problem here is ext3. It's behavior when performing fsync is
> 1 512B 24.7MB 24.7MB primary boot
> 2 24.7MB 6473MB 6449MB primary linux-swap(v1)
> 3 6473MB 40.8GB 34.4GB primary ext3
> 4 40.8GB 500GB 459GB extended lba
> 5 40.8GB 408GB 367GB logical ext3
> 6 408GB 472GB 64.4GB logical ext3
really terrible. See
http://blog.2ndquadrant.com/linux_filesystems_and_postgres/
for more details. So, the first thing you should do is switching to ext4
or xfs.
> *Disk model and details:*>Regular 7.2k SATA disk, not the most powerful piece of hardware.
> Model Family: Western Digital RE3 Serial ATA familySo essentially RAID0, as you only have 2 drives.
> The postgres data is stored on a software RAID10 on partition 5 of
> both these disks.
> * *Maintenance Setup*: autovacuum is running with default settings.> Old records are deleted every night. I also do 'vacuum full' on a 12My bet is it's related. If the system is I/O bound, it's natural the
> tables that receive large number of updates every night at 1AM. I
> have noticed that these 'vacuum full' also time out. (I am planning
> to post a separate question regarding my vacuuming strategy).
vacuum full are performing badly too.
> * *WAL Configuration*: The WAL is in the same disk.
Which is not helping, because it interferes with the other I/O.
regards
Tomas
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Elanchezhiyan Elango <elanelango@gmail.com> writes: >> The problem is that while this makes the checkpoints less frequent, it >> accumulates more changes that need to be written to disk during the >> checkpoint. Which means the impact more severe. > True. But the checkpoints finish in approximately 5-10 minutes every time > (even with checkpoint_completion_target of 0.9). There's something wrong with that. I wonder whether you need to kick checkpoint_segments up some more to keep the checkpoint from being run too fast. Even so, though, a checkpoint spread over 5-10 minutes ought to provide the kernel with enough breathing room to flush things. It sounds like the kernel is just sitting on the dirty buffers until it gets hit with fsyncs, and then it's dumping them as fast as it can. So you need some more work on tuning the kernel parameters. regards, tom lane
On 28.4.2014 16:07, Tom Lane wrote: > Elanchezhiyan Elango <elanelango@gmail.com> writes: >>> The problem is that while this makes the checkpoints less >>> frequent, it accumulates more changes that need to be written to >>> disk during the checkpoint. Which means the impact more severe. > >> True. But the checkpoints finish in approximately 5-10 minutes >> every time (even with checkpoint_completion_target of 0.9). > > There's something wrong with that. I wonder whether you need to > kick checkpoint_segments up some more to keep the checkpoint from > being run too fast. > > Even so, though, a checkpoint spread over 5-10 minutes ought to > provide the kernel with enough breathing room to flush things. It > sounds like the kernel is just sitting on the dirty buffers until it > gets hit with fsyncs, and then it's dumping them as fast as it can. > So you need some more work on tuning the kernel parameters. There's certainly something fishy, because although this is the supposed configuration: checkpoint_segments = 250 checkpoint_timeout = 1h checkpoint_completion_target = 0.9 the checkpoint logs typically finish in much shorter periods of time. Like this, for example: > > regards, tom lane > >
On 28.4.2014 16:07, Tom Lane wrote:There's certainly something fishy, because although this is the supposed
> Elanchezhiyan Elango <elanelango@gmail.com> writes:
>>> The problem is that while this makes the checkpoints less
>>> frequent, it accumulates more changes that need to be written to
>>> disk during the checkpoint. Which means the impact more severe.
>
>> True. But the checkpoints finish in approximately 5-10 minutes
>> every time (even with checkpoint_completion_target of 0.9).
>
> There's something wrong with that. I wonder whether you need to
> kick checkpoint_segments up some more to keep the checkpoint from
> being run too fast.
>
> Even so, though, a checkpoint spread over 5-10 minutes ought to
> provide the kernel with enough breathing room to flush things. It
> sounds like the kernel is just sitting on the dirty buffers until it
> gets hit with fsyncs, and then it's dumping them as fast as it can.
> So you need some more work on tuning the kernel parameters.
configuration:the checkpoint logs typically finish in much shorter periods of time.
checkpoint_segments = 250
checkpoint_timeout = 1h
checkpoint_completion_target = 0.9
Sorry, hit "send" too early by accident. On 28.4.2014 16:07, Tom Lane wrote: > Elanchezhiyan Elango <elanelango@gmail.com> writes: >>> The problem is that while this makes the checkpoints less >>> frequent, it accumulates more changes that need to be written to >>> disk during the checkpoint. Which means the impact more severe. > >> True. But the checkpoints finish in approximately 5-10 minutes >> every time (even with checkpoint_completion_target of 0.9). > > There's something wrong with that. I wonder whether you need to > kick checkpoint_segments up some more to keep the checkpoint from > being run too fast. Too fast? All the checkpoints listed in the log were "timed", pretty much exactly in 1h intervals: Apr 26 00:12:57 LOG: checkpoint starting: time Apr 26 01:12:57 LOG: checkpoint starting: time Apr 26 02:12:57 LOG: checkpoint starting: time Apr 26 03:12:57 LOG: checkpoint starting: time Apr 26 04:12:58 LOG: checkpoint starting: time Apr 26 05:12:57 LOG: checkpoint starting: time Apr 26 06:12:57 LOG: checkpoint starting: time There's certainly something fishy, because although this is the supposed configuration: checkpoint_segments = 250 checkpoint_timeout = 1h checkpoint_completion_target = 0.9 the checkpoint logs typically finish in much shorter periods of time. Like this, for example: Apr 26 10:12:57 LOG: checkpoint starting: time Apr 26 10:26:27 LOG: checkpoint complete: wrote 9777 buffers (15.3%); 0 transaction log file(s) added, 0 removed, 153 recycled; write=800.377 s, sync=8.605 s, total=809.834 s; sync files=719, longest=1.034 s, average=0.011 s And that's one of the longer runs - most of the others run in ~5-6 minutes. Now, maybe I'm mistaken but I'd expect the checkpoints to finish in ~54 minutes, which is (0.9*checkpoint_completion_target). > Even so, though, a checkpoint spread over 5-10 minutes ought to > provide the kernel with enough breathing room to flush things. It > sounds like the kernel is just sitting on the dirty buffers until it > gets hit with fsyncs, and then it's dumping them as fast as it can. > So you need some more work on tuning the kernel parameters. I'm not sure about this - the /proc/meminfo snapshots sent in the previous post show that the amount of "Dirty" memory is usually well below ~20MB, with max at ~36MB at 22:24:26, and within matter of seconds it drops down to ~10MB of dirty data. Also, the kernel settings seem quite aggressive to me: vm.dirty_background_ratio = 1 vm.dirty_background_bytes = 0 vm.dirty_ratio = 20 vm.dirty_bytes = 0 vm.dirty_writeback_centisecs = 500 vm.dirty_expire_centisecs = 500 regards Tomas
On 28.4.2014 07:50, Elanchezhiyan Elango wrote: > > So how much data in total are we talking about? > OK, so there are multiple tables, and you're updating 50k rows in all > tables in total? > > Every 5 minutes: 50K rows are updated in 4 tables. 2K rows are updated > in 39 tables. > Every 1 hour (on top of the hour): 50K rows are updated in 8 tables. 2K > rows are updated in 78 tables. > If every update will take up space equivalent to 1 row, then there are > 278K rows updated across all tables every 5 minutes. And 556K (278 * 2) > rows updated across all tables every 1 hour. All tables follow the same > schema except some tables don't have the 'port' field. And the data[] > column on each row could have maximum 48 values. I wasn't really asking about the amount of updates (that's reasonably well seen in the checkpoint logs), but about the size of the database. > Can you post \dt+ and \di+ so that we get an idea of table/index sizes? > > \dt+: http://pastebin.com/Dvg2vSeb > \di+: http://pastebin.com/586MGn0U According to the output, it seems you're dealing with ~20GB of data and ~30GB of indexes. Is that about right? > Thanks for your input on ext3 filesystem and having WAL on a > different disk. I'll see if these can be changed. I cannot change > these in the short term. What kernel version is this, actually? Also, have you done some basic performance tests, to see how the disk array behaves? I mean something like dd if=/dev/zero of=/mnt/raid/test.file bs=1M count=16000 dd if=/mnt/raid/test.file of=/dev/null bs=1M count=16000 to test sequential performance, pgbench to test something more random etc. Because trying to solve this from the "it's checkpoint issue" when in reality it might be something completely different. Also, are you sure there's no other source of significant I/O activity? Try to run iotop to watch what's happening there. regards Tomas
On 28.4.2014 22:54, Jeff Janes wrote: > On Mon, Apr 28, 2014 at 1:41 PM, Tomas Vondra <tv@fuzzy.cz > There's certainly something fishy, because although this is the supposed > configuration: > > checkpoint_segments = 250 > checkpoint_timeout = 1h > checkpoint_completion_target = 0.9 > > the checkpoint logs typically finish in much shorter periods of time. > > That doesn't look fishy to me. The checkpointer will not take more than > one nap between buffers, so it will always write at least 10 buffers per > second (of napping time) even if that means it finishes early. Which > seems to be the case here--the length of the write cycle seems to be > about one tenth the number of buffers written. Oh, makes sense I guess. Apparently I'm tuning this only on systems doing a lot of I/O, so this behaviour somehow escaped me. Tomas