Hi,
Sorry for the late response on this.
On Wed, Aug 31, 2011 at 09:40, Tomas Vondra <tv@fuzzy.cz> wrote:
> On 31 Srpen 2011, 1:07, Dan Scott wrote:
>> On Tue, Aug 30, 2011 at 13:52, Daniel Verite <daniel@manitou-mail.org>
>> wrote:
>>> Dan Scott wrote:
>>>
>>>> the insert process is unable to insert new rows into the database
>>>
>>> You should probably provide the error message on insert or otherwise
>>> describe
>>> how it's not working. Normally reading does not unintentionally prevent
>>> writing in a concurrent session.
>>
>> I've investigated a little further and it seems that they don't really
>> 'fail'. Just start taking significantly longer to insert, and the
>> messages start backing up and eventually stop being sent because
>
> What messages are you talking about?
I guess I need to explain the system a little better. The purpose of
this database is to log HL7 messages received from another system. We
then parse these messages to extract the data. So the 'messages' in
this case are the messages that I'm receiving from the remote system,
they are my data which I'm inserting into the database.
>> they're not being acknowledged. I can see a few "WARNING: pgstat wait
>> timeout" messages around the time that this is happening in the
>> syslog.
>
> The pgstat messages are a typical symptom of I/O bottleneck - it just
> means you'ro doing a lot of writes, more than the drives can take. Enable
> checkpoint logging (log_checkpoints=on) and watch the system stats (e.g.
> using 'iostat -x' or vmstat), my bet is this is a checkpoint or pdflush
> issue.
Checkpoint logging is enabled and I've setup monitoring using munin.
iostat -x 10 shows the following (vdb is the data drive):
avg-cpu: %user %nice %system %iowait %steal %idle
0.15 0.00 0.10 0.10 0.00 99.65
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
vda 0.00 0.00 0.00 0.70 0.00 5.60
8.00 0.00 1.14 0.86 0.06
vdb 0.00 3.30 0.00 5.40 0.00 69.60
12.89 0.05 8.57 1.52 0.82
dm-0 0.00 0.00 0.00 0.70 0.00 5.60
8.00 0.00 1.14 0.86 0.06
dm-1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.30 0.00 0.10 0.15 0.00 99.45
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
vda 0.00 0.00 0.00 1.50 0.00 12.00
8.00 0.00 1.67 0.47 0.07
vdb 0.00 3.50 0.00 4.60 0.00 64.80
14.09 0.02 3.96 1.50 0.69
dm-0 0.00 0.00 0.00 1.50 0.00 12.00
8.00 0.00 1.67 0.47 0.07
dm-1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.05 0.00 0.05 0.15 0.00 99.75
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
vda 0.00 0.00 0.00 0.80 0.00 6.40
8.00 0.00 0.88 0.37 0.03
vdb 0.00 1.30 0.00 2.80 0.00 32.80
11.71 0.01 3.79 1.75 0.49
dm-0 0.00 0.00 0.00 0.80 0.00 6.40
8.00 0.00 0.88 0.37 0.03
dm-1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.05 0.00 0.05 0.00 0.00 99.90
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
vda 0.00 0.00 0.00 0.40 0.00 3.20
8.00 0.00 1.75 0.75 0.03
vdb 0.00 0.10 0.00 0.70 0.00 6.40
9.14 0.00 0.71 0.71 0.05
dm-0 0.00 0.00 0.00 0.40 0.00 3.20
8.00 0.00 1.75 0.75 0.03
dm-1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
Which column should I be looking at?
> Anyway we need more info about your system - Pg version, amount of RAM,
> shared buffers, checkpoint settings (segments, completion) and page cache
> config (/proc/sys/vm/). A few lines of vmstat/iostat output would help
> too.
The server is running Scientific Linux 6.0, PostgreSQL 9.4.9, 2GB ram.
The PostgreSQL configuration is fairly standard:
[root@aorta ~]# grep shared /data/pgsql/data/postgresql.conf
# Note: Increasing max_connections costs ~400 bytes of shared memory per
shared_buffers = 24MB # min 128kB
# Note: Increasing max_prepared_transactions costs ~600 bytes of shared memory
#shared_preload_libraries = '' # (change requires restart)
# Note: Each lock table slot uses ~270 bytes of shared memory, and there are
[root@aorta ~]# grep check /data/pgsql/data/postgresql.conf
#checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min # range 30s-1h
#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning = 30s # 0 disables
log_checkpoints = on
#check_function_bodies = on
[root@aorta ~]# grep page /data/pgsql/data/postgresql.conf
#vacuum_cost_page_hit = 1 # 0-10000 credits
#vacuum_cost_page_miss = 10 # 0-10000 credits
#vacuum_cost_page_dirty = 20 # 0-10000 credits
#bgwriter_lru_maxpages = 100 # 0-1000 max buffers written/round
#full_page_writes = on # recover from partial page writes
#seq_page_cost = 1.0 # measured on an arbitrary scale
#random_page_cost = 4.0 # same scale as above
Thanks for all your help,
Dan