Thread: PostgreSQL 8.0 occasionally slow down
Hi,
I am new for postgresql server. And now i work on a projects which requires postgreSQL 8.0 and Java. I don't know why the server occasionally slow down a bit for every 3 minutes.
I have changed the log configuration so that it logs all statement transaction > 1000 ms and the result shown below :
============================================================================
<elf2 2007-06-28 14:30:25 HKT 46835574.7a64> LOG: duration: 1494.109 ms statement: commit;begin;
<elf2 2007-06-28 14:33:34 HKT 468354a8.7415> LOG: duration: 1048.429 ms statement: commit;begin;
<elf2 2007-06-28 14:33:35 HKT 468354a9.7418> LOG: duration: 1580.120 ms statement: commit;begin;
<elf2 2007-06-28 14:33:37 HKT 468354a9.7418 > LOG: duration: 1453.620 ms statement: commit;begin;
<elf2 2007-06-28 14:36:51 HKT 468354a9.7419> LOG: duration: 1430.019 ms statement: commit;begin;
<elf2 2007-06-28 14:36:53 HKT 468354a9.7418> LOG: duration: 1243.886 ms statement: commit;begin;
<elf2 2007-06-28 14:36:54 HKT 468354a9.7419> LOG: duration: 1491.821 ms statement: commit;begin;
<elf2 2007-06-28 14:36:54 HKT 468354a9.7418> LOG: duration: 1266.516 ms statement: commit;begin;
...
...
<elf2 2007-06-28 14:40:54 HKT 468354a9.741b> LOG: duration: 1776.466 ms statement: commit;begin;
<elf2 2007-06-28 14:40:54 HKT 468357ec.d5a> LOG: duration: 1500.132 ms statement: commit;begin;
...
...
<elf2 2007-06-28 14:44:07 HKT 46835477.73b7> LOG: duration: 1011.216 ms statement: commit;begin;
<elf2 2007-06-28 14:44:12 HKT 46835477.73b7> LOG: duration: 1009.187 ms statement: commit;begin;
<elf2 2007-06-28 14:44:13 HKT 468352f9.7194> LOG: duration: 1086.769 ms statement: commit;begin;
<elf2 2007-06-28 14:44:14 HKT 46835477.73b7> LOG: duration: 1481.627 ms statement: commit;begin;
...
...
<elf2 2007-06-28 14:47:44 HKT 468354a9.7419> LOG: duration: 10513.208 ms statement: commit;begin;
<elf2 2007-06-28 14:48:22 HKT 468354a9.7419> LOG: duration: 38126.708 ms statement: commit;begin;
============================================================================
For each 3 ~ 4 minutes , there are many transactions which requires (>1 seconds) for execution. It is strange for me seems the tables size is quite small (~ 10K < 20K row). I can said the rate of incoming transactions is quite steady through our the testing. So i am quite confusing why the performance degrades for every 3 ~ 4 minutes. I am wondering if there is any default scheduled task in the postgreSQL 8.0
The configurations which i have amended in postgresql.conf.
max_fsm_pages = 100000
vacuum_cost_delay = 10
The machine using :
512 RAM
Gentoo Linux
Do anyone can help me about this ? or any resolution for a sudden performance degrade ( because the application i need to develop is quite time-critical).
Thank.
Twinsen
I am new for postgresql server. And now i work on a projects which requires postgreSQL 8.0 and Java. I don't know why the server occasionally slow down a bit for every 3 minutes.
I have changed the log configuration so that it logs all statement transaction > 1000 ms and the result shown below :
============================================================================
<elf2 2007-06-28 14:30:25 HKT 46835574.7a64> LOG: duration: 1494.109 ms statement: commit;begin;
<elf2 2007-06-28 14:33:34 HKT 468354a8.7415> LOG: duration: 1048.429 ms statement: commit;begin;
<elf2 2007-06-28 14:33:35 HKT 468354a9.7418> LOG: duration: 1580.120 ms statement: commit;begin;
<elf2 2007-06-28 14:33:37 HKT 468354a9.7418 > LOG: duration: 1453.620 ms statement: commit;begin;
<elf2 2007-06-28 14:36:51 HKT 468354a9.7419> LOG: duration: 1430.019 ms statement: commit;begin;
<elf2 2007-06-28 14:36:53 HKT 468354a9.7418> LOG: duration: 1243.886 ms statement: commit;begin;
<elf2 2007-06-28 14:36:54 HKT 468354a9.7419> LOG: duration: 1491.821 ms statement: commit;begin;
<elf2 2007-06-28 14:36:54 HKT 468354a9.7418> LOG: duration: 1266.516 ms statement: commit;begin;
...
...
<elf2 2007-06-28 14:40:54 HKT 468354a9.741b> LOG: duration: 1776.466 ms statement: commit;begin;
<elf2 2007-06-28 14:40:54 HKT 468357ec.d5a> LOG: duration: 1500.132 ms statement: commit;begin;
...
...
<elf2 2007-06-28 14:44:07 HKT 46835477.73b7> LOG: duration: 1011.216 ms statement: commit;begin;
<elf2 2007-06-28 14:44:12 HKT 46835477.73b7> LOG: duration: 1009.187 ms statement: commit;begin;
<elf2 2007-06-28 14:44:13 HKT 468352f9.7194> LOG: duration: 1086.769 ms statement: commit;begin;
<elf2 2007-06-28 14:44:14 HKT 46835477.73b7> LOG: duration: 1481.627 ms statement: commit;begin;
...
...
<elf2 2007-06-28 14:47:44 HKT 468354a9.7419> LOG: duration: 10513.208 ms statement: commit;begin;
<elf2 2007-06-28 14:48:22 HKT 468354a9.7419> LOG: duration: 38126.708 ms statement: commit;begin;
============================================================================
For each 3 ~ 4 minutes , there are many transactions which requires (>1 seconds) for execution. It is strange for me seems the tables size is quite small (~ 10K < 20K row). I can said the rate of incoming transactions is quite steady through our the testing. So i am quite confusing why the performance degrades for every 3 ~ 4 minutes. I am wondering if there is any default scheduled task in the postgreSQL 8.0
The configurations which i have amended in postgresql.conf.
max_fsm_pages = 100000
vacuum_cost_delay = 10
The machine using :
512 RAM
Gentoo Linux
Do anyone can help me about this ? or any resolution for a sudden performance degrade ( because the application i need to develop is quite time-critical).
Thank.
Twinsen
Two points: * need more information about the circumstances. * could it be that autovaccum hits you? Andreas -- Ursprüngl. Mitteil. -- Betreff: [PERFORM] PostgreSQL 8.0 occasionally slow down Von: "Ho Fat Tsang" <namiwf@gmail.com> Datum: 28.06.2007 06:56 Hi, I am new for postgresql server. And now i work on a projects which requires postgreSQL 8.0 and Java. I don't know why the server occasionally slow down a bit for every 3 minutes. I have changed the log configuration so that it logs all statement transaction > 1000 ms and the result shown below : ============================================================================ <elf2 2007-06-28 14:30:25 HKT 46835574.7a64> LOG: duration: 1494.109 ms statement: commit;begin; <elf2 2007-06-28 14:33:34 HKT 468354a8.7415> LOG: duration: 1048.429 ms statement: commit;begin; <elf2 2007-06-28 14:33:35 HKT 468354a9.7418> LOG: duration: 1580.120 ms statement: commit;begin; <elf2 2007-06-28 14:33:37 HKT 468354a9.7418> LOG: duration: 1453.620 ms statement: commit;begin; <elf2 2007-06-28 14:36:51 HKT 468354a9.7419> LOG: duration: 1430.019 ms statement: commit;begin; <elf2 2007-06-28 14:36:53 HKT 468354a9.7418> LOG: duration: 1243.886 ms statement: commit;begin; <elf2 2007-06-28 14:36:54 HKT 468354a9.7419> LOG: duration: 1491.821 ms statement: commit;begin; <elf2 2007-06-28 14:36:54 HKT 468354a9.7418> LOG: duration: 1266.516 ms statement: commit;begin; ... ... <elf2 2007-06-28 14:40:54 HKT 468354a9.741b> LOG: duration: 1776.466 ms statement: commit;begin; <elf2 2007-06-28 14:40:54 HKT 468357ec.d5a> LOG: duration: 1500.132 ms statement: commit;begin; ... ... <elf2 2007-06-28 14:44:07 HKT 46835477.73b7> LOG: duration: 1011.216 ms statement: commit;begin; <elf2 2007-06-28 14:44:12 HKT 46835477.73b7> LOG: duration: 1009.187 ms statement: commit;begin; <elf2 2007-06-28 14:44:13 HKT 468352f9.7194> LOG: duration: 1086.769 ms statement: commit;begin; <elf2 2007-06-28 14:44:14 HKT 46835477.73b7> LOG: duration: 1481.627 ms statement: commit;begin; ... ... <elf2 2007-06-28 14:47:44 HKT 468354a9.7419> LOG: duration: 10513.208 ms statement: commit;begin; <elf2 2007-06-28 14:48:22 HKT 468354a9.7419> LOG: duration: 38126.708 ms statement: commit;begin; ============================================================================ For each 3 ~ 4 minutes , there are many transactions which requires (>1 seconds) for execution. It is strange for me seems the tables size is quite small (~ 10K < 20K row). I can said the rate of incoming transactions is quite steady through our the testing. So i am quite confusing why the performance degrades for every 3 ~ 4 minutes. I am wondering if there is any default scheduled task in the postgreSQL 8.0 The configurations which i have amended in postgresql.conf. max_fsm_pages = 100000 vacuum_cost_delay = 10 The machine using : 512 RAM Gentoo Linux Do anyone can help me about this ? or any resolution for a sudden performance degrade ( because the application i need to develop is quite time-critical). Thank. Twinsen
Ho Fat Tsang wrote: > > I am new for postgresql server. And now i work on a projects which > requires postgreSQL 8.0 and Java. I don't know why the server occasionally > slow down a bit for every 3 minutes. > Do anyone can help me about this ? or any resolution for a sudden > performance degrade ( because the application i need to develop is quite > time-critical). It's probably checkpointing. PG will write updates to the transaction log (WAL) immediately and update the main data files later. Every so often it makes sure the data files are up-to-date and this is called checkpointing. You want checkpointing to happen more often, not less. That way the load will be less each time it happens. See the manual for details. -- Richard Huxton Archonet Ltd
Hi Richard,
I have tuned the checkpoint_timeout to 30 second which is ten times less than default and the issue is still reproduced. Do you have any recommended configuration for WAL ?
Thanks
Twinsen
I have tuned the checkpoint_timeout to 30 second which is ten times less than default and the issue is still reproduced. Do you have any recommended configuration for WAL ?
Thanks
Twinsen
2007/6/28, Richard Huxton <dev@archonet.com>:
Ho Fat Tsang wrote:
>
> I am new for postgresql server. And now i work on a projects which
> requires postgreSQL 8.0 and Java. I don't know why the server occasionally
> slow down a bit for every 3 minutes.
> Do anyone can help me about this ? or any resolution for a sudden
> performance degrade ( because the application i need to develop is quite
> time-critical).
It's probably checkpointing. PG will write updates to the transaction
log (WAL) immediately and update the main data files later. Every so
often it makes sure the data files are up-to-date and this is called
checkpointing.
You want checkpointing to happen more often, not less. That way the load
will be less each time it happens. See the manual for details.
--
Richard Huxton
Archonet Ltd
Ho Fat Tsang wrote: > Hi Richard, > > I have tuned the checkpoint_timeout to 30 second which is ten times less > than default and the issue is still reproduced. Do you have any recommended > configuration for WAL ? If you look at the output of "vmstat 10" and "iostat -m 10" (I'm assuming you're on Linux) does it show your I/O peaking every three minutes? I might have been wrong about the cause. -- Richard Huxton Archonet Ltd
Hi Richard,
Thank for your prompt reply. I have used the command "vmstat 10" to investigate the I/O issue and listed below :
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 26848 8376 2208 595796 0 0 16 16 14 13 5 2 91 2
1 0 26848 8024 2128 596324 0 0 1595 620 2006 3489 45 7 39 9
2 0 26848 8432 2024 595988 0 0 1399 163 1953 3830 38 8 47 7
2 0 26936 8488 2008 596092 0 0 1696 636 1973 7423 52 8 31 9
1 0 26936 8476 2008 596148 0 0 1237 660 1618 1863 34 6 50 11 <-- The starting time when the pgsql log transaction due to long execution duration.
0 0 26936 8024 1980 596756 0 0 1983 228 1985 2241 52 8 31 10
0 2 26936 8312 2040 595904 0 0 405 16674 1449 1675 17 6 1 76 <-- The intermediate time reaching I/O peak.
0 0 26936 8544 2088 594964 0 0 1191 8295 680 1038 30 4 13 53
2 0 26936 8368 2124 595032 0 0 517 935 866 985 14 3 79 4
0 0 26936 8368 2064 595228 0 0 1706 190 1979 2356 45 7 38 9
0 0 26936 8196 2132 595452 0 0 1713 642 1913 2238 44 8 37 11
1 1 26936 8164 2168 595512 0 0 1652 666 2011 2542 45 7 38 10
0 1 26936 8840 2160 594592 0 0 1524 228 1846 2116 42 8 43 7
0 0 26936 7384 2200 596304 0 0 1584 604 1972 2137 41 7 40 11
As you said, it seems for each 3~4 minutes, there is a I/O peak. But what is the problem indicating by it ?
Thanks for help.
Twinsen
Thank for your prompt reply. I have used the command "vmstat 10" to investigate the I/O issue and listed below :
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 26848 8376 2208 595796 0 0 16 16 14 13 5 2 91 2
1 0 26848 8024 2128 596324 0 0 1595 620 2006 3489 45 7 39 9
2 0 26848 8432 2024 595988 0 0 1399 163 1953 3830 38 8 47 7
2 0 26936 8488 2008 596092 0 0 1696 636 1973 7423 52 8 31 9
1 0 26936 8476 2008 596148 0 0 1237 660 1618 1863 34 6 50 11 <-- The starting time when the pgsql log transaction due to long execution duration.
0 0 26936 8024 1980 596756 0 0 1983 228 1985 2241 52 8 31 10
0 2 26936 8312 2040 595904 0 0 405 16674 1449 1675 17 6 1 76 <-- The intermediate time reaching I/O peak.
0 0 26936 8544 2088 594964 0 0 1191 8295 680 1038 30 4 13 53
2 0 26936 8368 2124 595032 0 0 517 935 866 985 14 3 79 4
0 0 26936 8368 2064 595228 0 0 1706 190 1979 2356 45 7 38 9
0 0 26936 8196 2132 595452 0 0 1713 642 1913 2238 44 8 37 11
1 1 26936 8164 2168 595512 0 0 1652 666 2011 2542 45 7 38 10
0 1 26936 8840 2160 594592 0 0 1524 228 1846 2116 42 8 43 7
0 0 26936 7384 2200 596304 0 0 1584 604 1972 2137 41 7 40 11
As you said, it seems for each 3~4 minutes, there is a I/O peak. But what is the problem indicating by it ?
Thanks for help.
Twinsen
2007/6/28, Richard Huxton < dev@archonet.com>:
Ho Fat Tsang wrote:
> Hi Richard,
>
> I have tuned the checkpoint_timeout to 30 second which is ten times less
> than default and the issue is still reproduced. Do you have any recommended
> configuration for WAL ?
If you look at the output of "vmstat 10" and "iostat -m 10" (I'm
assuming you're on Linux) does it show your I/O peaking every three
minutes? I might have been wrong about the cause.
--
Richard Huxton
Archonet Ltd
Ho Fat Tsang wrote: > Hi Richard, > > Thank for your prompt reply. I have used the command "vmstat 10" to > investigate the I/O issue and listed below : > > procs -----------memory---------- ---swap-- -----io---- --system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id > wa > 0 0 26848 8376 2208 595796 0 0 16 16 14 13 5 2 91 > 2 [etc] > 1 0 26936 8476 2008 596148 0 0 1237 660 1618 1863 34 6 50 > 11 <-- The starting time when the pgsql log transaction due to long > execution duration. > 0 0 26936 8024 1980 596756 0 0 1983 228 1985 2241 52 8 31 > 10 > 0 2 26936 8312 2040 595904 0 0 405 16674 1449 1675 17 6 1 > 76 <-- The intermediate time reaching I/O peak. [etc] > As you said, it seems for each 3~4 minutes, there is a I/O peak. But > what is > the problem indicating by it ? It's a burst of writing too (bo=blocks out for those who aren't familiar with vmstat). Well, there are four possibilities: 1. Something outside of PostgreSQL 2. An increase in update queries 3. Checkpoints 4. Vacuum If you keep an eye on "top" at the same time as vmstat, that should show whether it is another process. You would have mentioned if this co-incided with more queries, so we can probably rule that out. You've changed checkpointing timeouts and that's not affected this. We can see if it's autovacuum by disabling it in postgresql.conf and restarting PG. Try that and see if it alters things. It might be you need to vacuum more often (so you do less on each run) or it might be you need more/faster disks to keep up with your update activity. -- Richard Huxton Archonet Ltd
On Thu, 28 Jun 2007, Ho Fat Tsang wrote: > I have tuned the checkpoint_timeout to 30 second which is ten times less > than default and the issue is still reproduced. Doing a checkpoint every 30 seconds is crazy; no wonder your system is pausing so much. Put the timeout back to the default. What you should do here is edit your config file and set checkpoint_warning to its maximum of 3600. After that, take a look at the log files; you'll then get a warning message every time a checkpoint happens. If those line up with when you're getting the slowdowns, then at least you'll have narrowed the cause of your problem, and you can get some advice here on how to make the overhead of checkpoints less painful. The hint it will give is probably the first thing to try: increase checkpoint_segments from the default to something much larger (if it's at 3 now, try 10 instead to start), and see if the problem stops happening as frequently. Your problem looks exactly like a pause at every checkpoint, and I'm not sure what Richard was thinking when he suggested having them more often would improve things. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
"Greg Smith" <gsmith@gregsmith.com> writes: > On Thu, 28 Jun 2007, Ho Fat Tsang wrote: > >> I have tuned the checkpoint_timeout to 30 second which is ten times less than >> default and the issue is still reproduced. > > Your problem looks exactly like a pause at every checkpoint, and I'm not > sure what Richard was thinking when he suggested having them more often > would improve things. Having frequent checkpoints is bad for overall performance but should reduce the severity of the checkpoint impact. I interpreted his comment as saying he lowered it just as an experiment to test if it was checkpoint causing the problems not as a permanent measure. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
>>> On Thu, Jun 28, 2007 at 1:54 AM, in message <97e6e20f0706272354m5e42c91em177c8b80abb2ea52@mail.gmail.com>, "Ho Fat Tsang" <namiwf@gmail.com> wrote: > > I don't know why the server occasionally > slow down a bit for every 3 minutes. If the problem is checkpoints, try making your background writer more aggressive. This allows more of the pages to be writtento disk before the checkpoint starts. I'll show the settings which have eliminated similar problems for us, but yourbest settings will depend on hardware and are almost certainly going to be different. In particular, we have a batterybacked caching RAID controller, which seems to change the dynamics of these sorts of issues quite a bit. #bgwriter_delay = 200ms # 10-10000ms between rounds bgwriter_lru_percent = 20.0 # 0-100% of LRU buffers scanned/round bgwriter_lru_maxpages = 200 # 0-1000 buffers max written/round bgwriter_all_percent = 10.0 # 0-100% of all buffers scanned/round bgwriter_all_maxpages = 600 # 0-1000 buffers max written/round We also adjust a couple other WAL-related settings: wal_buffers = 160kB # min 32kB # (change requires restart) checkpoint_segments = 10 # in logfile segments, min 1, 16MB each Since you're on 8.0 I think you'll need to specify wal-buffers as a number of 8KB pages. -Kevin
Hi Richard,
I've tested again according your suggestion. I noticed that for each time the pgsql slow down, there is a short period a process called "pdflush" eating up lot of I/O. I've goolgled and know it is a process for writing dirty pages back to the disk by the Linux kernel. I will have further investigation on this process with my limited knowledge on Linux kernel.
Correct me if i am wrong. It seems postgresql 8.0 does not bundle auto-vacuum by default. So all vacuum and analyse are done manually ? So what i have tested related to vaccuum is running auto-vacuum (a executeable located in /bin) parallel under normal production load but it seems won't help.
Thanks for help.
Twinsen
I've tested again according your suggestion. I noticed that for each time the pgsql slow down, there is a short period a process called "pdflush" eating up lot of I/O. I've goolgled and know it is a process for writing dirty pages back to the disk by the Linux kernel. I will have further investigation on this process with my limited knowledge on Linux kernel.
Correct me if i am wrong. It seems postgresql 8.0 does not bundle auto-vacuum by default. So all vacuum and analyse are done manually ? So what i have tested related to vaccuum is running auto-vacuum (a executeable located in /bin) parallel under normal production load but it seems won't help.
Thanks for help.
Twinsen
2007/6/28, Richard Huxton <dev@archonet.com>:
Ho Fat Tsang wrote:
> Hi Richard,
>
> Thank for your prompt reply. I have used the command "vmstat 10" to
> investigate the I/O issue and listed below :
>
> procs -----------memory---------- ---swap-- -----io---- --system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id
> wa
> 0 0 26848 8376 2208 595796 0 0 16 16 14 13 5 2 91
> 2
[etc]
> 1 0 26936 8476 2008 596148 0 0 1237 660 1618 1863 34 6 50
> 11 <-- The starting time when the pgsql log transaction due to long
> execution duration.
> 0 0 26936 8024 1980 596756 0 0 1983 228 1985 2241 52 8 31
> 10
> 0 2 26936 8312 2040 595904 0 0 405 16674 1449 1675 17 6 1
> 76 <-- The intermediate time reaching I/O peak.
[etc]
> As you said, it seems for each 3~4 minutes, there is a I/O peak. But
> what is
> the problem indicating by it ?
It's a burst of writing too (bo=blocks out for those who aren't familiar
with vmstat).
Well, there are four possibilities:
1. Something outside of PostgreSQL
2. An increase in update queries
3. Checkpoints
4. Vacuum
If you keep an eye on "top" at the same time as vmstat, that should show
whether it is another process.
You would have mentioned if this co-incided with more queries, so we can
probably rule that out.
You've changed checkpointing timeouts and that's not affected this.
We can see if it's autovacuum by disabling it in postgresql.conf and
restarting PG. Try that and see if it alters things.
It might be you need to vacuum more often (so you do less on each run)
or it might be you need more/faster disks to keep up with your update
activity.
--
Richard Huxton
Archonet Ltd
Ho Fat Tsang wrote: > Hi Richard, > > I've tested again according your suggestion. I noticed that for each > time the pgsql slow down, there is a short period a process called > "pdflush" > eating up lot of I/O. I've goolgled and know it is a process for writing > dirty pages back to the disk by the Linux kernel. I will have further > investigation on this process with my limited knowledge on Linux kernel. Well, pdflush is responsible for flushing dirty pages to disk on behalf of all processes. If it's doing it every 3 minutes while checkpoints are happening every 30 seconds then I don't see how it's PG that's responsible. There are three possibilities: 1. PG isn't actually checkpointing every 30 seconds. 2. There is a burst of query activity every 3 minutes that causes a lot of writing. 3. Some other process is responsible. > Correct me if i am wrong. It seems postgresql 8.0 does not bundle > auto-vacuum by default. So all vacuum and analyse are done manually ? So > what i have tested related to vaccuum is running auto-vacuum (a executeable > located in /bin) parallel under normal production load but it seems won't > help. Can't remember whether 8.0 had autovacuum bundled and turned off or not bundled at all. If it's not running it can't be causing this problem though. -- Richard Huxton Archonet Ltd
Richard Huxton escribió: > Ho Fat Tsang wrote: > > Correct me if i am wrong. It seems postgresql 8.0 does not bundle > >auto-vacuum by default. So all vacuum and analyse are done manually ? So > >what i have tested related to vaccuum is running auto-vacuum (a executeable > >located in /bin) parallel under normal production load but it seems won't > >help. > > Can't remember whether 8.0 had autovacuum bundled and turned off or not > bundled at all. If it's not running it can't be causing this problem though. The separate binary he found is the contrib pg_autovacuum. Integrated autovac got into 8.1. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Fri, 29 Jun 2007, Ho Fat Tsang wrote: > I noticed that for each time the pgsql slow down, there is a short > period a process called "pdflush" eating up lot of I/O. I've goolgled > and know it is a process for writing dirty pages back to the disk by the > Linux kernel. The pdflush documentation is really spread out, you may find my paper at http://www.westnet.com/~gsmith/content/linux-pdflush.htm a good place to start looking into that. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Hi Greg.
2007/6/28, Greg Smith <gsmith@gregsmith.com>:
Yes, Thank you for your suggestion. i have found that the slowdown time does not align to checkpoint after i turned on the warning. The issue is related what Richard has been mentioned - Something outsides PG doing many write operations to pages.
On Thu, 28 Jun 2007, Ho Fat Tsang wrote:
> I have tuned the checkpoint_timeout to 30 second which is ten times less
> than default and the issue is still reproduced.
Doing a checkpoint every 30 seconds is crazy; no wonder your system is
pausing so much. Put the timeout back to the default. What you should do
here is edit your config file and set checkpoint_warning to its maximum of
3600. After that, take a look at the log files; you'll then get a warning
message every time a checkpoint happens. If those line up with when
you're getting the slowdowns, then at least you'll have narrowed the cause
of your problem, and you can get some advice here on how to make the
overhead of checkpoints less painful.
The hint it will give is probably the first thing to try: increase
checkpoint_segments from the default to something much larger (if it's at
3 now, try 10 instead to start), and see if the problem stops happening as
frequently. Your problem looks exactly like a pause at every checkpoint,
and I'm not sure what Richard was thinking when he suggested having them
more often would improve things.
Yes, Thank you for your suggestion. i have found that the slowdown time does not align to checkpoint after i turned on the warning. The issue is related what Richard has been mentioned - Something outsides PG doing many write operations to pages.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Hi Kevin,
Thank for your configuration. I have tested with this configuration (amended a bit) and it helps a bit. But i have found the root cause is related to the application that using PG.
But yet i can learn much to tune the PG in my restricted environment !.
Regards,
Twinsen
Thank for your configuration. I have tested with this configuration (amended a bit) and it helps a bit. But i have found the root cause is related to the application that using PG.
But yet i can learn much to tune the PG in my restricted environment !.
Regards,
Twinsen
2007/6/28, Kevin Grittner <Kevin.Grittner@wicourts.gov>:
>>> On Thu, Jun 28, 2007 at 1:54 AM, in message
<97e6e20f0706272354m5e42c91em177c8b80abb2ea52@mail.gmail.com>, "Ho Fat Tsang"
<namiwf@gmail.com> wrote:
>
> I don't know why the server occasionally
> slow down a bit for every 3 minutes.
If the problem is checkpoints, try making your background writer more aggressive. This allows more of the pages to be written to disk before the checkpoint starts. I'll show the settings which have eliminated similar problems for us, but your best settings will depend on hardware and are almost certainly going to be different. In particular, we have a battery backed caching RAID controller, which seems to change the dynamics of these sorts of issues quite a bit.
#bgwriter_delay = 200ms # 10-10000ms between rounds
bgwriter_lru_percent = 20.0 # 0-100% of LRU buffers scanned/round
bgwriter_lru_maxpages = 200 # 0-1000 buffers max written/round
bgwriter_all_percent = 10.0 # 0-100% of all buffers scanned/round
bgwriter_all_maxpages = 600 # 0-1000 buffers max written/round
We also adjust a couple other WAL-related settings:
wal_buffers = 160kB # min 32kB
# (change requires restart)
checkpoint_segments = 10 # in logfile segments, min 1, 16MB each
Since you're on 8.0 I think you'll need to specify wal-buffers as a number of 8KB pages.
-Kevin
2007/6/29, Richard Huxton <dev@archonet.com>:
Exactly ! you are right, finally i have found that the root cause for this is the application that use PG. There is memory leak using MappedByteBuffer (writing in java), it leads high I/O loading and finally reaches the ratio that pdflush is being kicked start in the kernel.
Thank you for helping a lot in digging out this issue ! learn much for you guys !
Ho Fat Tsang wrote:
> Hi Richard,
>
> I've tested again according your suggestion. I noticed that for each
> time the pgsql slow down, there is a short period a process called
> "pdflush"
> eating up lot of I/O. I've goolgled and know it is a process for writing
> dirty pages back to the disk by the Linux kernel. I will have further
> investigation on this process with my limited knowledge on Linux kernel.
Well, pdflush is responsible for flushing dirty pages to disk on behalf
of all processes.
If it's doing it every 3 minutes while checkpoints are happening every
30 seconds then I don't see how it's PG that's responsible.
There are three possibilities:
1. PG isn't actually checkpointing every 30 seconds.
2. There is a burst of query activity every 3 minutes that causes a lot
of writing.
3. Some other process is responsible.
Exactly ! you are right, finally i have found that the root cause for this is the application that use PG. There is memory leak using MappedByteBuffer (writing in java), it leads high I/O loading and finally reaches the ratio that pdflush is being kicked start in the kernel.
Thank you for helping a lot in digging out this issue ! learn much for you guys !
> Correct me if i am wrong. It seems postgresql 8.0 does not bundle
> auto-vacuum by default. So all vacuum and analyse are done manually ? So
> what i have tested related to vaccuum is running auto-vacuum (a executeable
> located in /bin) parallel under normal production load but it seems won't
> help.
Can't remember whether 8.0 had autovacuum bundled and turned off or not
bundled at all. If it's not running it can't be causing this problem though.
--
Richard Huxton
Archonet Ltd
2007/7/3, Greg Smith <gsmith@gregsmith.com>:
When i found the "pdflush" process is the major clue of PG slow down, i googled and found this article !
it is a really good one for tuning pdflush ! Thank a lot !
On Fri, 29 Jun 2007, Ho Fat Tsang wrote:
> I noticed that for each time the pgsql slow down, there is a short
> period a process called "pdflush" eating up lot of I/O. I've goolgled
> and know it is a process for writing dirty pages back to the disk by the
> Linux kernel.
The pdflush documentation is really spread out, you may find my paper at
http://www.westnet.com/~gsmith/content/linux-pdflush.htm a good place to
start looking into that.
When i found the "pdflush" process is the major clue of PG slow down, i googled and found this article !
it is a really good one for tuning pdflush ! Thank a lot !
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match