Thread: vacuum slowed by syslogd
On many occasions, I've noticed that some PostgreSQL activity takes far longer than it previously did and that disablingsyslogd addresses the symptoms. Most recently, it took 20-60 seconds to VACUUM a small, heavily updated table, while it used to take a fraction of a second.I found syslog entries like these: 13:19:53 --Relation sometable-- 13:20:03 Removed 2 tuples in 1 pages. 13:20:23 ^ICPU 0.00s/0.00u sec elapsed 0.00 sec. 13:20:54 Pages 1: Changed 1, Empty 0; Tup 4: Vac 2, Keep 0, UnUsed 13. 13:20:54 ^ITotal CPU 0.00s/0.00u sec elapsed 60.12 sec. It took almost-exactly 60 seconds, but virtually no CPU time was used (and no disk IO). Many similar examples have "real"times that are near perfect multiples of 10 seconds (e.g. 50.09, 40.07). This is not every single VACUUM, but it isfrequent. The problem disappears when syslogd is stopped or when PostgreSQL disables syslog usage. This is very consistent and I canreproduce the problem in some installations by toggling these factors on and off. The other way to toggle the problem on/off is to disable syslogd's udp socket (though this is a feature we'd still like touse). This evidence normally indicates a name resolution issue, but I'm not sure how to test for that beyond using "hostname -v-i". This has happened at several installations. We use both 7.2.1 (Debian stable) and 7.3.2 (Debian testing). I would guess that this is not the result of a PostgreSQL deficiency, but it is most sharply affected by it. Perhaps someonecan offer insight? Thanks, Mike Adler
Michael Adler <adler@pobox.com> writes: > On many occasions, I've noticed that some PostgreSQL activity takes > far longer than it previously did and that disabling syslogd addresses > the symptoms. > ... > This evidence normally indicates a name resolution issue, but I'm not > sure how to test for that beyond using "hostname -v -i". I looked in Red Hat's bugzilla for similar issues, and found a mention that syslogd tries to reverse-lookup the address it gets each message from. A delay there could act as you describe. At least in recent RH releases, you can start syslogd with the "-x" switch to disable this lookup --- does that exist in Debian, and if so does it help? I would expect any comparable resolution delays on the client side to occur only once when a backend is started, not once per message, since we only openlog() once per backend. So a delay on the syslogd side seems a likely theory. Something else you could do to investigate is to test whether syslogd can log messages from other clients during these 10-second delays. If it's hung trying to resolve a name, presumably messages from other clients would get delayed too. regards, tom lane
On Tue, Jan 13, 2004 at 11:36:19PM -0500, Tom Lane wrote: > Michael Adler <adler@pobox.com> writes: > > On many occasions, I've noticed that some PostgreSQL activity takes > > far longer than it previously did and that disabling syslogd addresses > > the symptoms. > > ... > > This evidence normally indicates a name resolution issue, but I'm not > > sure how to test for that beyond using "hostname -v -i". > > I looked in Red Hat's bugzilla for similar issues, and found a mention > that syslogd tries to reverse-lookup the address it gets each message > from. A delay there could act as you describe. At least in recent RH > releases, you can start syslogd with the "-x" switch to disable this > lookup --- does that exist in Debian, and if so does it help? There is no such option within Debian or the syslogd from freshmeat. Sounds nice, though. I was able to track down a name resolution issue. The "nameserver" in /etc/resolv.conf pointed to a nonexistent host. Thisadded ten second delays to telnetd logins and netstat -a. This seemed to be the heart of the problem, but I'm gettingsome mixed info from the "field" and I'm still following up. Thanks, Mike
Michael Adler wrote: > On many occasions, I've noticed that some PostgreSQL activity takes far longer than it previously did and that disablingsyslogd addresses the symptoms. > > Most recently, it took 20-60 seconds to VACUUM a small, heavily updated table, while it used to take a fraction of a second.I found syslog entries like these: > > 13:19:53 --Relation sometable-- > 13:20:03 Removed 2 tuples in 1 pages. > 13:20:23 ^ICPU 0.00s/0.00u sec elapsed 0.00 sec. > 13:20:54 Pages 1: Changed 1, Empty 0; Tup 4: Vac 2, Keep 0, UnUsed 13. > 13:20:54 ^ITotal CPU 0.00s/0.00u sec elapsed 60.12 sec. > > It took almost-exactly 60 seconds, but virtually no CPU time was used (and no disk IO). Many similar examples have "real"times that are near perfect multiples of 10 seconds (e.g. 50.09, 40.07). This is not every single VACUUM, but it isfrequent. > > The problem disappears when syslogd is stopped or when PostgreSQL disables syslog usage. This is very consistent and Ican reproduce the problem in some installations by toggling these factors on and off. > Did you try to put a "-" before the file log name ? Example: LOCAL0.* -/var/log/postgresql.log Regards Gaetano Mendola