Thread: BUG #4801: Performance failure: 600 MB written to each WAL log

BUG #4801: Performance failure: 600 MB written to each WAL log

From

"Peter Much"

Date:

08 May 2009, 09:26:54

The following bug has been logged online:

Bug reference:      4801
Logged by:          Peter Much
Email address:      pmc@citylink.dinoex.sub.org
PostgreSQL version: 8.2.13
Operating system:   FreeBSD 7.2
Description:        Performance failure: 600 MB written to each WAL log
Details:

Server becomes very slow while tremendous amounts of data are written
to WAL logs - disk activity log shows 600 MB of effective write requests
before each 16 MB WAL log is archived and recycled.

Problem appears occasionally all of a sudden with no apparent reason.
I perveived it with 8.2.5(?), 8.2.11 and now 8.2.13, with FreeBSD Rel. 6.3
and 7.2.

I perceived it only during "vaccum full" yet, but then a long-running
"vacuum full" on a large table is the only significant activity on the
server. (I know that "vacuum full" may not be really necessary, but
I think it should practically function.)

When starting the "vacuum full", the activity looks what I think normal:
there is heavy activity on the database table files, accompanied by
a certain amount of accesses to the WAL logs. The amount of disk writes
to WAL then figures (more or less) to the WAL switching frequency.
And it may stay so and the vacuum may complete successful.

But sometimes (same "vacuum full", same table) the server suddenly
changes into a very slow state of proceeding: then there is only few
activity on the database table disk, while the WAL disk is busy writing
at 100%. But the WAL logs are filled very slow, too.
I can watch the proceeding of the vacuum by listing the open file
handle offsets on the database table files (with lsof). And the ratio
between these offsets proceeding and the number of archived WAL logs
is the same in both states.

Also, the server functions correctly by all means when in this slow
state, and all requests are processed. And there are no error messages.
The only difference is that in normal state of operation the amount of
data written to one WAL log is just some more than the 16 MB it contains,
while in the slow state it is about 600 MB - and this slows down
operation. (The vacuum job must be killed then, because it would take
days instead of hours to complete.)

I have not really figured out the conditions that do bring the server
into this slow mode of operation. It seems to happen when a certain
amount of other activity (on other databases within the same server)
occurs. That is, as long as there is not much other activity on the
system, the vacuum will usually complete with normal speed.

But after the server has fallen into the slow state, it will not get
out of it again. It has to be stopped and restarted to get back to
normal speed; killing and restarting the vacuum process will not
help.

The WAL handling configuration in the config file is as default, only
I do WAL archiving. I have tried to increae wal_buffers, this had no
effect.
I have moved the whole installation from FreeBSD-UFS diskspace to ZFS,
and have switched off full_page_writes. This does significantly reduce
the absolute amount of produced WALs, and therefore the slow state of
operation becomes faster than before, but besides this the behaviour
did not change.

The command "show wal_sync_method" yields "fsync".

Over all, I do not think that the problem relates to WAL storage. It
seems to be mainly the question: what is this tremendous amount of
-obviousely redundant- data that is written to WAL, and how do I get
the server out of this mode again without disrupting operation?

One other thing should be remarked here: The server is a very small
system. It has not much memory and will do a real lot of paging; the
disks are also not fast.
I occasionally experience that under such conditions software shows
bugs that would never become visible on the usual oversized systems.

Configuration of the server is as follows (as changed from defaults):
> max_connections = 40
> shared_buffers = 28MB
> max_fsm_pages = 179200
> effective_cache_size = 12MB
> autovacuum = off
(autovacuum is off to reduce noise - disks will spindown when
databases are not used.)

Re: BUG #4801: Performance failure: 600 MB written to each WAL log

From

Rafael Martinez

Date:

08 May 2009, 09:59:12

Peter Much wrote:

>
> I perceived it only during "vaccum full" yet, but then a long-running
> "vacuum full" on a large table is the only significant activity on the
> server. (I know that "vacuum full" may not be really necessary, but
> I think it should practically function.)
>
[..........]

Hei

How large is that 'large table' you are talking about?

Our experience says that you don't need vacuum full at all when
administrating a postgres database. Specially if the database is large
and in production.

vacuum full on a large database has so long execution time and uses so
many resources (IO/ram/time) that it is almost useless. If you have such
a problem with your database, that you need the functionality delivered
by vacuum full, it is better to export and import your database.

This does not explain your problem, but maybe if you can avoid running
vacuum full unnecessarily, you will also avoid this problem you are
experiencing.

regards
--
 Rafael Martinez, <r.m.guerrero@usit.uio.no>
 Center for Information Technology Services
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/

Re: BUG #4801: Performance failure: 600 MB written to each WAL log

From

Peter Much

Date:

08 May 2009, 13:13:56

Dear Rafael,

I agree with You. And that is actually what I have done up to now,
although for a different reason: the amount of WAL logs produced
by VACUUM FULL is practically unpredictable and can be much more than
the database size, and since that database belongs to the backup system,
the backup system cannot archive them during VACUUM FULL.
I have solved this now with ZFS, abandoning the need for
full_page_writes.

On Fri, May 08, 2009 at 02:59:04PM +0200, Rafael Martinez wrote:
! How large is that 'large table' you are talking about?

20 Mio. tuples. Which is quite a lot for a pentium-II mainly
used as a router with some attached tapedrives.
OTOH, the machine has not much else to do, so it can well run
VACUUM FULL once in a while.

! vacuum full on a large database has so long execution time and uses so
! many resources (IO/ram/time) that it is almost useless. If you have such
! a problem with your database, that you need the functionality delivered
! by vacuum full, it is better to export and import your database.

Yes. That respective postgres installation used to contain (and
still does) a couple of databases for my needs, some of them
experimental, none of them big. I considered it very practical
to run VACUUM FULL (and Index rebuild) thru all instances from
cron every week or month, so the databases would reshape by
themselves if some mess was left behind.

Then I aquired the backup software, it needed a database, so I
placed it into the existing postgres installation - and after
noticing that it tends to get some hundred times bigger than
the others together, I gave it some extra diskspace. And I
thought, postgresQL is extremely scaleable, it can do it.

So, I should exclude that backup-system DB from the VACUUM FULL
cronjob. Okay, well, if it works properly, it runs some hours and
does not hurt anybody - and, as it is done regularly at a fixed
time, I can look into my recorded "df" outputs later, and they
will show me the exact amount the DB is growing over time.

! This does not explain your problem, but maybe if you can avoid running
! vacuum full unnecessarily, you will also avoid this problem you are
! experiencing.

Yes. But this is a really strange behaviour, and it does not at
all match with the usual postgresQL style, which is very precise.
Or, in other words: it creeps like a bug, it smells like a bug,
it looks like a bug - maybe it is a bug. And I am not fond of letting
identified bugs creep under the carpet - they tend to come out at
unpleasant moments.
Or, said again differntly: that server does something which seems
not to make sense at all. Lets hunt it down, at least figure out
why it does this.

It might be possible to approach this by logical analysis, without
debugging gear, by asking: >under which circumstances are WAL logs
written redundantly?<
Or, if there is someone who would like to analyze the matter, I could
try to get the FreeBSD GEOM or something there to do a tracelog
of the stuff that is actually written to WAL.
Maybe then we find something that is not limited to VACUUM FULL, or
maybe we find something that does NOT do greater havoc only by good
luck. Or maybe it is just the way it should work...

best regards,
Peter

Re: BUG #4801: Performance failure: 600 MB written to each WAL log

From

Simon Riggs

Date:

08 May 2009, 15:40:48

On Fri, 2009-05-08 at 12:26 +0000, Peter Much wrote:

> Server becomes very slow while tremendous amounts of data are written
> to WAL logs - disk activity log shows 600 MB of effective write
> requests before each 16 MB WAL log is archived and recycled.

VACUUM FULL does many things, most of them slowly. It re-writes normal
database blocks setting hint bits, which are not WAL logged. This may
explain the figures you have.

8.2 is also a fairly poor performer with VACUUM and will cause many WAL
unnecessary flushes per WAL file. Not sure why you are using an old
release of PostgreSQL on a new BSD version, but if you upgrade
PostgreSQL and use VACUUM instead you will see improvement.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Re: BUG #4801: Performance failure: 600 MB written to each WAL log

From

Peter Much

Date:

08 May 2009, 23:13:35

Dear Simon,

On Fri, May 08, 2009 at 07:15:19PM +0100, Simon Riggs wrote:
! > Server becomes very slow while tremendous amounts of data are written
! > to WAL logs - disk activity log shows 600 MB of effective write
! > requests before each 16 MB WAL log is archived and recycled.
!
! VACUUM FULL does many things, most of them slowly. It re-writes normal
! database blocks setting hint bits, which are not WAL logged. This may
! explain the figures you have.

You did not get me fully right. The 600 MB are written ONTO EACH
16 MB WAL LOGFILE before it gets switched. And that is a lot more
than superfluous flushes should account for.

And since this does not happen always, only after the server falls
into this strange mode of operation it will stay there - and after
restarting,  the vacuum process will practically do the same things
again and may work correctly then - I consider this a remarkable
effect, because such things usually tend to point to some race
condition somewhere.

! 8.2 is also a fairly poor performer with VACUUM and will cause many WAL
! unnecessary flushes per WAL file.

Yes, I see that too.

! Not sure why you are using an old
! release of PostgreSQL on a new BSD version,

Hm. Because it works, and it is stable and reliable? I am using this
database system more or less since it was called Ingres, and I think
it has already improved a lot. ;)

The only reason for upgrading the BSD was to get ZFS, and get rid of the
full_block_writes. Up to now I was practically drowning in WAL logs;
now they have reduced by factor 8, and I am very happy with that.
(And for a backup system one MUST archive the WAL logs, too. It
doesnt make much sense to do offline backups, and not have current offline
backups of the database needed to find something on these backups...)

I am not very fond of major upgrades. Due to the heap of
various functionalities that I have accumulated over time, each
major upgrade will bring some problems, incompatibilities and
features, and then the first thing to recognize is always: something
that used to work does no longer or different. And then I have to
dig into that respective construct and figure out how it needs to be
adjusted. For instance, the interface between postgresQL and Ruby-
on-Rails Rel. 1.2 is still for postgres-7 - it works with 8.2, but
there is an issue with escapes in strings ("standard_conforming_strings");
so likely I have to go for RoR Rel. 2.x - which means work thru
my RoR apps and fixup things that are deprecated. Etc. etc., You get
the idea.

So, usually I try to upgrade at least two major levels at once.

! but if you upgrade
! PostgreSQL and use VACUUM instead you will see improvement.

I do daily VACUUM, anyway. But out of good habit I also do a monthly
VACUUM FULL. I like self-cleaning systems (You should see the mess in
my rooms ;))


So, well, on the bottomline I read from Your message: You guys have
redesigned the VACUUM process in 8.3, and therefore nobody is really
eager to dig into the old stuff and search for strange problems
there. That's an argument, I can understand.

best regards,
Peter