Re: [HACKERS] Full page writes improvement, code update - Mailing list pgsql-patches

From Zeugswetter Andreas ADI SD
Subject Re: [HACKERS] Full page writes improvement, code update
Date
Msg-id E1539E0ED7043848906A8FF995BDA57901E7BD64@m0143.s-mxs.net
Whole thread Raw
In response to Re: [HACKERS] Full page writes improvement, code update  (Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp>)
Responses Re: [HACKERS] Full page writes improvement, code update  (Koichi Suzuki <suzuki.koichi@oss.ntt.co.jp>)
List pgsql-patches
> I don't fully understand what "transaction log" means.   If it means
> "archived WAL", the current (8.2) code handle WAL as follows:

Probably we can define "transaction log" to be the part of WAL that is
not
full pages.

> 1) If full_page_writes=off, then no full page writes will be
> written to WAL, except for those during onlie backup (between
> pg_start_backup and
> pg_stop_backup).   The WAL size will be considerably small
> but it cannot
> recover from partial/inconsistent write to the database
> files.  We have to go back to the online backup and apply all
> the archive log.
>
> 2) If full_page_writes=on, then full page writes will be
> written at the first update of a page after each checkpoint,
> plus full page writes at
> 1).   Because we have no means (in 8.2) to optimize the WAL
> so far, what
> we can do is to copy WAL or gzip it at archive time.
>
> If we'd like to keep good chance of recovery after the crash,
> 8.2 provides only the method 2), leaving archive log size
> considerably large.  My proposal maintains the chance of
> crash recovery the same as in the case of full_page_writes=on
> and reduces the size of archived log as in the case of
> full_page_writes=off.

Yup, this is a good summary.

You say you need to remove the optimization that avoids
the logging of a new tuple because the full page image exists.
I think we must already have the info in WAL which tuple inside the full
page image
is new (the one for which we avoided the WAL entry for).

How about this:
Leave current WAL as it is and only add the not removeable flag to full
pages.
pg_compresslog then replaces the full page image with a record for the
one tuple that is changed.
I tend to think it is not worth the increased complexity only to save
bytes in the uncompressed WAL though.

Another point about pg_decompresslog:

Why do you need a pg_decompresslog ? Imho pg_compresslog should already
do the replacing of the
full_page with the dummy entry. Then pg_decompresslog could be a simple
gunzip, or whatever compression was used,
but no logic.

Andreas

pgsql-patches by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Automatic adjustment of bgwriter_lru_maxpages
Next
From: "Marko Kreen"
Date:
Subject: Re: RESET SESSION v3