Hi,
On 05/04/2011 03:46 AM, Tian Luo wrote:
> No matter I turn on or turn off the "full_page_writes", I always
> observe 8192-byte writes of log data for simple write operations
> (write/update).
How did you measure that? A single transaction doing a single write, I
guess. Ever tried multiple transactions with a simple write operation
each and checking how much WAL that spits out per transaction?
As I understand it, dirty blocks are written to disk as soon as
feasible. After all, that helps crash recovery. With a basically idle
system, "as soon as feasible" might be pretty soon. However, put your
(disk sub-) system under load and "as soon as feasible" might take awhile.
> But according to the document, when this is off, it could speed up
> operations but may cause problems during recovery. So, I guess this is
> because it writes less when the option is turned off. However, this
> contradicts my observations ....
I think you didn't trigger the savings. It's about writing full pages
on the first write to a block after a checkpoint. Did you monitor
checkpoint times of Postgres in your tests?
> If I am not missing anything, I find that the writes of log data go
> through function "XLogWrite" in source file
> "backend/access/transam/xlog.c".
>
> In this file, log data are written with the following code:
>
> from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
> nbytes = npages * (Size) XLOG_BLCKSZ;
> if (write(openLogFile, from, nbytes) != nbytes)
> {
> ...
> }
>
> So, "nbytes" should always be multiples of XLOG_BLCKSZ, which in the
> default case, is 8192.
That observation seems correct.
Regards
Markus Wanner