Re: Walreceiver fsyncs excessively - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Walreceiver fsyncs excessively
Date
Msg-id AANLkTi=Huh7Eg=+MadSuRPeatJ_KHEJ-_Sc_8_DxvOue@mail.gmail.com
Whole thread Raw
In response to Walreceiver fsyncs excessively  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Walreceiver fsyncs excessively  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Thu, Jan 13, 2011 at 9:01 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> While testing Fujii-san's patch to flush any already-written WAL on error in
> walreceiver, I added a debugging elog to XLogWalRcvFlush() to print out how
> far it has written and flushed.
>
> I saw an unexpected pattern while the standby catches up with the master:
>
> LOG:  streaming replication successfully connected to primary
> LOG:   flushing flush=0/0 write=0/1E020000
> LOG:   flushing flush=0/1E020000 write=0/1E040000
> LOG:   flushing flush=0/1E040000 write=0/1E060000
> LOG:   flushing flush=0/1E060000 write=0/1E080000
> LOG:   flushing flush=0/1E080000 write=0/1E0A0000
> LOG:   flushing flush=0/1E0A0000 write=0/1E0C0000
> LOG:   flushing flush=0/1E0C0000 write=0/1E0E0000
> LOG:   flushing flush=0/1E0E0000 write=0/1E100000
> LOG:   flushing flush=0/1E100000 write=0/1E120000
> LOG:   flushing flush=0/1E120000 write=0/1E140000
>
> The master sends the WAL at full-speed, but walreceiver always fsyncs it in
> 128 kB chunks. That's excessive, it should be able to read and write to disk
> the whole WAL segment before flushing.
>
> There's a little flaw in the walreceiver logic that tries to read all the
> available WAL before flushing and sleeping. The way libpqrcv_receive is
> written, when it's called with timeout==0 it will not call PQconsumeInput.
> So what happens is that when walreceiver main loop calls libpqrcv_receive()
> in a loop to fetch all WAL that's available without blocking, it actually
> only reads the WAL that's in the libpq receive buffer - it will not read the
> WAL that's in the TCP read buffer.
>
> Attached patch fixes libpqrcv_receive() so that it calls PQconsumeInput()
> before concluding that there's no data available. The excessive fsyncing can
> lead to very bad performance, so this needs to be appled to 9.0 too.

Seems good.

Can we remove the "justconnected" flag, thanks to the patch?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: pg_ctl failover Re: Latches, signals, and waiting
Next
From: Pavel Golub
Date:
Subject: Warning compiling pg_dump (MinGW, Windows XP)