Re: incorrect handling of the timeout in pg_receivexlog - Mailing list pgsql-hackers
From | Magnus Hagander |
---|---|
Subject | Re: incorrect handling of the timeout in pg_receivexlog |
Date | |
Msg-id | CABUevEyNa7jr6hNT6e5A5-=3Qkhztiqn+3h3se3PLRpL-6f6zQ@mail.gmail.com Whole thread Raw |
In response to | Re: incorrect handling of the timeout in pg_receivexlog (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: incorrect handling of the timeout in pg_receivexlog
(Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
|
List | pgsql-hackers |
On Tue, Feb 7, 2012 at 10:31, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 07.02.2012 09:03, Fujii Masao wrote: >> >> On Tue, Feb 7, 2012 at 2:58 PM, Fujii Masao<masao.fujii@gmail.com> wrote: >>> >>> When I compiled HEAD with --disable-integer-datetimes and tested >>> >>> pg_receivexlog, I encountered unexpected replication timeout. As >>> far as I read the pg_receivexlog code, the cause of this problem is >>> that pg_receivexlog handles the standby message timeout incorrectly >>> in --disable-integer-datetimes. The attached patch fixes this problem. >>> Comments? >> >> >> receivelog.c >> ------- >> timeout.tv_sec = last_status + standby_message_timeout - now - 1; >> if (timeout.tv_sec<= 0) >> ------- >> >> Umm.. the above code also handles the timestamp incorrectly. ISTM that the >> root cause of these problems is that receivelog.c uses TimestampTz. > > > Yep. While localGetCurrentTimestamp() returns a TimestampTz and handles > float timestamps correctly, the caller just assigns the result to a int64 > variable, assuming --enable-integer-datetimes. Ugh. Indeed. >> What about changing receivelog.c so that it uses time_t instead of >> TimestampTz? Which would make the code simpler, I think. > > > Hmm, that would reduce granularity to seconds. The --statusint option is > given in seconds, but it would be good to have more precision in the > calculations to avoid rounding errors. > > But actually, if the purpose of the --statusint option is to avoid > disconnection because of exceeding the server's replication_timeout, one > second granularity just isn't enough to be begin with. replication_timeout > is given in milliseconds, so if you set replication_timeout=900ms in the > server, there is no way to make pg_basebackup/pg_receivexlog to reply in > time. > > So, --statusint needs to be in milliseconds. And while we're at it, how > difficult would be to ask the server for the current value of > replication_timeout, and set --statusint automatically based on that? Or > perhaps mark replication_timeout as GUC_REPORT. It is rather fiddly that > depending on a server setting, you need to pass an option in the client or > it will just silently fail with no indication of what the problem is. We can't really ask for it easily, since we're on a replication connection. Unless we add that to the walsender grammar, but that would make it impossible to use the receivexlog stuff against a 9.1 server (which I think still works, though I haven't tested it in a while). Do we have a facility to make it a GUC_REPORT but only for walsender connections? It seems like a very unnecessary thing to have it sent out over every single connection, since it would only be useful in a very small subset of them. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
pgsql-hackers by date: