On 2015-12-01 20:55:02 +0100, Peter J. Holzer wrote:
> On 2015-12-01 18:58:31 +0100, Peter J. Holzer wrote:
> > I suspect such an interaction because I cannot reproduce the problem
> > outside of a stored procedure. A standalone Perl script doing the same
> > requests doesn't get a timeout.
[...]
> The strace doesn't show a reason for the SIGALRM, though. No alarm(2) or
> setitimer(2) system call (I connected strace to a running postgres
> process just after I got the prompt from "psql" and before I typed
> "select * from mb_search('export');" (I used a different (but very
> similar) stored procedure for those tests because it is much easier to
> find a search which is slow enough to trigger a timeout at least
> sometimes than a data request (which normally finishes in
> milliseconds)).
>
> So I guess my next task will be to find out where that SIGALRM comes
> from and/or whether I can just restart the zmq_msg_recv if it happens.
Ok, I think I know where that SIGALRM comes from: It's the
AuthenticationTimeout. What I'm seeing in strace (if I attach it early
enough) is that during authentication the postgres worker process calls
setitimer with a 60 second timeout twice. This matches the comment in
backend/postmaster/postmaster.c:
* Note: AuthenticationTimeout is applied here while waiting for the
* startup packet, and then again in InitPostgres for the duration of any
* authentication operations. So a hostile client could tie up the
* process for nearly twice AuthenticationTimeout before we kick him off.
As explained in backend/utils/misc/timeout.c, the timers are never
cancelled: If a timeout is cancelled, postgres just sees that it has
nothing to do and resumes whatever it is doing.
This is also what I'm seeing: 60 seconds after start, the process
receives a SIGALRM.
If the process is idle or in a "normal" SQL statement at the time, thats
not a problem. But if it is in one of my stored procedures which is
currently calling a ØMQ function which is waiting for some I/O
(zmq_msg_recv(), most likely), that gets interrupted and returns an
error which my code doesn't know how to handle (yet). So the error gets
back to the user.
A strange interaction between postgres and ØMQ indeed. But now that I
know what's causing it I can handle that. Thanks for your patience.
hp
--
_ | Peter J. Holzer | I want to forget all about both belts and
|_|_) | | suspenders; instead, I want to buy pants
| | | hjp@hjp.at | that actually fit.
__/ | http://www.hjp.at/ | -- http://noncombatant.org/