Re: plperlu stored procedure seems to freeze for a minute - Mailing list pgsql-general

From Adrian Klaver
Subject Re: plperlu stored procedure seems to freeze for a minute
Date
Msg-id 565DE4D9.20604@aklaver.com
Whole thread Raw
In response to Re: plperlu stored procedure seems to freeze for a minute  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
Responses Re: plperlu stored procedure seems to freeze for a minute  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
List pgsql-general
On 12/01/2015 09:58 AM, Peter J. Holzer wrote:
> On 2015-12-01 07:16:04 -0800, Adrian Klaver wrote:
>> On 12/01/2015 06:51 AM, Peter J. Holzer wrote:
>>> A rather weird observation from the log files of our server (9.5 beta1):
>>>
>>> 2015-12-01 09:23:37 CET [26265]: [42-1] user=fragner,db=wds,pid=26265 WARNING:  Use of uninitialized value $success
inconcatenation (.) or string at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. 
>>> 2015-12-01 09:23:37 CET [26265]: [43-1] user=fragner,db=wds,pid=26265 CONTEXT:  PL/Perl function
"mb_timeseriesdata_zmq"
>>> [lots of other stuff from different connections]
>>> 2015-12-01 09:24:45 CET [26265]: [44-1] user=fragner,db=wds,pid=26265 ERROR:  impossible result '' (payload=) at
/usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pmline 36. 
>>>                  WDS::Macrobond::Utils::decode_result("") called at line 30
>>>                  main::__ANON__("gen_wqehur") called at -e line 0
>>>                  eval {...} called at -e line 0
>>>
>>> Two messages from the same line of the same plperlu stored procedure, 68
>>> seconds apart. So what is this line 36?
>>>
>>>      confess "impossible result '$success' (payload=$payload)";
>>>
>>> What? The first message clearly comes from interpolating $success
>>> (which is undef at that point) into the argument. The second from
>>> confess itself. What could cause a plperlu procedure to freeze for 68
>>> seconds between the call to confess and its output?
>>>
>>> Is it possible that only the writing of the log entry is delayed?
>>>
>>> Another weird thing: $success is undef because a ØMQ rpc call[1] timed
>>
>> And the call is?
>
> The sequence is:
>
> my $req_sck = zmq_socket($context, ZMQ_REQ);
> zmq_connect($req_sck, $url);
> my $qry_msg = join(...);
> zmq_send($req_sck, $qry_msg);
> my $res_msg = zmq_msg_init();
> my $rv = zmq_msg_recv($res_msg, $req_sck);
> # check rv here.
> my $data = zmq_msg_data($res_msg); # $data is "" here
> my $result = WDS::Macrobond::Utils::decode_result($data); # the error messages are from this function
>
> (Yeah, ØMQ is quite low-level. There is a higher level Perl Module, but
> I'm not using it).
>
> I omitted that because I don't think it's terribly relevant here.
> Details of the usage of ØMQ are better discussed on the ØMQ mailing
> list.
>
> But there is something else which may be relevant: ØMQ uses threads
> internally, and I don't actually know whether zmq_msg_recv returning

Except I see this here:

http://api.zeromq.org/4-0:zmq-socket

Thread safety

ØMQ sockets are not thread safe. Applications MUST NOT use a socket from
multiple threads except after migrating a socket from one thread to
another with a "full fence" memory barrier.

> means that the read(2) call (or whatever) on the socket terminates.
> It may actually continue in another thread. But I still don't see how
> that could block the main thread (or wake it up again in a place which
> has nothing to do with ØMQ (confess is a standard Perl function to print
> a stack trace and die)). Or - just thinking aloud here - I fear I'm
> abusing you guys as support teddy bears[1] - maybe it's the other way
> round: confess dies, so maybe it frees some lock during cleanup which
> allows the message which should have been sent by zmq_send to finally go
> out on the wire. But that still doesn't explain the 68 seconds spent in
> confess ...
>
> Postgres worker processes are single-threaded, are they? Is there
> something else which could interact badly with a moderately complex
> multithreaded I/O library used from a stored procedure?
>
> I suspect such an interaction because I cannot reproduce the problem
> outside of a stored procedure. A standalone Perl script doing the same
> requests doesn't get a timeout.
>
> I guess Alvaro is right: I should strace the postgres worker process
> while it executes the stored procedure. The problem of course is that
> it happens often enough be annoying, but rarely enough that it's not
> easily reproducible.

 From here:

http://api.zeromq.org/4-0:zmq-connect

It seems something like(I am not a Perl programmer, so approach carefully):

my $rc = zmq_connect($req_sck, $url);

Then you will have an error code to examine.

Have you looked at the Notes at the bottom of this page:

http://www.postgresql.org/docs/9.4/interactive/plperl-trusted.html

>
>>> out (after 60 seconds, which is also a mystery, because ØMQ doesn't seem
>>> to have a default timeout of 60 seconds, and I don't set one). But at
>>
>> Network timeout?
>
> That was my first guess, but I don't see where it would come from. Or
> why it only is there if I call the code from a stored procedure, not
> from a standalone script.
>
>>> 09:24:45 (i.e. the time of the error message) the answer for that RPC
>>> call arrived. So it kind of looks like confess waited for the message to
>>> arrive (which makes no sense at all) or maybe that confess waited for
>>> something which also blocked the sending of the request (because
>>> according to the server logs, the RPC request only arrived there at
>>> 09:24:45 and was answered within 1 second), but that doesn't make any
>>
>> So if the request timed out how did you get a reply, a second request?
>
> Nope. I don't really "get" the reply. I just see in the logs of the
> other server that it sent a reply at that time.
>
> The time line is like this
>
> time    postgres process        mb_dal process
> T       zmq_send()
>          zmq_msg_recv()
>
> T+60    zmq_msg_recv returns
>          an empty message;
>          decode() is called
>          which notices that the
>          message is empty and
>          calls confess();
>
> T+128   confess() prints a      receives the message sent at T;
>          stacktrace;             sends a reply;
>
>          hp
>
> [1] For those who don't know the story: Someone at a help desk noticed
> that people often started to explain their problem and somewhere in the
> middle said "never mind, I think I know what the problem is" and went
> away. So he put a big teddy bear on the desk and asked people to explain
> their problems to the bear. So a support teddy bear is someone who you
> talk to mostly to clear your own mind, not because you expect a
> solution.
>


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: plperlu stored procedure seems to freeze for a minute
Next
From: "Peter J. Holzer"
Date:
Subject: Re: plperlu stored procedure seems to freeze for a minute