Thread: statistics process shutting down
During load testing, I'm getting the following error: FATAL: could not read from statistics collector pipe: No error LOG: statistics collector process (PID 1108) was terminated by signal 1 After which the statistics collector restarts. I have statement level and block level stats turned on. The server still hums along happily but after the restart both pg_stat_database and pg_stat_activity lie about how many backends are connected. Merlin
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > During load testing, I'm getting the following error: > FATAL: could not read from statistics collector pipe: No error > LOG: statistics collector process (PID 1108) was terminated by signal 1 Evidently coming from here: len = piperead(readPipe, ((char *) &msg) + nread, targetlen - nread); if (len < 0) { if (errno == EINTR) continue; ereport(ERROR, (errcode_for_socket_access(), errmsg("could not read from statistics collector pipe: %m"))); } So why is piperead() failing, and why doesn't it set errno to something useful? regards, tom lane
> Evidently coming from here: > > len = piperead(readPipe, ((char *) &msg) + nread, > targetlen - nread); > if (len < 0) > { > if (errno == EINTR) > continue; > ereport(ERROR, > (errcode_for_socket_access(), > errmsg("could not read from statistics collector > pipe: %m"))); > } > > So why is piperead() failing, and why doesn't it set errno to something > useful? Well, the win32 piperead() is really just a call to recv() (vs unix read()). Here is the win32 implemenation: int piperead(int s, char *buf, int len) { int ret = recv(s, buf, len, 0); if (ret < 0 && WSAGetLastError() == WSAECONNRESET) /* EOF on the pipe! (win32 socket based implementation) */ ret = 0; return ret; } I think the key to this puzzle is the return code from WSAGetLastError(). Also, the WSA call *might* be masking the value of errno. I did a quick search and came up with this: http://archives.postgresql.org/pgsql-patches/2001-10/msg00160.php I think maybe errno needs to get set to WSAGetLastError(). In pipe.c: if (ret < 0) { int wsa_errno; wsa_errno = WSAGetLastError(); if (WSAECONNRESET == wsa_errno) { /* EOF on the pipe! (win32 socket based implementation) */ ret = 0; } else { errno = wsa_errno; /* this *might* be ok */ } } return ret; Maybe Magnus might comment here. This doesn't explain why the read call is failing but I'm pretty sure the error code is not being properly returned. Merlin
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > if (ret < 0) > { > int wsa_errno; > wsa_errno = WSAGetLastError(); > if (WSAECONNRESET == wsa_errno) > { /* EOF on the pipe! (win32 socket based implementation) */ > ret = 0; > } > else > { > errno = wsa_errno; /* this *might* be ok */ > } > } > Maybe Magnus might comment here. This doesn't explain why the read call > is failing but I'm pretty sure the error code is not being properly > returned. I recall a lot of angst about whether the encoding of WSAGetLastError() is compatible with errno values, but I forget what the conclusion was. Can we just assign to errno like that, or do we need a mapping function? regards, tom lane
> > Maybe Magnus might comment here. This doesn't explain why the read call > > is failing but I'm pretty sure the error code is not being properly > > returned. > > I recall a lot of angst about whether the encoding of WSAGetLastError() > is compatible with errno values, but I forget what the conclusion was. > Can we just assign to errno like that, or do we need a mapping function? Ah. My bad. /backend/prot/win32/socket.c static void TranslateSocketError(void) etc. This puts the value in errno like it is suppoed to be. MS recv() does not (I checked) so this is definately a bug in pipe.c, since it is reasonable for the caller to expect errno to be set to something. Also, recv() and read() return completely different error codes :-). So any caller has to be careful not to make assumptions about errno. Merlin
Whoop! I missed something here. In src/include/port/win32.h, recv is #defined to pgwin32_recv(s, buf, len, flags). This version of the function appears to do all the errno mapping, etc. So pipe.c is correct, or at least I have no answer as to why the error code is not showing up in my log :(. Merlin
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > Ah. My bad. > /backend/port/win32/socket.c > static void > TranslateSocketError(void) etc. This puts the value in errno like it is > suppoed to be. Hmm. I am strongly tempted to remove src/port/pipe.c and put it into src/backend/port/win32/socket.c, so it can share this function. > MS recv() does not (I checked) so this is definately a > bug in pipe.c, since it is reasonable for the caller to expect errno to > be set to something. Can you provide a patch? I'm hesitant to hack code I don't have the means to test. regards, tom lane
Is this an open item? --------------------------------------------------------------------------- Merlin Moncure wrote: > > > > Maybe Magnus might comment here. This doesn't explain why the read > call > > > is failing but I'm pretty sure the error code is not being properly > > > returned. > > > > I recall a lot of angst about whether the encoding of > WSAGetLastError() > > is compatible with errno values, but I forget what the conclusion was. > > Can we just assign to errno like that, or do we need a mapping > function? > > Ah. My bad. > /backend/prot/win32/socket.c > > static void > TranslateSocketError(void) etc. This puts the value in errno like it is > suppoed to be. MS recv() does not (I checked) so this is definately a > bug in pipe.c, since it is reasonable for the caller to expect errno to > be set to something. > > Also, recv() and read() return completely different error codes :-). So > any caller has to be careful not to make assumptions about errno. > > Merlin > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momijan wrote: > > Is this an open item? > I would say yes, referring to the statistics collector restarting under heavy load conditions. Unfortunately, the server where I was most frequently getting the error went into production on Wednesday (with stats collector turned off). So testing the problem there is basically out until I can set up some kind of simulated test here at the office. Merlin