Thread: Questions about connection clean-up and "invalid page header"
Hi Everybody. I have two questions. 1. We have a system that is accessed by Crystal reports which is in turned controlled by another (3rd party) system. Now, when a report takes too long or the user cancels it, it doesn't send a cancel request to Postgres. It just kills the Crystal process that works on it. As a result, the query is left alive on the Postgres backend. Eventually I get the message "Unexpected End of file" and the query is cancelled. But this doesn't happen soon enough for me - these are usually very heavy queries, and I'd like them to be cleaned up as soon as possible if the client connection has ended. Is there a parameter to set in the configuration or some other means to shorten the time before an abandoned backend's query is cancelled? 2. I get the following message in my development database: vacuumdb: vacuuming of database "reports" failed: ERROR: invalid page header in block 6200 of relation "rb" I had this already a couple of months ago. Looking around the web, I saw this error is supposed to indicate a hardware error. I informed my sysadmin, but since this is just the dev system and the data was not important, I did a TRUNCATE TABLE on the "rb" relation, and the errors stopped... But now the error is back, and I'm a bit suspicious. If this is a hardware issue, it's rather suspicious that it returned in the exact same relation after I did a "truncate table". I have many other relations in the system, ones that fill up a lot faster. So I suspect this might be a PostgreSQL issue after all. What can I do about this? We are currently using PostgreSQL v. 8.3.1 on the server side. TIA, Herouth
On Sun, Jan 24, 2010 at 3:17 AM, Herouth Maoz <herouth@unicell.co.il> wrote: > Hi Everybody. > > I have two questions. > > 1. We have a system that is accessed by Crystal reports which is in turned > controlled by another (3rd party) system. Now, when a report takes too long or > the user cancels it, it doesn't send a cancel request to Postgres. It just > kills the Crystal process that works on it. > > As a result, the query is left alive on the Postgres backend. Eventually I get > the message "Unexpected End of file" and the query is cancelled. But this > doesn't happen soon enough for me - these are usually very heavy queries, and > I'd like them to be cleaned up as soon as possible if the client connection > has ended. The real solution is to fix the application. But I understand sometimes you can't do that. > Is there a parameter to set in the configuration or some other means to > shorten the time before an abandoned backend's query is cancelled? You can shorten the tcp_keepalive settings so that dead connections get detected faster. > 2. I get the following message in my development database: > > vacuumdb: vacuuming of database "reports" failed: ERROR: invalid page header > in block 6200 of relation "rb" > > I had this already a couple of months ago. Looking around the web, I saw this > error is supposed to indicate a hardware error. I informed my sysadmin, but > since this is just the dev system and the data was not important, I did a > TRUNCATE TABLE on the "rb" relation, and the errors stopped... > > But now the error is back, and I'm a bit suspicious. If this is a hardware > issue, it's rather suspicious that it returned in the exact same relation > after I did a "truncate table". I have many other relations in the system, > ones that fill up a lot faster. So I suspect this might be a PostgreSQL issue > after all. What can I do about this? Might be, but not very likely. I and many others run pgsql in production environments where it handles thousands of updates / inserts per minute with no corruption. We run on server class hardware with ECC memory and large RAID arrays with no corruption. Have you run something as simple as memtest86+ on your machine to see if it's got bad memory? > We are currently using PostgreSQL v. 8.3.1 on the server side. You should really update to the latest 8.3.x version (around 8.3.8 or so). It's simple and easy, and it's possible you've hit a bug in an older version of 8.3.
Scott Marlowe wrote:
Thanks, I'll ask my sysadmin to do that.You can shorten the tcp_keepalive settings so that dead connections get detected faster.
Someone pointed out to me, though, that comparing data warehouse systems to production systems is like Apples and Oranges - we also have a production system that, as you say, makes millions of inserts and updates per hour. It works very well with PostgreSQL - a lot better than with Sybase with which we worked previously. But the reports system on which I work makes bulk inserts using calculations based on complicated joins and each transaction is long and memory-consuming, as opposed to the production system, where each transaction takes a few milliseconds and is cleared immediately.Might be, but not very likely. I and many others run pgsql in production environments where it handles thousands of updates / inserts per minute with no corruption. We run on server class hardware with ECC memory and large RAID arrays with no corruption.
So far this only happened to me in the development server, and if it really is a matter of hardware, I'm not worried. What I am worried is if there really is some sort of bug that may carry to our production reports system.
I'll tell my sysadmin to do that. Thank you.Have you run something as simple as memtest86+ on your machine to see if it's got bad memory?
OK, I'll also try to get that done.We are currently using PostgreSQL v. 8.3.1 on the server side.You should really update to the latest 8.3.x version (around 8.3.8 or so). It's simple and easy, and it's possible you've hit a bug in an older version of 8.3.
Thanks for your help,
Herouth
On Mon, Jan 25, 2010 at 8:15 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> Is there a parameter to set in the configuration or some other means to >> shorten the time before an abandoned backend's query is cancelled? > > You can shorten the tcp_keepalive settings so that dead connections > get detected faster. > This won't help. The TCP connection is already being closed (or I think only half-closed). The problem is that in the Unix socket API you don't find out about that unless you check or try to read or write to it. The tcp_keepalive setting would only come into play if the remote machine crashed or was disconnected from the network. -- greg
Greg Stark wrote:
That's the situation I'm having, so it's OK. Crystal, being a Windows application, obviously runs on a different server than the database itself, so the connection between them is TCP/IP, not Unix domain sockets. And furthermore, that was exactly the problem as I described it - the fact that the third party software, instead of somehow instructing Crystal to send a cancel request to PostgreSQL, instead just kills the client process on the Windows side.On Mon, Jan 25, 2010 at 8:15 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:Is there a parameter to set in the configuration or some other means to shorten the time before an abandoned backend's query is cancelled?You can shorten the tcp_keepalive settings so that dead connections get detected faster.This won't help. The TCP connection is already being closed (or I think only half-closed). The problem is that in the Unix socket API you don't find out about that unless you check or try to read or write to it. The tcp_keepalive setting would only come into play if the remote machine crashed or was disconnected from the network.
Herouth
On Mon, Jan 25, 2010 at 11:37 AM, Herouth Maoz <herouth@unicell.co.il> wrote: > The tcp_keepalive setting would only come into play if the remote > machine crashed or was disconnected from the network. > > > That's the situation I'm having, so it's OK. Crystal, being a Windows > application, obviously runs on a different server than the database itself, > so the connection between them is TCP/IP, not Unix domain sockets. The unix socket api is used for both unix domain sockets and internet domain sockets. The point is that in the api there's no way to find out about a connection the other side has closed except for when you write or read from it or when you explicitly check. > And > furthermore, that was exactly the problem as I described it - the fact that > the third party software, instead of somehow instructing Crystal to send a > cancel request to PostgreSQL, instead just kills the client process on the > Windows side. Killing the client process doesn't mean the machine has crashed or been disconnected from the network. I'm assuming Crystal isn't crashing the machine just to stop the report... And even if it did and tcp_keepalives kicked in the server *still* wouldn't notice until it checked or tried to read or write to that socket. -- greg
Greg Stark wrote:
Well, I assume by the fact that eventually I get an "Unexpected end of file" message for those queries, that something does go in and check them. Do you have any suggestion as to how to cause the postgresql server to do so earlier?On Mon, Jan 25, 2010 at 11:37 AM, Herouth Maoz <herouth@unicell.co.il> wrote:The tcp_keepalive setting would only come into play if the remote machine crashed or was disconnected from the network. That's the situation I'm having, so it's OK. Crystal, being a Windows application, obviously runs on a different server than the database itself, so the connection between them is TCP/IP, not Unix domain sockets.The unix socket api is used for both unix domain sockets and internet domain sockets. The point is that in the api there's no way to find out about a connection the other side has closed except for when you write or read from it or when you explicitly check.And furthermore, that was exactly the problem as I described it - the fact that the third party software, instead of somehow instructing Crystal to send a cancel request to PostgreSQL, instead just kills the client process on the Windows side.Killing the client process doesn't mean the machine has crashed or been disconnected from the network. I'm assuming Crystal isn't crashing the machine just to stop the report... And even if it did and tcp_keepalives kicked in the server *still* wouldn't notice until it checked or tried to read or write to that socket.
Herouth
On Mon, Jan 25, 2010 at 1:16 PM, Herouth Maoz <herouth@unicell.co.il> wrote: > Well, I assume by the fact that eventually I get an "Unexpected end of file" > message for those queries, that something does go in and check them. Do you > have any suggestion as to how to cause the postgresql server to do so > earlier? No, Postgres pretty intentionally doesn't check because checking would be quite slow. If this is a plpgsql function looping you can put a RAISE NOTICE in the loop periodically. I suppose you could write such a function and add it to your query but whether it does what you want will depend on the query plan. -- greg