Re: [HACKERS] ERROR: could not read block - Mailing list pgsql-admin
From | Kevin Grittner |
---|---|
Subject | Re: [HACKERS] ERROR: could not read block |
Date | |
Msg-id | 437C6BE802000025000007B7@gwmta.wicourts.gov Whole thread Raw |
Responses |
Re: [HACKERS] ERROR: could not read block
|
List | pgsql-admin |
1) We run a couple Java applications on the same box to provide middle tier access. When the box is heavily loaded, I think I've seen about 80% PostgreSQL, 20% Java load. 2) I checked that no antivirus software was running, and had the techs pare down the services running on that box to the absolute minimum after the second failure, so that we could eliminate such issues as possible causes. 3) The aforementioned Java apps hold open 21 database connections. (One for a software publisher to query a list of jar files for access to the database, and 20 for a connection pool in the middle tier.) The way the pool is configured, six of those are used for queries of normal priority, so we rarely have more than six connections doing anything an any one moment. During the initial failure, the middle tier was under normal load, so 45,000 inserts were made to the table in question during the ujpdate. After we hit the problem, we removed that middle tier from the list of targets, so it was running, but totally idle during the remaining tests. None of this seems material, however. It's pretty clear that the problem was exhaustion of the Windows page pool. Our Windows experts have reconfigured the machine (which had been tuned for Sybase ASE). Their changes have boosted the page pool from 20,000 entries to 180,000 entries. We're continuing to test to ensure that the problem is not showing up with this configuration; but, so far, it looks good. If we don't want to tell Windows users to make highly technical changes to the Windows registry in order to use PostgreSQL, it does seem wise to use retries, as has already been discussed on this thread. -Kevin >>> "Magnus Hagander" <mha@sollentuna.net> >>> [copying this one over to hackers] > Our DBAs reviewed the Microsoft documentation you referenced, > modified the registry, and rebooted the OS. We've been > beating up on the database without seeing the error so far. > We'll keep at it for a while. Very interesting. As this seems to be a resource error, a couple of questions. Sorry if you've already answered some of them, couldn't find it in the archives. 1) Is this a dedicated pg server, or does it have something else on it? 2) We have to ask this - do you run any antivirus on it, that might nto be releasing resources the right way? Anything else that might stick in a kernel driver? 3) Are you hitting the database with many connections, or is this a single/few connection scenario? Are the other connections typically active when this shows up? Seems like we could just retry when we get this failure. The question is we need to do a small amount of sleep before we do? Also, we can't just retry forever, there has to be some kind of end to it... (If you read the SQL kb, it can be read as retrying is the correct thing, because the bug in sql was that it didn't retry) //Magnus ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster
pgsql-admin by date: