Thread: fsync and semctl errors with 8.1.5/win32
I've been attempting to run PostgreSQL 8.1.5/win32 on a production deployment, but have started having many problems. McAfee Antivirus is installed and running, although I've excluded the entire drive where PostgreSQL is installed and where the data is installed. I've received several errors in the past few days/weeks. They fall into three general categories 1) permission denied errors 2) semctl errors 3) fsync errors. I am not sure how to reproduce these errors locally - they seem to occur at unpredictable intervals. The following posts seem related, although I don't see a resolution for any of the problems listed: http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg16097.html http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14792.html http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14916.html I have run PostgreSQL on Linux in the past and not had any problems. Is the win32 build generally considered stable or unstable for production use? Any help would be greatly appreciated! 1) PERMISSION DENIED ERROR This error occurred on the same day as the semctl started, but stopped occurring for a few hours before the semctl errors started. The following is an example: 2006-11-25 00:46:04 ERROR: could not open relation 1663/16404/84855: Permission denied 2006-11-25 00:46:05 ERROR: could not open relation 1663/16404/84855: Permission denied 2006-11-25 00:46:06 ERROR: could not open relation 1663/16404/84855: Permission denied 2006-11-25 00:46:07 ERROR: could not open relation 1663/16404/84855: Permission denied 2006-11-25 00:46:08 ERROR: could not open relation 1663/16404/84855: Permission denied 2006-11-25 00:46:09 ERROR: could not open relation 1663/16404/84855: Permission denied 2006-11-25 00:46:10 ERROR: could not open relation 1663/16404/84855: Permission denied 2006-11-25 00:46:11 ERROR: could not open relation 1663/16404/84855: Permission denied 2006-11-25 00:46:12 ERROR: could not open relation 1663/16404/84855: Permission denied 2) SEMCTL ERROR This error occurred over and over one day with the same pattern - several semctl errors, then the unexpected EOF. This resulted in clients being unable to create database connections. The error occurred overnight and into the next day, and did not disappear until postgres was restarted. The following is an example: 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) failed: A non-blocking socket operation could not be completed immediately. 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) failed: A non-blocking socket operation could not be completed immediately. 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) failed: A non-blocking socket operation could not be completed immediately. 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) failed: A non-blocking socket operation could not be completed immediately. 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) failed: A non-blocking socket operation could not be completed immediately. 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) failed: A non-blocking socket operation could not be completed immediately. 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) failed: A non-blocking socket operation could not be completed immediately. 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) failed: A non-blocking socket operation could not be completed immediately. 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) failed: A non-blocking socket operation could not be completed immediately. 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) failed: A non-blocking socket operation could not be completed immediately. 2006-11-25 22:10:03 LOG: could not receive data from client: No connection could be made because the target machine actively refused it. 2006-11-25 22:10:03 LOG: unexpected EOF on client connection 3) FSYNC ERROR I've seen this error several times in the past - including today. The following is an example: 2006-11-27 00:00:20 LOG: autovacuum: processing database "incommDashboard" 2006-11-27 00:00:20 LOG: could not fsync segment 0 of relation 1663/16404/89952: Permission denied 2006-11-27 00:00:20 ERROR: storage sync failed on magnetic disk: Permission denied 2006-11-27 00:00:24 LOG: could not fsync segment 0 of relation 1663/16404/89952: Permission denied 2006-11-27 00:00:24 ERROR: storage sync failed on magnetic disk: Permission denied 2006-11-27 00:00:26 LOG: could not fsync segment 0 of relation 1663/16404/89952: Permission denied 2006-11-27 00:00:26 ERROR: storage sync failed on magnetic disk: Permission denied 2006-11-27 00:00:29 LOG: could not fsync segment 0 of relation 1663/16404/89952: Permission denied 2006-11-27 00:00:29 ERROR: storage sync failed on magnetic disk: Permission denied 2006-11-27 00:00:32 LOG: could not fsync segment 0 of relation 1663/16404/89952: Permission denied 2006-11-27 00:00:32 ERROR: storage sync failed on magnetic disk: Permission denied 2006-11-27 00:00:42 LOG: could not fsync segment 0 of relation 1663/16404/89952: Permission denied 2006-11-27 00:00:42 ERROR: storage sync failed on magnetic disk: Permission denied
Per the FAQ, we suggest that you *uninstall* your antivirus. Especially if it has firewall-like functionality (like I beleive McAfee does). Just disabling the scan does *not* remove the filter drivers and does not make the antivirus not affect the database processes. So try this. If the problem doesn't go away, look for something else installed that might be interfernig with the normal operation of your windows install. //Magnus=20 > -----Original Message----- > From: pgsql-bugs-owner@postgresql.org=20 > [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Jeremy Haile > Sent: den 27 november 2006 15:21 > To: pgsql-bugs@postgresql.org > Subject: [BUGS] fsync and semctl errors with 8.1.5/win32 >=20 > I've been attempting to run PostgreSQL 8.1.5/win32 on a=20 > production deployment, but have started having many problems.=20 > McAfee Antivirus is installed and running, although I've=20 > excluded the entire drive where PostgreSQL is installed and=20 > where the data is installed. >=20 > I've received several errors in the past few days/weeks.=20=20 > They fall into three general categories 1) permission denied=20 > errors 2) semctl errors 3) fsync errors. I am not sure how=20 > to reproduce these errors locally - they seem to occur at=20 > unpredictable intervals. >=20 > The following posts seem related, although I don't see a=20 > resolution for any of the problems listed: > http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg16097.html > http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14792.html > http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14916.html >=20 > I have run PostgreSQL on Linux in the past and not had any=20 > problems. Is the win32 build generally considered stable or=20 > unstable for production use? Any help would be greatly appreciated! >=20 > 1) PERMISSION DENIED ERROR > This error occurred on the same day as the semctl started,=20 > but stopped occurring for a few hours before the semctl=20 > errors started. >=20 > The following is an example: > 2006-11-25 00:46:04 ERROR: could not open relation 1663/16404/84855: > Permission denied > 2006-11-25 00:46:05 ERROR: could not open relation 1663/16404/84855: > Permission denied > 2006-11-25 00:46:06 ERROR: could not open relation 1663/16404/84855: > Permission denied > 2006-11-25 00:46:07 ERROR: could not open relation 1663/16404/84855: > Permission denied > 2006-11-25 00:46:08 ERROR: could not open relation 1663/16404/84855: > Permission denied > 2006-11-25 00:46:09 ERROR: could not open relation 1663/16404/84855: > Permission denied > 2006-11-25 00:46:10 ERROR: could not open relation 1663/16404/84855: > Permission denied > 2006-11-25 00:46:11 ERROR: could not open relation 1663/16404/84855: > Permission denied > 2006-11-25 00:46:12 ERROR: could not open relation 1663/16404/84855: > Permission denied >=20 >=20 > 2) SEMCTL ERROR > This error occurred over and over one day with the same=20 > pattern - several semctl errors, then the unexpected EOF.=20=20 > This resulted in clients being unable to create database=20 > connections. The error occurred overnight and into the next=20 > day, and did not disappear until postgres was restarted.=20=20 >=20 > The following is an example: > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0)=20 > failed: A non-blocking socket operation could not be=20 > completed immediately. > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0)=20 > failed: A non-blocking socket operation could not be=20 > completed immediately. > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0)=20 > failed: A non-blocking socket operation could not be=20 > completed immediately. > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0)=20 > failed: A non-blocking socket operation could not be=20 > completed immediately. > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0)=20 > failed: A non-blocking socket operation could not be=20 > completed immediately. > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0)=20 > failed: A non-blocking socket operation could not be=20 > completed immediately. > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0)=20 > failed: A non-blocking socket operation could not be=20 > completed immediately. > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0)=20 > failed: A non-blocking socket operation could not be=20 > completed immediately. > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0)=20 > failed: A non-blocking socket operation could not be=20 > completed immediately. > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0)=20 > failed: A non-blocking socket operation could not be=20 > completed immediately. > 2006-11-25 22:10:03 LOG: could not receive data from client:=20 > No connection could be made because the target machine=20 > actively refused it. > 2006-11-25 22:10:03 LOG: unexpected EOF on client connection >=20 >=20 > 3) FSYNC ERROR > I've seen this error several times in the past - including today. >=20 > The following is an example: > 2006-11-27 00:00:20 LOG: autovacuum: processing database=20 > "incommDashboard" > 2006-11-27 00:00:20 LOG: could not fsync segment 0 of relation > 1663/16404/89952: Permission denied > 2006-11-27 00:00:20 ERROR: storage sync failed on magnetic disk: > Permission denied > 2006-11-27 00:00:24 LOG: could not fsync segment 0 of relation > 1663/16404/89952: Permission denied > 2006-11-27 00:00:24 ERROR: storage sync failed on magnetic disk: > Permission denied > 2006-11-27 00:00:26 LOG: could not fsync segment 0 of relation > 1663/16404/89952: Permission denied > 2006-11-27 00:00:26 ERROR: storage sync failed on magnetic disk: > Permission denied > 2006-11-27 00:00:29 LOG: could not fsync segment 0 of relation > 1663/16404/89952: Permission denied > 2006-11-27 00:00:29 ERROR: storage sync failed on magnetic disk: > Permission denied > 2006-11-27 00:00:32 LOG: could not fsync segment 0 of relation > 1663/16404/89952: Permission denied > 2006-11-27 00:00:32 ERROR: storage sync failed on magnetic disk: > Permission denied > 2006-11-27 00:00:42 LOG: could not fsync segment 0 of relation > 1663/16404/89952: Permission denied > 2006-11-27 00:00:42 ERROR: storage sync failed on magnetic disk: > Permission denied >=20 > ---------------------------(end of=20 > broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >=20
Thanks Magnus. I will uninstall the AntiVirus and see if my problems persist. I have disabled all other non-essential services, indexing, etc. so I don't know of anything else that could be causing the problems. However, in some of the posts I referred to, the poster indicated that they were not running antivirus software and still experienced the problems I'm having. I'll repost if I do or don't continue to experience problems after uninstalling the antivirus. On Mon, 27 Nov 2006 15:58:33 +0100, "Magnus Hagander" <mha@sollentuna.net> said: > Per the FAQ, we suggest that you *uninstall* your antivirus. Especially > if it has firewall-like functionality (like I beleive McAfee does). Just > disabling the scan does *not* remove the filter drivers and does not > make the antivirus not affect the database processes. So try this. If > the problem doesn't go away, look for something else installed that > might be interfernig with the normal operation of your windows install. > > //Magnus > > > -----Original Message----- > > From: pgsql-bugs-owner@postgresql.org > > [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Jeremy Haile > > Sent: den 27 november 2006 15:21 > > To: pgsql-bugs@postgresql.org > > Subject: [BUGS] fsync and semctl errors with 8.1.5/win32 > > > > I've been attempting to run PostgreSQL 8.1.5/win32 on a > > production deployment, but have started having many problems. > > McAfee Antivirus is installed and running, although I've > > excluded the entire drive where PostgreSQL is installed and > > where the data is installed. > > > > I've received several errors in the past few days/weeks. > > They fall into three general categories 1) permission denied > > errors 2) semctl errors 3) fsync errors. I am not sure how > > to reproduce these errors locally - they seem to occur at > > unpredictable intervals. > > > > The following posts seem related, although I don't see a > > resolution for any of the problems listed: > > http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg16097.html > > http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14792.html > > http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14916.html > > > > I have run PostgreSQL on Linux in the past and not had any > > problems. Is the win32 build generally considered stable or > > unstable for production use? Any help would be greatly appreciated! > > > > 1) PERMISSION DENIED ERROR > > This error occurred on the same day as the semctl started, > > but stopped occurring for a few hours before the semctl > > errors started. > > > > The following is an example: > > 2006-11-25 00:46:04 ERROR: could not open relation 1663/16404/84855: > > Permission denied > > 2006-11-25 00:46:05 ERROR: could not open relation 1663/16404/84855: > > Permission denied > > 2006-11-25 00:46:06 ERROR: could not open relation 1663/16404/84855: > > Permission denied > > 2006-11-25 00:46:07 ERROR: could not open relation 1663/16404/84855: > > Permission denied > > 2006-11-25 00:46:08 ERROR: could not open relation 1663/16404/84855: > > Permission denied > > 2006-11-25 00:46:09 ERROR: could not open relation 1663/16404/84855: > > Permission denied > > 2006-11-25 00:46:10 ERROR: could not open relation 1663/16404/84855: > > Permission denied > > 2006-11-25 00:46:11 ERROR: could not open relation 1663/16404/84855: > > Permission denied > > 2006-11-25 00:46:12 ERROR: could not open relation 1663/16404/84855: > > Permission denied > > > > > > 2) SEMCTL ERROR > > This error occurred over and over one day with the same > > pattern - several semctl errors, then the unexpected EOF. > > This resulted in clients being unable to create database > > connections. The error occurred overnight and into the next > > day, and did not disappear until postgres was restarted. > > > > The following is an example: > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > failed: A non-blocking socket operation could not be > > completed immediately. > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > failed: A non-blocking socket operation could not be > > completed immediately. > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > failed: A non-blocking socket operation could not be > > completed immediately. > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > failed: A non-blocking socket operation could not be > > completed immediately. > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > failed: A non-blocking socket operation could not be > > completed immediately. > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > failed: A non-blocking socket operation could not be > > completed immediately. > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > failed: A non-blocking socket operation could not be > > completed immediately. > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > failed: A non-blocking socket operation could not be > > completed immediately. > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > failed: A non-blocking socket operation could not be > > completed immediately. > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > failed: A non-blocking socket operation could not be > > completed immediately. > > 2006-11-25 22:10:03 LOG: could not receive data from client: > > No connection could be made because the target machine > > actively refused it. > > 2006-11-25 22:10:03 LOG: unexpected EOF on client connection > > > > > > 3) FSYNC ERROR > > I've seen this error several times in the past - including today. > > > > The following is an example: > > 2006-11-27 00:00:20 LOG: autovacuum: processing database > > "incommDashboard" > > 2006-11-27 00:00:20 LOG: could not fsync segment 0 of relation > > 1663/16404/89952: Permission denied > > 2006-11-27 00:00:20 ERROR: storage sync failed on magnetic disk: > > Permission denied > > 2006-11-27 00:00:24 LOG: could not fsync segment 0 of relation > > 1663/16404/89952: Permission denied > > 2006-11-27 00:00:24 ERROR: storage sync failed on magnetic disk: > > Permission denied > > 2006-11-27 00:00:26 LOG: could not fsync segment 0 of relation > > 1663/16404/89952: Permission denied > > 2006-11-27 00:00:26 ERROR: storage sync failed on magnetic disk: > > Permission denied > > 2006-11-27 00:00:29 LOG: could not fsync segment 0 of relation > > 1663/16404/89952: Permission denied > > 2006-11-27 00:00:29 ERROR: storage sync failed on magnetic disk: > > Permission denied > > 2006-11-27 00:00:32 LOG: could not fsync segment 0 of relation > > 1663/16404/89952: Permission denied > > 2006-11-27 00:00:32 ERROR: storage sync failed on magnetic disk: > > Permission denied > > 2006-11-27 00:00:42 LOG: could not fsync segment 0 of relation > > 1663/16404/89952: Permission denied > > 2006-11-27 00:00:42 ERROR: storage sync failed on magnetic disk: > > Permission denied > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 9: In versions below 8.0, the planner will ignore your desire to > > choose an index scan if your joining column's datatypes do not > > match > >
I've gotten pushback from my organization on removing antivirus from the servers completely. Are there any antiviruses that are known to be compatible with PostgreSQL/win32? On Mon, 27 Nov 2006 10:28:23 -0500, "Jeremy Haile" <jhaile@fastmail.fm> said: > Thanks Magnus. > > I will uninstall the AntiVirus and see if my problems persist. I have > disabled all other non-essential services, indexing, etc. so I don't > know of anything else that could be causing the problems. However, in > some of the posts I referred to, the poster indicated that they were not > running antivirus software and still experienced the problems I'm > having. > > I'll repost if I do or don't continue to experience problems after > uninstalling the antivirus. > > On Mon, 27 Nov 2006 15:58:33 +0100, "Magnus Hagander" > <mha@sollentuna.net> said: > > Per the FAQ, we suggest that you *uninstall* your antivirus. Especially > > if it has firewall-like functionality (like I beleive McAfee does). Just > > disabling the scan does *not* remove the filter drivers and does not > > make the antivirus not affect the database processes. So try this. If > > the problem doesn't go away, look for something else installed that > > might be interfernig with the normal operation of your windows install. > > > > //Magnus > > > > > -----Original Message----- > > > From: pgsql-bugs-owner@postgresql.org > > > [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Jeremy Haile > > > Sent: den 27 november 2006 15:21 > > > To: pgsql-bugs@postgresql.org > > > Subject: [BUGS] fsync and semctl errors with 8.1.5/win32 > > > > > > I've been attempting to run PostgreSQL 8.1.5/win32 on a > > > production deployment, but have started having many problems. > > > McAfee Antivirus is installed and running, although I've > > > excluded the entire drive where PostgreSQL is installed and > > > where the data is installed. > > > > > > I've received several errors in the past few days/weeks. > > > They fall into three general categories 1) permission denied > > > errors 2) semctl errors 3) fsync errors. I am not sure how > > > to reproduce these errors locally - they seem to occur at > > > unpredictable intervals. > > > > > > The following posts seem related, although I don't see a > > > resolution for any of the problems listed: > > > http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg16097.html > > > http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14792.html > > > http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14916.html > > > > > > I have run PostgreSQL on Linux in the past and not had any > > > problems. Is the win32 build generally considered stable or > > > unstable for production use? Any help would be greatly appreciated! > > > > > > 1) PERMISSION DENIED ERROR > > > This error occurred on the same day as the semctl started, > > > but stopped occurring for a few hours before the semctl > > > errors started. > > > > > > The following is an example: > > > 2006-11-25 00:46:04 ERROR: could not open relation 1663/16404/84855: > > > Permission denied > > > 2006-11-25 00:46:05 ERROR: could not open relation 1663/16404/84855: > > > Permission denied > > > 2006-11-25 00:46:06 ERROR: could not open relation 1663/16404/84855: > > > Permission denied > > > 2006-11-25 00:46:07 ERROR: could not open relation 1663/16404/84855: > > > Permission denied > > > 2006-11-25 00:46:08 ERROR: could not open relation 1663/16404/84855: > > > Permission denied > > > 2006-11-25 00:46:09 ERROR: could not open relation 1663/16404/84855: > > > Permission denied > > > 2006-11-25 00:46:10 ERROR: could not open relation 1663/16404/84855: > > > Permission denied > > > 2006-11-25 00:46:11 ERROR: could not open relation 1663/16404/84855: > > > Permission denied > > > 2006-11-25 00:46:12 ERROR: could not open relation 1663/16404/84855: > > > Permission denied > > > > > > > > > 2) SEMCTL ERROR > > > This error occurred over and over one day with the same > > > pattern - several semctl errors, then the unexpected EOF. > > > This resulted in clients being unable to create database > > > connections. The error occurred overnight and into the next > > > day, and did not disappear until postgres was restarted. > > > > > > The following is an example: > > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > > failed: A non-blocking socket operation could not be > > > completed immediately. > > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > > failed: A non-blocking socket operation could not be > > > completed immediately. > > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > > failed: A non-blocking socket operation could not be > > > completed immediately. > > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > > failed: A non-blocking socket operation could not be > > > completed immediately. > > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > > failed: A non-blocking socket operation could not be > > > completed immediately. > > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > > failed: A non-blocking socket operation could not be > > > completed immediately. > > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > > failed: A non-blocking socket operation could not be > > > completed immediately. > > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > > failed: A non-blocking socket operation could not be > > > completed immediately. > > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > > failed: A non-blocking socket operation could not be > > > completed immediately. > > > 2006-11-25 22:10:03 FATAL: semctl(167238064, 15, SETVAL, 0) > > > failed: A non-blocking socket operation could not be > > > completed immediately. > > > 2006-11-25 22:10:03 LOG: could not receive data from client: > > > No connection could be made because the target machine > > > actively refused it. > > > 2006-11-25 22:10:03 LOG: unexpected EOF on client connection > > > > > > > > > 3) FSYNC ERROR > > > I've seen this error several times in the past - including today. > > > > > > The following is an example: > > > 2006-11-27 00:00:20 LOG: autovacuum: processing database > > > "incommDashboard" > > > 2006-11-27 00:00:20 LOG: could not fsync segment 0 of relation > > > 1663/16404/89952: Permission denied > > > 2006-11-27 00:00:20 ERROR: storage sync failed on magnetic disk: > > > Permission denied > > > 2006-11-27 00:00:24 LOG: could not fsync segment 0 of relation > > > 1663/16404/89952: Permission denied > > > 2006-11-27 00:00:24 ERROR: storage sync failed on magnetic disk: > > > Permission denied > > > 2006-11-27 00:00:26 LOG: could not fsync segment 0 of relation > > > 1663/16404/89952: Permission denied > > > 2006-11-27 00:00:26 ERROR: storage sync failed on magnetic disk: > > > Permission denied > > > 2006-11-27 00:00:29 LOG: could not fsync segment 0 of relation > > > 1663/16404/89952: Permission denied > > > 2006-11-27 00:00:29 ERROR: storage sync failed on magnetic disk: > > > Permission denied > > > 2006-11-27 00:00:32 LOG: could not fsync segment 0 of relation > > > 1663/16404/89952: Permission denied > > > 2006-11-27 00:00:32 ERROR: storage sync failed on magnetic disk: > > > Permission denied > > > 2006-11-27 00:00:42 LOG: could not fsync segment 0 of relation > > > 1663/16404/89952: Permission denied > > > 2006-11-27 00:00:42 ERROR: storage sync failed on magnetic disk: > > > Permission denied > > > > > > ---------------------------(end of > > > broadcast)--------------------------- > > > TIP 9: In versions below 8.0, the planner will ignore your desire to > > > choose an index scan if your joining column's datatypes do not > > > match > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
Jeremy Haile wrote: > I've gotten pushback from my organization on removing antivirus from the > servers completely. Are there any antiviruses that are known to be > compatible with PostgreSQL/win32? All my boxes (2 build farm members, 1 production server, and the laptop on which the official releases are built and tested) run Sophos Anti Virus (http://www.sophos.com/products/es/endpoint/sav.html), with no problems. Regards, Dave
Thanks for the feedback. If you don't mind, what version of PostgreSQL are you running? I'm trying to bring PostgreSQL into this company - they are primarily a Windows/SQL Server shop (although Java software development) I've already gotten comments similar to "Why don't you just switch to SQL Server?" - so I'm hoping to find a workaround before I get forced to switch DB platforms. As it is, my application seems unreliable because I haven't been able to resolve the PostgreSQL hanging problems in Windows. If I had my way, I'd switch the server to Linux - but alas, that hasn't been an option so far. I know this may be the wrong list to ask this question on - but as I'm an outspoken PostgreSQL advocate, I'd like your opinions. If I am unable to resolve these PostgreSQL issues given my constraints, will I likely have less problems running MySQL/InnoDB on Windows? (since it has had a native Windows build for much longer) On Mon, 27 Nov 2006 16:40:57 +0000, "Dave Page" <dpage@postgresql.org> said: > Jeremy Haile wrote: > > I've gotten pushback from my organization on removing antivirus from the > > servers completely. Are there any antiviruses that are known to be > > compatible with PostgreSQL/win32? > > All my boxes (2 build farm members, 1 production server, and the laptop > on which the official releases are built and tested) run Sophos Anti > Virus (http://www.sophos.com/products/es/endpoint/sav.html), with no > problems. > > Regards, Dave > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend
OK - after uninstalling the virus scanner (McAfee), I still get the same disk access errors. Here's a few seconds of the log output (this has been going on for 10 mins as of this e-mail being sent): 2006-11-28 16:16:10 LOG: could not fsync segment 0 of relation 1663/16404/30267: Permission denied 2006-11-28 16:16:10 ERROR: storage sync failed on magnetic disk: Permission denied 2006-11-28 16:16:11 LOG: could not fsync segment 0 of relation 1663/16404/30267: Permission denied 2006-11-28 16:16:11 ERROR: storage sync failed on magnetic disk: Permission denied 2006-11-28 16:16:12 LOG: could not fsync segment 0 of relation 1663/16404/30267: Permission denied 2006-11-28 16:16:12 ERROR: storage sync failed on magnetic disk: Permission denied 2006-11-28 16:16:13 LOG: could not fsync segment 0 of relation 1663/16404/30267: Permission denied 2006-11-28 16:16:13 ERROR: storage sync failed on magnetic disk: Permission denied Here's the FileMon output from the same seconds: 4:16:10 PM postgres.exe:3168 OPEN C:\Program Files\PostgreSQL\8.1\data\base\16404\30267 DELETE PEND Options: Open Access: 0012019F 4:16:11 PM postgres.exe:3168 OPEN C:\Program Files\PostgreSQL\8.1\data\base\16404\30267 DELETE PEND Options: Open Access: 0012019F 4:16:12 PM postgres.exe:3168 OPEN C:\Program Files\PostgreSQL\8.1\data\base\16404\30267 DELETE PEND Options: Open Access: 0012019F 4:16:13 PM postgres.exe:3168 OPEN C:\Program Files\PostgreSQL\8.1\data\base\16404\30267 DELETE PEND Options: Open Access: 0012019F This is an incredibly bad problem for me. I'd really appreciate any help! Jeremy On Mon, 27 Nov 2006 12:14:00 -0500, "Jeremy Haile" <jhaile@fastmail.fm> said: > Thanks for the feedback. If you don't mind, what version of PostgreSQL > are you running? > > I'm trying to bring PostgreSQL into this company - they are primarily a > Windows/SQL Server shop (although Java software development) I've > already gotten comments similar to "Why don't you just switch to SQL > Server?" - so I'm hoping to find a workaround before I get forced to > switch DB platforms. As it is, my application seems unreliable because > I haven't been able to resolve the PostgreSQL hanging problems in > Windows. If I had my way, I'd switch the server to Linux - but alas, > that hasn't been an option so far. > > I know this may be the wrong list to ask this question on - but as I'm > an outspoken PostgreSQL advocate, I'd like your opinions. If I am > unable to resolve these PostgreSQL issues given my constraints, will I > likely have less problems running MySQL/InnoDB on Windows? (since it has > had a native Windows build for much longer) > > > On Mon, 27 Nov 2006 16:40:57 +0000, "Dave Page" <dpage@postgresql.org> > said: > > Jeremy Haile wrote: > > > I've gotten pushback from my organization on removing antivirus from the > > > servers completely. Are there any antiviruses that are known to be > > > compatible with PostgreSQL/win32? > > > > All my boxes (2 build farm members, 1 production server, and the laptop > > on which the official releases are built and tested) run Sophos Anti > > Virus (http://www.sophos.com/products/es/endpoint/sav.html), with no > > problems. > > > > Regards, Dave > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: explain analyze is your friend > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
I forgot to mention - this problem is occurring on multiple Windows machines. One of them is running Windows XP Professional. The other is running Windows Server 2003. I have disabled indexing, virus scanning, and all non-essential services on both of them. The problem continues to show up even when no queries are being run (although it might always start while queries are running) On Tue, 28 Nov 2006 16:18:56 -0500, "Jeremy Haile" <jhaile@fastmail.fm> said: > OK - after uninstalling the virus scanner (McAfee), I still get the same > disk access errors. > > Here's a few seconds of the log output (this has been going on for 10 > mins as of this e-mail being sent): > 2006-11-28 16:16:10 LOG: could not fsync segment 0 of relation > 1663/16404/30267: Permission denied > 2006-11-28 16:16:10 ERROR: storage sync failed on magnetic disk: > Permission denied > 2006-11-28 16:16:11 LOG: could not fsync segment 0 of relation > 1663/16404/30267: Permission denied > 2006-11-28 16:16:11 ERROR: storage sync failed on magnetic disk: > Permission denied > 2006-11-28 16:16:12 LOG: could not fsync segment 0 of relation > 1663/16404/30267: Permission denied > 2006-11-28 16:16:12 ERROR: storage sync failed on magnetic disk: > Permission denied > 2006-11-28 16:16:13 LOG: could not fsync segment 0 of relation > 1663/16404/30267: Permission denied > 2006-11-28 16:16:13 ERROR: storage sync failed on magnetic disk: > Permission denied > > > Here's the FileMon output from the same seconds: > 4:16:10 PM postgres.exe:3168 OPEN C:\Program > Files\PostgreSQL\8.1\data\base\16404\30267 DELETE PEND Options: > Open Access: 0012019F > 4:16:11 PM postgres.exe:3168 OPEN C:\Program > Files\PostgreSQL\8.1\data\base\16404\30267 DELETE PEND Options: > Open Access: 0012019F > 4:16:12 PM postgres.exe:3168 OPEN C:\Program > Files\PostgreSQL\8.1\data\base\16404\30267 DELETE PEND Options: > Open Access: 0012019F > 4:16:13 PM postgres.exe:3168 OPEN C:\Program > Files\PostgreSQL\8.1\data\base\16404\30267 DELETE PEND Options: > Open Access: 0012019F > > > This is an incredibly bad problem for me. I'd really appreciate any > help! > > Jeremy > > > On Mon, 27 Nov 2006 12:14:00 -0500, "Jeremy Haile" <jhaile@fastmail.fm> > said: > > Thanks for the feedback. If you don't mind, what version of PostgreSQL > > are you running? > > > > I'm trying to bring PostgreSQL into this company - they are primarily a > > Windows/SQL Server shop (although Java software development) I've > > already gotten comments similar to "Why don't you just switch to SQL > > Server?" - so I'm hoping to find a workaround before I get forced to > > switch DB platforms. As it is, my application seems unreliable because > > I haven't been able to resolve the PostgreSQL hanging problems in > > Windows. If I had my way, I'd switch the server to Linux - but alas, > > that hasn't been an option so far. > > > > I know this may be the wrong list to ask this question on - but as I'm > > an outspoken PostgreSQL advocate, I'd like your opinions. If I am > > unable to resolve these PostgreSQL issues given my constraints, will I > > likely have less problems running MySQL/InnoDB on Windows? (since it has > > had a native Windows build for much longer) > > > > > > On Mon, 27 Nov 2006 16:40:57 +0000, "Dave Page" <dpage@postgresql.org> > > said: > > > Jeremy Haile wrote: > > > > I've gotten pushback from my organization on removing antivirus from the > > > > servers completely. Are there any antiviruses that are known to be > > > > compatible with PostgreSQL/win32? > > > > > > All my boxes (2 build farm members, 1 production server, and the laptop > > > on which the official releases are built and tested) run Sophos Anti > > > Virus (http://www.sophos.com/products/es/endpoint/sav.html), with no > > > problems. > > > > > > Regards, Dave > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 6: explain analyze is your friend > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 5: don't forget to increase your free space map settings
>I forgot to mention - this problem is occurring on multiple Windows > machines. One of them is running Windows XP Professional. The other is > running Windows Server 2003. I have disabled indexing, virus scanning, > and all non-essential services on both of them. The problem continues > to show up even when no queries are being run (although it might always > start while queries are running) seems exactly what i'm noticing since 8.2x on windows 2003 as well - no disk services (backup, virus, ...) are running that would block files, and processmon/filemon always show that the files in question are locked by pgsql processes... under higher insert/update load, the errors appear more often here, do you experience the same finding when loading bulk data? - thomas
"Jeremy Haile" <jhaile@fastmail.fm> writes: > Here's a few seconds of the log output (this has been going on for 10 > mins as of this e-mail being sent): > 2006-11-28 16:16:10 LOG: could not fsync segment 0 of relation > 1663/16404/30267: Permission denied > 2006-11-28 16:16:10 ERROR: storage sync failed on magnetic disk: > Permission denied > Here's the FileMon output from the same seconds: > 4:16:10 PM postgres.exe:3168 OPEN C:\Program > Files\PostgreSQL\8.1\data\base\16404\30267 DELETE PEND Options: > Open Access: 0012019F I still don't want to make mdsync() treat EACCES as an ignorable error. However, in this situation we've got an infinite loop because the checkpoint will never succeed and thus the bgwriter will never reach smgrcloseall(), which seems to be what's needed to allow the deleted file to die the real death. Perhaps a suitable workaround would be to make the bgwriter do smgrcloseall in its error recovery path? That is /* * Sleep at least 1 second after any error. A write error is likely * to be repeated, and we don't want to be filling the error logs as * fast as we can. */ pg_usleep(1000000L); + + /* Drop open files to allow deleted files to really go away */ + smgrcloseall(); } /* We can now handle ereport(ERROR) */ PG_exception_stack = &local_sigjmp_buf; Perhaps this should be #ifdef WIN32, although there's probably no harm in doing it on Unixen too. Can someone test this idea? regards, tom lane
Yes - processmon always shows the files being locked by postgres.exe processes. My database is being used as a data warehouse, so about all I am doing is bulk insert/updates. I have a job that runs every 5 minutes and loads data into the database. I typically load between 10,000 and 100,000 rows every 5 minutes into my fact tables, although I also make use of transition tables, dimension tables, etc. that get inserts/updates as well. I see the problem occur 3-4 times a day on average, but I don't know how to reproduce it other than letting it run for a while. On Tue, 28 Nov 2006 22:31:42 +0100, "Thomas H." <me@alternize.com> said: > >I forgot to mention - this problem is occurring on multiple Windows > > machines. One of them is running Windows XP Professional. The other is > > running Windows Server 2003. I have disabled indexing, virus scanning, > > and all non-essential services on both of them. The problem continues > > to show up even when no queries are being run (although it might always > > start while queries are running) > > seems exactly what i'm noticing since 8.2x on windows 2003 as well - no > disk > services (backup, virus, ...) are running that would block files, and > processmon/filemon always show that the files in question are locked by > pgsql processes... > > under higher insert/update load, the errors appear more often here, do > you > experience the same finding when loading bulk data? > > - thomas > >
> Perhaps this should be #ifdef WIN32, although there's probably no harm > in doing it on Unixen too. Can someone test this idea? if magnus/dave could provide me a patched rc1 exe, i could run it in our semi-productive environment for some tests. - thomas
I am currently running 8.1.5, but I'm willing to upgrade to whatever version, use a patched exe, etc. Just let me know what I need to do. On Tue, 28 Nov 2006 23:39:00 +0100, "Thomas H." <me@alternize.com> said: > > Perhaps this should be #ifdef WIN32, although there's probably no harm > > in doing it on Unixen too. Can someone test this idea? > > if magnus/dave could provide me a patched rc1 exe, i could run it in our > semi-productive environment for some tests. > > - thomas > >
Last night I received another filesystem-related problem. Shortly after this problem occurred, I had a connection hang indefinitely, causing my software to go down all night. The log output that occurred shortly before the problem is below. After that, there was no log output by PostgreSQL until I came in this morning and killed the offending process. Any ideas? Could the pg_xlog error below possibly result in a transaction hanging indefinitely? If so, would the solution Tom proposed possibly fix the error? Any update on getting a patched exe for Thomas and I to test? 2006-11-29 20:11:35 ERROR: tuple concurrently updated 2006-11-29 20:11:35 ERROR: tuple concurrently updated 2006-11-29 20:11:35 ERROR: tuple concurrently updated 2006-11-29 20:11:36 LOG: transaction ID wrap limit is 1090292093, limited by database "incommDashboard" 2006-11-29 20:11:36 LOG: transaction ID wrap limit is 1090292093, limited by database "incommDashboard" 2006-11-29 21:21:38 LOG: transaction ID wrap limit is 1090522044, limited by database "incommDashboard" 2006-11-29 21:21:38 LOG: transaction ID wrap limit is 1090522044, limited by database "incommDashboard" 2006-11-29 22:22:52 LOG: transaction ID wrap limit is 1090579373, limited by database "incommDashboard" 2006-11-29 22:22:52 LOG: transaction ID wrap limit is 1090579373, limited by database "incommDashboard" 2006-11-29 23:38:47 LOG: transaction ID wrap limit is 1090633937, limited by database "incommDashboard" 2006-11-29 23:38:47 LOG: transaction ID wrap limit is 1090633937, limited by database "incommDashboard" 2006-11-29 23:57:52 LOG: could not rename file "pg_xlog/00000001000000190000005E" to "pg_xlog/00000001000000190000007F", continuing to try On Tue, 28 Nov 2006 17:49:22 -0500, "Jeremy Haile" <jhaile@fastmail.fm> said: > I am currently running 8.1.5, but I'm willing to upgrade to whatever > version, use a patched exe, etc. Just let me know what I need to do. > > On Tue, 28 Nov 2006 23:39:00 +0100, "Thomas H." <me@alternize.com> said: > > > Perhaps this should be #ifdef WIN32, although there's probably no harm > > > in doing it on Unixen too. Can someone test this idea? > > > > if magnus/dave could provide me a patched rc1 exe, i could run it in our > > semi-productive environment for some tests. > > > > - thomas > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match
> 2006-11-29 23:57:52 LOG: could not rename file > "pg_xlog/00000001000000190000005E" to > "pg_xlog/00000001000000190000007F", continuing to try i had this one as well. good news is: this bug is fixed in 8.2 - thomas
Really? That's great news. Maybe I should start testing with 8.2 today. Did you run into problems where transactions would hang? If so, did those disappear in 8.2? On Thu, 30 Nov 2006 15:14:46 +0100, "Thomas H." <me@alternize.com> said: > > 2006-11-29 23:57:52 LOG: could not rename file > > "pg_xlog/00000001000000190000005E" to > > "pg_xlog/00000001000000190000007F", continuing to try > > i had this one as well. good news is: this bug is fixed in 8.2 > > - thomas >
> Did you run into problems where transactions would hang? If so, did > those disappear in 8.2? well, i wasn't really able to exactly determine under what conditions that xlog bug appeared in our case. tho it always was when lots of data is imported at once within one transaction. under normal load i've never seen the xlog bug. as far as i know it was some sort of lifelock: as with the other error messages, another postgres.exe kept a lock of the xlog file, which the bgwriter-process wanted to rename which lead to the complete halt of the db system, due to the importance of xlog/bgwriter. you can force an unload of the locked xlog file handle in processmon, and postgresql will resume "normally". i had a transaction lately that created 7gb of xlog-files (vacuum full of a mid-sized table) without any xlog-lockup, so i guess this problem is really fixed in the latest 8.2 build :-) if you have hanging transactions but other db activity works well, i would rather guess its a side effect of the other file problems with the relation-files that can't be renamed. i've never been able to see any impact of that error message. even when it appears 10 times a second everything seems "ok". but on the other side, in our case, we use the database as a web backend and have always around 20-30 concurrent connections, so its hard to debug. - thomas
> We were also running it on Windows Server 2003. We ended up rolling > back service pack 1 and it seems to have taken care of the hanging > transactions and we haven't seen a semctl error in awhile. interesting. we're using sp1 & pgsql since day 1 and the problem only started when testing 8.2b1. but on the other hand, it might be that a hotfix is the cause for this error, as i haven't seen it before aug/sept 06. i sure would have noticed... - thomas
Jeremy, My company runs a 200 gig data warehouse. We are running 8.1.2. We were seeing hanging transactions and occasional semctl errors. We were also running it on Windows Server 2003. We ended up rolling back service pack 1 and it seems to have taken care of the hanging transactions and we haven't seen a semctl error in a while. Worth a shot if it applies to you. Brad Russell Programmer Analyst NPC International -----Original Message----- From: pgsql-bugs-owner@postgresql.org [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Thomas H. Sent: Thursday, November 30, 2006 9:11 AM To: Jeremy Haile; pgsql-bugs@postgresql.org Subject: Re: [BUGS] fsync and semctl errors with 8.1.5/win32 > Did you run into problems where transactions would hang? If so, did > those disappear in 8.2? well, i wasn't really able to exactly determine under what conditions that xlog bug appeared in our case. tho it always was when lots of data is imported at once within one transaction. under normal load i've never seen the xlog bug. as far as i know it was some sort of lifelock: as with the other error messages, another postgres.exe kept a lock of the xlog file, which the bgwriter-process wanted to rename which lead to the complete halt of the db system, due to the importance of xlog/bgwriter. you can force an unload of the locked xlog file handle in processmon, and postgresql will resume "normally". i had a transaction lately that created 7gb of xlog-files (vacuum full of a mid-sized table) without any xlog-lockup, so i guess this problem is really fixed in the latest 8.2 build :-) if you have hanging transactions but other db activity works well, i would rather guess its a side effect of the other file problems with the relation-files that can't be renamed. i've never been able to see any impact of that error message. even when it appears 10 times a second everything seems "ok". but on the other side, in our case, we use the database as a web backend and have always around 20-30 concurrent connections, so its hard to debug. - thomas ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend
In our case the hanging transactions was killing our in house merge process that merges the data from our stores to the warehouse. When we rolled back sp1 no hanging transactions. When we applied sp1 the hanging started again. We took sp1 off and on a couple times and it happened every time. -----Original Message----- From: Thomas H. [mailto:me@alternize.com] Sent: Thursday, November 30, 2006 10:04 AM To: Bradley Russell Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] fsync and semctl errors with 8.1.5/win32 > We were also running it on Windows Server 2003. We ended up rolling > back service pack 1 and it seems to have taken care of the hanging > transactions and we haven't seen a semctl error in awhile. interesting. we're using sp1 & pgsql since day 1 and the problem only started when testing 8.2b1. but on the other hand, it might be that a hotfix is the cause for this error, as i haven't seen it before aug/sept 06. i sure would have noticed... - thomas
I've upgraded to 8.2rc1. I still receive fsync errors all the time, but I haven't had it freeze yet. With 8.1.5 I was able to reproduce the freeze very quickly, so this is a good sign! Only time will tell if it's truly resolved - but I'll post updates to this list either way. I'm still open to testing the fsync fix that Tom proposed if I can get a patched exe. On Thu, 30 Nov 2006 10:15:50 -0600, "Bradley Russell" <bradley.russell@npcinternational.com> said: > In our case the hanging transactions was killing our in house merge > process that merges the data from our stores to the warehouse. > > When we rolled back sp1 no hanging transactions. When we applied sp1 > the hanging started again. We took sp1 off and on a couple times and it > happened every time. > > -----Original Message----- > From: Thomas H. [mailto:me@alternize.com] > Sent: Thursday, November 30, 2006 10:04 AM > To: Bradley Russell > Cc: pgsql-bugs@postgresql.org > Subject: Re: [BUGS] fsync and semctl errors with 8.1.5/win32 > > > > We were also running it on Windows Server 2003. We ended up rolling > > back service pack 1 and it seems to have taken care of the hanging > > transactions and we haven't seen a semctl error in awhile. > > interesting. we're using sp1 & pgsql since day 1 and the problem only > started when testing 8.2b1. but on the other hand, it might be that a > hotfix > is the cause for this error, as i haven't seen it before aug/sept 06. i > sure > would have noticed... > > - thomas > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org
>> Here's a few seconds of the log output (this has been going on for 10 >> mins as of this e-mail being sent): >> 2006-11-28 16:16:10 LOG: could not fsync segment 0 of relation >> 1663/16404/30267: Permission denied >> 2006-11-28 16:16:10 ERROR: storage sync failed on magnetic disk: >> Permission denied > >> Here's the FileMon output from the same seconds: >> 4:16:10 PM postgres.exe:3168 OPEN C:\Program >> Files\PostgreSQL\8.1\data\base\16404\30267 DELETE PEND Options: >> Open Access: 0012019F > > I still don't want to make mdsync() treat EACCES as an ignorable error. > However, in this situation we've got an infinite loop because the > checkpoint will never succeed and thus the bgwriter will never reach > smgrcloseall(), which seems to be what's needed to allow the deleted > file to die the real death. > > Perhaps a suitable workaround would be to make the bgwriter do > smgrcloseall in its error recovery path? That is > > /* > * Sleep at least 1 second after any error. A write error is likely > * to be repeated, and we don't want to be filling the error logs as > * fast as we can. > */ > pg_usleep(1000000L); > + > + /* Drop open files to allow deleted files to really go away */ > + smgrcloseall(); > } > > /* We can now handle ereport(ERROR) */ > PG_exception_stack = &local_sigjmp_buf; > > > Perhaps this should be #ifdef WIN32, although there's probably no harm > in doing it on Unixen too. Can someone test this idea? > in 8.2.0 the error messages changed a bit: 2006-12-05 03:47:12 [736] LOG: could not fsync segment 0 of relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:12 [736] ERROR: storage sync failed on magnetic disk: Permission denied 2006-12-05 03:47:13 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:14 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:15 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:16 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:17 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:18 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:19 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:20 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:21 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:22 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:23 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:24 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:25 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:26 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:27 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:28 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:29 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:30 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:31 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:32 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:47:33 [736] ERROR: could not open relation 1663/16692/2361629: Permission denied 2006-12-05 03:52:34 [736] LOG: could not fsync segment 0 of relation 1663/16692/2361668: Permission denied 2006-12-05 03:52:34 [736] ERROR: storage sync failed on magnetic disk: Permission denied 2006-12-05 03:52:35 [736] ERROR: could not open relation 1663/16692/2361668: Permission denied 2006-12-05 03:52:36 [736] ERROR: could not open relation 1663/16692/2361668: Permission denied 2006-12-05 03:52:37 [736] ERROR: could not open relation 1663/16692/2361668: Permission denied 2006-12-05 03:52:38 [736] ERROR: could not open relation 1663/16692/2361668: Permission denied 2006-12-05 03:52:39 [736] ERROR: could not open relation 1663/16692/2361668: Permission denied 2006-12-05 03:52:40 [736] ERROR: could not open relation 1663/16692/2361668: Permission denied 2006-12-05 03:52:41 [736] ERROR: could not open relation 1663/16692/2361668: Permission denied ... and so on. - thomas
"Thomas H." <me@alternize.com> writes: > in 8.2.0 the error messages changed a bit: > 2006-12-05 03:47:12 [736] LOG: could not fsync segment 0 of relation > 1663/16692/2361629: Permission denied > 2006-12-05 03:47:12 [736] ERROR: storage sync failed on magnetic disk: > Permission denied > 2006-12-05 03:47:13 [736] ERROR: could not open relation > 1663/16692/2361629: Permission denied > 2006-12-05 03:47:14 [736] ERROR: could not open relation > 1663/16692/2361629: Permission denied So what's holding the file open now? It's evidently not the bgwriter. regards, tom lane
>> in 8.2.0 the error messages changed a bit: > >> 2006-12-05 03:47:12 [736] LOG: could not fsync segment 0 of relation >> 1663/16692/2361629: Permission denied >> 2006-12-05 03:47:12 [736] ERROR: storage sync failed on magnetic disk: >> Permission denied >> 2006-12-05 03:47:13 [736] ERROR: could not open relation >> 1663/16692/2361629: Permission denied >> 2006-12-05 03:47:14 [736] ERROR: could not open relation >> 1663/16692/2361629: Permission denied > > So what's holding the file open now? It's evidently not the bgwriter. one of the unnamed postgresql.exe processes from the connection pool: postgres: db_outnow outnow 127.0.0.1(3384) idle might be related: in addition to the above messages, the log is now also flooded by: 2006-12-05 04:16:29 [5196] LOG: could not rename temporary statistics file "global/pgstat.tmp" to "global/pgstat.stat": A blocking operation was interrupted by a call to WSACancelBlockingCall. there is no pgstat.tmp file in global... - thomas
>> 2006-12-05 03:47:12 [736] LOG: could not fsync segment 0 of relation >> 1663/16692/2361629: Permission denied >> 2006-12-05 03:47:12 [736] ERROR: storage sync failed on magnetic disk: >> Permission denied >> 2006-12-05 03:47:13 [736] ERROR: could not open relation >> 1663/16692/2361629: Permission denied >> 2006-12-05 03:47:14 [736] ERROR: could not open relation >> 1663/16692/2361629: Permission denied > > So what's holding the file open now? It's evidently not the bgwriter. btw: FileMon reports every few seconds: 04:24:28 postgres.exe:736 OPEN D:\DB\postgreSQL.82\data\base\16692\2361629 DELETE PEND Options: Open Access: 0012019F the time corresponds to the "could not open relation" logentries i would interpret this as: postgresql pid 736 (bgwriter) is trying to open the file 2361629 which fails because it is marked as "to be deleted". the file system operation is pending because another process (from the pgsql connection pool) is still keeping a handle open. as it is a connection pool process, it will be "recycled" after a while and release open handles: everytime the error messages disappear after some minutes... - thomas
"Thomas H." <me@alternize.com> writes: >> So what's holding the file open now? It's evidently not the bgwriter. > one of the unnamed postgresql.exe processes from the connection pool: > postgres: db_outnow outnow 127.0.0.1(3384) idle Hm. I would imagine that as soon as this process does something, the messages stop? (It should close its file handle in response to a relcache flush that it will read as soon as it becomes active.) regards, tom lane
"Thomas H." <me@alternize.com> writes: > ... in addition to the above messages, the log is now also > flooded by: > 2006-12-05 04:16:29 [5196] LOG: could not rename temporary statistics file > "global/pgstat.tmp" to "global/pgstat.stat": A blocking operation was > interrupted by a call to WSACancelBlockingCall. Hm ... there simply isn't anything that holds pgstat.stat open for long, so this behavior seems independent of any other issues we might have. Can you find any evidence about what's wrong here? regards, tom lane
>> ... in addition to the above messages, the log is now also >> flooded by: > >> 2006-12-05 04:16:29 [5196] LOG: could not rename temporary statistics >> file >> "global/pgstat.tmp" to "global/pgstat.stat": A blocking operation was >> interrupted by a call to WSACancelBlockingCall. > > Hm ... there simply isn't anything that holds pgstat.stat open for long, > so this behavior seems independent of any other issues we might have. > Can you find any evidence about what's wrong here? hope this helps: 05:33:14 postgres.exe:5196 CREATE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Options: OverwriteIf Access: 00120196 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 0 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 4096 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 8192 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 12288 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 16384 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 20480 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 24576 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 28672 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 32768 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 36864 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 40960 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 45056 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 49152 Length: 1650 05:33:14 postgres.exe:5196 CLOSE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS 05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Options: Open Access: 00110080 05:33:14 postgres.exe:5196 QUERY INFORMATION D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS FileAttributeTagInformation 05:33:14 postgres.exe:5196 QUERY INFORMATION D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Attributes: A 05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.stat SUCCESS Options: Open Access: 00100002 05:33:14 postgres.exe:5196 SET INFORMATION D:\DB\postgreSQL.82\data\global\pgstat.tmp * 0xC0000123 FileRenameInformation 05:33:14 postgres.exe:5196 CLOSE D:\DB\postgreSQL.82\data\global\pgstat.stat SUCCESS 05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.tmp NOT FOUND Options: Open Access: 00110080 05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.tmp NOT FOUND Options: Open Access: 00010080 05:33:14 postgres.exe:5196 CREATE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Options: OverwriteIf Access: 00120196 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 0 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 4096 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 8192 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 12288 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 16384 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 20480 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 24576 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 28672 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 32768 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 36864 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 40960 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 45056 Length: 4096 05:33:14 postgres.exe:5196 WRITE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Offset: 49152 Length: 1650 05:33:14 postgres.exe:5196 CLOSE D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS 05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Options: Open Access: 00110080 05:33:14 postgres.exe:5196 QUERY INFORMATION D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS FileAttributeTagInformation 05:33:14 postgres.exe:5196 QUERY INFORMATION D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Attributes: A 05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.stat SUCCESS Options: Open Access: 00100002 05:33:14 postgres.exe:5196 SET INFORMATION D:\DB\postgreSQL.82\data\global\pgstat.tmp * 0xC0000123 FileRenameInformation 05:33:14 postgres.exe:5196 CLOSE D:\DB\postgreSQL.82\data\global\pgstat.stat SUCCESS 05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.tmp NOT FOUND Options: Open Access: 00110080 05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.tmp NOT FOUND Options: Open Access: 00010080 those are the file actions that are happening at the same time as the error log message appears. 2006-12-05 05:33:14 [5196] LOG: could not rename temporary statistics file "global/pgstat.tmp" to "global/pgstat.stat": A blocking operation was interrupted by a call to WSACancelBlockingCall. 2006-12-05 05:33:14 [5196] LOG: could not rename temporary statistics file "global/pgstat.tmp" to "global/pgstat.stat": A blocking operation was interrupted by a call to WSACancelBlockingCall. the next file actions with pgstat take place some seconds later. more or less the same thing is happening every 2-3 seconds when the log errors appear. there was a short pause of ~30min when this error didn't appear, but for over 40min now it is logged every few seconds. the rc1 doesn't show this behaviour, i've just doublechecked. - thomas
>>> So what's holding the file open now? It's evidently not the bgwriter. > >> one of the unnamed postgresql.exe processes from the connection pool: >> postgres: db_outnow outnow 127.0.0.1(3384) idle > > Hm. I would imagine that as soon as this process does something, > the messages stop? (It should close its file handle in response > to a relcache flush that it will read as soon as it becomes active.) from what i observe i would say the process dies (timeouts?) and then bgwriter is "happy" again: here's *all* more information i got from filemon when filtering for one of the relation that produced the error: http://rafb.net/paste/results/3uozHD77.html its pid 3772 that still has a handle open, while all the others have closed it properly after pid 2780 issued a DELETE. the process itself has 3 threads that are in: - postgres.exe+0x1220 - postgres.exe!pg_queue_signal+0x120 - postgres.exe!shmctl+0x80 (i can get stacktraces for all of them if usefull) pid 3772 died at 05:55:22 (~20min after its last access to the file), and bgwriter could finally write, and the error messages are gone. - thomas