Thread: fsync and semctl errors with 8.1.5/win32

fsync and semctl errors with 8.1.5/win32

From
"Jeremy Haile"
Date:
I've been attempting to run PostgreSQL 8.1.5/win32 on a production
deployment, but have started having many problems.  McAfee Antivirus is
installed and running, although I've excluded the entire drive where
PostgreSQL is installed and where the data is installed.

I've received several errors in the past few days/weeks.  They fall into
three general categories 1) permission denied errors 2) semctl errors 3)
fsync errors.  I am not sure how to reproduce these errors locally -
they seem to occur at unpredictable intervals.

The following posts seem related, although I don't see a resolution for
any of the problems listed:
http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg16097.html
http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14792.html
http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14916.html

I have run PostgreSQL on Linux in the past and not had any problems.  Is
the win32 build generally considered stable or unstable for production
use?  Any help would be greatly appreciated!

1) PERMISSION DENIED ERROR
This error occurred on the same day as the semctl started, but stopped
occurring for a few hours before the semctl errors started.

The following is an example:
2006-11-25 00:46:04 ERROR:  could not open relation 1663/16404/84855:
Permission denied
2006-11-25 00:46:05 ERROR:  could not open relation 1663/16404/84855:
Permission denied
2006-11-25 00:46:06 ERROR:  could not open relation 1663/16404/84855:
Permission denied
2006-11-25 00:46:07 ERROR:  could not open relation 1663/16404/84855:
Permission denied
2006-11-25 00:46:08 ERROR:  could not open relation 1663/16404/84855:
Permission denied
2006-11-25 00:46:09 ERROR:  could not open relation 1663/16404/84855:
Permission denied
2006-11-25 00:46:10 ERROR:  could not open relation 1663/16404/84855:
Permission denied
2006-11-25 00:46:11 ERROR:  could not open relation 1663/16404/84855:
Permission denied
2006-11-25 00:46:12 ERROR:  could not open relation 1663/16404/84855:
Permission denied


2) SEMCTL ERROR
This error occurred over and over one day with the same pattern -
several semctl errors, then the unexpected EOF.  This resulted in
clients being unable to create database connections.  The error occurred
overnight and into the next day, and did not disappear  until postgres
was restarted.

The following is an example:
2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0) failed: A
non-blocking socket operation could not be completed immediately.
2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0) failed: A
non-blocking socket operation could not be completed immediately.
2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0) failed: A
non-blocking socket operation could not be completed immediately.
2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0) failed: A
non-blocking socket operation could not be completed immediately.
2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0) failed: A
non-blocking socket operation could not be completed immediately.
2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0) failed: A
non-blocking socket operation could not be completed immediately.
2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0) failed: A
non-blocking socket operation could not be completed immediately.
2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0) failed: A
non-blocking socket operation could not be completed immediately.
2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0) failed: A
non-blocking socket operation could not be completed immediately.
2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0) failed: A
non-blocking socket operation could not be completed immediately.
2006-11-25 22:10:03 LOG:  could not receive data from client: No
connection could be made because the target machine actively refused it.
2006-11-25 22:10:03 LOG:  unexpected EOF on client connection


3) FSYNC ERROR
I've seen this error several times in the past - including today.

The following is an example:
2006-11-27 00:00:20 LOG:  autovacuum: processing database
"incommDashboard"
2006-11-27 00:00:20 LOG:  could not fsync segment 0 of relation
1663/16404/89952: Permission denied
2006-11-27 00:00:20 ERROR:  storage sync failed on magnetic disk:
Permission denied
2006-11-27 00:00:24 LOG:  could not fsync segment 0 of relation
1663/16404/89952: Permission denied
2006-11-27 00:00:24 ERROR:  storage sync failed on magnetic disk:
Permission denied
2006-11-27 00:00:26 LOG:  could not fsync segment 0 of relation
1663/16404/89952: Permission denied
2006-11-27 00:00:26 ERROR:  storage sync failed on magnetic disk:
Permission denied
2006-11-27 00:00:29 LOG:  could not fsync segment 0 of relation
1663/16404/89952: Permission denied
2006-11-27 00:00:29 ERROR:  storage sync failed on magnetic disk:
Permission denied
2006-11-27 00:00:32 LOG:  could not fsync segment 0 of relation
1663/16404/89952: Permission denied
2006-11-27 00:00:32 ERROR:  storage sync failed on magnetic disk:
Permission denied
2006-11-27 00:00:42 LOG:  could not fsync segment 0 of relation
1663/16404/89952: Permission denied
2006-11-27 00:00:42 ERROR:  storage sync failed on magnetic disk:
Permission denied

Re: fsync and semctl errors with 8.1.5/win32

From
"Magnus Hagander"
Date:
Per the FAQ, we suggest that you *uninstall* your antivirus. Especially
if it has firewall-like functionality (like I beleive McAfee does). Just
disabling the scan does *not* remove the filter drivers and does not
make the antivirus not affect the database processes. So try this. If
the problem doesn't go away, look for something else installed that
might be interfernig with the normal operation of your windows install.

//Magnus=20

> -----Original Message-----
> From: pgsql-bugs-owner@postgresql.org=20
> [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Jeremy Haile
> Sent: den 27 november 2006 15:21
> To: pgsql-bugs@postgresql.org
> Subject: [BUGS] fsync and semctl errors with 8.1.5/win32
>=20
> I've been attempting to run PostgreSQL 8.1.5/win32 on a=20
> production deployment, but have started having many problems.=20
>  McAfee Antivirus is installed and running, although I've=20
> excluded the entire drive where PostgreSQL is installed and=20
> where the data is installed.
>=20
> I've received several errors in the past few days/weeks.=20=20
> They fall into three general categories 1) permission denied=20
> errors 2) semctl errors 3) fsync errors.  I am not sure how=20
> to reproduce these errors locally - they seem to occur at=20
> unpredictable intervals.
>=20
> The following posts seem related, although I don't see a=20
> resolution for any of the problems listed:
> http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg16097.html
> http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14792.html
> http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14916.html
>=20
> I have run PostgreSQL on Linux in the past and not had any=20
> problems.  Is the win32 build generally considered stable or=20
> unstable for production use?  Any help would be greatly appreciated!
>=20
> 1) PERMISSION DENIED ERROR
> This error occurred on the same day as the semctl started,=20
> but stopped occurring for a few hours before the semctl=20
> errors started.
>=20
> The following is an example:
> 2006-11-25 00:46:04 ERROR:  could not open relation 1663/16404/84855:
> Permission denied
> 2006-11-25 00:46:05 ERROR:  could not open relation 1663/16404/84855:
> Permission denied
> 2006-11-25 00:46:06 ERROR:  could not open relation 1663/16404/84855:
> Permission denied
> 2006-11-25 00:46:07 ERROR:  could not open relation 1663/16404/84855:
> Permission denied
> 2006-11-25 00:46:08 ERROR:  could not open relation 1663/16404/84855:
> Permission denied
> 2006-11-25 00:46:09 ERROR:  could not open relation 1663/16404/84855:
> Permission denied
> 2006-11-25 00:46:10 ERROR:  could not open relation 1663/16404/84855:
> Permission denied
> 2006-11-25 00:46:11 ERROR:  could not open relation 1663/16404/84855:
> Permission denied
> 2006-11-25 00:46:12 ERROR:  could not open relation 1663/16404/84855:
> Permission denied
>=20
>=20
> 2) SEMCTL ERROR
> This error occurred over and over one day with the same=20
> pattern - several semctl errors, then the unexpected EOF.=20=20
> This resulted in clients being unable to create database=20
> connections.  The error occurred overnight and into the next=20
> day, and did not disappear  until postgres was restarted.=20=20
>=20
> The following is an example:
> 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)=20
> failed: A non-blocking socket operation could not be=20
> completed immediately.
> 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)=20
> failed: A non-blocking socket operation could not be=20
> completed immediately.
> 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)=20
> failed: A non-blocking socket operation could not be=20
> completed immediately.
> 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)=20
> failed: A non-blocking socket operation could not be=20
> completed immediately.
> 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)=20
> failed: A non-blocking socket operation could not be=20
> completed immediately.
> 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)=20
> failed: A non-blocking socket operation could not be=20
> completed immediately.
> 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)=20
> failed: A non-blocking socket operation could not be=20
> completed immediately.
> 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)=20
> failed: A non-blocking socket operation could not be=20
> completed immediately.
> 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)=20
> failed: A non-blocking socket operation could not be=20
> completed immediately.
> 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)=20
> failed: A non-blocking socket operation could not be=20
> completed immediately.
> 2006-11-25 22:10:03 LOG:  could not receive data from client:=20
> No connection could be made because the target machine=20
> actively refused it.
> 2006-11-25 22:10:03 LOG:  unexpected EOF on client connection
>=20
>=20
> 3) FSYNC ERROR
> I've seen this error several times in the past - including today.
>=20
> The following is an example:
> 2006-11-27 00:00:20 LOG:  autovacuum: processing database=20
> "incommDashboard"
> 2006-11-27 00:00:20 LOG:  could not fsync segment 0 of relation
> 1663/16404/89952: Permission denied
> 2006-11-27 00:00:20 ERROR:  storage sync failed on magnetic disk:
> Permission denied
> 2006-11-27 00:00:24 LOG:  could not fsync segment 0 of relation
> 1663/16404/89952: Permission denied
> 2006-11-27 00:00:24 ERROR:  storage sync failed on magnetic disk:
> Permission denied
> 2006-11-27 00:00:26 LOG:  could not fsync segment 0 of relation
> 1663/16404/89952: Permission denied
> 2006-11-27 00:00:26 ERROR:  storage sync failed on magnetic disk:
> Permission denied
> 2006-11-27 00:00:29 LOG:  could not fsync segment 0 of relation
> 1663/16404/89952: Permission denied
> 2006-11-27 00:00:29 ERROR:  storage sync failed on magnetic disk:
> Permission denied
> 2006-11-27 00:00:32 LOG:  could not fsync segment 0 of relation
> 1663/16404/89952: Permission denied
> 2006-11-27 00:00:32 ERROR:  storage sync failed on magnetic disk:
> Permission denied
> 2006-11-27 00:00:42 LOG:  could not fsync segment 0 of relation
> 1663/16404/89952: Permission denied
> 2006-11-27 00:00:42 ERROR:  storage sync failed on magnetic disk:
> Permission denied
>=20
> ---------------------------(end of=20
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>=20

Re: fsync and semctl errors with 8.1.5/win32

From
"Jeremy Haile"
Date:
Thanks Magnus.

I will uninstall the AntiVirus and see if my problems persist.  I have
disabled all other non-essential services, indexing, etc. so I don't
know of anything else that could be causing the problems.  However, in
some of the posts I referred to, the poster indicated that they were not
running antivirus software and still experienced the problems I'm
having.

I'll repost if I do or don't continue to experience problems after
uninstalling the antivirus.

On Mon, 27 Nov 2006 15:58:33 +0100, "Magnus Hagander"
<mha@sollentuna.net> said:
> Per the FAQ, we suggest that you *uninstall* your antivirus. Especially
> if it has firewall-like functionality (like I beleive McAfee does). Just
> disabling the scan does *not* remove the filter drivers and does not
> make the antivirus not affect the database processes. So try this. If
> the problem doesn't go away, look for something else installed that
> might be interfernig with the normal operation of your windows install.
>
> //Magnus
>
> > -----Original Message-----
> > From: pgsql-bugs-owner@postgresql.org
> > [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Jeremy Haile
> > Sent: den 27 november 2006 15:21
> > To: pgsql-bugs@postgresql.org
> > Subject: [BUGS] fsync and semctl errors with 8.1.5/win32
> >
> > I've been attempting to run PostgreSQL 8.1.5/win32 on a
> > production deployment, but have started having many problems.
> >  McAfee Antivirus is installed and running, although I've
> > excluded the entire drive where PostgreSQL is installed and
> > where the data is installed.
> >
> > I've received several errors in the past few days/weeks.
> > They fall into three general categories 1) permission denied
> > errors 2) semctl errors 3) fsync errors.  I am not sure how
> > to reproduce these errors locally - they seem to occur at
> > unpredictable intervals.
> >
> > The following posts seem related, although I don't see a
> > resolution for any of the problems listed:
> > http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg16097.html
> > http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14792.html
> > http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14916.html
> >
> > I have run PostgreSQL on Linux in the past and not had any
> > problems.  Is the win32 build generally considered stable or
> > unstable for production use?  Any help would be greatly appreciated!
> >
> > 1) PERMISSION DENIED ERROR
> > This error occurred on the same day as the semctl started,
> > but stopped occurring for a few hours before the semctl
> > errors started.
> >
> > The following is an example:
> > 2006-11-25 00:46:04 ERROR:  could not open relation 1663/16404/84855:
> > Permission denied
> > 2006-11-25 00:46:05 ERROR:  could not open relation 1663/16404/84855:
> > Permission denied
> > 2006-11-25 00:46:06 ERROR:  could not open relation 1663/16404/84855:
> > Permission denied
> > 2006-11-25 00:46:07 ERROR:  could not open relation 1663/16404/84855:
> > Permission denied
> > 2006-11-25 00:46:08 ERROR:  could not open relation 1663/16404/84855:
> > Permission denied
> > 2006-11-25 00:46:09 ERROR:  could not open relation 1663/16404/84855:
> > Permission denied
> > 2006-11-25 00:46:10 ERROR:  could not open relation 1663/16404/84855:
> > Permission denied
> > 2006-11-25 00:46:11 ERROR:  could not open relation 1663/16404/84855:
> > Permission denied
> > 2006-11-25 00:46:12 ERROR:  could not open relation 1663/16404/84855:
> > Permission denied
> >
> >
> > 2) SEMCTL ERROR
> > This error occurred over and over one day with the same
> > pattern - several semctl errors, then the unexpected EOF.
> > This resulted in clients being unable to create database
> > connections.  The error occurred overnight and into the next
> > day, and did not disappear  until postgres was restarted.
> >
> > The following is an example:
> > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > failed: A non-blocking socket operation could not be
> > completed immediately.
> > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > failed: A non-blocking socket operation could not be
> > completed immediately.
> > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > failed: A non-blocking socket operation could not be
> > completed immediately.
> > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > failed: A non-blocking socket operation could not be
> > completed immediately.
> > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > failed: A non-blocking socket operation could not be
> > completed immediately.
> > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > failed: A non-blocking socket operation could not be
> > completed immediately.
> > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > failed: A non-blocking socket operation could not be
> > completed immediately.
> > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > failed: A non-blocking socket operation could not be
> > completed immediately.
> > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > failed: A non-blocking socket operation could not be
> > completed immediately.
> > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > failed: A non-blocking socket operation could not be
> > completed immediately.
> > 2006-11-25 22:10:03 LOG:  could not receive data from client:
> > No connection could be made because the target machine
> > actively refused it.
> > 2006-11-25 22:10:03 LOG:  unexpected EOF on client connection
> >
> >
> > 3) FSYNC ERROR
> > I've seen this error several times in the past - including today.
> >
> > The following is an example:
> > 2006-11-27 00:00:20 LOG:  autovacuum: processing database
> > "incommDashboard"
> > 2006-11-27 00:00:20 LOG:  could not fsync segment 0 of relation
> > 1663/16404/89952: Permission denied
> > 2006-11-27 00:00:20 ERROR:  storage sync failed on magnetic disk:
> > Permission denied
> > 2006-11-27 00:00:24 LOG:  could not fsync segment 0 of relation
> > 1663/16404/89952: Permission denied
> > 2006-11-27 00:00:24 ERROR:  storage sync failed on magnetic disk:
> > Permission denied
> > 2006-11-27 00:00:26 LOG:  could not fsync segment 0 of relation
> > 1663/16404/89952: Permission denied
> > 2006-11-27 00:00:26 ERROR:  storage sync failed on magnetic disk:
> > Permission denied
> > 2006-11-27 00:00:29 LOG:  could not fsync segment 0 of relation
> > 1663/16404/89952: Permission denied
> > 2006-11-27 00:00:29 ERROR:  storage sync failed on magnetic disk:
> > Permission denied
> > 2006-11-27 00:00:32 LOG:  could not fsync segment 0 of relation
> > 1663/16404/89952: Permission denied
> > 2006-11-27 00:00:32 ERROR:  storage sync failed on magnetic disk:
> > Permission denied
> > 2006-11-27 00:00:42 LOG:  could not fsync segment 0 of relation
> > 1663/16404/89952: Permission denied
> > 2006-11-27 00:00:42 ERROR:  storage sync failed on magnetic disk:
> > Permission denied
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> >        choose an index scan if your joining column's datatypes do not
> >        match
> >

Re: fsync and semctl errors with 8.1.5/win32

From
"Jeremy Haile"
Date:
I've gotten pushback from my organization on removing antivirus from the
servers completely.  Are there any antiviruses that are known to be
compatible with PostgreSQL/win32?

On Mon, 27 Nov 2006 10:28:23 -0500, "Jeremy Haile" <jhaile@fastmail.fm>
said:
> Thanks Magnus.
>
> I will uninstall the AntiVirus and see if my problems persist.  I have
> disabled all other non-essential services, indexing, etc. so I don't
> know of anything else that could be causing the problems.  However, in
> some of the posts I referred to, the poster indicated that they were not
> running antivirus software and still experienced the problems I'm
> having.
>
> I'll repost if I do or don't continue to experience problems after
> uninstalling the antivirus.
>
> On Mon, 27 Nov 2006 15:58:33 +0100, "Magnus Hagander"
> <mha@sollentuna.net> said:
> > Per the FAQ, we suggest that you *uninstall* your antivirus. Especially
> > if it has firewall-like functionality (like I beleive McAfee does). Just
> > disabling the scan does *not* remove the filter drivers and does not
> > make the antivirus not affect the database processes. So try this. If
> > the problem doesn't go away, look for something else installed that
> > might be interfernig with the normal operation of your windows install.
> >
> > //Magnus
> >
> > > -----Original Message-----
> > > From: pgsql-bugs-owner@postgresql.org
> > > [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Jeremy Haile
> > > Sent: den 27 november 2006 15:21
> > > To: pgsql-bugs@postgresql.org
> > > Subject: [BUGS] fsync and semctl errors with 8.1.5/win32
> > >
> > > I've been attempting to run PostgreSQL 8.1.5/win32 on a
> > > production deployment, but have started having many problems.
> > >  McAfee Antivirus is installed and running, although I've
> > > excluded the entire drive where PostgreSQL is installed and
> > > where the data is installed.
> > >
> > > I've received several errors in the past few days/weeks.
> > > They fall into three general categories 1) permission denied
> > > errors 2) semctl errors 3) fsync errors.  I am not sure how
> > > to reproduce these errors locally - they seem to occur at
> > > unpredictable intervals.
> > >
> > > The following posts seem related, although I don't see a
> > > resolution for any of the problems listed:
> > > http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg16097.html
> > > http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14792.html
> > > http://www.mail-archive.com/pgsql-bugs@postgresql.org/msg14916.html
> > >
> > > I have run PostgreSQL on Linux in the past and not had any
> > > problems.  Is the win32 build generally considered stable or
> > > unstable for production use?  Any help would be greatly appreciated!
> > >
> > > 1) PERMISSION DENIED ERROR
> > > This error occurred on the same day as the semctl started,
> > > but stopped occurring for a few hours before the semctl
> > > errors started.
> > >
> > > The following is an example:
> > > 2006-11-25 00:46:04 ERROR:  could not open relation 1663/16404/84855:
> > > Permission denied
> > > 2006-11-25 00:46:05 ERROR:  could not open relation 1663/16404/84855:
> > > Permission denied
> > > 2006-11-25 00:46:06 ERROR:  could not open relation 1663/16404/84855:
> > > Permission denied
> > > 2006-11-25 00:46:07 ERROR:  could not open relation 1663/16404/84855:
> > > Permission denied
> > > 2006-11-25 00:46:08 ERROR:  could not open relation 1663/16404/84855:
> > > Permission denied
> > > 2006-11-25 00:46:09 ERROR:  could not open relation 1663/16404/84855:
> > > Permission denied
> > > 2006-11-25 00:46:10 ERROR:  could not open relation 1663/16404/84855:
> > > Permission denied
> > > 2006-11-25 00:46:11 ERROR:  could not open relation 1663/16404/84855:
> > > Permission denied
> > > 2006-11-25 00:46:12 ERROR:  could not open relation 1663/16404/84855:
> > > Permission denied
> > >
> > >
> > > 2) SEMCTL ERROR
> > > This error occurred over and over one day with the same
> > > pattern - several semctl errors, then the unexpected EOF.
> > > This resulted in clients being unable to create database
> > > connections.  The error occurred overnight and into the next
> > > day, and did not disappear  until postgres was restarted.
> > >
> > > The following is an example:
> > > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > > failed: A non-blocking socket operation could not be
> > > completed immediately.
> > > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > > failed: A non-blocking socket operation could not be
> > > completed immediately.
> > > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > > failed: A non-blocking socket operation could not be
> > > completed immediately.
> > > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > > failed: A non-blocking socket operation could not be
> > > completed immediately.
> > > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > > failed: A non-blocking socket operation could not be
> > > completed immediately.
> > > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > > failed: A non-blocking socket operation could not be
> > > completed immediately.
> > > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > > failed: A non-blocking socket operation could not be
> > > completed immediately.
> > > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > > failed: A non-blocking socket operation could not be
> > > completed immediately.
> > > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > > failed: A non-blocking socket operation could not be
> > > completed immediately.
> > > 2006-11-25 22:10:03 FATAL:  semctl(167238064, 15, SETVAL, 0)
> > > failed: A non-blocking socket operation could not be
> > > completed immediately.
> > > 2006-11-25 22:10:03 LOG:  could not receive data from client:
> > > No connection could be made because the target machine
> > > actively refused it.
> > > 2006-11-25 22:10:03 LOG:  unexpected EOF on client connection
> > >
> > >
> > > 3) FSYNC ERROR
> > > I've seen this error several times in the past - including today.
> > >
> > > The following is an example:
> > > 2006-11-27 00:00:20 LOG:  autovacuum: processing database
> > > "incommDashboard"
> > > 2006-11-27 00:00:20 LOG:  could not fsync segment 0 of relation
> > > 1663/16404/89952: Permission denied
> > > 2006-11-27 00:00:20 ERROR:  storage sync failed on magnetic disk:
> > > Permission denied
> > > 2006-11-27 00:00:24 LOG:  could not fsync segment 0 of relation
> > > 1663/16404/89952: Permission denied
> > > 2006-11-27 00:00:24 ERROR:  storage sync failed on magnetic disk:
> > > Permission denied
> > > 2006-11-27 00:00:26 LOG:  could not fsync segment 0 of relation
> > > 1663/16404/89952: Permission denied
> > > 2006-11-27 00:00:26 ERROR:  storage sync failed on magnetic disk:
> > > Permission denied
> > > 2006-11-27 00:00:29 LOG:  could not fsync segment 0 of relation
> > > 1663/16404/89952: Permission denied
> > > 2006-11-27 00:00:29 ERROR:  storage sync failed on magnetic disk:
> > > Permission denied
> > > 2006-11-27 00:00:32 LOG:  could not fsync segment 0 of relation
> > > 1663/16404/89952: Permission denied
> > > 2006-11-27 00:00:32 ERROR:  storage sync failed on magnetic disk:
> > > Permission denied
> > > 2006-11-27 00:00:42 LOG:  could not fsync segment 0 of relation
> > > 1663/16404/89952: Permission denied
> > > 2006-11-27 00:00:42 ERROR:  storage sync failed on magnetic disk:
> > > Permission denied
> > >
> > > ---------------------------(end of
> > > broadcast)---------------------------
> > > TIP 9: In versions below 8.0, the planner will ignore your desire to
> > >        choose an index scan if your joining column's datatypes do not
> > >        match
> > >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

Re: fsync and semctl errors with 8.1.5/win32

From
Dave Page
Date:
Jeremy Haile wrote:
> I've gotten pushback from my organization on removing antivirus from the
> servers completely.  Are there any antiviruses that are known to be
> compatible with PostgreSQL/win32?

All my boxes (2 build farm members, 1 production server, and the laptop
on which the official releases are built and tested) run Sophos Anti
Virus (http://www.sophos.com/products/es/endpoint/sav.html), with no
problems.

Regards, Dave

Re: fsync and semctl errors with 8.1.5/win32

From
"Jeremy Haile"
Date:
Thanks for the feedback.  If you don't mind, what version of PostgreSQL
are you running?

I'm trying to bring PostgreSQL into this company - they are primarily a
Windows/SQL Server shop (although Java software development)  I've
already gotten comments similar to "Why don't you just switch to SQL
Server?" -  so I'm hoping to find a workaround before I get forced to
switch DB platforms.  As it is, my application seems unreliable because
I haven't been able to resolve the PostgreSQL hanging problems in
Windows.  If I had my way, I'd switch the server to Linux - but alas,
that hasn't been an option so far.

I know this may be the wrong list to ask this question on - but as I'm
an outspoken PostgreSQL advocate, I'd like your opinions.  If I am
unable to resolve these PostgreSQL issues given my constraints, will I
likely have less problems running MySQL/InnoDB on Windows? (since it has
had a native Windows build for much longer)


On Mon, 27 Nov 2006 16:40:57 +0000, "Dave Page" <dpage@postgresql.org>
said:
> Jeremy Haile wrote:
> > I've gotten pushback from my organization on removing antivirus from the
> > servers completely.  Are there any antiviruses that are known to be
> > compatible with PostgreSQL/win32?
>
> All my boxes (2 build farm members, 1 production server, and the laptop
> on which the official releases are built and tested) run Sophos Anti
> Virus (http://www.sophos.com/products/es/endpoint/sav.html), with no
> problems.
>
> Regards, Dave
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Re: fsync and semctl errors with 8.1.5/win32

From
"Jeremy Haile"
Date:
OK - after uninstalling the virus scanner (McAfee), I still get the same
disk access errors.

Here's a few seconds of the log output (this has been going on for 10
mins as of this e-mail being sent):
2006-11-28 16:16:10 LOG:  could not fsync segment 0 of relation
1663/16404/30267: Permission denied
2006-11-28 16:16:10 ERROR:  storage sync failed on magnetic disk:
Permission denied
2006-11-28 16:16:11 LOG:  could not fsync segment 0 of relation
1663/16404/30267: Permission denied
2006-11-28 16:16:11 ERROR:  storage sync failed on magnetic disk:
Permission denied
2006-11-28 16:16:12 LOG:  could not fsync segment 0 of relation
1663/16404/30267: Permission denied
2006-11-28 16:16:12 ERROR:  storage sync failed on magnetic disk:
Permission denied
2006-11-28 16:16:13 LOG:  could not fsync segment 0 of relation
1663/16404/30267: Permission denied
2006-11-28 16:16:13 ERROR:  storage sync failed on magnetic disk:
Permission denied


Here's the FileMon output from the same seconds:
4:16:10 PM      postgres.exe:3168       OPEN    C:\Program
Files\PostgreSQL\8.1\data\base\16404\30267   DELETE PEND     Options:
Open  Access: 0012019F
4:16:11 PM      postgres.exe:3168       OPEN    C:\Program
Files\PostgreSQL\8.1\data\base\16404\30267   DELETE PEND     Options:
Open  Access: 0012019F
4:16:12 PM      postgres.exe:3168       OPEN    C:\Program
Files\PostgreSQL\8.1\data\base\16404\30267   DELETE PEND     Options:
Open  Access: 0012019F
4:16:13 PM      postgres.exe:3168       OPEN    C:\Program
Files\PostgreSQL\8.1\data\base\16404\30267   DELETE PEND     Options:
Open  Access: 0012019F


This is an incredibly bad problem for me.  I'd really appreciate any
help!

Jeremy


On Mon, 27 Nov 2006 12:14:00 -0500, "Jeremy Haile" <jhaile@fastmail.fm>
said:
> Thanks for the feedback.  If you don't mind, what version of PostgreSQL
> are you running?
>
> I'm trying to bring PostgreSQL into this company - they are primarily a
> Windows/SQL Server shop (although Java software development)  I've
> already gotten comments similar to "Why don't you just switch to SQL
> Server?" -  so I'm hoping to find a workaround before I get forced to
> switch DB platforms.  As it is, my application seems unreliable because
> I haven't been able to resolve the PostgreSQL hanging problems in
> Windows.  If I had my way, I'd switch the server to Linux - but alas,
> that hasn't been an option so far.
>
> I know this may be the wrong list to ask this question on - but as I'm
> an outspoken PostgreSQL advocate, I'd like your opinions.  If I am
> unable to resolve these PostgreSQL issues given my constraints, will I
> likely have less problems running MySQL/InnoDB on Windows? (since it has
> had a native Windows build for much longer)
>
>
> On Mon, 27 Nov 2006 16:40:57 +0000, "Dave Page" <dpage@postgresql.org>
> said:
> > Jeremy Haile wrote:
> > > I've gotten pushback from my organization on removing antivirus from the
> > > servers completely.  Are there any antiviruses that are known to be
> > > compatible with PostgreSQL/win32?
> >
> > All my boxes (2 build farm members, 1 production server, and the laptop
> > on which the official releases are built and tested) run Sophos Anti
> > Virus (http://www.sophos.com/products/es/endpoint/sav.html), with no
> > problems.
> >
> > Regards, Dave
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: explain analyze is your friend
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

Re: fsync and semctl errors with 8.1.5/win32

From
"Jeremy Haile"
Date:
I forgot to mention - this problem is occurring on multiple Windows
machines.  One of them is running Windows XP Professional.  The other is
running Windows Server 2003.  I have disabled indexing, virus scanning,
and all non-essential services on both of them.  The problem continues
to show up even when no queries are being run (although it might always
start while queries are running)




On Tue, 28 Nov 2006 16:18:56 -0500, "Jeremy Haile" <jhaile@fastmail.fm>
said:
> OK - after uninstalling the virus scanner (McAfee), I still get the same
> disk access errors.
>
> Here's a few seconds of the log output (this has been going on for 10
> mins as of this e-mail being sent):
> 2006-11-28 16:16:10 LOG:  could not fsync segment 0 of relation
> 1663/16404/30267: Permission denied
> 2006-11-28 16:16:10 ERROR:  storage sync failed on magnetic disk:
> Permission denied
> 2006-11-28 16:16:11 LOG:  could not fsync segment 0 of relation
> 1663/16404/30267: Permission denied
> 2006-11-28 16:16:11 ERROR:  storage sync failed on magnetic disk:
> Permission denied
> 2006-11-28 16:16:12 LOG:  could not fsync segment 0 of relation
> 1663/16404/30267: Permission denied
> 2006-11-28 16:16:12 ERROR:  storage sync failed on magnetic disk:
> Permission denied
> 2006-11-28 16:16:13 LOG:  could not fsync segment 0 of relation
> 1663/16404/30267: Permission denied
> 2006-11-28 16:16:13 ERROR:  storage sync failed on magnetic disk:
> Permission denied
>
>
> Here's the FileMon output from the same seconds:
> 4:16:10 PM      postgres.exe:3168       OPEN    C:\Program
> Files\PostgreSQL\8.1\data\base\16404\30267   DELETE PEND     Options:
> Open  Access: 0012019F
> 4:16:11 PM      postgres.exe:3168       OPEN    C:\Program
> Files\PostgreSQL\8.1\data\base\16404\30267   DELETE PEND     Options:
> Open  Access: 0012019F
> 4:16:12 PM      postgres.exe:3168       OPEN    C:\Program
> Files\PostgreSQL\8.1\data\base\16404\30267   DELETE PEND     Options:
> Open  Access: 0012019F
> 4:16:13 PM      postgres.exe:3168       OPEN    C:\Program
> Files\PostgreSQL\8.1\data\base\16404\30267   DELETE PEND     Options:
> Open  Access: 0012019F
>
>
> This is an incredibly bad problem for me.  I'd really appreciate any
> help!
>
> Jeremy
>
>
> On Mon, 27 Nov 2006 12:14:00 -0500, "Jeremy Haile" <jhaile@fastmail.fm>
> said:
> > Thanks for the feedback.  If you don't mind, what version of PostgreSQL
> > are you running?
> >
> > I'm trying to bring PostgreSQL into this company - they are primarily a
> > Windows/SQL Server shop (although Java software development)  I've
> > already gotten comments similar to "Why don't you just switch to SQL
> > Server?" -  so I'm hoping to find a workaround before I get forced to
> > switch DB platforms.  As it is, my application seems unreliable because
> > I haven't been able to resolve the PostgreSQL hanging problems in
> > Windows.  If I had my way, I'd switch the server to Linux - but alas,
> > that hasn't been an option so far.
> >
> > I know this may be the wrong list to ask this question on - but as I'm
> > an outspoken PostgreSQL advocate, I'd like your opinions.  If I am
> > unable to resolve these PostgreSQL issues given my constraints, will I
> > likely have less problems running MySQL/InnoDB on Windows? (since it has
> > had a native Windows build for much longer)
> >
> >
> > On Mon, 27 Nov 2006 16:40:57 +0000, "Dave Page" <dpage@postgresql.org>
> > said:
> > > Jeremy Haile wrote:
> > > > I've gotten pushback from my organization on removing antivirus from the
> > > > servers completely.  Are there any antiviruses that are known to be
> > > > compatible with PostgreSQL/win32?
> > >
> > > All my boxes (2 build farm members, 1 production server, and the laptop
> > > on which the official releases are built and tested) run Sophos Anti
> > > Virus (http://www.sophos.com/products/es/endpoint/sav.html), with no
> > > problems.
> > >
> > > Regards, Dave
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 6: explain analyze is your friend
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: don't forget to increase your free space map settings

Re: fsync and semctl errors with 8.1.5/win32

From
"Thomas H."
Date:
>I forgot to mention - this problem is occurring on multiple Windows
> machines.  One of them is running Windows XP Professional.  The other is
> running Windows Server 2003.  I have disabled indexing, virus scanning,
> and all non-essential services on both of them.  The problem continues
> to show up even when no queries are being run (although it might always
> start while queries are running)

seems exactly what i'm noticing since 8.2x on windows 2003 as well - no disk
services (backup, virus, ...) are running that would block files, and
processmon/filemon always show that the files in question are locked by
pgsql processes...

under higher insert/update load, the errors appear more often here, do you
experience the same finding when loading bulk data?

- thomas

Re: fsync and semctl errors with 8.1.5/win32

From
Tom Lane
Date:
"Jeremy Haile" <jhaile@fastmail.fm> writes:
> Here's a few seconds of the log output (this has been going on for 10
> mins as of this e-mail being sent):
> 2006-11-28 16:16:10 LOG:  could not fsync segment 0 of relation
> 1663/16404/30267: Permission denied
> 2006-11-28 16:16:10 ERROR:  storage sync failed on magnetic disk:
> Permission denied

> Here's the FileMon output from the same seconds:
> 4:16:10 PM      postgres.exe:3168       OPEN    C:\Program
> Files\PostgreSQL\8.1\data\base\16404\30267   DELETE PEND     Options:
> Open  Access: 0012019F

I still don't want to make mdsync() treat EACCES as an ignorable error.
However, in this situation we've got an infinite loop because the
checkpoint will never succeed and thus the bgwriter will never reach
smgrcloseall(), which seems to be what's needed to allow the deleted
file to die the real death.

Perhaps a suitable workaround would be to make the bgwriter do
smgrcloseall in its error recovery path?  That is

        /*
         * Sleep at least 1 second after any error.  A write error is likely
         * to be repeated, and we don't want to be filling the error logs as
         * fast as we can.
         */
        pg_usleep(1000000L);
+
+        /* Drop open files to allow deleted files to really go away */
+        smgrcloseall();
    }

    /* We can now handle ereport(ERROR) */
    PG_exception_stack = &local_sigjmp_buf;


Perhaps this should be #ifdef WIN32, although there's probably no harm
in doing it on Unixen too.  Can someone test this idea?

            regards, tom lane

Re: fsync and semctl errors with 8.1.5/win32

From
"Jeremy Haile"
Date:
Yes - processmon always shows the files being locked by postgres.exe
processes.  My database is being used as a data warehouse, so about all
I am doing is bulk insert/updates.  I have a job that runs every 5
minutes and loads data into the database.

I typically load between 10,000 and 100,000 rows every 5 minutes into my
fact tables, although I also make use of transition tables, dimension
tables, etc. that get inserts/updates as well.  I see the problem occur
3-4 times a day on average, but I don't know how to reproduce it other
than letting it run for a while.



On Tue, 28 Nov 2006 22:31:42 +0100, "Thomas H." <me@alternize.com> said:
> >I forgot to mention - this problem is occurring on multiple Windows
> > machines.  One of them is running Windows XP Professional.  The other is
> > running Windows Server 2003.  I have disabled indexing, virus scanning,
> > and all non-essential services on both of them.  The problem continues
> > to show up even when no queries are being run (although it might always
> > start while queries are running)
>
> seems exactly what i'm noticing since 8.2x on windows 2003 as well - no
> disk
> services (backup, virus, ...) are running that would block files, and
> processmon/filemon always show that the files in question are locked by
> pgsql processes...
>
> under higher insert/update load, the errors appear more often here, do
> you
> experience the same finding when loading bulk data?
>
> - thomas
>
>

Re: fsync and semctl errors with 8.1.5/win32

From
"Thomas H."
Date:
> Perhaps this should be #ifdef WIN32, although there's probably no harm
> in doing it on Unixen too.  Can someone test this idea?

if magnus/dave could provide me a patched rc1 exe, i could run it in our
semi-productive environment for some tests.

- thomas

Re: fsync and semctl errors with 8.1.5/win32

From
"Jeremy Haile"
Date:
I am currently running 8.1.5, but I'm willing to upgrade to whatever
version, use a patched exe, etc.  Just let me know what I need to do.

On Tue, 28 Nov 2006 23:39:00 +0100, "Thomas H." <me@alternize.com> said:
> > Perhaps this should be #ifdef WIN32, although there's probably no harm
> > in doing it on Unixen too.  Can someone test this idea?
>
> if magnus/dave could provide me a patched rc1 exe, i could run it in our
> semi-productive environment for some tests.
>
> - thomas
>
>

Re: fsync and semctl errors with 8.1.5/win32

From
"Jeremy Haile"
Date:
Last night I received another filesystem-related problem.  Shortly after
this problem occurred, I had a connection hang indefinitely, causing my
software to go down all night.

The log output that occurred shortly before the problem is below.  After
that, there was no log output by PostgreSQL until I came in this morning
and killed the offending process.  Any ideas?  Could the pg_xlog error
below possibly result in a transaction hanging indefinitely?  If so,
would the solution Tom proposed possibly fix the error?

Any update on getting a patched exe for Thomas and I to test?

2006-11-29 20:11:35 ERROR:  tuple concurrently updated
2006-11-29 20:11:35 ERROR:  tuple concurrently updated
2006-11-29 20:11:35 ERROR:  tuple concurrently updated
2006-11-29 20:11:36 LOG:  transaction ID wrap limit is 1090292093,
limited by database "incommDashboard"
2006-11-29 20:11:36 LOG:  transaction ID wrap limit is 1090292093,
limited by database "incommDashboard"
2006-11-29 21:21:38 LOG:  transaction ID wrap limit is 1090522044,
limited by database "incommDashboard"
2006-11-29 21:21:38 LOG:  transaction ID wrap limit is 1090522044,
limited by database "incommDashboard"
2006-11-29 22:22:52 LOG:  transaction ID wrap limit is 1090579373,
limited by database "incommDashboard"
2006-11-29 22:22:52 LOG:  transaction ID wrap limit is 1090579373,
limited by database "incommDashboard"
2006-11-29 23:38:47 LOG:  transaction ID wrap limit is 1090633937,
limited by database "incommDashboard"
2006-11-29 23:38:47 LOG:  transaction ID wrap limit is 1090633937,
limited by database "incommDashboard"
2006-11-29 23:57:52 LOG:  could not rename file
"pg_xlog/00000001000000190000005E" to
"pg_xlog/00000001000000190000007F", continuing to try


On Tue, 28 Nov 2006 17:49:22 -0500, "Jeremy Haile" <jhaile@fastmail.fm>
said:
> I am currently running 8.1.5, but I'm willing to upgrade to whatever
> version, use a patched exe, etc.  Just let me know what I need to do.
>
> On Tue, 28 Nov 2006 23:39:00 +0100, "Thomas H." <me@alternize.com> said:
> > > Perhaps this should be #ifdef WIN32, although there's probably no harm
> > > in doing it on Unixen too.  Can someone test this idea?
> >
> > if magnus/dave could provide me a patched rc1 exe, i could run it in our
> > semi-productive environment for some tests.
> >
> > - thomas
> >
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

Re: fsync and semctl errors with 8.1.5/win32

From
"Thomas H."
Date:
> 2006-11-29 23:57:52 LOG:  could not rename file
> "pg_xlog/00000001000000190000005E" to
> "pg_xlog/00000001000000190000007F", continuing to try

i had this one as well. good news is: this bug is fixed in 8.2

- thomas

Re: fsync and semctl errors with 8.1.5/win32

From
"Jeremy Haile"
Date:
Really?  That's great news.  Maybe I should start testing with 8.2
today.

Did you run into problems where transactions would hang?  If so, did
those disappear in 8.2?

On Thu, 30 Nov 2006 15:14:46 +0100, "Thomas H." <me@alternize.com> said:
> > 2006-11-29 23:57:52 LOG:  could not rename file
> > "pg_xlog/00000001000000190000005E" to
> > "pg_xlog/00000001000000190000007F", continuing to try
>
> i had this one as well. good news is: this bug is fixed in 8.2
>
> - thomas
>

Re: fsync and semctl errors with 8.1.5/win32

From
"Thomas H."
Date:
> Did you run into problems where transactions would hang?  If so, did
> those disappear in 8.2?

well, i wasn't really able to exactly determine under what conditions that
xlog bug appeared in our case. tho it always was when lots of data is
imported at once within one transaction. under normal load i've never seen
the xlog bug. as far as i know it was some sort of lifelock: as with the
other error messages, another postgres.exe kept a lock of the xlog file,
which the bgwriter-process wanted to rename which lead to the complete halt
of the db system, due to the importance of xlog/bgwriter. you can force an
unload of the locked xlog file handle in processmon, and postgresql will
resume "normally".

i had a transaction lately that created 7gb of xlog-files (vacuum full of a
mid-sized table) without any xlog-lockup, so i guess this problem is really
fixed in the latest 8.2 build :-)

if you have hanging transactions but other db activity works well, i would
rather guess its a side effect of the other file problems with the
relation-files that can't be renamed. i've never been able to see any impact
of that error message. even when it appears 10 times a second everything
seems "ok". but on the other side, in our case, we use the database as a web
backend and have always around 20-30 concurrent connections, so its hard to
debug.

- thomas

Re: fsync and semctl errors with 8.1.5/win32

From
"Thomas H."
Date:
> We were also running it on Windows Server 2003.  We ended up rolling
> back service pack 1 and it seems to have taken care of the hanging
> transactions and we haven't seen a semctl error in awhile.

interesting. we're using sp1 & pgsql since day 1 and the problem only
started when testing 8.2b1. but on the other hand, it might be that a hotfix
is the cause for this error, as i haven't seen it before aug/sept 06. i sure
would have noticed...

- thomas

Re: fsync and semctl errors with 8.1.5/win32

From
"Bradley Russell"
Date:
Jeremy,

My company runs a 200 gig data warehouse. We are running 8.1.2. We were
seeing hanging transactions and occasional semctl errors.

We were also running it on Windows Server 2003.  We ended up rolling
back service pack 1 and it seems to have taken care of the hanging
transactions and we haven't seen a semctl error in a while.

Worth a shot if it applies to you.

Brad Russell
Programmer Analyst
NPC International

-----Original Message-----
From: pgsql-bugs-owner@postgresql.org
[mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Thomas H.
Sent: Thursday, November 30, 2006 9:11 AM
To: Jeremy Haile; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] fsync and semctl errors with 8.1.5/win32


> Did you run into problems where transactions would hang?  If so, did
> those disappear in 8.2?

well, i wasn't really able to exactly determine under what conditions
that
xlog bug appeared in our case. tho it always was when lots of data is
imported at once within one transaction. under normal load i've never
seen
the xlog bug. as far as i know it was some sort of lifelock: as with the

other error messages, another postgres.exe kept a lock of the xlog file,

which the bgwriter-process wanted to rename which lead to the complete
halt
of the db system, due to the importance of xlog/bgwriter. you can force
an
unload of the locked xlog file handle in processmon, and postgresql will

resume "normally".

i had a transaction lately that created 7gb of xlog-files (vacuum full
of a
mid-sized table) without any xlog-lockup, so i guess this problem is
really
fixed in the latest 8.2 build :-)

if you have hanging transactions but other db activity works well, i
would
rather guess its a side effect of the other file problems with the
relation-files that can't be renamed. i've never been able to see any
impact
of that error message. even when it appears 10 times a second everything

seems "ok". but on the other side, in our case, we use the database as a
web
backend and have always around 20-30 concurrent connections, so its hard
to
debug.

- thomas



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: fsync and semctl errors with 8.1.5/win32

From
"Bradley Russell"
Date:
In our case the hanging transactions was killing our in house merge
process that merges the data from our stores to the warehouse.

When we rolled back sp1 no hanging transactions.  When we applied sp1
the hanging started again.  We took sp1 off and on a couple times and it
happened every time.

-----Original Message-----
From: Thomas H. [mailto:me@alternize.com]
Sent: Thursday, November 30, 2006 10:04 AM
To: Bradley Russell
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] fsync and semctl errors with 8.1.5/win32


> We were also running it on Windows Server 2003.  We ended up rolling
> back service pack 1 and it seems to have taken care of the hanging
> transactions and we haven't seen a semctl error in awhile.

interesting. we're using sp1 & pgsql since day 1 and the problem only
started when testing 8.2b1. but on the other hand, it might be that a
hotfix
is the cause for this error, as i haven't seen it before aug/sept 06. i
sure
would have noticed...

- thomas

Re: fsync and semctl errors with 8.1.5/win32

From
"Jeremy Haile"
Date:
I've upgraded to 8.2rc1.  I still receive fsync errors all the time, but
I haven't had it freeze yet.  With 8.1.5 I was able to reproduce the
freeze very quickly, so this is a good sign!  Only time will tell if
it's truly resolved - but I'll post updates to this list either way.

I'm still open to testing the fsync fix that Tom proposed if I can get a
patched exe.


On Thu, 30 Nov 2006 10:15:50 -0600, "Bradley Russell"
<bradley.russell@npcinternational.com> said:
> In our case the hanging transactions was killing our in house merge
> process that merges the data from our stores to the warehouse.
>
> When we rolled back sp1 no hanging transactions.  When we applied sp1
> the hanging started again.  We took sp1 off and on a couple times and it
> happened every time.
>
> -----Original Message-----
> From: Thomas H. [mailto:me@alternize.com]
> Sent: Thursday, November 30, 2006 10:04 AM
> To: Bradley Russell
> Cc: pgsql-bugs@postgresql.org
> Subject: Re: [BUGS] fsync and semctl errors with 8.1.5/win32
>
>
> > We were also running it on Windows Server 2003.  We ended up rolling
> > back service pack 1 and it seems to have taken care of the hanging
> > transactions and we haven't seen a semctl error in awhile.
>
> interesting. we're using sp1 & pgsql since day 1 and the problem only
> started when testing 8.2b1. but on the other hand, it might be that a
> hotfix
> is the cause for this error, as i haven't seen it before aug/sept 06. i
> sure
> would have noticed...
>
> - thomas
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

Re: fsync and semctl errors with 8.1.5/win32

From
"Thomas H."
Date:
>> Here's a few seconds of the log output (this has been going on for 10
>> mins as of this e-mail being sent):
>> 2006-11-28 16:16:10 LOG:  could not fsync segment 0 of relation
>> 1663/16404/30267: Permission denied
>> 2006-11-28 16:16:10 ERROR:  storage sync failed on magnetic disk:
>> Permission denied
>
>> Here's the FileMon output from the same seconds:
>> 4:16:10 PM      postgres.exe:3168       OPEN    C:\Program
>> Files\PostgreSQL\8.1\data\base\16404\30267   DELETE PEND     Options:
>> Open  Access: 0012019F
>
> I still don't want to make mdsync() treat EACCES as an ignorable error.
> However, in this situation we've got an infinite loop because the
> checkpoint will never succeed and thus the bgwriter will never reach
> smgrcloseall(), which seems to be what's needed to allow the deleted
> file to die the real death.
>
> Perhaps a suitable workaround would be to make the bgwriter do
> smgrcloseall in its error recovery path?  That is
>
> /*
> * Sleep at least 1 second after any error.  A write error is likely
> * to be repeated, and we don't want to be filling the error logs as
> * fast as we can.
> */
> pg_usleep(1000000L);
> +
> + /* Drop open files to allow deleted files to really go away */
> + smgrcloseall();
> }
>
> /* We can now handle ereport(ERROR) */
> PG_exception_stack = &local_sigjmp_buf;
>
>
> Perhaps this should be #ifdef WIN32, although there's probably no harm
> in doing it on Unixen too.  Can someone test this idea?
>

in 8.2.0 the error messages changed a bit:

2006-12-05 03:47:12 [736] LOG:  could not fsync segment 0 of relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:12 [736] ERROR:  storage sync failed on magnetic disk:
Permission denied
2006-12-05 03:47:13 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:14 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:15 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:16 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:17 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:18 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:19 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:20 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:21 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:22 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:23 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:24 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:25 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:26 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:27 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:28 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:29 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:30 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:31 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:32 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:47:33 [736] ERROR:  could not open relation
1663/16692/2361629: Permission denied
2006-12-05 03:52:34 [736] LOG:  could not fsync segment 0 of relation
1663/16692/2361668: Permission denied
2006-12-05 03:52:34 [736] ERROR:  storage sync failed on magnetic disk:
Permission denied
2006-12-05 03:52:35 [736] ERROR:  could not open relation
1663/16692/2361668: Permission denied
2006-12-05 03:52:36 [736] ERROR:  could not open relation
1663/16692/2361668: Permission denied
2006-12-05 03:52:37 [736] ERROR:  could not open relation
1663/16692/2361668: Permission denied
2006-12-05 03:52:38 [736] ERROR:  could not open relation
1663/16692/2361668: Permission denied
2006-12-05 03:52:39 [736] ERROR:  could not open relation
1663/16692/2361668: Permission denied
2006-12-05 03:52:40 [736] ERROR:  could not open relation
1663/16692/2361668: Permission denied
2006-12-05 03:52:41 [736] ERROR:  could not open relation
1663/16692/2361668: Permission denied
... and so on.

- thomas

Re: fsync and semctl errors with 8.1.5/win32

From
Tom Lane
Date:
"Thomas H." <me@alternize.com> writes:
> in 8.2.0 the error messages changed a bit:

> 2006-12-05 03:47:12 [736] LOG:  could not fsync segment 0 of relation
> 1663/16692/2361629: Permission denied
> 2006-12-05 03:47:12 [736] ERROR:  storage sync failed on magnetic disk:
> Permission denied
> 2006-12-05 03:47:13 [736] ERROR:  could not open relation
> 1663/16692/2361629: Permission denied
> 2006-12-05 03:47:14 [736] ERROR:  could not open relation
> 1663/16692/2361629: Permission denied

So what's holding the file open now?  It's evidently not the bgwriter.

            regards, tom lane

Re: fsync and semctl errors with 8.1.5/win32

From
"Thomas H."
Date:
>> in 8.2.0 the error messages changed a bit:
>
>> 2006-12-05 03:47:12 [736] LOG:  could not fsync segment 0 of relation
>> 1663/16692/2361629: Permission denied
>> 2006-12-05 03:47:12 [736] ERROR:  storage sync failed on magnetic disk:
>> Permission denied
>> 2006-12-05 03:47:13 [736] ERROR:  could not open relation
>> 1663/16692/2361629: Permission denied
>> 2006-12-05 03:47:14 [736] ERROR:  could not open relation
>> 1663/16692/2361629: Permission denied
>
> So what's holding the file open now?  It's evidently not the bgwriter.

one of the unnamed postgresql.exe processes from the connection pool:
postgres: db_outnow outnow 127.0.0.1(3384) idle

might be related: in addition to the above messages, the log is now also
flooded by:

2006-12-05 04:16:29 [5196] LOG:  could not rename temporary statistics file
"global/pgstat.tmp" to "global/pgstat.stat": A blocking operation was
interrupted by a call to WSACancelBlockingCall.

there is no pgstat.tmp file in global...

- thomas

Re: fsync and semctl errors with 8.1.5/win32

From
"Thomas H."
Date:
>> 2006-12-05 03:47:12 [736] LOG:  could not fsync segment 0 of relation
>> 1663/16692/2361629: Permission denied
>> 2006-12-05 03:47:12 [736] ERROR:  storage sync failed on magnetic disk:
>> Permission denied
>> 2006-12-05 03:47:13 [736] ERROR:  could not open relation
>> 1663/16692/2361629: Permission denied
>> 2006-12-05 03:47:14 [736] ERROR:  could not open relation
>> 1663/16692/2361629: Permission denied
>
> So what's holding the file open now?  It's evidently not the bgwriter.


btw: FileMon reports every few seconds:
04:24:28 postgres.exe:736 OPEN D:\DB\postgreSQL.82\data\base\16692\2361629
DELETE PEND Options: Open  Access: 0012019F
the time corresponds to the "could not open relation" logentries


i would interpret this as: postgresql pid 736 (bgwriter) is trying to open
the file 2361629 which fails because it is marked as "to be deleted". the
file system operation is pending because another process (from the pgsql
connection pool) is still keeping a handle open.

as it is a connection pool process, it will be "recycled" after a while and
release open handles: everytime the error messages disappear after some
minutes...

- thomas

Re: fsync and semctl errors with 8.1.5/win32

From
Tom Lane
Date:
"Thomas H." <me@alternize.com> writes:
>> So what's holding the file open now?  It's evidently not the bgwriter.

> one of the unnamed postgresql.exe processes from the connection pool:
> postgres: db_outnow outnow 127.0.0.1(3384) idle

Hm.  I would imagine that as soon as this process does something,
the messages stop?  (It should close its file handle in response
to a relcache flush that it will read as soon as it becomes active.)

            regards, tom lane

Re: fsync and semctl errors with 8.1.5/win32

From
Tom Lane
Date:
"Thomas H." <me@alternize.com> writes:
> ... in addition to the above messages, the log is now also
> flooded by:

> 2006-12-05 04:16:29 [5196] LOG:  could not rename temporary statistics file
> "global/pgstat.tmp" to "global/pgstat.stat": A blocking operation was
> interrupted by a call to WSACancelBlockingCall.

Hm ... there simply isn't anything that holds pgstat.stat open for long,
so this behavior seems independent of any other issues we might have.
Can you find any evidence about what's wrong here?

            regards, tom lane

Re: fsync and semctl errors with 8.1.5/win32

From
"Thomas H."
Date:
>> ... in addition to the above messages, the log is now also
>> flooded by:
>
>> 2006-12-05 04:16:29 [5196] LOG:  could not rename temporary statistics
>> file
>> "global/pgstat.tmp" to "global/pgstat.stat": A blocking operation was
>> interrupted by a call to WSACancelBlockingCall.
>
> Hm ... there simply isn't anything that holds pgstat.stat open for long,
> so this behavior seems independent of any other issues we might have.
> Can you find any evidence about what's wrong here?

hope this helps:

05:33:14 postgres.exe:5196 CREATE D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Options: OverwriteIf  Access: 00120196
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 0 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 4096 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 8192 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 12288 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 16384 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 20480 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 24576 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 28672 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 32768 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 36864 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 40960 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 45056 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 49152 Length: 1650
05:33:14 postgres.exe:5196 CLOSE D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS
05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Options: Open  Access: 00110080
05:33:14 postgres.exe:5196 QUERY INFORMATION
D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS
FileAttributeTagInformation
05:33:14 postgres.exe:5196 QUERY INFORMATION
D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Attributes: A
05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.stat
SUCCESS Options: Open  Access: 00100002
05:33:14 postgres.exe:5196 SET INFORMATION
D:\DB\postgreSQL.82\data\global\pgstat.tmp * 0xC0000123
FileRenameInformation
05:33:14 postgres.exe:5196 CLOSE D:\DB\postgreSQL.82\data\global\pgstat.stat
SUCCESS
05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.tmp
NOT FOUND Options: Open  Access: 00110080
05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.tmp
NOT FOUND Options: Open  Access: 00010080
05:33:14 postgres.exe:5196 CREATE D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Options: OverwriteIf  Access: 00120196
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 0 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 4096 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 8192 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 12288 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 16384 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 20480 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 24576 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 28672 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 32768 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 36864 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 40960 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 45056 Length: 4096
05:33:14 postgres.exe:5196 WRITE  D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Offset: 49152 Length: 1650
05:33:14 postgres.exe:5196 CLOSE D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS
05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.tmp
SUCCESS Options: Open  Access: 00110080
05:33:14 postgres.exe:5196 QUERY INFORMATION
D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS
FileAttributeTagInformation
05:33:14 postgres.exe:5196 QUERY INFORMATION
D:\DB\postgreSQL.82\data\global\pgstat.tmp SUCCESS Attributes: A
05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.stat
SUCCESS Options: Open  Access: 00100002
05:33:14 postgres.exe:5196 SET INFORMATION
D:\DB\postgreSQL.82\data\global\pgstat.tmp * 0xC0000123
FileRenameInformation
05:33:14 postgres.exe:5196 CLOSE D:\DB\postgreSQL.82\data\global\pgstat.stat
SUCCESS
05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.tmp
NOT FOUND Options: Open  Access: 00110080
05:33:14 postgres.exe:5196 OPEN D:\DB\postgreSQL.82\data\global\pgstat.tmp
NOT FOUND Options: Open  Access: 00010080

those are the file actions that are happening at the same time as the error
log message appears.
2006-12-05 05:33:14 [5196] LOG:  could not rename temporary statistics file
"global/pgstat.tmp" to "global/pgstat.stat": A blocking operation was
interrupted by a call to WSACancelBlockingCall.
2006-12-05 05:33:14 [5196] LOG:  could not rename temporary statistics file
"global/pgstat.tmp" to "global/pgstat.stat": A blocking operation was
interrupted by a call to WSACancelBlockingCall.

the next file actions with pgstat take place some seconds later.

more or less the same thing is happening every 2-3 seconds when the log
errors appear. there was a short pause of ~30min when this error didn't
appear, but for over 40min now it is logged every few seconds. the rc1
doesn't show this behaviour, i've just doublechecked.

- thomas

Re: fsync and semctl errors with 8.1.5/win32

From
"Thomas H."
Date:
>>> So what's holding the file open now?  It's evidently not the bgwriter.
>
>> one of the unnamed postgresql.exe processes from the connection pool:
>> postgres: db_outnow outnow 127.0.0.1(3384) idle
>
> Hm.  I would imagine that as soon as this process does something,
> the messages stop?  (It should close its file handle in response
> to a relcache flush that it will read as soon as it becomes active.)

from what i observe i would say the process dies (timeouts?) and then
bgwriter is "happy" again:

here's *all* more information i got from filemon when filtering for one of
the relation that produced the error:

http://rafb.net/paste/results/3uozHD77.html

its pid 3772 that still has a handle open, while all the others have closed
it properly after pid 2780 issued a DELETE.
the process itself has 3 threads that are in:
- postgres.exe+0x1220
- postgres.exe!pg_queue_signal+0x120
- postgres.exe!shmctl+0x80
(i can get stacktraces for all of them if usefull)

pid 3772 died at 05:55:22 (~20min after its last access to the file), and
bgwriter could finally write, and the error messages are gone.

- thomas