Thread: drop tablespace error: invalid argument

drop tablespace error: invalid argument

From
Jan Otto
Date:
hello hackers,

i have problems dropping an existing empty tablespace. here is a reduced example:

AscheMobil:~ asche$ cat test2.sql 
CREATE TABLESPACE testspace LOCATION '/opt/postgresql/data2';
CREATE SCHEMA testschema;
CREATE TABLE testschema.foobar (id int) TABLESPACE testspace;
DROP SCHEMA testschema CASCADE;
DROP TABLESPACE testspace;

AscheMobil:~ asche$ /opt/postgresql/bin/psql asche <test2.sql 
CREATE TABLESPACE
CREATE SCHEMA
CREATE TABLE
NOTICE:  drop cascades to table testschema.foobar
DROP SCHEMA
ERROR:  could not read directory "pg_tblspc/16464": Invalid argument
STATEMENT:  DROP TABLESPACE testspace;
ERROR:  could not read directory "pg_tblspc/16464": Invalid argument

AscheMobil:~ asche$ ls -l /opt/postgresql/data/pg_tblspc/
total 8
lrwx------  1 asche  staff  21 Aug 16 13:08 16464 -> /opt/postgresql/data2

AscheMobil:~ asche$ ls -l /opt/postgresql/data2/
total 8
-rw-------  1 asche  staff  4 Aug 16 13:08 PG_VERSION

AscheMobil:~ asche$ id
uid=501(asche) gid=20(staff) groups=20(staff),204(_developer),100(_lpoperator),98(_lpadmin),81(_appserveradm),80(admin),79(_appserverusr),61(localaccounts),12(everyone),402(com.apple.sharepoint.group.1),401(com.apple.access_screensharing)

if i dont create the table testschema.foobar i can drop the tablespace without problems. there is another effect i wonder about. when i execute 'DROP TABLESPACE testspace;' two times at the end of script the second drop statement drops the tablespace correctly.

AscheMobil:~ asche$ echo 'DROP TABLESPACE testspace;'>>test2.sql 
AscheMobil:~ asche$ cat test2.sql 
CREATE TABLESPACE testspace LOCATION '/opt/postgresql/data2';
CREATE SCHEMA testschema;
CREATE TABLE testschema.foobar (id int) TABLESPACE testspace;
DROP SCHEMA testschema CASCADE;
DROP TABLESPACE testspace;
DROP TABLESPACE testspace;

AscheMobil:~ asche$ /opt/postgresql/bin/psql asche < test2.sql 
CREATE TABLESPACE
CREATE SCHEMA
CREATE TABLE
NOTICE:  drop cascades to table testschema.foobar
DROP SCHEMA
ERROR:  could not read directory "pg_tblspc/16469": Invalid argument
STATEMENT:  DROP TABLESPACE testspace;
ERROR:  could not read directory "pg_tblspc/16469": Invalid argument
DROP TABLESPACE

AscheMobil:~ asche$ ls -l /opt/postgresql/data2/
AscheMobil:~ asche$ ls -l /opt/postgresql/data/pg_tblspc/

AscheMobil:~ asche$ /opt/postgresql/bin/psql asche -c 'select version()'
                                                                version                                                                 
----------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.4.0 on i386-apple-darwin10.0.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646), 64-bit
(1 row)

this is the original postgresql-8.4.0 source package from http://www.postgresql.org/ftp/source/v8.4.0/ compiled with:
./configure --enable-debug --with-openssl --with-perl --with-python --with-tcl --with-libxml --with-libxslt --with-zlib --prefix=/opt/postgresql

it would be nice if somebody can take a look at this. 

regards, jan otto

Re: drop tablespace error: invalid argument

From
Tom Lane
Date:
Jan Otto <asche@me.com> writes:
> ERROR:  could not read directory "pg_tblspc/16464": Invalid argument
> STATEMENT:  DROP TABLESPACE testspace;

Hmm ... can't reproduce this here, not even on OSX.  From the version
number I suspect you are using unreleased Snow Leopard.  I'd venture
it's a newly-introduced kernel bug and you need to talk to Apple about
it.
        regards, tom lane


Re: drop tablespace error: invalid argument

From
Jan Otto
Date:
>> ERROR:  could not read directory "pg_tblspc/16464": Invalid argument
>> STATEMENT:  DROP TABLESPACE testspace;
>
> Hmm ... can't reproduce this here, not even on OSX.  From the version
> number I suspect you are using unreleased Snow Leopard.  I'd venture
> it's a newly-introduced kernel bug and you need to talk to Apple about
> it.

Thank you Tom. I will file a bugreport at Apple.

regards, jan otto



Re: drop tablespace error: invalid argument

From
Jan Otto
Date:
On Aug 16, 2009, at 8:25 PM, Tom Lane wrote:
Jan Otto <asche@me.com> writes:
ERROR:  could not read directory "pg_tblspc/16464": Invalid argument
STATEMENT:  DROP TABLESPACE testspace;

Hmm ... can't reproduce this here, not even on OSX.  From the version
number I suspect you are using unreleased Snow Leopard.  I'd venture
it's a newly-introduced kernel bug and you need to talk to Apple about
it.

regards, tom lane

I have digged a bit around in the source code of postgresql to build a
self contained test-case for Apple and found that the implementation
of Apples readdir() is buggy. readdir() fails under some circumstances.
So i have build a patch against current pgsql's HEAD to work around
the issue. If the bug in readdir() goes into the final release  snow leopard
we have a solution. 

This patch basically frees dirdesc and rereads the tablespace location
in case a subdirectory was deleted from the tablespace. this is the place
where snow leopard fails to read the next entry with readdir().

regards, jan otto

diff -c -r1.61 tablespace.c
*** pgsql/src/backend/commands/tablespace.c     22 Jan 2009 20:16:02 -0000      1.61
--- pgsql/src/backend/commands/tablespace.c     17 Aug 2009 22:36:01 -0000
***************
*** 611,616 ****
--- 611,623 ----
                                         errmsg("could not remove directory \"%s\": %m",
                                                        subfile)));
  
+               /*
+                * The following two lines work around a bug in Mac OS X Snow Leopard (Build 10A432)
+                * readdir() implementation. We free dirdesc and reread location from start. 
+                */
+               FreeDir(dirdesc);
+               dirdesc = AllocateDir(location);
                pfree(subfile);
        }

Re: drop tablespace error: invalid argument

From
Jan Otto
Date:
Jan Otto <asche@me.com> writes:
ERROR:  could not read directory "pg_tblspc/16464": Invalid argument
STATEMENT:  DROP TABLESPACE testspace;

I have digged a bit around in the source code of postgresql to build a
self contained test-case for Apple and found that the implementation
of Apples readdir() is buggy. readdir() fails under some circumstances.
So i have build a patch against current pgsql's HEAD to work around
the issue. If the bug in readdir() goes into the final release  snow leopard
we have a solution. 

This patch basically frees dirdesc and rereads the tablespace location
in case a subdirectory was deleted from the tablespace. this is the place
where snow leopard fails to read the next entry with readdir().

The bug in readdir() appeared in the final snow leopard too. Anybody
with Snow Leopard installed can check this, with simply doing the regression
tests (make check). The tablespace regression test is failing.

The patch i sent in works around the issue. if it is not acceptable to reread
the tablespace-directory after every delete i can rewrite the workaround.
Probably it is preferred that we write all entries of the directory into an array 
and looping through that array after that instead of looping with ReadDir()?

regards, jan otto 

Re: drop tablespace error: invalid argument

From
Tom Lane
Date:
Jan Otto <asche@me.com> writes:
> The bug in readdir() appeared in the final snow leopard too. Anybody
> with Snow Leopard installed can check this, with simply doing the
> regression tests (make check). The tablespace regression test is
> failing.

> The patch i sent in works around the issue. if it is not acceptable to
> reread the tablespace-directory after every delete i can rewrite the
> workaround.  Probably it is preferred that we write all entries of the
> directory into an array and looping through that array after that
> instead of looping with ReadDir()?

I'm not really eager to put in a workaround for such a basic OS bug,
especially not when the odds are good that it'll be fixed in 10.6.1.
Let's wait a little bit for Apple to get their act together.
        regards, tom lane


Re: drop tablespace error: invalid argument

From
Tom Lane
Date:
I wrote:
> Jan Otto <asche@me.com> writes:
>> The bug in readdir() appeared in the final snow leopard too. Anybody
>> with Snow Leopard installed can check this, with simply doing the
>> regression tests (make check). The tablespace regression test is
>> failing.

>> The patch i sent in works around the issue. if it is not acceptable to
>> reread the tablespace-directory after every delete i can rewrite the
>> workaround.  Probably it is preferred that we write all entries of the
>> directory into an array and looping through that array after that
>> instead of looping with ReadDir()?

> I'm not really eager to put in a workaround for such a basic OS bug,
> especially not when the odds are good that it'll be fixed in 10.6.1.
> Let's wait a little bit for Apple to get their act together.

Well, 10.6.1 is out and it's still got the readdir() bug :-(.

It's likely that there'll be a 10.6.2 before very long, but I wonder if
we should go ahead with some sort of hack; at least as a temporary fix
in CVS HEAD so that we can get more useful buildfarm reports from Snow
Leopard machines.

Comments?
        regards, tom lane


Re: drop tablespace error: invalid argument

From
"David E. Wheeler"
Date:
On Sep 11, 2009, at 12:42 PM, Tom Lane wrote:

> Well, 10.6.1 is out and it's still got the readdir() bug :-(.

Has someone filed a bug report about this with Apple?
    https://bugreport.apple.com/cgi-bin/WebObjects/RadarWeb.woa

Best,

David



Re: drop tablespace error: invalid argument

From
Robert Creager
Date:
On Sep 11, 2009, at 2:35 PM, David E. Wheeler wrote:

> On Sep 11, 2009, at 12:42 PM, Tom Lane wrote:
>
>> Well, 10.6.1 is out and it's still got the readdir() bug :-(.
>
> Has someone filed a bug report about this with Apple?
>
>    https://bugreport.apple.com/cgi-bin/WebObjects/RadarWeb.woa

If no one has (yet), I'll be happy to.  I just submitted one for an  
AirPort problem...  I guess I'll whip up an example program and just  
submit it anyway...  Anyone already written one?

Later,
Rob

Re: drop tablespace error: invalid argument

From
Robert Creager
Date:
On Sep 11, 2009, at 2:35 PM, David E. Wheeler wrote:

> On Sep 11, 2009, at 12:42 PM, Tom Lane wrote:
>
>> Well, 10.6.1 is out and it's still got the readdir() bug :-(.
>
> Has someone filed a bug report about this with Apple?
>
>    https://bugreport.apple.com/cgi-bin/WebObjects/RadarWeb.woa

Look at the history of this thread, and it's already submitted:
 http://www.nabble.com/drop-tablespace-error:-invalid-argument-td24992634.html

Later,
Rob

Re: drop tablespace error: invalid argument

From
Tom Lane
Date:
Jan Otto <asche@me.com> writes:
> This patch basically frees dirdesc and rereads the tablespace location
> in case a subdirectory was deleted from the tablespace. this is the  
> place
> where snow leopard fails to read the next entry with readdir().

I've applied this patch in HEAD only for the moment.  I hope that
Apple will have fixed their bug before the next set of PG back-branch
updates come out --- if not, we'll probably have to back-patch.
        regards, tom lane


Re: drop tablespace error: invalid argument

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I've applied this patch in HEAD only for the moment.  I hope that
> Apple will have fixed their bug before the next set of PG back-branch
> updates come out --- if not, we'll probably have to back-patch.

and on the flip side, I was hoping to see a new 8.4.2 soon due to the
recent commits I've seen against that branch... :/
Thanks,
    Stephen

Re: drop tablespace error: invalid argument

From
Jan Otto
Date:
>> Well, 10.6.1 is out and it's still got the readdir() bug :-(.
>
> Has someone filed a bug report about this with Apple?

yes i have filed a bugreport and keep this list informed when
there is something going on.

regards, jan otto



Re: drop tablespace error: invalid argument

From
Tom Lane
Date:
Jan Otto <asche@me.com> writes:
>>> ERROR:  could not read directory "pg_tblspc/16464": Invalid argument
>>> STATEMENT:  DROP TABLESPACE testspace;
>> 
>> Hmm ... can't reproduce this here, not even on OSX.  From the version
>> number I suspect you are using unreleased Snow Leopard.  I'd venture
>> it's a newly-introduced kernel bug and you need to talk to Apple about
>> it.

> Thank you Tom. I will file a bugreport at Apple.

Hey Jan, did you get any response to that bug report?  Somebody else
dug up a document suggesting that this might be intentional on Apple's
part:
http://archives.postgresql.org/pgsql-bugs/2009-11/msg00040.php

If he's right, we have a nontrivial problem here :-(
        regards, tom lane


Re: drop tablespace error: invalid argument

From
Jan Otto
Date:
>>>> ERROR:  could not read directory "pg_tblspc/16464": Invalid  
>>>> argument
>>>> STATEMENT:  DROP TABLESPACE testspace;
>>>
>>> Hmm ... can't reproduce this here, not even on OSX.  From the  
>>> version
>>> number I suspect you are using unreleased Snow Leopard.  I'd venture
>>> it's a newly-introduced kernel bug and you need to talk to Apple  
>>> about
>>> it.
>
>> Thank you Tom. I will file a bugreport at Apple.
>
> Hey Jan, did you get any response to that bug report?  Somebody else
> dug up a document suggesting that this might be intentional on Apple's
> part:
> http://archives.postgresql.org/pgsql-bugs/2009-11/msg00040.php
>
> If he's right, we have a nontrivial problem here :-(

no this is not intentional. i got late (22. Oct 2009) feedback from  
apple that my reported bug was marked as duplicate.

quoting apple:
"After further investigation it has been determined that this is a  
known issue, which is currently being investigated by engineering.  
This issue has been filed in our bug database under the original Bug  
ID# 6795764."

regards, jan otto



Re: drop tablespace error: invalid argument

From
Jan Otto
Date:
> Hey Jan, did you get any response to that bug report?  Somebody else
> dug up a document suggesting that this might be intentional on Apple's
> part:
> http://archives.postgresql.org/pgsql-bugs/2009-11/msg00040.php

i was not subscribed to pgsql-bugs list. i have read this message now  
and
see he is referring to an article that was last modified at 22. april  
2008 and
was written for the first mac os x (10.0)! this article is very very  
old and was
maybe modified during changes of apples knowledgbase-urls.

a quick check on mac os x 10.4 und 10.5 confirmed that this behaviour/ 
bug
is not present like described in this article. probably it was in  
10.0.x... i have
no older version of mac os x available here to check.

regards, jan otto




Re: drop tablespace error: invalid argument

From
Tom Lane
Date:
Jan Otto <asche@me.com> writes:
> a quick check on mac os x 10.4 und 10.5 confirmed that this behaviour/
> bug is not present like described in this article. probably it was in
> 10.0.x... i have no older version of mac os x available here to check.

Yeah, I thought we'd probably have heard about it before now if OSX
had acted like that all along.

My inclination is to continue assuming that the EINVAL is a new bug
introduced in Snow Leopard.  I sure hope they fix it in 10.6.2 though.
If they don't, we may have to think about a workaround, messy as that
will apparently be.
        regards, tom lane


Re: drop tablespace error: invalid argument

From
Tom Lane
Date:
I wrote:
> My inclination is to continue assuming that the EINVAL is a new bug
> introduced in Snow Leopard.  I sure hope they fix it in 10.6.2 though.
> If they don't, we may have to think about a workaround, messy as that
> will apparently be.

10.6.2 is out, and it appears to fix the bug --- if I remove the hack
in tablespace.c, we still pass regression tests.

Someone else please confirm?  If so I'll revert that patch.
        regards, tom lane


Re: drop tablespace error: invalid argument

From
Jan Otto
Date:
>> My inclination is to continue assuming that the EINVAL is a new bug
>> introduced in Snow Leopard.  I sure hope they fix it in 10.6.2 though.
>> If they don't, we may have to think about a workaround, messy as that
>> will apparently be.
> 
> 10.6.2 is out, and it appears to fix the bug --- if I remove the hack
> in tablespace.c, we still pass regression tests.
> 
> Someone else please confirm?  If so I'll revert that patch.

Yes i can confirm that this bug is fixed in Mac OS X 10.6.2. I have checked it twice.
With removed workaround in tablespace.c and with my self written testcase from
september.

regards, jan otto



Re: drop tablespace error: invalid argument

From
Stephen Tyler
Date:
On Tue, Nov 10, 2009 at 8:57 PM, Jan Otto <asche@me.com> wrote:
> Someone else please confirm?  If so I'll revert that patch.

Yes i can confirm that this bug is fixed in Mac OS X 10.6.2. I have checked it twice.
With removed workaround in tablespace.c and with my self written testcase from
september.

I can confirm that I am no longer able to trigger  "ERROR:  could not read directory "pg_xlog": Invalid argument" in Mac OS X 10.6.2 with "checkpoint_segments = 128".

I can also report that under 10.6.1, changing "checkpoint_segments = 128" to "checkpoint_segments = 64" made the pg_xlog errors disappear almost entirely. I could still easily trigger them with "VACUUM FULL", but could not trigger them on demand with regular db operations.

Stephen

PS: I am observing some kind of disk lock-up on my machine that I can't explain (and is present on both 10.6.1 and 10.6.2).  Huge operations (like "VACUUM FULL on a 50GB table") appear to run in brief spikes of activity interspersed with 30 second pauses when the disk appears to be both inactive and somewhat unresponsive and CPU is idle.  Perhaps fsync() is misbehaving (I have an SSD Raid 0 array).  Anyway I am mentioning this as a caution that although I can detect no readdir() errors on Mac OS X 10.6.2, perhaps all is not OK on my system.

Re: drop tablespace error: invalid argument

From
Tom Lane
Date:
Stephen Tyler <stephen@stephen-tyler.com> writes:
> On Tue, Nov 10, 2009 at 8:57 PM, Jan Otto <asche@me.com> wrote:
>>> Someone else please confirm?  If so I'll revert that patch.
>> 
>> Yes i can confirm that this bug is fixed in Mac OS X 10.6.2. I have checked
>> it twice.
>> With removed workaround in tablespace.c and with my self written testcase
>> from
>> september.

> I can confirm that I am no longer able to trigger  "ERROR:  could not read
> directory "pg_xlog": Invalid argument" in Mac OS X 10.6.2 with
> "checkpoint_segments = 128".

OK, I've reverted the hack in tablespace.c.  This is good, I was not
looking forward to providing our own implementation of readdir() :-(

> PS: I am observing some kind of disk lock-up on my machine that I can't
> explain (and is present on both 10.6.1 and 10.6.2).  Huge operations (like
> "VACUUM FULL on a 50GB table") appear to run in brief spikes of activity
> interspersed with 30 second pauses when the disk appears to be both inactive
> and somewhat unresponsive and CPU is idle.  Perhaps fsync() is misbehaving
> (I have an SSD Raid 0 array).

Maybe ktrace and/or dtrace would shed a bit of light on what's happening
there.
        regards, tom lane