Could not read directory "pg_xlog": Invalid argument (on SSD Raid) - Mailing list pgsql-general

From Data Growth Pty Ltd
Subject Could not read directory "pg_xlog": Invalid argument (on SSD Raid)
Date
Msg-id 51549ea20911032030n31ae047r227ce9b0a1c821aa@mail.gmail.com
Whole thread Raw
Responses Re: Could not read directory "pg_xlog": Invalid argument (on SSD Raid)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
I'm frequently getting these errors in my console:

4/11/09 2:25:04 PM    org.postgresql.postgres[192]    ERROR:  could not read directory "pg_xlog": Invalid argument
4/11/09 2:25:56 PM    org.postgresql.postgres[192]    ERROR:  could not read directory "pg_xlog": Invalid argument
4/11/09 2:36:03 PM    org.postgresql.postgres[192]    ERROR:  could not read directory "pg_xlog": Invalid argument

and rarely:

3/11/09 10:32:31 PM    org.postgresql.postgres[217]    ERROR:  could not read directory "pg_clog": Invalid argument

It is clearly not failing all the time, as the pg_xlog file is full of files that keep being touched and updated.  I have not experienced data loss (yet), but large queries are taking orders of magnitude longer than I would like.


System:

Mac Pro Quad Nahelem 2.93GHz, 16GB RAM running Snow Leopard OS X 10.6.1 in 64bit mode

Postgres 8.4.1 (Intel 64 bit) from http://www.kyngchaos.com/software:postgres
    ( I have also tried compiling from source - I have the same problems plus a few extra installation issues.  The "official" postgresql binary from http://www.enterprisedb.com/ is not 64 bit)

The postgres data directory is on an SSD Raid 0 array.  It can support around 10K random read I/O per second, or 5K random write I/Os, sustained, in other applications. pg_xlog and pg_clog are on the same SSD raid array as the postgres DB.



Under postgres it does several thousand I/Os per second for about 1-2 seconds, then drops back to only about 50 I/Os per second for about 10 seconds, before repeating the cycle.  CPU is usually only a couple % occupied.  The console often records an error message "pg_xlog": Invalid argument during those infrequent activity bursts.

I've looked at the source code in src/port/dirmod.c:

pgfnames(const char *path)
{
....
        while ((file = readdir(dir)) != NULL)
        {
....
                errno = 0;
        }
....
        if (errno)
        {
....
                fprintf(stderr, _("could not read directory \"%s\": %s\n"),
                                path, strerror(errno));
....
        }


So it seems that readdir is returning "Invalid argument" occasionally.  But I do not understand how this error could possibly occur in this location.

I've searched for "pg_xlog": Invalid argument, and the only other mention I have found was on Linux running on a ram disk.

Could this be a race condition?  Suggestions?

Stephen

pgsql-general by date:

Previous
From: shahrzad khorrami
Date:
Subject: Group by problem!
Next
From: John Burski
Date:
Subject: Re: createlang error(s)