Thread: Unhelpful initdb error message
Hi all, After building Postgres and trying an initdb, I'm getting the following: thom@swift:~/Development$ initdb The files belonging to this database system will be owned by user "thom". This user must also own the server process. The database cluster will be initialized with locale en_GB.UTF-8. The default database encoding has accordingly been set to UTF8. The default text search configuration will be set to "english". fixing permissions on existing directory /home/thom/Development/data ... ok creating subdirectories ... ok selecting default max_connections ... 10 selecting default shared_buffers ... 400kB creating configuration files ... ok creating template1 database in /home/thom/Development/data/base/1 ... FATAL: could not remove old lock file "postmaster.pid": No such file or directory HINT: The file seems accidentally left over, but it could not be removed. Please remove the file by hand and try again. child process exited with exit code 1 initdb: removing contents of data directory "/home/thom/Development/data" It can't remove an old lock file due to it not existing, but the hint says it was left over but couldn't be removed. The hint contradicts the error message. There is nothing in the data directory at all before trying this, and nothing after. Repeating initdb yields the same result. But, if I rename the data directory to something else and mkdir data again, all is well. I can make it break again by removing the new data directory and renaming the old one back to data, still completely empty. Note that throughout all of this, Postgres is running, but as a separate user and using completely separate directories, since it's the standard packaged version on Debian. Can anyone suggest what is wrong here? -- Thom
Thom Brown <thom@linux.com> writes: > thom@swift:~/Development$ initdb > The files belonging to this database system will be owned by user "thom". > This user must also own the server process. > The database cluster will be initialized with locale en_GB.UTF-8. > The default database encoding has accordingly been set to UTF8. > The default text search configuration will be set to "english". > fixing permissions on existing directory /home/thom/Development/data ... ok > creating subdirectories ... ok > selecting default max_connections ... 10 > selecting default shared_buffers ... 400kB > creating configuration files ... ok > creating template1 database in /home/thom/Development/data/base/1 ... > FATAL: could not remove old lock file "postmaster.pid": No such file > or directory > HINT: The file seems accidentally left over, but it could not be > removed. Please remove the file by hand and try again. > child process exited with exit code 1 > initdb: removing contents of data directory "/home/thom/Development/data" Um ... I assume this is some patched version rather than pristine sources? It's pretty hard to explain why it's falling over like that. I don't think there is anything wrong with the error message, because it's intended for the case where some previous postmaster failed and left a lock file behind. The question is how is it you're getting to that error, not whether we should change its text. One possible lead is that it looks like the postmaster-starting probes to select max_connections and shared_buffers all failed too, since those numbers came out as the minimums. regards, tom lane
On Tuesday, March 06, 2012 7:46:37 am Thom Brown wrote: > Hi all, > > After building Postgres and trying an initdb, I'm getting the following: > > > thom@swift:~/Development$ initdb > The files belonging to this database system will be owned by user "thom". > This user must also own the server process. > > The database cluster will be initialized with locale en_GB.UTF-8. > The default database encoding has accordingly been set to UTF8. > The default text search configuration will be set to "english". > > fixing permissions on existing directory /home/thom/Development/data ... ok > creating subdirectories ... ok > selecting default max_connections ... 10 > selecting default shared_buffers ... 400kB > creating configuration files ... ok > creating template1 database in /home/thom/Development/data/base/1 ... > FATAL: could not remove old lock file "postmaster.pid": No such file > or directory > HINT: The file seems accidentally left over, but it could not be > removed. Please remove the file by hand and try again. > child process exited with exit code 1 > initdb: removing contents of data directory "/home/thom/Development/data" > > > It can't remove an old lock file due to it not existing, but the hint > says it was left over but couldn't be removed. The hint contradicts > the error message. There is nothing in the data directory at all > before trying this, and nothing after. Repeating initdb yields the > same result. > > But, if I rename the data directory to something else and mkdir data > again, all is well. I can make it break again by removing the new > data directory and renaming the old one back to data, still completely > empty. Note that throughout all of this, Postgres is running, but as > a separate user and using completely separate directories, since it's > the standard packaged version on Debian. > > Can anyone suggest what is wrong here? The postmaster.pid is located outside the data directory, but points back to the data directory. Not sure where Debian, though at a guess somewhere in /var. Any way search for postmaster.pid. -- Adrian Klaver adrian.klaver@gmail.com
On 6 March 2012 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thom Brown <thom@linux.com> writes: >> thom@swift:~/Development$ initdb >> The files belonging to this database system will be owned by user "thom". >> This user must also own the server process. > >> The database cluster will be initialized with locale en_GB.UTF-8. >> The default database encoding has accordingly been set to UTF8. >> The default text search configuration will be set to "english". > >> fixing permissions on existing directory /home/thom/Development/data ... ok >> creating subdirectories ... ok >> selecting default max_connections ... 10 >> selecting default shared_buffers ... 400kB >> creating configuration files ... ok >> creating template1 database in /home/thom/Development/data/base/1 ... >> FATAL: could not remove old lock file "postmaster.pid": No such file >> or directory >> HINT: The file seems accidentally left over, but it could not be >> removed. Please remove the file by hand and try again. >> child process exited with exit code 1 >> initdb: removing contents of data directory "/home/thom/Development/data" > > Um ... I assume this is some patched version rather than pristine > sources? It's pretty hard to explain why it's falling over like that. No, I did a "git stash", "git clean -f" and "git pull" before trying to build. -- Thom
On 6 March 2012 16:04, Adrian Klaver <adrian.klaver@gmail.com> wrote: > The postmaster.pid is located outside the data directory, but points back to the > data directory. Not sure where Debian, though at a guess somewhere in /var. > Any way search for postmaster.pid. I'm not sure, because if I use a new data directory, initdb it and start the service, the postmaster.pid appears in it, and not as a symbolic link. I did a search for postmaster.pid in the whole of /var and it only shows up "/var/lib/postgresql/9.1/main/postmaster.pid" -- Thom
On Tuesday, March 06, 2012 8:11:20 am Thom Brown wrote: > On 6 March 2012 16:04, Adrian Klaver <adrian.klaver@gmail.com> wrote: > > The postmaster.pid is located outside the data directory, but points back > > to the data directory. Not sure where Debian, though at a guess > > somewhere in /var. Any way search for postmaster.pid. > > I'm not sure, because if I use a new data directory, initdb it and > start the service, the postmaster.pid appears in it, and not as a > symbolic link. > > I did a search for postmaster.pid in the whole of /var and it only > shows up "/var/lib/postgresql/9.1/main/postmaster.pid" My guess is if you open that file you will find it points back to the old directory. So are you still running the Debian packaged version of Postgres? Or in other words does a ps show any other postmasters running other than the new one you built? -- Adrian Klaver adrian.klaver@gmail.com
On 6 March 2012 16:11, Thom Brown <thom@linux.com> wrote: > On 6 March 2012 16:04, Adrian Klaver <adrian.klaver@gmail.com> wrote: >> The postmaster.pid is located outside the data directory, but points back to the >> data directory. Not sure where Debian, though at a guess somewhere in /var. >> Any way search for postmaster.pid. > > I'm not sure, because if I use a new data directory, initdb it and > start the service, the postmaster.pid appears in it, and not as a > symbolic link. > > I did a search for postmaster.pid in the whole of /var and it only > shows up "/var/lib/postgresql/9.1/main/postmaster.pid" Correction, this is Ubuntu, not Debian. 11.10 if it's of any consequence. The file system is ext4 with rw,noatime,nodiratime,errors=remount-ro,commit=0 on a Crucial m4 SSD. ecryptfs is in use in the parent directory. -- Thom
On 6 March 2012 16:18, Adrian Klaver <adrian.klaver@gmail.com> wrote: > On Tuesday, March 06, 2012 8:11:20 am Thom Brown wrote: >> On 6 March 2012 16:04, Adrian Klaver <adrian.klaver@gmail.com> wrote: >> > The postmaster.pid is located outside the data directory, but points back >> > to the data directory. Not sure where Debian, though at a guess >> > somewhere in /var. Any way search for postmaster.pid. >> >> I'm not sure, because if I use a new data directory, initdb it and >> start the service, the postmaster.pid appears in it, and not as a >> symbolic link. >> >> I did a search for postmaster.pid in the whole of /var and it only >> shows up "/var/lib/postgresql/9.1/main/postmaster.pid" > > > My guess is if you open that file you will find it points back to the old > directory. So are you still running the Debian packaged version of Postgres? > Or in other words does a ps show any other postmasters running other than the > new one you built? No, only the ones running as the postgres user. Here's the contents of the pid file in /var/lib/postgresql/9.1/main/ 1199 /var/lib/postgresql/9.1/main 1330883367 5432 /var/run/postgresql localhost 5432001 0 And if I start my development copy, this is the content of its postmaster.pid: 27061 /home/thom/Development/data 1331050950 5488 /tmp localhost 5488001 191365126 -- Thom
Thom Brown <thom@linux.com> writes: > On 6 March 2012 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Um ... I assume this is some patched version rather than pristine >> sources? �It's pretty hard to explain why it's falling over like that. > No, I did a "git stash", "git clean -f" and "git pull" before trying to build. [ scratches head... ] I can't reproduce it with current git tip. regards, tom lane
On 6 March 2012 16:31, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thom Brown <thom@linux.com> writes: >> On 6 March 2012 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Um ... I assume this is some patched version rather than pristine >>> sources? It's pretty hard to explain why it's falling over like that. > >> No, I did a "git stash", "git clean -f" and "git pull" before trying to build. > > [ scratches head... ] I can't reproduce it with current git tip. And I don't think I can reproduce this if I remove that directory. I've seen this issue about 3 or 4 times in the past, and fixed it by ditching the old data dir completely. I'm just not sure what causes this to happen. Looking back through my terminal log, one thing might lend a clue from before I tried rebuliding it: thom@swift:~/Development$ pg_ctl stop waiting for server to shut down....cd .postgre.s ............. .... ....^C thom@swift:~/Development$ pg_ctl stop pg_ctl: could not send stop signal (PID: 2807): No such process thom@swift:~/Development$ ps -ef | grep postgres postgres 1199 1 0 Mar04 ? 00:00:01 /usr/lib/postgresql/9.1/bin/postgres -D /var/lib/postgresql/9.1/main -c config_file=/etc/postgresql/9.1/main/postgresql.conf postgres 1273 1199 0 Mar04 ? 00:00:18 postgres: writer process postgres 1274 1199 0 Mar04 ? 00:00:14 postgres: wal writer process postgres 1275 1199 0 Mar04 ? 00:00:03 postgres: autovacuum launcher process postgres 1276 1199 0 Mar04 ? 00:00:02 postgres: stats collector process thom 16476 4302 0 15:30 pts/1 00:00:00 grep --color=auto postgres Postgres wouldn't shut down. I had no other terminal windows using psql, no other database client apps open, yet it stayed shutting down, so I CTRL+C'd it and tried again. A quick check of running processes showed that it had stopped running. (it shows postgres running above, but the dev copy runs as my user, not postgres) -- Thom
On Tuesday, March 06, 2012 8:24:20 am Thom Brown wrote: > > > No, only the ones running as the postgres user. In my original read, I missed the part you had the Ubuntu/Debian packaged version running. > > Here's the contents of the pid file in /var/lib/postgresql/9.1/main/ > > 1199 > /var/lib/postgresql/9.1/main > 1330883367 > 5432 > /var/run/postgresql > localhost > 5432001 0 > > And if I start my development copy, this is the content of its > postmaster.pid: > > 27061 > /home/thom/Development/data > 1331050950 > 5488 > /tmp > localhost > 5488001 191365126 So how are getting the file above? I thought initdb refused to init the directory and that you could not find pid file it was referring to? Just on a hunch, what is in /tmp? -- Adrian Klaver adrian.klaver@gmail.com
On 6 March 2012 16:40, Adrian Klaver <adrian.klaver@gmail.com> wrote: > On Tuesday, March 06, 2012 8:24:20 am Thom Brown wrote: >> >> >> No, only the ones running as the postgres user. > > In my original read, I missed the part you had the Ubuntu/Debian packaged > version running. > >> >> Here's the contents of the pid file in /var/lib/postgresql/9.1/main/ >> >> 1199 >> /var/lib/postgresql/9.1/main >> 1330883367 >> 5432 >> /var/run/postgresql >> localhost >> 5432001 0 >> >> And if I start my development copy, this is the content of its >> postmaster.pid: >> >> 27061 >> /home/thom/Development/data >> 1331050950 >> 5488 >> /tmp >> localhost >> 5488001 191365126 > > So how are getting the file above? I thought initdb refused to init the directory > and that you could not find pid file it was referring to? Just on a hunch, what is > in /tmp? I got the above output when I created a new data directory and initdb'd it. /tmp shows: 4 -rw------- 1 thom thom 55 2012-03-06 16:22 .s.PGSQL.5488.lock 0 srwxrwxrwx 1 thom thom 0 2012-03-06 16:22 .s.PGSQL.5488 Once it's up and running. These disappear after though. When using the old data directory again, there's no evidence of anything like this in /tmp. -- Thom
On Tuesday, March 06, 2012 8:44:10 am Thom Brown wrote: > >> And if I start my development copy, this is the content of its > >> postmaster.pid: > >> > >> 27061 > >> /home/thom/Development/data > >> 1331050950 > >> 5488 > >> /tmp > >> localhost > >> 5488001 191365126 > > > > So how are getting the file above? I thought initdb refused to init the > > directory and that you could not find pid file it was referring to? Just > > on a hunch, what is in /tmp? > > I got the above output when I created a new data directory and initdb'd it. Still not understanding. In your original post you said /home/thom/Development/data was the original directory you could not initdb. How could it also be the new directory you can initdb as indicated by the postmaster.pid? From your previous post: thom@swift:~/Development$ pg_ctl stop pg_ctl: could not send stop signal (PID: 2807): No such process Doing the above without qualifying which version of pg_ctl you are using or what data directory you are pointing is dangerous. The combination of implied pathing and preset env variables could lead to all sorts of mischief. > > /tmp shows: > > 4 -rw------- 1 thom thom 55 2012-03-06 16:22 > .s.PGSQL.5488.lock > 0 srwxrwxrwx 1 thom thom 0 2012-03-06 16:22 > .s.PGSQL.5488 > > Once it's up and running. These disappear after though. When using > the old data directory again, there's no evidence of anything like > this in /tmp. -- Adrian Klaver adrian.klaver@gmail.com
On 6 March 2012 17:00, Adrian Klaver <adrian.klaver@gmail.com> wrote: > On Tuesday, March 06, 2012 8:44:10 am Thom Brown wrote: > >> >> And if I start my development copy, this is the content of its >> >> postmaster.pid: >> >> >> >> 27061 >> >> /home/thom/Development/data >> >> 1331050950 >> >> 5488 >> >> /tmp >> >> localhost >> >> 5488001 191365126 >> > >> > So how are getting the file above? I thought initdb refused to init the >> > directory and that you could not find pid file it was referring to? Just >> > on a hunch, what is in /tmp? >> >> I got the above output when I created a new data directory and initdb'd it. > > Still not understanding. In your original post you said > /home/thom/Development/data was the original directory you could not initdb. How > could it also be the new directory you can initdb as indicated by the > postmaster.pid? /home/thom/Development/data was causing problems so: mv data databroken mkdir data initdb ... working fine again. I then used the postmaster.pid from this when started up. But if I do: pg_ctl stop rm -rf data mv databroken data initdb ... error messages appear again. > From your previous post: > thom@swift:~/Development$ pg_ctl stop > pg_ctl: could not send stop signal (PID: 2807): No such process > > Doing the above without qualifying which version of pg_ctl you are using or what > data directory you are pointing is dangerous. The combination of implied > pathing and preset env variables could lead to all sorts of mischief. Unlikely since pg_ctl isn't available in my search path once I remove my local development bin dir from it. All non-client tools for the packaged version aren't available to normal users. Those are all in /usr/lib/postgresql/9.1/bin. The only ones exposed to the search path through symbolic links are: clusterdb createdb createlang createuser dropdb droplang dropuser pg_dump pg_dumpall pg_restore psql reindexdb vacuumdb vacuumlo -- Thom
Thom Brown <thom@linux.com> writes: > Looking back through my terminal log, one thing might lend a clue from > before I tried rebuliding it: > thom@swift:~/Development$ pg_ctl stop > waiting for server to shut down....cd .postgre.s > ............. > .... > ....^C > thom@swift:~/Development$ pg_ctl stop > pg_ctl: could not send stop signal (PID: 2807): No such process > thom@swift:~/Development$ ps -ef | grep postgres > postgres 1199 1 0 Mar04 ? 00:00:01 > /usr/lib/postgresql/9.1/bin/postgres -D /var/lib/postgresql/9.1/main > -c config_file=/etc/postgresql/9.1/main/postgresql.conf > postgres 1273 1199 0 Mar04 ? 00:00:18 postgres: writer > process > postgres 1274 1199 0 Mar04 ? 00:00:14 postgres: wal writer > process > postgres 1275 1199 0 Mar04 ? 00:00:03 postgres: autovacuum > launcher process > postgres 1276 1199 0 Mar04 ? 00:00:02 postgres: stats > collector process > thom 16476 4302 0 15:30 pts/1 00:00:00 grep --color=auto postgres Hm. It looks like pg_ctl found a PID file pointing to a non-existent process, which is a bit like what you're seeing initdb do. I wonder whether this is somehow caused by conflicting settings for PGDATA. Do you have a setting for that in your environment, or .bashrc or someplace, that is different from what you're trying to use? regards, tom lane
On Tuesday, March 06, 2012 9:09:41 am Thom Brown wrote: > On 6 March 2012 17:00, Adrian Klaver <adrian.klaver@gmail.com> wrote: > > On Tuesday, March 06, 2012 8:44:10 am Thom Brown wrote: > >> >> And if I start my development copy, this is the content of its > >> >> postmaster.pid: > >> >> > >> >> 27061 > >> >> /home/thom/Development/data > >> >> 1331050950 > >> >> 5488 > >> >> /tmp > >> >> localhost > >> >> 5488001 191365126 > >> > > >> > So how are getting the file above? I thought initdb refused to init > >> > the directory and that you could not find pid file it was referring > >> > to? Just on a hunch, what is in /tmp? > >> > >> I got the above output when I created a new data directory and initdb'd > >> it. > > > > Still not understanding. In your original post you said > > /home/thom/Development/data was the original directory you could not > > initdb. How could it also be the new directory you can initdb as > > indicated by the postmaster.pid? > > /home/thom/Development/data was causing problems so: > > mv data databroken > mkdir data > initdb > > ... working fine again. I then used the postmaster.pid from this when > started up. But if I do: > > pg_ctl stop > rm -rf data > mv databroken data > initdb > > ... error messages appear again. Humph, need more coffee. > > > From your previous post: > > thom@swift:~/Development$ pg_ctl stop > > pg_ctl: could not send stop signal (PID: 2807): No such process > > > > Doing the above without qualifying which version of pg_ctl you are using > > or what data directory you are pointing is dangerous. The combination > > of implied pathing and preset env variables could lead to all sorts of > > mischief. > > Unlikely since pg_ctl isn't available in my search path once I remove > my local development bin dir from it. All non-client tools for the > packaged version aren't available to normal users. Those are all in > /usr/lib/postgresql/9.1/bin. The only ones exposed to the search path > through symbolic links are: env variables? -- Adrian Klaver adrian.klaver@gmail.com
On 6 March 2012 17:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thom Brown <thom@linux.com> writes: >> Looking back through my terminal log, one thing might lend a clue from >> before I tried rebuliding it: > >> thom@swift:~/Development$ pg_ctl stop >> waiting for server to shut down....cd .postgre.s >> ............. >> .... > > > >> ....^C >> thom@swift:~/Development$ pg_ctl stop >> pg_ctl: could not send stop signal (PID: 2807): No such process >> thom@swift:~/Development$ ps -ef | grep postgres >> postgres 1199 1 0 Mar04 ? 00:00:01 >> /usr/lib/postgresql/9.1/bin/postgres -D /var/lib/postgresql/9.1/main >> -c config_file=/etc/postgresql/9.1/main/postgresql.conf >> postgres 1273 1199 0 Mar04 ? 00:00:18 postgres: writer >> process >> postgres 1274 1199 0 Mar04 ? 00:00:14 postgres: wal writer >> process >> postgres 1275 1199 0 Mar04 ? 00:00:03 postgres: autovacuum >> launcher process >> postgres 1276 1199 0 Mar04 ? 00:00:02 postgres: stats >> collector process >> thom 16476 4302 0 15:30 pts/1 00:00:00 grep --color=auto postgres > > Hm. It looks like pg_ctl found a PID file pointing to a non-existent > process, which is a bit like what you're seeing initdb do. > > I wonder whether this is somehow caused by conflicting settings for > PGDATA. Do you have a setting for that in your environment, or .bashrc > or someplace, that is different from what you're trying to use? These are in my env output: PATH=/home/thom/Development/psql/bin/:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games PGDATA=/home/thom/Development/data/ PGPORT=5488 This appears in my build script before configure: export PGDATA=$HOME/Development/data/ export PATH=$HOME/Development/psql/bin/:$PATH export PGPORT=5488 And those 3 lines also appear in my .bashrc file without any variation: export PGDATA=$HOME/Development/data/ export PATH=$HOME/Development/psql/bin/:$PATH export PGPORT=5488 -- Thom
Adrian Klaver <adrian.klaver@gmail.com> writes: > The postmaster.pid is located outside the data directory, but points back to the > data directory. Not sure where Debian, though at a guess somewhere in /var. > Any way search for postmaster.pid. Really? That seems like an extremely dangerous/stupid/unnecessary hack on the part of the Debian packagers. What's keeping users from accidentally starting two postmasters in the same data directory, if they can put their pidfiles in (different) other places? (This seems unrelated to Thom's issue, but it's still worrisome.) regards, tom lane
On Tuesday, March 06, 2012 9:25:17 am Thom Brown wrote: > > These are in my env output: > > PATH=/home/thom/Development/psql/bin/:/usr/lib/lightdm/lightdm:/usr/local/s > bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games > PGDATA=/home/thom/Development/data/ > PGPORT=5488 > > This appears in my build script before configure: > > export PGDATA=$HOME/Development/data/ > export PATH=$HOME/Development/psql/bin/:$PATH > export PGPORT=5488 > > And those 3 lines also appear in my .bashrc file without any variation: > > export PGDATA=$HOME/Development/data/ > export PATH=$HOME/Development/psql/bin/:$PATH > export PGPORT=5488 And you are sure there is no pg_ctl or initdb outside /usr/lib/postgresql/9.1/bin or /home/thom/Development/psql/bin and in your PATH? Just for grins what happens if you try an initdb using an explicit reference to the binary /home/thom/Development/psql/bin/initdb and the -D /home/thom/Development/data/ ? -- Adrian Klaver adrian.klaver@gmail.com
Thom Brown <thom@linux.com> writes: > On 6 March 2012 16:31, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> [ scratches head... ] �I can't reproduce it with current git tip. > And I don't think I can reproduce this if I remove that directory. > I've seen this issue about 3 or 4 times in the past, and fixed it by > ditching the old data dir completely. I'm just not sure what causes > this to happen. I'm a bit confused here. Isn't the data directory totally empty before initdb starts? It's supposed to refuse to proceed otherwise. regards, tom lane
On Tuesday, March 06, 2012 9:43:00 am Tom Lane wrote: > Adrian Klaver <adrian.klaver@gmail.com> writes: > > The postmaster.pid is located outside the data directory, but points back > > to the data directory. Not sure where Debian, though at a guess > > somewhere in /var. Any way search for postmaster.pid. > > Really? That seems like an extremely dangerous/stupid/unnecessary hack > on the part of the Debian packagers. What's keeping users from > accidentally starting two postmasters in the same data directory, if > they can put their pidfiles in (different) other places? No, that was a mistake on my part. It is in the $DATA directory. > > (This seems unrelated to Thom's issue, but it's still worrisome.) > > regards, tom lane -- Adrian Klaver adrian.klaver@gmail.com
On 6 March 2012 17:45, Adrian Klaver <adrian.klaver@gmail.com> wrote: > On Tuesday, March 06, 2012 9:25:17 am Thom Brown wrote: > >> >> These are in my env output: >> >> PATH=/home/thom/Development/psql/bin/:/usr/lib/lightdm/lightdm:/usr/local/s >> bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games >> PGDATA=/home/thom/Development/data/ >> PGPORT=5488 >> >> This appears in my build script before configure: >> >> export PGDATA=$HOME/Development/data/ >> export PATH=$HOME/Development/psql/bin/:$PATH >> export PGPORT=5488 >> >> And those 3 lines also appear in my .bashrc file without any variation: >> >> export PGDATA=$HOME/Development/data/ >> export PATH=$HOME/Development/psql/bin/:$PATH >> export PGPORT=5488 > > And you are sure there is no pg_ctl or initdb outside > /usr/lib/postgresql/9.1/bin or /home/thom/Development/psql/bin and in your PATH? > > Just for grins what happens if you try an initdb using an explicit reference to > the binary /home/thom/Development/psql/bin/initdb and the -D > /home/thom/Development/data/ ? thom@swift:~/Development$ /home/thom/Development/psql/bin/initdb -E 'UTF8' -D /home/thom/Development/data/ The files belonging to this database system will be owned by user "thom". This user must also own the server process. The database cluster will be initialized with locale en_GB.UTF-8. The default text search configuration will be set to "english". fixing permissions on existing directory /home/thom/Development/data ... ok creating subdirectories ... ok selecting default max_connections ... 10 selecting default shared_buffers ... 400kB creating configuration files ... ok creating template1 database in /home/thom/Development/data/base/1 ... FATAL: could not remove old lock file "postmaster.pid": No such file or directory HINT: The file seems accidentally left over, but it could not be removed. Please remove the file by hand and try again. child process exited with exit code 1 initdb: removing contents of data directory "/home/thom/Development/data" -- Thom
On 6 March 2012 17:46, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thom Brown <thom@linux.com> writes: >> On 6 March 2012 16:31, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> [ scratches head... ] I can't reproduce it with current git tip. > >> And I don't think I can reproduce this if I remove that directory. >> I've seen this issue about 3 or 4 times in the past, and fixed it by >> ditching the old data dir completely. I'm just not sure what causes >> this to happen. > > I'm a bit confused here. Isn't the data directory totally empty before > initdb starts? It's supposed to refuse to proceed otherwise. Yes, it is completely empty: thom@swift:~/Development$ ls -la data total 8 drwx------ 2 thom thom 4096 2012-03-06 17:48 . drwxrwxr-x 15 thom thom 4096 2012-03-06 17:46 .. -- Thom
Thom Brown <thom@linux.com> writes: > /home/thom/Development/data was causing problems so: > mv data databroken > mkdir data > initdb > ... working fine again. I then used the postmaster.pid from this when > started up. But if I do: > pg_ctl stop > rm -rf data > mv databroken data > initdb > ... error messages appear again. Okay, so the question becomes: what is different between databroken and a freshly mkdir'd empty directory? If there is no visible difference in contents, ownership, or permissions, then it seems like this is evidence of a filesystem bug (ie, apparently empty directory acts nonempty for some operations). regards, tom lane
On Tuesday, March 06, 2012 9:48:51 am Thom Brown wrote: > On 6 March 2012 17:45, Adrian Klaver <adrian.klaver@gmail.com> wrote: > > On Tuesday, March 06, 2012 9:25:17 am Thom Brown wrote: > >> These are in my env output: > >> > >> PATH=/home/thom/Development/psql/bin/:/usr/lib/lightdm/lightdm:/usr/loca > >> l/s bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games > >> PGDATA=/home/thom/Development/data/ > >> PGPORT=5488 > >> > >> This appears in my build script before configure: > >> > >> export PGDATA=$HOME/Development/data/ > >> export PATH=$HOME/Development/psql/bin/:$PATH > >> export PGPORT=5488 > >> > >> And those 3 lines also appear in my .bashrc file without any variation: > >> > >> export PGDATA=$HOME/Development/data/ > >> export PATH=$HOME/Development/psql/bin/:$PATH > >> export PGPORT=5488 > > > > And you are sure there is no pg_ctl or initdb outside > > /usr/lib/postgresql/9.1/bin or /home/thom/Development/psql/bin and in > > your PATH? So that would be no:)? > > > > Just for grins what happens if you try an initdb using an explicit > > reference to the binary /home/thom/Development/psql/bin/initdb and the > > -D > > /home/thom/Development/data/ ? > > thom@swift:~/Development$ /home/thom/Development/psql/bin/initdb -E > 'UTF8' -D /home/thom/Development/data/ > The files belonging to this database system will be owned by user "thom". > This user must also own the server process. > > The database cluster will be initialized with locale en_GB.UTF-8. > The default text search configuration will be set to "english". > > fixing permissions on existing directory /home/thom/Development/data ... ok > creating subdirectories ... ok > selecting default max_connections ... 10 > selecting default shared_buffers ... 400kB > creating configuration files ... ok > creating template1 database in /home/thom/Development/data/base/1 ... > FATAL: could not remove old lock file "postmaster.pid": No such file > or directory > HINT: The file seems accidentally left over, but it could not be > removed. Please remove the file by hand and try again. > child process exited with exit code 1 > initdb: removing contents of data directory "/home/thom/Development/data" Its official, I'm stumped. Information seems to be persisting between sessions and absent some other cluster then the ones you have indicated I don't where that information is coming from? -- Adrian Klaver adrian.klaver@gmail.com
On 6 March 2012 17:53, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thom Brown <thom@linux.com> writes: >> /home/thom/Development/data was causing problems so: > >> mv data databroken >> mkdir data >> initdb > >> ... working fine again. I then used the postmaster.pid from this when >> started up. But if I do: > >> pg_ctl stop >> rm -rf data >> mv databroken data >> initdb > >> ... error messages appear again. > > Okay, so the question becomes: what is different between databroken and > a freshly mkdir'd empty directory? If there is no visible difference in > contents, ownership, or permissions, then it seems like this is evidence > of a filesystem bug (ie, apparently empty directory acts nonempty for > some operations). You may well be right. There appear to be dark forces at work here: thom@swift:~/Development/data$ touch postmaster.pid thom@swift:~/Development/data$ ls -l total 0 thom@swift:~/Development/data$ touch file.txt thom@swift:~/Development/data$ ls -l total 8 -rw-rw-r-- 1 thom thom 0 2012-03-06 17:59 file.txt -- Thom
On Tuesday, March 06, 2012 9:53:52 am Tom Lane wrote: > Thom Brown <thom@linux.com> writes: > > /home/thom/Development/data was causing problems so: > > > > mv data databroken > > mkdir data > > initdb > > > > ... working fine again. I then used the postmaster.pid from this when > > started up. But if I do: > > > > pg_ctl stop > > rm -rf data > > mv databroken data > > initdb > > > > ... error messages appear again. > > Okay, so the question becomes: what is different between databroken and > a freshly mkdir'd empty directory? If there is no visible difference in > contents, ownership, or permissions, then it seems like this is evidence > of a filesystem bug (ie, apparently empty directory acts nonempty for > some operations). A thought, what if you do rm -rf * in the data directory? > > regards, tom lane -- Adrian Klaver adrian.klaver@gmail.com
On 6 March 2012 18:01, Adrian Klaver <adrian.klaver@gmail.com> wrote: > On Tuesday, March 06, 2012 9:53:52 am Tom Lane wrote: >> Thom Brown <thom@linux.com> writes: >> > /home/thom/Development/data was causing problems so: >> > >> > mv data databroken >> > mkdir data >> > initdb >> > >> > ... working fine again. I then used the postmaster.pid from this when >> > started up. But if I do: >> > >> > pg_ctl stop >> > rm -rf data >> > mv databroken data >> > initdb >> > >> > ... error messages appear again. >> >> Okay, so the question becomes: what is different between databroken and >> a freshly mkdir'd empty directory? If there is no visible difference in >> contents, ownership, or permissions, then it seems like this is evidence >> of a filesystem bug (ie, apparently empty directory acts nonempty for >> some operations). > > A thought, what if you do rm -rf * in the data directory? I've done that a couple times, but no effect. I think Tom's point about a filesystem bug is probably right. -- Thom
On Tue, Mar 6, 2012 at 19:03, Thom Brown <thom@linux.com> wrote: > On 6 March 2012 18:01, Adrian Klaver <adrian.klaver@gmail.com> wrote: >> On Tuesday, March 06, 2012 9:53:52 am Tom Lane wrote: >>> Thom Brown <thom@linux.com> writes: >>> > /home/thom/Development/data was causing problems so: >>> > >>> > mv data databroken >>> > mkdir data >>> > initdb >>> > >>> > ... working fine again. I then used the postmaster.pid from this when >>> > started up. But if I do: >>> > >>> > pg_ctl stop >>> > rm -rf data >>> > mv databroken data >>> > initdb >>> > >>> > ... error messages appear again. >>> >>> Okay, so the question becomes: what is different between databroken and >>> a freshly mkdir'd empty directory? If there is no visible difference in >>> contents, ownership, or permissions, then it seems like this is evidence >>> of a filesystem bug (ie, apparently empty directory acts nonempty for >>> some operations). >> >> A thought, what if you do rm -rf * in the data directory? > > I've done that a couple times, but no effect. I think Tom's point > about a filesystem bug is probably right. You mentioned encryptfs, right? That's where I'd be looking first :-O it wasn't obvious enough to throw something in your kernel dmesg log by any chance? :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Thom Brown <thom@linux.com> writes: > On 6 March 2012 18:01, Adrian Klaver <adrian.klaver@gmail.com> wrote: >> A thought, what if you do rm -rf * in the data directory? > I've done that a couple times, but no effect. I think Tom's point > about a filesystem bug is probably right. Yeah, given your "touch" experiment I think that you have more than enough ammunition to file a kernel bug. Apparently, the directory contents are corrupted in such a way that a file named "postmaster.pid" can be created but it's invisible to some (perhaps not all) operations. In some of the more complex directory data structures I could believe that this result is filename-sensitive (think corrupted hashtable...) regards, tom lane
Sry, forgot to add list. Thom Brown wrote: > > I've done that a couple times, but no effect. I think Tom's point > about a filesystem bug is probably right. Have you rebooted since this started? There may be a process that is holding the pid file 'deleted but present' until the process terminates. If you can't find the process to kill it a reboot would remove all doubt. Just a thought. Bosco.
Bosco Rama <postgres@boscorama.com> writes: > Thom Brown wrote: >> I've done that a couple times, but no effect. I think Tom's point >> about a filesystem bug is probably right. > Have you rebooted since this started? There may be a process that is > holding the pid file 'deleted but present' until the process terminates. Even if something is holding the file open, that wouldn't prevent unlink from removing the directory entry for it; or even if we were talking about a badly-designed filesystem that failed to follow standard Unix semantics, that wouldn't explain why the directory entry is apparently visible to some operations but not others. Still, I agree with your point: Thom should reboot and see if the misbehavior is still there, because that would be useful info for his bug report. regards, tom lane
On Tue, Mar 6, 2012 at 10:11 AM, Thom Brown <thom@linux.com> wrote: > On 6 March 2012 16:04, Adrian Klaver <adrian.klaver@gmail.com> wrote: >> The postmaster.pid is located outside the data directory, but points back to the >> data directory. Not sure where Debian, though at a guess somewhere in /var. >> Any way search for postmaster.pid. > > I'm not sure, because if I use a new data directory, initdb it and > start the service, the postmaster.pid appears in it, and not as a > symbolic link. > > I did a search for postmaster.pid in the whole of /var and it only > shows up "/var/lib/postgresql/9.1/main/postmaster.pid" > > -- > Thom I know that I'm late to the party, but a small suggestion: Run "initdb" with "strace" (truss on Solaris) and examine the syscalls made. It should show you, conclusively, what files are being "open"ed, "unlink"ed, etc... Example: strace -o /tmp/x initdb -D /tmp/data-1 grep -E '^(open|unlink)' /tmp/x
On 6 March 2012 18:20, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bosco Rama <postgres@boscorama.com> writes: >> Thom Brown wrote: >>> I've done that a couple times, but no effect. I think Tom's point >>> about a filesystem bug is probably right. > >> Have you rebooted since this started? There may be a process that is >> holding the pid file 'deleted but present' until the process terminates. > > Even if something is holding the file open, that wouldn't prevent unlink > from removing the directory entry for it; or even if we were talking > about a badly-designed filesystem that failed to follow standard Unix > semantics, that wouldn't explain why the directory entry is apparently > visible to some operations but not others. > > Still, I agree with your point: Thom should reboot and see if the > misbehavior is still there, because that would be useful info for his > bug report. After a reboot, initdb completes successfully. I don't think it performed an fsck of any kind as I don't see it in the logs. -- Thom
On 6 March 2012 18:51, dennis jenkins <dennis.jenkins.75@gmail.com> wrote: > On Tue, Mar 6, 2012 at 10:11 AM, Thom Brown <thom@linux.com> wrote: >> On 6 March 2012 16:04, Adrian Klaver <adrian.klaver@gmail.com> wrote: >>> The postmaster.pid is located outside the data directory, but points back to the >>> data directory. Not sure where Debian, though at a guess somewhere in /var. >>> Any way search for postmaster.pid. >> >> I'm not sure, because if I use a new data directory, initdb it and >> start the service, the postmaster.pid appears in it, and not as a >> symbolic link. >> >> I did a search for postmaster.pid in the whole of /var and it only >> shows up "/var/lib/postgresql/9.1/main/postmaster.pid" >> >> -- >> Thom > > I know that I'm late to the party, but a small suggestion: Run > "initdb" with "strace" (truss on Solaris) and examine the syscalls > made. It should show you, conclusively, what files are being > "open"ed, "unlink"ed, etc... > > Example: > > strace -o /tmp/x initdb -D /tmp/data-1 > grep -E '^(open|unlink)' /tmp/x The reboot removed the opportunity to do this unfortunately. I'll have to wait an see if it happens again, but if it does, I'll try the suggestion. -- Thom
Thom Brown <thom@linux.com> writes: > On 6 March 2012 18:20, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Still, I agree with your point: Thom should reboot and see if the >> misbehavior is still there, because that would be useful info for his >> bug report. > After a reboot, initdb completes successfully. I don't think it > performed an fsck of any kind as I don't see it in the logs. Fascinating. So maybe there is something to Bosco's theory of something holding open the old pidfile. But what would that be? The postmaster doesn't hold it open, just write it and close it. regards, tom lane
On 6 March 2012 19:28, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thom Brown <thom@linux.com> writes: >> On 6 March 2012 18:20, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Still, I agree with your point: Thom should reboot and see if the >>> misbehavior is still there, because that would be useful info for his >>> bug report. > >> After a reboot, initdb completes successfully. I don't think it >> performed an fsck of any kind as I don't see it in the logs. > > Fascinating. So maybe there is something to Bosco's theory of something > holding open the old pidfile. But what would that be? The postmaster > doesn't hold it open, just write it and close it. No idea. I did run an lsof while the problem was still present and grep'd for the directory as I too suspected there may be some process thinking it still had a reference to the file, but there were no matches. -- Thom
Tom Lane wrote: > > Fascinating. So maybe there is something to Bosco's theory of something > holding open the old pidfile. There could also have been a corrupt in-memory/cached descriptor in the filesystem code that never needed flushing to disk? That would help explain why it fully went away after the reboot and yet the on-disk stuff seems fine. > But what would that be? Possibly a 3rd party/home-grown monitoring program? Bosco.