Thread: can't start postgresql

can't start postgresql

From
Igor Shevchenko
Date:
Hi All,

I've got a problem w/ one of pgsql installations. It can't start:

[postgres@db ~]$ /usr/pgsql/bin/postgres -D /usr/pgsql/data
PANIC:  failed to re-find parent key in "23724"
Aborted

This is PostgreSQL 8.1.5 compiled from sources, on Fedora Core4. Originally it
was 8.1.2, and the bug appeared on v8.1.2; I've compiled and installed v8.1.5
over it. Autovacuum is enabled.

How can I fix this ?

--
Best Regards,
Igor Shevchenko

Re: can't start postgresql

From
Enrico
Date:
On Tue, 14 Nov 2006 02:23:39 +0200
Igor Shevchenko <igor@carcass.ath.cx> wrote:

Try with:

> [postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data

Bye Enrico
--
If Bill Gates had a penny for everytime Windows crashed,he'd be a multi-billionaire by now .......oh look, he already
is!!!! 
scotty@linuxtime.it - Skype:sscotty71
http://www.linuxtime.it/enricopirozzi

Re: can't start postgresql

From
Igor Shevchenko
Date:
On Tuesday 14 November 2006 10:52, Enrico wrote:
> Try with:
> > [postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data

Thanks, but this is not working -

[postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale C.

initdb: directory "/usr/pgsql/data" exists but is not empty
If you want to create a new database system, either remove or empty
the directory "/usr/pgsql/data" or run initdb
with an argument other than "/usr/pgsql/data".

--
Best Regards,
Igor Shevchenko

Re: can't start postgresql

From
Enrico
Date:
> Thanks, but this is not working -
>
> [postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data
> The files belonging to this database system will be owned by user "postgres".
> This user must also own the server process.
>
> The database cluster will be initialized with locale C.
>
> initdb: directory "/usr/pgsql/data" exists but is not empty
> If you want to create a new database system, either remove or empty
> the directory "/usr/pgsql/data" or run initdb
> with an argument other than "/usr/pgsql/data".

of course if want to use /usr/pgsql/bin/initdb -D /usr/pgsql/data
directory /usr/pgsql/data must be empty and user postgres must be the owner,
but I don't understand, what is your problem?

Enrico

--
If Bill Gates had a penny for everytime Windows crashed,he'd be a multi-billionaire by now .......oh look, he already
is!!!! 
scotty@linuxtime.it - Skype:sscotty71
http://www.linuxtime.it/enricopirozzi

Re: can't start postgresql

From
Matthias.Pitzl@izb.de
Date:
He's just trying to do what you told him.
But the use of initdb won't correct the problem Igor got. There's something
wrong with his database so that he can't start the postmaster process:
> PANIC:  failed to re-find parent key in "23724"

Unfortunately i don't have any idea how to solve this issue :(

-- Matthias


> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Enrico
> Sent: Tuesday, November 14, 2006 4:22 PM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] can't start postgresql
>
>
>
> > Thanks, but this is not working -
> >
> > [postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data
> > The files belonging to this database system will be owned
> by user "postgres".
> > This user must also own the server process.
> >
> > The database cluster will be initialized with locale C.
> >
> > initdb: directory "/usr/pgsql/data" exists but is not empty
> > If you want to create a new database system, either remove or empty
> > the directory "/usr/pgsql/data" or run initdb
> > with an argument other than "/usr/pgsql/data".
>
> of course if want to use /usr/pgsql/bin/initdb -D /usr/pgsql/data
> directory /usr/pgsql/data must be empty and user postgres
> must be the owner,
> but I don't understand, what is your problem?
>
> Enrico
>

Re: can't start postgresql

From
Alvaro Herrera
Date:
Matthias.Pitzl@izb.de wrote:
> He's just trying to do what you told him.
> But the use of initdb won't correct the problem Igor got. There's something
> wrong with his database so that he can't start the postmaster process:
> > PANIC:  failed to re-find parent key in "23724"
>
> Unfortunately i don't have any idea how to solve this issue :(

I'd think starting a standalone backend and issuing a "reindex database"
should be enough to get him started.  Now, the problem is figuring _how_
the index got in that state; or even _what_ index is the problematic
one.  (I think it would be possible to find out by setting the
"log_error_verbosity" parameter to "verbose").

If it's a hardware problem, one would think it deserves some diagnosis.
It could be the btree bug Tom fixed last week, but I'm not sure if this
is really a consequence of it.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: can't start postgresql

From
"Merlin Moncure"
Date:
On 11/13/06, Igor Shevchenko <igor@carcass.ath.cx> wrote:
> Hi All,
>
> I've got a problem w/ one of pgsql installations. It can't start:
>
> [postgres@db ~]$ /usr/pgsql/bin/postgres -D /usr/pgsql/data
> PANIC:  failed to re-find parent key in "23724"
> Aborted
>
> This is PostgreSQL 8.1.5 compiled from sources, on Fedora Core4. Originally it
> was 8.1.2, and the bug appeared on v8.1.2; I've compiled and installed v8.1.5
> over it. Autovacuum is enabled.

is this a fresh installation? or do you have an existing database here
that crashed or had some other problem?

merlin

Re: can't start postgresql

From
Igor Shevchenko
Date:
Hi Merlin,

On Tuesday 14 November 2006 17:36, you wrote:
> On 11/13/06, Igor Shevchenko <igor@carcass.ath.cx> wrote:
> > I've got a problem w/ one of pgsql installations. It can't start:
> >
> > [postgres@db ~]$ /usr/pgsql/bin/postgres -D /usr/pgsql/data
> > PANIC:  failed to re-find parent key in "23724"
> > Aborted
> >
> > This is PostgreSQL 8.1.5 compiled from sources, on Fedora Core4.
> > Originally it was 8.1.2, and the bug appeared on v8.1.2; I've compiled
> > and installed v8.1.5 over it. Autovacuum is enabled.
>
> is this a fresh installation? or do you have an existing database here
> that crashed or had some other problem?

Yes it was a working installation, 32g of data, on v8.1.2. Pgsql wouldn't come
up, so I've upgraded it to v8.1.5 (from sources), and tried to start it in a
single-process mode, with no luck - it was still failing with the same,
abovementioned error message.

We had a backup and had already restored from it, but I still have the crashed
db saved elsewhere, for testing/debugging.

--
Best Regards,
Igor Shevchenko

Re: can't start postgresql

From
Igor Shevchenko
Date:
On Tuesday 14 November 2006 17:34, Alvaro Herrera wrote:
> Matthias.Pitzl@izb.de wrote:
> > He's just trying to do what you told him.
> > But the use of initdb won't correct the problem Igor got. There's
> > something
> >
> > wrong with his database so that he can't start the postmaster process:
> > > PANIC:  failed to re-find parent key in "23724"
> >
> > Unfortunately i don't have any idea how to solve this issue :(
>
> I'd think starting a standalone backend and issuing a "reindex database"
> should be enough to get him started.  Now, the problem is figuring _how_
> the index got in that state; or even _what_ index is the problematic
> one.  (I think it would be possible to find out by setting the
> "log_error_verbosity" parameter to "verbose").

The problem is, it won't start even in the standalone mode. It prints the same
error message and dies. I was trying to start it connected to my main db,
template0, template1 and postgres, no change. The index corruption (assuming
this IS an index corruption) must've happened with system table(s). Is there
any way to diagose that ? Maybe reindex a specific table without starting
postmaster, or remove certain index files physically (e.g. for pg_class,
etc) ?

> If it's a hardware problem, one would think it deserves some diagnosis.
> It could be the btree bug Tom fixed last week, but I'm not sure if this
> is really a consequence of it.

Can't rule out an HW problem. I'd think this IS an HW problem, but we've
started using an updated version of our software on this server a week ago,
and it creates/drops alot of tables, temporary and not (kind of in-house
table partitioning), so this could be an (exceptionally rare) pgsql bug with
pg_class / pg_index / etc handling. A side question: is it safe to
create/drop tables from pl/pgsql functions, called directly or via a
trigger ?

Our previous version of the software was using a light version of what we're
trying to do now, and there was / maybe still a problem:

We're processing data in chunks, and each chunk is completely separate from
others, so using a special table per chunk seems to be a natural thing to do.
We keep these in a separate tablespace "w". There's one non-unique
(integer,varchar) index per each such table; both fields are non-null. Both
table and index are created via a pl/pgsql function. An index is created
after the data import, during the same transaction:

a) begin
b) select create_w_table(jobid,false) -- false means don't create index
c) insert into w.tableX select *....
d) select create_w_table(jobid,true) -- true means create index
e) commit

Both table and index are dropped at the same time, using another pl/pgsql
function.

create_w_table is rarely called from yet another pl/pgsql function.

Everything's working fine, except that sometimes an index can, um, "go off the
rails":

* not listed by "psql"'s \d w.tableX
* queries are not using it
* it's still listed in the pg_class (seen by "select * from pg_class where
relname='tblX_idx'")
* a corresponding "pg_index" row references a non-existing pg_class entry via
pg_index.indrelid

I'm seeing this several times a week, for over 5 months now. Here's my fix
procedure:

* fetch a list of probematic indexes:

select pc.relname,pc.oid,pc_idx.oid from pg_class pc left join pg_class pc_idx
on (pc_idx.relname=pc.relname || '_idx') where pc.relname ~ '^table\\d+$' and
(pc_idx.oid is NULL OR not exists (select 1 from pg_index where
pg_index.indexrelid=pc_idx.oid and indrelid=pc.oid))

* fix each with:

update pg_index set indrelid=${right one from pg_class} where
indexrelid=${index oid from pg_class};
update pg_class set relhasindex='t' where oid=${table oid}; -- AFAICR this is
redundant, as relhasindex is still true

now I can drop this index (usually it's out of sync), vacuum the table and
recreate index:

drop index w.tableX_idx;
vacuum full verbose w.tableX;
create index tableX_idx on w.tableX(...);
analyze w.tableX;


I've seen this on pgsql versions from v8.1.2 till v8.1.4, can't confirm with
v8.1.5 yet. Autovacuum's on; daily "vacuum verbose analyze" and "reindex" are
ran over pg_class, pg_index, a bunch of our own small tables, and recently
over pg_depend, pg_type and pg_statistic.

--
Best Regards,
Igor Shevchenko

Re: can't start postgresql

From
John Gray
Date:
On Tue, 14 Nov 2006 21:26:02 +0200, Igor Shevchenko wrote:


>> I'd think starting a standalone backend and issuing a "reindex database"
>> should be enough to get him started.  Now, the problem is figuring _how_
>> the index got in that state; or even _what_ index is the problematic
>> one.  (I think it would be possible to find out by setting the
>> "log_error_verbosity" parameter to "verbose").
>
> The problem is, it won't start even in the standalone mode. It prints the same
> error message and dies. I was trying to start it connected to my main db,
> template0, template1 and postgres, no change. The index corruption (assuming
> this IS an index corruption) must've happened with system table(s). Is there
> any way to diagose that ? Maybe reindex a specific table without starting
> postmaster, or remove certain index files physically (e.g. for pg_class,
> etc) ?

I think you want to use -P to ignore the indexes on startup:

postgres -D /usr/local/pgsql/data -P my_database

Does that help - then you might get the chance to do a REINDEX?

Regards

John



Re: can't start postgresql

From
Igor Shevchenko
Date:
On Wednesday 15 November 2006 01:41, you wrote:
> On Tue, 14 Nov 2006 21:26:02 +0200, Igor Shevchenko wrote:
> >> I'd think starting a standalone backend and issuing a "reindex database"
> >> should be enough to get him started.  Now, the problem is figuring _how_
> >> the index got in that state; or even _what_ index is the problematic
> >> one.  (I think it would be possible to find out by setting the
> >> "log_error_verbosity" parameter to "verbose").
> >
> > The problem is, it won't start even in the standalone mode. It prints the
> > same error message and dies. I was trying to start it connected to my
> > main db, template0, template1 and postgres, no change. The index
> > corruption (assuming this IS an index corruption) must've happened with
> > system table(s). Is there any way to diagose that ? Maybe reindex a
> > specific table without starting postmaster, or remove certain index files
> > physically (e.g. for pg_class, etc) ?
>
> I think you want to use -P to ignore the indexes on startup:
>
> postgres -D /usr/local/pgsql/data -P my_database
>
> Does that help - then you might get the chance to do a REINDEX?

Thanks for the pointer!

I thought it would help, but to my surprise, it didn't. Im still seeing this :

[postgres@db ~]$ /backup/pgsql/bin/postgres -D /backup/pgsql/data -P template1
PANIC:  failed to re-find parent key in "23724"
Aborted

--
Best Regards,
Igor Shevchenko

Re: can't start postgresql

From
Igor Shevchenko
Date:
On Tuesday 14 November 2006 17:21, Enrico wrote:
> > Thanks, but this is not working -
> >
> > [postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data
> > The files belonging to this database system will be owned by user
> > "postgres". This user must also own the server process.
> >
> > The database cluster will be initialized with locale C.
> >
> > initdb: directory "/usr/pgsql/data" exists but is not empty
> > If you want to create a new database system, either remove or empty
> > the directory "/usr/pgsql/data" or run initdb
> > with an argument other than "/usr/pgsql/data".
>
> of course if want to use /usr/pgsql/bin/initdb -D /usr/pgsql/data
> directory /usr/pgsql/data must be empty and user postgres must be the
> owner, but I don't understand, what is your problem?

The problem is that the db was not empty (32g of data).

--
Best Regards,
Igor Shevchenko

Re: can't start postgresql

From
Tom Lane
Date:
Igor Shevchenko <igor@carcass.ath.cx> writes:
> I've got a problem w/ one of pgsql installations. It can't start:

> [postgres@db ~]$ /usr/pgsql/bin/postgres -D /usr/pgsql/data
> PANIC:  failed to re-find parent key in "23724"
> Aborted

Try applying this patch:
http://archives.postgresql.org/pgsql-committers/2006-11/msg00004.php

            regards, tom lane