Thread: another pg database corruption?

another pg database corruption?

From

pginfo

Date:

27 October 2003, 21:34:53

Hi all,
I am using pg 7.3.4 on linuz red hat 7.3 and reiserFS.
After ~30 days error free work my pg stops.
I my log I found:
PANIC:  open of /mnt/diske/skladdb/pg_clog/0D02 failed: No such file or
directory
LOG:  statement: COPY public.a_sp (ids, ids_ma, ids_mp, ids_mr, ids_mpr,
ids_mpdt, ids_mpkt, smetkas, smetkaa, smetkad1, smetkad2, tip, vaji,
ime, imeeng, balans, opr, pp, prihrazh_razh, prihrazh_prih, ids_firma)
TO stdout;
LOG:  server process (pid 1252) was terminated by signal 6
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing shared memory and
semaphores
LOG:  database system was interrupted at 2003-10-27 14:34:01 EET
LOG:  checkpoint record is at 2/1C591C4C
LOG:  redo record is at 2/1C591C4C; undo record is at 0/0; shutdown TRUE

LOG:  next transaction id: 493496; next oid: 197060600
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  ReadRecord: record with zero length at 2/1C591C8C
LOG:  redo is not required
LOG:  database system is ready
PANIC:  open of /mnt/diske/skladdb/pg_clog/09FD failed: No such file or
directory
LOG:  statement: COPY public.a_acc (ids, ids_firma, ids_debit,
ids_kredit, date_op, ids_vid_doc, ids_otdel, ids_papka, ids_papka1,
ids_papka2, parv_doc, doc, nomer, val, voborot, kurs, oborotlv, zab1,
zab2, kod1, kod2, kod3, addkod, pp, instime, modtime, ids_slu) TO
stdout;
LOG:  server process (pid 1260) was terminated by signal 6
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing shared memory and
semaphores
LOG:  database system was interrupted at 2003-10-27 14:37:10 EET

How can I found the reason for this problem.
Also we try to use pg in production and to replace oracle, but it looks
very unstable with this corruption.
Exist any way to prevent pg from this error (for example real time
backup for changes or realtime replication?).
We are using pg from 3 y. and we was very happy, but in production it
looks not so stable.

many thanks,
ivan.

Re: another pg database corruption?

From

Andrew Sullivan

Date:

28 October 2003, 13:57:43

On Mon, Oct 27, 2003 at 06:21:40PM +0100, pginfo wrote:
> Hi all,
> I am using pg 7.3.4 on linuz red hat 7.3 and reiserFS.

How sure are you that your patch level for your kernel is good?  ISTR
some issues with reiserfs on some versions of the recent kernels, but
there've been so many filesystem problems with Linux over the last
couple years that my recollection is probably not all it might be.

Anyway, I would start wondering about hardware.  Try badblocks on
your disk.  Note that to produce useful results, you may have to do
the destructive tests.  You'll be wanting to back up your data first.
There have been occasional reports of clog files being destroyed --
search the archives for some quick fixes, but note that you are
likely to have some inconsistent data.  So far, all the problems have
been attributable to hardware, but nobody is ruling out a bug, if it
is reproducable.  I think Tom Lane is especially interested in
looking at cases like this; or he was last time I talked to him about
it.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: another pg database corruption?

From

pginfo

Date:

28 October 2003, 14:44:45

Hi Andrew,

Andrew Sullivan wrote:

> On Mon, Oct 27, 2003 at 06:21:40PM +0100, pginfo wrote:
> > Hi all,
> > I am using pg 7.3.4 on linuz red hat 7.3 and reiserFS.
>
> How sure are you that your patch level for your kernel is good?  ISTR
> some issues with reiserfs on some versions of the recent kernels, but
> there've been so many filesystem problems with Linux over the last
> couple years that my recollection is probably not all it might be.
>

I do not have made any patchin on my kernel.I used the standart reiserfs
with my linux distro.
If it exists better filesystem I am ready to use it.
Also I have bad had bad results with ext3. In general I need journal file
system.

> Anyway, I would start wondering about hardware.  Try badblocks on
> your disk.  Note that to produce useful results, you may have to do
> the destructive tests.  You'll be wanting to back up your data first.

I can not:). If I try pg_dump the pg crashes.One of my problems is to
restore this data if  possible.
I have a cron script that makes pg_dump every 3 h., but it is not possible
to collect
the missing data, because we have many users.
From this position I spoke, that I need to be sure that pg is very stable.

> There have been occasional reports of clog files being destroyed --
> search the archives for some quick fixes, but note that you are
> likely to have some inconsistent data.

Any idea for fixing this data is wellcome. I will check the data
inconsistent.

>  So far, all the problems have
> been attributable to hardware, but nobody is ruling out a bug, if it
> is reproducable.

We used the system 30 days without any reboot or stop. The last one was for
java upgrade and do not any problems.We do not have any changes in hardware
( I will not say it is not hardware).

>  I think Tom Lane is especially interested in
> looking at cases like this; or he was last time I talked to him about
> it.
>
> A
>
> --
> ----
> Andrew Sullivan                         204-4141 Yonge Street
> Afilias Canada                        Toronto, Ontario Canada
> <andrew@libertyrms.info>                              M2P 2A8
>                                          +1 416 646 3304 x110
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings

  regards,
ivan.

Re: another pg database corruption?

From

Andrew Sullivan

Date:

28 October 2003, 14:58:58

On Tue, Oct 28, 2003 at 03:12:37PM +0100, pginfo wrote:
> >
>
> I do not have made any patchin on my kernel.I used the standart reiserfs
> with my linux distro.

Yes, but have you kept up to date with new kernel releases from Red
Hat?  They're pretty good about releasing patched kernels if there is
a problem.

> > your disk.  Note that to produce useful results, you may have to do
> > the destructive tests.  You'll be wanting to back up your data first.
>
> I can not:). If I try pg_dump the pg crashes.One of my problems is to
> restore this data if  possible.

Shut down the database, I'm afraid, and copy the data directory
before you start doing things.

> >From this position I spoke, that I need to be sure that pg is very stable.

So far as I've heard, the cases of clog file corruption have all been
related to hardware.  You'd best think about replacing your disk or
your controller.  Have you had any crashes lately?  Anything unusual?
Are you using ECC RAM?  (If not, are you sure you don't have bad RAM?
If you got a bad bit written at the wrong time, you'd have a real
mess.)

> Any idea for fixing this data is wellcome. I will check the data
> inconsistent.

As I understand it, you need to zero out the file with dummy data in
order to get going again.  You really need to plough the archives for
what to do, though.  I've never had to do this.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: another pg database corruption?

From

pginfo

Date:

28 October 2003, 15:17:47

Andrew Sullivan wrote:

> On Tue, Oct 28, 2003 at 03:12:37PM +0100, pginfo wrote:
> > >
> >
> > I do not have made any patchin on my kernel.I used the standart reiserfs
> > with my linux distro.
>
> Yes, but have you kept up to date with new kernel releases from Red
> Hat?

Realy no.But I have many instalations with this version and the same config. And
all this is working well for long periode.
The problem is that the corrupted was the biggest one.

>  They're pretty good about releasing patched kernels if there is
> a problem.
>
> > > your disk.  Note that to produce useful results, you may have to do
> > > the destructive tests.  You'll be wanting to back up your data first.
> >
> > I can not:). If I try pg_dump the pg crashes.One of my problems is to
> > restore this data if  possible.
>
> Shut down the database, I'm afraid, and copy the data directory
> before you start doing things.
>

I did it.

> > >From this position I spoke, that I need to be sure that pg is very stable.
>
> So far as I've heard, the cases of clog file corruption have all been
> related to hardware.  You'd best think about replacing your disk or
> your controller.

The system was with 3 hdd. One for my application server one for pg and one for
archives.After the first problem I execute initdb -D on my archive disk, got the
last possible backup and start to insert data. All was well, but after 10 h of
work
pg stops ( I sendet this error here).
Also later I tryed to recreate the db on my last disk and this time I had
problems by restoring the data.
So I think I do not have problem with my hdd ( if I do not have problem with all
3 hdd at same time, or with controller).

>  Have you had any crashes lately?  Anything unusual?
> Are you using ECC RAM?  (If not, are you sure you don't have bad RAM?
> If you got a bad bit written at the wrong time, you'd have a real
> mess.)

My rad is 1G ECC each. I am using 2 G of RAM on this box..

>
>
> > Any idea for fixing this data is wellcome. I will check the data
> > inconsistent.
>
> As I understand it, you need to zero out the file with dummy data in
> order to get going again.  You really need to plough the archives for
> what to do, though.  I've never had to do this.
>
> A
>
> --
> ----
> Andrew Sullivan                         204-4141 Yonge Street
> Afilias Canada                        Toronto, Ontario Canada
> <andrew@libertyrms.info>                              M2P 2A8
>                                          +1 416 646 3304 x110
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend

  reagards,
ivan.

Re: another pg database corruption?

From

Andreas Schmitz

Date:

28 October 2003, 15:55:58

> Realy no.But I have many instalations with this version and the same
> config. And all this is working well for long periode.
> The problem is that the corrupted was the biggest one.

The good thing about PC hardware is that it is "same same but different". The
same hardware (board,cpu etc.) might behave different (chip rev. etc). I
suggest you move the database for testing to another machine.  Try to
reproduce the problem there.

regards

-andreas

PS: FYI ext3 is an ext2 filesystem with journaling enabled (man tunefs)


--
Andreas Schmitz - Phone +49 201 8501 318
Cityweb-Technik-Service-Gesellschaft mbH
Friedrichstr. 12 - Fax +49 201 8501 104
45128 Essen - email a.schmitz@cityweb.de