Thread: ehm...
well, i will be a little verbose, but i am afraid this could be a big bug in postgresql, so... i've made a script in python that, by psycopg module, put freecddb archive in a relational db. so, it does a few select and a lot of insert for each file of freecddb (something like 500.000 files). now, the process is slower (most of the time is spent reading small file, parsing and generating the right query), so i have try it just 8 times, and now i need to know if i am wrong or if is it a real postgresql problem (having see the problem of duplicated key, and so on). well, just to make this a little bit more a bug report: os: linux 2.4.18pre1 (also tried with 2.4.17) distribution: debian unstable gcc: gcc version 3.0.3 (also tried with 3.0.2) python: 2.1 psycopg: http://packages.debian.org/unstable/non-us/python2.1-psycopg.html postgresql: different cvs version in last 8 days now, what happen is simple: i run my script and after a lot of insert (the error in the bottom appears after 76242 file parsed), if i run vacuumdb i've got different error. i.e.: vacuumdb -a -v -f -z NOTICE: Analyzing author FATAL 2: open of /home/postgres/db/pg_clog/0000 failed: No such file or directo ry server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost vacuumdb: vacuum freecddb failed i've got errors like this using vacuum with and without '-f', with db working and with db doing nothing else than vacuum. sometimes i've got errors of duplicated key (already know problem), but that time i didn't 'cut & paste' the error believing it was my fault by some mistake in my schema. now, i can investigate deeply, but i need your opinion if it is something to do or not. as i said, it takes longer to reproduce this (and maybe i'm doing something wrong). thanks a lot for your work, andrea n.b.: well, if i try on the same postgres to run other db i've got no problem, but they are a lot smaller and have a few read access and less write access per day (3-4 insert, without foreing keys, and one hundred select).
andrea gelmini <andrea.gelmini@linux.it> writes: > vacuumdb -a -v -f -z > NOTICE: Analyzing author > FATAL 2: open of /home/postgres/db/pg_clog/0000 failed: No such file or directo > ry > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > connection to server was lost > vacuumdb: vacuum freecddb failed > i've got errors like this using vacuum with and without '-f', with db > working and with db doing nothing else than vacuum. > sometimes i've got errors of duplicated key (already know problem), but > that time i didn't 'cut & paste' the error believing it was my fault by > some mistake in my schema. > now, i can investigate deeply, but i need your opinion if it is something > to do or not. as i said, it takes longer to reproduce this (and maybe i'm > doing something wrong). Yup, it looks like a bug to me. Apparently a CLOG segment has been recycled too soon. We just found a bug of that ilk in sequence processing, but VACUUM doesn't touch sequences, so apparently you have a different bug. Please submit details. Since CLOG segments normally hold a million transactions each, it'll necessarily take a long time to reproduce any problem of this kind. If you don't mind doing an initdb, you could reduce the CLOG segment size to make it easier to try to reproduce the problem. In src/backend/access/transam/clog.c change #define CLOG_XACTS_PER_SEGMENT 0x100000 to 0x10000 (I think that's about as small as you can make it without breaking anything). That gives you a shot at a problem every 64K transactions. regards, tom lane
andrea gelmini <andrea.gelmini@linux.it> writes: > On ven, gen 11, 2002 at 01:56:12 -0500, Tom Lane wrote: >> Yup, it looks like a bug to me. Apparently a CLOG segment has been >> recycled too soon. We just found a bug of that ilk in sequence >> processing, but VACUUM doesn't touch sequences, so apparently you have >> a different bug. Please submit details. > yes, i will retry tonight. > what kind of info do you want? > by the way i can give you script and the rest of things you need (and it will be very > easily for you to reproduce the problem... the script is very simple, and > schema is nothing more than a few tables). Yes, I would like a copy of your script. I just found and fixed a problem that could lead to this kind of failure (reference to an already-deleted CLOG entry). However, I'm not sure that the problem I fixed could affect VACUUM itself. Please update from cvs (make sure you get src/backend/utils/time/tqual.c v 1.46) and then see if you still see the problem. If you do, a backtrace from the point where elog() is called would be really helpful. regards, tom lane
andrea gelmini <andrea.gelmini@linux.it> writes: > you was right, and there's no need of vacuum db to crash the postmaster... > doing a fresh installation, a fresh initdb, starting my script (and a > second bash script that does nothing more than a simple insert) after about > 1 milion transactions i can see the problem (after changed sources as you told > me), without vacuumdb. Hm, you never did a VACUUM at all? Don't see how that can be --- removal of CLOG segments is triggered by VACUUM. Please let me know if you can still see the problem with today's snapshot. regards, tom lane
On ven, gen 11, 2002 at 01:56:12 -0500, Tom Lane wrote: > Yup, it looks like a bug to me. Apparently a CLOG segment has been > recycled too soon. We just found a bug of that ilk in sequence > processing, but VACUUM doesn't touch sequences, so apparently you have > a different bug. Please submit details. yes, i will retry tonight. what kind of info do you want? by the way i can give you script and the rest of things you need (and it will be very easily for you to reproduce the problem... the script is very simple, and schema is nothing more than a few tables). > If you don't mind doing an initdb, you could reduce the CLOG segment no problem, i do initdb everytime I recompile cvs... > size to make it easier to try to reproduce the problem. In > src/backend/access/transam/clog.c change > #define CLOG_XACTS_PER_SEGMENT 0x100000 > to 0x10000 (I think that's about as small as you can make it without > breaking anything). That gives you a shot at a problem every 64K > transactions. thanks for your time, andrea
On ven, gen 11, 2002 at 03:16:12 -0500, Tom Lane wrote: > Yes, I would like a copy of your script. i will package script and 'instructions' soon. i've got also two snapshots of db after the crash (tar files of postgres directory) > > I just found and fixed a problem that could lead to this kind of failure > (reference to an already-deleted CLOG entry). However, I'm not sure yesterday i spent the afternoon doing a lot of test... now, i think you don't need the details because... > that the problem I fixed could affect VACUUM itself. Please update from you was right, and there's no need of vacuum db to crash the postmaster... doing a fresh installation, a fresh initdb, starting my script (and a second bash script that does nothing more than a simple insert) after about 1 milion transactions i can see the problem (after changed sources as you told me), without vacuumdb. > cvs (make sure you get src/backend/utils/time/tqual.c v 1.46) and then > see if you still see the problem. If you do, a backtrace from the point > where elog() is called would be really helpful. tomorrow i will redo all my tests, but i guess now the bug is fixed. thanks a lot for your work, andrea n.b.: when you'll come in italy i'll be happy to be your host...
On dom, gen 13, 2002 at 11:30:08 -0500, Tom Lane wrote: > Hm, you never did a VACUUM at all? Don't see how that can be --- > removal of CLOG segments is triggered by VACUUM. uhm... i was sure i didn't do vacuum... but... well... so I made other tests. > > Please let me know if you can still see the problem with today's snapshot. i've tried in these days, and it seems everything work right now... thanks a lot for your time, andrea