Thread: PostgreSQL 8.1.0 catalog corruption

PostgreSQL 8.1.0 catalog corruption

From
Bob Ippolito
Date:
I've been running 8.1.0 on a test box since the FreeBSD port has been  
available, and it appears that the catalog has become corrupted.   
There's plenty of free space on all volumes, so it hasn't run out of  
space or anything.

$ uname -a
FreeBSD shi.mochibot.com 6.0-RELEASE FreeBSD 6.0-RELEASE #2: Mon Nov   
7 14:34:52 PST 2005     root@shi.mochibot.com:/usr/obj/usr/src/sys/ 
CUSTOM  i386

$ psql mochi
Welcome to psql 8.1.0, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms       \h for help with SQL commands       \? for help with psql commands
\gor terminate with semicolon to execute query       \q to quit
 

mochi=# drop table ping_1132387200;
ERROR:  table "ping_1132387200" does not exist
mochi=# create table ping_1132387200();
ERROR:  type "ping_1132387200" already exists

I'm not sure what to do about this..

-bob



Re: PostgreSQL 8.1.0 catalog corruption

From
Bob Ippolito
Date:
On Nov 21, 2005, at 11:31 AM, Bob Ippolito wrote:

> I've been running 8.1.0 on a test box since the FreeBSD port has  
> been available, and it appears that the catalog has become  
> corrupted.  There's plenty of free space on all volumes, so it  
> hasn't run out of space or anything.
>
> $ uname -a
> FreeBSD shi.mochibot.com 6.0-RELEASE FreeBSD 6.0-RELEASE #2: Mon  
> Nov  7 14:34:52 PST 2005     root@shi.mochibot.com:/usr/obj/usr/src/ 
> sys/CUSTOM  i386
>
> $ psql mochi
> Welcome to psql 8.1.0, the PostgreSQL interactive terminal.
>
> Type:  \copyright for distribution terms
>        \h for help with SQL commands
>        \? for help with psql commands
>        \g or terminate with semicolon to execute query
>        \q to quit
>
> mochi=# drop table ping_1132387200;
> ERROR:  table "ping_1132387200" does not exist
> mochi=# create table ping_1132387200();
> ERROR:  type "ping_1132387200" already exists
>
> I'm not sure what to do about this..

Oh, and trying to drop the type doesn't work either:

mochi=# drop type ping_1132387200;
ERROR:  cache lookup failed for relation 211174567

-bob



Re: PostgreSQL 8.1.0 catalog corruption

From
Tom Lane
Date:
Bob Ippolito <bob@redivi.com> writes:
> mochi=# drop table ping_1132387200;
> ERROR:  table "ping_1132387200" does not exist
> mochi=# create table ping_1132387200();
> ERROR:  type "ping_1132387200" already exists

> I'm not sure what to do about this..

Try dropping the type.

We've seen at least one prior report of a table's rowtype not being
dropped with the table, but nothing solid enough to debug.  If you can
find a way to reproduce this, I'm all ears.
        regards, tom lane


Re: PostgreSQL 8.1.0 catalog corruption

From
Bob Ippolito
Date:
On Nov 21, 2005, at 12:44 PM, Tom Lane wrote:

> Bob Ippolito <bob@redivi.com> writes:
>> mochi=# drop table ping_1132387200;
>> ERROR:  table "ping_1132387200" does not exist
>> mochi=# create table ping_1132387200();
>> ERROR:  type "ping_1132387200" already exists
>
>> I'm not sure what to do about this..
>
> Try dropping the type.
>
> We've seen at least one prior report of a table's rowtype not being
> dropped with the table, but nothing solid enough to debug.  If you can
> find a way to reproduce this, I'm all ears.

I did try that, I guess it didn't make it to the list yet:

mochi=# drop type ping_1132387200;
ERROR:  cache lookup failed for relation 211174567

-bob






Re: PostgreSQL 8.1.0 catalog corruption

From
Alvaro Herrera
Date:
Bob Ippolito wrote:
> I've been running 8.1.0 on a test box since the FreeBSD port has been  
> available, and it appears that the catalog has become corrupted.   
> There's plenty of free space on all volumes, so it hasn't run out of  
> space or anything.
> 
> $ uname -a
> FreeBSD shi.mochibot.com 6.0-RELEASE FreeBSD 6.0-RELEASE #2: Mon Nov   
> 7 14:34:52 PST 2005     root@shi.mochibot.com:/usr/obj/usr/src/sys/ 
> CUSTOM  i386
> 
> $ psql mochi
> Welcome to psql 8.1.0, the PostgreSQL interactive terminal.
> 
> Type:  \copyright for distribution terms
>        \h for help with SQL commands
>        \? for help with psql commands
>        \g or terminate with semicolon to execute query
>        \q to quit
> 
> mochi=# drop table ping_1132387200;
> ERROR:  table "ping_1132387200" does not exist
> mochi=# create table ping_1132387200();
> ERROR:  type "ping_1132387200" already exists

Well, try

DROP TYPE ping_1132387200;

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: PostgreSQL 8.1.0 catalog corruption

From
Tom Lane
Date:
Bob Ippolito <bob@redivi.com> writes:
> On Nov 21, 2005, at 12:44 PM, Tom Lane wrote:
>> Try dropping the type.

> I did try that, I guess it didn't make it to the list yet:

> mochi=# drop type ping_1132387200;
> ERROR:  cache lookup failed for relation 211174567

Hmm, apparently there are still entries in pg_depend for these objects.
Would you look to see what is in pg_depend with either objid or refobjid
equal to either 211174567 or the type's oid (possibly 211174568, but
check)?
        regards, tom lane


Re: PostgreSQL 8.1.0 catalog corruption

From
Bob Ippolito
Date:
On Nov 21, 2005, at 1:14 PM, Tom Lane wrote:

> Bob Ippolito <bob@redivi.com> writes:
>> On Nov 21, 2005, at 12:44 PM, Tom Lane wrote:
>>> Try dropping the type.
>
>> I did try that, I guess it didn't make it to the list yet:
>
>> mochi=# drop type ping_1132387200;
>> ERROR:  cache lookup failed for relation 211174567
>
> Hmm, apparently there are still entries in pg_depend for these  
> objects.
> Would you look to see what is in pg_depend with either objid or  
> refobjid
> equal to either 211174567 or the type's oid (possibly 211174568, but
> check)?

I don't know how to get the oid of a type.. but there are certainly  
entries in pg_depend with the other criteria:

mochi=# select * from pg_depend where objid=211174567;
classid |   objid   | objsubid | refclassid | refobjid  | refobjsubid  
| deptype
---------+-----------+----------+------------+----------- 
+-------------+---------    1259 | 211174567 |        0 |       2615 |      2200 |            
0 | n    1259 | 211174567 |        0 |       1259 | 103327322 |            
0 | n
(2 rows)

mochi=# select * from pg_depend where refobjid=211174567;
classid |   objid   | objsubid | refclassid | refobjid  | refobjsubid  
| deptype
---------+-----------+----------+------------+----------- 
+-------------+---------    2606 | 212119648 |        0 |       1259 | 211174567 |            
1 | a    2606 | 212119648 |        0 |       1259 | 211174567 |            
1 | n    2606 | 212119649 |        0 |       1259 | 211174567 |            
2 | a    2606 | 212119649 |        0 |       1259 | 211174567 |            
2 | n    1247 | 211174568 |        0 |       1259 | 211174567 |            
0 | i    2604 | 211174569 |        0 |       1259 | 211174567 |            
1 | a    1259 | 211174570 |        0 |       1259 | 211174567 |            
0 | i    2606 | 211174574 |        0 |       1259 | 211174567 |            
1 | a    1259 | 211174575 |        0 |       1259 | 211174567 |            
2 | a    1259 | 211174576 |        0 |       1259 | 211174567 |            
4 | a    1259 | 211174576 |        0 |       1259 | 211174567 |            
2 | a    1259 | 211174577 |        0 |       1259 | 211174567 |            
4 | a    1259 | 211174577 |        0 |       1259 | 211174567 |            
7 | a    1259 | 211174577 |        0 |       1259 | 211174567 |            
2 | a    1259 | 211174578 |        0 |       1259 | 211174567 |            
4 | a    1259 | 211174578 |        0 |       1259 | 211174567 |            
8 | a    1259 | 211174578 |        0 |       1259 | 211174567 |            
2 | a
(17 rows)

-bob



Re: PostgreSQL 8.1.0 catalog corruption

From
Tom Lane
Date:
Bob Ippolito <bob@redivi.com> writes:
> I don't know how to get the oid of a type.. but there are certainly  
> entries in pg_depend with the other criteria:

Hmph, looks like you still have a pretty full set of dependencies for
the table.  What about attributes --- tryselect attname from pg_attribute where attrelid = 211174567;

It might be interesting to try reindexing pg_class, too ... maybe the
pg_class row is still there but isn't being found because of a corrupt
index?
        regards, tom lane


Re: PostgreSQL 8.1.0 catalog corruption

From
Bob Ippolito
Date:
On Nov 21, 2005, at 1:43 PM, Tom Lane wrote:

> Bob Ippolito <bob@redivi.com> writes:
>> I don't know how to get the oid of a type.. but there are certainly
>> entries in pg_depend with the other criteria:
>
> Hmph, looks like you still have a pretty full set of dependencies for
> the table.  What about attributes --- try
>     select attname from pg_attribute where attrelid = 211174567;
>
> It might be interesting to try reindexing pg_class, too ... maybe the
> pg_class row is still there but isn't being found because of a corrupt
> index?

The attributes look like the names of all the columns in the table,  
and reindexing didn't help.

mochi=# select attname from pg_attribute where attrelid = 211174567;        attname
------------------------
tableoid
cmax
xmax
cmin
xmin
oid
ctid
id
timestamp
ip_address
mochiTag
mochiGUID
mochiVersion
movieURL
movieURLHost
movieReferrer
movieReferrerHost
movieUserAgent
movieSWFVersion
movieBytesLoaded
movieQuality
movieStageHeight
movieStageWidth
movieBytesTotal
movieStageAlign
movieScaleMode
movieShowMenu
userScreenReader
userCameras
userMicrophones
userSystemCapabilities
userTimeZoneOffset
userTicks
userUTCTime
(34 rows)

mochi=# reindex table pg_class;
REINDEX
mochi=# create table ping_1132387200();
ERROR:  type "ping_1132387200" already exists
mochi=# drop table ping_1132387200;
ERROR:  table "ping_1132387200" does not exist
mochi=# drop type ping_1132387200;
ERROR:  cache lookup failed for relation 211174567

-bob



Re: PostgreSQL 8.1.0 catalog corruption

From
Tom Lane
Date:
Bob Ippolito <bob@redivi.com> writes:
> The attributes look like the names of all the columns in the table,  
> and reindexing didn't help.

So at this point it seems that the pg_class row disappeared, but there
probably wasn't any actual DROP operation --- you'd think at least some
of those other entries would have been deleted by a DROP.

My next guess is that the pg_class row simply got clobbered somehow,
eg its xmin field got set to something ridiculous.  The only way I can
think of to investigate that is to dump out the contents of pg_class
with pg_filedump --- are you game for that?  If so, get the right
version of pg_filedump from 
http://sources.redhat.com/rhdb/utilities.html
and run it with the -i -f options (usually produces the most helpful
output, in my experience).
        regards, tom lane


Re: PostgreSQL 8.1.0 catalog corruption

From
Bob Ippolito
Date:
On Nov 21, 2005, at 1:59 PM, Tom Lane wrote:

> Bob Ippolito <bob@redivi.com> writes:
>> The attributes look like the names of all the columns in the table,
>> and reindexing didn't help.
>
> So at this point it seems that the pg_class row disappeared, but there
> probably wasn't any actual DROP operation --- you'd think at least  
> some
> of those other entries would have been deleted by a DROP.
>
> My next guess is that the pg_class row simply got clobbered somehow,
> eg its xmin field got set to something ridiculous.  The only way I can
> think of to investigate that is to dump out the contents of pg_class
> with pg_filedump --- are you game for that?  If so, get the right
> version of pg_filedump from
> http://sources.redhat.com/rhdb/utilities.html
> and run it with the -i -f options (usually produces the most helpful
> output, in my experience).

This is 8.1.0, can I use pg_dump 4.0 with that?  The entire database  
is 39GB, there's a way to just get pg_class, right?

-bob



Re: PostgreSQL 8.1.0 catalog corruption

From
Tom Lane
Date:
Bob Ippolito <bob@redivi.com> writes:
> Ok, so how do I figure out which file(s) are associated with pg_class  
> so I can feed this thing?

See contrib/oid2name and/or read
http://www.postgresql.org/docs/8.1/static/storage.html
        regards, tom lane


Re: PostgreSQL 8.1.0 catalog corruption

From
Bob Ippolito
Date:
On Nov 21, 2005, at 2:42 PM, Tom Lane wrote:

> Bob Ippolito <bob@redivi.com> writes:
>> Ok, so how do I figure out which file(s) are associated with pg_class
>> so I can feed this thing?
>
> See contrib/oid2name and/or read
> http://www.postgresql.org/docs/8.1/static/storage.html

Ok, here's the pg_filedump for the pg_class table in the mochi  
database that is having the issue:

(it's also there without gz, but the gz is 228KB versus 2352KB  
uncompressed)
http://undefined.org/mochi.pg_class.filedump.gz

-bob



Re: PostgreSQL 8.1.0 catalog corruption

From
Tom Lane
Date:
Bob Ippolito <bob@redivi.com> writes:
> Ok, here's the pg_filedump for the pg_class table in the mochi  
> database that is having the issue:

Thanks.  I don't see any live tuples that look like they could have been
the one we want, but there's a whole lot of deleted rows, which
pg_filedump won't show with those options.  Could you try a pg_filedump
with -d option too?
        regards, tom lane


Re: PostgreSQL 8.1.0 catalog corruption

From
Bob Ippolito
Date:
On Nov 21, 2005, at 3:12 PM, Tom Lane wrote:

> Bob Ippolito <bob@redivi.com> writes:
>> Ok, here's the pg_filedump for the pg_class table in the mochi
>> database that is having the issue:
>
> Thanks.  I don't see any live tuples that look like they could have  
> been
> the one we want, but there's a whole lot of deleted rows, which
> pg_filedump won't show with those options.  Could you try a  
> pg_filedump
> with -d option too?

Sure, here it is:

http://undefined.org/mochi.pg_class-1.filedump.gz

-bob



Re: PostgreSQL 8.1.0 catalog corruption

From
Tom Lane
Date:
Bob Ippolito <bob@redivi.com> writes:
> Sure, here it is:
> http://undefined.org/mochi.pg_class-1.filedump.gz

Well, I count at least a couple hundred deleted versions of that table
row :-(.  What the heck were you doing with it?  As far as I can think
offhand, only a schema modification would cause an update of the
pg_class row.
        regards, tom lane


Re: PostgreSQL 8.1.0 catalog corruption

From
Bob Ippolito
Date:
On Nov 21, 2005, at 3:56 PM, Tom Lane wrote:

> Bob Ippolito <bob@redivi.com> writes:
>> Sure, here it is:
>> http://undefined.org/mochi.pg_class-1.filedump.gz
>
> Well, I count at least a couple hundred deleted versions of that table
> row :-(.  What the heck were you doing with it?  As far as I can think
> offhand, only a schema modification would cause an update of the
> pg_class row.

There's an ETL process that does the following:

- Create a temp table
- COPY into the temp table
- do some transforms on the data
- create a dated table if it doesn't already exist
- fail miserably because the catalog is busted**

**: it normally does something else here, namely inserting a bunch of  
rows into the table

The ETL process keeps trying until it succeeds or someone stops it,  
so I guess that's why there's so much churn in there for that table.   
Kept trying to create it, and ran into the issue.  I'd estimate  
around 1700 to 1800 dead versions of that table, because it ran for  
some time before I noticed and stopped it... this is just a test box  
after all, I don't have 8.1 in production yet (thankfully!).

So what do I do now?  Kill the database, start over from a dump, and  
cross fingers that this doesn't pop up again?

-bob



Re: PostgreSQL 8.1.0 catalog corruption

From
Tom Lane
Date:
Bob Ippolito <bob@redivi.com> writes:
> On Nov 21, 2005, at 3:56 PM, Tom Lane wrote:
>> Well, I count at least a couple hundred deleted versions of that table
>> row :-(.  What the heck were you doing with it?

> The ETL process keeps trying until it succeeds or someone stops it,  
> so I guess that's why there's so much churn in there for that table.   
> Kept trying to create it, and ran into the issue.  I'd estimate  
> around 1700 to 1800 dead versions of that table, because it ran for  
> some time before I noticed and stopped it... this is just a test box  
> after all, I don't have 8.1 in production yet (thankfully!).

Um, no, that theory doesn't seem to explain the evidence.  A failed
insertion would result in a row with an uncommitted XMIN and no XMAX.
All of the entries I'm seeing have both XMIN and XMAX set.  A good-size
fraction have the same XMIN and XMAX (but different CMIN and CMAX), but
I see some that have different XMIN and XMAX.  It looks to me like the
table was definitely created successfully, and it survived across
multiple transactions ... but something was doing a lot of DDL changes
on it.  If we could find out what, maybe we could reproduce the problem.
        regards, tom lane


Re: PostgreSQL 8.1.0 catalog corruption

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Bob Ippolito <bob@redivi.com> writes:
> > On Nov 21, 2005, at 3:56 PM, Tom Lane wrote:
> >> Well, I count at least a couple hundred deleted versions of that table
> >> row :-(.  What the heck were you doing with it?
> 
> > The ETL process keeps trying until it succeeds or someone stops it,  
> > so I guess that's why there's so much churn in there for that table.   
> > Kept trying to create it, and ran into the issue.  I'd estimate  
> > around 1700 to 1800 dead versions of that table, because it ran for  
> > some time before I noticed and stopped it... this is just a test box  
> > after all, I don't have 8.1 in production yet (thankfully!).
> 
> Um, no, that theory doesn't seem to explain the evidence.  A failed
> insertion would result in a row with an uncommitted XMIN and no XMAX.
> All of the entries I'm seeing have both XMIN and XMAX set.  A good-size
> fraction have the same XMIN and XMAX (but different CMIN and CMAX), but
> I see some that have different XMIN and XMAX.  It looks to me like the
> table was definitely created successfully, and it survived across
> multiple transactions ... but something was doing a lot of DDL changes
> on it.  If we could find out what, maybe we could reproduce the problem.

Maybe the UPDATE pg_class SET relhastriggers='f' that people is so fond
of doing to deactivate triggers?  Or something similar?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: PostgreSQL 8.1.0 catalog corruption

From
Bob Ippolito
Date:
On Nov 21, 2005, at 4:33 PM, Alvaro Herrera wrote:

> Tom Lane wrote:
>> Bob Ippolito <bob@redivi.com> writes:
>>> On Nov 21, 2005, at 3:56 PM, Tom Lane wrote:
>>>> Well, I count at least a couple hundred deleted versions of that  
>>>> table
>>>> row :-(.  What the heck were you doing with it?
>>
>>> The ETL process keeps trying until it succeeds or someone stops it,
>>> so I guess that's why there's so much churn in there for that table.
>>> Kept trying to create it, and ran into the issue.  I'd estimate
>>> around 1700 to 1800 dead versions of that table, because it ran for
>>> some time before I noticed and stopped it... this is just a test box
>>> after all, I don't have 8.1 in production yet (thankfully!).
>>
>> Um, no, that theory doesn't seem to explain the evidence.  A failed
>> insertion would result in a row with an uncommitted XMIN and no XMAX.
>> All of the entries I'm seeing have both XMIN and XMAX set.  A good- 
>> size
>> fraction have the same XMIN and XMAX (but different CMIN and  
>> CMAX), but
>> I see some that have different XMIN and XMAX.  It looks to me like  
>> the
>> table was definitely created successfully, and it survived across
>> multiple transactions ... but something was doing a lot of DDL  
>> changes
>> on it.  If we could find out what, maybe we could reproduce the  
>> problem.
>
> Maybe the UPDATE pg_class SET relhastriggers='f' that people is so  
> fond
> of doing to deactivate triggers?  Or something similar?

I don't touch pg_class at all... this is what I'm doing (over and  
over again).
-- clone_table is almost always a no-op, but once a day it creates a  
new table        SELECT clone_table('ping', 'ping_%s', '')        SELECT drop_ping_constraints('ping_%s')-- stuff that
doesn'teffect DDLSELECT add_ping_constraints('ping_%s')
 

and the referenced UDFs are as follows:

CREATE OR REPLACE FUNCTION
clone_table(parent text, child text, extra text) RETURNS boolean
AS $$
DECLARE    tmprec record;    user_index record;    parent_constraint record;    user_index_column record;    indexname
text;   i integer;    columns text[];
 
BEGIN    -- are we done?    FOR tmprec IN            SELECT 1 FROM pg_sysviews.pg_user_tables WHERE  
table_name=child            LOOP
        RETURN FALSE;    END LOOP;
    -- inherit the table    EXECUTE 'CREATE TABLE '        || quote_ident(child)        || '('        || extra
||') INHERITS ('        || quote_ident(parent)        || ')';
 

    FOR parent_constraint IN            SELECT *            FROM pg_sysviews.pg_user_table_constraints A
WHEREA.table_name = parent            LOOP        EXECUTE 'ALTER TABLE '            || quote_ident(child)            ||
'ADD '            || parent_constraint.definition;    END LOOP;
 
    i := 0;    FOR user_index IN            SELECT *            FROM pg_sysviews.pg_user_indexes A            WHERE
          A.table_name = parent                AND A.index_name != (parent || '_pkey')            LOOP
 
        i := i + 1;        indexname := child;        columns := '{}'::text[];        FOR user_index_column IN
     SELECT B.column_name, quote_ident(B.column_name) AS col                FROM pg_sysviews.pg_user_index_columns B
           WHERE                    B.table_name = user_index.table_name                    AND B.index_name =
user_index.index_name               ORDER BY B.column_position                LOOP
 
            indexname := indexname || '_' ||  
user_index_column.column_name;            columns := array_append(columns, user_index_column.col);        END LOOP;
        IF user_index.predicate IS NOT NULL THEN            indexname := indexname || '_p' || i::text;        END IF;
        -- this is not complete, but works        -- missing tablespace, index_method, is_clustered,        EXECUTE
('CREATE'            || (CASE WHEN user_index.is_unique THEN 'UNIQUE ' ELSE  
 
'' END)            || 'INDEX '            || quote_ident(indexname)            || ' ON '            ||
quote_ident(child)           || ' USING '            || quote_ident(user_index.index_method)            || ' ('
  || array_to_string(columns, ',')            || ')'            || (CASE WHEN user_index.predicate IS NOT NULL
     THEN ' WHERE ' || user_index.predicate                ELSE '' END)        );
 

    END LOOP;
    RETURN TRUE;

END
$$ LANGUAGE plpgsql;


CREATE OR REPLACE FUNCTION
drop_ping_constraints(ping_table text) RETURNS void
AS $drop_ping_constraints$
DECLARE    next_sql text;    constraint_rec record;
BEGIN
    next_sql := $sql$        SELECT            "constraint_name"        FROM pg_sysviews.pg_user_table_constraints
 WHERE "constraint_name" IN ($sql$        || quote_literal(ping_table || '_timestamp_check')        || ', '        ||
quote_literal(ping_table|| '_id_check')        || ')';
 
    -- RAISE NOTICE 'SQL: %', next_sql;    FOR constraint_rec IN EXECUTE next_sql LOOP        next_sql := 'ALTER TABLE
'           || quote_ident(ping_table)            || ' DROP CONSTRAINT '            ||
quote_ident(constraint_rec.constraint_name);       -- RAISE NOTICE 'SQL: %', next_sql;        EXECUTE next_sql;    END
LOOP;
    RETURN;
END
$drop_ping_constraints$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION
add_ping_constraints(ping_table text) RETURNS void
AS $add_ping_constraints$
DECLARE    next_sql text;    extents_rec RECORD;
BEGIN
    next_sql := $sql$        SELECT            MIN("id") AS "min_id",            MAX("id") AS "max_id",
MIN("timestamp")AS "min_timestamp",            MAX("timestamp") AS "max_timestamp"        FROM $sql$        ||
quote_ident(ping_table);
    -- RAISE NOTICE 'SQL: %', next_sql;    FOR extents_rec IN EXECUTE next_sql LOOP    END LOOP;
    next_sql := 'ALTER TABLE '        || quote_ident(ping_table)        || ' ADD CHECK("id" BETWEEN '        ||
quote_literal(extents_rec.min_id)|| '::bigint'        || ' AND '        || quote_literal(extents_rec.max_id) ||
'::bigint'       || ')';    -- RAISE NOTICE 'SQL: %', next_sql;    EXECUTE next_sql;
 
    next_sql := 'ALTER TABLE '        || quote_ident(ping_table)        || ' ADD CHECK("timestamp" BETWEEN '        ||
quote_literal(extents_rec.min_timestamp)|| '::timestamptz'        || ' AND '        ||
quote_literal(extents_rec.max_timestamp)|| '::timestamptz'        || ')';
 
    -- RAISE NOTICE 'SQL: %', next_sql;    EXECUTE next_sql;
    RETURN;
END
$add_ping_constraints$ LANGUAGE plpgsql;


Re: PostgreSQL 8.1.0 catalog corruption

From
Tom Lane
Date:
Bob Ippolito <bob@redivi.com> writes:
> I don't touch pg_class at all... this is what I'm doing (over and  
> over again).

>     -- clone_table is almost always a no-op, but once a day it creates a  
> new table
>          SELECT clone_table('ping', 'ping_%s', '')
>          SELECT drop_ping_constraints('ping_%s')
>     -- stuff that doesn't effect DDL
>     SELECT add_ping_constraints('ping_%s')

Hm, do the drop/add constraint functions get executed even when
clone_table decides not to make a new table?  If so, that would probably
explain the pattern I'm seeing in the dump of many updates of the
pg_class row.

This still doesn't give us a hint why the row disappeared, but maybe we
can try running these functions for awhile and see if anyone can
reproduce a failure.
        regards, tom lane


Re: PostgreSQL 8.1.0 catalog corruption

From
Bob Ippolito
Date:
On Nov 21, 2005, at 5:50 PM, Tom Lane wrote:

> Bob Ippolito <bob@redivi.com> writes:
>> I don't touch pg_class at all... this is what I'm doing (over and
>> over again).
>
>>     -- clone_table is almost always a no-op, but once a day it creates a
>> new table
>>          SELECT clone_table('ping', 'ping_%s', '')
>>          SELECT drop_ping_constraints('ping_%s')
>>     -- stuff that doesn't effect DDL
>>     SELECT add_ping_constraints('ping_%s')
>
> Hm, do the drop/add constraint functions get executed even when
> clone_table decides not to make a new table?  If so, that would  
> probably
> explain the pattern I'm seeing in the dump of many updates of the
> pg_class row.

Yes, they do.  The constraints are there for constraint exclusion.

> This still doesn't give us a hint why the row disappeared, but  
> maybe we
> can try running these functions for awhile and see if anyone can
> reproduce a failure.

If it matters, I have had the same code running on Bizgres 0.7.4 for  
quite some time with no issues at all.  I may just have to migrate  
the test server to Bizgres 0.8 if we can't figure out why PostgreSQL  
8.1.0 choked here.

-bob



Re: PostgreSQL 8.1.0 catalog corruption

From
Tom Lane
Date:
Bob Ippolito <bob@redivi.com> writes:
> On Nov 21, 2005, at 5:50 PM, Tom Lane wrote:
>> Hm, do the drop/add constraint functions get executed even when
>> clone_table decides not to make a new table?  If so, that would  
>> probably explain the pattern I'm seeing in the dump of many updates of the
>> pg_class row.

> Yes, they do.  The constraints are there for constraint exclusion.

I dug through the dump more closely and determined that the newest
remaining version of the ping_1132387200 row claims to have been
outdated by transaction 000d585f.  However, its ctid points to an item
slot that seems to have been reused by a much later transaction
(000fac5c).  So I'm afraid all the evidence is gone about what really
happened :-(.  If we had caught the problem earlier maybe we could have
learned more.  If you see it happen again, could you get dumps of
pg_class (in both dump formats) as quickly as possible?
        regards, tom lane