Re: Lost rows/data corruption? - Mailing list pgsql-general

From Andrew Hall
Subject Re: Lost rows/data corruption?
Date
Msg-id 008001c51313$a2d2d680$5001010a@bluereef.local
Whole thread Raw
In response to Lost rows/data corruption?  ("Andrew Hall" <temp02@bluereef.com.au>)
Responses Re: Lost rows/data corruption?
List pgsql-general
Tom,

Here is the data you requested. It took little while to gather it as this
kind of corruption doesn't happen all the time.

The first sign that we know something is wrong is our application freezing
when communicating with the DB.

If we then issue a vacuum, here's the output:

sonar=# VACUUM FULL;
WARNING:  index "user_session_pkey" contains 85613 row versions, but table
contains 85513 row versions
HINT:  Rebuild the index with REINDEX.
WARNING:  index "user_session_pkey" contains 85613 row versions, but table
contains 85513 row versions
HINT:  Rebuild the index with REINDEX.
VACUUM

If we the try a reindex, here's the output:

sonar=# REINDEX TABLE user_session;
ERROR:  could not create unique index
DETAIL:  Table contains duplicated values.

There are duplicates and massive amounts of erroneous data in most columns
in the table. From here, no amount of row deletion seems to clear the issue,
the only remedy is a truncate on the table and repopulate. In fact if we try
even a select we get this:

WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

Here is the schema for the table:

sonar=# \d user_session;
                                    Table "public.user_session"
     Column     |            Type             |
Modifiers

----------------+-----------------------------+---------------------------------
-------------------
 user_id        | integer                     | not null
 session_id     | integer                     | not null
 group_id       | integer                     |
 src            | character varying(22)       | not null
 hit_bytes_in   | bigint                      |
 miss_bytes_in  | bigint                      |
 logon_time     | timestamp without time zone | default
('now'::text)::timestamp
(6) with time zone
 logoff_time    | timestamp without time zone | default
('now'::text)::timestamp
(6) with time zone
 account_active | character(1)                | default 'Y'::bpchar
 hit_bytes_out  | bigint                      |
 miss_bytes_out | bigint                      |
 cost_bytes_in  | double precision            |
 cost_bytes_out | double precision            |
 time_cost      | double precision            |
Indexes:
    "user_session_pkey" primary key, btree (user_id, session_id)
Foreign-key constraints:
    "$1" FOREIGN KEY (user_id) REFERENCES user_table(user_id) ON DELETE
CASCADE
Triggers:
    delsessionusagetrigger AFTER DELETE ON user_session FOR EACH ROW EXECUTE
PRO
CEDURE delsessionusagefunc()

I'm most concerned that a primary constraint is being actively violated, and
I don't understand how this is possible.

Any help appreciated.
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Andrew Hall" <temp02@bluereef.com.au>
Cc: <pgsql-general@postgresql.org>
Sent: Friday, February 04, 2005 10:12 AM
Subject: Re: [GENERAL] Lost rows/data corruption?


> "Andrew Hall" <temp02@bluereef.com.au> writes:
>> We have a long running DB application using PG7.4.6. We do a VACUUM FULL
>> every night and a normal 'maintenance' VACUUM every hour. We do nothing
>> with
>> any indexes. Every now and then we get errors from the database whereby
>> an
>> update will fail on a table saying that there is duplicate violation on a
>> primary keyed row. Theoretically this is impossible as the constraint
>> should
>> prevent any duplication, ever. When the next vacuum is run, we see an
>> error
>> saying that there are more rows than the database can account for (or
>> words
>> to that effect) and we should either REINDEX or TRUNCATE the table.
>
> Could we see the exact log trace, rather than handwaving?  Also the
> schemas of the table(s)/index(es) involved might be useful.
>
> regards, tom lane


pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Bug in COPY from CSV?
Next
From: Tom Lane
Date:
Subject: Re: Lost rows/data corruption?