Re: Lost rows/data corruption? - Mailing list pgsql-general
From | Andrew Hall |
---|---|
Subject | Re: Lost rows/data corruption? |
Date | |
Msg-id | 008001c51313$a2d2d680$5001010a@bluereef.local Whole thread Raw |
In response to | Lost rows/data corruption? ("Andrew Hall" <temp02@bluereef.com.au>) |
Responses |
Re: Lost rows/data corruption?
|
List | pgsql-general |
Tom, Here is the data you requested. It took little while to gather it as this kind of corruption doesn't happen all the time. The first sign that we know something is wrong is our application freezing when communicating with the DB. If we then issue a vacuum, here's the output: sonar=# VACUUM FULL; WARNING: index "user_session_pkey" contains 85613 row versions, but table contains 85513 row versions HINT: Rebuild the index with REINDEX. WARNING: index "user_session_pkey" contains 85613 row versions, but table contains 85513 row versions HINT: Rebuild the index with REINDEX. VACUUM If we the try a reindex, here's the output: sonar=# REINDEX TABLE user_session; ERROR: could not create unique index DETAIL: Table contains duplicated values. There are duplicates and massive amounts of erroneous data in most columns in the table. From here, no amount of row deletion seems to clear the issue, the only remedy is a truncate on the table and repopulate. In fact if we try even a select we get this: WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. Here is the schema for the table: sonar=# \d user_session; Table "public.user_session" Column | Type | Modifiers ----------------+-----------------------------+--------------------------------- ------------------- user_id | integer | not null session_id | integer | not null group_id | integer | src | character varying(22) | not null hit_bytes_in | bigint | miss_bytes_in | bigint | logon_time | timestamp without time zone | default ('now'::text)::timestamp (6) with time zone logoff_time | timestamp without time zone | default ('now'::text)::timestamp (6) with time zone account_active | character(1) | default 'Y'::bpchar hit_bytes_out | bigint | miss_bytes_out | bigint | cost_bytes_in | double precision | cost_bytes_out | double precision | time_cost | double precision | Indexes: "user_session_pkey" primary key, btree (user_id, session_id) Foreign-key constraints: "$1" FOREIGN KEY (user_id) REFERENCES user_table(user_id) ON DELETE CASCADE Triggers: delsessionusagetrigger AFTER DELETE ON user_session FOR EACH ROW EXECUTE PRO CEDURE delsessionusagefunc() I'm most concerned that a primary constraint is being actively violated, and I don't understand how this is possible. Any help appreciated. ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Andrew Hall" <temp02@bluereef.com.au> Cc: <pgsql-general@postgresql.org> Sent: Friday, February 04, 2005 10:12 AM Subject: Re: [GENERAL] Lost rows/data corruption? > "Andrew Hall" <temp02@bluereef.com.au> writes: >> We have a long running DB application using PG7.4.6. We do a VACUUM FULL >> every night and a normal 'maintenance' VACUUM every hour. We do nothing >> with >> any indexes. Every now and then we get errors from the database whereby >> an >> update will fail on a table saying that there is duplicate violation on a >> primary keyed row. Theoretically this is impossible as the constraint >> should >> prevent any duplication, ever. When the next vacuum is run, we see an >> error >> saying that there are more rows than the database can account for (or >> words >> to that effect) and we should either REINDEX or TRUNCATE the table. > > Could we see the exact log trace, rather than handwaving? Also the > schemas of the table(s)/index(es) involved might be useful. > > regards, tom lane
pgsql-general by date: