Re: Mysterious DB reset - Mailing list pgsql-general

From Brent Wood
Subject Re: Mysterious DB reset
Date
Msg-id B30242D206AB9543A3406649674DB419C35423CF@welwexmb02.niwa.local
Whole thread Raw
In response to Re: Mysterious DB reset  (Thom Brown <thom@linux.com>)
List pgsql-general
One point - a serial datatype will not increment to infinity, as it is effectively a 4 byte integer with a sequence imposed, it can only store values upto MAXINT (2147483647) . Above this it may well wrap around where MAXINT + 1 = 1

You can delay the problem (significantly) by using bigserial (8 byte integer) instead of serial - this has MAXINT=9223372036854775807

http://www.postgresql.org/docs/9.2/static/datatype-numeric.html

Otherwise you might run a cron job or trigger to reset the serial values & the sequence when you think it timely.


I can't see how this would cause the missing records though.

Cheers

Brent Wood

Programme leader: Environmental Information Delivery
NIWA
DDI:  +64 (4) 3860529

From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Thom Brown [thom@linux.com]
Sent: Thursday, March 06, 2014 8:01 AM
To: Israel Brewster
Cc: PGSQL Mailing List
Subject: Re: [GENERAL] Mysterious DB reset

On 5 March 2014 18:22, Israel Brewster <israel@eraalaska.net> wrote:
I have a Postgresql 9.2.4 database containing real-time tracking data for our aircraft for the past week (theoretically). It is populated by two different processes: one that runs every few minutes, retrieving data from a number of sources and storing it in the DB, and one that has an "always on" connection to the DB streaming data into the database in realtime (often several records per second). To keep the database size manageable I have a cron job that runs every evening to delete all records that are more than a week old, after archiving a subset of them in permanent storage.

This morning my boss e-mailed me, complaining that only a couple of aircraft were showing up in the list (SELECT distinct(tail) FROM data being the command that populates the list). Upon looking at the data I saw that it only went back to 4am this morning, rather than the week I was expecting. My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected:

DELETE FROM data WHERE pointtime<now() AT TIME ZONE 'UTC'-interval '7 days';

Then I noticed something even more odd. My database has an id column, which is defined as a SERIAL. As we all know, a serial is a monotonically increasing number that is not affected by deletes. However, the oldest record in my database, from 4am this morning, had an id of 1. Even though I KNOW there was data in the system yesterday. Even if my DELETE command was wrong and deleted ALL records, that shouldn't have reset the SERIAL column to 1! I also know that I have not been in the database mucking around with the sequence value - to be completely honest, I don't even know the exact command to reset it - I'd have to google it if I wanted to.

Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred.

So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-)

That is odd.  Even if it were an unlogged table, and there was a crash, the sequence wouldn't reset, and even if it was running in a very long-running transaction held open by a buggy connection pooler, the sequence would still progress as it's immune to the effects of transactions.

So if all the data went missing, and the sequence reset, the only thing I can think of is:

Someone ran:

TRUNCATE data RESTART IDENTITY;

or someone restored the table structure from a backup that deleted the original table.

Do you log DDL?

Was the table partitioned?

You should also really be on 9.2.7, although I can't think of any bug that's been fixed which could be responsible for this issue.

--
Thom
--
Please consider the environment before printing this email.
NIWA is the trading name of the National Institute of Water & Atmospheric Research Ltd.

pgsql-general by date:

Previous
From: Susan Cassidy
Date:
Subject: Re: another trigger problem
Next
From: "Raschick, Hartmut"
Date:
Subject: Re: too-may-open-files log file entries when vauuming under solaris