Re: Mysterious DB reset - Mailing list pgsql-general

From Thom Brown
Subject Re: Mysterious DB reset
Date
Msg-id CAA-aLv5qD1ZgoWS8U3wT=6YAjS-p_3zDxRdXbz+WpJegxGnWWQ@mail.gmail.com
Whole thread Raw
In response to Mysterious DB reset  (Israel Brewster <israel@eraalaska.net>)
Responses Re: Mysterious DB reset  (Israel Brewster <israel@eraalaska.net>)
Re: Mysterious DB reset  (Brent Wood <Brent.Wood@niwa.co.nz>)
List pgsql-general
On 5 March 2014 18:22, Israel Brewster <israel@eraalaska.net> wrote:
I have a Postgresql 9.2.4 database containing real-time tracking data for our aircraft for the past week (theoretically). It is populated by two different processes: one that runs every few minutes, retrieving data from a number of sources and storing it in the DB, and one that has an "always on" connection to the DB streaming data into the database in realtime (often several records per second). To keep the database size manageable I have a cron job that runs every evening to delete all records that are more than a week old, after archiving a subset of them in permanent storage.

This morning my boss e-mailed me, complaining that only a couple of aircraft were showing up in the list (SELECT distinct(tail) FROM data being the command that populates the list). Upon looking at the data I saw that it only went back to 4am this morning, rather than the week I was expecting. My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected:

DELETE FROM data WHERE pointtime<now() AT TIME ZONE 'UTC'-interval '7 days';

Then I noticed something even more odd. My database has an id column, which is defined as a SERIAL. As we all know, a serial is a monotonically increasing number that is not affected by deletes. However, the oldest record in my database, from 4am this morning, had an id of 1. Even though I KNOW there was data in the system yesterday. Even if my DELETE command was wrong and deleted ALL records, that shouldn't have reset the SERIAL column to 1! I also know that I have not been in the database mucking around with the sequence value - to be completely honest, I don't even know the exact command to reset it - I'd have to google it if I wanted to.

Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred.

So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-)

That is odd.  Even if it were an unlogged table, and there was a crash, the sequence wouldn't reset, and even if it was running in a very long-running transaction held open by a buggy connection pooler, the sequence would still progress as it's immune to the effects of transactions.

So if all the data went missing, and the sequence reset, the only thing I can think of is:

Someone ran:

TRUNCATE data RESTART IDENTITY;

or someone restored the table structure from a backup that deleted the original table.

Do you log DDL?

Was the table partitioned?

You should also really be on 9.2.7, although I can't think of any bug that's been fixed which could be responsible for this issue.

--
Thom

pgsql-general by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Mysterious DB reset
Next
From: Tom Lane
Date:
Subject: Re: too-may-open-files log file entries when vauuming under solaris