Thread: Mysterious DB reset

Mysterious DB reset

From
Israel Brewster
Date:
I have a Postgresql 9.2.4 database containing real-time tracking data for our aircraft for the past week (theoretically). It is populated by two different processes: one that runs every few minutes, retrieving data from a number of sources and storing it in the DB, and one that has an "always on" connection to the DB streaming data into the database in realtime (often several records per second). To keep the database size manageable I have a cron job that runs every evening to delete all records that are more than a week old, after archiving a subset of them in permanent storage.

This morning my boss e-mailed me, complaining that only a couple of aircraft were showing up in the list (SELECT distinct(tail) FROM data being the command that populates the list). Upon looking at the data I saw that it only went back to 4am this morning, rather than the week I was expecting. My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected:

DELETE FROM data WHERE pointtime<now() AT TIME ZONE 'UTC'-interval '7 days';

Then I noticed something even more odd. My database has an id column, which is defined as a SERIAL. As we all know, a serial is a monotonically increasing number that is not affected by deletes. However, the oldest record in my database, from 4am this morning, had an id of 1. Even though I KNOW there was data in the system yesterday. Even if my DELETE command was wrong and deleted ALL records, that shouldn't have reset the SERIAL column to 1! I also know that I have not been in the database mucking around with the sequence value - to be completely honest, I don't even know the exact command to reset it - I'd have to google it if I wanted to.

Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred.

So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-)

-----------------------------------------------
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
-----------------------------------------------


Attachment

Re: Mysterious DB reset

From
Alvaro Herrera
Date:
Israel Brewster wrote:

> So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly
causethis behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset?
Especiallywhile there is an active connection? Thanks for any ideas, however wild or off the wall :-) 

Is this running off a NFS mount or something?  I'm wondering about the
filesystem getting unmounted and an empty copy of the database being in
the mountpoint.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Re: Mysterious DB reset

From
Thom Brown
Date:
On 5 March 2014 18:22, Israel Brewster <israel@eraalaska.net> wrote:
I have a Postgresql 9.2.4 database containing real-time tracking data for our aircraft for the past week (theoretically). It is populated by two different processes: one that runs every few minutes, retrieving data from a number of sources and storing it in the DB, and one that has an "always on" connection to the DB streaming data into the database in realtime (often several records per second). To keep the database size manageable I have a cron job that runs every evening to delete all records that are more than a week old, after archiving a subset of them in permanent storage.

This morning my boss e-mailed me, complaining that only a couple of aircraft were showing up in the list (SELECT distinct(tail) FROM data being the command that populates the list). Upon looking at the data I saw that it only went back to 4am this morning, rather than the week I was expecting. My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected:

DELETE FROM data WHERE pointtime<now() AT TIME ZONE 'UTC'-interval '7 days';

Then I noticed something even more odd. My database has an id column, which is defined as a SERIAL. As we all know, a serial is a monotonically increasing number that is not affected by deletes. However, the oldest record in my database, from 4am this morning, had an id of 1. Even though I KNOW there was data in the system yesterday. Even if my DELETE command was wrong and deleted ALL records, that shouldn't have reset the SERIAL column to 1! I also know that I have not been in the database mucking around with the sequence value - to be completely honest, I don't even know the exact command to reset it - I'd have to google it if I wanted to.

Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred.

So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-)

That is odd.  Even if it were an unlogged table, and there was a crash, the sequence wouldn't reset, and even if it was running in a very long-running transaction held open by a buggy connection pooler, the sequence would still progress as it's immune to the effects of transactions.

So if all the data went missing, and the sequence reset, the only thing I can think of is:

Someone ran:

TRUNCATE data RESTART IDENTITY;

or someone restored the table structure from a backup that deleted the original table.

Do you log DDL?

Was the table partitioned?

You should also really be on 9.2.7, although I can't think of any bug that's been fixed which could be responsible for this issue.

--
Thom

Re: Mysterious DB reset

From
Adrian Klaver
Date:
On 03/05/2014 10:22 AM, Israel Brewster wrote:
> I have a Postgresql 9.2.4 database containing real-time tracking data
> for our aircraft for the past week (theoretically). It is populated by
> two different processes: one that runs every few minutes, retrieving
> data from a number of sources and storing it in the DB, and one that has
> an "always on" connection to the DB streaming data into the database in
> realtime (often several records per second). To keep the database size
> manageable I have a cron job that runs every evening to delete all
> records that are more than a week old, after archiving a subset of them
> in permanent storage.
>
> This morning my boss e-mailed me, complaining that only a couple of
> aircraft were showing up in the list (SELECT distinct(tail) FROM data
> being the command that populates the list). Upon looking at the data I
> saw that it only went back to 4am this morning, rather than the week I
> was expecting. My first thought was "Oh, I must have a typo in my
> cleanup routine, such that it is deleting all records rather than only
> those a week old, and it's just that no one has noticed until now". So I
> looked at that, but changing the delete to a select appeared to produce
> the proper results, in that no records were selected:

Well it would, if the records only go back to 4 AM this morning. In
other words if no records exist before 4 AM today, no records exist
before 7 days ago also or am I missing something?

>
> DELETE FROM data WHERE pointtime<now() AT TIME ZONE 'UTC'-interval '7 days';
>
> Then I noticed something even more odd. My database has an id column,
> which is defined as a SERIAL. As we all know, a serial is a
> monotonically increasing number that is not affected by deletes.
> However, the oldest record in my database, from 4am this morning, had an
> id of 1. Even though I KNOW there was data in the system yesterday. Even
> if my DELETE command was wrong and deleted ALL records, that shouldn't
> have reset the SERIAL column to 1! I also know that I have not been in
> the database mucking around with the sequence value - to be completely
> honest, I don't even know the exact command to reset it - I'd have to
> google it if I wanted to.

A sequence is just a special table.

So what does SELECT * from the sequence show?

>
> Also odd is that my cleanup script runs at 1am. I have records of there
> being new data in the database up to 3:51am, but the oldest record
> currently in the DB is from 4:45am (as specified by the default of now()
> on the column). So I know records were added after my delete command
> ran, but before this reset occurred.

I am not sure what you are calling the 'reset'?
Did something happen between 3:51 AM and 4:45 AM?
Also not sure why you call the 4:45 AM record the oldest, when you say
you can identify records from 3:51 AM?

>
> So my question is, aside from someone going in and mucking about in the
> wee hours of the morning, what could possibly cause this behavior? What
> sort of event could cause all data to be deleted from the table, and the
> sequence to be reset? Especially while there is an active connection?
> Thanks for any ideas, however wild or off the wall :-)

What is in the Postgres/system logs for the time period(s) you mention?

>
> -----------------------------------------------
> Israel Brewster
> Computer Support Technician II
> Era Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7250 x7293
> -----------------------------------------------
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Mysterious DB reset

From
Israel Brewster
Date:
For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going on
appearsto be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 

On Mar 5, 2014, at 1:00 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> On 03/05/2014 10:22 AM, Israel Brewster wrote:
>> My first thought was "Oh, I must have a typo in my
>> cleanup routine, such that it is deleting all records rather than only
>> those a week old, and it's just that no one has noticed until now". So I
>> looked at that, but changing the delete to a select appeared to produce
>> the proper results, in that no records were selected:
>
> Well it would, if the records only go back to 4 AM this morning. In other words if no records exist before 4 AM
today,no records exist before 7 days ago also or am I missing something? 

If the delete is correct, you are absolutely right. My first theory, however, was that I made a typo, and the delete
wasdeleting ALL records, in which case changing it to a select would select all records. As it did not, that seems to
confirmthe delete is correct, and therefore not the problem. 

> A sequence is just a special table.
>
> So what does SELECT * from the sequence show?

tracking=> SELECT * FROM data_id_seq;
 sequence_name | last_value | start_value | increment_by |      max_value      | min_value | cache_value | log_cnt |
is_cycled| is_called  

---------------+------------+-------------+--------------+---------------------+-----------+-------------+---------+-----------+-----------
 data_id_seq   |       1184 |           1 |            1 | 9223372036854775807 |         1 |           1 |      16 | f
      | t 


>
>>
>> Also odd is that my cleanup script runs at 1am. I have records of there
>> being new data in the database up to 3:51am, but the oldest record
>> currently in the DB is from 4:45am (as specified by the default of now()
>> on the column). So I know records were added after my delete command
>> ran, but before this reset occurred.
>
> I am not sure what you are calling the 'reset'?
> Did something happen between 3:51 AM and 4:45 AM?

Yes: All my data was deleted and the sequence reset to 1.

> Also not sure why you call the 4:45 AM record the oldest, when you say you can identify records from 3:51 AM?

As I mentioned, I archive the records to permanent storage. This archive process happens every hour (for various
reasons).That is how I know we had records for 3:51 am: they exist in the permanent archive. However, they don't exist
inthe local database any more. 

>
>>
>> So my question is, aside from someone going in and mucking about in the
>> wee hours of the morning, what could possibly cause this behavior? What
>> sort of event could cause all data to be deleted from the table, and the
>> sequence to be reset? Especially while there is an active connection?
>> Thanks for any ideas, however wild or off the wall :-)
>
> What is in the Postgres/system logs for the time period(s) you mention?

The postgres log has a lot of errors in it, some of which MAY explain the issue. For example:

cp: cannot create regular file '/mnt/pglogs/000000010000000400000094': Permission denied
LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: test ! -f /mnt/pglogs/000000010000000400000094 && cp
pg_xlog/000000010000000400000094/mnt/pglogs/000000010000000400000094 
WARNING:  transaction log file "000000010000000400000094" could not be archived: too many failures
LOG:  received smart shutdown request
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down

However, there are no timestamps on any of the entries (can I fix that?), so I don't know if those are current entries,
orfrom back before I got the mount for the logs working. At this time, the mount point IS working correctly, and from
whatI can tell so is the archive command. The latest entry is from yesterday (modify date on the log shows Mar 5, 9:21,
whenI was messing with it yesterday), however, so there are no entries from this morning when it happened again. I
don'tsee anything of interest in the syslog or messages log. 

>
>>
>> -----------------------------------------------
>> Israel Brewster
>> Computer Support Technician II
>> Era Alaska
>> 5245 Airport Industrial Rd
>> Fairbanks, AK 99709
>> (907) 450-7250 x7293
>> -----------------------------------------------
>>
>>
>>
>>
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com



Re: Mysterious DB reset

From
Israel Brewster
Date:
On Mar 5, 2014, at 10:01 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> Israel Brewster wrote:
>
>> So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could
possiblycause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to
bereset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-) 
>
> Is this running off a NFS mount or something?  I'm wondering about the
> filesystem getting unmounted and an empty copy of the database being in
> the mount point.

Good thought - I hadn't thought of a file system level reset. Unfortunately the data folder is on the root partition,
whichis a standard SATA hard drive, and the date on the data folder shows February 27th, so apparently the folder
itselfat least hasn't been re-created. 

>
> --
> Álvaro Herrera                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
-----------------------------------------------
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
-----------------------------------------------


Attachment

Re: Mysterious DB reset

From
Israel Brewster
Date:
On Mar 5, 2014, at 10:01 AM, Thom Brown <thom@linux.com> wrote:

On 5 March 2014 18:22, Israel Brewster <israel@eraalaska.net> wrote:
I have a Postgresql 9.2.4 database containing real-time tracking data for our aircraft for the past week (theoretically). It is populated by two different processes: one that runs every few minutes, retrieving data from a number of sources and storing it in the DB, and one that has an "always on" connection to the DB streaming data into the database in realtime (often several records per second). To keep the database size manageable I have a cron job that runs every evening to delete all records that are more than a week old, after archiving a subset of them in permanent storage.

This morning my boss e-mailed me, complaining that only a couple of aircraft were showing up in the list (SELECT distinct(tail) FROM data being the command that populates the list). Upon looking at the data I saw that it only went back to 4am this morning, rather than the week I was expecting. My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected:

DELETE FROM data WHERE pointtime<now() AT TIME ZONE 'UTC'-interval '7 days';

Then I noticed something even more odd. My database has an id column, which is defined as a SERIAL. As we all know, a serial is a monotonically increasing number that is not affected by deletes. However, the oldest record in my database, from 4am this morning, had an id of 1. Even though I KNOW there was data in the system yesterday. Even if my DELETE command was wrong and deleted ALL records, that shouldn't have reset the SERIAL column to 1! I also know that I have not been in the database mucking around with the sequence value - to be completely honest, I don't even know the exact command to reset it - I'd have to google it if I wanted to.

Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred.

So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-)

That is odd.  Even if it were an unlogged table, and there was a crash, the sequence wouldn't reset, and even if it was running in a very long-running transaction held open by a buggy connection pooler, the sequence would still progress as it's immune to the effects of transactions.

Not famillar with a logged vs unlogged table (still learning all the features of PostgreSQL), but as you said the sequence resetting is rather odd.


So if all the data went missing, and the sequence reset, the only thing I can think of is:

Someone ran:

TRUNCATE data RESTART IDENTITY;

Considering that I'm the only one in the company that knows SQL at all beyond a simple single-table select (I keep having to explain joins and how they aren't evil to the other programmer here), not likely :-)


or someone restored the table structure from a backup that deleted the original table.

Now that's a thought...Maybe my backup routine is working backwards (pushing from the backup archive to the primary machine rather than from the primary to the backup). I did switch primary and secondary machines a while ago, but I thought I had checked that. What would be the effect of overwriting the data files while the database is active?


Do you log DDL?

Not sure what that is, so I'll assume no :-)


Was the table partitioned?

Nope.


You should also really be on 9.2.7, although I can't think of any bug that's been fixed which could be responsible for this issue.

I'll look at updating. Thanks.


--
Thom


-----------------------------------------------
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
-----------------------------------------------

Attachment

Re: Mysterious DB reset

From
Michael Nolan
Date:
On 3/6/14, Israel Brewster <israel@eraalaska.net> wrote:



> LOG:  received smart shutdown request
> LOG:  autovacuum launcher shutting down
> LOG:  shutting down
> LOG:  database system is shut down
>
> However, there are no timestamps on any of the entries (can I fix that?)

Yes, change the log_line_prefix in the postgresql.conf file and reload it.

I use:

log_line_prefix = '%m %u '

You might also want to use this, at least temporarily:

log_statement = all

--
Mike Nolan


Re: Mysterious DB reset

From
Adrian Klaver
Date:
On 03/06/2014 09:33 AM, Israel Brewster wrote:
> For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going on
appearsto be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 
>
> On Mar 5, 2014, at 1:00 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
>> On 03/05/2014 10:22 AM, Israel Brewster wrote:
>>> My first thought was "Oh, I must have a typo in my
>>> cleanup routine, such that it is deleting all records rather than only
>>> those a week old, and it's just that no one has noticed until now". So I
>>> looked at that, but changing the delete to a select appeared to produce
>>> the proper results, in that no records were selected:
>>
>> Well it would, if the records only go back to 4 AM this morning. In other words if no records exist before 4 AM
today,no records exist before 7 days ago also or am I missing something? 
>
> If the delete is correct, you are absolutely right. My first theory, however, was that I made a typo, and the delete
wasdeleting ALL records, in which case changing it to a select would select all records. As it did not, that seems to
confirmthe delete is correct, and therefore not the problem. 
>
>> A sequence is just a special table.
>>
>> So what does SELECT * from the sequence show?
>
> tracking=> SELECT * FROM data_id_seq;
>   sequence_name | last_value | start_value | increment_by |      max_value      | min_value | cache_value | log_cnt |
is_cycled| is_called 
>
---------------+------------+-------------+--------------+---------------------+-----------+-------------+---------+-----------+-----------
>   data_id_seq   |       1184 |           1 |            1 | 9223372036854775807 |         1 |           1 |      16 |
f        | t 
>
>
>>
>>>
>>> Also odd is that my cleanup script runs at 1am. I have records of there
>>> being new data in the database up to 3:51am, but the oldest record
>>> currently in the DB is from 4:45am (as specified by the default of now()
>>> on the column). So I know records were added after my delete command
>>> ran, but before this reset occurred.
>>
>> I am not sure what you are calling the 'reset'?
>> Did something happen between 3:51 AM and 4:45 AM?
>
> Yes: All my data was deleted and the sequence reset to 1.
>
>> Also not sure why you call the 4:45 AM record the oldest, when you say you can identify records from 3:51 AM?
>
> As I mentioned, I archive the records to permanent storage. This archive process happens every hour (for various
reasons).That is how I know we had records for 3:51 am: they exist in the permanent archive. However, they don't exist
inthe local database any more. 

Well something is happening. See my notes on logging below to help track
down the cause.

>
>>
>>>
>>> So my question is, aside from someone going in and mucking about in the
>>> wee hours of the morning, what could possibly cause this behavior? What
>>> sort of event could cause all data to be deleted from the table, and the
>>> sequence to be reset? Especially while there is an active connection?
>>> Thanks for any ideas, however wild or off the wall :-)
>>
>> What is in the Postgres/system logs for the time period(s) you mention?
>
> The postgres log has a lot of errors in it, some of which MAY explain the issue. For example:
>
> cp: cannot create regular file '/mnt/pglogs/000000010000000400000094': Permission denied
> LOG:  archive command failed with exit code 1
> DETAIL:  The failed archive command was: test ! -f /mnt/pglogs/000000010000000400000094 && cp
pg_xlog/000000010000000400000094/mnt/pglogs/000000010000000400000094 
> WARNING:  transaction log file "000000010000000400000094" could not be archived: too many failures
> LOG:  received smart shutdown request
> LOG:  autovacuum launcher shutting down
> LOG:  shutting down
> LOG:  database system is shut down

Might be good to explain your archive setup.

>
> However, there are no timestamps on any of the entries (can I fix that?), so I don't know if those are current
entries,or from back before I got the mount for the logs working. At this time, the mount point IS working correctly,
andfrom what I can tell so is the archive command. The latest entry is from yesterday (modify date on the log shows Mar
5,9:21, when I was messing with it yesterday), however, so there are no entries from this morning when it happened
again.I don't see anything of interest in the syslog or messages log. 

Yes you can, timestamps and a lot more. For all the details go here:

http://www.postgresql.org/docs/9.2/interactive/runtime-config-logging.html

At the least I would:

log_statement = 'mod'

log_connections = on
log_disconnections = on

log_line_prefix = '%u-%m-%x'

>
>>
>>>
>>> -----------------------------------------------
>>> Israel Brewster
>>> Computer Support Technician II
>>> Era Alaska
>>> 5245 Airport Industrial Rd
>>> Fairbanks, AK 99709
>>> (907) 450-7250 x7293
>>> -----------------------------------------------
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Adrian Klaver
>> adrian.klaver@aklaver.com
>
>
>



Re: Mysterious DB reset

From
Israel Brewster
Date:
On Mar 6, 2014, at 9:03 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> On 03/06/2014 09:33 AM, Israel Brewster wrote:
>> For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going
onappears to be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 
>>
>> On Mar 5, 2014, at 1:00 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>
>>> On 03/05/2014 10:22 AM, Israel Brewster wrote:
>>>> My first thought was "Oh, I must have a typo in my
>>>> cleanup routine, such that it is deleting all records rather than only
>>>> those a week old, and it's just that no one has noticed until now". So I
>>>> looked at that, but changing the delete to a select appeared to produce
>>>> the proper results, in that no records were selected:
>>>
>>> Well it would, if the records only go back to 4 AM this morning. In other words if no records exist before 4 AM
today,no records exist before 7 days ago also or am I missing something? 
>>
>> If the delete is correct, you are absolutely right. My first theory, however, was that I made a typo, and the delete
wasdeleting ALL records, in which case changing it to a select would select all records. As it did not, that seems to
confirmthe delete is correct, and therefore not the problem. 
>>
>>> A sequence is just a special table.
>>>
>>> So what does SELECT * from the sequence show?
>>
>> tracking=> SELECT * FROM data_id_seq;
>>  sequence_name | last_value | start_value | increment_by |      max_value      | min_value | cache_value | log_cnt |
is_cycled| is_called 
>>
---------------+------------+-------------+--------------+---------------------+-----------+-------------+---------+-----------+-----------
>>  data_id_seq   |       1184 |           1 |            1 | 9223372036854775807 |         1 |           1 |      16 |
f        | t 
>>
>>
>>>
>>>>
>>>> Also odd is that my cleanup script runs at 1am. I have records of there
>>>> being new data in the database up to 3:51am, but the oldest record
>>>> currently in the DB is from 4:45am (as specified by the default of now()
>>>> on the column). So I know records were added after my delete command
>>>> ran, but before this reset occurred.
>>>
>>> I am not sure what you are calling the 'reset'?
>>> Did something happen between 3:51 AM and 4:45 AM?
>>
>> Yes: All my data was deleted and the sequence reset to 1.
>>
>>> Also not sure why you call the 4:45 AM record the oldest, when you say you can identify records from 3:51 AM?
>>
>> As I mentioned, I archive the records to permanent storage. This archive process happens every hour (for various
reasons).That is how I know we had records for 3:51 am: they exist in the permanent archive. However, they don't exist
inthe local database any more. 
>
> Well something is happening. See my notes on logging below to help track down the cause.

Yep.

>
>>
>>>
>>>>
>>>> So my question is, aside from someone going in and mucking about in the
>>>> wee hours of the morning, what could possibly cause this behavior? What
>>>> sort of event could cause all data to be deleted from the table, and the
>>>> sequence to be reset? Especially while there is an active connection?
>>>> Thanks for any ideas, however wild or off the wall :-)
>>>
>>> What is in the Postgres/system logs for the time period(s) you mention?
>>
>> The postgres log has a lot of errors in it, some of which MAY explain the issue. For example:
>>
>> cp: cannot create regular file '/mnt/pglogs/000000010000000400000094': Permission denied
>> LOG:  archive command failed with exit code 1
>> DETAIL:  The failed archive command was: test ! -f /mnt/pglogs/000000010000000400000094 && cp
pg_xlog/000000010000000400000094/mnt/pglogs/000000010000000400000094 
>> WARNING:  transaction log file "000000010000000400000094" could not be archived: too many failures
>> LOG:  received smart shutdown request
>> LOG:  autovacuum launcher shutting down
>> LOG:  shutting down
>> LOG:  database system is shut down
>
> Might be good to explain your archive setup.

Ok, here goes: We have the primary system which receives the data and handles all requests for said data. There is also
ahot standby server keep in sync with streaming replication. The WALs are archived to a NFS share on this machine. 

Once an hour a python script runs that a) Selects all unsynced records from the postgresql db, b) stores a subset of
themin our permanent archive, and c) marks the previously selected records as synced (UPDATE data SET syncd=true WHERE
idin (...) ) 

Additionally, I have a) a script that runs at 8:00pm every evening that uses pg_dump to dump the contents of the
databaseto a backup file, and b) a script that runs at 8:00 each morning that rsync's various config files and scripts
(suchas my data retrieval scripts) from the primary machine to a backup location on the secondary machine. 

None of the scripts run anywhere near the apparent 4:40ish cutoff time for my data

Make sense? Probably not the best setup, but then that's what happens when you figure out stuff for yourself rather
thanhaving formal training :-) I'm DEFINITELY open to suggestions :-) 

>
>>
>> However, there are no timestamps on any of the entries (can I fix that?), so I don't know if those are current
entries,or from back before I got the mount for the logs working. At this time, the mount point IS working correctly,
andfrom what I can tell so is the archive command. The latest entry is from yesterday (modify date on the log shows Mar
5,9:21, when I was messing with it yesterday), however, so there are no entries from this morning when it happened
again.I don't see anything of interest in the syslog or messages log. 
>
> Yes you can, timestamps and a lot more. For all the details go here:
>
> http://www.postgresql.org/docs/9.2/interactive/runtime-config-logging.html
>
> At the least I would:
>
> log_statement = 'mod'
>
> log_connections = on
> log_disconnections = on
>
> log_line_prefix = '%u-%m-%x'

I'll get those in the config, and we'll see what happens tomorrow morning. Hopefully that will give more information.
Thanksfor the link and information! 

>
>>
>>>
>>>>
>>>> -----------------------------------------------
>>>> Israel Brewster
>>>> Computer Support Technician II
>>>> Era Alaska
>>>> 5245 Airport Industrial Rd
>>>> Fairbanks, AK 99709
>>>> (907) 450-7250 x7293
>>>> -----------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Adrian Klaver
>>> adrian.klaver@aklaver.com


-----------------------------------------------
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
-----------------------------------------------




Attachment

Re: Mysterious DB reset

From
Adrian Klaver
Date:
On 03/06/2014 10:43 AM, Israel Brewster wrote:
> On Mar 6, 2014, at 9:03 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>

>>
>> Well something is happening. See my notes on logging below to help track down the cause.
>
> Yep.
>

>>
>> Might be good to explain your archive setup.
>
> Ok, here goes: We have the primary system which receives the data and handles all requests for said data. There is
alsoa hot standby server keep in sync with streaming replication. The WALs are archived to a NFS share on this machine. 
>
> Once an hour a python script runs that a) Selects all unsynced records from the postgresql db, b) stores a subset of
themin our permanent archive, and c) marks the previously selected records as synced (UPDATE data SET syncd=true WHERE
idin (...) ) 
>
> Additionally, I have a) a script that runs at 8:00pm every evening that uses pg_dump to dump the contents of the
databaseto a backup file, and b) a script that runs at 8:00 each morning that rsync's various config files and scripts
(suchas my data retrieval scripts) from the primary machine to a backup location on the secondary machine. 
>
> None of the scripts run anywhere near the apparent 4:40ish cutoff time for my data

Are all the scripts running from one machine?
If so, have you checked that the times are set correctly on the various
machines?


>
> Make sense? Probably not the best setup, but then that's what happens when you figure out stuff for yourself rather
thanhaving formal training :-) I'm DEFINITELY open to suggestions :-) 

'Makes sense' is context sensitive. It really depends on what you want
to achieve. My procedure is to define the end result first and then work
backwards from there.

>
>>

>
> I'll get those in the config, and we'll see what happens tomorrow morning. Hopefully that will give more information.
Thanksfor the link and information! 
>
>



Re: Mysterious DB reset

From
Israel Brewster
Date:
On Mar 6, 2014, at 10:03 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> On 03/06/2014 10:43 AM, Israel Brewster wrote:
>> On Mar 6, 2014, at 9:03 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>
>
>>>
>>> Well something is happening. See my notes on logging below to help track down the cause.
>>
>> Yep.
>>
>
>>>
>>> Might be good to explain your archive setup.
>>
>> Ok, here goes: We have the primary system which receives the data and handles all requests for said data. There is
alsoa hot standby server keep in sync with streaming replication. The WALs are archived to a NFS share on this machine. 
>>
>> Once an hour a python script runs that a) Selects all unsynced records from the postgresql db, b) stores a subset of
themin our permanent archive, and c) marks the previously selected records as synced (UPDATE data SET syncd=true WHERE
idin (...) ) 
>>
>> Additionally, I have a) a script that runs at 8:00pm every evening that uses pg_dump to dump the contents of the
databaseto a backup file, and b) a script that runs at 8:00 each morning that rsync's various config files and scripts
(suchas my data retrieval scripts) from the primary machine to a backup location on the secondary machine. 
>>
>> None of the scripts run anywhere near the apparent 4:40ish cutoff time for my data
>
> Are all the scripts running from one machine?
> If so, have you checked that the times are set correctly on the various machines?

Three different machines (due to OS requirements), but yeah the times all appear to be correct.

>
>
>>
>> Make sense? Probably not the best setup, but then that's what happens when you figure out stuff for yourself rather
thanhaving formal training :-) I'm DEFINITELY open to suggestions :-) 
>
> 'Makes sense' is context sensitive. It really depends on what you want to achieve. My procedure is to define the end
resultfirst and then work backwards from there. 

Good point. However, I was asking more in the general "did I explain it well enough to be understood" rather than in
the"is it a sensible setup" aspect. The rest of the sentence was just me being insecure :-D 

>
>>
>>>
>
>>
>> I'll get those in the config, and we'll see what happens tomorrow morning. Hopefully that will give more
information.Thanks for the link and information! 
>>
>>
>

-----------------------------------------------
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
-----------------------------------------------


Attachment

Re: Mysterious DB reset

From
Adrian Klaver
Date:
On 03/06/2014 11:09 AM, Israel Brewster wrote:
> On Mar 6, 2014, at 10:03 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>

>>
>> Are all the scripts running from one machine?
>> If so, have you checked that the times are set correctly on the various machines?
>
> Three different machines (due to OS requirements), but yeah the times all appear to be correct.

So it appears you will have to wait to see what the logging reports.
Should have mentioned that you need to make sure you do something like
pg_ctl reload on the Postgres server to get the postgressql.conf changes
to take effect.


Another thought. Might not be bad idea to grep your scripts for ALTER,
SEQUENCE or any other relevant keywords. Just in case something slipped
in you where not aware of.

>
>>
>>
>>>
>>> Make sense? Probably not the best setup, but then that's what happens when you figure out stuff for yourself rather
thanhaving formal training :-) I'm DEFINITELY open to suggestions :-) 
>>
>> 'Makes sense' is context sensitive. It really depends on what you want to achieve. My procedure is to define the end
resultfirst and then work backwards from there. 
>
> Good point. However, I was asking more in the general "did I explain it well enough to be understood" rather than in
the"is it a sensible setup" aspect. The rest of the sentence was just me being insecure :-D 

I understood the basics of what you are doing. The details probably can
wait pending the log information. Hey, all of us are beginners/learning
in something.

>

>
> -----------------------------------------------
> Israel Brewster
> Computer Support Technician II
> Era Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7250 x7293
> -----------------------------------------------
>



Re: Mysterious DB reset

From
David Johnston
Date:
Israel Brewster-2 wrote
> So my question is, aside from someone going in and mucking about in the
> wee hours of the morning, what could possibly cause this behavior? What
> sort of event could cause all data to be deleted from the table, and the
> sequence to be reset? Especially while there is an active connection?
> Thanks for any ideas, however wild or off the wall :-)

Nothing so far makes this likely but any chance there is some kind of
Virtual Machine setup in place where all the changes from a given day are
being lost because the VM is resetting back to "factory defaults"?

Also, you say you perform daily pg_dumps.  Have you tried loading these up
and see what their contents are?

Ultimately the log files are going to be needed to do any meaningful
forensic work though.

Any chance you may been debugging the wrong box/database?  That is always
something worth verifying and usually not the difficult.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Mysterious-DB-reset-tp5794868p5795031.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Mysterious DB reset

From
Steve Crawford
Date:
On 03/06/2014 09:33 AM, Israel Brewster wrote:
> For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going on
appearsto be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 
>
>
> Also odd is that my cleanup script runs at 1am. I have records of there
> being new data in the database up to 3:51am, but the oldest record
> currently in the DB is from 4:45am (as specified by the default of now()
> on the column). So I know records were added after my delete command
> ran, but before this reset occurred.
>
A shot in the dark...

Have you searched /etc/crontab, root's crontab, PostgreSQL's crontab and
the crontabs of any automatic scripts that connect. I'm not sure about
Slackware but Red Hat and Centos run the cron.daily scripts at (wait for
it...) just after 4am.

Some of the default daily scripts like logrotate can have "side effects"
like restarting the service that writes to the log file being rotated.

Cheers,
Steve


Re: Mysterious DB reset

From
Israel Brewster
Date:
So the logs have given me a partial answer, specifically WHO and WHAT. I still have to track down WHY:

[unknown]-2014-03-07 04:40:02.407 AKST-0LOG:  connection received: host=[local]
postgres-2014-03-07 04:40:02.409 AKST-0LOG:  connection authorized: user=postgres database=tracking
postgres-2014-03-07 04:40:02.521 AKST-0LOG:  statement: DROP INDEX public.timerecp_idx;
postgres-2014-03-07 04:40:02.548 AKST-0LOG:  statement: DROP INDEX public.tail_idx;
postgres-2014-03-07 04:40:02.553 AKST-0LOG:  statement: DROP INDEX public.syncd_idx;
postgres-2014-03-07 04:40:02.562 AKST-0LOG:  statement: DROP INDEX public.pointtime_idx;
postgres-2014-03-07 04:40:02.571 AKST-0LOG:  statement: ALTER TABLE ONLY public.tails DROP CONSTRAINT tails_pkey;
postgres-2014-03-07 04:40:02.577 AKST-0LOG:  statement: ALTER TABLE ONLY public.data DROP CONSTRAINT data_pkey;
postgres-2014-03-07 04:40:02.588 AKST-0LOG:  statement: ALTER TABLE public.tails ALTER COLUMN id DROP DEFAULT;
postgres-2014-03-07 04:40:02.593 AKST-0LOG:  statement: ALTER TABLE public.data ALTER COLUMN id DROP DEFAULT;
postgres-2014-03-07 04:40:02.601 AKST-0LOG:  statement: DROP SEQUENCE public.tails_id_seq;
postgres-2014-03-07 04:40:02.611 AKST-0LOG:  statement: DROP TABLE public.tails;
postgres-2014-03-07 04:40:02.619 AKST-0LOG:  statement: DROP SEQUENCE public.data_id_seq;
postgres-2014-03-07 04:40:02.627 AKST-0LOG:  statement: DROP TABLE public.data;
postgres-2014-03-07 04:40:02.643 AKST-0LOG:  statement: DROP EXTENSION plpgsql;
postgres-2014-03-07 04:40:02.654 AKST-0LOG:  statement: DROP SCHEMA public;
postgres-2014-03-07 04:40:02.663 AKST-0LOG:  statement: CREATE SCHEMA public;
postgres-2014-03-07 04:40:02.671 AKST-0LOG:  statement: ALTER SCHEMA public OWNER TO postgres;
postgres-2014-03-07 04:40:02.671 AKST-0LOG:  statement: COMMENT ON SCHEMA public IS 'standard public schema';
postgres-2014-03-07 04:40:02.679 AKST-0LOG:  statement: CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;
postgres-2014-03-07 04:40:02.689 AKST-0LOG:  statement: COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language';
postgres-2014-03-07 04:40:02.697 AKST-0LOG:  statement: CREATE TABLE data (
            id bigint NOT NULL,
            tail character varying(16) NOT NULL,
            timerecp timestamp without time zone DEFAULT now() NOT NULL,
            altitude integer,
            pointtime timestamp without time zone,
            lat numeric(7,5) NOT NULL,
            lng numeric(8,5) NOT NULL,
            speed integer,
            heading integer,
            source character varying(64),
            syncd boolean DEFAULT false
        );
postgres-2014-03-07 04:40:02.707 AKST-0LOG:  statement: ALTER TABLE public.data OWNER TO tracking;
postgres-2014-03-07 04:40:02.715 AKST-0LOG:  statement: CREATE SEQUENCE data_id_seq
            START WITH 1
            INCREMENT BY 1
            NO MINVALUE
            NO MAXVALUE
            CACHE 1;
postgres-2014-03-07 04:40:02.723 AKST-0LOG:  statement: ALTER TABLE public.data_id_seq OWNER TO tracking;
postgres-2014-03-07 04:40:02.732 AKST-0LOG:  statement: ALTER SEQUENCE data_id_seq OWNED BY data.id;
postgres-2014-03-07 04:40:02.740 AKST-0LOG:  statement: CREATE TABLE tails (
            id integer NOT NULL,
            tailnum character varying(8),
            trackingdevicenumber character varying(256)
        );
postgres-2014-03-07 04:40:02.749 AKST-0LOG:  statement: ALTER TABLE public.tails OWNER TO tracking;
postgres-2014-03-07 04:40:02.757 AKST-0LOG:  statement: CREATE SEQUENCE tails_id_seq
            START WITH 1
            INCREMENT BY 1
            NO MINVALUE
            NO MAXVALUE
            CACHE 1;
postgres-2014-03-07 04:40:02.766 AKST-0LOG:  statement: ALTER TABLE public.tails_id_seq OWNER TO tracking;
postgres-2014-03-07 04:40:02.774 AKST-0LOG:  statement: ALTER SEQUENCE tails_id_seq OWNED BY tails.id;
postgres-2014-03-07 04:40:02.782 AKST-0LOG:  statement: ALTER TABLE ONLY data ALTER COLUMN id SET DEFAULT
nextval('data_id_seq'::regclass);
postgres-2014-03-07 04:40:02.791 AKST-0LOG:  statement: ALTER TABLE ONLY tails ALTER COLUMN id SET DEFAULT
nextval('tails_id_seq'::regclass);
postgres-2014-03-07 04:40:02.799 AKST-0LOG:  statement: ALTER TABLE ONLY data
            ADD CONSTRAINT data_pkey PRIMARY KEY (id);
postgres-2014-03-07 04:40:03.066 AKST-0LOG:  statement: ALTER TABLE ONLY tails
            ADD CONSTRAINT tails_pkey PRIMARY KEY (id);
postgres-2014-03-07 04:40:03.116 AKST-0LOG:  statement: CREATE INDEX pointtime_idx ON data USING btree (pointtime);
postgres-2014-03-07 04:40:03.158 AKST-0LOG:  statement: CREATE INDEX syncd_idx ON data USING btree (syncd);
postgres-2014-03-07 04:40:03.208 AKST-0LOG:  statement: CREATE INDEX tail_idx ON data USING btree (tail);
postgres-2014-03-07 04:40:03.258 AKST-0LOG:  statement: CREATE INDEX timerecp_idx ON data USING btree (timerecp);
postgres-2014-03-07 04:40:03.309 AKST-0LOG:  statement: REVOKE ALL ON SCHEMA public FROM PUBLIC;
postgres-2014-03-07 04:40:03.317 AKST-0LOG:  statement: REVOKE ALL ON SCHEMA public FROM postgres;
postgres-2014-03-07 04:40:03.325 AKST-0LOG:  statement: GRANT ALL ON SCHEMA public TO postgres;
postgres-2014-03-07 04:40:03.333 AKST-0LOG:  statement: GRANT ALL ON SCHEMA public TO PUBLIC;
postgres-2014-03-07 04:40:03.342 AKST-0LOG:  disconnection: session time: 0:00:00.935 user=postgres database=tracking
host=[local]

So that definitely answers what is happening: Every morning at 4:40, the tables get dropped an re-created. Wow. I KNOW
Inever wrote a script to do that! I'm sort of thinking I somehow have a pg_dump with a -c flag (or perhaps the file
fromsuch) that is being loaded in, but I still don't know how or why. Closer though. Thanks for the suggestions! 
-----------------------------------------------
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
-----------------------------------------------



On Mar 6, 2014, at 10:34 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> On 03/06/2014 11:09 AM, Israel Brewster wrote:
>> On Mar 6, 2014, at 10:03 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>
>
>>>
>>> Are all the scripts running from one machine?
>>> If so, have you checked that the times are set correctly on the various machines?
>>
>> Three different machines (due to OS requirements), but yeah the times all appear to be correct.
>
> So it appears you will have to wait to see what the logging reports. Should have mentioned that you need to make sure
youdo something like pg_ctl reload on the Postgres server to get the postgressql.conf changes to take effect. 
>
>
> Another thought. Might not be bad idea to grep your scripts for ALTER, SEQUENCE or any other relevant keywords. Just
incase something slipped in you where not aware of. 
>
>>
>>>
>>>
>>>>
>>>> Make sense? Probably not the best setup, but then that's what happens when you figure out stuff for yourself
ratherthan having formal training :-) I'm DEFINITELY open to suggestions :-) 
>>>
>>> 'Makes sense' is context sensitive. It really depends on what you want to achieve. My procedure is to define the
endresult first and then work backwards from there. 
>>
>> Good point. However, I was asking more in the general "did I explain it well enough to be understood" rather than in
the"is it a sensible setup" aspect. The rest of the sentence was just me being insecure :-D 
>
> I understood the basics of what you are doing. The details probably can wait pending the log information. Hey, all of
usare beginners/learning in something. 
>
>>
>
>>
>> -----------------------------------------------
>> Israel Brewster
>> Computer Support Technician II
>> Era Alaska
>> 5245 Airport Industrial Rd
>> Fairbanks, AK 99709
>> (907) 450-7250 x7293
>> -----------------------------------------------
>>
>


Attachment

Re: Mysterious DB reset

From
Israel Brewster
Date:
On Mar 6, 2014, at 1:25 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote:

> On 03/06/2014 09:33 AM, Israel Brewster wrote:
>> For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going
onappears to be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 
>>
>>
>> Also odd is that my cleanup script runs at 1am. I have records of there
>> being new data in the database up to 3:51am, but the oldest record
>> currently in the DB is from 4:45am (as specified by the default of now()
>> on the column). So I know records were added after my delete command
>> ran, but before this reset occurred.
>>
> A shot in the dark...
>
> Have you searched /etc/crontab, root's crontab, PostgreSQL's crontab and the crontabs of any automatic scripts that
connect.I'm not sure about Slackware but Red Hat and Centos run the cron.daily scripts at (wait for it...) just after
4am.

Good shot - you nailed it! I found a rouge script in /etc/crontab.daily that ran the following line at 4:40am:

/usr/local/pgsql/bin/pg_dump -h <domain name of localhost> -U tracking -cs | /usr/local/pgsql/bin/psql -U postgres
tracking

It must have been left over from before I got streaming replication up and working, when this box was still the backup
serverand not primary -i.e. the domain name in the first half wasn't for the local machine until I swapped machines.
Apparentlywhen you do a pg_dump with the -c flag from a server to itself, it does the clean before reading the data,
resultingin a new empty database. Thanks again for all the suggestions! 

>
> Some of the default daily scripts like logrotate can have "side effects" like restarting the service that writes to
thelog file being rotated. 
>
> Cheers,
> Steve
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



Re: Mysterious DB reset

From
Adrian Klaver
Date:
On 03/07/2014 11:08 AM, Israel Brewster wrote:
> On Mar 6, 2014, at 1:25 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote:
>
>> On 03/06/2014 09:33 AM, Israel Brewster wrote:
>>> For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going
onappears to be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 
>>>
>>>
>>> Also odd is that my cleanup script runs at 1am. I have records of there
>>> being new data in the database up to 3:51am, but the oldest record
>>> currently in the DB is from 4:45am (as specified by the default of now()
>>> on the column). So I know records were added after my delete command
>>> ran, but before this reset occurred.
>>>
>> A shot in the dark...
>>
>> Have you searched /etc/crontab, root's crontab, PostgreSQL's crontab and the crontabs of any automatic scripts that
connect.I'm not sure about Slackware but Red Hat and Centos run the cron.daily scripts at (wait for it...) just after
4am.
>
> Good shot - you nailed it! I found a rouge script in /etc/crontab.daily that ran the following line at 4:40am:
>
> /usr/local/pgsql/bin/pg_dump -h <domain name of localhost> -U tracking -cs | /usr/local/pgsql/bin/psql -U postgres
tracking
>
> It must have been left over from before I got streaming replication up and working, when this box was still the
backupserver and not primary -i.e. the domain name in the first half wasn't for the local machine until I swapped
machines.Apparently when you do a pg_dump with the -c flag from a server to itself, it does the clean before reading
thedata, resulting in a new empty database. Thanks again for all the suggestions! 

No, -c just tells pg_dump to output clean commands and really only has
meaning when you do a plain text dump as above. pg_dump does not clean
the data from the running server on its own. What did the cleaning was
immediately piping to output of the dump to psql. So basically you
created a plain text dump file and fed it back to the server and the
dump file included commands to clean out objects. If you had not used -c
you would have gotten a bunch of duplicate <something> errors.


>
>>
>> Some of the default daily scripts like logrotate can have "side effects" like restarting the service that writes to
thelog file being rotated. 
>>
>> Cheers,
>> Steve
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Mysterious DB reset

From
Israel Brewster
Date:
-----------------------------------------------
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
-----------------------------------------------



On Mar 7, 2014, at 12:07 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> On 03/07/2014 11:08 AM, Israel Brewster wrote:
>> On Mar 6, 2014, at 1:25 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote:
>>
>>> On 03/06/2014 09:33 AM, Israel Brewster wrote:
>>>> For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going
onappears to be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 
>>>>
>>>>
>>>> Also odd is that my cleanup script runs at 1am. I have records of there
>>>> being new data in the database up to 3:51am, but the oldest record
>>>> currently in the DB is from 4:45am (as specified by the default of now()
>>>> on the column). So I know records were added after my delete command
>>>> ran, but before this reset occurred.
>>>>
>>> A shot in the dark...
>>>
>>> Have you searched /etc/crontab, root's crontab, PostgreSQL's crontab and the crontabs of any automatic scripts that
connect.I'm not sure about Slackware but Red Hat and Centos run the cron.daily scripts at (wait for it...) just after
4am.
>>
>> Good shot - you nailed it! I found a rouge script in /etc/crontab.daily that ran the following line at 4:40am:
>>
>> /usr/local/pgsql/bin/pg_dump -h <domain name of localhost> -U tracking -cs | /usr/local/pgsql/bin/psql -U postgres
tracking
>>
>> It must have been left over from before I got streaming replication up and working, when this box was still the
backupserver and not primary -i.e. the domain name in the first half wasn't for the local machine until I swapped
machines.Apparently when you do a pg_dump with the -c flag from a server to itself, it does the clean before reading
thedata, resulting in a new empty database. Thanks again for all the suggestions! 
>
> No, -c just tells pg_dump to output clean commands and really only has meaning when you do a plain text dump as
above.pg_dump does not clean the data from the running server on its own. What did the cleaning was immediately piping
tooutput of the dump to psql. So basically you created a plain text dump file and fed it back to the server and the
dumpfile included commands to clean out objects. If you had not used -c you would have gotten a bunch of duplicate
<something>errors. 

Right, but I ended up with no data. So what I was getting at was that because I had the -c, which output clean
commands,those clean commands were executed before pg_dump dumped the data. You are absolutely correct in saying that
itwas only executed at all because it was piped to psql, I wasn't claiming otherwise :-) However, the sequence of
eventsclearly is: 1) pg_dump outputs "clean" commands. The pipe to psql runs these, deleting the tables/data/etc. 2)
pg_dumpoutputs the rebuild commands, which are piped to psql, which rebuilds the tables/etc 3) pg_dump tries to dump
thedata, but there is none (other than the default starting data in the sequences) due to steps 1 and 2 already having
beenexecuted (by the pipe to psql) on the same database that pg_dump is running on. 

Sorry for not being clear :-)

>
>
>>
>>>
>>> Some of the default daily scripts like logrotate can have "side effects" like restarting the service that writes to
thelog file being rotated. 
>>>
>>> Cheers,
>>> Steve
>>>
>>>
>>> --
>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-general
>>
>>
>>
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Attachment

Re: Mysterious DB reset

From
Adrian Klaver
Date:
On 03/07/2014 03:10 PM, Israel Brewster wrote:

>>> /usr/local/pgsql/bin/pg_dump -h <domain name of localhost> -U tracking -cs | /usr/local/pgsql/bin/psql -U postgres
tracking
>>>
>>> It must have been left over from before I got streaming replication up and working, when this box was still the
backupserver and not primary -i.e. the domain name in the first half wasn't for the local machine until I swapped
machines.Apparently when you do a pg_dump with the -c flag from a server to itself, it does the clean before reading
thedata, resulting in a new empty database. Thanks again for all the suggestions! 
>>
>> No, -c just tells pg_dump to output clean commands and really only has meaning when you do a plain text dump as
above.pg_dump does not clean the data from the running server on its own. What did the cleaning was immediately piping
tooutput of the dump to psql. So basically you created a plain text dump file and fed it back to the server and the
dumpfile included commands to clean out objects. If you had not used -c you would have gotten a bunch of duplicate
<something>errors. 
>
> Right, but I ended up with no data. So what I was getting at was that because I had the -c, which output clean
commands,those clean commands were executed before pg_dump dumped the data. You are absolutely correct in saying that
itwas only executed at all because it was piped to psql, I wasn't claiming otherwise :-) However, the sequence of
eventsclearly is: 1) pg_dump outputs "clean" commands. The pipe to psql runs these, deleting the tables/data/etc. 2)
pg_dumpoutputs the rebuild commands, which are piped to psql, which rebuilds the tables/etc 3) pg_dump tries to dump
thedata, but there is none (other than the default starting data in the sequences) due to steps 1 and 2 already having
beenexecuted (by the pipe to psql) on the same database that pg_dump is running on. 
>
> Sorry for not being clear :-)

Yea, a snake eating its tail.

>
>>
>>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Mysterious DB reset

From
Brent Wood
Date:
One point - a serial datatype will not increment to infinity, as it is effectively a 4 byte integer with a sequence imposed, it can only store values upto MAXINT (2147483647) . Above this it may well wrap around where MAXINT + 1 = 1

You can delay the problem (significantly) by using bigserial (8 byte integer) instead of serial - this has MAXINT=9223372036854775807

http://www.postgresql.org/docs/9.2/static/datatype-numeric.html

Otherwise you might run a cron job or trigger to reset the serial values & the sequence when you think it timely.


I can't see how this would cause the missing records though.

Cheers

Brent Wood

Programme leader: Environmental Information Delivery
NIWA
DDI:  +64 (4) 3860529

From: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] on behalf of Thom Brown [thom@linux.com]
Sent: Thursday, March 06, 2014 8:01 AM
To: Israel Brewster
Cc: PGSQL Mailing List
Subject: Re: [GENERAL] Mysterious DB reset

On 5 March 2014 18:22, Israel Brewster <israel@eraalaska.net> wrote:
I have a Postgresql 9.2.4 database containing real-time tracking data for our aircraft for the past week (theoretically). It is populated by two different processes: one that runs every few minutes, retrieving data from a number of sources and storing it in the DB, and one that has an "always on" connection to the DB streaming data into the database in realtime (often several records per second). To keep the database size manageable I have a cron job that runs every evening to delete all records that are more than a week old, after archiving a subset of them in permanent storage.

This morning my boss e-mailed me, complaining that only a couple of aircraft were showing up in the list (SELECT distinct(tail) FROM data being the command that populates the list). Upon looking at the data I saw that it only went back to 4am this morning, rather than the week I was expecting. My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected:

DELETE FROM data WHERE pointtime<now() AT TIME ZONE 'UTC'-interval '7 days';

Then I noticed something even more odd. My database has an id column, which is defined as a SERIAL. As we all know, a serial is a monotonically increasing number that is not affected by deletes. However, the oldest record in my database, from 4am this morning, had an id of 1. Even though I KNOW there was data in the system yesterday. Even if my DELETE command was wrong and deleted ALL records, that shouldn't have reset the SERIAL column to 1! I also know that I have not been in the database mucking around with the sequence value - to be completely honest, I don't even know the exact command to reset it - I'd have to google it if I wanted to.

Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred.

So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-)

That is odd.  Even if it were an unlogged table, and there was a crash, the sequence wouldn't reset, and even if it was running in a very long-running transaction held open by a buggy connection pooler, the sequence would still progress as it's immune to the effects of transactions.

So if all the data went missing, and the sequence reset, the only thing I can think of is:

Someone ran:

TRUNCATE data RESTART IDENTITY;

or someone restored the table structure from a backup that deleted the original table.

Do you log DDL?

Was the table partitioned?

You should also really be on 9.2.7, although I can't think of any bug that's been fixed which could be responsible for this issue.

--
Thom
--
Please consider the environment before printing this email.
NIWA is the trading name of the National Institute of Water & Atmospheric Research Ltd.