Re: Mysterious DB reset - Mailing list pgsql-general

From Israel Brewster
Subject Re: Mysterious DB reset
Date
Msg-id 84AF018A-5F37-4314-877A-BA0CE114FEE9@eraalaska.net
Whole thread Raw
In response to Re: Mysterious DB reset  (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses Re: Mysterious DB reset  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
On Mar 6, 2014, at 9:03 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

> On 03/06/2014 09:33 AM, Israel Brewster wrote:
>> For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going
onappears to be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 
>>
>> On Mar 5, 2014, at 1:00 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>
>>> On 03/05/2014 10:22 AM, Israel Brewster wrote:
>>>> My first thought was "Oh, I must have a typo in my
>>>> cleanup routine, such that it is deleting all records rather than only
>>>> those a week old, and it's just that no one has noticed until now". So I
>>>> looked at that, but changing the delete to a select appeared to produce
>>>> the proper results, in that no records were selected:
>>>
>>> Well it would, if the records only go back to 4 AM this morning. In other words if no records exist before 4 AM
today,no records exist before 7 days ago also or am I missing something? 
>>
>> If the delete is correct, you are absolutely right. My first theory, however, was that I made a typo, and the delete
wasdeleting ALL records, in which case changing it to a select would select all records. As it did not, that seems to
confirmthe delete is correct, and therefore not the problem. 
>>
>>> A sequence is just a special table.
>>>
>>> So what does SELECT * from the sequence show?
>>
>> tracking=> SELECT * FROM data_id_seq;
>>  sequence_name | last_value | start_value | increment_by |      max_value      | min_value | cache_value | log_cnt |
is_cycled| is_called 
>>
---------------+------------+-------------+--------------+---------------------+-----------+-------------+---------+-----------+-----------
>>  data_id_seq   |       1184 |           1 |            1 | 9223372036854775807 |         1 |           1 |      16 |
f        | t 
>>
>>
>>>
>>>>
>>>> Also odd is that my cleanup script runs at 1am. I have records of there
>>>> being new data in the database up to 3:51am, but the oldest record
>>>> currently in the DB is from 4:45am (as specified by the default of now()
>>>> on the column). So I know records were added after my delete command
>>>> ran, but before this reset occurred.
>>>
>>> I am not sure what you are calling the 'reset'?
>>> Did something happen between 3:51 AM and 4:45 AM?
>>
>> Yes: All my data was deleted and the sequence reset to 1.
>>
>>> Also not sure why you call the 4:45 AM record the oldest, when you say you can identify records from 3:51 AM?
>>
>> As I mentioned, I archive the records to permanent storage. This archive process happens every hour (for various
reasons).That is how I know we had records for 3:51 am: they exist in the permanent archive. However, they don't exist
inthe local database any more. 
>
> Well something is happening. See my notes on logging below to help track down the cause.

Yep.

>
>>
>>>
>>>>
>>>> So my question is, aside from someone going in and mucking about in the
>>>> wee hours of the morning, what could possibly cause this behavior? What
>>>> sort of event could cause all data to be deleted from the table, and the
>>>> sequence to be reset? Especially while there is an active connection?
>>>> Thanks for any ideas, however wild or off the wall :-)
>>>
>>> What is in the Postgres/system logs for the time period(s) you mention?
>>
>> The postgres log has a lot of errors in it, some of which MAY explain the issue. For example:
>>
>> cp: cannot create regular file '/mnt/pglogs/000000010000000400000094': Permission denied
>> LOG:  archive command failed with exit code 1
>> DETAIL:  The failed archive command was: test ! -f /mnt/pglogs/000000010000000400000094 && cp
pg_xlog/000000010000000400000094/mnt/pglogs/000000010000000400000094 
>> WARNING:  transaction log file "000000010000000400000094" could not be archived: too many failures
>> LOG:  received smart shutdown request
>> LOG:  autovacuum launcher shutting down
>> LOG:  shutting down
>> LOG:  database system is shut down
>
> Might be good to explain your archive setup.

Ok, here goes: We have the primary system which receives the data and handles all requests for said data. There is also
ahot standby server keep in sync with streaming replication. The WALs are archived to a NFS share on this machine. 

Once an hour a python script runs that a) Selects all unsynced records from the postgresql db, b) stores a subset of
themin our permanent archive, and c) marks the previously selected records as synced (UPDATE data SET syncd=true WHERE
idin (...) ) 

Additionally, I have a) a script that runs at 8:00pm every evening that uses pg_dump to dump the contents of the
databaseto a backup file, and b) a script that runs at 8:00 each morning that rsync's various config files and scripts
(suchas my data retrieval scripts) from the primary machine to a backup location on the secondary machine. 

None of the scripts run anywhere near the apparent 4:40ish cutoff time for my data

Make sense? Probably not the best setup, but then that's what happens when you figure out stuff for yourself rather
thanhaving formal training :-) I'm DEFINITELY open to suggestions :-) 

>
>>
>> However, there are no timestamps on any of the entries (can I fix that?), so I don't know if those are current
entries,or from back before I got the mount for the logs working. At this time, the mount point IS working correctly,
andfrom what I can tell so is the archive command. The latest entry is from yesterday (modify date on the log shows Mar
5,9:21, when I was messing with it yesterday), however, so there are no entries from this morning when it happened
again.I don't see anything of interest in the syslog or messages log. 
>
> Yes you can, timestamps and a lot more. For all the details go here:
>
> http://www.postgresql.org/docs/9.2/interactive/runtime-config-logging.html
>
> At the least I would:
>
> log_statement = 'mod'
>
> log_connections = on
> log_disconnections = on
>
> log_line_prefix = '%u-%m-%x'

I'll get those in the config, and we'll see what happens tomorrow morning. Hopefully that will give more information.
Thanksfor the link and information! 

>
>>
>>>
>>>>
>>>> -----------------------------------------------
>>>> Israel Brewster
>>>> Computer Support Technician II
>>>> Era Alaska
>>>> 5245 Airport Industrial Rd
>>>> Fairbanks, AK 99709
>>>> (907) 450-7250 x7293
>>>> -----------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Adrian Klaver
>>> adrian.klaver@aklaver.com


-----------------------------------------------
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
-----------------------------------------------




Attachment

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Mysterious DB reset
Next
From: Adrian Klaver
Date:
Subject: Re: Mysterious DB reset