Thread: Help me recovering data

Help me recovering data

From
"Kouber Saparev"
Date:
Hi folks,

I ran into big trouble - it seems that my DB is lost.

"select * from pg_database" gives me 0 rows, but I still can connect to
databases with \c and even select from tables there, although they're also
not visible with \dt.

After asking the guys in the #postgresql@irc.freenode.net channel they told
me that the reason is the "Transaction ID wraparound", because I have never
ran VACUUM on the whole database.

So they proposed to ask here for help. I have stopped the server, but what
could I do in order to save the data if it's possible at all?

You could also take a look at my pg_controldata output:
http://rafb.net/paste/results/ghcIb880.html

Regards,
Kouber Saparev



Re: Help me recovering data

From
Tom Lane
Date:
"Kouber Saparev" <postgresql@saparev.com> writes:
> After asking the guys in the #postgresql@irc.freenode.net channel they told
> me that the reason is the "Transaction ID wraparound", because I have never
> ran VACUUM on the whole database.

> So they proposed to ask here for help. I have stopped the server, but what
> could I do in order to save the data if it's possible at all?

I think you're pretty well screwed as far as getting it *all* back goes,
but you could use pg_resetxlog to back up the NextXID counter enough to
make your tables and databases reappear (and thereby lose the effects of
however many recent transactions you back up over).

Once you've found a NextXID setting you like, I'd suggest an immediate
pg_dumpall/initdb/reload to make sure you have a consistent set of data.
Don't VACUUM, or indeed modify the DB at all, until you have gotten a
satisfactory dump.

Then put in a cron job to do periodic vacuuming ;-)
        regards, tom lane


Re: Help me recovering data

From
"Kouber Saparev"
Date:
> Once you've found a NextXID setting you like, I'd suggest an immediate
> pg_dumpall/initdb/reload to make sure you have a consistent set of data.
> Don't VACUUM, or indeed modify the DB at all, until you have gotten a
> satisfactory dump.
>
> Then put in a cron job to do periodic vacuuming ;-)

Thank you, I just discovered in the mailing lists that I'm not the first nor
the last guy that have forgotten to VACUUM the database. ;-)

Regards,
Kouber Saparev



Re: Help me recovering data

From
Christopher Kings-Lynne
Date:
> I think you're pretty well screwed as far as getting it *all* back goes,
> but you could use pg_resetxlog to back up the NextXID counter enough to
> make your tables and databases reappear (and thereby lose the effects of
> however many recent transactions you back up over).
> 
> Once you've found a NextXID setting you like, I'd suggest an immediate
> pg_dumpall/initdb/reload to make sure you have a consistent set of data.
> Don't VACUUM, or indeed modify the DB at all, until you have gotten a
> satisfactory dump.
> 
> Then put in a cron job to do periodic vacuuming ;-)

This might seem like a stupid question, but since this is a massive data 
loss potential in PostgreSQL, what's so hard about having the 
checkpointer or something check the transaction counter when it runs and  either issue a db-wide vacuum if it's about
towrap, or simply 
 
disallow any new transactions?

I think people'd rather their db just stopped accepting new transactions 
rather than just losing data...

Chris


Re: Help me recovering data

From
pgsql@mohawksoft.com
Date:
It must be possible to create a tool based on the PostgreSQL sources that
can read all the tuples in a database and dump them to a file stream. All
the data remains in the file until overwritten with data after a vacuum.
It *should* be doable.

If there data in the table is worth anything, then it would be worth
extracting.

It would, of course, be a tool of last resort.



> "Kouber Saparev" <postgresql@saparev.com> writes:
>> After asking the guys in the #postgresql@irc.freenode.net channel they
>> told
>> me that the reason is the "Transaction ID wraparound", because I have
>> never
>> ran VACUUM on the whole database.
>
>> So they proposed to ask here for help. I have stopped the server, but
>> what
>> could I do in order to save the data if it's possible at all?
>
> I think you're pretty well screwed as far as getting it *all* back goes,
> but you could use pg_resetxlog to back up the NextXID counter enough to
> make your tables and databases reappear (and thereby lose the effects of
> however many recent transactions you back up over).
>
> Once you've found a NextXID setting you like, I'd suggest an immediate
> pg_dumpall/initdb/reload to make sure you have a consistent set of data.
> Don't VACUUM, or indeed modify the DB at all, until you have gotten a
> satisfactory dump.
>
> Then put in a cron job to do periodic vacuuming ;-)
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>



Re: Help me recovering data

From
Doug McNaught
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:

> This might seem like a stupid question, but since this is a massive
> data loss potential in PostgreSQL, what's so hard about having the
> checkpointer or something check the transaction counter when it runs
> and either issue a db-wide vacuum if it's about to wrap, or simply
> disallow any new transactions?

I think autovac-in-backend is the preferred solution to this, and it's
definitely on the TODO list...

-Doug


Re: Help me recovering data

From
pgsql@mohawksoft.com
Date:
>> I think you're pretty well screwed as far as getting it *all* back goes,
>> but you could use pg_resetxlog to back up the NextXID counter enough to
>> make your tables and databases reappear (and thereby lose the effects of
>> however many recent transactions you back up over).
>>
>> Once you've found a NextXID setting you like, I'd suggest an immediate
>> pg_dumpall/initdb/reload to make sure you have a consistent set of data.
>> Don't VACUUM, or indeed modify the DB at all, until you have gotten a
>> satisfactory dump.
>>
>> Then put in a cron job to do periodic vacuuming ;-)
>
> This might seem like a stupid question, but since this is a massive data
> loss potential in PostgreSQL, what's so hard about having the
> checkpointer or something check the transaction counter when it runs and
>   either issue a db-wide vacuum if it's about to wrap, or simply
> disallow any new transactions?
>
> I think people'd rather their db just stopped accepting new transactions
> rather than just losing data...
>

I would certainly prefer the system to issue an error and stop working
than complete data loss.



Re: Help me recovering data

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> This might seem like a stupid question, but since this is a massive data 
> loss potential in PostgreSQL, what's so hard about having the 
> checkpointer or something check the transaction counter when it runs and 
>   either issue a db-wide vacuum if it's about to wrap, or simply 
> disallow any new transactions?

The checkpointer is entirely incapable of either detecting the problem
(it doesn't have enough infrastructure to examine pg_database in a
reasonable way) or preventing backends from doing anything if it did
know there was a problem.

> I think people'd rather their db just stopped accepting new transactions 
> rather than just losing data...

Not being able to issue new transactions *is* data loss --- how are you
going to get the system out of that state?

autovacuum is the correct long-term solution to this, not some kind of
automatic hara-kiri.
        regards, tom lane


Re: Help me recovering data

From
"Kouber Saparev"
Date:
> Not being able to issue new transactions *is* data loss --- how are you
> going to get the system out of that state?

Yes, but I also would prefer the server to say something as "The database is
full, please vacuum." - the same as when the hard disk is full and you try
to record something on it - it's not exactly data loss, just an incapability
to continue the job.

The thing is that a warning is issued only when you start the vacuum itself:

play=# VACUUM;
WARNING:  some databases have not been vacuumed in 1613770184 transactions
HINT:  Better vacuum them within 533713463 transactions, or you may have a
wraparound failure.
VACUUM

So, it's something like the chicken and the egg problem, you have to vacuum
in order to receive a message that you had to do it earlier, but sometimes
it's just too late. As it was in my case, I have just discovered that almost
all of my data is missing - not even a notice or a warning message to let me
know that the end of the world is approaching. :)

Regards,
Kouber Saparev



Re: Help me recovering data

From
Christopher Kings-Lynne
Date:
> The checkpointer is entirely incapable of either detecting the problem
> (it doesn't have enough infrastructure to examine pg_database in a
> reasonable way) or preventing backends from doing anything if it did
> know there was a problem.

Well, I guess I meant 'some regularly running process'...

>>I think people'd rather their db just stopped accepting new transactions 
>>rather than just losing data...
> 
> Not being able to issue new transactions *is* data loss --- how are you
> going to get the system out of that state?

Not allowing any transactions except a vacuum...

> autovacuum is the correct long-term solution to this, not some kind of
> automatic hara-kiri.

Yeah, seems like it should really happen soon...

Chris


Re: Help me recovering data

From
pgsql@mohawksoft.com
Date:
>> The checkpointer is entirely incapable of either detecting the problem
>> (it doesn't have enough infrastructure to examine pg_database in a
>> reasonable way) or preventing backends from doing anything if it did
>> know there was a problem.
>
> Well, I guess I meant 'some regularly running process'...
>
>>>I think people'd rather their db just stopped accepting new transactions
>>>rather than just losing data...
>>
>> Not being able to issue new transactions *is* data loss --- how are you
>> going to get the system out of that state?
>
> Not allowing any transactions except a vacuum...
>
>> autovacuum is the correct long-term solution to this, not some kind of
>> automatic hara-kiri.
>
> Yeah, seems like it should really happen soon...
>
> Chris

Maybe I'm missing something, but shouldn't the prospect of data loss (even
in the presense of admin ignorance) be something that should be
unacceptable? Certainly within the realm "normal PostgreSQL" operation.




Re: Help me recovering data

From
Tom Lane
Date:
pgsql@mohawksoft.com writes:
> Maybe I'm missing something, but shouldn't the prospect of data loss (even
> in the presense of admin ignorance) be something that should be
> unacceptable? Certainly within the realm "normal PostgreSQL" operation.

[ shrug... ]  The DBA will always be able to find a way to shoot himself
in the foot.  We've seen several instances of people blowing away
pg_xlog and pg_clog, for example, because they "don't need log files".
Or how about failing to keep adequate backups?  That's a sure way for an
ignorant admin to lose data too.

Once autovacuum gets to the point where it's used by default, this
particular failure mode should be a thing of the past, but in the
meantime I'm not going to panic about it.
        regards, tom lane


Re: Help me recovering data

From
pgsql@mohawksoft.com
Date:
> pgsql@mohawksoft.com writes:
>> Maybe I'm missing something, but shouldn't the prospect of data loss
>> (even
>> in the presense of admin ignorance) be something that should be
>> unacceptable? Certainly within the realm "normal PostgreSQL" operation.
>
> [ shrug... ]  The DBA will always be able to find a way to shoot himself
> in the foot.  We've seen several instances of people blowing away
> pg_xlog and pg_clog, for example, because they "don't need log files".
> Or how about failing to keep adequate backups?  That's a sure way for an
> ignorant admin to lose data too.

There is a difference between actively doing something stupid and failing
to realize a maintenence task is required.

PostgreSQL should stop working. When the admin tries to understand why,
they can read a troubleshooting FAQ and say "oops, I gotta run this vacuum
thingy." That is a whole lot better than falling off a cliff you didn't
even know was there.

>
> Once autovacuum gets to the point where it's used by default, this
> particular failure mode should be a thing of the past, but in the
> meantime I'm not going to panic about it.

I don't know how to say this without sounding like a jerk, (I guess that's
my role sometimes) but would you go back and re-read this sentence?

To paraphrase: "I know this causes a catestrophic data loss, and we have
plans to fix it in the future, but for now, I'm not going panic about it."

What would you do if the FreeBSD group or Linux kernel group said this
about a file system? If you failed to run fsck after 100 mounts, you loose
your data?

I thought PostgreSQL was about "protecting your data." How many times have
we smugly said, "yea, you can use MySQL if you don't care about your
data." Any data loss caused by postgresql should be seen as unacceptable.
It's funny, while I've known about this for a while, and it has always
seemed a sort of distant edge condition that is easily avoided. However,
with todays faster machines and disks, it is easier to reach this
limitation than ever before. All PostgreSQL needs is one or two VERY UPSET
mainstream users who lose data to completely reverse the momemntum that it
is gaining.

No amount of engineering discussion about it not being the fault of
postgresql will be lost, and rightfully so, IMHO.

Sorry.



Re: Help me recovering data

From
Stephan Szabo
Date:
On Wed, 16 Feb 2005 pgsql@mohawksoft.com wrote:

> >
> > Once autovacuum gets to the point where it's used by default, this
> > particular failure mode should be a thing of the past, but in the
> > meantime I'm not going to panic about it.
>
> I don't know how to say this without sounding like a jerk, (I guess that's
> my role sometimes) but would you go back and re-read this sentence?
>
> To paraphrase: "I know this causes a catestrophic data loss, and we have
> plans to fix it in the future, but for now, I'm not going panic about it."

Do you have a useful suggestion about how to fix it?  "Stop working" is
handwaving and merely basically saying, "one of you people should do
something about this" is not a solution to the problem, it's not even an
approach towards a solution to the problem.


Re: Help me recovering data

From
"Joshua D. Drake"
Date:
>>in the foot.  We've seen several instances of people blowing away
>>pg_xlog and pg_clog, for example, because they "don't need log files".
>>Or how about failing to keep adequate backups?  That's a sure way for an
>>ignorant admin to lose data too.
>>
>>
>
>There is a difference between actively doing something stupid and failing
>to realize a maintenence task is required.
>
>PostgreSQL should stop working. When the admin tries to understand why,
>they can read a troubleshooting FAQ and say "oops, I gotta run this vacuum
>thingy." That is a whole lot better than falling off a cliff you didn't
>even know was there.
>
>
There is another way to look at this as lends itself to mohawksoft's
argument.

More often than not DBAs and Sysadmins are neither one. They are people
that get shoved into the job because they happen to mention around
the water cooler that they "once" installed linux/freebsd -- whatever.

Maybe it is an executive that has some of his brains left after
sitting behind a desk all day for the last 10 years. One day he/she
gets a thought in his head to create a new project named "foo".

He does not want to waste his internal resources so said executive
decides he will do it himself as a hobby. For some reason, the project
actually succeeds (I have seen this many times) and the company starts
using it.

Well guess what... it uses PostgreSQL. The guy isn't a DBA, heck he is
even really a programmer. He had know idea about this "vacuum" thing. He
had never heard of other databases having to do it.

So they run for a year, and then all of a sudden **BOOM** the world ends.

Do you think they are going to care that we "documented" the issue? Uhmmm
no they won't. Chances are they will drop kick PostgreSQL and bad talk it
to all their other executive friends.

In short, this whole argument has the mark of irresponsibility on both
parties but it is is the PostgreSQL projects responisbility to make
reasonable effort to produce a piece of software that doesn't break.

We are not talking about a user who ran a query: delete from foo;

At this point we have a known critical bug. Usually the PostgreSQL community
is all over critical bugs. Why is this any different?

It sounds to me that people are just annoyed that users don't RTFM. Get
over it. Most won't. If users RTFM more often, it would put most support
companies out of business.

Sincerely,

Joshua D. Drake


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL


Attachment

Re: Help me recovering data

From
"Joshua D. Drake"
Date:
>Do you have a useful suggestion about how to fix it?  "Stop working" is
>handwaving and merely basically saying, "one of you people should do
>something about this" is not a solution to the problem, it's not even an
>approach towards a solution to the problem.
>
>
I believe that the ability for PostgreSQL to stop accepting
queries and to log to the file or STDERR why it stopped working
and how to resolve it is appropriate.

Also it is probably appropriate to warn ahead of time...

WARNING: Only 50,000 transactions left before lock out

  or something like that.

J


>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL


Attachment

Re: Help me recovering data

From
Christopher Kings-Lynne
Date:
> At this point we have a known critical bug. Usually the PostgreSQL 
> community
> is all over critical bugs. Why is this any different?
> 
> It sounds to me that people are just annoyed that users don't RTFM. Get 
> over it. Most won't. If users RTFM more often, it would put most support 
> companies out of business.

I wonder if I should point out that we just had 3 people suffering XID 
wraparound failure in 2 days in the IRC channel...

Chris


Re: Help me recovering data

From
"Joshua D. Drake"
Date:
Christopher Kings-Lynne wrote:

>> At this point we have a known critical bug. Usually the PostgreSQL
>> community
>> is all over critical bugs. Why is this any different?
>>
>> It sounds to me that people are just annoyed that users don't RTFM.
>> Get over it. Most won't. If users RTFM more often, it would put most
>> support companies out of business.
>
>
> I wonder if I should point out that we just had 3 people suffering XID
> wraparound failure in 2 days in the IRC channel...

I have had half a dozen new customers in the last six months that have
had the same problem. Nothing like the phone call:

Uhmmm I am a new customer, help I can't see my databases.

Sincerely,

Joshua D. Drake


>
> Chris



--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL


Attachment

Re: Help me recovering data

From
Stephan Szabo
Date:
On Wed, 16 Feb 2005, Joshua D. Drake wrote:

>
> >Do you have a useful suggestion about how to fix it?  "Stop working" is
> >handwaving and merely basically saying, "one of you people should do
> >something about this" is not a solution to the problem, it's not even an
> >approach towards a solution to the problem.
> >
> >
> I believe that the ability for PostgreSQL to stop accepting
> queries and to log to the file or STDERR why it stopped working
> and how to resolve it is appropriate.

Right, but since the how to resolve it currently involves executing a
query, simply stopping dead won't allow you to resolve it. Also, if we
stop at the exact wraparound point, can we run into problems actually
trying to do the vacuum if that's still the resolution technique?  If so,
how far in advance of wraparound must we stop to guarantee it will
succeed? It's not rocket science, but figuring such things out is part of
actually making a workable solution.




Re: Help me recovering data

From
pgsql@mohawksoft.com
Date:
> On Wed, 16 Feb 2005 pgsql@mohawksoft.com wrote:
>
>> >
>> > Once autovacuum gets to the point where it's used by default, this
>> > particular failure mode should be a thing of the past, but in the
>> > meantime I'm not going to panic about it.
>>
>> I don't know how to say this without sounding like a jerk, (I guess
>> that's
>> my role sometimes) but would you go back and re-read this sentence?
>>
>> To paraphrase: "I know this causes a catestrophic data loss, and we have
>> plans to fix it in the future, but for now, I'm not going panic about
>> it."
>
> Do you have a useful suggestion about how to fix it?  "Stop working" is
> handwaving and merely basically saying, "one of you people should do
> something about this" is not a solution to the problem, it's not even an
> approach towards a solution to the problem.

Actually, it is not a solution to the problem of losing data. It is a drop
dead last ditch failsafe that EVERY PRODUCT should have before losing
data.


>



Re: Help me recovering data

From
Stephan Szabo
Date:
On Wed, 16 Feb 2005 pgsql@mohawksoft.com wrote:

> > On Wed, 16 Feb 2005 pgsql@mohawksoft.com wrote:
> >
> >> >
> >> > Once autovacuum gets to the point where it's used by default, this
> >> > particular failure mode should be a thing of the past, but in the
> >> > meantime I'm not going to panic about it.
> >>
> >> I don't know how to say this without sounding like a jerk, (I guess
> >> that's
> >> my role sometimes) but would you go back and re-read this sentence?
> >>
> >> To paraphrase: "I know this causes a catestrophic data loss, and we have
> >> plans to fix it in the future, but for now, I'm not going panic about
> >> it."
> >
> > Do you have a useful suggestion about how to fix it?  "Stop working" is
> > handwaving and merely basically saying, "one of you people should do
> > something about this" is not a solution to the problem, it's not even an
> > approach towards a solution to the problem.
>
> Actually, it is not a solution to the problem of losing data. It is a drop
> dead last ditch failsafe that EVERY PRODUCT should have before losing
> data.

Let's try again. Saying, "one of you people should do something about
this" is not a solution to the problem or an approach thereto.  "Stop
working" is handwaving since I see no approach therein that allows the
user to actually recover the data.



Re: Help me recovering data

From
pgsql@mohawksoft.com
Date:
>
> On Wed, 16 Feb 2005, Joshua D. Drake wrote:
>
>>
>> >Do you have a useful suggestion about how to fix it?  "Stop working" is
>> >handwaving and merely basically saying, "one of you people should do
>> >something about this" is not a solution to the problem, it's not even
>> an
>> >approach towards a solution to the problem.
>> >
>> >
>> I believe that the ability for PostgreSQL to stop accepting
>> queries and to log to the file or STDERR why it stopped working
>> and how to resolve it is appropriate.
>
> Right, but since the how to resolve it currently involves executing a
> query, simply stopping dead won't allow you to resolve it. Also, if we
> stop at the exact wraparound point, can we run into problems actually
> trying to do the vacuum if that's still the resolution technique?  If so,
> how far in advance of wraparound must we stop to guarantee it will
> succeed? It's not rocket science, but figuring such things out is part of

I would say, have a GUC parameter set at 1000 transactions. When fewer
than this number are available, postmaster will not run and issue a
message

"Transaction wrap-around error! You must run vacuum in stingle user
postgres mode to correct it, to avoid this message run the vacuum command
more frequently"

Hell, why not block  all the PostgreSQL processes and run vacuum? But, for
now, versions of PostgreSQL should stop before losing data.



Re: Help me recovering data

From
pgsql@mohawksoft.com
Date:
> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
>> Right, but since the how to resolve it currently involves executing a
>> query, simply stopping dead won't allow you to resolve it. Also, if we
>> stop at the exact wraparound point, can we run into problems actually
>> trying to do the vacuum if that's still the resolution technique?
>
> We'd have to do something with a fair amount of slop.  The idea I was
> toying with just now involved a forcible shutdown once we get within
> say 100,000 transactions of a wrap failure; but apply this check only
> when in interactive operation.  This would allow the DBA to perform
> the needed VACUUMing manually in a standalone backend.
>
> The real question here is exactly how large a cluestick do you want to
> hit the DBA with.  I don't think we can "guarantee" no data loss with
> anything less than forced shutdown, but that's not so much a cluestick
> as a clue howitzer.

I think a DBA or accidental DBA would prefer stating in a meeting:

"Yea, the database shut down because I didn't perform normal maintenence,
its fixed now and we have a script in place so it won't happen again"

Over

"Yea, the database lost all its data and we have to restore from our last
backup because I didn't perform normal maintenence."

One gets a "boy are you lucky" over a "you're fired."

>
> Maybe
>
> (a) within 200,000 transactions of wrap, every transaction start
> delivers a WARNING message;
>
> (b) within 100,000 transactions, forced shutdown as above.

I agree.


Re: Help me recovering data

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> Right, but since the how to resolve it currently involves executing a
> query, simply stopping dead won't allow you to resolve it. Also, if we
> stop at the exact wraparound point, can we run into problems actually
> trying to do the vacuum if that's still the resolution technique?

We'd have to do something with a fair amount of slop.  The idea I was
toying with just now involved a forcible shutdown once we get within
say 100,000 transactions of a wrap failure; but apply this check only
when in interactive operation.  This would allow the DBA to perform
the needed VACUUMing manually in a standalone backend.

The real question here is exactly how large a cluestick do you want to
hit the DBA with.  I don't think we can "guarantee" no data loss with
anything less than forced shutdown, but that's not so much a cluestick
as a clue howitzer.

Maybe

(a) within 200,000 transactions of wrap, every transaction start
delivers a WARNING message;

(b) within 100,000 transactions, forced shutdown as above.
        regards, tom lane


Re: Help me recovering data

From
pgsql@mohawksoft.com
Date:
>
> On Wed, 16 Feb 2005 pgsql@mohawksoft.com wrote:
>
>> > On Wed, 16 Feb 2005 pgsql@mohawksoft.com wrote:
>> >
>> >> >
>> >> > Once autovacuum gets to the point where it's used by default, this
>> >> > particular failure mode should be a thing of the past, but in the
>> >> > meantime I'm not going to panic about it.
>> >>
>> >> I don't know how to say this without sounding like a jerk, (I guess
>> >> that's
>> >> my role sometimes) but would you go back and re-read this sentence?
>> >>
>> >> To paraphrase: "I know this causes a catestrophic data loss, and we
>> have
>> >> plans to fix it in the future, but for now, I'm not going panic about
>> >> it."
>> >
>> > Do you have a useful suggestion about how to fix it?  "Stop working"
>> is
>> > handwaving and merely basically saying, "one of you people should do
>> > something about this" is not a solution to the problem, it's not even
>> an
>> > approach towards a solution to the problem.
>>
>> Actually, it is not a solution to the problem of losing data. It is a
>> drop
>> dead last ditch failsafe that EVERY PRODUCT should have before losing
>> data.
>
> Let's try again. Saying, "one of you people should do something about
> this" is not a solution to the problem or an approach thereto.  "Stop
> working" is handwaving since I see no approach therein that allows the
> user to actually recover the data.
>


Well, it is sort of the the Hockey strike, now that it seems like stoping
normal operation is better than losing billions of rows of data. We can
decide who to do it and how to correct it.




Re: Help me recovering data

From
Stephan Szabo
Date:
On Wed, 16 Feb 2005, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > Right, but since the how to resolve it currently involves executing a
> > query, simply stopping dead won't allow you to resolve it. Also, if we
> > stop at the exact wraparound point, can we run into problems actually
> > trying to do the vacuum if that's still the resolution technique?
>
> We'd have to do something with a fair amount of slop.  The idea I was
> toying with just now involved a forcible shutdown once we get within
> say 100,000 transactions of a wrap failure; but apply this check only
> when in interactive operation.  This would allow the DBA to perform
> the needed VACUUMing manually in a standalone backend.
>
> The real question here is exactly how large a cluestick do you want to
> hit the DBA with.  I don't think we can "guarantee" no data loss with
> anything less than forced shutdown, but that's not so much a cluestick
> as a clue howitzer.
>
> Maybe
>
> (a) within 200,000 transactions of wrap, every transaction start
> delivers a WARNING message;
>
> (b) within 100,000 transactions, forced shutdown as above.

This seems reasonable, although perhaps the former could be something
configurable.  I'm not sure there's a good reason to allow the latter to
change unless there'd ever be a case where 100,000 transactions wasn't
enough to vacuum or something like that.

All in all, I figure that odds are very high that if someone isn't
vacuuming in the rest of the transaction id space, either the transaction
rate is high enough that 100,000 warning may not be enough or they aren't
going to pay attention anyway and the howitzer might not be bad.


Re: Help me recovering data

From
Bruno Wolff III
Date:
On Wed, Feb 16, 2005 at 09:38:31 -0800, Stephan Szabo <sszabo@megazone.bigpanda.com> wrote:
> On Wed, 16 Feb 2005, Tom Lane wrote:
> 
> > (a) within 200,000 transactions of wrap, every transaction start
> > delivers a WARNING message;
> >
> > (b) within 100,000 transactions, forced shutdown as above.
> 
> This seems reasonable, although perhaps the former could be something
> configurable.  I'm not sure there's a good reason to allow the latter to
> change unless there'd ever be a case where 100,000 transactions wasn't
> enough to vacuum or something like that.

I don't think there is much point in making it configurable. If they knew
to do that they would most likely know to vacuum as well.

However, 100K out of 1G seems too small. Just to get wrap around there
must be a pretty high transaction rate, so 100K may not give much warning.
1M or 10M seem to be better.


Re: Help me recovering data

From
Richard Huxton
Date:
Stephan Szabo wrote:
> On Wed, 16 Feb 2005, Tom Lane wrote:
> 
>>Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
>>
>>(a) within 200,000 transactions of wrap, every transaction start
>>delivers a WARNING message;
>>
>>(b) within 100,000 transactions, forced shutdown as above.
> 
> 
> This seems reasonable, although perhaps the former could be something
> configurable.  I'm not sure there's a good reason to allow the latter to
> change unless there'd ever be a case where 100,000 transactions wasn't
> enough to vacuum or something like that.
> 
> All in all, I figure that odds are very high that if someone isn't
> vacuuming in the rest of the transaction id space, either the transaction
> rate is high enough that 100,000 warning may not be enough or they aren't
> going to pay attention anyway and the howitzer might not be bad.

How would people feel about stopping after the first 100 transactions too?
Pro: Teaches the lesson straight away.
Con: Irritating
Con: Might not be enough time for automated installers

--  Richard Huxton  Archonet Ltd


Re: Help me recovering data

From
Andrew Dunstan
Date:

Tom Lane wrote:

>Maybe
>
>(a) within 200,000 transactions of wrap, every transaction start
>delivers a WARNING message;
>
>(b) within 100,000 transactions, forced shutdown as above.
>
>
>  
>

This seems sound enough, but if the DBA and/or SA can't be bothered 
reading the docs where this topic features quite prominently, I suspect 
the warning messages won't have much effect either. Basically ISTM we're 
talking about people who *need* a clue howitzer.

This will possibly hit us more now we have the Windows port (or maybe 
not, if the Windows servers are regularly rebooted ;-) )

cheers

andrew


Re: Help me recovering data

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> All in all, I figure that odds are very high that if someone isn't
> vacuuming in the rest of the transaction id space, either the transaction
> rate is high enough that 100,000 warning may not be enough or they aren't
> going to pay attention anyway and the howitzer might not be bad.

Yeah.  It's easy to imagine scenarios where the majority of the warnings
go into the bit bucket (because they are going to noninteractive client
applications that just ignore NOTICE messages).  So I think it's
appropriate to be delivering the warnings for a good long time, in hopes
that someone at least occasionally fires up psql and happens to actually
see them.  Something like 100K or 1M transactions feels about right
to me.

Pulling the failure trigger with 100K transactions still to go is surely
overly conservative, but compared to the size of the ID space it is not
worth noticing.

As far as the actual implementation, I was envisioning adding a limiting
XID variable and a database name variable to shared memory (protected by
the same LWLock that protects the nextXID counter).  These would
be computed and loaded during the bootstrap process, right after we
finish WAL replay if any.  It would probably cost us one XID to do this
(though maybe it could be done without running a real transaction?  This
ties in with my thoughts about replacing GetRawDatabaseInfo with a flat
file...), but one XID per postmaster start attempt is hopefully not
gonna kill us.  Subsequently, any VACUUM that updates a datfrozenxid
entry in pg_database would update these variables to reflect the new
safe limit and the name of the database with the currently oldest
datfrozenxid.  This would allow a very cheap comparison during
GetNewTransactionId to see if we are near enough to generate a warning:
WARNING: database "foo" must be vacuumed within 58372 transactions
or past the limit and generate an error:
ERROR: database is shut down to avoid wraparound data loss in database "foo"
HINT: Stop the postmaster and use a standalone backend to VACUUM in "foo".
In the error case, we could error out *without* advancing nextXID,
so that even automated clients continually retrying failed transactions
couldn't blow past the safety margin.
        regards, tom lane


Re: Help me recovering data

From
Tom Lane
Date:
Bruno Wolff III <bruno@wolff.to> writes:
> I don't think there is much point in making it configurable. If they knew
> to do that they would most likely know to vacuum as well.

Agreed.

> However, 100K out of 1G seems too small. Just to get wrap around there
> must be a pretty high transaction rate, so 100K may not give much warning.
> 1M or 10M seem to be better.

Good point.  Even 10M is less than 1% of the ID space.  Dunno about you,
but the last couple cars I've owned start flashing warnings when the gas
tank is about 20% full, not 1% full...
        regards, tom lane


Re: Help me recovering data

From
Greg Stark
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes:

> Christopher Kings-Lynne wrote:
> 
> > I wonder if I should point out that we just had 3 people suffering XID
> > wraparound failure in 2 days in the IRC channel...
> 
> I have had half a dozen new customers in the last six months that have
> had the same problem. Nothing like the phone call:

How are so many people doing so many transactions so soon after installing?

To hit wraparound you have to do a billion transactions? ("With a `B'") That
takes real work. If you did 1,000 txn/minute for every minute of every day it
would still take a couple years to get there.

And most databases get a mix of updates and selects. I would expect it would
be pretty hard to go that long with any significant level of update activity
and no vacuums and not notice the performance problems from the dead tuples.

What am I missing. Is there a significant percentage of the user base that's
doing nothing but loading huge static databases and then performing massive
loads (like thousands of queries per second) of purely read-only queries
against them?

-- 
greg



Re: Help me recovering data

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> How are so many people doing so many transactions so soon after installing?

> To hit wraparound you have to do a billion transactions? ("With a `B'") That
> takes real work. If you did 1,000 txn/minute for every minute of every day it
> would still take a couple years to get there.

> And most databases get a mix of updates and selects. I would expect it would
> be pretty hard to go that long with any significant level of update activity
> and no vacuums and not notice the performance problems from the dead tuples.

I think the people who've managed to shoot themselves in the foot this
way are those who decided to "optimize" their cron jobs to only vacuum
their user tables, and forgot about the system catalogs.  So it's
probably more of a case of "a little knowledge is a dangerous thing"
than never having heard of VACUUM at all.  I too don't see that you
could possibly get to 2 billion transactions without having found out
that Postgres requires regular VACUUMing.
        regards, tom lane


Re: Help me recovering data

From
"Joshua D. Drake"
Date:
> I think the people who've managed to shoot themselves in the foot this
> way are those who decided to "optimize" their cron jobs to only vacuum
> their user tables, and forgot about the system catalogs.  So it's
> probably more of a case of "a little knowledge is a dangerous thing"
> than never having heard of VACUUM at all.  I too don't see that you
> could possibly get to 2 billion transactions without having found out
> that Postgres requires regular VACUUMing.

I have had two new customers in the last year who didn't do any vacuums
that had a rollover. The database sat in a backroom and did processing.
It just worked so they didn't worry about it.

It took one of them almost two years to get there but it does happen.

Sincerely,

Joshua D. Drake



>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org


--
Command Prompt, Inc., your source for PostgreSQL replication,
professional support, programming, managed services, shared
and dedicated hosting. Home of the Open Source Projects plPHP,
plPerlNG, pgManage,  and pgPHPtoolkit.
Contact us now at: +1-503-667-4564 - http://www.commandprompt.com


Attachment

Re: Help me recovering data

From
"Matthew T. O'Connor"
Date:
Tom Lane wrote:

>pgsql@mohawksoft.com writes:
>  
>
>>Maybe I'm missing something, but shouldn't the prospect of data loss (even
>>in the presense of admin ignorance) be something that should be
>>unacceptable? Certainly within the realm "normal PostgreSQL" operation.
>>    
>>
>
>Once autovacuum gets to the point where it's used by default, this
>particular failure mode should be a thing of the past, but in the
>meantime I'm not going to panic about it.
>
Which I hope will be soon.


Re: Help me recovering data

From
Gaetano Mendola
Date:
Greg Stark wrote:
> "Joshua D. Drake" <jd@commandprompt.com> writes:
> 
> 
>>Christopher Kings-Lynne wrote:
>>
>>
>>>I wonder if I should point out that we just had 3 people suffering XID
>>>wraparound failure in 2 days in the IRC channel...
>>
>>I have had half a dozen new customers in the last six months that have
>>had the same problem. Nothing like the phone call:
> 
> 
> How are so many people doing so many transactions so soon after installing?
> 
> To hit wraparound you have to do a billion transactions? ("With a `B'") That
> takes real work. If you did 1,000 txn/minute for every minute of every day it
> would still take a couple years to get there.

We do ~4000 txn/minute so in 6 month you are screewd up...



Regards
Gaetano Mendola




Re: Help me recovering data

From
Gaetano Mendola
Date:
Stephan Szabo wrote:
> On Wed, 16 Feb 2005 pgsql@mohawksoft.com wrote:
> 
> 
>>>Once autovacuum gets to the point where it's used by default, this
>>>particular failure mode should be a thing of the past, but in the
>>>meantime I'm not going to panic about it.
>>
>>I don't know how to say this without sounding like a jerk, (I guess that's
>>my role sometimes) but would you go back and re-read this sentence?
>>
>>To paraphrase: "I know this causes a catestrophic data loss, and we have
>>plans to fix it in the future, but for now, I'm not going panic about it."
> 
> 
> Do you have a useful suggestion about how to fix it?  "Stop working" is
> handwaving and merely basically saying, "one of you people should do
> something about this" is not a solution to the problem, it's not even an
> approach towards a solution to the problem.

Is not a solution but between loose data and shutdown the postmaster I
prefer the shutdown.

Regards
Gaetano Mendola




Re: Help me recovering data

From
Gaetano Mendola
Date:
Tom Lane wrote:
> Bruno Wolff III <bruno@wolff.to> writes:
> 
>>I don't think there is much point in making it configurable. If they knew
>>to do that they would most likely know to vacuum as well.
> 
> 
> Agreed.
> 
> 
>>However, 100K out of 1G seems too small. Just to get wrap around there
>>must be a pretty high transaction rate, so 100K may not give much warning.
>>1M or 10M seem to be better.
> 
> 
> Good point.  Even 10M is less than 1% of the ID space.  Dunno about you,
> but the last couple cars I've owned start flashing warnings when the gas
> tank is about 20% full, not 1% full...

BTW, why not do an automatic vacuum instead of shutdown ? At least the
DB do not stop working untill someone study what the problem is and
how solve it.


Regards
Gaetano Mendola



Re: Help me recovering data

From
Tom Lane
Date:
Gaetano Mendola <mendola@bigfoot.com> writes:
> BTW, why not do an automatic vacuum instead of shutdown ? At least the
> DB do not stop working untill someone study what the problem is and
> how solve it.

No, the entire point of this discussion is to whup the DBA upside the
head with a big enough cluestick to get him to install autovacuum.

Once autovacuum is default, it won't matter anymore.
        regards, tom lane


Re: Help me recovering data

From
Greg Stark
Date:
Gaetano Mendola <mendola@bigfoot.com> writes:

> We do ~4000 txn/minute so in 6 month you are screewd up...

Sure, but if you ran without vacuuming for 6 months, wouldn't you notice the
huge slowdowns from all those dead tuples before that?

-- 
greg



Re: Help me recovering data

From
Dennis Bjorklund
Date:
On 17 Feb 2005, Greg Stark wrote:

> Gaetano Mendola <mendola@bigfoot.com> writes:
> 
> > We do ~4000 txn/minute so in 6 month you are screewd up...
> 
> Sure, but if you ran without vacuuming for 6 months, wouldn't you notice the
> huge slowdowns from all those dead tuples before that?

Most people that we have seen on irc that run into the problem do vacuum
some tables, but forget to vacuum all. Then their tables work fine but
suddenly some system tables like pg_databases and pg_shadow become empty
since they never was vacuumed...

-- 
/Dennis Björklund



Re: Help me recovering data

From
Christopher Kings-Lynne
Date:
>>And most databases get a mix of updates and selects. I would expect it would
>>be pretty hard to go that long with any significant level of update activity
>>and no vacuums and not notice the performance problems from the dead tuples.
> 
> 
> I think the people who've managed to shoot themselves in the foot this
> way are those who decided to "optimize" their cron jobs to only vacuum
> their user tables, and forgot about the system catalogs. 

That's certainly the case with one of the people we helped in IRC - 3 
user tables only being vacuumed.

Chris


Re: Help me recovering data

From
pgsql@mohawksoft.com
Date:
> Gaetano Mendola <mendola@bigfoot.com> writes:
>
>> We do ~4000 txn/minute so in 6 month you are screewd up...
>
> Sure, but if you ran without vacuuming for 6 months, wouldn't you notice
> the
> huge slowdowns from all those dead tuples before that?
>
>
I would think that only applies to databases where UPDATE and DELETE are
done often. What about databases that are 99.999% inserts? A DBA lightly
going over the docs may not even know that vacuum needs to be run.


Re: Help me recovering data

From
Gaetano Mendola
Date:
Greg Stark wrote:
> Gaetano Mendola <mendola@bigfoot.com> writes:
> 
> 
>>We do ~4000 txn/minute so in 6 month you are screewd up...
> 
> 
> Sure, but if you ran without vacuuming for 6 months, wouldn't you notice the
> huge slowdowns from all those dead tuples before that?
> 

In my applications yes, for sure I see the huge slowdown after 2 days
without it, but giveng the fact that someone crossed the limit I
immagine that is possible without performance loose


Regards
Gaetano Mendola



Re: Help me recovering data

From
Kevin Brown
Date:
Tom Lane wrote:
> Gaetano Mendola <mendola@bigfoot.com> writes:
> > BTW, why not do an automatic vacuum instead of shutdown ? At least the
> > DB do not stop working untill someone study what the problem is and
> > how solve it.
> 
> No, the entire point of this discussion is to whup the DBA upside the
> head with a big enough cluestick to get him to install autovacuum.
> 
> Once autovacuum is default, it won't matter anymore.

I have a concern about this that I hope is just based on some
misunderstanding on my part.

My concern is: suppose that a database is modified extremely
infrequently?  So infrequently, in fact, that over a billion read
transactions occur before the next write transaction.  Once that write
transaction occurs, you're hosed, right?  Autovacuum won't catch this
because it takes action based on the write activity that occurs in the
tables.

So: will autovacuum be coded to explicitly look for transaction
wraparound, or to automatically vacuum every N number of transactions
(e.g., 500 million)?



-- 
Kevin Brown                          kevin@sysexperts.com


Re: Help me recovering data

From
Jürgen Cappel
Date:
Just wondering after this discussion:

Is transaction wraparound limited to a database or to an installation ?
i.e. can heavy traffic in one db affect another db in the same installation ?



Re: Help me recovering data

From
Russell Smith
Date:
On Fri, 18 Feb 2005 04:38 pm, Kevin Brown wrote:
> Tom Lane wrote:
> > Gaetano Mendola <mendola@bigfoot.com> writes:
> > > BTW, why not do an automatic vacuum instead of shutdown ? At least the
> > > DB do not stop working untill someone study what the problem is and
> > > how solve it.
> > 
> > No, the entire point of this discussion is to whup the DBA upside the
> > head with a big enough cluestick to get him to install autovacuum.
> > 
> > Once autovacuum is default, it won't matter anymore.
> 
> I have a concern about this that I hope is just based on some
> misunderstanding on my part.
> 
> My concern is: suppose that a database is modified extremely
> infrequently?  So infrequently, in fact, that over a billion read
> transactions occur before the next write transaction.  Once that write
> transaction occurs, you're hosed, right?  Autovacuum won't catch this
> because it takes action based on the write activity that occurs in the
> tables.
> 
> So: will autovacuum be coded to explicitly look for transaction
> wraparound, or to automatically vacuum every N number of transactions
> (e.g., 500 million)?
> 
autovacuum already checks for both Transaction wraparound, and table updates.
It vacuums individual tables as they need it, from a free space/recovery point of view.

It also does checks to ensure that no database is nearing transaction wraparound, if it
is, it initiates a database wide vacuum to resolve that issue.

Regards

Russell Smith
> 
> 


Re: Help me recovering data

From
Russell Smith
Date:
On Fri, 18 Feb 2005 08:53 pm, Jürgen Cappel wrote:
> Just wondering after this discussion:
>
> Is transaction wraparound limited to a database or to an installation ?
> i.e. can heavy traffic in one db affect another db in the same installation ?
>
XID's are global to the pg cluster, or installation.  So not using a database
will still cause XID wraparound to occur on that database.

Regards

Russell Smith.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>
>


Re: Help me recovering data

From
Robert Treat
Date:
On Thursday 17 February 2005 07:47, pgsql@mohawksoft.com wrote:
> > Gaetano Mendola <mendola@bigfoot.com> writes:
> >> We do ~4000 txn/minute so in 6 month you are screewd up...
> >
> > Sure, but if you ran without vacuuming for 6 months, wouldn't you notice
> > the
> > huge slowdowns from all those dead tuples before that?
>
> I would think that only applies to databases where UPDATE and DELETE are
> done often. What about databases that are 99.999% inserts? A DBA lightly
> going over the docs may not even know that vacuum needs to be run.
>

Yup... I don't vacuum a least a 100 of the tables in my schema cause they are 
continuous insert with big deletions once every 6 months or so.  Generally 
speaking it isn't worth the performance hit to vacuum these big tables 
regularly, so I only do it when I have to.... like every six months when wrap 
around gets close.

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


Re: Help me recovering data

From
"Matthew T. O'Connor"
Date:
Russell Smith wrote:

>On Fri, 18 Feb 2005 04:38 pm, Kevin Brown wrote:
>  
>
>>Tom Lane wrote:
>>    
>>
>>> No, the entire point of this discussion is to whup the DBA upside the
>>>
>>>head with a big enough cluestick to get him to install autovacuum.
>>>
>>>Once autovacuum is default, it won't matter anymore.
>>>      
>>>
>>I have a concern about this that I hope is just based on some
>>misunderstanding on my part.
>>
>>My concern is: suppose that a database is modified extremely
>>infrequently?  So infrequently, in fact, that over a billion read
>>transactions occur before the next write transaction.  Once that write
>>transaction occurs, you're hosed, right?  Autovacuum won't catch this
>>because it takes action based on the write activity that occurs in the
>>tables.
>>
>>So: will autovacuum be coded to explicitly look for transaction
>>wraparound, or to automatically vacuum every N number of transactions
>>(e.g., 500 million)?
>>    
>>
>autovacuum already checks for both Transaction wraparound, and table updates.
>It vacuums individual tables as they need it, from a free space/recovery point of view.
>
>It also does checks to ensure that no database is nearing transaction wraparound, if it
>is, it initiates a database wide vacuum to resolve that issue.
>
Right, the check that autovacuum does for wraparound is totally separate 
from the monitoring of inserts updates and deletes.


Re: Help me recovering data

From
Bruce Momjian
Date:
Matthew T. O'Connor wrote:
> Tom Lane wrote:
> 
> >pgsql@mohawksoft.com writes:
> >  
> >
> >>Maybe I'm missing something, but shouldn't the prospect of data loss (even
> >>in the presense of admin ignorance) be something that should be
> >>unacceptable? Certainly within the realm "normal PostgreSQL" operation.
> >>    
> >>
> >
> >Once autovacuum gets to the point where it's used by default, this
> >particular failure mode should be a thing of the past, but in the
> >meantime I'm not going to panic about it.
> >
> Which I hope will be soon.

I am ready to help you implement integrated autovacuum in 8.1.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Help me recovering data

From
Thomas F.O'Connell
Date:
Does auto_vacuum vacuum the system tables?

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Feb 16, 2005, at 5:42 PM, Matthew T. O'Connor wrote:

> Tom Lane wrote:
>
>> pgsql@mohawksoft.com writes:
>>
>>> Maybe I'm missing something, but shouldn't the prospect of data loss 
>>> (even
>>> in the presense of admin ignorance) be something that should be
>>> unacceptable? Certainly within the realm "normal PostgreSQL" 
>>> operation.
>>>
>>
>> Once autovacuum gets to the point where it's used by default, this
>> particular failure mode should be a thing of the past, but in the
>> meantime I'm not going to panic about it.
>>
> Which I hope will be soon.



Re: Help me recovering data

From
"Matthew T. O'Connor"
Date:
Thomas F.O'Connell wrote:

> Does auto_vacuum vacuum the system tables?


Yes