Thread: quick review

quick review

From

"Molle Bestefich"

Date:

20 November 2006, 21:26:21

Hi

First time PostgreSQL user.

Here's a review of my experience so far.

1.) Reading the manual

Looks good feature-wise, but there's a suspicious lack of reference to
any kind of repair utility for damaged data files.  Hmm.  I've been
using computers for long enough that I know that in the real world
there's no such thing as "data corruption doesn't occur", so it's
rather suspicious.

2.) Downloading and running the software

Installer doesn't run over Terminal Services.  Sucks, ended up using
the no-installer zip instead.

Using what I assume is the server (postgres.exe - gee, a win32
service, or an icon or something would've been nice), I keep getting
"you are not permitted to run as administrator" messages.  Been
looking for the option to turn that check off for the past 30 minutes,
I do really know what I'm doing, network/security wise.  Spent so long
on this useless task now that I've started considering the PostgreSQL
developers pretentious f..ks for deciding what I'm permitted to run on
my own box.  I could understand a warning and a polite pointer at a
circumvention mechanism, but completely locking me out is unbelievably
arrogant.  Thinking that if the rest of the product is developed with
the same disdain for end users, I'll definitely be better off with
another product.

So, short review, but I haven't gotten any farther yet.
Feel free to send suggestions my way, too :-).

Re: quick review

From

"Thomas H."

Date:

21 November 2006, 00:11:11

> Installer doesn't run over Terminal Services.  Sucks, ended up using
> the no-installer zip instead.

http://pginstaller.projects.postgresql.org/faq/FAQ_windows.html#3.5

upgrade to windows2003. installing pgsql in a w2k3 console session just 
fine.

> Using what I assume is the server (postgres.exe - gee, a win32
> service, or an icon or something would've been nice), I keep getting
> "you are not permitted to run as administrator" messages.  Been

http://pginstaller.projects.postgresql.org/faq/FAQ_windows.html#2.3

> looking for the option to turn that check off for the past 30 minutes,
> I do really know what I'm doing, network/security wise.  Spent so long

hear hear... another windows admin running all his service under the 
administrator context.

> [just rant not really worth reading again]

use "pg_ctl.exe register"  to setup your system service when using zip 
instead of installer. you can also use pg_ctl to run pgsql from a command 
prompt. as you obviously "know what you're doing", i assume you've checked 
the howto for pgsql on windows, read pgsql's setup-section and already ran 
initdb... ;-)

best regards
- just another arrogant pgsql user

Re: quick review

From

"Merlin Moncure"

Date:

21 November 2006, 00:13:56

On 11/20/06, Molle Bestefich <molle.bestefich@gmail.com> wrote:
> Hi
>
> First time PostgreSQL user.
>
> Here's a review of my experience so far.
>
> 1.) Reading the manual
>
> Looks good feature-wise, but there's a suspicious lack of reference to
> any kind of repair utility for damaged data files.  Hmm.  I've been
> using computers for long enough that I know that in the real world
> there's no such thing as "data corruption doesn't occur", so it's
> rather suspicious.
>
> 2.) Downloading and running the software
>
> Installer doesn't run over Terminal Services.  Sucks, ended up using
> the no-installer zip instead.
>
> Using what I assume is the server (postgres.exe - gee, a win32
> service, or an icon or something would've been nice), I keep getting
> "you are not permitted to run as administrator" messages.  Been
> looking for the option to turn that check off for the past 30 minutes,
> I do really know what I'm doing, network/security wise.  Spent so long
> on this useless task now that I've started considering the PostgreSQL
> developers pretentious f..ks for deciding what I'm permitted to run on
> my own box.  I could understand a warning and a polite pointer at a
> circumvention mechanism, but completely locking me out is unbelievably
> arrogant.  Thinking that if the rest of the product is developed with
> the same disdain for end users, I'll definitely be better off with
> another product.
>
> So, short review, but I haven't gotten any farther yet.
> Feel free to send suggestions my way, too :-).

maybe try lincoln logs?

merlin

Re: quick review

From

Neil Conway

Date:

21 November 2006, 00:16:58

On Mon, 2006-11-20 at 17:09 +0100, Molle Bestefich wrote:
> Looks good feature-wise, but there's a suspicious lack of reference to
> any kind of repair utility for damaged data files.

There is indeed no included repair utility for damaged files. There are
a some tools for examining the Postgres on-disk format (like
pg_filedump[1], and pgfsck[2]), which can be useful for crash recovery.
There is also the zero_damaged_pages configuration parameter, which can
be used to recover from page-level data corruption. Postgres could use
better tools for this sort of low-level crash recovery, I agree. I think
one reason for this is that such tools are rarely needed.

> Using what I assume is the server (postgres.exe - gee, a win32
> service, or an icon or something would've been nice), I keep getting
> "you are not permitted to run as administrator" messages.

Please see the list archives for exhaustive discussions of why Postgres
behaves this way -- I won't rehash them here. Name calling is unlikely
to result in convince many people.

-Neil

[1] http://sources.redhat.com/rhdb/utilities.html
[2] http://svana.org/kleptog/pgsql/pgfsck.html (seems not recently
updated)

Re: quick review

From

Bruce Momjian

Date:

21 November 2006, 00:17:55

Molle Bestefich wrote:
> Hi
> 
> First time PostgreSQL user.
> 
> Here's a review of my experience so far.
> 
> 1.) Reading the manual
> 
> Looks good feature-wise, but there's a suspicious lack of reference to
> any kind of repair utility for damaged data files.  Hmm.  I've been
> using computers for long enough that I know that in the real world
> there's no such thing as "data corruption doesn't occur", so it's
> rather suspicious.

Unless you have buggy hardware, there is no need for a repair tool. 
That is our position.

> 2.) Downloading and running the software
> 
> Installer doesn't run over Terminal Services.  Sucks, ended up using
> the no-installer zip instead.
> 
> Using what I assume is the server (postgres.exe - gee, a win32
> service, or an icon or something would've been nice), I keep getting
> "you are not permitted to run as administrator" messages.  Been
> looking for the option to turn that check off for the past 30 minutes,
> I do really know what I'm doing, network/security wise.  Spent so long
> on this useless task now that I've started considering the PostgreSQL
> developers pretentious f..ks for deciding what I'm permitted to run on
> my own box.  I could understand a warning and a polite pointer at a
> circumvention mechanism, but completely locking me out is unbelievably
> arrogant.  Thinking that if the rest of the product is developed with
> the same disdain for end users, I'll definitely be better off with
> another product.

Perhaps.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: quick review

From

Tom Lane

Date:

21 November 2006, 01:02:40

Neil Conway <neilc@samurai.com> writes:
> There is indeed no included repair utility for damaged files. There are
> a some tools for examining the Postgres on-disk format (like
> pg_filedump[1], and pgfsck[2]), which can be useful for crash recovery.
> There is also the zero_damaged_pages configuration parameter, which can
> be used to recover from page-level data corruption. Postgres could use
> better tools for this sort of low-level crash recovery, I agree. I think
> one reason for this is that such tools are rarely needed.

In my mind, the existence of an automated repair utility is an admission
that the software it's for is insufficiently robust.  When we find a
repeatable data corruption scenario in Postgres, we *fix the bug*, we
don't make something to clean up after an unfixed bug.  Comparison
point: thirty years ago, people wrote "fsck" utilities for their
non-robust filesystems, and hoped they'd get all their data back;
now they run journaling filesystems instead.

This is certainly not to claim that we don't have corruption problems;
we do.  What we don't have are corruption problems that are predictable
enough to be repaired by automatic processes, nor ones widespread enough
to justify any great investment in such tools.

(But having said that, one use of REINDEX is as a repair utility for
broken indexes...)
        regards, tom lane

Re: quick review

From

"Simon Riggs"

Date:

21 November 2006, 09:59:55

On Mon, 2006-11-20 at 17:09 +0100, Molle Bestefich wrote:

> Looks good feature-wise, but there's a suspicious lack of reference to
> any kind of repair utility for damaged data files.  Hmm.  I've been
> using computers for long enough that I know that in the real world
> there's no such thing as "data corruption doesn't occur", so it's
> rather suspicious. 

There *is* a repair utility for damaged data files. It is the
*automatic* crash recovery feature of the database, hence the lack of a
separate repair utility.

PostgreSQL presumes that if the system crashes that you want your
database to come up in a consistent state because your data is extremely
valuable. There isn't a mode where the database comes up but some of
your files are suspect and we make you run a utility to get things back
to normal (maybe) - that is the way of doing things that you should be
somewhat skeptical of.

There is a parameter to zero_damaged_pages which can be used in
conjunction with the VACUUM utility to recover a database that has some
very bad data in it, but that is usually a response to hardware error.

Anyway, thanks very much for explaining your viewpoint. That helps us to
understand the mindset of non-users. We'll try to improve the docs to
explain why we don't need a separate tool to recover the database.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com

Re: quick review

From

Neil Conway

Date:

21 November 2006, 12:49:03

On Tue, 2006-11-21 at 00:00 -0500, Tom Lane wrote:
> In my mind, the existence of an automated repair utility is an admission
> that the software it's for is insufficiently robust.  When we find a
> repeatable data corruption scenario in Postgres, we *fix the bug*, we
> don't make something to clean up after an unfixed bug.

If it's a question of priorities, I completely agree: clearly the
primary focus should be on writing reliable software that doesn't need
repair utilities, and I think by that measure we've been doing pretty
well. But I don't think the need for these kind of tools can be
discounted entirely: hardware problems frequently cause data corruption,
and the number of future Postgres data-loss bugs is likely to be
non-zero, despite our best efforts.

Having better tools is hardly a bad thing, and I don't think having
better tools would require making an "admission" about the reliability
of our software. I was just saying that there's room for improvement:
for instance, tools like pg_filedump and pgfsck could be a lot more
polished and feature-complete, and the whole process of recovering from
data corruption could be better documented. Again, I don't think it is
our top priority, but if someone wants to work on it, I wouldn't stop
them...

-Neil

Re: quick review

From

Tom Lane

Date:

21 November 2006, 13:01:07

Neil Conway <neilc@samurai.com> writes:
> Having better tools is hardly a bad thing, and I don't think having
> better tools would require making an "admission" about the reliability
> of our software. I was just saying that there's room for improvement:
> for instance, tools like pg_filedump and pgfsck could be a lot more
> polished and feature-complete, and the whole process of recovering from
> data corruption could be better documented.

The point I was trying to make is that recovery is never a cookbook
process --- it's never twice the same problem.  (If it were, we could
and should be doing something about the underlying problem.)  This makes
it difficult to provide either polished tools or polished documentation.
        regards, tom lane

Re: quick review

From

Alvaro Herrera

Date:

21 November 2006, 13:28:30

Tom Lane wrote:
> Neil Conway <neilc@samurai.com> writes:
> > Having better tools is hardly a bad thing, and I don't think having
> > better tools would require making an "admission" about the reliability
> > of our software. I was just saying that there's room for improvement:
> > for instance, tools like pg_filedump and pgfsck could be a lot more
> > polished and feature-complete, and the whole process of recovering from
> > data corruption could be better documented.
> 
> The point I was trying to make is that recovery is never a cookbook
> process --- it's never twice the same problem.

Well, TOAST pointer problems are very frequent, even though they are
typically hardware-related.  An heuristic-based tool to try to guess
values for invalid tuples is not impossible, I'd guess.

I've never even tried written such a thing though.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: quick review

From

Benny Amorsen

Date:

21 November 2006, 17:04:46

>>>>> "BM" == Bruce Momjian <bruce@momjian.us> writes:

BM> Unless you have buggy hardware, there is no need for a repair
BM> tool. That is our position.

Non-buggy hardware will be more difficult to come by as databases
become larger on commodity hardware. Of course you can define the
problem away by saying that Postgres is only good for enterprises.

Right now file systems are being changed to be more resilient. ZFS
and the Linux working group to replace/extend the ext* series are
examples. Databases will have to follow suit eventually.

There are probably other things with higher priority for Postgres
right now though.


/Benny

Re: quick review

From

"Joshua D. Drake"

Date:

21 November 2006, 17:11:10

On Mon, 2006-11-20 at 21:50 -0500, Bruce Momjian wrote:
> Molle Bestefich wrote:
> > Hi
> > 
> > First time PostgreSQL user.
> > 
> > Here's a review of my experience so far.
> > 
> > 1.) Reading the manual
> > 
> > Looks good feature-wise, but there's a suspicious lack of reference to
> > any kind of repair utility for damaged data files.  Hmm.  I've been
> > using computers for long enough that I know that in the real world
> > there's no such thing as "data corruption doesn't occur", so it's
> > rather suspicious.
> 
> Unless you have buggy hardware, there is no need for a repair tool. 
> That is our position.

And when I mentioned that to two of my largest customers, they both
looked at me like I had lost my mind.

Something to think about.

Joshua D. Drake


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: quick review

From

Bruce Momjian

Date:

21 November 2006, 17:15:53

Joshua D. Drake wrote:
> On Mon, 2006-11-20 at 21:50 -0500, Bruce Momjian wrote:
> > Molle Bestefich wrote:
> > > Hi
> > > 
> > > First time PostgreSQL user.
> > > 
> > > Here's a review of my experience so far.
> > > 
> > > 1.) Reading the manual
> > > 
> > > Looks good feature-wise, but there's a suspicious lack of reference to
> > > any kind of repair utility for damaged data files.  Hmm.  I've been
> > > using computers for long enough that I know that in the real world
> > > there's no such thing as "data corruption doesn't occur", so it's
> > > rather suspicious.
> > 
> > Unless you have buggy hardware, there is no need for a repair tool. 
> > That is our position.
> 
> And when I mentioned that to two of my largest customers, they both
> looked at me like I had lost my mind.
> 
> Something to think about.

My assumption is that they are used to buggy software that has bugs
unfixed for years, and the repair tools are the answer to that, no?

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: quick review

From

Andrew Dunstan

Date:

21 November 2006, 17:18:35

Joshua D. Drake wrote:
>> Unless you have buggy hardware, there is no need for a repair tool. 
>> That is our position.
>>     
>
> And when I mentioned that to two of my largest customers, they both
> looked at me like I had lost my mind.
>
> Something to think about.
>
>   

Now ask your clients what errors they see that could be fixed by a 
repair tool. I think Bruce's formulation is unfortunate, and would look 
better like this: When we find that there is a bug that causes data 
corruption we fix the bug rather than supplying a workaround. Our 
position is that repair tools are mostly a bandaid, and we would rather 
fix the problem.

cheers

andrew

Re: quick review

From

"Joshua D. Drake"

Date:

21 November 2006, 17:32:41

> > And when I mentioned that to two of my largest customers, they both
> > looked at me like I had lost my mind.
> > 
> > Something to think about.
> 
> My assumption is that they are used to buggy software that has bugs
> unfixed for years, and the repair tools are the answer to that, no?
> 

Well I certainly can't argue with that :) but it is a really hard
argument to make to the customer. These are customers who have spent
many more times your and my salaries on software and needed these
*repair* tools in the past.

Sincerely,

Joshua D. Drake


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: quick review

From

Bruce Momjian

Date:

21 November 2006, 17:34:56

Joshua D. Drake wrote:
> 
> > > And when I mentioned that to two of my largest customers, they both
> > > looked at me like I had lost my mind.
> > > 
> > > Something to think about.
> > 
> > My assumption is that they are used to buggy software that has bugs
> > unfixed for years, and the repair tools are the answer to that, no?
> > 
> 
> Well I certainly can't argue with that :) but it is a really hard
> argument to make to the customer. These are customers who have spent
> many more times your and my salaries on software and needed these
> *repair* tools in the past.

OK, so do we give them a /bin/true and go away?  How do we address this?

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: quick review

From

"Joshua D. Drake"

Date:

21 November 2006, 17:39:44

On Tue, 2006-11-21 at 16:18 -0500, Andrew Dunstan wrote:
> Joshua D. Drake wrote:
> >> Unless you have buggy hardware, there is no need for a repair tool. 
> >> That is our position.
> >>     
> >
> > And when I mentioned that to two of my largest customers, they both
> > looked at me like I had lost my mind.
> >
> > Something to think about.
> >
> >   
> 
> Now ask your clients what errors they see that could be fixed by a 
> repair tool. I think Bruce's formulation is unfortunate, and would look 
> better like this: When we find that there is a bug that causes data 
> corruption we fix the bug rather than supplying a workaround. Our 
> position is that repair tools are mostly a bandaid, and we would rather 
> fix the problem.

I have a customer right now, that has a corrupted table. The table would
be fixed by deleting a couple of rows. Now, I don't know if this is true
but I could find it useful to do something like:

table_check -U postgres -D foo -t corrupted_table

WARNING: 2 rows bad, writing out to disk
NOTICE: 100000 million rows cleaned
NOTICE: table corrupted_table clean

Then I could go to a file and get some semblence of information on what
rows were bad so I could find them from an old backup or something.

The reason this is important is that a single bad row in a 500Gb
database, prevents backups.

Sincerely,

Joshua D. Drake



> cheers
> 
> andrew
> 
-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: quick review

From

"Joshua D. Drake"

Date:

21 November 2006, 17:41:31

> > Well I certainly can't argue with that :) but it is a really hard
> > argument to make to the customer. These are customers who have spent
> > many more times your and my salaries on software and needed these
> > *repair* tools in the past.
> 
> OK, so do we give them a /bin/true and go away?  How do we address this?
> 

Heh... Good question. We can't use the argument that we don't corrupt
stuff, cause it does happen. It may not be PostgreSQL's fault, it likely
isn't... but the end result is PostgreSQL is corrupted and needs to be
fixed.

The common way to do that is dump/reload whatever is bad or look for
specific rows etc...

What do you do when looking at a 300 million rows table? Outages just
are not that keen in today's world.

Sincerely,

Joshua D. Drake



-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: quick review

From

Douglas McNaught

Date:

21 November 2006, 17:44:45

Andrew Dunstan <andrew@dunslane.net> writes:

> Now ask your clients what errors they see that could be fixed by a
> repair tool. I think Bruce's formulation is unfortunate, and would
> look better like this: When we find that there is a bug that causes
> data corruption we fix the bug rather than supplying a workaround. Our
> position is that repair tools are mostly a bandaid, and we would
> rather fix the problem.

From what Tom and some others have been saying, it sounds as though
there might be scope for a debugfs(8) sort of tool, to assist in
reconstructing hardware-damaged data.  I agree that any kind of
fsck(8)-style automatic de-scragger is probably not what we want.

The abovementioned tool would still require someone fairly
knowledgeble about the on-disk data structures to drive it.

-Doug

Re: quick review

From

"Molle Bestefich"

Date:

18 December 2006, 18:23:05

> another windows admin running all his service
> under the administrator context.

I needed the PostgreSQL setup to prove to a customer
that a working setup could be made using PGSQL.  It lived
on my system for a couple of days in total, so cooking
up a perfectly secure system was hardly worth it, in fact
it was a major waste of my time.

The machine in question has a firewall, so external
connections to the service would never occur.

PostgreSQL telling me how to run my system security-wise
is infinitely annoying.  Feels like being locked in a
cage, which is always an insulting way to treat a user.

I just tried the same approach with another popular
open source database.  It indeed also refuses to start,
but the user isn't locked out.  It kindly says "you
probably shouldn't be doing this, but if you really want
to, you can run as root with --user=root".

Anyway, I just enabled the guest account on the machine
in question and started a command prompt under those
credentials.  Things wouldn't work unless the guest
account was allowed sign-on privileges (?), so it has
those now.  I've actually forgotten to disable them again,
so argueably being forced by PostgreSQL to run as another
user has actually caused *worse* security on this machine
than would otherwise be the case :-).

Oh well, enough ranting about that already.
Everything worked very smoothly once I had wasted a couple
of hours getting it to start.

Thanks for all the feedback regarding "pg_ctl register"
etc. and thanks for the discussion!


> The point I was trying to make is that recovery is never
> a cookbook process --- it's never twice the same problem.
> (If it were, we could and should be doing something about
> the underlying problem.)  This makes it difficult to
> provide either polished tools or polished documentation.

I've seen a ton of wedged databases.  Missing a tuplet here
or there after recovery because the process applied by the
tool used is imperfect has never been a problem for any
company I've been at.  Minimal time to recover has always
been far more critical.  And also the money paid for the
actual recovery in terms of paychecks matter a lot.

Simply put, a tool with just a single button named "recover
all the data that you can" is by far the best solution in so
many cases.  Minimal fuzz, minimal downtime, minimal money
spent on recovery.  And perhaps there's even a good chance that
any missing data could be entered back into the system manually.

The only time I've ever heard of experts being brought in
to fix a database problem was when IBM's DB2 system crashed for
a major bank in Scandinavia.  But that's banking data, so that's
an entirely different story from everyday use by any other kind
of corporation.  It's .01% of the market, it's really not
that interesting if you ask me.

Ok, long rant short, convincing any company to use a database
system is much easier when that particular system has a one-click
recovery application.  Same reason why NTFS and FAT32 are
filesystems that people like to keep their data on - they know
that when things do go wrong, they can launch Tiramisu or
Easy Recovery Pro or what not and just tell it "recover as much
as you can onto this other disk".  People sleep safer if they
know that there's a backup plan, in particular one that doesn't
require two months of downtime while your DBA is learning the
innards of the database system file structure, posting to a
mailing list *hoping* that there's someone that can and wants
to talk to him (while he's at his most stressed) who has the
required expertise, etc.

Re: quick review

From

tomas@tuxteam.de

Date:

24 December 2006, 01:42:53

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, Dec 18, 2006 at 03:47:42AM +0100, Molle Bestefich wrote:

[...]

> Simply put, a tool with just a single button named "recover
> all the data that you can" is by far the best solution in so
> many cases.  Minimal fuzz, minimal downtime, minimal money
> spent on recovery.  And perhaps there's even a good chance that
> any missing data could be entered back into the system manually.

I think the point which has been made here was that the recovery tool
*is already there*: i.e. all what can be done as an "one-click" recovery
is done by the system at start-up. Beyond this no cookbook exists (and
thus no way to put it under an one-click procedure).

So this one-click thing would be mainly something to cater for the
"needs" of marketing.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFFjg4/Bcgs9XrR2kYRAtnSAKCCFkB+YAlvbuyhepUnR9/CeG3tsACfb8bE
lCszfmaps1PN1jLMQrc6eLo=
=/6Pa
-----END PGP SIGNATURE-----

Re: quick review

From

"Dawid Kuroczko"

Date:

24 December 2006, 18:30:50

On 12/24/06, tomas@tuxteam.de <tomas@tuxteam.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Mon, Dec 18, 2006 at 03:47:42AM +0100, Molle Bestefich wrote:
>
> [...]
>
> > Simply put, a tool with just a single button named "recover
> > all the data that you can" is by far the best solution in so
> > many cases.  Minimal fuzz, minimal downtime, minimal money
> > spent on recovery.  And perhaps there's even a good chance that
> > any missing data could be entered back into the system manually.
>
> I think the point which has been made here was that the recovery tool
> *is already there*: i.e. all what can be done as an "one-click" recovery
> is done by the system at start-up. Beyond this no cookbook exists (and
> thus no way to put it under an one-click procedure).
>
> So this one-click thing would be mainly something to cater for the
> "needs" of marketing.

Well start-up recovery is great and reliable.  The only problem is that
it won't help if you have some obscure hardware problem, you really
have a problem.  If you want to sleep well, you should know what to
do when disaster happens.

I really like the approach of XFS filesystem, which ships with fsck.xfs
which is essentially equivalent to /bin/true.  They write in their white
paper that they did so, because journaling should recover from all
failures.  Yet they also wrote that some time after they learned that
hardware corruption is not as unlikely as one might assume, so they
provide xfs_check an xfs_repair utilities.

I think there should be a documented way to recover from obscure
hardware failure, with even more detailed information how this could
result only from using crappy hardware...  And I don't think this should
be "one click" process -- some people might miss real (software)
corruption, and this is a biggest drawback.  Perhaps the disaster
recoverer should leave a detailed log which would be enough to
detect software-corruption even after the recovery [and users should
be advised to send them].
  Regards,     Dawid Kuroczko

Re: quick review

From

Christopher Browne

Date:

24 December 2006, 23:06:41

A long time ago, in a galaxy far, far away, qnex42@gmail.com ("Dawid Kuroczko") wrote:
> On 12/24/06, tomas@tuxteam.de <tomas@tuxteam.de> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On Mon, Dec 18, 2006 at 03:47:42AM +0100, Molle Bestefich wrote:
>>
>> [...]
>>
>> > Simply put, a tool with just a single button named "recover
>> > all the data that you can" is by far the best solution in so
>> > many cases.  Minimal fuzz, minimal downtime, minimal money
>> > spent on recovery.  And perhaps there's even a good chance that
>> > any missing data could be entered back into the system manually.
>>
>> I think the point which has been made here was that the recovery tool
>> *is already there*: i.e. all what can be done as an "one-click" recovery
>> is done by the system at start-up. Beyond this no cookbook exists (and
>> thus no way to put it under an one-click procedure).
>>
>> So this one-click thing would be mainly something to cater for the
>> "needs" of marketing.
>
> Well start-up recovery is great and reliable.  The only problem is that
> it won't help if you have some obscure hardware problem, you really
> have a problem.  If you want to sleep well, you should know what to
> do when disaster happens.
>
> I really like the approach of XFS filesystem, which ships with fsck.xfs
> which is essentially equivalent to /bin/true.  They write in their white
> paper that they did so, because journaling should recover from all
> failures.  Yet they also wrote that some time after they learned that
> hardware corruption is not as unlikely as one might assume, so they
> provide xfs_check an xfs_repair utilities.
>
> I think there should be a documented way to recover from obscure
> hardware failure, with even more detailed information how this could
> result only from using crappy hardware...  And I don't think this should
> be "one click" process -- some people might miss real (software)
> corruption, and this is a biggest drawback.  Perhaps the disaster
> recoverer should leave a detailed log which would be enough to
> detect software-corruption even after the recovery [and users should
> be advised to send them].

The trouble is that it is often *impossible* to recover from the
"obscure hardware failure."

If the failure is that a bunch of vital bits have been lost or
scribbled on, there may be NO way to recover from this.

And in practice, this in fact seems to be a common form for "obscure
hardware failure" to take: those problems are, in fact, irretrievable.

There historically have been two main sorts of corruptions:

1.  Hardware corruptions where the only recovery is to have some sort
of replica of the data, whether via near-hardware mechanisms (e.g. -
RAID) or more 'logical' mechanisms (e.g. - replication systems).

2.  Software corruptions, where the answer is not to provide some
"recovery mechanism," but rather to FIX THE BUG that is leading to the
problem.  Once the bug is fixed, there is no more corruption (of this
sort).

Neither of these is amenable to there being some mechanism such as you
describe.  There are really only two possibilities:
a) The problem is one that the WAL recovery system can cope with, or
b) There has been True Data Loss, and there is NO recovery system   short of recovering from backup/replica.
-- 
output = ("cbbrowne" "@" "acm.org")
http://cbbrowne.com/info/slony.html
"Here I  am, brain the  size of a planet, and  they ask me to take you
down the the bridge.  Call that job satisfaction?  'Cos I don't."
-- Marvin the Paranoid Android