Thread: TODO list

TODO list

From

Peter Eisentraut

Date:

15 January 2000, 13:31:35

* User who can create databases can modify pg_database table

Not anymore.

* Interlock to prevent DROP DATABASE on a database with running backends

I think Tom wanted this listed as an achievement, because it's already
done.

* Better interface for adding to pg_group

Done.

* Allow array on int8[]

Done. (Credit to Thomas, thought, he just forgot to apply the patch.)

* Make Absolutetime/Relativetime int4 because time_t can be int8 on some
ports

Does this mean the abstime/reltime types or all of them?  I thought the
former were deprecated anyway.

* Permissions on indexes, prevent them?

Done (prevented)

* Make postgres user have a password by default

Done. (--pwprompt option, enter it blind twice, echo ALTER USER |
postgres; probably as secure as it gets)

* Update table SET table.value = 3 fails(SQL standard says this is OK)

Not the standard I'm looking at. Someone please enlighten me.
        <update statement: searched> ::=          UPDATE <table name>            SET <set clause list>              [
WHERE<search condition> ]
 
        <set clause list> ::=             <set clause> [ { <comma> <set clause> }... ]
        <set clause> ::=             <object column> <equals operator> <update source>
        <object column> ::= <column name>
        <column name> ::= <identifier>
        <identifier> ::=             [ <introducer><character set specification> ] <actual identifier>
<introducer>::= <underscore>
 
        <character set specification> ::=    { nothing of interest }
        <actual identifier> ::=               <regular identifier>             | <delimited identifier>
{ meaning a non-quoted identifier or a quoted one }


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden

Re: [HACKERS] TODO list

From

Bruce Momjian

Date:

15 January 2000, 14:32:32

[Charset ISO-8859-1 unsupported, filtering to ASCII...]
> * User who can create databases can modify pg_database table
> 
> Not anymore.

Done.  Good.

> 
> * Interlock to prevent DROP DATABASE on a database with running backends
> 
> I think Tom wanted this listed as an achievement, because it's already
> done.

Done.

> 
> * Better interface for adding to pg_group
> 
> Done.

Yes.  I can remove the pg_group FAQ item after 7.0 is out to most
people.

> 
> * Allow array on int8[]
> 
> Done. (Credit to Thomas, thought, he just forgot to apply the patch.)

Good.

> 
> * Make Absolutetime/Relativetime int4 because time_t can be int8 on some
> ports
> 
> Does this mean the abstime/reltime types or all of them?  I thought the
> former were deprecated anyway.

I think the idea is that it can roll over the mac int4 value.  Not sure
about which types are active.


> 
> * Permissions on indexes, prevent them?
> 
> Done (prevented)

Good.


> 
> * Make postgres user have a password by default
> 
> Done. (--pwprompt option, enter it blind twice, echo ALTER USER |
> postgres; probably as secure as it gets)

Perfect.  We don't want to do it by default.

> 
> * Update table SET table.value = 3 fails(SQL standard says this is OK)
> 
> Not the standard I'm looking at. Someone please enlighten me.
> 
>          <update statement: searched> ::=
>            UPDATE <table name>
>              SET <set clause list>
>                [ WHERE <search condition> ]
> 
>          <set clause list> ::=
>               <set clause> [ { <comma> <set clause> }... ]
> 
>          <set clause> ::=
>               <object column> <equals operator> <update source>
> 
>          <object column> ::= <column name>
> 
>          <column name> ::= <identifier>
> 
>          <identifier> ::=
>               [ <introducer><character set specification> ] <actual identifier>
>   
>          <introducer> ::= <underscore>
> 
>          <character set specification> ::=
>         { nothing of interest }
> 
>          <actual identifier> ::=
>                 <regular identifier>
>               | <delimited identifier>
> 
>     { meaning a non-quoted identifier or a quoted one }
> 

I don't see anything in the spec that says you can use table.column on
the left-hand side of the equals.  No?

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] TODO list

From

Don Baccus

Date:

15 January 2000, 15:01:34

At 02:31 PM 1/15/00 -0500, Bruce Momjian wrote:
>> * Update table SET table.value = 3 fails(SQL standard says this is OK)
>> 
>> Not the standard I'm looking at. Someone please enlighten me.

If this is indeed the standard, it looks to me as though Bruce is 
reading it right.  Makes sense, too, only one table can be updated
at a time, so there's no opportunity for ambiguity in column names
on the left side.  What's the point?

- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.

Re: [HACKERS] TODO list

From

Stephen Birch

Date:

15 January 2000, 18:31:39

Can we ALTER a table to drop a column yet, or is that still a TO DO item?

Steve


Peter Eisentraut wrote:

> * User who can create databases can modify pg_database table
>
> Not anymore.
>
> * Interlock to prevent DROP DATABASE on a database with running backends
>

Re: [HACKERS] TODO list

From

Bruce Momjian

Date:

15 January 2000, 21:31:35

> Can we ALTER a table to drop a column yet, or is that still a TO DO item?
> 

Still a TODO item.
--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] TODO list

From

Peter Eisentraut

Date:

16 January 2000, 12:12:45

I'm working on that, but pssst, don't tell anyone! ;)

On 2000-01-15, Stephen Birch mentioned:

> Can we ALTER a table to drop a column yet, or is that still a TO DO item?
> 
> Steve
> 
> 
> Peter Eisentraut wrote:
> 
> > * User who can create databases can modify pg_database table
> >
> > Not anymore.
> >
> > * Interlock to prevent DROP DATABASE on a database with running backends
> >
> 
> 

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden

Re: [HACKERS] TODO list

From

Thomas Lockhart

Date:

17 January 2000, 02:53:54

> * Allow array on int8[]
> Done. (Credit to Thomas, thought, he just forgot to apply the patch.)

Thanks. btw, didn't forget, but wanted confirmation that it worked
(which I got a day or two later).

> * Make Absolutetime/Relativetime int4 because time_t can be int8 on some
> ports
> Does this mean the abstime/reltime types or all of them?  I thought the
> former were deprecated anyway.

abstime should probably be considered deprecated as a user type, but
it is still used extensively internally and within the tuple
structure. I'd be reluctant to wholesale replace it with
timestamp/datetime, since that will take 8 bytes per value rather than
4.
                      - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California

Re: [HACKERS] TODO list

From

Tom Lane

Date:

17 January 2000, 03:09:56

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> Does this mean the abstime/reltime types or all of them?  I thought the
>> former were deprecated anyway.

> abstime should probably be considered deprecated as a user type, but
> it is still used extensively internally and within the tuple
> structure. I'd be reluctant to wholesale replace it with
> timestamp/datetime, since that will take 8 bytes per value rather than
> 4.

I was meaning to ask you which of the date/time types are going to be
left standing when the dust settles.  (I know you've said, but the
archives are so messed up right now that I can't find it.)

Timestamp is the only remaining standard type without an array type,
and if it's not going to be deprecated then it ought to have one...
        regards, tom lane

Re: [HACKERS] TODO list

From

Thomas Lockhart

Date:

17 January 2000, 03:23:54

> I was meaning to ask you which of the date/time types are going to be
> left standing when the dust settles.  (I know you've said, but the
> archives are so messed up right now that I can't find it.)
> Timestamp is the only remaining standard type without an array type,
> and if it's not going to be deprecated then it ought to have one...

"timestamp" will continue, but *all* of the code will come from a
renamed "datetime". So don't bother adding anything for timestamp,
since it will magically appear when datetime gets renamed.

btw, I will make "datetime" a synonym for "timestamp", so existing
apps should work without change.
                   - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California

Re: [HACKERS] TODO list

From

Peter Eisentraut

Date:

17 January 2000, 06:13:56

On Mon, 17 Jan 2000, Thomas Lockhart wrote:

> > * Make Absolutetime/Relativetime int4 because time_t can be int8 on some
> > ports
> > Does this mean the abstime/reltime types or all of them?  I thought the
> > former were deprecated anyway.
> 
> abstime should probably be considered deprecated as a user type, but
> it is still used extensively internally and within the tuple
> structure. I'd be reluctant to wholesale replace it with
> timestamp/datetime, since that will take 8 bytes per value rather than
> 4.

Just so I understand this: The official SQL data types are "timestamp" and
"interval", right? Everything else will eventually be an alias or phased
out or whatever?

I've been itching to change the pg_shadow.valuntil column to timestamp
anyway, I suppose that would be a step in the right direction, or not?

-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden

Re: [HACKERS] TODO list

From

Bruce Momjian

Date:

17 January 2000, 11:19:07

> > I was meaning to ask you which of the date/time types are going to be
> > left standing when the dust settles.  (I know you've said, but the
> > archives are so messed up right now that I can't find it.)
> > Timestamp is the only remaining standard type without an array type,
> > and if it's not going to be deprecated then it ought to have one...
> 
> "timestamp" will continue, but *all* of the code will come from a
> renamed "datetime". So don't bother adding anything for timestamp,
> since it will magically appear when datetime gets renamed.
> 
> btw, I will make "datetime" a synonym for "timestamp", so existing
> apps should work without change.

Got it.  Never mind.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] TODO list

From

Thomas Lockhart

Date:

17 January 2000, 11:26:05

> The official SQL data types are "timestamp" and
> "interval", right? Everything else will eventually be an alias or 
> phased out or whatever?

No (at least I haven't proposed that). abstime stays as a 4-byte
internal system time type. timestamp and interval become full-featured
date/time types, stealing all of the datetime and timespan code, and
the latter two become synonyms for timestamp and interval.

> I've been itching to change the pg_shadow.valuntil column to timestamp
> anyway, I suppose that would be a step in the right direction, or not?

At the moment, there are *no* 8-byte date/time types in the system
tables. This would be the first instance of that, and I'm not sure we
should introduce it in just one place.

Has abstime been a problem here?
                  - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California

Re: [HACKERS] TODO list

From

Peter Eisentraut

Date:

18 January 2000, 18:22:29

On 2000-01-17, Thomas Lockhart mentioned:

> > The official SQL data types are "timestamp" and
> > "interval", right? Everything else will eventually be an alias or 
> > phased out or whatever?
> 
> No (at least I haven't proposed that). abstime stays as a 4-byte
> internal system time type. timestamp and interval become full-featured
> date/time types, stealing all of the datetime and timespan code, and
> the latter two become synonyms for timestamp and interval.

Okay, so we have "timestamp" and "interval" as offical types, a few
"datetime" sort of things as aliases for backwards compatibility, and
"abstime" as a more or less internal type with less precision and storage
requirements. Sounds clear to me. This also puts the original TODO item
into a much clearer light.

> > I've been itching to change the pg_shadow.valuntil column to timestamp
> > anyway, I suppose that would be a step in the right direction, or not?
> 
> At the moment, there are *no* 8-byte date/time types in the system
> tables. This would be the first instance of that, and I'm not sure we
> should introduce it in just one place.
> 
> Has abstime been a problem here?

No. I just thought this could be done, but in view of your explanation I
am now wiser ...

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden

Re: TODO list

From

Bruce Momjian

Date:

04 April 2001, 16:59:31

> 
> Bruce,
> 
> Two changes for the TODO list.
> 
> 1. Under "RELIABILITY/MISC", add:
> 
>   Write out a CRC with each data block, and verify it on reading.
> 
> 2. Under SOURCE CODE, I believe Tom has already implemented:
> 
>   Correct CRC WAL code to be a real CRC64 algorithm 

TODO updated.  I know we did number 2, but did we agree on #1 and is it
done?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Re: TODO list

From

Tom Lane

Date:

04 April 2001, 17:40:18

Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> Two changes for the TODO list.
>> 
>> 1. Under "RELIABILITY/MISC", add:
>> 
>> Write out a CRC with each data block, and verify it on reading.
>> 
>> 2. Under SOURCE CODE, I believe Tom has already implemented:
>> 
>> Correct CRC WAL code to be a real CRC64 algorithm 

> TODO updated.  I know we did number 2, but did we agree on #1 and is it
> done?

#2 is indeed done.  #1 is not done, and possibly not agreed to ---
I think Vadim had doubts about its usefulness, though personally I'd
like to see it.
        regards, tom lane

Re: Re: TODO list

From

Bruce Momjian

Date:

04 April 2001, 17:40:19

> > TODO updated.  I know we did number 2, but did we agree on #1 and is it
> > done?
> 
> #2 is indeed done.  #1 is not done, and possibly not agreed to ---
> I think Vadim had doubts about its usefulness, though personally I'd
> like to see it.

That was my recollection too.  This was the discussion about testing the
disk hardware.  #1 removed.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Re: TODO list

From

"Ken Hirsch"

Date:

05 April 2001, 16:40:38

> > > TODO updated.  I know we did number 2, but did we agree on #1 and is
it
> > > done?
> >
> > #2 is indeed done.  #1 is not done, and possibly not agreed to ---
> > I think Vadim had doubts about its usefulness, though personally I'd
> > like to see it.
>
> That was my recollection too.  This was the discussion about testing the
> disk hardware.  #1 removed.

What is recommended in the bible (Gray and Reuter), especially for larger
disk block sizes that may not be written atomically, is to have a word at
the end of the that must match a word at the beginning of the block.  It
gets changed each time you write the block.

Ken Hirsch
All your database are belong to us.

Re: Re: TODO list

From

ncm@zembu.com (Nathan Myers)

Date:

05 April 2001, 17:01:22

On Thu, Apr 05, 2001 at 04:25:42PM -0400, Ken Hirsch wrote:
> > > > TODO updated.  I know we did number 2, but did we agree on #1 and is
> it
> > > > done?
> > >
> > > #2 is indeed done.  #1 is not done, and possibly not agreed to ---
> > > I think Vadim had doubts about its usefulness, though personally I'd
> > > like to see it.
> >
> > That was my recollection too.  This was the discussion about testing the
> > disk hardware.  #1 removed.
> 
> What is recommended in the bible (Gray and Reuter), especially for larger
> disk block sizes that may not be written atomically, is to have a word at
> the end of the that must match a word at the beginning of the block.  It
> gets changed each time you write the block.

That only works if your blocks are atomic.  Even SCSI disks reorder
sector writes, and they are free to write the first and last sectors
of an 8k-32k block, and not have written the intermediate blocks 
before the power goes out.  On IDE disks it is of course far worse.

(On many (most?) IDE drives, even when they have been told to report 
write completion only after data is physically on the platter, they will 
"forget" if they see activity that looks like benchmarking.  Others just 
ignore the command, and in any case they all default to unsafe mode.)

If the reason that a block CRC isn't on the TODO list is that Vadim
objects, maybe we should hear some reasons why he objects?  Maybe 
the objections could be dealt with, and everyone satisfied.

Nathan Myers
ncm@zembu.com

RE: Re: TODO list

From

"Mikheev, Vadim"

Date:

05 April 2001, 17:28:02

> If the reason that a block CRC isn't on the TODO list is that Vadim
> objects, maybe we should hear some reasons why he objects?  Maybe 
> the objections could be dealt with, and everyone satisfied.

Unordered disk writes are covered by backing up modified blocks
in log. It allows not only catch such writes, as would CRC do,
but *avoid* them.

So, for what CRC could be used? To catch disk damages?
Disk has its own CRC for this.

Vadim

Re: Re: TODO list

From

ncm@zembu.com (Nathan Myers)

Date:

05 April 2001, 17:38:46

On Thu, Apr 05, 2001 at 02:27:48PM -0700, Mikheev, Vadim wrote:
> > If the reason that a block CRC isn't on the TODO list is that Vadim
> > objects, maybe we should hear some reasons why he objects?  Maybe 
> > the objections could be dealt with, and everyone satisfied.
> 
> Unordered disk writes are covered by backing up modified blocks
> in log. It allows not only catch such writes, as would CRC do,
> but *avoid* them.
> 
> So, for what CRC could be used? To catch disk damages?
> Disk has its own CRC for this.

OK, this was already discussed, maybe while Vadim was absent.  
Should I re-post the previous text?

Nathan Myers
ncm@zembu.com

RE: Re: TODO list

From

"Mikheev, Vadim"

Date:

05 April 2001, 17:47:55

> > So, for what CRC could be used? To catch disk damages?
> > Disk has its own CRC for this.
> 
> OK, this was already discussed, maybe while Vadim was absent.  
> Should I re-post the previous text?

Let's return to this discussion *after* 7.1 release.
My main objection was (and is) - no time to deal with
this issue for 7.1

Vadim

Re: Re: TODO list

From

ncm@zembu.com (Nathan Myers)

Date:

05 April 2001, 18:06:42

On Thu, Apr 05, 2001 at 02:47:41PM -0700, Mikheev, Vadim wrote:
> > > So, for what CRC could be used? To catch disk damages?
> > > Disk has its own CRC for this.
> > 
> > OK, this was already discussed, maybe while Vadim was absent.  
> > Should I re-post the previous text?
> 
> Let's return to this discussion *after* 7.1 release.
> My main objection was (and is) - no time to deal with
> this issue for 7.1.

OK, everybody agreed on that before.  

This doesn't read like an objection to having it on the TODO list for
some future release.  

Nathan Myers
ncm@zembu.com

Re: Re: TODO list

From

Tom Lane

Date:

05 April 2001, 18:25:35

"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
>> If the reason that a block CRC isn't on the TODO list is that Vadim
>> objects, maybe we should hear some reasons why he objects?  Maybe 
>> the objections could be dealt with, and everyone satisfied.

> Unordered disk writes are covered by backing up modified blocks
> in log. It allows not only catch such writes, as would CRC do,
> but *avoid* them.

> So, for what CRC could be used? To catch disk damages?
> Disk has its own CRC for this.

Oh, I see.  For anyone else who has trouble reading between the lines:

Blocks that have recently been written, but failed to make it down to
the disk platter intact, should be restorable from the WAL log.  So we
do not need a block-level CRC to guard against partial writes.

A block-level CRC might be useful to guard against long-term data
lossage, but Vadim thinks that the disk's own CRCs ought to be
sufficient for that (and I can't say I disagree).

So the only real benefit of a block-level CRC would be to guard against
bits dropped in transit from the disk surface to someplace else, ie,
during read or during a "cp -r" type copy of the database to another
location.  That's not a totally negligible risk, but is it worth the
overhead of updating and checking block CRCs?  Seems dubious at best.
        regards, tom lane

Re: Re: TODO list

From

Bruce Momjian

Date:

05 April 2001, 20:29:05

> > So, for what CRC could be used? To catch disk damages?
> > Disk has its own CRC for this.
> 
> Oh, I see.  For anyone else who has trouble reading between the lines:
> 
> Blocks that have recently been written, but failed to make it down to
> the disk platter intact, should be restorable from the WAL log.  So we
> do not need a block-level CRC to guard against partial writes.
> 
> A block-level CRC might be useful to guard against long-term data
> lossage, but Vadim thinks that the disk's own CRCs ought to be
> sufficient for that (and I can't say I disagree).
> 
> So the only real benefit of a block-level CRC would be to guard against
> bits dropped in transit from the disk surface to someplace else, ie,
> during read or during a "cp -r" type copy of the database to another
> location.  That's not a totally negligible risk, but is it worth the
> overhead of updating and checking block CRCs?  Seems dubious at best.

Agreed.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Re: TODO list

From

ncm@zembu.com (Nathan Myers)

Date:

05 April 2001, 21:39:34

On Thu, Apr 05, 2001 at 06:25:17PM -0400, Tom Lane wrote:
> "Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
> >> If the reason that a block CRC isn't on the TODO list is that Vadim
> >> objects, maybe we should hear some reasons why he objects?  Maybe 
> >> the objections could be dealt with, and everyone satisfied.
> 
> > Unordered disk writes are covered by backing up modified blocks
> > in log. It allows not only catch such writes, as would CRC do,
> > but *avoid* them.
> 
> > So, for what CRC could be used? To catch disk damages?
> > Disk has its own CRC for this.
> 
> Blocks that have recently been written, but failed to make it down to
> the disk platter intact, should be restorable from the WAL log.  So we
> do not need a block-level CRC to guard against partial writes.

If a block is missing some sectors in the middle, how would you know
to reconstruct it from the WAL, without a block CRC telling you that
the block is corrupt?

> A block-level CRC might be useful to guard against long-term data
> lossage, but Vadim thinks that the disk's own CRCs ought to be
> sufficient for that (and I can't say I disagree).

The people who make the disks don't agree.  

They publish the error rate they guarantee, and they meet it, more 
or less.  They publish a rate that is _just_ low enough to satisfy 
noncritical requirements (on the correct assumption that they can't 
satisfy critical requirements in any case) and high enough not to 
interfere with benchmarks.  They assume that if you need better 
reliability you can and will provide it yourself, and rely on their 
CRC only as a performance optimization.

At the raw sector level, they get (and correct) errors very frequently; 
when they are not getting "enough" errors, they pack the bits more 
densely until they do, and sell a higher-density drive.

> So the only real benefit of a block-level CRC would be to guard against
> bits dropped in transit from the disk surface to someplace else, ie,
> during read or during a "cp -r" type copy of the database to another
> location.  That's not a totally negligible risk, but is it worth the
> overhead of updating and checking block CRCs?  Seems dubious at best.

Vadim didn't want to re-open this discussion until after 7.1 is out
the door, but that "dubious at best" demands an answer.  See the archive 
posting:

http://www.postgresql.org/mhonarc/pgsql-hackers/2001-01/msg00473.html

...

Incidentally, is the page at 
 http://www.postgresql.org/mhonarc/pgsql-hackers/2001-01/

the best place to find old messages?  It's never worked right for me.

Nathan Myers
ncm@zembu.com

Re: Re: TODO list

From

Philip Warner

Date:

05 April 2001, 22:07:47

At 18:25 5/04/01 -0400, Tom Lane wrote:
>
>A block-level CRC might be useful to guard against long-term data
>lossage, but Vadim thinks that the disk's own CRCs ought to be
>sufficient for that (and I can't say I disagree).
>
>So the only real benefit of a block-level CRC would be to guard against
>bits dropped in transit from the disk surface to someplace else

What about guarding against file system problems, like blocks of one
(non-PG) file erroneously writing to blocks of another (PG table) file?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

RE: Re: TODO list

From

"Mikheev, Vadim"

Date:

05 April 2001, 22:37:10

> > Blocks that have recently been written, but failed to make
> > it down to the disk platter intact, should be restorable from
> > the WAL log.  So we do not need a block-level CRC to guard
> > against partial writes.
> 
> If a block is missing some sectors in the middle, how would you know
> to reconstruct it from the WAL, without a block CRC telling you that
> the block is corrupt?

On recovery we unconditionally copy *entire* block content from the log
for each block modified since last checkpoint. And we do not write new
checkpoint record (ie do not advance recovery start point) untill we know
that all data blocks are flushed on disk (including blocks modified before
checkpointer started).

Vadim

Re: Re: TODO list

From

Tom Lane

Date:

05 April 2001, 22:52:41

Philip Warner <pjw@rhyme.com.au> writes:
>> So the only real benefit of a block-level CRC would be to guard against
>> bits dropped in transit from the disk surface to someplace else

> What about guarding against file system problems, like blocks of one
> (non-PG) file erroneously writing to blocks of another (PG table) file?

Well, what about it?  Can you offer numbers demonstrating that this risk
is probable enough to justify the effort and runtime cost of a block
CRC?

If we're in the business of expending cycles to guard against
nil-probability risks, let's checksum our executables every time we
start up, to make sure they're not overwritten.  Actually, we'd better
re-checksum program text memory every few seconds, in case RAM dropped
a bit since we looked last.  And let's follow every memcpy by a memcmp
to make sure that didn't drop a bit.  Heck, let's keep a CRC on every
palloc'd memory block.  And so on and so forth.  Sooner or later you've
got to draw the line at diminishing returns, both for runtime costs
and for the programming effort you spent on this stuff (instead of on
finding/fixing bugs that might bite you with far greater frequency than
anything a CRC might catch for you).

To be perfectly clear: I have actually seen bug reports trace to
problems that I think a block-level CRC might have detected (not
corrected, of course, but at least the user might have realized he had
flaky hardware a little sooner).  So I do not say that the upside to
a block CRC is nil.  But I am unconvinced that it exceeds the downside,
in development effort, runtime, false failure reports (is that CRC error
really due to hardware trouble, or a software bug that failed to update
the CRC? and how do you get around the CRC error to get at your data??)
etc etc.
        regards, tom lane

Re: Re: TODO list

From

Philip Warner

Date:

06 April 2001, 00:09:43

At 22:52 5/04/01 -0400, Tom Lane wrote:
>
>> What about guarding against file system problems, like blocks of one
>> (non-PG) file erroneously writing to blocks of another (PG table) file?
>
>Well, what about it?  Can you offer numbers demonstrating that this risk
>is probable enough to justify the effort and runtime cost of a block
>CRC?

Rhetorical crap aside, I've had more file system falures (including badly
mapped file data) than I have had disk hardware failures. So, if you are
considering 'bits dropped in transit', you should also be considering data
corruption not related to the hardware.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: Re: TODO list

From

"Rod Taylor"

Date:

06 April 2001, 08:48:23

> If we're in the business of expending cycles to guard against
> nil-probability risks, let's checksum our executables every time we
> start up, to make sure they're not overwritten.  Actually, we'd
better
> re-checksum program text memory every few seconds, in case RAM
dropped
> a bit since we looked last.  And let's follow every memcpy by a
memcmp
> to make sure that didn't drop a bit.  Heck, let's keep a CRC on
every

Why does it sound like you have problems with radiation eating away at
your live memory for satellite operations?

RE: Re: TODO list

From

"Mikheev, Vadim"

Date:

06 April 2001, 13:25:38

> To be perfectly clear: I have actually seen bug reports trace to
> problems that I think a block-level CRC might have detected (not
> corrected, of course, but at least the user might have realized he had
> flaky hardware a little sooner).  So I do not say that the upside to
> a block CRC is nil.  But I am unconvinced that it exceeds the
> downside, in development effort, runtime, false failure reports
> (is that CRC error really due to hardware trouble, or a software bug
> that failed to update the CRC? and how do you get around the CRC error
> to get at your data??) etc etc.

Something to remember: currently we update t_infomask (set
HEAP_XMAX_COMMITTED etc) while holding share lock on buffer -
we have to change this before block CRC implementation.

Vadim

Re: Re: TODO list

From

Tom Lane

Date:

06 April 2001, 14:48:46

"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
> Something to remember: currently we update t_infomask (set
> HEAP_XMAX_COMMITTED etc) while holding share lock on buffer -
> we have to change this before block CRC implementation.

Yeah, we'd lose some concurrency there.
        regards, tom lane