Thread: md.c should not call files "relations"

md.c should not call files "relations"

From
Tom Lane
Date:
There's an interesting thread over here
http://archives.postgresql.org/pgsql-sql/2009-08/msg00013.php
in which someone mistook a filesystem-level permissions problem
for a database permissions problem.  It wasn't exactly his fault,
I think, since the message he was presented with was

ERROR:  could not create relation "test": Permission denied

which is not all that obviously different from what you would
get for a SQL-permissions violation.

I am thinking that this message would be more correct and less
confusing if it looked something like

ERROR:  could not create file "12345/67890": Permission denied

ie, when reflecting an OS-level error we should call a file a file and
provide its filesystem name, not the name of the table that we were
hoping to map to it.  This would be more likely to lead the user's
mind in the right direction, and he'd need the filesystem pathname
for any detailed investigation anyway.

This would have the further advantage that we could make all the
errors in md.c consistent --- some of them provide filesystem names
rather than table names because that's all they have available.

Lastly, I'm wondering why someone seems to have removed the double
quotes around the filesystem name in some of these messages.
Surely that's not per style guide.

Comments?
        regards, tom lane


Re: md.c should not call files "relations"

From
"David E. Wheeler"
Date:
On Aug 4, 2009, at 6:30 PM, Tom Lane wrote:

> Comments?

+1 Seems like a no-brainer.

David


Re: md.c should not call files "relations"

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> There's an interesting thread over here
> http://archives.postgresql.org/pgsql-sql/2009-08/msg00013.php
> in which someone mistook a filesystem-level permissions problem
> for a database permissions problem.  It wasn't exactly his fault,
> I think, since the message he was presented with was
> 
> ERROR:  could not create relation "test": Permission denied
> 
> which is not all that obviously different from what you would
> get for a SQL-permissions violation.
> 
> I am thinking that this message would be more correct and less
> confusing if it looked something like
> 
> ERROR:  could not create file "12345/67890": Permission denied
> 
> ie, when reflecting an OS-level error we should call a file a file and
> provide its filesystem name, not the name of the table that we were
> hoping to map to it.  This would be more likely to lead the user's
> mind in the right direction, and he'd need the filesystem pathname
> for any detailed investigation anyway.

We already print the file name, not table name - since version 8.0. The
OP that saw the message was on 7.4.

I agree we should call file a file.

> This would have the further advantage that we could make all the
> errors in md.c consistent --- some of them provide filesystem names
> rather than table names because that's all they have available.
> 
> Lastly, I'm wondering why someone seems to have removed the double
> quotes around the filesystem name in some of these messages.
> Surely that's not per style guide.

When I replaced %u/%u/%u with %s containing the relpath() of the file,
it didn't occur to add quotes. Agreed, they should be quoted.

Want me to change those or are you on it already?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: md.c should not call files "relations"

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Want me to change those or are you on it already?

I'm going to bed --- if you wanna do it, have at it ...
        regards, tom lane


Re: md.c should not call files "relations"

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Want me to change those or are you on it already?
> 
> I'm going to bed --- if you wanna do it, have at it ...

Ok.

I note that many of the messages currently print the relpath() of the
relation, and don't include the affected segment suffix. For example:
could not read block 140000 of relation base/11566/24614: read only 1
of 8192 bytes

If we change them to point to the exactly right filename including
segment suffix, then the block number becomes confusing, since that
would still refer block number within the relation, not the segment.
Right now, the "relation xxx" is referring to the segmented virtual file
as whole, not to any specific segment. One option is to revert those
messages to 8.3 style:
 could not read block 140000 of relation 1663/11566/24614: read only 1
of 8192 bytes

We'd need to include the fork there, so at least for forks other than
the main one it would become something like
 could not read block 140000 of relation 1663/11566/24614/fsm: read
only 1 of 8192 bytes

Another option is to print the byte offset within segment file instead
of block number:
 could not read 8129 bytes at offset 73138176 of file
"base/11566/24614.1": read only 1 bytes

That feels more concise and describes accurately what the failing OS
call was. However, it doesn't fit these two messages:
 cannot extend relation %s beyond %u blocks could not truncate relation %s to %u blocks: it's only %u blocks now

since those genuinely don't refer to any particular segment. Also, if we
want to support RELSEG_SIZE > 4GB, we'd have to use INT64_FORMAT in the
format strings, and I don't think that works nicely with translations.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: md.c should not call files "relations"

From
"David E. Wheeler"
Date:
On Aug 4, 2009, at 11:10 PM, Tom Lane wrote:

>> Want me to change those or are you on it already?
>
> I'm going to bed --- if you wanna do it, have at it ...

Oh please. Everyone knows that you don't sleep, Tom. You just sit back  
in your chair and power nap for five minutes once in a while, perhaps  
between reading the -hackers and -general mail lists. ;-P

David


Re: md.c should not call files "relations"

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> I note that many of the messages currently print the relpath() of the
> relation, and don't include the affected segment suffix. For example:

>  could not read block 140000 of relation base/11566/24614: read only 1
> of 8192 bytes

> If we change them to point to the exactly right filename including
> segment suffix, then the block number becomes confusing, since that
> would still refer block number within the relation, not the segment.

Hmm, good point.  I don't think the byte-offset solution is usable,
because of the INT64_FORMAT problem.  What I would vote for is just
continuing to show the block number relative to the whole relation,
while (as much as possible) showing the actual filesystem pathname of
the file being mentioned.  This would mean that anyone trying to
interpret the block number would have to be aware of what it meant
and do the appropriate modulo calculation, but frankly I doubt that all
that many people will care about exactly what offset is implied.

BTW, I wonder whether it would be worth adding an entry point to fd.c
to return the path name associated with a logical fd, rather than
sprinkling extra relpath() calls throughout these messages.
        regards, tom lane


Re: md.c should not call files "relations"

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> I note that many of the messages currently print the relpath() of the
>> relation, and don't include the affected segment suffix. For example:
> 
>>  could not read block 140000 of relation base/11566/24614: read only 1
>> of 8192 bytes
> 
>> If we change them to point to the exactly right filename including
>> segment suffix, then the block number becomes confusing, since that
>> would still refer block number within the relation, not the segment.
> 
> Hmm, good point.  I don't think the byte-offset solution is usable,
> because of the INT64_FORMAT problem.  What I would vote for is just
> continuing to show the block number relative to the whole relation,
> while (as much as possible) showing the actual filesystem pathname of
> the file being mentioned.  This would mean that anyone trying to
> interpret the block number would have to be aware of what it meant
> and do the appropriate modulo calculation, but frankly I doubt that all
> that many people will care about exactly what offset is implied.

Ok. The most likely scenario where it would be confusing would be if you
get an error along the lines of "read error on block 200000 in file
XXX.1": you look at file XXX.1 and conclude that the file must be
truncated because the file is much shorter than 200000 blocks. Some
low-level knowledge is indeed needed to interpret that correctly, but
then again knowing to multiply by 8192 to get the offset is low-level
knowledge to begin with.

> BTW, I wonder whether it would be worth adding an entry point to fd.c
> to return the path name associated with a logical fd, rather than
> sprinkling extra relpath() calls throughout these messages.

Yes. I was going to add a function to md.c to construct the filename
from (SmgrRelation, ForkNumber, segment number), but that's an even
better idea.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com