Thread: [PATCH] Cleanup of GUC units code

[PATCH] Cleanup of GUC units code

From

Marko Kreen

Date:

31 August 2008, 19:17:45

Current GUC units code has 2 problems:

- It requires case-sensiteive representation of units ("kB").
  As the main point of units is to make configuring easier,
  requiring byte-exact typing makes them unnecessarily fragile.

- In attempt to preserve maximum range of values for INT64_IS_BUSTED
  systems, the code is written rather non-obvious way.

Attached patch replaces per-unit custom code with lookup table,
where each unit is represented as multiplier of base unit.
And it compares unit names case-insensitivaly.

It sacrifices some range on INT64_IS_BUSTED systems, but as the only promise
we offer them is that "Postgres may compile" I don't see it as problem.

In case people like the case-sensitivity, it can be restored and the patch
applied as plain cleanup.

Attachment

units-full-cleanup.patch

Re: [PATCH] Cleanup of GUC units code

From

Tom Lane

Date:

31 August 2008, 23:56:42

Marko Kreen <markokr@gmail.com> writes:
> - In attempt to preserve maximum range of values for INT64_IS_BUSTED
>   systems, the code is written rather non-obvious way.

I do not personally object a bit to making the units comparisons
case-insensitive (I think it's mainly Peter who wants to be strict
about it).  I don't think there are any other good ideas in this
patch, however, and exposing ourselves to intermediate overflows in
the name of simplicity is definitely not one.
        regards, tom lane

Re: [PATCH] Cleanup of GUC units code

From

"Marko Kreen"

Date:

01 September 2008, 10:38:06

On 9/1/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Kreen <markokr@gmail.com> writes:
>  > - In attempt to preserve maximum range of values for INT64_IS_BUSTED
>  >   systems, the code is written rather non-obvious way.
>
> I do not personally object a bit to making the units comparisons
>  case-insensitive (I think it's mainly Peter who wants to be strict
>  about it).  I don't think there are any other good ideas in this
>  patch, however, and exposing ourselves to intermediate overflows in
>  the name of simplicity is definitely not one.

For all practical purposes, the overflow is insignificant when int64
works.  I'll look if I can avoid it on INT64_IS_BUSTED case.

In the meantime, here is simple patch for case-insensivity.

--
marko

Attachment

units.nocase.diff

Re: [PATCH] Cleanup of GUC units code

From

Peter Eisentraut

Date:

02 September 2008, 09:01:52

Marko Kreen wrote:
> In the meantime, here is simple patch for case-insensivity.

You might be able to talk me into accepting various unambiguous, common 
alternative spellings of various units.  But for instance allowing MB 
and Mb to mean the same thing is insane.

Re: [PATCH] Cleanup of GUC units code

From

Gregory Stark

Date:

02 September 2008, 09:16:58

Peter Eisentraut <peter_e@gmx.net> writes:

> Marko Kreen wrote:
>> In the meantime, here is simple patch for case-insensivity.
>
> You might be able to talk me into accepting various unambiguous, common
> alternative spellings of various units.  But for instance allowing MB and Mb to
> mean the same thing is insane.

Because you think some user will be trying to specify their shared_buffers in
bits?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!

Re: [PATCH] Cleanup of GUC units code

From

Peter Eisentraut

Date:

02 September 2008, 10:45:46

Gregory Stark wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> 
>> Marko Kreen wrote:
>>> In the meantime, here is simple patch for case-insensivity.
>> You might be able to talk me into accepting various unambiguous, common
>> alternative spellings of various units.  But for instance allowing MB and Mb to
>> mean the same thing is insane.
> 
> Because you think some user will be trying to specify their shared_buffers in
> bits?

My concern is that this information does not stay in the configuration 
files.  It will invariably leak out into whitepapers, presentations, 
product documentation, and before long there will be confusion about why 
you can't stuff N Mb over an N Mb connection.  I am not making this up.

Mb does not add any typing ease (as "KB" might) or readability (as "sec"   might), and there is no respectable source
thatwill claim it is an 

acceptable alias for MB.

Re: [PATCH] Cleanup of GUC units code

From

"Marko Kreen"

Date:

02 September 2008, 10:45:57

On 9/2/08, Peter Eisentraut <peter_e@gmx.net> wrote:
> Marko Kreen wrote:
> > In the meantime, here is simple patch for case-insensivity.
>
>  You might be able to talk me into accepting various unambiguous, common
> alternative spellings of various units.  But for instance allowing MB and Mb
> to mean the same thing is insane.

How would the docs for that look like?  And anyway, what is wrong with
Mb for megabytes?  mB may be slightly weird but if some user likes it,
I see no reason to reject it.

You do realize that misspelling unit name can cause downtime of several
minutes instead of couple seconds?  We can easily do restart in couple of
seconds but the restart, look logs, launch editor, find value, change,
save, restart cycle will take quite a lot more.  Why should we increase
the chance that any config edit causes problems?

Secondly, humans don't have byte-exact memory, instead they generalize
and deduce (eg. from nearby parameters).  Thus remembering "KB, MB, GB"
or "kb, mb, gb" is easier than remembering "kB, MB, GB".  Also remembering
"ms, s, m, h, d" is easier than "ms, s, min, h, d". (I'm not proposing 'm'
for minutes, just noting, that as we shouldn't ever introduce 'month' unit
for core config values, the 'm' as minutes is preferable for 'min'.)

Thirdly, please don't use "standard units" argument, unless you plan to
propose use of "KiB, MiB, GiB" at the same moment.  The units will be
remembered as units-for-postgresql.conf.  It is good if they conform to
user expectations, but it's even more important they are easy to remember.

Fourth, for the same reason, it is preferable the amount of units
accepted stays small.  Again, to be easily rememberable.  This
also kills any chance "mb" can be confused with "milli-bits".

If any extension/module wants to use any other units or full-SI,
it can but that should not extend to core config values.  Again,
to reduce chance for confusion.

So, are there any other arguments for keeping current behaviour?

-- 
marko

Re: [PATCH] Cleanup of GUC units code

From

"Marko Kreen"

Date:

02 September 2008, 10:54:38

On 9/2/08, Peter Eisentraut <peter_e@gmx.net> wrote:
> Gregory Stark wrote:
> > Peter Eisentraut <peter_e@gmx.net> writes:
> > > Marko Kreen wrote:
> > > > In the meantime, here is simple patch for case-insensivity.
> > > >
> > > You might be able to talk me into accepting various unambiguous, common
> > > alternative spellings of various units.  But for instance allowing MB
> and Mb to
> > > mean the same thing is insane.
> > >
> >
> > Because you think some user will be trying to specify their shared_buffers
> in
> > bits?
> >
>
>  My concern is that this information does not stay in the configuration
> files.  It will invariably leak out into whitepapers, presentations, product
> documentation, and before long there will be confusion about why you can't
> stuff N Mb over an N Mb connection.  I am not making this up.

Uh.  So you want force "proper" units in presentations at the price
of everyday admin operations?  Does not seem like a sensible tradeoff.

-- 
marko

Re: [PATCH] Cleanup of GUC units code

From

Gregory Stark

Date:

02 September 2008, 12:25:58

"Marko Kreen" <markokr@gmail.com> writes:

> Uh.  So you want force "proper" units in presentations at the price
> of everyday admin operations?  Does not seem like a sensible tradeoff.

It didn't to anyone else when Peter wrote the current version either, but as
the person willing to actually do the work and write the code Peter got to
make the decision. Nobody else stepped up to do the work to change it and we
can't exactly force Peter to do work he doesn't agree with. The current
version is unquestionably better than what came before where you had to
specify some things in units of Postgres blocks, for example.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!

Re: [PATCH] Cleanup of GUC units code

From

Tom Lane

Date:

02 September 2008, 18:19:18

Gregory Stark <stark@enterprisedb.com> writes:
> "Marko Kreen" <markokr@gmail.com> writes:
>> Uh.  So you want force "proper" units in presentations at the price
>> of everyday admin operations?  Does not seem like a sensible tradeoff.

> It didn't to anyone else when Peter wrote the current version either, but as
> the person willing to actually do the work and write the code Peter got to
> make the decision. Nobody else stepped up to do the work to change it and we
> can't exactly force Peter to do work he doesn't agree with.

It's not that, in my mind: it's that Peter feels more strongly about it
than the rest of us.  This proposal has come up before and he's
successfully argued it down each time.  He does have a point about there
being some potential for confusion; and the arguments on the other side
are not much better than "I'm lazy".  Being lazy myself, I'd prefer a
case insensitive implementation; but I don't feel strongly enough about
it to want to override Peter's opinion.
        regards, tom lane

Re: [PATCH] Cleanup of GUC units code

From

"Asko Oja"

Date:

02 September 2008, 18:48:52

Anything that will reduce potential downtime should be way to go.
To me it seems that Peter uses the loudest voice and others just don't care enough.
Using kB for kilobyte seems quite alien and confusing. I have not noticed that to be used in software i use in my everyday work and could not find any either with quick search.

On Wed, Sep 3, 2008 at 12:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Gregory Stark <stark@enterprisedb.com> writes:
> "Marko Kreen" <markokr@gmail.com> writes:
>> Uh. So you want force "proper" units in presentations at the price
>> of everyday admin operations? Does not seem like a sensible tradeoff.

> It didn't to anyone else when Peter wrote the current version either, but as
> the person willing to actually do the work and write the code Peter got to
> make the decision. Nobody else stepped up to do the work to change it and we
> can't exactly force Peter to do work he doesn't agree with.

It's not that, in my mind: it's that Peter feels more strongly about it
than the rest of us. This proposal has come up before and he's
successfully argued it down each time. He does have a point about there
being some potential for confusion; and the arguments on the other side
are not much better than "I'm lazy". Being lazy myself, I'd prefer a
case insensitive implementation; but I don't feel strongly enough about
it to want to override Peter's opinion.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Cleanup of GUC units code

From

Tom Lane

Date:

02 September 2008, 19:37:20

"Asko Oja" <ascoja@gmail.com> writes:
> Anything that will reduce potential downtime should be way to go.

That argument didn't seem to me to be worth the electrons to rebut,
but now that someone else has repeated it, maybe I should.  It's
ludicrous to claim that allowing case insensitivity here will make
any noticeable improvement in downtime, as there are far more ways
to screw up your config file than this one.  (A config file test mode,
such as was discussed a few weeks ago, *would* help, but this won't.)
In the light of Peter's argument about confusion, one could even
argue that admins putting in wrong values because they got confused
about the meaning of the units is a downtime risk in itself.
        regards, tom lane

Re: [PATCH] Cleanup of GUC units code

From

"Marko Kreen"

Date:

02 September 2008, 19:40:50

On 9/3/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
>  > "Marko Kreen" <markokr@gmail.com> writes:
>  >> Uh.  So you want force "proper" units in presentations at the price
>  >> of everyday admin operations?  Does not seem like a sensible tradeoff.
>
>  > It didn't to anyone else when Peter wrote the current version either, but as
>  > the person willing to actually do the work and write the code Peter got to
>  > make the decision. Nobody else stepped up to do the work to change it and we
>  > can't exactly force Peter to do work he doesn't agree with.
>
>
> It's not that, in my mind: it's that Peter feels more strongly about it
>  than the rest of us.  This proposal has come up before and he's
>  successfully argued it down each time.  He does have a point about there
>  being some potential for confusion; and the arguments on the other side
>  are not much better than "I'm lazy".  Being lazy myself, I'd prefer a
>  case insensitive implementation; but I don't feel strongly enough about
>  it to want to override Peter's opinion.

Um, being lazy is *the point* of units!?  And being friendly to
newbies.  Can there be any other reason for their existence?

-- 
marko

Re: [PATCH] Cleanup of GUC units code

From

"Marko Kreen"

Date:

02 September 2008, 19:44:10

On 9/3/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Asko Oja" <ascoja@gmail.com> writes:
>  > Anything that will reduce potential downtime should be way to go.
>
> That argument didn't seem to me to be worth the electrons to rebut,
>  but now that someone else has repeated it, maybe I should.  It's
>  ludicrous to claim that allowing case insensitivity here will make
>  any noticeable improvement in downtime, as there are far more ways
>  to screw up your config file than this one.  (A config file test mode,
>  such as was discussed a few weeks ago, *would* help, but this won't.)
>  In the light of Peter's argument about confusion, one could even
>  argue that admins putting in wrong values because they got confused
>  about the meaning of the units is a downtime risk in itself.

How can the meaning of "gb", "mb", "kb" be uncertain?

But currently writing "kb" *will* load database with wrong values.

-- 
marko

Re: [PATCH] Cleanup of GUC units code

From

Heikki Linnakangas

Date:

03 September 2008, 05:20:25

Marko Kreen wrote:
> On 9/2/08, Peter Eisentraut <peter_e@gmx.net> wrote:
>> Marko Kreen wrote:
>>> In the meantime, here is simple patch for case-insensivity.
>>  You might be able to talk me into accepting various unambiguous, common
>> alternative spellings of various units.  But for instance allowing MB and Mb
>> to mean the same thing is insane.
> 
> How would the docs for that look like?  And anyway, what is wrong with
> Mb for megabytes?

I doesn't seem completely unreasonable to me that we'd want to express 
something in megabits/second in the future. For example, instead of 
vacuum_cost_delay, it would be cool to specify a bandwidth allowance. 
Megabits/second is a completely reasonable unit for that. Or a limit on 
network bandwidth.

FWIW, I don't feel very strongly either way. I'm more than happy with 
the status quo. The hint in the error message very clearly spells out 
what the valid values are, so it's immediately clear what you need to 
fix if you get that wrong.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: [PATCH] Cleanup of GUC units code

From

"Asko Oja"

Date:

03 September 2008, 05:58:29

On Wed, Sep 3, 2008 at 11:20 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

Marko Kreen wrote:
On 9/2/08, Peter Eisentraut <peter_e@gmx.net> wrote:
Marko Kreen wrote:
In the meantime, here is simple patch for case-insensivity.
You might be able to talk me into accepting various unambiguous, common
alternative spellings of various units. But for instance allowing MB and Mb
to mean the same thing is insane.

How would the docs for that look like? And anyway, what is wrong with
Mb for megabytes?

From infamous wikipedia: A megabit is a unit of information or computer storage, abbreviated Mbit (or Mb).
To me playing with case of acronyms and even depending on it seems more insane. It would make much more sense to have case insensitive set of acronyms and (thanks Tom for pointing out) some sanity checks when configuration is loaded to notify user when wrong ones are used for some context.

I doesn't seem completely unreasonable to me that we'd want to express something in megabits/second in the future. For example, instead of vacuum_cost_delay, it would be cool to specify a bandwidth allowance. Megabits/second is a completely reasonable unit for that. Or a limit on network bandwidth.

There are less confusing (better) acronyms kbit/s and mbit/s available for that.

FWIW, I don't feel very strongly either way. I'm more than happy with the status quo. The hint in the error message very clearly spells out what the valid values are, so it's immediately clear what you need to fix if you get that wrong.

Is the database down during that time?

--
Heikki Linnakangas

EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Cleanup of GUC units code

From

"Marko Kreen"

Date:

03 September 2008, 06:18:44

On 9/3/08, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> Marko Kreen wrote:
> > On 9/2/08, Peter Eisentraut <peter_e@gmx.net> wrote:
> > > Marko Kreen wrote:
> > > > In the meantime, here is simple patch for case-insensivity.
> > > >
> > >  You might be able to talk me into accepting various unambiguous, common
> > > alternative spellings of various units.  But for instance allowing MB
> and Mb
> > > to mean the same thing is insane.
> > >
> >
> > How would the docs for that look like?  And anyway, what is wrong with
> > Mb for megabytes?
> >
>
>  I doesn't seem completely unreasonable to me that we'd want to express
> something in megabits/second in the future. For example, instead of
> vacuum_cost_delay, it would be cool to specify a bandwidth allowance.
> Megabits/second is a completely reasonable unit for that. Or a limit on
> network bandwidth.

While it sounds theoretically useful, as a UI it's combines worst
from both worlds.  You now can confuse user with both case-sensitivity
and unit-mixup.  Even if we keep the current case-sensitivity, we should
set policy that units that differ only with case will never be accepted.

And the best way to set the policy in stone would be to make
units case-insensitive.

I like Asko's proposal of 'kbit/s' and 'mbit/s' for those - clear
and no chance of confusion.

>  FWIW, I don't feel very strongly either way. I'm more than happy with the
> status quo. The hint in the error message very clearly spells out what the
> valid values are, so it's immediately clear what you need to fix if you get
> that wrong.

Well, the problem is that while database may come up with wrong values,
it may be unusable for actual loads, until admin browses the logs,
re-edits the config file and finally restarts in again.

-- 
marko

Re: [PATCH] Cleanup of GUC units code

From

Hannu Krosing

Date:

03 September 2008, 06:24:16

On Tue, 2008-09-02 at 16:50 +0300, Peter Eisentraut wrote:
> Gregory Stark wrote:
> > Peter Eisentraut <peter_e@gmx.net> writes:
> > 
> >> Marko Kreen wrote:
> >>> In the meantime, here is simple patch for case-insensivity.
> >> You might be able to talk me into accepting various unambiguous, common
> >> alternative spellings of various units.  But for instance allowing MB and Mb to
> >> mean the same thing is insane.
> > 
> > Because you think some user will be trying to specify their shared_buffers in
> > bits?

If somebody wants then somebody can still claim that the units are
wrong, as SI specifies 1k = 1000 and 1M=1000000, etc

The binary "should" use IEC multipliers for kibi-, mebi-, tebi-, gibi-,
etc.

http://searchstorage.techtarget.com/sDefinition/0,,sid5_gci825099,00.html

So we could beat "correctness" into DBA-s by forcing them to write KiB
MiB and so on for things that they have always written kb and mb (unless
they are communication engineers, in which case they may be used to
writing these as kB and MB)


> My concern is that this information does not stay in the configuration 
> files.  It will invariably leak out into whitepapers, presentations, 
> product documentation, and before long there will be confusion about why 
> you can't stuff N Mb over an N Mb connection.  

You can't even put N Mbits of data in one sec over N Mbits/sec
connection as there is some overhead from all protocol levels.

Clueless people will also be confused about why they can't fit 1Mbyte of
CVS data into a 1Mbyte database file or why you can't store 1024 MB of
data on a 1GB disk.

> I am not making this up.

Is there anything in conf files that can be reasonably specified in
bits ?

> Mb does not add any typing ease (as "KB" might) or readability (as "sec" 
>    might), and there is no respectable source that will claim it is an 
> acceptable alias for MB.

Are you really afraid that someone would want to use mb to mean
millibits ?

As SQL is generally case insensitive, it is quite surprising to most
people that GUC units are not.

-----------------
Hannu

Re: [PATCH] Cleanup of GUC units code

From

"Joshua D. Drake"

Date:

03 September 2008, 11:53:11

Hannu Krosing wrote:
> On Tue, 2008-09-02 at 16:50 +0300, Peter Eisentraut wrote:
>> Gregory Stark wrote:
>>> Peter Eisentraut <peter_e@gmx.net> writes:

> Are you really afraid that someone would want to use mb to mean
> millibits ?
> 
> As SQL is generally case insensitive, it is quite surprising to most
> people that GUC units are not.

We have had this discussion before, I even submitted a patch to make 
them case insensitive. In retrospect I was wrong to submit that patch. 
SQL may be case insensitive but units are not. MB != Mb != mb , I don't 
think we should encourage in any way for users to do the wrong thing.

Sincerely,

Joshua D. Drake

Re: [PATCH] Cleanup of GUC units code

From

"Joshua D. Drake"

Date:

03 September 2008, 12:21:20

Greg Stark wrote:
> I don't think worrying about the message we send to users is reasonable. 
> We can take responsibilty for the messages we output but punishing our 
> users to teach them a lesson is being actively user-hostile

There is no arguing that MB != Mb; nor is there anything user-hostile 
behind the idea of doing it the right way.

Sincerely,

Joshua D. Drake

Re: [PATCH] Cleanup of GUC units code

From

Hannu Krosing

Date:

03 September 2008, 12:32:29

On Wed, 2008-09-03 at 07:52 -0700, Joshua D. Drake wrote:
> Hannu Krosing wrote:
> > On Tue, 2008-09-02 at 16:50 +0300, Peter Eisentraut wrote:
> >> Gregory Stark wrote:
> >>> Peter Eisentraut <peter_e@gmx.net> writes:
> 
> > Are you really afraid that someone would want to use mb to mean
> > millibits ?
> > 
> > As SQL is generally case insensitive, it is quite surprising to most
> > people that GUC units are not.
> 
> We have had this discussion before, I even submitted a patch to make 
> them case insensitive. In retrospect I was wrong to submit that patch. 
> SQL may be case insensitive but units are not. MB != Mb != mb , 

For most people they are equal, and all mean MEGABYTE(S) though
http://en.wikipedia.org/wiki/MB has lots of other possible meanings for
each.

> I don't think we should encourage in any way for users to do the wrong thing.

Can you see any scenario where accepting case insensitive units does
more damage than just ignoring the conf line with "incorrect" casing ?

Or do you mean we should discourage people from editing postgresql.conf
manually and have them use some tool which prevents them entering "kb" ?

----------------
Hannu

Re: [PATCH] Cleanup of GUC units code

From

Hannu Krosing

Date:

03 September 2008, 12:37:42

On Wed, 2008-09-03 at 08:20 -0700, Joshua D. Drake wrote:
> Greg Stark wrote:
> > I don't think worrying about the message we send to users is reasonable. 
> > We can take responsibilty for the messages we output but punishing our 
> > users to teach them a lesson is being actively user-hostile
> 
> There is no arguing that MB != Mb; 

The whole point of this discussion is, that mostly people expect 
MB == Mb = mb == mB, especially if they see weird constructs like kB
used (k for Kilo, or actually Kibi).

> nor is there anything user-hostile behind the idea of doing it the right way.

I was not trying to expose some sinister plan, just pointing out that
users seldom expect that kind of surprise.

--------------
Hannu

Re: [PATCH] Cleanup of GUC units code

From

Joshua Drake

Date:

03 September 2008, 13:11:10

On Wed, 03 Sep 2008 18:32:16 +0300
Hannu Krosing <hannu@krosing.net> wrote:

> > We have had this discussion before, I even submitted a patch to
> > make them case insensitive. In retrospect I was wrong to submit
> > that patch. SQL may be case insensitive but units are not. MB !=
> > Mb != mb , 
> 
> For most people they are equal, and all mean MEGABYTE(S) though
> http://en.wikipedia.org/wiki/MB has lots of other possible meanings
> for each.

O.k. there is an interesting point being made here, "For most people".

Which people exactly? Are we talking about the general populous? If so
I would argue that "most people" don't have a clue what MB, Mb, or mb
is except to say I think that means some kind of speed or capacity.

The above is not our target.

If our definition of most people is, "those who are reasonably
technically adept and will be deploying PostgreSQL in production on
some level".

If someone doesn't know the difference between Mb and MB on a
production system, I would not want them anywhere near any instance of
a production system.

If we are going to make sweeping statements (anyone on this thread)
about user-hostile and most people, then we better define what those
mean. This whole argument about making something easier (and incorrect)
for someone who doesn't exist and has not been defined.

I would be hung on this list if I made a similar argument about any
other feature.

> 
> > I don't think we should encourage in any way for users to do the
> > wrong thing.
> 
> Can you see any scenario where accepting case insensitive units does
> more damage than just ignoring the conf line with "incorrect" casing ?

Generally speaking, no I can't think of any damage that could be done
from mixed casing. Especially since we would only accept certain
possibilities, e.g; Mb would equal MB.

It just strikes me as really bad that a project that prides itself on
"doing it right" is willing to make this type of sacrifice. This isn't
about usability. This is about doing it wrong and actively encouraging
our users that "wrong is ok". It could also misinform the user about
what the meaning of the value means.

> 
> Or do you mean we should discourage people from editing
> postgresql.conf manually and have them use some tool which prevents
> them entering "kb" ?

Well that is a whole other argument :P. I would be happy to have that
one on another thread.

Sincerely,

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: [PATCH] Cleanup of GUC units code

From

Andrew Sullivan

Date:

03 September 2008, 14:11:36

On Wed, Sep 03, 2008 at 06:37:29PM +0300, Hannu Krosing wrote:
> On Wed, 2008-09-03 at 08:20 -0700, Joshua D. Drake wrote:
> > There is no arguing that MB != Mb; 
> 
> The whole point of this discussion is, that mostly people expect 
> MB == Mb = mb == mB, especially if they see weird constructs like kB
> used (k for Kilo, or actually Kibi).

Note that in the networked computer world, MB and Mb are
importantly different.  The latter is relevant for the speed of your
network interface, for instance.  People often get this wrong when
speaking carelessly, but a mistake of this sort is a serious one,
given the orders of magnitude difference.

A

-- 
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: [PATCH] Cleanup of GUC units code

From

Hannu Krosing

Date:

03 September 2008, 14:18:40

On Wed, 2008-09-03 at 09:10 -0700, Joshua Drake wrote:

> If someone doesn't know the difference between Mb and MB on a
> production system, I would not want them anywhere near any instance of
> a production system.

I for one can make the difference, once I can zen that we are in a
domain, where Mbit makes sense. For me it does not make any sense to
define shared_buffer in Mbit's.

> If we are going to make sweeping statements (anyone on this thread)
> about user-hostile and most people, then we better define what those
> mean. 

Not user-hostile but rather hostile to an overworked DBA, who tries to
change some .conf param from 1MB to 512KB at 3AM to save an overloaded
server and then suddenly the server won't start anymore.

> This whole argument about making something easier (and incorrect)
> for someone who doesn't exist and has not been defined.

I still don't get in what non-academic way it would be incorrect to
spell kilobytes as kb or KB or megabytes as mb ?

Do you have any standard source where it specifies one use and forbids
the other ?

We are not correct according to SI (where k = 1000) nor IEC which
specifies Ki for 1024.

> I would be hung on this list if I made a similar argument about any
> other feature.

Its all about making it easier, not incorrect. 

It may be perhaps "incorrect" in some strictly academic sense, maybe
some article submission guidelines or such.

> > > I don't think we should encourage in any way for users to do the
> > > wrong thing.
> > 
> > Can you see any scenario where accepting case insensitive units does
> > more damage than just ignoring the conf line with "incorrect" casing ?
> 
> Generally speaking, no I can't think of any damage that could be done
> from mixed casing. Especially since we would only accept certain
> possibilities, e.g; Mb would equal MB.
> 
> It just strikes me as really bad that a project that prides itself on
> "doing it right" is willing to make this type of sacrifice. This isn't
> about usability. 

It is all about usability. Not accepting popular spellings of common
units seems to me as unnecessary nit-picking , similar to if we
required _exactly_ one space on either side on = in SET statements.

And we are already "incorrect" by allowing both '' and "" around the
value, the latter would usually imply an identifier :P

hannu=# set effective_cache_size = '132MB';
SET
hannu=# set effective_cache_size = "132MB";
SET

Maybe we should change this too if we want to champion the "right way"?

> This is about doing it wrong and actively encouraging
> our users that "wrong is ok". It could also misinform the user about
> what the meaning of the value means.

You could issue a warning, something like 

"Warning: KB is not a spelling endorsed by PostgeSQL Global Development
Group, in future please use kB, a spelling that many of us feel to be
the only valid one and others don't think important enough to argue with
them."

-----------------------
Hannu

Re: [PATCH] Cleanup of GUC units code

From

Alvaro Herrera

Date:

03 September 2008, 14:49:54

Hannu Krosing escribió:
> On Wed, 2008-09-03 at 09:10 -0700, Joshua Drake wrote:

> > If we are going to make sweeping statements (anyone on this thread)
> > about user-hostile and most people, then we better define what those
> > mean. 
> 
> Not user-hostile but rather hostile to an overworked DBA, who tries to
> change some .conf param from 1MB to 512KB at 3AM to save an overloaded
> server and then suddenly the server won't start anymore.

I think the energy wasted in this discussion would be better spent in
working a the check-the-config-file feature.  That would equally solve
this problem, as well as many others.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [PATCH] Cleanup of GUC units code

From

Andrew Sullivan

Date:

03 September 2008, 15:02:55

On Wed, Sep 03, 2008 at 01:48:18PM -0400, Alvaro Herrera wrote:

> I think the energy wasted in this discussion would be better spent in
> working a the check-the-config-file feature.  That would equally solve
> this problem, as well as many others.

This seems like a good idea to me.

A

-- 
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: [PATCH] Cleanup of GUC units code

From

Greg Stark

Date:

03 September 2008, 15:40:30

I don't think worrying about the message we send to users is  
reasonable. We can take responsibilty for the messages we output but  
punishing our users to teach them a lesson is being actively user- 
hostile

greg

On 3 Sep 2008, at 15:52, "Joshua D. Drake" <jd@commandprompt.com> wrote:

> Hannu Krosing wrote:
>> On Tue, 2008-09-02 at 16:50 +0300, Peter Eisentraut wrote:
>>> Gregory Stark wrote:
>>>> Peter Eisentraut <peter_e@gmx.net> writes:
>
>> Are you really afraid that someone would want to use mb to mean
>> millibits ?
>> As SQL is generally case insensitive, it is quite surprising to most
>> people that GUC units are not.
>
> We have had this discussion before, I even submitted a patch to make  
> them case insensitive. In retrospect I was wrong to submit that  
> patch. SQL may be case insensitive but units are not. MB != Mb !=  
> mb , I don't think we should encourage in any way for users to do  
> the wrong thing.
>
> Sincerely,
>
> Joshua D. Drake
>
>
>

Re: [PATCH] Cleanup of GUC units code

From

Greg Stark

Date:

03 September 2008, 15:41:04

Sure if people want to do it the right way more power to them. What  
you're talking about is punishing people when they don't live up to  
your standards.

greg

On 3 Sep 2008, at 16:20, "Joshua D. Drake" <jd@commandprompt.com> wrote:

> Greg Stark wrote:
>> I don't think worrying about the message we send to users is  
>> reasonable. We can take responsibilty for the messages we output  
>> but punishing our users to teach them a lesson is being actively  
>> user-hostile
>
> There is no arguing that MB != Mb; nor is there anything user- 
> hostile behind the idea of doing it the right way.
>
> Sincerely,
>
> Joshua D. Drake
>

Re: [PATCH] Cleanup of GUC units code

From

Joshua Drake

Date:

03 September 2008, 15:46:25

On Wed, 3 Sep 2008 19:36:19 +0100
Greg Stark <greg.stark@enterprisedb.com> wrote:

> Sure if people want to do it the right way more power to them. What  
> you're talking about is punishing people when they don't live up to  
> your standards.

I think I will defer to Andrew and Alvaro's opinion on the matter.

Sincerely,

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: [PATCH] Cleanup of GUC units code

From

"Greg Sabino Mullane"

Date:

03 September 2008, 16:05:23

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> You do realize that misspelling unit name can cause downtime of several
> minutes instead of couple seconds?  We can easily do restart in couple of
> seconds but the restart, look logs, launch editor, find value, change,
> save, restart cycle will take quite a lot more.  Why should we increase
> the chance that any config edit causes problems?

Minutes? It's like any other changes made to postgresql.conf: make the change,
ctrl-z out of your editor, reload Postgres. If there's an error, foreground
to the editor, revert the change, and reload Postgres. Not ideal (the
config file checker is what we really want), but certainly quick.

> Secondly, humans don't have byte-exact memory, instead they generalize
> and deduce (eg. from nearby parameters).  Thus remembering "KB, MB, GB"
> or "kb, mb, gb" is easier than remembering "kB, MB, GB".

They are all listed at the top of the file, in case there is any confusion.
No need to worry about remembering things.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200809031501
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAki+39gACgkQvJuQZxSWSsiTfACgiNPP77YKGKgyBm36ckkKZhGM
b9sAn2JmrpbMkJ8dm0Wbz3TYdLo83h/S
=PQvy
-----END PGP SIGNATURE-----

Re: [PATCH] Cleanup of GUC units code

From

Hannu Krosing

Date:

03 September 2008, 17:10:46

On Wed, 2008-09-03 at 13:48 -0400, Alvaro Herrera wrote:
> Hannu Krosing escribió:
> > On Wed, 2008-09-03 at 09:10 -0700, Joshua Drake wrote:
> 
> > > If we are going to make sweeping statements (anyone on this thread)
> > > about user-hostile and most people, then we better define what those
> > > mean. 
> > 
> > Not user-hostile but rather hostile to an overworked DBA, who tries to
> > change some .conf param from 1MB to 512KB at 3AM to save an overloaded
> > server and then suddenly the server won't start anymore.
> 
> I think the energy wasted in this discussion would be better spent in
> working a the check-the-config-file feature.

What kind of checks do you have in mind. Would this be something that
works at restart, does the check and continues with old settings if new
file would not load ?

> That would equally solve
> this problem, as well as many others.

AFAIK the config file is checked now, and if the check fails, the
database won't start.

-------------
Hannu

Re: [PATCH] Cleanup of GUC units code

From

Joshua Drake

Date:

03 September 2008, 17:13:17

On Wed, 03 Sep 2008 23:10:24 +0300
Hannu Krosing <hannu@2ndQuadrant.com> wrote:

> > That would equally solve
> > this problem, as well as many others.
> 
> AFAIK the config file is checked now, and if the check fails, the
> database won't start.

Like apachectl configcheck ... E.g; we have the ability to check if the
config file is valid "before" we restart and have an extended outage.

Joshua D. Drake

> 
> -------------
> Hannu
> 
> 


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: [PATCH] Cleanup of GUC units code

From

Alvaro Herrera

Date:

03 September 2008, 17:20:49

Hannu Krosing escribió:
> On Wed, 2008-09-03 at 13:48 -0400, Alvaro Herrera wrote:

> > I think the energy wasted in this discussion would be better spent in
> > working a the check-the-config-file feature.
> 
> What kind of checks do you have in mind. Would this be something that
> works at restart, does the check and continues with old settings if new
> file would not load ?

This was discussed in some other thread.  No, the idea is that before
you do a reload, you run some function or invoke postmaster with certain
arguments, and it checks the config file and it says "this is OK" or
"there are errors here and here".  The check can be run even if the
server is already running.  That way you can just run it just before a
reload or restart.

> > That would equally solve
> > this problem, as well as many others.
> 
> AFAIK the config file is checked now, and if the check fails, the
> database won't start.

... which is not ideal.  Obviously it doesn't make much sense to run the
check if the server is already down, because you'll immediately know
that it won't come up just by trying to start it up.

(However, maybe it would be better for the init script to run the check
anyway, and report the error to stderr where the user running the script
can read it directly instead of having to go check the postmaster log
which could be sitting somewhere else.)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: [PATCH] Cleanup of GUC units code

From

Hannu Krosing

Date:

03 September 2008, 19:27:39

On Wed, 2008-09-03 at 11:45 -0700, Joshua Drake wrote:
> On Wed, 3 Sep 2008 19:36:19 +0100
> Greg Stark <greg.stark@enterprisedb.com> wrote:
> 
> > Sure if people want to do it the right way more power to them. What  
> > you're talking about is punishing people when they don't live up to  
> > your standards.
> 
> I think I will defer to Andrew and Alvaro's opinion on the matter.

So Andrews opinion was that Mb (meaning Mbit) is different from MB (for
megabyte) and that if someone thinks that we define shared buffers in
megabits can get confused and order wrong kind of network card ?

I can understand Alvaros stance more readily - if we have irrational
constraints on what can go into conf file, and people wont listen to
reason, then build better tools for helping people to compli to these
irrational demands. It has the added benefit of helping to catch reall
conf file errors.

I did not realize earlier that KB vs kb vs kB vs Kb is a religious
issue .

I mean, there is no known written standard, which says that Mb is
megabit, not megabyte or that you can (or can't) write kilo as K, but
some people just believe that kB is "the Way" and allowing people to
write kilobytes as KB or kb is evil and should be punished.

To me this sounds stupid, but I understand that this is a thing that
can't be argued logically and I have better things to do than changing
peoples irrational beliefs.

Sorry.

---------------
Hannu

Re: [PATCH] Cleanup of GUC units code

From

Joshua Drake

Date:

03 September 2008, 19:45:46

On Thu, 04 Sep 2008 01:26:44 +0300
Hannu Krosing <hannu@krosing.net> wrote:

> So Andrews opinion was that Mb (meaning Mbit) is different from MB
> (for megabyte) and that if someone thinks that we define shared
> buffers in megabits can get confused and order wrong kind of network
> card ?

I was actually referring to:

http://archives.postgresql.org/pgsql-hackers/2008-09/msg00206.php

Sincerely,

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: [PATCH] Cleanup of GUC units code

From

Alvaro Herrera

Date:

03 September 2008, 21:02:27

Hannu Krosing escribió:

> I mean, there is no known written standard, which says that Mb is
> megabit, not megabyte or that you can (or can't) write kilo as K, but
> some people just believe that kB is "the Way" and allowing people to
> write kilobytes as KB or kb is evil and should be punished.

Yes there is --- it's the SI.

http://en.wikipedia.org/wiki/SI#SI_writing_style

I don't know about it being "evil" and punishment, but it's wrong.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [PATCH] Cleanup of GUC units code

From

Robert Treat

Date:

04 September 2008, 00:15:37

On Wednesday 03 September 2008 16:12:29 Joshua Drake wrote:
> On Wed, 03 Sep 2008 23:10:24 +0300
>
> Hannu Krosing <hannu@2ndQuadrant.com> wrote:
> > > That would equally solve
> > > this problem, as well as many others.
> >
> > AFAIK the config file is checked now, and if the check fails, the
> > database won't start.
>
> Like apachectl configcheck ... E.g; we have the ability to check if the
> config file is valid "before" we restart and have an extended outage.
>

To paraphrase, "if you can't write a config file correctly before restarting, 
I do not want you anywhere near any instance of a production system" 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

Re: [PATCH] Cleanup of GUC units code

From

Hannu Krosing

Date:

04 September 2008, 02:52:19

On Wed, 2008-09-03 at 20:01 -0400, Alvaro Herrera wrote:
> Hannu Krosing escribió:
> 
> > I mean, there is no known written standard, which says that Mb is
> > megabit, not megabyte or that you can (or can't) write kilo as K, but
> > some people just believe that kB is "the Way" and allowing people to
> > write kilobytes as KB or kb is evil and should be punished.
> 
> Yes there is --- it's the SI.
> 
> http://en.wikipedia.org/wiki/SI#SI_writing_style
> 
> I don't know about it being "evil" and punishment, but it's wrong.

SI defines decimal-based prefixes, where k = kilo = 1000, so our current
conf use is also wrong.

http://en.wikipedia.org/wiki/Kilobyte
...
1024 bytes (2^10): This unit is used when expressing quantities which
are based on powers of two, such as memory chip capacities. Most
software also expresses storage capacity in units of 1024 bytes.
Although the use of kilobyte for this unit is common, this usage has
been expressly forbidden by the SI standard and other standards
organisations. To indicate a quantity of 1024 bytes, the term kibibyte
(KiB) has been recommended instead.

And no, I am not proposing switching conf units to KiB, nor am I
proposing to use MB to mean 1 000 000. I'm quite happy with us being
"wrong" in 1000 v. 1024 area.

---------------
Hannu

Re: [PATCH] Cleanup of GUC units code

From

Dimitri Fontaine

Date:

04 September 2008, 04:57:26

Le jeudi 04 septembre 2008, Robert Treat a écrit :
> To paraphrase, "if you can't write a config file correctly before
> restarting, I do not want you anywhere near any instance of a production
> system"

Do you really want to TCO of PostgreSQL to raise that much when the software
could help lowering it?

If you're a shop where you can't have only experts in any given domain do
level 1 nightly "fix", maybe you'd still like them to be able to follow some
procedures involving server config change.
On what grounds are you wanting this not to be possible?

Regards,
--
dim

Re: [PATCH] Cleanup of GUC units code

From

Alvaro Herrera

Date:

04 September 2008, 10:21:05

Hannu Krosing escribió:
> On Wed, 2008-09-03 at 20:01 -0400, Alvaro Herrera wrote:

> > Yes there is --- it's the SI.
> > 
> > http://en.wikipedia.org/wiki/SI#SI_writing_style
> > 
> > I don't know about it being "evil" and punishment, but it's wrong.
> 
> SI defines decimal-based prefixes, where k = kilo = 1000, so our current
> conf use is also wrong.

Actually, this has been a moving target.  For a certain length of time,
some standards did accept that k meant 1024 "in computing context"; see

http://en.wikipedia.org/wiki/Binary_prefix

So we're not _absolutely_ wrong here; at least not until KiB are more
widely accepted and kB more widely refused to mean 1024 bytes.  The
relevant standard has been published just this year by ISO.

http://en.wikipedia.org/wiki/ISO/IEC_80000#Binary_prefixes

So this is new territory, whereas case-sensitivity of prefixes and unit
abbreviations has existed for decades.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: [PATCH] Cleanup of GUC units code

From

Andrew Sullivan

Date:

04 September 2008, 10:30:20

On Thu, Sep 04, 2008 at 01:26:44AM +0300, Hannu Krosing wrote:

> So Andrews opinion was that Mb (meaning Mbit) is different from MB (for
> megabyte) and that if someone thinks that we define shared buffers in
> megabits can get confused and order wrong kind of network card ?

I know it's fun to point and laugh instead of giving an argument, but
the above is not what I said.  What I said is that there is a
technical difference between at least some of these units, and one
that is relevant in some contexts where we have good reason to believe
Postgres is used.  So it seems to me that there is at least a _prima
facie_ reason in favour of making case-based decisions.  Your argument
against that appears to be, "Well, people can be sloppy."

Alvaro's suggestion seems to me to be a better one.  It is customary,
in servers with large complicated configuration systems, for the
server to come with a tool that validates the configuration file
before you try to load it.  Postfix does this; apache does it; so does
BIND.  Heck, even NSD (which is way less configurable than BIND) does
this.  Offering such a tool provides considerable more benefit than
the questionable one of allowing people to type whatever they want
into the configuration file and suppose that the server will by magic
know what they meant.

> I can understand Alvaros stance more readily - if we have irrational
> constraints on what can go into conf file, and people wont listen to
> reason

Extending your current reasoning, it's irrational that all the names
of the parameters have to be spelled correctly.  Why can't we just
accept log_statement_duration_min?  It's _obvious_ that it's the same
thing as log_min_duration_statement!  It's silly to expect that
harried administrators have to spell these options correctly.  Why
can't we parse all the file, separating each label by "_".  Then if
any arrangements of those labels matches a "real" configuration
parameter, select that one as the thing to match and proceed from
there?

A

-- 
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: [PATCH] Cleanup of GUC units code

From

Steve Atkins

Date:

04 September 2008, 11:01:32

On Sep 4, 2008, at 6:29 AM, Andrew Sullivan wrote:

> On Thu, Sep 04, 2008 at 01:26:44AM +0300, Hannu Krosing wrote:
>
>> So Andrews opinion was that Mb (meaning Mbit) is different from MB  
>> (for
>> megabyte) and that if someone thinks that we define shared buffers in
>> megabits can get confused and order wrong kind of network card ?
>
> I know it's fun to point and laugh instead of giving an argument, but
> the above is not what I said.  What I said is that there is a
> technical difference between at least some of these units, and one
> that is relevant in some contexts where we have good reason to believe
> Postgres is used.  So it seems to me that there is at least a _prima
> facie_ reason in favour of making case-based decisions.  Your argument
> against that appears to be, "Well, people can be sloppy."

Settings in postgresql.conf are currently case-insensitive. Except
for the units.

> Alvaro's suggestion seems to me to be a better one.  It is customary,
> in servers with large complicated configuration systems, for the
> server to come with a tool that validates the configuration file
> before you try to load it.  Postfix does this; apache does it; so does
> BIND.  Heck, even NSD (which is way less configurable than BIND) does
> this.  Offering such a tool provides considerable more benefit than
> the questionable one of allowing people to type whatever they want
> into the configuration file and suppose that the server will by magic
> know what they meant.

How would such a tool cope with, for example, shared_buffers
being set to one eighth the size the DBA intended, due to their
use of Mb rather than MB? Both of which are perfectly valid
units to use to set shared buffers, even though we only support
one right now. If the answer to that is something along the lines
of we don't support megaabits for shared_buffers, and never will because
nobody in their right mind would ever intend to use megabits
to set their shared buffer size... that's a useful datapoint when
it comes to designing for usability.

Cheers,  Steve

Re: [PATCH] Cleanup of GUC units code

From

Hannu Krosing

Date:

04 September 2008, 11:29:11

On Thu, 2008-09-04 at 09:29 -0400, Andrew Sullivan wrote:
> On Thu, Sep 04, 2008 at 01:26:44AM +0300, Hannu Krosing wrote:
> 
> > So Andrews opinion was that Mb (meaning Mbit) is different from MB (for
> > megabyte) and that if someone thinks that we define shared buffers in
> > megabits can get confused and order wrong kind of network card ?
> 
> I know it's fun to point and laugh instead of giving an argument, but
> the above is not what I said.  What I said is that there is a
> technical difference between at least some of these units, and one
> that is relevant in some contexts where we have good reason to believe
> Postgres is used.  So it seems to me that there is at least a _prima
> facie_ reason in favour of making case-based decisions.  Your argument
> against that appears to be, "Well, people can be sloppy."
> 
> Alvaro's suggestion seems to me to be a better one. 

Agreed. maybe this can even be implemented as a special switch to
postmaster (maybe -n or --dry-run, similar to make), not a separate
command.

> > I can understand Alvaros stance more readily - if we have irrational
> > constraints on what can go into conf file, and people wont listen to
> > reason
> 
> Extending your current reasoning, it's irrational that all the names
> of the parameters have to be spelled correctly.

It would be irrational to allow all letters in parameter names to be
case-insensitive, except 'k' which has to be lowercase ;)

The main point of confusion comes from not accepting KB and this bites
you when you go down from MB, with reasoning like "ok, it seems that
units are in uppercase, so let's change 1MB to 768KB and see what
happens"

-------------
Hannu

Re: [PATCH] Cleanup of GUC units code

From

Andrew Sullivan

Date:

04 September 2008, 12:52:02

On Thu, Sep 04, 2008 at 07:01:18AM -0700, Steve Atkins wrote:
> Settings in postgresql.conf are currently case-insensitive. Except
> for the units.

And, of course, filenames when you are using a case-sensitive
filesystem.  Because these are things that are defined by some
convention other than the ones the PGDG made up.  Since units fall
into that category, it seems to me that we're stuck with using
external conventions.

> one right now. If the answer to that is something along the lines
> of we don't support megaabits for shared_buffers, and never will because
> nobody in their right mind would ever intend to use megabits
> to set their shared buffer size... that's a useful datapoint when
> it comes to designing for usability.

And you are going to establish this worldwide convention on what
someone in right mind would do how, exactly?  For instance, I think
nobody in right mind would use "KB" to mean "kilobytes".  I suppose
you could get a random sample of all current Postgres users to decide
what makes sense, but then you'd have the problem of knowing whether
you had a random sample, since the population isn't obviously
identifiable.  Or, we could just stick with the convention that we
already have, and write a tool that captures this an other issues.
Maybe even one that could later form the basis for an automatic tuning
advisor, as well.

The problem with appeals to common sense always turns out to be that
different people's common sense leads them to different conclusions.
(We had a devastating government in Ontario some years ago that claimed
to be doing things that were just common sense; the Province is still
cleaning up the mess.)  

A

-- 
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: [PATCH] Cleanup of GUC units code

From

Simon Riggs

Date:

08 September 2008, 08:09:10

On Wed, 2008-09-03 at 11:58 +0300, Asko Oja wrote:
> On Wed, Sep 3, 2008 at 11:20 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>         Marko Kreen wrote:
>                 On 9/2/08, Peter Eisentraut <peter_e@gmx.net> wrote:
>                         Marko Kreen wrote:
>                                 In the meantime, here is simple patch
>                                 for case-insensivity.
>                          You might be able to talk me into accepting
>                         various unambiguous, common
>                         alternative spellings of various units.  But
>                         for instance allowing MB and Mb
>                         to mean the same thing is insane. 
>                 
>                 How would the docs for that look like?  And anyway,
>                 what is wrong with
>                 Mb for megabytes?
> From infamous wikipedia: A megabit is a unit of information or
> computer storage, abbreviated Mbit (or Mb).
> To me playing with case of acronyms and even depending on it seems
> more insane. It would make much more sense to have case insensitive
> set of acronyms and (thanks Tom for pointing out) some sanity checks
> when configuration is loaded to notify user when wrong ones are used
> for some context.

There is a patch on the CommitFest, so we must either accept it or
reject it (with/without comments).

Peter's objection is reasonable, as far as most people have replied.
Marko's proposal is also reasonable to most people, since they do not
wish fat fingers to cause any amount of downtime. ISTM that if you've
done this, you appreciate the feature, if not it seems less important.

I would point out that Marko's patch is about what values get accepted
in postgresql.conf, not how the units are represented when you perform
SHOW, look at pg_settings or read the docs. So Marko's proposal does not
ignore the important distinction between units in *all* places, just at
the time of input. With that in mind, the proposal seems to be
acceptable to the majority, based upon my assessment of the comments.

In terms of the patch, ISTM that relaxing the comparison logic also
appears to relax the warnings given and that is not acceptable, given
concerns raised.

So my recommendation to everybody is
* we allow case insensitive matches of units in postgresql.conf
* Marko should change patch to put WARNINGs in place so people know they
got it wrong
* we make sure the case is always shown correctly in all other aspects
of Postgres server and docs (no relaxation at all there)
* in the longer term, we look for the solution to be a config checker

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: [PATCH] Cleanup of GUC units code

From

Tom Lane

Date:

08 September 2008, 09:40:15

Simon Riggs <simon@2ndQuadrant.com> writes:
> Peter's objection is reasonable, as far as most people have replied.
> Marko's proposal is also reasonable to most people, since they do not
> wish fat fingers to cause any amount of downtime. ISTM that if you've
> done this, you appreciate the feature, if not it seems less important.

I really think that the claim that this will "save downtime" is a
ridiculous argument.  On that basis we should, for example, be looking
for a nearest match for any misspelled variable name.  The fact of the
matter is that a configuration validator is a far better answer to any
such worries than trying to accept bad/questionable input.

> So my recommendation to everybody is
> * we allow case insensitive matches of units in postgresql.conf
> * Marko should change patch to put WARNINGs in place so people know they
> got it wrong
> * we make sure the case is always shown correctly in all other aspects
> of Postgres server and docs (no relaxation at all there)
> * in the longer term, we look for the solution to be a config checker

My vote is to reject the patch and work on a config checker.
        regards, tom lane

Re: [PATCH] Cleanup of GUC units code

From

Gregory Stark

Date:

08 September 2008, 09:54:59

Simon Riggs <simon@2ndQuadrant.com> writes:

> Peter's objection is reasonable, as far as most people have replied.
> Marko's proposal is also reasonable to most people, since they do not
> wish fat fingers to cause any amount of downtime. ISTM that if you've
> done this, you appreciate the feature, if not it seems less important.

My objection isn't down-time at all, it's the insultingly user-hostile
attitude. I normally am setting work_mem by hand in a psql session and *every*
time I do it I swear at Postgres for being annoyingly pedantic here.

I'm all for using the correct acronyms in all messages and documentation. What
I find annoying is the:

postgres=# set work_mem = '1g';
ERROR: invalid value for parameter "work_mem": "1g"
HINT:  It's perfectly clear what you want but I'm going to refuse to do      it until you type it exactly as I say:
"GB"

> * Marko should change patch to put WARNINGs in place so people know they
> got it wrong

That's only slightly less insulting than an error.

> * we make sure the case is always shown correctly in all other aspects
> of Postgres server and docs (no relaxation at all there)

I believe we're already fairly stringent about this as we should be.

> * in the longer term, we look for the solution to be a config checker

I don't think a config checker directly addresses the same problem. I never
set work_mem in a config and it still annoys the hell out of me.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production
Tuning

Re: [PATCH] Cleanup of GUC units code

From

Tom Lane

Date:

08 September 2008, 10:11:50

Gregory Stark <stark@enterprisedb.com> writes:
> I'm all for using the correct acronyms in all messages and documentation. What
> I find annoying is the:

> postgres=# set work_mem = '1g';
> ERROR: invalid value for parameter "work_mem": "1g"

But of course case insensitivity isn't going to fix that example for you.
So we're right back at the question of where we should draw the line in
trying to accept variant input.
        regards, tom lane

Re: [PATCH] Cleanup of GUC units code

From

"Greg Stark"

Date:

08 September 2008, 10:27:13

On Mon, Sep 8, 2008 at 2:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> But of course case insensitivity isn't going to fix that example for you.
> So we're right back at the question of where we should draw the line in
> trying to accept variant input.

Well it's not a perfect precedent but for example, dd accepts:

G    (2^30)
M    (2^20)
k     (2^10)
K    (2^10)
Kb  (10^3)
MB (10^6)
GB (10^9)
b    (512)

I think we're all agreed we want to ignore the KiB crap and make all
our units base 2. And I don't think usin "b" for block makes sense for
us. But the point is that yes, people expect to type "100M" or "1G"
and have that work. Plenty of us do it all the time with dd or other
tools already.

-- 
greg

Re: [PATCH] Cleanup of GUC units code

From

tomas@tuxteam.de

Date:

08 September 2008, 10:35:47

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, Sep 08, 2008 at 02:18:55PM +0100, Greg Stark wrote:
> On Mon, Sep 8, 2008 at 2:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > But of course case insensitivity isn't going to fix that example for you.
> > So we're right back at the question of where we should draw the line in
> > trying to accept variant input.
> 
> Well it's not a perfect precedent but for example, dd accepts:
> 
> G    (2^30)
> M    (2^20)
> k     (2^10)
> K    (2^10)
> Kb  (10^3)
> MB (10^6)
> GB (10^9)
> b    (512)

Heh. But it doesn't take Gb or gB or gb, which was one of your express
wishes. So -- hm.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIxSxlBcgs9XrR2kYRAkOkAJ9pp7nNjgJOb2NtHPwKIKZMcsNYlQCdE8wa
VQ/c+9Nan1V3d+/cdTm+Xn4=
=rkzg
-----END PGP SIGNATURE-----

Re: [PATCH] Cleanup of GUC units code

From

Tom Lane

Date:

08 September 2008, 11:11:32

"Greg Stark" <stark@enterprisedb.com> writes:
> On Mon, Sep 8, 2008 at 2:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> But of course case insensitivity isn't going to fix that example for you.
>> So we're right back at the question of where we should draw the line in
>> trying to accept variant input.

> Well it's not a perfect precedent but for example, dd accepts:

> G    (2^30)
> M    (2^20)
> k     (2^10)
> K    (2^10)
> Kb  (10^3)
> MB (10^6)
> GB (10^9)
> b    (512)

Hmm.  I could get behind a proposal to allow single-letter abbreviations
if it could be made to work across the board, but what about the time
units?  "ms" vs "min" is the sticky part there...
        regards, tom lane

Re: [PATCH] Cleanup of GUC units code

From

"Greg Sabino Mullane"

Date:

08 September 2008, 11:15:26

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


Tom Lane wrote:
> My vote is to reject the patch and work on a config checker.

+1

> postgres=# set work_mem = '1g';
> ERROR: invalid value for parameter "work_mem": "1g"

Perhaps this would be a great place for a HINT listing all
valid inputs, if not there already?

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200809081014
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkjFM2AACgkQvJuQZxSWSsiDvACdE6Wlrnu3uQH4mOpuEMvX0VQe
rXoAoPLCR5jKTWQH4GsHDtz5NNZXq4vA
=nRMS
-----END PGP SIGNATURE-----

Re: [PATCH] Cleanup of GUC units code

From

Alvaro Herrera

Date:

08 September 2008, 11:18:08

Greg Sabino Mullane wrote:

> Tom Lane wrote:

> > postgres=# set work_mem = '1g';
> > ERROR: invalid value for parameter "work_mem": "1g"
> 
> Perhaps this would be a great place for a HINT listing all
> valid inputs, if not there already?

alvherre=# set work_mem = '1g';
ERROR:  invalid value for parameter "work_mem": "1g"
HINT:  Valid units for this parameter are "kB", "MB", and "GB".

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: [PATCH] Cleanup of GUC units code

From

Gregory Stark

Date:

08 September 2008, 11:27:55

"Greg Sabino Mullane" <greg@turnstep.com> writes:

> Tom Lane wrote:
>> My vote is to reject the patch and work on a config checker.
>
> +1
>
>> postgres=# set work_mem = '1g';
>> ERROR: invalid value for parameter "work_mem": "1g"
>
> Perhaps this would be a great place for a HINT listing all
> valid inputs, if not there already?

It is, I paraphrased it on my original message as:

HINT:  It's perfectly clear what you want but I'm going to refuse to do      it until you type it exactly as I say:
"GB"


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about
EnterpriseDB'sPostgreSQL training!

Re: [PATCH] Cleanup of GUC units code

From

Alvaro Herrera

Date:

08 September 2008, 11:33:27

Gregory Stark wrote:
> "Greg Sabino Mullane" <greg@turnstep.com> writes:
> 
> > Tom Lane wrote:
> >> My vote is to reject the patch and work on a config checker.
> >
> > +1
> >
> >> postgres=# set work_mem = '1g';
> >> ERROR: invalid value for parameter "work_mem": "1g"
> >
> > Perhaps this would be a great place for a HINT listing all
> > valid inputs, if not there already?
> 
> It is, I paraphrased it on my original message as:
> 
> HINT:  It's perfectly clear what you want but I'm going to refuse to do
>        it until you type it exactly as I say: "GB"

It's good as a joke, but what if the user says '1024b'?  Does it mean
1024 blocks or one kilobyte?  If blocks, what size are we talking, the
usual 512 bytes, or our BLCKSZ?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: [PATCH] Cleanup of GUC units code

From

Gregory Stark

Date:

08 September 2008, 11:47:08

Alvaro Herrera <alvherre@commandprompt.com> writes:

> It's good as a joke, but what if the user says '1024b'?  Does it mean
> 1024 blocks or one kilobyte?  If blocks, what size are we talking, the
> usual 512 bytes, or our BLCKSZ?

For what guc would you find a unit of posix-style "blocks" relevant?!

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!

Re: [PATCH] Cleanup of GUC units code

From

Tom Lane

Date:

08 September 2008, 12:03:24

Gregory Stark <stark@enterprisedb.com> writes:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> It's good as a joke, but what if the user says '1024b'?  Does it mean
>> 1024 blocks or one kilobyte?  If blocks, what size are we talking, the
>> usual 512 bytes, or our BLCKSZ?

> For what guc would you find a unit of posix-style "blocks" relevant?!

The point isn't whether it's relevant, the point is that there's a
fairly serious doubt as to what the user thought it meant.
        regards, tom lane

Re: [PATCH] Cleanup of GUC units code

From

Gregory Stark

Date:

08 September 2008, 12:17:15

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>>> It's good as a joke, but what if the user says '1024b'?  Does it mean
>>> 1024 blocks or one kilobyte?  If blocks, what size are we talking, the
>>> usual 512 bytes, or our BLCKSZ?
>
>> For what guc would you find a unit of posix-style "blocks" relevant?!
>
> The point isn't whether it's relevant, the point is that there's a
> fairly serious doubt as to what the user thought it meant.

My point was that we don't accept any form of "b" today and nobody has
suggested we should.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!

Re: [PATCH] Cleanup of GUC units code

From

Korry Douglas

Date:

08 September 2008, 16:10:48

>> Settings in postgresql.conf are currently case-insensitive. Except
>> for the units.
>
> And, of course, filenames when you are using a case-sensitive
> filesystem.  Because these are things that are defined by some
> convention other than the ones the PGDG made up.  Since units fall
> into that category, it seems to me that we're stuck with using
> external conventions.


Just a thought... since there are very few (if any) places where a  
user would specify a variable in terms of bits (kilobits, megabits,  
gigabits), we could make the units case-insensitive and assume that  
kb, gb, and mb (and all case variants) refer to some number of bytes.   
If a user wants to specify a variable in terms of bits, he would have  
to spell out the units completely, as in "4 gigabits", "20 kilobits",  
or "16 megabits".

        -- Korry

Re: [PATCH] Cleanup of GUC units code

From

Ron Mayer

Date:

09 September 2008, 01:12:58

Marko Kreen wrote:
> Thirdly, please don't use "standard units" argument, unless you plan to
> propose use of "KiB, MiB, GiB" at the same moment.  

In defense of standard units, if the postgres docs say
"Postgres will round up to the nearest power of 2"
kB and MB seem very clear to me.  If we want to silently
accept other abbreviations and acronyms too (mb, mIb, whatever)
that doesn't bother me, tho.

Re: [PATCH] Cleanup of GUC units code

From

Joshua Drake

Date:

09 September 2008, 01:18:59

On Mon, 8 Sep 2008 10:32:40 -0400
Alvaro Herrera <alvherre@commandprompt.com> wrote:

> Gregory Stark wrote:
> > "Greg Sabino Mullane" <greg@turnstep.com> writes:
> > 
> > > Tom Lane wrote:
> > >> My vote is to reject the patch and work on a config checker.
> > >
> > > +1

+1

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: [PATCH] Cleanup of GUC units code

From

Peter Eisentraut

Date:

09 September 2008, 13:55:23

Tom Lane wrote:
> "Greg Stark" <stark@enterprisedb.com> writes:
>> On Mon, Sep 8, 2008 at 2:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> But of course case insensitivity isn't going to fix that example for you.
>>> So we're right back at the question of where we should draw the line in
>>> trying to accept variant input.
> 
>> Well it's not a perfect precedent but for example, dd accepts:
> 
>> G    (2^30)
>> M    (2^20)
>> k     (2^10)
>> K    (2^10)
>> Kb  (10^3)
>> MB (10^6)
>> GB (10^9)
>> b    (512)
> 
> Hmm.  I could get behind a proposal to allow single-letter abbreviations
> if it could be made to work across the board,

The SQL standard actually specifies something about that.  You can 
define the length of large object types (CLOB and BLOB) with multipliers 
K, M, and G, as in

CREATE TABLE foo ( bar BLOB(5 M) );

These multipliers are case insensitive, of course.  (And their are 
1024-based, FWIW.)

So I could imagine that we generalize this approach to make these 
multipliers available in other positions.

This would have definitional problems of its own, however.  If you 
interpret K, M, and G strictly as unit-less multipliers, then

SET shared_buffers = 2 G

would mean

SET shared_buffers = 2 * 1073741824

meaning

SET shared_buffers = 2147463648

which is not the same thing as the current

SET shared_buffer = '2GB'

This also affects the solution to another GUC units complaint that the 
quotes are annoying, which I support.

We could possibly settle some of these arguments if we could redefine 
all memory parameters to use one byte as base unit, and then allow some 
ambiguity and unit omission from there.  But that would probably cause 
much havoc, so we are stuck with a certain degree of inconsistency anyhow.

Re: [PATCH] Cleanup of GUC units code

From

"Robert Haas"

Date:

09 September 2008, 16:33:44

> This would have definitional problems of its own, however.  If you interpret
> K, M, and G strictly as unit-less multipliers, then
>
> SET shared_buffers = 2 G

I don't think it would be a good idea to make them unit-less, for
exactly the reasons you mention.

> We could possibly settle some of these arguments if we could redefine all
> memory parameters to use one byte as base unit, and then allow some
> ambiguity and unit omission from there.  But that would probably cause much
> havoc, so we are stuck with a certain degree of inconsistency anyhow.

A good start might be to always OUTPUT memory parameters using the
same base unit.

portal=# show shared_buffers;shared_buffers
----------------24MB
(1 row)

portal=# show temp_buffers;temp_buffers
--------------1024
(1 row)

Kilobytes seems like the most reasonable choice, because we definitely
have variables where you would want to set a value less than 1
megabyte, and I doubt we have (or will ever need) any where we the
granularity is finer than than 1 kilobyte.

Beyond that, how about moving in the direction of deprecating
unit-less settings altogether?  In other words, if you want 1024
shared buffers, you should be saying 8192kB or 8MB rather than 1024.
We could issue a WARNING for 8.4 and eventually move to rejecting that
syntax altogether.  That gets everything into the same base unit
without ever change the semantics of any particular value.

...Robert

Re: [PATCH] Cleanup of GUC units code

From

Hannu Krosing

Date:

09 September 2008, 17:47:40

On Thu, 2008-09-04 at 11:51 -0400, Andrew Sullivan wrote:
> On Thu, Sep 04, 2008 at 07:01:18AM -0700, Steve Atkins wrote:
> > Settings in postgresql.conf are currently case-insensitive. Except
> > for the units.
> 
> And, of course, filenames when you are using a case-sensitive
> filesystem.  Because these are things that are defined by some
> convention other than the ones the PGDG made up.  Since units fall
> into that category, it seems to me that we're stuck with using
> external conventions.
> 
> > one right now. If the answer to that is something along the lines
> > of we don't support megaabits for shared_buffers, and never will because
> > nobody in their right mind would ever intend to use megabits
> > to set their shared buffer size... that's a useful datapoint when
> > it comes to designing for usability.
> 
> And you are going to establish this worldwide convention on what
> someone in right mind would do how, exactly?  For instance, I think
> nobody in right mind would use "KB" to mean "kilobytes".  

Except those following JEDEC standards ;)

from http://en.wikipedia.org/wiki/JEDEC_memory_standards

-------------------
JEDEC Standard 100B.01[1] defines the "prefix to units of semiconductor
storage capacity" as follows:
     * kilo (K): A multiplier equal to 1024 (210).     * mega (M): A multiplier equal to 1 048 576 (220 or K2, where K
=      1024).     * giga (G): A multiplier equal to 1 073 741 824 (230 or K3, where       K = 1024).
 
-------------------

I'm not sure if this applies only to storage _capacity_ or also to stuff
stored using said capacity.


but in general I think it is easier to train millions of DBAs to use kB
than to achieve consensus about what "everybody" assumes on this list,
so I also give "+1" to working on config checker "instead" .

------------------
Hannu

Re: [PATCH] Cleanup of GUC units code

From

Greg Smith

Date:

09 September 2008, 19:56:35

On Tue, 9 Sep 2008, Robert Haas wrote:

> A good start might be to always OUTPUT memory parameters using the
> same base unit.

SHOW gives output that matches what you input.  If you want to see things 
with consistant units, look at pg_settings:

# select name,unit,setting,current_setting(name) from pg_settings where 
name='shared_buffers' or name='temp_buffers';
      name      | unit | setting | current_setting
----------------+------+---------+----------------- shared_buffers | 8kB  | 131072  | 1GB temp_buffers   | 8kB  | 1024
 | 1024

> Beyond that, how about moving in the direction of deprecating
> unit-less settings altogether?  In other words, if you want 1024
> shared buffers, you should be saying 8192kB or 8MB rather than 1024.
> We could issue a WARNING for 8.4 and eventually move to rejecting that
> syntax altogether.  That gets everything into the same base unit
> without ever change the semantics of any particular value.

Deprecating setting the value directly in its base unit, so that all the 
memory parameters are specified as a number of bytes, would sidestep a lot 
of the issues people complain about.  What I would like to see (but don't 
have nearly enough time to argue in support of considering the resistance 
to change here) is that this syntax:

shared_buffers=1024

Would assume the user meant 1024 *bytes*, with the server silently 
rounding that up to the nearest 8k block.  Then the whole issue of "do 
they mean bits or bytes?" goes away, because no one would ever have to 
include the "B".  That paves the way for making it easy to support 
case-insensitive values without pedantic confusion.  As I see it these 
should all give you the same thing:

shared_buffers=16777216
shared_buffers=16384k
shared_buffers=16384K
shared_buffers=16M
shared_buffers=16m

Because that's what would make life easiest for those configuring the 
software.

Since this suggestion will inevitably lead to cries of befowled backward 
compatibility, what I've been doing instead of participating in this 
thread is working on a tool that will make it easy to convert old 
postgresql.conf files to support a new version.  Then the tool can convert 
all the places someone uses the old syntax into the new.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: [PATCH] Cleanup of GUC units code

From

Tom Lane

Date:

09 September 2008, 21:29:33

Greg Smith <gsmith@gregsmith.com> writes:
> What I would like to see (but don't 
> have nearly enough time to argue in support of considering the resistance 
> to change here) is that this syntax:

> shared_buffers=1024

> Would assume the user meant 1024 *bytes*, with the server silently 
> rounding that up to the nearest 8k block.  Then the whole issue of "do 
> they mean bits or bytes?" goes away, because no one would ever have to 
> include the "B".

How do you come to that conclusion?  Leaving off the unit entirely
certainly doesn't make the user's intent clearer.

There's also a pretty serious compatibility problem, which is that
settings that had always worked before would suddenly be completely
broken (off by a factor of 8192 qualifies as "broken" in my book).

I think that if we wanted to change anything here, we'd have to
*require* a unit spec on unit-affected parameters, at least for a period
of several releases.  Otherwise the confusion would be horrendous.

> That paves the way for making it easy to support 
> case-insensitive values without pedantic confusion.

Again, you're just making this up.  It doesn't make anything clearer.
        regards, tom lane

Re: [PATCH] Cleanup of GUC units code

From

Greg Smith

Date:

09 September 2008, 23:07:14

On Tue, 9 Sep 2008, Tom Lane wrote:

> How do you come to that conclusion?  Leaving off the unit entirely
> certainly doesn't make the user's intent clearer.

Same way I do all my conclusions in this area--talking to people in the 
field regularly who've never configured a postgresql.conf before.  I 
highly recommend it for a fresh view.

Here's how this works every time I go throught it.  When you first 
encounter someone who is new to PostgreSQL, after they find out 
"shared_buffers" is a memory allocation they presume it's in bytes until 
they find out otherwise.  And then they're slightly annoyed that a) if 
they accidentally don't include a unit all the values get way bigger 
because of some backward compatibility nonsense they don't care about and 
b) that it's case sensitive.  Since some of the arguments against (b) ("Mb 
could mean megabits!") diminish if the recommended practice is to just 
keep the multiplier in there, so the byte part of the unit is optional and 
not used by the default postgresql.conf, that seems the most reasonable 
way to proceed to me.

Unlike things like network speed where it's more complicated, memory is 
measured in bytes unless there's a different multiplier attached by most 
people.  Greg Stark already made this same observation yesterday:  "But 
the point is that yes, people expect to type "100M" or "1G" and have that 
work. Plenty of us do it all the time with dd or other tools already."

I don't actually expect any adjustment here but was getting a little bored 
watching everyone replay 
http://archives.postgresql.org/pgsql-hackers/2006-07/msg01229.php with 
barely any changes from the first time.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: [PATCH] Cleanup of GUC units code

From

"Robert Haas"

Date:

09 September 2008, 23:30:47

>> A good start might be to always OUTPUT memory parameters using the
>> same base unit.
> SHOW gives output that matches what you input.

Not for me it doesn't.

portal=# show temp_buffers;temp_buffers
--------------1024
(1 row)

portal=# set temp_buffers = '16MB';
SET
portal=# show temp_buffers;temp_buffers
--------------2048
(1 row)

> Deprecating setting the value directly in its base unit, so that all the
> memory parameters are specified as a number of bytes, would sidestep a lot
> of the issues people complain about.  What I would like to see (but don't
> have nearly enough time to argue in support of considering the resistance to
> change here) is that this syntax:
>
> shared_buffers=1024

Silly me, perhaps, but in the absence of a unit, I would assume that a
number in this context refers to a number of BUFFERS.  I'd be willing
to bet money that if you showed this line to a bunch of people who
knew nothing about postgresql but were reasonably computer-savvy,
you'd get a lot more people who thought that was a number of buffers
than who thought it was a number of bytes.

Compounding the problem, of course, is the fact, that right now,
that's exactly what it does mean.  While it's probably acceptable to
change the semantics of postgresql.conf in such a way that this syntax
is no longer accepted, or generates a warning, it's almost certainly a
bad idea to change it to silently mean something that is four orders
of magnitude different what it currently means.

> Would assume the user meant 1024 *bytes*, with the server silently rounding
> that up to the nearest 8k block.  Then the whole issue of "do they mean bits
> or bytes?" goes away, because no one would ever have to include the "B".

I don't believe this issue exists in the first place, because anyone
who would specify the size of their shared buffer pool in bits should
have their head examined.  But if, hypothetically, someone were
confused, I can't imagine that taking away the "B" is somehow going to
be more clear.

...Robert

Re: [PATCH] Cleanup of GUC units code

From

Ron Mayer

Date:

10 September 2008, 11:38:41

Robert Haas wrote:
>bits...bytes...blocks...m...M
> 
> I can't imagine that taking away the "B" is somehow going to
> be more clear.

If clarity is the goal, I'd want the following:

a) Verbosely spelling out the units in the default config file
   temp_buffers = 16 megabytes   or   temp_buffers = 16 milliblocks :-)
   Naive users who favor cut&paste will use the verbose words   that should leave little room for confusion.
Power-users  who know the short forms from the docs will presumably have   read the descriptions.

b) having "show" show verbosely spelled out units.   db=# show temp_buffers;   temp_buffers   --------------   16000000
bytes  (1 row)

c) having "set" show a NOTICE with the verbose word for the units  db=# set temp_buffers = '16mb';  NOTICE: setting
temp_buffersto 16000000 bytes

Re: [PATCH] Cleanup of GUC units code

From

"Robert Haas"

Date:

10 September 2008, 12:48:46

> a) Verbosely spelling out the units in the default config file
>   temp_buffers = 16 megabytes
>   or
>   temp_buffers = 16 milliblocks :-)
>   Naive users who favor cut&paste will use the verbose words
>   that should leave little room for confusion.  Power-users
>   who know the short forms from the docs will presumably have
>   read the descriptions.

I think it would make a lot of sense to encourage adding the word
"buffers" or "blocks" when that is the unit in question.  This is all
religion at this point, but I find it difficult to believe that there
is any real need to spell out megabytes.

...Robert

Re: [PATCH] Cleanup of GUC units code

From

Peter Eisentraut

Date:

11 September 2008, 08:28:48

Robert Haas wrote:
>>> A good start might be to always OUTPUT memory parameters using the
>>> same base unit.
>> SHOW gives output that matches what you input.
> 
> Not for me it doesn't.
> 
> portal=# show temp_buffers;
>  temp_buffers
> --------------
>  1024
> (1 row)
> 
> portal=# set temp_buffers = '16MB';
> SET
> portal=# show temp_buffers;
>  temp_buffers
> --------------
>  2048
> (1 row)

temp_buffers is actually special-cased in the code because
   /*    * We show the GUC var until local buffers have been initialized, and    * NLocBuffer afterwards.    */

It is not clear to me right now why that is a good idea.  But it is only 
this one paramter.

The actual logic that SHOW uses in the general case is to reformat the 
value with the largest unit that allows for an integer value to be 
presented.  So this is neither your nor Greg's idea, but I think it 
gives useful behavior in practice.

Re: [PATCH] Cleanup of GUC units code

From

Tom Lane

Date:

11 September 2008, 08:58:53

Peter Eisentraut <peter_e@gmx.net> writes:
> temp_buffers is actually special-cased in the code because

>     /*
>      * We show the GUC var until local buffers have been initialized, and
>      * NLocBuffer afterwards.
>      */

> It is not clear to me right now why that is a good idea.

I think the reason for it is that you can change the setting within a
particular session, but only up until temp buffers have been initialized
(ie, when you first touch a temp table).  If we just made it act like
other GUCs then SHOW would be lying to you about the effective value
if you changed it after that point.

An easy workaround would be to make the variable PGC_BACKEND, but that
seems to me to lose useful flexibility.  Or we could invent a context
category specifically for this type of behavior, but is it worth that?

The narrowest fix would be to just teach the show hook to format its
result properly.  I seem to recall having looked at that and been
annoyed by the amount of copy-and-paste required.

[ thinks for a bit... ]  Or maybe we should get rid of the show hook
and instead put in an assign hook that prevents changes after the
initialization event.  The problem is that you can't throw error
while reading, eg, a new config file.  So while this might be the most
user-friendly approach, I think it would take some API change for assign
hooks.
        regards, tom lane

Re: [PATCH] Cleanup of GUC units code

From

"Robert Haas"

Date:

11 September 2008, 09:20:52

> temp_buffers is actually special-cased in the code because
>
>   /*
>    * We show the GUC var until local buffers have been initialized, and
>    * NLocBuffer afterwards.
>    */
>
> It is not clear to me right now why that is a good idea.  But it is only
> this one paramter.

OK, well that's not so bad then, although it would be nice to make it
consistent.

> The actual logic that SHOW uses in the general case is to reformat the value
> with the largest unit that allows for an integer value to be presented.  So
> this is neither your nor Greg's idea, but I think it gives useful behavior
> in practice.

Yes, that's a totally sensible choice as well.  What do you think of
the idea of always requiring an explicit unit, either by deprecating
blocks as a unit altogether, or pushing users to specify "blocks" in
that case?  It seemed to me from reading your previous response that
you thought this would make it more possible to be more flexible about
how MB, GB, etc. are specified, although I'm not exactly sure what the
relationship is.

...Robert