Re: WIP: a way forward on bootstrap data - Mailing list pgsql-hackers

From Tom Lane
Subject Re: WIP: a way forward on bootstrap data
Date
Msg-id 22280.1522880971@sss.pgh.pa.us
Whole thread Raw
In response to Re: WIP: a way forward on bootstrap data  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: WIP: a way forward on bootstrap data  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
I'm starting to look through v13 seriously, and one thing struck
me that could use some general discussion: what is our policy
going to be for choosing the default values for catalog columns?
In particular, I noticed that you have for pg_proc

    bool        proisstrict BKI_DEFAULT(f);

    char        provolatile BKI_DEFAULT(v);

    char        proparallel BKI_DEFAULT(u);

which do not comport at all with the most common values in those
columns.  As of HEAD, I see

postgres=# select proisstrict, count(*) from pg_proc group by 1;
 proisstrict | count 
-------------+-------
 f           |   312
 t           |  2640
(2 rows)

postgres=# select provolatile, count(*) from pg_proc group by 1;
 provolatile | count 
-------------+-------
 i           |  2080
 s           |   570
 v           |   302
(3 rows)

postgres=# select proparallel, count(*) from pg_proc group by 1;
 proparallel | count 
-------------+-------
 r           |   139
 s           |  2722
 u           |    91
(3 rows)

(Since this is from the final initdb state, this overstates the number
of .bki entries for pg_proc a bit, but not by much.)

I think there's no question that the default for proisstrict ought
to be "true" --- not only is that by far the more common choice,
but it's actually the safer choice.  A C function that needs to be
marked strict and isn't will at best do the wrong thing, and quite
likely will crash, if passed a NULL value.

The defaults for provolatile and proparallel maybe require more thought
though.  What you've chosen corresponds to the default assumptions of
CREATE FUNCTION, which are what we need for user-defined functions that
we don't know anything about; but I'm not sure that makes them the best
defaults for built-in functions.  I'm inclined to go with the majority
values here, in part because that will make the outliers stand out when
looking at pg_proc.dat.  I don't think it's great that we'll have 2800+
entries explicitly marked with proparallel 'i' or 's', but the less-than-
100 with proparallel 'u' will be so only implicitly because the rewrite
script will strip out any field entries that match the default.  That's
really the worst of all worlds: it'd be better to have no default
in this column at all, I think, than to behave like that.

In short, I'm tempted to say that when there's a clear majority of
entries that would use a particular default, that's the default we
should use, whether or not it's "surprising" or "unsafe" according
to the semantics.  It's clearly not "surprising" for a C function
to be marked proparallel 's'; the other cases are more so.

I'm not seeing any other BKI_DEFAULT choices that I'm inclined to
question, so maybe it's a mistake to try to derive any general
policy choices from such a small number of cases.  But anyway
I'm inclined to change these cases.

Comments anyone?

            regards, tom lane


pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Next
From: Andres Freund
Date:
Subject: Re: WIP: a way forward on bootstrap data