Thread: [PATCH] Make various variables read-only (const)

[PATCH] Make various variables read-only (const)

From
Oskari Saarenmaa
Date:
This allows the variables to be moved from .data to .rodata section which
means that more data can be shared by processes and makes sure that nothing
can accidentally overwrite the read-only definitions.  On a x86-64 Linux
system this moves roughly 9 kilobytes of previously writable data to the
read-only data segment in the backend and 4 kilobytes in libpq.

https://github.com/saaros/postgres/compare/constify

24 files changed, 108 insertions(+), 137 deletions(-)

/ Oskari

Attachment

Re: [PATCH] Make various variables read-only (const)

From
Robert Haas
Date:
On Fri, Dec 20, 2013 at 12:01 PM, Oskari Saarenmaa <os@ohmu.fi> wrote:
> This allows the variables to be moved from .data to .rodata section which
> means that more data can be shared by processes and makes sure that nothing
> can accidentally overwrite the read-only definitions.  On a x86-64 Linux
> system this moves roughly 9 kilobytes of previously writable data to the
> read-only data segment in the backend and 4 kilobytes in libpq.
>
> https://github.com/saaros/postgres/compare/constify
>
> 24 files changed, 108 insertions(+), 137 deletions(-)

This sounds like a broadly good thing, but I've had enough painful
experiences with const to be a little wary.  And how much does this
really affect data sharing?  Doesn't copy-on-write do the same thing
for writable data?  Could we get most of the benefit by const-ifying
one or two large data structures and forget the rest?

Other comments:

- The first hunk of the patch mutilates the comment it modifies for no
apparent reason.  Please revert.

- Why change the API of transformRelOptions()?

-#define DEF_ENC2NAME(name, codepage) { #name, PG_##name }
+/* The extra NUL-terminator will make sure a warning is raised if the
+ * storage space for name is too small, otherwise when strlen(name) ==
+ * sizeof(pg_enc2name.name) the NUL-terminator would be silently dropped.
+ */
+#define DEF_ENC2NAME(name, codepage) { #name "\0", PG_##name }

- The above hunk is not related to the primary purpose of this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Make various variables read-only (const)

From
Wim Lewis
Date:
By an odd coincidence, I also decided to try to const-ify libpq
recently, and noticed this thread as I was cleaning up my patch for
submission. For what it's worth, I've attached my patch to this message.
It doesn't move as much data into the text segment as Oskari Saarenmaa's
patch does, but it is less intrusive (simply adding const modifiers here
and there). I just went after the low-hanging fruit in libpq.

As an aside, it might make sense to make pg_encname_tbl and
pg_encname_tbl_sz static, since as far as I can tell those symbols are
never used outside of encnames.c nor are they likely to be. I didn't
make that change in this patch though.

On Mon, Dec 23, 2013, Robert Haas <robertmhaas@gmail.com> wrote:
> And how much does this really affect data sharing?  Doesn't copy-on-write do the same thing for writable data?

It can have a surprisingly large effect if read-only data gets
intermixed on pages with actual read-write data and can get COWd
unnecessarily.

My motivation, though, was more about code correctness than memory
sharing, though sharing is a nice benefit--- I was examining unexpected
symbols in .data/.bss in case they represented a thread-safety problem
for my program linked against libpq. (They didn't, FWIW, except for the
known and documented issue with PQoidStatus(). Not that I really
expected to find a problem in libpq, but marking those structures const
makes it nice and clear that they're not mutable.)


Attachment

Re: [PATCH] Make various variables read-only (const)

From
Oskari Saarenmaa
Date:
On Sun, Dec 22, 2013 at 09:43:57PM -0500, Robert Haas wrote:
> On Fri, Dec 20, 2013 at 12:01 PM, Oskari Saarenmaa <os@ohmu.fi> wrote:
> > This allows the variables to be moved from .data to .rodata section which
> > means that more data can be shared by processes and makes sure that nothing
> > can accidentally overwrite the read-only definitions.  On a x86-64 Linux
> > system this moves roughly 9 kilobytes of previously writable data to the
> > read-only data segment in the backend and 4 kilobytes in libpq.
> >
> > https://github.com/saaros/postgres/compare/constify
> >
> > 24 files changed, 108 insertions(+), 137 deletions(-)
> 
> This sounds like a broadly good thing, but I've had enough painful
> experiences with const to be a little wary.  And how much does this
> really affect data sharing?  Doesn't copy-on-write do the same thing
> for writable data?  Could we get most of the benefit by const-ifying
> one or two large data structures and forget the rest?

Thanks for the review and sorry for the late reply, I was offline for a
while.

As Wim Lewis pointed out in his mail the const data is most likely
mixed with non-const data and copy-on-write won't help with all of it. 
Also, some of the const data includes duplicates and thus .data actually
shrinks more than .rodata grows.

We'd probably get most of the space-saving benefits by just constifying the
biggest variables, but I think applying const to more things will also make
things more correct.

> Other comments:
> 
> - The first hunk of the patch mutilates the comment it modifies for no
> apparent reason.  Please revert.
> 
> - Why change the API of transformRelOptions()?

The comment was changed to reflect the new API, I modified
transformRelOptions to only accept a single valid namespace to make things
simpler in the calling code.  Nothing used more than one valid namespace
anyway, and it allows us to just use a constant "toast" without having to
create a 2 char* array with a NULL.

> -#define DEF_ENC2NAME(name, codepage) { #name, PG_##name }
> +/* The extra NUL-terminator will make sure a warning is raised if the
> + * storage space for name is too small, otherwise when strlen(name) ==
> + * sizeof(pg_enc2name.name) the NUL-terminator would be silently dropped.
> + */
> +#define DEF_ENC2NAME(name, codepage) { #name "\0", PG_##name }
> 
> - The above hunk is not related to the primary purpose of this patch.

It sort-of is.  Without fixed size char-arrays it's not possible to move
everything to .rodata, but fixed size char-arrays come with the drawback of
silently dropping the NUL-terminator when strlen(str) == sizeof(array), by
forcing a NUL-terminator in we always get a warning if it would've been
dropped and the size of the array can then be increased.

Thanks,
Oskari



Re: [PATCH] Make various variables read-only (const)

From
Tom Lane
Date:
Oskari Saarenmaa <os@ohmu.fi> writes:
> On Sun, Dec 22, 2013 at 09:43:57PM -0500, Robert Haas wrote:
>> - Why change the API of transformRelOptions()?

> The comment was changed to reflect the new API, I modified
> transformRelOptions to only accept a single valid namespace to make things
> simpler in the calling code.  Nothing used more than one valid namespace
> anyway, and it allows us to just use a constant "toast" without having to
> create a 2 char* array with a NULL.

That may be true today, but the code was obviously designed to allow for
more than one namespace in the future.  I'm inclined to reject this part
of the patch, or at least rework it to const-ify the existing data
structure rather than editorialize on what's needed.

However, I believe this code was Alvaro's to begin with, so I'd be
interested in his opinion on how important it is for transformRelOptions
to allow more than one namespace per call.

Actually, I'm kind of wondering why the code has a concept of namespaces
at all, given the fact that it fails to store them in the resulting array.
It seems impossible to verify after the fact that an option was given with
the right namespace, so isn't the feature pretty much pointless?
        regards, tom lane



Re: [PATCH] Make various variables read-only (const)

From
Tom Lane
Date:
I wrote:
> However, I believe this code was Alvaro's to begin with, so I'd be
> interested in his opinion on how important it is for transformRelOptions
> to allow more than one namespace per call.

> Actually, I'm kind of wondering why the code has a concept of namespaces
> at all, given the fact that it fails to store them in the resulting array.
> It seems impossible to verify after the fact that an option was given with
> the right namespace, so isn't the feature pretty much pointless?

After thinking about it some more, I realize that the intended usage
pattern is to call transformRelOptions once for each allowed namespace.
Since each call would ignore options outside its namespace, that would
result in any options with invalid namespace names being silently ignored;
so the fix was to add a parameter saying which namespaces are valid,
and then each call checks that, independently of extracting the options
it cares about.

This design seems a bit odd to me; it's certainly wasting effort, since
each namespace check after the first one is redundant.  I'm inclined to
propose that we factor out the namespace validity checking into a separate
function, such that you do something like

void checkOptionNamespaces(List *defList, const char * const validnsps[])

just once, and then call transformRelOptions for each of the desired
namespaces; transformRelOptions's validnsps argument goes away.  In at
least some places this looks like it would net out cleaner; for instance,
there is no good reason why callers that are trying to extract "toast"
options should have to know which other namespaces are allowed.
        regards, tom lane



Re: [PATCH] Make various variables read-only (const)

From
Tom Lane
Date:
I wrote:
> [ speculation about refactoring the API of transformRelOptions ]

This morning I remembered that there's another patch in the queue with
an interest in the API and behavior of transformRelOptions:
https://commitfest.postgresql.org/action/patch_view?id=1318

While I think that one has no chance of getting committed in exactly
the submitted form, some descendant patch may well make it in; if we do
anything to transformRelOptions right now it'll create merge issues for
any work in that area.

Moreover, the amount of .data space that'd be saved by twiddling
transformRelOptions' API is negligible, so it's not really that
interesting for the purposes of this patch.  So I think the right
thing to do for now is just drop the relevant parts of this patch.
We can revisit the issue, if still needed, after the extension-options
dust settles.

I'll keep looking at the rest of this patch.
        regards, tom lane



Re: [PATCH] Make various variables read-only (const)

From
Tom Lane
Date:
Oskari Saarenmaa <os@ohmu.fi> writes:
> On Sun, Dec 22, 2013 at 09:43:57PM -0500, Robert Haas wrote:
>> -#define DEF_ENC2NAME(name, codepage) { #name, PG_##name }
>> +/* The extra NUL-terminator will make sure a warning is raised if the
>> + * storage space for name is too small, otherwise when strlen(name) ==
>> + * sizeof(pg_enc2name.name) the NUL-terminator would be silently dropped.
>> + */
>> +#define DEF_ENC2NAME(name, codepage) { #name "\0", PG_##name }
>> 
>> - The above hunk is not related to the primary purpose of this patch.

> It sort-of is.  Without fixed size char-arrays it's not possible to move
> everything to .rodata, but fixed size char-arrays come with the drawback of
> silently dropping the NUL-terminator when strlen(str) == sizeof(array), by
> forcing a NUL-terminator in we always get a warning if it would've been
> dropped and the size of the array can then be increased.

AFAICT, this change and some similar ones are based on a false assumption.
It is *not* necessary to replace pointers by fixed-length arrays in order
to get things into .rodata.  For example, in chklocale.c where we already
had this coding:

struct encoding_match
{enum pg_enc pg_enc_code;const char *system_enc_name;
};

static const struct encoding_match encoding_match_list[] = {{PG_EUC_JP, "EUC-JP"},{PG_EUC_JP, "eucJP"},...

what I see in gcc -S (on x86_64 Linux) is:
.section    .rodata.align 32.type    encoding_match_list, @object.size    encoding_match_list, 1776
encoding_match_list:.long    1.zero    4.quad    .LC5.long    1.zero    4.quad    .LC6
....section    .rodata.str1.1
.LC5:.string    "EUC-JP"
.LC6:.string    "eucJP"
...

So it's all read-only data anyway, as can be confirmed with "size"
which shows that the .o file contains *no* data outside the text
segment.  There might be some platforms where this isn't the case,
but I don't think they're mainstream.

Converting the pointer arrangement to a fixed-length struct member
as per the patch has the following implications:

* We save the pointer, and possibly some alignment padding, at the cost
that now every string has to be padded to the maximal length.  This
might produce a net space savings if all the strings in the table
are of similar length, but it's hardly a guaranteed win.  Note also
that we no longer get to share storage with any other uses of the
same strings as plain literals.

* Possibly there's some savings in executable startup time as a
consequence of eliminating pointer-relocation work.  This would be
platform dependent, and in any case I've never heard any complaints
suggesting that this is a metric we need to care about at all.

* We now have a risk of possible silent breakage if any string reaches
the maximum length.  This is exacerbated by the temptation to not leave
a lot of daylight in the chosen maximum length, so as to minimize the
amount of bloat created.

The hunk Robert is complaining about is attempting to deal with that last
drawback; but it seems like a mighty ugly technique to me, and there are
other places where you've replaced pointers by fixed-length arrays without
adding any such protection (because it's notationally impractical).

TBH I don't find this to be an improvement, especially in cases where
it forces code changes beyond just const-ifying the data declarations.
I don't have a lot of confidence for instance that you found all the
places that were checking for a NULL pointer as an array terminator.
Having to change existing code greatly weakens the argument that this
is a correctness-improving patch, because now you have to factor in a
risk of actively breaking something.

So I think the fixed-length-array changes in this patch are a bad
idea; it's sufficient to make sure we've const-ified both the pointers
and the containing structs, as in the existing chklocale.c coding.
        regards, tom lane



Re: [PATCH] Make various variables read-only (const)

From
Tom Lane
Date:
I wrote:
> AFAICT, this change and some similar ones are based on a false assumption.
> It is *not* necessary to replace pointers by fixed-length arrays in order
> to get things into .rodata.

After further experimentation I find that this claim is true when
compiling "normally", but apparently not when using -fpic, at least
not on RHEL6 x86_64.  Even when const-ified, the tables in encnames.o
and wchar.o end up in the data segment (though the underlying strings
are not).  So if we want a meaningful shrinkage in the size of the data
segment in libpq.so, we'd have to do something similar to what Oskari
proposes.

However, I'm still against doing so; the other points I made before
still apply, and I think on balance fixed-length arrays are still a
bad idea.  It seems like a particularly bad idea to expose a limit
on the length of an encoding name as part of the ABI defined by
pg_wchar.h, as the submitted patch did.  I'm on board with making
changes like this where they can be argued to improve correctness and
maintainability, but surely moving to fixed-length arrays is the
opposite of that.
        regards, tom lane



Re: [PATCH] Make various variables read-only (const)

From
Tom Lane
Date:
Oskari Saarenmaa <os@ohmu.fi> writes:
> This allows the variables to be moved from .data to .rodata section which
> means that more data can be shared by processes and makes sure that nothing
> can accidentally overwrite the read-only definitions.  On a x86-64 Linux
> system this moves roughly 9 kilobytes of previously writable data to the
> read-only data segment in the backend and 4 kilobytes in libpq.

Committed, with the changes mentioned upthread and some other minor
editorialization.  I also adopted Wim Lewis' suggestion to not export
pg_encname_tbl[] at all anymore, since it's hard to see what the point
is of doing anything besides pg_char_to_encoding() with it.
        regards, tom lane