Thread: pg_verify_checksums and -fno-strict-aliasing

pg_verify_checksums and -fno-strict-aliasing

From
Michael Banck
Date:
Hi,

I noticed that pg_verify_checksums computes bogus checksums if I compile
it with '-O2 -Wall' but without -fno-strict-aliasing. Also I am getting
a compile warning then:

|src/bin/pg_verify_checksums$ make CFLAGS='-g -O2 -Wall'
[...]
|gcc -g -O2 -Wall -I../../../src/include 
|  -I/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/include
|  -D_GNU_SOURCE   -c -o pg_verify_checksums.o
|  /home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/bin/pg_verify_checksums/pg_verify_checksums.c
|  -MMD -MP -MF .deps/pg_verify_checksums.Po
|/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/bin/pg_verify_checksums/pg_verify_checksums.c: 
|  In function ‘scan_file’:
|/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/bin/pg_verify_checksums/pg_verify_checksums.c:112:3: 
|  warning: dereferencing type-punned pointer will break strict-aliasing
|  rules [-Wstrict-aliasing]
|   if (PageIsNew(buf))
|   ^~
[...]
|src/bin/pg_verify_checksums$ ./pg_verify_checksums -D ../../test/regress/tmp_check/data
|pg_verify_checksums: checksum verification failed in file "../../test/regress/tmp_check/data/global/1233", block 0:
calculatedchecksum FC8A but expected F857
 
|pg_verify_checksums: checksum verification failed in file "../../test/regress/tmp_check/data/global/1233", block 1:
calculatedchecksum F340 but expected A68D
 
[...]
|pg_verify_checksums: checksum verification failed in file "../../test/regress/tmp_check/data/base/12368/2659", block
5:calculated checksum 954D but expected D2ED
 
|pg_verify_checksums: checksum verification failed in file "../../test/regress/tmp_check/data/base/12368/2659", block
6:calculated checksum 361 but expected E7F7
 
|pg_verify_checksums: checksum verification failed in file "../../test/regress/tmp_check/data/base/12368/2659", block
7:calculated checksum 4C0D but expected 4B10
 
|pg_verify_checksums: checksum verification failed in file "../../test/regress/tmp_check/data/base/12368/2659", block
8:calculated checksum 2B9A but expected 71E3
 
|pg_verify_checksums: checksum verification failed in file "../../test/regress/tmp_check/data/base/12368/2659", block
9:calculated checksum 19CD but expected 541C
 
|Checksum scan completed
|Data checksum version: 1
|Files scanned:  932
|Blocks scanned: 2789
|Bad checksums:  2789

If I add -fno-strict-aliasing to $CFLAGS, the problem goes away.

Is this something to worry about, or just pilot error cause I am not
using the same $CFLAGS as for the rest of the build? I originally
noticed this problem with my external fork of pg_verify_checksums.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: pg_verify_checksums and -fno-strict-aliasing

From
Fabien COELHO
Date:
> I noticed that pg_verify_checksums computes bogus checksums if I compile
> it with '-O2 -Wall' but without -fno-strict-aliasing. Also I am getting
> a compile warning then:
>
> [...]
>
> If I add -fno-strict-aliasing to $CFLAGS, the problem goes away.
>
> Is this something to worry about, or just pilot error cause I am not
> using the same $CFLAGS as for the rest of the build? I originally
> noticed this problem with my external fork of pg_verify_checksums.

It looks like the code is using aliasing where the standard says it should 
not, which breaks compiler optimization, and the added options tells the 
compiler to not assume that the code conforms to the standard...

As PostgreSQL source is expected to conform to some C standard (unsure 
which one right now, possibly c89 but maybe it is beginning to switch to 
c99, a young 19 years old standard), I'd suggest that the right fix is 
rather to actually remove the aliasing issue.

-- 
Fabien.


Re: pg_verify_checksums and -fno-strict-aliasing

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> If I add -fno-strict-aliasing to $CFLAGS, the problem goes away.
>> Is this something to worry about, or just pilot error cause I am not
>> using the same $CFLAGS as for the rest of the build? I originally
>> noticed this problem with my external fork of pg_verify_checksums.

> It looks like the code is using aliasing where the standard says it should 
> not, which breaks compiler optimization, and the added options tells the 
> compiler to not assume that the code conforms to the standard...

Actually, this code is just plain broken:

    char        buf[BLCKSZ];
    PageHeader    header = (PageHeader) buf;

There is no guarantee that a local char[] array is aligned on anything
stronger than a byte boundary, and if it isn't, word-sized accesses to
*header are going to fail on machines with strict alignment rules.
I suppose that that is really what Michael's compiler is moaning about.

I rather suspect that this hasn't been tested on anything but Intel
hardware, which is famously misalignment-tolerant.  The lack of any
apparent regression test infrastructure for it isn't leaving a warm
feeling about how much the buildfarm is testing it.

(The right fix, of course, is to malloc the work buffer rather than
put it on the stack.)

            regards, tom lane


Re: pg_verify_checksums and -fno-strict-aliasing

From
Alvaro Herrera
Date:
On 2018-Aug-30, Fabien COELHO wrote:

> As PostgreSQL source is expected to conform to some C standard (unsure which
> one right now, possibly c89 but maybe it is beginning to switch to c99, a
> young 19 years old standard), I'd suggest that the right fix is rather to
> actually remove the aliasing issue.

Yeah, type aliasing is quite a rampant issue in our code and I don't
think there's an easy fix.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_verify_checksums and -fno-strict-aliasing

From
Michael Paquier
Date:
On Thu, Aug 30, 2018 at 10:39:26AM -0400, Tom Lane wrote:
> I rather suspect that this hasn't been tested on anything but Intel
> hardware, which is famously misalignment-tolerant.  The lack of any
> apparent regression test infrastructure for it isn't leaving a warm
> feeling about how much the buildfarm is testing it.
>
> (The right fix, of course, is to malloc the work buffer rather than
> put it on the stack.)

pg_upgrade/file.c is careful about that (5afcd2a), and has a comment on
the matter, as does pg_standby.c.

Now, grepping around for "BLCKSZ]", some garbage may have accumulated?
- entrySplitPage in ginentrypage.c
- rewind_copy_file_range in file.c
- _hash_alloc_buckets in hashpage.c
- pg_prewarm.c
- blinsert.c
- pg_waldump.c
- logtape.c
--
Michael

Attachment

Re: pg_verify_checksums and -fno-strict-aliasing

From
Magnus Hagander
Date:
On Thu, Aug 30, 2018 at 4:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> If I add -fno-strict-aliasing to $CFLAGS, the problem goes away.
>> Is this something to worry about, or just pilot error cause I am not
>> using the same $CFLAGS as for the rest of the build? I originally
>> noticed this problem with my external fork of pg_verify_checksums.

> It looks like the code is using aliasing where the standard says it should
> not, which breaks compiler optimization, and the added options tells the
> compiler to not assume that the code conforms to the standard...

Actually, this code is just plain broken:

        char            buf[BLCKSZ];
        PageHeader      header = (PageHeader) buf;

There is no guarantee that a local char[] array is aligned on anything
stronger than a byte boundary, and if it isn't, word-sized accesses to
*header are going to fail on machines with strict alignment rules.
I suppose that that is really what Michael's compiler is moaning about.

I rather suspect that this hasn't been tested on anything but Intel
hardware, which is famously misalignment-tolerant.  The lack of any
apparent regression test infrastructure for it isn't leaving a warm
feeling about how much the buildfarm is testing it.

I know I certainly didn't test it on non-intel.

We did have that in the online verify checksum patch, but it came out as part of the revert of that part.

Given that we also recently found bugs in the actual hash index implementation that would've gotten caught by this, perhaps we should add a TAP test for this.

Should we make it a separate test in pg_verify_checksums, or should we piggyback on the pg_basebackup tests (which AFAICT is the only ones that create a cluster with checksums enabled at all, and thus is the only codepath that actually uses the backend checksum code at all, which I think is an even worse thing to have no tests of)


(The right fix, of course, is to malloc the work buffer rather than
put it on the stack.)

So if I get you  right, you're saying the attached patch should be all that's needed? 

--
Attachment

Re: pg_verify_checksums and -fno-strict-aliasing

From
Magnus Hagander
Date:
On Thu, Aug 30, 2018 at 9:35 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Aug 30, 2018 at 4:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> If I add -fno-strict-aliasing to $CFLAGS, the problem goes away.
>> Is this something to worry about, or just pilot error cause I am not
>> using the same $CFLAGS as for the rest of the build? I originally
>> noticed this problem with my external fork of pg_verify_checksums.

> It looks like the code is using aliasing where the standard says it should
> not, which breaks compiler optimization, and the added options tells the
> compiler to not assume that the code conforms to the standard...

Actually, this code is just plain broken:

        char            buf[BLCKSZ];
        PageHeader      header = (PageHeader) buf;

There is no guarantee that a local char[] array is aligned on anything
stronger than a byte boundary, and if it isn't, word-sized accesses to
*header are going to fail on machines with strict alignment rules.
I suppose that that is really what Michael's compiler is moaning about.

I rather suspect that this hasn't been tested on anything but Intel
hardware, which is famously misalignment-tolerant.  The lack of any
apparent regression test infrastructure for it isn't leaving a warm
feeling about how much the buildfarm is testing it.

I know I certainly didn't test it on non-intel.

We did have that in the online verify checksum patch, but it came out as part of the revert of that part.

Given that we also recently found bugs in the actual hash index implementation that would've gotten caught by this, perhaps we should add a TAP test for this.

Should we make it a separate test in pg_verify_checksums, or should we piggyback on the pg_basebackup tests (which AFAICT is the only ones that create a cluster with checksums enabled at all, and thus is the only codepath that actually uses the backend checksum code at all, which I think is an even worse thing to have no tests of)

PFA some *very* basic tests for pg_verify_checksums, which should at least be enough to catch the kind of errors we had now in the tool itself.

Does not at this point try to solve the fact that the backend checksum code is not tested.


(The right fix, of course, is to malloc the work buffer rather than
put it on the stack.)

So if I get you  right, you're saying the attached patch should be all that's needed? 


--
Attachment

Re: pg_verify_checksums and -fno-strict-aliasing

From
Michael Paquier
Date:
On Thu, Aug 30, 2018 at 09:35:33PM +0200, Magnus Hagander wrote:
> Should we make it a separate test in pg_verify_checksums, or should we
> piggyback on the pg_basebackup tests (which AFAICT is the only ones that
> create a cluster with checksums enabled at all, and thus is the only
> codepath that actually uses the backend checksum code at all, which I think
> is an even worse thing to have no tests of)

This should be a separate suite.  And FWIW, we only use pg_regress to
make sure that an already-initialized data folder has the correct,
secure authentication set.  So creating a node with checksums enabled is
just that:
$node->init(extra => ['--data-checksums']);

[... 20 minutes later ...]
Attached is a basic test suite ;)
--
Michael

Attachment

Re: pg_verify_checksums and -fno-strict-aliasing

From
Magnus Hagander
Date:
On Thu, Aug 30, 2018 at 10:02 PM, Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Aug 30, 2018 at 09:35:33PM +0200, Magnus Hagander wrote:
> Should we make it a separate test in pg_verify_checksums, or should we
> piggyback on the pg_basebackup tests (which AFAICT is the only ones that
> create a cluster with checksums enabled at all, and thus is the only
> codepath that actually uses the backend checksum code at all, which I think
> is an even worse thing to have no tests of)

This should be a separate suite.  And FWIW, we only use pg_regress to
make sure that an already-initialized data folder has the correct,
secure authentication set.  So creating a node with checksums enabled is
just that:
$node->init(extra => ['--data-checksums']);

[... 20 minutes later ...]
Attached is a basic test suite ;)

Haha, nice timing :) 

I wonder if your tests that pg_control has picked things up belong more in the tests of initdb itself?

Do you think there is value in testing against a non-checksum cluster? I guess there's some point to it. I think testing actual corruption (like my version of the tests) is more valuable, but perhaps we should just do both?

--

Re: pg_verify_checksums and -fno-strict-aliasing

From
Andres Freund
Date:
On 2018-08-30 10:39:26 -0400, Tom Lane wrote:
>     char        buf[BLCKSZ];
>     PageHeader    header = (PageHeader) buf;

> (The right fix, of course, is to malloc the work buffer rather than
> put it on the stack.)

Or alternatively, for places where such allocations could be a problem
for performance, using a union can be used to force the required
alignment.  I'm fairly certain that just allocating is more than OK
here, to be clear.

Greetings,

Andres Freund


Re: pg_verify_checksums and -fno-strict-aliasing

From
Michael Banck
Date:
Hi,

Am Donnerstag, den 30.08.2018, 22:02 +0200 schrieb Magnus Hagander:
> PFA some *very* basic tests for pg_verify_checksums, which should at
> least be enough to catch the kind of errors we had now in the tool
> itself.

I proposed something similar for pg_basebackup back then and IIRC Peter
(rightfully) complained that using a system catalog for that wasn't so
great - also, are you sure the relfilenode will always be stable? Then
again, this is obviously a throw-away data directory so it should be no
general issue, just a matter of taste.

https://github.com/credativ/pg_checksums/blob/master/t/001_checksums.pl
has some similar tests at the end but creates a table for corruption,
adds some safeguards for block size and also uses  command_checks_all()
to parse the output. It's PGDG licensed so you're welcome to take some
or all of that.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: pg_verify_checksums and -fno-strict-aliasing

From
Michael Banck
Date:
Hi,

Am Donnerstag, den 30.08.2018, 21:35 +0200 schrieb Magnus Hagander:
> So if I get you  right, you're saying the attached patch should be all
> that's needed? 

I tried to do some similar changes but neither what you proposed nor
what I came up with actually fixes the checksum failures, though they
silence the warning at -Wall level.

Could well be I'm doing something wrong, so it would be cool if somebody
could reproduce this first. In principle, it should be enough to run
'make clean && make CLFAGS=-O2' in the src/bin/pg_verify_checksums
subdirectory in order to get a broken executable.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: pg_verify_checksums and -fno-strict-aliasing

From
Tom Lane
Date:
Michael Banck <michael.banck@credativ.de> writes:
> Could well be I'm doing something wrong, so it would be cool if somebody
> could reproduce this first. In principle, it should be enough to run
> 'make clean && make CLFAGS=-O2' in the src/bin/pg_verify_checksums
> subdirectory in order to get a broken executable.

I can reproduce it on my Fedora 28 machine (gcc 8.1.1) by compiling
pg_verify_checksums with all the normal flags except -fno-strict-aliasing.
I don't see any warning, not even with -Wstrict-aliasing=2 :-(, but the
generated code malfunctions as Michael describes.  As best I can tell,
the problem is in the inlined code from checksum_impl.h rather than
in pg_verify_checksums.c itself.  I think what is happening is gcc
is concluding that it can optimize away the code that temporarily
sets the page's checksum field to zero.

Although, as Alvaro mentioned upthread, there's basically no hope of
getting away from -fno-strict-aliasing for Postgres-at-large, I think it'd
be a good thing if we could get checksum_impl.h to not depend on it.  The
whole point of that header, according to its own self-description, is to
export the checksum calculation for use by external programs --- and we
can't really control what compiler flags will be used by those.  The fact
that Michael even ran into this problem seems to be an instance of that.

So, I've been fooling around trying to get it to work without
-fno-strict-aliasing, but with little luck so far.

            regards, tom lane


Re: pg_verify_checksums and -fno-strict-aliasing

From
Andres Freund
Date:
Hi,

On 2018-08-30 17:19:28 -0400, Tom Lane wrote:
> So, I've been fooling around trying to get it to work without
> -fno-strict-aliasing, but with little luck so far.

The problem presumably is that pg_checksum_block() accesses the relevant
fields as an uint32, whereas pg_checksum_page() accesses it as a
PageHeader. That's an aliasing violation.  *One* cast from char* to
either type is fine, it's accessing under both those types that's
problematic.

One way to fix it would be to memcpy in/out the modified PageHeader, or
just do offset math and memcpy to that offset.

Greetings,

Andres Freund


Re: pg_verify_checksums and -fno-strict-aliasing

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Thu, Aug 30, 2018 at 4:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (The right fix, of course, is to malloc the work buffer rather than
>> put it on the stack.)

> So if I get you  right, you're saying the attached patch should be all
> that's needed?

Well, that's some of what's needed.  I notice this code is also being
sloppy about whether block numbers are signed or unsigned, which means
it'd probably misbehave on relations exceeding 2^31 blocks.  I have
a patch in progress to clean all that up, though I'm still working
on the strict-aliasing part.

            regards, tom lane


Re: pg_verify_checksums and -fno-strict-aliasing

From
Andres Freund
Date:
On 2018-08-30 14:46:06 -0700, Andres Freund wrote:
> Hi,
> 
> On 2018-08-30 17:19:28 -0400, Tom Lane wrote:
> > So, I've been fooling around trying to get it to work without
> > -fno-strict-aliasing, but with little luck so far.
> 
> The problem presumably is that pg_checksum_block() accesses the relevant
> fields as an uint32, whereas pg_checksum_page() accesses it as a
> PageHeader. That's an aliasing violation.  *One* cast from char* to
> either type is fine, it's accessing under both those types that's
> problematic.
> 
> One way to fix it would be to memcpy in/out the modified PageHeader, or
> just do offset math and memcpy to that offset.

It took me a bit to reproduce the issue (due to sheer stupidity on my
part: no, changing the flags passed to gcc to link pg_verify_checksums
doesn't do the trick), but the above indeed fixes the issue for me.

The attached is just for demonstration that the approach works.

Greetings,

Andres Freund

Attachment

Re: pg_verify_checksums and -fno-strict-aliasing

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2018-08-30 14:46:06 -0700, Andres Freund wrote:
>> One way to fix it would be to memcpy in/out the modified PageHeader, or
>> just do offset math and memcpy to that offset.

> It took me a bit to reproduce the issue (due to sheer stupidity on my
> part: no, changing the flags passed to gcc to link pg_verify_checksums
> doesn't do the trick), but the above indeed fixes the issue for me.

I suspect people will complain about the added cost of doing that.

I've been AFK all afternoon, but what I was intending to try next was
the union approach, specifically union'ing PageHeaderData with the uint32
array representation needed by pg_checksum_block().  That might also
end up giving us code less unreadable than this:

    uint32        (*dataArr)[N_SUMS] = (uint32 (*)[N_SUMS]) data;


BTW, not to mention the elephant in the room, but: is it *really* OK
that pg_checksum_page scribbles on the page buffer, even temporarily?
It's certainly quite horrid that there aren't large warnings about
that in the function's API comment.

            regards, tom lane


Re: pg_verify_checksums and -fno-strict-aliasing

From
Andres Freund
Date:
On 2018-08-30 18:11:40 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-08-30 14:46:06 -0700, Andres Freund wrote:
> >> One way to fix it would be to memcpy in/out the modified PageHeader, or
> >> just do offset math and memcpy to that offset.
> 
> > It took me a bit to reproduce the issue (due to sheer stupidity on my
> > part: no, changing the flags passed to gcc to link pg_verify_checksums
> > doesn't do the trick), but the above indeed fixes the issue for me.
> 
> I suspect people will complain about the added cost of doing that.

I think the compiler will just optimize it away.  But if we're concerned
we could write it as

    memcpy(&save_checksum, page + offsetof(PageHeaderData, pd_checksum), sizeof(save_checksum));
    memset(page + offsetof(PageHeaderData, pd_checksum), 0, sizeof(save_checksum));

    checksum = pg_checksum_block(page, BLCKSZ);

    memcpy(page + offsetof(PageHeaderData, pd_checksum), &save_checksum, sizeof(save_checksum));

works, but still not exceedingly pretty :/.  The code generated looks
reasonable:

194        memcpy(&save_checksum, page + offsetof(PageHeaderData, pd_checksum), sizeof(save_checksum));
   0x00000000000035d0 <+0>:    push   %r12
   0x00000000000035d2 <+2>:    xor    %eax,%eax
   0x00000000000035d4 <+4>:    movzwl 0x8(%rdi),%r12d

195        memset(page + offsetof(PageHeaderData, pd_checksum), 0, sizeof(save_checksum));
   0x00000000000035d9 <+9>:    push   %rbp
   0x00000000000035da <+10>:    mov    %ax,0x8(%rdi)

(the pushes are just interspersed stuff, yay latency aware instruction
scheduling)


> I've been AFK all afternoon, but what I was intending to try next was
> the union approach, specifically union'ing PageHeaderData with the uint32
> array representation needed by pg_checksum_block().  That might also
> end up giving us code less unreadable than this:
> 
>     uint32        (*dataArr)[N_SUMS] = (uint32 (*)[N_SUMS]) data;

Hm.


> BTW, not to mention the elephant in the room, but: is it *really* OK
> that pg_checksum_page scribbles on the page buffer, even temporarily?
> It's certainly quite horrid that there aren't large warnings about
> that in the function's API comment.

It certainly should be warned about.  Practically I don't think it's a
problem, because we pretty much always operate on a copy of the page
when writing out, as otherwise concurrently set hint bits would be
troublesome.

Greetings,

Andres Freund


Re: pg_verify_checksums and -fno-strict-aliasing

From
Michael Paquier
Date:
On Thu, Aug 30, 2018 at 10:07:38PM +0200, Magnus Hagander wrote:
> I wonder if your tests that pg_control has picked things up belong more in
> the tests of initdb itself?

For the case where checksums are disabled, moving there the check on
control data makes sense.

> Do you think there is value in testing against a non-checksum cluster? I
> guess there's some point to it. I think testing actual corruption (like my
> version of the tests) is more valuable, but perhaps we should just do both?

Yeah, let's do stuff on a single cluster which has them only enabled,
as initializing a node is one of the most costly operations in TAP
tests.  Checking that the server is stopped is definitely a must in my
opinion, and your addition about emulating corrupted blocks is a good
idea.  I would personally vote for keeping a control file check within
the tests of pg_verify_checksums as that's cheap.
--
Michael

Attachment

Re: pg_verify_checksums and -fno-strict-aliasing

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2018-08-30 18:11:40 -0400, Tom Lane wrote:
>> I suspect people will complain about the added cost of doing that.

> I think the compiler will just optimize it away.

Maybe.  In any case, the attached version avoids any need for memcpy
and is, I think, cleaner code anyhow.  It fixes the problem for me
with Fedora's gcc, though I'm not sure that it'd be enough to get rid
of the warning Michael sees (I wonder what compiler he's using).

>> BTW, not to mention the elephant in the room, but: is it *really* OK
>> that pg_checksum_page scribbles on the page buffer, even temporarily?
>> It's certainly quite horrid that there aren't large warnings about
>> that in the function's API comment.

> It certainly should be warned about.  Practically I don't think it's a
> problem, because we pretty much always operate on a copy of the page
> when writing out, as otherwise concurrently set hint bits would be
> troublesome.

The write end of things is not a problem IMO, since presumably the caller
would be about to overwrite the checksum field anyway.  The issue is for
reading and/or verifying, where it's much less obvious that clobbering
the page image is safe.

            regards, tom lane

diff --git a/src/include/storage/checksum_impl.h b/src/include/storage/checksum_impl.h
index 64d7622..a49d27f 100644
--- a/src/include/storage/checksum_impl.h
+++ b/src/include/storage/checksum_impl.h
@@ -107,6 +107,13 @@
 /* prime multiplier of FNV-1a hash */
 #define FNV_PRIME 16777619

+/* Use a union so that this code is valid under strict aliasing */
+typedef union
+{
+    PageHeaderData phdr;
+    uint32        data[BLCKSZ / (sizeof(uint32) * N_SUMS)][N_SUMS];
+} PGChecksummablePage;
+
 /*
  * Base offsets to initialize each of the parallel FNV hashes into a
  * different initial state.
@@ -132,28 +139,27 @@ do { \
 } while (0)

 /*
- * Block checksum algorithm.  The data argument must be aligned on a 4-byte
- * boundary.
+ * Block checksum algorithm.  The page must be adequately aligned
+ * (at least on 4-byte boundary).
  */
 static uint32
-pg_checksum_block(char *data, uint32 size)
+pg_checksum_block(const PGChecksummablePage *page)
 {
     uint32        sums[N_SUMS];
-    uint32        (*dataArr)[N_SUMS] = (uint32 (*)[N_SUMS]) data;
     uint32        result = 0;
     uint32        i,
                 j;

     /* ensure that the size is compatible with the algorithm */
-    Assert((size % (sizeof(uint32) * N_SUMS)) == 0);
+    Assert(sizeof(PGChecksummablePage) == BLCKSZ);

     /* initialize partial checksums to their corresponding offsets */
     memcpy(sums, checksumBaseOffsets, sizeof(checksumBaseOffsets));

     /* main checksum calculation */
-    for (i = 0; i < size / sizeof(uint32) / N_SUMS; i++)
+    for (i = 0; i < (uint32) (BLCKSZ / (sizeof(uint32) * N_SUMS)); i++)
         for (j = 0; j < N_SUMS; j++)
-            CHECKSUM_COMP(sums[j], dataArr[i][j]);
+            CHECKSUM_COMP(sums[j], page->data[i][j]);

     /* finally add in two rounds of zeroes for additional mixing */
     for (i = 0; i < 2; i++)
@@ -168,8 +174,10 @@ pg_checksum_block(char *data, uint32 size)
 }

 /*
- * Compute the checksum for a Postgres page.  The page must be aligned on a
- * 4-byte boundary.
+ * Compute the checksum for a Postgres page.
+ *
+ * The page must be adequately aligned (at least on a 4-byte boundary).
+ * Beware also that the checksum field of the page is transiently zeroed.
  *
  * The checksum includes the block number (to detect the case where a page is
  * somehow moved to a different location), the page header (excluding the
@@ -178,12 +186,12 @@ pg_checksum_block(char *data, uint32 size)
 uint16
 pg_checksum_page(char *page, BlockNumber blkno)
 {
-    PageHeader    phdr = (PageHeader) page;
+    PGChecksummablePage *cpage = (PGChecksummablePage *) page;
     uint16        save_checksum;
     uint32        checksum;

     /* We only calculate the checksum for properly-initialized pages */
-    Assert(!PageIsNew(page));
+    Assert(!PageIsNew(&cpage->phdr));

     /*
      * Save pd_checksum and temporarily set it to zero, so that the checksum
@@ -191,10 +199,10 @@ pg_checksum_page(char *page, BlockNumber blkno)
      * Restore it after, because actually updating the checksum is NOT part of
      * the API of this function.
      */
-    save_checksum = phdr->pd_checksum;
-    phdr->pd_checksum = 0;
-    checksum = pg_checksum_block(page, BLCKSZ);
-    phdr->pd_checksum = save_checksum;
+    save_checksum = cpage->phdr.pd_checksum;
+    cpage->phdr.pd_checksum = 0;
+    checksum = pg_checksum_block(cpage);
+    cpage->phdr.pd_checksum = save_checksum;

     /* Mix in the block number to detect transposed pages */
     checksum ^= blkno;

Re: pg_verify_checksums and -fno-strict-aliasing

From
Andres Freund
Date:
On 2018-08-30 19:02:15 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > It certainly should be warned about.  Practically I don't think it's a
> > problem, because we pretty much always operate on a copy of the page
> > when writing out, as otherwise concurrently set hint bits would be
> > troublesome.
> 
> The write end of things is not a problem IMO, since presumably the caller
> would be about to overwrite the checksum field anyway.  The issue is for
> reading and/or verifying, where it's much less obvious that clobbering
> the page image is safe.

With "reading" you mean putting the page into the buffercache? If so,
that should be ok, the page isn't yet accessible (BM_VALID isn't set).

I don't think it's possible to do verification from the page in s_b,
because we don't keep the checksum current, so there should never be
calls to do so.

But we probably should still add a comment making clear that the
function modifies the buffer temporarily.

Greetings,

Andres Freund


Re: pg_verify_checksums and -fno-strict-aliasing

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Thu, Aug 30, 2018 at 10:39:26AM -0400, Tom Lane wrote:
>> (The right fix, of course, is to malloc the work buffer rather than
>> put it on the stack.)

> pg_upgrade/file.c is careful about that (5afcd2a), and has a comment on
> the matter, as does pg_standby.c.

> Now, grepping around for "BLCKSZ]", some garbage may have accumulated?

Ah, that's a cute idea for finding trouble spots.

> - entrySplitPage in ginentrypage.c
> - rewind_copy_file_range in file.c
> - _hash_alloc_buckets in hashpage.c
> - pg_prewarm.c
> - blinsert.c
> - pg_waldump.c
> - logtape.c

Some of these are safe, I think, because the buffers are only used as
targets for read() and write().  But some are definitely broken.
My own list of files that seem to have issues is

blinsert.c
generic_xlog.c
ginentrypage.c
hashpage.c
pg_verify_checksums.c
pg_waldump.c
xloginsert.c

The fact that some of these are pretty old and we've not noticed is
not good.  It suggests that we don't currently have any compilers in the
buildfarm that under-align char[] arrays on the stack, which seems like
a gotcha waiting to bite us.  I wonder if there is any way to persuade
some compiler on a non-Intel box to do that.

Anyway, I'll work on a patch for that, unless you were on it already?

            regards, tom lane


Re: pg_verify_checksums and -fno-strict-aliasing

From
Michael Paquier
Date:
On Thu, Aug 30, 2018 at 07:37:37PM -0400, Tom Lane wrote:
> Some of these are safe, I think, because the buffers are only used as
> targets for read() and write().  But some are definitely broken.

Yes, I have not spent more than a couple of minutes on this issue.  I
noticed some of them easily though.

> My own list of files that seem to have issues is
>
> blinsert.c
> generic_xlog.c
> ginentrypage.c
> hashpage.c
> pg_verify_checksums.c
> pg_waldump.c
> xloginsert.c
>
> The fact that some of these are pretty old and we've not noticed is
> not good.  It suggests that we don't currently have any compilers in the
> buildfarm that under-align char[] arrays on the stack, which seems like
> a gotcha waiting to bite us.  I wonder if there is any way to persuade
> some compiler on a non-Intel box to do that.

Agreed, that's annoying.

> Anyway, I'll work on a patch for that, unless you were on it already?

I have begun working on that and am less than halfway through it as I
needed a fresh problem, however I am not sure I would be able to finish
it today, perhaps tomorrow...  If you have time and want to press on as
11 would release soon, of course feel free to wrap up more quickly than
I can.
--
Michael

Attachment

Re: pg_verify_checksums and -fno-strict-aliasing

From
Michael Banck
Date:
Hi,

Am Donnerstag, den 30.08.2018, 19:02 -0400 schrieb Tom Lane:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-08-30 18:11:40 -0400, Tom Lane wrote:
> > > I suspect people will complain about the added cost of doing that.
> > I think the compiler will just optimize it away.
> 
> Maybe.  In any case, the attached version avoids any need for memcpy
> and is, I think, cleaner code anyhow.  It fixes the problem for me
> with Fedora's gcc, though I'm not sure that it'd be enough to get rid
> of the warning Michael sees (I wonder what compiler he's using).

This fixes the bogus checksums, thanks!

I am on Debian stable, i.e. 'gcc version 6.3.0 20170516 (Debian 6.3.0-
18+deb9u1)'. 

The warning shows up only if I add -Wall *and* -O2 to CFLAGS, I think I
did not mention that explictly before.

As it is in pg_verify_checksums.c I need Magnus' patch as well to make
it go away. But the warning is independent of the runtime issue so less
of an issue (but probably worth fixing).


Michael 

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: pg_verify_checksums and -fno-strict-aliasing

From
Peter Eisentraut
Date:
On 31/08/2018 01:37, Tom Lane wrote:
> The fact that some of these are pretty old and we've not noticed is
> not good.  It suggests that we don't currently have any compilers in the
> buildfarm that under-align char[] arrays on the stack, which seems like
> a gotcha waiting to bite us.  I wonder if there is any way to persuade
> some compiler on a non-Intel box to do that.

-Wcast-align=strict (requires gcc-8) warns about the issue in
pg_verify_checksums.c, but also about a handful^W^W15211 other things,
so it wouldn't be immediately useful.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_verify_checksums and -fno-strict-aliasing

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Thu, Aug 30, 2018 at 07:37:37PM -0400, Tom Lane wrote:
>> Anyway, I'll work on a patch for that, unless you were on it already?

> I have begun working on that and am less than halfway through it as I
> needed a fresh problem, however I am not sure I would be able to finish
> it today, perhaps tomorrow...  If you have time and want to press on as
> 11 would release soon, of course feel free to wrap up more quickly than
> I can.

Some of these places might be performance-critical enough that adding
a palloc/pfree cycle would not be nice.  What I was considering doing
was inventing something like

typedef union PGAlignedBuffer
{
    char    data[BLCKSZ];
    double    force_align;
} PGAlignedBuffer;

and replacing plain char[] variables with that as appropriate.
We might need one for XLOG_BLCKSZ too.

Since this'd be used in frontend as well as backend, it'd likely
have to end up in c.h, which is slightly annoying but not really
a big deal as long as we pick a nonconflicting type name.

            regards, tom lane


Re: pg_verify_checksums and -fno-strict-aliasing

From
Tom Lane
Date:
Michael Banck <michael.banck@credativ.de> writes:
> Am Donnerstag, den 30.08.2018, 19:02 -0400 schrieb Tom Lane:
>> Maybe.  In any case, the attached version avoids any need for memcpy
>> and is, I think, cleaner code anyhow.  It fixes the problem for me
>> with Fedora's gcc, though I'm not sure that it'd be enough to get rid
>> of the warning Michael sees (I wonder what compiler he's using).

> This fixes the bogus checksums, thanks!
> I am on Debian stable, i.e. 'gcc version 6.3.0 20170516 (Debian 6.3.0-
> 18+deb9u1)'.

Ah-hah.  I still can't replicate any warning with gcc 8.1.1, but
I do have a Raspbian installation at hand, with
gcc version 6.3.0 20170516 (Raspbian 6.3.0-18+rpi1+deb9u1)
and on that I get

$ gcc  -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fwrapv -fexcess-precision=standard -g -O2 -I../../../src/include
-D_GNU_SOURCE  -c -o pg_verify_checksums.o pg_verify_checksums.c 
pg_verify_checksums.c: In function 'scan_file':
pg_verify_checksums.c:112:3: warning: dereferencing type-punned pointer will break strict-aliasing rules
[-Wstrict-aliasing]
   if (PageIsNew(buf))
   ^~

Adding -Wstrict-aliasing=2 makes it slightly more verbose:

$ gcc -Wstrict-aliasing=2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla
-Wendif-labels-Wmissing-format-attribute -Wformat-security -fwrapv -fexcess-precision=standard -g -O2
-I../../../src/include -D_GNU_SOURCE   -c -o pg_verify_checksums.o pg_verify_checksums.c 
pg_verify_checksums.c: In function 'scan_file':
pg_verify_checksums.c:82:2: warning: dereferencing type-punned pointer will break strict-aliasing rules
[-Wstrict-aliasing]
  PageHeader header = (PageHeader) buf;
  ^~~~~~~~~~
pg_verify_checksums.c:112:3: warning: dereferencing type-punned pointer will break strict-aliasing rules
[-Wstrict-aliasing]
   if (PageIsNew(buf))
   ^~

Compiling this way also causes pg_verify_checksums to compute wrong
checksums.  Installing the patched version of checksum_impl.h fixes
that, but doesn't change the warnings, meaning they have absolutely
nothing to do with the actual problem :-(

So, even aside from the difficulty of getting rid of aliasing-unsafe
code in Postgres, the state of the compiler warning technology is
still years short of where we could sanely consider dispensing with
-fno-strict-aliasing in general.  Still, as I said before, it'd be
a good idea for checksum_impl.h to not depend on that.  I'll go
ahead and push the fix.

            regards, tom lane


Re: pg_verify_checksums and -fno-strict-aliasing

From
Tom Lane
Date:
I wrote:
> Some of these places might be performance-critical enough that adding
> a palloc/pfree cycle would not be nice.  What I was considering doing
> was inventing something like
>
> typedef union PGAlignedBuffer
> {
>     char    data[BLCKSZ];
>     double    force_align;
> } PGAlignedBuffer;

Here's a proposed patch using that approach.

Although some of the places that were using "char buf[BLCKSZ]" variables
weren't doing anything that really requires better alignment, I made
almost all of them use PGAlignedBuffer variables anyway, on the grounds
that you typically get better memcpy speed for aligned than unaligned
transfers.

I also fixed a few places that were using the palloc solution, and
one that was actually doing hand-alignment of the pointer (ugh).
I didn't try to be thorough about getting rid of such pallocs,
just fix places that seemed likely to be performance-critical.

This needs to be back-patched as relevant, of course.

            regards, tom lane

diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index 4afdea7..5ae1dda 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -36,7 +36,7 @@ typedef struct
     int64        indtuples;        /* total number of tuples indexed */
     MemoryContext tmpCtx;        /* temporary memory context reset after each
                                  * tuple */
-    char        data[BLCKSZ];    /* cached page */
+    PGAlignedBuffer data;        /* cached page */
     int            count;            /* number of tuples in cached page */
 } BloomBuildState;

@@ -52,7 +52,7 @@ flushCachedPage(Relation index, BloomBuildState *buildstate)

     state = GenericXLogStart(index);
     page = GenericXLogRegisterBuffer(state, buffer, GENERIC_XLOG_FULL_IMAGE);
-    memcpy(page, buildstate->data, BLCKSZ);
+    memcpy(page, buildstate->data.data, BLCKSZ);
     GenericXLogFinish(state);
     UnlockReleaseBuffer(buffer);
 }
@@ -63,8 +63,8 @@ flushCachedPage(Relation index, BloomBuildState *buildstate)
 static void
 initCachedPage(BloomBuildState *buildstate)
 {
-    memset(buildstate->data, 0, BLCKSZ);
-    BloomInitPage(buildstate->data, 0);
+    memset(buildstate->data.data, 0, BLCKSZ);
+    BloomInitPage(buildstate->data.data, 0);
     buildstate->count = 0;
 }

@@ -84,7 +84,7 @@ bloomBuildCallback(Relation index, HeapTuple htup, Datum *values,
     itup = BloomFormTuple(&buildstate->blstate, &htup->t_self, values, isnull);

     /* Try to add next item to cached page */
-    if (BloomPageAddItem(&buildstate->blstate, buildstate->data, itup))
+    if (BloomPageAddItem(&buildstate->blstate, buildstate->data.data, itup))
     {
         /* Next item was added successfully */
         buildstate->count++;
@@ -98,7 +98,7 @@ bloomBuildCallback(Relation index, HeapTuple htup, Datum *values,

         initCachedPage(buildstate);

-        if (!BloomPageAddItem(&buildstate->blstate, buildstate->data, itup))
+        if (!BloomPageAddItem(&buildstate->blstate, buildstate->data.data, itup))
         {
             /* We shouldn't be here since we're inserting to the empty page */
             elog(ERROR, "could not add new bloom tuple to empty page");
diff --git a/contrib/pg_prewarm/pg_prewarm.c b/contrib/pg_prewarm/pg_prewarm.c
index 3cbb7c2..1a00c59 100644
--- a/contrib/pg_prewarm/pg_prewarm.c
+++ b/contrib/pg_prewarm/pg_prewarm.c
@@ -36,7 +36,7 @@ typedef enum
     PREWARM_BUFFER
 } PrewarmType;

-static char blockbuffer[BLCKSZ];
+static PGAlignedBuffer blockbuffer;

 /*
  * pg_prewarm(regclass, mode text, fork text,
@@ -178,7 +178,7 @@ pg_prewarm(PG_FUNCTION_ARGS)
         for (block = first_block; block <= last_block; ++block)
         {
             CHECK_FOR_INTERRUPTS();
-            smgrread(rel->rd_smgr, forkNumber, block, blockbuffer);
+            smgrread(rel->rd_smgr, forkNumber, block, blockbuffer.data);
             ++blocks_done;
         }
     }
diff --git a/src/backend/access/gin/ginentrypage.c b/src/backend/access/gin/ginentrypage.c
index 8107697..ef5c367 100644
--- a/src/backend/access/gin/ginentrypage.c
+++ b/src/backend/access/gin/ginentrypage.c
@@ -616,7 +616,7 @@ entrySplitPage(GinBtree btree, Buffer origbuf,
     Page        lpage = PageGetTempPageCopy(BufferGetPage(origbuf));
     Page        rpage = PageGetTempPageCopy(BufferGetPage(origbuf));
     Size        pageSize = PageGetPageSize(lpage);
-    char        tupstore[2 * BLCKSZ];
+    PGAlignedBuffer tupstore[2];    /* could need 2 pages' worth of tuples */

     entryPreparePage(btree, lpage, off, insertData, updateblkno);

@@ -625,7 +625,7 @@ entrySplitPage(GinBtree btree, Buffer origbuf,
      * one after another in a temporary workspace.
      */
     maxoff = PageGetMaxOffsetNumber(lpage);
-    ptr = tupstore;
+    ptr = tupstore[0].data;
     for (i = FirstOffsetNumber; i <= maxoff; i++)
     {
         if (i == off)
@@ -658,7 +658,7 @@ entrySplitPage(GinBtree btree, Buffer origbuf,
     GinInitPage(rpage, GinPageGetOpaque(lpage)->flags, pageSize);
     GinInitPage(lpage, GinPageGetOpaque(rpage)->flags, pageSize);

-    ptr = tupstore;
+    ptr = tupstore[0].data;
     maxoff++;
     lsize = 0;

diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index e32807e..915b9ca 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -64,18 +64,15 @@ writeListPage(Relation index, Buffer buffer,
                 size = 0;
     OffsetNumber l,
                 off;
-    char       *workspace;
+    PGAlignedBuffer workspace;
     char       *ptr;

-    /* workspace could be a local array; we use palloc for alignment */
-    workspace = palloc(BLCKSZ);
-
     START_CRIT_SECTION();

     GinInitBuffer(buffer, GIN_LIST);

     off = FirstOffsetNumber;
-    ptr = workspace;
+    ptr = workspace.data;

     for (i = 0; i < ntuples; i++)
     {
@@ -127,7 +124,7 @@ writeListPage(Relation index, Buffer buffer,
         XLogRegisterData((char *) &data, sizeof(ginxlogInsertListPage));

         XLogRegisterBuffer(0, buffer, REGBUF_WILL_INIT);
-        XLogRegisterBufData(0, workspace, size);
+        XLogRegisterBufData(0, workspace.data, size);

         recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_INSERT_LISTPAGE);
         PageSetLSN(page, recptr);
@@ -140,8 +137,6 @@ writeListPage(Relation index, Buffer buffer,

     END_CRIT_SECTION();

-    pfree(workspace);
-
     return freesize;
 }

diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 3ec29a5..27d86b9 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -1000,7 +1000,7 @@ static bool
 _hash_alloc_buckets(Relation rel, BlockNumber firstblock, uint32 nblocks)
 {
     BlockNumber lastblock;
-    char        zerobuf[BLCKSZ];
+    PGAlignedBuffer zerobuf;
     Page        page;
     HashPageOpaque ovflopaque;

@@ -1013,7 +1013,7 @@ _hash_alloc_buckets(Relation rel, BlockNumber firstblock, uint32 nblocks)
     if (lastblock < firstblock || lastblock == InvalidBlockNumber)
         return false;

-    page = (Page) zerobuf;
+    page = (Page) zerobuf.data;

     /*
      * Initialize the page.  Just zeroing the page won't work; see
@@ -1034,11 +1034,11 @@ _hash_alloc_buckets(Relation rel, BlockNumber firstblock, uint32 nblocks)
         log_newpage(&rel->rd_node,
                     MAIN_FORKNUM,
                     lastblock,
-                    zerobuf,
+                    zerobuf.data,
                     true);

     RelationOpenSmgr(rel);
-    smgrextend(rel->rd_smgr, MAIN_FORKNUM, lastblock, zerobuf, false);
+    smgrextend(rel->rd_smgr, MAIN_FORKNUM, lastblock, zerobuf.data, false);

     return true;
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b8bfe23..27e8b01 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2709,7 +2709,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
     HeapTuple  *heaptuples;
     int            i;
     int            ndone;
-    char       *scratch = NULL;
+    PGAlignedBuffer scratch;
     Page        page;
     bool        needwal;
     Size        saveFreeSpace;
@@ -2727,14 +2727,6 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
                                             xid, cid, options);

     /*
-     * Allocate some memory to use for constructing the WAL record. Using
-     * palloc() within a critical section is not safe, so we allocate this
-     * beforehand.
-     */
-    if (needwal)
-        scratch = palloc(BLCKSZ);
-
-    /*
      * We're about to do the actual inserts -- but check for conflict first,
      * to minimize the possibility of having to roll back work we've just
      * done.
@@ -2826,7 +2818,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
             uint8        info = XLOG_HEAP2_MULTI_INSERT;
             char       *tupledata;
             int            totaldatalen;
-            char       *scratchptr = scratch;
+            char       *scratchptr = scratch.data;
             bool        init;
             int            bufflags = 0;

@@ -2885,7 +2877,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
                 scratchptr += datalen;
             }
             totaldatalen = scratchptr - tupledata;
-            Assert((scratchptr - scratch) < BLCKSZ);
+            Assert((scratchptr - scratch.data) < BLCKSZ);

             if (need_tuple_data)
                 xlrec->flags |= XLH_INSERT_CONTAINS_NEW_TUPLE;
@@ -2912,7 +2904,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
                 bufflags |= REGBUF_KEEP_DATA;

             XLogBeginInsert();
-            XLogRegisterData((char *) xlrec, tupledata - scratch);
+            XLogRegisterData((char *) xlrec, tupledata - scratch.data);
             XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);

             XLogRegisterBufData(0, tupledata, totaldatalen);
diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c
index ce02354..e421f29 100644
--- a/src/backend/access/transam/generic_xlog.c
+++ b/src/backend/access/transam/generic_xlog.c
@@ -61,14 +61,11 @@ typedef struct
 /* State of generic xlog record construction */
 struct GenericXLogState
 {
-    /*
-     * page's images. Should be first in this struct to have MAXALIGN'ed
-     * images addresses, because some code working with pages directly aligns
-     * addresses, not offsets from beginning of page
-     */
-    char        images[MAX_GENERIC_XLOG_PAGES * BLCKSZ];
+    /* Info about each page, see above */
     PageData    pages[MAX_GENERIC_XLOG_PAGES];
     bool        isLogged;
+    /* Page images (properly aligned) */
+    PGAlignedBuffer images[MAX_GENERIC_XLOG_PAGES];
 };

 static void writeFragment(PageData *pageData, OffsetNumber offset,
@@ -251,12 +248,12 @@ computeDelta(PageData *pageData, Page curpage, Page targetpage)
 #ifdef WAL_DEBUG
     if (XLOG_DEBUG)
     {
-        char        tmp[BLCKSZ];
+        PGAlignedBuffer tmp;

-        memcpy(tmp, curpage, BLCKSZ);
-        applyPageRedo(tmp, pageData->delta, pageData->deltaLen);
-        if (memcmp(tmp, targetpage, targetLower) != 0 ||
-            memcmp(tmp + targetUpper, targetpage + targetUpper,
+        memcpy(tmp.data, curpage, BLCKSZ);
+        applyPageRedo(tmp.data, pageData->delta, pageData->deltaLen);
+        if (memcmp(tmp.data, targetpage, targetLower) != 0 ||
+            memcmp(tmp.data + targetUpper, targetpage + targetUpper,
                    BLCKSZ - targetUpper) != 0)
             elog(ERROR, "result of generic xlog apply does not match");
     }
@@ -277,7 +274,7 @@ GenericXLogStart(Relation relation)

     for (i = 0; i < MAX_GENERIC_XLOG_PAGES; i++)
     {
-        state->pages[i].image = state->images + BLCKSZ * i;
+        state->pages[i].image = state->images[i].data;
         state->pages[i].buffer = InvalidBuffer;
     }

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 65db2e4..29d792f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3210,8 +3210,7 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 {
     char        path[MAXPGPATH];
     char        tmppath[MAXPGPATH];
-    char        zbuffer_raw[XLOG_BLCKSZ + MAXIMUM_ALIGNOF];
-    char       *zbuffer;
+    PGAlignedXLogBuffer zbuffer;
     XLogSegNo    installed_segno;
     XLogSegNo    max_segno;
     int            fd;
@@ -3263,17 +3262,13 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
      * fsync below) that all the indirect blocks are down on disk.  Therefore,
      * fdatasync(2) or O_DSYNC will be sufficient to sync future writes to the
      * log file.
-     *
-     * Note: ensure the buffer is reasonably well-aligned; this may save a few
-     * cycles transferring data to the kernel.
      */
-    zbuffer = (char *) MAXALIGN(zbuffer_raw);
-    memset(zbuffer, 0, XLOG_BLCKSZ);
+    memset(zbuffer.data, 0, XLOG_BLCKSZ);
     for (nbytes = 0; nbytes < wal_segment_size; nbytes += XLOG_BLCKSZ)
     {
         errno = 0;
         pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE);
-        if ((int) write(fd, zbuffer, XLOG_BLCKSZ) != (int) XLOG_BLCKSZ)
+        if ((int) write(fd, zbuffer.data, XLOG_BLCKSZ) != (int) XLOG_BLCKSZ)
         {
             int            save_errno = errno;

@@ -3380,7 +3375,7 @@ XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno,
 {
     char        path[MAXPGPATH];
     char        tmppath[MAXPGPATH];
-    char        buffer[XLOG_BLCKSZ];
+    PGAlignedXLogBuffer buffer;
     int            srcfd;
     int            fd;
     int            nbytes;
@@ -3423,7 +3418,7 @@ XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno,
          * zeros.
          */
         if (nread < sizeof(buffer))
-            memset(buffer, 0, sizeof(buffer));
+            memset(buffer.data, 0, sizeof(buffer));

         if (nread > 0)
         {
@@ -3432,7 +3427,7 @@ XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno,
             if (nread > sizeof(buffer))
                 nread = sizeof(buffer);
             pgstat_report_wait_start(WAIT_EVENT_WAL_COPY_READ);
-            r = read(srcfd, buffer, nread);
+            r = read(srcfd, buffer.data, nread);
             if (r != nread)
             {
                 if (r < 0)
@@ -3450,7 +3445,7 @@ XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno,
         }
         errno = 0;
         pgstat_report_wait_start(WAIT_EVENT_WAL_COPY_WRITE);
-        if ((int) write(fd, buffer, sizeof(buffer)) != (int) sizeof(buffer))
+        if ((int) write(fd, buffer.data, sizeof(buffer)) != (int) sizeof(buffer))
         {
             int            save_errno = errno;

diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 5bea073..c819204 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -809,12 +809,12 @@ XLogCompressBackupBlock(char *page, uint16 hole_offset, uint16 hole_length,
     int32        len;
     int32        extra_bytes = 0;
     char       *source;
-    char        tmp[BLCKSZ];
+    PGAlignedBuffer tmp;

     if (hole_length != 0)
     {
         /* must skip the hole */
-        source = tmp;
+        source = tmp.data;
         memcpy(source, page, hole_offset);
         memcpy(source + hole_offset,
                page + (hole_offset + hole_length),
@@ -917,7 +917,7 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
     if (lsn <= RedoRecPtr)
     {
         int            flags;
-        char        copied_buffer[BLCKSZ];
+        PGAlignedBuffer copied_buffer;
         char       *origdata = (char *) BufferGetBlock(buffer);
         RelFileNode rnode;
         ForkNumber    forkno;
@@ -935,11 +935,11 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
             uint16        lower = ((PageHeader) page)->pd_lower;
             uint16        upper = ((PageHeader) page)->pd_upper;

-            memcpy(copied_buffer, origdata, lower);
-            memcpy(copied_buffer + upper, origdata + upper, BLCKSZ - upper);
+            memcpy(copied_buffer.data, origdata, lower);
+            memcpy(copied_buffer.data + upper, origdata + upper, BLCKSZ - upper);
         }
         else
-            memcpy(copied_buffer, origdata, BLCKSZ);
+            memcpy(copied_buffer.data, origdata, BLCKSZ);

         XLogBeginInsert();

@@ -948,7 +948,7 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
             flags |= REGBUF_STANDARD;

         BufferGetTag(buffer, &rnode, &forkno, &blkno);
-        XLogRegisterBlock(0, &rnode, forkno, blkno, copied_buffer, flags);
+        XLogRegisterBlock(0, &rnode, forkno, blkno, copied_buffer.data, flags);

         recptr = XLogInsert(RM_XLOG_ID, XLOG_FPI_FOR_HINT);
     }
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 4c633c6..d5daafd 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1412,7 +1412,7 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
 {
     DecodedBkpBlock *bkpb;
     char       *ptr;
-    char        tmp[BLCKSZ];
+    PGAlignedBuffer tmp;

     if (!record->blocks[block_id].in_use)
         return false;
@@ -1425,7 +1425,7 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
     if (bkpb->bimg_info & BKPIMAGE_IS_COMPRESSED)
     {
         /* If a backup block image is compressed, decompress it */
-        if (pglz_decompress(ptr, bkpb->bimg_len, tmp,
+        if (pglz_decompress(ptr, bkpb->bimg_len, tmp.data,
                             BLCKSZ - bkpb->hole_length) < 0)
         {
             report_invalid_record(record, "invalid compressed image at %X/%X, block %d",
@@ -1434,7 +1434,7 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
                                   block_id);
             return false;
         }
-        ptr = tmp;
+        ptr = tmp.data;
     }

     /* generate page, taking into account hole if necessary */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 48743db..4d60787 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11276,21 +11276,14 @@ static void
 copy_relation_data(SMgrRelation src, SMgrRelation dst,
                    ForkNumber forkNum, char relpersistence)
 {
-    char       *buf;
+    PGAlignedBuffer buf;
     Page        page;
     bool        use_wal;
     bool        copying_initfork;
     BlockNumber nblocks;
     BlockNumber blkno;

-    /*
-     * palloc the buffer so that it's MAXALIGN'd.  If it were just a local
-     * char[] array, the compiler might align it on any byte boundary, which
-     * can seriously hurt transfer speed to and from the kernel; not to
-     * mention possibly making log_newpage's accesses to the page header fail.
-     */
-    buf = (char *) palloc(BLCKSZ);
-    page = (Page) buf;
+    page = (Page) buf.data;

     /*
      * The init fork for an unlogged relation in many respects has to be
@@ -11314,7 +11307,7 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
         /* If we got a cancel signal during the copy of the data, quit */
         CHECK_FOR_INTERRUPTS();

-        smgrread(src, forkNum, blkno, buf);
+        smgrread(src, forkNum, blkno, buf.data);

         if (!PageIsVerified(page, blkno))
             ereport(ERROR,
@@ -11340,11 +11333,9 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
          * rel, because there's no need for smgr to schedule an fsync for this
          * write; we'll do it ourselves below.
          */
-        smgrextend(dst, forkNum, blkno, buf, true);
+        smgrextend(dst, forkNum, blkno, buf.data, true);
     }

-    pfree(buf);
-
     /*
      * If the rel is WAL-logged, must fsync before commit.  We use heap_sync
      * to ensure that the toast table gets fsync'd too.  (For a temp or
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index c83ff3b..8005f11 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -496,11 +496,11 @@ SendTimeLineHistory(TimeLineHistoryCmd *cmd)
     bytesleft = histfilelen;
     while (bytesleft > 0)
     {
-        char        rbuf[BLCKSZ];
+        PGAlignedBuffer rbuf;
         int            nread;

         pgstat_report_wait_start(WAIT_EVENT_WALSENDER_TIMELINE_HISTORY_READ);
-        nread = read(fd, rbuf, sizeof(rbuf));
+        nread = read(fd, rbuf.data, sizeof(rbuf));
         pgstat_report_wait_end();
         if (nread < 0)
             ereport(ERROR,
@@ -513,7 +513,7 @@ SendTimeLineHistory(TimeLineHistoryCmd *cmd)
                      errmsg("could not read file \"%s\": read %d of %zu",
                             path, nread, (Size) bytesleft)));

-        pq_sendbytes(&buf, rbuf, nread);
+        pq_sendbytes(&buf, rbuf.data, nread);
         bytesleft -= nread;
     }
     CloseTransientFile(fd);
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index efbede7..4a52284 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -96,7 +96,7 @@ struct BufFile
     off_t        curOffset;        /* offset part of current pos */
     int            pos;            /* next read/write position in buffer */
     int            nbytes;            /* total # of valid bytes in buffer */
-    char        buffer[BLCKSZ];
+    PGAlignedBuffer buffer;
 };

 static BufFile *makeBufFileCommon(int nfiles);
@@ -437,7 +437,7 @@ BufFileLoadBuffer(BufFile *file)
      * Read whatever we can get, up to a full bufferload.
      */
     file->nbytes = FileRead(thisfile,
-                            file->buffer,
+                            file->buffer.data,
                             sizeof(file->buffer),
                             WAIT_EVENT_BUFFILE_READ);
     if (file->nbytes < 0)
@@ -502,7 +502,7 @@ BufFileDumpBuffer(BufFile *file)
             file->offsets[file->curFile] = file->curOffset;
         }
         bytestowrite = FileWrite(thisfile,
-                                 file->buffer + wpos,
+                                 file->buffer.data + wpos,
                                  bytestowrite,
                                  WAIT_EVENT_BUFFILE_WRITE);
         if (bytestowrite <= 0)
@@ -572,7 +572,7 @@ BufFileRead(BufFile *file, void *ptr, size_t size)
             nthistime = size;
         Assert(nthistime > 0);

-        memcpy(ptr, file->buffer + file->pos, nthistime);
+        memcpy(ptr, file->buffer.data + file->pos, nthistime);

         file->pos += nthistime;
         ptr = (void *) ((char *) ptr + nthistime);
@@ -621,7 +621,7 @@ BufFileWrite(BufFile *file, void *ptr, size_t size)
             nthistime = size;
         Assert(nthistime > 0);

-        memcpy(file->buffer + file->pos, ptr, nthistime);
+        memcpy(file->buffer.data + file->pos, ptr, nthistime);

         file->dirty = true;
         file->pos += nthistime;
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 62d7797..756c1ff 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -240,11 +240,11 @@ ltsWriteBlock(LogicalTapeSet *lts, long blocknum, void *buffer)
      */
     while (blocknum > lts->nBlocksWritten)
     {
-        char        zerobuf[BLCKSZ];
+        PGAlignedBuffer zerobuf;

-        MemSet(zerobuf, 0, sizeof(zerobuf));
+        MemSet(zerobuf.data, 0, sizeof(zerobuf));

-        ltsWriteBlock(lts, lts->nBlocksWritten, zerobuf);
+        ltsWriteBlock(lts, lts->nBlocksWritten, zerobuf.data);
     }

     /* Write the requested block */
diff --git a/src/bin/pg_rewind/copy_fetch.c b/src/bin/pg_rewind/copy_fetch.c
index 160a912..6a3c03a 100644
--- a/src/bin/pg_rewind/copy_fetch.c
+++ b/src/bin/pg_rewind/copy_fetch.c
@@ -156,7 +156,7 @@ recurse_dir(const char *datadir, const char *parentpath,
 static void
 rewind_copy_file_range(const char *path, off_t begin, off_t end, bool trunc)
 {
-    char        buf[BLCKSZ];
+    PGAlignedBuffer buf;
     char        srcpath[MAXPGPATH];
     int            srcfd;

@@ -182,7 +182,7 @@ rewind_copy_file_range(const char *path, off_t begin, off_t end, bool trunc)
         else
             len = end - begin;

-        readlen = read(srcfd, buf, len);
+        readlen = read(srcfd, buf.data, len);

         if (readlen < 0)
             pg_fatal("could not read file \"%s\": %s\n",
@@ -190,7 +190,7 @@ rewind_copy_file_range(const char *path, off_t begin, off_t end, bool trunc)
         else if (readlen == 0)
             pg_fatal("unexpected EOF while reading file \"%s\"\n", srcpath);

-        write_target_range(buf, begin, readlen);
+        write_target_range(buf.data, begin, readlen);
         begin += readlen;
     }

diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index bf7feed..922d4dd 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -75,8 +75,8 @@ skipfile(const char *fn)
 static void
 scan_file(const char *fn, BlockNumber segmentno)
 {
-    char        buf[BLCKSZ];
-    PageHeader    header = (PageHeader) buf;
+    PGAlignedBuffer buf;
+    PageHeader    header = (PageHeader) buf.data;
     int            f;
     BlockNumber blockno;

@@ -93,7 +93,7 @@ scan_file(const char *fn, BlockNumber segmentno)
     for (blockno = 0;; blockno++)
     {
         uint16        csum;
-        int            r = read(f, buf, BLCKSZ);
+        int            r = read(f, buf.data, BLCKSZ);

         if (r == 0)
             break;
@@ -109,7 +109,7 @@ scan_file(const char *fn, BlockNumber segmentno)
         if (PageIsNew(header))
             continue;

-        csum = pg_checksum_page(buf, blockno + segmentno * RELSEG_SIZE);
+        csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
         if (csum != header->pd_checksum)
         {
             if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 2a557b3..4affbb9 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -209,13 +209,13 @@ search_directory(const char *directory, const char *fname)
     /* set WalSegSz if file is successfully opened */
     if (fd >= 0)
     {
-        char        buf[XLOG_BLCKSZ];
+        PGAlignedXLogBuffer buf;
         int            r;

-        r = read(fd, buf, XLOG_BLCKSZ);
+        r = read(fd, buf.data, XLOG_BLCKSZ);
         if (r == XLOG_BLCKSZ)
         {
-            XLogLongPageHeader longhdr = (XLogLongPageHeader) buf;
+            XLogLongPageHeader longhdr = (XLogLongPageHeader) buf.data;

             WalSegSz = longhdr->xlp_seg_size;

diff --git a/src/include/c.h b/src/include/c.h
index 1e50103..193e9ac 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -989,6 +989,32 @@ extern void ExceptionalCondition(const char *conditionName,
  * ----------------------------------------------------------------
  */

+/*
+ * Use this, not "char buf[BLCKSZ]", to declare a field or local variable
+ * holding a page buffer, if that page might be accessed as a page and not
+ * just a string of bytes.  Otherwise the variable might be under-aligned,
+ * causing problems on alignment-picky hardware.  (In some places, we use
+ * this to declare buffers even though we only pass them to read() and
+ * write(), because copying to/from aligned buffers is usually faster than
+ * using unaligned buffers.)  We include both "double" and "int64" in the
+ * union to ensure that the compiler knows the value must be MAXALIGN'ed
+ * (cf. configure's computation of MAXIMUM_ALIGNOF).
+ */
+typedef union PGAlignedBuffer
+{
+    char        data[BLCKSZ];
+    double        force_align_d;
+    int64        force_align_i64;
+} PGAlignedBuffer;
+
+/* Same, but for an XLOG_BLCKSZ-sized buffer */
+typedef union PGAlignedXLogBuffer
+{
+    char        data[XLOG_BLCKSZ];
+    double        force_align_d;
+    int64        force_align_i64;
+} PGAlignedXLogBuffer;
+
 /* msb for char */
 #define HIGHBIT                    (0x80)
 #define IS_HIGHBIT_SET(ch)        ((unsigned char)(ch) & HIGHBIT)

Re: pg_verify_checksums and -fno-strict-aliasing

From
Michael Paquier
Date:
On Fri, Aug 31, 2018 at 03:55:51PM -0400, Tom Lane wrote:
> I wrote:
>> Some of these places might be performance-critical enough that adding
>> a palloc/pfree cycle would not be nice.  What I was considering doing
>> was inventing something like
>>
>> typedef union PGAlignedBuffer
>> {
>>     char    data[BLCKSZ];
>>     double    force_align;
>> } PGAlignedBuffer;
>
> Here's a proposed patch using that approach.

This solution is smarter than using malloc/palloc to enforce alignment.
I was digging into the GIN and bloom code with this pattern, but except
if I used TopTransactionContext for a simple approach, which is not
acceptable by the way, I was finishing with ugly memory contexts
used...  I am still not sure if that was completely correct either, this
needed more time ;)

> Although some of the places that were using "char buf[BLCKSZ]" variables
> weren't doing anything that really requires better alignment, I made
> almost all of them use PGAlignedBuffer variables anyway, on the grounds
> that you typically get better memcpy speed for aligned than unaligned
> transfers.

Makes sense.

> I also fixed a few places that were using the palloc solution, and
> one that was actually doing hand-alignment of the pointer (ugh).
> I didn't try to be thorough about getting rid of such pallocs,
> just fix places that seemed likely to be performance-critical.

Okay, for the memo replay_image_masked and master_image_masked
in xlog.c could make use of the new structure.  SetWALSegSize in
pg_standby.c and WriteEmptyXLOG in pg_resetwal.c, and pg_upgrade's
file.c could also be patched.

walmethods.c could also use some static buffers, not worth the
complication perhaps..

> +typedef union PGAlignedBuffer
> +{
> +    char        data[BLCKSZ];
> +    double        force_align_d;
> +    int64        force_align_i64;
> +} PGAlignedBuffer;
> +
> +/* Same, but for an XLOG_BLCKSZ-sized buffer */
> +typedef union PGAlignedXLogBuffer
> +{
> +    char        data[XLOG_BLCKSZ];
> +    double        force_align_d;
> +    int64        force_align_i64;
> +} PGAlignedXLogBuffer;

One complain I have is about the name of those structures.  Perhaps
PGAlignedBlock and PGAlignedXlogBlock make more sense?
--
Michael

Attachment

Re: pg_verify_checksums and -fno-strict-aliasing

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Fri, Aug 31, 2018 at 03:55:51PM -0400, Tom Lane wrote:
>> I also fixed a few places that were using the palloc solution, and
>> one that was actually doing hand-alignment of the pointer (ugh).
>> I didn't try to be thorough about getting rid of such pallocs,
>> just fix places that seemed likely to be performance-critical.

> Okay, for the memo replay_image_masked and master_image_masked
> in xlog.c could make use of the new structure.  SetWALSegSize in
> pg_standby.c and WriteEmptyXLOG in pg_resetwal.c, and pg_upgrade's
> file.c could also be patched.

I intentionally didn't change replay_image_masked/master_image_masked
to use statically-allocated buffers.  Since, AFAICS, those aren't
needed in most backend processes, they'd just be eating 16KB of
per-process data space to no purpose.

The others you mention could be changed, probably, but I didn't
bother as they didn't seem performance-critical.

>> +typedef union PGAlignedBuffer

> One complain I have is about the name of those structures.  Perhaps
> PGAlignedBlock and PGAlignedXlogBlock make more sense?

Don't have a strong preference, anybody else have an opinion?

(I also wondered whether to use "WAL" instead of "XLog" in that
struct name, but it seems like we've mostly stuck with "xlog"
in internal C names.)

            regards, tom lane


Re: pg_verify_checksums and -fno-strict-aliasing

From
Michael Paquier
Date:
On Fri, Aug 31, 2018 at 07:59:58PM -0400, Tom Lane wrote:
> The others you mention could be changed, probably, but I didn't
> bother as they didn't seem performance-critical.

It is not really critical indeed.  There is an argument to change them
so as other folks get used to it though.

> (I also wondered whether to use "WAL" instead of "XLog" in that
> struct name, but it seems like we've mostly stuck with "xlog"
> in internal C names.)

XLOG_BLCKSZ is used, which makes me think that XLog is better than WAL
here.  A matter of taste of course.
--
Michael

Attachment

Re: pg_verify_checksums and -fno-strict-aliasing

From
Fabien COELHO
Date:
Hello,

>> Okay, for the memo replay_image_masked and master_image_masked
>> in xlog.c could make use of the new structure.  SetWALSegSize in
>> pg_standby.c and WriteEmptyXLOG in pg_resetwal.c, and pg_upgrade's
>> file.c could also be patched.
>
> I intentionally didn't change replay_image_masked/master_image_masked
> to use statically-allocated buffers.  Since, AFAICS, those aren't
> needed in most backend processes, they'd just be eating 16KB of
> per-process data space to no purpose.
>
> The others you mention could be changed, probably, but I didn't
> bother as they didn't seem performance-critical.

I'd go for having just one same approach everywhere, for code base 
homogeneity.

>>> +typedef union PGAlignedBuffer
>
>> One complain I have is about the name of those structures.  Perhaps
>> PGAlignedBlock and PGAlignedXlogBlock make more sense?
>
> Don't have a strong preference, anybody else have an opinion?

I like "Block" better, because it's more precise.

> (I also wondered whether to use "WAL" instead of "XLog" in that
> struct name, but it seems like we've mostly stuck with "xlog"
> in internal C names.)

Best to blend with the surrounding code in the header file?

-- 
Fabien.


Re: pg_verify_checksums and -fno-strict-aliasing

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Fri, Aug 31, 2018 at 07:59:58PM -0400, Tom Lane wrote:
>> The others you mention could be changed, probably, but I didn't
>> bother as they didn't seem performance-critical.

> It is not really critical indeed.  There is an argument to change them
> so as other folks get used to it though.

Fair enough.  I renamed the types as suggested, changed a few more
places for consistency's sake, and pushed.

There still remain some places where palloc(BLCKSZ) or equivalent is used,
but there's no matching pfree.  In a lot of them the buffer is returned
to the caller so there's no choice.  It's likely that some are just
leaking the storage transiently and we could convert them to using a
PGAlignedBlock local variable, but I didn't bother trying to do the
analysis.

            regards, tom lane


Re: pg_verify_checksums and -fno-strict-aliasing

From
Michael Paquier
Date:
On Sat, Sep 01, 2018 at 03:32:10PM -0400, Tom Lane wrote:
> Fair enough.  I renamed the types as suggested, changed a few more
> places for consistency's sake, and pushed.

Thanks!

> There still remain some places where palloc(BLCKSZ) or equivalent is used,
> but there's no matching pfree.  In a lot of them the buffer is returned
> to the caller so there's no choice.  It's likely that some are just
> leaking the storage transiently and we could convert them to using a
> PGAlignedBlock local variable, but I didn't bother trying to do the
> analysis.

At quick glance, I am not seeing anything critical.  So the result looks
good to me.
--
Michael

Attachment