Thread: [HACKERS] SerializedSnapshotData alignment

[HACKERS] SerializedSnapshotData alignment

From
Noah Misch
Date:
Dear 7b4ac19 authors,

Field ps_snapshot_data usually receives four-byte alignment within
ParallelIndexScanDescData, but it contains the eight-byte whenTaken field.
The select_parallel test dies with SIGBUS on "Oracle Solaris 10 1/13
s10s_u11wos_24a SPARC", building with gcc 4.9.2.  Some credible fixes:

1. Move the SerializedSnapshotData declaration from snapmgr.c to snapmgr.h and
   declare the ps_snapshot_data field to be of type SerializedSnapshotData.
   Probably also add a field "TransactionId xids[FLEXIBLE_ARRAY_MEMBER]" to
   SerializedSnapshotData, to assert the variable-length nature.

2. Change "char ps_snapshot_data[FLEXIBLE_ARRAY_MEMBER]" to "int64 ...".  I
   have attached this in SerializedSnapshot-int64-v1.patch.

3. Change no declarations, and make snapmgr.c memcpy() the
   SerializedSnapshotData through a local buffer.  I have attached this as
   SerializedSnapshot-memcpy-v1.patch.

I like (2) well enough, but I don't see that technique used elsewhere in the
tree.  (1) is more typical of PostgreSQL, though I personally like it when
structs can stay private to a file.  (3) is also well-attested, particularly
in xlog replay code.  I am leaning toward (2).  Other opinions?

Field phs_snapshot_data happens to receive eight-byte alignment within
ParallelHeapScanDescData; otherwise, such scans would fail the same way.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] SerializedSnapshotData alignment

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> Dear 7b4ac19 authors,
> Field ps_snapshot_data usually receives four-byte alignment within
> ParallelIndexScanDescData, but it contains the eight-byte whenTaken field.
> The select_parallel test dies with SIGBUS on "Oracle Solaris 10 1/13
> s10s_u11wos_24a SPARC", building with gcc 4.9.2.

It's a little distressing that the buildfarm didn't find this already.
Is there some reason why it's specific to that particular compiler,
rather than generic to alignment-picky 64-bit machines?

In general, though, I agree that using a char[] member to represent
anything that has any alignment requirement at all is seriously bad
coding style that is almost certain to fail eventually.

A solution you didn't mention is to change the ParallelIndexScanDescData
field to be a pointer, perhaps "struct SerializedSnapshotData *", while
leaving that struct opaque so far as relscan.h is concerned.  This could
avoid the need to use the unsafe blind casts that I'm sure must be
involved in accesses to that field at present.
        regards, tom lane



Re: [HACKERS] SerializedSnapshotData alignment

From
Noah Misch
Date:
On Sun, Feb 26, 2017 at 07:53:15PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > Dear 7b4ac19 authors,
> > Field ps_snapshot_data usually receives four-byte alignment within
> > ParallelIndexScanDescData, but it contains the eight-byte whenTaken field.
> > The select_parallel test dies with SIGBUS on "Oracle Solaris 10 1/13
> > s10s_u11wos_24a SPARC", building with gcc 4.9.2.
> 
> It's a little distressing that the buildfarm didn't find this already.
> Is there some reason why it's specific to that particular compiler,
> rather than generic to alignment-picky 64-bit machines?

I wondered the same thing; if nothing else, why don't protosciurus and
castoroides fail the same way?  They do use older compilers, "Sun C 5.10
SunOS_sparc 2009/06/03" and gcc 3.4.3.  I have "Sun C 5.12 SunOS_sparc
2011/11/16" and gcc 4.9.2, both of which are alignment-sensitive in several
configurations, according to the attached test program.  However, in a 32-bit
build with this Sun C, I don't get alignment-related bus errors.  (Those
animals build 64-bit, so this isn't the full story.)

> In general, though, I agree that using a char[] member to represent
> anything that has any alignment requirement at all is seriously bad
> coding style that is almost certain to fail eventually.
> 
> A solution you didn't mention is to change the ParallelIndexScanDescData
> field to be a pointer, perhaps "struct SerializedSnapshotData *", while
> leaving that struct opaque so far as relscan.h is concerned.  This could
> avoid the need to use the unsafe blind casts that I'm sure must be
> involved in accesses to that field at present.

That, too, would be reasonable.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] SerializedSnapshotData alignment

From
Thomas Munro
Date:
On Mon, Feb 27, 2017 at 2:18 PM, Noah Misch <noah@leadboat.com> wrote:
> I wondered the same thing; if nothing else, why don't protosciurus and
> castoroides fail the same way?  They do use older compilers, "Sun C 5.10
> SunOS_sparc 2009/06/03" and gcc 3.4.3.  I have "Sun C 5.12 SunOS_sparc
> 2011/11/16" and gcc 4.9.2, both of which are alignment-sensitive in several
> configurations, according to the attached test program.  However, in a 32-bit
> build with this Sun C, I don't get alignment-related bus errors.  (Those
> animals build 64-bit, so this isn't the full story.)

It's been a while but I seem to recall that Sun C defaulted to a
-xmemalign setting that tolerated misaligned reads in 32 bit builds,
but used a different default on 64 bit builds.  "Solaris Application
Programming" 5.8.5 seems to confirm this:  "For 32-bit applications,
since Sun Studio 9, the default is for the compiler to assume 8-byte
alignment and to trap and correct any misaligned data accesses.  For
64-bit applications, the compiler assumes 8-byte alignment, but the
application will SIGBUS on a misaligned access."

Yet castoroides seems to be building with -m64, so that's not the
explanation.  Could it be correctly aligned by coincidence?

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] SerializedSnapshotData alignment

From
Amit Kapila
Date:
On Mon, Feb 27, 2017 at 5:13 AM, Noah Misch <noah@leadboat.com> wrote:
> Dear 7b4ac19 authors,
>
> Field ps_snapshot_data usually receives four-byte alignment within
> ParallelIndexScanDescData, but it contains the eight-byte whenTaken field.
> The select_parallel test dies with SIGBUS on "Oracle Solaris 10 1/13
> s10s_u11wos_24a SPARC", building with gcc 4.9.2.  Some credible fixes:
>
> 1. Move the SerializedSnapshotData declaration from snapmgr.c to snapmgr.h and
>    declare the ps_snapshot_data field to be of type SerializedSnapshotData.
>    Probably also add a field "TransactionId xids[FLEXIBLE_ARRAY_MEMBER]" to
>    SerializedSnapshotData, to assert the variable-length nature.
>
> 2. Change "char ps_snapshot_data[FLEXIBLE_ARRAY_MEMBER]" to "int64 ...".  I
>    have attached this in SerializedSnapshot-int64-v1.patch.
>
> 3. Change no declarations, and make snapmgr.c memcpy() the
>    SerializedSnapshotData through a local buffer.  I have attached this as
>    SerializedSnapshot-memcpy-v1.patch.
>
> I like (2) well enough, but I don't see that technique used elsewhere in the
> tree.
>

You seem to have forgotten to change all the callers of
Serialize/RestoreSnapshot to int64* which is causing below warnings on
my m/c:

1>src/backend/access/transam/parallel.c(334): warning C4133:
'function' : incompatible types - from 'char *' to 'int64 *'
1>src/backend/access/transam/parallel.c(338): warning C4133:
'function' : incompatible types - from 'char *' to 'int64 *'
1>src/backend/access/transam/parallel.c(1072): warning C4133:
'function' : incompatible types - from 'char *' to 'int64 *'
1>src/backend/access/transam/parallel.c(1078): warning C4133:
'function' : incompatible types - from 'char *' to 'int64 *'


>  (1) is more typical of PostgreSQL, though I personally like it when
> structs can stay private to a file.  (3) is also well-attested, particularly
> in xlog replay code.  I am leaning toward (2).  Other opinions?
>

Your Approach-2 patch looks like a sane idea to me, if we decide to do
it some other way, I can write a patch unless you prefer to do it by
yourself.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] SerializedSnapshotData alignment

From
Robert Haas
Date:
On Mon, Feb 27, 2017 at 5:13 AM, Noah Misch <noah@leadboat.com> wrote:
> Dear 7b4ac19 authors,
>
> Field ps_snapshot_data usually receives four-byte alignment within
> ParallelIndexScanDescData, but it contains the eight-byte whenTaken field.
> The select_parallel test dies with SIGBUS on "Oracle Solaris 10 1/13
> s10s_u11wos_24a SPARC", building with gcc 4.9.2.  Some credible fixes:
>
> 1. Move the SerializedSnapshotData declaration from snapmgr.c to snapmgr.h and
>    declare the ps_snapshot_data field to be of type SerializedSnapshotData.
>    Probably also add a field "TransactionId xids[FLEXIBLE_ARRAY_MEMBER]" to
>    SerializedSnapshotData, to assert the variable-length nature.
>
> 2. Change "char ps_snapshot_data[FLEXIBLE_ARRAY_MEMBER]" to "int64 ...".  I
>    have attached this in SerializedSnapshot-int64-v1.patch.
>
> 3. Change no declarations, and make snapmgr.c memcpy() the
>    SerializedSnapshotData through a local buffer.  I have attached this as
>    SerializedSnapshot-memcpy-v1.patch.
>
> I like (2) well enough, but I don't see that technique used elsewhere in the
> tree.  (1) is more typical of PostgreSQL, though I personally like it when
> structs can stay private to a file.  (3) is also well-attested, particularly
> in xlog replay code.  I am leaning toward (2).  Other opinions?

In my imagination, we were already doing #3, so I'd probably favor
that approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] SerializedSnapshotData alignment

From
Noah Misch
Date:
On Mon, Feb 27, 2017 at 03:08:35PM +1300, Thomas Munro wrote:
> On Mon, Feb 27, 2017 at 2:18 PM, Noah Misch <noah@leadboat.com> wrote:
> > I wondered the same thing; if nothing else, why don't protosciurus and
> > castoroides fail the same way?  They do use older compilers, "Sun C 5.10
> > SunOS_sparc 2009/06/03" and gcc 3.4.3.  I have "Sun C 5.12 SunOS_sparc
> > 2011/11/16" and gcc 4.9.2, both of which are alignment-sensitive in several
> > configurations, according to the attached test program.  However, in a 32-bit
> > build with this Sun C, I don't get alignment-related bus errors.  (Those
> > animals build 64-bit, so this isn't the full story.)
> 
> It's been a while but I seem to recall that Sun C defaulted to a
> -xmemalign setting that tolerated misaligned reads in 32 bit builds,
> but used a different default on 64 bit builds.  "Solaris Application
> Programming" 5.8.5 seems to confirm this:  "For 32-bit applications,
> since Sun Studio 9, the default is for the compiler to assume 8-byte
> alignment and to trap and correct any misaligned data accesses.  For
> 64-bit applications, the compiler assumes 8-byte alignment, but the
> application will SIGBUS on a misaligned access."
> 
> Yet castoroides seems to be building with -m64, so that's not the
> explanation.  Could it be correctly aligned by coincidence?

Coincidental alignment was it.  ps_snapshot_data has offset 16 in a 64-bit
build and offset 12 in a 32-bit build, so it is well-aligned in 64-bit only.
I was building a 32-bit PostgreSQL; when I build a 64-bit PostgreSQL, I no
longer get the SIGBUS.



Re: [HACKERS] SerializedSnapshotData alignment

From
Noah Misch
Date:
On Mon, Feb 27, 2017 at 12:11:54PM +0530, Robert Haas wrote:
> On Mon, Feb 27, 2017 at 5:13 AM, Noah Misch <noah@leadboat.com> wrote:
> > Dear 7b4ac19 authors,
> >
> > Field ps_snapshot_data usually receives four-byte alignment within
> > ParallelIndexScanDescData, but it contains the eight-byte whenTaken field.
> > The select_parallel test dies with SIGBUS on "Oracle Solaris 10 1/13
> > s10s_u11wos_24a SPARC", building with gcc 4.9.2.  Some credible fixes:
> >
> > 1. Move the SerializedSnapshotData declaration from snapmgr.c to snapmgr.h and
> >    declare the ps_snapshot_data field to be of type SerializedSnapshotData.
> >    Probably also add a field "TransactionId xids[FLEXIBLE_ARRAY_MEMBER]" to
> >    SerializedSnapshotData, to assert the variable-length nature.
> >
> > 2. Change "char ps_snapshot_data[FLEXIBLE_ARRAY_MEMBER]" to "int64 ...".  I
> >    have attached this in SerializedSnapshot-int64-v1.patch.
> >
> > 3. Change no declarations, and make snapmgr.c memcpy() the
> >    SerializedSnapshotData through a local buffer.  I have attached this as
> >    SerializedSnapshot-memcpy-v1.patch.
> >
> > I like (2) well enough, but I don't see that technique used elsewhere in the
> > tree.  (1) is more typical of PostgreSQL, though I personally like it when
> > structs can stay private to a file.  (3) is also well-attested, particularly
> > in xlog replay code.  I am leaning toward (2).  Other opinions?
> 
> In my imagination, we were already doing #3, so I'd probably favor
> that approach.

Pushed that way.