Thread: pgsql: Further code review for pg_lsn data type.

pgsql: Further code review for pg_lsn data type.

From
Robert Haas
Date:
Further code review for pg_lsn data type.

Change input function error messages to be more consistent with what is
done elsewhere.  Remove a bunch of redundant type casts, so that the
compiler will warn us if we screw up.  Don't pass LSNs by value on
platforms where a Datum is only 32 bytes, per buildfarm.  Move macros
for packing and unpacking LSNs to pg_lsn.h so that we can include
access/xlogdefs.h, to avoid an unsatisfied dependency on XLogRecPtr.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/694e3d139a9d090c58494428bebfadad216419da

Modified Files
--------------
src/backend/utils/adt/pg_lsn.c       |   38 +++++++++++++++++-----------------
src/include/catalog/pg_type.h        |    2 +-
src/include/fmgr.h                   |    2 --
src/include/postgres.h               |   14 -------------
src/include/utils/pg_lsn.h           |    7 +++++++
src/test/regress/expected/pg_lsn.out |   10 ++++-----
6 files changed, 32 insertions(+), 41 deletions(-)


Re: pgsql: Further code review for pg_lsn data type.

From
Andres Freund
Date:
Hi,

On 2014-02-19 15:10:52 +0000, Robert Haas wrote:
> Change input function error messages to be more consistent with what is
> done elsewhere.  Remove a bunch of redundant type casts, so that the
> compiler will warn us if we screw up.  Don't pass LSNs by value on
> platforms where a Datum is only 32 bytes, per buildfarm.  Move macros
> for packing and unpacking LSNs to pg_lsn.h so that we can include
> access/xlogdefs.h, to avoid an unsatisfied dependency on XLogRecPtr.

Hm, won't
#define DatumGetLSN(X) ((XLogRecPtr) DatumGetInt64(X))
#define LSNGetDatum(X) (Int64GetDatum((int64) (X)))
possibly truncate the value if it's larger than 2^(63-1) as int is
signed but XLogRecPtr is unsigned?

I am afraid you're going to have to copy the whole #ifdef
USE_FLOAT8_BYVAL dance.

Not that it's too likely to have that big values in reality atm...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pgsql: Further code review for pg_lsn data type.

From
Heikki Linnakangas
Date:
On 02/20/2014 02:56 AM, Andres Freund wrote:
> On 2014-02-19 15:10:52 +0000, Robert Haas wrote:
>> Change input function error messages to be more consistent with what is
>> done elsewhere.  Remove a bunch of redundant type casts, so that the
>> compiler will warn us if we screw up.  Don't pass LSNs by value on
>> platforms where a Datum is only 32 bytes, per buildfarm.  Move macros
>> for packing and unpacking LSNs to pg_lsn.h so that we can include
>> access/xlogdefs.h, to avoid an unsatisfied dependency on XLogRecPtr.
>
> Hm, won't
> #define DatumGetLSN(X) ((XLogRecPtr) DatumGetInt64(X))
> #define LSNGetDatum(X) (Int64GetDatum((int64) (X)))
> possibly truncate the value if it's larger than 2^(63-1) as int is
> signed but XLogRecPtr is unsigned?

No. Casting between unsigned and signed integers of same width doesn't
lose information. For example with 16-bit integers, casting unsigned
40000 to signed gives -25536. Casting signed -25536 back to unsigned
gives back 40000.

- Heikki


Re: pgsql: Further code review for pg_lsn data type.

From
Andres Freund
Date:
On 2014-02-20 08:25:01 +0200, Heikki Linnakangas wrote:
> On 02/20/2014 02:56 AM, Andres Freund wrote:
> >On 2014-02-19 15:10:52 +0000, Robert Haas wrote:
> >>Change input function error messages to be more consistent with what is
> >>done elsewhere.  Remove a bunch of redundant type casts, so that the
> >>compiler will warn us if we screw up.  Don't pass LSNs by value on
> >>platforms where a Datum is only 32 bytes, per buildfarm.  Move macros
> >>for packing and unpacking LSNs to pg_lsn.h so that we can include
> >>access/xlogdefs.h, to avoid an unsatisfied dependency on XLogRecPtr.
> >
> >Hm, won't
> >#define DatumGetLSN(X) ((XLogRecPtr) DatumGetInt64(X))
> >#define LSNGetDatum(X) (Int64GetDatum((int64) (X)))
> >possibly truncate the value if it's larger than 2^(63-1) as int is
> >signed but XLogRecPtr is unsigned?
>
> No. Casting between unsigned and signed integers of same width doesn't lose
> information. For example with 16-bit integers, casting unsigned 40000 to
> signed gives -25536. Casting signed -25536 back to unsigned gives back
> 40000.

Are you sure?

6.3.1.3 Signed and unsigned integers, paragraph 3:
"Otherwise, the new type is signed and the value cannot be represented
in it; either the result is implementation-defined or an
implementation-defined signal is raised."

Afaik unsigned to signed always safe, but not the other way round?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pgsql: Further code review for pg_lsn data type.

From
Heikki Linnakangas
Date:
On 02/20/2014 09:47 AM, Andres Freund wrote:
> On 2014-02-20 08:25:01 +0200, Heikki Linnakangas wrote:
>> On 02/20/2014 02:56 AM, Andres Freund wrote:
>>> On 2014-02-19 15:10:52 +0000, Robert Haas wrote:
>>>> Change input function error messages to be more consistent with what is
>>>> done elsewhere.  Remove a bunch of redundant type casts, so that the
>>>> compiler will warn us if we screw up.  Don't pass LSNs by value on
>>>> platforms where a Datum is only 32 bytes, per buildfarm.  Move macros
>>>> for packing and unpacking LSNs to pg_lsn.h so that we can include
>>>> access/xlogdefs.h, to avoid an unsatisfied dependency on XLogRecPtr.
>>>
>>> Hm, won't
>>> #define DatumGetLSN(X) ((XLogRecPtr) DatumGetInt64(X))
>>> #define LSNGetDatum(X) (Int64GetDatum((int64) (X)))
>>> possibly truncate the value if it's larger than 2^(63-1) as int is
>>> signed but XLogRecPtr is unsigned?
>>
>> No. Casting between unsigned and signed integers of same width doesn't lose
>> information. For example with 16-bit integers, casting unsigned 40000 to
>> signed gives -25536. Casting signed -25536 back to unsigned gives back
>> 40000.
>
> Are you sure?
>
> 6.3.1.3 Signed and unsigned integers, paragraph 3:
> "Otherwise, the new type is signed and the value cannot be represented
> in it; either the result is implementation-defined or an
> implementation-defined signal is raised."
>
> Afaik unsigned to signed always safe, but not the other way round?

Oh, that's interesting, I didn't know that. We do signed to unsigned
conversions in a few places:

$ grep -r -I PG_GETARG_INT . | grep uint
./src/backend/access/hash/hashfunc.c:    return hash_uint32((int32)
PG_GETARG_INT16(0));
./src/backend/access/hash/hashfunc.c:    return
hash_uint32(PG_GETARG_INT32(0));
./src/backend/utils/adt/varlena.c:    uint32        value = (uint32)
PG_GETARG_INT32(0);
./src/backend/utils/adt/varlena.c:    uint64        value = (uint64)
PG_GETARG_INT64(0);

And in fact, the SET_X_BYTES macros also work by casting the value to an
unsigned integer. So if signed -> unsigned is undefined, then the
behavior of IntXGetDatum macros is also undefined.

- Heikki


Re: pgsql: Further code review for pg_lsn data type.

From
Andres Freund
Date:
On 2014-02-20 10:22:30 +0200, Heikki Linnakangas wrote:
> On 02/20/2014 09:47 AM, Andres Freund wrote:
> >On 2014-02-20 08:25:01 +0200, Heikki Linnakangas wrote:
> >>On 02/20/2014 02:56 AM, Andres Freund wrote:
> >>>Hm, won't
> >>>#define DatumGetLSN(X) ((XLogRecPtr) DatumGetInt64(X))
> >>>#define LSNGetDatum(X) (Int64GetDatum((int64) (X)))
> >>>possibly truncate the value if it's larger than 2^(63-1) as int is
> >>>signed but XLogRecPtr is unsigned?
> >>
> >>No. Casting between unsigned and signed integers of same width doesn't lose
> >>information. For example with 16-bit integers, casting unsigned 40000 to
> >>signed gives -25536. Casting signed -25536 back to unsigned gives back
> >>40000.
> >
> >Are you sure?
> >
> >6.3.1.3 Signed and unsigned integers, paragraph 3:
> >"Otherwise, the new type is signed and the value cannot be represented
> >in it; either the result is implementation-defined or an
> >implementation-defined signal is raised."
> >
> >Afaik unsigned to signed always safe, but not the other way round?

Bollocks, as visible in the quoted paragraph above, it's other way
round: signed to unsigned is well defined, unsigned to signed is
not. Don't type while waiting for coffee. Sorry for that.

> And in fact, the SET_X_BYTES macros also work by casting the value to an
> unsigned integer. So if signed -> unsigned is undefined, then the behavior
> of IntXGetDatum macros is also undefined.

Our whole usage of Datum is a pretty damned big mess of undefined
behaviour. One day I want to see how big the penalty of making Datum a
proper union with all the basetypes is. That's well defined.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pgsql: Further code review for pg_lsn data type.

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> 6.3.1.3 Signed and unsigned integers, paragraph 3:
> "Otherwise, the new type is signed and the value cannot be represented
> in it; either the result is implementation-defined or an
> implementation-defined signal is raised."

"Implementation-defined" is entirely different from "undefined".
In practice, every two's-complement machine in the world is going
to define this behavior the same way.  The standard is written the
way it is to avoid assuming that the underlying hardware is
two's-complement ... but there are no such machines outside museums.

I think you're making a problem out of nothing.  We have considerably
more-real portability issues to worry about, like memory ordering.

            regards, tom lane


Re: pgsql: Further code review for pg_lsn data type.

From
Andres Freund
Date:
On 2014-02-20 09:59:51 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > 6.3.1.3 Signed and unsigned integers, paragraph 3:
> > "Otherwise, the new type is signed and the value cannot be represented
> > in it; either the result is implementation-defined or an
> > implementation-defined signal is raised."
>
> "Implementation-defined" is entirely different from "undefined".

Yea, I don't think I talked about undefined behaviour in the context of
this.

The undefined behaviour bit was more about the aliasing and such. I *do*
think it might be worth fixing that someday, but it's certainly nothing
presssing.

> I think you're making a problem out of nothing.  We have considerably
> more-real portability issues to worry about, like memory ordering.

I don't think it's a huge problem, but it's pretty easy to avoid, so why
not avoid it?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pgsql: Further code review for pg_lsn data type.

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-02-20 09:59:51 -0500, Tom Lane wrote:
>> I think you're making a problem out of nothing.  We have considerably
>> more-real portability issues to worry about, like memory ordering.

> I don't think it's a huge problem, but it's pretty easy to avoid, so why
> not avoid it?

It's *not* that easy to avoid.  If we turn Datum into a struct we face a
very significant risk of performance problems: on many machines the rules
for passing structs to functions differ from those for passing scalars,
and not in a good way.

If there were one shred of evidence that there is a real problem here,
it might be worth looking into; but the fact is that you're wasting
our time by even bringing it up.  Consider for example that the printf
family of functions don't have any problem if you pass an int and ask for
it to be printed with %u, or vice versa.  That behavior involves exactly
the same type of casting you're complaining about, and in practice it
is perfectly portable.

            regards, tom lane


Re: pgsql: Further code review for pg_lsn data type.

From
Andres Freund
Date:
On 2014-02-20 10:25:20 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-02-20 09:59:51 -0500, Tom Lane wrote:
> >> I think you're making a problem out of nothing.  We have considerably
> >> more-real portability issues to worry about, like memory ordering.
>
> > I don't think it's a huge problem, but it's pretty easy to avoid, so why
> > not avoid it?
>
> It's *not* that easy to avoid.  If we turn Datum into a struct we face a
> very significant risk of performance problems

All that's needed in this case is to copy DatumGetInt64/Int64GetDatum's definition and
avoid the intermediatry casts to int64.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services