Re: [GENERAL] C++ port of Postgres - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [GENERAL] C++ port of Postgres
Date
Msg-id CAEepm=17=83AzSL7yUe4DQXRD4zC9_cN_9sW6yTpCuatmx4xQg@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] C++ port of Postgres  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: [GENERAL] C++ port of Postgres
List pgsql-hackers
On Mon, Sep 26, 2016 at 10:57 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Sep 1, 2016 at 1:41 AM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>>
>> [trimmed cc list because of big attachments]
>>
>> On 8/16/16 4:22 PM, Jim Nasby wrote:
>> > Joy, do you have an idea what a *minimally invasive* patch for C++
>> > support would look like? That's certainly the first step here.
>>
>> I developed a minimally invasive patch for C++ support a few years ago
>> shortly after I wrote that blog post.  Since there appears to have been
>> some interest here now, I have updated that and split it up into logical
>> chunks.
>>
>> So here you go.
>
>
> I looked at a random selection of these patches this morning.

And this morning I looked at the rest of them.

> 0004-Fix-LDFLAGS-test-for-C.patch

Makes sense.

> 0005-Add-test-for-Wmissing-prototypes.patch

This does seem to follow the example of how we test for support for
other warning flags.

> 0006-Remove-unnecessary-prototypes.patch

Looks OK.

> 0007-Fix-incorrect-type-cast.patch
 /* array of check flags, reported to consistentFn */
- bool   *entryRes;
+ GinTernaryValue *entryRes;

Right.  That would be pretty dodgy even in C if we ever use stdbool.h,
because sizeof(_Bool) is implementation defined.  The
interchangeability relies on bool and GinTernaryValue both being
typedefs for 'char'.  (Not to mention the dangerous contradictions
possible with bools obtained that way: 'b == false || b == true' can
be false, which I guess has been thought about already and is off
topic here.)

I wonder if the following bit of gin.h should be more nuanced: maybe
it's OK to convert between bool and GinTernaryValue, but it's
definitely not OK to cast between pointers types?  Or maybe we should
have a function/macro to convert between the types explicitly and not
encourage people to consider them convertible.
 /*  * A ternary value used by tri-consistent functions.  *  * For convenience, this is compatible with booleans. A
booleancan be  * safely cast to a GinTernaryValue.  */ typedef char GinTernaryValue;
 

> 0008-Add-necessary-type-cast.patch

Maybe instead of this:

- gcv.check = check;
+ gcv.check = (GinTernaryValue *) check;

... it would be better to do this?

-        bool       *check = (bool *) PG_GETARG_POINTER(0);
+        GinTernaryValue *check = (GinTernaryValue *) PG_GETARG_POINTER(0);

> 0009-Rename-some-typedefs-to-avoid-name-conflicts.patch

I don't know if it's a relevant precedent or not, but I noticed that
fdwapi.h, amapi.h and tsmapi.h used the convention that function
pointer types are named XXX_function, and then the members of a struct
behaving as a kind of vtable are named XXX.

> 0010-Reorder-some-things.patch

> Forward declarations of static variables are not possible in C++, so
> move the full definition before its use.

Right.

> 0011-Add-missing-fields-in-struct-initializations.patch

I don't undestand why this is necessary, unless you're explicitly
choosing to enable a warning like missing-field-initializers for C++
but not for C.  Implicit zero-initialisation of trailing missing
initialisers is a feature, not a bug.  Also I noticed that 0013 (or a
proper solution to the keyword collision problem) is needed before
this one.

> 0012-Separate-enum-from-struct.patch

Right.

> 0013-Avoid-C-key-words.patch

> This is not a long-term solution, because C header files that are
> included somewhere might have C++ awareness and will break if the key
> word is defined away.  But this shows the list of words that would have
> to be renamed around the code.

Right, let's rename them all directly.

> 0015-Fix-function-prototypes-for-C.patch

I wonder if (perhaps in some later later patch) walkers should take
const pointers and mutators non-const.  That may require propagating
constness around some more places.

> 0017-Don-t-define-bool-in-C.patch

Check.

> 0018-Change-TimeoutId-from-enum-to-integer.patch

This works, but I feel like we're losing something valuable if we
convert all our enums to ints just because some tiny bit of code
somewhere wants to loop over them.  Maybe we should we keep enums like
this, and do the necessary casting in the small number of places that
do int-like-stuff with them?  Like so:

diff --git a/src/backend/utils/misc/timeout.c b/src/backend/utils/misc/timeout.c
index 7171a7c..cc5b2c4 100644
--- a/src/backend/utils/misc/timeout.c
+++ b/src/backend/utils/misc/timeout.c
@@ -348,7 +348,7 @@ InitializeTimeouts(void)
       for (i = 0; i < MAX_TIMEOUTS; i++)       {
-               all_timeouts[i].index = i;
+               all_timeouts[i].index = (TimeoutId) i;               all_timeouts[i].indicator = false;
all_timeouts[i].timeout_handler= NULL;               all_timeouts[i].start_time = 0;
 
@@ -379,7 +379,8 @@ RegisterTimeout(TimeoutId id, timeout_handler_proc handler)       if (id >= USER_TIMEOUT)       {
           /* Allocate a user-defined timeout reason */
 
-               for (id = USER_TIMEOUT; id < MAX_TIMEOUTS; id++)
+               for (id = USER_TIMEOUT; id < MAX_TIMEOUTS;
+                        id = (TimeoutId) ((int) id + 1))                       if (all_timeouts[id].timeout_handler ==
NULL)                              break;               if (id >= MAX_TIMEOUTS)
 

> 0019-Add-explicit-cast-for-casting-away-volatile.patch

- tmp = e->counters;
+ tmp = const_cast<pgssEntry *>(e)->counters;

Obviously we can't just drop C++ casts in here like that.  Casting
away volatile might in theory produce the wrong result in some cases,
but we can see that there is a compiler barrier right there already
(included in the preceding SpinLockAcquire) so I guess we could do a
C-style cast to lose volatile so it remains a valid C program:

-                       tmp = e->counters;
+                       tmp = ((pgssEntry *) e)->counters;


> 0020-Add-more-extern-key-words.patch

Don't we just need to put declarations in a header included by guc.c
and the defining transation unit, or explicitly duplicate the
declarations in both (including the one that defines it immediately
after)?  Then we would have valid C and C++ with the right linkage.
That is, instead of this:

-const struct config_enum_entry wal_level_options[] = {
+extern const struct config_enum_entry wal_level_options[] = { {"minimal", WAL_LEVEL_MINIMAL, false}, {"replica",
WAL_LEVEL_REPLICA,false}, {"archive", WAL_LEVEL_REPLICA, true}, /* deprecated */
 

... we would do this:

+extern const struct config_enum_entry wal_level_options[];

const struct config_enum_entry wal_level_options[] = { {"minimal", WAL_LEVEL_MINIMAL, false}, {"replica",
WAL_LEVEL_REPLICA,false}, {"archive", WAL_LEVEL_REPLICA, true}, /* deprecated */
 

> 0022-Disable-conflicting-function.patch

> This function conflicts with a function in the backend.  FIXME

Indeed:

$ git grep ^xml_is_well_formed\(
contrib/xml2/xpath.c:xml_is_well_formed(PG_FUNCTION_ARGS)
src/backend/utils/adt/xml.c:xml_is_well_formed(PG_FUNCTION_ARGS)

The contrib one says:

/** Returns true if document is well-formed** Note: this has been superseded by a core function.  We still have to*
haveit in the contrib module so that existing SQL-level references* to the function won't fail; but in normal usage
withup-to-date SQL* definitions for the contrib module, this won't be called.*/
 

It's been in core since 9.1, ie in every supported version.  I'm not
sure what the comment means exactly but I wonder: can we just delete
the contrib version now?

As for the commitfest entry: this thread discusses two different
people's efforts to compile PostgreSQL as C++.  Joy Arulraj's github
branch derives in some way from Peter Eisentraut's work, but I have
provided feedback on Peter's patches, because (1) they were posted
here in patch format and (2) there is a commitfest entry listening
Peter as the author.  I think several of these patches are
committable, and many obviously are not.  Heikki already set the CF
item to 'Ready for Committer' based on an inspection of a few of the
patches, but invited others to continue looking, so I did.

-- 
Thomas Munro
http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Set log_line_prefix and application name in test drivers
Next
From: Craig Ringer
Date:
Subject: Re: Add support for restrictive RLS policies