Re: [HACKERS] [GENERAL] C++ port of Postgres - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] [GENERAL] C++ port of Postgres
Date
Msg-id 20170301063016.o57uv2hxgxysoif4@alap3.anarazel.de
Whole thread Raw
In response to Re: [GENERAL] C++ port of Postgres  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [GENERAL] C++ port of Postgres
List pgsql-hackers
Hi,

On 2017-02-28 23:42:45 -0500, Peter Eisentraut wrote:
> On 1/26/17 22:46, Andres Freund wrote:
> > On 2016-09-30 15:24:09 -0400, Peter Eisentraut wrote:
> >> Yeah, I have committed a few of the patches now and I'll close the CF
> >> entry now.  Thanks for your research.
> > 
> > Are you planning to push more of these at some point?
> 
> Sure, let's take a look.

Cool.


> Getting rid of the C++ keywords would open up the possibility of using
> -Wc++-compat, which you have expressed interest in, I think.

Indeed. And if we go down this path, I'm going to argue that we should
add that by default.

> Subject: [PATCH v2 01/23] Fix mixup of bool and ternary value
> 
> ---
>  src/backend/access/gin/ginscan.c | 2 +-
>  src/include/access/gin_private.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/backend/access/gin/ginscan.c b/src/backend/access/gin/ginscan.c
> index c3ce0479c5..c83375d6b4 100644
> --- a/src/backend/access/gin/ginscan.c
> +++ b/src/backend/access/gin/ginscan.c
> @@ -147,7 +147,7 @@ ginFillScanKey(GinScanOpaque so, OffsetNumber attnum,
>      key->nuserentries = nUserQueryValues;
>  
>      key->scanEntry = (GinScanEntry *) palloc(sizeof(GinScanEntry) * nQueryValues);
> -    key->entryRes = (bool *) palloc0(sizeof(bool) * nQueryValues);
> +    key->entryRes = (GinTernaryValue *) palloc0(sizeof(GinTernaryValue) * nQueryValues);
>  
>      key->query = query;
>      key->queryValues = queryValues;
> diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
> index 34e7339f05..f1f395ac85 100644
> --- a/src/include/access/gin_private.h
> +++ b/src/include/access/gin_private.h
> @@ -281,7 +281,7 @@ typedef struct GinScanKeyData
>      int            nadditional;
>  
>      /* array of check flags, reported to consistentFn */
> -    bool       *entryRes;
> +    GinTernaryValue *entryRes;
>      bool        (*boolConsistentFn) (GinScanKey key);
>      GinTernaryValue (*triConsistentFn) (GinScanKey key);
>      FmgrInfo   *consistentFmgrInfo;

That seems like a pretty clear-cut case given the usage of entryRes[i].


> From 7c2f4f31df0eec816d6bb17aa6df2e7f3c6da03e Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter_e@gmx.net>
> Date: Tue, 30 Aug 2016 12:00:00 -0400
> Subject: [PATCH v2 02/23] Fix LDFLAGS test for C++
> 
> The test looks to link to particular function, so we need to make that
> function have C linkage.
> ---
>  config/c-compiler.m4 |  9 ++++++++-
>  configure            | 27 ++++++++++++++++++++++++---
>  2 files changed, 32 insertions(+), 4 deletions(-)
> 
> diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
> index 7d901e1f1a..21fb4645c4 100644
> --- a/config/c-compiler.m4
> +++ b/config/c-compiler.m4
> @@ -353,7 +353,14 @@ AC_DEFUN([PGAC_PROG_CC_LDFLAGS_OPT],
>  AC_CACHE_CHECK([whether $CC supports $1], [Ac_cachevar],
>  [pgac_save_LDFLAGS=$LDFLAGS
>  LDFLAGS="$pgac_save_LDFLAGS $1"
> -AC_RUN_IFELSE([AC_LANG_PROGRAM([extern void $2 (); void (*fptr) () = $2;],[])],
> +AC_RUN_IFELSE([AC_LANG_PROGRAM([#ifdef __cplusplus
> +extern "C" {
> +#endif
> +extern void $2 ();
> +#ifdef __cplusplus
> +}
> +#endif
> +void (*fptr) () = $2;],[])],
>                [Ac_cachevar=yes],
>                [Ac_cachevar=no],
>                [Ac_cachevar="assuming no"])

Hm. I'm a bit confused here.


The description of PGAC_PROG_CC_LDFLAGS_OPT isn't exactly great:

# PGAC_PROG_CC_LDFLAGS_OPT
# ------------------------
# Given a string, check if the compiler supports the string as a
# command-line option. If it does, add the string to LDFLAGS.

doesn't even mention that the second parameter is something kind of
required, and what it could be used for.

Nor is:
# For reasons you'd really rather not know about, this checks whether
# you can link to a particular function, not just whether you can link.
# In fact, we must actually check that the resulting program runs :-(

exactly helpful...


But if we accept the premise of the current way to do things, this
mostly makes sense.  I do wonder if we really should use CC to run a C++
compiler, but ...


> From fd5248ca63ce3cf35f22cebd2088c4dfd3d994a4 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter_e@gmx.net>
> Date: Tue, 30 Aug 2016 12:00:00 -0400
> Subject: [PATCH v2 03/23] Add test for -Wmissing-prototypes
> 
> This was part of the hard-coded default warnings set in C, but the
> option is not valid for C++ (since prototypes are always required).

>  
>  if test "$GCC" = yes -a "$ICC" = no; then
> -  CFLAGS="-Wall -Wmissing-prototypes -Wpointer-arith"
> +  CFLAGS="-Wall -Wpointer-arith"
> +  # not valid for C++
> +  PGAC_PROG_CC_CFLAGS_OPT([-Wmissing-prototypes])
>    # These work in some but not all gcc versions
>    PGAC_PROG_CC_CFLAGS_OPT([-Wdeclaration-after-statement])
>    PGAC_PROG_CC_CFLAGS_OPT([-Wendif-labels])

Makes sense.


> From 3447e7cc85c1f2836d882e19952e6db1950a2607 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter_e@gmx.net>
> Date: Tue, 30 Aug 2016 12:00:00 -0400
> Subject: [PATCH v2 04/23] Add necessary type cast
> 
> ---
>  src/backend/utils/adt/tsginidx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/backend/utils/adt/tsginidx.c b/src/backend/utils/adt/tsginidx.c
> index 83a939dfd5..8f38972676 100644
> --- a/src/backend/utils/adt/tsginidx.c
> +++ b/src/backend/utils/adt/tsginidx.c
> @@ -309,7 +309,7 @@ gin_tsquery_consistent(PG_FUNCTION_ARGS)
>           * query.
>           */
>          gcv.first_item = GETQUERY(query);
> -        gcv.check = check;
> +        gcv.check = (GinTernaryValue *) check;
>          gcv.map_item_operand = (int *) (extra_data[0]);
>          gcv.need_recheck = recheck;
>  
> -- 
> 2.12.0

k.


> From db1807e62dd6ccf358132a7c56ede2531d33336f Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter_e@gmx.net>
> Date: Tue, 30 Aug 2016 12:00:00 -0400
> Subject: [PATCH v2 05/23] Rename some typedefs to avoid name conflicts
> 
> C++ does not like that in struct _archiveHandle some of the fields have
> the same name as a typedef.  This changes the meaning of the typedef in
> the scope of the struct, causing warnings and possibly other problems.
> 
> Rename the types so they have names distinct from the struct fields.

k.


> From 997b09fca65a139af8e73d7e6356739383bd8df1 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter_e@gmx.net>
> Date: Tue, 30 Aug 2016 12:00:00 -0400
> Subject: [PATCH v2 06/23] Reorder some things
> 
> Forward declarations of static variables are not possible in C++, so
> move the full definition before its use.

k.


> From e3393f8f5718b31c4ffe0346edabd8b08a2c7a3f Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter_e@gmx.net>
> Date: Tue, 30 Aug 2016 12:00:00 -0400
> Subject: [PATCH v2 07/23] Eliminate C++ keywords
> 
> Remove use of these keywords: class constexpr delete friend namespace
> new operator private public template this typeid typename

Phew, this is going to be fun.

As indicated by the size of this patch, such violations are frequent. I
don't think we should commit this without shortly afterwards adding
-Wc++-compat or even -Werror=c++-compat, otherwise we'll just
continually end up breaking this.


> diff --git a/contrib/hstore/hstore.h b/contrib/hstore/hstore.h
> index 6bab08b7de..fbedc971c0 100644
> --- a/contrib/hstore/hstore.h
> +++ b/contrib/hstore/hstore.h
> @@ -188,7 +188,7 @@ extern Pairs *hstoreArrayToPairs(ArrayType *a, int *npairs);
>   * for now, we default to on for the benefit of people restoring old dumps
>   */
>  #ifndef HSTORE_POLLUTE_NAMESPACE
> -#define HSTORE_POLLUTE_NAMESPACE 1
> +#define HSTORE_POLLUTE_NAMESPACE 0
>  #endif

Doesn't seem like something that should be folded into a patch like this.


> diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
> index a38da3047f..302a55a0bf 100644
> --- a/src/backend/catalog/namespace.c
> +++ b/src/backend/catalog/namespace.c
> @@ -2153,7 +2153,7 @@ TSParserIsVisible(Oid prsId)
>  {
>      HeapTuple    tup;
>      Form_pg_ts_parser form;
> -    Oid            namespace;
> +    Oid            prsnamespace;
>      bool        visible;

Not yours, but I really do hate the table table prefixes in catalog
tables :(



> --- a/src/backend/optimizer/geqo/geqo_erx.c
> +++ b/src/backend/optimizer/geqo/geqo_erx.c
> @@ -281,7 +281,7 @@ static Gene
>  gimme_gene(PlannerInfo *root, Edge edge, Edge *edge_table)
>  {
>      int            i;
> -    Gene        friend;
> +    Gene        friend_;

friend_gene? Not a fan of underscore ending name.


> @@ -869,7 +869,7 @@ boolexpr_startup_fn(Node *clause, PredIterInfo info)
>  typedef struct
>  {
>      OpExpr        opexpr;
> -    Const        constexpr;
> +    Const        const_expr;
>      int            next_elem;
>      int            num_elems;
>      Datum       *elem_values;
> @@ -913,13 +913,13 @@ arrayconst_startup_fn(Node *clause, PredIterInfo info)
>      state->opexpr.args = list_copy(saop->args);
>  
>      /* Set up a dummy Const node to hold the per-element values */
> -    state->constexpr.xpr.type = T_Const;
> -    state->constexpr.consttype = ARR_ELEMTYPE(arrayval);
> -    state->constexpr.consttypmod = -1;
> -    state->constexpr.constcollid = arrayconst->constcollid;
> -    state->constexpr.constlen = elmlen;
> -    state->constexpr.constbyval = elmbyval;
> -    lsecond(state->opexpr.args) = &state->constexpr;
> +    state->const_expr.xpr.type = T_Const;
> +    state->const_expr.consttype = ARR_ELEMTYPE(arrayval);
> +    state->const_expr.consttypmod = -1;
> +    state->const_expr.constcollid = arrayconst->constcollid;
> +    state->const_expr.constlen = elmlen;
> +    state->const_expr.constbyval = elmbyval;
> +    lsecond(state->opexpr.args) = &state->const_expr;

.oO(C++ needs more classes of keywords/col_name_keyword)


> diff --git a/src/backend/utils/adt/rangetypes_gist.c b/src/backend/utils/adt/rangetypes_gist.c
> index f81b16c236..9f5390f34b 100644
> --- a/src/backend/utils/adt/rangetypes_gist.c
> +++ b/src/backend/utils/adt/rangetypes_gist.c
> @@ -257,8 +257,8 @@ range_gist_penalty(PG_FUNCTION_ARGS)
>      GISTENTRY  *origentry = (GISTENTRY *) PG_GETARG_POINTER(0);
>      GISTENTRY  *newentry = (GISTENTRY *) PG_GETARG_POINTER(1);
>      float       *penalty = (float *) PG_GETARG_POINTER(2);
> -    RangeType  *orig = DatumGetRangeType(origentry->key);
> -    RangeType  *new = DatumGetRangeType(newentry->key);
> +    RangeType  *origkey = DatumGetRangeType(origentry->key);
> +    RangeType  *newkey = DatumGetRangeType(newentry->key);

I guess you renamed orig just for symmetry? Because I don't think it's a
keyword?


> @@ -663,9 +663,9 @@ RestoreArchive(Archive *AHX)
>          if ((te->reqs & (REQ_SCHEMA | REQ_DATA)) != 0)
>          {
>              /* Show namespace if available */
> -            if (te->namespace)
> +            if (te->schema)

Arguable the 'namespace' in the comments should be changed to. But
whatever.

> diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
> index 81f0db8d3c..ac2ebb7eea 100644
> --- a/src/bin/pg_dump/pg_backup_archiver.h
> +++ b/src/bin/pg_dump/pg_backup_archiver.h
> @@ -212,7 +212,7 @@ typedef enum
>  
>  struct _archiveHandle
>  {
> -    Archive        public;            /* Public part of archive */
> +    Archive        archive;        /* Public part of archive */
>      int            version;        /* Version of file */
>  
>      char       *archiveRemoteVersion;    /* When reading an archive, the
> @@ -342,7 +342,7 @@ struct _tocEntry
>      bool        hadDumper;        /* Archiver was passed a dumper routine (used
>                                   * in restore) */
>      char       *tag;            /* index tag */
> -    char       *namespace;        /* null or empty string if not in a schema */
> +    char       *schema;            /* null or empty string if not in a schema */
>      char       *tablespace;        /* null if not in a tablespace; empty string
>                                   * means use database default */
>      char       *owner;
> @@ -393,10 +393,10 @@ TocEntry   *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
>  extern bool checkSeek(FILE *fp);

Phew, these are painful due to the widespread impact. Will make
backpatching even less fun. But I don't really see an alternative, so we
better just get it over.


> @@ -265,16 +265,16 @@ SimpleXLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
>           * be done both forward and backward, consider also switching timeline
>           * accordingly.
>           */
> -        while (private->tliIndex < targetNentries - 1 &&
> -               targetHistory[private->tliIndex].end < targetSegEnd)
> -            private->tliIndex++;
> -        while (private->tliIndex > 0 &&
> -               targetHistory[private->tliIndex].begin >= targetSegEnd)
> -            private->tliIndex--;
> +        while (private_data->tliIndex < targetNentries - 1 &&
> +               targetHistory[private_data->tliIndex].end < targetSegEnd)
> +            private_data->tliIndex++;
> +        while (private_data->tliIndex > 0 &&
> +               targetHistory[private_data->tliIndex].begin >= targetSegEnd)
> +            private_data->tliIndex--;

Hm, isn't this causing a bunch of overlong lines? Which'll then be
awkwardly "fixed" by pgindent?  Maybe just go for "priv"?




> diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
> index 3421eed74f..2193a95f71 100644
> --- a/src/pl/plpgsql/src/plpgsql.h
> +++ b/src/pl/plpgsql/src/plpgsql.h
> @@ -1078,7 +1078,7 @@ extern PLpgSQL_rec *plpgsql_build_record(const char *refname, int lineno,
>  extern int plpgsql_recognize_err_condition(const char *condname,
>                                  bool allow_sqlstate);
>  extern PLpgSQL_condition *plpgsql_parse_err_condition(char *condname);
> -extern void plpgsql_adddatum(PLpgSQL_datum *new);
> +extern void plpgsql_adddatum(PLpgSQL_datum *newdatum);
>  extern int    plpgsql_add_initdatums(int **varnos);
>  extern void plpgsql_HashTableInit(void);
>  
> @@ -1104,7 +1104,7 @@ extern Oid plpgsql_exec_get_datum_type(PLpgSQL_execstate *estate,
>                              PLpgSQL_datum *datum);
>  extern void plpgsql_exec_get_datum_type_info(PLpgSQL_execstate *estate,
>                                   PLpgSQL_datum *datum,
> -                                 Oid *typeid, int32 *typmod, Oid *collation);
> +                                 Oid *typid, int32 *typmod, Oid *collation);

Hm, this one (and a bunch of others)  should have been caught earlier,
but cpluspluscheck was too centered in src/include... Admittedly those
are probably more important, given that one is more likely to include
them directly.


Wee, through that one.  That must have been a fun one to write


> From 7c18f989d0dc430059cce70e5f0e634e18705b91 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter_e@gmx.net>
> Date: Tue, 30 Aug 2016 12:00:00 -0400
> Subject: [PATCH v2 08/23] Set up for static asserts in C++
> 
> ---
>  src/include/c.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/src/include/c.h b/src/include/c.h
> index 947bd98067..7f303a4f39 100644
> --- a/src/include/c.h
> +++ b/src/include/c.h
> @@ -744,14 +744,26 @@ typedef NameData *Name;
>   * about a negative width for a struct bit-field.  This will not include a
>   * helpful error message, but it beats not getting an error at all.
>   */
> +#ifdef __cplusplus
> +#if __cpp_static_assert >= 200410
> +#define _Static_assert(condition, errmessage) static_assert(condition, errmessage)
> +#define HAVE__STATIC_ASSERT 1
> +#endif
> +#endif

A comment about why we're doing this seems appropriate.


> From dadbc472a1e06f2b17b758f24827ba431d499cee Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter_e@gmx.net>
> Date: Tue, 30 Aug 2016 12:00:00 -0400
> Subject: [PATCH v2 09/23] Fix function prototypes for C++
> 
> C++ needs to have the right number of arguments in function signatures.

Ok.


> From 454f463597d4732883c163b92f62494a282b9dc8 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter_e@gmx.net>
> Date: Tue, 30 Aug 2016 12:00:00 -0400
> Subject: [PATCH v2 10/23] Fix fmgr_oldstyle for C++
> 
> C++ needs the function pointer to have the right number of arguments, so
> some casts are needed here.

I've a lingering patch to just remove fmgr_oldstyle, that seems
preferrable to this.


> From 47a7faf131bbee1e56f31220253798f955b05d65 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter_e@gmx.net>
> Date: Tue, 30 Aug 2016 12:00:00 -0400
> Subject: [PATCH v2 11/23] Don't define bool in C++
> 
> ---
>  src/test/thread/thread_test.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/src/test/thread/thread_test.c b/src/test/thread/thread_test.c
> index cb93bcc5ab..7ee0991988 100644
> --- a/src/test/thread/thread_test.c
> +++ b/src/test/thread/thread_test.c
> @@ -24,6 +24,8 @@
>  #include "postgres.h"
>  #else
>  /* From src/include/c.h" */
> +#ifndef __cplusplus
> +
>  #ifndef bool
>  typedef char bool;
>  #endif
> @@ -37,6 +39,8 @@ typedef char bool;
>  #endif
>  #endif
>  
> +#endif
> +

I wonder if we shouldn't also use stdbool.h type booleans if available.


> From 6c22f8b7f692d4fe26f807453a2b97993db4cae0 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter_e@gmx.net>
> Date: Tue, 30 Aug 2016 12:00:00 -0400
> Subject: [PATCH v2 12/23] Change TimeoutId from enum to integer
> 
> RegisterTimeout() treats it like an integer.  C++ doesn't like enums to
> be treated that way.

> -typedef enum TimeoutId
> -{
> +typedef int TimeoutId;
> +
>      /* Predefined timeout reasons */
> -    STARTUP_PACKET_TIMEOUT,
> -    DEADLOCK_TIMEOUT,
> -    LOCK_TIMEOUT,
> -    STATEMENT_TIMEOUT,
> -    STANDBY_DEADLOCK_TIMEOUT,
> -    STANDBY_TIMEOUT,
> -    STANDBY_LOCK_TIMEOUT,
> -    IDLE_IN_TRANSACTION_SESSION_TIMEOUT,
> +const int STARTUP_PACKET_TIMEOUT = 1;
> +const int DEADLOCK_TIMEOUT = 2;
> +const int LOCK_TIMEOUT = 3;
> +const int STATEMENT_TIMEOUT = 4;
> +const int STANDBY_DEADLOCK_TIMEOUT = 5;
> +const int STANDBY_TIMEOUT = 6;
> +const int STANDBY_LOCK_TIMEOUT = 7;
> +const int IDLE_IN_TRANSACTION_SESSION_TIMEOUT = 8;
>      /* First user-definable timeout reason */
> -    USER_TIMEOUT,
> +const int USER_TIMEOUT = 9;
>      /* Maximum number of timeout reasons */
> -    MAX_TIMEOUTS = 16
> -} TimeoutId;
> +const int MAX_TIMEOUTS = 16;

I dislike this one a fair bit. On some stupid compilers you're going to
emit actual symbols with this.  We could just cast in RegisterTimeout()?


(skipping to 23 here)


> From 07dbc4ef099c70af672764bc84e1f7ad9a8c7917 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter_e@gmx.net>
> Date: Wed, 5 Oct 2016 12:00:00 -0400
> Subject: [PATCH v2 23/23] Fix issue with enums and va_arg()

Seems reasonable.


- Andres



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: [HACKERS] Proposal : Parallel Merge Join
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Proposal : Parallel Merge Join