Thread: server-side extension in c++

server-side extension in c++

From
Igor
Date:
Hi All,

Is there an easy way to add c++ files to my simple pgsql module ? My Makefile
is as follows -

===
MODULES = pg_uservars
DATA_built = pg_uservars.sql
PGXS := $(shell pg_config --pgxs)
include $(PGXS)
===

I've got "pg_uservars.c" and "hv.cc" and I'd like to compile hv.cc via g++.
I'm aware of c++ name [de]mangling, just looking if there's a standard way of
using C++ when it comes to pgxs.

--
Best Regards,
Igor Shevchenko

Re: server-side extension in c++

From
Craig Ringer
Date:
Igor wrote:
> Hi All,
>
> Is there an easy way to add c++ files to my simple pgsql module ? My Makefile
> is as follows -
>
> ===
> MODULES = pg_uservars
> DATA_built = pg_uservars.sql
> PGXS := $(shell pg_config --pgxs)
> include $(PGXS)
> ===
>
> I've got "pg_uservars.c" and "hv.cc" and I'd like to compile hv.cc via g++.
> I'm aware of c++ name [de]mangling, just looking if there's a standard way of
> using C++ when it comes to pgxs.

It should "just work". Simply make sure to follow the usual rules for
calling into C++ code from C and vice versa:

- Use "extern C" linkage for all functions that must be accessible by
  dlopen(), and preferably also for any functions that you might take
  a function pointer to and pass to C code

- Never return new()'d memory that might be free()'d by the C code; use
  malloc()

- Never delete() memory that was malloc()'d by the C code; use free()

- Never let an exception propagate into the C code; use a catch-all
  block at the top level of all "extern C" functions

... and probably other things I've missed.

--
Craig Ringer

Re: server-side extension in c++

From
Bruce Momjian
Date:
Craig Ringer wrote:
> Igor wrote:
> > Hi All,
> >
> > Is there an easy way to add c++ files to my simple pgsql module ? My Makefile
> > is as follows -
> >
> > ===
> > MODULES = pg_uservars
> > DATA_built = pg_uservars.sql
> > PGXS := $(shell pg_config --pgxs)
> > include $(PGXS)
> > ===
> >
> > I've got "pg_uservars.c" and "hv.cc" and I'd like to compile hv.cc via g++.
> > I'm aware of c++ name [de]mangling, just looking if there's a standard way of
> > using C++ when it comes to pgxs.
>
> It should "just work". Simply make sure to follow the usual rules for
> calling into C++ code from C and vice versa:
>
> - Use "extern C" linkage for all functions that must be accessible by
>   dlopen(), and preferably also for any functions that you might take
>   a function pointer to and pass to C code
>
> - Never return new()'d memory that might be free()'d by the C code; use
>   malloc()
>
> - Never delete() memory that was malloc()'d by the C code; use free()
>
> - Never let an exception propagate into the C code; use a catch-all
>   block at the top level of all "extern C" functions
>
> ... and probably other things I've missed.

That is great new information.  I have created a new documentation
section called "Using C++ for Extensibility", and listed you as the
author in the CVS commit;  patch attached.  Thanks.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +

Index: doc/src/sgml/extend.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/extend.sgml,v
retrieving revision 1.38
diff -c -c -r1.38 extend.sgml
*** doc/src/sgml/extend.sgml    3 Apr 2010 07:22:53 -0000    1.38
--- doc/src/sgml/extend.sgml    1 Jun 2010 02:29:31 -0000
***************
*** 273,276 ****
--- 273,322 ----
    &xoper;
    &xindex;

+   <sect1 id="extend-how">
+    <title>Using C++ for Extensibility</title>
+
+    <indexterm zone="extend-Cpp">
+     <primary>C++</primary>
+    </indexterm>
+
+    <para>
+     It is possible to use a compiler in C++ mode to build
+     <productname>PostgreSQL</productname> extensions;  you must simply
+     follow the standard methods for dynamically linking to C executables:
+
+     <itemizedlist>
+      <listitem>
+       <para>
+         Use <literal>extern C</> linkage for all functions that must
+         be accessible by <function>dlopen()</>.  This is also necessary
+         for any functions that might be passed as pointers between
+         the backend and C++ code.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Use <function>malloc()</> to allocate any memory that might be
+        freed by the backend C code (don't pass <function>new()</>-allocated
+        memory).
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Use <function>free()</> to free memory allocated by the backend
+        C code (do not use <function>delete()</> for such cases).
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Prevent exceptions from propagating into the C code (use a
+        catch-all block at the top level of all <literal>extern C</>
+        functions).
+       </para>
+      </listitem>
+     </itemizedlist>
+    </para>
+
+   </sect1>
+
   </chapter>

Re: server-side extension in c++

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> That is great new information.  I have created a new documentation
> section called "Using C++ for Extensibility", and listed you as the
> author in the CVS commit;  patch attached.  Thanks.

Too bad two out of the four pieces of advice are wrong (how many pieces
of memory managed by the backend are allocated directly with malloc?).
The other two are not wrong as far as they go, but they're certainly
woefully inadequate, because no interesting backend extension is going
to be able to get along without calling back into the core code.

Personally I would reduce this section to

    <para>
     Don't.
    </para>

I don't think it is worth our time to try to support people who run into
the inevitable memory management and error handling incompatibilities.
Nor are they likely to be happy at the end of the experience, if we
blithely tell them up front that it'll work.

            regards, tom lane

Re: server-side extension in c++

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > That is great new information.  I have created a new documentation
> > section called "Using C++ for Extensibility", and listed you as the
> > author in the CVS commit;  patch attached.  Thanks.
>
> Too bad two out of the four pieces of advice are wrong (how many pieces
> of memory managed by the backend are allocated directly with malloc?).
> The other two are not wrong as far as they go, but they're certainly
> woefully inadequate, because no interesting backend extension is going
> to be able to get along without calling back into the core code.

Good point.  I assumed others would chime in to improve this.

> Personally I would reduce this section to
>
>     <para>
>      Don't.
>     </para>
>
> I don't think it is worth our time to try to support people who run into
> the inevitable memory management and error handling incompatibilities.
> Nor are they likely to be happy at the end of the experience, if we
> blithely tell them up front that it'll work.

Well, I would have avoided this mine-trap except we have this 9.0
release note item:

       Allow use of <productname>C++</> functions in backend code (Kurt
       Harriman, Peter Eisentraut)

I figure if we don't provide some guidance, things will be even worse.

I have updated the docs to mention palloc/pfree instead;  applied patch
attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +
Index: doc/src/sgml/extend.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/extend.sgml,v
retrieving revision 1.40
diff -c -c -r1.40 extend.sgml
*** doc/src/sgml/extend.sgml    1 Jun 2010 02:35:37 -0000    1.40
--- doc/src/sgml/extend.sgml    1 Jun 2010 02:53:30 -0000
***************
*** 296,309 ****
       </listitem>
       <listitem>
        <para>
!        Use <function>malloc()</> to allocate any memory that might be
         freed by the backend C code (don't pass <function>new()</>-allocated
         memory).
        </para>
       </listitem>
       <listitem>
        <para>
!        Use <function>free()</> to free memory allocated by the backend
         C code (do not use <function>delete()</> for such cases).
        </para>
       </listitem>
--- 296,309 ----
       </listitem>
       <listitem>
        <para>
!        Use <function>palloc()</> to allocate any memory that might be
         freed by the backend C code (don't pass <function>new()</>-allocated
         memory).
        </para>
       </listitem>
       <listitem>
        <para>
!        Use <function>pfree()</> to free memory allocated by the backend
         C code (do not use <function>delete()</> for such cases).
        </para>
       </listitem>

Re: server-side extension in c++

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> Personally I would reduce this section to
>>    Don't.

> Well, I would have avoided this mine-trap except we have this 9.0
> release note item:
>        Allow use of <productname>C++</> functions in backend code (Kurt
>        Harriman, Peter Eisentraut)

I'd be interested to see a section like this written by someone who'd
actually done a nontrivial C++ extension and lived to tell the tale.
As is, this is so incomplete that my opinion is it's worse than useless.
It gives people the impression that writing an extension in C++ will
be easy.  When they find out it isn't, we'll get the blame.

            regards, tom lane

Re: server-side extension in c++

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> Personally I would reduce this section to
> >>    Don't.
>
> > Well, I would have avoided this mine-trap except we have this 9.0
> > release note item:
> >        Allow use of <productname>C++</> functions in backend code (Kurt
> >        Harriman, Peter Eisentraut)
>
> I'd be interested to see a section like this written by someone who'd
> actually done a nontrivial C++ extension and lived to tell the tale.
> As is, this is so incomplete that my opinion is it's worse than useless.
> It gives people the impression that writing an extension in C++ will
> be easy.  When they find out it isn't, we'll get the blame.

So should I just comment it out and then when someone gets serious we
can use it as a starting point for them?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +

Re: server-side extension in c++

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
>>> Well, I would have avoided this mine-trap except we have this 9.0
>>> release note item:
>>>    Allow use of <productname>C++</> functions in backend code (Kurt
>>>    Harriman, Peter Eisentraut)

> So should I just comment it out and then when someone gets serious we
> can use it as a starting point for them?

Sure.  While you're at it, tone down the release-note item.  It should
read more like "Take some steps towards allowing use ...", because C++
keywords in the header files surely were not the only stumbling block.

            regards, tom lane

Re: server-side extension in c++

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> >>> Well, I would have avoided this mine-trap except we have this 9.0
> >>> release note item:
> >>>    Allow use of <productname>C++</> functions in backend code (Kurt
> >>>    Harriman, Peter Eisentraut)
>
> > So should I just comment it out and then when someone gets serious we
> > can use it as a starting point for them?
>
> Sure.  While you're at it, tone down the release-note item.  It should
> read more like "Take some steps towards allowing use ...", because C++
> keywords in the header files surely were not the only stumbling block.

OK, done with attached, applied patch.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +
Index: doc/src/sgml/extend.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/extend.sgml,v
retrieving revision 1.41
diff -c -c -r1.41 extend.sgml
*** doc/src/sgml/extend.sgml    1 Jun 2010 02:54:37 -0000    1.41
--- doc/src/sgml/extend.sgml    1 Jun 2010 03:17:46 -0000
***************
*** 273,278 ****
--- 273,280 ----
    &xoper;
    &xindex;

+ <!-- Use this someday when C++ is easier to use. bjm 2010-05-31
+
    <sect1 id="extend-Cpp">
     <title>Using C++ for Extensibility</title>

***************
*** 318,322 ****
--- 320,325 ----
     </para>

    </sect1>
+ -->

   </chapter>
Index: doc/src/sgml/release-9.0.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/release-9.0.sgml,v
retrieving revision 2.22
diff -c -c -r2.22 release-9.0.sgml
*** doc/src/sgml/release-9.0.sgml    17 May 2010 17:46:13 -0000    2.22
--- doc/src/sgml/release-9.0.sgml    1 Jun 2010 03:17:50 -0000
***************
*** 2519,2532 ****

       <listitem>
        <para>
!        Allow use of <productname>C++</> functions in backend code (Kurt
         Harriman, Peter Eisentraut)
        </para>

        <para>
!        This removes keyword conflicts that previously made <productname>C++</>
!        usage difficult in backend code. <literal>extern "C" { }</> might still
!        be necessary.
        </para>
       </listitem>

--- 2519,2534 ----

       <listitem>
        <para>
!        Simplify use of <productname>C++</> functions in backend code (Kurt
         Harriman, Peter Eisentraut)
        </para>

        <para>
!        While this removes keyword conflicts that previously made
!        <productname>C++</> usage difficult in backend code, there are
!        still other complexities when using <productname>C++</> for backend
!        functions. <literal>extern "C" { }</> is still necessary in
!        some cases.
        </para>
       </listitem>


Re: server-side extension in c++

From
Craig Ringer
Date:
On 01/06/10 10:48, Tom Lane wrote:

> Too bad two out of the four pieces of advice are wrong (how many pieces
> of memory managed by the backend are allocated directly with malloc?).
> The other two are not wrong as far as they go, but they're certainly
> woefully inadequate, because no interesting backend extension is going
> to be able to get along without calling back into the core code.

It's a lot like mixing C++ with Symbian's longjump-based error handling.
It's possible, just ugly, and requires error-handling boundaries to be
carefully thought out.

Rather than saying "don't mix new/delete and malloc/free" I should've
said "always be sure to release memory with the matching function to
that which allocated it", thus covering palloc too. Not that you
generally need to worry too much about palloc'd memory.

> Personally I would reduce this section to
>
>     <para>
>     Don't.
>     </para>

Sometimes you need or want to expose capabilities of a C++ library. So
long as you do so with proper encapsulation of the C++ functionality, so
that the only interfaces Pg sees are C, there's really no problem.

> Nor are they likely to be happy at the end of the experience, if we
> blithely tell them up front that it'll work.

I've had no issues using C++ libraries in Pg server-side code. It *does*
work. You just need to be careful where your error-handling and memory
management style boundaries lie.

--
Craig Ringer

Re: server-side extension in c++

From
Craig Ringer
Date:
On 01/06/10 11:05, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> Tom Lane wrote:
>>> Personally I would reduce this section to
>>>     Don't.
>
>> Well, I would have avoided this mine-trap except we have this 9.0
>> release note item:
>>        Allow use of <productname>C++</> functions in backend code (Kurt
>>        Harriman, Peter Eisentraut)
>
> I'd be interested to see a section like this written by someone who'd
> actually done a nontrivial C++ extension and lived to tell the tale.

I can't speak up there - my own C++/Pg backend stuff has been fairly
trivial, and has been where I can maintain a fairly clean separation of
the C++-exposed and the Pg-backend-exposed parts. I was able to keep
things separate enough that my C++ compilation units didn't include the
Pg backend headers; they just exposed a pure C public interface. The Pg
backend-using compilation units were written in C, and talked to the C++
part over its exposed pure C interfaces.

This was very much pain-free, but I certainly wouldn't want to try to
use C++ code tightly intermixed with Pg backend-using code. It'd be a
nightmare.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Re: server-side extension in c++

From
David Fetter
Date:
On Tue, Jun 01, 2010 at 02:13:02PM +0800, Craig Ringer wrote:
> On 01/06/10 11:05, Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> >> Tom Lane wrote:
> >>> Personally I would reduce this section to
> >>>     Don't.
> >
> >> Well, I would have avoided this mine-trap except we have this 9.0
> >> release note item:
> >>        Allow use of <productname>C++</> functions in backend code (Kurt
> >>        Harriman, Peter Eisentraut)
> >
> > I'd be interested to see a section like this written by someone
> > who'd actually done a nontrivial C++ extension and lived to tell
> > the tale.
>
> I can't speak up there - my own C++/Pg backend stuff has been fairly
> trivial, and has been where I can maintain a fairly clean separation
> of the C++-exposed and the Pg-backend-exposed parts. I was able to
> keep things separate enough that my C++ compilation units didn't
> include the Pg backend headers; they just exposed a pure C public
> interface. The Pg backend-using compilation units were written in C,
> and talked to the C++ part over its exposed pure C interfaces.
>
> This was very much pain-free, but I certainly wouldn't want to try
> to use C++ code tightly intermixed with Pg backend-using code. It'd
> be a nightmare.

These two paragraphs, suitably changed to be more like the rest of the
docs, would be a great start for people interested in using C++.

Would some short bits of sample code help?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: server-side extension in c++

From
Tom Lane
Date:
Craig Ringer <craig@postnewspapers.com.au> writes:
> On 01/06/10 11:05, Tom Lane wrote:
>> I'd be interested to see a section like this written by someone who'd
>> actually done a nontrivial C++ extension and lived to tell the tale.

> I can't speak up there - my own C++/Pg backend stuff has been fairly
> trivial, and has been where I can maintain a fairly clean separation of
> the C++-exposed and the Pg-backend-exposed parts. I was able to keep
> things separate enough that my C++ compilation units didn't include the
> Pg backend headers; they just exposed a pure C public interface. The Pg
> backend-using compilation units were written in C, and talked to the C++
> part over its exposed pure C interfaces.

Yeah, if you can design your code so that C++ never has to call back
into the core backend, that eliminates a large chunk of the pain.
Should we be documenting design ideas like this one?

            regards, tom lane

Re: server-side extension in c++

From
Bruce Momjian
Date:
Tom Lane wrote:
> Craig Ringer <craig@postnewspapers.com.au> writes:
> > On 01/06/10 11:05, Tom Lane wrote:
> >> I'd be interested to see a section like this written by someone who'd
> >> actually done a nontrivial C++ extension and lived to tell the tale.
>
> > I can't speak up there - my own C++/Pg backend stuff has been fairly
> > trivial, and has been where I can maintain a fairly clean separation of
> > the C++-exposed and the Pg-backend-exposed parts. I was able to keep
> > things separate enough that my C++ compilation units didn't include the
> > Pg backend headers; they just exposed a pure C public interface. The Pg
> > backend-using compilation units were written in C, and talked to the C++
> > part over its exposed pure C interfaces.
>
> Yeah, if you can design your code so that C++ never has to call back
> into the core backend, that eliminates a large chunk of the pain.
> Should we be documenting design ideas like this one?

I have incorporated the new ideas into the C++ documentation section,
and removed the comment block in the attached patch.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +
Index: doc/src/sgml/extend.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/extend.sgml,v
retrieving revision 1.42
diff -c -c -r1.42 extend.sgml
*** doc/src/sgml/extend.sgml    1 Jun 2010 03:19:36 -0000    1.42
--- doc/src/sgml/extend.sgml    2 Jun 2010 01:20:23 -0000
***************
*** 273,280 ****
    &xoper;
    &xindex;

- <!-- Use this someday when C++ is easier to use. bjm 2010-05-31
-
    <sect1 id="extend-Cpp">
     <title>Using C++ for Extensibility</title>

--- 273,278 ----
***************
*** 284,313 ****

     <para>
      It is possible to use a compiler in C++ mode to build
!     <productname>PostgreSQL</productname> extensions;  you must simply
!     follow the standard methods for dynamically linking to C executables:

      <itemizedlist>
       <listitem>
        <para>
!         Use <literal>extern C</> linkage for all functions that must
!         be accessible by <function>dlopen()</>.  This is also necessary
!         for any functions that might be passed as pointers between
!         the backend and C++ code.
!       </para>
!      </listitem>
!      <listitem>
!       <para>
!        Use <function>palloc()</> to allocate any memory that might be
!        freed by the backend C code (don't pass <function>new()</>-allocated
!        memory).
        </para>
       </listitem>
       <listitem>
        <para>
!        Use <function>pfree()</> to free memory allocated by the backend
!        C code (do not use <function>delete()</> for such cases).
!       </para>
       </listitem>
       <listitem>
        <para>
--- 282,307 ----

     <para>
      It is possible to use a compiler in C++ mode to build
!     <productname>PostgreSQL</productname> extensions by following these
!     guidelines:

      <itemizedlist>
       <listitem>
        <para>
!         All functions accessed by the backend must present a C interface
!         to the backend;  these C functions can then call C++ functions.
!         For example, <literal>extern C</> linkage is required for
!         backend-accessed functions.  This is also necessary for any
!         functions that are passed as pointers between the backend and
!         C++ code.
        </para>
       </listitem>
       <listitem>
        <para>
!        Free memory using the appropriate deallocation method.  For example,
!        most backend memory is allocated using <function>palloc()</>, so use
!        <function>pfree()</> to free it, i.e. using C++
!        <function>delete()</> in such cases will fail.
       </listitem>
       <listitem>
        <para>
***************
*** 320,325 ****
     </para>

    </sect1>
- -->

   </chapter>
--- 314,318 ----

Re: server-side extension in c++

From
Craig Ringer
Date:
On 02/06/10 09:23, Bruce Momjian wrote:
> Tom Lane wrote:
>> Craig Ringer <craig@postnewspapers.com.au> writes:
>>> On 01/06/10 11:05, Tom Lane wrote:
>>>> I'd be interested to see a section like this written by someone who'd
>>>> actually done a nontrivial C++ extension and lived to tell the tale.
>>
>>> I can't speak up there - my own C++/Pg backend stuff has been fairly
>>> trivial, and has been where I can maintain a fairly clean separation of
>>> the C++-exposed and the Pg-backend-exposed parts. I was able to keep
>>> things separate enough that my C++ compilation units didn't include the
>>> Pg backend headers; they just exposed a pure C public interface. The Pg
>>> backend-using compilation units were written in C, and talked to the C++
>>> part over its exposed pure C interfaces.
>>
>> Yeah, if you can design your code so that C++ never has to call back
>> into the core backend, that eliminates a large chunk of the pain.
>> Should we be documenting design ideas like this one?
>
> I have incorporated the new ideas into the C++ documentation section,
> and removed the comment block in the attached patch.

If you're going to include that much, I'd still really want to warn
people about exception/error handling too. It's important. I made brief
mention of it before, but perhaps some more detail would help if people
really want to do this.

( BTW, all in all, I agree with Tom Lane - the best answer is "don't".
Sometimes you need to access functionality from C++ libraries, but
unless that's your reason I wouldn't ever consider doing it. )

Here's a rough outline of the rules I follow when mixing C/C++ code,
plus some info on the longjmp error handling related complexities added
by Pg:



Letting an exception thrown from C++ code cross into C code will be
EXTREMELY ugly. The C++-to-C boundaries *must* have unconditional catch
blocks to convert thrown exceptions into appropriate error codes, even
if the C++ code in question never knowingly throws an exception. C++ may
throw std::bad_alloc on failure of operator new(), among other things,
so the user must _always_ have an unconditional catch. Letting an
exception propagate out to the C-based Pg backend is rather likely to
result in a backend crash.

If the C++ libraries you are using will put up with it, compile your C++
code with -fno-exceptions to make your life much, much easier, as you
can avoid worrying about this entirely. OTOH, you must then check for
NULL return from operator new().

If you can't do that: My usual rule is that any "extern C" function
*must* have an unconditional catch. I also require that any function
that may be passed as a function pointer to C code must be "extern C"
and thus must obey the previous rule, so that covers function pointers
and dlopen()ed access to functions.




Similarly, calling Pg code that may use Pg's error handling from within
C++ is unsafe. It should be OK if you know for absolute certain that the
C++ call tree in question only has plain-old-data (POD) structs and
simple variables on the stack, but even then it requires caution. C++
code that uses Pg calls can't do anything it couldn't do if you were
using 'goto' and labels in each involved function, but additionally has
to worry about returning and passing non-POD objects between functions
in a call chain by value, as a longjmp may result in dtors not being
properly called.

The best way to get around this issue is not to call into the Pg backend
from C++ code at all, instead encapsulating your C++ functionality into
cleanly separated modules with pure C interfaces. If you don't #include
any Pg backend headers into any compilation units compiled with the C++
compiler, that should do the trick.

If you must mix Pg calls and C++, restrict your C++ objects to the heap
(ie use pointers to them, managed with new and delete) and limit your
stack to POD variables (simple structs and built-in types). Note that
this means you can't use std::auto_ptr, std::tr1:shared_ptr, RAII lock
management, etc in C++ code that may call into the Pg backend.



--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Re: server-side extension in c++

From
Bruce Momjian
Date:
Craig Ringer wrote:
> ( BTW, all in all, I agree with Tom Lane - the best answer is "don't".
> Sometimes you need to access functionality from C++ libraries, but
> unless that's your reason I wouldn't ever consider doing it. )
>
> Here's a rough outline of the rules I follow when mixing C/C++ code,
> plus some info on the longjmp error handling related complexities added
> by Pg:

This was very helpful.  I have condensed your ideas into the attached
patch that contains the potential C++ documentation section.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +
Index: doc/src/sgml/extend.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/extend.sgml,v
retrieving revision 1.42
diff -c -c -r1.42 extend.sgml
*** doc/src/sgml/extend.sgml    1 Jun 2010 03:19:36 -0000    1.42
--- doc/src/sgml/extend.sgml    2 Jun 2010 03:25:30 -0000
***************
*** 273,280 ****
    &xoper;
    &xindex;

- <!-- Use this someday when C++ is easier to use. bjm 2010-05-31
-
    <sect1 id="extend-Cpp">
     <title>Using C++ for Extensibility</title>

--- 273,278 ----
***************
*** 284,325 ****

     <para>
      It is possible to use a compiler in C++ mode to build
!     <productname>PostgreSQL</productname> extensions;  you must simply
!     follow the standard methods for dynamically linking to C executables:

      <itemizedlist>
       <listitem>
        <para>
!         Use <literal>extern C</> linkage for all functions that must
!         be accessible by <function>dlopen()</>.  This is also necessary
!         for any functions that might be passed as pointers between
!         the backend and C++ code.
        </para>
       </listitem>
       <listitem>
        <para>
!        Use <function>palloc()</> to allocate any memory that might be
!        freed by the backend C code (don't pass <function>new()</>-allocated
!        memory).
!       </para>
       </listitem>
       <listitem>
        <para>
!        Use <function>pfree()</> to free memory allocated by the backend
!        C code (do not use <function>delete()</> for such cases).
        </para>
       </listitem>
       <listitem>
        <para>
!        Prevent exceptions from propagating into the C code (use a
!        catch-all block at the top level of all <literal>extern C</>
!        functions).
        </para>
       </listitem>
      </itemizedlist>
     </para>

    </sect1>
- -->

   </chapter>
--- 282,338 ----

     <para>
      It is possible to use a compiler in C++ mode to build
!     <productname>PostgreSQL</productname> extensions by following these
!     guidelines:

      <itemizedlist>
       <listitem>
        <para>
!         All functions accessed by the backend must present a C interface
!         to the backend;  these C functions can then call C++ functions.
!         For example, <literal>extern C</> linkage is required for
!         backend-accessed functions.  This is also necessary for any
!         functions that are passed as pointers between the backend and
!         C++ code.
        </para>
       </listitem>
       <listitem>
        <para>
!        Free memory using the appropriate deallocation method.  For example,
!        most backend memory is allocated using <function>palloc()</>, so use
!        <function>pfree()</> to free it, i.e. using C++
!        <function>delete()</> in such cases will fail.
       </listitem>
       <listitem>
        <para>
!        Prevent exceptions from propagating into the C code (use a
!        catch-all block at the top level of all <literal>extern C</>
!        functions).  This is necessary even if the C++ code does not
!        throw any exceptions because events like out-of-memory still
!        throw exceptions.  Any exceptions must be caught and appropriate
!        errors passed back to the C interface.  If possible, compile C++
!        with <option>-fno-exceptions</> to eliminate exceptions entirely;
!        in such cases, you must check for failures in your C++ code, e.g.
!        check for NULL returned by <function>new()</>.
        </para>
       </listitem>
       <listitem>
        <para>
!        If calling backend functions from C++ code, be sure that the
!        C++ call stack contains only plain old data structure
!        (<acronym>POD</>).  This is necessary because backend errors
!        generate a <function>longjump()</> that does not properly unroll
!        a C++ call stack with non-POD objects.
        </para>
       </listitem>
      </itemizedlist>
     </para>

+    <para>
+     In summary, it is best to place C++ code behind a wall of
+     <literal>extern C</> functions that interface to the backend,
+     and avoid exception, memory, and call stack leakage.
+    </para>
    </sect1>

   </chapter>

Re: server-side extension in c++

From
Peter Geoghegan
Date:
> Letting an exception thrown from C++ code cross into C code will be
> EXTREMELY ugly. The C++-to-C boundaries *must* have unconditional catch
> blocks to convert thrown exceptions into appropriate error codes, even
> if the C++ code in question never knowingly throws an exception. C++ may
> throw std::bad_alloc on failure of operator new(), among other things,
> so the user must _always_ have an unconditional catch. Letting an
> exception propagate out to the C-based Pg backend is rather likely to
> result in a backend crash.

Right, but I don't think that this differs from the general C++ case.
Allowing exceptions to propagate across module boundaries was always a
bad idea, as was managing memory across module boundaries. Aside from
being very messy, they had to have exactly compatible runtimes and so
on.

> If the C++ libraries you are using will put up with it, compile your C++
> code with -fno-exceptions to make your life much, much easier, as you
> can avoid worrying about this entirely. OTOH, you must then check for
> NULL return from operator new().

That's the pre-standard behaviour of operator new(), and operator
new() continues to behave that way on some platforms, typically
embedded systems.

> If you can't do that: My usual rule is that any "extern C" function
> *must* have an unconditional catch. I also require that any function
> that may be passed as a function pointer to C code must be "extern C"
> and thus must obey the previous rule, so that covers function pointers
> and dlopen()ed access to functions.
>

Seems reasonable, and not overly difficult.

>
> Similarly, calling Pg code that may use Pg's error handling from within
> C++ is unsafe. It should be OK if you know for absolute certain that the
> C++ call tree in question only has plain-old-data (POD) structs and
> simple variables on the stack, but even then it requires caution. C++
> code that uses Pg calls can't do anything it couldn't do if you were
> using 'goto' and labels in each involved function, but additionally has
> to worry about returning and passing non-POD objects between functions
> in a call chain by value, as a longjmp may result in dtors not being
> properly called.


Really? That seems like an *incredibly* arduous requirement.
Intuitively, I find it difficult to believe. After all, even though
using longjmp in C++ code is a fast track to undefined behaviour, I
would have imagined that doing so in an isolated C module with a well
defined interface, called from C++ would be safe. I would have
imagined that ultimately, the call to the Pg C function must return,
and therefore cannot affect stack unwinding within the C++ part of the
program.

To invoke a reductio ad absurdum argument, if this were the case,
calling C functions from C++ would be widely considered a dangerous
thing to do, which it is not. After all, setjmp()/longjmp() are part
of the C standard library...in general, it's difficult to know whether
or not a third party module may use them. Have you ever seen a C
library marked as C++ safe or C++ unsafe? No, me neither.

Perhaps I'm missing something though...does the error handling portion
of the pg code potentially need a hook into the C++ code, from where
the longjmp() must be performed? I don't know what you mean by "a call
chain by value".

The bottom line is that *I think* you're fine as long as you don't do
setjmp()/longjmp() from within C++. You may even be okay if you just
do setjmp() from within C++. The longjmp() will hopefully only affect
stack unwinding before we get down to the C++ part of the stack, where
that matters.

--
Regards,
Peter Geoghegan

Re: server-side extension in c++

From
Craig Ringer
Date:
On 02/06/10 19:17, Peter Geoghegan wrote:

>> Similarly, calling Pg code that may use Pg's error handling from within
>> C++ is unsafe. It should be OK if you know for absolute certain that the
>> C++ call tree in question only has plain-old-data (POD) structs and
>> simple variables on the stack, but even then it requires caution. C++
>> code that uses Pg calls can't do anything it couldn't do if you were
>> using 'goto' and labels in each involved function, but additionally has
>> to worry about returning and passing non-POD objects between functions
>> in a call chain by value, as a longjmp may result in dtors not being
>> properly called.
>
>
> Really? That seems like an *incredibly* arduous requirement.
> Intuitively, I find it difficult to believe. After all, even though
> using longjmp in C++ code is a fast track to undefined behaviour, I
> would have imagined that doing so in an isolated C module with a well
> defined interface, called from C++ would be safe.

Not necessarily. It's only safe if setjmp/longjmp calls occur only
within the C code without "breaking" call paths involving C++.

This is ok:

   [ C ]
   entrypoint()
   callIntoCppCode()
   [ C++ ]
   someCalls()
   callIntoCCode()
   [ C ]
   setjmp()
   doSomeStuff()
   longjmp()

This is really, really not:

   [ C ]
   entrypoint()
   setjmp()        <----
   callIntoCppCode()
   [ C++ ]
   someCalls()
   callIntoCCode()
   [ C ]
   doSomeStuff()
   longjmp()

See the attached demo (pop all files in the same directory then run "make").


> I would have
> imagined that ultimately, the call to the Pg C function must return,
> and therefore cannot affect stack unwinding within the C++ part of the
> program.

That's the whole point; a longjmp breaks the call chain, and the
guarantee that eventually the stack will unwind as functions return.

It's OK if you setjmp(a), do some work, setjmp(b), longjmp(a), do some
work, longjmp(b), return.

My understanding, which is likely imperfect, is that Pg's error handling
does NOT guarantee that, ie it's quite possible that a function may call
longjmp() without preparing any jmp_env to "jump back to" and therefore
will never return.



> To invoke a reductio ad absurdum argument, if this were the case,
> calling C functions from C++ would be widely considered a dangerous
> thing to do, which it is not.

If those C functions use setjmp/longjmp, it *is* a dangerous thing to
do. Most libraries that use setjmp/longjump in ways that may affect
calling code DO document this, and it's expected that the user of the
library will know what that entails.

If the library uses setjmp/longjmp entirely internally, so that it never

http://stackoverflow.com/questions/1376085/c-safe-to-use-longjmp-and-setjmp

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Attachment

Re: server-side extension in c++

From
David Fetter
Date:
On Wed, Jun 02, 2010 at 10:11:37AM +0800, Craig Ringer wrote:
> On 02/06/10 09:23, Bruce Momjian wrote:
> > Tom Lane wrote:
> >> Craig Ringer <craig@postnewspapers.com.au> writes:
> >>> On 01/06/10 11:05, Tom Lane wrote:
> >>>> I'd be interested to see a section like this written by someone who'd
> >>>> actually done a nontrivial C++ extension and lived to tell the tale.
> >>
> >>> I can't speak up there - my own C++/Pg backend stuff has been fairly
> >>> trivial, and has been where I can maintain a fairly clean separation of
> >>> the C++-exposed and the Pg-backend-exposed parts. I was able to keep
> >>> things separate enough that my C++ compilation units didn't include the
> >>> Pg backend headers; they just exposed a pure C public interface. The Pg
> >>> backend-using compilation units were written in C, and talked to the C++
> >>> part over its exposed pure C interfaces.
> >>
> >> Yeah, if you can design your code so that C++ never has to call back
> >> into the core backend, that eliminates a large chunk of the pain.
> >> Should we be documenting design ideas like this one?
> >
> > I have incorporated the new ideas into the C++ documentation section,
> > and removed the comment block in the attached patch.
>
> If you're going to include that much, I'd still really want to warn
> people about exception/error handling too. It's important. I made brief
> mention of it before, but perhaps some more detail would help if people
> really want to do this.
>
> ( BTW, all in all, I agree with Tom Lane - the best answer is "don't".
> Sometimes you need to access functionality from C++ libraries, but
> unless that's your reason I wouldn't ever consider doing it. )
>
> Here's a rough outline of the rules I follow when mixing C/C++ code,
> plus some info on the longjmp error handling related complexities added
> by Pg:
>
>
>
> Letting an exception thrown from C++ code cross into C code will be
> EXTREMELY ugly. The C++-to-C boundaries *must* have unconditional catch
> blocks to convert thrown exceptions into appropriate error codes, even
> if the C++ code in question never knowingly throws an exception. C++ may
> throw std::bad_alloc on failure of operator new(), among other things,
> so the user must _always_ have an unconditional catch. Letting an
> exception propagate out to the C-based Pg backend is rather likely to
> result in a backend crash.
>
> If the C++ libraries you are using will put up with it, compile your C++
> code with -fno-exceptions to make your life much, much easier, as you
> can avoid worrying about this entirely. OTOH, you must then check for
> NULL return from operator new().
>
> If you can't do that: My usual rule is that any "extern C" function
> *must* have an unconditional catch. I also require that any function
> that may be passed as a function pointer to C code must be "extern C"
> and thus must obey the previous rule, so that covers function pointers
> and dlopen()ed access to functions.
>
>
>
>
> Similarly, calling Pg code that may use Pg's error handling from within
> C++ is unsafe. It should be OK if you know for absolute certain that the
> C++ call tree in question only has plain-old-data (POD) structs and
> simple variables on the stack, but even then it requires caution. C++
> code that uses Pg calls can't do anything it couldn't do if you were
> using 'goto' and labels in each involved function, but additionally has
> to worry about returning and passing non-POD objects between functions
> in a call chain by value, as a longjmp may result in dtors not being
> properly called.
>
> The best way to get around this issue is not to call into the Pg backend
> from C++ code at all, instead encapsulating your C++ functionality into
> cleanly separated modules with pure C interfaces. If you don't #include
> any Pg backend headers into any compilation units compiled with the C++
> compiler, that should do the trick.
>
> If you must mix Pg calls and C++, restrict your C++ objects to the heap
> (ie use pointers to them, managed with new and delete) and limit your
> stack to POD variables (simple structs and built-in types). Note that
> this means you can't use std::auto_ptr, std::tr1:shared_ptr, RAII lock
> management, etc in C++ code that may call into the Pg backend.

Is PostGIS following these guidelines?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: server-side extension in c++

From
Bruce Momjian
Date:
Craig Ringer wrote:
> See the attached demo (pop all files in the same directory then run "make").
>
>
> > I would have
> > imagined that ultimately, the call to the Pg C function must return,
> > and therefore cannot affect stack unwinding within the C++ part of the
> > program.
>
> That's the whole point; a longjmp breaks the call chain, and the
> guarantee that eventually the stack will unwind as functions return.
>
> It's OK if you setjmp(a), do some work, setjmp(b), longjmp(a), do some
> work, longjmp(b), return.
>
> My understanding, which is likely imperfect, is that Pg's error handling
> does NOT guarantee that, ie it's quite possible that a function may call
> longjmp() without preparing any jmp_env to "jump back to" and therefore
> will never return.

You are correct that a longjump() jumps back to the query entry loop,
hopping over any user-defined C or C++ functions in the call stack, and
you are right that if we were just using longjump() without unwinding
C++ calls, we would be OK using non-POD structures.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +

Re: server-side extension in c++

From
Peter Geoghegan
Date:
On 2 June 2010 13:36, Craig Ringer <craig@postnewspapers.com.au> wrote:
>>
>> Really? That seems like an *incredibly* arduous requirement.
>> Intuitively, I find it difficult to believe. After all, even though
>> using longjmp in C++ code is a fast track to undefined behaviour, I
>> would have imagined that doing so in an isolated C module with a well
>> defined interface, called from C++ would be safe.
>
> Not necessarily. It's only safe if setjmp/longjmp calls occur only
> within the C code without "breaking" call paths involving C++.

It isn't obvious to me that your suggestion that C++ functions that
invoke jumping pg code use only POD types, but manipulate C++ types
through pointers helps much, or at all. RAII/SBRM is just another
memory management strategy (albeit a very effective, intuitive one).
It's basically equivalent to the compiler generating calls to a
constructor when an object is instantiated, and to a destructor when
the object goes out of scope. So, how your concern fundamentally
differs from the general case where we're managing resources (but not
through memory contexts/palloc) explicitly, and risk being cut off
before control flow reaches our (implicit or explicit) destructor call
isn't clear, except perhaps that RAII gives clients what may be a
false sense of security. Sure, one is technically undefined behaviour
while the other isn't, but the end result is probably identical - a
memory leak.


>> I would have
>> imagined that ultimately, the call to the Pg C function must return,
>> and therefore cannot affect stack unwinding within the C++ part of the
>> program.
>
> That's the whole point; a longjmp breaks the call chain, and the
> guarantee that eventually the stack will unwind as functions return.

Yes, but my point was that if that occurs above the C++ code, it will
never be affected by it. We have to longjmp() *over* C++ code before
we have a problem. However, Bruce has answered the question of whether
or not that happens - it does, so I guess it doesn't matter.

Here's a radical idea that has no problems that immediately occur to
me, apart from the two massive problems that you just lost your
ability to reliably manage non-memory resources with RAII, and that
you're now in the realm of undefined behaviour:

Re-implement global operator new() and friends in terms of palloc and
pfree. This sort of thing is often done for C++ application
frameworks.

It makes me queasy that by doing this, we're resorting to undefined
behaviour in terms of the C++ standard (destructors are never called)
as a matter of routine. What do you think? I suppose that such
undefined behaviour is absolutely intolerable. It's not a serious
suggestion, just something that I think is worth pointing out.

--
Regards,
Peter Geoghegan

Re: server-side extension in c++

From
Craig Ringer
Date:
On 2/06/2010 11:49 PM, Peter Geoghegan wrote:
> On 2 June 2010 13:36, Craig Ringer<craig@postnewspapers.com.au>  wrote:
>>>
>>> Really? That seems like an *incredibly* arduous requirement.
>>> Intuitively, I find it difficult to believe. After all, even though
>>> using longjmp in C++ code is a fast track to undefined behaviour, I
>>> would have imagined that doing so in an isolated C module with a well
>>> defined interface, called from C++ would be safe.
>>
>> Not necessarily. It's only safe if setjmp/longjmp calls occur only
>> within the C code without "breaking" call paths involving C++.
>
> It isn't obvious to me that your suggestion that C++ functions that
> invoke jumping pg code use only POD types, but manipulate C++ types
> through pointers helps much, or at all. RAII/SBRM is just another
> memory management strategy (albeit a very effective, intuitive one).
> It's basically equivalent to the compiler generating calls to a
> constructor when an object is instantiated, and to a destructor when
> the object goes out of scope.

... and use of longjmp completely breaks scoping rules, but doesn't
inherently violate other program flow expectations.

> So, how your concern fundamentally
> differs from the general case where we're managing resources (but not
> through memory contexts/palloc) explicitly, and risk being cut off
> before control flow reaches our (implicit or explicit) destructor call
> isn't clear, except perhaps that RAII gives clients what may be a
> false sense of security. Sure, one is technically undefined behaviour
> while the other isn't, but the end result is probably identical - a
> memory leak.

Except that Pg, via palloc, offers a way to clean up a whole memory
context. Ensuring you delete your C++ object graph (probably via a few
opaque pointers you pass around in the C code) when a MemoryContext is
deleted isn't hard. palloc's MemoryContextMethods->delete_context
provides just what's required. It's no different to what you do in a
normal extension written in C, except that your deleteMyObject(somePtr)
call happens to be an "extern C" function written in C++ that delete()s
the ptr. No biggie.

You can't do that if you're relying on smart pointers, refcounting,
std::auto_ptr, etc because they're broken by longjmp, dtors won't get
called when they should, you'll think objects are still referenced when
they aren't, and things generally fail.

It's even worse if you're relying on stack-based objects with dtors for
lock management or the like.

> Yes, but my point was that if that occurs above the C++ code, it will
> never be affected by it. We have to longjmp() *over* C++ code before
> we have a problem.

Sure, as per the example I posted.

> Re-implement global operator new() and friends in terms of palloc and
> pfree. This sort of thing is often done for C++ application
> frameworks.

... and regularly causes headaches :S

> It makes me queasy that by doing this, we're resorting to undefined
> behaviour in terms of the C++ standard (destructors are never called)
> as a matter of routine.

Well, if it was done. I really, really would't want to do it for just
those reasons - I've never liked placement new, overriding operator
new(), etc for those reasons.

It's not too tricky to just free your C++ object graph when a
MemoryContext goes out of scope, as MemoryContexts have their own
dtor-equivalents that're reliably called by Pg irrespective of
setjmp/longjmp-based program flow. Why make it more complicated than it
has to be? This way your dtors get called reliably at destruction.

That said, if I was to do that in code I was writing, I'd build a pool
allocator based on a memory context that handed out palloc'd chunks...
and I'd just give up on destructors for those objects.

http://www.parashift.com/c++-faq-lite/dtors.html#faq-11.10
http://www.parashift.com/c++-faq-lite/dtors.html#faq-11.14

 > What do you think? I suppose that such
> undefined behaviour is absolutely intolerable. It's not a serious
> suggestion, just something that I think is worth pointing out.

That stuff is cool, but rarely worth the complexity because it breaks
pretty basic assumptions about how things work. I prefer to just keep my
C and C++ code cleanly separated where possible, and stick to a very
simple subset of C++ where I can't keep them separate.

--
Craig Ringer

Re: server-side extension in c++

From
Mark Cave-Ayland
Date:
David Fetter wrote:

> Is PostGIS following these guidelines?

In short, no. Due to various problems in the early days with C++
exceptions generated by the GEOS library causing problems in C (and also
ABI changes forcing a recompile of any GEOS linked library), a thin
intermediate C++ layer called libgeos_c was added to GEOS.

For each public C++ function, libgeos_c declares a similarly-named
wrapper with extern "C" that just executes the underlying C++ function.
If an underlying error such as an exception occurs, the libgeos_c
wrapper returns false, and a simple handler allows the C caller to
retrieve the related error string.

While it does seem quite inelegant, I don't believe any problems linking
between C/C++ have been reported on any compiler/platform since this was
  put into place.


HTH,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

Re: server-side extension in c++

From
David Fetter
Date:
On Wed, Jun 02, 2010 at 05:41:10PM +0100, Mark Cave-Ayland wrote:
> David Fetter wrote:
>
> >Is PostGIS following these guidelines?
>
> In short, no. Due to various problems in the early days with C++
> exceptions generated by the GEOS library causing problems in C (and
> also ABI changes forcing a recompile of any GEOS linked library), a
> thin intermediate C++ layer called libgeos_c was added to GEOS.
>
> For each public C++ function, libgeos_c declares a similarly-named
> wrapper with extern "C" that just executes the underlying C++
> function. If an underlying error such as an exception occurs, the
> libgeos_c wrapper returns false, and a simple handler allows the C
> caller to retrieve the related error string.
>
> While it does seem quite inelegant, I don't believe any problems
> linking between C/C++ have been reported on any compiler/platform
> since this was  put into place.

It's good to have actual working code in production to bolster the
case that the design is sound.

How much work would it be to refactor libgeos_c to use a catch-all
exception handler?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: server-side extension in c++

From
Bruce Momjian
Date:
Peter Geoghegan wrote:
> >> I would have
> >> imagined that ultimately, the call to the Pg C function must return,
> >> and therefore cannot affect stack unwinding within the C++ part of the
> >> program.
> >
> > That's the whole point; a longjmp breaks the call chain, and the
> > guarantee that eventually the stack will unwind as functions return.
>
> Yes, but my point was that if that occurs above the C++ code, it will
> never be affected by it. We have to longjmp() *over* C++ code before
> we have a problem. However, Bruce has answered the question of whether
> or not that happens - it does, so I guess it doesn't matter.

Yes.  I have updated the C++ doc patch to call it a "distant"
longjump().

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +
Index: doc/src/sgml/extend.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/extend.sgml,v
retrieving revision 1.42
diff -c -c -r1.42 extend.sgml
*** doc/src/sgml/extend.sgml    1 Jun 2010 03:19:36 -0000    1.42
--- doc/src/sgml/extend.sgml    2 Jun 2010 17:34:36 -0000
***************
*** 273,280 ****
    &xoper;
    &xindex;

- <!-- Use this someday when C++ is easier to use. bjm 2010-05-31
-
    <sect1 id="extend-Cpp">
     <title>Using C++ for Extensibility</title>

--- 273,278 ----
***************
*** 284,325 ****

     <para>
      It is possible to use a compiler in C++ mode to build
!     <productname>PostgreSQL</productname> extensions;  you must simply
!     follow the standard methods for dynamically linking to C executables:

      <itemizedlist>
       <listitem>
        <para>
!         Use <literal>extern C</> linkage for all functions that must
!         be accessible by <function>dlopen()</>.  This is also necessary
!         for any functions that might be passed as pointers between
!         the backend and C++ code.
        </para>
       </listitem>
       <listitem>
        <para>
!        Use <function>palloc()</> to allocate any memory that might be
!        freed by the backend C code (don't pass <function>new()</>-allocated
!        memory).
!       </para>
       </listitem>
       <listitem>
        <para>
!        Use <function>pfree()</> to free memory allocated by the backend
!        C code (do not use <function>delete()</> for such cases).
        </para>
       </listitem>
       <listitem>
        <para>
!        Prevent exceptions from propagating into the C code (use a
!        catch-all block at the top level of all <literal>extern C</>
!        functions).
        </para>
       </listitem>
      </itemizedlist>
     </para>

    </sect1>
- -->

   </chapter>
--- 282,338 ----

     <para>
      It is possible to use a compiler in C++ mode to build
!     <productname>PostgreSQL</productname> extensions by following these
!     guidelines:

      <itemizedlist>
       <listitem>
        <para>
!         All functions accessed by the backend must present a C interface
!         to the backend;  these C functions can then call C++ functions.
!         For example, <literal>extern C</> linkage is required for
!         backend-accessed functions.  This is also necessary for any
!         functions that are passed as pointers between the backend and
!         C++ code.
        </para>
       </listitem>
       <listitem>
        <para>
!        Free memory using the appropriate deallocation method.  For example,
!        most backend memory is allocated using <function>palloc()</>, so use
!        <function>pfree()</> to free it, i.e. using C++
!        <function>delete()</> in such cases will fail.
       </listitem>
       <listitem>
        <para>
!        Prevent exceptions from propagating into the C code (use a
!        catch-all block at the top level of all <literal>extern C</>
!        functions).  This is necessary even if the C++ code does not
!        throw any exceptions because events like out-of-memory still
!        throw exceptions.  Any exceptions must be caught and appropriate
!        errors passed back to the C interface.  If possible, compile C++
!        with <option>-fno-exceptions</> to eliminate exceptions entirely;
!        in such cases, you must check for failures in your C++ code, e.g.
!        check for NULL returned by <function>new()</>.
        </para>
       </listitem>
       <listitem>
        <para>
!        If calling backend functions from C++ code, be sure that the
!        C++ call stack contains only plain old data structure
!        (<acronym>POD</>).  This is necessary because backend errors
!        generate a distant <function>longjump()</> that does not properly
!        unroll a C++ call stack with non-POD objects.
        </para>
       </listitem>
      </itemizedlist>
     </para>

+    <para>
+     In summary, it is best to place C++ code behind a wall of
+     <literal>extern C</> functions that interface to the backend,
+     and avoid exception, memory, and call stack leakage.
+    </para>
    </sect1>

   </chapter>

Re: server-side extension in c++

From
Peter Geoghegan
Date:
Hi Mark. You'll recall that I talked with you for quite a while at
Pg-day 2009 in Paris. Nice of you to chime in here.


> Except that Pg, via palloc, offers a way to clean up a whole memory context.
> Ensuring you delete your C++ object graph (probably via a few opaque
> pointers you pass around in the C code) when a MemoryContext is deleted
> isn't hard. palloc's MemoryContextMethods->delete_context provides just
> what's required. It's no different to what you do in a normal extension
> written in C, except that your deleteMyObject(somePtr) call happens to be an
> "extern C" function written in C++ that delete()s the ptr. No biggie.
>
> You can't do that if you're relying on smart pointers, refcounting,
> std::auto_ptr, etc because they're broken by longjmp, dtors won't get called
> when they should, you'll think objects are still referenced when they
> aren't, and things generally fail.
>
> It's even worse if you're relying on stack-based objects with dtors for lock
> management or the like.
>

That all seems very convoluted. Any non-trivial C++ class is either a
resource managing class, directly or indirectly. Therefore, all
non-trivial C++ classes on the stack are broken by longjmp(). You
can't use simple things like strings.

I just wish that it wasn't such a mess.


> It's not too tricky to just free your C++ object graph when a MemoryContext
> goes out of scope, as MemoryContexts have their own dtor-equivalents that're
> reliably called by Pg irrespective of setjmp/longjmp-based program flow. Why
> make it more complicated than it has to be? This way your dtors get called
> reliably at destruction.

I guess that's the least worst option at this time. You'll have to
pass ptrs to somewhere where they'll be subsequently be deleted.
They'll have to be typed. You'll have to write a bunch of utility
functions, one per class used, to preserve typing.

> That said, if I was to do that in code I was writing, I'd build a pool
> allocator based on a memory context that handed out palloc'd chunks... and
> I'd just give up on destructors for those objects.
>
> http://www.parashift.com/c++-faq-lite/dtors.html#faq-11.10
> http://www.parashift.com/c++-faq-lite/dtors.html#faq-11.14
>

Well, you still have the undefined behaviour problem, obviously. I
don't think that that's something that's ever going to be acceptable.

--
Regards,
Peter Geoghegan

Re: server-side extension in c++

From
Peter Geoghegan
Date:
> It's good to have actual working code in production to bolster the
> case that the design is sound.
>
> How much work would it be to refactor libgeos_c to use a catch-all
> exception handler?
>

Well, they'd have to be using specific exception handlers to get the
error message. Perhaps they just have one per wrapper function like
this (assuming their exception objects ultimately inherit from
std::exception, which they all ought to):

...
char* error_string;
...
catch(const std::exception& e)
{
    error_string = malloc(strlen(e.what()) + 1 );
    if(error_string != NULL)
         strcpy(error_string, e.what());
    return false;
}
// They could add this catch-all, which we could fall back on to be on
the safe side
catch(...)
{
   // We can't do anything except swallow - this could be anything
that doesn't inherit from std::exception
}

--
Regards,
Peter Geoghegan

Re: server-side extension in c++

From
Mark Cave-Ayland
Date:
David Fetter wrote:

> It's good to have actual working code in production to bolster the
> case that the design is sound.
>
> How much work would it be to refactor libgeos_c to use a catch-all
> exception handler?
>
> Cheers,
> David.

Given that GEOS is not used exclusively by PostGIS but also by quite a
few other open source GIS packages, I'd say quite small unless it was a
very minimal change. I also should point out that I have very little C++
experience, and so would be the wrong person to ask ;)


ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

Re: server-side extension in c++

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Peter Geoghegan wrote:
> > >> I would have
> > >> imagined that ultimately, the call to the Pg C function must return,
> > >> and therefore cannot affect stack unwinding within the C++ part of the
> > >> program.
> > >
> > > That's the whole point; a longjmp breaks the call chain, and the
> > > guarantee that eventually the stack will unwind as functions return.
> >
> > Yes, but my point was that if that occurs above the C++ code, it will
> > never be affected by it. We have to longjmp() *over* C++ code before
> > we have a problem. However, Bruce has answered the question of whether
> > or not that happens - it does, so I guess it doesn't matter.
>
> Yes.  I have updated the C++ doc patch to call it a "distant"
> longjump().

Applied.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +