Re: Reconnect a single connection used by multiple threads in embedded SQL in C application causes error. - Mailing list pgsql-bugs

From Noah Misch
Subject Re: Reconnect a single connection used by multiple threads in embedded SQL in C application causes error.
Date
Msg-id 20220314070807.GA2317912@rfd.leadboat.com
Whole thread Raw
In response to RE: Reconnect a single connection used by multiple threads in embedded SQL in C application causes error.  ("egashira.yusuke@fujitsu.com" <egashira.yusuke@fujitsu.com>)
Responses RE: Reconnect a single connection used by multiple threads in embedded SQL in C application causes error.  ("egashira.yusuke@fujitsu.com" <egashira.yusuke@fujitsu.com>)
List pgsql-bugs
On Fri, Feb 25, 2022 at 11:29:55AM +0000, egashira.yusuke@fujitsu.com wrote:
> > Would you like to propose a patch?
> 
> Sure. I will work on creating a document patch.
> 
> However, I found another problem while testing to make sure that the use cases were valid.
> 
> >      - Each thread does its own CONNECT, DISCONNECT, and other commands. 
> >        Each thread has its own default connection. No sharing at all."
> 
> In that case, if I execute CONNECT with same connection-string on each threads without connection-name, the later
CONNECTwas skipped.
 
> Then, ECPGdebug shows following message.
> 
>   ECPGconnect: connection identifier (null) is already in use
> 
> This seems to indicate that CONNECT need to specify connection-name even if we use the default connection in each
thread.
> This also seems to me to be an undesirable behavior in multiple thread.

Agreed.  It might have been a nice feature for every thread to have its own
nameless connection.

> Or do we need to name all connections if we use multiple connections, even in multiple thread?

Yes.

On Fri, Mar 04, 2022 at 11:49:39AM +0000, egashira.yusuke@fujitsu.com wrote:
> > It looks like too verbose.  Couldn't it be summarized as something
> > like this?
> > 
> > > Current connection is managed in a per-thread manner. For the first
> > > use on a thread, it is the most recently used one in the process.

There's no behavior specific to "the first use on a thread", and "most
recently used" is not a factor.  Avoid those phrases.  (In a thread that has
never opened a connection, each command that retrieves the current connection
will retrieve the non-thread-specific current connection.)

> Yes.
> I thought that I needed a detailed explanation of the current connection for the note below, but this certainly seems
tobe summable.
 

For this particular paragraph, I liked your v1's level of verbosity more than
this compact version.

> > This doesn't look like a API documentation but a procedual
> > documentation.  IMO there's no particular reason that we should
> > officially concretely suggest how mulitple connectinos are managed on
> > somebody's application.  And part of the section is already in the
> > documentation.

I'm okay with your removal of that paragraph.  As a general matter of
documentation strategy, it's valid for our manual to contain both API docs and
higher-level advice about known strategies for combining the APIs into an
effective program.

> > |  <para>
> > |   If your application uses multiple threads of execution, they cannot share a
> > |   connection concurrently. You must either explicitly control access to the connection
> > |   (using mutexes) or use a connection for each thread.
> > |  </para>
> > 
> > So, it would boil down to the caution about disconnection.
> > 
> > > Note that connections are shared among all threads.  Be careful not
> > > to DISCONNECT a connection that is to be used in another thread.
> 
> I wanted to make a document that could avoid the use cases that I reported, so I tried to write a note according to
theoccurrence conditions.
 
> However, this was also too verbose, so I rewrote the note as you suggested.
> And since there is no mention of the current connection, I changed the section to add it to "36.2.3. Closing a
Connection".
> 
> I think I might be better to add "In the worst case, this may cause an application crash." to this note so that users
canavoid the problem more strongly.
 
> However, this sentence can also be read as claiming that there is an implementation problem.
> Should not I add this sentence?

It's okay to admit implementation problems.  I might write "Behavior is
undefined if the current thread is not the thread that opened the connection."
Referring to undefined behavior is a customary way for API docs to imply that
a crash is possible.  If one wanted to be even more precise, "Behavior is
undefined if the connection is the thread-specific current connection of any
other thread."  However, the more-precise wording is harder to understand.
Also, being more precise now could limit our freedom to change behavior later.

> --- a/doc/src/sgml/ecpg.sgml
> +++ b/doc/src/sgml/ecpg.sgml
> @@ -222,10 +222,15 @@ EXEC SQL CONNECT TO <replaceable>target</replaceable> <optional>AS <replaceable>
>    <para>
>     The <replaceable>connection-name</replaceable> is used to handle
>     multiple connections in one program.  It can be omitted if a
> -   program uses only one connection.  The most recently opened
> -   connection becomes the current connection, which is used by default
> -   when an SQL statement is to be executed (see later in this
> -   chapter).
> +   program uses only one connection.
> +  </para>
> +
> +  <para>
> +   The most recently opened connection becomes the current connection.

This is true.  Disconnecting also changes the current connection, to another
one of the open connections.  Please document that somewhere appropriate.

> +   The current connection is managed in a per-thread manner.  For the
> +   first use on a thread, it is the most recently opened one in the
> +   process.  The current connection is used by default when an SQL
> +   statement is to be executed (see later in this chapter).
>    </para>
>  
>    <para>
> @@ -276,8 +281,8 @@ EXEC SQL CONNECT TO :target USER :user USING :passwd;
>  
>    <para>
>     SQL statements in embedded SQL programs are by default executed on
> -   the current connection, that is, the most recently opened one.  If
> -   an application needs to manage multiple connections, then there are
> +   the current connection, that is, the most recently opened one in each thread.

Likewise, this is not accurate if a disconnect of the current connection
happened after the most recent open.  You could simply delete this claim,
which would avoid defining the current connection in two places.

> +   If an application needs to manage multiple connections, then there are
>     three ways to handle this.
>    </para>



pgsql-bugs by date:

Previous
From: "David G. Johnston"
Date:
Subject: Docs for psql regarding default database name are incorrect
Next
From: Michael Paquier
Date:
Subject: Re: BUG #17409: Unable to alter data type of clustered column which is referenced by foreign key