Thread: parse mistake in ecpg connect string

parse mistake in ecpg connect string

From
"Wang, Shenhao"
Date:
Hi, hacker

I found in function ECPGconnect, the connect string in comment is written as:

/*------
 * new style:
 *    <tcp|unix>:postgresql://server[:port|:/unixsocket/path:]
 *    [/db-name][?options]
 *------
*/

But, the parse logical seems wrong, like:

tmp = strrchr(dbname + offset, ':');
tmp2 = strchr(tmp + 1, ':')

the value tmp2 will always be NULL, the unix-socket path will be ignored.

I have fixed this problem, the patch attached.
However, since this usage is not recorded in manual[1](maybe this is why this problem is not found for a long time), so
howabout delete this source directly instead? 
Thoughts?

This patch only fix the problem when using a character variable to store the connect string like:

EXEC SQL BEGIN DECLARE SECTION;
    char constr[] = "unix:postgresql://localhost:/tmp/a:?port=5435&dbname=postgres";
EXEC SQL END DECLARE SECTION;

If I write a source like:
EXEC SQL CONNECT TO unix:postgresql://localhost:/tmp/a:/postgres?port=5435
EXEC SQL CONNECT TO unix:postgresql://localhost/postgres?host=/tmp/a&port=5435
The program ecpg will report some error when parse .pgc file

I will try to fix this problem later, but it seems a little difficult to add some lex/bison file rules

[1] https://www.postgresql.org/docs/13/ecpg-connect.html#ECPG-CONNECTING

Best regards
Shenhao Wang



Attachment

Re: parse mistake in ecpg connect string

From
Kyotaro Horiguchi
Date:
At Thu, 4 Feb 2021 09:25:00 +0000, "Wang, Shenhao" <wangsh.fnst@cn.fujitsu.com> wrote in 
> Hi, hacker
> 
> I found in function ECPGconnect, the connect string in comment is written as:
> 
> /*------
>  * new style:
>  *    <tcp|unix>:postgresql://server[:port|:/unixsocket/path:]
>  *    [/db-name][?options]
>  *------
> */
> 
> But, the parse logical seems wrong, like:

Actually it looks like broken, but..
  
> [1] https://www.postgresql.org/docs/13/ecpg-connect.html#ECPG-CONNECTING

The comment and related code seem to be remnants of an ancient syntax
of hostname/socket-path style, which should have been cleaned up in
2000. I guess that the tcp: and unix: style target remains just for
backward compatibility, I'm not sure, though.  Nowadays you can do
that by using the "dbname[@hostname][:port]" style target.

EXEC SQL CONNECT TO 'postgres@/tmp:5432';
EXEC SQL CONNECT TO 'unix:postgresql://localhost:5432/postgres?host=/tmp';

FWIW, directly embedding /unixsocket/path syntax in a URL is broken in
the view of URI. It is the reason why the current connection URI takes
the way shown above. So I think we want to remove that code rather
than to fix it.

And, since the documentation is saying that the bare target
specification is somewhat unstable, I'm not sure we dare to *fix* the
ecpg syntax.

In [1]
> In practice, it is probably less error-prone to use a (single-quoted)
> string literal or a variable reference.

That being said, we might need a description about how we can specify
a unix socket directory in ecpg-connect.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/interfaces/ecpg/ecpglib/connect.c b/src/interfaces/ecpg/ecpglib/connect.c
index 6b0a3067e6..f45892304d 100644
--- a/src/interfaces/ecpg/ecpglib/connect.c
+++ b/src/interfaces/ecpg/ecpglib/connect.c
@@ -360,8 +360,7 @@ ECPGconnect(int lineno, int c, const char *name, const char *user, const char *p
 
                 /*------
                  * new style:
-                 *    <tcp|unix>:postgresql://server[:port|:/unixsocket/path:]
-                 *    [/db-name][?options]
+                 *    <tcp|unix>:postgresql://server[:port][/db-name][?options]
                  *------
                  */
                 offset += strlen("postgresql://");
@@ -385,41 +384,11 @@ ECPGconnect(int lineno, int c, const char *name, const char *user, const char *p
                 }
 
                 tmp = strrchr(dbname + offset, ':');
-                if (tmp != NULL)    /* port number or Unix socket path given */
+                if (tmp != NULL)    /* port number given */
                 {
-                    char       *tmp2;
-
                     *tmp = '\0';
-                    if ((tmp2 = strchr(tmp + 1, ':')) != NULL)
-                    {
-                        *tmp2 = '\0';
-                        host = ecpg_strdup(tmp + 1, lineno);
-                        connect_params++;
-                        if (strncmp(dbname, "unix:", 5) != 0)
-                        {
-                            ecpg_log("ECPGconnect: socketname %s given for TCP connection on line %d\n", host,
lineno);
-                            ecpg_raise(lineno, ECPG_CONNECT,
ECPG_SQLSTATE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION,realname ? realname : ecpg_gettext("<DEFAULT>"));
 
-                            if (host)
-                                ecpg_free(host);
-
-                            /*
-                             * port not set yet if (port) ecpg_free(port);
-                             */
-                            if (options)
-                                ecpg_free(options);
-                            if (realname)
-                                ecpg_free(realname);
-                            if (dbname)
-                                ecpg_free(dbname);
-                            free(this);
-                            return false;
-                        }
-                    }
-                    else
-                    {
-                        port = ecpg_strdup(tmp + 1, lineno);
-                        connect_params++;
-                    }
+                    port = ecpg_strdup(tmp + 1, lineno);
+                    connect_params++;
                 }
 
                 if (strncmp(dbname, "unix:", 5) == 0)

RE: parse mistake in ecpg connect string

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Wang,

> the value tmp2 will always be NULL, the unix-socket path will be ignored.

I confirmed it, you're right.

> I have fixed this problem, the patch attached.

It looks good to me:-).

> I will try to fix this problem later, but it seems a little difficult to add some lex/bison file rules

I think rule `connection_target` must be fixed.
Either port or unix_socket_directory can be accepted now(I followed the comment),
but we should discuss about it.
According to the doc, libpq's connection-URI accept both.

The attached patch contains all fixes, and pass test in my environment.
And the following line:

  EXEC SQL CONNECT TO unix:postgresql://localhost:/a:/postgres;

is precompiled to:

  { ECPGconnect(__LINE__, 0, "unix:postgresql://localhost:/a:/postgres" , NULL, NULL , NULL, 0); }

Is it OK?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

RE: parse mistake in ecpg connect string

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Horiguchi-san,

My response crossed in the e-mail with yours. Sorry.

> FWIW, directly embedding /unixsocket/path syntax in a URL is broken in
> the view of URI. It is the reason why the current connection URI takes
> the way shown above. So I think we want to remove that code rather
> than to fix it.

I didn't know such a phenomenon. If other codes follow the rule,
I agree yours.

Digress from the main topic, but the following cannot be accepted for the precompiler.
This should be fixed, isn't it?

EXEC SQL CONNECT TO postgres@/tmp:5432;

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




RE: parse mistake in ecpg connect string

From
"Wang, Shenhao"
Date:
Hi, Horiguchi-san, Kuroda-san:

Thank you for reviewing.

Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

>FWIW, directly embedding /unixsocket/path syntax in a URL is broken in
>the view of URI. It is the reason why the current connection URI takes
>the way shown above. So I think we want to remove that code rather
>than to fix it.

It seems that remove that code is better.

>That being said, we might need a description about how we can specify
>a unix socket directory in ecpg-connect.

After remove the code, if target is:
1. dbname@/unixsocket/path:port
2. unix:postgresql://localhost:port/dbname?host=/unixsocket/path
The ecpg will report an error.

But, if target is:
3. a (single-quoted) string literal of 1 or 2 listed above.
4. a variable reference of 1 or 2 listed above.
The ecpg will precompile successfully. That means if we want to use a unix socket directory in ecpg-connect.
We can only use No.3 and No.4 listed above.

I think we can add some description on docs, but I don't have ability to write description in English,
Can someone help me write a description?

Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:

>Digress from the main topic, but the following cannot be accepted for the precompiler.
>This should be fixed, isn't it?
>
>EXEC SQL CONNECT TO postgres@/tmp:5432;

First, thank you for adding a bison rule.

I think add the bison rule is a little difficult because in PG13 windows can also support unix-socket,
In your patch:
> dir_name: '/' dir_name        { $$ = make2_str(mm_strdup("/"), $2); }
>         | ecpg_ident        { $$ = $1; }
>        ;
Windows will remains wrong(I'm not sure ecpg on windows can use unix socket connection).

And if we add the rules in bison files, both ecpg and ecpglib will both parse the host in different ways.
Ecpg parse the host by bison rules, and ecpglib parse the host by splitting the connect string use char '@' or char
'='.
I think it's not a good action.

But If we add some description on docs, these problem can be solved in an easy way.
Therefore, I prefer to add some description on docs.


Best regards
Shenhao Wang





Re: parse mistake in ecpg connect string

From
Tom Lane
Date:
"Wang, Shenhao" <wangsh.fnst@cn.fujitsu.com> writes:
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>> FWIW, directly embedding /unixsocket/path syntax in a URL is broken in
>> the view of URI. It is the reason why the current connection URI takes
>> the way shown above. So I think we want to remove that code rather
>> than to fix it.

> It seems that remove that code is better.

FWIW, I agree with Horiguchi-san that we should just take out the dead
code in ECPGconnect().  Some checking in our git history shows that it's
never worked since it was added (in a4f25b6a9c2).  If nobody's noticed
in 18 years, and the documentation doesn't say that it should work,
then that's not a feature we need to support.

I do agree that it'd be a good idea to extend the documentation to
point out how to specify a non-default socket path; but I'm content
to say that a "?host=" option is the only way to do that.

I also got a bit of a laugh out of

                    if (strcmp(dbname + offset, "localhost") != 0 && strcmp(dbname + offset, "127.0.0.1") != 0)

Should we allow "::1" here as well?  On the other hand, colons are
already overloaded in this syntax, so maybe allowing them in the
host part is a bad idea.

            regards, tom lane



RE: parse mistake in ecpg connect string

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Wang, Tom

> I think add the bison rule is a little difficult because in PG13 windows can also support unix-socket,
> In your patch:
> > dir_name: '/' dir_name{ $$ = make2_str(mm_strdup("/"), $2); }
> > | ecpg_ident{ $$ = $1; }
> >;
> Windows will remains wrong(I'm not sure ecpg on windows can use unix socket connection).
>
> And if we add the rules in bison files, both ecpg and ecpglib will both parse the host in different ways.
> Ecpg parse the host by bison rules, and ecpglib parse the host by splitting the connect string use char '@' or char
'='.
> I think it's not a good action.
>
> But If we add some description on docs, these problem can be solved in an easy way.
> Therefore, I prefer to add some description on docs.

I didn't care about the windows environment.
Somewhat WIN32 directive can be used for switching code, but I agree your claims.

> I think we can add some description on docs, but I don't have ability to write description in English,
> Can someone help me write a description?

I'm also not a native English speaker, but I put a draft.
Please review it and combine them if it's OK.

> Should we allow "::1" here as well?  On the other hand, colons are
> already overloaded in this syntax, so maybe allowing them in the
> host part is a bad idea.

I have no idea how to fix it now, so I added notice that IPv6 should not be used
in the host part...

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

Re: parse mistake in ecpg connect string

From
Kyotaro Horiguchi
Date:
At Tue, 9 Feb 2021 02:12:37 +0000, "kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com> wrote in 
> Dear Wang, Tom
> 
> > I think add the bison rule is a little difficult because in PG13 windows can also support unix-socket, 
> > In your patch:
> > > dir_name: '/' dir_name{ $$ = make2_str(mm_strdup("/"), $2); }
> > > | ecpg_ident{ $$ = $1; }
> > >;
> > Windows will remains wrong(I'm not sure ecpg on windows can use unix socket connection).
> > 
> > And if we add the rules in bison files, both ecpg and ecpglib will both parse the host in different ways.
> > Ecpg parse the host by bison rules, and ecpglib parse the host by splitting the connect string use char '@' or char
'='.
> > I think it's not a good action.
> > 
> > But If we add some description on docs, these problem can be solved in an easy way.
> > Therefore, I prefer to add some description on docs.
> 
> I didn't care about the windows environment.
> Somewhat WIN32 directive can be used for switching code, but I agree your claims.

This thread looks like discussing about unix-domain socket on
Windows. (I'll look into it.)

> > I think we can add some description on docs, but I don't have ability to write description in English,
> > Can someone help me write a description?
> 
> I'm also not a native English speaker, but I put a draft.
> Please review it and combine them if it's OK.
> 
> > Should we allow "::1" here as well?  On the other hand, colons are
> > already overloaded in this syntax, so maybe allowing them in the
> > host part is a bad idea.

Yeah, that made me smile for the same reason:p

> I have no idea how to fix it now, so I added notice that IPv6 should not be used
> in the host part...

Anyway the host part for the unix: method is just for
spelling. Although I think we can further remove ipv4 address, we
don't even need to bother that. (However, I don't object to add "::1"
either.)

I think replacing "hostname" in the unix: method to "localhost" works.

> dbname[@hostname][:port]
> tcp:postgresql://hostname[:port][/dbname][?options]
- unix:postgresql://<italic>hostname</>[:<i>port</>]..
+ unix:postgresql://<nonitalic>localhost</>[:<i>port</>]..

@@ -199,6 +199,13 @@ EXEC SQL CONNECT TO <replaceable>target</replaceable> <optional>AS <replaceable>
    any <replaceable>keyword</replaceable> or <replaceable>value</replaceable>,
    though not within or after one.  Note that there is no way to
    write <literal>&</literal> within a <replaceable>value</replaceable>.
+
+   Also note that if you want to specify the socket directory
+   for Unix-domain communications, an option <replaceable>host=</replaceable>
+   and single-quoted string must be used.
+   The notation rule is almost the same as libpq's one,
+   but the IPv6 address cannot be used here.
+

<replaceable> is not the tag to use for "host". <varname> is that.

If we change the "hostname" for the unix: method to be fixed, no need
to mention IP address. (Even if someone tries using IP addresses
instead of localhost, that case is not our business:p)

How about the attached?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/ecpg.sgml b/doc/src/sgml/ecpg.sgml
index 0fef9bfcbe..cd34f6a69e 100644
--- a/doc/src/sgml/ecpg.sgml
+++ b/doc/src/sgml/ecpg.sgml
@@ -120,7 +120,7 @@ EXEC SQL CONNECT TO <replaceable>target</replaceable> <optional>AS <replaceable>
 
     <listitem>
      <simpara>
-
<literal>unix:postgresql://<replaceable>hostname</replaceable><optional>:<replaceable>port</replaceable></optional><optional>/<replaceable>dbname</replaceable></optional><optional>?<replaceable>options</replaceable></optional></literal>
+
<literal>unix:postgresql://localhost<optional>:<replaceable>port</replaceable></optional><optional>/<replaceable>dbname</replaceable></optional><optional>?<replaceable>options</replaceable></optional></literal>
      </simpara>
     </listitem>
 
@@ -199,6 +199,10 @@ EXEC SQL CONNECT TO <replaceable>target</replaceable> <optional>AS <replaceable>
    any <replaceable>keyword</replaceable> or <replaceable>value</replaceable>,
    though not within or after one.  Note that there is no way to
    write <literal>&</literal> within a <replaceable>value</replaceable>.
+   The host part of the unix: method URI is fixed, use
+   the <varname>host</varname> option in case of using a non-default socket
+   directory.  Since a directory path contains slashes, the whole target
+   should be an SQL string literal.
   </para>
 
   <para>
diff --git a/src/interfaces/ecpg/ecpglib/connect.c b/src/interfaces/ecpg/ecpglib/connect.c
index 6b0a3067e6..f45892304d 100644
--- a/src/interfaces/ecpg/ecpglib/connect.c
+++ b/src/interfaces/ecpg/ecpglib/connect.c
@@ -360,8 +360,7 @@ ECPGconnect(int lineno, int c, const char *name, const char *user, const char *p
 
                 /*------
                  * new style:
-                 *    <tcp|unix>:postgresql://server[:port|:/unixsocket/path:]
-                 *    [/db-name][?options]
+                 *    <tcp|unix>:postgresql://server[:port][/db-name][?options]
                  *------
                  */
                 offset += strlen("postgresql://");
@@ -385,41 +384,11 @@ ECPGconnect(int lineno, int c, const char *name, const char *user, const char *p
                 }
 
                 tmp = strrchr(dbname + offset, ':');
-                if (tmp != NULL)    /* port number or Unix socket path given */
+                if (tmp != NULL)    /* port number given */
                 {
-                    char       *tmp2;
-
                     *tmp = '\0';
-                    if ((tmp2 = strchr(tmp + 1, ':')) != NULL)
-                    {
-                        *tmp2 = '\0';
-                        host = ecpg_strdup(tmp + 1, lineno);
-                        connect_params++;
-                        if (strncmp(dbname, "unix:", 5) != 0)
-                        {
-                            ecpg_log("ECPGconnect: socketname %s given for TCP connection on line %d\n", host,
lineno);
-                            ecpg_raise(lineno, ECPG_CONNECT,
ECPG_SQLSTATE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION,realname ? realname : ecpg_gettext("<DEFAULT>"));
 
-                            if (host)
-                                ecpg_free(host);
-
-                            /*
-                             * port not set yet if (port) ecpg_free(port);
-                             */
-                            if (options)
-                                ecpg_free(options);
-                            if (realname)
-                                ecpg_free(realname);
-                            if (dbname)
-                                ecpg_free(dbname);
-                            free(this);
-                            return false;
-                        }
-                    }
-                    else
-                    {
-                        port = ecpg_strdup(tmp + 1, lineno);
-                        connect_params++;
-                    }
+                    port = ecpg_strdup(tmp + 1, lineno);
+                    connect_params++;
                 }
 
                 if (strncmp(dbname, "unix:", 5) == 0)

Re: parse mistake in ecpg connect string

From
Kyotaro Horiguchi
Date:
At Tue, 09 Feb 2021 13:58:14 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > I didn't care about the windows environment.
> > Somewhat WIN32 directive can be used for switching code, but I agree your claims.
> 
> This thread looks like discussing about unix-domain socket on
> Windows. (I'll look into it.)

Yes, I forgot to past the URL as ususal:(

https://www.postgresql.org/message-id/CAA4eK1KaTu3_CTAdKON_P4FB=-uvNkviJpqYkhLFcmb8xZkk_Q@mail.gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: parse mistake in ecpg connect string

From
"Wang, Shenhao"
Date:
Hi, Horiguchi-san

Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

> How about the attached?

I think, this patch is good.

> > > Should we allow "::1" here as well?  On the other hand, colons are
> > > already overloaded in this syntax, so maybe allowing them in the
> > > host part is a bad idea.

> Yeah, that made me smile for the same reason:p

It seems that ecpg cannot parse the connect str with ipv6 correctly.

Such as:
EXEC SQL CONNECT TO 'tcp:postgresql://::1:5432/postgres'
connect to the server successfully, but
EXEC SQL CONNECT TO 'tcp:postgresql://::1/postgres'
failed to connect to server.

And ecpg will always wrong when parse a connect str
EXEC SQL CONNECT TO tcp:postgresql://::1:5432/postgres;
Ecpg error :
    a.pgc:16: ERROR: syntax error at or near "::"

Maybe we should support ipv6 like libpq.
In [1],
>  The host part may be either host name or an IP address. To specify an IPv6 host address, enclose it in square
brackets:

How about using square brackets like libpq, such as:
EXEC SQL CONNECT TO 'tcp:postgresql://[::1]/postgres'

Maybe we can create a new thread to talk about how ecpg support ipv6

[1]  https://www.postgresql.org/docs/13/libpq-connect.html#LIBPQ-CONNSTRING

Best regards
Shenhao Wang






RE: parse mistake in ecpg connect string

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Wang, Horiguchi-san,

> > How about the attached?
>
> I think, this patch is good.

I agree. The backward compatibility is violated in the doc, but maybe no one take care.

> Maybe we can create a new thread to talk about how ecpg support ipv6

+1

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




Re: parse mistake in ecpg connect string

From
Tom Lane
Date:
"kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com> writes:
> Dear Wang, Horiguchi-san,
>>> How about the attached?

>> I think, this patch is good.

> I agree. The backward compatibility is violated in the doc, but maybe no one take care.

Pushed with a little more work on the documentation.

            regards, tom lane



Re: parse mistake in ecpg connect string

From
Kyotaro Horiguchi
Date:
At Thu, 11 Feb 2021 15:23:31 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> "kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com> writes:
> > Dear Wang, Horiguchi-san,
> >>> How about the attached?
> 
> >> I think, this patch is good.
> 
> > I agree. The backward compatibility is violated in the doc, but maybe no one take care.
> 
> Pushed with a little more work on the documentation.

Thanks for committing this (and further update of the document).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center