Thread: libpq+MB/putenv(), getenv() clean up

libpq+MB/putenv(), getenv() clean up

From
Tatsuo Ishii
Date:
Hi,

I have a plan to clean up the usage of putenv(), getenv() in libpq+MB
configuration. This needs some interface changes with libpq in the
frontend side. I'm not sure this is visible to end users or not, and I
would like to hear from hackes.

First of all, I would like to explain the current implementation.

(1) While establishing a connection, if the environment variable
PGCLIENTENCODING is not set, libpq asks the backend what the encoding
for the database is. This is done by sending a query "select
getdatabaseencoding()". In this case, both the backend and the
frontend uses same encoding and its name is set to the
PGCLIENTENCODING environment variable for the later use.
(fe-connect.c: PQsetenvPoll())

(2) When libpq prints the result of a query, it needs to determine the
length of a multi-byte letter. For this purpose, getenv() is called to
know the encoding name. (fe-print.c)

Above implementation has a design flaw since it is not multithread-safe.
To fix the problem, I would like to make changes as follows:

(1) Add a new member "int client_encoding" to struct pg_conn.

(2) Add an argument which is a pointer to PGconn to PQsetenvPoll() so
that the client encoding can be set in (1) above.

(3) Add a new function PQclientencoding() to extract client_encoding
from PGconn.

(4) Change PQmblen() so that it extracts encoding info using
PQclientencoding() rather than calling getenv(). This also requires
add an argument which is a pointer to PGconn.

(5) Change fe-print.c:do_filed() to add an argument which is a pointer to
PGconn.

Comments and suggestions are welcome.
--
Tatsuo Ishii


Re: [HACKERS] libpq+MB/putenv(), getenv() clean up

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> I have a plan to clean up the usage of putenv(), getenv() in libpq+MB
> configuration. This needs some interface changes with libpq in the
> frontend side. I'm not sure this is visible to end users or not, and I
> would like to hear from hackes.

I think it is a very good idea to remove getenv() from PQmblen().
getenv() is rather slow, at least on the machines I use, and having
to do it for each character processed is a huge performance hit.

PQmblen is exported by libpq (psql is an example of an application
that uses it).  So very possibly, changing it would break a few client
applications.  A possible answer is to leave PQmblen alone, and invent
a new routine with a different name that looks at PGconn.  We could
deprecate PQmblen and delete it after a few releases.  I'm not sure
if this is worth the trouble or not --- maybe it's OK to make a non-
compatible change to PQmblen.

> (1) While establishing a connection, if the environment variable
> PGCLIENTENCODING is not set, libpq asks the backend what the encoding
> for the database is.

> Above implementation has a design flaw since it is not multithread-safe.

You would still do one getenv() during connection setup, right, to see
if PGCLIENTENCODING is set?  If you don't, that would be a significant
change in behavior that might make a lot of people unhappy.
        regards, tom lane


Re: [HACKERS] libpq+MB/putenv(), getenv() clean up

From
Bruce Momjian
Date:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > I have a plan to clean up the usage of putenv(), getenv() in libpq+MB
> > configuration. This needs some interface changes with libpq in the
> > frontend side. I'm not sure this is visible to end users or not, and I
> > would like to hear from hackes.
> 
> I think it is a very good idea to remove getenv() from PQmblen().
> getenv() is rather slow, at least on the machines I use, and having
> to do it for each character processed is a huge performance hit.

Yikes, we are calling it for every character.  I think it munges through
the entire process environment space looking for a value.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] libpq+MB/putenv(), getenv() clean up

From
Tatsuo Ishii
Date:
> I think it is a very good idea to remove getenv() from PQmblen().
> getenv() is rather slow, at least on the machines I use, and having
> to do it for each character processed is a huge performance hit.

I didn't notice that. Thanks for the point.

> PQmblen is exported by libpq (psql is an example of an application
> that uses it).  So very possibly, changing it would break a few client
> applications.  A possible answer is to leave PQmblen alone, and invent
> a new routine with a different name that looks at PGconn.  We could
> deprecate PQmblen and delete it after a few releases.  I'm not sure
> if this is worth the trouble or not --- maybe it's OK to make a non-
> compatible change to PQmblen.

With the changes I propose, PQmblen() would not work anymore
anyway. I'll post to interfaces list about the incompatible changes.

> You would still do one getenv() during connection setup, right, to see
> if PGCLIENTENCODING is set?

Yes. 

>If you don't, that would be a significant
> change in behavior that might make a lot of people unhappy.

So the behavior won't be changed.
--
Tatsuo Ishii


Re: [HACKERS] libpq+MB/putenv(), getenv() clean up

From
SAKAIDA
Date:
Hi,

Tatsuo Ishii <t-ishii@sra.co.jp> wrote:
> 
> I have a plan to clean up the usage of putenv(), getenv() in libpq+MB
> configuration. This needs some interface changes with libpq in the
> frontend side. I'm not sure this is visible to end users or not, and I
> would like to hear from hackes.
 This plan is welcome !


> Above implementation has a design flaw since it is not multithread-safe.
> To fix the problem, I would like to make changes as follows:
> 
> (1) Add a new member "int client_encoding" to struct pg_conn.
> 
> (2) Add an argument which is a pointer to PGconn to PQsetenvPoll() so
> that the client encoding can be set in (1) above.
> 
> (3) Add a new function PQclientencoding() to extract client_encoding
> from PGconn.
> 
> (4) Change PQmblen() so that it extracts encoding info using
> PQclientencoding() rather than calling getenv(). This also requires
> add an argument which is a pointer to PGconn.
> 
> (5) Change fe-print.c:do_filed() to add an argument which is a pointer to
> PGconn.
> 
> Comments and suggestions are welcome.
 Do those changes mean that libpq(or psql) always has the MULTIBYTE
function?
 Now, libpq's MULTIBYTE function is a compile option(--with-mb).
But, I always want the MULTIBYTE function, even if PostgreSQL have 
*not* been made with "--with-mb" option.  Because, I want to access
two kind of PostgreSQL servers(named A and B) by using the B's psql.
(Here, A server is "--with-mb" and B server is not "--with-mb".)

Regards,
SAKAIDA Masaaki  -- Osaka, Japan




Re: [HACKERS] libpq+MB/putenv(), getenv() clean up

From
Peter Eisentraut
Date:
On Thu, 13 Jan 2000, Tatsuo Ishii wrote:

> Hi,
> 
> I have a plan to clean up the usage of putenv(), getenv() in libpq+MB
> configuration. This needs some interface changes with libpq in the
> frontend side. I'm not sure this is visible to end users or not, and I
> would like to hear from hackes.
> 
> First of all, I would like to explain the current implementation.
> 
> (1) While establishing a connection, if the environment variable
> PGCLIENTENCODING is not set, libpq asks the backend what the encoding
> for the database is. This is done by sending a query "select
> getdatabaseencoding()". In this case, both the backend and the
> frontend uses same encoding and its name is set to the
> PGCLIENTENCODING environment variable for the later use.
> (fe-connect.c: PQsetenvPoll())

Whatever you do, please do not set any environment variables from within
programs. It's evil. Consider the user leaving the database and connecting
to another, but then PGCLIENTENCODING is already set to what would be
interpreted as his "preference", but maybe he wants the backend to decide.
I saw you had some hacks in psql for working around this, but psql is not
every application. I think what you are suggesting below would incorporate
this change, I just want to express it explicitly.

> 
> (2) When libpq prints the result of a query, it needs to determine the
> length of a multi-byte letter. For this purpose, getenv() is called to
> know the encoding name. (fe-print.c)
> 
> Above implementation has a design flaw since it is not multithread-safe.
> To fix the problem, I would like to make changes as follows:
> 
> (1) Add a new member "int client_encoding" to struct pg_conn.
> 
> (2) Add an argument which is a pointer to PGconn to PQsetenvPoll() so
> that the client encoding can be set in (1) above.
> 
> (3) Add a new function PQclientencoding() to extract client_encoding
> from PGconn.

How about PQencoding()?

> 
> (4) Change PQmblen() so that it extracts encoding info using
> PQclientencoding() rather than calling getenv(). This also requires
> add an argument which is a pointer to PGconn.

How about PQmblen(character, encoding)? Then you could call it PQmblen(c,
PQclientencoding(conn)) or PQmblen(c, other_encoding). That makes it more
general.

> 
> (5) Change fe-print.c:do_filed() to add an argument which is a pointer to
> PGconn.
> 
> Comments and suggestions are welcome.
> --
> Tatsuo Ishii
> 
> ************
> 
> 

-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: [HACKERS] libpq+MB/putenv(), getenv() clean up

From
Peter Eisentraut
Date:
On Thu, 13 Jan 2000, SAKAIDA wrote:

>   Do those changes mean that libpq(or psql) always has the MULTIBYTE
> function?
> 
>   Now, libpq's MULTIBYTE function is a compile option(--with-mb).
> But, I always want the MULTIBYTE function, even if PostgreSQL have 
> *not* been made with "--with-mb" option.  Because, I want to access
> two kind of PostgreSQL servers(named A and B) by using the B's psql.
> (Here, A server is "--with-mb" and B server is not "--with-mb".)

Aah, that's a good point, but an always-on psql and libpq is much slower.
But you could use the psql that stems from the multibyte compile, or not?


-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: [HACKERS] libpq+MB/putenv(), getenv() clean up

From
Tatsuo Ishii
Date:
> Whatever you do, please do not set any environment variables from within
> programs. It's evil. 

Of course it's in my plan. I will eliminate them.

> > (3) Add a new function PQclientencoding() to extract client_encoding
> > from PGconn.
> 
> How about PQencoding()?

Humm... someday we may have PQdatabasencoding().  So I think it's
better to have "client" on it not to be confused.

> How about PQmblen(character, encoding)? Then you could call it PQmblen(c,
> PQclientencoding(conn)) or PQmblen(c, other_encoding). That makes it more
> general.

Good idea. Agreed.
--
Tatsuo Ishii


Re: [HACKERS] libpq+MB/putenv(), getenv() clean up

From
Tatsuo Ishii
Date:
I have committed changes below.

> (1) Add a new member "int client_encoding" to struct pg_conn.

done.

> (2) Add an argument which is a pointer to PGconn to PQsetenvPoll() so
> that the client encoding can be set in (1) above.

Rather than adding new parameter, I changed the argument to PGconn *.

> (3) Add a new function PQclientencoding() to extract client_encoding
> from PGconn.

done.

> (4) Change PQmblen() so that it extracts encoding info using
> PQclientencoding() rather than calling getenv(). This also requires
> add an argument which is a pointer to PGconn.

Now,

extern int    PQmblen(const unsigned char *s, int encoding);

(Thanks goes to Peter for the suggestion)

> (5) Change fe-print.c:do_filed() to add an argument which is a pointer to
> PGconn.

I found the arugument PGresult *res of do_field() has a pointer to
PGconn. So I did not need to change the interface.

(6) lots of changes have been made to psql to adapt the changes above.

Though I have run the regression test with/without multibyte and did
not find particular problem, please let me know if you find anything
wrong with those changes.
--
Tatsuo Ishii