Thread: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
"kuroda.hayato@fujitsu.com"
Date:
Dear Hackers, Tom, (I changed subject because this is no longer postgres_fdw's patch) > > What would be better to think about is how to let users specify this > > kind of behavior for themselves. I think it's possible to set > > application_name as part of a foreign server's connection options, > > but at present the result would only be a constant string. Somebody > > who wished the PID to be in there would like to have some sort of > > formatting escape, say "%p" for PID. Extrapolating wildly, maybe we > > could make all the %-codes known to log_line_prefix available here. > > I think your argument is better than mine. I will try to implement this approach. I made a patch based on your advice. I add parse and rewrite function in connectOptions2(). I implemented in libpq layer, hence any other application can be used. Here is an example: ``` $ env PGAPPNAME=000%p%utest000 psql -d postgres -c "show application_name" application_name ----------------------- 00025632hayatotest000 (1 row) ``` > > I don't think this is a great idea as-is. People who need to do this > > sort of thing will all have their own ideas of what they need to track > > --- most obviously, it might be appropriate to include the originating > > server's name, else you don't know what machine the PID is for. > > I thought this is not big problem because hostname (or IP address) can be > added to log_line_prefix. I added only local-pid because this is the only thing > that cannot be set in the parameter. Currently network hostname (or IP address) cannot be set because of the above reason. Some backends' code must be modified if we want to get and embed such information. Of cause we can get system hostname by gethostname(), but I cannot judge whether it is meaningful. How do you think? Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
Fujii Masao
Date:
On 2021/08/05 12:18, kuroda.hayato@fujitsu.com wrote: > Dear Hackers, Tom, > > (I changed subject because this is no longer postgres_fdw's patch) > >>> What would be better to think about is how to let users specify this >>> kind of behavior for themselves. I think it's possible to set >>> application_name as part of a foreign server's connection options, >>> but at present the result would only be a constant string. Somebody >>> who wished the PID to be in there would like to have some sort of >>> formatting escape, say "%p" for PID. Extrapolating wildly, maybe we >>> could make all the %-codes known to log_line_prefix available here. >> >> I think your argument is better than mine. I will try to implement this approach. > > I made a patch based on your advice. I add parse and rewrite function in connectOptions2(). > I implemented in libpq layer, hence any other application can be used. > Here is an example: > > ``` > $ env PGAPPNAME=000%p%utest000 psql -d postgres -c "show application_name" > application_name > ----------------------- > 00025632hayatotest000 > (1 row) > ``` Why did you make even %u and %d available in application_name? With the patch, they are replaced with the user name to connect as and the database name to connect to, respectively. Unlike %p (i.e., PID in *client* side), those information can be easily viewed via log_line_prefix and pg_stat_activity, etc in the server side. So I'm not sure how much helpful to expose them also in application_name. > >>> I don't think this is a great idea as-is. People who need to do this >>> sort of thing will all have their own ideas of what they need to track >>> --- most obviously, it might be appropriate to include the originating >>> server's name, else you don't know what machine the PID is for. So some people may want to specify their own ID in application_name when connecting to the foreign server. For now they can do that by setting application_name per foreign server object. But this means that every session connecting to the same foreign server basically should use the same application_name. If they want to change the setting, they need to execute "ALTER SERAVER ... OPTIONS ...". Isn't this inconvenient? If yes, what about allowing us to specify foreign server's application_name per session? For example, we can add postgres_fdw.application_name GUC, and then if it's set postgres_fdw always uses it instead of the server-level option when connecting to the foreign server. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
RE: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
"kuroda.hayato@fujitsu.com"
Date:
Dear Fujii-san, Thank you for replying! I attached new version. > Why did you make even %u and %d available in application_name? Actually no particular reason. I added them because they can easily add... And I agree what you say, so removed. > So some people may want to specify their own ID in application_name > when connecting to the foreign server. For now they can do that by > setting application_name per foreign server object. But this means that > every session connecting to the same foreign server basically should > use the same application_name. If they want to change the setting, > they need to execute "ALTER SERAVER ... OPTIONS ...". Isn't this inconvenient? Yeah, currently such users must execute ALTER command for each time. > If yes, what about allowing us to specify foreign server's application_name > per session? For example, we can add postgres_fdw.application_name GUC, > and then if it's set postgres_fdw always uses it instead of the server-level > option when connecting to the foreign server. Thought? OK, I added the GUC parameter. My patch Defines new GUC at _PG_init(). The value is used when backends started to connect to remote server. Normal users can modify the value by SET commands but that change does not propagate to the established connections. I'm not sure about the versioning policy about contrib. Do we have to make new version? Any other objects are not changed. Finally, I and Fujii-san were now discussing locally whether this replacement should be in libpq or postgres_fdw layer. I started to think that it should be postgres_fdw, because: * previously aplications could distinguish by using current applicaiton_name, * libpq is used almost all uses so some modifications might affect someone, and * libpq might be just a tool, and should not be intelligent I will make and attach another version that new codes move to postgres_fdw. So could you vote which version is better? Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
RE: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
"kuroda.hayato@fujitsu.com"
Date:
Dear Fujii-san, I attached new version, that almost all codes moved from libpq to postgres_fdw. Now we can accept four types of escapes. All escapes will be rewritten to connection souce's information: * application_name, * user name, * database name, and * backend's pid. These are cannot be set by log_line_prefix, so I think it is useful. We can use escape sequences in two ways: * Sets postgres_fdw.application_name GUC paramter, or * Sets application_name option in CREATE SERVER command. Which version do you like? Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
Fujii Masao
Date:
On 2021/08/31 16:11, kuroda.hayato@fujitsu.com wrote: > Dear Fujii-san, > > I attached new version, that almost all codes > moved from libpq to postgres_fdw. Thanks for updating the patch! Can we split the patch into two as follows? If so, we can review and commit them one by one. #1. Add application_name GUC into postgres_fdw #2. Allow to specify special escape characters in application_name that postgres_fdw uses. > > Now we can accept four types of escapes. > All escapes will be rewritten to connection souce's information: > > * application_name, > * user name, > * database name, and > * backend's pid. +1 > > These are cannot be set by log_line_prefix, so I think it is useful. > > > We can use escape sequences in two ways: > * Sets postgres_fdw.application_name GUC paramter, or > * Sets application_name option in CREATE SERVER command. OK. > > Which version do you like? For now I've not found real use case that implementing the feature in libpq would introduce. So IMO postgres_fdw version is better. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
RE: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
"kuroda.hayato@fujitsu.com"
Date:
Dear Fujii-san, > Can we split the patch into two as follows? If so, we can review > and commit them one by one. > > #1. Add application_name GUC into postgres_fdw > #2. Allow to specify special escape characters in application_name that > postgres_fdw uses. OK, I split and attached like that. 0001 adds new GUC, and 0002 allows to accept escapes. > For now I've not found real use case that implementing the feature > in libpq would introduce. So IMO postgres_fdw version is better. +1, so I'll focus on the this version. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
Fujii Masao
Date:
On 2021/09/01 19:04, kuroda.hayato@fujitsu.com wrote: > OK, I split and attached like that. 0001 adds new GUC, and > 0002 allows to accept escapes. Thanks for splitting and updating the patches! Here are the comments for 0001 patch. - /* Use "postgres_fdw" as fallback_application_name. */ + /* Use GUC paramter if set */ + if (pgfdw_application_name && *pgfdw_application_name != '\0') This GUC parameter should be set after the options of foreign server are set so that its setting can override the server-level ones. Isn't it better to comment this? +static bool +check_pgfdw_application_name(char **newval, void **extra, GucSource source) +{ + /* Only allow clean ASCII chars in the application name */ + if (*newval) + pg_clean_ascii(*newval); + return true; Do we really need this check hook? Even without that, any non-ASCII characters in application_name would be replaced with "?" in the foreign PostgreSQL server when it's passed to that. On the other hand, if it's really necessary, application_name set in foreign server object also needs to be processed in the same way. + NULL, + PGC_USERSET, + GUC_IS_NAME, Why is GUC_IS_NAME flag necessary? postgres_fdw.application_name overrides application_name set in foreign server object. Shouldn't this information be documented? Isn't it better to have the regression test for this feature? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
RE: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
"kuroda.hayato@fujitsu.com"
Date:
Dear Fujii-san, Thank you for reviewing! > This GUC parameter should be set after the options of foreign server > are set so that its setting can override the server-level ones. > Isn't it better to comment this? I added the following comments: ```diff - /* Use "postgres_fdw" as fallback_application_name. */ + /* + * Use pgfdw_application_name as application_name. + * + * Note that this check must be behind constructing generic options + * because pgfdw_application_name has higher priority. + */ ``` > Do we really need this check hook? Even without that, any non-ASCII characters > in application_name would be replaced with "?" in the foreign PostgreSQL server > when it's passed to that. > > On the other hand, if it's really necessary, application_name set in foreign > server object also needs to be processed in the same way. I added check_hook because I want to make sure that input for parse function has only ascii characters. But in this phase we don't have such a function, so removed. (Actually I'm not sure it is really needed, so I will check until next phase) > Why is GUC_IS_NAME flag necessary? I thought again and I noticed even if extremely long string is specified it will be truncated at server-side. This check is duplicated so removed. Maybe such a check is a user-responsibility. > postgres_fdw.application_name overrides application_name set in foreign server > object. > Shouldn't this information be documented? I added descriptions to postgres_fdw.sgml. > Isn't it better to have the regression test for this feature? +1, added. In that test SET the parameter and check pg_stat_activity. Attached is the fixed version. 0002 will be rebased later. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
Fujii Masao
Date:
On 2021/09/02 18:27, kuroda.hayato@fujitsu.com wrote: > I added the following comments: > > ```diff > - /* Use "postgres_fdw" as fallback_application_name. */ > + /* > + * Use pgfdw_application_name as application_name. > + * > + * Note that this check must be behind constructing generic options > + * because pgfdw_application_name has higher priority. > + */ > ``` Thanks! What about updating the comments furthermore as follows? --------------------------------- Use pgfdw_application_name as application_name if set. PQconnectdbParams() processes the parameter arrays from start to end. If any key word is repeated, the last value is used. Therefore note that pgfdw_application_name must be added to the arrays after options of ForeignServer are, so that it can override application_name set in ForeignServer. --------------------------------- > Attached is the fixed version. 0002 will be rebased later. Thanks for updating the patch! + } + /* Use "postgres_fdw" as fallback_application_name */ It's better to add new empty line between these two lines. +-- Disconnect once because the value is used only when establishing connections +DO $$ + BEGIN + PERFORM postgres_fdw_disconnect_all(); + END +$$; Why does DO command need to be used here to execute postgres_fdw_disconnect_all()? Instead, we can just execute "SELECT 1 FROM postgres_fdw_disconnect_all();"? For test coverage, it's better to test at least the following three cases? (1) appname is set in neither GUC nor foreign server -> "postgres_fdw" set in fallback_application_name is used (2) appname is set in foreign server, but not in GUC -> appname in foreign server is used (3) appname is set both in GUC and foreign server -> appname in GUC is used +SELECT FROM ft1 LIMIT 1; "1" should be added just after SELECT in the above statement? Because postgres_fdw regression test basically uses "SELECT 1 FROM ..." in other places. + DefineCustomStringVariable("postgres_fdw.application_name", + "Sets the application name. This is used when connects to the remote server.", What about simplifying this description as follows? --------------- Sets the application name to be used on the remote server. --------------- + <title> Configuration Parameters </title> + <variablelist> The empty characters just after <title> and before </title> should be removed? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
RE: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
"kuroda.hayato@fujitsu.com"
Date:
Dear Fujii-san, Thank you for your great works. Attached is the latest version. > Thanks! What about updating the comments furthermore as follows? > > --------------------------------- > Use pgfdw_application_name as application_name if set. > > PQconnectdbParams() processes the parameter arrays from start to end. > If any key word is repeated, the last value is used. Therefore note that > pgfdw_application_name must be added to the arrays after options of > ForeignServer are, so that it can override application_name set in > ForeignServer. > --------------------------------- It's more friendly than mine because it mentions about specification about PQconnectdbParams(). Fixed like yours. > + } > + /* Use "postgres_fdw" as fallback_application_name */ > > It's better to add new empty line between these two lines. Fixed. > +-- Disconnect once because the value is used only when establishing > connections > +DO $$ > + BEGIN > + PERFORM postgres_fdw_disconnect_all(); > + END > +$$; > > Why does DO command need to be used here to execute > postgres_fdw_disconnect_all()? Instead, we can just execute > "SELECT 1 FROM postgres_fdw_disconnect_all();"? DO command was used because I want to ignore the returning value of postgres_fdw_disconnect_all(). Currently this function retruns false, but if other tests are modified, some connection may remain and the function may become to return true. I seeked sql file and I found that this function was called by your way. Hence I fixed. > For test coverage, it's better to test at least the following three cases? > > (1) appname is set in neither GUC nor foreign server > -> "postgres_fdw" set in fallback_application_name is used > (2) appname is set in foreign server, but not in GUC > -> appname in foreign server is used > (3) appname is set both in GUC and foreign server > -> appname in GUC is used I set four testcases: (1) Sets neither GUC nor server option (2) Sets server option, but not GUC (3) Sets GUC but not server option (4) Sets both GUC and server option I confirmed it almost works fine, but I found that fallback_application_name will be never used in our test enviroment. It is caused because our test runner pg_regress sets PGAPPNAME to "pg_regress" and libpq prefer the environment variable to fallback_appname. (I tried to control it by \setenv, but failed...) > +SELECT FROM ft1 LIMIT 1; > > "1" should be added just after SELECT in the above statement? > Because postgres_fdw regression test basically uses "SELECT 1 FROM ..." > in other places. Fixed. > + DefineCustomStringVariable("postgres_fdw.application_name", > + "Sets the application name. This is used when connects to the remote server.", > > What about simplifying this description as follows? > > --------------- > Sets the application name to be used on the remote server. > --------------- +1. > + <title> Configuration Parameters </title> > + <variablelist> > > The empty characters just after <title> and before </title> should be removed? I checked other sgml file and agreed. Fixed. And I found that including string.h is no more needed. Hence it is removed. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
Fujii Masao
Date:
On 2021/09/03 14:56, kuroda.hayato@fujitsu.com wrote: > Dear Fujii-san, > > Thank you for your great works. Attached is the latest version. Thanks a lot! > I set four testcases: > > (1) Sets neither GUC nor server option > (2) Sets server option, but not GUC > (3) Sets GUC but not server option > (4) Sets both GUC and server option OK. > I confirmed it almost works fine, but I found that > fallback_application_name will be never used in our test enviroment. > It is caused because our test runner pg_regress sets PGAPPNAME to "pg_regress" and > libpq prefer the environment variable to fallback_appname. > (I tried to control it by \setenv, but failed...) make check uses "pg_regress", but I guess that make installcheck uses "postgres_fdw". So the patch would cause make installcheck to fail. I think that the case (1) is not so important, so can be removed. Thought? Attached is the updated version of the patch. I removed the test for case (1). And I arranged the regression tests so that they are based on debug_discard_caches, to simplify them. Also I added and updated some comments and docs. Could you review this version? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
RE: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
"kuroda.hayato@fujitsu.com"
Date:
Dear Fujii-san, Thank you for updating! Your modification is very interesting and I learn something new. > Attached is the updated version of the patch. I removed the test > for case (1). And I arranged the regression tests so that they are based > on debug_discard_caches, to simplify them. Also I added and updated > some comments and docs. Could you review this version? I agree removing (1) because it is obvious case. ```diff +-- If appname is set both as GUC and as options of server object, +-- the GUC setting overrides appname of server object and is used. +SET postgres_fdw.application_name TO 'fdw_guc_appname'; +SELECT 1 FROM postgres_fdw_disconnect_all(); + ?column? +---------- + 1 +(1 row) + +SELECT 1 FROM ft1 LIMIT 1; + ?column? +---------- + 1 +(1 row) + +SELECT application_name FROM pg_stat_activity + WHERE application_name IN ('loopback2', 'fdw_guc_appname'); + application_name +------------------ + fdw_guc_appname +(1 row) ``` I think we should SELECT ft6 because foreign server 'loopback' doesn't have application_name server option. I have no comments anymore. Best Regards, Hayato Kuroda FUJITSU LIMITED
Hi, Kuroda-san, Fujii-san I agree that this feature is useful. Thanks for working this. I've tried to use the patches and read them. I have some questions. 1) Why does postgres_fdw.application_name override the server's option? > + establishes a connection to a foreign server. This overrides > + <varname>application_name</varname> option of the server object. > + Note that change of this parameter doesn't affect any existing > + connections until they are re-established. My expected behavior was that application_name option of server object overrides the GUC because other GUCs seems behave so. For example, log_statement in postgresql.conf can be overrided by ALTER ROLE for only specific user. Should the narrower scope setting takes precedence? 2) Is it better to specify that we can use the escaped format for application_name option of the server object in the document? I think it's new feature too. 3) Is the following expected behavior? * 1. [coorinator] configure "postgres_fdw.application_name" as "[%d]". * 2. [coodinator] connect *Non-ASCII* database. * 3. [coodinator] execute queries for remote server. * 4. [remote] execute the following query. ``` postgres(2654193)=# SELECT application_name FROM pg_stat_activity WHERE backend_type = 'client backend' ; application_name --------------------------- psql [?????????] ``` One of the difference between log_line_prefix and application_name is that whether it only allow clean ASCII chars or not. So, the above behavior is expected. I think there are three choices for handling the above case. 1) Accept the above case because Non-ASCII rarely be used(?), and describe the limitation in the document. 2) Add a new characters for oid of database. 3) Lease the limitation of application_name and make it accept Non-ASCII. As a user, 3) is best for me. But it seems be out of scope this threads, so will you select 1)? Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Dear Ikeda-san, > I agree that this feature is useful. Thanks for working this. Thanks :-). > 1) > > Why does postgres_fdw.application_name override the server's option? > > > + establishes a connection to a foreign server. This overrides > > + <varname>application_name</varname> option of the server object. > > + Note that change of this parameter doesn't affect any existing > > + connections until they are re-established. > > My expected behavior was that application_name option of server object > overrides the GUC because other GUCs seems behave so. For example, > log_statement in postgresql.conf can be overrided by ALTER ROLE for > only specific user. Should the narrower scope setting takes precedence? Hmm... I didn't know the policy in other options, thank you. I set higher priority because of the following reason. When users set app name as server option and create remote connection from same backend process, they will use same even if GUC sets. In this case users must execute ALTER SERVERS every session, and I think it is inconvenient. I think both approaches are reasonable, so I also want to ask committer. Fujii-san, how do you think? > 2) > > Is it better to specify that we can use the escaped format > for application_name option of the server object in the document? > I think it's new feature too. Yeah, I agree. I will add at `Connection Options` section in 0002 patch. Is it OK? Do we have more good place? > 3) > > Is the following expected behavior? > > * 1. [coorinator] configure "postgres_fdw.application_name" as "[%d]". > * 2. [coodinator] connect *Non-ASCII* database. > * 3. [coodinator] execute queries for remote server. > * 4. [remote] execute the following query. > > ``` > postgres(2654193)=# SELECT application_name FROM pg_stat_activity WHERE > backend_type = 'client backend' ; > application_name > --------------------------- > psql > [?????????] > ``` > > One of the difference between log_line_prefix and application_name > is that whether it only allow clean ASCII chars or not. So, the above > behavior is expected. > > I think there are three choices for handling the above case. > > 1) Accept the above case because Non-ASCII rarely be used(?), and > describe the limitation in the document. > 2) Add a new characters for oid of database. > 3) Lease the limitation of application_name and make it accept > Non-ASCII. > > As a user, 3) is best for me. > But it seems be out of scope this threads, so will you select 1)? Actually I did not considering about such a situation... But I want to choose (1) because the limitation is caused by libpq layer and the modification is almost unrelated to this patch. I'm not sure why appname have such a limitation so at first we should investigate it. How do you think? Best Regards, Hayato Kuroda FUJITSU LIMITED
On 2021-09-06 13:21, kuroda.hayato@fujitsu.com wrote: > Dear Ikeda-san, > >> I agree that this feature is useful. Thanks for working this. > > Thanks :-). > >> 1) >> >> Why does postgres_fdw.application_name override the server's option? >> >> > + establishes a connection to a foreign server. This overrides >> > + <varname>application_name</varname> option of the server object. >> > + Note that change of this parameter doesn't affect any existing >> > + connections until they are re-established. >> >> My expected behavior was that application_name option of server object >> overrides the GUC because other GUCs seems behave so. For example, >> log_statement in postgresql.conf can be overrided by ALTER ROLE for >> only specific user. Should the narrower scope setting takes >> precedence? > > Hmm... I didn't know the policy in other options, thank you. > I set higher priority because of the following reason. > When users set app name as server option and > create remote connection from same backend process, > they will use same even if GUC sets. > In this case users must execute ALTER SERVERS every session, > and I think it is inconvenient. Thanks for explaining what you thought. I understood your idea. IIUC, we can execute "ALTER SERVERS" commands automatically using scripts? If so, to have finer control seems to be more important than to reduce the effort of overwriting every configuration. But this is just my feeling. I agree to hear comments from Fujii-san and so on. > I think both approaches are reasonable, so I also want to ask > committer. > Fujii-san, how do you think? >> 2) >> >> Is it better to specify that we can use the escaped format >> for application_name option of the server object in the document? >> I think it's new feature too. > > Yeah, I agree. I will add at `Connection Options` section in 0002 > patch. > Is it OK? Do we have more good place? +1 I found that '%%' need to be described in the documents of postgres_fdw.application_name. What do you think? >> 3) >> >> Is the following expected behavior? >> >> * 1. [coorinator] configure "postgres_fdw.application_name" as "[%d]". >> * 2. [coodinator] connect *Non-ASCII* database. >> * 3. [coodinator] execute queries for remote server. >> * 4. [remote] execute the following query. >> >> ``` >> postgres(2654193)=# SELECT application_name FROM pg_stat_activity >> WHERE >> backend_type = 'client backend' ; >> application_name >> --------------------------- >> psql >> [?????????] >> ``` >> >> One of the difference between log_line_prefix and application_name >> is that whether it only allow clean ASCII chars or not. So, the above >> behavior is expected. >> >> I think there are three choices for handling the above case. >> >> 1) Accept the above case because Non-ASCII rarely be used(?), and >> describe the limitation in the document. >> 2) Add a new characters for oid of database. >> 3) Lease the limitation of application_name and make it accept >> Non-ASCII. >> >> As a user, 3) is best for me. >> But it seems be out of scope this threads, so will you select 1)? > > Actually I did not considering about such a situation... > But I want to choose (1) because the limitation is caused by libpq > layer > and the modification is almost unrelated to this patch. > I'm not sure why appname have such a limitation > so at first we should investigate it. How do you think? OK. I agree that (1) is enough for the first step. If I have any time, I'll investigate it too. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
Fujii Masao
Date:
On 2021/09/06 10:32, kuroda.hayato@fujitsu.com wrote: > I think we should SELECT ft6 because foreign server 'loopback' > doesn't have application_name server option. Yes. I forgot to update that... Thanks for the review! Attached is the updated version of the patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On 2021/09/06 16:48, Masahiro Ikeda wrote: > On 2021-09-06 13:21, kuroda.hayato@fujitsu.com wrote: >> Dear Ikeda-san, >> >>> I agree that this feature is useful. Thanks for working this. >> >> Thanks :-). >> >>> 1) >>> >>> Why does postgres_fdw.application_name override the server's option? >>> >>> > + establishes a connection to a foreign server. This overrides >>> > + <varname>application_name</varname> option of the server object. >>> > + Note that change of this parameter doesn't affect any existing >>> > + connections until they are re-established. >>> >>> My expected behavior was that application_name option of server object >>> overrides the GUC because other GUCs seems behave so. For example, >>> log_statement in postgresql.conf can be overrided by ALTER ROLE for >>> only specific user. Should the narrower scope setting takes precedence? An option of a role doesn't always override a GUC setting. For example, GUC with PGC_USERSET like work_mem, work_mem setting of a role overrides that set in postgresql.conf, as you told. But if work_mem is set by SET command, its value overrides the option of a role. So if we should treat an option of foreign server object like an option of a role, we should handle applicaion_name option for postgres_fdw in that way. If we want to implement this, we need to check the context of postgres_fdw.application_name when using it. If its context is PGC_SIGHUP or PGC_POSTMASTER, it needs to be added the connection arrays that PQconnectParams() uses before options of server object are processed, i.e., application_name of server object overrides the other in this case. On the other hand, if its context is higher than PGC_SIGHUP, it needs to be added to the arrays after options of server object are processed. I'm OK with the current proposal for now (i.e., postgres_fdw.application_name always overrides that of server object) at least as the first step beucase it's simple. But if you want that feature and can simply implement it, I don't object to that. > IIUC, we can execute "ALTER SERVERS" commands automatically using scripts? > If so, to have finer control seems to be more important than to reduce the effort of > overwriting every configuration. You mean to execute ALTER SERVER automatically via script whenever a client connects to the server (i.e., a client passes its application_name to the server, the server automatically sets it to the foreign server object, then the server uses it as application_name of the remote connection)? >>> Is it better to specify that we can use the escaped format >>> for application_name option of the server object in the document? >>> I think it's new feature too. >> >> Yeah, I agree. I will add at `Connection Options` section in 0002 patch. >> Is it OK? Do we have more good place? > > +1 +1 >> Actually I did not considering about such a situation... >> But I want to choose (1) because the limitation is caused by libpq layer >> and the modification is almost unrelated to this patch. >> I'm not sure why appname have such a limitation >> so at first we should investigate it. How do you think? > > OK. I agree that (1) is enough for the first step. +1 > If I have any time, I'll investigate it too. Maybe we need to consider what happens if different clients use application_name with different encodings. How should pg_stat_activity.application_name and log_line_prefix, etc handle such case? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021/09/06 21:28, Fujii Masao wrote: > > > On 2021/09/06 16:48, Masahiro Ikeda wrote: >> On 2021-09-06 13:21, kuroda.hayato@fujitsu.com wrote: >>> Dear Ikeda-san, >>> >>>> I agree that this feature is useful. Thanks for working this. >>> >>> Thanks . >>> >>>> 1) >>>> >>>> Why does postgres_fdw.application_name override the server's option? >>>> >>>>> + establishes a connection to a foreign server. This overrides >>>>> + <varname>application_name</varname> option of the server object. >>>>> + Note that change of this parameter doesn't affect any existing >>>>> + connections until they are re-established. >>>> >>>> My expected behavior was that application_name option of server object >>>> overrides the GUC because other GUCs seems behave so. For example, >>>> log_statement in postgresql.conf can be overrided by ALTER ROLE for >>>> only specific user. Should the narrower scope setting takes precedence? > > An option of a role doesn't always override a GUC setting. For example, > GUC with PGC_USERSET like work_mem, work_mem setting of a role overrides > that set in postgresql.conf, as you told. But if work_mem is set by > SET command, its value overrides the option of a role. So if we should treat > an option of foreign server object like an option of a role, we should handle > applicaion_name option for postgres_fdw in that way. > > If we want to implement this, we need to check the context of > postgres_fdw.application_name when using it. If its context is PGC_SIGHUP > or PGC_POSTMASTER, it needs to be added the connection arrays that > PQconnectParams() uses before options of server object are processed, > i.e., application_name of server object overrides the other in this case. > On the other hand, if its context is higher than PGC_SIGHUP, it needs to > be added to the arrays after options of server object are processed. > > I'm OK with the current proposal for now (i.e., postgres_fdw.application_name > always overrides that of server object) at least as the first step beucase > it's simple. But if you want that feature and can simply implement it, I don't > object to that. Thanks for explaining. I forgot to consider about SET command. I understood that SET command should override the old value, and it's difficult to manage the contexts for server options and the GUC. Sorry for my impractical proposal. >> IIUC, we can execute "ALTER SERVERS" commands automatically using scripts? >> If so, to have finer control seems to be more important than to reduce the >> effort of >> overwriting every configuration. > > You mean to execute ALTER SERVER automatically via script whenever > a client connects to the server (i.e., a client passes its application_name > to the server, the server automatically sets it to the foreign server object, > then the server uses it as application_name of the remote connection)? Sorry, Kuroda-san and Fujii-san are right. If SET command is used, ALTER SERVER doesn't work. >>>> Is it better to specify that we can use the escaped format >>>> for application_name option of the server object in the document? >>>> I think it's new feature too. >>> >>> Yeah, I agree. I will add at `Connection Options` section in 0002 patch. >>> Is it OK? Do we have more good place? >> >> +1 > > +1 > >>> Actually I did not considering about such a situation... >>> But I want to choose (1) because the limitation is caused by libpq layer >>> and the modification is almost unrelated to this patch. >>> I'm not sure why appname have such a limitation >>> so at first we should investigate it. How do you think? >> >> OK. I agree that (1) is enough for the first step. > > +1 > >> If I have any time, I'll investigate it too. > > Maybe we need to consider what happens if different clients use > application_name with different encodings. How should > pg_stat_activity.application_name and log_line_prefix, etc handle such case? Thanks. That makes sense. I'll check them. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
RE: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
"kuroda.hayato@fujitsu.com"
Date:
Dear Fujii-san, I confirmed it and I think it's OK. Other comments should be included in 0002. Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
Fujii Masao
Date:
On 2021/09/07 10:32, kuroda.hayato@fujitsu.com wrote: > Dear Fujii-san, > > I confirmed it and I think it's OK. > Other comments should be included in 0002. Pushed 0001 patch. Thanks! Could you rebase 0002 patch? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
RE: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
"kuroda.hayato@fujitsu.com"
Date:
Dear Fujii-san, Ikeda-san, > Pushed 0001 patch. Thanks! I confirmed your commit. Thanks! I attached the rebased version. Tests and descriptions were added. In my understanding Ikeda-san's indication is included. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
Tom Lane
Date:
Fujii Masao <masao.fujii@oss.nttdata.com> writes: > Pushed 0001 patch. Thanks! The buildfarm shows that this test case isn't terribly reliable. TBH, I think you should just remove the test case altogether. It does not look useful enough to justify a permanent expenditure of testing cycles, never mind the amount of effort that would be needed to stabilize it. regards, tom lane
Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)
From
Fujii Masao
Date:
On 2021/09/08 7:46, Tom Lane wrote: > Fujii Masao <masao.fujii@oss.nttdata.com> writes: >> Pushed 0001 patch. Thanks! > > The buildfarm shows that this test case isn't terribly reliable. Yes... Thanks for reporting this! > TBH, I think you should just remove the test case altogether. > It does not look useful enough to justify a permanent expenditure of > testing cycles, never mind the amount of effort that would be needed to > stabilize it. Ok, I will remove the tests. Kuroda-san is proposing the additional feature which allows postgres_fdw.application_name to include escape characters. If we commit the feature, it's worth adding the similar tests checking that the escape characters there are replaced with expected values. So it's better to investigate what made the tests fail, not to cause the same tests failure later. The tests use postgres_fdw_disconnect_all() to close the existing remote connections before establishing new remote connection with new application_name setting. Then they check that the expected application_name appears in pg_stat_activity. The cause of the issue seems to be that it can take a bit time for the existing remote connection (i.e., its corresponding backend) to exit after postgres_fdw_disconect_all() is called. So application_name of the existing remote connection also could appear in pg_stat_activity when it's checked next time. If this analysis is right, the tests that the upcoming patch adds should handle the case where the existing remote connection can appear in pg_stat_activity after postgres_fdw_disconect_all(). Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Tue, 7 Sep 2021 11:30:27 +0000, "kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com> wrote in > I attached the rebased version. Tests and descriptions were added. > In my understanding Ikeda-san's indication is included. I have some comments by a quick look. + * one have higher priority. See also the previous comment. Is "the previous comment" "the comment above"? + for (i = n -1; i >= 0; i--) You might want a space between - and 1. +parse_application_name(StringInfo buf, const char *name) The name seems a bit too generic as it is a function only for postgres_fdw. + /* must be a '%', so skip to the next char */ + p++; + if (*p == '\0') + break; /* format error - ignore it */ I'm surprised by finding that undefined %-escapes and stray % behave differently between archive_command and log_line_prefix. I understand this behaves like the latter. + const char *username = MyProcPort->user_name; I'm not sure but even if user_name doesn't seem to be NULL, don't we want to do the same thing with %u of log_line_prefix for safety? Namely, setting [unknown] if user_name is NULL or "". The same can be said for %d. + * process_padding --- helper function for processing the format + * string in log_line_prefix Since this is no longer a static helper function for a specific function, the name and the comment should be more descriptive. That being said, in the first place the function seems reducible almost to a call to atol. By a quick measurement the function is about 38% faster (0.024us/call(the function) vs 0.039us/call(strtol) so I'm not sure we want to replace process_log_prefix_padding(), but we don't need to reuse the function in parse_application_name since it is called only once per connection. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Dear Horiguchi-san, Thank you for reviewing! I attached the fixed version. > Is "the previous comment" "the comment above"? Yeah, fixed. > + for (i = n -1; i >= 0; i--) > > You might want a space between - and 1. Fixed. > +parse_application_name(StringInfo buf, const char *name) > > The name seems a bit too generic as it is a function only for > postgres_fdw. Indeed. I renamed to parse_pgfdw_appname(). > + /* must be a '%', so skip to the next char */ > + p++; > + if (*p == '\0') > + break; /* format error - > ignore it */ > > I'm surprised by finding that undefined %-escapes and stray % behave > differently between archive_command and log_line_prefix. I understand > this behaves like the latter. Indeed. pgarch_archiveXlog() treats undefined escapes as nothing special, but log_line_prefix() stop parsing immediately. They have no description about it in docs. I will not treat it in this thread and follow log_line_prefix(), but I agree it is strange. > + const char *username = > MyProcPort->user_name; > > I'm not sure but even if user_name doesn't seem to be NULL, don't we > want to do the same thing with %u of log_line_prefix for safety? > Namely, setting [unknown] if user_name is NULL or "". The same can be > said for %d. I think they will be never NULL in current implementation, but your suggestion is better. Checks were added in %a, %u and %d. > + * process_padding --- helper function for processing the format > + * string in log_line_prefix > > Since this is no longer a static helper function for a specific > function, the name and the comment should be more descriptive. > > That being said, in the first place the function seems reducible > almost to a call to atol. By a quick measurement the function is about > 38% faster (0.024us/call(the function) vs 0.039us/call(strtol) so I'm > not sure we want to replace process_log_prefix_padding(), but we don't > need to reuse the function in parse_application_name since it is > called only once per connection. My first impression is that they use the function because they want to know the end of sequence and do error-handling, but I agree this can be replaced by strtol(). I changed the function to strtol() and restored process_log_prefix_padding(). How do you think? Does it follow your suggestion? Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On 2021/09/08 21:32, kuroda.hayato@fujitsu.com wrote: > Dear Horiguchi-san, > > Thank you for reviewing! I attached the fixed version. Thanks for updating the patch! + for (i = n - 1; i >= 0; i--) + { + if (strcmp(keywords[i], "application_name") == 0) + { + parse_pgfdw_appname(&buf, values[i]); + values[i] = buf.data; + break; + } postgres_fdw gets out of the loop after processing appname even when buf.data is '\0'. Is this expected behavior? Because of this, when postgres_fdw.application_name = '%b', unprocessed appname of the server object is used. +CREATE FUNCTION for_escapes() RETURNS bool AS $$ + DECLARE + appname text; + c bool; + BEGIN + SHOW application_name INTO appname; + EXECUTE 'SELECT COUNT(*) FROM pg_stat_activity WHERE application_name LIKE ''' Could you tell me why the statement checking application_name should be wrapped in a function? Instead, we can just execute something like the following? SELECT COUNT(*) FROM pg_stat_activity WHERE application_name = current_setting('application_name') || current_user || current_database()|| pg_backend_pid() || '%'; + char *endptr = NULL; + padding = (int)strtol(p, &endptr, 10); strtol() seems to work differently from process_log_prefix_padding(), for example, when the input string is "%-p". + case 'a': + { + const char *appname = application_name; When log_line_prefix() processes "%a", "%u" or "%d" characters in log_line_prefix, it checks whether MyProcPort is NULL or not. Likewise shouldn't parse_pgfdw_appname() do the same thing? For now it's ok not to check that because only process having MyProcPort can use postgres_fdw. But in the future the process not having that may use postgres_fdw, for example, 2PC resolver process that Sawada-san is proposing to add to support automatic 2PC among multiple remote servers. + You can use escape sequences for <literal>application_name</literal> even if + it is set as a connection or a user mapping option. + Please refer the later section. I was thinking that application_name cannot be set in a user mapping. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Dear Fujii-san, Thank you for reviewing! > postgres_fdw gets out of the loop after processing appname even > when buf.data is '\0'. Is this expected behavior? Because of this, > when postgres_fdw.application_name = '%b', unprocessed appname > of the server object is used. In this case, I expected to use fallback_application_name instead of appname, but I missed the case where appname was set as a server option. Maybe we should do continue when buf.data is \0. > + char *endptr = NULL; > + padding = (int)strtol(p, &endptr, 10); > > strtol() seems to work differently from process_log_prefix_padding(), > for example, when the input string is "%-p". Yeah, and I found that "%+5p" is another example. Our padding function does not understand plus sign (and maybe this is the specification of printf()), but strtol() reads. I did not fix here because I lost my way. Do we treats these cases and continue to use strtol(), or use process_padding()? > Could you tell me why the statement checking > application_name should be wrapped in a function? > Instead, we can just execute something like the following? > > SELECT COUNT(*) FROM pg_stat_activity WHERE application_name = > current_setting('application_name') || current_user || current_database() || > pg_backend_pid() || '%'; I did not know current_setting() function... The function was replaced as you expected. > When log_line_prefix() processes "%a", "%u" or "%d" characters in > log_line_prefix, it checks whether MyProcPort is NULL or not. > Likewise shouldn't parse_pgfdw_appname() do the same thing? > For now it's ok not to check that because only process having > MyProcPort can use postgres_fdw. But in the future the process > not having that may use postgres_fdw, for example, 2PC resolver > process that Sawada-san is proposing to add to support automatic > 2PC among multiple remote servers. Sure, I did not consider about other patches. I added checks. > + You can use escape sequences for <literal>application_name</literal> > even if > + it is set as a connection or a user mapping option. > + Please refer the later section. > > I was thinking that application_name cannot be set in a user mapping. I confirmed codes and found that is_valid_option() returns false if libpq options are set. So removed. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On 2021/09/09 21:36, kuroda.hayato@fujitsu.com wrote: > In this case, I expected to use fallback_application_name instead of appname, > but I missed the case where appname was set as a server option. > Maybe we should do continue when buf.data is \0. Yes. + /* + * If the input format is wrong and the string becomes '\0', + * this parameter is no longer used. + * We conitnue searching application_name. + */ + if (values[i] == '\0') + continue; + else + break; We can simplify the code as follows. if (values[i] != '\0') break; > Yeah, and I found that "%+5p" is another example. > Our padding function does not understand plus sign > (and maybe this is the specification of printf()), but strtol() reads. > > I did not fix here because I lost my way. Do we treats these cases > and continue to use strtol(), or use process_padding()? IMO it's better to use process_padding() to process log_line_prefix and postgres_fdw.application in the same way as possible. Which would be less confusing. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Dear Fujii-san, Thank you for quick reviewing! > We can simplify the code as follows. > > if (values[i] != '\0') > break; Fixed. And type mismatching was also fixed. > IMO it's better to use process_padding() to process log_line_prefix and > postgres_fdw.application in the same way as possible. > Which would be less confusing. OK, I followed that. Some comments were added above the function. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
From Friday, September 10, 2021 11:24 AM kuroda.hayato@fujitsu.com <kuroda.hayato@fujitsu.com> > > We can simplify the code as follows. > > > > if (values[i] != '\0') > > break; > > Fixed. And type mismatching was also fixed. > > > IMO it's better to use process_padding() to process log_line_prefix > > and postgres_fdw.application in the same way as possible. > > Which would be less confusing. > > OK, I followed that. Some comments were added above the function. Hi Kuroda-san, I found one minor thing in the patch. + appendStringInfoSpaces(buf, + padding > 0 ? padding : -padding); Is it better to use Abs(padding) here ? Although some existing codes are using " padding > 0 ? padding : -padding ". Best regards, Hou zj
Dear Hou, > I found one minor thing in the patch. Thanks! > Is it better to use Abs(padding) here ? > Although some existing codes are using " padding > 0 ? padding : -padding ". +1, fixed. And I found that some comments were not added above padding calculation, so added. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
At Fri, 10 Sep 2021 04:33:53 +0000, "kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com> wrote in > Dear Hou, > > > I found one minor thing in the patch. > > Thanks! > > > Is it better to use Abs(padding) here ? > > Although some existing codes are using " padding > 0 ? padding : -padding ". > > +1, fixed. > > And I found that some comments were not added above padding calculation, > so added. Thanks for the new version. I don't object for reusing process_log_prefix_padding, but I still find it strange that the function with the name 'process_padding' is in string.c. If we move it to string.c, I'd like to name it "pg_fast_strtoi" or something and change the interface to int pg_fast_strtoi(const char *p, char **endptr) that is (roughly) compatible to strtol. What do (both) you think about this? + /* + * Search application_name and replace it if found. + * + * We search paramters from the end because the later + * one have higher priority. See also the above comment. + */ + for (i = n - 1; i >= 0; i--) + { + if (strcmp(keywords[i], "application_name") == 0) + { + parse_pgfdw_appname(&buf, values[i]); + values[i] = buf.data; + + /* + * If the input format is wrong and the string becomes '\0', + * this parameter is no longer used. + * We conitnue searching application_name. + */ + if (*values[i] != '\0') + break; + } + } I didn't fully checked in what case parse_pgfdw_appname gives "" as result, I feel that we should use the original value in that case. That is, > parse_pgfdw_appname(&buf, vaues[i]); > > /* use the result if any, otherwise use the original string */ > if (buf.data[0] != 0) > values[i] = buf.data; > > /* break if it results in non-empty string */ > if (values[0][0] != 0) > break; regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Dear Horiguchi-san, Thank you for giving comments! > Thanks for the new version. I don't object for reusing > process_log_prefix_padding, but I still find it strange that the > function with the name 'process_padding' is in string.c. If we move > it to string.c, I'd like to name it "pg_fast_strtoi" or something and > change the interface to int pg_fast_strtoi(const char *p, char > **endptr) that is (roughly) compatible to strtol. What do (both) you > think about this? I agree that this interface might be confused. I changed its name and interface. How do you think? Actually I cannot distinguish the name is good or not, but I could not think of anything else... > I didn't fully checked in what case parse_pgfdw_appname gives "" as > result, I feel that we should use the original value in that > case. That is, > > > parse_pgfdw_appname(&buf, vaues[i]); > > > > /* use the result if any, otherwise use the original string */ > > if (buf.data[0] != 0) > > values[i] = buf.data; > > > > /* break if it results in non-empty string */ > > if (values[0][0] != 0) > > break; Hmm. I tested log_line_prefix() by setting it as '%z' and I found that any prefixes were not output. I think we should follow it, so currently not fixed. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On 2021/09/14 13:42, kuroda.hayato@fujitsu.com wrote: > Dear Horiguchi-san, > > Thank you for giving comments! > >> Thanks for the new version. I don't object for reusing >> process_log_prefix_padding, but I still find it strange that the >> function with the name 'process_padding' is in string.c. If we move >> it to string.c, I'd like to name it "pg_fast_strtoi" or something and >> change the interface to int pg_fast_strtoi(const char *p, char >> **endptr) that is (roughly) compatible to strtol. What do (both) you >> think about this? > > I agree that this interface might be confused. > I changed its name and interface. How do you think? > Actually I cannot distinguish the name is good or not, > but I could not think of anything else... The name using the word "strtoint" sounds confusing to me because the behavior of the function is different from strtoint() or pg_strtoint32(), etc. Otherwise we can easily misunderstand that pg_fast_strtoint() can be used as alternative of strtoint() or pg_strtoint32(). I have no better idea for the name, though.. > >> I didn't fully checked in what case parse_pgfdw_appname gives "" as >> result, I feel that we should use the original value in that >> case. That is, >> >>> parse_pgfdw_appname(&buf, vaues[i]); >>> >>> /* use the result if any, otherwise use the original string */ >>> if (buf.data[0] != 0) >>> values[i] = buf.data; >>> >>> /* break if it results in non-empty string */ >>> if (values[0][0] != 0) >>> break; Agreed. It's strange to use the application_name of the server object in that case. There seems to be four options: (1) Use the original postgres_fdw.application_name like "%b" (2) Use the application_name of the server object (if set) (3) Use fallback_application_name (4) Use empty string as application_name (2) and (3) look strange to me because we expect that postgres_fdw.application_name should override application_name of the server object and fallback_application_mame. Also reporting invalid escape sequence in application_name as it is, i.e., (1), looks strange to me. So (4) looks most intuitive and similar behavior to log_line_prefix. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Dear Fujii-san, Thanks for comments! > >> Thanks for the new version. I don't object for reusing > >> process_log_prefix_padding, but I still find it strange that the > >> function with the name 'process_padding' is in string.c. If we move > >> it to string.c, I'd like to name it "pg_fast_strtoi" or something and > >> change the interface to int pg_fast_strtoi(const char *p, char > >> **endptr) that is (roughly) compatible to strtol. What do (both) you > >> think about this? > > > > I agree that this interface might be confused. > > I changed its name and interface. How do you think? > > Actually I cannot distinguish the name is good or not, > > but I could not think of anything else... > > The name using the word "strtoint" sounds confusing to me > because the behavior of the function is different from strtoint() or > pg_strtoint32(), etc. Otherwise we can easily misunderstand that > pg_fast_strtoint() can be used as alternative of strtoint() or > pg_strtoint32(). I have no better idea for the name, though.. you mean that this is not strtoXXX, right? If so we should go back to process_padding().... Horiguchi-san, do you have any idea? And I added pg_restrict keywords for compiler optimizations. > >> I didn't fully checked in what case parse_pgfdw_appname gives "" as > >> result, I feel that we should use the original value in that > >> case. That is, > >> > >>> parse_pgfdw_appname(&buf, vaues[i]); > >>> > >>> /* use the result if any, otherwise use the original string */ > >>> if (buf.data[0] != 0) > >>> values[i] = buf.data; > >>> > >>> /* break if it results in non-empty string */ > >>> if (values[0][0] != 0) > >>> break; > > Agreed. It's strange to use the application_name of the server > object in that case. There seems to be four options: > > (1) Use the original postgres_fdw.application_name like "%b" > (2) Use the application_name of the server object (if set) > (3) Use fallback_application_name > (4) Use empty string as application_name > > (2) and (3) look strange to me because we expect that > postgres_fdw.application_name should override application_name > of the server object and fallback_application_mame. > > Also reporting invalid escape sequence in application_name as it is, > i.e., (1), looks strange to me. > > So (4) looks most intuitive and similar behavior to log_line_prefix. > Thought? I agreed that (2) and (3) breaks the rule which should override server option. Hence I chose (4), values[i] substituted to buf.data in any case. Attached is the latest version. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On 2021/09/16 12:36, kuroda.hayato@fujitsu.com wrote: > you mean that this is not strtoXXX, right? Yes, the behavior of process_log_prefix_padding() is different from strtoint() or pg_strtoint32(). For example, how the heading space characters are handled is different. If we use the name like pg_fast_strtoint(), we might be likely to replace the existing strtoint() or pg_strtoint32() with pg_fast_strtoint() because its name contains "fast", for better performance. But we should not do that because their behavior is different. > If so we should > go back to process_padding().... Horiguchi-san, do you have any idea? At least for me process_padding() sounds like the function is actually doing the padding operation. This is not what the function does. So how about something like parse_padding(), parse_padding_format()? > And I added pg_restrict keywords for compiler optimizations. I'm not sure how much it's worth doing this here. If this change is helpful for better performance or something, IMO it's better to propose this separately. > I agreed that (2) and (3) breaks the rule which should override server option. > Hence I chose (4), values[i] substituted to buf.data in any case. > > Attached is the latest version. Thanks! Will review the patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Dear Fujii-san, Horiguchi-san, Based on your advice, I made a patch that communize two parsing functions into one. new internal function parse_format_string() was added. (This name may be too generic...) log_line_prefix() and parse_pgfdw_appname() become just the wrapper function. My prerpimary checking was passed for server and postgres_fdw, but could you review that? Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On 2021/09/21 15:08, kuroda.hayato@fujitsu.com wrote: > Dear Fujii-san, Horiguchi-san, > > Based on your advice, I made a patch > that communize two parsing functions into one. > new internal function parse_format_string() was added. > (This name may be too generic...) > log_line_prefix() and parse_pgfdw_appname() become just the wrapper function. > My prerpimary checking was passed for server and postgres_fdw, > but could you review that? Yes. Thanks for updating the patch! ------------------- elog.c:2785:14: warning: expression which evaluates to zero treated as a null pointer constant of type 'char *' [-Wnon-literal-null-conversion] *endptr = '\0'; ^~~~ 1 warning generated. ------------------- I got the above compiler warning. + * Note: StringInfo datatype cannot be accepted + * because elog.h should not include postgres-original header file. How about moving the function to guc.c from elog.c because it's for the parameters, i.e., log_line_prefix and postgres_fdw.application_name? This allows us to use StringInfo in the function? + parse_pgfdw_appname(buf, values[i]); + /* + * Note that appname may becomes an empty string + * if an input string has wrong format. + */ + values[i] = *buf; If postgres_fdw.application_name contains only invalid escape characters like "%b", parse_pgfdw_appname() returns an empty string. We discussed there are four options to handle this case and we concluded (4) is better. Right? But ISTM that the patch uses (2). > (1) Use the original postgres_fdw.application_name like "%b" > (2) Use the application_name of the server object (if set) > (3) Use fallback_application_name > (4) Use empty string as application_name Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Dear Fujii-san, Horiguchi-san I'm sorry for sending a bad patch... > ------------------- > elog.c:2785:14: warning: expression which evaluates to zero treated as a null > pointer constant of type 'char *' [-Wnon-literal-null-conversion] > *endptr = '\0'; > ^~~~ > 1 warning generated. > ------------------- > > I got the above compiler warning. Fixed. *endptr points (char *)p in any cases, that means this parameter is invalid. > + * Note: StringInfo datatype cannot be accepted > + * because elog.h should not include postgres-original header file. > > How about moving the function to guc.c from elog.c because it's for > the parameters, i.e., log_line_prefix and postgres_fdw.application_name? > This allows us to use StringInfo in the function? Yeah, StringInfo can be used in guc.c. Hence moved it. Some variables and functions for timestamp became non-static function, because they were used both normal logging and log_line_prefix. I think their name is not too generic so I did not fix them. > + parse_pgfdw_appname(buf, values[i]); > + /* > + * Note that appname may becomes an empty > string > + * if an input string has wrong format. > + */ > + values[i] = *buf; > > If postgres_fdw.application_name contains only invalid escape characters like > "%b", parse_pgfdw_appname() returns an empty string. We discussed > there are four options to handle this case and we concluded (4) is better. > Right? But ISTM that the patch uses (2). > > > (1) Use the original postgres_fdw.application_name like "%b" > > (2) Use the application_name of the server object (if set) > > (3) Use fallback_application_name > > (4) Use empty string as application_name Yeah, currently my patch uses case (2). I tried to implement (4), but I found that libpq function(may be conninfo_array_parse()) must be modified in order to that. We moved the functionality to libpq layer because we want to avoid some side effects, so we started to think case (4) might be wrong. Now we propose the following senario: 1. Use postgres_fdw.application_name when it is set and the parsing result is not empty 2. If not, use the foreign-server option when it is set and the parsing result is not empty 3. If not, use fallback_application_name How do you think? Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
At Mon, 27 Sep 2021 04:10:50 +0000, "kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com> wrote in > I'm sorry for sending a bad patch... Thank you for the new version, and sorry for making the discussion go back and forth:p > > + * Note: StringInfo datatype cannot be accepted > > + * because elog.h should not include postgres-original header file. > > > > How about moving the function to guc.c from elog.c because it's for > > the parameters, i.e., log_line_prefix and postgres_fdw.application_name? > > This allows us to use StringInfo in the function? > > Yeah, StringInfo can be used in guc.c. Hence moved it. > Some variables and functions for timestamp became non-static function, > because they were used both normal logging and log_line_prefix. > I think their name is not too generic so I did not fix them. > > > + parse_pgfdw_appname(buf, values[i]); > > + /* > > + * Note that appname may becomes an empty > > string > > + * if an input string has wrong format. > > + */ > > + values[i] = *buf; > > > > If postgres_fdw.application_name contains only invalid escape characters like > > "%b", parse_pgfdw_appname() returns an empty string. We discussed > > there are four options to handle this case and we concluded (4) is better. > > Right? But ISTM that the patch uses (2). > > > > > (1) Use the original postgres_fdw.application_name like "%b" > > > (2) Use the application_name of the server object (if set) > > > (3) Use fallback_application_name > > > (4) Use empty string as application_name > > Yeah, currently my patch uses case (2). I tried to implement (4), > but I found that libpq function(may be conninfo_array_parse()) must be modified in order to that. > We moved the functionality to libpq layer because we want to avoid some side effects, > so we started to think case (4) might be wrong. > > Now we propose the following senario: > 1. Use postgres_fdw.application_name when it is set and the parsing result is not empty > 2. If not, use the foreign-server option when it is set and the parsing result is not empty > 3. If not, use fallback_application_name > > How do you think? I think we don't have a predecessor of the case like this where a behavior is decided from object option and GUC. I'm a bit uncomfortable with .conf configuration overrides server options, but I also think in-session-set GUC should override server options. So, it's slightly against POLA but from the standpoint of usability +0.5 to that prioritization since I cannot come up with a better idea. I thought it is nice to share process_format_string but the function is too tightly coupled with ErrorData detail as you pointed off-list. So I came to think we cannot use the function from outside. It could be a possibility to make the function be just a skeleton that takes a list of pairs of an escaped character and the associated callback function but that is apparently too-much complex. (It would be worth doing if we share the same backbone processing with archive_command, restore_command, recover_end_command and so on, but that is necessarily accompanied by the need to change the behavior either log_line_prefix or others.) I personally don't care even if this feature implements padding-reading logic differently from process_format_string, but if we don't like that, I would return to suggest using strtol in both functions. As Fujii-san pointed upthread, strtol behaves a bit different way than we expect here, but that can be handled by checking the starting characters. > if (*p == '-' ? isdigit(p[1]) : isdigit(p[0])) > { > char *endptr; > padding = strtol(p, &endptr, 10); > if (p == endptr) > break; > p = endptr; > } > else > padding = 0; The code gets a bit more complex but simplification by removing the helper function wins. strtol is slower than the original function but it can be thought in noise-level? isdigit on some platforms seems following locale, but it is already widely used for example while numeric parsing so I don't think that matters. (Of course we can get rid of isdigit by using bare comparisons.) I think it can be a separate preparatory patch of this patch. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Fri, 01 Oct 2021 11:23:33 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > function but that is apparently too-much complex. (It would be worth > doing if we share the same backbone processing with archive_command, > restore_command, recover_end_command and so on, but that is > necessarily accompanied by the need to change the behavior either > log_line_prefix or others.) So I was daydreaming over the idea and concluded as it is no-go. I said that the fdw-application name is close to archive_command rather than log_line_prefix, but that is totally missing the point. log_line_prefix and the fdw-application name share the concept of formatting, which is not a part of archive_command and friends. The latter is made very light weight and simple due to not only the formatless-ness but also the narrow choices of escape characters. Also they work on fixed-length buffer instead of StringInfo. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Dear Horiguchi-san, Thank you for giving many comments! I attached new patches. I'm sorry for the late reply. > I think we don't have a predecessor of the case like this where a > behavior is decided from object option and GUC. > > I'm a bit uncomfortable with .conf configuration overrides server > options, but I also think in-session-set GUC should override server > options. So, it's slightly against POLA but from the standpoint of > usability +0.5 to that prioritization since I cannot come up with a > better idea. OK, I keep the current spec. Please tell me if you come up with something. > I thought it is nice to share process_format_string but the function > is too tightly coupled with ErrorData detail as you pointed off-list. > So I came to think we cannot use the function from outside. It could > be a possibility to make the function be just a skeleton that takes a > list of pairs of an escaped character and the associated callback > function but that is apparently too-much complex. (It would be worth > doing if we share the same backbone processing with archive_command, > restore_command, recover_end_command and so on, but that is > necessarily accompanied by the need to change the behavior either > log_line_prefix or others.) I agree that combining them makes source too complex. And we have an another problem about native language support. Sometimes espaces becomes "[unkown]" string in log_line_prefix(), and they are gettext()'s target. So if we combine log_line_prefix() and parse_pgfdw_appname(), some non-ascii characters may be set to postgres_fdw.applicaiton_name. Of cause we can implement callback function, but I think it is the next step. > I personally don't care even if this feature implements > padding-reading logic differently from process_format_string, but if > we don't like that, I would return to suggest using strtol in both > functions. > > As Fujii-san pointed upthread, strtol behaves a bit different way than > we expect here, but that can be handled by checking the starting > characters. > if (*p == '-' ? isdigit(p[1]) : isdigit(p[0])) > { > char *endptr; > padding = strtol(p, &endptr, 10); > if (p == endptr) > break; > p = endptr; > } > else > padding = 0; I removed the support function based on your suggestion, but added some if-statement in order to treats the special case: '%-p'. And I found and fixed another bug. If users set application_name both in server option and GUC, concatenated string was set as appname. This is caused because we reused same StringInfo and parsing all appname string even if we sucsess it. Now the array search will stop if sucseed, and in failure case do resetStringInfo() and re-search. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On 2021/10/04 21:53, kuroda.hayato@fujitsu.com wrote: >> if (*p == '-' ? isdigit(p[1]) : isdigit(p[0])) >> { >> char *endptr; >> padding = strtol(p, &endptr, 10); >> if (p == endptr) >> break; >> p = endptr; >> } >> else >> padding = 0; > > I removed the support function based on your suggestion, > but added some if-statement in order to treats the special case: '%-p'. + else if (*p == '-' ? isdigit(p[1]) : isdigit(p[0])) + { + char *endptr; + padding = strtol(p, &endptr, 10); Maybe isdigit() and strtol() work differently depending on locale setting? If so, I'm afraid that the behavior of this new code would be different from process_log_prefix_padding(). Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Dear Fujii-san, Thank you for reviewing! > + else if (*p == '-' ? isdigit(p[1]) : isdigit(p[0])) > + { > + char *endptr; > + padding = strtol(p, &endptr, 10); > > Maybe isdigit() and strtol() work differently depending on locale setting? > If so, I'm afraid that the behavior of this new code would be different from > process_log_prefix_padding(). (Maybe strtol() should be strtoint(). This is my fault... but anyway) You afraid that these functions may return non-zero value if locale is not "C" and inputs are non-ascii characters? I read the POSIX specification(isdigit: [1], strtol: [2]) and I found that it suggests such functions are affected by locale settings. For isdigit(): > The isdigit() and isdigit_l() functions shall test whether c is a character of class digit in the current locale, > or in the locale represented by locale, respectively; see XBD Locale. For strtol() > In other than the C or POSIX locale, additional locale-specific subject sequence forms may be accepted. I seeked other sources, but I thought that they did not consider about locale settings. For example, functions in numutils.c are use such functions without any locale consideration. And I considered executing setlocale(LC_CTYPE, "C"), but I found the following comment pg_locale.c and I did not want to violate the policy. > * Here is how the locale stuff is handled: LC_COLLATE and LC_CTYPE > * are fixed at CREATE DATABASE time, stored in pg_database, and cannot > * be changed. Thus, the effects of strcoll(), strxfrm(), isupper(), > * toupper(), etc. are always in the same fixed locale. But isdigit() and strtol() cannot accept multi byte character, so I also thought we don't have to consider that some unexpected character is set in LC_CTYPE digit category. So now we can choose from followings: * ignore such differences and use isdigit() and strtol() * give up using them and implement two static support functions How do you think? Someone knows any other knowledge about locale? [1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/isdigit.html [2]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/strtol.html Best Regards, Hayato Kuroda FUJITSU LIMITED
On 2021/10/07 11:46, kuroda.hayato@fujitsu.com wrote: > So now we can choose from followings: > > * ignore such differences and use isdigit() and strtol() > * give up using them and implement two static support functions > > How do you think? Someone knows any other knowledge about locale? Replacing process_log_prefix_padding() with isdigit()+strtol() is just refactoring and doesn't provide any new feature. So they basically should work in the same way. If the behavior of isdigit()+strtol() can be different from process_log_prefix_padding(), I'd prefer to the latter option you suggested, i.e., give up using isdigit()+strtol(). OTOH, of course if the behaviors of them are the same, I'm ok to use isdigit()+strtol(), though. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Tue, 12 Oct 2021 13:25:01 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2021/10/07 11:46, kuroda.hayato@fujitsu.com wrote: > > So now we can choose from followings: > >* ignore such differences and use isdigit() and strtol() > > * give up using them and implement two static support functions > >How do you think? Someone knows any other knowledge about locale? > > Replacing process_log_prefix_padding() with isdigit()+strtol() is > just refactoring and doesn't provide any new feature. So they > basically should work in the same way. If the behavior of > isdigit()+strtol() > can be different from process_log_prefix_padding(), I'd prefer to > the latter option you suggested, i.e., give up using > isdigit()+strtol(). > > OTOH, of course if the behaviors of them are the same, > I'm ok to use isdigit()+strtol(), though. Hmm. It look like behaving a bit xdifferently. At least for example, for "%-X", current code treats it as 0 padding but the patch treats it as -1. By the way, the current code is already a kind of buggy. I think I showed an example like: "%4%5%6%7p" is converted to "57p". Do we need to imitate that bug with this patch? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 12 Oct 2021 15:06:11 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > "%4%5%6%7p" is converted to "57p". Do we need to imitate that bug > with this patch? So.. I try to describe the behavior for exhaustive patterns.. current: A. "^-?[0-9]+.*" : returns valid padding. p goes after the last digit. B. "^[^0-9-].*" : padding = 0, p doesn't advance. C. "^-[^0-9].*" : padding = 0, p advances by 1 byte. D. "^-" : padding = 0, p advances by 1 byte. (if *p == 0 then breaks) I think the patterns covers the all possibilities. If we code as the following: if (*p == '-' ? isdigit(p[1]) : isdigit(p[0])) { char *endptr; padding = strtol(p, &endptr, 10); Assert(endptr > p); if (*endptr == '\0') break; p = endptr; } else padding = 0; A. "^-?[0-9]+.*" : same to the current B. "^[^0-9-].*" : same to the current C. "^-[^0-9].*" : padding = 0, p doesn't advance. D. "^-" : padding = 0, p doesn't advance. If we wan to make the behaviors C and D same with the current, the else clause should be like the follows, but I don't think we need to do that. else { padding = 0; if (*p == '-') p++; } regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 12 Oct 2021 15:42:23 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > If we code as the following: ... > A. "^-?[0-9]+.*" : same to the current > B. "^[^0-9-].*" : same to the current > C. "^-[^0-9].*" : padding = 0, p doesn't advance. > D. "^-" : padding = 0, p doesn't advance. One possible cause of a difference in behavior is character class handling including multibyte characters of isdigit and strtol. If isdigit accepts '一' as a digit (some platforms might do this) , and strtol doesn't (I believe it is universal behavior), '%一0p' is converted to '%' and the pointer moves onto '一'. But I don't think we need to do something for such a crazy specification. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Dear Horiguchi-san, Fujii-san, Perfect work... Thank you for replying and analyzing! > A. "^-?[0-9]+.*" : returns valid padding. p goes after the last digit. > B. "^[^0-9-].*" : padding = 0, p doesn't advance. > C. "^-[^0-9].*" : padding = 0, p advances by 1 byte. > D. "^-" : padding = 0, p advances by 1 byte. > (if *p == 0 then breaks) I confirmed them and your patterns are correct. > If we wan to make the behaviors C and D same with the current, the > else clause should be like the follows, but I don't think we need to > do that. > else > { > padding = 0; > if (*p == '-') > p++; > } This treatments is not complex so I want to add them if possible. > One possible cause of a difference in behavior is character class > handling including multibyte characters of isdigit and strtol. If > isdigit accepts '一' as a digit (some platforms might do this) , and > strtol doesn't (I believe it is universal behavior), '%一0p' is > converted to '%' and the pointer moves onto '一'. But I don't think we > need to do something for such a crazy specification. Does isdigit() understand multi-byte character correctly? The arguments of isdigit() is just a unsigned char, and this is 1byte. Hence I thought that they cannot distinguish 'ー'. Actually I considered about another thing. Maybe isdigit() just checks whether the value of the argument is in (int)48 and (int)57, and that means that the first part of some multi-byte characters may be accepted as digit in some locales. But, of cause I agreed this is the crazy case. Best Regards, Hayato Kuroda FUJITSU LIMITED
> Does isdigit() understand multi-byte character correctly? The arguments > of isdigit() is just a unsigned char, and this is 1byte. > Hence I thought that they cannot distinguish 'ー'. Sorry, but I referred some wrong doc. Please ignore here... Best Regards, Hayato Kuroda FUJITSU LIMITED
At Wed, 13 Oct 2021 11:05:19 +0000, "kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com> wrote in > Dear Horiguchi-san, Fujii-san, > > Perfect work... Thank you for replying and analyzing! > > > A. "^-?[0-9]+.*" : returns valid padding. p goes after the last digit. > > B. "^[^0-9-].*" : padding = 0, p doesn't advance. > > C. "^-[^0-9].*" : padding = 0, p advances by 1 byte. > > D. "^-" : padding = 0, p advances by 1 byte. > > (if *p == 0 then breaks) > > I confirmed them and your patterns are correct. Thanks for the confirmation. > > If we wan to make the behaviors C and D same with the current, the > > else clause should be like the follows, but I don't think we need to > > do that. > > else > > { > > padding = 0; > > if (*p == '-') > > p++; > > } > > This treatments is not complex so I want to add them if possible. Also I don't oppose to do that. > > One possible cause of a difference in behavior is character class > > handling including multibyte characters of isdigit and strtol. If > > isdigit accepts '一' as a digit (some platforms might do this) , and > > strtol doesn't (I believe it is universal behavior), '%一0p' is > > converted to '%' and the pointer moves onto '一'. But I don't think we > > need to do something for such a crazy specification. > > Does isdigit() understand multi-byte character correctly? The arguments > of isdigit() is just a unsigned char, and this is 1byte. > Hence I thought that they cannot distinguish 'ー'. .. > Sorry, but I referred some wrong doc. Please ignore here... I'm not sure. All of it is a if-story. z/OS's isdigit is defined as "Test for a decimal digit, as defined in the digit locale source file and in the digit class of the LC_CTYPE category of the current locale.", which I read it as "it accepts multibyte strings" but of course I'm not sure that's correct. On CentOS8, and for Japanese letters, both iswdigit(L'ー') and iswdigit(L'0') return false so, assuming the same character classfication scheme that is implementation dependent, I guess z/OS's isdigit would behave the same way with CentOS's isdigit. Hoever, regardless of the precise behavior, > But, of cause I agreed this is the crazy case. Yes, I meant that that doesn't matter. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Dear Horiguchi-san, > I'm not sure. All of it is a if-story. z/OS's isdigit is defined as > "Test for a decimal digit, as defined in the digit locale source file > and in the digit class of the LC_CTYPE category of the current > locale.", which I read it as "it accepts multibyte strings" but of > course I'm not sure that's correct. > > On CentOS8, and for Japanese letters, both iswdigit(L'ー') and > iswdigit(L'0') return false so, assuming the same character > classfication scheme that is implementation dependent, I guess z/OS's > isdigit would behave the same way with CentOS's isdigit. > > Hoever, regardless of the precise behavior, > > > But, of cause I agreed this is the crazy case. > > Yes, I meant that that doesn't matter. Hmm.. of cause I want to do because I believe this is corner case, but we should avoid regression... If no one can say it completely some existing methods probably should be used. Therefore, I would like to suggest using the export and support functions again. The name is converted to parse_XXX and it locates in elog.c. How is it? Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On 2021/10/15 17:45, kuroda.hayato@fujitsu.com wrote: > Dear Horiguchi-san, > >> I'm not sure. All of it is a if-story. z/OS's isdigit is defined as >> "Test for a decimal digit, as defined in the digit locale source file >> and in the digit class of the LC_CTYPE category of the current >> locale.", which I read it as "it accepts multibyte strings" but of >> course I'm not sure that's correct. >> >> On CentOS8, and for Japanese letters, both iswdigit(L'ー') and >> iswdigit(L'0') return false so, assuming the same character >> classfication scheme that is implementation dependent, I guess z/OS's >> isdigit would behave the same way with CentOS's isdigit. >> >> Hoever, regardless of the precise behavior, >> >>> But, of cause I agreed this is the crazy case. >> >> Yes, I meant that that doesn't matter. > > Hmm.. of cause I want to do because I believe this is corner case, > but we should avoid regression... > If no one can say it completely some existing methods probably should be used. > Therefore, I would like to suggest using the export and support functions again. On second thought, do we really need padding support for application_name in postgres_fdw? I just thought that when I read the description "Padding can be useful to aid human readability in log files." in the docs about log_line_prefix. + case 'a': + if (MyProcPort) + { I commented that the check of MyProcPort is necessary because background worker not having MyProcPort may access to the remote server. The example of such process is the resolver process that Sawada-san was proposing for atomic commit feature. But the proposal was withdrawn and for now there seems no such process. If this is true, we can safely remove the check of MyProcPort? If so, something like Assert(MyProcPort != NULL) may need to be added, instead. + if (username == NULL || *username == '\0') + username = "[unknown]"; If user name or database name is not set, the name is replaced with "[unknown]". log_line_prefix needs this because log message may be output when they have not been set yet, e.g., at early stage of backend startup. But I wonder if application_name in postgres_fdw actually need that.. Thought? + if (appname == NULL || *appname == '\0') + appname = "[unknown]"; Do we really want to replace the application name with "[unknown]" when application_name in the local server is not set? At least for me, it's intuitive to replace it with empty string in that case, in postgres_fdw application_name. The patch now fails to be applied to the master. Could you rebase it? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Dear Fujii-san, Thank you for giving comments! I attached new patches. > On second thought, do we really need padding support for application_name > in postgres_fdw? I just thought that when I read the description > "Padding can be useful to aid human readability in log files." in the docs > about log_line_prefix. My feelings was that we don't have any reasons not to support, but I cannot found some good use-cases. Now I decided to keep supporting that, but if we face another problem I will cut that. > + case 'a': > + if (MyProcPort) > + { > > I commented that the check of MyProcPort is necessary because background > worker not having MyProcPort may access to the remote server. The example > of such process is the resolver process that Sawada-san was proposing for > atomic commit feature. But the proposal was withdrawn and for now > there seems no such process. If this is true, we can safely remove the check > of MyProcPort? If so, something like Assert(MyProcPort != NULL) may need > to be added, instead. My understating was that we don't have to assume committing the Sawada-san's patch. I think that FDW is only available from backend processes in the current implementation, and MyProcPort will be substituted when processes are initialized() - in BackendInitialize(). Since the backend will execute BackendInitialize() after forking() from the postmaster, we can assume that everyone who operates FDW has a valid value for MyProcPort. I removed if statement and added assertion. We can force parse_pgfdw_appname() not to be called if MyProcPort does not exist, but I don't think it is needed now. > If user name or database name is not set, the name is replaced with > "[unknown]". log_line_prefix needs this because log message may be > output when they have not been set yet, e.g., at early stage of backend > startup. But I wonder if application_name in postgres_fdw actually > need that.. Thought? Hmm, I think all of backend processes have username and database, but here has been followed from Horiguchi-san's suggestion: ``` I'm not sure but even if user_name doesn't seem to be NULL, don't we want to do the same thing with %u of log_line_prefix for safety? Namely, setting [unknown] if user_name is NULL or "". The same can be said for %d. ``` But actually I don't have strong opinions. > + if (appname == NULL || *appname > == '\0') > + appname = "[unknown]"; > > Do we really want to replace the application name with "[unknown]" > when application_name in the local server is not set? At least for me, > it's intuitive to replace it with empty string in that case, > in postgres_fdw application_name. Yeah, I agreed that empty string should be keep here. Currently I kept NULL case because of the above reason. > The patch now fails to be applied to the master. Could you rebase it? Thanks, rebased. I just moved test to the end of the sql file. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
On 2021/11/04 20:42, kuroda.hayato@fujitsu.com wrote: > Dear Fujii-san, > > Thank you for giving comments! I attached new patches. Thanks for updating the patch! + <para> + Note that if embedded strings have Non-ASCII, + these characters will be replaced with the question marks (<literal>?</literal>). + This limitation is caused by <literal>application_name</literal>. + </para> Isn't this descripton confusing because postgres_fdw actually doesn't do this? postgres_fdw just passses the application_name as it is to the remote server. >> On second thought, do we really need padding support for application_name >> in postgres_fdw? I just thought that when I read the description >> "Padding can be useful to aid human readability in log files." in the docs >> about log_line_prefix. > > My feelings was that we don't have any reasons not to support, > but I cannot found some good use-cases. > Now I decided to keep supporting that, > but if we face another problem I will cut that. I'd like to hear more opinions about this from others. But *if* there is actually no use case, I'd like not to add the feature, to make the code simpler. >> + case 'a': >> + if (MyProcPort) >> + { >> >> I commented that the check of MyProcPort is necessary because background >> worker not having MyProcPort may access to the remote server. The example >> of such process is the resolver process that Sawada-san was proposing for >> atomic commit feature. But the proposal was withdrawn and for now >> there seems no such process. If this is true, we can safely remove the check >> of MyProcPort? If so, something like Assert(MyProcPort != NULL) may need >> to be added, instead. > > My understating was that we don't have to assume committing the Sawada-san's patch. > I think that FDW is only available from backend processes in the current implementation, > and MyProcPort will be substituted when processes are initialized() - in BackendInitialize(). > Since the backend will execute BackendInitialize() after forking() from the postmaster, > we can assume that everyone who operates FDW has a valid value for MyProcPort. > I removed if statement and added assertion. + case 'u': + Assert(MyProcPort != NULL); Isn't it enough to perform this assertion check only once at the top of parse_pgfdw_appname()? > We can force parse_pgfdw_appname() not to be called if MyProcPort does not exist, > but I don't think it is needed now. Yes. >> If user name or database name is not set, the name is replaced with >> "[unknown]". log_line_prefix needs this because log message may be >> output when they have not been set yet, e.g., at early stage of backend >> startup. But I wonder if application_name in postgres_fdw actually >> need that.. Thought? > > Hmm, I think all of backend processes have username and database, but > here has been followed from Horiguchi-san's suggestion: > > ``` > I'm not sure but even if user_name doesn't seem to be NULL, don't we > want to do the same thing with %u of log_line_prefix for safety? > Namely, setting [unknown] if user_name is NULL or "". The same can be > said for %d. > ``` > > But actually I don't have strong opinions. Ok, we can wait for more opinions about this to come. But if user name and database name should NOT be NULL in postgres_fdw, I think that it's better to add the assertion check for those conditions and get rid of "[unknown]" part. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Fri, 5 Nov 2021 03:14:00 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2021/11/04 20:42, kuroda.hayato@fujitsu.com wrote: > > Dear Fujii-san, > >Thank you for giving comments! I attached new patches. > > Thanks for updating the patch! > > + <para> > + Note that if embedded strings have Non-ASCII, > + these characters will be replaced with the question marks > (<literal>?</literal>). > + This limitation is caused by <literal>application_name</literal>. > + </para> > > Isn't this descripton confusing because postgres_fdw actually doesn't > do this? > postgres_fdw just passses the application_name as it is to the remote > server. I'm not sure that that distinction is so clear for users. So I feel we want a notice something like this. But it doesn't seem correct as it is. When the user name of the session consists of non-ascii characters, %u is finally seen as a sequence of '?'. It is not a embedded strings in pgfdw_application_name. I didn't notice this behavior. "pgfdw_application_name is set to application_name of a pgfdw connection after placeholder conversion, thus the resulting string is subject to the same restrictions to application_names.". Maybe followed by "If session user name or database name contains non-ascii characters, they are replaced by question marks "?"". Sidetracking a bit, considering this restriction, we might need to reconsider about %u and %d. session-id might be useful as it is ascii-string that can identify a session exactly. > >> On second thought, do we really need padding support for > >> application_name > >> in postgres_fdw? I just thought that when I read the description > >> "Padding can be useful to aid human readability in log files." in the > >> docs > >> about log_line_prefix. > >My feelings was that we don't have any reasons not to support, > > but I cannot found some good use-cases. > > Now I decided to keep supporting that, > > but if we face another problem I will cut that. > > I'd like to hear more opinions about this from others. > But *if* there is actually no use case, I'd like not to add > the feature, to make the code simpler. I think padding is useful because it alingns the significant content of log lines by equating the length of the leading fixed components. application_name doesn't make any other visible components unaligned even when shown in the pg_stat_activity view. It is not useful in that sense. Padding make the string itself make look maybe nicer, but gives no other advantage. I doubt people complain to the lack of the padding feature. Addition to that, since pgfdw_application_name and log_line_prefix work different way in regard to multibyte characters, we don't need to struggle so hard to consilidate them in their behavior. # I noticed that the paddig feature doesn't consider multibyte # characters. (I'm not suggesting them ought to be handled # "prpoerly"). In short, I'm for to removing it by +0.7. > >> + case 'a': > >> + if (MyProcPort) > >> + { > >> > >> I commented that the check of MyProcPort is necessary because > >> background > >> worker not having MyProcPort may access to the remote server. The > >> example > >> of such process is the resolver process that Sawada-san was proposing > >> for > >> atomic commit feature. But the proposal was withdrawn and for now > >> there seems no such process. If this is true, we can safely remove the > >> check > >> of MyProcPort? If so, something like Assert(MyProcPort != NULL) may > >> need > >> to be added, instead. > >My understating was that we don't have to assume committing the > >Sawada-san's patch. > > I think that FDW is only available from backend processes in the > > current implementation, > > and MyProcPort will be substituted when processes are initialized() - > > in BackendInitialize(). > > Since the backend will execute BackendInitialize() after forking() > > from the postmaster, > > we can assume that everyone who operates FDW has a valid value for > > MyProcPort. > > I removed if statement and added assertion. I think it is right. > + case 'u': > + Assert(MyProcPort != NULL); > > Isn't it enough to perform this assertion check only once > at the top of parse_pgfdw_appname()? Yeah, in either way, we should treat them in the same way. > > We can force parse_pgfdw_appname() not to be called if MyProcPort does > > not exist, > > but I don't think it is needed now. > > Yes. (I assume you said "it is needed now".) I'm not sure how to force that but if it means a NULL MyProcPort cuases a crash, I think crashing server is needlessly too aggressive as the penatly. > >> If user name or database name is not set, the name is replaced with > >> "[unknown]". log_line_prefix needs this because log message may be > >> output when they have not been set yet, e.g., at early stage of > >> backend > >> startup. But I wonder if application_name in postgres_fdw actually > >> need that.. Thought? > >Hmm, I think all of backend processes have username and database, but > > here has been followed from Horiguchi-san's suggestion: > >``` > > I'm not sure but even if user_name doesn't seem to be NULL, don't we > > want to do the same thing with %u of log_line_prefix for safety? > > Namely, setting [unknown] if user_name is NULL or "". The same can be > > said for %d. > > ``` > >But actually I don't have strong opinions. > > Ok, we can wait for more opinions about this to come. > But if user name and database name should NOT be NULL > in postgres_fdw, I think that it's better to add the assertion > check for those conditions and get rid of "[unknown]" part. It seems to me that postgres-fdw asumes a valid user id, but doesn't make no use of databsae, server port, and process id. What I thought here is that making it an assertion is too much. So just ignoring the replacement is also fine to me. That being said, if we are eager not to have unused code paths, it is reasonable enough. I don't object strongly to replace it with an assertion. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2021/11/05 12:17, Kyotaro Horiguchi wrote: > I'm not sure that that distinction is so clear for users. So I feel we > want a notice something like this. But it doesn't seem correct as it > is. When the user name of the session consists of non-ascii > characters, %u is finally seen as a sequence of '?'. It is not a > embedded strings in pgfdw_application_name. I didn't notice this > behavior. > > "pgfdw_application_name is set to application_name of a pgfdw > connection after placeholder conversion, thus the resulting string is > subject to the same restrictions to application_names.". Maybe > followed by "If session user name or database name contains non-ascii > characters, they are replaced by question marks "?"". If we add something to the docs about this, we should clearly describe what postgres_fdw.application_name does and what postgres_fdw in a foreign server does. BTW, this kind of information was already commented in option.c. Probably it's better to mention the limitations on not only ASCII characters but also the length of application name. * Unlike application_name GUC, don't set GUC_IS_NAME flag nor check_hook * to allow postgres_fdw.application_name to be any string more than * NAMEDATALEN characters and to include non-ASCII characters. Instead, * remote server truncates application_name of remote connection to less * than NAMEDATALEN and replaces any non-ASCII characters in it with a '?' * character. If possible, I'd like to see this change as a separate patch and commt it first because this is the description for the existing parameter postgres_fdw.application_name. >> I'd like to hear more opinions about this from others. >> But *if* there is actually no use case, I'd like not to add >> the feature, to make the code simpler. > > I think padding is useful because it alingns the significant content > of log lines by equating the length of the leading fixed > components. application_name doesn't make any other visible components > unaligned even when shown in the pg_stat_activity view. It is not > useful in that sense. > > Padding make the string itself make look maybe nicer, but gives no > other advantage. > > I doubt people complain to the lack of the padding feature. Addition > to that, since pgfdw_application_name and log_line_prefix work > different way in regard to multibyte characters, we don't need to > struggle so hard to consilidate them in their behavior. > > # I noticed that the paddig feature doesn't consider multibyte > # characters. (I'm not suggesting them ought to be handled > # "prpoerly"). > > In short, I'm for to removing it by +0.7. So our current consensus is to remove the padding part from postgres_fdw.application_name. >> + case 'u': >> + Assert(MyProcPort != NULL); >> >> Isn't it enough to perform this assertion check only once >> at the top of parse_pgfdw_appname()? > > Yeah, in either way, we should treat them in the same way. > >>> We can force parse_pgfdw_appname() not to be called if MyProcPort does >>> not exist, >>> but I don't think it is needed now. >> >> Yes. > > (I assume you said "it is needed now".) I'm not sure how to force > that but if it means a NULL MyProcPort cuases a crash, I think > crashing server is needlessly too aggressive as the penatly. I said "Yes" for Kuroda-san's comment "I don't think it is needed now". So I meant that "it is NOT needed now". Sorry for unclear comment.. His idea was to skip calling parse_pgfdw_appname() if MyProcPort is NULL, so as to prevent parse_pgfdw_appname() from seeing NULL pointer of MyProcPort. But he thought it's not necessary now, and I agree with him because the idea would cause more confusing behavior. > It seems to me that postgres-fdw asumes a valid user id, but doesn't > make no use of databsae, server port, and process id. What I thought > here is that making it an assertion is too much. So just ignoring the > replacement is also fine to me. > > That being said, if we are eager not to have unused code paths, it is > reasonable enough. I don't object strongly to replace it with an > assertion. So no one strongly objects to the addition of assertion? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Sun, 7 Nov 2021 13:35:39 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2021/11/05 12:17, Kyotaro Horiguchi wrote: > If possible, I'd like to see this change as a separate patch > and commt it first because this is the description for > the existing parameter postgres_fdw.application_name. Fair enough. > >> I'd like to hear more opinions about this from others. > >> But *if* there is actually no use case, I'd like not to add > >> the feature, to make the code simpler. > > I think padding is useful because it alingns the significant content > > of log lines by equating the length of the leading fixed > > In short, I'm for to removing it by +0.7. > > So our current consensus is to remove the padding part > from postgres_fdw.application_name. I think so. > >> + case 'u': > >> + Assert(MyProcPort != NULL); > >> > >> Isn't it enough to perform this assertion check only once > >> at the top of parse_pgfdw_appname()? > > Yeah, in either way, we should treat them in the same way. > > > >>> We can force parse_pgfdw_appname() not to be called if MyProcPort does > >>> not exist, > >>> but I don't think it is needed now. > >> > >> Yes. > > (I assume you said "it is needed now".) I'm not sure how to force > > that but if it means a NULL MyProcPort cuases a crash, I think > > crashing server is needlessly too aggressive as the penatly. > > I said "Yes" for Kuroda-san's comment "I don't think it is > needed now". So I meant that "it is NOT needed now". > Sorry for unclear comment.. > > His idea was to skip calling parse_pgfdw_appname() if > MyProcPort is NULL, so as to prevent parse_pgfdw_appname() > from seeing NULL pointer of MyProcPort. But he thought > it's not necessary now, and I agree with him because > the idea would cause more confusing behavior. > > > > It seems to me that postgres-fdw asumes a valid user id, but doesn't > > make no use of databsae, server port, and process id. What I thought > > here is that making it an assertion is too much. So just ignoring the > > replacement is also fine to me. > > That being said, if we are eager not to have unused code paths, it is > > reasonable enough. I don't object strongly to replace it with an > > assertion. > > So no one strongly objects to the addition of assertion? It seems to me so. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Dear Horiguchi-san, Fujii-san Thank you for discussing! > Isn't it enough to perform this assertion check only once > at the top of parse_pgfdw_appname()? Fixed. > If possible, I'd like to see this change as a separate patch > and commt it first because this is the description for > the existing parameter postgres_fdw.application_name. I attached a small doc patch for this. > "pgfdw_application_name is set to application_name of a pgfdw > connection after placeholder conversion, thus the resulting string is > subject to the same restrictions to application_names.". Maybe > followed by "If session user name or database name contains non-ascii > characters, they are replaced by question marks "?"". This part was also added. Thanks! > So our current consensus is to remove the padding part > from postgres_fdw.application_name. Yeah I removed. Attached patches are the latest version. Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
Dear Fujii-san, Thank you for reviewing! I'll post the latest version. > Currently StringInfo variable is defined in connect_pg_server() and passed to > parse_pgfdw_appname(). But there is no strong reason why the variable needs to > be defined connect_pg_server(). Right? If so how about refactoring > parse_pgfdw_appname() so that it defines StringInfoData variable and returns the > processed character string? You can see such coding at, for example, > escape_param_str() in dblink.c. > If this refactoring is done, probably we can get rid of "#include "lib/stringinfo.h"" > line from connection.c because connect_pg_server() no longer needs to use > StringInfo. I refactored that. Thanks! > + int i; > <snip> > + for (i = n - 1; i >= 0; i--) > > I'm tempted to simplify these into "for (int i = n - 1; i >= 0; i--)". > > + /* > + * Search application_name and replace it if found. > + * > + * We search paramters from the end because the later > + * one have higher priority. See also the above comment. > + */ > > How about updating these comments into the following? > > ----------------------- > Search the parameter arrays to find application_name setting, > and replace escape sequences in it with status information if found. > The arrays are searched backwards because the last value > is used if application_name is repeatedly set. Prefer yours. Fixed. > + if (strcmp(buf->data, "") != 0) > > Is this condition the same as "data[0] == '\0'"? Maybe "data[0] != '\0'", but fixed. > + * parse postgres_fdw.application_name and set escaped string. > + * This function is almost same as log_line_prefix(), but > + * accepted escape sequence is different and this cannot understand > + * padding integer. > + * > + * Note that argument buf must be initialized. > > How about updating these comments into the following? > I thought that using the term "parse" was a bit confusing because what the > function actually does is to replace escape seequences in application_name. > Probably it's also better to rename the function, e.g., to process_pgfdw_appname . > > ----------------------------- > Replace escape sequences beginning with % character in the given > application_name with status information, and return it. > ----------------------------- Thank you for suggestions. I confused the word "parse." Fixed and renamed. > + Same as <xref linkend="guc-log-line-prefix"/>, this is a > + <function>printf</function>-style string. Accepted escapes are > + bit different from <xref linkend="guc-log-line-prefix"/>, > + but padding can be used like as it. > > This description needs to be updated. I missed there. Fixed. Best Regards, Hayato Kuroda FUJITSU LIMITED