Thread: [PATCH] COPY command's data format option allows only lowercase csv,text or binary
[PATCH] COPY command's data format option allows only lowercase csv,text or binary
From
Bharath Rupireddy
Date:
Hi, COPY command's FORMAT option allows only all lowercase csv, text or binary, this is true because strcmp is being used while parsing these values. It would be nice if the uppercase or combination of lower and upper case format options such as CSV, TEXT, BINARY, Csv, Text, Binary so on. is also allowed. To achieve this pg_strcasecmp() is used instead of strcmp. Attached is a patch having above changes. Request the community to review the patch, if it makes sense. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [PATCH] COPY command's data format option allows only lowercase csv, text or binary
From
Tom Lane
Date:
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes: > COPY command's FORMAT option allows only all lowercase csv, text or > binary, this is true because strcmp is being used while parsing these > values. This is nonsense, actually: regression=# create table foo (f1 int); CREATE TABLE regression=# copy foo from stdin (format CSV); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. As that shows, there's already a round of lowercasing done by the parser. The only way that strcasecmp in copy.c would be useful is if you wanted to accept things like copy foo from stdin (format "CSV"); I don't find that to be a terribly good idea. The normal implication of quoting is that it *prevents* case folding, so why should that happen anyway? More generally, though, why would we want to change this policy only here? I believe we're reasonably consistent about letting the parser do any required down-casing and then just checking keyword matches with strcmp. regards, tom lane
Re: [PATCH] COPY command's data format option allows only lowercasecsv, text or binary
From
Robert Haas
Date:
On Wed, Jun 24, 2020 at 10:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > More generally, though, why would we want to change this policy only > here? I believe we're reasonably consistent about letting the parser > do any required down-casing and then just checking keyword matches > with strcmp. I've had the feeling in the past that our use of pg_strcasecmp() was a bit random. Looking through the output of 'git grep pg_strcasecmp', it seems like we don't typically don't use it on option names, but sometimes we use it on option values. For instance, DefineCollation() uses pg_strcasecmp() on the collproviderstr, and DefineType() uses it on storageEl; and also, not to be overlooked, defGetBoolean() uses it when matching true/false/on/off, which probably affects a lot of places. On the other hand, ExplainQuery() matches the format using plain old strcmp(), despite indirectly using pg_strcasecmp() for the Boolean parameters. So I guess I'm not really convinced that there is all that much consistency here. Mind you, I'm not sure whether or not anything really needs to be changed, or exactly what ought to be done. I'm just making the observation that it might not be as consistent as you may think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] COPY command's data format option allows only lowercase csv, text or binary
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Jun 24, 2020 at 10:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> More generally, though, why would we want to change this policy only >> here? I believe we're reasonably consistent about letting the parser >> do any required down-casing and then just checking keyword matches >> with strcmp. > ... Mind you, I'm not sure whether or not anything really needs to be > changed, or exactly what ought to be done. I'm just making the > observation that it might not be as consistent as you may think. Yeah, I'm sure there are a few inconsistencies. We previously made a pass to get rid of pg_strcasecmp for anything that had been through the parser's downcasing (commit fb8697b31) but I wouldn't be surprised if that missed a few cases, or if new ones have snuck in. Anyway, "don't use pg_strcasecmp unnecessarily" was definitely the agreed-to policy as of Jan 2018. My vague recollection is that there are a few exceptions (defGetBoolean may well be one of them) where pg_strcasecmp still seemed necessary because the input might not have come through the parser in some usages. regards, tom lane
Re: [PATCH] COPY command's data format option allows only lowercasecsv, text or binary
From
Michael Paquier
Date:
On Wed, Jun 24, 2020 at 12:55:22PM -0400, Tom Lane wrote: > Yeah, I'm sure there are a few inconsistencies. We previously made a > pass to get rid of pg_strcasecmp for anything that had been through > the parser's downcasing (commit fb8697b31) but I wouldn't be surprised > if that missed a few cases, or if new ones have snuck in. Anyway, > "don't use pg_strcasecmp unnecessarily" was definitely the agreed-to > policy as of Jan 2018. 0d8c9c1 has introduced some in parse_basebackup_options() for the new manifest option, and fe30e7e for AlterType(), no? > My vague recollection is that there are a few exceptions (defGetBoolean > may well be one of them) where pg_strcasecmp still seemed necessary > because the input might not have come through the parser in some usages. Yep, there were a couple of exceptions. What was done at this time was a case-by-case lookup to check what came only from the parser. -- Michael
Attachment
Re: [PATCH] COPY command's data format option allows only lowercasecsv, text or binary
From
Bharath Rupireddy
Date:
> As that shows, there's already a round of lowercasing done by the parser. > The only way that strcasecmp in copy.c would be useful is if you wanted to > accept things like > copy foo from stdin (format "CSV"); > I don't find that to be a terribly good idea. The normal implication > of quoting is that it *prevents* case folding, so why should that > happen anyway? > I was able to know that the parser does the lowercasing for other parts of the query, what I missed in my understanding is that about the proper usage of quoting. Thanks for letting me know this point. I agree with the above understanding to not change that behavior. Please ignore this patch. Thank you all for your responses. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: [PATCH] COPY command's data format option allows only lowercasecsv, text or binary
From
Michael Paquier
Date:
On Thu, Jun 25, 2020 at 11:07:33AM +0900, Michael Paquier wrote: > 0d8c9c1 has introduced some in parse_basebackup_options() for the > new manifest option, and fe30e7e for AlterType(), no? Please forget this one. I had a moment of brain fade. Those have been added for the option values, and on the option names we use directly strcmp(), so I am not actually seeing a code path on HEAD where we use pg_strcasecmp for something coming only from the parser. -- Michael