Thread: Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version
Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version
From
David HJ
Date:
Hello PostgreSQL Community,
I am writing to propose an alternative compilation of PostgreSQL that allows for a 256-byte identifier length limit, alongside the existing 64-byte version.
Problem:
The current limit of 63 bytes can be quite restrictive, especially for databases that use multi-byte character sets like UTF-8. In such cases, a Chinese character takes up 3 bytes, limiting the name to just 21 characters.
Use-Case:
In specific use-cases, table names like "能源消耗统计_全球主要国家石油与天然气使用量_年度碳排放与可再生能源比例表" and "气候变化研究_全球主要地区极端天气事件频率_年度灾害损失与应对措施表" can far exceed the current limit.
Proposed Solution:
I propose that we offer an alternative compilation of PostgreSQL that increases the NAMEDATALEN constant to allow for 256-byte identifiers. This would be particularly useful for databases that make extensive use of multi-byte character sets like UTF-8.
Dual Compilation:
To maintain backward compatibility and to offer more flexibility, I suggest compiling this 256-byte version alongside the existing 64-byte version. Users could then choose which version best suits their needs.
Backward Compatibility:
I understand that this change could affect backward compatibility. Offering this as an alternative compilation could be a way to mitigate those concerns. I am open to suggestions on how this could be implemented to minimize disruptions.
I look forward to your feedback and hope we can make PostgreSQL even more adaptable to various needs.
Best regards,
Chuxiong
Re: Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version
From
Laurenz Albe
Date:
On Tue, 2023-10-10 at 11:49 +0800, David HJ wrote: > I am writing to propose an alternative compilation of PostgreSQL that allows > for a 256-byte identifier length limit, alongside the existing 64-byte version. > > Problem: > The current limit of 63 bytes can be quite restrictive, especially for databases > that use multi-byte character sets like UTF-8. In such cases, a Chinese character > takes up 3 bytes, limiting the name to just 21 characters. > > Use-Case: > In specific use-cases, table names like > "能源消耗统计_全球主要国家石油与天然气使用量_年度碳排放与可再生能源比例表" and > "气候变化研究_全球主要地区极端天气事件频率_年度灾害损失与应对措施表" can far exceed > the current limit. I have little understanding for people who name their tables energy_consumption_statistics_oil_and_natural_gas_usage_in_major_countries_around_the_world_annual_carbon_emissions_and_renewable_energy_ratio_table. The table could just as well be named "能源统计排放比". The exact description could go in the table comment. Apart from that, it is a good idea to use table names that are standard SQL identifiers, so that you don't have to double quote them all the time. Anyway, you are not the first person to hit the limit, so there is clearly a pain that some people feel. > Proposed Solution: > I propose that we offer an alternative compilation of PostgreSQL that increases > the NAMEDATALEN constant to allow for 256-byte identifiers. This would be > particularly useful for databases that make extensive use of multi-byte character > sets like UTF-8. Wouldn't it be a good solution if we promote the #define to a configure option, like "./configure --identifier-length-limit=256"? Yours, Laurenz Albe
Re: Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version
From
Julien Rouhaud
Date:
On Tue, Oct 10, 2023 at 08:22:55AM +0200, Laurenz Albe wrote: > On Tue, 2023-10-10 at 11:49 +0800, David HJ wrote: > > I am writing to propose an alternative compilation of PostgreSQL that allows > > for a 256-byte identifier length limit, alongside the existing 64-byte version. > > > > Problem: > > The current limit of 63 bytes can be quite restrictive, especially for databases > > that use multi-byte character sets like UTF-8. In such cases, a Chinese character > > takes up 3 bytes, limiting the name to just 21 characters. > > Anyway, you are not the first person to hit the limit, so there is clearly a > pain that some people feel. > > > Proposed Solution: > > I propose that we offer an alternative compilation of PostgreSQL that increases > > the NAMEDATALEN constant to allow for 256-byte identifiers. This would be > > particularly useful for databases that make extensive use of multi-byte character > > sets like UTF-8. > > Wouldn't it be a good solution if we promote the #define to a configure option, > like "./configure --identifier-length-limit=256"? Note that there was some thread recently [1] where the possibility of having some kind of compilation matrix to generate multiple set of binaries with various compile-time values was discussed, so I guess it could fit well with that approach. [1] https://www.postgresql.org/message-id/20230630211153.kbysulcjedxa5ii6@awork3.anarazel.de and following messages
Re: Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version
From
Tatsuo Ishii
Date:
> Hello PostgreSQL Community, > > I am writing to propose an alternative compilation of PostgreSQL that > allows for a 256-byte identifier length limit, alongside the existing > 64-byte version. > > *Problem:* > The current limit of 63 bytes can be quite restrictive, especially for > databases that use multi-byte character sets like UTF-8. In such cases, a > Chinese character takes up 3 bytes, limiting the name to just 21 characters. > > *Use-Case:* > In specific use-cases, table names like > "能源消耗统计_全球主要国家石油与天然气使用量_年度碳排放与可再生能源比例表" and > "气候变化研究_全球主要地区极端天气事件频率_年度灾害损失与应对措施表" can far exceed the current limit. > > *Proposed Solution:* > I propose that we offer an alternative compilation of PostgreSQL that > increases the NAMEDATALEN constant to allow for 256-byte identifiers. This > would be particularly useful for databases that make extensive use of > multi-byte character sets like UTF-8. Another solution would be, letting the meaning of NAMEDATALEN to be number of *characters*, not the number of bytes. This way, you can use up to 64 UTF-8 characters. In my understanding MySQL already does this way. I know this requires non trivial code modifications to PostgreSQL but would be better than to make binaries with random NAMEDATALEN values. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version
From
Laurenz Albe
Date:
On Tue, 2023-10-10 at 15:53 +0900, Tatsuo Ishii wrote: > Another solution would be, letting the meaning of NAMEDATALEN to be > number of *characters*, not the number of bytes. This way, you can use > up to 64 UTF-8 characters. In my understanding MySQL already does this > way. I know this requires non trivial code modifications to PostgreSQL > but would be better than to make binaries with random NAMEDATALEN > values. Since "name" is a fixed-length data type, that would require the stored size to increase to accomodate the extra bytes. Wouldn't that change the storage format and break pg_upgrade? Yours, Laurenz Albe
Re: Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version
From
Tom Lane
Date:
Laurenz Albe <laurenz.albe@cybertec.at> writes: > On Tue, 2023-10-10 at 15:53 +0900, Tatsuo Ishii wrote: >> Another solution would be, letting the meaning of NAMEDATALEN to be >> number of *characters*, not the number of bytes. This way, you can use >> up to 64 UTF-8 characters. In my understanding MySQL already does this >> way. I know this requires non trivial code modifications to PostgreSQL >> but would be better than to make binaries with random NAMEDATALEN >> values. > Since "name" is a fixed-length data type, that would require the stored > size to increase to accomodate the extra bytes. Wouldn't that change the > storage format and break pg_upgrade? Yeah, the real reason this is unlikely to happen is precisely that "name" is fixed-length. Increasing the standard NAMEDATALEN by 4x, or even 2x, has been proposed and rejected many times before because of the bloat it would cause in places like pg_attribute, pg_proc, in-memory tuple descriptors, etc. The real way forward IMO is to find a way to make "name" variable-length, thus both satisfying people who need a few long names and reducing overhead for everybody. This is difficult to do without breaking mountains of backend code, but there's been some discussions about ways to accomplish that. The most recent thread I could find is https://www.postgresql.org/message-id/flat/CALSd-crdmj9PGdvdioU%3Da5W7P%3DTgNmEB2QP9wiF6DTUbBuMXrQ%40mail.gmail.com regards, tom lane
Re: Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version
From
Peter Eisentraut
Date:
On 10.10.23 08:22, Laurenz Albe wrote: > Apart from that, it is a good idea to use table names that are standard > SQL identifiers, so that you don't have to double quote them all the time. FWIW, the Chinese character sequences posted here would be valid unquoted identifiers if PostgreSQL implemented standard SQL correctly/completely.
Re: Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version
From
Tom Lane
Date:
Peter Eisentraut <peter@eisentraut.org> writes: > On 10.10.23 08:22, Laurenz Albe wrote: >> Apart from that, it is a good idea to use table names that are standard >> SQL identifiers, so that you don't have to double quote them all the time. > FWIW, the Chinese character sequences posted here would be valid > unquoted identifiers if PostgreSQL implemented standard SQL > correctly/completely. I'm pretty sure they're valid unquoted identifiers today, because by and large we'll take any non-ASCII as identifier characters. Conforming to the letter of the spec would reduce, not increase, the set of strings we'll call identifiers. regards, tom lane