Thread: Data loss when '"json_populate_recorset" with long column name

Data loss when '"json_populate_recorset" with long column name

From
Денис Романенко
Date:
If we create a column name longer than 64 bytes, it will be truncated in PostgreSQL to max (NAMEDATALEN) length. 

For example: "VeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongName" will be truncated in database to "VeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongNameVer"

But in the codebase we could work with full column name - SQL functions like INSERT/UPDATE work with long names without problem, automatically searches for suitable column (thank you for it).

But if we try to update it with "json_populate_recordset" using full name, it will not just ignore column with long name - data in that record will be nulled.

How to reproduce:
1. create table wow("VeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongName" text);
2. select * from json_populate_recordset(null::wow,'[{"VeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongName": "haha"}]');
3. "VeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongNameVer" becomes null.


P.S. Why do I need columns with more than 64 bytes length - because I use non-Latin characters in column and table names, so In fact I have only 32 chars because of Unicode. (PostgreSQL: NAMEDATALEN increase because of non-latin languages)



Re: Data loss when '"json_populate_recorset" with long column name

From
Julien Rouhaud
Date:
On Tue, Sep 7, 2021 at 11:27 AM Денис Романенко <deromanenko@gmail.com> wrote:
>
> If we create a column name longer than 64 bytes, it will be truncated in PostgreSQL to max (NAMEDATALEN) length.
>
> For example: "VeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongName" will be truncated in database
to"VeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongNameVer" 
>
> But in the codebase we could work with full column name - SQL functions like INSERT/UPDATE work with long names
withoutproblem, automatically searches for suitable column (thank you for it). 
>
> But if we try to update it with "json_populate_recordset" using full name, it will not just ignore column with long
name- data in that record will be nulled. 
>
> How to reproduce:
> 1. create table wow("VeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongName" text);
> 2. select * from
json_populate_recordset(null::wow,'[{"VeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongName":
"haha"}]');
> 3. "VeryLongNameVeryLongNameVeryLongNameVeryLongNameVeryLongNameVer" becomes null.

Yes, that's because json identifiers have different rules from
relation identifiers.  Your only option here is to use the real /
truncated identifier.  Also I don't think it would be a good thing to
add a way to truncate identifiers in json objects using the
NAMEDATALEN limit, as this could easily lead to invalid json object
that should be valid.



Re: Data loss when '"json_populate_recorset" with long column name

From
Michael Paquier
Date:
On Tue, Sep 07, 2021 at 01:11:49PM +0800, Julien Rouhaud wrote:
> Yes, that's because json identifiers have different rules from
> relation identifiers.  Your only option here is to use the real /
> truncated identifier.  Also I don't think it would be a good thing to
> add a way to truncate identifiers in json objects using the
> NAMEDATALEN limit, as this could easily lead to invalid json object
> that should be valid.

Yeah.  We should try to work toward removing the limits on NAMEDATALEN
for the attribute names.  Easier said than done :)
--
Michael

Attachment

Re: Data loss when '"json_populate_recorset" with long column name

From
Julien Rouhaud
Date:
On Tue, Sep 7, 2021 at 1:31 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> Yeah.  We should try to work toward removing the limits on NAMEDATALEN
> for the attribute names.  Easier said than done :)

Yes, but even if we eventually fix that my impression is that we would
still enforce a limit of 128 characters (or bytes) as this is the SQL
specification.  So trying to rely on json identifiers having the same
rules as SQL identifiers sounds like the wrong approach.



Re: Data loss when '"json_populate_recorset" with long column name

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Tue, Sep 7, 2021 at 1:31 PM Michael Paquier <michael@paquier.xyz> wrote:
>> Yeah.  We should try to work toward removing the limits on NAMEDATALEN
>> for the attribute names.  Easier said than done :)

> Yes, but even if we eventually fix that my impression is that we would
> still enforce a limit of 128 characters (or bytes) as this is the SQL
> specification.

Probably not.  I think SQL says that's the minimum expectation; and
even if they say it should be that exactly, there is no reason we'd
suddenly start slavishly obeying that part of the spec after ignoring
it for years ;-).

There would still be a limit of course, but it would stem from the max
tuple width in the associated catalog, so on the order of 7kB or so.
(Hmm ... perhaps it'd be wise to set a limit of say a couple of kB,
just so that the implementation limit is crisp rather than being
a little bit different in each catalog and each release.)

            regards, tom lane



Re: Data loss when '"json_populate_recorset" with long column name

From
Gavin Flower
Date:
On 8/09/21 2:08 am, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
>> On Tue, Sep 7, 2021 at 1:31 PM Michael Paquier <michael@paquier.xyz> wrote:
>>> Yeah.  We should try to work toward removing the limits on NAMEDATALEN
>>> for the attribute names.  Easier said than done :)
>> Yes, but even if we eventually fix that my impression is that we would
>> still enforce a limit of 128 characters (or bytes) as this is the SQL
>> specification.
> Probably not.  I think SQL says that's the minimum expectation; and
> even if they say it should be that exactly, there is no reason we'd
> suddenly start slavishly obeying that part of the spec after ignoring
> it for years ;-).
>
> There would still be a limit of course, but it would stem from the max
> tuple width in the associated catalog, so on the order of 7kB or so.
> (Hmm ... perhaps it'd be wise to set a limit of say a couple of kB,
> just so that the implementation limit is crisp rather than being
> a little bit different in each catalog and each release.)
>
>             regards, tom lane
>
>
How about 4kB (unless there are systems for which this is too large)?

That should be easy to remember.


Cheers,
Gavin




Re: Data loss when '"json_populate_recorset" with long column name

From
Julien Rouhaud
Date:
On Tue, Sep 7, 2021 at 10:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Julien Rouhaud <rjuju123@gmail.com> writes:
>
> > Yes, but even if we eventually fix that my impression is that we would
> > still enforce a limit of 128 characters (or bytes) as this is the SQL
> > specification.
>
> Probably not.  I think SQL says that's the minimum expectation;

Ah, I didn't know that.

> and
> even if they say it should be that exactly, there is no reason we'd
> suddenly start slavishly obeying that part of the spec after ignoring
> it for years ;-).

Well, yes but we ignored it for years due to technical limitation.
And the result of that is that we make migration to postgres harder.

If we somehow find a way to remove this limitation, ignoring the spec
again (assuming that the spec gives a hard limit) will make migration
from postgres harder and will also probably bring other problems
(allowing identifier kB long will lead to bigger caches for instances,
which can definitely bite hard).  Is it really worth it?