Thread: Sorting regression of text function result since commit 586b98fdf1aae
Sorting regression of text function result since commit 586b98fdf1aae
From
Jehan-Guillaume de Rorthais
Date:
Hi, A customer found what looks like a sort regression while testing his code from v11 on a higher version. We hunt this regression down to commit 586b98fdf1aae, introduced in v12. Consider the following test case: createdb -l fr_FR.utf8 -T template0 reg psql reg <<<" BEGIN; CREATE TABLE IF NOT EXISTS reg ( id bigint NOT NULL, reg bytea NOT NULL ); INSERT INTO reg VALUES (1, convert_to( 'aaa', 'UTF8')), (2, convert_to( 'aa}', 'UTF8')); SELECT id FROM reg ORDER BY convert_from(reg, 'UTF8');" In parent commit 68f6f2b7395fe, it results: id ──── 2 1 And in 586b98fdf1aae: id ──── 1 2 Looking at the plan, the sort node are different: * 68f6f2b7395fe: Sort Key: (convert_from(reg, 'UTF8'::name)) * 586b98fdf1aae: Sort Key: (convert_from(reg, 'UTF8'::name)) COLLATE "C" It looks like since 586b98fdf1aae, the result type collation of "convert_from" is forced to "C", like the patch does for type "name", instead of the "default" collation for type "text". Looking at hints in the header comment of function "exprCollation", I poked around and found that the result collation wrongly follow the input collation in this case. With 586b98fdf1aae: -- 2nd parameter type resolved as "name" so collation forced to "C" SELECT id FROM reg ORDER BY convert_from(reg, 'UTF8'); -- 1 -- 2 -- Collation of 2nd parameter is forced to something else SELECT id FROM reg ORDER BY convert_from(reg, 'UTF8' COLLATE \"default\"); -- 2 -- 1 -- Sort -- Sort Key: (convert_from(reg, 'UTF8'::name COLLATE "default")) -- -> Seq Scan on reg It seems because the second parameter type is "name", the result collation become "C" instead of being the collation associated with "text" type: "default". I couldn't find anything explaining this behavior in the changelog. It looks like a regression to me, but if this is actually expected, maybe this deserve some documentation patch? Regards,
Jehan-Guillaume de Rorthais <jgdr@dalibo.com> writes: > It looks like since 586b98fdf1aae, the result type collation of "convert_from" > is forced to "C", like the patch does for type "name", instead of the "default" > collation for type "text". Well, convert_from() inherits its result collation from the input, per the normal rules for collation assignment [1]. > Looking at hints in the header comment of function "exprCollation", I poked > around and found that the result collation wrongly follow the input collation > in this case. It's not "wrong", it's what the SQL standard requires. > I couldn't find anything explaining this behavior in the changelog. It looks > like a regression to me, but if this is actually expected, maybe this deserve > some documentation patch? The v12 release notes do say Type name now behaves much like a domain over type text that has default collation “C”. You'd have similar results from an expression involving such a domain, I believe. I'm less than excited about patching the v12 release notes four years later. Maybe, if this point had come up in a more timely fashion, we'd have mentioned it --- but it's hardly possible to cover every potential implication of such a change in the release notes. regards, tom lane [1] https://www.postgresql.org/docs/current/collation.html#COLLATION-CONCEPTS
Re: Sorting regression of text function result since commit 586b98fdf1aae
From
Jehan-Guillaume de Rorthais
Date:
On Mon, 11 Dec 2023 15:43:12 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jehan-Guillaume de Rorthais <jgdr@dalibo.com> writes: > > It looks like since 586b98fdf1aae, the result type collation of > > "convert_from" is forced to "C", like the patch does for type "name", > > instead of the "default" collation for type "text". > > Well, convert_from() inherits its result collation from the input, > per the normal rules for collation assignment [1]. > > > Looking at hints in the header comment of function "exprCollation", I poked > > around and found that the result collation wrongly follow the input > > collation in this case. > > It's not "wrong", it's what the SQL standard requires. Mh, OK. This is at least a surprising behavior. Having a non-data related argument impacting the result collation seems counter-intuitive. But I understand this is by standard, no need to discuss it. > > I couldn't find anything explaining this behavior in the changelog. It looks > > like a regression to me, but if this is actually expected, maybe this > > deserve some documentation patch? > > The v12 release notes do say > > Type name now behaves much like a domain over type text that has > default collation “C”. Sure, and I saw it, but reading at this entry, I couldn't guess this could have such implication on text result from a function call. That's why I hunt for the precise commit and was surprise to find this was the actual change. > You'd have similar results from an expression involving such a domain, > I believe. > > I'm less than excited about patching the v12 release notes four > years later. Maybe, if this point had come up in a more timely > fashion, we'd have mentioned it --- but it's hardly possible to > cover every potential implication of such a change in the > release notes. This could have been documented in the collation concept page, as a trap to be aware of. A link from the release note to such a small paragraph would have been enough to warn devs this might have implications when mixed with other collatable types. But I understand we can not document all the traps paving the way to the standard anyway. Thank you for your explanation! Regards,