Home > mailing lists

Sorting regression of text function result since commit 586b98fdf1aae - Mailing list pgsql-hackers

From	Jehan-Guillaume de Rorthais
Subject	Sorting regression of text function result since commit 586b98fdf1aae
Date	December 11, 2023 20:09:51
Msg-id	20231211210951.5020a5d6@karst Whole thread
Responses	Re: Sorting regression of text function result since commit 586b98fdf1aae
List	pgsql-hackers

Tree view

Hi,

A customer found what looks like a sort regression while testing his code from
v11 on a higher version. We hunt this regression down to commit 586b98fdf1aae,
introduced in v12.

Consider the following test case:

  createdb -l fr_FR.utf8 -T template0 reg
  psql reg <<<"
  BEGIN;
  CREATE TABLE IF NOT EXISTS reg
  (
    id bigint NOT NULL,
    reg bytea NOT NULL
  );

  INSERT INTO reg VALUES
    (1, convert_to( 'aaa', 'UTF8')),
    (2, convert_to( 'aa}', 'UTF8'));

  SELECT id FROM reg ORDER BY convert_from(reg, 'UTF8');"

In parent commit 68f6f2b7395fe, it results:

   id
  ────
    2
    1

And in 586b98fdf1aae:

   id
  ────
    1
    2

Looking at the plan, the sort node are different:

* 68f6f2b7395fe: Sort Key: (convert_from(reg, 'UTF8'::name))
* 586b98fdf1aae: Sort Key: (convert_from(reg, 'UTF8'::name)) COLLATE "C"

It looks like since 586b98fdf1aae, the result type collation of "convert_from"
is forced to "C", like the patch does for type "name", instead of the "default"
collation for type "text".

Looking at hints in the header comment of function "exprCollation", I poked
around and found that the result collation wrongly follow the input collation
in this case. With 586b98fdf1aae:

  -- 2nd parameter type resolved as "name" so collation forced to "C"
  SELECT id FROM reg ORDER BY convert_from(reg, 'UTF8');
  --   1
  --   2

  -- Collation of 2nd parameter is forced to something else
  SELECT id FROM reg ORDER BY convert_from(reg, 'UTF8' COLLATE \"default\");
  --  2
  --  1
  -- Sort
  --   Sort Key: (convert_from(reg, 'UTF8'::name COLLATE "default"))
  --   ->  Seq Scan on reg

It seems because the second parameter type is "name", the result collation
become "C" instead of being the collation associated with "text" type:
"default".

I couldn't find anything explaining this behavior in the changelog. It looks
like a regression to me, but if this is actually expected, maybe this deserve
some documentation patch?

Regards,

pgsql-hackers by date:

From: James Coleman
Date: 11 December 2023, 19:59:46
Subject: Teach predtest about IS [NOT] proofs

From: Tom Lane
Date: 11 December 2023, 20:43:12
Subject: Re: Sorting regression of text function result since commit 586b98fdf1aae

Sorting regression of text function result since commit 586b98fdf1aae - Mailing list pgsql-hackers

Previous

Next