Thread: Re: general purpose array_sort

Re: general purpose array_sort

From
Junwang Zhao
Date:
On Sat, Sep 28, 2024 at 10:41 PM jian he <jian.universality@gmail.com> wrote:
>
> On Sat, Sep 28, 2024 at 7:52 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> >
> > PFA v2, use COLLATE keyword to supply the collation suggested by
> > Andreas offlist.
> >
> this is better. otherwise we need extra care to handle case like:
> SELECT array_sort('{1,3,5,2,4,6}'::int[] COLLATE "pg_c_utf8");
>
>
> +      <row>
> +       <entry role="func_table_entry"><para role="func_signature">
> +        <indexterm>
> +         <primary>array_sort</primary>
> +        </indexterm>
> +        <function>array_sort</function> ( <type>anyarray</type>
> <optional>, <parameter>dir</parameter> </optional>)
> +        <returnvalue>anyarray</returnvalue>
> +       </para>
> +       <para>
> +        Sorts the array in either ascending or descending order.
> +        <parameter>dir</parameter> must be <literal>asc</literal>
> +        or <literal>desc</literal>. The array must be empty or one-dimensional.
> +       </para>
> +       <para>
> +        <literal>array_sort(ARRAY[1,2,5,6,3,4])</literal>
> +        <returnvalue>{1,2,3,4,5,6}</returnvalue>
> +       </para></entry>
> +      </row>
> I am confused with <parameter>dir</parameter>. I guess you want to say
> "direction"
> But here, I think <parameter>sort_asc</parameter> would be more appropriate?

This doc is mostly copied and edited from intarray.sgml sort part.

And the logic is basically the same, you can check the intarray module.

>
>
> <parameter>dir</parameter> can have only two potential values, make it
> as a boolean would be more easier?
> you didn't mention information:  "by default, it will sort by
> ascending order; the sort collation by default is using the array
> element type's collation"
>
> tuplesort_begin_datum can do null-first, null-last, so the
> one-dimension array can allow null values.

The following(create extension intarry first) will give an error, I
keep the same for array_sort.

SELECT sort('{1234234,-30,234234, null}');

>
> Based on the above and others, I did some refactoring, feel free to take it.
> my changes, changed the function signature, so you need to pay
> attention to sql test file.

Thanks for your refactor, I will take some in the next version.


--
Regards
Junwang Zhao



Re: general purpose array_sort

From
"David G. Johnston"
Date:
On Sat, Sep 28, 2024 at 7:05 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
On Sat, Sep 28, 2024 at 10:41 PM jian he <jian.universality@gmail.com> wrote:
>
> <parameter>dir</parameter> can have only two potential values, make it
> as a boolean would be more easier?
> you didn't mention information:  "by default, it will sort by
> ascending order; the sort collation by default is using the array
> element type's collation"
>
> tuplesort_begin_datum can do null-first, null-last, so the
> one-dimension array can allow null values.

The following(create extension intarry first) will give an error, I
keep the same for array_sort.

SELECT sort('{1234234,-30,234234, null}');


I would suggest accepting:
asc
desc
asc nulls first
asc nulls last *
desc nulls first *
desc nulls last

As valid inputs for "dir" - and that the starred options are the defaults when null position is omitted.

In short, mimic create index.

David J.

Re: general purpose array_sort

From
jian he
Date:
On Mon, Sep 30, 2024 at 1:01 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> > I would suggest accepting:
> > asc
> > desc
> > asc nulls first
> > asc nulls last *
> > desc nulls first *
> > desc nulls last
> >
> > As valid inputs for "dir" - and that the starred options are the defaults when null position is omitted.
> >
> > In short, mimic create index.
> >
> > David J.
> >
>
> PFA v3 with David's suggestion addressed.
>

I think just adding 2 bool arguments (asc/desc, nulls last/not nulls
last) would be easier.
but either way, (i don't have a huge opinion)
but document the second argument, imagine case
SELECT array_sort('{a,B}'::text[] , E'aSc NulLs  LaST \t\r\n');
would be tricky?


errmsg("multidimensional arrays sorting are not supported")));
write a sql test to trigger the error message that would be great.

you can add two or one example to collate.icu.utf8.sql to demo that it
actually works with COLLATE  collation_name
like:
SELECT array_sort('{a,B}'::text[] COLLATE case_insensitive);
SELECT array_sort('{a,B}'::text[] COLLATE "C");


#define WHITESPACE " \t\n\r"
you may also check function scanner_isspace


+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
            /* Disallow data types without a less-than operator */
            type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
            if (type->lt_opr == InvalidOid)
                ereport(ERROR,
                        (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                         errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
                                attname, format_type_be(attForm->atttypid))));



Re: general purpose array_sort

From
jian he
Date:
> >
> > + typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
> > + if (typentry == NULL || typentry->type_id != elmtyp)
> > + {
> > + typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
> > TYPECACHE_GT_OPR);
> > + fcinfo->flinfo->fn_extra = (void *) typentry;
> > + }
> > you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
> > see CreateStatistics.
> >             /* Disallow data types without a less-than operator */
> >             type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
> >             if (type->lt_opr == InvalidOid)
> >                 ereport(ERROR,
> >                         (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> >                          errmsg("column \"%s\" cannot be used in
> > statistics because its type %s has no default btree operator class",
> >                                 attname, format_type_be(attForm->atttypid))));
>
> I added an Assert for this part, not sure if that is enough.
>

i think it really should be:

if (typentry == NULL || typentry->type_id != elmtyp)
{
 typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
 fcinfo->flinfo->fn_extra = (void *) typentry;
if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
OidIsValid(typentry->gt_opr));
ereport(ERROR,....)
}

Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
then we cannot do the sort, we should just error out.

I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);

select array_sort('{#FF0000, #FF0000}'::colour[]);
of course it will segfault  with your new Assert.


[1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c



Re: general purpose array_sort

From
Amit Langote
Date:
Hi Junwang,

On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:
> >
> > > >
> > > > + typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
> > > > + if (typentry == NULL || typentry->type_id != elmtyp)
> > > > + {
> > > > + typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
> > > > TYPECACHE_GT_OPR);
> > > > + fcinfo->flinfo->fn_extra = (void *) typentry;
> > > > + }
> > > > you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
> > > > see CreateStatistics.
> > > >             /* Disallow data types without a less-than operator */
> > > >             type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
> > > >             if (type->lt_opr == InvalidOid)
> > > >                 ereport(ERROR,
> > > >                         (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > > >                          errmsg("column \"%s\" cannot be used in
> > > > statistics because its type %s has no default btree operator class",
> > > >                                 attname, format_type_be(attForm->atttypid))));
> > >
> > > I added an Assert for this part, not sure if that is enough.
> > >
> >
> > i think it really should be:
> >
> > if (typentry == NULL || typentry->type_id != elmtyp)
> > {
> >  typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
> > TYPECACHE_GT_OPR);
> >  fcinfo->flinfo->fn_extra = (void *) typentry;
> > if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
> > OidIsValid(typentry->gt_opr));
> > ereport(ERROR,....)
> > }
> >
> > Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
> > then we cannot do the sort, we should just error out.
> >
> > I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
> > colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);
> >
> > select array_sort('{#FF0000, #FF0000}'::colour[]);
> > of course it will segfault  with your new Assert.
> >
> >
> > [1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c
>
> Make sense, PFA v5 with Jian's suggestion.

Have you noticed that the tests have failed on Cirrus CI runs of this patch?

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277

It might be related to the test machines having a different *default*
locale than your local environment, which could result in a different
sort order for the test data. You may need to add an explicit COLLATE
clause to the tests to ensure consistent sorting across systems.

--
Thanks, Amit Langote



Re: general purpose array_sort

From
Junwang Zhao
Date:
On Wed, Oct 9, 2024 at 10:10 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> Hi Amit,
>
> On Thu, Oct 3, 2024 at 2:22 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > Hi Junwang,
> >
> > On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> > > On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:
> > > >
> > > > > >
> > > > > > + typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
> > > > > > + if (typentry == NULL || typentry->type_id != elmtyp)
> > > > > > + {
> > > > > > + typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
> > > > > > TYPECACHE_GT_OPR);
> > > > > > + fcinfo->flinfo->fn_extra = (void *) typentry;
> > > > > > + }
> > > > > > you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
> > > > > > see CreateStatistics.
> > > > > >             /* Disallow data types without a less-than operator */
> > > > > >             type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
> > > > > >             if (type->lt_opr == InvalidOid)
> > > > > >                 ereport(ERROR,
> > > > > >                         (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > > > > >                          errmsg("column \"%s\" cannot be used in
> > > > > > statistics because its type %s has no default btree operator class",
> > > > > >                                 attname, format_type_be(attForm->atttypid))));
> > > > >
> > > > > I added an Assert for this part, not sure if that is enough.
> > > > >
> > > >
> > > > i think it really should be:
> > > >
> > > > if (typentry == NULL || typentry->type_id != elmtyp)
> > > > {
> > > >  typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
> > > > TYPECACHE_GT_OPR);
> > > >  fcinfo->flinfo->fn_extra = (void *) typentry;
> > > > if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
> > > > OidIsValid(typentry->gt_opr));
> > > > ereport(ERROR,....)
> > > > }
> > > >
> > > > Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
> > > > then we cannot do the sort, we should just error out.
> > > >
> > > > I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
> > > > colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);
> > > >
> > > > select array_sort('{#FF0000, #FF0000}'::colour[]);
> > > > of course it will segfault  with your new Assert.
> > > >
> > > >
> > > > [1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c
> > >
> > > Make sense, PFA v5 with Jian's suggestion.
> >
> > Have you noticed that the tests have failed on Cirrus CI runs of this patch?
> >
> > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277
>
> Sorry for the late reply due to my vacation. I should have paid
> more attention to Cirrus CI earlier ;)
>
> >
> > It might be related to the test machines having a different *default*
> > locale than your local environment, which could result in a different
> > sort order for the test data. You may need to add an explicit COLLATE
> > clause to the tests to ensure consistent sorting across systems.
>
> I've changed the tests to use just ASCII characters, then added
> *COLLATE "C"* to the tests and CI passed, PFA v6.

Sadly the CI only passed on my own github repo, it failed on
cfbot[1], will dig into the reason later because I can not open the cirrus
ci page right now ;(

[1] https://cirrus-ci.com/task/5815925960605696

>
> >
> > --
> > Thanks, Amit Langote
>
>
>
> --
> Regards
> Junwang Zhao



--
Regards
Junwang Zhao



Re: general purpose array_sort

From
Junwang Zhao
Date:
On Wed, Oct 9, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> On Wed, Oct 9, 2024 at 10:10 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> >
> > Hi Amit,
> >
> > On Thu, Oct 3, 2024 at 2:22 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > >
> > > Hi Junwang,
> > >
> > > On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> > > > On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:
> > > > >
> > > > > > >
> > > > > > > + typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
> > > > > > > + if (typentry == NULL || typentry->type_id != elmtyp)
> > > > > > > + {
> > > > > > > + typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
> > > > > > > TYPECACHE_GT_OPR);
> > > > > > > + fcinfo->flinfo->fn_extra = (void *) typentry;
> > > > > > > + }
> > > > > > > you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
> > > > > > > see CreateStatistics.
> > > > > > >             /* Disallow data types without a less-than operator */
> > > > > > >             type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
> > > > > > >             if (type->lt_opr == InvalidOid)
> > > > > > >                 ereport(ERROR,
> > > > > > >                         (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > > > > > >                          errmsg("column \"%s\" cannot be used in
> > > > > > > statistics because its type %s has no default btree operator class",
> > > > > > >                                 attname, format_type_be(attForm->atttypid))));
> > > > > >
> > > > > > I added an Assert for this part, not sure if that is enough.
> > > > > >
> > > > >
> > > > > i think it really should be:
> > > > >
> > > > > if (typentry == NULL || typentry->type_id != elmtyp)
> > > > > {
> > > > >  typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
> > > > > TYPECACHE_GT_OPR);
> > > > >  fcinfo->flinfo->fn_extra = (void *) typentry;
> > > > > if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
> > > > > OidIsValid(typentry->gt_opr));
> > > > > ereport(ERROR,....)
> > > > > }
> > > > >
> > > > > Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
> > > > > then we cannot do the sort, we should just error out.
> > > > >
> > > > > I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
> > > > > colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);
> > > > >
> > > > > select array_sort('{#FF0000, #FF0000}'::colour[]);
> > > > > of course it will segfault  with your new Assert.
> > > > >
> > > > >
> > > > > [1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c
> > > >
> > > > Make sense, PFA v5 with Jian's suggestion.
> > >
> > > Have you noticed that the tests have failed on Cirrus CI runs of this patch?
> > >
> > > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277
> >
> > Sorry for the late reply due to my vacation. I should have paid
> > more attention to Cirrus CI earlier ;)
> >
> > >
> > > It might be related to the test machines having a different *default*
> > > locale than your local environment, which could result in a different
> > > sort order for the test data. You may need to add an explicit COLLATE
> > > clause to the tests to ensure consistent sorting across systems.
> >
> > I've changed the tests to use just ASCII characters, then added
> > *COLLATE "C"* to the tests and CI passed, PFA v6.
>
> Sadly the CI only passed on my own github repo, it failed on
> cfbot[1], will dig into the reason later because I can not open the cirrus
> ci page right now ;(
>
> [1] https://cirrus-ci.com/task/5815925960605696
>

Seems the failure is not related to this patch, I guess the reason for
this is the stop phase doesn't
release the port properly?

2024-10-09 14:53:10.079 UTC [43052][checkpointer] LOG:  checkpoint
complete: wrote 5617 buffers (34.3%), wrote 3 SLRU buffers; 0 WAL
file(s) added, 0 removed, 3 recycled; write=0.107 s, sync=0.001 s,
total=0.107 s; sync files=0, longest=0.000 s, average=0.000 s;
distance=45239 kB, estimate=45239 kB; lsn=0/414A138, redo
lsn=0/414A138
2024-10-09 14:53:10.084 UTC [43050][postmaster] LOG:  database system
is shut down
2024-10-09 14:53:10.215 UTC [43270][postmaster] LOG:  starting
PostgreSQL 18devel on x86_64-freebsd, compiled by clang-17.0.6, 64-bit
2024-10-09 14:53:10.215 UTC [43270][postmaster] LOG:  could not bind
IPv4 address "127.0.0.1": Address already in use
2024-10-09 14:53:10.215 UTC [43270][postmaster] HINT:  Is another
postmaster already running on port 11643? If not, wait a few seconds
and retry.
2024-10-09 14:53:10.215 UTC [43270][postmaster] WARNING:  could not
create listen socket for "127.0.0.1"
2024-10-09 14:53:10.218 UTC [43270][postmaster] FATAL:  could not
create any TCP/IP sockets
2024-10-09 14:53:10.218 UTC [43270][postmaster] LOG:  database system
is shut down



https://api.cirrus-ci.com/v1/artifact/task/5815925960605696/testrun/build/testrun/ssl/001_ssltests/log/001_ssltests_primary.log

> >
> > >
> > > --
> > > Thanks, Amit Langote
> >
> >
> >
> > --
> > Regards
> > Junwang Zhao
>
>
>
> --
> Regards
> Junwang Zhao



--
Regards
Junwang Zhao



Re: general purpose array_sort

From
jian he
Date:
tricky case:
should we allow array element type to be composite/domain?
currently seems to work fine.


create table t(b int[]);
insert into t values ('{{1,3}}'), ('{{1,2}}');

 select array_sort((select array_agg(t) from t), 'desc');
            array_sort
-----------------------------------
 {"(\"{{1,3}}\")","(\"{{1,2}}\")"}


select array_sort((t.b)) from t;
ERROR:  multidimensional arrays sorting are not supported


select array_sort((select array_agg(t.b) from t));
ERROR:  multidimensional arrays sorting are not supported



Re: general purpose array_sort

From
Junwang Zhao
Date:
Hi Jian,

On Fri, Oct 11, 2024 at 1:12 PM jian he <jian.universality@gmail.com> wrote:
>
> tricky case:
> should we allow array element type to be composite/domain?
> currently seems to work fine.
>
>
> create table t(b int[]);
> insert into t values ('{{1,3}}'), ('{{1,2}}');
>
>  select array_sort((select array_agg(t) from t), 'desc');
>             array_sort
> -----------------------------------
>  {"(\"{{1,3}}\")","(\"{{1,2}}\")"}
>
>
> select array_sort((t.b)) from t;
> ERROR:  multidimensional arrays sorting are not supported
>
>
> select array_sort((select array_agg(t.b) from t));
> ERROR:  multidimensional arrays sorting are not supported

I tried the above cases, and the first one works because it's
a one dim array of composite type, the other two fails because
they are multidimensional.

It seems there is not much meaning to sort composite type,
so are you proposing we should error on that?

--
Regards
Junwang Zhao



Re: general purpose array_sort

From
Tom Lane
Date:
Junwang Zhao <zhjwpku@gmail.com> writes:
> It seems there is not much meaning to sort composite type,
> so are you proposing we should error on that?

It's hardly "general purpose" if it randomly refuses to
sort certain types.  I would say it should be able to sort
anything that ORDER BY will handle --- and that certainly
includes the cases shown here.

            regards, tom lane



Re: general purpose array_sort

From
jian he
Date:
On Wed, Oct 23, 2024 at 10:28 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> PFA v7 with multi-array support.
>

if (ARR_NDIM(array) == 1)
{
}
else
{
}
can be simplified.
i think beginning part of array_sort can be like the following:
(newline emitted)

---------------------------------------------------------------------
    if (ARR_NDIM(array) < 1)
        PG_RETURN_ARRAYTYPE_P(array);
    if (dirstr != NULL)
    {
        if (!parse_sort_order(text_to_cstring(dirstr), &sort_asc, &nulls_first))
            ereport(ERROR,
                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                     errmsg("second parameter must be a valid sort
direction")));
    }
    elmtyp = ARR_ELEMTYPE(array);
    if (ARR_NDIM(array) > 1)
        elmtyp = get_array_type(elmtyp);
    typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
    if (typentry == NULL || typentry->type_id != elmtyp)
    {
        typentry = lookup_type_cache(elmtyp, sort_asc ?
TYPECACHE_LT_OPR : TYPECACHE_GT_OPR);
        if ((sort_asc && !OidIsValid(typentry->lt_opr)) ||
            (!sort_asc && !OidIsValid(typentry->gt_opr)))
            ereport(ERROR,
                    (errcode(ERRCODE_UNDEFINED_FUNCTION),
                    errmsg("could not identify an ordering operator
for type %s",
                            format_type_be(elmtyp))));
        fcinfo->flinfo->fn_extra = (void *) typentry;
    }
---------------------------------------------------------------------
/*
 * array_sort
 *
 * Sorts the array in either ascending or descending order.
 * The array must be empty or one-dimensional.
 */
comments need to be updated.


typedef enum
    PARSE_SORT_ORDER_DONE
} ParseSortOrderState;

last one, should have comma, like
"PARSE_SORT_ORDER_DONE, "



Re: general purpose array_sort

From
Junwang Zhao
Date:
On Thu, Oct 24, 2024 at 8:40 PM jian he <jian.universality@gmail.com> wrote:
>
> On Wed, Oct 23, 2024 at 10:28 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> > PFA v7 with multi-array support.
> >
>
> if (ARR_NDIM(array) == 1)
> {
> }
> else
> {
> }
> can be simplified.
> i think beginning part of array_sort can be like the following:
> (newline emitted)
>
> ---------------------------------------------------------------------
>     if (ARR_NDIM(array) < 1)
>         PG_RETURN_ARRAYTYPE_P(array);
>     if (dirstr != NULL)
>     {
>         if (!parse_sort_order(text_to_cstring(dirstr), &sort_asc, &nulls_first))
>             ereport(ERROR,
>                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>                      errmsg("second parameter must be a valid sort
> direction")));
>     }
>     elmtyp = ARR_ELEMTYPE(array);
>     if (ARR_NDIM(array) > 1)
>         elmtyp = get_array_type(elmtyp);

I'm wondering should we cache the type entry for both element type and
the corresponding array type, for example if we have a table:

create table t(b int[]);
insert into t values ('{1,3}'),('{{2,3}}'),('{{1,2},{0,2}}');

with 1-d array and m-d array interleaved, then the following query will
call lookup_type_cache multiple times.

select array_sort((t.b)) from t;

>     typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
>     if (typentry == NULL || typentry->type_id != elmtyp)
>     {
>         typentry = lookup_type_cache(elmtyp, sort_asc ?
> TYPECACHE_LT_OPR : TYPECACHE_GT_OPR);
>         if ((sort_asc && !OidIsValid(typentry->lt_opr)) ||
>             (!sort_asc && !OidIsValid(typentry->gt_opr)))
>             ereport(ERROR,
>                     (errcode(ERRCODE_UNDEFINED_FUNCTION),
>                     errmsg("could not identify an ordering operator
> for type %s",
>                             format_type_be(elmtyp))));
>         fcinfo->flinfo->fn_extra = (void *) typentry;
>     }
> ---------------------------------------------------------------------
> /*
>  * array_sort
>  *
>  * Sorts the array in either ascending or descending order.
>  * The array must be empty or one-dimensional.
>  */
> comments need to be updated.

will fix it in the next version of patch.

>
>
> typedef enum
>     PARSE_SORT_ORDER_DONE
> } ParseSortOrderState;
>
> last one, should have comma, like
> "PARSE_SORT_ORDER_DONE, "

will fix it.


--
Regards
Junwang Zhao



Re: general purpose array_sort

From
Aleksander Alekseev
Date:
Hi,

> It's hardly "general purpose" if it randomly refuses to
> sort certain types.  I would say it should be able to sort
> anything that ORDER BY will handle --- and that certainly
> includes the cases shown here.

I wonder how useful / convenient the new function will be considering
that we already have CTEs and can do:

SELECT array_agg(x ORDER BY x) FROM unnest(ARRAY[5,1,3,2,4]) AS x;

Perhaps there are use cases I didn't consider?

-- 
Best regards,
Aleksander Alekseev



Re: general purpose array_sort

From
"David G. Johnston"
Date:
On Thu, Oct 24, 2024 at 7:58 AM Aleksander Alekseev <aleksander@timescale.com> wrote:
Hi,

> It's hardly "general purpose" if it randomly refuses to
> sort certain types.  I would say it should be able to sort
> anything that ORDER BY will handle --- and that certainly
> includes the cases shown here.

I wonder how useful / convenient the new function will be considering
that we already have CTEs and can do:

SELECT array_agg(x ORDER BY x) FROM unnest(ARRAY[5,1,3,2,4]) AS x;

Perhaps there are use cases I didn't consider?


Succinctness of expression.  Plus I'm under the impression that a function doing this is going to be somewhat faster than composing two functions together within a multi-node subtree.

I feel like the same observation could have been made for array_shuffle but we added that.  This function being added feels to me like just completing the set.

David J.

Re: general purpose array_sort

From
Aleksander Alekseev
Date:
Hi David,

>> > It's hardly "general purpose" if it randomly refuses to
>> > sort certain types.  I would say it should be able to sort
>> > anything that ORDER BY will handle --- and that certainly
>> > includes the cases shown here.
>>
>> I wonder how useful / convenient the new function will be considering
>> that we already have CTEs and can do:
>>
>> SELECT array_agg(x ORDER BY x) FROM unnest(ARRAY[5,1,3,2,4]) AS x;
>>
>> Perhaps there are use cases I didn't consider?
>>
>
> Succinctness of expression.  Plus I'm under the impression that a function doing this is going to be somewhat faster
thancomposing two functions together within a multi-node subtree.
 
>
> I feel like the same observation could have been made for array_shuffle but we added that.  This function being added
feelsto me like just completing the set.
 

Right. To clarify, I'm not opposed to array_sort(). In fact personally
I would prefer using it instead of array_agg() + unnest().

Just making an observation / thinking out loud that the requirement to
support everything ORDER BY handles / supports (and will handle /
support?) might make this function impractical to maintain.
array_shuffle() or a recently proposed array_reverse() [1] are rather
simple since they just move the array elements without looking at
them. array_sort() does look at the elements and thus is very
different.

Particularly, does the function really need to support dir => asc |
desc | asc nulls first | etc... ? Maybe array_sort() + array_reverse(
array_sort() ) will handle most of the practical cases? I don't know.

[1]: https://commitfest.postgresql.org/50/5314/
--
Best regards,
Aleksander Alekseev



Re: general purpose array_sort

From
"David G. Johnston"
Date:
On Thu, Oct 24, 2024 at 8:27 AM Aleksander Alekseev <aleksander@timescale.com> wrote:

Just making an observation / thinking out loud that the requirement to
support everything ORDER BY handles / supports (and will handle /
support?) might make this function impractical to maintain.

Particularly, does the function really need to support dir => asc |
desc | asc nulls first | etc... ? Maybe array_sort() + array_reverse(
array_sort() ) will handle most of the practical cases? I don't know.

[1]: https://commitfest.postgresql.org/50/5314/


Composing function calls here seems quite undesirable from a performance standpoint, but maybe I over-estimate the cost of exploding-manipulating-freezing an array datum.  Also, while I'm not in a good position to judge the challenge of implementing the sort parameters I would rather have them than not since the order by API has them (plus performance).  I also, maybe unreasonably, believe that our extensible type system has already limited the maintenance burden.

David J.

Re: general purpose array_sort

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Composing function calls here seems quite undesirable from a performance
> standpoint, but maybe I over-estimate the cost of
> exploding-manipulating-freezing an array datum.  Also, while I'm not in a
> good position to judge the challenge of implementing the sort parameters I
> would rather have them than not since the order by API has them (plus
> performance).  I also, maybe unreasonably, believe that our extensible type
> system has already limited the maintenance burden.

Oh!  I had not actually looked at the details of what was being
proposed here.  I imagined "array_sort(anyarray)" and the sort would
use the default sort order for the array's element type.  This
business with a textual representation of a sort clause seems like
over-engineering ... except that it's also under-engineered, because
the parsing is lame and incomplete.  (No USING option, and the fact
that collation comes from somewhere else seems impossibly confusing.)
Let's drop that.  As already noted, if you need a non-default sort
order you can build what you want with a sub-select.  The ambition
here should be to make easy cases easy.

            regards, tom lane



Re: general purpose array_sort

From
"David G. Johnston"
Date:
On Thu, Oct 24, 2024 at 9:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
This business with a textual representation of a sort clause seems like
over-engineering ... except that it's also under-engineered, because
the parsing is lame and incomplete.  (No USING option, and the fact
that collation comes from somewhere else seems impossibly confusing.)
Let's drop that.

I can accept this outcome though an optional three-valued boolean sort order (ascending and descending only) I'd argue is worth keeping.  null value placement too I guess, three-valued boolean (nulls_first).

David J.

Re: general purpose array_sort

From
Aleksander Alekseev
Date:
Hi,

> I can accept this outcome though an optional three-valued boolean sort order (ascending and descending only) I'd
argueis worth keeping.  null value placement too I guess, three-valued boolean (nulls_first).
 

Perhaps these optional arguments deserve separate discussions. I
suggest merging something everyone agrees on first. This will simplify
the review process and allow us to deliver value to the users quickly.
Arguments like `reverse => true` and `nulls_first => true` can always
be implemented and added as separate patches.

-- 
Best regards,
Aleksander Alekseev



Re: general purpose array_sort

From
Junwang Zhao
Date:
On Fri, Oct 25, 2024 at 1:19 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi,
>
> > I can accept this outcome though an optional three-valued boolean sort order (ascending and descending only) I'd
argueis worth keeping.  null value placement too I guess, three-valued boolean (nulls_first). 
>
> Perhaps these optional arguments deserve separate discussions. I
> suggest merging something everyone agrees on first. This will simplify
> the review process and allow us to deliver value to the users quickly.
> Arguments like `reverse => true` and `nulls_first => true` can always
> be implemented and added as separate patches.

As this patch uses the tuplesort infrastructure, we need to supply the
sortOperator, sortCollation and nullsFirstFlag, I tend to agree with
David. I admit that the parsing part is not good, so I will remove it
by using two boolean parameters Jian suggested earlier.

Will send out another version by tomorrow.

>
> --
> Best regards,
> Aleksander Alekseev



--
Regards
Junwang Zhao



Re: general purpose array_sort

From
Aleksander Alekseev
Date:
Hi,

> Based on the previous discussion, I split it into two patches in V8.
>
> 0001 is the general sort part without `is_ascending` or `nulls_first`,
> the sort order is determined by the "<" operator of the element type.
> It also cached the type entry of both eletyp and the corresponding
> array type.
>
> 0002 adds the `is_ascending` and `nulls_first` part, it now uses
> two boolean parameters instead of parsing one text parameter.

Thanks for the update patch set. Here are some comments.

0001:

> +{ oid => '8810', descr => 'sort array',
> +  proname => 'array_sort', provolatile => 'v', prorettype => 'anyarray',
> +  proargtypes => 'anyarray', prosrc => 'array_sort'},

I would expect that array_sort() should be IMMUTABLE. Is there a
reason for it to be VOLATILE?

> +        <function>array_sort</function> ( <type>anyarray</type> <optional> COLLATE
<replaceable>collation_name</replaceable></optional>)
 
> +        <returnvalue>anyarray</returnvalue>

It seems to me that the part about using COLLATE should be moved
below, to the description / examples section, since it's not part of
the function signature.

Also the description should be more specific about how NULLs are
sorted. NULLs also should be covered by tests.

0002:

> <parameter>is_ascending</parameter>

I really believe this name is not the best one. I suggest using
`reverse => true`. `nulls_first` is OK.

> +Datum
> +array_sort_order(PG_FUNCTION_ARGS)
> +{
> +    return array_sort(fcinfo);
> +}
> +
> +Datum
> +array_sort_order_nulls_first(PG_FUNCTION_ARGS)
> +{
> +    return array_sort(fcinfo);
> +}

Any reason not to specify array_sort in pg_proc.dat?

The tests cover is_ascending => true | false, which is OK, but only
(is_ascending = true, nulls_first => true) and (is_ascending => false,
nulls_fist => false). For the case when both optional arguments are
specified you have to test at least 4 combinations.

-- 
Best regards,
Aleksander Alekseev



Re: general purpose array_sort

From
jian he
Date:
On Tue, Oct 29, 2024 at 12:48 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:.
>
> 0001:
>
> > +{ oid => '8810', descr => 'sort array',
> > +  proname => 'array_sort', provolatile => 'v', prorettype => 'anyarray',
> > +  proargtypes => 'anyarray', prosrc => 'array_sort'},
>
> I would expect that array_sort() should be IMMUTABLE. Is there a
> reason for it to be VOLATILE?
>

https://www.postgresql.org/docs/current/sql-createfunction.html says:

IMMUTABLE indicates that the function cannot modify the database and always
returns the same result when given the same argument values; that is, it does
not do database lookups or otherwise use information not directly present in its
argument list. If this option is given, any call of the function with
all-constant arguments can be immediately replaced with the function value.


+ {
+ typentry = lookup_type_cache(elmtyp, TYPECACHE_LT_OPR);
+ if (!OidIsValid(typentry->lt_opr))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("could not identify ordering operator for type %s",
+ format_type_be(elmtyp))));

This error can happen. I think this conflicts with the doc IMMUTABLE
description.



Re: general purpose array_sort

From
Aleksander Alekseev
Date:
Hi Jian,

> IMMUTABLE indicates that the function cannot modify the database and always
> returns the same result when given the same argument values; that is, it does
> not do database lookups or otherwise use information not directly present in its
> argument list. If this option is given, any call of the function with
> all-constant arguments can be immediately replaced with the function value.
>
>
> + {
> + typentry = lookup_type_cache(elmtyp, TYPECACHE_LT_OPR);
> + if (!OidIsValid(typentry->lt_opr))
> + ereport(ERROR,
> + (errcode(ERRCODE_UNDEFINED_FUNCTION),
> + errmsg("could not identify ordering operator for type %s",
> + format_type_be(elmtyp))));
>
> This error can happen. I think this conflicts with the doc IMMUTABLE
> description.

lookup_type_cache() is used at least by array_position() which is
marked as IMMUTABLE, so I believe this is fine. Similarly functions
dealing with timezones can return different results between the DBMS
restarts / updates, but we don't care and mark them IMMUTABLE anyway.
Otherwise we couldn't use these functions in functional indexes which
will make them rather useless.

-- 
Best regards,
Aleksander Alekseev



Re: general purpose array_sort

From
Aleksander Alekseev
Date:
Hi,

Thanks for the updated patch set.

> > > +Datum
> > > +array_sort_order(PG_FUNCTION_ARGS)
> > > +{
> > > +    return array_sort(fcinfo);
> > > +}
> > > +
> > > +Datum
> > > +array_sort_order_nulls_first(PG_FUNCTION_ARGS)
> > > +{
> > > +    return array_sort(fcinfo);
> > > +}
> >
> > Any reason not to specify array_sort in pg_proc.dat?
>
> It is specified in 0001 (see oid => '8810').

What I meant was that I don't think these wrapper functions are
needed. I think you can just do:

```
+{ oid => '8811', descr => 'sort array',
+  proname => 'array_sort', prorettype => 'anyarray',
+  proargtypes => 'anyarray bool', prosrc => 'array_sort'}, <--
array_sort is used directly in `prosrc`
```

... unless I'm missing something.

-- 
Best regards,
Aleksander Alekseev



Re: general purpose array_sort

From
Junwang Zhao
Date:
Hi,

On Wed, Oct 30, 2024 at 9:29 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi,
>
> Thanks for the updated patch set.
>
> > > > +Datum
> > > > +array_sort_order(PG_FUNCTION_ARGS)
> > > > +{
> > > > +    return array_sort(fcinfo);
> > > > +}
> > > > +
> > > > +Datum
> > > > +array_sort_order_nulls_first(PG_FUNCTION_ARGS)
> > > > +{
> > > > +    return array_sort(fcinfo);
> > > > +}
> > >
> > > Any reason not to specify array_sort in pg_proc.dat?
> >
> > It is specified in 0001 (see oid => '8810').
>
> What I meant was that I don't think these wrapper functions are
> needed. I think you can just do:
>
> ```
> +{ oid => '8811', descr => 'sort array',
> +  proname => 'array_sort', prorettype => 'anyarray',
> +  proargtypes => 'anyarray bool', prosrc => 'array_sort'}, <--
> array_sort is used directly in `prosrc`
> ```
>
> ... unless I'm missing something.

There is a opr sanity check for this[1], if we remove these wrapper functions,
regression test will fail with:

- oid | proname | oid | proname
------+---------+-----+---------
-(0 rows)
+ oid  |  proname   | oid  |  proname
+------+------------+------+------------
+ 8811 | array_sort | 8812 | array_sort
+ 8810 | array_sort | 8811 | array_sort
+ 8810 | array_sort | 8812 | array_sort
+(3 rows)


[1]:

-- Considering only built-in procs (prolang = 12), look for multiple uses
-- of the same internal function (ie, matching prosrc fields). It's OK to
-- have several entries with different pronames for the same internal function,
-- but conflicts in the number of arguments and other critical items should
-- be complained of. (We don't check data types here; see next query.)
-- Note: ignore aggregate functions here, since they all point to the same
-- dummy built-in function.

SELECT p1.oid, p1.proname, p2.oid, p2.proname
FROM pg_proc AS p1, pg_proc AS p2
WHERE p1.oid < p2.oid AND
p1.prosrc = p2.prosrc AND
p1.prolang = 12 AND p2.prolang = 12 AND
(p1.prokind != 'a' OR p2.prokind != 'a') AND
(p1.prolang != p2.prolang OR
p1.prokind != p2.prokind OR
p1.prosecdef != p2.prosecdef OR
p1.proleakproof != p2.proleakproof OR
p1.proisstrict != p2.proisstrict OR
p1.proretset != p2.proretset OR
p1.provolatile != p2.provolatile OR
p1.pronargs != p2.pronargs);

>
> --
> Best regards,
> Aleksander Alekseev



--
Regards
Junwang Zhao



Re: general purpose array_sort

From
jian he
Date:
On Mon, Nov 4, 2024 at 1:46 PM Michael Paquier <michael@paquier.xyz> wrote:
>
>
> +    typentry = lookup_type_cache(elmtyp, TYPECACHE_LT_OPR);
> +    if (!OidIsValid(typentry->lt_opr))
> +        ereport(ERROR,
> +                (errcode(ERRCODE_UNDEFINED_FUNCTION),
> +                 errmsg("could not identify ordering operator for type %s",
> +                    format_type_be(elmtyp))));
>
> The patch introduces two error paths based on the fact that ordering
> operators could not be found depending on a data type that lacks the
> ordering operator and the array ordering operator part.  It is right
> to issue an error if these are lacking, like the various stats paths.
> Should we have some regression tests with specific data types for
> these errors, though?  The stats paths don't care much about these
> error cases, but it does not mean that we should not care about them.
> In short, let's have negative test coverage if we can.
>
select distinct oprleft::regtype from pg_operator where oprname = '='
and oprleft = oprright
except all
select distinct oprleft::regtype from pg_operator where oprname = '<'
and oprleft = oprright;
returns

hstore
cid
aclitem
xid
line

simple tests case using xid data type would be

SELECT array_sort('{{1,2,3}}'::xid[]);


> +typedef struct ArraySortCachedInfo
> +{
> +    TypeCacheEntry *typentry;
> +    TypeCacheEntry *array_typentry;
> +} ArraySortCachedInfo;
>
> Let's put that at the top of the file, with a comment about how it
> links to array_sort() for the caching with fn_extra.  Let's also
> document the meaning of the fields.
>
> FWIW, I am confused by this implementation, where you have to allocate
> the two TypeCacheEntry because of the fact that you have to deal with
> the 1-dimension case and the multi-dimension case.  In the context of
> a single function call, why do you need both typentry and
> array_typentry, actually?  Wouldn't it be enough to use one typentry
> that points to the typcache, meaning that you don't really need to use
> the extra business with fn_mcxt, no?  If you require both (because I
> may be wrong), perhaps you should have a regression test that's able
> to break when removing array_typentry, changing the code to only rely
> on typentry.   Note: I have just removed array_typentry in a quick
> test, current coverage was happy about it.  Feel free to prove me
> wrong.
>

drop table if exists t;
CREATE TABLE t (a int[]);
insert into t values ('{1,3}'),('{1,2,3}'),('{11}');
insert into t values ('{{1,12}}'), ('{{4,3}}');
SELECT array_sort(a) from t;

In the above case,
tuplesort_begin_datum needs the int type information and int[] type information.
otherwise the cached TypeCacheEntry is being used to sort mult-dimension array,
which will make the result false.



Re: general purpose array_sort

From
Dean Rasheed
Date:
On Sun, 3 Nov 2024 at 03:33, Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> PFA v11.
>

Testing this with an array with non-default lower bounds, it fails to
preserve the array bounds, which I think it should (note:
array_reverse() and array_shuffle() do preserve the bounds):

SELECT array_reverse(a), array_shuffle(a), array_sort(a)
  FROM (VALUES ('[10:12][20:21]={{1,2},{10,20},{3,4}}'::int[])) v(a);

-[ RECORD 1 ]-+-------------------------------------
array_reverse | [10:12][20:21]={{3,4},{10,20},{1,2}}
array_shuffle | [10:12][20:21]={{10,20},{3,4},{1,2}}
array_sort    | [1:3][20:21]={{1,2},{3,4},{10,20}}

Regards,
Dean



Re: general purpose array_sort

From
Junwang Zhao
Date:
Hi jian,

On Tue, Nov 5, 2024 at 3:13 PM jian he <jian.universality@gmail.com> wrote:
>
> On Mon, Nov 4, 2024 at 7:34 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> >
> > Testing this with an array with non-default lower bounds, it fails to
> > preserve the array bounds, which I think it should (note:
> > array_reverse() and array_shuffle() do preserve the bounds):
> >
> > SELECT array_reverse(a), array_shuffle(a), array_sort(a)
> >   FROM (VALUES ('[10:12][20:21]={{1,2},{10,20},{3,4}}'::int[])) v(a);
> >
> > -[ RECORD 1 ]-+-------------------------------------
> > array_reverse | [10:12][20:21]={{3,4},{10,20},{1,2}}
> > array_shuffle | [10:12][20:21]={{10,20},{3,4},{1,2}}
> > array_sort    | [1:3][20:21]={{1,2},{3,4},{10,20}}
> >
>
> if i understand it correctly,
> array_create_iterator cannot cope with top dimension bound information.
> since input array arguments already have dims, lbs information.
> so at the end of array_sort directly copy
> from the input array argument to astate.
>
> tuplesort_performsort won't need array bounds, we should be safe?
>
>
>
> v12-0001 same as v11-0001-general-purpose-array_sort.patch, only
> resolve git conflict
> v12-0002 preserve array bound information.
> v12-0003 cache ArrayMetaState.
>
> after v12-0003 now
> typedef struct ArraySortCachedInfo
> {
>     TypeCacheEntry *typentry;
>     TypeCacheEntry *array_typentry;
>     ArrayMetaState array_meta;
> } ArraySortCachedInfo;
>
> function array_create_iterator, get_typlenbyvalalign
> will do cache search, we can cache ArrayMetaState.
> so multiple array_create_iterator calls won't need to call get_typlenbyvalalign.
> every time.
>
>
> 0002, I also have a 3 dimensional array test.
> create table t(a int[]);
> insert into t values ('[-1:-0]={7,1}'::int[]),
> ('[-2:-0][20:21]={{1,2},{10,20},{1,-4}}'),
> ('[-2:-0][20:22]={{-11,2,-1},{-11,2, 1},{-11,-4, 10}}'),
> ('[-13:-10][0:1][20:22]={
> {{1,2,112},{1,2,-123}},
> {{10,-20,1},{11,123,3}},
> {{10,-20,1},{11,-123,-9}},
> {{1,2,-11},{1,2,211}}}'::int[]);
> SELECT array_sort(t.a) from t;
> SELECT array_sort((t.a) [-13:-10][0:1][21:22]) from t where array_ndims(a) = 3;
> SELECT array_sort((t.a) [-13:-11][0:1][21:22]) from t where array_ndims(a) = 3;
> SELECT array_sort((t.a) [-13:-11][0:0][20:21]) from t where array_ndims(a) = 3;
>
> The test output is ok to me.

Thanks for the bounds preserve solution, I just looked at 0002,

+ if (astate->arraystate != NULL)
+ {
+ memcpy(astate->arraystate->dims, dims, ndim * sizeof(int));
+ memcpy(astate->arraystate->lbs, lbs, ndim * sizeof(int));
+ Assert(ndim == astate->arraystate->ndims);
+ }

It seems to me we only need to set astate->arraystate->lbs[0] = lbs[0] ?

--
Regards
Junwang Zhao



Re: general purpose array_sort

From
jian he
Date:
On Tue, Nov 5, 2024 at 8:30 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
>
> Thanks for the bounds preserve solution, I just looked at 0002,
>
> + if (astate->arraystate != NULL)
> + {
> + memcpy(astate->arraystate->dims, dims, ndim * sizeof(int));
> + memcpy(astate->arraystate->lbs, lbs, ndim * sizeof(int));
> + Assert(ndim == astate->arraystate->ndims);
> + }
>
> It seems to me we only need to set astate->arraystate->lbs[0] = lbs[0] ?
>
yes.

> + memcpy(astate->arraystate->dims, dims, ndim * sizeof(int));
thinking about it, this is wrong. we should just do Assert
        for(int i = 0; i < ndim; i++)
        {
            Assert(astate->arraystate->dims[i] == dims[i]);
        }

or just remove
 memcpy(astate->arraystate->dims, dims, ndim * sizeof(int));



Re: general purpose array_sort

From
Junwang Zhao
Date:
Hi Michael,

On Mon, Nov 4, 2024 at 1:46 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sun, Nov 03, 2024 at 11:33:05AM +0800, Junwang Zhao wrote:
> > Rebase needed due to array_reverse committed, PFA v11.
>
> There has been another conflict since you have posted this version
> (noticed that after my business in 027124a872d7).  I have looked at
> 0001.
>
> +    if (ARR_NDIM(array) < 1)
> +        PG_RETURN_ARRAYTYPE_P(array);
> There is no point in doing a sort if the array has only one element.
> You can add a check based on "ARR_DIMS(array)[0] < 2" to achieve that.

Yeah, this is reasonable but one case I can't be sure:

SELECT array_sort('{{2,3,4}}'::xid[]);

This will return the array as is, but xid doesn't have a LT_OPR, should
I error out in this case? like:

could not identify ordering operator for type xid[]

>
> +    typentry = lookup_type_cache(elmtyp, TYPECACHE_LT_OPR);
> +    if (!OidIsValid(typentry->lt_opr))
> +        ereport(ERROR,
> +                (errcode(ERRCODE_UNDEFINED_FUNCTION),
> +                 errmsg("could not identify ordering operator for type %s",
> +                    format_type_be(elmtyp))));
>
> The patch introduces two error paths based on the fact that ordering
> operators could not be found depending on a data type that lacks the
> ordering operator and the array ordering operator part.  It is right
> to issue an error if these are lacking, like the various stats paths.
> Should we have some regression tests with specific data types for
> these errors, though?  The stats paths don't care much about these
> error cases, but it does not mean that we should not care about them.
> In short, let's have negative test coverage if we can.
>
> +typedef struct ArraySortCachedInfo
> +{
> +    TypeCacheEntry *typentry;
> +    TypeCacheEntry *array_typentry;
> +} ArraySortCachedInfo;
>
> Let's put that at the top of the file, with a comment about how it
> links to array_sort() for the caching with fn_extra.  Let's also
> document the meaning of the fields.

Will fix it in the following patch set.

>
> FWIW, I am confused by this implementation, where you have to allocate
> the two TypeCacheEntry because of the fact that you have to deal with
> the 1-dimension case and the multi-dimension case.  In the context of
> a single function call, why do you need both typentry and
> array_typentry, actually?  Wouldn't it be enough to use one typentry
> that points to the typcache, meaning that you don't really need to use
> the extra business with fn_mcxt, no?  If you require both (because I
> may be wrong), perhaps you should have a regression test that's able
> to break when removing array_typentry, changing the code to only rely
> on typentry.   Note: I have just removed array_typentry in a quick
> test, current coverage was happy about it.  Feel free to prove me
> wrong.
>
> Agreed that the function should be immutable.  The results are fixed
> depending on the input even with the COLLATE clauses appended.
>
> Let's add something when there is only one element in the first
> dimension of the array, say two cases one with an int and one with an
> array of ints like:
> SELECT array_sort('{1}'::int[]);
> SELECT array_sort('{{1}}'::int[]);

Will add.

> --
> Michael



--
Regards
Junwang Zhao



Re: general purpose array_sort

From
Junwang Zhao
Date:
On Tue, Nov 5, 2024 at 9:13 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Nov 04, 2024 at 03:16:35PM +0800, jian he wrote:
> > drop table if exists t;
> > CREATE TABLE t (a int[]);
> > insert into t values ('{1,3}'),('{1,2,3}'),('{11}');
> > insert into t values ('{{1,12}}'), ('{{4,3}}');
> > SELECT array_sort(a) from t;
> >
> > In the above case,
> > tuplesort_begin_datum needs the int type information and int[] type information.
> > otherwise the cached TypeCacheEntry is being used to sort mult-dimension array,
> > which will make the result false.
>
> All these behaviors need more extensive testing.
>
> This brings me an extra question around the caching.  Would the
> sorting be able to behave correctly when feeding to a single
> array_sort() context array values that have multiple COLLATE clauses?
> Or merge_collation_state() would be smart enough to make sure that
> collation conflicts never happen to begin with?  I am wondering if we
> should worry about multiple VALUES, CTEs, or PL functions where
> array_sort() could be fed into its cache values that lead to
> unpredictible results for some values.  This stuff should perhaps have
> more testing around such behaviors, stressing what kind of
> interactions we have between the sorting of multiple values and the
> caching, in the context of a single array_sort() call.

I'm afraid this can not be achieved in my current implementation, a simple
case is:

SELECT array_sort('{foo,bar,null,CCC,Abc,bbc}'::text[]);
{Abc,bar,bbc,CCC,foo,NULL}
SELECT array_sort('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "C");
{Abc,CCC,bar,bbc,foo,NULL}

SELECT array_sort(a COLLATE "C") FROM (VALUES
('{foo,bar,null,CCC,Abc,bbc}'::text[] COLLATE "C"),
('{foo,bar,null,CCC,Abc,bbc}'::text[])) v(a);
{Abc,CCC,bar,bbc,foo,NULL}
{Abc,CCC,bar,bbc,foo,NULL}

Maybe add some documents to specify this?

> --
> Michael



--
Regards
Junwang Zhao



Re: general purpose array_sort

From
Robert Haas
Date:
On Thu, Nov 7, 2024 at 8:56 AM Junwang Zhao <zhjwpku@gmail.com> wrote:
> Yeah, this is reasonable but one case I can't be sure:
>
> SELECT array_sort('{{2,3,4}}'::xid[]);
>
> This will return the array as is, but xid doesn't have a LT_OPR, should
> I error out in this case? like:
>
> could not identify ordering operator for type xid[]

Yes, I think that case needs to error out. It seems best to identify
the ordering operator before you decide whether or not you have >1
element.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: general purpose array_sort

From
Junwang Zhao
Date:
On Thu, Nov 7, 2024 at 10:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Nov 7, 2024 at 8:56 AM Junwang Zhao <zhjwpku@gmail.com> wrote:
> > Yeah, this is reasonable but one case I can't be sure:
> >
> > SELECT array_sort('{{2,3,4}}'::xid[]);
> >
> > This will return the array as is, but xid doesn't have a LT_OPR, should
> > I error out in this case? like:
> >
> > could not identify ordering operator for type xid[]
>
> Yes, I think that case needs to error out. It seems best to identify
> the ordering operator before you decide whether or not you have >1
> element.

Got it, will do this in the next version.

>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com



--
Regards
Junwang Zhao



Re: general purpose array_sort

From
jian he
Date:
On Fri, Nov 8, 2024 at 8:52 AM Michael Paquier <michael@paquier.xyz> wrote:
>

> I am wondering if there are more fancy cases where the saved cache
> could force a state that would lead to puzzling results, say with
> different collations that should be applied.  I'd recommend to
> research that more, to reflect that in the docs and to add tests that
> show what we should expect in these cases within 0001 because this
> new function is mimicking in the context of a function execution
> multiple query clauses where restrictions are applied when analyzing
> the query, close to the parser.
>
> For example, UNION and UNION ALL require a common collation when
> processing a set of expressions related to them, which would be OK.
> Perhaps I lack some imagination to be able to break things.
> --


We had 3 error occurrences of
ERROR:  could not determine which collation to use for string comparison
in collate.linux.utf8.out.
one is UNION ALL, another two is do comparison with two text arguments.

here array_sort only takes one argument, there is not that much place
to go wrong?
potential misbehavior would be only about UNION ALL?

UNION ALL for two tables, for collation, we can both implicit;  both
explicit' one implicit,one explicit.
if both explicit, then it will error out quite easily.
if one side is explicit, another side explicitly, then we use
explicitly, which is what we expected.

the trick is that both are implicit.

drop table t1,t2;
create table t1(a int, b text[] COLLATE "C");
create table t2(a int, b text[] COLLATE case_sensitive);
insert into t1 values (1, '{foo,bar,null,CCC,Abc,bbc}'::text[]);
insert into t2 values (2, '{foo,bar,null,CCC,Abc,bbc}'::text[]);

create domain dtxt as text[] collate case_insensitive;
CREATE OR REPLACE FUNCTION mytxt_coll(x text[]) RETURNS dtxt LANGUAGE
plpgsql AS $$
declare
  xx text[] COLLATE case_insensitive;
begin
    xx := x;
    return xx collate case_insensitive;
end
$$;

--these three fail.
select array_sort(b) from (select b from t1 union all select b from t2) sub;
select array_sort(b) from (select mytxt_coll(b) as b from t2 union all
select b from t1 ) sub;
select array_sort(b) from (select b from
mytxt_coll('{foo,bar,null,CCC,Abc,bbc}'::text[] collate
case_insensitive) f(b) union all select b from t1) sub;

-----
select array_sort(b) from (select b from t1 union all select b from
mytxt_coll('{foo,bar,null,CCC,Abc,bbc}'::text[]) f(b)) sub;
select array_sort(b) from (select b from
mytxt_coll('{foo,bar,null,CCC,Abc,bbc}'::text[]) f(b) union all select
b from t1 ) sub;

these two query outputs are the same, which is what we expected per
quote from manual:
<<>>
otherwise, all input expressions must have the same implicit collation
derivation or the default collation.
If any non-default collation is present, that is the result of the
collation combination.
Otherwise, the result is the default collation.
<<>>
https://www.postgresql.org/docs/current/collation.html#COLLATION-CONCEPTS

also we have varstr_sortsupport->check_collation_set to make sure we
have a single valid collation for array_sort.

overall, I think the current implementation works fine.