Thread: Re: general purpose array_sort

Re: general purpose array_sort

From
Junwang Zhao
Date:
On Sat, Sep 28, 2024 at 10:41 PM jian he <jian.universality@gmail.com> wrote:
>
> On Sat, Sep 28, 2024 at 7:52 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> >
> > PFA v2, use COLLATE keyword to supply the collation suggested by
> > Andreas offlist.
> >
> this is better. otherwise we need extra care to handle case like:
> SELECT array_sort('{1,3,5,2,4,6}'::int[] COLLATE "pg_c_utf8");
>
>
> +      <row>
> +       <entry role="func_table_entry"><para role="func_signature">
> +        <indexterm>
> +         <primary>array_sort</primary>
> +        </indexterm>
> +        <function>array_sort</function> ( <type>anyarray</type>
> <optional>, <parameter>dir</parameter> </optional>)
> +        <returnvalue>anyarray</returnvalue>
> +       </para>
> +       <para>
> +        Sorts the array in either ascending or descending order.
> +        <parameter>dir</parameter> must be <literal>asc</literal>
> +        or <literal>desc</literal>. The array must be empty or one-dimensional.
> +       </para>
> +       <para>
> +        <literal>array_sort(ARRAY[1,2,5,6,3,4])</literal>
> +        <returnvalue>{1,2,3,4,5,6}</returnvalue>
> +       </para></entry>
> +      </row>
> I am confused with <parameter>dir</parameter>. I guess you want to say
> "direction"
> But here, I think <parameter>sort_asc</parameter> would be more appropriate?

This doc is mostly copied and edited from intarray.sgml sort part.

And the logic is basically the same, you can check the intarray module.

>
>
> <parameter>dir</parameter> can have only two potential values, make it
> as a boolean would be more easier?
> you didn't mention information:  "by default, it will sort by
> ascending order; the sort collation by default is using the array
> element type's collation"
>
> tuplesort_begin_datum can do null-first, null-last, so the
> one-dimension array can allow null values.

The following(create extension intarry first) will give an error, I
keep the same for array_sort.

SELECT sort('{1234234,-30,234234, null}');

>
> Based on the above and others, I did some refactoring, feel free to take it.
> my changes, changed the function signature, so you need to pay
> attention to sql test file.

Thanks for your refactor, I will take some in the next version.


--
Regards
Junwang Zhao



Re: general purpose array_sort

From
"David G. Johnston"
Date:
On Sat, Sep 28, 2024 at 7:05 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
On Sat, Sep 28, 2024 at 10:41 PM jian he <jian.universality@gmail.com> wrote:
>
> <parameter>dir</parameter> can have only two potential values, make it
> as a boolean would be more easier?
> you didn't mention information:  "by default, it will sort by
> ascending order; the sort collation by default is using the array
> element type's collation"
>
> tuplesort_begin_datum can do null-first, null-last, so the
> one-dimension array can allow null values.

The following(create extension intarry first) will give an error, I
keep the same for array_sort.

SELECT sort('{1234234,-30,234234, null}');


I would suggest accepting:
asc
desc
asc nulls first
asc nulls last *
desc nulls first *
desc nulls last

As valid inputs for "dir" - and that the starred options are the defaults when null position is omitted.

In short, mimic create index.

David J.

Re: general purpose array_sort

From
jian he
Date:
On Mon, Sep 30, 2024 at 1:01 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> > I would suggest accepting:
> > asc
> > desc
> > asc nulls first
> > asc nulls last *
> > desc nulls first *
> > desc nulls last
> >
> > As valid inputs for "dir" - and that the starred options are the defaults when null position is omitted.
> >
> > In short, mimic create index.
> >
> > David J.
> >
>
> PFA v3 with David's suggestion addressed.
>

I think just adding 2 bool arguments (asc/desc, nulls last/not nulls
last) would be easier.
but either way, (i don't have a huge opinion)
but document the second argument, imagine case
SELECT array_sort('{a,B}'::text[] , E'aSc NulLs  LaST \t\r\n');
would be tricky?


errmsg("multidimensional arrays sorting are not supported")));
write a sql test to trigger the error message that would be great.

you can add two or one example to collate.icu.utf8.sql to demo that it
actually works with COLLATE  collation_name
like:
SELECT array_sort('{a,B}'::text[] COLLATE case_insensitive);
SELECT array_sort('{a,B}'::text[] COLLATE "C");


#define WHITESPACE " \t\n\r"
you may also check function scanner_isspace


+ typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (typentry == NULL || typentry->type_id != elmtyp)
+ {
+ typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
+ fcinfo->flinfo->fn_extra = (void *) typentry;
+ }
you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
see CreateStatistics.
            /* Disallow data types without a less-than operator */
            type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
            if (type->lt_opr == InvalidOid)
                ereport(ERROR,
                        (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                         errmsg("column \"%s\" cannot be used in
statistics because its type %s has no default btree operator class",
                                attname, format_type_be(attForm->atttypid))));



Re: general purpose array_sort

From
jian he
Date:
> >
> > + typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
> > + if (typentry == NULL || typentry->type_id != elmtyp)
> > + {
> > + typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
> > TYPECACHE_GT_OPR);
> > + fcinfo->flinfo->fn_extra = (void *) typentry;
> > + }
> > you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
> > see CreateStatistics.
> >             /* Disallow data types without a less-than operator */
> >             type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
> >             if (type->lt_opr == InvalidOid)
> >                 ereport(ERROR,
> >                         (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> >                          errmsg("column \"%s\" cannot be used in
> > statistics because its type %s has no default btree operator class",
> >                                 attname, format_type_be(attForm->atttypid))));
>
> I added an Assert for this part, not sure if that is enough.
>

i think it really should be:

if (typentry == NULL || typentry->type_id != elmtyp)
{
 typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
TYPECACHE_GT_OPR);
 fcinfo->flinfo->fn_extra = (void *) typentry;
if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
OidIsValid(typentry->gt_opr));
ereport(ERROR,....)
}

Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
then we cannot do the sort, we should just error out.

I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);

select array_sort('{#FF0000, #FF0000}'::colour[]);
of course it will segfault  with your new Assert.


[1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c



Re: general purpose array_sort

From
Amit Langote
Date:
Hi Junwang,

On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:
> >
> > > >
> > > > + typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
> > > > + if (typentry == NULL || typentry->type_id != elmtyp)
> > > > + {
> > > > + typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
> > > > TYPECACHE_GT_OPR);
> > > > + fcinfo->flinfo->fn_extra = (void *) typentry;
> > > > + }
> > > > you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
> > > > see CreateStatistics.
> > > >             /* Disallow data types without a less-than operator */
> > > >             type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
> > > >             if (type->lt_opr == InvalidOid)
> > > >                 ereport(ERROR,
> > > >                         (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > > >                          errmsg("column \"%s\" cannot be used in
> > > > statistics because its type %s has no default btree operator class",
> > > >                                 attname, format_type_be(attForm->atttypid))));
> > >
> > > I added an Assert for this part, not sure if that is enough.
> > >
> >
> > i think it really should be:
> >
> > if (typentry == NULL || typentry->type_id != elmtyp)
> > {
> >  typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
> > TYPECACHE_GT_OPR);
> >  fcinfo->flinfo->fn_extra = (void *) typentry;
> > if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
> > OidIsValid(typentry->gt_opr));
> > ereport(ERROR,....)
> > }
> >
> > Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
> > then we cannot do the sort, we should just error out.
> >
> > I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
> > colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);
> >
> > select array_sort('{#FF0000, #FF0000}'::colour[]);
> > of course it will segfault  with your new Assert.
> >
> >
> > [1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c
>
> Make sense, PFA v5 with Jian's suggestion.

Have you noticed that the tests have failed on Cirrus CI runs of this patch?

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277

It might be related to the test machines having a different *default*
locale than your local environment, which could result in a different
sort order for the test data. You may need to add an explicit COLLATE
clause to the tests to ensure consistent sorting across systems.

--
Thanks, Amit Langote



Re: general purpose array_sort

From
Junwang Zhao
Date:
On Wed, Oct 9, 2024 at 10:10 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> Hi Amit,
>
> On Thu, Oct 3, 2024 at 2:22 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > Hi Junwang,
> >
> > On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> > > On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:
> > > >
> > > > > >
> > > > > > + typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
> > > > > > + if (typentry == NULL || typentry->type_id != elmtyp)
> > > > > > + {
> > > > > > + typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
> > > > > > TYPECACHE_GT_OPR);
> > > > > > + fcinfo->flinfo->fn_extra = (void *) typentry;
> > > > > > + }
> > > > > > you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
> > > > > > see CreateStatistics.
> > > > > >             /* Disallow data types without a less-than operator */
> > > > > >             type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
> > > > > >             if (type->lt_opr == InvalidOid)
> > > > > >                 ereport(ERROR,
> > > > > >                         (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > > > > >                          errmsg("column \"%s\" cannot be used in
> > > > > > statistics because its type %s has no default btree operator class",
> > > > > >                                 attname, format_type_be(attForm->atttypid))));
> > > > >
> > > > > I added an Assert for this part, not sure if that is enough.
> > > > >
> > > >
> > > > i think it really should be:
> > > >
> > > > if (typentry == NULL || typentry->type_id != elmtyp)
> > > > {
> > > >  typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
> > > > TYPECACHE_GT_OPR);
> > > >  fcinfo->flinfo->fn_extra = (void *) typentry;
> > > > if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
> > > > OidIsValid(typentry->gt_opr));
> > > > ereport(ERROR,....)
> > > > }
> > > >
> > > > Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
> > > > then we cannot do the sort, we should just error out.
> > > >
> > > > I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
> > > > colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);
> > > >
> > > > select array_sort('{#FF0000, #FF0000}'::colour[]);
> > > > of course it will segfault  with your new Assert.
> > > >
> > > >
> > > > [1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c
> > >
> > > Make sense, PFA v5 with Jian's suggestion.
> >
> > Have you noticed that the tests have failed on Cirrus CI runs of this patch?
> >
> > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277
>
> Sorry for the late reply due to my vacation. I should have paid
> more attention to Cirrus CI earlier ;)
>
> >
> > It might be related to the test machines having a different *default*
> > locale than your local environment, which could result in a different
> > sort order for the test data. You may need to add an explicit COLLATE
> > clause to the tests to ensure consistent sorting across systems.
>
> I've changed the tests to use just ASCII characters, then added
> *COLLATE "C"* to the tests and CI passed, PFA v6.

Sadly the CI only passed on my own github repo, it failed on
cfbot[1], will dig into the reason later because I can not open the cirrus
ci page right now ;(

[1] https://cirrus-ci.com/task/5815925960605696

>
> >
> > --
> > Thanks, Amit Langote
>
>
>
> --
> Regards
> Junwang Zhao



--
Regards
Junwang Zhao



Re: general purpose array_sort

From
Junwang Zhao
Date:
On Wed, Oct 9, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> On Wed, Oct 9, 2024 at 10:10 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> >
> > Hi Amit,
> >
> > On Thu, Oct 3, 2024 at 2:22 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > >
> > > Hi Junwang,
> > >
> > > On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> > > > On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote:
> > > > >
> > > > > > >
> > > > > > > + typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
> > > > > > > + if (typentry == NULL || typentry->type_id != elmtyp)
> > > > > > > + {
> > > > > > > + typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
> > > > > > > TYPECACHE_GT_OPR);
> > > > > > > + fcinfo->flinfo->fn_extra = (void *) typentry;
> > > > > > > + }
> > > > > > > you need to one-time check typentry->lt_opr or typentry->gt_opr exists?
> > > > > > > see CreateStatistics.
> > > > > > >             /* Disallow data types without a less-than operator */
> > > > > > >             type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
> > > > > > >             if (type->lt_opr == InvalidOid)
> > > > > > >                 ereport(ERROR,
> > > > > > >                         (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > > > > > >                          errmsg("column \"%s\" cannot be used in
> > > > > > > statistics because its type %s has no default btree operator class",
> > > > > > >                                 attname, format_type_be(attForm->atttypid))));
> > > > > >
> > > > > > I added an Assert for this part, not sure if that is enough.
> > > > > >
> > > > >
> > > > > i think it really should be:
> > > > >
> > > > > if (typentry == NULL || typentry->type_id != elmtyp)
> > > > > {
> > > > >  typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR :
> > > > > TYPECACHE_GT_OPR);
> > > > >  fcinfo->flinfo->fn_extra = (void *) typentry;
> > > > > if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as &&
> > > > > OidIsValid(typentry->gt_opr));
> > > > > ereport(ERROR,....)
> > > > > }
> > > > >
> > > > > Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR
> > > > > then we cannot do the sort, we should just error out.
> > > > >
> > > > > I just tried this colour type [1] with (CREATE TYPE colour (INPUT =
> > > > > colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4);
> > > > >
> > > > > select array_sort('{#FF0000, #FF0000}'::colour[]);
> > > > > of course it will segfault  with your new Assert.
> > > > >
> > > > >
> > > > > [1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c
> > > >
> > > > Make sense, PFA v5 with Jian's suggestion.
> > >
> > > Have you noticed that the tests have failed on Cirrus CI runs of this patch?
> > >
> > > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277
> >
> > Sorry for the late reply due to my vacation. I should have paid
> > more attention to Cirrus CI earlier ;)
> >
> > >
> > > It might be related to the test machines having a different *default*
> > > locale than your local environment, which could result in a different
> > > sort order for the test data. You may need to add an explicit COLLATE
> > > clause to the tests to ensure consistent sorting across systems.
> >
> > I've changed the tests to use just ASCII characters, then added
> > *COLLATE "C"* to the tests and CI passed, PFA v6.
>
> Sadly the CI only passed on my own github repo, it failed on
> cfbot[1], will dig into the reason later because I can not open the cirrus
> ci page right now ;(
>
> [1] https://cirrus-ci.com/task/5815925960605696
>

Seems the failure is not related to this patch, I guess the reason for
this is the stop phase doesn't
release the port properly?

2024-10-09 14:53:10.079 UTC [43052][checkpointer] LOG:  checkpoint
complete: wrote 5617 buffers (34.3%), wrote 3 SLRU buffers; 0 WAL
file(s) added, 0 removed, 3 recycled; write=0.107 s, sync=0.001 s,
total=0.107 s; sync files=0, longest=0.000 s, average=0.000 s;
distance=45239 kB, estimate=45239 kB; lsn=0/414A138, redo
lsn=0/414A138
2024-10-09 14:53:10.084 UTC [43050][postmaster] LOG:  database system
is shut down
2024-10-09 14:53:10.215 UTC [43270][postmaster] LOG:  starting
PostgreSQL 18devel on x86_64-freebsd, compiled by clang-17.0.6, 64-bit
2024-10-09 14:53:10.215 UTC [43270][postmaster] LOG:  could not bind
IPv4 address "127.0.0.1": Address already in use
2024-10-09 14:53:10.215 UTC [43270][postmaster] HINT:  Is another
postmaster already running on port 11643? If not, wait a few seconds
and retry.
2024-10-09 14:53:10.215 UTC [43270][postmaster] WARNING:  could not
create listen socket for "127.0.0.1"
2024-10-09 14:53:10.218 UTC [43270][postmaster] FATAL:  could not
create any TCP/IP sockets
2024-10-09 14:53:10.218 UTC [43270][postmaster] LOG:  database system
is shut down



https://api.cirrus-ci.com/v1/artifact/task/5815925960605696/testrun/build/testrun/ssl/001_ssltests/log/001_ssltests_primary.log

> >
> > >
> > > --
> > > Thanks, Amit Langote
> >
> >
> >
> > --
> > Regards
> > Junwang Zhao
>
>
>
> --
> Regards
> Junwang Zhao



--
Regards
Junwang Zhao



Re: general purpose array_sort

From
jian he
Date:
tricky case:
should we allow array element type to be composite/domain?
currently seems to work fine.


create table t(b int[]);
insert into t values ('{{1,3}}'), ('{{1,2}}');

 select array_sort((select array_agg(t) from t), 'desc');
            array_sort
-----------------------------------
 {"(\"{{1,3}}\")","(\"{{1,2}}\")"}


select array_sort((t.b)) from t;
ERROR:  multidimensional arrays sorting are not supported


select array_sort((select array_agg(t.b) from t));
ERROR:  multidimensional arrays sorting are not supported



Re: general purpose array_sort

From
Junwang Zhao
Date:
Hi Jian,

On Fri, Oct 11, 2024 at 1:12 PM jian he <jian.universality@gmail.com> wrote:
>
> tricky case:
> should we allow array element type to be composite/domain?
> currently seems to work fine.
>
>
> create table t(b int[]);
> insert into t values ('{{1,3}}'), ('{{1,2}}');
>
>  select array_sort((select array_agg(t) from t), 'desc');
>             array_sort
> -----------------------------------
>  {"(\"{{1,3}}\")","(\"{{1,2}}\")"}
>
>
> select array_sort((t.b)) from t;
> ERROR:  multidimensional arrays sorting are not supported
>
>
> select array_sort((select array_agg(t.b) from t));
> ERROR:  multidimensional arrays sorting are not supported

I tried the above cases, and the first one works because it's
a one dim array of composite type, the other two fails because
they are multidimensional.

It seems there is not much meaning to sort composite type,
so are you proposing we should error on that?

--
Regards
Junwang Zhao



Re: general purpose array_sort

From
Tom Lane
Date:
Junwang Zhao <zhjwpku@gmail.com> writes:
> It seems there is not much meaning to sort composite type,
> so are you proposing we should error on that?

It's hardly "general purpose" if it randomly refuses to
sort certain types.  I would say it should be able to sort
anything that ORDER BY will handle --- and that certainly
includes the cases shown here.

            regards, tom lane



Re: general purpose array_sort

From
jian he
Date:
On Wed, Oct 23, 2024 at 10:28 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> PFA v7 with multi-array support.
>

if (ARR_NDIM(array) == 1)
{
}
else
{
}
can be simplified.
i think beginning part of array_sort can be like the following:
(newline emitted)

---------------------------------------------------------------------
    if (ARR_NDIM(array) < 1)
        PG_RETURN_ARRAYTYPE_P(array);
    if (dirstr != NULL)
    {
        if (!parse_sort_order(text_to_cstring(dirstr), &sort_asc, &nulls_first))
            ereport(ERROR,
                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                     errmsg("second parameter must be a valid sort
direction")));
    }
    elmtyp = ARR_ELEMTYPE(array);
    if (ARR_NDIM(array) > 1)
        elmtyp = get_array_type(elmtyp);
    typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
    if (typentry == NULL || typentry->type_id != elmtyp)
    {
        typentry = lookup_type_cache(elmtyp, sort_asc ?
TYPECACHE_LT_OPR : TYPECACHE_GT_OPR);
        if ((sort_asc && !OidIsValid(typentry->lt_opr)) ||
            (!sort_asc && !OidIsValid(typentry->gt_opr)))
            ereport(ERROR,
                    (errcode(ERRCODE_UNDEFINED_FUNCTION),
                    errmsg("could not identify an ordering operator
for type %s",
                            format_type_be(elmtyp))));
        fcinfo->flinfo->fn_extra = (void *) typentry;
    }
---------------------------------------------------------------------
/*
 * array_sort
 *
 * Sorts the array in either ascending or descending order.
 * The array must be empty or one-dimensional.
 */
comments need to be updated.


typedef enum
    PARSE_SORT_ORDER_DONE
} ParseSortOrderState;

last one, should have comma, like
"PARSE_SORT_ORDER_DONE, "



Re: general purpose array_sort

From
Junwang Zhao
Date:
On Thu, Oct 24, 2024 at 8:40 PM jian he <jian.universality@gmail.com> wrote:
>
> On Wed, Oct 23, 2024 at 10:28 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
> > PFA v7 with multi-array support.
> >
>
> if (ARR_NDIM(array) == 1)
> {
> }
> else
> {
> }
> can be simplified.
> i think beginning part of array_sort can be like the following:
> (newline emitted)
>
> ---------------------------------------------------------------------
>     if (ARR_NDIM(array) < 1)
>         PG_RETURN_ARRAYTYPE_P(array);
>     if (dirstr != NULL)
>     {
>         if (!parse_sort_order(text_to_cstring(dirstr), &sort_asc, &nulls_first))
>             ereport(ERROR,
>                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>                      errmsg("second parameter must be a valid sort
> direction")));
>     }
>     elmtyp = ARR_ELEMTYPE(array);
>     if (ARR_NDIM(array) > 1)
>         elmtyp = get_array_type(elmtyp);

I'm wondering should we cache the type entry for both element type and
the corresponding array type, for example if we have a table:

create table t(b int[]);
insert into t values ('{1,3}'),('{{2,3}}'),('{{1,2},{0,2}}');

with 1-d array and m-d array interleaved, then the following query will
call lookup_type_cache multiple times.

select array_sort((t.b)) from t;

>     typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
>     if (typentry == NULL || typentry->type_id != elmtyp)
>     {
>         typentry = lookup_type_cache(elmtyp, sort_asc ?
> TYPECACHE_LT_OPR : TYPECACHE_GT_OPR);
>         if ((sort_asc && !OidIsValid(typentry->lt_opr)) ||
>             (!sort_asc && !OidIsValid(typentry->gt_opr)))
>             ereport(ERROR,
>                     (errcode(ERRCODE_UNDEFINED_FUNCTION),
>                     errmsg("could not identify an ordering operator
> for type %s",
>                             format_type_be(elmtyp))));
>         fcinfo->flinfo->fn_extra = (void *) typentry;
>     }
> ---------------------------------------------------------------------
> /*
>  * array_sort
>  *
>  * Sorts the array in either ascending or descending order.
>  * The array must be empty or one-dimensional.
>  */
> comments need to be updated.

will fix it in the next version of patch.

>
>
> typedef enum
>     PARSE_SORT_ORDER_DONE
> } ParseSortOrderState;
>
> last one, should have comma, like
> "PARSE_SORT_ORDER_DONE, "

will fix it.


--
Regards
Junwang Zhao



Re: general purpose array_sort

From
Aleksander Alekseev
Date:
Hi,

> It's hardly "general purpose" if it randomly refuses to
> sort certain types.  I would say it should be able to sort
> anything that ORDER BY will handle --- and that certainly
> includes the cases shown here.

I wonder how useful / convenient the new function will be considering
that we already have CTEs and can do:

SELECT array_agg(x ORDER BY x) FROM unnest(ARRAY[5,1,3,2,4]) AS x;

Perhaps there are use cases I didn't consider?

-- 
Best regards,
Aleksander Alekseev



Re: general purpose array_sort

From
"David G. Johnston"
Date:
On Thu, Oct 24, 2024 at 7:58 AM Aleksander Alekseev <aleksander@timescale.com> wrote:
Hi,

> It's hardly "general purpose" if it randomly refuses to
> sort certain types.  I would say it should be able to sort
> anything that ORDER BY will handle --- and that certainly
> includes the cases shown here.

I wonder how useful / convenient the new function will be considering
that we already have CTEs and can do:

SELECT array_agg(x ORDER BY x) FROM unnest(ARRAY[5,1,3,2,4]) AS x;

Perhaps there are use cases I didn't consider?


Succinctness of expression.  Plus I'm under the impression that a function doing this is going to be somewhat faster than composing two functions together within a multi-node subtree.

I feel like the same observation could have been made for array_shuffle but we added that.  This function being added feels to me like just completing the set.

David J.

Re: general purpose array_sort

From
Aleksander Alekseev
Date:
Hi David,

>> > It's hardly "general purpose" if it randomly refuses to
>> > sort certain types.  I would say it should be able to sort
>> > anything that ORDER BY will handle --- and that certainly
>> > includes the cases shown here.
>>
>> I wonder how useful / convenient the new function will be considering
>> that we already have CTEs and can do:
>>
>> SELECT array_agg(x ORDER BY x) FROM unnest(ARRAY[5,1,3,2,4]) AS x;
>>
>> Perhaps there are use cases I didn't consider?
>>
>
> Succinctness of expression.  Plus I'm under the impression that a function doing this is going to be somewhat faster
thancomposing two functions together within a multi-node subtree.
 
>
> I feel like the same observation could have been made for array_shuffle but we added that.  This function being added
feelsto me like just completing the set.
 

Right. To clarify, I'm not opposed to array_sort(). In fact personally
I would prefer using it instead of array_agg() + unnest().

Just making an observation / thinking out loud that the requirement to
support everything ORDER BY handles / supports (and will handle /
support?) might make this function impractical to maintain.
array_shuffle() or a recently proposed array_reverse() [1] are rather
simple since they just move the array elements without looking at
them. array_sort() does look at the elements and thus is very
different.

Particularly, does the function really need to support dir => asc |
desc | asc nulls first | etc... ? Maybe array_sort() + array_reverse(
array_sort() ) will handle most of the practical cases? I don't know.

[1]: https://commitfest.postgresql.org/50/5314/
--
Best regards,
Aleksander Alekseev



Re: general purpose array_sort

From
"David G. Johnston"
Date:
On Thu, Oct 24, 2024 at 8:27 AM Aleksander Alekseev <aleksander@timescale.com> wrote:

Just making an observation / thinking out loud that the requirement to
support everything ORDER BY handles / supports (and will handle /
support?) might make this function impractical to maintain.

Particularly, does the function really need to support dir => asc |
desc | asc nulls first | etc... ? Maybe array_sort() + array_reverse(
array_sort() ) will handle most of the practical cases? I don't know.

[1]: https://commitfest.postgresql.org/50/5314/


Composing function calls here seems quite undesirable from a performance standpoint, but maybe I over-estimate the cost of exploding-manipulating-freezing an array datum.  Also, while I'm not in a good position to judge the challenge of implementing the sort parameters I would rather have them than not since the order by API has them (plus performance).  I also, maybe unreasonably, believe that our extensible type system has already limited the maintenance burden.

David J.

Re: general purpose array_sort

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Composing function calls here seems quite undesirable from a performance
> standpoint, but maybe I over-estimate the cost of
> exploding-manipulating-freezing an array datum.  Also, while I'm not in a
> good position to judge the challenge of implementing the sort parameters I
> would rather have them than not since the order by API has them (plus
> performance).  I also, maybe unreasonably, believe that our extensible type
> system has already limited the maintenance burden.

Oh!  I had not actually looked at the details of what was being
proposed here.  I imagined "array_sort(anyarray)" and the sort would
use the default sort order for the array's element type.  This
business with a textual representation of a sort clause seems like
over-engineering ... except that it's also under-engineered, because
the parsing is lame and incomplete.  (No USING option, and the fact
that collation comes from somewhere else seems impossibly confusing.)
Let's drop that.  As already noted, if you need a non-default sort
order you can build what you want with a sub-select.  The ambition
here should be to make easy cases easy.

            regards, tom lane



Re: general purpose array_sort

From
"David G. Johnston"
Date:
On Thu, Oct 24, 2024 at 9:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
This business with a textual representation of a sort clause seems like
over-engineering ... except that it's also under-engineered, because
the parsing is lame and incomplete.  (No USING option, and the fact
that collation comes from somewhere else seems impossibly confusing.)
Let's drop that.

I can accept this outcome though an optional three-valued boolean sort order (ascending and descending only) I'd argue is worth keeping.  null value placement too I guess, three-valued boolean (nulls_first).

David J.

Re: general purpose array_sort

From
Aleksander Alekseev
Date:
Hi,

> I can accept this outcome though an optional three-valued boolean sort order (ascending and descending only) I'd
argueis worth keeping.  null value placement too I guess, three-valued boolean (nulls_first).
 

Perhaps these optional arguments deserve separate discussions. I
suggest merging something everyone agrees on first. This will simplify
the review process and allow us to deliver value to the users quickly.
Arguments like `reverse => true` and `nulls_first => true` can always
be implemented and added as separate patches.

-- 
Best regards,
Aleksander Alekseev



Re: general purpose array_sort

From
Junwang Zhao
Date:
On Fri, Oct 25, 2024 at 1:19 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi,
>
> > I can accept this outcome though an optional three-valued boolean sort order (ascending and descending only) I'd
argueis worth keeping.  null value placement too I guess, three-valued boolean (nulls_first). 
>
> Perhaps these optional arguments deserve separate discussions. I
> suggest merging something everyone agrees on first. This will simplify
> the review process and allow us to deliver value to the users quickly.
> Arguments like `reverse => true` and `nulls_first => true` can always
> be implemented and added as separate patches.

As this patch uses the tuplesort infrastructure, we need to supply the
sortOperator, sortCollation and nullsFirstFlag, I tend to agree with
David. I admit that the parsing part is not good, so I will remove it
by using two boolean parameters Jian suggested earlier.

Will send out another version by tomorrow.

>
> --
> Best regards,
> Aleksander Alekseev



--
Regards
Junwang Zhao



Re: general purpose array_sort

From
Aleksander Alekseev
Date:
Hi,

> Based on the previous discussion, I split it into two patches in V8.
>
> 0001 is the general sort part without `is_ascending` or `nulls_first`,
> the sort order is determined by the "<" operator of the element type.
> It also cached the type entry of both eletyp and the corresponding
> array type.
>
> 0002 adds the `is_ascending` and `nulls_first` part, it now uses
> two boolean parameters instead of parsing one text parameter.

Thanks for the update patch set. Here are some comments.

0001:

> +{ oid => '8810', descr => 'sort array',
> +  proname => 'array_sort', provolatile => 'v', prorettype => 'anyarray',
> +  proargtypes => 'anyarray', prosrc => 'array_sort'},

I would expect that array_sort() should be IMMUTABLE. Is there a
reason for it to be VOLATILE?

> +        <function>array_sort</function> ( <type>anyarray</type> <optional> COLLATE
<replaceable>collation_name</replaceable></optional>)
 
> +        <returnvalue>anyarray</returnvalue>

It seems to me that the part about using COLLATE should be moved
below, to the description / examples section, since it's not part of
the function signature.

Also the description should be more specific about how NULLs are
sorted. NULLs also should be covered by tests.

0002:

> <parameter>is_ascending</parameter>

I really believe this name is not the best one. I suggest using
`reverse => true`. `nulls_first` is OK.

> +Datum
> +array_sort_order(PG_FUNCTION_ARGS)
> +{
> +    return array_sort(fcinfo);
> +}
> +
> +Datum
> +array_sort_order_nulls_first(PG_FUNCTION_ARGS)
> +{
> +    return array_sort(fcinfo);
> +}

Any reason not to specify array_sort in pg_proc.dat?

The tests cover is_ascending => true | false, which is OK, but only
(is_ascending = true, nulls_first => true) and (is_ascending => false,
nulls_fist => false). For the case when both optional arguments are
specified you have to test at least 4 combinations.

-- 
Best regards,
Aleksander Alekseev



Re: general purpose array_sort

From
jian he
Date:
On Tue, Oct 29, 2024 at 12:48 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:.
>
> 0001:
>
> > +{ oid => '8810', descr => 'sort array',
> > +  proname => 'array_sort', provolatile => 'v', prorettype => 'anyarray',
> > +  proargtypes => 'anyarray', prosrc => 'array_sort'},
>
> I would expect that array_sort() should be IMMUTABLE. Is there a
> reason for it to be VOLATILE?
>

https://www.postgresql.org/docs/current/sql-createfunction.html says:

IMMUTABLE indicates that the function cannot modify the database and always
returns the same result when given the same argument values; that is, it does
not do database lookups or otherwise use information not directly present in its
argument list. If this option is given, any call of the function with
all-constant arguments can be immediately replaced with the function value.


+ {
+ typentry = lookup_type_cache(elmtyp, TYPECACHE_LT_OPR);
+ if (!OidIsValid(typentry->lt_opr))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("could not identify ordering operator for type %s",
+ format_type_be(elmtyp))));

This error can happen. I think this conflicts with the doc IMMUTABLE
description.



Re: general purpose array_sort

From
Aleksander Alekseev
Date:
Hi Jian,

> IMMUTABLE indicates that the function cannot modify the database and always
> returns the same result when given the same argument values; that is, it does
> not do database lookups or otherwise use information not directly present in its
> argument list. If this option is given, any call of the function with
> all-constant arguments can be immediately replaced with the function value.
>
>
> + {
> + typentry = lookup_type_cache(elmtyp, TYPECACHE_LT_OPR);
> + if (!OidIsValid(typentry->lt_opr))
> + ereport(ERROR,
> + (errcode(ERRCODE_UNDEFINED_FUNCTION),
> + errmsg("could not identify ordering operator for type %s",
> + format_type_be(elmtyp))));
>
> This error can happen. I think this conflicts with the doc IMMUTABLE
> description.

lookup_type_cache() is used at least by array_position() which is
marked as IMMUTABLE, so I believe this is fine. Similarly functions
dealing with timezones can return different results between the DBMS
restarts / updates, but we don't care and mark them IMMUTABLE anyway.
Otherwise we couldn't use these functions in functional indexes which
will make them rather useless.

-- 
Best regards,
Aleksander Alekseev



Re: general purpose array_sort

From
Aleksander Alekseev
Date:
Hi,

Thanks for the updated patch set.

> > > +Datum
> > > +array_sort_order(PG_FUNCTION_ARGS)
> > > +{
> > > +    return array_sort(fcinfo);
> > > +}
> > > +
> > > +Datum
> > > +array_sort_order_nulls_first(PG_FUNCTION_ARGS)
> > > +{
> > > +    return array_sort(fcinfo);
> > > +}
> >
> > Any reason not to specify array_sort in pg_proc.dat?
>
> It is specified in 0001 (see oid => '8810').

What I meant was that I don't think these wrapper functions are
needed. I think you can just do:

```
+{ oid => '8811', descr => 'sort array',
+  proname => 'array_sort', prorettype => 'anyarray',
+  proargtypes => 'anyarray bool', prosrc => 'array_sort'}, <--
array_sort is used directly in `prosrc`
```

... unless I'm missing something.

-- 
Best regards,
Aleksander Alekseev



Re: general purpose array_sort

From
Junwang Zhao
Date:
Hi,

On Wed, Oct 30, 2024 at 9:29 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi,
>
> Thanks for the updated patch set.
>
> > > > +Datum
> > > > +array_sort_order(PG_FUNCTION_ARGS)
> > > > +{
> > > > +    return array_sort(fcinfo);
> > > > +}
> > > > +
> > > > +Datum
> > > > +array_sort_order_nulls_first(PG_FUNCTION_ARGS)
> > > > +{
> > > > +    return array_sort(fcinfo);
> > > > +}
> > >
> > > Any reason not to specify array_sort in pg_proc.dat?
> >
> > It is specified in 0001 (see oid => '8810').
>
> What I meant was that I don't think these wrapper functions are
> needed. I think you can just do:
>
> ```
> +{ oid => '8811', descr => 'sort array',
> +  proname => 'array_sort', prorettype => 'anyarray',
> +  proargtypes => 'anyarray bool', prosrc => 'array_sort'}, <--
> array_sort is used directly in `prosrc`
> ```
>
> ... unless I'm missing something.

There is a opr sanity check for this[1], if we remove these wrapper functions,
regression test will fail with:

- oid | proname | oid | proname
------+---------+-----+---------
-(0 rows)
+ oid  |  proname   | oid  |  proname
+------+------------+------+------------
+ 8811 | array_sort | 8812 | array_sort
+ 8810 | array_sort | 8811 | array_sort
+ 8810 | array_sort | 8812 | array_sort
+(3 rows)


[1]:

-- Considering only built-in procs (prolang = 12), look for multiple uses
-- of the same internal function (ie, matching prosrc fields). It's OK to
-- have several entries with different pronames for the same internal function,
-- but conflicts in the number of arguments and other critical items should
-- be complained of. (We don't check data types here; see next query.)
-- Note: ignore aggregate functions here, since they all point to the same
-- dummy built-in function.

SELECT p1.oid, p1.proname, p2.oid, p2.proname
FROM pg_proc AS p1, pg_proc AS p2
WHERE p1.oid < p2.oid AND
p1.prosrc = p2.prosrc AND
p1.prolang = 12 AND p2.prolang = 12 AND
(p1.prokind != 'a' OR p2.prokind != 'a') AND
(p1.prolang != p2.prolang OR
p1.prokind != p2.prokind OR
p1.prosecdef != p2.prosecdef OR
p1.proleakproof != p2.proleakproof OR
p1.proisstrict != p2.proisstrict OR
p1.proretset != p2.proretset OR
p1.provolatile != p2.provolatile OR
p1.pronargs != p2.pronargs);

>
> --
> Best regards,
> Aleksander Alekseev



--
Regards
Junwang Zhao