Thread: Re: general purpose array_sort
On Sat, Sep 28, 2024 at 10:41 PM jian he <jian.universality@gmail.com> wrote: > > On Sat, Sep 28, 2024 at 7:52 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > > > PFA v2, use COLLATE keyword to supply the collation suggested by > > Andreas offlist. > > > this is better. otherwise we need extra care to handle case like: > SELECT array_sort('{1,3,5,2,4,6}'::int[] COLLATE "pg_c_utf8"); > > > + <row> > + <entry role="func_table_entry"><para role="func_signature"> > + <indexterm> > + <primary>array_sort</primary> > + </indexterm> > + <function>array_sort</function> ( <type>anyarray</type> > <optional>, <parameter>dir</parameter> </optional>) > + <returnvalue>anyarray</returnvalue> > + </para> > + <para> > + Sorts the array in either ascending or descending order. > + <parameter>dir</parameter> must be <literal>asc</literal> > + or <literal>desc</literal>. The array must be empty or one-dimensional. > + </para> > + <para> > + <literal>array_sort(ARRAY[1,2,5,6,3,4])</literal> > + <returnvalue>{1,2,3,4,5,6}</returnvalue> > + </para></entry> > + </row> > I am confused with <parameter>dir</parameter>. I guess you want to say > "direction" > But here, I think <parameter>sort_asc</parameter> would be more appropriate? This doc is mostly copied and edited from intarray.sgml sort part. And the logic is basically the same, you can check the intarray module. > > > <parameter>dir</parameter> can have only two potential values, make it > as a boolean would be more easier? > you didn't mention information: "by default, it will sort by > ascending order; the sort collation by default is using the array > element type's collation" > > tuplesort_begin_datum can do null-first, null-last, so the > one-dimension array can allow null values. The following(create extension intarry first) will give an error, I keep the same for array_sort. SELECT sort('{1234234,-30,234234, null}'); > > Based on the above and others, I did some refactoring, feel free to take it. > my changes, changed the function signature, so you need to pay > attention to sql test file. Thanks for your refactor, I will take some in the next version. -- Regards Junwang Zhao
On Sat, Sep 28, 2024 at 7:05 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
On Sat, Sep 28, 2024 at 10:41 PM jian he <jian.universality@gmail.com> wrote:
>
> <parameter>dir</parameter> can have only two potential values, make it
> as a boolean would be more easier?
> you didn't mention information: "by default, it will sort by
> ascending order; the sort collation by default is using the array
> element type's collation"
>
> tuplesort_begin_datum can do null-first, null-last, so the
> one-dimension array can allow null values.
The following(create extension intarry first) will give an error, I
keep the same for array_sort.
SELECT sort('{1234234,-30,234234, null}');
I would suggest accepting:
asc
desc
asc nulls first
asc nulls last *
desc nulls first *
desc nulls last
As valid inputs for "dir" - and that the starred options are the defaults when null position is omitted.
In short, mimic create index.
David J.
On Mon, Sep 30, 2024 at 1:01 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > > I would suggest accepting: > > asc > > desc > > asc nulls first > > asc nulls last * > > desc nulls first * > > desc nulls last > > > > As valid inputs for "dir" - and that the starred options are the defaults when null position is omitted. > > > > In short, mimic create index. > > > > David J. > > > > PFA v3 with David's suggestion addressed. > I think just adding 2 bool arguments (asc/desc, nulls last/not nulls last) would be easier. but either way, (i don't have a huge opinion) but document the second argument, imagine case SELECT array_sort('{a,B}'::text[] , E'aSc NulLs LaST \t\r\n'); would be tricky? errmsg("multidimensional arrays sorting are not supported"))); write a sql test to trigger the error message that would be great. you can add two or one example to collate.icu.utf8.sql to demo that it actually works with COLLATE collation_name like: SELECT array_sort('{a,B}'::text[] COLLATE case_insensitive); SELECT array_sort('{a,B}'::text[] COLLATE "C"); #define WHITESPACE " \t\n\r" you may also check function scanner_isspace + typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra; + if (typentry == NULL || typentry->type_id != elmtyp) + { + typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR : TYPECACHE_GT_OPR); + fcinfo->flinfo->fn_extra = (void *) typentry; + } you need to one-time check typentry->lt_opr or typentry->gt_opr exists? see CreateStatistics. /* Disallow data types without a less-than operator */ type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR); if (type->lt_opr == InvalidOid) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("column \"%s\" cannot be used in statistics because its type %s has no default btree operator class", attname, format_type_be(attForm->atttypid))));
> > > > + typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra; > > + if (typentry == NULL || typentry->type_id != elmtyp) > > + { > > + typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR : > > TYPECACHE_GT_OPR); > > + fcinfo->flinfo->fn_extra = (void *) typentry; > > + } > > you need to one-time check typentry->lt_opr or typentry->gt_opr exists? > > see CreateStatistics. > > /* Disallow data types without a less-than operator */ > > type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR); > > if (type->lt_opr == InvalidOid) > > ereport(ERROR, > > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > errmsg("column \"%s\" cannot be used in > > statistics because its type %s has no default btree operator class", > > attname, format_type_be(attForm->atttypid)))); > > I added an Assert for this part, not sure if that is enough. > i think it really should be: if (typentry == NULL || typentry->type_id != elmtyp) { typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR : TYPECACHE_GT_OPR); fcinfo->flinfo->fn_extra = (void *) typentry; if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as && OidIsValid(typentry->gt_opr)); ereport(ERROR,....) } Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR then we cannot do the sort, we should just error out. I just tried this colour type [1] with (CREATE TYPE colour (INPUT = colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4); select array_sort('{#FF0000, #FF0000}'::colour[]); of course it will segfault with your new Assert. [1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c
Hi Junwang, On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote: > > > > > > > > > > + typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra; > > > > + if (typentry == NULL || typentry->type_id != elmtyp) > > > > + { > > > > + typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR : > > > > TYPECACHE_GT_OPR); > > > > + fcinfo->flinfo->fn_extra = (void *) typentry; > > > > + } > > > > you need to one-time check typentry->lt_opr or typentry->gt_opr exists? > > > > see CreateStatistics. > > > > /* Disallow data types without a less-than operator */ > > > > type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR); > > > > if (type->lt_opr == InvalidOid) > > > > ereport(ERROR, > > > > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > > > errmsg("column \"%s\" cannot be used in > > > > statistics because its type %s has no default btree operator class", > > > > attname, format_type_be(attForm->atttypid)))); > > > > > > I added an Assert for this part, not sure if that is enough. > > > > > > > i think it really should be: > > > > if (typentry == NULL || typentry->type_id != elmtyp) > > { > > typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR : > > TYPECACHE_GT_OPR); > > fcinfo->flinfo->fn_extra = (void *) typentry; > > if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as && > > OidIsValid(typentry->gt_opr)); > > ereport(ERROR,....) > > } > > > > Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR > > then we cannot do the sort, we should just error out. > > > > I just tried this colour type [1] with (CREATE TYPE colour (INPUT = > > colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4); > > > > select array_sort('{#FF0000, #FF0000}'::colour[]); > > of course it will segfault with your new Assert. > > > > > > [1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c > > Make sense, PFA v5 with Jian's suggestion. Have you noticed that the tests have failed on Cirrus CI runs of this patch? https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277 It might be related to the test machines having a different *default* locale than your local environment, which could result in a different sort order for the test data. You may need to add an explicit COLLATE clause to the tests to ensure consistent sorting across systems. -- Thanks, Amit Langote
On Wed, Oct 9, 2024 at 10:10 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > Hi Amit, > > On Thu, Oct 3, 2024 at 2:22 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > Hi Junwang, > > > > On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > > On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote: > > > > > > > > > > > > > > > > + typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra; > > > > > > + if (typentry == NULL || typentry->type_id != elmtyp) > > > > > > + { > > > > > > + typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR : > > > > > > TYPECACHE_GT_OPR); > > > > > > + fcinfo->flinfo->fn_extra = (void *) typentry; > > > > > > + } > > > > > > you need to one-time check typentry->lt_opr or typentry->gt_opr exists? > > > > > > see CreateStatistics. > > > > > > /* Disallow data types without a less-than operator */ > > > > > > type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR); > > > > > > if (type->lt_opr == InvalidOid) > > > > > > ereport(ERROR, > > > > > > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > > > > > errmsg("column \"%s\" cannot be used in > > > > > > statistics because its type %s has no default btree operator class", > > > > > > attname, format_type_be(attForm->atttypid)))); > > > > > > > > > > I added an Assert for this part, not sure if that is enough. > > > > > > > > > > > > > i think it really should be: > > > > > > > > if (typentry == NULL || typentry->type_id != elmtyp) > > > > { > > > > typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR : > > > > TYPECACHE_GT_OPR); > > > > fcinfo->flinfo->fn_extra = (void *) typentry; > > > > if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as && > > > > OidIsValid(typentry->gt_opr)); > > > > ereport(ERROR,....) > > > > } > > > > > > > > Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR > > > > then we cannot do the sort, we should just error out. > > > > > > > > I just tried this colour type [1] with (CREATE TYPE colour (INPUT = > > > > colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4); > > > > > > > > select array_sort('{#FF0000, #FF0000}'::colour[]); > > > > of course it will segfault with your new Assert. > > > > > > > > > > > > [1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c > > > > > > Make sense, PFA v5 with Jian's suggestion. > > > > Have you noticed that the tests have failed on Cirrus CI runs of this patch? > > > > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277 > > Sorry for the late reply due to my vacation. I should have paid > more attention to Cirrus CI earlier ;) > > > > > It might be related to the test machines having a different *default* > > locale than your local environment, which could result in a different > > sort order for the test data. You may need to add an explicit COLLATE > > clause to the tests to ensure consistent sorting across systems. > > I've changed the tests to use just ASCII characters, then added > *COLLATE "C"* to the tests and CI passed, PFA v6. Sadly the CI only passed on my own github repo, it failed on cfbot[1], will dig into the reason later because I can not open the cirrus ci page right now ;( [1] https://cirrus-ci.com/task/5815925960605696 > > > > > -- > > Thanks, Amit Langote > > > > -- > Regards > Junwang Zhao -- Regards Junwang Zhao
On Wed, Oct 9, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > On Wed, Oct 9, 2024 at 10:10 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > > > Hi Amit, > > > > On Thu, Oct 3, 2024 at 2:22 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > > > Hi Junwang, > > > > > > On Wed, Oct 2, 2024 at 11:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > > > On Wed, Oct 2, 2024 at 9:51 AM jian he <jian.universality@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > + typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra; > > > > > > > + if (typentry == NULL || typentry->type_id != elmtyp) > > > > > > > + { > > > > > > > + typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR : > > > > > > > TYPECACHE_GT_OPR); > > > > > > > + fcinfo->flinfo->fn_extra = (void *) typentry; > > > > > > > + } > > > > > > > you need to one-time check typentry->lt_opr or typentry->gt_opr exists? > > > > > > > see CreateStatistics. > > > > > > > /* Disallow data types without a less-than operator */ > > > > > > > type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR); > > > > > > > if (type->lt_opr == InvalidOid) > > > > > > > ereport(ERROR, > > > > > > > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > > > > > > errmsg("column \"%s\" cannot be used in > > > > > > > statistics because its type %s has no default btree operator class", > > > > > > > attname, format_type_be(attForm->atttypid)))); > > > > > > > > > > > > I added an Assert for this part, not sure if that is enough. > > > > > > > > > > > > > > > > i think it really should be: > > > > > > > > > > if (typentry == NULL || typentry->type_id != elmtyp) > > > > > { > > > > > typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR : > > > > > TYPECACHE_GT_OPR); > > > > > fcinfo->flinfo->fn_extra = (void *) typentry; > > > > > if ((sort_asc && !OidIsValid(typentry->lt_opr) || (!sort_as && > > > > > OidIsValid(typentry->gt_opr)); > > > > > ereport(ERROR,....) > > > > > } > > > > > > > > > > Imagine a type that doesn't have TYPECACHE_LT_OPR or TYPECACHE_GT_OPR > > > > > then we cannot do the sort, we should just error out. > > > > > > > > > > I just tried this colour type [1] with (CREATE TYPE colour (INPUT = > > > > > colour_in, OUTPUT = colour_out, LIKE = pg_catalog.int4); > > > > > > > > > > select array_sort('{#FF0000, #FF0000}'::colour[]); > > > > > of course it will segfault with your new Assert. > > > > > > > > > > > > > > > [1] https://github.com/hlinnaka/colour-datatype/blob/master/colour.c > > > > > > > > Make sense, PFA v5 with Jian's suggestion. > > > > > > Have you noticed that the tests have failed on Cirrus CI runs of this patch? > > > > > > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5277 > > > > Sorry for the late reply due to my vacation. I should have paid > > more attention to Cirrus CI earlier ;) > > > > > > > > It might be related to the test machines having a different *default* > > > locale than your local environment, which could result in a different > > > sort order for the test data. You may need to add an explicit COLLATE > > > clause to the tests to ensure consistent sorting across systems. > > > > I've changed the tests to use just ASCII characters, then added > > *COLLATE "C"* to the tests and CI passed, PFA v6. > > Sadly the CI only passed on my own github repo, it failed on > cfbot[1], will dig into the reason later because I can not open the cirrus > ci page right now ;( > > [1] https://cirrus-ci.com/task/5815925960605696 > Seems the failure is not related to this patch, I guess the reason for this is the stop phase doesn't release the port properly? 2024-10-09 14:53:10.079 UTC [43052][checkpointer] LOG: checkpoint complete: wrote 5617 buffers (34.3%), wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 3 recycled; write=0.107 s, sync=0.001 s, total=0.107 s; sync files=0, longest=0.000 s, average=0.000 s; distance=45239 kB, estimate=45239 kB; lsn=0/414A138, redo lsn=0/414A138 2024-10-09 14:53:10.084 UTC [43050][postmaster] LOG: database system is shut down 2024-10-09 14:53:10.215 UTC [43270][postmaster] LOG: starting PostgreSQL 18devel on x86_64-freebsd, compiled by clang-17.0.6, 64-bit 2024-10-09 14:53:10.215 UTC [43270][postmaster] LOG: could not bind IPv4 address "127.0.0.1": Address already in use 2024-10-09 14:53:10.215 UTC [43270][postmaster] HINT: Is another postmaster already running on port 11643? If not, wait a few seconds and retry. 2024-10-09 14:53:10.215 UTC [43270][postmaster] WARNING: could not create listen socket for "127.0.0.1" 2024-10-09 14:53:10.218 UTC [43270][postmaster] FATAL: could not create any TCP/IP sockets 2024-10-09 14:53:10.218 UTC [43270][postmaster] LOG: database system is shut down https://api.cirrus-ci.com/v1/artifact/task/5815925960605696/testrun/build/testrun/ssl/001_ssltests/log/001_ssltests_primary.log > > > > > > > > -- > > > Thanks, Amit Langote > > > > > > > > -- > > Regards > > Junwang Zhao > > > > -- > Regards > Junwang Zhao -- Regards Junwang Zhao
tricky case: should we allow array element type to be composite/domain? currently seems to work fine. create table t(b int[]); insert into t values ('{{1,3}}'), ('{{1,2}}'); select array_sort((select array_agg(t) from t), 'desc'); array_sort ----------------------------------- {"(\"{{1,3}}\")","(\"{{1,2}}\")"} select array_sort((t.b)) from t; ERROR: multidimensional arrays sorting are not supported select array_sort((select array_agg(t.b) from t)); ERROR: multidimensional arrays sorting are not supported
Hi Jian, On Fri, Oct 11, 2024 at 1:12 PM jian he <jian.universality@gmail.com> wrote: > > tricky case: > should we allow array element type to be composite/domain? > currently seems to work fine. > > > create table t(b int[]); > insert into t values ('{{1,3}}'), ('{{1,2}}'); > > select array_sort((select array_agg(t) from t), 'desc'); > array_sort > ----------------------------------- > {"(\"{{1,3}}\")","(\"{{1,2}}\")"} > > > select array_sort((t.b)) from t; > ERROR: multidimensional arrays sorting are not supported > > > select array_sort((select array_agg(t.b) from t)); > ERROR: multidimensional arrays sorting are not supported I tried the above cases, and the first one works because it's a one dim array of composite type, the other two fails because they are multidimensional. It seems there is not much meaning to sort composite type, so are you proposing we should error on that? -- Regards Junwang Zhao
Junwang Zhao <zhjwpku@gmail.com> writes: > It seems there is not much meaning to sort composite type, > so are you proposing we should error on that? It's hardly "general purpose" if it randomly refuses to sort certain types. I would say it should be able to sort anything that ORDER BY will handle --- and that certainly includes the cases shown here. regards, tom lane
On Wed, Oct 23, 2024 at 10:28 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > PFA v7 with multi-array support. > if (ARR_NDIM(array) == 1) { } else { } can be simplified. i think beginning part of array_sort can be like the following: (newline emitted) --------------------------------------------------------------------- if (ARR_NDIM(array) < 1) PG_RETURN_ARRAYTYPE_P(array); if (dirstr != NULL) { if (!parse_sort_order(text_to_cstring(dirstr), &sort_asc, &nulls_first)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("second parameter must be a valid sort direction"))); } elmtyp = ARR_ELEMTYPE(array); if (ARR_NDIM(array) > 1) elmtyp = get_array_type(elmtyp); typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra; if (typentry == NULL || typentry->type_id != elmtyp) { typentry = lookup_type_cache(elmtyp, sort_asc ? TYPECACHE_LT_OPR : TYPECACHE_GT_OPR); if ((sort_asc && !OidIsValid(typentry->lt_opr)) || (!sort_asc && !OidIsValid(typentry->gt_opr))) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_FUNCTION), errmsg("could not identify an ordering operator for type %s", format_type_be(elmtyp)))); fcinfo->flinfo->fn_extra = (void *) typentry; } --------------------------------------------------------------------- /* * array_sort * * Sorts the array in either ascending or descending order. * The array must be empty or one-dimensional. */ comments need to be updated. typedef enum PARSE_SORT_ORDER_DONE } ParseSortOrderState; last one, should have comma, like "PARSE_SORT_ORDER_DONE, "
On Thu, Oct 24, 2024 at 8:40 PM jian he <jian.universality@gmail.com> wrote: > > On Wed, Oct 23, 2024 at 10:28 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > PFA v7 with multi-array support. > > > > if (ARR_NDIM(array) == 1) > { > } > else > { > } > can be simplified. > i think beginning part of array_sort can be like the following: > (newline emitted) > > --------------------------------------------------------------------- > if (ARR_NDIM(array) < 1) > PG_RETURN_ARRAYTYPE_P(array); > if (dirstr != NULL) > { > if (!parse_sort_order(text_to_cstring(dirstr), &sort_asc, &nulls_first)) > ereport(ERROR, > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > errmsg("second parameter must be a valid sort > direction"))); > } > elmtyp = ARR_ELEMTYPE(array); > if (ARR_NDIM(array) > 1) > elmtyp = get_array_type(elmtyp); I'm wondering should we cache the type entry for both element type and the corresponding array type, for example if we have a table: create table t(b int[]); insert into t values ('{1,3}'),('{{2,3}}'),('{{1,2},{0,2}}'); with 1-d array and m-d array interleaved, then the following query will call lookup_type_cache multiple times. select array_sort((t.b)) from t; > typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra; > if (typentry == NULL || typentry->type_id != elmtyp) > { > typentry = lookup_type_cache(elmtyp, sort_asc ? > TYPECACHE_LT_OPR : TYPECACHE_GT_OPR); > if ((sort_asc && !OidIsValid(typentry->lt_opr)) || > (!sort_asc && !OidIsValid(typentry->gt_opr))) > ereport(ERROR, > (errcode(ERRCODE_UNDEFINED_FUNCTION), > errmsg("could not identify an ordering operator > for type %s", > format_type_be(elmtyp)))); > fcinfo->flinfo->fn_extra = (void *) typentry; > } > --------------------------------------------------------------------- > /* > * array_sort > * > * Sorts the array in either ascending or descending order. > * The array must be empty or one-dimensional. > */ > comments need to be updated. will fix it in the next version of patch. > > > typedef enum > PARSE_SORT_ORDER_DONE > } ParseSortOrderState; > > last one, should have comma, like > "PARSE_SORT_ORDER_DONE, " will fix it. -- Regards Junwang Zhao
Hi, > It's hardly "general purpose" if it randomly refuses to > sort certain types. I would say it should be able to sort > anything that ORDER BY will handle --- and that certainly > includes the cases shown here. I wonder how useful / convenient the new function will be considering that we already have CTEs and can do: SELECT array_agg(x ORDER BY x) FROM unnest(ARRAY[5,1,3,2,4]) AS x; Perhaps there are use cases I didn't consider? -- Best regards, Aleksander Alekseev
On Thu, Oct 24, 2024 at 7:58 AM Aleksander Alekseev <aleksander@timescale.com> wrote:
Hi,
> It's hardly "general purpose" if it randomly refuses to
> sort certain types. I would say it should be able to sort
> anything that ORDER BY will handle --- and that certainly
> includes the cases shown here.
I wonder how useful / convenient the new function will be considering
that we already have CTEs and can do:
SELECT array_agg(x ORDER BY x) FROM unnest(ARRAY[5,1,3,2,4]) AS x;
Perhaps there are use cases I didn't consider?
Succinctness of expression. Plus I'm under the impression that a function doing this is going to be somewhat faster than composing two functions together within a multi-node subtree.
I feel like the same observation could have been made for array_shuffle but we added that. This function being added feels to me like just completing the set.
David J.
Hi David, >> > It's hardly "general purpose" if it randomly refuses to >> > sort certain types. I would say it should be able to sort >> > anything that ORDER BY will handle --- and that certainly >> > includes the cases shown here. >> >> I wonder how useful / convenient the new function will be considering >> that we already have CTEs and can do: >> >> SELECT array_agg(x ORDER BY x) FROM unnest(ARRAY[5,1,3,2,4]) AS x; >> >> Perhaps there are use cases I didn't consider? >> > > Succinctness of expression. Plus I'm under the impression that a function doing this is going to be somewhat faster thancomposing two functions together within a multi-node subtree. > > I feel like the same observation could have been made for array_shuffle but we added that. This function being added feelsto me like just completing the set. Right. To clarify, I'm not opposed to array_sort(). In fact personally I would prefer using it instead of array_agg() + unnest(). Just making an observation / thinking out loud that the requirement to support everything ORDER BY handles / supports (and will handle / support?) might make this function impractical to maintain. array_shuffle() or a recently proposed array_reverse() [1] are rather simple since they just move the array elements without looking at them. array_sort() does look at the elements and thus is very different. Particularly, does the function really need to support dir => asc | desc | asc nulls first | etc... ? Maybe array_sort() + array_reverse( array_sort() ) will handle most of the practical cases? I don't know. [1]: https://commitfest.postgresql.org/50/5314/ -- Best regards, Aleksander Alekseev
On Thu, Oct 24, 2024 at 8:27 AM Aleksander Alekseev <aleksander@timescale.com> wrote:
Just making an observation / thinking out loud that the requirement to
support everything ORDER BY handles / supports (and will handle /
support?) might make this function impractical to maintain.
Particularly, does the function really need to support dir => asc |
desc | asc nulls first | etc... ? Maybe array_sort() + array_reverse(
array_sort() ) will handle most of the practical cases? I don't know.
[1]: https://commitfest.postgresql.org/50/5314/
Composing function calls here seems quite undesirable from a performance standpoint, but maybe I over-estimate the cost of exploding-manipulating-freezing an array datum. Also, while I'm not in a good position to judge the challenge of implementing the sort parameters I would rather have them than not since the order by API has them (plus performance). I also, maybe unreasonably, believe that our extensible type system has already limited the maintenance burden.
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > Composing function calls here seems quite undesirable from a performance > standpoint, but maybe I over-estimate the cost of > exploding-manipulating-freezing an array datum. Also, while I'm not in a > good position to judge the challenge of implementing the sort parameters I > would rather have them than not since the order by API has them (plus > performance). I also, maybe unreasonably, believe that our extensible type > system has already limited the maintenance burden. Oh! I had not actually looked at the details of what was being proposed here. I imagined "array_sort(anyarray)" and the sort would use the default sort order for the array's element type. This business with a textual representation of a sort clause seems like over-engineering ... except that it's also under-engineered, because the parsing is lame and incomplete. (No USING option, and the fact that collation comes from somewhere else seems impossibly confusing.) Let's drop that. As already noted, if you need a non-default sort order you can build what you want with a sub-select. The ambition here should be to make easy cases easy. regards, tom lane
On Thu, Oct 24, 2024 at 9:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
This business with a textual representation of a sort clause seems like
over-engineering ... except that it's also under-engineered, because
the parsing is lame and incomplete. (No USING option, and the fact
that collation comes from somewhere else seems impossibly confusing.)
Let's drop that.
I can accept this outcome though an optional three-valued boolean sort order (ascending and descending only) I'd argue is worth keeping. null value placement too I guess, three-valued boolean (nulls_first).
David J.
Hi, > I can accept this outcome though an optional three-valued boolean sort order (ascending and descending only) I'd argueis worth keeping. null value placement too I guess, three-valued boolean (nulls_first). Perhaps these optional arguments deserve separate discussions. I suggest merging something everyone agrees on first. This will simplify the review process and allow us to deliver value to the users quickly. Arguments like `reverse => true` and `nulls_first => true` can always be implemented and added as separate patches. -- Best regards, Aleksander Alekseev
On Fri, Oct 25, 2024 at 1:19 AM Aleksander Alekseev <aleksander@timescale.com> wrote: > > Hi, > > > I can accept this outcome though an optional three-valued boolean sort order (ascending and descending only) I'd argueis worth keeping. null value placement too I guess, three-valued boolean (nulls_first). > > Perhaps these optional arguments deserve separate discussions. I > suggest merging something everyone agrees on first. This will simplify > the review process and allow us to deliver value to the users quickly. > Arguments like `reverse => true` and `nulls_first => true` can always > be implemented and added as separate patches. As this patch uses the tuplesort infrastructure, we need to supply the sortOperator, sortCollation and nullsFirstFlag, I tend to agree with David. I admit that the parsing part is not good, so I will remove it by using two boolean parameters Jian suggested earlier. Will send out another version by tomorrow. > > -- > Best regards, > Aleksander Alekseev -- Regards Junwang Zhao
Hi, > Based on the previous discussion, I split it into two patches in V8. > > 0001 is the general sort part without `is_ascending` or `nulls_first`, > the sort order is determined by the "<" operator of the element type. > It also cached the type entry of both eletyp and the corresponding > array type. > > 0002 adds the `is_ascending` and `nulls_first` part, it now uses > two boolean parameters instead of parsing one text parameter. Thanks for the update patch set. Here are some comments. 0001: > +{ oid => '8810', descr => 'sort array', > + proname => 'array_sort', provolatile => 'v', prorettype => 'anyarray', > + proargtypes => 'anyarray', prosrc => 'array_sort'}, I would expect that array_sort() should be IMMUTABLE. Is there a reason for it to be VOLATILE? > + <function>array_sort</function> ( <type>anyarray</type> <optional> COLLATE <replaceable>collation_name</replaceable></optional>) > + <returnvalue>anyarray</returnvalue> It seems to me that the part about using COLLATE should be moved below, to the description / examples section, since it's not part of the function signature. Also the description should be more specific about how NULLs are sorted. NULLs also should be covered by tests. 0002: > <parameter>is_ascending</parameter> I really believe this name is not the best one. I suggest using `reverse => true`. `nulls_first` is OK. > +Datum > +array_sort_order(PG_FUNCTION_ARGS) > +{ > + return array_sort(fcinfo); > +} > + > +Datum > +array_sort_order_nulls_first(PG_FUNCTION_ARGS) > +{ > + return array_sort(fcinfo); > +} Any reason not to specify array_sort in pg_proc.dat? The tests cover is_ascending => true | false, which is OK, but only (is_ascending = true, nulls_first => true) and (is_ascending => false, nulls_fist => false). For the case when both optional arguments are specified you have to test at least 4 combinations. -- Best regards, Aleksander Alekseev
On Tue, Oct 29, 2024 at 12:48 AM Aleksander Alekseev <aleksander@timescale.com> wrote:. > > 0001: > > > +{ oid => '8810', descr => 'sort array', > > + proname => 'array_sort', provolatile => 'v', prorettype => 'anyarray', > > + proargtypes => 'anyarray', prosrc => 'array_sort'}, > > I would expect that array_sort() should be IMMUTABLE. Is there a > reason for it to be VOLATILE? > https://www.postgresql.org/docs/current/sql-createfunction.html says: IMMUTABLE indicates that the function cannot modify the database and always returns the same result when given the same argument values; that is, it does not do database lookups or otherwise use information not directly present in its argument list. If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value. + { + typentry = lookup_type_cache(elmtyp, TYPECACHE_LT_OPR); + if (!OidIsValid(typentry->lt_opr)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_FUNCTION), + errmsg("could not identify ordering operator for type %s", + format_type_be(elmtyp)))); This error can happen. I think this conflicts with the doc IMMUTABLE description.
Hi Jian, > IMMUTABLE indicates that the function cannot modify the database and always > returns the same result when given the same argument values; that is, it does > not do database lookups or otherwise use information not directly present in its > argument list. If this option is given, any call of the function with > all-constant arguments can be immediately replaced with the function value. > > > + { > + typentry = lookup_type_cache(elmtyp, TYPECACHE_LT_OPR); > + if (!OidIsValid(typentry->lt_opr)) > + ereport(ERROR, > + (errcode(ERRCODE_UNDEFINED_FUNCTION), > + errmsg("could not identify ordering operator for type %s", > + format_type_be(elmtyp)))); > > This error can happen. I think this conflicts with the doc IMMUTABLE > description. lookup_type_cache() is used at least by array_position() which is marked as IMMUTABLE, so I believe this is fine. Similarly functions dealing with timezones can return different results between the DBMS restarts / updates, but we don't care and mark them IMMUTABLE anyway. Otherwise we couldn't use these functions in functional indexes which will make them rather useless. -- Best regards, Aleksander Alekseev
Hi, Thanks for the updated patch set. > > > +Datum > > > +array_sort_order(PG_FUNCTION_ARGS) > > > +{ > > > + return array_sort(fcinfo); > > > +} > > > + > > > +Datum > > > +array_sort_order_nulls_first(PG_FUNCTION_ARGS) > > > +{ > > > + return array_sort(fcinfo); > > > +} > > > > Any reason not to specify array_sort in pg_proc.dat? > > It is specified in 0001 (see oid => '8810'). What I meant was that I don't think these wrapper functions are needed. I think you can just do: ``` +{ oid => '8811', descr => 'sort array', + proname => 'array_sort', prorettype => 'anyarray', + proargtypes => 'anyarray bool', prosrc => 'array_sort'}, <-- array_sort is used directly in `prosrc` ``` ... unless I'm missing something. -- Best regards, Aleksander Alekseev
Hi, On Wed, Oct 30, 2024 at 9:29 PM Aleksander Alekseev <aleksander@timescale.com> wrote: > > Hi, > > Thanks for the updated patch set. > > > > > +Datum > > > > +array_sort_order(PG_FUNCTION_ARGS) > > > > +{ > > > > + return array_sort(fcinfo); > > > > +} > > > > + > > > > +Datum > > > > +array_sort_order_nulls_first(PG_FUNCTION_ARGS) > > > > +{ > > > > + return array_sort(fcinfo); > > > > +} > > > > > > Any reason not to specify array_sort in pg_proc.dat? > > > > It is specified in 0001 (see oid => '8810'). > > What I meant was that I don't think these wrapper functions are > needed. I think you can just do: > > ``` > +{ oid => '8811', descr => 'sort array', > + proname => 'array_sort', prorettype => 'anyarray', > + proargtypes => 'anyarray bool', prosrc => 'array_sort'}, <-- > array_sort is used directly in `prosrc` > ``` > > ... unless I'm missing something. There is a opr sanity check for this[1], if we remove these wrapper functions, regression test will fail with: - oid | proname | oid | proname ------+---------+-----+--------- -(0 rows) + oid | proname | oid | proname +------+------------+------+------------ + 8811 | array_sort | 8812 | array_sort + 8810 | array_sort | 8811 | array_sort + 8810 | array_sort | 8812 | array_sort +(3 rows) [1]: -- Considering only built-in procs (prolang = 12), look for multiple uses -- of the same internal function (ie, matching prosrc fields). It's OK to -- have several entries with different pronames for the same internal function, -- but conflicts in the number of arguments and other critical items should -- be complained of. (We don't check data types here; see next query.) -- Note: ignore aggregate functions here, since they all point to the same -- dummy built-in function. SELECT p1.oid, p1.proname, p2.oid, p2.proname FROM pg_proc AS p1, pg_proc AS p2 WHERE p1.oid < p2.oid AND p1.prosrc = p2.prosrc AND p1.prolang = 12 AND p2.prolang = 12 AND (p1.prokind != 'a' OR p2.prokind != 'a') AND (p1.prolang != p2.prolang OR p1.prokind != p2.prokind OR p1.prosecdef != p2.prosecdef OR p1.proleakproof != p2.proleakproof OR p1.proisstrict != p2.proisstrict OR p1.proretset != p2.proretset OR p1.provolatile != p2.provolatile OR p1.pronargs != p2.pronargs); > > -- > Best regards, > Aleksander Alekseev -- Regards Junwang Zhao