Re: Sort functions with specialized comparators - Mailing list pgsql-hackers

From John Naylor
Subject Re: Sort functions with specialized comparators
Date
Msg-id CANWCAZa+O3QiNKW1YYTOusHL-vEFcSz2VH+-qPv1EfvCpc1b9w@mail.gmail.com
Whole thread Raw
In response to Re: Sort functions with specialized comparators  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
Responses Re: Sort functions with specialized comparators
List pgsql-hackers
On Mon, Jan 6, 2025 at 10:51 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> > On 6 Jan 2025, at 15:54, John Naylor <johncnaylorls@gmail.com> wrote:

> > argument. Like some other patches in this series, this does have the
> > side effect of removing the ability to skip quinique(), so that should
> > be benchmarked (I can do that this week unless you beat me to it).
>
> With the same setup as in the first message of this thread we can do:
>
> postgres=# SELECT _int_contains(arr,ARRAY[1]) FROM arrays_to_sort;
>
> before patch patch
> Time: 567.928 ms
> after patch
> Time: 890.297 ms
> timing of this function is dominated by PREPAREARR(a);
>
> What bothers me is that PREPAREARR(a); is returning new array in case of empty input. That's why I considered little
refactoringof resize_intArrayType(): reorder cases so that if (num == ARRNELEMS(a)) was first. 

Hmm, I'm confused. First, none of the arrays are empty that I can see
-- am I missing something?

Then, the first message setup is

CREATE TABLE arrays_to_sort AS
   SELECT array_shuffle(a) arr
   FROM
       (SELECT ARRAY(SELECT generate_series(1, 1000000)) a),
       generate_series(1, 10);

...so most of the time is in sorting the big array, and I don't see a
regression, the opposite in fact:

SELECT _int_contains(arr,ARRAY[1]) FROM arrays_to_sort;

master:
1492.552 ms

v9:
873.697 ms

The case I was concerned about was if the big array was already sorted
and unique. Then, it's conceivable that unnecessarily running qunique
would make things slower, but I don't see that either:

--ordered
CREATE TABLE arrays_sorted AS
   SELECT a arr
   FROM
       (SELECT ARRAY(SELECT generate_series(1, 1000000)) a),
       generate_series(1, 10);

SELECT _int_contains(arr,ARRAY[1]) FROM arrays_sorted;

master:
31.388

v9:
28.247

--
John Naylor
Amazon Web Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: allow changing autovacuum_max_workers without restarting
Next
From: Nisha Moond
Date:
Subject: Re: Conflict detection for update_deleted in logical replication