Re: Proposal to introduce a shuffle function to intarray extension - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Proposal to introduce a shuffle function to intarray extension
Date
Msg-id CA+hUKG+TPcsR-OmioTdtTHBs9k6dS0fOcgkw4YSdp_=RJhCxoQ@mail.gmail.com
Whole thread Raw
In response to Re: Proposal to introduce a shuffle function to intarray extension  (Martin Kalcher <martin.kalcher@aboutsource.net>)
Responses Re: Proposal to introduce a shuffle function to intarray extension
Re: Proposal to introduce a shuffle function to intarray extension
List pgsql-hackers
On Mon, Jul 18, 2022 at 4:15 AM Martin Kalcher
<martin.kalcher@aboutsource.net> wrote:
> Am 17.07.22 um 08:00 schrieb Thomas Munro:
> >> Actually ... is there a reason to bother with an intarray version
> >> at all, rather than going straight for an in-core anyarray function?
> >> It's not obvious to me that an int4-only version would have
> >> major performance advantages.
> >
> > Yeah, that seems like a good direction.  If there is a performance
> > advantage to specialising, then perhaps we only have to specialise on
> > size, not type.  Perhaps there could be a general function that
> > internally looks out for typbyval && typlen == 4, and dispatches to a
> > specialised 4-byte, and likewise for 8, if it can, and that'd already
> > be enough to cover int, bigint, float etc, without needing
> > specialisations for each type.
>
> I played around with the idea of an anyarray shuffle(). The hard part
> was to deal with arrays with variable length elements, as they can not
> be swapped easily in place. I solved it by creating an intermediate
> array of references to the elements. I'll attach a patch with the proof
> of concept. Unfortunatly it is already about 5 times slower than the
> specialised version and i am not sure if it is worth going down that road.

Seems OK for a worst case.  It must still be a lot faster than doing
it in SQL.  Now I wonder what the exact requirements would be to
dispatch to a faster version that would handle int4.  I haven't
studied this in detail but perhaps to dispatch to a fast shuffle for
objects of size X, the requirement would be something like typlen == X
&& align_bytes <= typlen && typlen % align_bytes == 0, where
align_bytes is typalign converted to ALIGNOF_{CHAR,SHORT,INT,DOUBLE}?
Or in English, 'the data consists of densely packed objects of fixed
size X, no padding'.  Or perhaps you can work out the padded size and
use that, to catch a few more types.  Then you call
array_shuffle_{2,4,8}() as appropriate, which should be as fast as
your original int[] proposal, but work also for float, date, ...?

About your experimental patch, I haven't reviewed it properly or tried
it but I wonder if uint32 dat_offset, uint32 size (= half size
elements) would be enough due to limitations on varlenas.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: postgres_fdw versus regconfig and similar constants
Next
From: Tom Lane
Date:
Subject: Re: Proposal to introduce a shuffle function to intarray extension