Re: Review Report: propose to include 3 new functions into intarray and intagg - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Review Report: propose to include 3 new functions into intarray and intagg
Date
Msg-id 48CE4C6B.6050908@enterprisedb.com
Whole thread Raw
In response to Re: Review Report: propose to include 3 new functions into intarray and intagg  (Markus Wanner <markus@bluegap.ch>)
Responses Re: Review Report: propose to include 3 new functions into intarray and intagg  (Markus Wanner <markus@bluegap.ch>)
List pgsql-hackers
Markus Wanner wrote:
> Dmitry Koterov wrote:
>> But, what about intarray patch? Does somebody plan to review it? I'd 
>> prefer to include it too. If you approve, I'll correct the code style 
>> in intarray contrib patch too.
> 
> I've already volunteered for reviewing it as well. I just felt like 
> splitting things up...

Looking at the intarray patch:

I find it a bit unfriendly to have a function that depends on sorted 
input, but doesn't check it. But that's probably not a good enough 
reason to reject an otherwise simple and useful function. Also, we 
already have uniq, which doesn't strictly speaking require sorted input, 
but it does if you want to eliminate all duplicates from the array.

_int_group_count_sort seems a bit special purpose. Why does it bother to 
sort the output? That's wasted time if you don't need sorted output, or 
if you want the array sorted by the integer value instead of frequency. 
If you want sorted output, you can just sort it afterwards.

Also, it's requiring sorted input for a small performance gain, but 
there's a lot more precedence in the existing intarray functions to not 
require sorted input, but to sort the input instead (union, intersect, 
same, overlap).

I realize that the current implementation is faster for the use case 
where the input is sorted, and output needs to be sorted, but if we go 
down that path we'll soon have dozens of different variants of various 
functions, with different ordering requirements of inputs and outputs.

So, I'd suggest changing _int_group_count_sort so that it doesn't 
require sorted input, and doesn't sort the output. The binary search 
function looks good to me (I think I'd prefer naming it bsearch(), 
though, though I can see that it was named bidx in reference to the 
existing idx function). Also, as Markus pointed out, the SGML docs need 
to be updated.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Transaction Snapshots and Hot Standby
Next
From: Andrew Dunstan
Date:
Subject: Re: no XLOG during COPY?