Re: "pivot aggregation" with a patched intarray - Mailing list pgsql-hackers

From Marc Mamin
Subject Re: "pivot aggregation" with a patched intarray
Date
Msg-id B6F6FD62F2624C4C9916AC0175D56D8828A822BB@jenmbs01.ad.intershop.net
Whole thread Raw
In response to Re: "pivot aggregation" with a patched intarray  (Ali Akbar <the.apaan@gmail.com>)
Responses Re: "pivot aggregation" with a patched intarray  (Ali Akbar <the.apaan@gmail.com>)
List pgsql-hackers
> -----Original Message-----
> From: Ali Akbar [mailto:the.apaan@gmail.com]
> Sent: Donnerstag, 5. Juni 2014 01:12
> To: Marc Mamin
> Cc: Michael Paquier; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] "pivot aggregation" with a patched intarray
> 
> 2014-06-01 20:48 GMT+07:00 Marc Mamin <M.Mamin@intershop.de>:
> >
> > >On Sat, May 31, 2014 at 12:31 AM, Marc Mamin <M.Mamin@intershop.de>
> wrote:
> > >> I have patched intarray with 3 additional functions in order to
> > >> count[distinct] event IDs into arrays, whereas the array position
> > >> correspond to the integer values. (mimic column oriented storage)
> > >
> > >I didn't look at the feature itself, but here are some comments
> about
> > >the format of the patch:
> > >- Be careful the newlines on the file you posted use ¥r¥n, which is
> > >purely Windows stuff... This will generate unnecessary diffs with
> the
> > >source code
> > I don't mean to suggests this directly as a patch, I'm first
> > interested to see if there are some interest for such an aggregation
> > type.
> 
> From what i see, the icount_to_array is complementary to standard
> count() aggregates, but it produces array. If the values are not
> sparse, i think the performance and memory/storage benefit you
> mentioned will be true. But if the values are sparse, there will be
> many 0's, how it will perform?

I'm thinking about adding a final function to my aggregate that would replace zero values will nulls, 
hence transforming the intarray into a standard int[], possibly with nullbitmap and a lowerbound that can be > 1.
This will probably degrade the performance considerably, but may reduce the size of the end result for spare data and
nottoo small integers...
 


> 
> I'm interested to benchmark it with some use cases, to confirm the
> performance benefits of it.


Performances should greatly depend on the data distribution and order as they influence the number of palloc.
My first tests shown as well better and poorer results.

My target is not to get better performances at the first place, but to get a pivot structure in an early aggregation
stage.


regards,

Marc Mamin


Attachment

pgsql-hackers by date:

Previous
From:
Date:
Subject: Re: pg_receivexlog add synchronous mode
Next
From: Andres Freund
Date:
Subject: Re: slotname vs slot_name