Home > mailing lists

Re: "pivot aggregation" with a patched intarray - Mailing list pgsql-hackers

From	Marc Mamin
Subject	Re: "pivot aggregation" with a patched intarray
Date	June 5, 2014 13:18:38
Msg-id	B6F6FD62F2624C4C9916AC0175D56D8828A822BB@jenmbs01.ad.intershop.net Whole thread Raw
In response to	Re: "pivot aggregation" with a patched intarray (Ali Akbar <the.apaan@gmail.com>)
Responses	Re: "pivot aggregation" with a patched intarray
List	pgsql-hackers

Tree view

> -----Original Message-----
> From: Ali Akbar [mailto:the.apaan@gmail.com]
> Sent: Donnerstag, 5. Juni 2014 01:12
> To: Marc Mamin
> Cc: Michael Paquier; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] "pivot aggregation" with a patched intarray
> 
> 2014-06-01 20:48 GMT+07:00 Marc Mamin <M.Mamin@intershop.de>:
> >
> > >On Sat, May 31, 2014 at 12:31 AM, Marc Mamin <M.Mamin@intershop.de>
> wrote:
> > >> I have patched intarray with 3 additional functions in order to
> > >> count[distinct] event IDs into arrays, whereas the array position
> > >> correspond to the integer values. (mimic column oriented storage)
> > >
> > >I didn't look at the feature itself, but here are some comments
> about
> > >the format of the patch:
> > >- Be careful the newlines on the file you posted use ¥r¥n, which is
> > >purely Windows stuff... This will generate unnecessary diffs with
> the
> > >source code
> > I don't mean to suggests this directly as a patch, I'm first
> > interested to see if there are some interest for such an aggregation
> > type.
> 
> From what i see, the icount_to_array is complementary to standard
> count() aggregates, but it produces array. If the values are not
> sparse, i think the performance and memory/storage benefit you
> mentioned will be true. But if the values are sparse, there will be
> many 0's, how it will perform?

I'm thinking about adding a final function to my aggregate that would replace zero values will nulls, 
hence transforming the intarray into a standard int[], possibly with nullbitmap and a lowerbound that can be > 1.
This will probably degrade the performance considerably, but may reduce the size of the end result for spare data and
nottoo small integers...

> 
> I'm interested to benchmark it with some use cases, to confirm the
> performance benefits of it.

Performances should greatly depend on the data distribution and order as they influence the number of palloc.
My first tests shown as well better and poorer results.

My target is not to get better performances at the first place, but to get a pivot structure in an early aggregation
stage.

regards,

Marc Mamin

Attachment

intarray_mod.tar.gz

pgsql-hackers by date:

From:
Date: 05 June 2014, 13:15:16
Subject: Re: pg_receivexlog add synchronous mode

From: Andres Freund
Date: 05 June 2014, 13:58:07
Subject: Re: slotname vs slot_name

Re: "pivot aggregation" with a patched intarray - Mailing list pgsql-hackers

Attachment

Previous

Next