crosstab/repivot...any interest? - Mailing list pgsql-hackers

From Merlin Moncure
Subject crosstab/repivot...any interest?
Date
Msg-id CAHyXU0yE_bqF9kuyfbSpVdpiEkH0CtmTJGugJ8Ag6jftGd+QVA@mail.gmail.com
Whole thread Raw
In response to Re: crosstab/repivot...any interest?  (Morris de Oryx <morrisdeoryx@gmail.com>)
Responses Re: crosstab/repivot...any interest?
List pgsql-hackers
On Fri, Jan 25, 2019 at 9:14 PM Morris de Oryx <morrisdeoryx@gmail.com> wrote:
>
> Hello, I'm not a C coder and can't help....but I love cross-tab/pivot tables. They're the best, and just fantastic for preparing data to feed into various analysis tools. The tablefunc module is helpful, but a bit awkward to use (as I remember it.)
>
> From a user's point of view, I high-performance cross-tab generator would be just fantastic.
>
> As I understand it, this is what's involved in a pivot:
>
> 1. Identify rows that should be grouped (consolidated.)
> 2. Distinguish the value that identifies each derived column.
> 3. Distinguish the value that identifies each row-column value.
> 4. Collapse the rows, build the columns, and populate the 'cells' with data.
>
> In an ideal world, you would be able to perform different grouping operations. Such as count, sum, avg, etc.
>
> If there's a way to do this in a system-wide and standards-pointing way, so much the better.
>
> Apologies if I'm violating list etiquette by jumping in here. I've been lurking on several Postgres lists for a bit and picking up interesting details every day. If I've been Unintentionally and Cluelessly Off, I'm find with being told.

No worries, sir! Apologies on the late reply.  I've made some headway on this item.  Waiting for postgres to implement the SQL standard pivoting (regardless if it implements the cases I need) is not an option for my personal case. I can't use the SQL approach either as it's very slow and imposing some scaling limits that need to work around in the short run.

My strategy is to borrow [steal] from crosstab_hash and make a new version called repivot which takes an arleady pivoted data set and repivots it against an identified column.   Most of the code can be shared with tablefunc so ideally this could live as an amendment to that extension.  That might not happen though, so I'm going to package it as a separate extension (removing the majority of tablefunc that I don't need) and submit it to this group for consideration.

If we punt, it'll end up as a private extension or live the life of an Orphan in pgxn.  If there's some interest here, we can consider a new contrib extension (which I personally rate very unlikely) or recasting as an extra routine to tablefunc.  Any way we slice it, huge thanks to Joe Conway&co for giving us such an awesome function to work with all these years (not to mention the strategic plr language).  SRF crosstab() is still somewhat baroque, but it still fills a niche that nothing else implements.

The interface I'm looking at is:
SELECT repivot(
  query TEXT,
  static_attributes INT,  /* number of static attributes that are unchanged around key; we need this in our usages */
  attribute_query  TEXT); /* query that draws up the pivoted attribute list */

The driving query is expected to return 0+ static attributes which are essentially simply pasted to the output. The next two columns are the key column and the pivoting column.   So if you had three attributes, the input set would be:

a1, a2, a3, k1, p, v1...vn

Where the coordinates v and p would exchange.  I need to get this done quickly and so am trying to avoid more abstracted designs (maybe multi part keys should be supported through...this is big limitation of crosstab albeit with some obnoxious work arounds).

merlin

pgsql-hackers by date:

Previous
From: Haribabu Kommi
Date:
Subject: Re: current_logfiles not following group access and instead followslog_file_mode permissions
Next
From: "Nagaura, Ryohei"
Date:
Subject: RE: Timeout parameters