Re: crosstab/repivot...any interest? - Mailing list pgsql-hackers
From | Merlin Moncure |
---|---|
Subject | Re: crosstab/repivot...any interest? |
Date | |
Msg-id | CAHyXU0zYpYLpUuTJdHvDmPXJQy3TdMdVjZ=dZVXzoMTYtjEfFw@mail.gmail.com Whole thread Raw |
In response to | Re: crosstab/repivot...any interest? (Joe Conway <mail@joeconway.com>) |
List | pgsql-hackers |
On Tue, Feb 26, 2019 at 8:31 AM Joe Conway <mail@joeconway.com> wrote: > On 2/25/19 8:34 PM, Merlin Moncure wrote: > > The interface I'm looking at is: > > SELECT repivot( > > query TEXT, > > static_attributes INT, /* number of static attributes that are > > unchanged around key; we need this in our usages */ > > attribute_query TEXT); /* query that draws up the pivoted attribute > > list */ > > > > The driving query is expected to return 0+ static attributes which are > > essentially simply pasted to the output. The next two columns are the > > key column and the pivoting column. So if you had three attributes, > > the input set would be: > > > > a1, a2, a3, k1, p, v1...vn > > > > Where the coordinates v and p would exchange. I need to get this done > > quickly and so am trying to avoid more abstracted designs (maybe multi > > part keys should be supported through...this is big limitation of > > crosstab albeit with some obnoxious work arounds). > > Perhaps not enough coffee yet, but I am not sure I fully grok the design > here. A fully fleshed out example would be useful. Thanks for your interest. A lot of our application data is organized in columnar type formats. Typically, the analytical reports on the back of the database are organized withe output columns being a KPI or a unit of time. KPI Columns: Country,Make,Model,TimeSlice,Sales,Units US,Ford,Fusion,Y2018Q1,165MM$,250k US,Ford,Fusion,Y2018Q2,172MM$,261k US,Ford,Fusion,Y2018Q3,183MM$,268k ... Time Columns: Country,Make,Mode, KPI,Y2018Q1,Y2018Q2,Y2018Q3 US,Ford,Fusion,Y2018Q1,Sales,165MM$,172MM$,183MM$ US,Ford,Fusion,Y2018Q2,Units,250k,261k,268k SELECT repivot( <query pulling KPI format>, 1, /* only one static attribute */ 2, /* number of key columns, only needed if multi part keys are supported */ <query developing list of time columns>); In this example supporting multi-part, the repivot column is the 4th, assumed to be attributes first, keys second, pivot column third, data block last. Multi column pivots might be considered but I don't need them and that's a significant expansion in scope, so I'm avoiding that consideration. What we need to do is convert from the first format above (which is how data is materialized on table) to the second format. Just like as within crosstab, if the key column(s) are ordered into the function we can exploit that structure for an efficient implementation. 'Ford' and 'Fusion' are key columns; 'US' is uninteresting attribute column (it is unambiguously represented by 'Ford', 'Fusion') and so would be simply be copied from input to output set. Our data is materialized in KPI style (which is pretty typical) but we have large analytical reports that want it with columns representing units of time. This is farily common in the industry IMO. There are various pure SQL approaches to do this but they tend to force you to build out the X,Y columns and then join it all back together; this can spiral out of control quite quickly from a performance standpoint. A crosstab style crawling cursor over ordered set ought to get the job done very efficiently. tablefunc doesn't support multi part keys today, and the workaround is that you have so stuff a key into a formatted string and than parse it out for downstream joining, some needs of having to do that joining might hopefully be eliminated by allowing the attribute columns to be copied through. merlin
pgsql-hackers by date: