Thread: generic crosstab ?

generic crosstab ?

From
Andreas
Date:
Hi,

is there a generic solution to dump the result of a query as a crosstab, 
when I can't know how many columns I will need?

E.g. I get something like this:

id, x
1,  a
1,  b
1,  c
2,  l
2,  m


and I'd like to see it as:

id,  x1,  x2,  x3,  .....  xn
1,   a,   b,   c,    null, .... null
2,   l,    m,  ....

I fear the problem is I dont know n.


Re: generic crosstab ?

From
Samuel Gendler
Date:


On Tue, Apr 24, 2012 at 1:01 PM, Andreas <maps.on@gmx.net> wrote:
Hi,

is there a generic solution to dump the result of a query as a crosstab, when I can't know how many columns I will need?

E.g. I get something like this:

id, x
1,  a
1,  b
1,  c
2,  l
2,  m



Yes.  You can provide a query which returns the columns to the version of the crosstab function which looks like this:

crosstab(text source_sql, text category_sql)
It does exactly what you are looking for. The second query returns the set of values that act as columns in the final result (the pivot for each row in the result returned by the first query).  This allows the function to correctly insert a null for any column for which there is no row in the first query results.


Re: generic crosstab ?

From
Andreas
Date:
Am 24.04.2012 22:08, schrieb Samuel Gendler:


On Tue, Apr 24, 2012 at 1:01 PM, Andreas <maps.on@gmx.net> wrote:
Hi,

is there a generic solution to dump the result of a query as a crosstab, when I can't know how many columns I will need?

E.g. I get something like this:

id, x
1,  a
1,  b
1,  c
2,  l
2,  m



Yes.  You can provide a query which returns the columns to the version of the crosstab function which looks like this:

crosstab(text source_sql, text category_sql)
It does exactly what you are looking for. The second query returns the set of values that act as columns in the final result (the pivot for each row in the result returned by the first query).  This allows the function to correctly insert a null for any column for which there is no row in the first query results.



I got stuck with an error that translates to "Materialisation mode is needed but is not allowed in this context."
I couldn't figure out what this materialisation mode is, yet.

Could you please have a look at my query sample?
Both queries work for themselves but crosstab() fails.   :(
I checked and there are never more than 20 child_ids per parent_id so there should be enough columns.

select
    crosstab (
$$
    select
        parent_id                    as  row_name,
        'x' || row_number() over ( partition by parent_id order by child_id )  as  category,
        child_id          as  value
    from
        children
    order by 1
$$,
$$
    select 'x' || generate_series(1, 20) as  cat  order by 1
$$
);

Re: generic crosstab ?

From
"David Johnston"
Date:

From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Andreas
Sent: Tuesday, April 24, 2012 5:35 PM
To: Samuel Gendler
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] generic crosstab ?

 

Am 24.04.2012 22:08, schrieb Samuel Gendler:

 

On Tue, Apr 24, 2012 at 1:01 PM, Andreas <maps.on@gmx.net> wrote:

Hi,

is there a generic solution to dump the result of a query as a crosstab, when I can't know how many columns I will need?

E.g. I get something like this:

id, x
1,  a
1,  b
1,  c
2,  l
2,  m

 

Yes.  You can provide a query which returns the columns to the version of the crosstab function which looks like this:

 

crosstab(text source_sql, text category_sql)

It does exactly what you are looking for. The second query returns the set of values that act as columns in the final result (the pivot for each row in the result returned by the first query).  This allows the function to correctly insert a null for any column for which there is no row in the first query results.

 

 


I got stuck with an error that translates to "Materialisation mode is needed but is not allowed in this context."
I couldn't figure out what this materialisation mode is, yet.

Could you please have a look at my query sample?
Both queries work for themselves but crosstab() fails.   :(
I checked and there are never more than 20 child_ids per parent_id so there should be enough columns.

select
    crosstab (
$$
    select
        parent_id                    as  row_name,
        'x' || row_number() over ( partition by parent_id order by child_id )  as  category,
        child_id          as  value
    from
        children
    order by 1
$$,
$$
    select 'x' || generate_series(1, 20) as  cat  order by 1
$$
);

 

 

You must specify the output record structure:

 

SELECT crosstab(text, text) AS ( col1_name col1_type [, colN_name colN_type]* )

 

See: http://www.postgresql.org/docs/9.0/interactive/tablefunc.html for official usage and examples

 

Whether this relates to the “materialization node” message you are receiving I have no idea.

 

Dave

 

Re: generic crosstab ?

From
Joe Conway
Date:
On 04/24/2012 02:42 PM, David Johnston wrote:
> You must specify the output record structure:
> 
> SELECT crosstab(text, text) AS ( col1_name col1_type [, colN_name
> colN_type]* )
> 
> Whether this relates to the “materialization node” message you are
> receiving I have no idea.

The error is because you are selecting from a set returning function in
the target list rather than the from clause. It should be more like:

SELECT * FROM crosstab(text, text) AS ( col1_name col1_type [, colN_name
> colN_type]* )

HTH,

Joe

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support


Re: generic crosstab ?

From
Andreas
Date:
Am 25.04.2012 00:04, schrieb Joe Conway:
> On 04/24/2012 02:42 PM, David Johnston wrote:
>> You must specify the output record structure:
>>
>> SELECT crosstab(text, text) AS ( col1_name col1_type [, colN_name
>> colN_type]* )
>>
>> Whether this relates to the “materialization node” message you are
>> receiving I have no idea.
> The error is because you are selecting from a set returning function in
> the target list rather than the from clause. It should be more like:
>
> SELECT * FROM crosstab(text, text) AS ( col1_name col1_type [, colN_name
>> colN_type]* )
>

OK now i get at least some result.
But
1) I need to know how many categories will apear to construct the target 
list. This is a wee bit of a problem as this number is actually dynamic.

2) There are some rows in the resulting list with empty columns within 
the row.
When I execute the first query for a parent ID that has gaps in the 
crosstab I see it shows no gaps in the categories when called outside 
crosstab().
E.g. it dumps  x1, x2, x3, x4, x5 when called seperately but crosstab() 
shows
x1, x2, null, null, x5, null, x6, x7

How does this make sense ?


Thanks for the answers so far   :)


select  *
from    crosstab (
$$    select        parent_id                    as  row_name,        'x' || row_number() over ( partition by parent_id
orderby 
 
child_id )  as  category,        child_id          as  value    from        children    order by 1
$$,
$$    select 'x' || generate_series(1, 15) as  category  order by 1
$$
)
as result (    row_name    integer,    x1          integer,    x2          integer,    x3          integer,    x4
  integer,    x5          integer,    x6          integer,    x7          integer,    x8          integer,    x9
 integer,    x10         integer,    x11         integer,    x12         integer,    x13         integer,    x14
integer,    x15         integer
 
)


Re: generic crosstab ?

From
Pavel Stehule
Date:
Hello

try to use cursors

http://okbob.blogspot.com/2008/08/using-cursors-for-generating-cross.html

Regards

Pavel Stehule

2012/4/24 Andreas <maps.on@gmx.net>:
> Hi,
>
> is there a generic solution to dump the result of a query as a crosstab,
> when I can't know how many columns I will need?
>
> E.g. I get something like this:
>
> id, x
> 1,  a
> 1,  b
> 1,  c
> 2,  l
> 2,  m
>
>
> and I'd like to see it as:
>
> id,  x1,  x2,  x3,  .....  xn
> 1,   a,   b,   c,    null, .... null
> 2,   l,    m,  ....
>
> I fear the problem is I dont know n.
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql


Re: generic crosstab ?

From
Samuel Gendler
Date:


On Tue, Apr 24, 2012 at 3:37 PM, Andreas <maps.on@gmx.net> wrote:
Am 25.04.2012 00:04, schrieb Joe Conway:

On 04/24/2012 02:42 PM, David Johnston wrote:
You must specify the output record structure:

SELECT crosstab(text, text) AS ( col1_name col1_type [, colN_name
colN_type]* )

Whether this relates to the “materialization node” message you are
receiving I have no idea.
The error is because you are selecting from a set returning function in
the target list rather than the from clause. It should be more like:

SELECT * FROM crosstab(text, text) AS ( col1_name col1_type [, colN_name
colN_type]* )


OK now i get at least some result.
But
1) I need to know how many categories will apear to construct the target list. This is a wee bit of a problem as this number is actually dynamic.

2) There are some rows in the resulting list with empty columns within the row.
When I execute the first query for a parent ID that has gaps in the crosstab I see it shows no gaps in the categories when called outside crosstab().
E.g. it dumps  x1, x2, x3, x4, x5 when called seperately but crosstab() shows
x1, x2, null, null, x5, null, x6, x7

How does this make sense ?

I believe that the crosstab function is fairly naive about constructing the crosstab and requires that each row in the first query be in row-order for the crosstab output.  It is possible that it even requires the rows in the first query to be in column order in the crosstba output.

In other words, if the first query returns results like this:

row1, col3, value1
row1, col1, value2
row2, col1, value3
row2, col3, value4
row1, col2, value5

I believe that the last row1 entry will be dropped, or else maybe you'll get two rows in the crosstab output, each representing row1, and I am hypothesizing (without re-reading the documentation) that the fact that col3 comes before col1 in row1 will also result in a hole in the output.

Basically, your first query should include "order by 1,2" rather than just "order by 1"

Have you gone through the documentation for the crosstab functions?  I believe that will answer most of your questions.  It has been a long time since I've read them, but I remember them being fairly complete.

As for your question about knowing how many columns you will have, I've always solved that by opening a serialized transaction and running a query to get the possible set of columns, then issue the sql statement that includes the crosstab(sql,sql) call, using the first query results to build up the return result type.  Depending on the nature of your data set, it is possible you can forgo the serialized transaction, since there may be little to no risk of new column names appearing between the two queries.  You could also just structure the queries in the second statement so that it is guaranteed to return only the results returned by your initial query - include a where clause which is guaranteed to return only rows that match the columns returned by your first query.  You may wind up missing a value that was added between the first query and the second query, but odds are good that if your data is that volatile, it doesn't matter if you miss a value which happened to be inserted in those few milliseconds.

--sam