Re: generic crosstab ? - Mailing list pgsql-sql

From Samuel Gendler
Subject Re: generic crosstab ?
Date
Msg-id CAEV0TzB8bD8Wnqdtp89=p2kFxB0Xnc1r-9ekjO6oWvuXxfwX6w@mail.gmail.com
Whole thread Raw
In response to Re: generic crosstab ?  (Andreas <maps.on@gmx.net>)
List pgsql-sql


On Tue, Apr 24, 2012 at 3:37 PM, Andreas <maps.on@gmx.net> wrote:
Am 25.04.2012 00:04, schrieb Joe Conway:

On 04/24/2012 02:42 PM, David Johnston wrote:
You must specify the output record structure:

SELECT crosstab(text, text) AS ( col1_name col1_type [, colN_name
colN_type]* )

Whether this relates to the “materialization node” message you are
receiving I have no idea.
The error is because you are selecting from a set returning function in
the target list rather than the from clause. It should be more like:

SELECT * FROM crosstab(text, text) AS ( col1_name col1_type [, colN_name
colN_type]* )


OK now i get at least some result.
But
1) I need to know how many categories will apear to construct the target list. This is a wee bit of a problem as this number is actually dynamic.

2) There are some rows in the resulting list with empty columns within the row.
When I execute the first query for a parent ID that has gaps in the crosstab I see it shows no gaps in the categories when called outside crosstab().
E.g. it dumps  x1, x2, x3, x4, x5 when called seperately but crosstab() shows
x1, x2, null, null, x5, null, x6, x7

How does this make sense ?

I believe that the crosstab function is fairly naive about constructing the crosstab and requires that each row in the first query be in row-order for the crosstab output.  It is possible that it even requires the rows in the first query to be in column order in the crosstba output.

In other words, if the first query returns results like this:

row1, col3, value1
row1, col1, value2
row2, col1, value3
row2, col3, value4
row1, col2, value5

I believe that the last row1 entry will be dropped, or else maybe you'll get two rows in the crosstab output, each representing row1, and I am hypothesizing (without re-reading the documentation) that the fact that col3 comes before col1 in row1 will also result in a hole in the output.

Basically, your first query should include "order by 1,2" rather than just "order by 1"

Have you gone through the documentation for the crosstab functions?  I believe that will answer most of your questions.  It has been a long time since I've read them, but I remember them being fairly complete.

As for your question about knowing how many columns you will have, I've always solved that by opening a serialized transaction and running a query to get the possible set of columns, then issue the sql statement that includes the crosstab(sql,sql) call, using the first query results to build up the return result type.  Depending on the nature of your data set, it is possible you can forgo the serialized transaction, since there may be little to no risk of new column names appearing between the two queries.  You could also just structure the queries in the second statement so that it is guaranteed to return only the results returned by your initial query - include a where clause which is guaranteed to return only rows that match the columns returned by your first query.  You may wind up missing a value that was added between the first query and the second query, but odds are good that if your data is that volatile, it doesn't matter if you miss a value which happened to be inserted in those few milliseconds.

--sam

pgsql-sql by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: generic crosstab ?
Next
From: Venkateswara Rao Dokku
Date:
Subject: Regarding report generated by the pgFouine