Thread: Question(s) about crosstab

Question(s) about crosstab

From
Ken Tanzer
Date:
Hi.  I've got a simple table unit_hold, with grant numbers, buildings and counts of unit types, which I need to summarize, along with a table listing unit types:

\d unit_hold
                 Table "public.unit_hold"
        Column        |         Type          | Modifiers 
----------------------+-----------------------+-----------
 grant_number_code    | character varying(10) | 
 housing_project_code | character varying(10) | 
 unit_type_code       | character varying(10) | 
 count                | bigint                | 

SELECT * FROM unit_hold limit 3;
 grant_number_code | housing_project_code | unit_type_code | count 
-------------------+----------------------+----------------+-------
 1                 | AAAA                 | 4BR            |     1
 1                 | BBBB                 | 1BR            |     1
 1                 | CCCC                 | 1BR            |     1

SELECT unit_type_code,description FROM l_unit_type;
 unit_type_code | description 
----------------+-------------
 5BR            | 5 Bedroom
 4BR            | 4 Bedroom
 3BR            | 3 Bedroom
 6BR            | 6 Bedroom
 UNKNOWN        | Unknown
 GROUP          | Group Home
 2BR            | 2 Bedroom
 1BR            | 1 Bedroom
 0BR            | Studio
 SRO            | SRO


I thought this would be a good candidate for crosstab.  After wrestling with the documentation, this is the best I could come up with:

SELECT * FROM crosstab(
  'SELECT housing_project_code||''_''||grant_number_code AS project_and_grant,grant_number_code,housing_project_code,unit_type_code,count FROM unit_hold ORDER BY 1,2',
  'SELECT * FROM (SELECT DISTINCT unit_type_code FROM l_unit_type) foo ORDER BY unit_type_code ~ ''^[0-9]'' DESC, unit_type_code'
) AS ct(project_and_grant varchar, grant_number_code varchar, housing_project_code varchar, "0BR" bigint, "1BR" bigint, "2BR" bigint, "3BR" bigint, "4BR" bigint, "5BR" bigint, "6BR" bigint,"GROUP" bigint, "SRO" bigint, "UNKNOWN" bigint)

So here are my questions:

1)  Is there a simpler way?  I'm hoping I made this unnecessarily cumbersome and complicated.
2)  AFAICT, if a new unit type were to be added, I'd have to rewrite this query.  Is there any way to avoid that?
3)  It seems like everything after the first query, except for the category field, is redundant information, and that in theory you should be able to say crosstab('query','category_field').  Is there any inherent reason this simpler form couldn't work, or is it just that no one has wanted to do it, or gotten to it yet?

Thanks in advance!

Ken


--
AGENCY Software  
A data system that puts you in control
100% Free Software
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: Question(s) about crosstab

From
rob stone
Date:
O
n Tue, 2013-12-17 at 15:31 -0800, Ken Tanzer wrote:


>
> \d unit_hold
>                  Table "public.unit_hold"
>         Column        |         Type          | Modifiers
> ----------------------+-----------------------+-----------
>  grant_number_code    | character varying(10) |
>  housing_project_code | character varying(10) |
>  unit_type_code       | character varying(10) |
>  count                | bigint                |
>
>
> SELECT * FROM unit_hold limit 3;
>  grant_number_code | housing_project_code | unit_type_code | count
> -------------------+----------------------+----------------+-------
>  1                 | AAAA                 | 4BR            |     1
>  1                 | BBBB                 | 1BR            |     1
>  1                 | CCCC                 | 1BR            |     1
>
>
> SELECT unit_type_code,description FROM l_unit_type;
>  unit_type_code | description
> ----------------+-------------
>  5BR            | 5 Bedroom
>  4BR            | 4 Bedroom
>  3BR            | 3 Bedroom
>  6BR            | 6 Bedroom
>  UNKNOWN        | Unknown
>  GROUP          | Group Home
>  2BR            | 2 Bedroom
>  1BR            | 1 Bedroom
>  0BR            | Studio
>  SRO            | SRO
>
>
>
>
> I thought this would be a good candidate for crosstab.  After
> wrestling with the documentation, this is the best I could come up
> with:
>
>
> SELECT * FROM crosstab(
>   'SELECT housing_project_code||''_''||grant_number_code AS
> project_and_grant,grant_number_code,housing_project_code,unit_type_code,count FROM unit_hold ORDER BY 1,2',
>   'SELECT * FROM (SELECT DISTINCT unit_type_code FROM l_unit_type) foo
> ORDER BY unit_type_code ~ ''^[0-9]'' DESC, unit_type_code'
> ) AS ct(project_and_grant varchar, grant_number_code varchar,
> housing_project_code varchar, "0BR" bigint, "1BR" bigint, "2BR"
> bigint, "3BR" bigint, "4BR" bigint, "5BR" bigint, "6BR" bigint,"GROUP"
> bigint, "SRO" bigint, "UNKNOWN" bigint)
>
>
>
> So here are my questions:
>
>
> 1)  Is there a simpler way?  I'm hoping I made this unnecessarily
> cumbersome and complicated.
> 2)  AFAICT, if a new unit type were to be added, I'd have to rewrite
> this query.  Is there any way to avoid that?
> 3)  It seems like everything after the first query, except for the
> category field, is redundant information, and that in theory you
> should be able to say crosstab('query','category_field').  Is there
> any inherent reason this simpler form couldn't work, or is it just
> that no one has wanted to do it, or gotten to it yet?
>
>
> Thanks in advance!
>
>
> Ken
>
>
>
>
> --
>
> AGENCY Software
> A data system that puts you in control
> 100% Free Software
> http://agency-software.org/
> ken.tanzer@agency-software.org
> (253) 245-3801
>
>
> Subscribe to the mailing list to
> learn more about AGENCY or
> follow the discussion.


SELECT UH.grant_number_code, UH.housing_project_code, UH. count,
UT.description
FROM l_unit_type UT, unit_hold UH
WHERE UH.unit_type_code = UT.unit_type_code;


Easier to create a view.



Re: Question(s) about crosstab

From
Ken Tanzer
Date:
On Tue, Dec 17, 2013 at 3:47 PM, rob stone <floriparob@gmail.com> wrote:


SELECT UH.grant_number_code, UH.housing_project_code, UH. count,
UT.description
FROM l_unit_type UT, unit_hold UH
WHERE UH.unit_type_code = UT.unit_type_code;


Easier to create a view.


Thanks Rob, but that doesn't get the data into a 1 line per grant/building format, with the unit types as columns.  That's why I was looking at crosstab in the first place!

Cheers,
Ken

--
AGENCY Software  
A data system that puts you in control
100% Free Software
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: Question(s) about crosstab

From
David Johnston
Date:
Ken Tanzer wrote
> 1)  Is there a simpler way?  I'm hoping I made this unnecessarily
> cumbersome and complicated.
> 2)  AFAICT, if a new unit type were to be added, I'd have to rewrite this
> query.  Is there any way to avoid that?
> 3)  It seems like everything after the first query, except for the
> category
> field, is redundant information, and that in theory you should be able to
> say crosstab('query','category_field').  Is there any inherent reason this
> simpler form couldn't work, or is it just that no one has wanted to do it,
> or gotten to it yet?

1) Pivot Tables...(not a PostgreSQL feature I'm afraid)

2) Not that I am aware of.  I would suggest writing the query so that
"Other" is a valid group and any unmapped types get aliased to "Other" so at
least the query counts everything and you know that if "Other" is non-zero
you have some alterations to make.

3) Limitation of SQL - explained below:

The function call string that you pass in is just that, a string, the SQL
construct within which it resides has no knowledge of its contents.

SQL has the hard requirement that at the time you submit a query all columns
must be known.  If a function is polymorphic (in the sense it can output
different columns/row-types) then when you call that function you must
indicate which columns (and types) are going to be output by the function
during this specific execution.

As an aside you should consider dollar-quoting:

SELECT function_call(
$arg1$ SELECT '1' AS one $arg1$, $arg2$ SELECT '2two' AS "22" $arg2"
);

That way you do not need to deal with escaping the embedded quotes in the
query and can fairly easily extract the query text and run it standalone,
modify it, then copy it back without modification.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Question-s-about-crosstab-tp5783810p5783818.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Question(s) about crosstab

From
Ken Tanzer
Date:
On Tue, Dec 17, 2013 at 5:11 PM, David Johnston <polobo@yahoo.com> wrote:
3) Limitation of SQL - explained below:

The function call string that you pass in is just that, a string, the SQL
construct within which it resides has no knowledge of its contents.

SQL has the hard requirement that at the time you submit a query all columns
must be known.  If a function is polymorphic (in the sense it can output
different columns/row-types) then when you call that function you must
indicate which columns (and types) are going to be output by the function
during this specific execution.

I guess crosstabs were not all that I hoped they were (basically pivot tables), but thanks for the clear explanation.  

Cheers,
Ken

--
AGENCY Software  
A data system that puts you in control
100% Free Software
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: Question(s) about crosstab

From
Scott Marlowe
Date:
On Tue, Dec 17, 2013 at 10:42 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote:
>
> On Tue, Dec 17, 2013 at 5:11 PM, David Johnston <polobo@yahoo.com> wrote:
>>
>> 3) Limitation of SQL - explained below:
>>
>> The function call string that you pass in is just that, a string, the SQL
>> construct within which it resides has no knowledge of its contents.
>>
>> SQL has the hard requirement that at the time you submit a query all columns
>> must be known.  If a function is polymorphic (in the sense it can output
>> different columns/row-types) then when you call that function you must
>> indicate which columns (and types) are going to be output by the function
>> during this specific execution.
>
>
> I guess crosstabs were not all that I hoped they were (basically pivot tables), but thanks for the clear explanation.

In the past I've written simple bash, perl, php etc scripts that
interrogated catalogs and then built my crosstab queries for me. You
could do it in a pl language, tho probably not easily in plpgsql.
plpython or plperl etc would proabably be a good place to start.


Re: Question(s) about crosstab

From
John Abraham
Date:
Regarding crosstab, yes it's basically too complicated to use directly.  Here are the options:

1) write code (in another language, perhaps) to create your cross tab queries by
inspecting the tables, which then submits those queries to create views.  We have a web-app in
django/python that will create crosstab views in this way.  (We use it to attach the values to spatial shapes
in PostGIS, so that other GIS programs, such as mapserver, can use it.  GIS programs always
seem to expect things to be in crosstab format.)

2) Export to CSV, then pivottable in Excel, or AWK, or Perl, or whatever.  The problem with this
is that the results are NOT in your database, they are in the external table.

3) Can't "someone" write a pl language routine that does it better?  I'd be willing to work on the core
functionality in python if someone else would be willing to embed it in plpython (I've never used plpython.)



> ----------------------------------------------------------------------
>
> Date: Tue, 17 Dec 2013 15:31:54 -0800
> From: Ken Tanzer <ken.tanzer@gmail.com>
> Hi.  I've got a simple table unit_hold, with grant numbers, buildings and
> counts of unit types, which I need to summarize, along with a table listing
> unit types:

<snip>
>
> I thought this would be a good candidate for crosstab.  After wrestling
> with the documentation, this is the best I could come up with:
>
> SELECT * FROM crosstab(
>  'SELECT housing_project_code||''_''||grant_number_code AS
> project_and_grant,grant_number_code,housing_project_code,unit_type_code,count
> FROM unit_hold ORDER BY 1,2',
>  'SELECT * FROM (SELECT DISTINCT unit_type_code FROM l_unit_type) foo
> ORDER BY unit_type_code ~ ''^[0-9]'' DESC, unit_type_code'
> ) AS ct(project_and_grant varchar, grant_number_code varchar,
> housing_project_code varchar, "0BR" bigint, "1BR" bigint, "2BR" bigint,
> "3BR" bigint, "4BR" bigint, "5BR" bigint, "6BR" bigint,"GROUP" bigint,
> "SRO" bigint, "UNKNOWN" bigint)
>
> So here are my questions:
>
> 1)  Is there a simpler way?  I'm hoping I made this unnecessarily
> cumbersome and complicated.
> 2)  AFAICT, if a new unit type were to be added, I'd have to rewrite this
> query.  Is there any way to avoid that?
> 3)  It seems like everything after the first query, except for the category
> field, is redundant information, and that in theory you should be able to
> say crosstab('query','category_field').  Is there any inherent reason this
> simpler form couldn't work, or is it just that no one has wanted to do it,
> or gotten to it yet?
>

And from David Johnston:

> 1) Pivot Tables...(not a PostgreSQL feature I'm afraid)
>
> 2) Not that I am aware of.  I would suggest writing the query so that
> "Other" is a valid group and any unmapped types get aliased to "Other" so at
> least the query counts everything and you know that if "Other" is non-zero
> you have some alterations to make.
>
> 3) Limitation of SQL - explained below:
>
>
>
>
And from Scott Marlowe:

> In the past I've written simple bash, perl, php etc scripts that
> interrogated catalogs and then built my crosstab queries for me. You
> could do it in a pl language, tho probably not easily in plpgsql.
> plpython or plperl etc would proabably be a good place to start.
>




Re: Question(s) about crosstab

From
Joe Conway
Date:
On 12/18/2013 03:32 PM, John Abraham wrote:
> Regarding crosstab, yes it's basically too complicated to use directly.  Here are the options:
>
> 1) write code (in another language, perhaps) to create your cross tab queries by
> inspecting the tables, which then submits those queries to create views.  We have a web-app in
> django/python that will create crosstab views in this way.  (We use it to attach the values to spatial shapes
> in PostGIS, so that other GIS programs, such as mapserver, can use it.  GIS programs always
> seem to expect things to be in crosstab format.)

1) This is the best option. I've done it with plpgsql in the past. You
don't need to inspect your tables so much as determine how many result
columns to expect based on the categories SQL string. Once you know how
many categories there are, you can define the column definition list
which allows you to write the crosstab query. So basically your app
calls the plpgsql function and then executes the resulting returned
query string.

> 3) Can't "someone" write a pl language routine that does it better?  I'd be willing to work on the core
> functionality in python if someone else would be willing to embed it in plpython (I've never used plpython.)

3) Not possible -- reason was given down thread. Column definition must
be known/determinable by the parser prior to query execution.

Basically to improve this you would have to hack the postgres backend in
such a way that it didn't need the column definition list until query
execution time, which I also doubt is possible.

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support


Re: Question(s) about crosstab

From
Sergey Konoplev
Date:
On Tue, Dec 17, 2013 at 3:31 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote:
> 1)  Is there a simpler way?  I'm hoping I made this unnecessarily cumbersome and complicated.
> 2)  AFAICT, if a new unit type were to be added, I'd have to rewrite this query.  Is there any way to avoid that?
> 3)  It seems like everything after the first query, except for the category field, is redundant information, and that
intheory you should be able to say crosstab('query','category_field').  Is there any inherent reason this simpler form
couldn'twork, or is it just that no one has wanted to do it, or gotten to it yet? 

Try to look at this article [1]. The guy has made some plpgsql
automation so it generate the resulting crostab query kind of like you
described it in 3, and it looks like is solves 1 and 2. For complex
queries you can make views and use them with the tablename argument.

[1] http://www.cureffi.org/2013/03/19/automatically-creating-pivot-table-column-names-in-postgresql/

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com


Re: Question(s) about crosstab

From
John Abraham
Date:
On Dec 18, 2013, at 3:52 PM, Joe Conway <mail@joeconway.com> wrote:
>
>> 3) Can't "someone" write a pl language routine that does it better?  I'd be willing to work on the core
>> functionality in python if someone else would be willing to embed it in plpython (I've never used plpython.)
>
> 3) Not possible -- reason was given down thread. Column definition must be known/determinable by the parser prior to
queryexecution. 

No, wait, I think you misunderstood my idea.  Can’t we have a python function that crosstabs the data in python, then
createsa new table, then inserts that data into the new table by looping through a bunch of inserts? 

The parser wouldn’t’ need to know a thing.  There would be no output from the function itself, it would just create a
tablewhile it ran.  (I suppose it could return the name of the table, or a boolean success/fail flag, or return the
numberof columns that were created, but all of these are simple things knowable to the parser in advance.) 

I’ve written functions before in plpgsql that create tables behind the scenes, and basically return no output
themselves.

—
John

Re: Question(s) about crosstab

From
AI Rumman
Date:
Hi,

Once I faced the same problem of adding new type and reqriting the query working with crosstab function. Then I created a dynamic crosstab function.
You may have a look at it if it work out for you:

Thanks.


On Tue, Dec 17, 2013 at 3:31 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote:
Hi.  I've got a simple table unit_hold, with grant numbers, buildings and counts of unit types, which I need to summarize, along with a table listing unit types:

\d unit_hold
                 Table "public.unit_hold"
        Column        |         Type          | Modifiers 
----------------------+-----------------------+-----------
 grant_number_code    | character varying(10) | 
 housing_project_code | character varying(10) | 
 unit_type_code       | character varying(10) | 
 count                | bigint                | 

SELECT * FROM unit_hold limit 3;
 grant_number_code | housing_project_code | unit_type_code | count 
-------------------+----------------------+----------------+-------
 1                 | AAAA                 | 4BR            |     1
 1                 | BBBB                 | 1BR            |     1
 1                 | CCCC                 | 1BR            |     1

SELECT unit_type_code,description FROM l_unit_type;
 unit_type_code | description 
----------------+-------------
 5BR            | 5 Bedroom
 4BR            | 4 Bedroom
 3BR            | 3 Bedroom
 6BR            | 6 Bedroom
 UNKNOWN        | Unknown
 GROUP          | Group Home
 2BR            | 2 Bedroom
 1BR            | 1 Bedroom
 0BR            | Studio
 SRO            | SRO


I thought this would be a good candidate for crosstab.  After wrestling with the documentation, this is the best I could come up with:

SELECT * FROM crosstab(
  'SELECT housing_project_code||''_''||grant_number_code AS project_and_grant,grant_number_code,housing_project_code,unit_type_code,count FROM unit_hold ORDER BY 1,2',
  'SELECT * FROM (SELECT DISTINCT unit_type_code FROM l_unit_type) foo ORDER BY unit_type_code ~ ''^[0-9]'' DESC, unit_type_code'
) AS ct(project_and_grant varchar, grant_number_code varchar, housing_project_code varchar, "0BR" bigint, "1BR" bigint, "2BR" bigint, "3BR" bigint, "4BR" bigint, "5BR" bigint, "6BR" bigint,"GROUP" bigint, "SRO" bigint, "UNKNOWN" bigint)

So here are my questions:

1)  Is there a simpler way?  I'm hoping I made this unnecessarily cumbersome and complicated.
2)  AFAICT, if a new unit type were to be added, I'd have to rewrite this query.  Is there any way to avoid that?
3)  It seems like everything after the first query, except for the category field, is redundant information, and that in theory you should be able to say crosstab('query','category_field').  Is there any inherent reason this simpler form couldn't work, or is it just that no one has wanted to do it, or gotten to it yet?

Thanks in advance!

Ken


--
AGENCY Software  
A data system that puts you in control
100% Free Software
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: Question(s) about crosstab

From
Joe Conway
Date:
On 12/18/2013 05:14 PM, John Abraham wrote:
> On Dec 18, 2013, at 3:52 PM, Joe Conway <mail@joeconway.com> wrote:
>> 3) Not possible -- reason was given down thread. Column definition
>> must be known/determinable by the parser prior to query execution.
>
> No, wait, I think you misunderstood my idea.  Can’t we have a python
> function that crosstabs the data in python, then creates a new table,
> then inserts that data into the new table by looping through a bunch
> of inserts?

Oh, well I believe you could do that today in plpgsql (build the "create
table as select * from crosstab..." sql as described earlier in this
thread, then execute it), so I'm sure you could do it in plpython as
well. But this isn't generally what people are looking for.

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support