Thread: Question(s) about crosstab
Hi. I've got a simple table unit_hold, with grant numbers, buildings and counts of unit types, which I need to summarize, along with a table listing unit types:
\d unit_hold
Table "public.unit_hold"
Column | Type | Modifiers
----------------------+-----------------------+-----------
grant_number_code | character varying(10) |
housing_project_code | character varying(10) |
unit_type_code | character varying(10) |
count | bigint |
SELECT * FROM unit_hold limit 3;
grant_number_code | housing_project_code | unit_type_code | count
-------------------+----------------------+----------------+-------
1 | AAAA | 4BR | 1
1 | BBBB | 1BR | 1
1 | CCCC | 1BR | 1
SELECT unit_type_code,description FROM l_unit_type;
unit_type_code | description
----------------+-------------
5BR | 5 Bedroom
4BR | 4 Bedroom
3BR | 3 Bedroom
6BR | 6 Bedroom
UNKNOWN | Unknown
GROUP | Group Home
2BR | 2 Bedroom
1BR | 1 Bedroom
0BR | Studio
SRO | SRO
I thought this would be a good candidate for crosstab. After wrestling with the documentation, this is the best I could come up with:
SELECT * FROM crosstab(
'SELECT housing_project_code||''_''||grant_number_code AS project_and_grant,grant_number_code,housing_project_code,unit_type_code,count FROM unit_hold ORDER BY 1,2',
'SELECT * FROM (SELECT DISTINCT unit_type_code FROM l_unit_type) foo ORDER BY unit_type_code ~ ''^[0-9]'' DESC, unit_type_code'
) AS ct(project_and_grant varchar, grant_number_code varchar, housing_project_code varchar, "0BR" bigint, "1BR" bigint, "2BR" bigint, "3BR" bigint, "4BR" bigint, "5BR" bigint, "6BR" bigint,"GROUP" bigint, "SRO" bigint, "UNKNOWN" bigint)
So here are my questions:
1) Is there a simpler way? I'm hoping I made this unnecessarily cumbersome and complicated.
2) AFAICT, if a new unit type were to be added, I'd have to rewrite this query. Is there any way to avoid that?
3) It seems like everything after the first query, except for the category field, is redundant information, and that in theory you should be able to say crosstab('query','category_field'). Is there any inherent reason this simpler form couldn't work, or is it just that no one has wanted to do it, or gotten to it yet?
Thanks in advance!
Ken
AGENCY Software
A data system that puts you in control
100% Free Software
(253) 245-3801
learn more about AGENCY or
follow the discussion.
O n Tue, 2013-12-17 at 15:31 -0800, Ken Tanzer wrote: > > \d unit_hold > Table "public.unit_hold" > Column | Type | Modifiers > ----------------------+-----------------------+----------- > grant_number_code | character varying(10) | > housing_project_code | character varying(10) | > unit_type_code | character varying(10) | > count | bigint | > > > SELECT * FROM unit_hold limit 3; > grant_number_code | housing_project_code | unit_type_code | count > -------------------+----------------------+----------------+------- > 1 | AAAA | 4BR | 1 > 1 | BBBB | 1BR | 1 > 1 | CCCC | 1BR | 1 > > > SELECT unit_type_code,description FROM l_unit_type; > unit_type_code | description > ----------------+------------- > 5BR | 5 Bedroom > 4BR | 4 Bedroom > 3BR | 3 Bedroom > 6BR | 6 Bedroom > UNKNOWN | Unknown > GROUP | Group Home > 2BR | 2 Bedroom > 1BR | 1 Bedroom > 0BR | Studio > SRO | SRO > > > > > I thought this would be a good candidate for crosstab. After > wrestling with the documentation, this is the best I could come up > with: > > > SELECT * FROM crosstab( > 'SELECT housing_project_code||''_''||grant_number_code AS > project_and_grant,grant_number_code,housing_project_code,unit_type_code,count FROM unit_hold ORDER BY 1,2', > 'SELECT * FROM (SELECT DISTINCT unit_type_code FROM l_unit_type) foo > ORDER BY unit_type_code ~ ''^[0-9]'' DESC, unit_type_code' > ) AS ct(project_and_grant varchar, grant_number_code varchar, > housing_project_code varchar, "0BR" bigint, "1BR" bigint, "2BR" > bigint, "3BR" bigint, "4BR" bigint, "5BR" bigint, "6BR" bigint,"GROUP" > bigint, "SRO" bigint, "UNKNOWN" bigint) > > > > So here are my questions: > > > 1) Is there a simpler way? I'm hoping I made this unnecessarily > cumbersome and complicated. > 2) AFAICT, if a new unit type were to be added, I'd have to rewrite > this query. Is there any way to avoid that? > 3) It seems like everything after the first query, except for the > category field, is redundant information, and that in theory you > should be able to say crosstab('query','category_field'). Is there > any inherent reason this simpler form couldn't work, or is it just > that no one has wanted to do it, or gotten to it yet? > > > Thanks in advance! > > > Ken > > > > > -- > > AGENCY Software > A data system that puts you in control > 100% Free Software > http://agency-software.org/ > ken.tanzer@agency-software.org > (253) 245-3801 > > > Subscribe to the mailing list to > learn more about AGENCY or > follow the discussion. SELECT UH.grant_number_code, UH.housing_project_code, UH. count, UT.description FROM l_unit_type UT, unit_hold UH WHERE UH.unit_type_code = UT.unit_type_code; Easier to create a view.
On Tue, Dec 17, 2013 at 3:47 PM, rob stone <floriparob@gmail.com> wrote:
SELECT UH.grant_number_code, UH.housing_project_code, UH. count,
UT.description
FROM l_unit_type UT, unit_hold UH
WHERE UH.unit_type_code = UT.unit_type_code;
Easier to create a view.
Thanks Rob, but that doesn't get the data into a 1 line per grant/building format, with the unit types as columns. That's why I was looking at crosstab in the first place!
Cheers,
Ken
--
AGENCY Software
A data system that puts you in control
100% Free Software
(253) 245-3801
learn more about AGENCY or
follow the discussion.
Ken Tanzer wrote > 1) Is there a simpler way? I'm hoping I made this unnecessarily > cumbersome and complicated. > 2) AFAICT, if a new unit type were to be added, I'd have to rewrite this > query. Is there any way to avoid that? > 3) It seems like everything after the first query, except for the > category > field, is redundant information, and that in theory you should be able to > say crosstab('query','category_field'). Is there any inherent reason this > simpler form couldn't work, or is it just that no one has wanted to do it, > or gotten to it yet? 1) Pivot Tables...(not a PostgreSQL feature I'm afraid) 2) Not that I am aware of. I would suggest writing the query so that "Other" is a valid group and any unmapped types get aliased to "Other" so at least the query counts everything and you know that if "Other" is non-zero you have some alterations to make. 3) Limitation of SQL - explained below: The function call string that you pass in is just that, a string, the SQL construct within which it resides has no knowledge of its contents. SQL has the hard requirement that at the time you submit a query all columns must be known. If a function is polymorphic (in the sense it can output different columns/row-types) then when you call that function you must indicate which columns (and types) are going to be output by the function during this specific execution. As an aside you should consider dollar-quoting: SELECT function_call( $arg1$ SELECT '1' AS one $arg1$, $arg2$ SELECT '2two' AS "22" $arg2" ); That way you do not need to deal with escaping the embedded quotes in the query and can fairly easily extract the query text and run it standalone, modify it, then copy it back without modification. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Question-s-about-crosstab-tp5783810p5783818.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Tue, Dec 17, 2013 at 5:11 PM, David Johnston <polobo@yahoo.com> wrote:
-- 3) Limitation of SQL - explained below:
The function call string that you pass in is just that, a string, the SQL
construct within which it resides has no knowledge of its contents.
SQL has the hard requirement that at the time you submit a query all columns
must be known. If a function is polymorphic (in the sense it can output
different columns/row-types) then when you call that function you must
indicate which columns (and types) are going to be output by the function
during this specific execution.
I guess crosstabs were not all that I hoped they were (basically pivot tables), but thanks for the clear explanation.
Cheers,
Ken
AGENCY Software
A data system that puts you in control
100% Free Software
(253) 245-3801
learn more about AGENCY or
follow the discussion.
On Tue, Dec 17, 2013 at 10:42 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote: > > On Tue, Dec 17, 2013 at 5:11 PM, David Johnston <polobo@yahoo.com> wrote: >> >> 3) Limitation of SQL - explained below: >> >> The function call string that you pass in is just that, a string, the SQL >> construct within which it resides has no knowledge of its contents. >> >> SQL has the hard requirement that at the time you submit a query all columns >> must be known. If a function is polymorphic (in the sense it can output >> different columns/row-types) then when you call that function you must >> indicate which columns (and types) are going to be output by the function >> during this specific execution. > > > I guess crosstabs were not all that I hoped they were (basically pivot tables), but thanks for the clear explanation. In the past I've written simple bash, perl, php etc scripts that interrogated catalogs and then built my crosstab queries for me. You could do it in a pl language, tho probably not easily in plpgsql. plpython or plperl etc would proabably be a good place to start.
Regarding crosstab, yes it's basically too complicated to use directly. Here are the options: 1) write code (in another language, perhaps) to create your cross tab queries by inspecting the tables, which then submits those queries to create views. We have a web-app in django/python that will create crosstab views in this way. (We use it to attach the values to spatial shapes in PostGIS, so that other GIS programs, such as mapserver, can use it. GIS programs always seem to expect things to be in crosstab format.) 2) Export to CSV, then pivottable in Excel, or AWK, or Perl, or whatever. The problem with this is that the results are NOT in your database, they are in the external table. 3) Can't "someone" write a pl language routine that does it better? I'd be willing to work on the core functionality in python if someone else would be willing to embed it in plpython (I've never used plpython.) > ---------------------------------------------------------------------- > > Date: Tue, 17 Dec 2013 15:31:54 -0800 > From: Ken Tanzer <ken.tanzer@gmail.com> > Hi. I've got a simple table unit_hold, with grant numbers, buildings and > counts of unit types, which I need to summarize, along with a table listing > unit types: <snip> > > I thought this would be a good candidate for crosstab. After wrestling > with the documentation, this is the best I could come up with: > > SELECT * FROM crosstab( > 'SELECT housing_project_code||''_''||grant_number_code AS > project_and_grant,grant_number_code,housing_project_code,unit_type_code,count > FROM unit_hold ORDER BY 1,2', > 'SELECT * FROM (SELECT DISTINCT unit_type_code FROM l_unit_type) foo > ORDER BY unit_type_code ~ ''^[0-9]'' DESC, unit_type_code' > ) AS ct(project_and_grant varchar, grant_number_code varchar, > housing_project_code varchar, "0BR" bigint, "1BR" bigint, "2BR" bigint, > "3BR" bigint, "4BR" bigint, "5BR" bigint, "6BR" bigint,"GROUP" bigint, > "SRO" bigint, "UNKNOWN" bigint) > > So here are my questions: > > 1) Is there a simpler way? I'm hoping I made this unnecessarily > cumbersome and complicated. > 2) AFAICT, if a new unit type were to be added, I'd have to rewrite this > query. Is there any way to avoid that? > 3) It seems like everything after the first query, except for the category > field, is redundant information, and that in theory you should be able to > say crosstab('query','category_field'). Is there any inherent reason this > simpler form couldn't work, or is it just that no one has wanted to do it, > or gotten to it yet? > And from David Johnston: > 1) Pivot Tables...(not a PostgreSQL feature I'm afraid) > > 2) Not that I am aware of. I would suggest writing the query so that > "Other" is a valid group and any unmapped types get aliased to "Other" so at > least the query counts everything and you know that if "Other" is non-zero > you have some alterations to make. > > 3) Limitation of SQL - explained below: > > > > And from Scott Marlowe: > In the past I've written simple bash, perl, php etc scripts that > interrogated catalogs and then built my crosstab queries for me. You > could do it in a pl language, tho probably not easily in plpgsql. > plpython or plperl etc would proabably be a good place to start. >
On 12/18/2013 03:32 PM, John Abraham wrote: > Regarding crosstab, yes it's basically too complicated to use directly. Here are the options: > > 1) write code (in another language, perhaps) to create your cross tab queries by > inspecting the tables, which then submits those queries to create views. We have a web-app in > django/python that will create crosstab views in this way. (We use it to attach the values to spatial shapes > in PostGIS, so that other GIS programs, such as mapserver, can use it. GIS programs always > seem to expect things to be in crosstab format.) 1) This is the best option. I've done it with plpgsql in the past. You don't need to inspect your tables so much as determine how many result columns to expect based on the categories SQL string. Once you know how many categories there are, you can define the column definition list which allows you to write the crosstab query. So basically your app calls the plpgsql function and then executes the resulting returned query string. > 3) Can't "someone" write a pl language routine that does it better? I'd be willing to work on the core > functionality in python if someone else would be willing to embed it in plpython (I've never used plpython.) 3) Not possible -- reason was given down thread. Column definition must be known/determinable by the parser prior to query execution. Basically to improve this you would have to hack the postgres backend in such a way that it didn't need the column definition list until query execution time, which I also doubt is possible. Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support
On Tue, Dec 17, 2013 at 3:31 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote: > 1) Is there a simpler way? I'm hoping I made this unnecessarily cumbersome and complicated. > 2) AFAICT, if a new unit type were to be added, I'd have to rewrite this query. Is there any way to avoid that? > 3) It seems like everything after the first query, except for the category field, is redundant information, and that intheory you should be able to say crosstab('query','category_field'). Is there any inherent reason this simpler form couldn'twork, or is it just that no one has wanted to do it, or gotten to it yet? Try to look at this article [1]. The guy has made some plpgsql automation so it generate the resulting crostab query kind of like you described it in 3, and it looks like is solves 1 and 2. For complex queries you can make views and use them with the tablename argument. [1] http://www.cureffi.org/2013/03/19/automatically-creating-pivot-table-column-names-in-postgresql/ -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray.ru@gmail.com
On Dec 18, 2013, at 3:52 PM, Joe Conway <mail@joeconway.com> wrote: > >> 3) Can't "someone" write a pl language routine that does it better? I'd be willing to work on the core >> functionality in python if someone else would be willing to embed it in plpython (I've never used plpython.) > > 3) Not possible -- reason was given down thread. Column definition must be known/determinable by the parser prior to queryexecution. No, wait, I think you misunderstood my idea. Can’t we have a python function that crosstabs the data in python, then createsa new table, then inserts that data into the new table by looping through a bunch of inserts? The parser wouldn’t’ need to know a thing. There would be no output from the function itself, it would just create a tablewhile it ran. (I suppose it could return the name of the table, or a boolean success/fail flag, or return the numberof columns that were created, but all of these are simple things knowable to the parser in advance.) I’ve written functions before in plpgsql that create tables behind the scenes, and basically return no output themselves. — John
Hi,
Once I faced the same problem of adding new type and reqriting the query working with crosstab function. Then I created a dynamic crosstab function.
You may have a look at it if it work out for you:
Thanks.
On Tue, Dec 17, 2013 at 3:31 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote:
Hi. I've got a simple table unit_hold, with grant numbers, buildings and counts of unit types, which I need to summarize, along with a table listing unit types:\d unit_holdTable "public.unit_hold"Column | Type | Modifiers----------------------+-----------------------+-----------grant_number_code | character varying(10) |housing_project_code | character varying(10) |unit_type_code | character varying(10) |count | bigint |SELECT * FROM unit_hold limit 3;grant_number_code | housing_project_code | unit_type_code | count-------------------+----------------------+----------------+-------1 | AAAA | 4BR | 11 | BBBB | 1BR | 11 | CCCC | 1BR | 1SELECT unit_type_code,description FROM l_unit_type;unit_type_code | description----------------+-------------5BR | 5 Bedroom4BR | 4 Bedroom3BR | 3 Bedroom6BR | 6 BedroomUNKNOWN | UnknownGROUP | Group Home2BR | 2 Bedroom1BR | 1 Bedroom0BR | StudioSRO | SROI thought this would be a good candidate for crosstab. After wrestling with the documentation, this is the best I could come up with:SELECT * FROM crosstab('SELECT housing_project_code||''_''||grant_number_code AS project_and_grant,grant_number_code,housing_project_code,unit_type_code,count FROM unit_hold ORDER BY 1,2','SELECT * FROM (SELECT DISTINCT unit_type_code FROM l_unit_type) foo ORDER BY unit_type_code ~ ''^[0-9]'' DESC, unit_type_code') AS ct(project_and_grant varchar, grant_number_code varchar, housing_project_code varchar, "0BR" bigint, "1BR" bigint, "2BR" bigint, "3BR" bigint, "4BR" bigint, "5BR" bigint, "6BR" bigint,"GROUP" bigint, "SRO" bigint, "UNKNOWN" bigint)So here are my questions:1) Is there a simpler way? I'm hoping I made this unnecessarily cumbersome and complicated.2) AFAICT, if a new unit type were to be added, I'd have to rewrite this query. Is there any way to avoid that?3) It seems like everything after the first query, except for the category field, is redundant information, and that in theory you should be able to say crosstab('query','category_field'). Is there any inherent reason this simpler form couldn't work, or is it just that no one has wanted to do it, or gotten to it yet?Thanks in advance!Ken--AGENCY SoftwareA data system that puts you in control100% Free Software(253) 245-3801learn more about AGENCY orfollow the discussion.
On 12/18/2013 05:14 PM, John Abraham wrote: > On Dec 18, 2013, at 3:52 PM, Joe Conway <mail@joeconway.com> wrote: >> 3) Not possible -- reason was given down thread. Column definition >> must be known/determinable by the parser prior to query execution. > > No, wait, I think you misunderstood my idea. Can’t we have a python > function that crosstabs the data in python, then creates a new table, > then inserts that data into the new table by looping through a bunch > of inserts? Oh, well I believe you could do that today in plpgsql (build the "create table as select * from crosstab..." sql as described earlier in this thread, then execute it), so I'm sure you could do it in plpython as well. But this isn't generally what people are looking for. Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support