Thread: Generating a cross tab (pivot table)
Regarding to Terry's request on multiple aggregates and Shahbaz's request for generating a cross tab ( pivot table ) in September, I've found an excellent example on a german inet page http://www.itrain.de/ I've translated it and think it's useful for many who subscribed (I hope so, maybe it was told before, but I couldn't find anything about this topic in the techdecs). Objective: There is a relation "sales", holding the sales of different products of different vendors. The task is to generate a report which shows the sales of every vendor and every product. Consider the following table populated with some data: CREATE TABLE sales (product TEXT,vendor TEXT,sales INTEGER ); INSERT INTO sales VALUES ( 'milk' , 'mr. pink' , 12 ) ; INSERT INTO sales VALUES ( 'milk' , 'mr. brown' , 8 ) ; INSERT INTO sales VALUES ( 'honey' , 'mr. green' , 2 ) ; INSERT INTO sales VALUES ( 'milk' , 'mr. green' , 34 ) ; INSERT INTO sales VALUES ( 'butter', 'mr. pink' , 17 ) ; INSERT INTO sales VALUES ( 'butter', 'mr. brown' , 2 ) ; INSERT INTO sales VALUES ( 'honey' , 'mr. pink' , 19 ) ; The following query generates the report: SELECT product, SUM(CASE vendor WHEN 'mr. pink' THEN sales ELSE 0 END) AS "mr. pink ", SUM(CASE vendor WHEN 'mr. brown' THEN sales ELSE 0 END) AS "mr. brown", SUM(CASE vendor WHEN 'mr. green' THEN sales ELSE 0 END) AS "mr. green", SUM(sales) AS "sum of sales" FROM sales GROUP BY product ; product | mr. pink | mr. brown | mr. green | sum of sales ---------+-----------+-----------+-----------+--------------butter | 17 | 2 | 0 | 19honey | 19 | 0 | 2 | 21milk | 12 | 8 | 34 | 54 (3 rows) The example is based on MS SQL Server 7.0 and it appears to be there is a valuable feature called CUBE which completes the report. SELECT CASE WHEN GROUPING(product) = 1 THEN 'sum of sales' ELSE product END, SUM(CASE vendor WHEN 'mr. pink' THEN sales ELSE 0 END) AS "mr. pink ", SUM(CASE vendor WHEN 'mr. brown' THEN sales ELSE 0 END) AS "mr. brown", SUM(CASE vendor WHEN 'mr. green' THEN sales ELSE 0 END) AS "mr. green", SUM(sales) AS "sum of sales" FROM sales GROUP BY product WITH CUBE ; product | mr. pink | mr. brown | mr. green | sum of sales --------------+-----------+-----------+-----------+--------------butter | 17 | 2 | 0 | 19honey | 19 | 0 | 2 | 21milk | 12 | 8 | 34 | 54sum of sales | 48 | 10 | 36 | 94 (4 rows) I would like to hear from the core team whether they think this feature is worthy to be implemented, or even better, is there a similar one or an easy workaround already. It's obvious this approach is most inflexible. As soon as there is a new vendor, one has to re-write the query and add SUM(CASE vendor WHEN 'mr. new' THEN ... , In an advanced example it is shown how to deal with cross tabs in general using a stored procedure. I am going to translate this and re-write it for postgres, too (ok, I will try). Regards, Christoph
On Thursday 07 Nov 2002 11:47 am, Christoph Haller wrote: > Regarding to Terry's request on multiple aggregates and > Shahbaz's request for generating a cross tab ( pivot table ) > in September, I've found an excellent example on a german inet page > http://www.itrain.de/ > I've translated it and think it's useful for many who subscribed > (I hope so, maybe it was told before, but I couldn't find > anything about this topic in the techdecs). Very useful. Also note there are some examples of how to produce crosstab results in the table-functions contrib directory in 7.3 -- Richard Huxton
Richard Huxton wrote: > On Thursday 07 Nov 2002 11:47 am, Christoph Haller wrote: > >>Regarding to Terry's request on multiple aggregates and >>Shahbaz's request for generating a cross tab ( pivot table ) >>in September, I've found an excellent example on a german inet page >>http://www.itrain.de/ >>I've translated it and think it's useful for many who subscribed >>(I hope so, maybe it was told before, but I couldn't find >>anything about this topic in the techdecs). > > Very useful. Also note there are some examples of how to produce crosstab > results in the table-functions contrib directory in 7.3 > Just to amplify a bit, in contrib/tablefunc there is a family of functions called crosstabN(), where N is 2, 3 and 4. These are meant as examples -- you could, for example, create a function crosstab5() if you need it. There is also a function called crosstab(), which returns type RECORD and thus requires the column definition to be specified in the query. See contrib/tablefunc/README.tablefunc for more details and examples. These were done as relatively crude examples and therefore have some limitations which may or may not be a problem for you. If people find the functions useful and provide suggestions for improvement in functionality I'll try to upgrade them for 7.4. Thanks, Joe
--- Christoph Haller <ch@rodos.fzk.de> wrote: > It's obvious this approach is most inflexible. > As soon as there is a new vendor, one has to > re-write the query and add > SUM(CASE vendor WHEN 'mr. new' THEN ... , > > In an advanced example it is shown how to deal with > cross tabs in > general > using a stored procedure. I am going to translate > this and re-write it > for postgres, too (ok, I will try). I saw something that might somewhat a bit more flexible solution using SQL. I don't know it it work in PostgreSQL. I saw it at the MySQL site. The following is the URL: http://www.mysql.com/articles/wizard/index.html Has anyone tried this on a PostgreSQL database ? regards, ludwig __________________________________________________ Do you Yahoo!? New DSL Internet Access from SBC & Yahoo! http://sbc.yahoo.com
> I saw something that might somewhat a bit more > flexible solution using SQL. I don't know if it works > in PostgreSQL. I saw it at the MySQL site. > > The following is the URL: > http://www.mysql.com/articles/wizard/index.html > > Has anyone tried this on a PostgreSQL database ? No, not me. But as far as I can tell the SQL statements can quite easily be re-written in PostgreSQL: e. g. mysql> SELECT location, SUM(IF(gender='M',1,0)) AS M, SUM(IF(gender='F',1,0)) AS F -> FROM locations INNER JOIN employees USING (loc_code) GROUP BY location; becomes SELECT location, SUM(CASE WHEN gender='M' THEN 1 ELSE 0 END) AS "M", SUM(CASE WHEN gender='F' THEN 1 ELSE 0 END) AS "F", FROM locations LEFT JOIN employees ON (locations.loc_code=employees.loc_code) GROUP BY location; And this goes for the perl script as well. Regards, Christoph
Christoph, Lud, > > I saw something that might somewhat a bit more > > flexible solution using SQL. I don't know if it works > > in PostgreSQL. I saw it at the MySQL site. > > > > The following is the URL: > > http://www.mysql.com/articles/wizard/index.html > > > > Has anyone tried this on a PostgreSQL database ? Actually, I'm rather annoyed with the article author. He first claims that Joe Celko reccommends expensive add-on software for crosstabs (Joe does not) and then goes on to use one of Joe's own solutions. However, given the author's thouroughness otherwise, I'm sure the innaccuracy is due to some kind of misunderstanding. There are, in fact, 3 simple SQL-based solutions to the crosstab problem. Which one you use depends on the shape of your data. I am not going to cover them in detail here (I'll save that for an article) but to sum up: 1) The SUM(CASE()) statement method, as outlined in the article, which is good for crosstabs expecting small numbers of columns. Or, in the case of this article, good for RDBMS which do not support subselects. 2) The LEFT JOIN + Sub-Select method, which is good for crosstabs with lots of columns but not that many rows in the crosstabbed table. 3) The "crosstab reference grid" method, which is good for large tables and crosstabs with lots of columns, but requires setup and maintainence by trigger. Joe Celko covers these 3 types in "SQL for Smarties". I will write a PostgreSQL implementation in a later article. -Josh Berkus