Re: Creating a query to structure results grouped by two columns, pivoting only a third colum - Mailing list pgsql-sql

From Iuri Sampaio
Subject Re: Creating a query to structure results grouped by two columns, pivoting only a third colum
Date
Msg-id 66CD7D72-62D0-4ED1-BA98-3C8B9ED738C1@gmail.com
Whole thread Raw
In response to Re: Creating a query to structure results grouped by two columns, pivoting only a third colum  (Steve Midgley <science@misuse.org>)
List pgsql-sql
Using PySpark API, the solution would be 

````
df = df_table1.groupBy("area", "canal").pivot("date").sum("peso")
````

In oracle 11g,  it would be sort of 

````
SELECT * FROM table1 PIVOT (sum(peso) FOR date IN (
   '2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01', '2021-07-01', '2021-08-01', '2021-0-01’ 
 ) P
````

Oracle has this strange way of dealing with PIVOT, without even need to group columns. 


If I were to write the query, ut would be somehtinglike this (i.e. similar to Oracle but more explicit).

SELECT * FROM (
  SELECT 
    date, 
    area, 
    sub_canal,
    SUM(peso) AS peso
  FROM table1
  GROUP BY 1, 2, 3
  ORDER BY 1, 2, 3 
) t1 
PIVOT (
  peso FOR dat_referencia_mes IN ('2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01', '2021-07-01', '2021-08-01', '2021-0-01' )
) p



another thing that I would have done is to replace the static set of dates to a dynamic one. (i.e. SELECT DISTINCT date FROM table1)



However, I’m dealing with syntax errors now, plus once  I fix them I’m not sure if that will work ( I mean  



On Sep 25, 2021, at 21:53, Steve Midgley <science@misuse.org> wrote:



On Sat, Sep 25, 2021 at 9:49 AM Iuri Sampaio <iuri.sampaio@gmail.com> wrote:
 [FROM] 
###########################
date area  | canal | peso
2021-03-01 area1 can1 45.6768
2021-03-01 area1 can2 54.6768
2021-03-01 area1 can3 87.6
2021-03-01 area2 can1 1.87
2021-03-01 area2 can2 12.7687
2021-03-01 area2 can3 965.568
2021-03-01 area3 can1 968.95
2021-03-01 area3 can2 1.6867
2021-03-01 area3 can3 8.897
. ...
2021-06-01 area1 can1
2021-06-01 area1 can2
2021-06-01 area1 can3
2021-06-01 area2 can1
...

2021-12-01 area1
2021-12-01 area1
2021-12-01 area1
###########################


[TO] 

###########################
area  canal  2021-03-01  |  2021-04-01 2021-05-01 | 2021-06-01 | ...
peso peso peso peso

area1 can1 45.6768 875.98 1.232
area1 can2 54.6768 665.8 2.43 ...
area1 can3 87.6 65.8 4.76 ...
area1 total SUM(45.6768+54.6768+87.6) SUM(875.98+665.8+65.8) SUM(1.232+2.43+4.76) ...

area2 can1 1.87 ...
area2 can2 12.7687
area2 can3 965.568
area2 total SUM(1.87+12.7687+965.568) ...


area3 can1 968.95
area3 can2 1.6867
area3 can3 8.897

...
###########################


I think you'd want to use the "crosstab" function in the module "tablefunc" https://www.postgresql.org/docs/current/tablefunc.html

This post goes into quite a bit of detail on how to implement this type of query: https://stackoverflow.com/questions/3002499/postgresql-crosstab-query/11751905#11751905

Does that get you close enough?
Steve 

On Sat, Sep 25, 2021 at 9:49 AM Iuri Sampaio <iuri.sampaio@gmail.com> wrote:
Hello there,
In order to achieve such structure "pivoting" table and "grouping by" multiple columns. (i.e. as illustrated below), what would be the SQL implementation?
The source query is:

````
SELECT 
  t1.date, 
  t1.area, 
  t1.canal,
  SUM(t1.peso) AS peso
FROM table1 t1
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3 

````

and source query generates a initial structure as in:

###########################
date area  | canal | peso
2021-03-01 area1 can1 45.6768
2021-03-01 area1 can2 54.6768
2021-03-01 area1 can3 87.6
2021-03-01 area2 can1 1.87
2021-03-01 area2 can2 12.7687
2021-03-01 area2 can3 965.568
2021-03-01 area3 can1 968.95
2021-03-01 area3 can2 1.6867
2021-03-01 area3 can3 8.897
. ...
2021-06-01 area1 can1
2021-06-01 area1 can2
2021-06-01 area1 can3
2021-06-01 area2 can1
...

2021-12-01 area1
2021-12-01 area1
2021-12-01 area1
###########################




Then, the goal's to achieve a final structure grouped by columns "area" and "canal", pivoting column “date", but only to the column "peso".
Plus, a partial total of each area, named as "total” .

I tried to write a query to support such a display, however, I got stuck at pivoting date only to the column peso

````
SELECT  
  t2.area,
  t2.canal,
  ( SELECT 
month,
peso_valor
FROM (
    SELECT 
      month(t1.date) month, 
      t1.area, 
      t1.canal,
      SUM(t1.peso_valor) AS peso_valor
    FROM tbl_data t1
    GROUP BY 1, 2, 3
    ORDER BY 1, 2, 3
) source_table 
) PIVOT (  
  peso
  FOR month IN (
    1 Jan, 2 Feb, 3 Mar, 4 Apr, 5 Mai, 6 Jun, 7 Jul, 8 Aug, 9 Sep, 10 Oct, 11 Nov, 12 Dec
  )  
) as pivot_table
ORDER BY month
) as t2.peso 
FROM tbl_data t2
GROUP BY 1, 2
````


what would be the SQL implementation to achieve a structure as such  ?  
(.i.e.  grouped by columns "area" and "canal", pivoting column “date", but only to the column "peso”.)

 illustrated bellow?


Best wishes,
I








###########################
area  canal  2021-03-01  |  2021-04-01 2021-05-01 | 2021-06-01 | ...
peso peso peso peso

area1 can1 45.6768 875.98 1.232
area1 can2 54.6768 665.8 2.43 ...
area1 can3 87.6 65.8 4.76 ...
area1 total SUM(45.6768+54.6768+87.6) SUM(875.98+665.8+65.8) SUM(1.232+2.43+4.76) ...

area2 can1 1.87 ...
area2 can2 12.7687
area2 can3 965.568
area2 total SUM(1.87+12.7687+965.568) ...


area3 can1 968.95
area3 can2 1.6867
area3 can3 8.897

...
###########################



pgsql-sql by date:

Previous
From: Steve Midgley
Date:
Subject: Re: Creating a query to structure results grouped by two columns, pivoting only a third colum
Next
From: Shaozhong SHI
Date:
Subject: PostgreSQL CHECK Constraint