Re: Creating a query to structure results grouped by two columns, pivoting only a third colum - Mailing list pgsql-sql
From | Iuri Sampaio |
---|---|
Subject | Re: Creating a query to structure results grouped by two columns, pivoting only a third colum |
Date | |
Msg-id | 66CD7D72-62D0-4ED1-BA98-3C8B9ED738C1@gmail.com Whole thread Raw |
In response to | Re: Creating a query to structure results grouped by two columns, pivoting only a third colum (Steve Midgley <science@misuse.org>) |
List | pgsql-sql |
Using PySpark API, the solution would be
````
df = df_table1.groupBy("area", "canal").pivot("date").sum("peso")
````
In oracle 11g, it would be sort of
````
SELECT * FROM table1 PIVOT (sum(peso) FOR date IN (
'2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01', '2021-07-01', '2021-08-01', '2021-0-01’
) P
````
Oracle has this strange way of dealing with PIVOT, without even need to group columns.
If I were to write the query, ut would be somehtinglike this (i.e. similar to Oracle but more explicit).
SELECT * FROM (
SELECT
date,
area,
sub_canal,
SUM(peso) AS peso
FROM table1
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3
) t1
PIVOT (
peso FOR dat_referencia_mes IN ('2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01', '2021-07-01', '2021-08-01', '2021-0-01' )
) p
another thing that I would have done is to replace the static set of dates to a dynamic one. (i.e. SELECT DISTINCT date FROM table1)
However, I’m dealing with syntax errors now, plus once I fix them I’m not sure if that will work ( I mean
On Sep 25, 2021, at 21:53, Steve Midgley <science@misuse.org> wrote:On Sat, Sep 25, 2021 at 9:49 AM Iuri Sampaio <iuri.sampaio@gmail.com> wrote:[FROM]###########################date | area | canal | peso2021-03-01 area1 can1 45.67682021-03-01 area1 can2 54.67682021-03-01 area1 can3 87.62021-03-01 area2 can1 1.872021-03-01 area2 can2 12.76872021-03-01 area2 can3 965.5682021-03-01 area3 can1 968.952021-03-01 area3 can2 1.68672021-03-01 area3 can3 8.897…. … … ...2021-06-01 area1 can12021-06-01 area1 can22021-06-01 area1 can32021-06-01 area2 can1… … ...2021-12-01 area12021-12-01 area12021-12-01 area1###########################[TO]###########################area | canal | 2021-03-01 | 2021-04-01 | 2021-05-01 | 2021-06-01 | ...peso peso peso pesoarea1 can1 45.6768 875.98 1.232 …area1 can2 54.6768 665.8 2.43 ...area1 can3 87.6 65.8 4.76 ...area1 total SUM(45.6768+54.6768+87.6) SUM(875.98+665.8+65.8) SUM(1.232+2.43+4.76) ...area2 can1 1.87 … ...area2 can2 12.7687area2 can3 965.568area2 total SUM(1.87+12.7687+965.568) … ...area3 can1 968.95area3 can2 1.6867area3 can3 8.897...###########################I think you'd want to use the "crosstab" function in the module "tablefunc" https://www.postgresql.org/docs/current/tablefunc.htmlThis post goes into quite a bit of detail on how to implement this type of query: https://stackoverflow.com/questions/3002499/postgresql-crosstab-query/11751905#11751905Does that get you close enough?SteveOn Sat, Sep 25, 2021 at 9:49 AM Iuri Sampaio <iuri.sampaio@gmail.com> wrote:Hello there,In order to achieve such structure "pivoting" table and "grouping by" multiple columns. (i.e. as illustrated below), what would be the SQL implementation?The source query is:````SELECTt1.date,t1.area,t1.canal,SUM(t1.peso) AS pesoFROM table1 t1GROUP BY 1, 2, 3ORDER BY 1, 2, 3````and source query generates a initial structure as in:###########################date | area | canal | peso2021-03-01 area1 can1 45.67682021-03-01 area1 can2 54.67682021-03-01 area1 can3 87.62021-03-01 area2 can1 1.872021-03-01 area2 can2 12.76872021-03-01 area2 can3 965.5682021-03-01 area3 can1 968.952021-03-01 area3 can2 1.68672021-03-01 area3 can3 8.897…. … … ...2021-06-01 area1 can12021-06-01 area1 can22021-06-01 area1 can32021-06-01 area2 can1… … ...2021-12-01 area12021-12-01 area12021-12-01 area1###########################Then, the goal's to achieve a final structure grouped by columns "area" and "canal", pivoting column “date", but only to the column "peso".Plus, a partial total of each area, named as "total” .I tried to write a query to support such a display, however, I got stuck at pivoting date only to the column peso````SELECTt2.area,t2.canal,( SELECTmonth,peso_valorFROM (SELECTmonth(t1.date) month,t1.area,t1.canal,SUM(t1.peso_valor) AS peso_valorFROM tbl_data t1GROUP BY 1, 2, 3ORDER BY 1, 2, 3) source_table) PIVOT (pesoFOR month IN (1 Jan, 2 Feb, 3 Mar, 4 Apr, 5 Mai, 6 Jun, 7 Jul, 8 Aug, 9 Sep, 10 Oct, 11 Nov, 12 Dec)) as pivot_tableORDER BY month) as t2.pesoFROM tbl_data t2GROUP BY 1, 2````what would be the SQL implementation to achieve a structure as such ?(.i.e. grouped by columns "area" and "canal", pivoting column “date", but only to the column "peso”.)illustrated bellow?Best wishes,I###########################area | canal | 2021-03-01 | 2021-04-01 | 2021-05-01 | 2021-06-01 | ...peso peso peso pesoarea1 can1 45.6768 875.98 1.232 …area1 can2 54.6768 665.8 2.43 ...area1 can3 87.6 65.8 4.76 ...area1 total SUM(45.6768+54.6768+87.6) SUM(875.98+665.8+65.8) SUM(1.232+2.43+4.76) ...area2 can1 1.87 … ...area2 can2 12.7687area2 can3 965.568area2 total SUM(1.87+12.7687+965.568) … ...area3 can1 968.95area3 can2 1.6867area3 can3 8.897...###########################