Home > mailing lists

Re: Creating a query to structure results grouped by two columns, pivoting only a third colum - Mailing list pgsql-sql

From	Iuri Sampaio
Subject	Re: Creating a query to structure results grouped by two columns, pivoting only a third colum
Date	September 27, 2021 17:06:47
Msg-id	66CD7D72-62D0-4ED1-BA98-3C8B9ED738C1@gmail.com Whole thread Raw
In response to	Re: Creating a query to structure results grouped by two columns, pivoting only a third colum (Steve Midgley <science@misuse.org>)
List	pgsql-sql

Tree view

Using PySpark API, the solution would be

````

df = df_table1.groupBy("area", "canal").pivot("date").sum("peso")

````

In oracle 11g, it would be sort of

````

SELECT * FROM table1 PIVOT (sum(peso) FOR date IN (

'2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01', '2021-07-01', '2021-08-01', '2021-0-01’

) P

````

Oracle has this strange way of dealing with PIVOT, without even need to group columns.

If I were to write the query, ut would be somehtinglike this (i.e. similar to Oracle but more explicit).

SELECT * FROM (

SELECT

date,

area,

sub_canal,

SUM(peso) AS peso

FROM table1

GROUP BY 1, 2, 3

ORDER BY 1, 2, 3

) t1

PIVOT (

peso FOR dat_referencia_mes IN ('2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01', '2021-07-01', '2021-08-01', '2021-0-01' )

) p

another thing that I would have done is to replace the static set of dates to a dynamic one. (i.e. SELECT DISTINCT date FROM table1)

However, I’m dealing with syntax errors now, plus once I fix them I’m not sure if that will work ( I mean

On Sep 25, 2021, at 21:53, Steve Midgley <science@misuse.org> wrote:

On Sat, Sep 25, 2021 at 9:49 AM Iuri Sampaio <iuri.sampaio@gmail.com> wrote:
[FROM]
###########################
date | area | canal | peso
2021-03-01 area1 can1 45.6768
2021-03-01 area1 can2 54.6768
2021-03-01 area1 can3 87.6
2021-03-01 area2 can1 1.87
2021-03-01 area2 can2 12.7687
2021-03-01 area2 can3 965.568
2021-03-01 area3 can1 968.95
2021-03-01 area3 can2 1.6867
2021-03-01 area3 can3 8.897
…. … … ...
2021-06-01 area1 can1
2021-06-01 area1 can2
2021-06-01 area1 can3
2021-06-01 area2 can1
… … ...

2021-12-01 area1
2021-12-01 area1
2021-12-01 area1
###########################

[TO]

###########################
area | canal | 2021-03-01 | 2021-04-01 | 2021-05-01 | 2021-06-01 | ...
peso peso peso peso

area1 can1 45.6768 875.98 1.232 …
area1 can2 54.6768 665.8 2.43 ...
area1 can3 87.6 65.8 4.76 ...
area1 total SUM(45.6768+54.6768+87.6) SUM(875.98+665.8+65.8) SUM(1.232+2.43+4.76) ...

area2 can1 1.87 … ...
area2 can2 12.7687
area2 can3 965.568
area2 total SUM(1.87+12.7687+965.568) … ...

area3 can1 968.95
area3 can2 1.6867
area3 can3 8.897

...
###########################

I think you'd want to use the "crosstab" function in the module "tablefunc" https://www.postgresql.org/docs/current/tablefunc.html

This post goes into quite a bit of detail on how to implement this type of query: https://stackoverflow.com/questions/3002499/postgresql-crosstab-query/11751905#11751905

Does that get you close enough?
Steve

On Sat, Sep 25, 2021 at 9:49 AM Iuri Sampaio <iuri.sampaio@gmail.com> wrote:
Hello there,
In order to achieve such structure "pivoting" table and "grouping by" multiple columns. (i.e. as illustrated below), what would be the SQL implementation?
The source query is:

````
SELECT
t1.date,
t1.area,
t1.canal,
SUM(t1.peso) AS peso
FROM table1 t1
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3

````

and source query generates a initial structure as in:

###########################
date | area | canal | peso
2021-03-01 area1 can1 45.6768
2021-03-01 area1 can2 54.6768
2021-03-01 area1 can3 87.6
2021-03-01 area2 can1 1.87
2021-03-01 area2 can2 12.7687
2021-03-01 area2 can3 965.568
2021-03-01 area3 can1 968.95
2021-03-01 area3 can2 1.6867
2021-03-01 area3 can3 8.897
…. … … ...
2021-06-01 area1 can1
2021-06-01 area1 can2
2021-06-01 area1 can3
2021-06-01 area2 can1
… … ...

2021-12-01 area1
2021-12-01 area1
2021-12-01 area1
###########################

Then, the goal's to achieve a final structure grouped by columns "area" and "canal", pivoting column “date", but only to the column "peso".
Plus, a partial total of each area, named as "total” .

I tried to write a query to support such a display, however, I got stuck at pivoting date only to the column peso

````
SELECT
t2.area,
t2.canal,
( SELECT
month,
peso_valor
FROM (
SELECT
month(t1.date) month,
t1.area,
t1.canal,
SUM(t1.peso_valor) AS peso_valor
FROM tbl_data t1
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3
) source_table
) PIVOT (
peso
FOR month IN (
1 Jan, 2 Feb, 3 Mar, 4 Apr, 5 Mai, 6 Jun, 7 Jul, 8 Aug, 9 Sep, 10 Oct, 11 Nov, 12 Dec
)
) as pivot_table
ORDER BY month
) as t2.peso
FROM tbl_data t2
GROUP BY 1, 2
````

what would be the SQL implementation to achieve a structure as such ?
(.i.e. grouped by columns "area" and "canal", pivoting column “date", but only to the column "peso”.)

illustrated bellow?

Best wishes,
I

###########################
area | canal | 2021-03-01 | 2021-04-01 | 2021-05-01 | 2021-06-01 | ...
peso peso peso peso

area1 can1 45.6768 875.98 1.232 …
area1 can2 54.6768 665.8 2.43 ...
area1 can3 87.6 65.8 4.76 ...
area1 total SUM(45.6768+54.6768+87.6) SUM(875.98+665.8+65.8) SUM(1.232+2.43+4.76) ...

area2 can1 1.87 … ...
area2 can2 12.7687
area2 can3 965.568
area2 total SUM(1.87+12.7687+965.568) … ...

area3 can1 968.95
area3 can2 1.6867
area3 can3 8.897

...
###########################

pgsql-sql by date:

From: Steve Midgley
Date: 26 September 2021, 03:53:39
Subject: Re: Creating a query to structure results grouped by two columns, pivoting only a third colum

From: Shaozhong SHI
Date: 03 October 2021, 10:30:40
Subject: PostgreSQL CHECK Constraint

Re: Creating a query to structure results grouped by two columns, pivoting only a third colum - Mailing list pgsql-sql

Previous

Next