Thread: Inner Join of the same table
Hello, I'm migrating from MS SQL Server to PostgreSQL 8.1 and I have a serious problem:
Table: APORTES - Rows: 9,000,000 (9 million)
*cuiT (char 11)
*cuiL (char 11)
*PERI (char 6)
FAMI (numeric 6)
I need all the cuiLs whose max(PERI) are from a cuiT, and the Max(FAMI) of those cuiLs, so the sentence is:
SELECT DISTINCT T.cuiT, T.cuiL. U.MAXPERI, U.MAXFAMI
FROM APORTES T
INNER JOIN
(SELECT cuiL, MAX(PERI) AS MAXPERI,
MAX(FAMI) AS MAXFAMI
FROM APORTES
GROUP BY cuiL) AS U
ON T.cuiL = U.cuiL AND T.PERI=U.MAXPERI
WHERE T.cuiT='12345678901'
In MS SQL Server it lasts 1minute, in PostgreSQL for Windows it lasts 40minutes and in PostgreSQL for Linux (FreeBSD) it lasts 20minuts.
Do you know if there is any way to tune the server or optimize this sentence?
Thanks
Sebastián Baioni
Sebastián Baioni
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
Probalo ya!
Table: APORTES - Rows: 9,000,000 (9 million)
*cuiT (char 11)
*cuiL (char 11)
*PERI (char 6)
FAMI (numeric 6)
I need all the cuiLs whose max(PERI) are from a cuiT, and the Max(FAMI) of those cuiLs, so the sentence is:
SELECT DISTINCT T.cuiT, T.cuiL. U.MAXPERI, U.MAXFAMI
FROM APORTES T
INNER JOIN
(SELECT cuiL, MAX(PERI) AS MAXPERI,
MAX(FAMI) AS MAXFAMI
FROM APORTES
GROUP BY cuiL) AS U
ON T.cuiL = U.cuiL AND T.PERI=U.MAXPERI
WHERE T.cuiT='12345678901'
In MS SQL Server it lasts 1minute, in PostgreSQL for Windows it lasts 40minutes and in PostgreSQL for Linux (FreeBSD) it lasts 20minuts.
Do you know if there is any way to tune the server or optimize this sentence?
Thanks
Sebastián Baioni
Sebastián Baioni
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
Probalo ya!
Can you provide an EXPLAIN ANALYZE of the query in PG? Have you analyzed the PG database? How many rows is this query expected to return? Which version of PG are you running? What indexes have you defined? -- Mark On Tue, 2006-08-15 at 14:38 +0000, Sebastián Baioni wrote: > Hello, I'm migrating from MS SQL Server to PostgreSQL 8.1 and I have a > serious problem: > Table: APORTES - Rows: 9,000,000 (9 million) > *cuiT (char 11) > *cuiL (char 11) > *PERI (char 6) > FAMI (numeric 6) > > I need all the cuiLs whose max(PERI) are from a cuiT, and the Max > (FAMI) of those cuiLs, so the sentence is: > > SELECT DISTINCT T.cuiT, T.cuiL. U.MAXPERI, U.MAXFAMI > FROM APORTES T > INNER JOIN > (SELECT cuiL, MAX(PERI) AS MAXPERI, > MAX(FAMI) AS MAXFAMI > FROM APORTES > GROUP BY cuiL) AS U > ON T.cuiL = U.cuiL AND T.PERI=U.MAXPERI > WHERE T.cuiT='12345678901' > > In MS SQL Server it lasts 1minute, in PostgreSQL for Windows it lasts > 40minutes and in PostgreSQL for Linux (FreeBSD) it lasts 20minuts. > > Do you know if there is any way to tune the server or optimize this > sentence? > > Thanks > Sebastián Baioni > > Instrumentos musicalesSebastián Baioni Ofertas náuticas > > > ______________________________________________________________________ > Preguntá. Respondé. Descubrí. > Todo lo que querías saber, y lo que ni imaginabas, > está en Yahoo! Respuestas (Beta). > Probalo ya!
Hi Nark, thanks for your answer. It's expected to return 1,720 rows (of 80,471 that match with condition WHERE T.cuiT='12345678901') We have indexes by : uesapt000: cuiT, cuiL, PERI; uesapt001: cuiL, PERI; uesapt002: cuiT, PERI; We usually make a vacuum analyze and reindex of every table, and we are running 8.0 and 8.1 for windows and 7.4 for Linux. Here is the EXPLAIN: QUERY PLAN 1 Unique (cost=37478647.41..37478650.53 rows=312 width=62) 2 -> Sort (cost=37478647.41..37478648.19 rows=312 width=62) 3 Sort Key: t.cuiT, t.cuiL, u.maxperi 4 -> Merge Join (cost=128944.78..37478634.48 rows=312 width=62) 5 Merge Cond: ("outer".cuiL = "inner".cuiL) 6 Join Filter: (("inner".PERI)::text = "outer".maxperi) 7 -> Subquery Scan u (cost=0.00..37348434.56 rows=3951 width=47) 8 -> GroupAggregate (cost=0.00..37348395.05 rows=3951 width=25) 9 -> Index Scan using uesapt001 on APORTES (cost=0.00..37301678.64 rows=9339331 width=25) 10 -> Sort (cost=128944.78..129100.44 rows=62263 width=40) 11 Sort Key: t.cuiL 12 -> Index Scan using uesapt002 on APORTES t (cost=0.00..122643.90 rows=62263 width=40) 13 Index Cond: (cuiT = '30701965554'::bpchar) Thanks Sebastián Baioni --- Mark Lewis <mark.lewis@mir3.com> escribió: > Can you provide an EXPLAIN ANALYZE of the query in PG? Have you > analyzed the PG database? How many rows is this query expected to > return? Which version of PG are you running? What indexes have you > defined? > > -- Mark > > On Tue, 2006-08-15 at 14:38 +0000, Sebastián Baioni wrote: > > Hello, I'm migrating from MS SQL Server to PostgreSQL 8.1 and I have a > > serious problem: > > Table: APORTES - Rows: 9,000,000 (9 million) > > *cuiT (char 11) > > *cuiL (char 11) > > *PERI (char 6) > > FAMI (numeric 6) > > > > I need all the cuiLs whose max(PERI) are from a cuiT, and the Max > > (FAMI) of those cuiLs, so the sentence is: > > > > SELECT DISTINCT T.cuiT, T.cuiL. U.MAXPERI, U.MAXFAMI > > FROM APORTES T > > INNER JOIN > > (SELECT cuiL, MAX(PERI) AS MAXPERI, > > MAX(FAMI) AS MAXFAMI > > FROM APORTES > > GROUP BY cuiL) AS U > > ON T.cuiL = U.cuiL AND T.PERI=U.MAXPERI > > WHERE T.cuiT='12345678901' > > > > In MS SQL Server it lasts 1minute, in PostgreSQL for Windows it lasts > > 40minutes and in PostgreSQL for Linux (FreeBSD) it lasts 20minuts. > > > > Do you know if there is any way to tune the server or optimize this > > sentence? > > > > Thanks > > Sebastián Baioni __________________________________________________ Preguntá. Respondé. Descubrí. Todo lo que querías saber, y lo que ni imaginabas, está en Yahoo! Respuestas (Beta). ¡Probalo ya! http://www.yahoo.com.ar/respuestas
On Tue, Aug 15, 2006 at 03:43:29PM +0000, Sebasti?n Baioni wrote: > Hi Nark, thanks for your answer. > > It's expected to return 1,720 rows (of 80,471 that match with condition WHERE > T.cuiT='12345678901') > > We have indexes by : > uesapt000: cuiT, cuiL, PERI; > uesapt001: cuiL, PERI; > uesapt002: cuiT, PERI; > > We usually make a vacuum analyze and reindex of every table, and we are running 8.0 and 8.1 for > windows and 7.4 for Linux. > > Here is the EXPLAIN: > QUERY PLAN > 1 Unique (cost=37478647.41..37478650.53 rows=312 width=62) > 2 -> Sort (cost=37478647.41..37478648.19 rows=312 width=62) > 3 Sort Key: t.cuiT, t.cuiL, u.maxperi > 4 -> Merge Join (cost=128944.78..37478634.48 rows=312 width=62) > 5 Merge Cond: ("outer".cuiL = "inner".cuiL) > 6 Join Filter: (("inner".PERI)::text = "outer".maxperi) > 7 -> Subquery Scan u (cost=0.00..37348434.56 rows=3951 width=47) > 8 -> GroupAggregate (cost=0.00..37348395.05 rows=3951 width=25) > 9 -> Index Scan using uesapt001 on APORTES (cost=0.00..37301678.64 > rows=9339331 width=25) > 10 -> Sort (cost=128944.78..129100.44 rows=62263 width=40) > 11 Sort Key: t.cuiL > 12 -> Index Scan using uesapt002 on APORTES t (cost=0.00..122643.90 > rows=62263 width=40) > 13 Index Cond: (cuiT = '30701965554'::bpchar) That's EXPLAIN, not EXPLAIN ANALYZE, which doesn't help us much. Best bet would be an EXPLAIN ANALYZE from 8.1.x. It would also be useful to know how MSSQL is executing this query. If it would serve your purposes, copying the WHERE clause into the subquery would really help things. I think it might also mean you could combine everything into one query. > Thanks > Sebasti?n Baioni > > --- Mark Lewis <mark.lewis@mir3.com> escribi?: > > > Can you provide an EXPLAIN ANALYZE of the query in PG? Have you > > analyzed the PG database? How many rows is this query expected to > > return? Which version of PG are you running? What indexes have you > > defined? > > > > -- Mark > > > > On Tue, 2006-08-15 at 14:38 +0000, Sebasti?n Baioni wrote: > > > Hello, I'm migrating from MS SQL Server to PostgreSQL 8.1 and I have a > > > serious problem: > > > Table: APORTES - Rows: 9,000,000 (9 million) > > > *cuiT (char 11) > > > *cuiL (char 11) > > > *PERI (char 6) > > > FAMI (numeric 6) > > > > > > I need all the cuiLs whose max(PERI) are from a cuiT, and the Max > > > (FAMI) of those cuiLs, so the sentence is: > > > > > > SELECT DISTINCT T.cuiT, T.cuiL. U.MAXPERI, U.MAXFAMI > > > FROM APORTES T > > > INNER JOIN > > > (SELECT cuiL, MAX(PERI) AS MAXPERI, > > > MAX(FAMI) AS MAXFAMI > > > FROM APORTES > > > GROUP BY cuiL) AS U > > > ON T.cuiL = U.cuiL AND T.PERI=U.MAXPERI > > > WHERE T.cuiT='12345678901' > > > > > > In MS SQL Server it lasts 1minute, in PostgreSQL for Windows it lasts > > > 40minutes and in PostgreSQL for Linux (FreeBSD) it lasts 20minuts. > > > > > > Do you know if there is any way to tune the server or optimize this > > > sentence? > > > > > > Thanks > > > Sebasti?n Baioni > > > > > > __________________________________________________ > Pregunt?. Respond?. Descubr?. > Todo lo que quer?as saber, y lo que ni imaginabas, > est? en Yahoo! Respuestas (Beta). > ?Probalo ya! > http://www.yahoo.com.ar/respuestas > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Hello Jim, we can't use the Where cuiT='12345678901' in the subquery because we need max(cuiL) independently of that cuiT: cuiT cuiL PERI FAMI 1 a 200608 0 1 a 200601 2 1 b 200607 3 1 c 200605 4 2 a 200605 9 2 c 200604 4 2 b 200608 1 We need: where cuiT = '1' cuiT cuiL PERI FAMI 1 a 200608 9 1 c 200605 4 If we place the Where cuiT = '1' in the subquery we couldn't get the max(FAMI) of cuiL a = 9 and we couldn't know if that PERI is the max(PERI) of that cuiL independently of that cuiT. Here is the explain analyze with PG 8.0 for Windows: Explain Analyze SELECT DISTINCT T.cuiT,T.cuiL, U.MAXPERI AS ULT_APORTE_O_DDJJ FROM APORTES AS T INNER JOIN ( SELECT cuiL, MAX(PERI) AS MAXPERI FROM APORTES GROUP BY cuiL ) AS U ON T.cuiL=U.cuiL AND T.PERI=U.MAXPERI WHERE T.cuiT='12345678901' order by T.cuiT, T.cuiL, U.MAXPERI; QUERY PLAN 1 Unique (cost=37478647.41..37478650.53 rows=312 width=62) (actual time=2677209.000..2677520.000 rows=1720 loops=1) 2 -> Sort (cost=37478647.41..37478648.19 rows=312 width=62) (actual time=2677209.000..2677260.000 rows=3394 loops=1) 3 Sort Key: t.cuiT, t.cuiL, u.maxperi 4 -> Merge Join (cost=128944.78..37478634.48 rows=312 width=62) (actual time=74978.000..2677009.000 rows=3394 loops=1) 5 Merge Cond: ("outer".cuiL = "inner".cuiL) 6 Join Filter: (("inner".peri)::text = "outer".maxperi) 7 -> Subquery Scan u (cost=0.00..37348434.56 rows=3951 width=47) (actual time=130.000..2634923.000 rows=254576 loops=1) 8 -> GroupAggregate (cost=0.00..37348395.05 rows=3951 width=25) (actual time=130.000..2629617.000 rows=254576 loops=1) 9 -> Index Scan using uesapt001 on APORTES (cost=0.00..37301678.64 rows=9339331 width=25) (actual time=110.000..2520690.000 rows=9335892 loops=1) 10 -> Sort (cost=128944.78..129100.44 rows=62263 width=40) (actual time=30684.000..36838.000 rows=80471 loops=1) 11 Sort Key: t.cuiL 12 -> Index Scan using uesapt002 on APORTES t (cost=0.00..122643.90 rows=62263 width=40) (actual time=170.000..25566.000 rows=80471 loops=1) 13 Index Cond: (cuiT = '12345678901'::bpchar) Total runtime: 2677640.000 ms Thanks Sebastián Baioni --- "Jim C. Nasby" <jnasby@pervasive.com> escribió: > On Tue, Aug 15, 2006 at 03:43:29PM +0000, Sebasti?n Baioni wrote: > > Hi Nark, thanks for your answer. > > > > It's expected to return 1,720 rows (of 80,471 that match with condition WHERE > > T.cuiT='12345678901') > > > > We have indexes by : > > uesapt000: cuiT, cuiL, PERI; > > uesapt001: cuiL, PERI; > > uesapt002: cuiT, PERI; > > > > We usually make a vacuum analyze and reindex of every table, and we are running 8.0 and 8.1 for windows and 7.4 for Linux. > > That's EXPLAIN, not EXPLAIN ANALYZE, which doesn't help us much. Best > bet would be an EXPLAIN ANALYZE from 8.1.x. It would also be useful to > know how MSSQL is executing this query. > > If it would serve your purposes, copying the WHERE clause into the > subquery would really help things. I think it might also mean you could > combine everything into one query. > > > Thanks > > Sebasti?n Baioni > > > > --- Mark Lewis <mark.lewis@mir3.com> escribi?: > > > > > Can you provide an EXPLAIN ANALYZE of the query in PG? Have you > > > analyzed the PG database? How many rows is this query expected to > > > return? Which version of PG are you running? What indexes have you > > > defined? > > > > > > -- Mark > > > > > > On Tue, 2006-08-15 at 14:38 +0000, Sebasti?n Baioni wrote: > > > > Hello, I'm migrating from MS SQL Server to PostgreSQL 8.1 and I have a > > > > serious problem: > > > > Table: APORTES - Rows: 9,000,000 (9 million) > > > > *cuiT (char 11) > > > > *cuiL (char 11) > > > > *PERI (char 6) > > > > FAMI (numeric 6) > > > > > > > > I need all the cuiLs whose max(PERI) are from a cuiT, and the Max > > > > (FAMI) of those cuiLs, so the sentence is: > > > > > > > > SELECT DISTINCT T.cuiT, T.cuiL. U.MAXPERI, U.MAXFAMI > > > > FROM APORTES T > > > > INNER JOIN > > > > (SELECT cuiL, MAX(PERI) AS MAXPERI, > > > > MAX(FAMI) AS MAXFAMI > > > > FROM APORTES > > > > GROUP BY cuiL) AS U > > > > ON T.cuiL = U.cuiL AND T.PERI=U.MAXPERI > > > > WHERE T.cuiT='12345678901' > > > > > > > > In MS SQL Server it lasts 1minute, in PostgreSQL for Windows it lasts > > > > 40minutes and in PostgreSQL for Linux (FreeBSD) it lasts 20minuts. > > > > > > > > Do you know if there is any way to tune the server or optimize this > > > > sentence? > > > > > > > > Thanks > > > > Sebastián Baioni __________________________________________________ Preguntá. Respondé. Descubrí. Todo lo que querías saber, y lo que ni imaginabas, está en Yahoo! Respuestas (Beta). ¡Probalo ya! http://www.yahoo.com.ar/respuestas
On Aug 15, 2006, at 1:53 PM, Sebastián Baioni wrote: > 9 -> Index Scan using uesapt001 on > APORTES (cost=0.00..37301678.64 > rows=9339331 width=25) (actual time=110.000..2520690.000 > rows=9335892 loops=1) It's taking 2520 seconds to scan an index with 9M rows, which sounds way, way too slow. I suspect that index got bloated badly at some point by not vacuuming frequently enough (autovacuum is your friend). Try reindexing and see if that fixes the problem. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
=?iso-8859-1?q?Sebasti=E1n=20Baioni?= <sebaioni-postgresql@yahoo.com.ar> writes: > 8 -> GroupAggregate (cost=0.00..37348395.05 rows=3951 width=25) (actual > time=130.000..2629617.000 rows=254576 loops=1) > 9 -> Index Scan using uesapt001 on APORTES (cost=0.00..37301678.64 > rows=9339331 width=25) (actual time=110.000..2520690.000 rows=9335892 loops=1) Given the relatively small estimated number of group rows, I'd have expected the thing to use a seqscan and HashAggregate for this part. Do you have enable_hashagg turned off for some reason? Or enable_seqscan? regards, tom lane
I had enable_seqscan turned OFF; With enable_seqscan turned ON it takes only 6 minutes to complete the query and not 44minuts like it did with enable_seqscan turned OFF. THANKS A LOT! It's still much more slower than MS SQL server but now it has acceptable times. Sebastián Baioni --- Tom Lane <tgl@sss.pgh.pa.us> escribió: > Given the relatively small estimated number of group rows, I'd have > expected the thing to use a seqscan and HashAggregate for this part. > Do you have enable_hashagg turned off for some reason? Or enable_seqscan? > > regards, tom lane > > Hello Jim, we can't use the Where cuiT='12345678901' in the subquery because we need max(cuiL) independently of that cuiT: > > cuiT cuiL PERI FAMI > > 1 a 200608 0 > > 1 a 200601 2 > > 1 b 200607 3 > > 1 c 200605 4 > > 2 a 200605 9 > > 2 c 200604 4 > > 2 b 200608 1 > > We need: > > where cuiT = '1' > > cuiT cuiL PERI FAMI > > 1 a 200608 9 > > 1 c 200605 4 > > If we place the Where cuiT = '1' in the subquery we couldn't get the max(FAMI) of cuiL a = 9 and we couldn't know if that PERI is the max(PERI) of that cuiL independently of that cuiT. > > > > Here is the explain analyze with PG 8.0 for Windows: > > Explain Analyze > > SELECT DISTINCT T.cuiT,T.cuiL, U.MAXPERI AS ULT_APORTE_O_DDJJ > > FROM APORTES AS T > > INNER JOIN > > ( > > SELECT cuiL, MAX(PERI) AS MAXPERI > > FROM APORTES > > GROUP BY cuiL > > ) AS U ON T.cuiL=U.cuiL AND T.PERI=U.MAXPERI > > WHERE T.cuiT='12345678901' > > order by T.cuiT, T.cuiL, U.MAXPERI; > > > > QUERY PLAN > > 1 Unique (cost=37478647.41..37478650.53 rows=312 width=62) (actual time=2677209.000..2677520.000 > > rows=1720 loops=1) > > 2 -> Sort (cost=37478647.41..37478648.19 rows=312 width=62) (actual time=2677209.000..2677260.000 rows=3394 loops=1) > > 3 Sort Key: t.cuiT, t.cuiL, u.maxperi > > 4 -> Merge Join (cost=128944.78..37478634.48 rows=312 width=62) (actual time=74978.000..2677009.000 rows=3394 loops=1) > > 5 Merge Cond: ("outer".cuiL = "inner".cuiL) > > 6 Join Filter: (("inner".peri)::text = "outer".maxperi) > > 7 -> Subquery Scan u (cost=0.00..37348434.56 rows=3951 width=47) (actual time=130.000..2634923.000 rows=254576 loops=1) > > 8 -> GroupAggregate (cost=0.00..37348395.05 rows=3951 width=25) (actual time=130.000..2629617.000 rows=254576 loops=1) > > 9 -> Index Scan using uesapt001 on APORTES (cost=0.00..37301678.64 rows=9339331 width=25) (actual time=110.000..2520690.000 rows=9335892 loops=1) > > 10 -> Sort (cost=128944.78..129100.44 rows=62263 width=40) (actual time=30684.000..36838.000 rows=80471 loops=1) > > 11 Sort Key: t.cuiL > > 12 -> Index Scan using uesapt002 on APORTES t (cost=0.00..122643.90 rows=62263 width=40) (actual time=170.000..25566.000 rows=80471 loops=1) > > 13 Index Cond: (cuiT = '12345678901'::bpchar) > > Total runtime: 2677640.000 ms > > > > Thanks > > Sebastián Baioni __________________________________________________ Preguntá. Respondé. Descubrí. Todo lo que querías saber, y lo que ni imaginabas, está en Yahoo! Respuestas (Beta). ¡Probalo ya! http://www.yahoo.com.ar/respuestas