Thread: Inner Join of the same table

Inner Join of the same table

From
Sebastián Baioni
Date:
Hello, I'm migrating from MS SQL Server to PostgreSQL 8.1 and I have a serious problem:
Table: APORTES - Rows: 9,000,000 (9 million)
*cuiT (char 11)
*cuiL (char 11)
*PERI (char 6)
FAMI (numeric 6)

I need all the cuiLs whose max(PERI) are from a cuiT, and the Max(FAMI) of those cuiLs, so the sentence is:

SELECT DISTINCT T.cuiT, T.cuiL. U.MAXPERI, U.MAXFAMI
       FROM APORTES T
       INNER JOIN
       (SELECT cuiL, MAX(PERI) AS MAXPERI,
               MAX(FAMI) AS MAXFAMI

        FROM APORTES
        GROUP BY cuiL) AS U
       ON T.cuiL = U.cuiL AND T.PERI=U.MAXPERI
WHERE T.cuiT='12345678901'

In MS SQL Server it lasts 1minute, in PostgreSQL for Windows it lasts 40minutes and in PostgreSQL for Linux (FreeBSD) it lasts 20minuts.

Do you know if there is any way to tune the server or optimize this sentence?

Thanks
     Sebastián Baioni

Instrumentos musicalesSebastián Baioni Ofertas náuticas


Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
Probalo ya!

Re: Inner Join of the same table

From
Mark Lewis
Date:
Can you provide an EXPLAIN ANALYZE of the query in PG?  Have you
analyzed the PG database?  How many rows is this query expected to
return?  Which version of PG are you running?  What indexes have you
defined?

-- Mark

On Tue, 2006-08-15 at 14:38 +0000, Sebastián Baioni wrote:
> Hello, I'm migrating from MS SQL Server to PostgreSQL 8.1 and I have a
> serious problem:
> Table: APORTES - Rows: 9,000,000 (9 million)
> *cuiT (char 11)
> *cuiL (char 11)
> *PERI (char 6)
> FAMI (numeric 6)
>
> I need all the cuiLs whose max(PERI) are from a cuiT, and the Max
> (FAMI) of those cuiLs, so the sentence is:
>
> SELECT DISTINCT T.cuiT, T.cuiL. U.MAXPERI, U.MAXFAMI
>        FROM APORTES T
>        INNER JOIN
>        (SELECT cuiL, MAX(PERI) AS MAXPERI,
>                MAX(FAMI) AS MAXFAMI
>         FROM APORTES
>         GROUP BY cuiL) AS U
>        ON T.cuiL = U.cuiL AND T.PERI=U.MAXPERI
> WHERE T.cuiT='12345678901'
>
> In MS SQL Server it lasts 1minute, in PostgreSQL for Windows it lasts
> 40minutes and in PostgreSQL for Linux (FreeBSD) it lasts 20minuts.
>
> Do you know if there is any way to tune the server or optimize this
> sentence?
>
> Thanks
>      Sebastián Baioni
>
> Instrumentos musicalesSebastián Baioni Ofertas náuticas
>
>
> ______________________________________________________________________
> Preguntá. Respondé. Descubrí.
> Todo lo que querías saber, y lo que ni imaginabas,
> está en Yahoo! Respuestas (Beta).
> Probalo ya!

Re: Inner Join of the same table

From
Sebastián Baioni
Date:
Hi Nark, thanks for your answer.

It's expected to return 1,720 rows (of 80,471 that match with condition WHERE
T.cuiT='12345678901')

We have indexes by :
uesapt000: cuiT, cuiL, PERI;
uesapt001: cuiL, PERI;
uesapt002: cuiT, PERI;

We usually make a vacuum analyze and reindex of every table, and we are running 8.0 and 8.1 for
windows and 7.4 for Linux.

Here is the EXPLAIN:
QUERY PLAN
 1 Unique  (cost=37478647.41..37478650.53 rows=312 width=62)
 2  ->  Sort  (cost=37478647.41..37478648.19 rows=312 width=62)
 3        Sort Key: t.cuiT, t.cuiL, u.maxperi
 4        ->  Merge Join  (cost=128944.78..37478634.48 rows=312 width=62)
 5             Merge Cond: ("outer".cuiL = "inner".cuiL)
 6              Join Filter: (("inner".PERI)::text = "outer".maxperi)
 7              ->  Subquery Scan u  (cost=0.00..37348434.56 rows=3951 width=47)
 8                    ->  GroupAggregate  (cost=0.00..37348395.05 rows=3951 width=25)
 9                          ->  Index Scan using uesapt001 on APORTES  (cost=0.00..37301678.64
rows=9339331 width=25)
10              ->  Sort  (cost=128944.78..129100.44 rows=62263 width=40)
11                    Sort Key: t.cuiL
12                    ->  Index Scan using uesapt002 on APORTES t  (cost=0.00..122643.90
rows=62263 width=40)
13                          Index Cond: (cuiT = '30701965554'::bpchar)

Thanks
     Sebastián Baioni

 --- Mark Lewis <mark.lewis@mir3.com> escribió:

> Can you provide an EXPLAIN ANALYZE of the query in PG?  Have you
> analyzed the PG database?  How many rows is this query expected to
> return?  Which version of PG are you running?  What indexes have you
> defined?
>
> -- Mark
>
> On Tue, 2006-08-15 at 14:38 +0000, Sebastián Baioni wrote:
> > Hello, I'm migrating from MS SQL Server to PostgreSQL 8.1 and I have a
> > serious problem:
> > Table: APORTES - Rows: 9,000,000 (9 million)
> > *cuiT (char 11)
> > *cuiL (char 11)
> > *PERI (char 6)
> > FAMI (numeric 6)
> >
> > I need all the cuiLs whose max(PERI) are from a cuiT, and the Max
> > (FAMI) of those cuiLs, so the sentence is:
> >
> > SELECT DISTINCT T.cuiT, T.cuiL. U.MAXPERI, U.MAXFAMI
> >        FROM APORTES T
> >        INNER JOIN
> >        (SELECT cuiL, MAX(PERI) AS MAXPERI,
> >                MAX(FAMI) AS MAXFAMI
> >         FROM APORTES
> >         GROUP BY cuiL) AS U
> >        ON T.cuiL = U.cuiL AND T.PERI=U.MAXPERI
> > WHERE T.cuiT='12345678901'
> >
> > In MS SQL Server it lasts 1minute, in PostgreSQL for Windows it lasts
> > 40minutes and in PostgreSQL for Linux (FreeBSD) it lasts 20minuts.
> >
> > Do you know if there is any way to tune the server or optimize this
> > sentence?
> >
> > Thanks
> >      Sebastián Baioni





__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas


Re: Inner Join of the same table

From
"Jim C. Nasby"
Date:
On Tue, Aug 15, 2006 at 03:43:29PM +0000, Sebasti?n Baioni wrote:
> Hi Nark, thanks for your answer.
>
> It's expected to return 1,720 rows (of 80,471 that match with condition WHERE
> T.cuiT='12345678901')
>
> We have indexes by :
> uesapt000: cuiT, cuiL, PERI;
> uesapt001: cuiL, PERI;
> uesapt002: cuiT, PERI;
>
> We usually make a vacuum analyze and reindex of every table, and we are running 8.0 and 8.1 for
> windows and 7.4 for Linux.
>
> Here is the EXPLAIN:
> QUERY PLAN
>  1 Unique  (cost=37478647.41..37478650.53 rows=312 width=62)
>  2  ->  Sort  (cost=37478647.41..37478648.19 rows=312 width=62)
>  3        Sort Key: t.cuiT, t.cuiL, u.maxperi
>  4        ->  Merge Join  (cost=128944.78..37478634.48 rows=312 width=62)
>  5             Merge Cond: ("outer".cuiL = "inner".cuiL)
>  6              Join Filter: (("inner".PERI)::text = "outer".maxperi)
>  7              ->  Subquery Scan u  (cost=0.00..37348434.56 rows=3951 width=47)
>  8                    ->  GroupAggregate  (cost=0.00..37348395.05 rows=3951 width=25)
>  9                          ->  Index Scan using uesapt001 on APORTES  (cost=0.00..37301678.64
> rows=9339331 width=25)
> 10              ->  Sort  (cost=128944.78..129100.44 rows=62263 width=40)
> 11                    Sort Key: t.cuiL
> 12                    ->  Index Scan using uesapt002 on APORTES t  (cost=0.00..122643.90
> rows=62263 width=40)
> 13                          Index Cond: (cuiT = '30701965554'::bpchar)

That's EXPLAIN, not EXPLAIN ANALYZE, which doesn't help us much. Best
bet would be an EXPLAIN ANALYZE from 8.1.x. It would also be useful to
know how MSSQL is executing this query.

If it would serve your purposes, copying the WHERE clause into the
subquery would really help things. I think it might also mean you could
combine everything into one query.

> Thanks
>      Sebasti?n Baioni
>
>  --- Mark Lewis <mark.lewis@mir3.com> escribi?:
>
> > Can you provide an EXPLAIN ANALYZE of the query in PG?  Have you
> > analyzed the PG database?  How many rows is this query expected to
> > return?  Which version of PG are you running?  What indexes have you
> > defined?
> >
> > -- Mark
> >
> > On Tue, 2006-08-15 at 14:38 +0000, Sebasti?n Baioni wrote:
> > > Hello, I'm migrating from MS SQL Server to PostgreSQL 8.1 and I have a
> > > serious problem:
> > > Table: APORTES - Rows: 9,000,000 (9 million)
> > > *cuiT (char 11)
> > > *cuiL (char 11)
> > > *PERI (char 6)
> > > FAMI (numeric 6)
> > >
> > > I need all the cuiLs whose max(PERI) are from a cuiT, and the Max
> > > (FAMI) of those cuiLs, so the sentence is:
> > >
> > > SELECT DISTINCT T.cuiT, T.cuiL. U.MAXPERI, U.MAXFAMI
> > >        FROM APORTES T
> > >        INNER JOIN
> > >        (SELECT cuiL, MAX(PERI) AS MAXPERI,
> > >                MAX(FAMI) AS MAXFAMI
> > >         FROM APORTES
> > >         GROUP BY cuiL) AS U
> > >        ON T.cuiL = U.cuiL AND T.PERI=U.MAXPERI
> > > WHERE T.cuiT='12345678901'
> > >
> > > In MS SQL Server it lasts 1minute, in PostgreSQL for Windows it lasts
> > > 40minutes and in PostgreSQL for Linux (FreeBSD) it lasts 20minuts.
> > >
> > > Do you know if there is any way to tune the server or optimize this
> > > sentence?
> > >
> > > Thanks
> > >      Sebasti?n Baioni
>
>
>
>
>
> __________________________________________________
> Pregunt?. Respond?. Descubr?.
> Todo lo que quer?as saber, y lo que ni imaginabas,
> est? en Yahoo! Respuestas (Beta).
> ?Probalo ya!
> http://www.yahoo.com.ar/respuestas
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Inner Join of the same table

From
Sebastián Baioni
Date:
Hello Jim, we can't use the Where cuiT='12345678901' in the subquery because we need max(cuiL)
independently of that cuiT:
cuiT cuiL PERI    FAMI
1    a    200608  0
1    a    200601  2
1    b    200607  3
1    c    200605  4
2    a    200605  9
2    c    200604  4
2    b    200608  1
We need:
where cuiT = '1'
cuiT cuiL PERI    FAMI
1    a    200608  9
1    c    200605  4
If we place the Where cuiT = '1' in the subquery we couldn't get the max(FAMI) of cuiL a = 9 and
we couldn't know if that PERI is the max(PERI) of that cuiL independently of that cuiT.

Here is the explain analyze with PG 8.0 for Windows:
Explain Analyze
SELECT DISTINCT T.cuiT,T.cuiL, U.MAXPERI AS  ULT_APORTE_O_DDJJ
        FROM APORTES AS T
        INNER JOIN
        (
        SELECT cuiL, MAX(PERI) AS MAXPERI
        FROM APORTES
        GROUP BY cuiL
        ) AS U ON T.cuiL=U.cuiL AND T.PERI=U.MAXPERI
WHERE T.cuiT='12345678901'
order by T.cuiT, T.cuiL, U.MAXPERI;

QUERY PLAN
 1 Unique  (cost=37478647.41..37478650.53 rows=312 width=62) (actual time=2677209.000..2677520.000
rows=1720 loops=1)
 2  ->  Sort  (cost=37478647.41..37478648.19 rows=312 width=62) (actual
time=2677209.000..2677260.000 rows=3394 loops=1)
 3        Sort Key: t.cuiT, t.cuiL, u.maxperi
 4        ->  Merge Join  (cost=128944.78..37478634.48 rows=312 width=62) (actual
time=74978.000..2677009.000 rows=3394 loops=1)
 5              Merge Cond: ("outer".cuiL = "inner".cuiL)
 6              Join Filter: (("inner".peri)::text = "outer".maxperi)
 7              ->  Subquery Scan u  (cost=0.00..37348434.56 rows=3951 width=47) (actual
time=130.000..2634923.000 rows=254576 loops=1)
 8                    ->  GroupAggregate  (cost=0.00..37348395.05 rows=3951 width=25) (actual
time=130.000..2629617.000 rows=254576 loops=1)
 9                          ->  Index Scan using uesapt001 on APORTES  (cost=0.00..37301678.64
rows=9339331 width=25) (actual time=110.000..2520690.000 rows=9335892 loops=1)
10              ->  Sort  (cost=128944.78..129100.44 rows=62263 width=40) (actual
time=30684.000..36838.000 rows=80471 loops=1)
11                    Sort Key: t.cuiL
12                    ->  Index Scan using uesapt002 on APORTES t  (cost=0.00..122643.90
rows=62263 width=40) (actual time=170.000..25566.000 rows=80471 loops=1)
13                          Index Cond: (cuiT = '12345678901'::bpchar)
Total runtime: 2677640.000 ms

Thanks
     Sebastián Baioni

 --- "Jim C. Nasby" <jnasby@pervasive.com> escribió:

> On Tue, Aug 15, 2006 at 03:43:29PM +0000, Sebasti?n Baioni wrote:
> > Hi Nark, thanks for your answer.
> >
> > It's expected to return 1,720 rows (of 80,471 that match with condition WHERE
> > T.cuiT='12345678901')
> >
> > We have indexes by :
> > uesapt000: cuiT, cuiL, PERI;
> > uesapt001: cuiL, PERI;
> > uesapt002: cuiT, PERI;
> >
> > We usually make a vacuum analyze and reindex of every table, and we are running 8.0 and 8.1
for windows and 7.4 for Linux.
>
> That's EXPLAIN, not EXPLAIN ANALYZE, which doesn't help us much. Best
> bet would be an EXPLAIN ANALYZE from 8.1.x. It would also be useful to
> know how MSSQL is executing this query.
>
> If it would serve your purposes, copying the WHERE clause into the
> subquery would really help things. I think it might also mean you could
> combine everything into one query.
>
> > Thanks
> >      Sebasti?n Baioni
> >
> >  --- Mark Lewis <mark.lewis@mir3.com> escribi?:
> >
> > > Can you provide an EXPLAIN ANALYZE of the query in PG?  Have you
> > > analyzed the PG database?  How many rows is this query expected to
> > > return?  Which version of PG are you running?  What indexes have you
> > > defined?
> > >
> > > -- Mark
> > >
> > > On Tue, 2006-08-15 at 14:38 +0000, Sebasti?n Baioni wrote:
> > > > Hello, I'm migrating from MS SQL Server to PostgreSQL 8.1 and I have a
> > > > serious problem:
> > > > Table: APORTES - Rows: 9,000,000 (9 million)
> > > > *cuiT (char 11)
> > > > *cuiL (char 11)
> > > > *PERI (char 6)
> > > > FAMI (numeric 6)
> > > >
> > > > I need all the cuiLs whose max(PERI) are from a cuiT, and the Max
> > > > (FAMI) of those cuiLs, so the sentence is:
> > > >
> > > > SELECT DISTINCT T.cuiT, T.cuiL. U.MAXPERI, U.MAXFAMI
> > > >        FROM APORTES T
> > > >        INNER JOIN
> > > >        (SELECT cuiL, MAX(PERI) AS MAXPERI,
> > > >                MAX(FAMI) AS MAXFAMI
> > > >         FROM APORTES
> > > >         GROUP BY cuiL) AS U
> > > >        ON T.cuiL = U.cuiL AND T.PERI=U.MAXPERI
> > > > WHERE T.cuiT='12345678901'
> > > >
> > > > In MS SQL Server it lasts 1minute, in PostgreSQL for Windows it lasts
> > > > 40minutes and in PostgreSQL for Linux (FreeBSD) it lasts 20minuts.
> > > >
> > > > Do you know if there is any way to tune the server or optimize this
> > > > sentence?
> > > >
> > > > Thanks
> > > >      Sebastián Baioni





__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas


Re: Inner Join of the same table

From
Jim Nasby
Date:
On Aug 15, 2006, at 1:53 PM, Sebastián Baioni wrote:
>  9                          ->  Index Scan using uesapt001 on
> APORTES  (cost=0.00..37301678.64
> rows=9339331 width=25) (actual time=110.000..2520690.000
> rows=9335892 loops=1)

It's taking 2520 seconds to scan an index with 9M rows, which sounds
way, way too slow. I suspect that index got bloated badly at some
point by not vacuuming frequently enough (autovacuum is your friend).
Try reindexing and see if that fixes the problem.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461



Re: Inner Join of the same table

From
Tom Lane
Date:
=?iso-8859-1?q?Sebasti=E1n=20Baioni?= <sebaioni-postgresql@yahoo.com.ar> writes:
>  8                    ->  GroupAggregate  (cost=0.00..37348395.05 rows=3951 width=25) (actual
> time=130.000..2629617.000 rows=254576 loops=1)
>  9                          ->  Index Scan using uesapt001 on APORTES  (cost=0.00..37301678.64
> rows=9339331 width=25) (actual time=110.000..2520690.000 rows=9335892 loops=1)

Given the relatively small estimated number of group rows, I'd have
expected the thing to use a seqscan and HashAggregate for this part.
Do you have enable_hashagg turned off for some reason?  Or enable_seqscan?

            regards, tom lane

Re: Inner Join of the same table

From
Sebastián Baioni
Date:
I had enable_seqscan turned OFF; With enable_seqscan turned ON it takes only 6 minutes to complete
the query and not 44minuts like it did with enable_seqscan turned OFF.  THANKS A LOT!
It's still much more slower than MS SQL server but now it has acceptable times.

     Sebastián Baioni

 --- Tom Lane <tgl@sss.pgh.pa.us> escribió:
> Given the relatively small estimated number of group rows, I'd have
> expected the thing to use a seqscan and HashAggregate for this part.
> Do you have enable_hashagg turned off for some reason?  Or enable_seqscan?
>
>             regards, tom lane
> > Hello Jim, we can't use the Where cuiT='12345678901' in the subquery because we need max(cuiL)
independently of that cuiT:
> > cuiT cuiL PERI    FAMI
> > 1    a    200608  0
> > 1    a    200601  2
> > 1    b    200607  3
> > 1    c    200605  4
> > 2    a    200605  9
> > 2    c    200604  4
> > 2    b    200608  1
> > We need:
> > where cuiT = '1'
> > cuiT cuiL PERI    FAMI
> > 1    a    200608  9
> > 1    c    200605  4
> > If we place the Where cuiT = '1' in the subquery we couldn't get the max(FAMI) of cuiL a = 9
and we couldn't know if that PERI is the max(PERI) of that cuiL independently of that cuiT.
> >
> > Here is the explain analyze with PG 8.0 for Windows:
> > Explain Analyze
> > SELECT DISTINCT T.cuiT,T.cuiL, U.MAXPERI AS  ULT_APORTE_O_DDJJ
> >         FROM APORTES AS T
> >         INNER JOIN
> >         (
> >         SELECT cuiL, MAX(PERI) AS MAXPERI
> >         FROM APORTES
> >         GROUP BY cuiL
> >         ) AS U ON T.cuiL=U.cuiL AND T.PERI=U.MAXPERI
> > WHERE T.cuiT='12345678901'
> > order by T.cuiT, T.cuiL, U.MAXPERI;
> >
> > QUERY PLAN
> > 1 Unique  (cost=37478647.41..37478650.53 rows=312 width=62) (actual
time=2677209.000..2677520.000
> > rows=1720 loops=1)
> >  2  ->  Sort  (cost=37478647.41..37478648.19 rows=312 width=62) (actual
time=2677209.000..2677260.000 rows=3394 loops=1)
> >  3        Sort Key: t.cuiT, t.cuiL, u.maxperi
> >  4        ->  Merge Join  (cost=128944.78..37478634.48 rows=312 width=62) (actual
time=74978.000..2677009.000 rows=3394 loops=1)
> >  5              Merge Cond: ("outer".cuiL = "inner".cuiL)
> >  6              Join Filter: (("inner".peri)::text = "outer".maxperi)
> >  7              ->  Subquery Scan u  (cost=0.00..37348434.56 rows=3951 width=47) (actual
time=130.000..2634923.000 rows=254576 loops=1)
> >  8                    ->  GroupAggregate  (cost=0.00..37348395.05 rows=3951 width=25) (actual
time=130.000..2629617.000 rows=254576 loops=1)
> >  9                          ->  Index Scan using uesapt001 on APORTES  (cost=0.00..37301678.64
rows=9339331 width=25) (actual time=110.000..2520690.000 rows=9335892 loops=1)
> > 10              ->  Sort  (cost=128944.78..129100.44 rows=62263 width=40) (actual
time=30684.000..36838.000 rows=80471 loops=1)
> > 11                    Sort Key: t.cuiL
> > 12                    ->  Index Scan using uesapt002 on APORTES t  (cost=0.00..122643.90
rows=62263 width=40) (actual time=170.000..25566.000 rows=80471 loops=1)
> > 13                          Index Cond: (cuiT = '12345678901'::bpchar)
> > Total runtime: 2677640.000 ms
> >
> > Thanks
> >      Sebastián Baioni





__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas