Thread: Improving Inner Join Performance

Improving Inner Join Performance

From
"Andy"
Date:
Hi to all,
 
I have the following query:
 
SELECT count(*) FROM orders o
      INNER JOIN report r ON r.id_order=o.id
      WHERE o.id_status>3
 
Explaing analyze:
Aggregate  (cost=8941.82..8941.82 rows=1 width=0) (actual time=1003.297..1003.298 rows=1 loops=1)
  ->  Hash Join  (cost=3946.28..8881.72 rows=24041 width=0) (actual time=211.985..951.545 rows=72121 loops=1)
        Hash Cond: ("outer".id_order = "inner".id)
        ->  Seq Scan on report r  (cost=0.00..2952.21 rows=72121 width=4) (actual time=0.005..73.869 rows=72121 loops=1)
        ->  Hash  (cost=3787.57..3787.57 rows=24682 width=4) (actual time=211.855..211.855 rows=0 loops=1)
              ->  Seq Scan on orders o  (cost=0.00..3787.57 rows=24682 width=4) (actual time=0.047..147.170 rows=72121 loops=1)
                    Filter: (id_status > 3)
Total runtime: 1003.671 ms
 
 
I could use it in the following format, because I have to the moment only the 4,6 values for the id_status.
 
SELECT count(*) FROM orders o
      INNER JOIN report r ON r.id_order=o.id
      WHERE o.id_status IN (4,6)
 
Explain analyze:
Aggregate  (cost=5430.04..5430.04 rows=1 width=0) (actual time=1472.877..1472.877 rows=1 loops=1)
  ->  Hash Join  (cost=2108.22..5428.23 rows=720 width=0) (actual time=342.080..1419.775 rows=72121 loops=1)
        Hash Cond: ("outer".id_order = "inner".id)
        ->  Seq Scan on report r  (cost=0.00..2952.21 rows=72121 width=4) (actual time=0.036..106.217 rows=72121 loops=1)
        ->  Hash  (cost=2106.37..2106.37 rows=739 width=4) (actual time=342.011..342.011 rows=0 loops=1)
              ->  Index Scan using orders_id_status_idx, orders_id_status_idx on orders o  (cost=0.00..2106.37 rows=739 width=4) (actual time=0.131..268.397 rows=72121 loops=1)
                    Index Cond: ((id_status = 4) OR (id_status = 6))
Total runtime: 1474.356 ms
How can I improve this query's performace?? The ideea is to count all the values that I have in the database for the following conditions. If the users puts in some other search fields on the where then the query runs faster but in this format sometimes it takes a lot lot of time(sometimes even 2,3 seconds).
 
Can this be tuned somehow???
 
Regards,
Andy.
 
 

Re: Improving Inner Join Performance

From
Frank Wiles
Date:
On Thu, 5 Jan 2006 17:16:47 +0200
"Andy" <frum@ar-sd.net> wrote:

> Hi to all,
>
> I have the following query:
>
> SELECT count(*) FROM orders o
>       INNER JOIN report r ON r.id_order=o.id
>       WHERE o.id_status>3

> How can I improve this query's performace?? The ideea is to count all
> the values that I have in the database for the following conditions.
> If the users puts in some other search fields on the where then the
> query runs faster but in this format sometimes it takes a lot lot of
> time(sometimes even 2,3 seconds).
>
> Can this be tuned somehow???

  Do you have an index on report.id_order ? Try creating an index for
  it if not and run a vacuum analyze on the table to see if it gets
  rid of the sequence scan in the plan.

 ---------------------------------
   Frank Wiles <frank@wiles.org>
   http://www.wiles.org
 ---------------------------------


Re: Improving Inner Join Performance

From
"Andy"
Date:
Yes I have indexes an all join fields.
The tables have around 30 columns each and around 100k rows.
The database is vacuumed every hour.

Andy.
----- Original Message -----
From: "Frank Wiles" <frank@wiles.org>
To: "Andy" <frum@ar-sd.net>
Cc: <pgsql-performance@postgresql.org>
Sent: Thursday, January 05, 2006 9:20 PM
Subject: Re: [PERFORM] Improving Inner Join Performance


> On Thu, 5 Jan 2006 17:16:47 +0200
> "Andy" <frum@ar-sd.net> wrote:
>
>> Hi to all,
>>
>> I have the following query:
>>
>> SELECT count(*) FROM orders o
>>       INNER JOIN report r ON r.id_order=o.id
>>       WHERE o.id_status>3
>
>> How can I improve this query's performace?? The ideea is to count all
>> the values that I have in the database for the following conditions.
>> If the users puts in some other search fields on the where then the
>> query runs faster but in this format sometimes it takes a lot lot of
>> time(sometimes even 2,3 seconds).
>>
>> Can this be tuned somehow???
>
>  Do you have an index on report.id_order ? Try creating an index for
>  it if not and run a vacuum analyze on the table to see if it gets
>  rid of the sequence scan in the plan.
>
> ---------------------------------
>   Frank Wiles <frank@wiles.org>
>   http://www.wiles.org
> ---------------------------------
>
>
>

Re: Improving Inner Join Performance

From
Michael Glaesemann
Date:
On Jan 6, 2006, at 18:21 , Andy wrote:

> Yes I have indexes an all join fields. The tables have around 30
> columns each and around 100k rows. The database is vacuumed every
> hour.

Just to chime in, VACUUM != VACUUM ANALYZE. ANALYZE is what updates
database statistics and affects query planning. VACUUM alone does not
do this.

>>  Do you have an index on report.id_order ? Try creating an index for
>>  it if not and run a vacuum analyze on the table to see if it gets
>>  rid of the sequence scan in the plan.

Michael Glaesemann
grzm myrealbox com




Re: Improving Inner Join Performance

From
"Andy"
Date:
Sorry, I had to be more specific.
VACUUM ANALYZE is performed every hour.

Regards,
Andy.

----- Original Message -----
From: "Michael Glaesemann" <grzm@myrealbox.com>
To: "Andy" <frum@ar-sd.net>
Cc: <pgsql-performance@postgresql.org>
Sent: Friday, January 06, 2006 11:45 AM
Subject: Re: [PERFORM] Improving Inner Join Performance


>
> On Jan 6, 2006, at 18:21 , Andy wrote:
>
>> Yes I have indexes an all join fields. The tables have around 30
>> columns each and around 100k rows. The database is vacuumed every
>> hour.
>
> Just to chime in, VACUUM != VACUUM ANALYZE. ANALYZE is what updates
> database statistics and affects query planning. VACUUM alone does not
> do this.
>
>>>  Do you have an index on report.id_order ? Try creating an index for
>>>  it if not and run a vacuum analyze on the table to see if it gets
>>>  rid of the sequence scan in the plan.
>
> Michael Glaesemann
> grzm myrealbox com
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
>

Re: Improving Inner Join Performance

From
Pandurangan R S
Date:
> If the users puts in some other search fields on the where then the query runs faster but > in this format sometimes
ittakes a lot lot of time(sometimes even 2,3 seconds). 

Can you eloborate under what conditions which query is slower?

On 1/5/06, Andy <frum@ar-sd.net> wrote:
>
> Hi to all,
>
> I have the following query:
>
> SELECT count(*) FROM orders o
>       INNER JOIN report r ON r.id_order=o.id
>       WHERE o.id_status>3
>
> Explaing analyze:
> Aggregate  (cost=8941.82..8941.82 rows=1 width=0) (actual
> time=1003.297..1003.298 rows=1 loops=1)
>   ->  Hash Join  (cost=3946.28..8881.72 rows=24041 width=0) (actual
> time=211.985..951.545 rows=72121 loops=1)
>         Hash Cond: ("outer".id_order = "inner".id)
>         ->  Seq Scan on report r  (cost=0.00..2952.21 rows=72121 width=4)
> (actual time=0.005..73.869 rows=72121 loops=1)
>         ->  Hash  (cost=3787.57..3787.57 rows=24682 width=4) (actual
> time=211.855..211.855 rows=0 loops=1)
>               ->  Seq Scan on orders o  (cost=0.00..3787.57 rows=24682
> width=4) (actual time=0.047..147.170 rows=72121 loops=1)
>                     Filter: (id_status > 3)
> Total runtime: 1003.671 ms
>
>
> I could use it in the following format, because I have to the moment only
> the 4,6 values for the id_status.
>
> SELECT count(*) FROM orders o
>       INNER JOIN report r ON r.id_order=o.id
>       WHERE o.id_status IN (4,6)
>
> Explain analyze:
> Aggregate  (cost=5430.04..5430.04 rows=1 width=0) (actual
> time=1472.877..1472.877 rows=1 loops=1)
>   ->  Hash Join  (cost=2108.22..5428.23 rows=720 width=0) (actual
> time=342.080..1419.775 rows=72121 loops=1)
>         Hash Cond: ("outer".id_order = "inner".id)
>         ->  Seq Scan on report r  (cost=0.00..2952.21 rows=72121 width=4)
> (actual time=0.036..106.217 rows=72121 loops=1)
>         ->  Hash  (cost=2106.37..2106.37 rows=739 width=4) (actual
> time=342.011..342.011 rows=0 loops=1)
>               ->  Index Scan using orders_id_status_idx,
> orders_id_status_idx on orders o  (cost=0.00..2106.37 rows=739 width=4)
> (actual time=0.131..268.397 rows=72121 loops=1)
>                     Index Cond: ((id_status = 4) OR (id_status = 6))
> Total runtime: 1474.356 ms
>
> How can I improve this query's performace?? The ideea is to count all the
> values that I have in the database for the following conditions. If the
> users puts in some other search fields on the where then the query runs
> faster but in this format sometimes it takes a lot lot of time(sometimes
> even 2,3 seconds).
>
> Can this be tuned somehow???
>
> Regards,
> Andy.
>
>

Re: Improving Inner Join Performance

From
"Andy"
Date:
At the moment: o.id_status>3 can have values only 4 and 6. The 6 is around
90% from the whole table. This is why seq scan is made.

Now, depending on the user input the query can have more where fields. For
example:
SELECT count(*) FROM orders o
       INNER JOIN report r ON r.id_order=o.id
       WHERE o.id_status > 3 AND r.id_zufriden=7

Aggregate  (cost=7317.15..7317.15 rows=1 width=0) (actual
time=213.418..213.419 rows=1 loops=1)
  ->  Hash Join  (cost=3139.00..7310.80 rows=2540 width=0) (actual
time=57.554..212.215 rows=1308 loops=1)
        Hash Cond: ("outer".id = "inner".id_order)
        ->  Seq Scan on orders o  (cost=0.00..3785.31 rows=72216 width=4)
(actual time=0.014..103.292 rows=72121 loops=1)
              Filter: (id_status > 3)
        ->  Hash  (cost=3132.51..3132.51 rows=2597 width=4) (actual
time=57.392..57.392 rows=0 loops=1)
              ->  Seq Scan on report r  (cost=0.00..3132.51 rows=2597
width=4) (actual time=0.019..56.220 rows=1308 loops=1)
                    Filter: (id_zufriden = 7)
Total runtime: 213.514 ms

These examples can go on and on.

If I run this query
SELECT count(*) FROM orders o
INNER JOIN report r ON r.id_order=o.id
WHERE o.id_status>3
under normal system load the average response time is between 1.3 > 2.5
seconds. Sometimes even more. If I run it rapidly a few times then it
respondes faster(that is normal I supose).

The ideea of this query is to count all the possible results that the user
can have. I use this to build pages of results.

Andy.

----- Original Message -----
From: "Pandurangan R S" <pandurangan.r.s@gmail.com>
To: "Andy" <frum@ar-sd.net>
Cc: <pgsql-performance@postgresql.org>
Sent: Friday, January 06, 2006 11:56 AM
Subject: Re: [PERFORM] Improving Inner Join Performance


> If the users puts in some other search fields on the where then the query
> runs faster but > in this format sometimes it takes a lot lot of
> time(sometimes even 2,3 seconds).

Can you eloborate under what conditions which query is slower?

On 1/5/06, Andy <frum@ar-sd.net> wrote:
>
> Hi to all,
>
> I have the following query:
>
> SELECT count(*) FROM orders o
>       INNER JOIN report r ON r.id_order=o.id
>       WHERE o.id_status>3
>
> Explaing analyze:
> Aggregate  (cost=8941.82..8941.82 rows=1 width=0) (actual
> time=1003.297..1003.298 rows=1 loops=1)
>   ->  Hash Join  (cost=3946.28..8881.72 rows=24041 width=0) (actual
> time=211.985..951.545 rows=72121 loops=1)
>         Hash Cond: ("outer".id_order = "inner".id)
>         ->  Seq Scan on report r  (cost=0.00..2952.21 rows=72121 width=4)
> (actual time=0.005..73.869 rows=72121 loops=1)
>         ->  Hash  (cost=3787.57..3787.57 rows=24682 width=4) (actual
> time=211.855..211.855 rows=0 loops=1)
>               ->  Seq Scan on orders o  (cost=0.00..3787.57 rows=24682
> width=4) (actual time=0.047..147.170 rows=72121 loops=1)
>                     Filter: (id_status > 3)
> Total runtime: 1003.671 ms
>
>
> I could use it in the following format, because I have to the moment only
> the 4,6 values for the id_status.
>
> SELECT count(*) FROM orders o
>       INNER JOIN report r ON r.id_order=o.id
>       WHERE o.id_status IN (4,6)
>
> Explain analyze:
> Aggregate  (cost=5430.04..5430.04 rows=1 width=0) (actual
> time=1472.877..1472.877 rows=1 loops=1)
>   ->  Hash Join  (cost=2108.22..5428.23 rows=720 width=0) (actual
> time=342.080..1419.775 rows=72121 loops=1)
>         Hash Cond: ("outer".id_order = "inner".id)
>         ->  Seq Scan on report r  (cost=0.00..2952.21 rows=72121 width=4)
> (actual time=0.036..106.217 rows=72121 loops=1)
>         ->  Hash  (cost=2106.37..2106.37 rows=739 width=4) (actual
> time=342.011..342.011 rows=0 loops=1)
>               ->  Index Scan using orders_id_status_idx,
> orders_id_status_idx on orders o  (cost=0.00..2106.37 rows=739 width=4)
> (actual time=0.131..268.397 rows=72121 loops=1)
>                     Index Cond: ((id_status = 4) OR (id_status = 6))
> Total runtime: 1474.356 ms
>
> How can I improve this query's performace?? The ideea is to count all the
> values that I have in the database for the following conditions. If the
> users puts in some other search fields on the where then the query runs
> faster but in this format sometimes it takes a lot lot of time(sometimes
> even 2,3 seconds).
>
> Can this be tuned somehow???
>
> Regards,
> Andy.
>
>



Re: Improving Inner Join Performance

From
Jaime Casanova
Date:
On 1/6/06, Andy <frum@ar-sd.net> wrote:
> At the moment: o.id_status>3 can have values only 4 and 6. The 6 is around
> 90% from the whole table. This is why seq scan is made.
>
given this if you make id_status > 3 you will never use an index
because you will be scanning 4 and 6 the only values in this field as
you say, and even if there were any other value 6 is 90% of whole
table, so an index for this will not be used...

> Now, depending on the user input the query can have more where fields. For
> example:
> SELECT count(*) FROM orders o
>       INNER JOIN report r ON r.id_order=o.id
>       WHERE o.id_status > 3 AND r.id_zufriden=7
>
here the planner can be more selective, and of course the query is
faster... if you will be loading data load it all then make tests...

but because your actual data the planner will always choose to scan
the entire orders table for o.id_status > 3...

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

Re: Improving Inner Join Performance

From
"Andy"
Date:

shared_buffers = 10240
effective_cache_size = 64000
RAM on server: 1Gb.

Andy.

----- Original Message -----

From: "Frank Wiles" <frank@wiles.org>
To: "Andy" <frum@ar-sd.net>
Sent: Friday, January 06, 2006 7:12 PM
Subject: Re: [PERFORM] Improving Inner Join Performance

> On Fri, 6 Jan 2006 09:59:30 +0200
> "Andy" <
frum@ar-sd.net> wrote:
>
>> Yes I have indexes an all join fields.
>> The tables have around 30 columns each and around 100k rows.
>> The database is vacuumed every hour. 
>
>  What are you settings for:
>
>  shared_buffers
>  effective_cache_size
>
>  And how much RAM do you have in the server?
>
> ---------------------------------
>   Frank Wiles <
frank@wiles.org>
>  
http://www.wiles.org
> ---------------------------------
>
>
>

Re: Improving Inner Join Performance

From
"Jim C. Nasby"
Date:
Did you originally post some problem queries? The settings look OK,
though 1G of memory isn't very much now-a-days.

On Mon, Jan 09, 2006 at 09:56:52AM +0200, Andy wrote:
> shared_buffers = 10240
> effective_cache_size = 64000
> RAM on server: 1Gb.
>
> Andy.
>
> ----- Original Message -----
>
> From: "Frank Wiles" <frank@wiles.org>
> To: "Andy" <frum@ar-sd.net>
> Sent: Friday, January 06, 2006 7:12 PM
> Subject: Re: [PERFORM] Improving Inner Join Performance
>
>
> > On Fri, 6 Jan 2006 09:59:30 +0200
> > "Andy" <frum@ar-sd.net> wrote:
> >
> >> Yes I have indexes an all join fields.
> >> The tables have around 30 columns each and around 100k rows.
> >> The database is vacuumed every hour.
> >
> >  What are you settings for:
> >
> >  shared_buffers
> >  effective_cache_size
> >
> >  And how much RAM do you have in the server?
> >
> > ---------------------------------
> >   Frank Wiles <frank@wiles.org>
> >   http://www.wiles.org
> > ---------------------------------
> >
> >
> >

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461