Thread: Improving Inner Join Performance
Hi to all,
I have the following query:
SELECT count(*) FROM orders o
INNER JOIN report r ON r.id_order=o.id
WHERE o.id_status>3
INNER JOIN report r ON r.id_order=o.id
WHERE o.id_status>3
Explaing analyze:
Aggregate (cost=8941.82..8941.82 rows=1 width=0) (actual time=1003.297..1003.298 rows=1 loops=1)
-> Hash Join (cost=3946.28..8881.72 rows=24041 width=0) (actual time=211.985..951.545 rows=72121 loops=1)
Hash Cond: ("outer".id_order = "inner".id)
-> Seq Scan on report r (cost=0.00..2952.21 rows=72121 width=4) (actual time=0.005..73.869 rows=72121 loops=1)
-> Hash (cost=3787.57..3787.57 rows=24682 width=4) (actual time=211.855..211.855 rows=0 loops=1)
-> Seq Scan on orders o (cost=0.00..3787.57 rows=24682 width=4) (actual time=0.047..147.170 rows=72121 loops=1)
Filter: (id_status > 3)
Total runtime: 1003.671 ms
-> Hash Join (cost=3946.28..8881.72 rows=24041 width=0) (actual time=211.985..951.545 rows=72121 loops=1)
Hash Cond: ("outer".id_order = "inner".id)
-> Seq Scan on report r (cost=0.00..2952.21 rows=72121 width=4) (actual time=0.005..73.869 rows=72121 loops=1)
-> Hash (cost=3787.57..3787.57 rows=24682 width=4) (actual time=211.855..211.855 rows=0 loops=1)
-> Seq Scan on orders o (cost=0.00..3787.57 rows=24682 width=4) (actual time=0.047..147.170 rows=72121 loops=1)
Filter: (id_status > 3)
Total runtime: 1003.671 ms
I could use it in the following format, because I have to the moment only the 4,6 values for the id_status.
SELECT count(*) FROM orders o
INNER JOIN report r ON r.id_order=o.id
WHERE o.id_status IN (4,6)
INNER JOIN report r ON r.id_order=o.id
WHERE o.id_status IN (4,6)
Explain analyze:
Aggregate (cost=5430.04..5430.04 rows=1 width=0) (actual time=1472.877..1472.877 rows=1 loops=1)
-> Hash Join (cost=2108.22..5428.23 rows=720 width=0) (actual time=342.080..1419.775 rows=72121 loops=1)
Hash Cond: ("outer".id_order = "inner".id)
-> Seq Scan on report r (cost=0.00..2952.21 rows=72121 width=4) (actual time=0.036..106.217 rows=72121 loops=1)
-> Hash (cost=2106.37..2106.37 rows=739 width=4) (actual time=342.011..342.011 rows=0 loops=1)
-> Index Scan using orders_id_status_idx, orders_id_status_idx on orders o (cost=0.00..2106.37 rows=739 width=4) (actual time=0.131..268.397 rows=72121 loops=1)
Index Cond: ((id_status = 4) OR (id_status = 6))
Total runtime: 1474.356 ms
Aggregate (cost=5430.04..5430.04 rows=1 width=0) (actual time=1472.877..1472.877 rows=1 loops=1)
-> Hash Join (cost=2108.22..5428.23 rows=720 width=0) (actual time=342.080..1419.775 rows=72121 loops=1)
Hash Cond: ("outer".id_order = "inner".id)
-> Seq Scan on report r (cost=0.00..2952.21 rows=72121 width=4) (actual time=0.036..106.217 rows=72121 loops=1)
-> Hash (cost=2106.37..2106.37 rows=739 width=4) (actual time=342.011..342.011 rows=0 loops=1)
-> Index Scan using orders_id_status_idx, orders_id_status_idx on orders o (cost=0.00..2106.37 rows=739 width=4) (actual time=0.131..268.397 rows=72121 loops=1)
Index Cond: ((id_status = 4) OR (id_status = 6))
Total runtime: 1474.356 ms
How can I improve this query's performace?? The ideea is to count all the values that I have in the database for the following conditions. If the users puts in some other search fields on the where then the query runs faster but in this format sometimes it takes a lot lot of time(sometimes even 2,3 seconds).
Can this be tuned somehow???
Regards,
Andy.
On Thu, 5 Jan 2006 17:16:47 +0200 "Andy" <frum@ar-sd.net> wrote: > Hi to all, > > I have the following query: > > SELECT count(*) FROM orders o > INNER JOIN report r ON r.id_order=o.id > WHERE o.id_status>3 > How can I improve this query's performace?? The ideea is to count all > the values that I have in the database for the following conditions. > If the users puts in some other search fields on the where then the > query runs faster but in this format sometimes it takes a lot lot of > time(sometimes even 2,3 seconds). > > Can this be tuned somehow??? Do you have an index on report.id_order ? Try creating an index for it if not and run a vacuum analyze on the table to see if it gets rid of the sequence scan in the plan. --------------------------------- Frank Wiles <frank@wiles.org> http://www.wiles.org ---------------------------------
Yes I have indexes an all join fields. The tables have around 30 columns each and around 100k rows. The database is vacuumed every hour. Andy. ----- Original Message ----- From: "Frank Wiles" <frank@wiles.org> To: "Andy" <frum@ar-sd.net> Cc: <pgsql-performance@postgresql.org> Sent: Thursday, January 05, 2006 9:20 PM Subject: Re: [PERFORM] Improving Inner Join Performance > On Thu, 5 Jan 2006 17:16:47 +0200 > "Andy" <frum@ar-sd.net> wrote: > >> Hi to all, >> >> I have the following query: >> >> SELECT count(*) FROM orders o >> INNER JOIN report r ON r.id_order=o.id >> WHERE o.id_status>3 > >> How can I improve this query's performace?? The ideea is to count all >> the values that I have in the database for the following conditions. >> If the users puts in some other search fields on the where then the >> query runs faster but in this format sometimes it takes a lot lot of >> time(sometimes even 2,3 seconds). >> >> Can this be tuned somehow??? > > Do you have an index on report.id_order ? Try creating an index for > it if not and run a vacuum analyze on the table to see if it gets > rid of the sequence scan in the plan. > > --------------------------------- > Frank Wiles <frank@wiles.org> > http://www.wiles.org > --------------------------------- > > >
On Jan 6, 2006, at 18:21 , Andy wrote: > Yes I have indexes an all join fields. The tables have around 30 > columns each and around 100k rows. The database is vacuumed every > hour. Just to chime in, VACUUM != VACUUM ANALYZE. ANALYZE is what updates database statistics and affects query planning. VACUUM alone does not do this. >> Do you have an index on report.id_order ? Try creating an index for >> it if not and run a vacuum analyze on the table to see if it gets >> rid of the sequence scan in the plan. Michael Glaesemann grzm myrealbox com
Sorry, I had to be more specific. VACUUM ANALYZE is performed every hour. Regards, Andy. ----- Original Message ----- From: "Michael Glaesemann" <grzm@myrealbox.com> To: "Andy" <frum@ar-sd.net> Cc: <pgsql-performance@postgresql.org> Sent: Friday, January 06, 2006 11:45 AM Subject: Re: [PERFORM] Improving Inner Join Performance > > On Jan 6, 2006, at 18:21 , Andy wrote: > >> Yes I have indexes an all join fields. The tables have around 30 >> columns each and around 100k rows. The database is vacuumed every >> hour. > > Just to chime in, VACUUM != VACUUM ANALYZE. ANALYZE is what updates > database statistics and affects query planning. VACUUM alone does not > do this. > >>> Do you have an index on report.id_order ? Try creating an index for >>> it if not and run a vacuum analyze on the table to see if it gets >>> rid of the sequence scan in the plan. > > Michael Glaesemann > grzm myrealbox com > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > >
> If the users puts in some other search fields on the where then the query runs faster but > in this format sometimes ittakes a lot lot of time(sometimes even 2,3 seconds). Can you eloborate under what conditions which query is slower? On 1/5/06, Andy <frum@ar-sd.net> wrote: > > Hi to all, > > I have the following query: > > SELECT count(*) FROM orders o > INNER JOIN report r ON r.id_order=o.id > WHERE o.id_status>3 > > Explaing analyze: > Aggregate (cost=8941.82..8941.82 rows=1 width=0) (actual > time=1003.297..1003.298 rows=1 loops=1) > -> Hash Join (cost=3946.28..8881.72 rows=24041 width=0) (actual > time=211.985..951.545 rows=72121 loops=1) > Hash Cond: ("outer".id_order = "inner".id) > -> Seq Scan on report r (cost=0.00..2952.21 rows=72121 width=4) > (actual time=0.005..73.869 rows=72121 loops=1) > -> Hash (cost=3787.57..3787.57 rows=24682 width=4) (actual > time=211.855..211.855 rows=0 loops=1) > -> Seq Scan on orders o (cost=0.00..3787.57 rows=24682 > width=4) (actual time=0.047..147.170 rows=72121 loops=1) > Filter: (id_status > 3) > Total runtime: 1003.671 ms > > > I could use it in the following format, because I have to the moment only > the 4,6 values for the id_status. > > SELECT count(*) FROM orders o > INNER JOIN report r ON r.id_order=o.id > WHERE o.id_status IN (4,6) > > Explain analyze: > Aggregate (cost=5430.04..5430.04 rows=1 width=0) (actual > time=1472.877..1472.877 rows=1 loops=1) > -> Hash Join (cost=2108.22..5428.23 rows=720 width=0) (actual > time=342.080..1419.775 rows=72121 loops=1) > Hash Cond: ("outer".id_order = "inner".id) > -> Seq Scan on report r (cost=0.00..2952.21 rows=72121 width=4) > (actual time=0.036..106.217 rows=72121 loops=1) > -> Hash (cost=2106.37..2106.37 rows=739 width=4) (actual > time=342.011..342.011 rows=0 loops=1) > -> Index Scan using orders_id_status_idx, > orders_id_status_idx on orders o (cost=0.00..2106.37 rows=739 width=4) > (actual time=0.131..268.397 rows=72121 loops=1) > Index Cond: ((id_status = 4) OR (id_status = 6)) > Total runtime: 1474.356 ms > > How can I improve this query's performace?? The ideea is to count all the > values that I have in the database for the following conditions. If the > users puts in some other search fields on the where then the query runs > faster but in this format sometimes it takes a lot lot of time(sometimes > even 2,3 seconds). > > Can this be tuned somehow??? > > Regards, > Andy. > >
At the moment: o.id_status>3 can have values only 4 and 6. The 6 is around 90% from the whole table. This is why seq scan is made. Now, depending on the user input the query can have more where fields. For example: SELECT count(*) FROM orders o INNER JOIN report r ON r.id_order=o.id WHERE o.id_status > 3 AND r.id_zufriden=7 Aggregate (cost=7317.15..7317.15 rows=1 width=0) (actual time=213.418..213.419 rows=1 loops=1) -> Hash Join (cost=3139.00..7310.80 rows=2540 width=0) (actual time=57.554..212.215 rows=1308 loops=1) Hash Cond: ("outer".id = "inner".id_order) -> Seq Scan on orders o (cost=0.00..3785.31 rows=72216 width=4) (actual time=0.014..103.292 rows=72121 loops=1) Filter: (id_status > 3) -> Hash (cost=3132.51..3132.51 rows=2597 width=4) (actual time=57.392..57.392 rows=0 loops=1) -> Seq Scan on report r (cost=0.00..3132.51 rows=2597 width=4) (actual time=0.019..56.220 rows=1308 loops=1) Filter: (id_zufriden = 7) Total runtime: 213.514 ms These examples can go on and on. If I run this query SELECT count(*) FROM orders o INNER JOIN report r ON r.id_order=o.id WHERE o.id_status>3 under normal system load the average response time is between 1.3 > 2.5 seconds. Sometimes even more. If I run it rapidly a few times then it respondes faster(that is normal I supose). The ideea of this query is to count all the possible results that the user can have. I use this to build pages of results. Andy. ----- Original Message ----- From: "Pandurangan R S" <pandurangan.r.s@gmail.com> To: "Andy" <frum@ar-sd.net> Cc: <pgsql-performance@postgresql.org> Sent: Friday, January 06, 2006 11:56 AM Subject: Re: [PERFORM] Improving Inner Join Performance > If the users puts in some other search fields on the where then the query > runs faster but > in this format sometimes it takes a lot lot of > time(sometimes even 2,3 seconds). Can you eloborate under what conditions which query is slower? On 1/5/06, Andy <frum@ar-sd.net> wrote: > > Hi to all, > > I have the following query: > > SELECT count(*) FROM orders o > INNER JOIN report r ON r.id_order=o.id > WHERE o.id_status>3 > > Explaing analyze: > Aggregate (cost=8941.82..8941.82 rows=1 width=0) (actual > time=1003.297..1003.298 rows=1 loops=1) > -> Hash Join (cost=3946.28..8881.72 rows=24041 width=0) (actual > time=211.985..951.545 rows=72121 loops=1) > Hash Cond: ("outer".id_order = "inner".id) > -> Seq Scan on report r (cost=0.00..2952.21 rows=72121 width=4) > (actual time=0.005..73.869 rows=72121 loops=1) > -> Hash (cost=3787.57..3787.57 rows=24682 width=4) (actual > time=211.855..211.855 rows=0 loops=1) > -> Seq Scan on orders o (cost=0.00..3787.57 rows=24682 > width=4) (actual time=0.047..147.170 rows=72121 loops=1) > Filter: (id_status > 3) > Total runtime: 1003.671 ms > > > I could use it in the following format, because I have to the moment only > the 4,6 values for the id_status. > > SELECT count(*) FROM orders o > INNER JOIN report r ON r.id_order=o.id > WHERE o.id_status IN (4,6) > > Explain analyze: > Aggregate (cost=5430.04..5430.04 rows=1 width=0) (actual > time=1472.877..1472.877 rows=1 loops=1) > -> Hash Join (cost=2108.22..5428.23 rows=720 width=0) (actual > time=342.080..1419.775 rows=72121 loops=1) > Hash Cond: ("outer".id_order = "inner".id) > -> Seq Scan on report r (cost=0.00..2952.21 rows=72121 width=4) > (actual time=0.036..106.217 rows=72121 loops=1) > -> Hash (cost=2106.37..2106.37 rows=739 width=4) (actual > time=342.011..342.011 rows=0 loops=1) > -> Index Scan using orders_id_status_idx, > orders_id_status_idx on orders o (cost=0.00..2106.37 rows=739 width=4) > (actual time=0.131..268.397 rows=72121 loops=1) > Index Cond: ((id_status = 4) OR (id_status = 6)) > Total runtime: 1474.356 ms > > How can I improve this query's performace?? The ideea is to count all the > values that I have in the database for the following conditions. If the > users puts in some other search fields on the where then the query runs > faster but in this format sometimes it takes a lot lot of time(sometimes > even 2,3 seconds). > > Can this be tuned somehow??? > > Regards, > Andy. > >
On 1/6/06, Andy <frum@ar-sd.net> wrote: > At the moment: o.id_status>3 can have values only 4 and 6. The 6 is around > 90% from the whole table. This is why seq scan is made. > given this if you make id_status > 3 you will never use an index because you will be scanning 4 and 6 the only values in this field as you say, and even if there were any other value 6 is 90% of whole table, so an index for this will not be used... > Now, depending on the user input the query can have more where fields. For > example: > SELECT count(*) FROM orders o > INNER JOIN report r ON r.id_order=o.id > WHERE o.id_status > 3 AND r.id_zufriden=7 > here the planner can be more selective, and of course the query is faster... if you will be loading data load it all then make tests... but because your actual data the planner will always choose to scan the entire orders table for o.id_status > 3... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
shared_buffers = 10240
effective_cache_size = 64000
RAM on server: 1Gb.
Andy.
----- Original Message -----
From: "Frank Wiles" <frank@wiles.org>
To: "Andy" <frum@ar-sd.net>
Sent: Friday, January 06, 2006 7:12 PM
Subject: Re: [PERFORM] Improving Inner Join Performance
> "Andy" <frum@ar-sd.net> wrote:
>
>> Yes I have indexes an all join fields.
>> The tables have around 30 columns each and around 100k rows.
>> The database is vacuumed every hour.
>
> What are you settings for:
>
> shared_buffers
> effective_cache_size
>
> And how much RAM do you have in the server?
>
> ---------------------------------
> Frank Wiles <frank@wiles.org>
> http://www.wiles.org
> ---------------------------------
>
>
>
Did you originally post some problem queries? The settings look OK, though 1G of memory isn't very much now-a-days. On Mon, Jan 09, 2006 at 09:56:52AM +0200, Andy wrote: > shared_buffers = 10240 > effective_cache_size = 64000 > RAM on server: 1Gb. > > Andy. > > ----- Original Message ----- > > From: "Frank Wiles" <frank@wiles.org> > To: "Andy" <frum@ar-sd.net> > Sent: Friday, January 06, 2006 7:12 PM > Subject: Re: [PERFORM] Improving Inner Join Performance > > > > On Fri, 6 Jan 2006 09:59:30 +0200 > > "Andy" <frum@ar-sd.net> wrote: > > > >> Yes I have indexes an all join fields. > >> The tables have around 30 columns each and around 100k rows. > >> The database is vacuumed every hour. > > > > What are you settings for: > > > > shared_buffers > > effective_cache_size > > > > And how much RAM do you have in the server? > > > > --------------------------------- > > Frank Wiles <frank@wiles.org> > > http://www.wiles.org > > --------------------------------- > > > > > > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461