Thread: Join query on 1M row table slow
I have a pretty simple select query that joins a table (p) with 125K rows with another table (pc) with almost one million rows: select p.* from product_categories pc inner join products p on pc.product_id = p.id where pc.category_id = $category_id order by p.title limit 25 offset $offset The query usually takes about five seconds to execute (all other PG queries perform fast enough). I have indexes on everything needed, and EXPLAIN shows they're being used. Is there anything else I can do to improve performance - such as tweaking some settings in the config? Redhat 9, PG 7.4.1. __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
On Tue, 10 Feb 2004, CSN wrote: > I have a pretty simple select query that joins a table > (p) with 125K rows with another table (pc) with almost > one million rows: > > select p.* > from product_categories pc > inner join products p > on pc.product_id = p.id > where pc.category_id = $category_id > order by p.title > limit 25 > offset $offset > > The query usually takes about five seconds to execute > (all other PG queries perform fast enough). I have > indexes on everything needed, and EXPLAIN shows > they're being used. Is there anything else I can do to > improve performance - such as tweaking some settings > in the config? > > Redhat 9, PG 7.4.1. Could you give more info on the hardware? You did not mention how often you do your vacuum analyze or how often data is updated/deleted. The more info you provide the more we can try to help. How about your buffer and other settings?
On Tuesday 10 February 2004 19:51, CSN wrote: > I have a pretty simple select query that joins a table > (p) with 125K rows with another table (pc) with almost > one million rows: > > select p.* > from product_categories pc > inner join products p > on pc.product_id = p.id > where pc.category_id = $category_id > order by p.title > limit 25 > offset $offset > > The query usually takes about five seconds to execute > (all other PG queries perform fast enough). I have > indexes on everything needed, and EXPLAIN shows > they're being used. Is there anything else I can do to > improve performance - such as tweaking some settings > in the config? Quite possibly - I'd suggest: 1. Read "Performance Tuning" and the "Annotated conf file" at http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php 2. Vacuum analyze the tables concerned and post the output of EXPLAIN ANALYSE with your query. 3. Next time, try the -performance list - probably get a faster response there. -- Richard Huxton Archonet Ltd
> 2. Vacuum analyze the tables concerned and post the > output of EXPLAIN ANALYSE > with your query. => explain analyze; results in: ERROR: syntax error at or near ";" at character 16 __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
On Tue, 10 Feb 2004, CSN wrote: > I have a pretty simple select query that joins a table > (p) with 125K rows with another table (pc) with almost > one million rows: > > select p.* > from product_categories pc > inner join products p > on pc.product_id = p.id > where pc.category_id = $category_id > order by p.title > limit 25 > offset $offset > > The query usually takes about five seconds to execute > (all other PG queries perform fast enough). I have > indexes on everything needed, and EXPLAIN shows > they're being used. Is there anything else I can do to > improve performance - such as tweaking some settings > in the config? The problem is that in order to do an offset / limit on such a set, postgresql is gonna have to generate $offset + limit of the joined set. So, it's quite likely that it's generating the whole set first. It also looks odd having a select p.* from product_cat pc, but not selecting anything from the pc table. Could this be rewritten to something like select p.* from products p where p.id in (select product_id from product_categories pc where pc.category_id = $category_id) order by p.title limit 25 offset $offset ? Or is that equivalent?
Here's the EXPLAIN: Limit (cost=9595.99..9596.05 rows=25 width=290) -> Sort (cost=9595.99..9598.45 rows=986 width=290) Sort Key: p.title -> Nested Loop (cost=0.00..9546.96 rows=986 width=290) -> Index Scan using idx_pc_category_id on product_categories pc (cost=0.00..3607.28 rows=986 width=4) Index Cond: (category_id = 1016) -> Index Scan using pkey_products_id on products p (cost=0.00..6.01 rows=1 width=290) Index Cond: ("outer".product_id = p.id) (8 rows) Is the "cost" high? CSN --- lists@natserv.com wrote: > On Tue, 10 Feb 2004, CSN wrote: > > > I have a pretty simple select query that joins a > table > > (p) with 125K rows with another table (pc) with > almost > > one million rows: > > > > select p.* > > from product_categories pc > > inner join products p > > on pc.product_id = p.id > > where pc.category_id = $category_id > > order by p.title > > limit 25 > > offset $offset > > > > The query usually takes about five seconds to > execute > > (all other PG queries perform fast enough). I have > > indexes on everything needed, and EXPLAIN shows > > they're being used. Is there anything else I can > do to > > improve performance - such as tweaking some > settings > > in the config? > > > > Redhat 9, PG 7.4.1. > > Could you give more info on the hardware? > You did not mention how often you do your vacuum > analyze or how often data > is updated/deleted. The more info you provide the > more we can try to > help. > > How about your buffer and other settings? __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
On Tue, 10 Feb 2004, CSN wrote: > > 2. Vacuum analyze the tables concerned and post the > > output of EXPLAIN ANALYSE > > with your query. > > => explain analyze; > > results in: > > ERROR: syntax error at or near ";" at character 16 No silly. you do: explain analyze select ... (rest of the query...) and it runs the query and tells you how long each bit took and what it THOUGHT it would get back in terms of number of rows and what it actually got back. Let us know...
--- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > On Tue, 10 Feb 2004, CSN wrote: > > > I have a pretty simple select query that joins a > table > > (p) with 125K rows with another table (pc) with > almost > > one million rows: > > > > select p.* > > from product_categories pc > > inner join products p > > on pc.product_id = p.id > > where pc.category_id = $category_id > > order by p.title > > limit 25 > > offset $offset > > > > The query usually takes about five seconds to > execute > > (all other PG queries perform fast enough). I have > > indexes on everything needed, and EXPLAIN shows > > they're being used. Is there anything else I can > do to > > improve performance - such as tweaking some > settings > > in the config? > > The problem is that in order to do an offset / limit > on such a set, > postgresql is gonna have to generate $offset + limit > of the joined set. > > So, it's quite likely that it's generating the whole > set first. > > It also looks odd having a select p.* from > product_cat pc, but not > selecting anything from the pc table. > > Could this be rewritten to something like > > select p.* from products p where p.id in (select > product_id from product_categories pc where > pc.category_id = $category_id) > order by p.title limit 25 offset $offset > > ? Or is that equivalent? > I think that probably improves things (lower cost? - see my other post): explain select p.* from products p where p.id in ( select product_id from product_categories pc where pc.category_id = 1016) order by p.title limit 25 offset 0; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Limit (cost=4282.18..4282.24 rows=25 width=290) -> Sort (cost=4282.18..4282.46 rows=111 width=290) Sort Key: p.title -> Nested Loop (cost=3609.75..4278.41 rows=111 width=290) -> HashAggregate (cost=3609.75..3609.75 rows=111 width=4) -> Index Scan using idx_pc_category_id on product_categories pc (cost=0.00..3607.28 rows=986 width=4) Index Cond: (category_id = 1016) -> Index Scan using pkey_products_id on products p (cost=0.00..6.01 rows=1 width=290) Index Cond: (p.id = "outer".product_id) (9 rows) I figured the limit/offset was probably causing the problem. What's weird is that when the same query is executed again, it seems much faster - some sort of caching maybe? (table pc is just product_id <=> category_id - I don't really need the category_id) CSN __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
On Tue, 10 Feb 2004, CSN wrote: > > I think that probably improves things (lower cost? - > see my other post): > > explain select p.* from products p where p.id in ( > select product_id from product_categories pc where > pc.category_id = 1016) order by p.title limit 25 > offset 0; > > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------- > Limit (cost=4282.18..4282.24 rows=25 width=290) > -> Sort (cost=4282.18..4282.46 rows=111 > width=290) > Sort Key: p.title > -> Nested Loop (cost=3609.75..4278.41 > rows=111 width=290) > -> HashAggregate > (cost=3609.75..3609.75 rows=111 width=4) > -> Index Scan using > idx_pc_category_id on product_categories pc > (cost=0.00..3607.28 rows=986 width=4) > Index Cond: (category_id = > 1016) > -> Index Scan using pkey_products_id > on products p (cost=0.00..6.01 rows=1 width=290) > Index Cond: (p.id = > "outer".product_id) > (9 rows) > > > I figured the limit/offset was probably causing the > problem. What's weird is that when the same query is > executed again, it seems much faster - some sort of > caching maybe?> Yep. Exactly. What does explain analyze say about the two queries? > (table pc is just product_id <=> category_id - I don't > really need the category_id) If you could eliminate the need for that table in this query you should get it to run much faster.
Doh! Yeah, now I remember ;) QUERY 1: => explain analyze select p.* from product_categories pc inner join products p on pc.product_id = p.id where pc.category_id = 1016 order by p.title limit 25 offset 0; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=9595.99..9596.05 rows=25 width=290) (actual time=514.808..514.942 rows=25 loops=1) -> Sort (cost=9595.99..9598.45 rows=986 width=290) (actual time=514.794..514.868 rows=25 loops=1) Sort Key: p.title -> Nested Loop (cost=0.00..9546.96 rows=986 width=290) (actual time=0.672..421.732 rows=2358 loops=1) -> Index Scan using idx_pc_category_id on product_categories pc (cost=0.00..3607.28 rows=986 width=4) (actual time=0.343..125.762 rows=2358 loops=1) Index Cond: (category_id = 1016) -> Index Scan using pkey_products_id on products p (cost=0.00..6.01 rows=1 width=290) (actual time=0.075..0.083 rows=1 loops=2358) Index Cond: ("outer".product_id = p.id) Total runtime: 516.174 ms (9 rows) QUERY 2: => explain analyze select p.* from products p where p.id in ( select product_id from product_categories pc where pc.category_id = 1016) order by p.title limit 25 offset 0; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=4282.18..4282.24 rows=25 width=290) (actual time=447.852..447.979 rows=25 loops=1) -> Sort (cost=4282.18..4282.46 rows=111 width=290) (actual time=447.836..447.904 rows=25 loops=1) Sort Key: p.title -> Nested Loop (cost=3609.75..4278.41 rows=111 width=290) (actual time=104.256..358.182 rows=2358 loops=1) -> HashAggregate (cost=3609.75..3609.75 rows=111 width=4) (actual time=103.922..114.836 rows=2358 loops=1) -> Index Scan using idx_pc_category_id on product_categories pc (cost=0.00..3607.28 rows=986 width=4) (actual time=0.401..92.253 rows=2358 loops=1) Index Cond: (category_id = 1016) -> Index Scan using pkey_products_id on products p (cost=0.00..6.01 rows=1 width=290) (actual time=0.069..0.076 rows=1 loops=2358) Index Cond: (p.id = "outer".product_id) Total runtime: 449.370 ms (10 rows) -CSN --- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > On Tue, 10 Feb 2004, CSN wrote: > > > > 2. Vacuum analyze the tables concerned and post > the > > > output of EXPLAIN ANALYSE > > > with your query. > > > > => explain analyze; > > > > results in: > > > > ERROR: syntax error at or near ";" at character > 16 > > No silly. you do: > > explain analyze select ... (rest of the query...) > > and it runs the query and tells you how long each > bit took and what it > THOUGHT it would get back in terms of number of rows > and what it actually > got back. > > Let us know... > __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
--- lists@natserv.com wrote: > On Tue, 10 Feb 2004, CSN wrote: > > > I have a pretty simple select query that joins a > table > > (p) with 125K rows with another table (pc) with > almost > > one million rows: > > > > select p.* > > from product_categories pc > > inner join products p > > on pc.product_id = p.id > > where pc.category_id = $category_id > > order by p.title > > limit 25 > > offset $offset > > > > The query usually takes about five seconds to > execute > > (all other PG queries perform fast enough). I have > > indexes on everything needed, and EXPLAIN shows > > they're being used. Is there anything else I can > do to > > improve performance - such as tweaking some > settings > > in the config? > > > > Redhat 9, PG 7.4.1. > > Could you give more info on the hardware? Intel(R) Celeron(R) CPU 1.70GHz 1 GB RAM > You did not mention how often you do your vacuum > analyze or how often data > is updated/deleted. I've done both vaccuum and vaccuum analyze on the database. Vaccuum full takes forever (I haven't let it run its full course yet). The data is completely static - no updates/deletes, just selects. > How about your buffer and other settings? shared_buffers = 1000 That's about the only setting I changed in postgresql.conf TIA, CSN The more info you provide the > more we can try to > help. > > How about your buffer and other settings? __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
--- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > >What's weird is that when the same query > is > > executed again, it seems much faster - some sort > of > > caching maybe?> > > Yep. Exactly. Really? Where can I RTFM about it? CSN __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
Well, it looks like the number of rows estimate for the nested loop in the first query and the hash agg in the second are off by a factor 3 for the first query, and a factor of 20 for the second. Try running number 1 with set enable_nestloop = off and see if the first one gets faster. You might also wanna try turning off hash aggregate on the second one and see how that works. upping the analyze target on those two tables may help a bit too. On Tue, 10 Feb 2004, CSN wrote: > > Doh! Yeah, now I remember ;) > > QUERY 1: > > => explain analyze select p.* from product_categories > pc inner join products p on pc.product_id = p.id where > pc.category_id = 1016 order by p.title limit 25 offset > 0; > > QUERY PLAN > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=9595.99..9596.05 rows=25 width=290) > (actual time=514.808..514.942 rows=25 loops=1) > -> Sort (cost=9595.99..9598.45 rows=986 > width=290) (actual time=514.794..514.868 rows=25 > loops=1) > Sort Key: p.title > -> Nested Loop (cost=0.00..9546.96 rows=986 > width=290) (actual time=0.672..421.732 rows=2358 > loops=1) > -> Index Scan using idx_pc_category_id > on product_categories pc (cost=0.00..3607.28 rows=986 > width=4) (actual time=0.343..125.762 rows=2358 > loops=1) > Index Cond: (category_id = 1016) > -> Index Scan using pkey_products_id > on products p (cost=0.00..6.01 rows=1 width=290) > (actual time=0.075..0.083 rows=1 loops=2358) > Index Cond: ("outer".product_id = > p.id) > Total runtime: 516.174 ms > (9 rows) > > > QUERY 2: > > => explain analyze select p.* from products p where > p.id in ( select product_id from product_categories pc > where pc.category_id = 1016) order by p.title limit 25 > offset 0; > > QUERY PLAN > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=4282.18..4282.24 rows=25 width=290) > (actual time=447.852..447.979 rows=25 loops=1) > -> Sort (cost=4282.18..4282.46 rows=111 > width=290) (actual time=447.836..447.904 rows=25 > loops=1) > Sort Key: p.title > -> Nested Loop (cost=3609.75..4278.41 > rows=111 width=290) (actual time=104.256..358.182 > rows=2358 loops=1) > -> HashAggregate > (cost=3609.75..3609.75 rows=111 width=4) (actual > time=103.922..114.836 rows=2358 loops=1) > -> Index Scan using > idx_pc_category_id on product_categories pc > (cost=0.00..3607.28 rows=986 width=4) (actual > time=0.401..92.253 rows=2358 loops=1) > Index Cond: (category_id = > 1016) > -> Index Scan using pkey_products_id > on products p (cost=0.00..6.01 rows=1 width=290) > (actual time=0.069..0.076 rows=1 loops=2358) > Index Cond: (p.id = > "outer".product_id) > Total runtime: 449.370 ms > (10 rows) > > > -CSN > > > --- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > > On Tue, 10 Feb 2004, CSN wrote: > > > > > > 2. Vacuum analyze the tables concerned and post > > the > > > > output of EXPLAIN ANALYSE > > > > with your query. > > > > > > => explain analyze; > > > > > > results in: > > > > > > ERROR: syntax error at or near ";" at character > > 16 > > > > No silly. you do: > > > > explain analyze select ... (rest of the query...) > > > > and it runs the query and tells you how long each > > bit took and what it > > THOUGHT it would get back in terms of number of rows > > and what it actually > > got back. > > > > Let us know... > > > > > __________________________________ > Do you Yahoo!? > Yahoo! Finance: Get your refund fast by filing online. > http://taxes.yahoo.com/filing.html >
On Tue, 10 Feb 2004, CSN wrote: > > --- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > > >What's weird is that when the same query > > is > > > executed again, it seems much faster - some sort > > of > > > caching maybe?> > > > > Yep. Exactly. > > Really? Where can I RTFM about it? Not sure. Basically, your kernel should be caching a fair bit. See what top says about cache and buffer size. If you've got a big enough kernel buffer, and the table gets hit often enough, it should stay in kernel memory.
I disabled enable_hashagg and enable_nestloop. Appears to have made both queries worse :( QUERY 1: => explain analyze select p.* from product_categories pc inner join products p on pc.product_id = p.id where pc.category_id = 1016 order by p.title limit 25 offset 10000; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=65999.78..65999.78 rows=1 width=290) (actual time=7736.029..7736.029 rows=0 loops=1) -> Sort (cost=65997.31..65999.78 rows=986 width=290) (actual time=7723.794..7730.352 rows=2358 loops=1) Sort Key: p.title -> Merge Join (cost=65306.35..65948.28 rows=986 width=290) (actual time=7028.790..7614.223 rows=2358 loops=1) Merge Cond: ("outer".product_id = "inner".id) -> Sort (cost=3656.31..3658.78 rows=986 width=4) (actual time=102.115..105.357 rows=2358 loops=1) Sort Key: pc.product_id -> Index Scan using idx_pc_category_id on product_categories pc (cost=0.00..3607.28 rows=986 width=4) (actual time=0.349..94.173 rows=2358 loops=1) Index Cond: (category_id = 1016) -> Sort (cost=61650.04..61963.62 rows=125430 width=290) (actual time=6926.394..7272.130 rows=124521 loops=1) Sort Key: p.id -> Seq Scan on products p (cost=0.00..6638.30 rows=125430 width=290) (actual time=0.102..2855.358 rows=124753 loops=1) Total runtime: 8003.067 ms (13 rows) QUERY 2: => explain analyze select p.* from products p where p.id in ( select product_id from product_categories pc where pc.category_id = 1016) order by p.title limit 25 offset 0; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=10931.85..10931.91 rows=25 width=290) (actual time=3667.396..3667.526 rows=25 loops=1) -> Sort (cost=10931.85..10932.13 rows=111 width=290) (actual time=3667.384..3667.453 rows=25 loops=1) Sort Key: p.title -> Hash Join (cost=3661.52..10928.08 rows=111 width=290) (actual time=111.198..1615.324 rows=2358 loops=1) Hash Cond: ("outer".id = "inner".product_id) -> Seq Scan on products p (cost=0.00..6638.30 rows=125430 width=290) (actual time=0.113..1039.900 rows=124753 loops=1) -> Hash (cost=3661.24..3661.24 rows=111 width=4) (actual time=110.932..110.932 rows=0 loops=1) -> Unique (cost=3656.31..3661.24 rows=111 width=4) (actual time=97.255..106.798 rows=2358 loops=1) -> Sort (cost=3656.31..3658.78 rows=986 width=4) (actual time=97.247..99.998 rows=2358 loops=1) Sort Key: pc.product_id -> Index Scan using idx_pc_category_id on product_categories pc (cost=0.00..3607.28 rows=986 width=4) (actual time=0.327..88.436 rows=2358 loops=1) Index Cond: (category_id = 1016) Total runtime: 3669.479 ms (13 rows) > upping the analyze target on those two tables may > help a bit too. How exactly do I do that? SELECT * from thanks limit 1000 ;) CSN --- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > Well, it looks like the number of rows estimate for > the nested loop in the > first query and the hash agg in the second are off > by a factor 3 for the > first query, and a factor of 20 for the second. Try > running number 1 with > set enable_nestloop = off > and see if the first one gets faster. > > You might also wanna try turning off hash aggregate > on the second one and > see how that works. > > upping the analyze target on those two tables may > help a bit too. > > On Tue, 10 Feb 2004, CSN wrote: > > > > > Doh! Yeah, now I remember ;) > > > > QUERY 1: > > > > => explain analyze select p.* from > product_categories > > pc inner join products p on pc.product_id = p.id > where > > pc.category_id = 1016 order by p.title limit 25 > offset > > 0; > > > > > QUERY PLAN > > > > > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Limit (cost=9595.99..9596.05 rows=25 width=290) > > (actual time=514.808..514.942 rows=25 loops=1) > > -> Sort (cost=9595.99..9598.45 rows=986 > > width=290) (actual time=514.794..514.868 rows=25 > > loops=1) > > Sort Key: p.title > > -> Nested Loop (cost=0.00..9546.96 > rows=986 > > width=290) (actual time=0.672..421.732 rows=2358 > > loops=1) > > -> Index Scan using > idx_pc_category_id > > on product_categories pc (cost=0.00..3607.28 > rows=986 > > width=4) (actual time=0.343..125.762 rows=2358 > > loops=1) > > Index Cond: (category_id = > 1016) > > -> Index Scan using > pkey_products_id > > on products p (cost=0.00..6.01 rows=1 width=290) > > (actual time=0.075..0.083 rows=1 loops=2358) > > Index Cond: > ("outer".product_id = > > p.id) > > Total runtime: 516.174 ms > > (9 rows) > > > > > > QUERY 2: > > > > => explain analyze select p.* from products p > where > > p.id in ( select product_id from > product_categories pc > > where pc.category_id = 1016) order by p.title > limit 25 > > offset 0; > > > > > QUERY PLAN > > > > > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Limit (cost=4282.18..4282.24 rows=25 width=290) > > (actual time=447.852..447.979 rows=25 loops=1) > > -> Sort (cost=4282.18..4282.46 rows=111 > > width=290) (actual time=447.836..447.904 rows=25 > > loops=1) > > Sort Key: p.title > > -> Nested Loop (cost=3609.75..4278.41 > > rows=111 width=290) (actual time=104.256..358.182 > > rows=2358 loops=1) > > -> HashAggregate > > (cost=3609.75..3609.75 rows=111 width=4) (actual > > time=103.922..114.836 rows=2358 loops=1) > > -> Index Scan using > > idx_pc_category_id on product_categories pc > > (cost=0.00..3607.28 rows=986 width=4) (actual > > time=0.401..92.253 rows=2358 loops=1) > > Index Cond: > (category_id = > > 1016) > > -> Index Scan using > pkey_products_id > > on products p (cost=0.00..6.01 rows=1 width=290) > > (actual time=0.069..0.076 rows=1 loops=2358) > > Index Cond: (p.id = > > "outer".product_id) > > Total runtime: 449.370 ms > > (10 rows) > > > > > > -CSN > > > > > > --- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > > > On Tue, 10 Feb 2004, CSN wrote: > > > > > > > > 2. Vacuum analyze the tables concerned and > post > > > the > > > > > output of EXPLAIN ANALYSE > > > > > with your query. > > > > > > > > => explain analyze; > > > > > > > > results in: > > > > > > > > ERROR: syntax error at or near ";" at > character > > > 16 > > > > > > No silly. you do: > > > > > > explain analyze select ... (rest of the > query...) > > > > > > and it runs the query and tells you how long > each > > > bit took and what it > > > THOUGHT it would get back in terms of number of > rows > > > and what it actually > > > got back. > > > > > > Let us know... > > > > > > > > > __________________________________ > > Do you Yahoo!? > > Yahoo! Finance: Get your refund fast by filing > online. > > http://taxes.yahoo.com/filing.html > > > __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
On Tue, 10 Feb 2004, CSN wrote: > > I disabled enable_hashagg and enable_nestloop. Appears > to have made both queries worse :( > Good, then we know that the nest loop and hash agg are probably good plans. > > How exactly do I do that? > > SELECT * from thanks limit 1000 > ;) it's an alter table thingie: alter table tablename alter column columnname set statistics 100; But since it looks like it's picking a good plan, it's probably not a real big deal. So, can you get rid of the join / in on the other table, or do you need it there?
> So, can you get rid of the join / in on the other > table, or do you need it there? No - each product can fall under multiple categories, so I need the product_categories table (and the join on it). Thanks again, CSN __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html
On Tue, 10 Feb 2004, CSN wrote: > > So, can you get rid of the join / in on the other > > table, or do you need it there? > > No - each product can fall under multiple categories, > so I need the product_categories table (and the join > on it). Oh, ok. then we might have as efficient a query as we're gonna get. Oh, another thing is to make your select in the in() clause a select distinct and see if that helps. If you've got a whole bunch of duplicates running around in it it's sometimes faster to distinct it. Then again, given all the work Tom's been putting in the query planner / optimization, he may have already done something like that.
"scott.marlowe" <scott.marlowe@ihs.com> writes: > Oh, another thing is to make your select in the in() clause a select > distinct and see if that helps. If you've got a whole bunch of duplicates > running around in it it's sometimes faster to distinct it. Then again, > given all the work Tom's been putting in the query planner / optimization, > he may have already done something like that. As of CVS tip I would not recommend inserting DISTINCT in an "IN (subselect)" construct --- the planner will do the equivalent for itself if it figures that's the best way. I do not recall this late at night whether all the relevant changes were in 7.4 or not ... regards, tom lane
I have found in previous versions of postgres that rewriting the join can help. Have you tried something like:
select p.*
from product_categories pc, products p
where pc.product_id = p.id AND pc.category_id = $category_id
order by p.title
limit 25
offset $offset
from product_categories pc, products p
where pc.product_id = p.id AND pc.category_id = $category_id
order by p.title
limit 25
offset $offset
cheers
Matthew
--
----- Original Message -----From: CSNSent: Tuesday, February 10, 2004 7:51 PMSubject: [GENERAL] Join query on 1M row table slowI have a pretty simple select query that joins a table
(p) with 125K rows with another table (pc) with almost
one million rows:
select p.*
from product_categories pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset
The query usually takes about five seconds to execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can do to
improve performance - such as tweaking some settings
in the config?
Redhat 9, PG 7.4.1.
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
http://archives.postgresql.org
_____________________________________________________________________
This e-mail has been scanned for viruses by MCI's Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.mci.com
On Tue, 10 Feb 2004, CSN wrote: > > I have a pretty simple select query that joins a table > (p) with 125K rows with another table (pc) with almost > one million rows: > > select p.* > from product_categories pc > inner join products p > on pc.product_id = p.id > where pc.category_id = $category_id > order by p.title > limit 25 > offset $offset This idiom looks to me a lot like "results paging". You have a query that returns a lot of rows, and you are formatting them one page at a time in your CGI or whatever. In PostgreSQL, cursors do this very well: BEGIN; DECLARE resultset CURSOR FOR select p.* from product_categories pc inner join products p on pc.product_id = p.id where pc.category_id = $category_id order by p.title ; MOVE $offset IN resultset; FETCH 25 FROM resultset; [ repeat as necessary ]; This does use some resources on the server side, but it is very much faster than LIMIT/OFFSET. The biggest "gotcha" about cursors is that their lifetime is limited to the enclosing transaction, so they may not be appropriate for CGI-type applications. Bill Gribble
Appears to be somewhat slower - took about 600-2600ms on different runs. CSN => explain analyze select p.* from product_categories pc, products p where pc.product_id = p.id AND pc.category_id = 1016 order by p.title limit 25 offset 0; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=9270.77..9270.83 rows=25 width=290) (actual time=2598.686..2598.875 rows=25 loops=1) -> Sort (cost=9270.77..9273.15 rows=952 width=290) (actual time=2598.677..2598.805 rows=25 loops=1) Sort Key: p.title -> Nested Loop (cost=0.00..9223.67 rows=952 width=290) (actual time=27.257..2485.644 rows=2358 loops=1) -> Index Scan using idx_pc_category_id on product_categories pc (cost=0.00..3493.30 rows=951 width=4) (actual time=26.819..396.049 rows=2358 loops=1) Index Cond: (category_id = 1016) -> Index Scan using pkey_products_id on products p (cost=0.00..6.01 rows=1 width=290) (actual time=0.838..0.845 rows=1 loops=2358) Index Cond: ("outer".product_id = p.id) Total runtime: 2600.395 ms (9 rows) --- Matthew Lunnon <mlunnon@rwa-net.co.uk> wrote: > I have found in previous versions of postgres that > rewriting the join can help. Have you tried > something like: > > select p.* > from product_categories pc, products p > where pc.product_id = p.id AND pc.category_id = > $category_id > order by p.title > limit 25 > offset $offset > > > cheers > Matthew > -- > > Matthew Lunnon > Senior Software Engineer > RWA Ltd > www.rwa-net.co.uk > > ----- Original Message ----- > From: CSN > To: pgsql-general@postgresql.org > Sent: Tuesday, February 10, 2004 7:51 PM > Subject: [GENERAL] Join query on 1M row table slow > > > I have a pretty simple select query that joins a > table > (p) with 125K rows with another table (pc) with > almost > one million rows: > > select p.* > from product_categories pc > inner join products p > on pc.product_id = p.id > where pc.category_id = $category_id > order by p.title > limit 25 > offset $offset > > The query usually takes about five seconds to > execute > (all other PG queries perform fast enough). I have > indexes on everything needed, and EXPLAIN shows > they're being used. Is there anything else I can > do to > improve performance - such as tweaking some > settings > in the config? > > Redhat 9, PG 7.4.1. > > __________________________________ > Do you Yahoo!? > Yahoo! Finance: Get your refund fast by filing > online. > http://taxes.yahoo.com/filing.html > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > > > _____________________________________________________________________ > This e-mail has been scanned for viruses by MCI's > Internet Managed Scanning Services - powered by > MessageLabs. For further information visit > http://www.mci.com > __________________________________ Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online. http://taxes.yahoo.com/filing.html