Thread: Same query - Slow in production
I'm trying to determine why an identical query is running approximately 500 to 1000 times slower on our production database compared to our backup database server. Both database servers are dual 2.3 GHz G5 Xserves running PostgreSQL 8.1.3; both are configured with 8GB of RAM with identical shared memory settings; both postgresql.conf files are identical; both databases have identical indexes defined. The three relevant tables are all clustered the same, although I'm not sure when clustering was last performed on either server. All three tables have recently been analyzed on both servers. The different explain plans for this query seem to be consistent on both servers regardless of category and the production server is consistently and drastically slower than the backup server. If anyone has any ideas on how to have the production server generate the same explain plan as the backup server, or can suggest anything I might want to try, I would greatly appreciate it. Brian Wipf ClickSpace Interactive Inc. <brian@clickspace.com> Here's the query: SELECT ac.attribute_id FROM attribute_category ac WHERE is_browsable = 'true' AND category_id = 1000962 AND EXISTS ( SELECT 'X' FROM product_attribute_value pav, category_product cp WHERE pav.attribute_id = ac.attribute_id AND pav.status_code is null AND pav.product_id = cp.product_id AND cp.category_id = ac.category_id AND cp.product_is_active = 'true' AND cp.product_status_code = 'complete' ) Explain plans: Fast (backup server): Index Scan using attribute_category__category_id_fk_idx on attribute_category ac (cost=0.00..47943.34 rows=7 width=4) (actual time=0.110..0.263 rows=5 loops=1) Index Cond: (category_id = 1000962) Filter: (((is_browsable)::text = 'true'::text) AND (subplan)) SubPlan -> Nested Loop (cost=0.00..7983.94 rows=3 width=0) (actual time=0.043..0.043 rows=1 loops=5) -> Index Scan using category_product__category_id_is_active_and_status_idx on category_product cp (cost=0.00..4362.64 rows=1103 width=4) (actual time=0.013..0.015 rows=2 loops=5) Index Cond: ((category_id = $1) AND ((product_is_active)::text = 'true'::text) AND ((product_status_code)::text = 'complete'::text)) -> Index Scan using product_attribute_value__product_id_fk_idx on product_attribute_value pav (cost=0.00..3.27 rows=1 width=4) (actual time=0.016..0.016 rows=1 loops=8) Index Cond: (pav.product_id = "outer".product_id) Filter: ((attribute_id = $0) AND (status_code IS NULL)) Total runtime: 0.449 ms (11 rows) Slow (production server): Index Scan using attribute_category__category_id_fk_idx on attribute_category ac (cost=0.00..107115.90 rows=7 width=4) (actual time=1.472..464.437 rows=5 loops=1) Index Cond: (category_id = 1000962) Filter: (((is_browsable)::text = 'true'::text) AND (subplan)) SubPlan -> Nested Loop (cost=18.33..23739.70 rows=4 width=0) (actual time=92.870..92.870 rows=1 loops=5) -> Bitmap Heap Scan on product_attribute_value pav (cost=18.33..8764.71 rows=2549 width=4) (actual time=10.191..45.672 rows=5869 loops=5) Recheck Cond: (attribute_id = $0) Filter: (status_code IS NULL) -> Bitmap Index Scan on product_attribute_value__attribute_id_fk_idx (cost=0.00..18.33 rows=2952 width=0) (actual time=9.160..9.160 rows=33330 loops=5) Index Cond: (attribute_id = $0) -> Index Scan using x_category_product_pk on category_product cp (cost=0.00..5.86 rows=1 width=4) (actual time=0.007..0.007 rows=0 loops=29345) Index Cond: ((cp.category_id = $1) AND ("outer".product_id = cp.product_id)) Filter: (((product_is_active)::text = 'true'::text) AND ((product_status_code)::text = 'complete'::text)) Total runtime: 464.667 ms (14 rows) Table Descriptions: \d attribute_category; Table "public.attribute_category" Column | Type | Modifiers -----------------+----------------------+----------- attribute_id | integer | not null category_id | integer | not null is_browsable | character varying(5) | is_required | character varying(5) | sort_order | integer | default_unit_id | integer | Indexes: "attribute_category_pk" PRIMARY KEY, btree (attribute_id, category_id) "attribute_category__attribute_id_fk_idx" btree (attribute_id) "attribute_category__category_id_fk_idx" btree (category_id) CLUSTER Foreign-key constraints: "attribute_category_attribute_fk" FOREIGN KEY (attribute_id) REFERENCES attribute(attribute_id) DEFERRABLE INITIALLY DEFERRED "attribute_category_category_fk" FOREIGN KEY (category_id) REFERENCES category(category_id) DEFERRABLE INITIALLY DEFERRED \d product_attribute_value; Table "public.product_attribute_value" Column | Type | Modifiers ----------------------------+-----------------------+----------- attribute_id | integer | not null attribute_unit_id | integer | attribute_value_id | integer | boolean_value | character varying(5) | decimal_value | numeric(30,10) | product_attribute_value_id | integer | not null product_id | integer | not null product_reference_id | integer | status_code | character varying(32) | Indexes: "product_attribute_value_pk" PRIMARY KEY, btree (product_attribute_value_id) "product_attribute_value__attribute_id_fk_idx" btree (attribute_id) "product_attribute_value__attribute_unit_id_fk_idx" btree (attribute_unit_id) "product_attribute_value__attribute_value_id_fk_idx" btree (attribute_value_id) "product_attribute_value__decimal_value_idx" btree (decimal_value) "product_attribute_value__product_id_fk_idx" btree (product_id) CLUSTER "product_attribute_value__product_reference_id_fk_idx" btree (product_reference_id) Foreign-key constraints: "product_attribute_value_attribute_fk" FOREIGN KEY (attribute_id) REFERENCES attribute(attribute_id) DEFERRABLE INITIALLY DEFERRED "product_attribute_value_attributeunit_fk" FOREIGN KEY (attribute_unit_id) REFERENCES attribute_unit(attribute_unit_id) DEFERRABLE INITIALLY DEFERRED "product_attribute_value_attributevalue_fk" FOREIGN KEY (attribute_value_id) REFERENCES attribute_value(attribute_value_id) DEFERRABLE INITIALLY DEFERRED "product_attribute_value_product_fk" FOREIGN KEY (product_id) REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED "product_attribute_value_productreference_fk" FOREIGN KEY (product_reference_id) REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED \d category_product; Table "public.category_product" Column | Type | Modifiers ---------------------+------------------------+----------- category_id | integer | not null product_id | integer | not null en_name_sort_order | integer | fr_name_sort_order | integer | merchant_sort_order | integer | price_sort_order | integer | merchant_count | integer | is_active | character varying(5) | product_is_active | character varying(5) | product_status_code | character varying(32) | product_name_en | character varying(512) | product_name_fr | character varying(512) | product_click_count | integer | Indexes: "x_category_product_pk" PRIMARY KEY, btree (category_id, product_id) "category_product__category_id_is_active_and_status_idx" btree (category_id, product_is_active, product_status_code) "category_product__is_active_idx" btree (is_active) "category_product__merchant_sort_order_idx" btree (merchant_sort_order) "x_category_product__category_id_fk_idx" btree (category_id) CLUSTER "x_category_product__product_id_fk_idx" btree (product_id) Foreign-key constraints: "x_category_product_category_fk" FOREIGN KEY (category_id) REFERENCES category(category_id) DEFERRABLE INITIALLY DEFERRED "x_category_product_product_fk" FOREIGN KEY (product_id) REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED
I added to the exists query qualifier: AND cp.category_id = 1000962 (in addition to the cp.category_id = ac.category_id) Now I am getting a much better query plan on our production server: Index Scan using attribute_category__category_id_fk_idx on attribute_category ac (cost=0.00..485.71 rows=7 width=4) (actual time=0.104..0.351 rows=5 loops=1) Index Cond: (category_id = 1000962) Filter: (((is_browsable)::text = 'true'::text) AND (subplan)) SubPlan -> Nested Loop (cost=0.00..24.77 rows=1 width=0) (actual time=0.058..0.058 rows=1 loops=5) -> Index Scan using x_category_product__category_id_fk_idx on category_product cp (cost=0.00..6.01 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=5) Index Cond: ((category_id = $1) AND (category_id = 1000962)) Filter: (((product_is_active)::text = 'true'::text) AND ((product_status_code)::text = 'complete'::text)) -> Index Scan using product_attribute_value__product_id_fk_idx on product_attribute_value pav (cost=0.00..18.75 rows=1 width=4) (actual time=0.041..0.041 rows=1 loops=5) Index Cond: (pav.product_id = "outer".product_id) Filter: ((attribute_id = $0) AND (status_code IS NULL)) Total runtime: 0.558 ms (12 rows) It is using the x_category_product__category_id_fk_idx on category_product instead of the category_product__category_id_is_active_and_status_idx index as on our backup server. Still not sure what's causing the differences in query execution between the servers, but at least the query is fast again. Brian On 10-May-06, at 4:39 PM, Brian Wipf wrote: > I'm trying to determine why an identical query is running > approximately 500 to 1000 times slower on our production database > compared to our backup database server. > > Both database servers are dual 2.3 GHz G5 Xserves running > PostgreSQL 8.1.3; both are configured with 8GB of RAM with > identical shared memory settings; both postgresql.conf files are > identical; both databases have identical indexes defined. > > The three relevant tables are all clustered the same, although I'm > not sure when clustering was last performed on either server. All > three tables have recently been analyzed on both servers. > > The different explain plans for this query seem to be consistent on > both servers regardless of category and the production server is > consistently and drastically slower than the backup server. > > If anyone has any ideas on how to have the production server > generate the same explain plan as the backup server, or can suggest > anything I might want to try, I would greatly appreciate it. > > Brian Wipf > ClickSpace Interactive Inc. > <brian@clickspace.com> > > Here's the query: > > SELECT ac.attribute_id > FROM attribute_category ac > WHERE is_browsable = 'true' AND > category_id = 1000962 AND > EXISTS ( SELECT 'X' > FROM product_attribute_value pav, > category_product cp > WHERE pav.attribute_id = ac.attribute_id AND > pav.status_code is null AND > pav.product_id = cp.product_id AND > cp.category_id = ac.category_id AND > cp.product_is_active = 'true' AND > cp.product_status_code = 'complete' > ) > > Explain plans: > > Fast (backup server): > Index Scan using attribute_category__category_id_fk_idx on > attribute_category ac (cost=0.00..47943.34 rows=7 width=4) (actual > time=0.110..0.263 rows=5 loops=1) > Index Cond: (category_id = 1000962) > Filter: (((is_browsable)::text = 'true'::text) AND (subplan)) > SubPlan > -> Nested Loop (cost=0.00..7983.94 rows=3 width=0) (actual > time=0.043..0.043 rows=1 loops=5) > -> Index Scan using > category_product__category_id_is_active_and_status_idx on > category_product cp (cost=0.00..4362.64 rows=1103 width=4) (actual > time=0.013..0.015 rows=2 loops=5) > Index Cond: ((category_id = $1) AND > ((product_is_active)::text = 'true'::text) AND > ((product_status_code)::text = 'complete'::text)) > -> Index Scan using > product_attribute_value__product_id_fk_idx on > product_attribute_value pav (cost=0.00..3.27 rows=1 width=4) > (actual time=0.016..0.016 rows=1 loops=8) > Index Cond: (pav.product_id = "outer".product_id) > Filter: ((attribute_id = $0) AND (status_code IS > NULL)) > Total runtime: 0.449 ms > (11 rows) > > Slow (production server): > Index Scan using attribute_category__category_id_fk_idx on > attribute_category ac (cost=0.00..107115.90 rows=7 width=4) > (actual time=1.472..464.437 rows=5 loops=1) > Index Cond: (category_id = 1000962) > Filter: (((is_browsable)::text = 'true'::text) AND (subplan)) > SubPlan > -> Nested Loop (cost=18.33..23739.70 rows=4 width=0) (actual > time=92.870..92.870 rows=1 loops=5) > -> Bitmap Heap Scan on product_attribute_value pav > (cost=18.33..8764.71 rows=2549 width=4) (actual time=10.191..45.672 > rows=5869 loops=5) > Recheck Cond: (attribute_id = $0) > Filter: (status_code IS NULL) > -> Bitmap Index Scan on > product_attribute_value__attribute_id_fk_idx (cost=0.00..18.33 > rows=2952 width=0) (actual time=9.160..9.160 rows=33330 loops=5) > Index Cond: (attribute_id = $0) > -> Index Scan using x_category_product_pk on > category_product cp (cost=0.00..5.86 rows=1 width=4) (actual > time=0.007..0.007 rows=0 loops=29345) > Index Cond: ((cp.category_id = $1) AND > ("outer".product_id = cp.product_id)) > Filter: (((product_is_active)::text = > 'true'::text) AND ((product_status_code)::text = 'complete'::text)) > Total runtime: 464.667 ms > (14 rows) > > Table Descriptions: > > \d attribute_category; > Table "public.attribute_category" > Column | Type | Modifiers > -----------------+----------------------+----------- > attribute_id | integer | not null > category_id | integer | not null > is_browsable | character varying(5) | > is_required | character varying(5) | > sort_order | integer | > default_unit_id | integer | > Indexes: > "attribute_category_pk" PRIMARY KEY, btree (attribute_id, > category_id) > "attribute_category__attribute_id_fk_idx" btree (attribute_id) > "attribute_category__category_id_fk_idx" btree (category_id) > CLUSTER > Foreign-key constraints: > "attribute_category_attribute_fk" FOREIGN KEY (attribute_id) > REFERENCES attribute(attribute_id) DEFERRABLE INITIALLY DEFERRED > "attribute_category_category_fk" FOREIGN KEY (category_id) > REFERENCES category(category_id) DEFERRABLE INITIALLY DEFERRED > > \d product_attribute_value; > Table "public.product_attribute_value" > Column | Type | Modifiers > ----------------------------+-----------------------+----------- > attribute_id | integer | not null > attribute_unit_id | integer | > attribute_value_id | integer | > boolean_value | character varying(5) | > decimal_value | numeric(30,10) | > product_attribute_value_id | integer | not null > product_id | integer | not null > product_reference_id | integer | > status_code | character varying(32) | > Indexes: > "product_attribute_value_pk" PRIMARY KEY, btree > (product_attribute_value_id) > "product_attribute_value__attribute_id_fk_idx" btree > (attribute_id) > "product_attribute_value__attribute_unit_id_fk_idx" btree > (attribute_unit_id) > "product_attribute_value__attribute_value_id_fk_idx" btree > (attribute_value_id) > "product_attribute_value__decimal_value_idx" btree (decimal_value) > "product_attribute_value__product_id_fk_idx" btree (product_id) > CLUSTER > "product_attribute_value__product_reference_id_fk_idx" btree > (product_reference_id) > Foreign-key constraints: > "product_attribute_value_attribute_fk" FOREIGN KEY > (attribute_id) REFERENCES attribute(attribute_id) DEFERRABLE > INITIALLY DEFERRED > "product_attribute_value_attributeunit_fk" FOREIGN KEY > (attribute_unit_id) REFERENCES attribute_unit(attribute_unit_id) > DEFERRABLE INITIALLY DEFERRED > "product_attribute_value_attributevalue_fk" FOREIGN KEY > (attribute_value_id) REFERENCES attribute_value(attribute_value_id) > DEFERRABLE INITIALLY DEFERRED > "product_attribute_value_product_fk" FOREIGN KEY (product_id) > REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED > "product_attribute_value_productreference_fk" FOREIGN KEY > (product_reference_id) REFERENCES product(product_id) DEFERRABLE > INITIALLY DEFERRED > > \d category_product; > Table "public.category_product" > Column | Type | Modifiers > ---------------------+------------------------+----------- > category_id | integer | not null > product_id | integer | not null > en_name_sort_order | integer | > fr_name_sort_order | integer | > merchant_sort_order | integer | > price_sort_order | integer | > merchant_count | integer | > is_active | character varying(5) | > product_is_active | character varying(5) | > product_status_code | character varying(32) | > product_name_en | character varying(512) | > product_name_fr | character varying(512) | > product_click_count | integer | > Indexes: > "x_category_product_pk" PRIMARY KEY, btree (category_id, > product_id) > "category_product__category_id_is_active_and_status_idx" btree > (category_id, product_is_active, product_status_code) > "category_product__is_active_idx" btree (is_active) > "category_product__merchant_sort_order_idx" btree > (merchant_sort_order) > "x_category_product__category_id_fk_idx" btree (category_id) > CLUSTER > "x_category_product__product_id_fk_idx" btree (product_id) > Foreign-key constraints: > "x_category_product_category_fk" FOREIGN KEY (category_id) > REFERENCES category(category_id) DEFERRABLE INITIALLY DEFERRED > "x_category_product_product_fk" FOREIGN KEY (product_id) > REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED
Brian Wipf <brian@clickspace.com> writes: > I'm trying to determine why an identical query is running > approximately 500 to 1000 times slower on our production database > compared to our backup database server. It looks to me like it's pure luck that the query is fast on the backup server. The outer side of the EXISTS' join is being badly misestimated: > -> Index Scan using > category_product__category_id_is_active_and_status_idx on > category_product cp (cost=0.00..4362.64 rows=1103 width=4) (actual > time=0.013..0.015 rows=2 loops=5) > Index Cond: ((category_id = $1) AND > ((product_is_active)::text = 'true'::text) AND > ((product_status_code)::text = 'complete'::text)) If there actually had been 1100 matching rows instead of 2, the query would have run 550 times slower, putting it in the same ballpark as the other plan. So what I'm guessing is that the planner sees these two plans as being nearly the same cost, and small differences in the stats between the two databases are enough to tip its choice in one direction or the other. So what you want, of course, is to improve that rowcount estimate. I suppose the reason it's so bad is that we don't have multicolumn statistics ... is there a strong correlation between product_is_active and product_status_code? If so, it might be worth your while to find a way to merge them into one column. regards, tom lane
Hi! How can I speed up my server's performance when I use offset and limit clause. For example I have a query: SELECT * FROM table ORDER BY id, name OFFSET 100000 LIMIT 10000 This query takes a long time about more than 2 minutes. If my query is: SELECT * FROM table ORDER BY id, name OFFSET 50000 LIMIT 10000 It takes about 2 seconds. Thanks
Christian Paul Cosinas wrote: > Hi! > > How can I speed up my server's performance when I use offset and limit > clause. > > For example I have a query: > SELECT * FROM table ORDER BY id, name OFFSET 100000 LIMIT 10000 > > This query takes a long time about more than 2 minutes. > > If my query is: > SELECT * FROM table ORDER BY id, name OFFSET 50000 LIMIT 10000 > It takes about 2 seconds. Please create a new thread rather than replying to someone elses post and changing the subject. These threads can sometimes get missed. You do have an index on id and name don't you? -- Postgresql & php tutorials http://www.designmagick.com/
Why do you want to use it this way ? Explain what you want to do, there probably is another faster solution... On Thu, 11 May 2006 16:45:33 +0200, Christian Paul Cosinas <cpc@cybees.com> wrote: > Hi! > > How can I speed up my server's performance when I use offset and limit > clause. > > For example I have a query: > SELECT * FROM table ORDER BY id, name OFFSET 100000 LIMIT 10000 > > This query takes a long time about more than 2 minutes. > > If my query is: > SELECT * FROM table ORDER BY id, name OFFSET 50000 LIMIT 10000 > It takes about 2 seconds. > > Thanks > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match
"Christian Paul Cosinas" <cpc 'at' cybees.com> writes: > Hi! > > How can I speed up my server's performance when I use offset and limit > clause. > > For example I have a query: > SELECT * FROM table ORDER BY id, name OFFSET 100000 LIMIT 10000 > > This query takes a long time about more than 2 minutes. > > If my query is: > SELECT * FROM table ORDER BY id, name OFFSET 50000 LIMIT 10000 > It takes about 2 seconds. First you should read the appropriate documentation. http://www.postgresql.org/docs/8.1/interactive/performance-tips.html -- Guillaume Cottenceau
I am creating an application that gets the value of a large table and write it to a file. Why I want to use offset and limit is for me to create a threaded application so that they will not get the same results. For example: Thread 1 : gets offset 0 limit 5000 Thread 2 : gets offset 5000 limit 5000 Thread 3 : gets offset 10000 limit 5000 And so on... Would there be any other faster way than what It thought? -----Original Message----- From: PFC [mailto:lists@peufeu.com] Sent: Thursday, May 11, 2006 7:06 AM To: Christian Paul Cosinas; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Speed Up Offset and Limit Clause Why do you want to use it this way ? Explain what you want to do, there probably is another faster solution... On Thu, 11 May 2006 16:45:33 +0200, Christian Paul Cosinas <cpc@cybees.com> wrote: > Hi! > > How can I speed up my server's performance when I use offset and limit > clause. > > For example I have a query: > SELECT * FROM table ORDER BY id, name OFFSET 100000 LIMIT 10000 > > This query takes a long time about more than 2 minutes. > > If my query is: > SELECT * FROM table ORDER BY id, name OFFSET 50000 LIMIT 10000 > It takes about 2 seconds. > > Thanks > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match
Christian Paul Cosinas wrote: > I am creating an application that gets the value of a large table and write > it to a file. > > Why I want to use offset and limit is for me to create a threaded > application so that they will not get the same results. > > For example: > > Thread 1 : gets offset 0 limit 5000 > Thread 2 : gets offset 5000 limit 5000 > Thread 3 : gets offset 10000 limit 5000 > > And so on... > > Would there be any other faster way than what It thought? In order to return rows 10000 to 15000, it must select all rows from zero to 15000 and then discard the first 10000 -- probablynot what you were hoping for. You might add a "thread" column. Say you want to run ten threads: create sequence thread_seq increment by 1 minvalue 1 maxvalue 10 cycle start with 1; create table mytable( column1 integer, ... other columns..., thread integer default nextval('thread_seq') ); create bitmap index i_mytable_thread on mytable(thread); Now whenever you insert into mytable, you get a value in mytable.thread between 1 and 10, and it's indexed with a highlyefficient bitmap index. So your query becomes: Thread 1: select ... from mytable where ... and thread = 1; Thread 2: select ... from mytable where ... and thread = 2; ... and so forth. Craig
On Tue, May 16, 2006 at 07:20:12PM -0700, Craig A. James wrote: > >Why I want to use offset and limit is for me to create a threaded > >application so that they will not get the same results. > > In order to return rows 10000 to 15000, it must select all rows from zero > to 15000 and then discard the first 10000 -- probably not what you were > hoping for. > > You might add a "thread" column. Say you want to run ten threads: Another possibility is partitioning the table. If you do that using inheritance-based partitioning, you could just select directly from different partition tables, which probably be even faster than using a single table. The downside is it's more work to setup. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
> Thread 1 : gets offset 0 limit 5000 > Thread 2 : gets offset 5000 limit 5000 > Thread 3 : gets offset 10000 limit 5000 > > Would there be any other faster way than what It thought? Yeah, sure, use a thread which does the whole query (maybe using a cursor) and fills a queue with the results, then N threads consuming from that queue... it will work better.