Thread: Suggestions on an update query
PostgreSql version 8.2.4
Memory = 8 Gig
CPUs 1 dual core Zeon running at 3.0
I have a problem with an update query taking over 10 hours in order to run. I rebooted my server. I ran the SQL command “analyze”. Could you please help me with any suggestions? I have included the two tables involved in the update below as well as the indexes I am using.
The table result_entry contains 17,767,240 rows and the table question_number contains 40,787. Each row from the result_entry table will match to one and only one row in the table question_number using the fk_question_id field. Each row from the question_number table matches to an average of 436 rows on the result_entry table.
CREATE TABLE question_number
(
fk_form_id integer not null,
fk_question_id integer not null,
question_number integer not null,
sequence_id integer not null
);
ALTER TABLE ONLY question_number ADD CONSTRAINT question_number_pkey PRIMARY KEY (fk_question_id);
CREATE INDEX question_number_index1 ON question_number USING btree (question_number);
CREATE TABLE result_entry (
fk_result_submission_id integer NOT NULL,
fk_question_id integer NOT NULL,
fk_option_order_id integer NOT NULL,
value character varying,
order_id integer NOT NULL,
question_number integer
);
CREATE INDEX result_entery_index1 ON result_entry USING btree (fk_question_id);
update result_entry set question_number=question_number.question_number
from question_number where result_entry.fk_question_id=question_number.fk_question_id;
explain update result_entry set question_number=question_number.question_number
from question_number where result_entry.fk_question_id=question_number.fk_question_id;
QUERY PLAN
---------------------------------------------------------------------------------
Hash Join (cost=1437.71..1046983.94 rows=17333178 width=32)
Hash Cond: (result_entry.fk_question_id = question_number.fk_question_id)
-> Seq Scan on result_entry (cost=0.00..612216.78 rows=17333178 width=28)
-> Hash (cost=927.87..927.87 rows=40787 width=8)
-> Seq Scan on question_number (cost=0.00..927.87 rows=40787 width=8)
(5 rows)
Postgresql.conf settings:
shared_buffers = 1GB
work_mem = 10MB
max_fsm_pages = 204800
random_page_cost = 1.0
effective_cache_size = 8GB
Thanks for any help!
Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
I forgot to include an additional parameter I am using in Postgresql.conf:
checkpoint_segments = 30
Thanks,
Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Campbell, Lance
Sent: Friday, October 26, 2007 3:27 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Suggestions on an update query
PostgreSql version 8.2.4
Memory = 8 Gig
CPUs 1 dual core Zeon running at 3.0
I have a problem with an update query taking over 10 hours in order to run. I rebooted my server. I ran the SQL command “analyze”. Could you please help me with any suggestions? I have included the two tables involved in the update below as well as the indexes I am using.
The table result_entry contains 17,767,240 rows and the table question_number contains 40,787. Each row from the result_entry table will match to one and only one row in the table question_number using the fk_question_id field. Each row from the question_number table matches to an average of 436 rows on the result_entry table.
CREATE TABLE question_number
(
fk_form_id integer not null,
fk_question_id integer not null,
question_number integer not null,
sequence_id integer not null
);
ALTER TABLE ONLY question_number ADD CONSTRAINT question_number_pkey PRIMARY KEY (fk_question_id);
CREATE INDEX question_number_index1 ON question_number USING btree (question_number);
CREATE TABLE result_entry (
fk_result_submission_id integer NOT NULL,
fk_question_id integer NOT NULL,
fk_option_order_id integer NOT NULL,
value character varying,
order_id integer NOT NULL,
question_number integer
);
CREATE INDEX result_entery_index1 ON result_entry USING btree (fk_question_id);
update result_entry set question_number=question_number.question_number
from question_number where result_entry.fk_question_id=question_number.fk_question_id;
explain update result_entry set question_number=question_number.question_number
from question_number where result_entry.fk_question_id=question_number.fk_question_id;
QUERY PLAN
---------------------------------------------------------------------------------
Hash Join (cost=1437.71..1046983.94 rows=17333178 width=32)
Hash Cond: (result_entry.fk_question_id = question_number.fk_question_id)
-> Seq Scan on result_entry (cost=0.00..612216.78 rows=17333178 width=28)
-> Hash (cost=927.87..927.87 rows=40787 width=8)
-> Seq Scan on question_number (cost=0.00..927.87 rows=40787 width=8)
(5 rows)
Postgresql.conf settings:
shared_buffers = 1GB
work_mem = 10MB
max_fsm_pages = 204800
random_page_cost = 1.0
effective_cache_size = 8GB
Thanks for any help!
Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
"Campbell, Lance" <lance@uiuc.edu> writes: > QUERY PLAN > > > ------------------------------------------------------------------------ > > Hash Join (cost=1437.71..1046983.94 rows=17333178 width=32) > Hash Cond: (result_entry.fk_question_id = question_number.fk_question_id) > -> Seq Scan on result_entry (cost=0.00..612216.78 rows=17333178 width=28) > -> Hash (cost=927.87..927.87 rows=40787 width=8) > -> Seq Scan on question_number (cost=0.00..927.87 rows=40787 width=8) > > (5 rows) That looks like an entirely reasonable plan. Is it possible some other session was blocking this update with a lock on a record? Was there lots of I/O at the time? You could peek in pg_locks while the update seems frozen. This looks like a one-time administrative job to add a new column, is that it? You might also consider creating a new table with the new data and replacing the old table with the new one with something like: CREATE TABLE new_result_entry AS SELECT fk_result_submission_id, fk_question_id, fk_option_order_id, value, order_id, question_number.question_number FROM result_entry JOIN question_number USING (fk_question_id) CREATE INDEX result_entery_index1n ON new_result_entry USING btree (fk_question_id); ALTER TABLE result_entry RENAME TO old_result_entry ALTER TABLE newresult_entry RENAME TO result_entry Unfortunately (for this use case) any views, triggers, etc which reference the old table will continue to reference the old table after the renames. You'll have to drop and recreate them. That may not be an option if the data is actively being used though. But if it is an option there are a few advantages 1) it'll be a bit faster 2) you can build the indexes on the new data at the end of the creation b) the resulting table and indexes won't have all the old versions taking up space waiting for a vacuum. > Postgresql.conf settings: > shared_buffers = 1GB > work_mem = 10MB > max_fsm_pages = 204800 > random_page_cost = 1.0 > effective_cache_size = 8GB I would suggest keeping random_page_cost at least slightly above 1.0 and effective_cache_size should probably be about 6GB rather than 8 since the shared buffers and other things which use memory reduce the memory available for cache. Also, work_mem could be larger at least for large batch queries like this. None of this is relevant for this query though. Actually I think a larger work_mem can avoid problems with hash joins so you might try that but I don't think it would be choosing it estimated that might happen -- and the estimates all look accurate. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Fri, 26 Oct 2007 15:31:44 -0500 "Campbell, Lance" <lance@uiuc.edu> wrote: > I forgot to include an additional parameter I am using in > Postgresql.conf: > O.k. first, just to get it out of the way (and then I will try and help). Please do not top post, it makes replying contextually very difficult. > > PostgreSql version 8.2.4 > > Memory = 8 Gig > > CPUs 1 dual core Zeon running at 3.0 > O.k. first you might be grinding through your 20 checkpoint segments but in reality what I think is happening is you are doing foreign key checks against all of it and slowing things down. > > The table result_entry contains 17,767,240 rows and the table > question_number contains 40,787. Each row from the result_entry table > will match to one and only one row in the table question_number using > the fk_question_id field. Each row from the question_number table > matches to an average of 436 rows on the result_entry table. > > You could disable the foreign key for the update and then reapply it. Joshua D. Drake > > CREATE TABLE question_number > > ( > > fk_form_id integer not null, > > fk_question_id integer not null, > > question_number integer not null, > > sequence_id integer not null > > ); > > > > ALTER TABLE ONLY question_number ADD CONSTRAINT question_number_pkey > PRIMARY KEY (fk_question_id); > > CREATE INDEX question_number_index1 ON question_number USING btree > (question_number); > > > > > > CREATE TABLE result_entry ( > > fk_result_submission_id integer NOT NULL, > > fk_question_id integer NOT NULL, > > fk_option_order_id integer NOT NULL, > > value character varying, > > order_id integer NOT NULL, > > question_number integer > > ); > > > > CREATE INDEX result_entery_index1 ON result_entry USING btree > (fk_question_id); > > > > > > update result_entry set > question_number=question_number.question_number > > > from question_number where > result_entry.fk_question_id=question_number.fk_question_id; > > > > > > > > explain update result_entry set > question_number=question_number.question_number > > from question_number where > result_entry.fk_question_id=question_number.fk_question_id; > > > > QUERY PLAN > > > ------------------------------------------------------------------------ > --------- > > Hash Join (cost=1437.71..1046983.94 rows=17333178 width=32) > > Hash Cond: (result_entry.fk_question_id = > question_number.fk_question_id) > > -> Seq Scan on result_entry (cost=0.00..612216.78 rows=17333178 > width=28) > > -> Hash (cost=927.87..927.87 rows=40787 width=8) > > -> Seq Scan on question_number (cost=0.00..927.87 > rows=40787 width=8) > > (5 rows) > > > > > > > > Postgresql.conf settings: > > > > shared_buffers = 1GB > > work_mem = 10MB > > max_fsm_pages = 204800 > > random_page_cost = 1.0 > > effective_cache_size = 8GB > > > > > > Thanks for any help! > > > > > > Lance Campbell > > Project Manager/Software Architect > > Web Services at Public Affairs > > University of Illinois > > 217.333.0382 > > http://webservices.uiuc.edu > > > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Attachment
"Joshua D. Drake" <jd@commandprompt.com> writes: > On Fri, 26 Oct 2007 15:31:44 -0500 > "Campbell, Lance" <lance@uiuc.edu> wrote: > >> I forgot to include an additional parameter I am using in >> Postgresql.conf: >> > > O.k. first, just to get it out of the way (and then I will try and > help). Please do not top post, it makes replying contextually very > difficult. > >> PostgreSql version 8.2.4 >> >> Memory = 8 Gig >> >> CPUs 1 dual core Zeon running at 3.0 >> > > O.k. first you might be grinding through your 20 checkpoint segments > but in reality what I think is happening is you are doing foreign key > checks against all of it and slowing things down. If you're going to berate someone about top-posting perhaps you should attach your own commentary to relevant bits of context :P But the original post didn't include any foreign key constraints. I suspect you've guessed it right though. In fact I suspect what's happening is he doesn't have an index on the referencing column so the foreign key checks are doing sequential scans of. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Sat, 27 Oct 2007 03:04:47 +0100 Gregory Stark <stark@enterprisedb.com> wrote: > > O.k. first you might be grinding through your 20 checkpoint segments > > but in reality what I think is happening is you are doing foreign > > key checks against all of it and slowing things down. > > If you're going to berate someone about top-posting perhaps you > should attach your own commentary to relevant bits of context :P It was hardly berating Greg, I even said please. > I > suspect you've guessed it right though. In fact I suspect what's > happening is he doesn't have an index on the referencing column so > the foreign key checks are doing sequential scans of. > Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Attachment
Thanks for all of your help. The problem was that the result_entry table had some constraints that pointed to a third table. When I removed those constraints the performance was amazing. The update took less than seven minutes to execute. I did not even consider the fact that constraints to another table would impact the performance. Thanks again, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Gregory Stark Sent: Friday, October 26, 2007 9:05 PM To: Joshua D. Drake Cc: Campbell, Lance; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Suggestions on an update query "Joshua D. Drake" <jd@commandprompt.com> writes: > On Fri, 26 Oct 2007 15:31:44 -0500 > "Campbell, Lance" <lance@uiuc.edu> wrote: > >> I forgot to include an additional parameter I am using in >> Postgresql.conf: >> > > O.k. first, just to get it out of the way (and then I will try and > help). Please do not top post, it makes replying contextually very > difficult. > >> PostgreSql version 8.2.4 >> >> Memory = 8 Gig >> >> CPUs 1 dual core Zeon running at 3.0 >> > > O.k. first you might be grinding through your 20 checkpoint segments > but in reality what I think is happening is you are doing foreign key > checks against all of it and slowing things down. If you're going to berate someone about top-posting perhaps you should attach your own commentary to relevant bits of context :P But the original post didn't include any foreign key constraints. I suspect you've guessed it right though. In fact I suspect what's happening is he doesn't have an index on the referencing column so the foreign key checks are doing sequential scans of. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org
On 10/29/07, Campbell, Lance <lance@uiuc.edu> wrote: > Thanks for all of your help. The problem was that the result_entry table > had some constraints that pointed to a third table. When I removed > those constraints the performance was amazing. The update took less > than seven minutes to execute. I did not even consider the fact that > constraints to another table would impact the performance. Usually you can put an index on the refrerenced key in the foreign table to speed things up.