Home > mailing lists

Thread: Suggestions on an update query

Suggestions on an update query

From

"Campbell, Lance"

Date:

26 October 2007, 17:26:47

PostgreSql version 8.2.4

Memory = 8 Gig

CPUs 1 dual core Zeon running at 3.0

I have a problem with an update query taking over 10 hours in order to run. I rebooted my server. I ran the SQL command “analyze”. Could you please help me with any suggestions? I have included the two tables involved in the update below as well as the indexes I am using.

The table result_entry contains 17,767,240 rows and the table question_number contains 40,787. Each row from the result_entry table will match to one and only one row in the table question_number using the fk_question_id field. Each row from the question_number table matches to an average of 436 rows on the result_entry table.

CREATE TABLE question_number

(

fk_form_id integer not null,

fk_question_id integer not null,

question_number integer not null,

sequence_id integer not null

);

ALTER TABLE ONLY question_number ADD CONSTRAINT question_number_pkey PRIMARY KEY (fk_question_id);

CREATE INDEX question_number_index1 ON question_number USING btree (question_number);

CREATE TABLE result_entry (

fk_result_submission_id integer NOT NULL,

fk_question_id integer NOT NULL,

fk_option_order_id integer NOT NULL,

value character varying,

order_id integer NOT NULL,

question_number integer

);

CREATE INDEX result_entery_index1 ON result_entry USING btree (fk_question_id);

update result_entry set question_number=question_number.question_number

from question_number where result_entry.fk_question_id=question_number.fk_question_id;

explain update result_entry set question_number=question_number.question_number

from question_number where result_entry.fk_question_id=question_number.fk_question_id;

QUERY PLAN

---------------------------------------------------------------------------------

Hash Join (cost=1437.71..1046983.94 rows=17333178 width=32)

Hash Cond: (result_entry.fk_question_id = question_number.fk_question_id)

-> Seq Scan on result_entry (cost=0.00..612216.78 rows=17333178 width=28)

-> Hash (cost=927.87..927.87 rows=40787 width=8)

-> Seq Scan on question_number (cost=0.00..927.87 rows=40787 width=8)

(5 rows)

Postgresql.conf settings:

shared_buffers = 1GB

work_mem = 10MB

max_fsm_pages = 204800

random_page_cost = 1.0

effective_cache_size = 8GB

Thanks for any help!

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

Re: Suggestions on an update query

From

"Campbell, Lance"

Date:

26 October 2007, 17:31:53

I forgot to include an additional parameter I am using in Postgresql.conf:

checkpoint_segments = 30

Thanks,

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Campbell, Lance
Sent: Friday, October 26, 2007 3:27 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Suggestions on an update query

PostgreSql version 8.2.4

Memory = 8 Gig

CPUs 1 dual core Zeon running at 3.0

CREATE TABLE question_number

(

fk_form_id integer not null,

fk_question_id integer not null,

question_number integer not null,

sequence_id integer not null

);

ALTER TABLE ONLY question_number ADD CONSTRAINT question_number_pkey PRIMARY KEY (fk_question_id);

CREATE INDEX question_number_index1 ON question_number USING btree (question_number);

CREATE TABLE result_entry (

fk_result_submission_id integer NOT NULL,

fk_question_id integer NOT NULL,

fk_option_order_id integer NOT NULL,

value character varying,

order_id integer NOT NULL,

question_number integer

);

CREATE INDEX result_entery_index1 ON result_entry USING btree (fk_question_id);

update result_entry set question_number=question_number.question_number

from question_number where result_entry.fk_question_id=question_number.fk_question_id;

explain update result_entry set question_number=question_number.question_number

from question_number where result_entry.fk_question_id=question_number.fk_question_id;

QUERY PLAN

---------------------------------------------------------------------------------

Hash Join (cost=1437.71..1046983.94 rows=17333178 width=32)

Hash Cond: (result_entry.fk_question_id = question_number.fk_question_id)

-> Seq Scan on result_entry (cost=0.00..612216.78 rows=17333178 width=28)

-> Hash (cost=927.87..927.87 rows=40787 width=8)

-> Seq Scan on question_number (cost=0.00..927.87 rows=40787 width=8)

(5 rows)

Postgresql.conf settings:

shared_buffers = 1GB

work_mem = 10MB

max_fsm_pages = 204800

random_page_cost = 1.0

effective_cache_size = 8GB

Thanks for any help!

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

Re: Suggestions on an update query

From

Gregory Stark

Date:

26 October 2007, 18:10:16

"Campbell, Lance" <lance@uiuc.edu> writes:

>                                    QUERY PLAN
>
>
> ------------------------------------------------------------------------
>
>  Hash Join  (cost=1437.71..1046983.94 rows=17333178 width=32)
>    Hash Cond: (result_entry.fk_question_id = question_number.fk_question_id)
>    ->  Seq Scan on result_entry  (cost=0.00..612216.78 rows=17333178 width=28)
>    ->  Hash  (cost=927.87..927.87 rows=40787 width=8)
>          ->  Seq Scan on question_number  (cost=0.00..927.87 rows=40787 width=8)
>
> (5 rows)

That looks like an entirely reasonable plan. Is it possible some other session
was blocking this update with a lock on a record? Was there lots of I/O at the
time? You could peek in pg_locks while the update seems frozen.

This looks like a one-time administrative job to add a new column, is that it?
You might also consider creating a new table with the new data and replacing
the old table with the new one with something like:

CREATE TABLE new_result_entry AS
  SELECT fk_result_submission_id, fk_question_id, fk_option_order_id,
         value, order_id,
         question_number.question_number
    FROM result_entry
    JOIN question_number USING (fk_question_id)

CREATE INDEX result_entery_index1n ON new_result_entry USING btree (fk_question_id);

ALTER TABLE result_entry RENAME TO old_result_entry
ALTER TABLE newresult_entry RENAME TO result_entry

Unfortunately (for this use case) any views, triggers, etc which reference the
old table will continue to reference the old table after the renames. You'll
have to drop and recreate them.

That may not be an option if the data is actively being used though. But if it
is an option there are a few advantages 1) it'll be a bit faster 2) you can
build the indexes on the new data at the end of the creation b) the resulting
table and indexes won't have all the old versions taking up space waiting for
a vacuum.

> Postgresql.conf settings:
> shared_buffers = 1GB
> work_mem = 10MB
> max_fsm_pages = 204800
> random_page_cost = 1.0
> effective_cache_size = 8GB

I would suggest keeping random_page_cost at least slightly above 1.0 and
effective_cache_size should probably be about 6GB rather than 8 since the
shared buffers and other things which use memory reduce the memory available
for cache. Also, work_mem could be larger at least for large batch queries
like this.

None of this is relevant for this query though. Actually I think a larger
work_mem can avoid problems with hash joins so you might try that but I don't
think it would be choosing it estimated that might happen -- and the estimates
all look accurate.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: Suggestions on an update query

From

"Joshua D. Drake"

Date:

26 October 2007, 18:15:15

On Fri, 26 Oct 2007 15:31:44 -0500
"Campbell, Lance" <lance@uiuc.edu> wrote:

> I forgot to include an additional parameter I am using in
> Postgresql.conf:
>

O.k. first, just to get it out of the way (and then I will try and
help). Please do not top post, it makes replying contextually very
difficult.
>
> PostgreSql version 8.2.4
>
> Memory = 8 Gig
>
> CPUs 1 dual core Zeon running at 3.0
>

O.k. first you might be grinding through your 20 checkpoint segments
but in reality what I think is happening is you are doing foreign key
checks against all of it and slowing things down.


>
> The table result_entry contains 17,767,240 rows and the table
> question_number contains 40,787.  Each row from the result_entry table
> will match to one and only one row in the table question_number using
> the fk_question_id field.  Each row from the question_number table
> matches to an average of 436 rows on the result_entry table.
>
>


You could disable the foreign key for the update and then reapply it.

Joshua D. Drake


>
> CREATE TABLE question_number
>
> (
>
>   fk_form_id                    integer         not null,
>
>   fk_question_id                integer         not null,
>
>   question_number               integer not null,
>
>   sequence_id                   integer not null
>
> );
>
>
>
> ALTER TABLE ONLY question_number ADD CONSTRAINT question_number_pkey
> PRIMARY KEY (fk_question_id);
>
> CREATE INDEX question_number_index1 ON question_number USING btree
> (question_number);
>
>
>
>
>
> CREATE TABLE result_entry (
>
>     fk_result_submission_id integer NOT NULL,
>
>     fk_question_id integer NOT NULL,
>
>     fk_option_order_id integer NOT NULL,
>
>     value character varying,
>
>     order_id integer NOT NULL,
>
>     question_number integer
>
> );
>
>
>
> CREATE INDEX result_entery_index1 ON result_entry USING btree
> (fk_question_id);
>
>
>
>
>
> update result_entry set
> question_number=question_number.question_number
>
>
> from question_number where
> result_entry.fk_question_id=question_number.fk_question_id;
>
>
>
>
>
>
>
> explain update result_entry set
> question_number=question_number.question_number
>
> from question_number where
> result_entry.fk_question_id=question_number.fk_question_id;
>
>
>
>                                    QUERY PLAN
>
>
> ------------------------------------------------------------------------
> ---------
>
>  Hash Join  (cost=1437.71..1046983.94 rows=17333178 width=32)
>
>    Hash Cond: (result_entry.fk_question_id =
> question_number.fk_question_id)
>
>    ->  Seq Scan on result_entry  (cost=0.00..612216.78 rows=17333178
> width=28)
>
>    ->  Hash  (cost=927.87..927.87 rows=40787 width=8)
>
>          ->  Seq Scan on question_number  (cost=0.00..927.87
> rows=40787 width=8)
>
> (5 rows)
>
>
>
>
>
>
>
> Postgresql.conf settings:
>
>
>
> shared_buffers = 1GB
>
> work_mem = 10MB
>
> max_fsm_pages = 204800
>
> random_page_cost = 1.0
>
> effective_cache_size = 8GB
>
>
>
>
>
> Thanks for any help!
>
>
>
>
>
> Lance Campbell
>
> Project Manager/Software Architect
>
> Web Services at Public Affairs
>
> University of Illinois
>
> 217.333.0382
>
> http://webservices.uiuc.edu
>
>
>


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
            UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Attachment

signature.asc

Re: Suggestions on an update query

From

Gregory Stark

Date:

26 October 2007, 23:04:58

"Joshua D. Drake" <jd@commandprompt.com> writes:

> On Fri, 26 Oct 2007 15:31:44 -0500
> "Campbell, Lance" <lance@uiuc.edu> wrote:
>
>> I forgot to include an additional parameter I am using in
>> Postgresql.conf:
>>
>
> O.k. first, just to get it out of the way (and then I will try and
> help). Please do not top post, it makes replying contextually very
> difficult.
>
>> PostgreSql version 8.2.4
>>
>> Memory = 8 Gig
>>
>> CPUs 1 dual core Zeon running at 3.0
>>
>
> O.k. first you might be grinding through your 20 checkpoint segments
> but in reality what I think is happening is you are doing foreign key
> checks against all of it and slowing things down.

If you're going to berate someone about top-posting perhaps you should attach
your own commentary to relevant bits of context :P

But the original post didn't include any foreign key constraints. I suspect
you've guessed it right though. In fact I suspect what's happening is he
doesn't have an index on the referencing column so the foreign key checks are
doing sequential scans of.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: Suggestions on an update query

From

"Joshua D. Drake"

Date:

26 October 2007, 23:31:42

On Sat, 27 Oct 2007 03:04:47 +0100
Gregory Stark <stark@enterprisedb.com> wrote:


> > O.k. first you might be grinding through your 20 checkpoint segments
> > but in reality what I think is happening is you are doing foreign
> > key checks against all of it and slowing things down.
>
> If you're going to berate someone about top-posting perhaps you
> should attach your own commentary to relevant bits of context :P

It was hardly berating Greg, I even said please.


> I
> suspect you've guessed it right though. In fact I suspect what's
> happening is he doesn't have an index on the referencing column so
> the foreign key checks are doing sequential scans of.
>

Sincerely,

Joshua D. Drake


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
            UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Attachment

signature.asc

Re: Suggestions on an update query

From

"Campbell, Lance"

Date:

29 October 2007, 13:33:59

Thanks for all of your help. The problem was that the result_entry table
had some constraints that pointed to a third table.  When I removed
those constraints the performance was amazing.  The update took less
than seven minutes to execute.  I did not even consider the fact that
constraints to another table would impact the performance.

Thanks again,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Gregory
Stark
Sent: Friday, October 26, 2007 9:05 PM
To: Joshua D. Drake
Cc: Campbell, Lance; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Suggestions on an update query

"Joshua D. Drake" <jd@commandprompt.com> writes:

> On Fri, 26 Oct 2007 15:31:44 -0500
> "Campbell, Lance" <lance@uiuc.edu> wrote:
>
>> I forgot to include an additional parameter I am using in
>> Postgresql.conf:
>>
>
> O.k. first, just to get it out of the way (and then I will try and
> help). Please do not top post, it makes replying contextually very
> difficult.
>
>> PostgreSql version 8.2.4
>>
>> Memory = 8 Gig
>>
>> CPUs 1 dual core Zeon running at 3.0
>>
>
> O.k. first you might be grinding through your 20 checkpoint segments
> but in reality what I think is happening is you are doing foreign key
> checks against all of it and slowing things down.

If you're going to berate someone about top-posting perhaps you should
attach
your own commentary to relevant bits of context :P

But the original post didn't include any foreign key constraints. I
suspect
you've guessed it right though. In fact I suspect what's happening is he
doesn't have an index on the referencing column so the foreign key
checks are
doing sequential scans of.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: Suggestions on an update query

From

"Scott Marlowe"

Date:

30 October 2007, 02:28:26

On 10/29/07, Campbell, Lance <lance@uiuc.edu> wrote:
> Thanks for all of your help. The problem was that the result_entry table
> had some constraints that pointed to a third table.  When I removed
> those constraints the performance was amazing.  The update took less
> than seven minutes to execute.  I did not even consider the fact that
> constraints to another table would impact the performance.

Usually you can put an index on the refrerenced key in the foreign
table to speed things up.