Re: Running update in chunks? - Mailing list pgsql-general

From Kevin Grittner
Subject Re: Running update in chunks?
Date
Msg-id 20130122022150.120590@gmx.com
Whole thread Raw
In response to Running update in chunks?  (Tim Uckun <timuckun@gmail.com>)
Responses Re: Running update in chunks?  (Steve Crawford <scrawford@pinpointresearch.com>)
Re: Running update in chunks?  (Tim Uckun <timuckun@gmail.com>)
Re: Running update in chunks?  (Tim Uckun <timuckun@gmail.com>)
List pgsql-general
Kevin Grittner wrote:

> update imports set make_id = 0
>
> Query returned successfully: 98834 rows affected, 45860 ms execution time.

For difficult problems, there is nothing like a self-contained test
case, that someone else can run to see the issue. Here's a starting
point:

create extension if not exists hstore;
create schema cars;
drop table if exists cars.imports;
CREATE TABLE cars.imports
(
id serial NOT NULL,
target_id integer,
batch_id integer,
make_id integer,
model_id integer,
date timestamp without time zone,
division_id integer,
dealer_id integer,
data hstore,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT imports_pkey PRIMARY KEY (id)
);
CREATE INDEX index_imports_on_data ON cars.imports USING gin (data);
CREATE INDEX index_imports_on_dealer_id ON cars.imports USING btree (dealer_id);
CREATE INDEX index_imports_on_division_id ON cars.imports USING btree (division_id);
CREATE INDEX index_imports_on_make_id ON cars.imports USING btree (make_id);
CREATE INDEX index_imports_on_model_id ON cars.imports USING btree (model_id);
insert into cars.imports (created_at, updated_at) select now(), now() from (select generate_series(1, 100000)) x;
vacuum freeze analyze;
\timing on
update cars.imports set make_id = 0;
vacuum analyze;
update cars.imports set make_id = 0;

... and here's what I get when I run it on my desktop computer with
ordinary disk drives and a completely default configuration:

test=# create extension if not exists hstore;
CREATE EXTENSION
Time: 48.032 ms
test=# create schema cars;
CREATE SCHEMA
Time: 8.150 ms
test=# drop table if exists cars.imports;
NOTICE:  table "imports" does not exist, skipping
DROP TABLE
Time: 0.205 ms
test=# CREATE TABLE cars.imports
test-# (
test(# id serial NOT NULL,
test(# target_id integer,
test(# batch_id integer,
test(# make_id integer,
test(# model_id integer,
test(# date timestamp without time zone,
test(# division_id integer,
test(# dealer_id integer,
test(# data hstore,
test(# created_at timestamp without time zone NOT NULL,
test(# updated_at timestamp without time zone NOT NULL,
test(# CONSTRAINT imports_pkey PRIMARY KEY (id)
test(# );
NOTICE:  CREATE TABLE will create implicit sequence "imports_id_seq" for serial column "imports.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "imports_pkey" for table "imports"
CREATE TABLE
Time: 152.677 ms
test=# CREATE INDEX index_imports_on_data ON cars.imports USING gin (data);
CREATE INDEX
Time: 6.391 ms
test=# CREATE INDEX index_imports_on_dealer_id ON cars.imports USING btree (dealer_id);
CREATE INDEX
Time: 64.668 ms
test=# CREATE INDEX index_imports_on_division_id ON cars.imports USING btree (division_id);
CREATE INDEX
Time: 65.573 ms
test=# CREATE INDEX index_imports_on_make_id ON cars.imports USING btree (make_id);
CREATE INDEX
Time: 64.959 ms
test=# CREATE INDEX index_imports_on_model_id ON cars.imports USING btree (model_id);
CREATE INDEX
Time: 64.906 ms
test=# insert into cars.imports (created_at, updated_at) select now(), now() from (select generate_series(1, 100000))
x;
INSERT 0 100000
Time: 2516.559 ms
test=# vacuum freeze analyze;
VACUUM
Time: 3357.778 ms
test=# \timing on
Timing is on.
test=# update cars.imports set make_id = 0;
UPDATE 100000
Time: 2937.241 ms
test=# vacuum analyze;
VACUUM
Time: 2097.426 ms
test=# update cars.imports set make_id = 0;
UPDATE 100000
Time: 3935.939 ms

Ubuntu 12.10
i7-3770 CPU @ 3.40GHz with 16GB RAM
Linux Kevin-Desktop 3.5.0-22-generic #34-Ubuntu SMP Tue Jan 8 21:47:00 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
A pair of WDC WD10EALX-009BA0 (15.01H51) 7200 RPM drives in RAID 1.

 PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2, 64-bit

test=# SELECT name, current_setting(name), source
test-#   FROM pg_settings
test-#   WHERE source NOT IN ('default', 'override');
            name            |  current_setting   |        source        
----------------------------+--------------------+----------------------
 application_name           | psql               | client
 client_encoding            | UTF8               | client
 DateStyle                  | ISO, MDY           | configuration file
 default_text_search_config | pg_catalog.english | configuration file
 lc_messages                | en_US.UTF-8        | configuration file
 lc_monetary                | en_US.UTF-8        | configuration file
 lc_numeric                 | en_US.UTF-8        | configuration file
 lc_time                    | en_US.UTF-8        | configuration file
 log_timezone               | US/Central         | configuration file
 max_connections            | 100                | configuration file
 max_stack_depth            | 2MB                | environment variable
 shared_buffers             | 32MB               | configuration file
 TimeZone                   | US/Central         | configuration file
(13 rows)

I did absolutely no tuning from the default configuration.

So, what timings do you get if you run the identical script? Is
there something you can do to the above script (maybe in terms of
populating data) which will cause the performance you see?

-Kevin


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Yet Another Timestamp Question: Time Defaults
Next
From: "Kevin Grittner"
Date:
Subject: Re: Yet Another Timestamp Question: Time Defaults