Home > mailing lists

Running update in chunks? - Mailing list pgsql-general

From	Tim Uckun
Subject	Running update in chunks?
Date	January 21, 2013 11:04:39
Msg-id	CAGuHJrPHRNmmoWOQD7YfArgGHRrUkBzC33ipsJr1tODYJCmMFg@mail.gmail.com Whole thread Raw
Responses	Re: Running update in chunks? Re: Running update in chunks?
List	pgsql-general

Tree view

I have seen a lot of slow update questions asked both here and on
stack overflow  but they usually involve large tables. In my case the
dataset is kind of small.

I have an app in which I import data and then merge the records with
an existing table. Currently I do most of the heavy lifting with code
and it works well enough but as the imports get bigger I thought I
would rewrite the code to speed it up using postgres. Basically I get
the data which I consider to be dirty and I put it into a table using
hstore to store the data. I then run a series of update queries to
locate the "real" records in the various tables.  The import data
looks like this https://gist.github.com/4584366 and has about 98K
records in it. The lookup table is very small only a couple of hundred
records in it.

This is the query I am running

update cars.imports i
    set make_id = md.make_id
    from cars.models md where i.model_id = md.id;


Here is the analyse

"Update on imports i  (cost=2.46..49720.34 rows=138858 width=526)
(actual time=51968.553..51968.553 rows=0 loops=1)"
"  ->  Hash Join  (cost=2.46..49720.34 rows=138858 width=526) (actual
time=0.044..408.170 rows=98834 loops=1)"
"        Hash Cond: (i.model_id = md.id)"
"        ->  Seq Scan on imports i  (cost=0.00..47808.58 rows=138858
width=516) (actual time=0.010..323.616 rows=98834 loops=1)"
"        ->  Hash  (cost=1.65..1.65 rows=65 width=14) (actual
time=0.026..0.026 rows=65 loops=1)"
"              Buckets: 1024  Batches: 1  Memory Usage: 4kB"
"              ->  Seq Scan on models md  (cost=0.00..1.65 rows=65
width=14) (actual time=0.002..0.012 rows=65 loops=1)"
"Total runtime: 51968.602 ms"


This query takes fifty seconds on a macbook air with i7 processor and
eight gigs of RAM and SSD hard drive.  I am using postgres 9.2
installed with homebrew using the standard conf file.

So it seems to me that this query is running as fast as it could but
it's still much slower than doing things with code one record at a
time (using some memoization).

Anyway...  Presuming I can't really do anything to speed up this query
does it make sense to  try and do this in chunks and if so what is the
best technique for doing that.

pgsql-general by date:

From: Craig Ringer
Date: 21 January 2013, 10:47:22
Subject: SELinux users - Please consider testing SELinux/SEPostgreSQL patches

From: Richard Huxton
Date: 21 January 2013, 12:31:07
Subject: Re: Running update in chunks?

Running update in chunks? - Mailing list pgsql-general

Previous

Next