Re: about multiprocessingmassdata - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: about multiprocessingmassdata
Date
Msg-id 4F7C7827.7080705@fuzzy.cz
Whole thread Raw
In response to about multiprocessingmassdata  (superman0920 <superman0920@gmail.com>)
List pgsql-performance
On 4.4.2012 17:52, superman0920 wrote:
> Hi list!
>
> i have a table which has 8500000 rows records. i write a java program to
> update these records.
> i use 100 threads to update the records. For example, thread-1 update
> 1~85000 records; thread-2 update 85001~170000 and so on.
> The update sql's aim is remove the space in the column and it is simple:
> update poi set py=replace(py,' ','') where id=?;

That's a very naive approach. It's very likely each thread will do an
index scan for each update (to evaluate the 'id=?' condition. And that's
going to cost you much more than you gain because index scans are quite
CPU and I/O intensive.

Simply update the whole table by

   UPDATE poi SET py = replace(py, ' ','');

Have you actually tried how this performs or did you guess 'it's
definitely going to be very slow so I'll use multiple threads to make
that faster'?

If you really need to parallelize this, you need to do that differently
- e.g. use 'ctid' to skip to update a whole page like this:

   UPDATE poi SET py = replace(py, ' ','')
    WHERE ctid >= '(n,0)'::tid AND ctid < '(n+1,0)'::tid AND;

where 'n' ranges between 0 and number of pages the table (e.g. in pg_class).

But try the simple UPDATE first, my guess is it's going to be much
faster than you expect.

Tomas

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: H800 + md1200 Performance problem
Next
From: Scott Marlowe
Date:
Subject: Re: H800 + md1200 Performance problem