Thread: Help: massive parallel update to the same table

Help: massive parallel update to the same table

From

Red Maple

Date:

18 March 2011, 11:08:27

Hi all,

Our system has a postgres database that has a table for statistic which is updated every hour by about 10K clients. Each client only make update to its own row in the table. So far I am only seeing one core out of eight cores on my server being active which tells me that the update is being done serial instead of being parallel. Do you know if there is a way for me to make these independent updates happen in parallel?

Thank you, your help is very much appreciated!

Re: Help: massive parallel update to the same table

From

"Kevin Grittner"

Date:

18 March 2011, 11:28:15

Red Maple <redmapleleaf@gmail.com> wrote:

> Our system has a postgres database that has a table for statistic
> which is updated every hour by about 10K clients. Each client only
> make update to its own row in the table. So far I am only seeing
> one core out of eight cores on my server being active which tells
> me that the update is being done serial instead of being parallel.
> Do you know if there is a way for me to make these independent
> updates happen in parallel?

It should be parallel by default.  Are you taking out any explicit
locks?

Also, it seems like you're only doing about three updates per
second.  I would expect a single-row update to run in a couple ms or
less, so it would be rare that two requests would be active at the
same time, so you wouldn't often see multiple cores active at the
same time.  (Of course, the background writer, autovacuum, etc.,
should occasionally show up concurrently with update queries.)

Is there some particular problem you're trying to solve?  (For
example, is something too slow?)

-Kevin

Re: Help: massive parallel update to the same table

From

"Nicholson, Brad (Toronto, ON, CA)"

Date:

18 March 2011, 11:34:03

>From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Red Maple
>Sent: Friday, March 18, 2011 9:05 AM
>To: pgsql-performance@postgresql.org
>Subject: [PERFORM] Help: massive parallel update to the same table
>
>Hi all,
>
>Our system has a postgres database that has a table for statistic which is updated every hour by about 10K clients.
Eachclient only make update to its own row in the table. So far >I am only seeing one core out of eight cores on my
serverbeing active which tells me that the update is being done serial instead of being parallel. Do you know if there
isa way >for me to make these independent updates happen in parallel? 
>
>Thank you, your help is very much appreciated!

If they are all happening on one core, you are probably using one DB connection to do the updates.  To split them
acrossmultiple cores, you need to use multiple DB connections.  Be careful if/when you restructure things to filter
theserequests into a reasonable number of backend DB connections - turning a huge number of clients loose against a DB
isnot going end well.   

Brad.

Re: Help: massive parallel update to the same table

From

"Kevin Grittner"

Date:

18 March 2011, 13:07:09

[rearranged - please don't top-post]

[also, bringing this back to the list - please keep the list copied]

Red Maple <redmapleleaf@gmail.com> wrote:
> Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

>> It should be parallel by default.  Are you taking out any
>> explicit locks?

> my clients use psql to remotely run an update function on the
> postgres server. Each client run its own psql to connect to the
> server. What I have noticed is that if I commented out the update
> in the function so that only query is being done then all the core
> would kick in and run at 100%. However if I allow the update on
> the function then only one core would run.

> Currently it take 40min to update all the client statistics

Please show us the part you commented out to get the faster run
time, and the source code for the function you mentioned.

> Do you know if I have configured something incorrectly?
>
> I am running postgres 9.0.2 on fedora core 14. Here is my
> postgres.conf file
>
>
> [over 500 lines of configuration, mostly comments, wrapped]

If you're going to post that, please strip the comments or post the
results of this query:

  http://wiki.postgresql.org/wiki/Server_Configuration

I don't think anything in your configuration will affect this
particular problem, but it seems likely that you could do some
overall tuning.  If you want to do that, you should probably start a
new thread after this issue is sorted out.

-Kevin

Re: Help: massive parallel update to the same table

From

Red Maple

Date:

18 March 2011, 15:21:18

Hi,

Here is my function. If I comment out the update then it would run all the cores, if not then only one core will run....

CREATE OR REPLACE FUNCTION my_update_device(this_mac text, number_of_devices integer, this_sysuptime integer)
RETURNS integer AS
$BODY$
       DECLARE
        fake_mac macaddr;
        this_id integer;
        new_avgld integer;
BEGIN
     new_avgld = (this_sysuptime / 120) % 100;
     for i in 1..Number_of_devices loop
           fake_mac = substring(this_mac from 1 for 11) || ':' || upper(to_hex((i-1)/256)) || ':' || upper(to_hex((i-1)%256));
           select into this_id id from ap where lan_mac = upper(fake_mac::text);

           if not found then
              return -1;
           end if;

select into this_sysuptime sysuptime from ap_sysuptime where ap_id = this_id for update;

-- ==============================================================================
-- >>>>>>>> if I comment out the next update then all cores will be running, else only one core will be running
-- ==============================================================================
          update ap_sysuptime set sysuptime = this_sysuptime, last_contacted = now() where ap_id = this_id;
                      select into new_avgld avg_ld_1min from colubris_device where node_id = this_id for update;
                      new_avgld = (this_avgld / 120 ) % 100;

end loop;
return this_id;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

On Fri, Mar 18, 2011 at 12:06 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

[rearranged - please don't top-post]

[also, bringing this back to the list - please keep the list copied]

Red Maple <redmapleleaf@gmail.com> wrote:
> Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

>> It should be parallel by default. Are you taking out any
>> explicit locks?

> my clients use psql to remotely run an update function on the
> postgres server. Each client run its own psql to connect to the
> server. What I have noticed is that if I commented out the update
> in the function so that only query is being done then all the core
> would kick in and run at 100%. However if I allow the update on
> the function then only one core would run.

> Currently it take 40min to update all the client statistics

Please show us the part you commented out to get the faster run
time, and the source code for the function you mentioned.

> Do you know if I have configured something incorrectly?
>
> I am running postgres 9.0.2 on fedora core 14. Here is my
> postgres.conf file
>
>
> [over 500 lines of configuration, mostly comments, wrapped]

If you're going to post that, please strip the comments or post the
results of this query:

http://wiki.postgresql.org/wiki/Server_Configuration

I don't think anything in your configuration will affect this
particular problem, but it seems likely that you could do some
overall tuning. If you want to do that, you should probably start a
new thread after this issue is sorted out.

-Kevin

Re: Help: massive parallel update to the same table

From

"Kevin Grittner"

Date:

18 March 2011, 16:21:54

Red Maple <redmapleleaf@gmail.com> wrote:

> Here is my function. If I comment out the update then it would run
> all the cores, if not then only one core will run....

> CREATE OR REPLACE FUNCTION

> [...]

>       select sysuptime
>         into this_sysuptime
>         from ap_sysuptime
>         where ap_id = this_id
>         for update;
>
>       -- ==================================================
>       -- >>>>>>>> if I comment out the next update
>       -- >>>>>>>>   then all cores will be running,
>       -- >>>>>>>>   else only one core will be running
>       -- ==================================================
>       update ap_sysuptime
>         set sysuptime      = this_sysuptime,
>             last_contacted = now()
>         where ap_id = this_id;

This proves that you're not showing us the important part.  The
update locks the same row previously locked by the SELECT FOR
UPDATE, so any effect at the row level would be a serialization
failure based on a write conflict, which doesn't sound like your
problem.  They get different locks at the table level, though:

http://www.postgresql.org/docs/9.0/interactive/explicit-locking.html#LOCKING-TABLES

Somewhere in code you're not showing us you're acquiring a lock on
the ap_sysuptime table which conflicts with a ROW EXCLUSIVE lock but
not with a ROW SHARE lock.  The lock types which could do that are
SHARE and SHARE ROW EXCLUSIVE.  CREATE INDEX (without CONCURRENTLY)
could do that; otherwise it seems that you would need to be
explicitly issuing a LOCK statement at one of these levels somewhere
in your transaction.  That is what is causing the transactions to
run one at a time.

-Kevin

Re: Help: massive parallel update to the same table

From

Red Maple

Date:

22 March 2011, 11:13:08

Hi,

I have found the bug in my code that made the update to the same row in the table instead of two different row. Now I have all cores up and running 100%.

Thank you for all your help.

On Fri, Mar 18, 2011 at 3:21 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

Red Maple <redmapleleaf@gmail.com> wrote:

> Here is my function. If I comment out the update then it would run
> all the cores, if not then only one core will run....

> CREATE OR REPLACE FUNCTION

> [...]

> select sysuptime
> into this_sysuptime
> from ap_sysuptime
> where ap_id = this_id
> for update;
>
> -- ==================================================
> -- >>>>>>>> if I comment out the next update
> -- >>>>>>>> then all cores will be running,
> -- >>>>>>>> else only one core will be running
> -- ==================================================
> update ap_sysuptime
> set sysuptime = this_sysuptime,
> last_contacted = now()
> where ap_id = this_id;

This proves that you're not showing us the important part. The
update locks the same row previously locked by the SELECT FOR
UPDATE, so any effect at the row level would be a serialization
failure based on a write conflict, which doesn't sound like your
problem. They get different locks at the table level, though:

http://www.postgresql.org/docs/9.0/interactive/explicit-locking.html#LOCKING-TABLES

Somewhere in code you're not showing us you're acquiring a lock on
the ap_sysuptime table which conflicts with a ROW EXCLUSIVE lock but
not with a ROW SHARE lock. The lock types which could do that are
SHARE and SHARE ROW EXCLUSIVE. CREATE INDEX (without CONCURRENTLY)
could do that; otherwise it seems that you would need to be
explicitly issuing a LOCK statement at one of these levels somewhere
in your transaction. That is what is causing the transactions to
run one at a time.

-Kevin