Thread: Slow concurrent update of same row in a given table
Hi While doing some stress testing for updates in a small sized table we found the following results. We are not too happy about the speed of the updates particularly at high concurrency (10 clients). Initially we get 119 updates / sec but it drops to 10 updates/sec as concurrency is increased. PostgreSQL: 8.0.3 ------------------------------- TABLE STRUCTURE: general.stress ------------------------------- | dispatch_id | integer | not null | | query_id | integer | | | generated | timestamp with time zone | | | unsubscribes | integer | | | read_count | integer | | | status | character varying(10) | | | bounce_tracking | boolean | | | dispatch_hour | integer | | | dispatch_date_id | integer | | +------------------+--------------------------+-----------+ Indexes: "stress_pkey" PRIMARY KEY, btree (dispatch_id) UPDATE STATEMENT: update general.stress set read_count=read_count+1 where dispatch_id=114 TOOL USED: Perl/DBI , with prepared statement handlers CONCURRENCY METHOD: executing multiple copies of same program from different shells (linux enviornment) CLIENT SERVER LINK : 10/100 Mbits , LAN CLIENT CODE: stress.pl ------------------------------------------------------------------------- #!/opt/perl/bin/perl -I/usr/local/masonapache/lib/perl ################################################ #overview: update the table as fast as possible (while(1){}) #on every 100th commit , print the average update frequency #of last 100 updates ########################################## use strict; use Time::HiRes qw(gettimeofday tv_interval); use Utils; my $dbh = &Utils::db_connect(); my $sth = $dbh -> prepare("update general.stress set read_count=read_count+1 where dispatch_id=114"); my $cnt=0; my $t0 = [ gettimeofday ]; while(1) { $sth -> execute(); $dbh->commit(); $cnt++; if ($cnt % 100 == 0) { my $t1 = [ gettimeofday ]; my $elapsed = tv_interval ( $t0 , $t1 ); $t0 = $t1; printf "Rate: %d updates / sec\n" , 100.0/$elapsed ; } } $sth->finish(); $dbh->disconnect(); -------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------- RESULTS: -------------------------------------------------------------------------------------- Number of Copies | Update perl Sec 1 --> 119 2 ---> 59 3 ---> 38 4 ---> 28 5 --> 22 6 --> 19 7 --> 16 8 --> 14 9 --> 11 10 --> 11 11 --> 10 ------------------------------------------------------------------------------------- Note that the table was vacuum analyzed during the tests total number of records in table: 93 ------------------------------------------------------------------------------------- Regds Rajesh Kumar Mallah.
On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote: > Hi > > While doing some stress testing for updates in a small sized table > we found the following results. We are not too happy about the speed > of the updates particularly at high concurrency (10 clients). > > Initially we get 119 updates / sec but it drops to 10 updates/sec > as concurrency is increased. > > PostgreSQL: 8.0.3 > ------------------------------- > TABLE STRUCTURE: general.stress > ------------------------------- > | dispatch_id | integer | not null | > | query_id | integer | | > | generated | timestamp with time zone | | > | unsubscribes | integer | | > | read_count | integer | | > | status | character varying(10) | | > | bounce_tracking | boolean | | > | dispatch_hour | integer | | > | dispatch_date_id | integer | | > +------------------+--------------------------+-----------+ > Indexes: > "stress_pkey" PRIMARY KEY, btree (dispatch_id) > > UPDATE STATEMENT: > update general.stress set read_count=read_count+1 where dispatch_id=114 This means you are updating only one row, correct? > Number of Copies | Update perl Sec > > 1 --> 119 > 2 ---> 59 > 3 ---> 38 > 4 ---> 28 > 5 --> 22 > 6 --> 19 > 7 --> 16 > 8 --> 14 > 9 --> 11 > 10 --> 11 > 11 --> 10 So, 11 instances result in 10 updated rows per second, database wide or per instance? If it is per instance, then 11 * 10 is close to the performance for one connection. That being said, when you've got 10 connections fighting over one row, I wouldn't be surprised if you had bad performance. Also, at 119 updates a second, you're more than doubling the table's initial size (dead tuples) each second. How often are you vacuuming and are you using vacuum or vacuum full? Gavin
On 9/28/05, Gavin Sherry <swm@alcove.com.au> wrote: > On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote: > > > Hi > > > > While doing some stress testing for updates in a small sized table > > we found the following results. We are not too happy about the speed > > of the updates particularly at high concurrency (10 clients). > > > > Initially we get 119 updates / sec but it drops to 10 updates/sec > > as concurrency is increased. > > > > PostgreSQL: 8.0.3 > > ------------------------------- > > TABLE STRUCTURE: general.stress > > ------------------------------- > > | dispatch_id | integer | not null | > > | query_id | integer | | > > | generated | timestamp with time zone | | > > | unsubscribes | integer | | > > | read_count | integer | | > > | status | character varying(10) | | > > | bounce_tracking | boolean | | > > | dispatch_hour | integer | | > > | dispatch_date_id | integer | | > > +------------------+--------------------------+-----------+ > > Indexes: > > "stress_pkey" PRIMARY KEY, btree (dispatch_id) > > > > UPDATE STATEMENT: > > update general.stress set read_count=read_count+1 where dispatch_id=114 > > This means you are updating only one row, correct? Correct. > > > Number of Copies | Update perl Sec > > > > 1 --> 119 > > 2 ---> 59 > > 3 ---> 38 > > 4 ---> 28 > > 5 --> 22 > > 6 --> 19 > > 7 --> 16 > > 8 --> 14 > > 9 --> 11 > > 10 --> 11 > > 11 --> 10 > > So, 11 instances result in 10 updated rows per second, database wide or > per instance? If it is per instance, then 11 * 10 is close to the > performance for one connection. Sorry do not understand the difference between "database wide" and "per instance" > > That being said, when you've got 10 connections fighting over one row, I > wouldn't be surprised if you had bad performance. > > Also, at 119 updates a second, you're more than doubling the table's > initial size (dead tuples) each second. How often are you vacuuming and > are you using vacuum or vacuum full? Yes I realize the obvious phenomenon now, (and the uselessness of the script) , we should not consider it a performance degradation. I am having performance issue in my live database thats why i tried to simulate the situation(may the the script was overstresser). My original problem is that i send 100 000s of emails carrying a beacon for tracking readership every tuesday and on wednesday i see lot of the said query in pg_stat_activity each of these query update the SAME row that corresponds to the dispatch of last day and it is then i face the performance problem. I think i can only post further details next wednesday , please lemme know how should i be dealing with the situation if each the updates takes 100times more time that normal update duration. Best Regards Mallah. > > Gavin >
On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote: > > > Number of Copies | Update perl Sec > > > > > > 1 --> 119 > > > 2 ---> 59 > > > 3 ---> 38 > > > 4 ---> 28 > > > 5 --> 22 > > > 6 --> 19 > > > 7 --> 16 > > > 8 --> 14 > > > 9 --> 11 > > > 10 --> 11 > > > 11 --> 10 > > > > So, 11 instances result in 10 updated rows per second, database wide or > > per instance? If it is per instance, then 11 * 10 is close to the > > performance for one connection. > > > Sorry do not understand the difference between "database wide" > and "per instance" Per instance. > > > > > That being said, when you've got 10 connections fighting over one row, I > > wouldn't be surprised if you had bad performance. > > > > Also, at 119 updates a second, you're more than doubling the table's > > initial size (dead tuples) each second. How often are you vacuuming and > > are you using vacuum or vacuum full? > > > Yes I realize the obvious phenomenon now, (and the uselessness of the script) > , we should not consider it a performance degradation. > > I am having performance issue in my live database thats why i tried to > simulate the situation(may the the script was overstresser). > > My original problem is that i send 100 000s of emails carrying a > beacon for tracking readership every tuesday and on wednesday i see > lot of the said query in pg_stat_activity each of these query update > the SAME row that corresponds to the dispatch of last day and it is > then i face the performance problem. > > I think i can only post further details next wednesday , please lemme > know how should i be dealing with the situation if each the updates takes > 100times more time that normal update duration. I see. These problems regularly come up in database design. The best thing you can do is modify your database design/application such that instead of incrementing a count in a single row, you insert a row into a table, recording the 'dispatch_id'. Counting the number of rows for a given dispatch id will give you your count. Thanks, Gavin
On 9/29/05, Gavin Sherry <swm@alcove.com.au> wrote: > On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote: > > > > > Number of Copies | Update perl Sec > > > > > > > > 1 --> 119 > > > > 2 ---> 59 > > > > 3 ---> 38 > > > > 4 ---> 28 > > > > 5 --> 22 > > > > 6 --> 19 > > > > 7 --> 16 > > > > 8 --> 14 > > > > 9 --> 11 > > > > 10 --> 11 > > > > 11 --> 10 > > > > > > So, 11 instances result in 10 updated rows per second, database wide or > > > per instance? If it is per instance, then 11 * 10 is close to the > > > performance for one connection. > > > > > > Sorry do not understand the difference between "database wide" > > and "per instance" > > Per instance. > > > > > > > > > That being said, when you've got 10 connections fighting over one row, I > > > wouldn't be surprised if you had bad performance. > > > > > > Also, at 119 updates a second, you're more than doubling the table's > > > initial size (dead tuples) each second. How often are you vacuuming and > > > are you using vacuum or vacuum full? > > > > > > Yes I realize the obvious phenomenon now, (and the uselessness of the script) > > , we should not consider it a performance degradation. > > > > I am having performance issue in my live database thats why i tried to > > simulate the situation(may the the script was overstresser). > > > > My original problem is that i send 100 000s of emails carrying a > > beacon for tracking readership every tuesday and on wednesday i see > > lot of the said query in pg_stat_activity each of these query update > > the SAME row that corresponds to the dispatch of last day and it is > > then i face the performance problem. > > > > I think i can only post further details next wednesday , please lemme > > know how should i be dealing with the situation if each the updates takes > > 100times more time that normal update duration. > > I see. These problems regularly come up in database design. The best thing > you can do is modify your database design/application such that instead of > incrementing a count in a single row, you insert a row into a table, > recording the 'dispatch_id'. Counting the number of rows for a given > dispatch id will give you your count. > sorry i will be accumulating huge amount of rows in seperate table with no extra info when i really want just the count. Do you have a better database design in mind? Also i encounter same problem in implementing read count of articles in sites and in counting banner impressions where same row get updated by multiple processes frequently. Thanks & Regds mallah. > Thanks, > > Gavin >
On Thu, 29 Sep 2005, Rajesh Kumar Mallah wrote: > On 9/29/05, Gavin Sherry <swm@alcove.com.au> wrote: > > On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote: > > > > > > > Number of Copies | Update perl Sec > > > > > > > > > > 1 --> 119 > > > > > 2 ---> 59 > > > > > 3 ---> 38 > > > > > 4 ---> 28 > > > > > 5 --> 22 > > > > > 6 --> 19 > > > > > 7 --> 16 > > > > > 8 --> 14 > > > > > 9 --> 11 > > > > > 10 --> 11 > > > > > 11 --> 10 > > > > > > > > So, 11 instances result in 10 updated rows per second, database wide or > > > > per instance? If it is per instance, then 11 * 10 is close to the > > > > performance for one connection. > > > > > > > > > Sorry do not understand the difference between "database wide" > > > and "per instance" > > > > Per instance. > > > > > > > > > > > > > That being said, when you've got 10 connections fighting over one row, I > > > > wouldn't be surprised if you had bad performance. > > > > > > > > Also, at 119 updates a second, you're more than doubling the table's > > > > initial size (dead tuples) each second. How often are you vacuuming and > > > > are you using vacuum or vacuum full? > > > > > > > > > Yes I realize the obvious phenomenon now, (and the uselessness of the script) > > > , we should not consider it a performance degradation. > > > > > > I am having performance issue in my live database thats why i tried to > > > simulate the situation(may the the script was overstresser). > > > > > > My original problem is that i send 100 000s of emails carrying a > > > beacon for tracking readership every tuesday and on wednesday i see > > > lot of the said query in pg_stat_activity each of these query update > > > the SAME row that corresponds to the dispatch of last day and it is > > > then i face the performance problem. > > > > > > I think i can only post further details next wednesday , please lemme > > > know how should i be dealing with the situation if each the updates takes > > > 100times more time that normal update duration. > > > > I see. These problems regularly come up in database design. The best thing > > you can do is modify your database design/application such that instead of > > incrementing a count in a single row, you insert a row into a table, > > recording the 'dispatch_id'. Counting the number of rows for a given > > dispatch id will give you your count. > > > > sorry i will be accumulating huge amount of rows in seperate table > with no extra info when i really want just the count. Do you have > a better database design in mind? > > Also i encounter same problem in implementing read count of > articles in sites and in counting banner impressions where same > row get updated by multiple processes frequently. As I said in private email, accumulating large numbers of rows is not a problem. In your current application, you are write bound, not read bound. I've designed many similar systems which have hundred of millions of rows. It takes a while to generate the count, but you just do it periodically in non-busy periods. With 8.1, constraint exclusion will give you significantly better performance with this system, as well. Thanks, Gavin
On Thu, Sep 29, 2005 at 07:59:34AM +0530, Rajesh Kumar Mallah wrote: > > I see. These problems regularly come up in database design. The best thing > > you can do is modify your database design/application such that instead of > > incrementing a count in a single row, you insert a row into a table, > > recording the 'dispatch_id'. Counting the number of rows for a given > > dispatch id will give you your count. > > > > sorry i will be accumulating huge amount of rows in seperate table > with no extra info when i really want just the count. Do you have > a better database design in mind? > > Also i encounter same problem in implementing read count of > articles in sites and in counting banner impressions where same > row get updated by multiple processes frequently. Databases like to work on *sets* of data, not individual rows. Something like this would probably perform much better than what you've got now, and would prevent having a huge table laying around: INSERT INTO holding_table ... -- Done for every incomming connection/what-have-you CREATE OR REPLACE FUNCTION summarize() RETURNS void AS $$ DECLARE v_rows int; BEGIN DELETE FROM holding_table; GET DIAGNOSTICS v_rows = ROW_COUNT; UPDATE count_table SET count = count + v_rows ; END; $$ LANGUAGE plpgsql; Periodically (say, once a minute): SELECT summarize() VACUUM holding_table; VACUUM count_table; -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461