Thread: row reuse while UPDATE and vacuum analyze problem
Hi, testing of DBIlogging to postgres I've got serious problem with performance degradation during updates. In my case I got 15-16 req/sec for the first 1000 updates which drops down to 1-2 req. sec after 20000 updates. This is quite unusable even for medium Web site. As Tom Lane noticed update is just an insert, so if table has only one row which is updated by several processes the size will grow until 'vacuum analyze'. Indeed 'vacuum analyze' helps a lot, but index file doesn't affected, it remains big ! After 20190 updates and several 'vacuum analyze': -rw------- 1 postgres users 1810432 Jul 28 14:22 hits -rw------- 1 postgres users 1368064 Jul 28 14:22 hits_pkey om:/usr/local/pgsql/data/base/discovery$ psql discovery -c 'select count(*) from hits' count ----- 10000 (1 row) om:/usr/local/pgsql/data/base/discovery$ psql discovery -c 'select sum(count) from hits' sum ----- 20190 (1 row) I inserted 10,000 rows into table hits just to test how the number of rows could affect to performance while 2 rows are updated. I didn't notice any difference. After 'vacuum analyze': om:/usr/local/pgsql/data/base/discovery$ l hits* -rw------- 1 postgres users 606208 Jul 28 14:27 hits -rw------- 1 postgres users 1368064 Jul 28 14:27 hits_pkey om:/usr/local/pgsql/data/base/discovery$ Index file doesn't touched, actually modification date changed, but the size remains big. How update performance could be increased if:1. 'vacuum analyze' will analyze index file2. reuse row instead of inserting I found in TODO only * Allow row re-use without vacuum(Vadim) My site isn't in production yet, so I'd like to know are there some chance update problem will be solved. I think this is rather general problem and many Web developers will appreciate solving it as Jan's feature patch for LIMIT inspired many people to use postgres in real applications as well as great new MVCC feature. Regards, Oleg PS. For those who interested in my handler for Logging accumulated hits into postgres: In httpd.conf: PerlModule Apache::HitsDBI0 <Location /db/pubs.html> PerlCleanupHandler Apache::HitsDBI0 </Location> Table scheme: create table hits ( msg_id int4 not null primary key, count int4 not null, first_access datetime default now(), last_accessdatetime ); -- grant information GRANT SELECT ON hits to PUBLIC; GRANT INSERT,UPDATE ON hits to httpd; package Apache::HitsDBI0; use strict; # preloaded in startup.pl use Apache::Constants qw(:common); #use DBI (); sub handler { my $orig = shift; if ( $orig->args() =~ /msg_id=(\d+)/ ) { my $dbh = DBI->connect("dbi:Pg:dbname=discovery")|| die DBI->errstr; $dbh->{AutoCommit} = 0; my $sth = $dbh->do("LOCK TABLEhits IN SHARE ROW EXCLUSIVE MODE") || die $dbh->errstr; my $rows_affected = $dbh->do("update hits set count=count+1,last_access=now()where msg_id=$1") || die $dbh->errstr; ## postgres specific !!! $sth = $dbh->do("Insert Into hits (msg_id,count) values ($1, 1)") if ($rows_affected eq '0E0'); my $rc = $dbh->commit || die $dbh->errstr; } return OK; } 1; __END__ _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
On Wed, 28 Jul 1999, Oleg Bartunov wrote: > How update performance could be increased if: > 1. 'vacuum analyze' will analyze index file > 2. reuse row instead of inserting Just to clarify, 'reuse row' won't replace inserting (to the best of my knowledge), only reduce space wastage between vacuum's. Especially, again TTBOMK, with MVCC, where each "instance" of a row is serialized. Actually, there is a tought...if I understand the concept of MVCC, how is reusing a row going to work? My understanding is that I can "physically" have to copies of a row in a table, one newer then the other. So, if someone is running a SELECT while I'm doing an UPDATE, their SELECT will take the older version of hte row (the row at the time their SELECT started)...depending on how busy that table is, there will have to be some sort of mechanism for determining how 'stale' a row is, no? ie. on a *very* large table, with multiple SELECT/UPDATEs happening? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
At 09:00 28/07/99 -0300, you wrote: >On Wed, 28 Jul 1999, Oleg Bartunov wrote: > >Actually, there is a tought...if I understand the concept of MVCC, how is >reusing a row going to work? My understanding is that I can "physically" >have to copies of a row in a table, one newer then the other. So, if >someone is running a SELECT while I'm doing an UPDATE, their SELECT will >take the older version of hte row (the row at the time their SELECT >started)...depending on how busy that table is, there will have to be some >sort of mechanism for determining how 'stale' a row is, no? > >ie. on a *very* large table, with multiple SELECT/UPDATEs happening? > I presume that's part of MVCC - if PGSQL has a 'transaction id', then you only need to keep copies of rows back to the earliest started active tx. In fact, you only really need to keep versions to satisfy all currenct active tx's (ie. you don't need *all* intermediate versions). You must also keep a copy of a row prior to the current writer's update (until they commit). There was talk a while back about doing a 'background vacuum' - did the talk go anywhere, because reusing space is probably the only way to solve the infinite growth problem. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
On Wed, 28 Jul 1999, The Hermit Hacker wrote: > Date: Wed, 28 Jul 1999 09:00:21 -0300 (ADT) > From: The Hermit Hacker <scrappy@hub.org> > To: Oleg Bartunov <oleg@sai.msu.su> > Cc: pgsql-hackers@postgreSQL.org, tgl@sss.pgh.pa.us, vadim@krs.ru > Subject: Re: [HACKERS] row reuse while UPDATE and vacuum analyze problem > > On Wed, 28 Jul 1999, Oleg Bartunov wrote: > > > How update performance could be increased if: > > 1. 'vacuum analyze' will analyze index file > > 2. reuse row instead of inserting > > Just to clarify, 'reuse row' won't replace inserting (to the best of my > knowledge), only reduce space wastage between vacuum's. Especially, again > TTBOMK, with MVCC, where each "instance" of a row is serialized. > > Actually, there is a tought...if I understand the concept of MVCC, how is > reusing a row going to work? My understanding is that I can "physically" > have to copies of a row in a table, one newer then the other. So, if > someone is running a SELECT while I'm doing an UPDATE, their SELECT will > take the older version of hte row (the row at the time their SELECT > started)...depending on how busy that table is, there will have to be some > sort of mechanism for determining how 'stale' a row is, no? > > ie. on a *very* large table, with multiple SELECT/UPDATEs happening? This is what I noticed when start my testing about a week ago - I got duplicates, because of multiple concurrent processes trying inserts/updates. After LOCK TABLE hits IN SHARE ROW EXCLUSIVE MODE all problem were gone except performance slow downs a little bit. But after many updates performance dergrades very much IMO because table and index size grow even if I update the same row and even 'vacuum analyze' doesn't reduce the size of index file. In principle, I could live with cron job running vaccumdb every hour but vacuum doesn't touch indices. I hope I'll meet Vadim in Moscow and we discuss MVCC aspects in native russian :-) Regards, Oleg > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
After thinking about the row reuse problem for a while, this is the best I could come up with. I don't think it is well enough thought out to be worth implementing yet, but maybe the ideas are a step in the right direction. A partial solution to row-reuse is to allow writing commands (commands with write locks on a buffer) to purge expendable tuples as they find them. When a transaction starts it could look at the snapshots of concurrent transactions and compute a tide-mark equal to the oldest TID in all the concurrent snapshots. Any deleted tuple in a heap relation with a committed tmax value that is smaller than the tide-mark is invisible to any of the concurrent or future transactions, and is expendable. The write command that discovers the expendable transaction can then delete it, and reclaim the tuple space. Unfortunately this solves only half the problem. Deleting the tuple means that other scans no longer have to read and then discard it, but the space that the tuple uses is not really reclaimed because the way that insertions work is that they always add to the end of the heap. If all the tuples in a table are of fixed size then one solution would be to keep a list of empty tuple slots in the heap, and insert new tuples in these slots first. This would allow inserts to keep the table well packed. In the case of tables with variable length tuples the problem seems harder. Since the actual tuples in a table are always accessed through ItemIds, it is possible for a process with a write lock on a buffer to rearrange the tuples in the page to remove free space without affecting concurrent processes' views of the data within the buffer. After freeing the tuple, and compacting the space in the page, the process would have to update the free space list by removing any previous pointer to space on the page, and then re-inserting a pointer to the new space on the page. The free space list becomes quite a bit more complicated in this case, as it has to keep track of the sizes of the free space segments, and it needs to be indexed by the block in which the free space resides, and the size of the space available. This would seem to indicate the need for both a tree-structure and a hash structure associated with the free space list. Comments anyone? Bernie
> On Wed, 28 Jul 1999, Oleg Bartunov wrote: > > > How update performance could be increased if: > > 1. 'vacuum analyze' will analyze index file > > 2. reuse row instead of inserting > > Just to clarify, 'reuse row' won't replace inserting (to the best of my > knowledge), only reduce space wastage between vacuum's. Especially, again > TTBOMK, with MVCC, where each "instance" of a row is serialized. > > Actually, there is a tought...if I understand the concept of MVCC, how is > reusing a row going to work? My understanding is that I can "physically" > have to copies of a row in a table, one newer then the other. So, if > someone is running a SELECT while I'm doing an UPDATE, their SELECT will > take the older version of hte row (the row at the time their SELECT > started)...depending on how busy that table is, there will have to be some > sort of mechanism for determining how 'stale' a row is, no? > > ie. on a *very* large table, with multiple SELECT/UPDATEs happening? You would have to leave referenced rows alone. I think Vadim has this covered. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
At 15:54 28/07/99 +0000, Bernard Frankpitt wrote: >A partial solution to row-reuse is to allow writing commands (commands >with write locks on a buffer) to purge expendable tuples as they find >them. When a transaction starts it could look at the snapshots of >concurrent transactions and compute a tide-mark equal to the oldest TID >in all the concurrent snapshots. Any deleted tuple in a heap relation >with a committed tmax value that is smaller than the tide-mark is >invisible to any of the concurrent or future transactions, and is >expendable. The write command that discovers the expendable transaction >can then delete it, and reclaim the tuple space. Is there any way that the space can be freed as soon as it is no longer needed? I'm not sure how the MVCC stuff works, but I assume that when a R/O TX starts, a lock is taken out on the tables (and/or rows) that are read. If a R/W TX updates a record, then a new version is written with a new TID. Can the writer also mark the old version of the row for deletion, so that when the last reader commits, the row is deleted? I have no idea if this would be more or less efficient that making the writers do it, but it would have the advantage of distributing the load across more TXs. >Unfortunately this solves only half the problem. Deleting the tuple >means that other scans no longer have to read and then discard it, but >the space that the tuple uses is not really reclaimed because the way >that insertions work is that they always add to the end of the heap. If >all the tuples in a >table are of fixed size then one solution would be to keep a list of >empty tuple slots in the heap, and insert new tuples in these slots >first. This would allow inserts to keep the table well packed. This makes sense, but I would guess that fixed length tuples would not be common. >In the case of tables with variable length tuples the problem seems >harder. Since the actual tuples in a table are always accessed through >ItemIds, it is possible for a process with a write lock on a buffer to >rearrange the tuples in the page to remove free space without affecting >concurrent processes' views of the data within the buffer. After >freeing the tuple, and compacting the space in the page, the process >would have to update the free space list >by removing any previous pointer to space on the page, and then >re-inserting a pointer to the new space on the page. The free space >list becomes quite a bit more complicated in this case, as it has to >keep track of the sizes of the free space segments, and it needs to be >indexed by the block in which the free space resides, and the size of >the space available. This would seem to indicate the need for both a >tree-structure and a hash structure associated with the free space list. I'm not sure I follow this, I also don't know anything about the internals of the data storage code, but... Using my favorite database as a model, an alternative might be to (1) Record the free space on a page-by-page (or any kind of chunk-by-chunk) basis, (2) Don't bother doing any rearrangement when a record is deleted, just mark the record as free, and update the bytes-free count for the page, (3) When you want to store a record, look for any page that has enough bytes free, then do any space rearrangement necessary to store the record. AFAICT, most of the above steps (with the exception of page-fullness) must already be performed. This will of course fail totally if records are allowed to freely cross page boundaries. Please forgive me if this is way off the mark...and (if you have the patience) explain what I've missed. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/