Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers
From | Mark Woodward |
---|---|
Subject | Re: vacuum, performance, and MVCC |
Date | |
Msg-id | 17943.24.91.171.78.1151073015.squirrel@mail.mohawksoft.com Whole thread Raw |
In response to | Re: vacuum, performance, and MVCC (Hannu Krosing <hannu@skype.net>) |
Responses |
Re: vacuum, performance, and MVCC
Re: vacuum, performance, and MVCC |
List | pgsql-hackers |
> Ãhel kenal päeval, N, 2006-06-22 kell 12:41, kirjutas Mark Woodward: > >> > Depending on exact details and optimisations done, this can be either >> > slower or faster than postgresql's way, but they still need to do >> > something to get transactional visibility rules implemented. >> >> I think they have a different strategy. I think they maintain the notion >> of "current version" of a row, and hunt for previous versions when >> needed, >> at least that's how I suspect Oracle does it with redo logs. > > Not "current" but "last" :) True > > And one side effect of redo logs is that it is practically impossible to > do large deletes on production databases. So you design around that, > like you have to design around limitations of MVCC. Think that's bad, try doing an update in PostgreSQL on a table with 20 million rows and a few indexes. I had to write a script to chunk up the block update into segments and vacuum between each. > >> >> > > There has to be a more linear way of handling this scenario. >> >> > >> >> > So vacuum the table often. >> >> >> >> It's easy to say VACUUM often... but I'd bet that vacuuming is going >> >> to lessen the throughput in his tests even more; no matter how it's >> >> tuned. >> > >> > Running VACUUM often/continuously will likely keep his update rate >> > fluctuatons within a corridor of maybe 5-10%, at the cost of 1-2% >> extra >> > load. At least if vacuum is configured right and the server is not >> > already running at 100% IO saturation, in which case it will be worse. >> >> Assuming the table is a reasonable size, the I/O required for vacuum >> doesn't kill everything else! > > I have solved the problem of unneccessary IO by keeping active and > finished rows in separate tables, with the finish() function moving the > row between tables. Sorry, an RDBMS is a "relational database management system," if you are doing the "database management," it isn't a very good RDBMS. > > In case of the number of actively modified rows being in only tens or > low hundreds of thousands of rows, (i.e. the modified set fits in > memory) the continuous vacuum process shows up as just another backend, > not really taking order of magnitude more resources. It mainly generates > WAL traffic, as modified pages are already in memory/cache and are > mostly synced by background writer and/or checkpoint. > > Of course you have to adjust vacuum_cost_* variables so as to not > saturate IO. These sort of solutions, IMHO, don't show how good PostgreSQL is, but show where it is very lacking. > >> > The max throughput figure is not something you actually need very >> often >> > in production. >> >> No, but you need to have some degree of certainty and predictability in >> the system you are developing. > > Yup. You have to design it so it has. I was refereing to the system as a whole and the individual components. PostgreSQL's performance under some pathalogical condictions is not very predictable or reliable. > >> > What is interesting is setting up the server so that you >> > can service your loads comfortably. Running the server at 100% lead is >> > not anything you want to do on production server. There will be things >> > you need to do anyway and you need some headroom for that. >> >> Of course, you design it so peaks are easily managed, but unless you run >> vacuum continuously, and that has its own set of problems, you run into >> this problem, and it can get really really bad. > > Usually it gets really bad if you *don't* run vacuum continuously, maybe > hopeing to do it in slower times at night. For high-update db you have > to run it continuously, maybe having some 5-15 sec pauses between runs. And how much I/O does this take?
pgsql-hackers by date: