Thread: When to do a vacuum for highly active table
We have a highly active table that has virtually all entries updated every 5 minutes. Typical size of the table is 50,000 entries, and entries have grown fat. We are currently vaccuming hourly, and towards the end of the hour we are seeing degradation, when compared to the top of the hour. Vaccum is slowly killing our system, as it is starting to take up to 10 minutes, and load at the time of vacuum is 6+ on a Linux box. During the vacuum, overall system is goin unresponsive, then comes back once vacuum completes. If we run vacuum less frequently, degradation continues to the point that we can't keep up with the throughput, plus vacuum takes longer anyway. Becoming quite a pickle:-) We are thinking of splitting the table in two: the part the updates often, and the part the updates infrequently as we suspect that record size impacts vacuum. Any ideas? Thanks, Mark -----------------
Markus Benne <thing@m-bass.com> writes: > We have a highly active table that has virtually all > entries updated every 5 minutes. Typical size of the > table is 50,000 entries, and entries have grown fat. > We are currently vaccuming hourly, and towards the end > of the hour we are seeing degradation, when compared > to the top of the hour. On something like this, you really need to be vacuuming more often not less so; I'd think about how to do it every five or ten minutes rather than backing off. With only 50K rows it should really not take more than a couple of seconds to do the vacuum. When you wait till there are 600K dead rows, it's going to take awhile, plus you are suffering across-the-board performance degradation from all the dead rows. If you are using PG 8.0, there are some "vacuum cost" knobs you can fiddle with to slow down vacuum so it doesn't impose as much I/O load. Ideally you could get it to where you could run vacuum as often as you need to without noticing much impact on foreground processing. If you're not using 8.0 ... maybe it's time to update. Another thing you might want to do is look at "vacuum verbose" output, which will give you some idea of the time spent in each step. It might be there are specific aspects that could be improved. > We are thinking of splitting the table in two: the > part the updates often, and the part the updates > infrequently as we suspect that record size impacts > vacuum. You just said that virtually all rows update constantly --- where's the "infrequent" part? regards, tom lane
On Tue, Aug 30, 2005 at 05:29:17PM -0400, Tom Lane wrote: > Markus Benne <thing@m-bass.com> writes: > > We have a highly active table that has virtually all > > entries updated every 5 minutes. Typical size of the > > table is 50,000 entries, and entries have grown fat. > ... > > We are thinking of splitting the table in two: the > > part the updates often, and the part the updates > > infrequently as we suspect that record size impacts > > vacuum. > You just said that virtually all rows update constantly --- where's > the "infrequent" part? I think he means splitting it vertically, instead of horizontally, and it sounds like an excellent idea, if a large enough portion of each record is in fact mostly fixed. Otherwise, PostgreSQL is copying data multiple times, only to have the data expire as part of a dead row. I've already started to notice such issues with postgresql - but more because I'm using low-end hardware, and I'm projecting the effect for when our database becomes much larger with much higher demand on the database. This is the sort of scenario where a database without transactional integrity would significantly out-perform one designed around it. If records are fixed sized, and updated in place, these problems would occur far less often. Is it heresy to suggest MySQL in here? :-) I switched from MySQL to PostgreSQL several months ago, and haven't looked back - but they do work differently, and for certain uses, one can destroy the other. Using a MyISAM table would be the way I would go with this sort of problem. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/
thing@m-bass.com (Markus Benne) writes: > We have a highly active table that has virtually all > entries updated every 5 minutes. Typical size of the > table is 50,000 entries, and entries have grown fat. > > We are currently vaccuming hourly, and towards the end > of the hour we are seeing degradation, when compared > to the top of the hour. You're not vacuuming the table nearly often enough. You should vacuum this table every five minutes, and possibly more often than that. [We have some tables like that, albeit smaller than 50K entries, which we vacuum once per minute in production...] > We are thinking of splitting the table in two: the part the updates > often, and the part the updates infrequently as we suspect that > record size impacts vacuum. There's *some* merit to that. You might discover that there's a "hot spot" that needs to be vacuumed once per minute. But it may be simpler to just hit the table with a vacuum once every few minutes even though some tuples are seldom updated. -- output = reverse("gro.gultn" "@" "enworbbc") http://cbbrowne.com/info/spreadsheets.html Signs of a Klingon Programmer #3: "By filing this TPR you have challenged the honor of my family. Prepare to die!"
mark@mark.mielke.cc writes: > I think he means splitting it vertically, instead of horizontally, and > it sounds like an excellent idea, if a large enough portion of each > record is in fact mostly fixed. Otherwise, PostgreSQL is copying data > multiple times, only to have the data expire as part of a dead row. Only up to a point. Fields that are wide enough to get toasted out-of-line (multiple Kb) do not get physically copied if there's a row update that doesn't affect them. We don't really have enough information about his table to guess whether there's any point in manually partitioning the columns, but my leaning would be "probably not" --- the overhead in joining the resulting two tables would be high enough that you'd need a heck of a big improvement to justify it. regards, tom lane
> -----Original Message----- > From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance- > owner@postgresql.org] On Behalf Of Markus Benne > Sent: Wednesday, August 31, 2005 12:14 AM > To: pgsql-performance@postgresql.org > Subject: [PERFORM] When to do a vacuum for highly active table > > We have a highly active table that has virtually all > entries updated every 5 minutes. Typical size of the > table is 50,000 entries, and entries have grown fat. > > We are currently vaccuming hourly, and towards the end > of the hour we are seeing degradation, when compared > to the top of the hour. > > Vaccum is slowly killing our system, as it is starting > to take up to 10 minutes, and load at the time of > vacuum is 6+ on a Linux box. During the vacuum, > overall system is goin unresponsive, then comes back > once vacuum completes. Play with vacuum_cost_delay option. In our case it made BIG difference (going from very heavy hitting to almost unnoticed vacuuming.) Hope it helps. Rigmor Ukuhe > > If we run vacuum less frequently, degradation > continues to the point that we can't keep up with the > throughput, plus vacuum takes longer anyway. > > Becoming quite a pickle:-) > > We are thinking of splitting the table in two: the > part the updates often, and the part the updates > infrequently as we suspect that record size impacts > vacuum. > > Any ideas? > > > Thanks, > Mark > > ----------------- > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster
rigmor.ukuhe@finestmedia.ee ("Rigmor Ukuhe") writes: >> -----Original Message----- >> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance- >> owner@postgresql.org] On Behalf Of Markus Benne >> Sent: Wednesday, August 31, 2005 12:14 AM >> To: pgsql-performance@postgresql.org >> Subject: [PERFORM] When to do a vacuum for highly active table >> >> We have a highly active table that has virtually all >> entries updated every 5 minutes. Typical size of the >> table is 50,000 entries, and entries have grown fat. >> >> We are currently vaccuming hourly, and towards the end >> of the hour we are seeing degradation, when compared >> to the top of the hour. >> >> Vaccum is slowly killing our system, as it is starting >> to take up to 10 minutes, and load at the time of >> vacuum is 6+ on a Linux box. During the vacuum, >> overall system is goin unresponsive, then comes back >> once vacuum completes. > > Play with vacuum_cost_delay option. In our case it made BIG difference > (going from very heavy hitting to almost unnoticed vacuuming.) That helps only if the ONLY problem you're having is from the direct I/O of the vacuum. If part of the problem is that the table is so large that it takes 4h for VACUUM to complete, thereby leaving a transaction open for 4h, thereby causing other degradations, then vacuum_cost_delay will have a NEGATIVE impact, as it will mean that the vacuum on that table will take even /more/ than 4h. :-( For the above scenario, it is almost certain that the solution comes in two pieces: 1. VACUUM FULL / CLUSTER to bring the size down. The table has grown "fat," and no number of repetitions of "plain vacuum" will fix this. 2. Do "plain vacuum" on the table VASTLY more frequently, probably every 5 minutes, possibly more often than that. By doing this, you prevent things from getting so bad again. By the way, in this sort of situation, _ANY_ transaction that runs more than about 5 minutes represents a serious enemy to performance, as it will tend to cause the "hot" table to "get fatter." -- (reverse (concatenate 'string "gro.gultn" "@" "enworbbc")) http://www.ntlug.org/~cbbrowne/linux.html TECO Madness: a moment of regret, a lifetime of convenience. -- Kent Pitman