Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Mark Woodward
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 17943.24.91.171.78.1151073015.squirrel@mail.mohawksoft.com
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Hannu Krosing <hannu@skype.net>)
Responses Re: vacuum, performance, and MVCC  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: vacuum, performance, and MVCC  (Hannu Krosing <hannu@skype.net>)
List pgsql-hackers
> Ühel kenal päeval, N, 2006-06-22 kell 12:41, kirjutas Mark Woodward:
>
>> > Depending on exact details and optimisations done, this can be either
>> > slower or faster than postgresql's way, but they still need to do
>> > something to get transactional visibility rules implemented.
>>
>> I think they have a different strategy. I think they maintain the notion
>> of "current version" of a row, and hunt for previous versions when
>> needed,
>> at least that's how I suspect Oracle does it with redo logs.
>
> Not "current" but "last" :)

True
>
> And one side effect of redo logs is that it is practically impossible to
> do large deletes on production databases. So you design around that,
> like you have to design around limitations of MVCC.

Think that's bad, try doing an update in PostgreSQL on a table with 20
million rows and a few indexes. I had to write a script to chunk up the
block update into segments and vacuum between each.

>
>> >> > > There has to be a more linear way of handling this scenario.
>> >> >
>> >> > So vacuum the table often.
>> >>
>> >> It's easy to say VACUUM often... but I'd bet that vacuuming is going
>> >> to lessen the throughput in his tests even more; no matter how it's
>> >> tuned.
>> >
>> > Running VACUUM often/continuously will likely keep his update rate
>> > fluctuatons within a corridor of maybe 5-10%, at the cost of 1-2%
>> extra
>> > load. At least if vacuum is configured right and the server is not
>> > already running at 100% IO saturation, in which case it will be worse.
>>
>> Assuming the table is a reasonable size, the I/O required for vacuum
>> doesn't kill everything else!
>
> I have solved the problem of unneccessary IO by keeping active and
> finished rows in separate tables, with the finish() function moving the
> row between tables.

Sorry, an RDBMS is a "relational database management system," if you are
doing the "database management," it isn't a very good RDBMS.

>
> In case of the number of actively modified rows being in only tens or
> low hundreds of thousands of rows, (i.e. the modified set fits in
> memory) the continuous vacuum process shows up as just another backend,
> not really taking order of magnitude more resources. It mainly generates
> WAL traffic, as modified pages are already in memory/cache and are
> mostly synced by background writer and/or checkpoint.
>
> Of course you have to adjust vacuum_cost_* variables so as to not
> saturate IO.

These sort of solutions, IMHO, don't show how good PostgreSQL is, but show
where it is very lacking.

>
>> > The max throughput figure is not something you actually need very
>> often
>> > in production.
>>
>> No, but you need to have some degree of certainty and predictability in
>> the system you are developing.
>
> Yup. You have to design it so it has.

I was refereing to the system as a whole and the individual components.
PostgreSQL's performance under some pathalogical condictions is not very
predictable or reliable.

>
>> > What is interesting is setting up the server so that you
>> > can service your loads comfortably. Running the server at 100% lead is
>> > not anything you want to do on production server. There will be things
>> > you need to do anyway and you need some headroom for that.
>>
>> Of course, you design it so peaks are easily managed, but unless you run
>> vacuum continuously, and that has its own set of problems, you run into
>> this problem, and it can get really really bad.
>
> Usually it gets really bad if you *don't* run vacuum continuously, maybe
> hopeing to do it in slower times at night. For high-update db you have
> to run it continuously, maybe having some 5-15 sec pauses between runs.

And how much I/O does this take?


pgsql-hackers by date:

Previous
From: Tzahi Fadida
Date:
Subject: Re: Planning without reason.
Next
From: "Dave Page"
Date:
Subject: Re: [CORE] GPL Source and Copyright Questions