Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Mark Woodward
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 18739.24.91.171.78.1151063858.squirrel@mail.mohawksoft.com
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Csaba Nagy <nagy@ecircle-ag.com>)
Responses Re: vacuum, performance, and MVCC  (Csaba Nagy <nagy@ecircle-ag.com>)
List pgsql-hackers
>>     I suppose you have a table memberships (user_id, group_id) or something
>> like it ; it should have as few columns as possible ; then try regularly
>> clustering on group_id (maybe once a week) so that all the records for a
>> particular group are close together. Getting the members of a group to
>> send them an email should be faster (less random seeks).
>
> It is like this, and some more bookkeeping data which must be there...
> we could split the table for smaller records or for updatable/stable
> fields, but at the end of the day it doesn't make much sense, usually
> all the data is needed and I wonder if more big/shallow tables instead
> of one big/wider makes sense...
>
> Regularly clustering is out of question as it would render the system
> unusable for hours. There's no "0 activity hour" we could use for such
> stuff. There's always something happening, only the overall load is
> smaller at night...
>

Let me ask a question, you have this hundred million row table. OK, how
much of that table is "read/write?" Would it be posible to divide the
table into two (or more) tables where one is basically static, only
infrequent inserts and deletes, and the other is highly updated?

The "big" thing in performance is the amount of disk I/O, if you have a
smaller active table with only a single index, then you may be able to cut
your disk I/O time really down. The smaller the row size, the more rows
fit into a block. The fewer blocks the less dissk I/O. The less disk I/O
the bbetter the performance.

Also, and anyone listening correct me if I'm wrong, you NEED to vacuum
frequently because the indexes grow and vacuuming them doesnt remove
everything, sometimes a REINDEX or a drop/recreate is the only way to get
performance back. So if you wait too long between vacuums, your indexes
grow  and spread across more disk blocks than they should and thus use
more disk I/O to search and/or shared memory to cache.


pgsql-hackers by date:

Previous
From: Thomas Hallgren
Date:
Subject: Re: Shared library conflicts
Next
From: Simon Riggs
Date:
Subject: Re: xlog viewer proposal