Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Csaba Nagy
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 1151063635.3309.121.camel@coppola.muc.ecircle.de
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  ("Mark Woodward" <pgsql@mohawksoft.com>)
Responses Re: vacuum, performance, and MVCC  ("Mark Woodward" <pgsql@mohawksoft.com>)
List pgsql-hackers
> Let me ask a question, you have this hundred million row table. OK, how
> much of that table is "read/write?" Would it be posible to divide the
> table into two (or more) tables where one is basically static, only
> infrequent inserts and deletes, and the other is highly updated?

Well, all of it is read write... some of the data might be updated less
frequently, but there's no way I would know which part of the data is
that. Logically is just the same type of data... so unless I find a way
to continuously move back and forth the data between an archive table
and the live table, based on how active the groups are, I can't imagine
any other way of partitioning it. And that would also mean some quite
big load given the pretty high dynamics of the groups.

> The "big" thing in performance is the amount of disk I/O, if you have a
> smaller active table with only a single index, then you may be able to cut
> your disk I/O time really down. The smaller the row size, the more rows
> fit into a block. The fewer blocks the less dissk I/O. The less disk I/O
> the bbetter the performance.

I agree, but it is quite hard to achieve that when the data set is both
big AND the partitioning criteria is highly dynamic. Not to mention that
deleting from that table is also a PITA performance-wise, so I wonder
how well the continuous back and forth between the active and inactive
table would do.

> Also, and anyone listening correct me if I'm wrong, you NEED to vacuum
> frequently because the indexes grow and vacuuming them doesnt remove
> everything, sometimes a REINDEX or a drop/recreate is the only way to get
> performance back. So if you wait too long between vacuums, your indexes
> grow  and spread across more disk blocks than they should and thus use
> more disk I/O to search and/or shared memory to cache.

This is nice in theory, but kills performance. I vacuum the big tables
only overnight, otherwise the server is sluggish.

Cheers,
Csaba.




pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: xlog viewer proposal
Next
From: Andrew Dunstan
Date:
Subject: Re: [CORE] GPL Source and Copyright Questions