Thread: Long-running performance (MVCC, Vacuum, etc.) - Any fix?

Long-running performance (MVCC, Vacuum, etc.) - Any fix?

From
Eric Brown
Date:
I'm building an appliance where I don't want my customers having to
tune postgresql in any way from the configuration when I install it. I
don't even want them to know it is there. Yet in my study of posgresql,
it seems that even conscientious use of a vacuum daemon or cron job in
vacuum's various forms leaves databases after a while much larger and
lower performing than the actual space and performance after a
dump/restore.

It seems to me that with MVCC, an UPDATE (for example) is really a
DELETE+INSERT and it takes VACUUM to mark the old DELETED row as free
space again. So changing one boolean or increasing one counter in a row
of 100 bytes will relocate this row in a different location. While this
in itself isn't terrible as some DB page is dirty no matter what,
unless I'm lucky enough to insert into the same page that changed, my
index(es) for this table will have to be changed and this will dirty
yet another page that wouldn't be dirty without MVCC. And this is not
to mention the resources required by VACUUM. And the disk bloat over
time would imply that postgresql doesn't always efficiently use
free-space with records/rows of similar size and so performance
degrades as there are fewer rows per page over time.

My application is multi-threaded and I could care less about the
concurrency afforded by MVCC. I'd rather just update the things in
place and get less postgresql concurrency but more consistent
long-running performance and disk space utilization.

Is my interpretation correct?
Is there a way to turn off MVCC?
Do fixed sized rows help any?
Is there anybody using this thing in an appliance type application?
I'm quite far along with stored-procedures and whatnot, but if
postgresql really isn't the right solution due to these reasons, I'm
curious if anybody has alternate OpenSource suggestions? (I'm actually
migrating from sleepycat bsddb at the moment because I didn't realize
the licensing costs involved there. My needs really aren't that
extensive really - a few associative indexes, cascading delete, etc.)


My large data sets will have the following characteristics:
5,000,000 Rows x 50 bytes/row (could be fixed) w/1 multi-column index,
1 single-column timestamp index AT 1,000,000 index searches, 300,000
reads, 150,000 updates per day
720,000 Rows x 32 bytes/row fixed w/ 1 multi-column index, 1
single-column timestamp index AT 150,000 index search, 150,000 updates,
100s of reads per day
2,000,000 Rows x 4000 bytes avg/row AT 100,000 inserts per day, 150,000
reads per day

Thanks,
Eric

Eric Brown
408-571-6341
www.propel.com

Re: Long-running performance (MVCC, Vacuum, etc.) - Any fix?

From
Martijn van Oosterhout
Date:
On Tue, Dec 28, 2004 at 02:40:52AM -0800, Eric Brown wrote:
> I'm building an appliance where I don't want my customers having to
> tune postgresql in any way from the configuration when I install it. I
> don't even want them to know it is there. Yet in my study of posgresql,
> it seems that even conscientious use of a vacuum daemon or cron job in
> vacuum's various forms leaves databases after a while much larger and
> lower performing than the actual space and performance after a
> dump/restore.

<snip>

Firstly, you havn't said what version of PostgreSQL you are using.
Recent versions behave better. Also, the autovacuum daemon will
probably do most of what you want...

> My application is multi-threaded and I could care less about the
> concurrency afforded by MVCC. I'd rather just update the things in
> place and get less postgresql concurrency but more consistent
> long-running performance and disk space utilization.
>
> Is my interpretation correct?

Yes. Although, for a simple counter you might be able to use sequences,
depending on the exact semantics. They don't take any additional space
when you increment them.

> Is there a way to turn off MVCC?

Nope. This is a Relational Database with everything that entails. If
you don't want that, maybe you should look into a database designed to
be embedded (SQLDB iirc).

> Do fixed sized rows help any?

I doubt it.

> Is there anybody using this thing in an appliance type application?

Probably, but I don't know where...

> I'm quite far along with stored-procedures and whatnot, but if
> postgresql really isn't the right solution due to these reasons, I'm
> curious if anybody has alternate OpenSource suggestions? (I'm actually
> migrating from sleepycat bsddb at the moment because I didn't realize
> the licensing costs involved there. My needs really aren't that
> extensive really - a few associative indexes, cascading delete, etc.)

I'd suggest, use the autovacuum daemon, tune your FSM settings to what
you expect the load to be and maybe schedule a database-wide
REINDEX/VACUUM FULL regularly. We found doing it once a month was
enough to keep it tip-top over the long term...

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Long-running performance (MVCC, Vacuum, etc.) - Any fix?

From
Vivek Khera
Date:
>>>>> "MvO" == Martijn van Oosterhout <kleptog@svana.org> writes:

MvO> I'd suggest, use the autovacuum daemon, tune your FSM settings to what
MvO> you expect the load to be and maybe schedule a database-wide
MvO> REINDEX/VACUUM FULL regularly. We found doing it once a month was
MvO> enough to keep it tip-top over the long term...

If you know your application usage patterns well, scheduling vacuum
via cron probably would be better than autovacuum.  autovacuum runs
whenever it pleases, and doesn't deal well with extremely large tables
(ie, it never runs vacuum on them since the % change may be small even
though millions of rows are "expired").  You can even vacuum differnt
tables on different schedules.  Just make sure the VACUUMs don't
overlap!

I haven't run reindex or vacuum full since I upgraded to 7.4.x.  The
indexes are not bloating, and sufficient calls to vacuum keep the
"stable set size" of the db pretty constant.  ie, it doesn't grow
without bounds, as the OP fears.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: Long-running performance (MVCC, Vacuum, etc.) - Any fix?

From
Greg Stark
Date:
Eric Brown <eric.brown@propel.com> writes:

> My application is multi-threaded and I could care less about the concurrency
> afforded by MVCC. I'd rather just update the things in place and get less
> postgresql concurrency but more consistent long-running performance and disk
> space utilization.

Postgres uses MVCC for guaranteeing that it can roll back transactions too...

> Is my interpretation correct?
> Is there a way to turn off MVCC?
> Do fixed sized rows help any?

Postgres doesn't do anything special for fixed size rows.


I would suggest you look at pg_autovacuum, but in general while a
maintenance-free system is everyone's goal it's pretty far off.

--
greg