Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Rick Gigger
Subject Re: vacuum, performance, and MVCC
Date
Msg-id B930486B-E0F1-4D4F-BD1C-2C9F61703136@alpinenetworking.com
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  ("Mark Woodward" <pgsql@mohawksoft.com>)
List pgsql-hackers
On Jun 22, 2006, at 10:33 AM, Mark Woodward wrote:

>> Ühel kenal päeval, N, 2006-06-22 kell 09:59, kirjutas Mark
>> Woodward:
>>>> After a long battle with technology, pgsql@mohawksoft.com ("Mark
>>>> Woodward"), an earthling, wrote:
>>>>>> Clinging to sanity, pgsql@mohawksoft.com ("Mark Woodward")
>>>>>> mumbled
>>> into
>>>> It pointed to *ALL* the versions.
>>>
>>> Hmm, OK, then the problem is more serious than I suspected.
>>> This means that every index on a row has to be updated on every
>>> transaction that modifies that row. Is that correct?
>>
>> Yes.
>>
>>> I am attaching some code that shows the problem with regard to
>>> applications such as web server session management, when run, each
>>> second
>>> the system can handle fewer and fewer connections. Here is a brief
>>> output:
>>>
>>> markw@ent:~/pgfoo$ ./footest
>>> 1307 sessions per second, elapsed: 1
>>> 1292 sessions per second, elapsed: 2
>>> 1287 sessions per second, elapsed: 3
>>> ....
>>> 1216 sessions per second, elapsed: 25
>>> 1213 sessions per second, elapsed: 26
>>> 1208 sessions per second, elapsed: 27
>>> ....
>>> 1192 sessions per second, elapsed: 36
>>> 1184 sessions per second, elapsed: 37
>>> 1183 sessions per second, elapsed: 38
>>> ....
>>> 1164 sessions per second, elapsed: 58
>>> 1170 sessions per second, elapsed: 59
>>> 1168 sessions per second, elapsed: 60
>>>
>>> As you can see, in about a minute at high load, this very simple
>>> table
>>> lost about 10% of its performance, and I've seen worse based on
>>> update
>>> frequency.  Before you say this is an obscure problem, I can tell
>>> you it
>>> isn't. I have worked with more than a few projects that had to
>>> switch
>>> away
>>> from PostgreSQL because of this behavior.
>>
>> You mean systems that are designed so exactly, that they can't
>> take 10%
>> performance change ?
>
> No, that's not really the point, performance degrades over time, in
> one
> minute it degraded 10%.
>
> The update to session ratio has a HUGE impact on PostgreSQL. If you
> have a
> thousand active sessions, it may take a minute to degrade 10% assuming
> some level of active vs operations per session per action.

How big are your session?  Running with about 1000 sessions, running
vacuum on just the session table is so fast it is barely noticeable.
Vacuuming my session table every 5 minutes keeps them very, very
small and easy to vacuum and performance degradation is not an
issue.  I could probably do it every minute if I had to and it would
be fine.  But my sessions are only about 5k on average.

What is a bigger concern for me is the massive amount of writes to
the disk that happen in postgres to make sure the data is safe.  It's
just a waste of disk bandwidth for data for data that is transient
anyway.

To me postgres (and rdbms's in general) are just not good for
handling session data for web apps.  Once again isn't that why you
wrote mcache?

>
> If an active user causes a session update once a second, that is
> not too
> bad, but if an active user updates a session more often, then it is
> worse.
>
> Generally speaking, sessions aren't updated when they change, they are
> usually updated per HTTP request. The data in a session may not
> change,
> but the session handling code doesn't know this and simply updates
> anyway.

This problem is more or less specific to php no?  Because it reads
the whole session on session open and writes the whole thing on
close.  Because of this I am looking into smarter ways of handling
sessions than this.  Because yes, the session data RARELY changes.  I
am looking into ways of only doing updates when the data changes.  In
fact for a very similar problem, where I had tons of tiny requests
coming in that would NEVER alter the sessions I skipped session_start
and used my own session_touch function to update the timestamp on the
session because that's all I needed to do.  It saved TONS of wasted
overhead.

I don't mean to get off topic but it seems like these sorts of
problems are better solved outside of postgres.  I think your session
daemon is in fact the right approach here.  If you have other tables
with similar problems that is one thing but if it is just php session
tables then I think we need to look for a better use case to look
into this.

>
> In a heavily AJAX site, you may have many smaller HTTP requests
> returning
> items in a page. So, a single page may consist of multiple HTTP
> requests.
> Worse yet, as a user drags an image around, there are lots of
> background
> requests being made. Each request typically means a session lookup
> and a
> session update. This is compounded by the number of active users.
> Since
> the object of a site is to have many active users, this is always a
> problem. It is less intrusive now that non-locking vacuum is there,
> but
> that doesn't mean it isn't a problem.

Once again I think to run an "Enterprise" app (and by that I mean it
scales well) you need "Enterprise" class session management.  The php
model is not good for this and using postgres is not good for this.
It's just not the right tool for this job in my opinion.  I would
think you could gain WAY more by using a more intelligent session
handling model then you could ever eek out of postgres for this problem.

>
>>
>> Or just that they did not vacuum for so long, that performance was
>> less
>> than needed in the end?
>
> In an active site or application, vacuuming often enough to prevent
> this
> often is, itself, a load on the system.

Once again this is just anecdotal evidence but for me that load is
virtually unnoticeable it is so small.  The key here is that session
tables are SMALL.  That is less than a few thousand rows.  Maybe you
are storing a lot more data in there than I am but for me a  vacuum
every 5 minutes solves the problem performance degradation.  It is
still too much disk bandwidth though so I am just going to stop
storing sessions in postgres.

>
>>
>> btw, what did they switch to ?
>
> One switched to oracle and one is using a session handler I wrote
> for PHP.

Just switching the session handler sounds like a pretty good
solutions to this problem to me.

just my $0.02

Rick

pgsql-hackers by date:

Previous
From: Rick Gigger
Date:
Subject: Re: vacuum, performance, and MVCC
Next
From: Rick Gigger
Date:
Subject: Re: vacuum, performance, and MVCC