Thread: Frequently updated tables

Frequently updated tables

From
pgsql@mohawksoft.com
Date:
I've been down several roads about how to handle data that has to change
on a very frequent and rapid manner.

Think about summary tables, WEB session tables, etc. As great as MVCC is
for the vast majority of uses. The overhead of updates and deletes can
kill a project that needs to constantly update tables.

In most cases, the answer is just not to use PostgreSQL for that, but then
you are stuck with Mysql or something worse.

Would having a special class of table that is marked for frequent updates,
which causes the system to lock and update in place, make sense? Is that
even possible?

There has to be a way of making PostgreSQL able to handle this class of
problem. Anyone have a reasonable idea?


Re: Frequently updated tables

From
Mark Kirkwood
Date:

pgsql@mohawksoft.com wrote:

>I've been down several roads about how to handle data that has to change
>on a very frequent and rapid manner.
>
>Think about summary tables, WEB session tables, etc. As great as MVCC is
>for the vast majority of uses. The overhead of updates and deletes can
>kill a project that needs to constantly update tables.
>
>
>  
>

Are you saying that MVCC has *by design* a higher overhead for updates 
and deletes? or are you referring to the gradual loss of performance as 
a consequence of many dead tuples?

I am guessing you mean the latter, but best to be sure :-)

regards

Mark


Re: Frequently updated tables

From
pgsql@mohawksoft.com
Date:
>
>
> pgsql@mohawksoft.com wrote:
>
>>I've been down several roads about how to handle data that has to change
>>on a very frequent and rapid manner.
>>
>>Think about summary tables, WEB session tables, etc. As great as MVCC is
>>for the vast majority of uses. The overhead of updates and deletes can
>>kill a project that needs to constantly update tables.
>>
>>
>>
>>
>
> Are you saying that MVCC has *by design* a higher overhead for updates
> and deletes? or are you referring to the gradual loss of performance as
> a consequence of many dead tuples?
>
> I am guessing you mean the latter, but best to be sure :-)
The best phrasing would be "the accumulating overhead of deletes and
updates."

Yes.


Re: Frequently updated tables

From
Mark Kirkwood
Date:
pgsql@mohawksoft.com wrote:

>The best phrasing would be "the accumulating overhead of deletes and
>updates."
>
>Yes.
>  
>

Are you using 7.3?

I am asking because in 7.3 high update / delete tables could suffer 
(index and toast) bloat that was untamable via (lazy) VACUUM and FSM.
I believe this is fixed in 7.4, so it should be possible to achieve on 
disk size control of tables / indexes by configuring FSM and (lazy) 
VACUUM. Did you find this not to be the case?

regards

Mark


Re: Frequently updated tables

From
pgsql@mohawksoft.com
Date:
>
> pgsql@mohawksoft.com wrote:
>
>>The best phrasing would be "the accumulating overhead of deletes and
>>updates."
>>
>>Yes.
>>
>>
>
> Are you using 7.3?
>
> I am asking because in 7.3 high update / delete tables could suffer
> (index and toast) bloat that was untamable via (lazy) VACUUM and FSM.
> I believe this is fixed in 7.4, so it should be possible to achieve on
> disk size control of tables / indexes by configuring FSM and (lazy)
> VACUUM. Did you find this not to be the case?
>
Interesting, the company is usng 7.3.4. One single row summary table got
up to 2 million dead rows. A select from that single row took a quarter of
a second. A regular vacuum did not fix it, only a vacuum full did.
However, when the test was re-run with constant vacuums, it did not get
out of hand.

My concern is performance, and yes, for inserts PostgreSQL is great. For
data that is constantly being updated, PostgreSQL is a bit weak. Think
about a table with a few million rows that needs to be updated a few
thousand times a minute.

I love PG, I've been using it since version 6x, and it has gotten
fantastic over the years, and in many cases, I would choose it over
Oracle, but for systems that need frequent updates, I have a lot of
concerns.


Re: Frequently updated tables

From
pgsql@mohawksoft.com
Date:
>> I love PG, I've been using it since version 6x, and it has gotten
>> fantastic over the years, and in many cases, I would choose it over
>> Oracle, but for systems that need frequent updates, I have a lot of
>> concerns.
>
> ...that's the price you pay for concurrency man...

I think that's a cop-out. Other databases can handle this case fine and
they have MVCC.

Are we not "open source" "free software" proponents? Isn't one of our
motivations that we can do it better?

This *is* a problem with PostgreSQL, and it *is* a concern for a
reasonable number of potential deployments.




Re: Frequently updated tables

From
Christopher Kings-Lynne
Date:
> I love PG, I've been using it since version 6x, and it has gotten
> fantastic over the years, and in many cases, I would choose it over
> Oracle, but for systems that need frequent updates, I have a lot of
> concerns.

...that's the price you pay for concurrency man...

Chris



Re: Frequently updated tables

From
Alvaro Herrera
Date:
On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote:
> >I love PG, I've been using it since version 6x, and it has gotten
> >fantastic over the years, and in many cases, I would choose it over
> >Oracle, but for systems that need frequent updates, I have a lot of
> >concerns.
> 
> ...that's the price you pay for concurrency man...

Also he said that the problem was solved with enough lazy VACUUM
scheduling.  I don't understand why he doesn't want to use that
solution.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"People get annoyed when you try to debug them."  (Larry Wall)



Re: Frequently updated tables

From
pgsql@mohawksoft.com
Date:
> On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote:
>> >I love PG, I've been using it since version 6x, and it has gotten
>> >fantastic over the years, and in many cases, I would choose it over
>> >Oracle, but for systems that need frequent updates, I have a lot of
>> >concerns.
>>
>> ...that's the price you pay for concurrency man...
>
> Also he said that the problem was solved with enough lazy VACUUM
> scheduling.  I don't understand why he doesn't want to use that
> solution.
>

Sigh, because vacuums take away from performance. Imagine a table that has
to be updated on the order of a few thousand times a minute. Think about
the drop in performance during the vacuum.

On a one row table, vacuum is not so bad, but try some benchmarks on a
table with a goodly number of rows.


Re: Frequently updated tables

From
Bruno Wolff III
Date:
On Wed, Jun 09, 2004 at 13:41:27 -0400, pgsql@mohawksoft.com wrote:
> 
> Sigh, because vacuums take away from performance. Imagine a table that has
> to be updated on the order of a few thousand times a minute. Think about
> the drop in performance during the vacuum.
> 
> On a one row table, vacuum is not so bad, but try some benchmarks on a
> table with a goodly number of rows.

But you only need to rapidly vacuum the one table that is keeping your
totals record. This isn't going to be a big hit in performance relative
to the updates that are going on. You don't need to vacuum the tables
you are doing the inserts or updates to at that same rate.


Re: Frequently updated tables

From
"Joshua D. Drake"
Date:
>
> Sigh, because vacuums take away from performance.

This is a known issue that has been pretty much resolved for 7.5. Vacuum
in 7.5 does not take even close to as much IO resources.


Imagine a table that has
> to be updated on the order of a few thousand times a minute. Think about
> the drop in performance during the vacuum.
>
> On a one row table, vacuum is not so bad, but try some benchmarks on a
> table with a goodly number of rows.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment

Re: Frequently updated tables

From
"Joshua D. Drake"
Date:
> Also he said that the problem was solved with enough lazy VACUUM
> scheduling.  I don't understand why he doesn't want to use that
> solution.

Because even lazy VACUUM is horrendous to performance but as I said in
a further post this has been pretty much fixed by (Jan I believe) in 7.5.

Sincerely,

Joshua D. Drake


>


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment

Re: Frequently updated tables

From
"Scott Marlowe"
Date:
On Wed, 2004-06-09 at 11:41, pgsql@mohawksoft.com wrote:
> > On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote:
> >> >I love PG, I've been using it since version 6x, and it has gotten
> >> >fantastic over the years, and in many cases, I would choose it over
> >> >Oracle, but for systems that need frequent updates, I have a lot of
> >> >concerns.
> >>
> >> ...that's the price you pay for concurrency man...
> >
> > Also he said that the problem was solved with enough lazy VACUUM
> > scheduling.  I don't understand why he doesn't want to use that
> > solution.
> >
> 
> Sigh, because vacuums take away from performance. Imagine a table that has
> to be updated on the order of a few thousand times a minute. Think about
> the drop in performance during the vacuum.
> 
> On a one row table, vacuum is not so bad, but try some benchmarks on a
> table with a goodly number of rows.

Several points:

All databases pay to clean up the mess they've made, so to speak.  In
PostgreSQL you get to choose when, instead of always paying the price at
the end of transaction.

Lazy vacuum does not impact performance nearly as much as the old full
vacuum.  With the sleep / delay patch that's been passed around on
hackers its impact is virtually zero on the rest of the database

Properly setup fsm settings, pg_autovacuum deamon, and an installation
of the sleep / delay patch mentioned aobve makes this a non-issue.



Re: Frequently updated tables

From
Alvaro Herrera
Date:
On Wed, Jun 09, 2004 at 01:41:27PM -0400, pgsql@mohawksoft.com wrote:
> > On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote:

> > Also he said that the problem was solved with enough lazy VACUUM
> > scheduling.  I don't understand why he doesn't want to use that
> > solution.
> 
> Sigh, because vacuums take away from performance. Imagine a table that has
> to be updated on the order of a few thousand times a minute. Think about
> the drop in performance during the vacuum.
> 
> On a one row table, vacuum is not so bad, but try some benchmarks on a
> table with a goodly number of rows.

Hmm, this can be a problem if VACUUM pollutes the shared buffer pool.
So what about a new buffer replacement policy that takes this into
account and is not fooled by VACUUM?  This is already implemented in
7.5.  Also, how about a background writer process that writes dirty
buffers so that backends don't have to wait for IO to complete when a
dirty buffer has to be written?  This is also in current CVS.


Have you tried and measured how the current CVS code performs?  Jan
Wieck reported a lot of performance improvement some time ago while he
was developing this.  The code has changed since and I have not seen any
measurement.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Oh, oh, las chicas galacianas, lo harán por las perlas,
¡Y las de Arrakis por el agua! Pero si buscas damas
Que se consuman como llamas, ¡Prueba una hija de Caladan! (Gurney Halleck)



Re: Frequently updated tables

From
"Jim C. Nasby"
Date:
On Tue, Jun 08, 2004 at 07:16:45PM -0400, pgsql@mohawksoft.com wrote:
> >
> >
> > pgsql@mohawksoft.com wrote:
> >
> >>I've been down several roads about how to handle data that has to change
> >>on a very frequent and rapid manner.
> >>
> >>Think about summary tables, WEB session tables, etc. As great as MVCC is
> >>for the vast majority of uses. The overhead of updates and deletes can
> >>kill a project that needs to constantly update tables.
> >>
> >>
> >>
> >>
> >
> > Are you saying that MVCC has *by design* a higher overhead for updates
> > and deletes? or are you referring to the gradual loss of performance as
> > a consequence of many dead tuples?
> >
> > I am guessing you mean the latter, but best to be sure :-)
> The best phrasing would be "the accumulating overhead of deletes and
> updates."
> 
> Yes.
Doesn't pg_autovacuum largely take care of this issue?
-- 
Jim C. Nasby, Database Consultant                  jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: Frequently updated tables

From
pgsql@mohawksoft.com
Date:
> On Wed, Jun 09, 2004 at 13:41:27 -0400,
>   pgsql@mohawksoft.com wrote:
>>
>> Sigh, because vacuums take away from performance. Imagine a table that
>> has
>> to be updated on the order of a few thousand times a minute. Think about
>> the drop in performance during the vacuum.
>>
>> On a one row table, vacuum is not so bad, but try some benchmarks on a
>> table with a goodly number of rows.
>
> But you only need to rapidly vacuum the one table that is keeping your
> totals record. This isn't going to be a big hit in performance relative
> to the updates that are going on. You don't need to vacuum the tables
> you are doing the inserts or updates to at that same rate.
>

I have been talking about two types of problems which are both based on
PostgreSQL's behavior with frequently updated tables.

Summary table: In the single row table system, you have to vacuum very
requently, and this affects performance.

Frequently updated tables: think about the session table for a website.
Each new user gets a new session row. Everytime they refresh or act in the
site, the row is updated. When they leave or their session times out, the
row is deleted. I wrote a RAM only session manager for PHP because
PostgreSQL couldn't handle the volume. (2000 hits a second)

If you have an active site, with hundreds or thousands of hits a second,
vacuuming the table constantly is not practical.

I don't think anyone who has seriously looked at these issues has
concluded that PostgreSQL works fine in these cases. The question is what,
if anything, can be done? The frequent update issue really affects
PostgreSQL's acceptance in web applications, and one which MySQL seems to
do a better job.

IMHO, this issue, a two stage commit based replication system, and a real
and usable setup/configuration system are all that stands between
PostgreSQL and the serious enterprise deployment.


Re: Frequently updated tables

From
Josh Berkus
Date:
"Mohawksoft":

> I don't think anyone who has seriously looked at these issues has
> concluded that PostgreSQL works fine in these cases. The question is what,
> if anything, can be done? The frequent update issue really affects
> PostgreSQL's acceptance in web applications, and one which MySQL seems to
> do a better job.

I think that we'd welcome any suggestions that don't break MVCC.   Do you have 
any?   

MySQL is able to handle this situation -- in MyISAM tables -- precisely 
because there is no transaction isolation and they regard 97% data integrity 
as "good enough".   Essentially, the MyISAM tables are little better than 
delimited text flatfiles.  That's not an approach we can take.     

> IMHO, this issue, a two stage commit based replication system, and a real
> and usable setup/configuration system are all that stands between
> PostgreSQL and the serious enterprise deployment.

There I have to disagree with you.   The features you mention may be important 
to your clients, but they are not to mine; instead, we're waiting for 
clustering, and table partitioning in addition to what's in 7.5.   Please 
don't assume that all DB applications have the same needs as yours.  The 
problems you raise are legitimate, but not everyone shares your priorities. 

Besides, we already have "serious enterprise deployment."   5 of my clients 
are startups which run on PostgreSQL.   The .ORG and .INFO domains run on 
PostgreSQL.   There are two commerical-grade, deployed, ERP systems for 
manufacturers which run on PostgreSQL.    What is your definition of 
"enterprise deployment", exactly?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: Frequently updated tables

From
Mark Kirkwood
Date:
pgsql@mohawksoft.com wrote:

>
>I have been talking about two types of problems which are both based on
>PostgreSQL's behavior with frequently updated tables.
>
>Summary table: In the single row table system, you have to vacuum very
>requently, and this affects performance.
>
>Frequently updated tables: think about the session table for a website.
>Each new user gets a new session row. Everytime they refresh or act in the
>site, the row is updated. When they leave or their session times out, the
>row is deleted. I wrote a RAM only session manager for PHP because
>PostgreSQL couldn't handle the volume. (2000 hits a second)
>
>  
>
It would be interesting to see if the vacuum delay patch, fsm tuning + 
vacuum scheduling could have changed this situation. Clearly there is an 
issue here (hence a patch...), but ISTM that just as significant is the 
fact that it is difficult to know how to configure the various bits and 
pieces, and also difficult to know if it has been done optimally.

>If you have an active site, with hundreds or thousands of hits a second,
>vacuuming the table constantly is not practical.
>
>I don't think anyone who has seriously looked at these issues has
>concluded that PostgreSQL works fine in these cases. The question is what,
>if anything, can be done? The frequent update issue really affects
>PostgreSQL's acceptance in web applications, and one which MySQL seems to
>do a better job.
>
>
>  
>
As an aside, I have had similar issues with DB2 and high update tables - 
lock escalations (locklist tuning needed). It is not just 
non-overwriting storage managers that need the magic tuning wand :-)

regards

Mark


Re: Frequently updated tables

From
pgsql@mohawksoft.com
Date:
> pgsql@mohawksoft.com wrote:
>
>>
>>I have been talking about two types of problems which are both based on
>>PostgreSQL's behavior with frequently updated tables.
>>
>>Summary table: In the single row table system, you have to vacuum very
>>requently, and this affects performance.
>>
>>Frequently updated tables: think about the session table for a website.
>>Each new user gets a new session row. Everytime they refresh or act in
>> the
>>site, the row is updated. When they leave or their session times out, the
>>row is deleted. I wrote a RAM only session manager for PHP because
>>PostgreSQL couldn't handle the volume. (2000 hits a second)
>>
>>
>>
> It would be interesting to see if the vacuum delay patch, fsm tuning +
> vacuum scheduling could have changed this situation. Clearly there is an
> issue here (hence a patch...), but ISTM that just as significant is the
> fact that it is difficult to know how to configure the various bits and
> pieces, and also difficult to know if it has been done optimally.
>
>>If you have an active site, with hundreds or thousands of hits a second,
>>vacuuming the table constantly is not practical.
>>
>>I don't think anyone who has seriously looked at these issues has
>>concluded that PostgreSQL works fine in these cases. The question is
>> what,
>>if anything, can be done? The frequent update issue really affects
>>PostgreSQL's acceptance in web applications, and one which MySQL seems to
>>do a better job.
>>
>>
>>
>>
> As an aside, I have had similar issues with DB2 and high update tables -
> lock escalations (locklist tuning needed). It is not just
> non-overwriting storage managers that need the magic tuning wand :-)

Funny, I've used DB2 for a few projects, but never for a web session
system. This is an interesting data point thanks.