Thread: Frequently updated tables
I've been down several roads about how to handle data that has to change on a very frequent and rapid manner. Think about summary tables, WEB session tables, etc. As great as MVCC is for the vast majority of uses. The overhead of updates and deletes can kill a project that needs to constantly update tables. In most cases, the answer is just not to use PostgreSQL for that, but then you are stuck with Mysql or something worse. Would having a special class of table that is marked for frequent updates, which causes the system to lock and update in place, make sense? Is that even possible? There has to be a way of making PostgreSQL able to handle this class of problem. Anyone have a reasonable idea?
pgsql@mohawksoft.com wrote: >I've been down several roads about how to handle data that has to change >on a very frequent and rapid manner. > >Think about summary tables, WEB session tables, etc. As great as MVCC is >for the vast majority of uses. The overhead of updates and deletes can >kill a project that needs to constantly update tables. > > > > Are you saying that MVCC has *by design* a higher overhead for updates and deletes? or are you referring to the gradual loss of performance as a consequence of many dead tuples? I am guessing you mean the latter, but best to be sure :-) regards Mark
> > > pgsql@mohawksoft.com wrote: > >>I've been down several roads about how to handle data that has to change >>on a very frequent and rapid manner. >> >>Think about summary tables, WEB session tables, etc. As great as MVCC is >>for the vast majority of uses. The overhead of updates and deletes can >>kill a project that needs to constantly update tables. >> >> >> >> > > Are you saying that MVCC has *by design* a higher overhead for updates > and deletes? or are you referring to the gradual loss of performance as > a consequence of many dead tuples? > > I am guessing you mean the latter, but best to be sure :-) The best phrasing would be "the accumulating overhead of deletes and updates." Yes.
pgsql@mohawksoft.com wrote: >The best phrasing would be "the accumulating overhead of deletes and >updates." > >Yes. > > Are you using 7.3? I am asking because in 7.3 high update / delete tables could suffer (index and toast) bloat that was untamable via (lazy) VACUUM and FSM. I believe this is fixed in 7.4, so it should be possible to achieve on disk size control of tables / indexes by configuring FSM and (lazy) VACUUM. Did you find this not to be the case? regards Mark
> > pgsql@mohawksoft.com wrote: > >>The best phrasing would be "the accumulating overhead of deletes and >>updates." >> >>Yes. >> >> > > Are you using 7.3? > > I am asking because in 7.3 high update / delete tables could suffer > (index and toast) bloat that was untamable via (lazy) VACUUM and FSM. > I believe this is fixed in 7.4, so it should be possible to achieve on > disk size control of tables / indexes by configuring FSM and (lazy) > VACUUM. Did you find this not to be the case? > Interesting, the company is usng 7.3.4. One single row summary table got up to 2 million dead rows. A select from that single row took a quarter of a second. A regular vacuum did not fix it, only a vacuum full did. However, when the test was re-run with constant vacuums, it did not get out of hand. My concern is performance, and yes, for inserts PostgreSQL is great. For data that is constantly being updated, PostgreSQL is a bit weak. Think about a table with a few million rows that needs to be updated a few thousand times a minute. I love PG, I've been using it since version 6x, and it has gotten fantastic over the years, and in many cases, I would choose it over Oracle, but for systems that need frequent updates, I have a lot of concerns.
>> I love PG, I've been using it since version 6x, and it has gotten >> fantastic over the years, and in many cases, I would choose it over >> Oracle, but for systems that need frequent updates, I have a lot of >> concerns. > > ...that's the price you pay for concurrency man... I think that's a cop-out. Other databases can handle this case fine and they have MVCC. Are we not "open source" "free software" proponents? Isn't one of our motivations that we can do it better? This *is* a problem with PostgreSQL, and it *is* a concern for a reasonable number of potential deployments.
> I love PG, I've been using it since version 6x, and it has gotten > fantastic over the years, and in many cases, I would choose it over > Oracle, but for systems that need frequent updates, I have a lot of > concerns. ...that's the price you pay for concurrency man... Chris
On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote: > >I love PG, I've been using it since version 6x, and it has gotten > >fantastic over the years, and in many cases, I would choose it over > >Oracle, but for systems that need frequent updates, I have a lot of > >concerns. > > ...that's the price you pay for concurrency man... Also he said that the problem was solved with enough lazy VACUUM scheduling. I don't understand why he doesn't want to use that solution. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "People get annoyed when you try to debug them." (Larry Wall)
> On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote: >> >I love PG, I've been using it since version 6x, and it has gotten >> >fantastic over the years, and in many cases, I would choose it over >> >Oracle, but for systems that need frequent updates, I have a lot of >> >concerns. >> >> ...that's the price you pay for concurrency man... > > Also he said that the problem was solved with enough lazy VACUUM > scheduling. I don't understand why he doesn't want to use that > solution. > Sigh, because vacuums take away from performance. Imagine a table that has to be updated on the order of a few thousand times a minute. Think about the drop in performance during the vacuum. On a one row table, vacuum is not so bad, but try some benchmarks on a table with a goodly number of rows.
On Wed, Jun 09, 2004 at 13:41:27 -0400, pgsql@mohawksoft.com wrote: > > Sigh, because vacuums take away from performance. Imagine a table that has > to be updated on the order of a few thousand times a minute. Think about > the drop in performance during the vacuum. > > On a one row table, vacuum is not so bad, but try some benchmarks on a > table with a goodly number of rows. But you only need to rapidly vacuum the one table that is keeping your totals record. This isn't going to be a big hit in performance relative to the updates that are going on. You don't need to vacuum the tables you are doing the inserts or updates to at that same rate.
> > Sigh, because vacuums take away from performance. This is a known issue that has been pretty much resolved for 7.5. Vacuum in 7.5 does not take even close to as much IO resources. Imagine a table that has > to be updated on the order of a few thousand times a minute. Think about > the drop in performance during the vacuum. > > On a one row table, vacuum is not so bad, but try some benchmarks on a > table with a goodly number of rows. > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Attachment
> Also he said that the problem was solved with enough lazy VACUUM > scheduling. I don't understand why he doesn't want to use that > solution. Because even lazy VACUUM is horrendous to performance but as I said in a further post this has been pretty much fixed by (Jan I believe) in 7.5. Sincerely, Joshua D. Drake > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Attachment
On Wed, 2004-06-09 at 11:41, pgsql@mohawksoft.com wrote: > > On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote: > >> >I love PG, I've been using it since version 6x, and it has gotten > >> >fantastic over the years, and in many cases, I would choose it over > >> >Oracle, but for systems that need frequent updates, I have a lot of > >> >concerns. > >> > >> ...that's the price you pay for concurrency man... > > > > Also he said that the problem was solved with enough lazy VACUUM > > scheduling. I don't understand why he doesn't want to use that > > solution. > > > > Sigh, because vacuums take away from performance. Imagine a table that has > to be updated on the order of a few thousand times a minute. Think about > the drop in performance during the vacuum. > > On a one row table, vacuum is not so bad, but try some benchmarks on a > table with a goodly number of rows. Several points: All databases pay to clean up the mess they've made, so to speak. In PostgreSQL you get to choose when, instead of always paying the price at the end of transaction. Lazy vacuum does not impact performance nearly as much as the old full vacuum. With the sleep / delay patch that's been passed around on hackers its impact is virtually zero on the rest of the database Properly setup fsm settings, pg_autovacuum deamon, and an installation of the sleep / delay patch mentioned aobve makes this a non-issue.
On Wed, Jun 09, 2004 at 01:41:27PM -0400, pgsql@mohawksoft.com wrote: > > On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote: > > Also he said that the problem was solved with enough lazy VACUUM > > scheduling. I don't understand why he doesn't want to use that > > solution. > > Sigh, because vacuums take away from performance. Imagine a table that has > to be updated on the order of a few thousand times a minute. Think about > the drop in performance during the vacuum. > > On a one row table, vacuum is not so bad, but try some benchmarks on a > table with a goodly number of rows. Hmm, this can be a problem if VACUUM pollutes the shared buffer pool. So what about a new buffer replacement policy that takes this into account and is not fooled by VACUUM? This is already implemented in 7.5. Also, how about a background writer process that writes dirty buffers so that backends don't have to wait for IO to complete when a dirty buffer has to be written? This is also in current CVS. Have you tried and measured how the current CVS code performs? Jan Wieck reported a lot of performance improvement some time ago while he was developing this. The code has changed since and I have not seen any measurement. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Oh, oh, las chicas galacianas, lo harán por las perlas, ¡Y las de Arrakis por el agua! Pero si buscas damas Que se consuman como llamas, ¡Prueba una hija de Caladan! (Gurney Halleck)
On Tue, Jun 08, 2004 at 07:16:45PM -0400, pgsql@mohawksoft.com wrote: > > > > > > pgsql@mohawksoft.com wrote: > > > >>I've been down several roads about how to handle data that has to change > >>on a very frequent and rapid manner. > >> > >>Think about summary tables, WEB session tables, etc. As great as MVCC is > >>for the vast majority of uses. The overhead of updates and deletes can > >>kill a project that needs to constantly update tables. > >> > >> > >> > >> > > > > Are you saying that MVCC has *by design* a higher overhead for updates > > and deletes? or are you referring to the gradual loss of performance as > > a consequence of many dead tuples? > > > > I am guessing you mean the latter, but best to be sure :-) > The best phrasing would be "the accumulating overhead of deletes and > updates." > > Yes. Doesn't pg_autovacuum largely take care of this issue? -- Jim C. Nasby, Database Consultant jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
> On Wed, Jun 09, 2004 at 13:41:27 -0400, > pgsql@mohawksoft.com wrote: >> >> Sigh, because vacuums take away from performance. Imagine a table that >> has >> to be updated on the order of a few thousand times a minute. Think about >> the drop in performance during the vacuum. >> >> On a one row table, vacuum is not so bad, but try some benchmarks on a >> table with a goodly number of rows. > > But you only need to rapidly vacuum the one table that is keeping your > totals record. This isn't going to be a big hit in performance relative > to the updates that are going on. You don't need to vacuum the tables > you are doing the inserts or updates to at that same rate. > I have been talking about two types of problems which are both based on PostgreSQL's behavior with frequently updated tables. Summary table: In the single row table system, you have to vacuum very requently, and this affects performance. Frequently updated tables: think about the session table for a website. Each new user gets a new session row. Everytime they refresh or act in the site, the row is updated. When they leave or their session times out, the row is deleted. I wrote a RAM only session manager for PHP because PostgreSQL couldn't handle the volume. (2000 hits a second) If you have an active site, with hundreds or thousands of hits a second, vacuuming the table constantly is not practical. I don't think anyone who has seriously looked at these issues has concluded that PostgreSQL works fine in these cases. The question is what, if anything, can be done? The frequent update issue really affects PostgreSQL's acceptance in web applications, and one which MySQL seems to do a better job. IMHO, this issue, a two stage commit based replication system, and a real and usable setup/configuration system are all that stands between PostgreSQL and the serious enterprise deployment.
"Mohawksoft": > I don't think anyone who has seriously looked at these issues has > concluded that PostgreSQL works fine in these cases. The question is what, > if anything, can be done? The frequent update issue really affects > PostgreSQL's acceptance in web applications, and one which MySQL seems to > do a better job. I think that we'd welcome any suggestions that don't break MVCC. Do you have any? MySQL is able to handle this situation -- in MyISAM tables -- precisely because there is no transaction isolation and they regard 97% data integrity as "good enough". Essentially, the MyISAM tables are little better than delimited text flatfiles. That's not an approach we can take. > IMHO, this issue, a two stage commit based replication system, and a real > and usable setup/configuration system are all that stands between > PostgreSQL and the serious enterprise deployment. There I have to disagree with you. The features you mention may be important to your clients, but they are not to mine; instead, we're waiting for clustering, and table partitioning in addition to what's in 7.5. Please don't assume that all DB applications have the same needs as yours. The problems you raise are legitimate, but not everyone shares your priorities. Besides, we already have "serious enterprise deployment." 5 of my clients are startups which run on PostgreSQL. The .ORG and .INFO domains run on PostgreSQL. There are two commerical-grade, deployed, ERP systems for manufacturers which run on PostgreSQL. What is your definition of "enterprise deployment", exactly? -- Josh Berkus Aglio Database Solutions San Francisco
pgsql@mohawksoft.com wrote: > >I have been talking about two types of problems which are both based on >PostgreSQL's behavior with frequently updated tables. > >Summary table: In the single row table system, you have to vacuum very >requently, and this affects performance. > >Frequently updated tables: think about the session table for a website. >Each new user gets a new session row. Everytime they refresh or act in the >site, the row is updated. When they leave or their session times out, the >row is deleted. I wrote a RAM only session manager for PHP because >PostgreSQL couldn't handle the volume. (2000 hits a second) > > > It would be interesting to see if the vacuum delay patch, fsm tuning + vacuum scheduling could have changed this situation. Clearly there is an issue here (hence a patch...), but ISTM that just as significant is the fact that it is difficult to know how to configure the various bits and pieces, and also difficult to know if it has been done optimally. >If you have an active site, with hundreds or thousands of hits a second, >vacuuming the table constantly is not practical. > >I don't think anyone who has seriously looked at these issues has >concluded that PostgreSQL works fine in these cases. The question is what, >if anything, can be done? The frequent update issue really affects >PostgreSQL's acceptance in web applications, and one which MySQL seems to >do a better job. > > > > As an aside, I have had similar issues with DB2 and high update tables - lock escalations (locklist tuning needed). It is not just non-overwriting storage managers that need the magic tuning wand :-) regards Mark
> pgsql@mohawksoft.com wrote: > >> >>I have been talking about two types of problems which are both based on >>PostgreSQL's behavior with frequently updated tables. >> >>Summary table: In the single row table system, you have to vacuum very >>requently, and this affects performance. >> >>Frequently updated tables: think about the session table for a website. >>Each new user gets a new session row. Everytime they refresh or act in >> the >>site, the row is updated. When they leave or their session times out, the >>row is deleted. I wrote a RAM only session manager for PHP because >>PostgreSQL couldn't handle the volume. (2000 hits a second) >> >> >> > It would be interesting to see if the vacuum delay patch, fsm tuning + > vacuum scheduling could have changed this situation. Clearly there is an > issue here (hence a patch...), but ISTM that just as significant is the > fact that it is difficult to know how to configure the various bits and > pieces, and also difficult to know if it has been done optimally. > >>If you have an active site, with hundreds or thousands of hits a second, >>vacuuming the table constantly is not practical. >> >>I don't think anyone who has seriously looked at these issues has >>concluded that PostgreSQL works fine in these cases. The question is >> what, >>if anything, can be done? The frequent update issue really affects >>PostgreSQL's acceptance in web applications, and one which MySQL seems to >>do a better job. >> >> >> >> > As an aside, I have had similar issues with DB2 and high update tables - > lock escalations (locklist tuning needed). It is not just > non-overwriting storage managers that need the magic tuning wand :-) Funny, I've used DB2 for a few projects, but never for a web session system. This is an interesting data point thanks.