Thread: Re: What popular, large commercial websites run
Subject: Re: [GENERAL] What popular, large commercial websites run From: Vic Cekvenich <vic@basebeans.com> === Would profit margin or cost of operation impress managment? Low cost = profit. The way I see it, some managers will buy Oracle. They will have low profit margines. Some programers will use PostgreSQL. They will have high margins. One day the profitable company buys the non-profitable company and then the managers of the company that got taken over are all fired. Problem is that managers don't know IT, they will... fade away. Vic Steve Lane wrote: > On 4/29/02 9:58 AM, "Fran Fabrizio" <ffabrizio@mmrd.com> wrote: > > >>>Hi, I think this may also be interesting for the mailing list then : >>>this is a copy of a message found in this list a few days or weeks >> >>ago. >I kept it because I thought it might be useful some day :-) >> >>>Arnaud >> >>Yes, this message was originally posted to this very same mailing list. =) >> >>It's good information, but again not the type for presenting to >>management. Andrew Sullivan does weigh in (he's the .info guy) but >>other than that it's a lot of "we're not in production yet", "we use it >>for our smaller clients", or things like the "regional foundation for >>contemporary art" in France, which although interesting and good to >>know, does not excite management. >> >>When I say it would be nice to have some large commercial corporations >>to reference, I mean like Fortune 1000 companies or at least companies >>business types have heard about. They want to be able to say "If it's >>good enough for IBM and Ford Motor Co., it's good enough for us." It's >>frustrating and stupid and the wrong way to think about things, but they >>are the ones who make the decisions and so we have to speak their >>language. They want name recognition. >> >>I'm sure Pg is being used in some of these companies, but we just don't >>know about it. That info that was just posted about US Federal Govt >>agencies using it was great, by the way. Thanks! > > > We are using Pg as the back end for two quite significant educational > applications. One is in production, one still in development. That in > production is a system to track the special education population of a large > Midwestern state -- not a huge application in terms of data, but huge in > terms of criticality. It is in pilot now, and is slowly being rolled out to > the rest of the state. Ultimately it will track dozens of pages of > documentation for each of the state's 40K special ed students. > > -- sgl > > > ======================================================= > Steve Lane > > Vice President > Chris Moyer Consulting, Inc. > 833 West Chicago Ave Suite 203 > > Voice: (312) 433-2421 Email: slane@fmpro.com > Fax: (312) 850-3930 Web: http://www.fmpro.com > ======================================================= > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
On Mon, 29 Apr 2002, pgsql-gen Newsgroup wrote: > The way I see it, some managers will buy Oracle. They will have low > profit margines. Some programers will use PostgreSQL. They will have > high margins. That's all well and good provided postgres and Oracle were 100% feature compatible. They're not. Want inter-database queries? Too bad. Replication? Nope. Parallel queries? Scratch that. Packages? Big goose-egg. ADA-style error catching? Zero. In/Out variables? Only in your dreams, buddy. Views that take parameters? Zilch. Wanna actually drop your foreign keys, or change their status or triggering order for loading purposes? Not here. Are these issues being addressed? Sure they are. But I think I've pretty much proven that Oracle has things Postgres doesn't, and can do things Postgres can't. If you're a large corporation that needs replication, parallel database queries, use of rollback segments instead of MVCC (to avoid tainting the datafiles with invalid/old data), you don't have a choice. It's either Oracle or DB2, really. The fact is, we've used postgres for about 5 years now. I'm recommending migrating off of it at this very moment. Why? MVCC. When I finally got sick of doing a full database dump and restore every month, and a full vacuum every two hours to avoid rampant datafile growth, I made the official decision to ditch Postgres. Why are our databases bloating, even after hourly full vacuums? Because we have a database with a 50-100% data turnover rate at about 100,000 rows, and postgres just can't handle it. I've watched our 100mb database grow to 500mb, then 2gigs. Full dump and restore? 70mb again. Oh, and the spiking load, and table locks that occur during full vacuums? Just take the hit, web-surfers be damned. For us, Oracle keeps live statistics on the data, realtime. No analyze. It also uses rollback segments to serve old versions of data when locks are present, instead of MVCC. MVCC leaves the old version of the row *in* the table data, right next to unchanged rows, making vacuum necessary to clean up, and point to newest row versions without a sequence scan. Rollback segments just put the old versions in the segment, if things change, they reapply the data, and no harm done. No datafile growth. No old versions. No table scans to find valid rows, no vacuums. Does it cost more? Sure. But until Postgres can solve these important problems, we have no other choice; regardless of how much we want to go the cheaper route. It's not always about money. -- +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ | Shaun M. Thomas INN Database Administrator | | Phone: (309) 743-0812 Fax : (309) 743-0830 | | Email: sthomas@townnews.com AIM : trifthen | | Web : www.townnews.com | | | | "Most of our lives are about proving something, either to | | ourselves or to someone else." | | -- Anonymous | +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
On Wed, May 01, 2002 at 02:52:21PM -0500, Shaun Thomas wrote: > On Mon, 29 Apr 2002, pgsql-gen Newsgroup wrote: > > > The way I see it, some managers will buy Oracle. They will have low > > profit margines. Some programers will use PostgreSQL. They will have > > high margins. > > > Why are our databases bloating, even after hourly full vacuums? Because > we have a database with a 50-100% data turnover rate at about 100,000 > rows, and postgres just can't handle it. I've watched our 100mb > database grow to 500mb, then 2gigs. Full dump and restore? 70mb > again. Oh, and the spiking load, and table locks that occur during > full vacuums? Just take the hit, web-surfers be damned. > I'm very curious to know why you have problem with growing database. Does the performance suffer significantly if you don't do the FULL vacuum? Surely if you can afford the oracle you can afford relatively much cheaper storage. You must have other reasons than just not liking large database Thanks
On Wed, 1 May 2002 postgres@vrane.com wrote: > On Wed, May 01, 2002 at 02:52:21PM -0500, Shaun Thomas wrote: > > On Mon, 29 Apr 2002, pgsql-gen Newsgroup wrote: > > > > > The way I see it, some managers will buy Oracle. They will have low > > > profit margines. Some programers will use PostgreSQL. They will have > > > high margins. > > > > > > Why are our databases bloating, even after hourly full vacuums? Because > > we have a database with a 50-100% data turnover rate at about 100,000 > > rows, and postgres just can't handle it. I've watched our 100mb > > database grow to 500mb, then 2gigs. Full dump and restore? 70mb > > again. Oh, and the spiking load, and table locks that occur during > > full vacuums? Just take the hit, web-surfers be damned. > > > > I'm very curious to know why you have problem with growing > database. Does the performance suffer significantly > if you don't do the FULL vacuum? Surely if you can > afford the oracle you can afford relatively much > cheaper storage. You must have other reasons > than just not liking large database What you want to do to see an issue is: Create a table with a boolean column Then fill it with 50000 rows with the boolean field set to true and 50000 rows with the boolean field set to false; Analyse the table Then run EXPLAIN ANALYZE SELECT * FROM TABLE <mytable> WHERE <booleanfield> = true; Then run UPDATE <mytable> SET <booleanfield> = false WHERE <booleanfield> = false; (several times) Then retry the EXPLAIN ANALYZE SELECT ... from above again, after analysing the table again obviously. What you should see is a seqscan in both plans with the final number of rows the same but the second will have taken longer. Now, I haven't tried that exact scenario myself so don't quote me :) and I'm also assuming there's no caching of the results from the first explain through to the second. -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
On Wed, May 01, 2002 at 11:54:37PM +0100, Nigel J. Andrews wrote: > > On Wed, 1 May 2002 postgres@vrane.com wrote: > > Then run > EXPLAIN ANALYZE SELECT * FROM TABLE <mytable> WHERE <booleanfield> = true; > > Then run > UPDATE <mytable> SET <booleanfield> = false WHERE <booleanfield> = false; > > (several times) > > What you should see is a seqscan in both plans with the final number of rows > the same but the second will have taken longer. I haven't tried it yet but isn't creating an index here the logical thing to do here. Whenever I have "where" clause in my queries I always create an index.
On Wed, 1 May 2002 postgres@vrane.com wrote: > On Wed, May 01, 2002 at 11:54:37PM +0100, Nigel J. Andrews wrote: > > > > On Wed, 1 May 2002 postgres@vrane.com wrote: > > > > Then run > > EXPLAIN ANALYZE SELECT * FROM TABLE <mytable> WHERE <booleanfield> = true; > > > > Then run > > UPDATE <mytable> SET <booleanfield> = false WHERE <booleanfield> = false; > > > > (several times) > > > > What you should see is a seqscan in both plans with the final number of rows > > the same but the second will have taken longer. > > I haven't tried it yet but isn't creating an index here > the logical thing to do here. Whenever I have "where" > clause in my queries I always create an index. Damn, knew I'd miss something out. Yes, it would seem a good idea to create an index on that boolean column. I'd still expect the second query (in the explain analyze) to take longer to complete. I think I might give this a go myself sometime just to check. -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
On Wed, May 01, 2002 at 02:52:21PM -0500, Shaun Thomas wrote: > Why are our databases bloating, even after hourly full vacuums? Because > we have a database with a 50-100% data turnover rate at about 100,000 > rows, and postgres just can't handle it. I've watched our 100mb > database grow to 500mb, then 2gigs. Full dump and restore? 70mb > again. Oh, and the spiking load, and table locks that occur during > full vacuums? Just take the hit, web-surfers be damned. Have you tried running frequent (more than hourly) non-full vacuums? regards, tom lane
On Wed, 1 May 2002 postgres@vrane.com wrote: > I'm very curious to know why you have problem with growing > database. Does the performance suffer significantly > if you don't do the FULL vacuum? Surely if you can > afford the oracle you can afford relatively much > cheaper storage. You must have other reasons > than just not liking large database Well, it's not the fact that it's growing that's the problem. It's the fact that 100 actual MB of frequently changed data becomes 2gigs if not frequently vacuumed. Even with hourly full vacuums, it still slowly bloats to 200mb in two weeks, with the same amount of data. The worst part about this is that the more it bloats, the longer vacuum takes, and the speed of the bloating increases almost exponentially. Given two months, it's back up to 2 gigs. Full dump and restore? Back down to 100MB. I'm sorry, but no amount of disk storage should have to compensate for a database growing to 60x larger than the actual data stored (the data dump is 30 megs, but you can expect a certain amount of bloat due to column sizes). When the data files start hitting 2GB each, the Linux file-size limit comes into play, and you have no choice but to dump and restore. Even worse? Since there is only about 100 MB of real data in that 2GB morass, there's 1.9GB of old or invalid rows that Vacuum didn't clean up. That makes all subsequent vacuums slower, which makes their locks last longer, which means all selects on the tables being vacuumed are stalled until the vacuum is done. What happens when the vacuum takes half an hour, and it's a web application? Oh darn, you stop serving pages that use the database. Only full vacuum on non-peak times, you say? We tried that. The datafiles exploded to 2GB within days. *DAYS* The hourly vacuum brought it under control, but our insert script which runs every hour and replaces about 10% of the data per run, really cries bloody murder while the vacuum is running. As a result, this machine commonly has a load approaching 20 almost all the time. Turn off postgres? Less than 1, even with a concurrent Mysql DB that has a *valid* 2GB database that contains 2GB of actual data. I'm not passing blame. I'm not even angry. I'm just frustrated and tired of babying Postgres so it doesn't cause the server to burst into flames, crash, or otherwise fail. I actually had less admin overhead with an Oracle database. Anyone who has used Oracle knows just how hard it is to administer, but I sure as hell never had to completely dump and restore the data every month to keep it from eating my drives. I mean, what good is MVCC to avoid locking, when frequent full vacuums are even worse? I'd rather have a little locking contention, than have a completely useless database for ten to twenty minutes, every hour. Heck, maybe it's just our special case, that such a high rate of data turnover just bewilders postgres. But as of now, it's just plain unusable. Until vacuum goes away completely, which it never will as long as MVCC is in place, postgres is not an option for us. I just can't see any way around it. I hate mysql when I want things like foreign keys or subselects, so we still need a full DBMS. Hell, we're even considering giving a look to freaking Interbase, for the love of God. We're desperate, here. ^_^ As a side note, Sybase used to be our DBMS of choice, but it didn't play nice with PHP (segfaults on connects, sometimes), and only old versions fall under the free license, so we ditched it too. It also had weird syntax (go, for crying out loud!?), so our developers hated it. If I could only take all the good things from the databases I liked, and make them one database... Ease of administration of Mysql + Features of Oracle, for instance. But that database is only available in the wonderful, magical world of 'you must be kidding' land. Oh well. -- +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ | Shaun M. Thomas INN Database Administrator | | Phone: (309) 743-0812 Fax : (309) 743-0830 | | Email: sthomas@townnews.com AIM : trifthen | | Web : www.townnews.com | | | | "Most of our lives are about proving something, either to | | ourselves or to someone else." | | -- Anonymous | +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
On Wed, 1 May 2002 postgres@vrane.com wrote: > > What you should see is a seqscan in both plans with the final number of rows > > the same but the second will have taken longer. > > I haven't tried it yet but isn't creating an index here > the logical thing to do here. Whenever I have "where" > clause in my queries I always create an index. An index... on a boolean? My head hurts, stop it. -- +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ | Shaun M. Thomas INN Database Administrator | | Phone: (309) 743-0812 Fax : (309) 743-0830 | | Email: sthomas@townnews.com AIM : trifthen | | Web : www.townnews.com | | | | "Most of our lives are about proving something, either to | | ourselves or to someone else." | | -- Anonymous | +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
On Wed, 1 May 2002, Tom Lane wrote: > Have you tried running frequent (more than hourly) non-full vacuums? I'd love to. But one of our customers is keeping us from upgrading to 7.2. (the bastards.) Full vacuums are my only choice for the time being. But knowing how vacuum works, how would this help? It would keep our queries nice and speedy, but unless Postgres has a new, magical way of knowing outdated versions of a row are actually outdated, the overall file bloating will be the same, and we'd still need the full vacuums every hour. If vacuum in 7.2 marks the row as outdated, and new data can be placed *over* it, or a full vacuum completely obliterates *all* rows marked as invalid, sure. But if it keeps some kind of list of all recent row versions to avoid a sequence scan through all the invalid rows, should that list grow large, I could see it losing track with our large turnover rate. What to do... -- +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ | Shaun M. Thomas INN Database Administrator | | Phone: (309) 743-0812 Fax : (309) 743-0830 | | Email: sthomas@townnews.com AIM : trifthen | | Web : www.townnews.com | | | | "Most of our lives are about proving something, either to | | ourselves or to someone else." | | -- Anonymous | +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
Shaun Thomas <sthomas@townnews.com> writes: >> Have you tried running frequent (more than hourly) non-full vacuums? > I'd love to. But one of our customers is keeping us from upgrading to > 7.2. (the bastards.) Full vacuums are my only choice for the time > being. But knowing how vacuum works, how would this help? It would > keep our queries nice and speedy, but unless Postgres has a new, magical > way of knowing outdated versions of a row are actually outdated, the > overall file bloating will be the same, and we'd still need the full > vacuums every hour. It's not any more or less magical than the old vacuum's way of knowing that it can remove rows. It is a different way of looking at things though: you try to keep a steady-state percentage of free space in a table. 7.1 and before could only append new rows to the end of a table, so you had to full-vacuum down a table to minimum size and then let it grow again with updates. With the new style vacuum, we don't try to compress the table, we just record where there's free space due to removal of dead tuples. Then insertions reuse that space. So if you run a new-style vacuum after updating say 10% or 25% of the rows, you can maintain a steady state table size that's say 10% or 25% larger than the theoretical minimum. In an installation with a lot of update traffic, you may need to increase the default size of the free space map to ensure that you can keep track of all the free space in your heavily-updated tables. We're still learning about the best way to tune those configuration parameters. regards, tom lane
On Thu, May 02, 2002 at 09:54:12AM -0500, Shaun Thomas wrote: > On Wed, 1 May 2002 postgres@vrane.com wrote: > > > I'm very curious to know why you have problem with growing > > database. Does the performance suffer significantly > > if you don't do the FULL vacuum? Surely if you can > > afford the oracle you can afford relatively much > > cheaper storage. You must have other reasons > > than just not liking large database > > Well, it's not the fact that it's growing that's the problem. It's the > fact that 100 actual MB of frequently changed data becomes 2gigs if not > frequently vacuumed. Even with hourly full vacuums, it still slowly Maybe I did not make myself clear. I don't care whether it's growing or bloating or whatever. If you don't do the FULL vacuum does your performance suffer? Considering that 7.2 gives you non-full vacuum I want to know whether there are cases out there only FULL vacuum hourly will solve/not solve. The only problem I can see your very long essay is linux file size limit. Well that is not postgres fault. You surely know how to get around that problem by upgrading the system or even switching to a different system without that limit. Have you compared the work involved in migrating from one db to another with one os to another? I would think they are both equally involved.
On Thu, 2 May 2002, Tom Lane wrote: > Shaun Thomas <sthomas@townnews.com> writes: > >> Have you tried running frequent (more than hourly) non-full vacuums? > > > I'd love to. But one of our customers is keeping us from upgrading to > > 7.2. (the bastards.) Full vacuums are my only choice for the time > > being. But knowing how vacuum works, how would this help? It would > > keep our queries nice and speedy, but unless Postgres has a new, magical > > way of knowing outdated versions of a row are actually outdated, the > > overall file bloating will be the same, and we'd still need the full > > vacuums every hour. > > With the new style vacuum, we don't try to > compress the table, we just record where there's free space due to > removal of dead tuples. Then insertions reuse that space. So if you > run a new-style vacuum after updating say 10% or 25% of the rows, you > can maintain a steady state table size that's say 10% or 25% larger > than the theoretical minimum. See, that's what I figured. I just had it backwards. I thought you were tracking valid rows, not invalid ones. But, I was being stupid, since the invalid ones aren't likely to outnumber the valid ones. Duhh. Either way, having such a lookup before every insert/update doesn't seem terribly efficient. It seems like this would displace the slowdown caused by vacuum to inserts and updates. For a system with a high volume of inserts and updates, I'm not sure this would be much of an improvement. It would however, get rid of the 10-20 minutes of locked tables during a full vacuum. You have to keep a list to avoid a sequence scan for every insert or update though... so I understand. But It's also why I consider MVCC fundamentally flawed. If Postgres used rollback segments like Oracle, DB2 or other mainstream RDBMS's, this wouldn't be an issue; only locked rows are put in the rollback segment, so the versioning is still available, and it would clean up after itself on commit or rollback. I almost want to say MVCC was a nice idea for a research project, but it doesn't really work in practice. A database with a CVS like model *seems* like a good idea until you throw vacuum into the mixture. It all depends on what you need. This of course wouldn't be an issue for a data-stor, or data that doesn't change frequently. For something like a classified-ad system used by over 500 newspapers, it fails quite spectacularly in our case. In truth, this is the *one* thing we hate about postgres. That's all. Just that one thing. If there were some way to resolve it, I'd be an enthusiastic supporter. > In an installation with a lot of update traffic, you may need to > increase the default size of the free space map to ensure that you > can keep track of all the free space in your heavily-updated tables. > We're still learning about the best way to tune those configuration > parameters. I'll do that. I'm just trying to save my poor server. I'll do almost anything to reduce its suffering. You've been a great help. Thanks. -- +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ | Shaun M. Thomas INN Database Administrator | | Phone: (309) 743-0812 Fax : (309) 743-0830 | | Email: sthomas@townnews.com AIM : trifthen | | Web : www.townnews.com | | | | "Most of our lives are about proving something, either to | | ourselves or to someone else." | | -- Anonymous | +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
Shaun Thomas <sthomas@townnews.com> writes: > Either way, having such a lookup before every insert/update doesn't seem > terribly efficient. It seems like this would displace the slowdown > caused by vacuum to inserts and updates. Actually, I believe not. With the old scheme, the last page of an active table was a hot-spot for contention, because every insert/update on the table had to get an exclusive lock on that page. With the new scheme, different backends can be inserting tuples into different pages concurrently. (And yes, the FSM is designed to favor that happening; if multiple pages are recorded as having free space, different backends will get pointed to different pages to do insertions in.) The in-memory FSM could itself become a hot-spot for contention, but the profiles that I've been able to take do not show any indication that that's a problem in practice. Any given backend will touch the FSM only when it's exhausted the free space in its current target insert page, so the traffic to the FSM is significantly less than the total insertion traffic. Also, to the extent that this approach keeps the total table size constant, we avoid a lot of OS-level overhead for repeated extensions of the disk file. So I have no reason to think that insert/update is measurably slower in this scheme than before, and some reasons to think it may often be faster. I have not tried to benchmark the speed of insert/update taken in isolation, though. > But It's also why I consider MVCC > fundamentally flawed. If Postgres used rollback segments like Oracle, > DB2 or other mainstream RDBMS's, this wouldn't be an issue; only locked > rows are put in the rollback segment, so the versioning is still > available, and it would clean up after itself on commit or rollback. I do not agree with this reasoning, and I do *not* see that rollback segments have any advantages over our approach. regards, tom lane
I just tested this, and it's true. On my 7.2.1 machine, with booles, updating 50% of the 100,000 test rows I made, the data usage climbs about 1 Meg, and most of it does not get recovered by vacuum. I'll compile 7.2.1 (I'm running 7.2) and test it for other datatypes as well. On Thu, 2 May 2002, Shaun Thomas wrote: > On Wed, 1 May 2002 postgres@vrane.com wrote: > > > I'm very curious to know why you have problem with growing > > database. Does the performance suffer significantly > > if you don't do the FULL vacuum? Surely if you can > > afford the oracle you can afford relatively much > > cheaper storage. You must have other reasons > > than just not liking large database > > Well, it's not the fact that it's growing that's the problem. It's the > fact that 100 actual MB of frequently changed data becomes 2gigs if not > frequently vacuumed. Even with hourly full vacuums, it still slowly > bloats to 200mb in two weeks, with the same amount of data. The worst > part about this is that the more it bloats, the longer vacuum takes, > and the speed of the bloating increases almost exponentially. Given > two months, it's back up to 2 gigs. > > Full dump and restore? Back down to 100MB. I'm sorry, but no amount of > disk storage should have to compensate for a database growing to 60x > larger than the actual data stored (the data dump is 30 megs, but you > can expect a certain amount of bloat due to column sizes). When the > data files start hitting 2GB each, the Linux file-size limit comes into > play, and you have no choice but to dump and restore. > > Even worse? Since there is only about 100 MB of real data in that 2GB > morass, there's 1.9GB of old or invalid rows that Vacuum didn't clean > up. That makes all subsequent vacuums slower, which makes their locks > last longer, which means all selects on the tables being vacuumed are > stalled until the vacuum is done. What happens when the vacuum takes > half an hour, and it's a web application? Oh darn, you stop serving > pages that use the database. > > Only full vacuum on non-peak times, you say? We tried that. The > datafiles exploded to 2GB within days. *DAYS* The hourly vacuum > brought it under control, but our insert script which runs every hour > and replaces about 10% of the data per run, really cries bloody murder > while the vacuum is running. As a result, this machine commonly has a > load approaching 20 almost all the time. Turn off postgres? Less than > 1, even with a concurrent Mysql DB that has a *valid* 2GB database that > contains 2GB of actual data. > > I'm not passing blame. I'm not even angry. I'm just frustrated and > tired of babying Postgres so it doesn't cause the server to burst into > flames, crash, or otherwise fail. I actually had less admin overhead > with an Oracle database. Anyone who has used Oracle knows just how hard > it is to administer, but I sure as hell never had to completely dump and > restore the data every month to keep it from eating my drives. I mean, > what good is MVCC to avoid locking, when frequent full vacuums are even > worse? I'd rather have a little locking contention, than have a > completely useless database for ten to twenty minutes, every hour. > > Heck, maybe it's just our special case, that such a high rate of data > turnover just bewilders postgres. But as of now, it's just plain > unusable. Until vacuum goes away completely, which it never will as > long as MVCC is in place, postgres is not an option for us. I just > can't see any way around it. I hate mysql when I want things like > foreign keys or subselects, so we still need a full DBMS. Hell, we're > even considering giving a look to freaking Interbase, for the love of > God. We're desperate, here. ^_^ > > As a side note, Sybase used to be our DBMS of choice, but it didn't play > nice with PHP (segfaults on connects, sometimes), and only old versions > fall under the free license, so we ditched it too. It also had weird > syntax (go, for crying out loud!?), so our developers hated it. > > If I could only take all the good things from the databases I liked, and > make them one database... Ease of administration of Mysql + Features of > Oracle, for instance. But that database is only available in the > wonderful, magical world of 'you must be kidding' land. Oh well. > >
There HAS to be a bug here. I'm running 7.2, and haven't tried 7.2.1, but I'm guessing it won't fix this. with a table with a text and int field, I get the standard behaviour. Update a lot, file grows. vacuum (full) it and it shrinks back to it's original, or nearly so, the same size. With a bool in the table, things get ugly fast. on a 100,000 row table: table size in blocks: 48500 update test set yn=true where yn=true; table size in blocks: 49716 vacuum; table size in blocks: 49688 vacuum full; table size in blocks: 50832 and it just keeps going and going and going and going... Something is broken in vacuum and bools. I haven't tested any types but text, int, and bool so far. I'll test varchar, float and numeric next and let ya know.
> > > >So I have no reason to think that insert/update is measurably slower >in this scheme than before, and some reasons to think it may often be >faster. I have not tried to benchmark the speed of insert/update taken >in isolation, though. > It has been my experience that inserts/updates are FASTER when there is some free space in the db from deleted records.
On Thu, 2 May 2002 postgres@vrane.com wrote: > The only problem I can see your very long essay is linux file > size limit. Well that is not postgres fault. Dude, wasting 1.9GB is certainly *not* the OS's fault. Next time your basement starts flooding, blame your basement for not holding all the water you didn't ask for, and don't fix the pipe flooding it. What I currently have is a drain that drains slower than the pipe is flooding my basement. So, you would still think it was the OS's fault if I had a 5TB cluster that was completely full, but once dumped and restored, would only consume 100MB? So yeah. Throw space at it, that'll fix everything. I'm off to go buy the house next to mine so I can keep more of that water... > I would think they are both equally involved. Nope. Only have to switch once if we pick the right replacement. I've upgraded postgres (the customer halting us has since relented) and we'll see. But I'm not convinced it'll make any difference. MVCC is a losing proposition in my opinion. Period. Besides, this all started by some twit saying Postgres and Oracle are interchangable. They're not. That's all I'm saying. -- +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ | Shaun M. Thomas INN Database Administrator | | Phone: (309) 743-0812 Fax : (309) 743-0830 | | Email: sthomas@townnews.com AIM : trifthen | | Web : www.townnews.com | | | | "Most of our lives are about proving something, either to | | ourselves or to someone else." | | -- Anonymous | +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
Hi Shaun, On Fri, 2002-05-03 at 09:54, Shaun Thomas wrote: > > Besides, this all started by some twit saying Postgres and Oracle are > interchangable. They're not. That's all I'm saying. > Oh, I hope I'm not the "twit" that said Postgres and Oracle are interchangable. My point in an earlier post was that for _our_ clients' applications there is no reason to use/require Oracle; that Postgres fits. Its a hammer/wrench type of thing - pick the tool that best satisfies the job requirements. -- Tom Jenkins Development InfoStructure http://www.devis.com
Is there a newer version of PgAccess? I get this error when I go to design an existing table a figure that it has to do with the fact that postgres is serverall version ahead of the last pgaccess version I have?? Thanks for any help David Blood __________________________________________________ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com
On Fri, 2002-05-03 at 17:39, david blood wrote: > Is there a newer version of PgAccess? I get this error when I go to design an > existing table a figure that it has to do with the fact that postgres is > serverall version ahead of the last pgaccess version I have?? pgaccess works just fine for me on 7.2. What is your problem? Cheers Tony Grant -- RedHat Linux on Sony Vaio C1XD/S http://www.animaproductions.com/linux2.html Macromedia UltraDev with PostgreSQL http://www.animaproductions.com/ultra.html
I should have posted the error. I get this when I click design on a table. I also do not see the indexes that are on a table. David Blood ERROR: Attribute 'oid' not found ERROR: Attribute 'oid' not found while executing "pg_select pgsql184 {select oid,indexrelid from pg_index where (pg_class.relname='tblbadword') and (pg_class.oid=pg_index.indrelid)} rec { lappend Pg..." ("uplevel" body line 1) invoked from within "uplevel pg_select $args" (procedure "wpg_select" line 3) invoked from within "wpg_select $CurrentDB "select oid,indexrelid from pg_index where (pg_class.relname='$PgAcVar(tblinfo,tablename)') and (pg_class.oid=pg_index.indrelid)..." (procedure "refreshTableInformation" line 48) invoked from within "refreshTableInformation" (procedure "Tables::design" line 5) invoked from within "Tables::design $objname" ("Tables" arm line 2) invoked from within "switch $PgAcVar(activetab) { Tables { Tables::design $objname } Schema { Schema::open $objname } Queries { Queries::design $objname } V..." (procedure "Mainlib::cmd_Design" line 7) invoked from within "Mainlib::cmd_Design" invoked from within ".pgaw:Main.btndesign invoke" ("uplevel" body line 1) invoked from within "uplevel #0 [list $w invoke]" (procedure "tkButtonUp" line 9) invoked from within "tkButtonUp .pgaw:Main.btndesign " (command bound to event) --- tony <tony@animaproductions.com> wrote: > On Fri, 2002-05-03 at 17:39, david blood wrote: > > Is there a newer version of PgAccess? I get this error when I go to design > an > > existing table a figure that it has to do with the fact that postgres is > > serverall version ahead of the last pgaccess version I have?? > > pgaccess works just fine for me on 7.2. What is your problem? > > Cheers > > Tony Grant > -- > RedHat Linux on Sony Vaio C1XD/S > http://www.animaproductions.com/linux2.html > Macromedia UltraDev with PostgreSQL > http://www.animaproductions.com/ultra.html > __________________________________________________ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com
david blood <davidjblood@yahoo.com> writes: > I get this when I click design on a table. I also do not see the indexes that > are on a table. > ERROR: Attribute 'oid' not found > while executing > "pg_select pgsql184 {select oid,indexrelid from pg_index where > (pg_class.relname='tblbadword') and (pg_class.oid=pg_index.indrelid)} rec { > lappend Pg..." Try using the 7.2 version of pgaccess. That looks to be 7.1 code... regards, tom lane
I am not sure where to get that. I have gone to the pgaccess site and am using the latest version downloadable there. David Blood --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > david blood <davidjblood@yahoo.com> writes: > > I get this when I click design on a table. I also do not see the indexes > that > > are on a table. > > > ERROR: Attribute 'oid' not found > > > while executing > > "pg_select pgsql184 {select oid,indexrelid from pg_index where > > (pg_class.relname='tblbadword') and (pg_class.oid=pg_index.indrelid)} rec { > > lappend Pg..." > > Try using the 7.2 version of pgaccess. That looks to be 7.1 code... > > regards, tom lane __________________________________________________ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com
On Mon, 6 May 2002, david blood wrote: > I should have posted the error. > > I get this when I click design on a table. I also do not see the indexes that > are on a table. > > David Blood > > ERROR: Attribute 'oid' not found > > ERROR: Attribute 'oid' not found > > while executing > "pg_select pgsql184 {select oid,indexrelid from pg_index where > (pg_class.relname='tblbadword') and (pg_class.oid=pg_index.indrelid)} rec { > lappend Pg..." > ("uplevel" body line 1) > invoked from within > "uplevel pg_select $args" > (procedure "wpg_select" line 3) > invoked from within > "wpg_select $CurrentDB "select oid,indexrelid from pg_index where > (pg_class.relname='$PgAcVar(tblinfo,tablename)') and > (pg_class.oid=pg_index.indrelid)..." > (procedure "refreshTableInformation" line 48) > invoked from within > "refreshTableInformation" > > ... [sniped]B > > --- tony <tony@animaproductions.com> wrote: > > On Fri, 2002-05-03 at 17:39, david blood wrote: > > > Is there a newer version of PgAccess? I get this error when I go to design > > an > > > existing table a figure that it has to do with the fact that postgres is > > > serverall version ahead of the last pgaccess version I have?? > > > > pgaccess works just fine for me on 7.2. What is your problem? > > I think the answer to this problem is that yes there is a newer version of PgAccess. There must be since the version with 7.2.1 doesn't try to retrieve the oid field, which indeed does not exist in pg_index as reported in the error message. -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
On Mon, 6 May 2002, david blood wrote: > I am not sure where to get that. I have gone to the pgaccess site and am using > the latest version downloadable there. > > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > david blood <davidjblood@yahoo.com> writes: > > > I get this when I click design on a table. I also do not see the indexes > > that > > > are on a table. > > [sniped] > > > > Try using the 7.2 version of pgaccess. That looks to be 7.1 code... I see Tom got there ahead of me. If you've got a problem you must be running a more recent backend, which means you must have upgraded it, can't you just upgrade the client tools and things from the same source you upgraded the backend from? LOL, my first answer to this that Tom beat me to has just arrived in my mail box. -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
Shaun Thomas wrote: > On Mon, 29 Apr 2002, pgsql-gen Newsgroup wrote: > > > The way I see it, some managers will buy Oracle. They will have low > > profit margines. Some programers will use PostgreSQL. They will have > > high margins. > > That's all well and good provided postgres and Oracle were 100% feature > compatible. They're not. Want inter-database queries? Too bad. > Replication? Nope. Parallel queries? Scratch that. Packages? Big > goose-egg. ADA-style error catching? Zero. In/Out variables? Only in > your dreams, buddy. Views that take parameters? Zilch. Wanna actually > drop your foreign keys, or change their status or triggering order for > loading purposes? Not here. Agreed. We are not 100% feature compatible with Oracle. We do have some things Oracle doesn't, and they have some things we don't. I will admit they have more things we don't have, and some of the things they have are really useful, as you pointed out. However, some of they stuff they have is just bloat, which can be a negative. > Why are our databases bloating, even after hourly full vacuums? Because > we have a database with a 50-100% data turnover rate at about 100,000 > rows, and postgres just can't handle it. I've watched our 100mb > database grow to 500mb, then 2gigs. Full dump and restore? 70mb > again. Oh, and the spiking load, and table locks that occur during > full vacuums? Just take the hit, web-surfers be damned. I have read the thread an no one seems to have addressed your problem --- that a full vacuum doesn't reduce the db sizes back to 100mb. There are two possible causes, one is an old transaction that is keeping those expired tuples visible (unlikely), and index growth, particularly for ever-increasing keys, like primary keys. We have this TODO item: * Certain indexes will not shrink, e.g. oid indexes with many inserts I believe this has not been done because there are some problems with doing this in a crash-safe manner (?). Would you see if recreating some of the larger indexes helps reduce your disk usage after vacuum? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Sat, 1 Jun 2002, Bruce Momjian wrote: > I have read the thread an no one seems to have addressed your problem > --- that a full vacuum doesn't reduce the db sizes back to 100mb. Wow, this is an old thread, too. ^_^ Actually, I copied the vacuumdb script and changed the parameters/code to be a reindex script. I submitted it five times in various forms until it did basically everything everyone was requesting. I think I was hinting around that it should be added to the official codebase because it's a necessary utility. Why? Because that made my problem go away. The problem was that vacuum doesn't reindex, so indexes just kept growing. Either way, there's no longer a problem. ^_^ Take care. -- +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ | Shaun M. Thomas INN Database Administrator | | Phone: (309) 743-0812 Fax : (309) 743-0830 | | Email: sthomas@townnews.com AIM : trifthen | | Web : www.townnews.com | | | | "Most of our lives are about proving something, either to | | ourselves or to someone else." | | -- Anonymous | +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
Shaun Thomas wrote: > On Sat, 1 Jun 2002, Bruce Momjian wrote: > > > I have read the thread an no one seems to have addressed your problem > > --- that a full vacuum doesn't reduce the db sizes back to 100mb. > > Wow, this is an old thread, too. ... Yeah, Bruce, after 2 month what about reading your mailbox in reverse order? ;-) Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck wrote: > Shaun Thomas wrote: > > On Sat, 1 Jun 2002, Bruce Momjian wrote: > > > > > I have read the thread an no one seems to have addressed your problem > > > --- that a full vacuum doesn't reduce the db sizes back to 100mb. > > > > Wow, this is an old thread, too. ... > > Yeah, > > Bruce, after 2 month what about reading your mailbox in > reverse order? The problem is that the threads are often resurected as new email messages with no linkage to the old, and subject lines that are very different, so I can't find if the issue was addressed later or not. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026