Thread: vacuum locking
It seems a simple "vacuum" (not full or analyze) slows down the database dramatically. I am running vacuum every 15 minutes, but it takes about 5 minutes to run even after a fresh import. Even with vacuuming every 15 minutes, I'm not sure vacuuming is working properly. There are a lot of updates. The slowest relation is the primary key index, which is composed of a sequence. I've appended a csv with the parsed output from vacuum. The page counts are growing way too fast imo. I believe this is caused by the updates, and index pages not getting re-used. The index values aren't changing, but other values in the table are. Any suggestions how to make vacuuming more effective and reducing the time it takes to vacuum? I'd settle for less frequent vacuuming or perhaps index rebuilding. The database can be re-imported in about an hour. Rob ---------------------------------------------------------------- Spacing every 15 minutes Pages,Tuples,Deleted 7974,1029258,1536 7979,1025951,4336 7979,1026129,52 7979,1025618,686 7979,1025520,152 7980,1025583,28 7995,1028008,6 8004,1030016,14 8010,1026149,4965 8012,1026684,6 8014,1025910,960 8020,1026812,114 8027,1027642,50 8031,1027913,362 8040,1028368,784 8046,1028454,1143 8049,1029155,6 8053,1029980,10 8065,1031506,24 8084,1029134,4804 8098,1031004,346 8103,1029412,3044 8118,1029736,1872 8141,1031643,1704 8150,1032597,286 8152,1033222,6 8159,1029436,4845 8165,1029987,712 8170,1030229,268 8176,1029568,1632 8189,1030136,1540 8218,1030915,3963 8255,1033049,4598 8297,1036583,3866 8308,1031412,8640 8315,1031987,1058 8325,1033892,6 8334,1030589,4625 8350,1031709,1040 8400,1033071,5946 8426,1031555,8368 8434,1031638,2240 8436,1031703,872 8442,1031891,612
Rob Nagler wrote: > It seems a simple "vacuum" (not full or analyze) slows down the > database dramatically. I am running vacuum every 15 minutes, but it > takes about 5 minutes to run even after a fresh import. Even with > vacuuming every 15 minutes, I'm not sure vacuuming is working > properly. > > There are a lot of updates. The slowest relation is the primary key > index, which is composed of a sequence. I've appended a csv with the > parsed output from vacuum. The page counts are growing way too fast > imo. I believe this is caused by the updates, and index pages not > getting re-used. The index values aren't changing, but other values > in the table are. You should try 7.4 beta and pg_autovacuum which is a contrib module in CVS tip. It works with 7.3 as well. Major reason for 7.4 is, it fixes index growth in vacuum. So if your database is fit, it will stay that way with proper vacuuming. > > Any suggestions how to make vacuuming more effective and reducing the > time it takes to vacuum? I'd settle for less frequent vacuuming or > perhaps index rebuilding. The database can be re-imported in about an > hour. Make sure that you have FSM properly tuned. Bump it from defaults to suit your needs. I hope you have gone thr. this page for general purpose setting. http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html > > Rob > ---------------------------------------------------------------- > Spacing every 15 minutes > Pages,Tuples,Deleted > 7974,1029258,1536 > 7979,1025951,4336 > 7979,1026129,52 > 7979,1025618,686 Assuming those were incremental figures, largest you have is ~8000 tuples per 15 minutes and 26 pages. I think with proper FSM/shared buffers/effective cache and a pg_autovacuum with 1 min. polling interval, you could end up in lot better shape. Let us know if it works. Shridhar
> Any suggestions how to make vacuuming more effective and reducing the > time it takes to vacuum? I'd settle for less frequent vacuuming or > perhaps index rebuilding. The database can be re-imported in about an > hour. Which version and what are your FSM settings?
Attachment
Shridhar Daithankar writes: > You should try 7.4 beta and pg_autovacuum which is a contrib module > in CVS tip. It's on our todo list. :) How does pg_autovacuum differ from vacuumdb? I mean it seems to call the vacuum operation underneath just as vacuumdb does. I obviously didn't follow the logic as to how it gets there. :-) > Make sure that you have FSM properly tuned. Bump it from defaults to > suit your needs. I hope you have gone thr. this page for general > purpose setting. I didn't start vacuuming regularly until recently, so I didn't see this problem. > Assuming those were incremental figures, largest you have is ~8000 > tuples per 15 minutes and 26 pages. I think with proper FSM/shared > buffers/effective cache and a pg_autovacuum with 1 min. polling > interval, you could end up in lot better shape. Here are the numbers that are different. I'm using 7.3: shared_buffers = 8000 sort_mem = 8000 vacuum_mem = 64000 effective_cache_size = 40000 free says: total used free shared buffers cached Mem: 1030676 1005500 25176 0 85020 382280 -/+ buffers/cache: 538200 492476 Swap: 2096472 272820 1823652 It seems effective_cache_size is about right. vacuum_mem might be slowing down the system? But if I reduce it, won't vacuuming get slower? max_fsm_relations is probably too low (the default in my conf file says 100, probably needs to be 1000). Not sure how this affects disk usage. Here's the summary for the two active tables during a vacuum interval with high activity. The other tables don't get much activity, and are much smaller. As you see the 261 + 65 adds up to the bulk of the 5 minutes it takes to vacuum. INFO: Removed 8368 tuples in 427 pages. CPU 0.06s/0.04u sec elapsed 1.54 sec. INFO: Pages 24675: Changed 195, Empty 0; Tup 1031519: Vac 8368, Keep 254, UnUsed 1739. Total CPU 2.92s/2.58u sec elapsed 65.35 sec. INFO: Removed 232 tuples in 108 pages. CPU 0.01s/0.02u sec elapsed 0.27 sec. INFO: Pages 74836: Changed 157, Empty 0; Tup 4716475: Vac 232, Keep 11, UnUsed 641. Total CPU 10.19s/6.03u sec elapsed 261.44 sec. How would vacuuming every minute finish in time? It isn't changing much in the second table, but it's taking 261 seconds to wade through 5m rows. Assuming I vacuum every 15 minutes, it would seem like max_fsm_pages should be 1000, because that's about what was reclaimed. The default is 10000. Do I need to change this? Sorry to be so dense, but I just don't know the right values are. Thanks muchly for the advice, Rob
Rob, > vacuum_mem might be slowing down the system? But if I reduce it, > won't vacuuming get slower? Yes, but it will have less of an impact on the system while it's running. > INFO: Removed 8368 tuples in 427 pages. > CPU 0.06s/0.04u sec elapsed 1.54 sec. > INFO: Pages 24675: Changed 195, Empty 0; Tup 1031519: Vac 8368, Keep 254, > UnUsed 1739. Total CPU 2.92s/2.58u sec elapsed 65.35 sec. > > INFO: Removed 232 tuples in 108 pages. > CPU 0.01s/0.02u sec elapsed 0.27 sec. > INFO: Pages 74836: Changed 157, Empty 0; Tup 4716475: Vac 232, Keep 11, > UnUsed 641. > Total CPU 10.19s/6.03u sec elapsed 261.44 sec. What sort of disk array do you have? That seems like a lot of time considering how little work VACUUM is doing. -- Josh Berkus Aglio Database Solutions San Francisco
On Fri, 17 Oct 2003 09:52:26 -0600, Rob Nagler <nagler@bivio.biz> wrote: >INFO: Removed 8368 tuples in 427 pages. > CPU 0.06s/0.04u sec elapsed 1.54 sec. >INFO: Pages 24675: Changed 195, Empty 0; Tup 1031519: Vac 8368, Keep 254, UnUsed 1739. > Total CPU 2.92s/2.58u sec elapsed 65.35 sec. > >INFO: Removed 232 tuples in 108 pages. > CPU 0.01s/0.02u sec elapsed 0.27 sec. >INFO: Pages 74836: Changed 157, Empty 0; Tup 4716475: Vac 232, Keep 11, UnUsed >641. > Total CPU 10.19s/6.03u sec elapsed 261.44 sec. The low UnUsed numbers indicate that FSM is working fine. >Assuming I vacuum every 15 minutes, it would seem like max_fsm_pages >should be 1000, because that's about what was reclaimed. The default >is 10000. Do I need to change this? ISTM you are VACCUMing too aggressively. You are reclaiming less than 1% and 0.005%, respectively, of tuples. I would increase FSM settings to ca. 1000 fsm_relations, 100000 fsm_pages and VACUUM *less* often, say every two hours or so. ... or configure autovacuum to VACUUM a table when it has 10% dead tuples. Servus Manfred
Manfred Koizar writes: > ISTM you are VACCUMing too aggressively. You are reclaiming less than > 1% and 0.005%, respectively, of tuples. I would increase FSM settings > to ca. 1000 fsm_relations, 100000 fsm_pages and VACUUM *less* often, > say every two hours or so. I did this. We'll see how it goes. > ... or configure autovacuum to VACUUM a table when it has 10% dead > tuples. This solution doesn't really fix the fact that VACUUM consumes the disk while it is running. I want to avoid the erratic performance on my web server when VACUUM is running. mfg, Rob
Josh Berkus writes: > Yes, but it will have less of an impact on the system while it's running. We'll find out. I lowered it to vacuum_mem to 32000. > What sort of disk array do you have? That seems like a lot of time > considering how little work VACUUM is doing. Vendor: DELL Model: PERCRAID Mirror Rev: V1.0 Type: Direct-Access ANSI SCSI revision: 02 Two 10K disks attached. Rob
>>>>> "RN" == Rob Nagler <nagler@bivio.biz> writes: RN> Vendor: DELL Model: PERCRAID Mirror Rev: V1.0 RN> Type: Direct-Access ANSI SCSI revision: 02 AMI or Adaptec based? If AMI, make sure it has write-back cache enabled (and you have battery backup!), and disable the 'readahead' feature if you can. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
>>>>> "RN" == Rob Nagler <nagler@bivio.biz> writes: RN> This solution doesn't really fix the fact that VACUUM consumes the RN> disk while it is running. I want to avoid the erratic performance on RN> my web server when VACUUM is running. What's the disk utilization proir to running vacuum? If it is hovering around 95% or more of capacity, of course you're gonna overwhelm it. This ain't Star Trek -- the engines can't run at 110%, Cap'n! -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
Vivek Khera writes: > AMI or Adaptec based? Adaptec, I think. AIC-7899 LVD SCSI is what dmidecode says, and Red Hat/Adaptec aacraid driver, Aug 18 2003 is what comes up when it boots. I haven't be able to use the aac utilities with this driver, however, so it's hard to interrogate the device. > If AMI, make sure it has write-back cache enabled (and you have > battery backup!), and disable the 'readahead' feature if you can. I can't do this so easily. It's at a colo, and it's production. I doubt this has anything to do with this problem, anyway. We're talking about hundreds of megabytes of data. > What's the disk utilization proir to running vacuum? If it is > hovering around 95% or more of capacity, of course you're gonna > overwhelm it. Here's the vmstat 5 at a random time: procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 0 0 272372 38416 78220 375048 0 3 2 0 0 0 2 2 0 0 0 0 272372 30000 78320 375660 0 0 34 274 382 284 5 1 94 0 1 0 272372 23012 78372 375924 0 0 25 558 445 488 8 2 90 1 0 0 272368 22744 78472 376192 0 6 125 594 364 664 9 3 88 And here's it during vacuum: procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 2 1 277292 9620 72028 409664 46 32 4934 4812 1697 966 8 4 88 0 3 0 277272 9588 72096 412964 61 0 7303 2478 1391 976 3 3 94 2 2 0 277336 9644 72136 393264 1326 32 2827 2954 1693 1519 8 3 89 The pages are growing proportionately with the number of tuples, btw. Here's a vacuum snippet from a few days ago after a clean import, running every 15 minutes: INFO: Removed 2192 tuples in 275 pages. CPU 0.06s/0.01u sec elapsed 0.91 sec. INFO: Pages 24458: Changed 260, Empty 0; Tup 1029223: Vac 2192, Keep 3876, UnUsed 26. Total CPU 2.91s/2.22u sec elapsed 65.74 sec. And here's the latest today, running every 2 hours: INFO: Removed 28740 tuples in 1548 pages. CPU 0.08s/0.06u sec elapsed 3.73 sec. INFO: Pages 27277: Changed 367, Empty 0; Tup 1114178: Vac 28740, Keep 1502, UnUsed 10631. Total CPU 4.78s/4.09u sec elapsed 258.10 sec. The big tables/indexes are taking longer, but it's a big CPU/elapsed time savings to vacuum every two hours vs every 15 minutes. There's still the problem that when vacuum is running interactive performance drops dramatically. A query that takes a couple of seconds to run when the db isn't being vacuumed will take minutes when vacuum is running. It's tough for me to correlate exactly, but I suspect that while postgres is vacuuming an index or table, nothing else runs. In between relations, other stuff gets to run, and then vacuum hogs all the resources again. This could be for disk reasons or simply because postgres locks the index or table while it is being vacuumed. Either way, the behavior is unacceptable. Users shouldn't have to wait minutes while the database picks up after itself. The concept of vacuuming seems to be problematic. I'm not sure why the database simply can't garbage collect incrementally. AGC is very tricky, especially AGC that involves gigabytes of data on disk. Incremental garbage collection seems to be what other databases do, and it's been my experience that other databases don't have the type of unpredictable behavior I'm seeing with Postgres. I'd rather the database be a little bit slower on average than have to figure out the best time to inconvenience my users. Since my customer already has Oracle, we'll be running tests in the coming month(s :-) with Oracle to see how it performs under the same load and hardware. I'll keep this group posted. Rob
Rob Nagler <nagler@bivio.biz> writes: > Here's the vmstat 5 at a random time: > procs memory swap io system cpu > r b w swpd free buff cache si so bi bo in cs us sy id > 0 0 0 272372 38416 78220 375048 0 3 2 0 0 0 2 2 0 > 0 0 0 272372 30000 78320 375660 0 0 34 274 382 284 5 1 94 > 0 1 0 272372 23012 78372 375924 0 0 25 558 445 488 8 2 90 > 1 0 0 272368 22744 78472 376192 0 6 125 594 364 664 9 3 88 > And here's it during vacuum: > procs memory swap io system cpu > r b w swpd free buff cache si so bi bo in cs us sy id > 1 2 1 277292 9620 72028 409664 46 32 4934 4812 1697 966 8 4 88 > 0 3 0 277272 9588 72096 412964 61 0 7303 2478 1391 976 3 3 94 > 2 2 0 277336 9644 72136 393264 1326 32 2827 2954 1693 1519 8 3 89 The increased I/O activity is certainly to be expected, but what I find striking here is that you've got substantial swap activity in the second trace. What is causing that? Not VACUUM I don't think. It doesn't have any huge memory demand. But swapping out processes could account for the perceived slowdown in interactive response. regards, tom lane
Am Donnerstag, 23. Oktober 2003 01:32 schrieb Rob Nagler: > The concept of vacuuming seems to be problematic. I'm not sure why > the database simply can't garbage collect incrementally. AGC is very > tricky, especially AGC that involves gigabytes of data on disk. > Incremental garbage collection seems to be what other databases do, > and it's been my experience that other databases don't have the type > of unpredictable behavior I'm seeing with Postgres. I'd rather the > database be a little bit slower on average than have to figure out the > best time to inconvenience my users. I think oracle does not do garbage collect, it overwrites the tuples directly and stores the old tuples in undo buffers. Since most transactions are commits, this is a big win.
On Wed, Oct 22, 2003 at 09:27:47PM -0400, Tom Lane wrote: > trace. What is causing that? Not VACUUM I don't think. It doesn't have > any huge memory demand. But swapping out processes could account for What about if you've set vacuum_mem too high? A -- ---- Andrew Sullivan 204-4141 Yonge Street Afilias Canada Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Andrew Sullivan <andrew@libertyrms.info> writes: > On Wed, Oct 22, 2003 at 09:27:47PM -0400, Tom Lane wrote: >> trace. What is causing that? Not VACUUM I don't think. It doesn't have >> any huge memory demand. But swapping out processes could account for > What about if you've set vacuum_mem too high? Maybe, but only if it actually had reason to use a ton of memory --- that is, it were recycling a very large number of tuples in a single table. IIRC that didn't seem to be the case here. regards, tom lane
Mario Weilguni <mweilguni@sime.com> writes: > I think oracle does not do garbage collect, it overwrites the tuples directly > and stores the old tuples in undo buffers. Since most transactions are > commits, this is a big win. ... if all tuples are the same size, and if you never have any transactions that touch enough tuples to overflow your undo segment (or even just sit there for a long time, preventing you from recycling undo-log space; this is the dual of the VACUUM-can't-reclaim-dead-tuple problem). And a few other problems that any Oracle DBA can tell you about. I prefer our system. regards, tom lane
On Thu, Oct 23, 2003 at 09:17:41AM -0400, Tom Lane wrote: > > Maybe, but only if it actually had reason to use a ton of memory --- > that is, it were recycling a very large number of tuples in a single > table. IIRC that didn't seem to be the case here. Ah, that's what I was trying to ask. I didn't know if the memory was actually taken by vacuum at the beginning (like shared memory is) or what-all happened. A -- ---- Andrew Sullivan 204-4141 Yonge Street Afilias Canada Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Tom Lane writes: > ... if all tuples are the same size, and if you never have any Incorrect. If the tuples smaller, Oracle does the right thing. If there's enough space in the page, it shifts the tuples to make room. That's what pctfree, pctused and pctincrease allow you to control. It's all in memory so its fast, and I don't think it has to update any indices. > transactions that touch enough tuples to overflow your undo segment That's easily configured, and hasn't been a problem in the databases I've managed. > (or even just sit there for a long time, preventing you from recycling That's probably bad software or a batch system--which is tuned differently. Any OLTP system has to be able to partition its problems to keep transactions short and small. If it doesn't, it will not be usable. > undo-log space; this is the dual of the VACUUM-can't-reclaim-dead-tuple > problem). And a few other problems that any Oracle DBA can tell you > about. I prefer our system. Oracle seems to make the assumption that data changes, which is why it manages free space within each page as well as within free lists. The database will be bigger but you get much better performance on DML. It is very good at caching so reads are fast. Postgres seems to make the assumption that updates and deletes are rare. A delete/insert policy for updates means that a highly indexed table requires lots of disk I/O when the update happens and the concomitant garbage collection when vacuum runs. But then MVCC makes the assumption that there's lots of DML. I don't understand the philosphical split here. I guess I don't understand what application profiles/statistics makes you prefer Postgres' approach over Oracle's. > The increased I/O activity is certainly to be expected, but what I find > striking here is that you've got substantial swap activity in the second > trace. What is causing that? Not VACUUM I don't think. It doesn't have > any huge memory demand. But swapping out processes could account for > the perceived slowdown in interactive response. The box is a bit memory starved, and we'll be addressing that shortly. I don't think it accounts for 3 minute queries, but perhaps it might. vacuum_mem is 32mb, btw. Rob
>>>>> "RN" == Rob Nagler <nagler@bivio.biz> writes: RN> Vivek Khera writes: >> AMI or Adaptec based? RN> Adaptec, I think. AIC-7899 LVD SCSI is what dmidecode says, and RN> Red Hat/Adaptec aacraid driver, Aug 18 2003 is what comes up when it Cool. No need to diddle with it, then. The Adaptec work quite well, especially if you have battery backup. Anyhow, it seems that as Tom mentioned, you are going into swap when your vacuum runs, so I'll suspect you're just at the edge of total memory utilization, and then you go over the top. Another theory is that the disk capacity is near saturation, the vacuum causes it to slow down just a smidge, and then your application opens additional connections to handle the incoming requests which don't complete fast enough, causing more memory usage with the additional postmasters created. Again, you suffer the slow spiral of death due to resource shortage. I'd start by getting full diagnosis of overall what your system is doing during the vacuum (eg, additional processes created) then see if adding RAM will help. Also, how close are you to the capacity of your disk bandwidth? I don't see that in your numbers. I know in freebsd I can run "systat -vmstat" and it gives me a percentage of utilization that lets me know when I'm near the capacity. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
Am Donnerstag, 23. Oktober 2003 15:26 schrieb Tom Lane: > ... if all tuples are the same size, and if you never have any > transactions that touch enough tuples to overflow your undo segment > (or even just sit there for a long time, preventing you from recycling > undo-log space; this is the dual of the VACUUM-can't-reclaim-dead-tuple > problem). And a few other problems that any Oracle DBA can tell you about. > I prefer our system. of course both approaches have advantages, it simply depends on the usage pattern. A case where oracle really rules over postgresql are m<-->n connection tables where each record consist of two foreign keys, the overwrite approach is a big win here.
Rob Nagler <nagler@bivio.biz> writes: > Incorrect. If the tuples smaller, Oracle does the right thing. If > there's enough space in the page, it shifts the tuples to make room. > That's what pctfree, pctused and pctincrease allow you to control. > It's all in memory so its fast, and I don't think it has to update any > indices. Note that pctfree/pctused are a big performance drain on the usual case. Try setting them to 0/100 on a table that doesn't get updates (like a many-many relation table) and see how much faster it is to insert and scan. > > transactions that touch enough tuples to overflow your undo segment > > That's easily configured, and hasn't been a problem in the databases > I've managed. Judging by the number of FAQ lists out there that explain various quirks of rollback segment configuration I wouldn't say it's so easily configured. > > (or even just sit there for a long time, preventing you from recycling > > That's probably bad software or a batch system--which is tuned > differently. Any OLTP system has to be able to partition its problems > to keep transactions short and small. If it doesn't, it will not be > usable. Both DSS style and OLTP style databases can be accomodated with rollback segments though it seems to me that DSS style databases lose most of the advantage of rollback segments and optimistic commit. The biggest problem is on systems where there's a combination of both users. You need tremendous rollback segments to deal with the huge volume of oltp transactions that can occur during a single DSS query. And the DSS query performance is terrible as it has to check the rollback segments for a large portion of the blocks it reads. > Oracle seems to make the assumption that data changes, Arguably it's the other way around. Postgres's approach wins whenever most of the tuples in a table have been updated, in that case it just has to scan the whole table ignoring old records not visible to the transaction. Oracle has to consult the rollback segment for any recently updated tuple. Oracle's wins in the case where most of the tuples haven't changed so it can just scan the table without consulting lots of rollback segments. -- greg
I ran into the same problem with VACUUM on my Linux box. If you are running Linux, take a look at "elvtune" or read this post: http://groups.google.com/groups?q=stephen+vacuum+linux&hl=en&lr=&ie=UTF-8&se lm=gRdjb.7484%241o2.77%40nntp-post.primus.ca&rnum=3 Regards, Stephen "Rob Nagler" <nagler@bivio.biz> wrote in message news:16272.30527.120343.547492@jump.bivio.com... > Manfred Koizar writes: > > ISTM you are VACCUMing too aggressively. You are reclaiming less than > > 1% and 0.005%, respectively, of tuples. I would increase FSM settings > > to ca. 1000 fsm_relations, 100000 fsm_pages and VACUUM *less* often, > > say every two hours or so. > > I did this. We'll see how it goes. > > > ... or configure autovacuum to VACUUM a table when it has 10% dead > > tuples. > > This solution doesn't really fix the fact that VACUUM consumes the > disk while it is running. I want to avoid the erratic performance on > my web server when VACUUM is running. > > mfg, > Rob > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
Stephen writes: > I ran into the same problem with VACUUM on my Linux box. If you are running > Linux, take a look at "elvtune" or read this post: The default values were -r 64 -w 8192. The article said this was "optimal". I just futzed with different values anywere from -w 128 -r 128 to -r 16 -w 8192. None of these mattered much when vacuum is running. This is a RAID1 box with two disks. Even with vacuum and one other postmaster running, it's still got to get a lot of blocks through the I/O system. Rob
Mario Weilguni writes: > of course both approaches have advantages, it simply depends on the usage > pattern. A case where oracle really rules over postgresql are m<-->n > connection tables where each record consist of two foreign keys, the > overwrite approach is a big win here. That's usually our case. My company almost always has "groupware" problems to solve. Every record has a "realm" (security) foreign key and typically another key. The infrastructure puts the security key on queries to avoid returning the wrong realm's data. Rob
Vivek Khera writes: > Also, how close are you to the capacity of your disk bandwidth? I > don't see that in your numbers. I know in freebsd I can run "systat > -vmstat" and it gives me a percentage of utilization that lets me know > when I'm near the capacity. The vacuum totally consumes the system. It's in a constant "D". As near as I can tell, it's hitting all blocks in the database. The problem is interactive performance when vacuum is in a D state. Even with just two processes doing "stuff" (vacuum and a select, let's say), the select is very slow. My understanding of the problem is that if a query hits the disk hard (and many of my queries do) and vacuum is hitting the disk hard, they contend for the same resource and nobody wins. The query optimizer has lots of problems with my queries and ends up doing silly sorts. As a simple example, one query goes like this: select avg(f1) from t1 group by f2; This results in a plan like: Aggregate (cost=171672.95..180304.41 rows=115086 width=32) -> Group (cost=171672.95..177427.26 rows=1150862 width=32) -> Sort (cost=171672.95..174550.10 rows=1150862 width=32) Sort Key: f2 -> Seq Scan on t1 (cost=0.00..39773.62 rows=1150862 width=32) This is of course stupid, because it sorts a 1M rows, which probably means it has to hit disk (sort_mem can only be so large). Turns out there are only about 20 different values of f2, so it would be much better to aggregate without sorting. This is the type of query which runs while vacuum runs and I'm sure the two are just plain incompatible. vacuum is read intensive and this query is write intensive. Rob
Greg Stark writes: > Note that pctfree/pctused are a big performance drain on the usual case. Try > setting them to 0/100 on a table that doesn't get updates (like a many-many > relation table) and see how much faster it is to insert and scan. Right. You can optimize each table independently. The "usual" case doesn't exist in most databases, I've found, which is why Oracle does better. > Judging by the number of FAQ lists out there that explain various quirks of > rollback segment configuration I wouldn't say it's so easily configured. Maybe we just got lucky. :-) > The biggest problem is on systems where there's a combination of both users. As is ours. > You need tremendous rollback segments to deal with the huge volume of oltp > transactions that can occur during a single DSS query. And the DSS query > performance is terrible as it has to check the rollback segments for a large > portion of the blocks it reads. The DSS issues only come into play I think if the queries are long. This is our problem. Postgres does a bad job with DSS, I believe. I mentioned the select avg(f1) from t1 group by f2 in another message. If it were optimized for "standard" SQL, such as, avg, sum, etc., I think it would do a lot better with DSS-type problems. Our problem seems to be that the DSS queries almost always hit disk to sort. > Arguably it's the other way around. Postgres's approach wins whenever most of > the tuples in a table have been updated, in that case it just has to scan the > whole table ignoring old records not visible to the transaction. Oracle has to > consult the rollback segment for any recently updated tuple. Oracle's wins in > the case where most of the tuples haven't changed so it can just scan the > table without consulting lots of rollback segments. I see what you're saying. I'm not a db expert, just a programmer trying to make his queries go faster, so I'll acknowledge that the design is theoretically better. In practice, I'm still stuck. As a simple example, this query select avg(f1) from t1 group by f2 Takes 33 seconds (see explain analyze in another note in this thread) to run on idle hardware with about 1GB available in the cache. It's clearly hitting disk to do the sort. Being a dumb programmer, I changed the query to: select f1 from t1; And wrote the rest in Perl. It takes 4 seconds to run. Why? The Perl doesn't sort to disk, it aggregates in memory. There are 18 rows returned. What I didn't mention is that I originally had: select avg(f1), t2.name from t1, t2 where t2.f2 = t1.f2 group by t2.name; Which is much worse: Aggregate (cost=161046.30..162130.42 rows=8673 width=222) (actual time=72069.10..87455.69 rows=18 loops=1) -> Group (cost=161046.30..161479.95 rows=86729 width=222) (actual time=71066.38..78108.17 rows=963660 loops=1) -> Sort (cost=161046.30..161263.13 rows=86729 width=222) (actual time=71066.36..72445.74 rows=963660 loops=1) Sort Key: t2.name -> Merge Join (cost=148030.15..153932.66 rows=86729 width=222) (actual time=19850.52..27266.40 rows=963660loops=1) Merge Cond: ("outer".f2 = "inner".f2) -> Sort (cost=148028.59..150437.74 rows=963660 width=58) (actual time=19850.18..21750.12 rows=963660loops=1) Sort Key: t1.f2 -> Seq Scan on t1 (cost=0.00..32479.60 rows=963660 width=58) (actual time=0.06..3333.39 rows=963660loops=1) -> Sort (cost=1.56..1.60 rows=18 width=164) (actual time=0.30..737.59 rows=931007 loops=1) Sort Key: t2.f2 -> Seq Scan on t2 (cost=0.00..1.18 rows=18 width=164) (actual time=0.05..0.08 rows=18 loops=1) Total runtime: 87550.31 msec Again, there are about 18 values of f2. The optimizer even knows this (it's a foreign key to t2.f2), but instead it does the query plan in exactly the wrong order. It hits disk probably 3 times as much as the simpler query judging by the amount of time this query takes (33 vs 88 secs). BTW, adding an index to t1.f2 has seriously negative effects on many other DSS queries. I'm still not sure that the sort problem is our only problem when vacuum runs. It's tough to pin down. We'll be adding more memory to see if that helps with the disk contention. Rob
Rob Nagler <nagler@bivio.biz> writes: > Mario Weilguni writes: > > of course both approaches have advantages, it simply depends on the usage > > pattern. A case where oracle really rules over postgresql are m<-->n > > connection tables where each record consist of two foreign keys, the > > overwrite approach is a big win here. I don't understand why you would expect overwriting to win here. What types of updates do you do on these tables? Normally I found using update on such a table was too awkward to contemplate so I just delete all the relation records that I'm replacing for the key I'm working with and insert new ones. This always works out to be cleaner code. In fact I usually leave such tables with no UPDATE grants on them. In that situation I would have actually expected Postgres to do as well as or better than Oracle since that makes them both functionally equivalent. -- greg
Rob Nagler <nagler@bivio.biz> writes: > Greg Stark writes: > > Note that pctfree/pctused are a big performance drain on the usual case. Try > > setting them to 0/100 on a table that doesn't get updates (like a many-many > > relation table) and see how much faster it is to insert and scan. > > Right. You can optimize each table independently. The "usual" case > doesn't exist in most databases, I've found, which is why Oracle does > better. Sorry I was unclear. By "usual case" I meant reading, as opposed to updates. The size of the on-disk representation turns out to be a major determinant in a lot of database applications, since the dominant resource is i/o bandwidth. Try doing a fresh import of a large table with pctfree 0 pctuse 100 and compare how long a select takes on it compared to the original table. > In practice, I'm still stuck. As a simple example, this query > select avg(f1) from t1 group by f2 > > Takes 33 seconds (see explain analyze in another note in this thread) > to run on idle hardware with about 1GB available in the cache. It's > clearly hitting disk to do the sort. Being a dumb programmer, I > changed the query to: I didn't see the rest of the thread so forgive me if you've already seen these suggestions. FIrstly, that type of query will be faster in 7.4 due to implementing a new method for doing groups called hash aggregates. Secondly you could try raising sort_mem. Postgres can't know how much memory it really has before it swaps, so there's a parameter to tell it. And swapping would be much worse than doing disk sorts. You can raise sort_mem to tell it how much memory it's allowed to use before it goes to disk sorts. You can even use ALTER SESSION to raise it in a few DSS sessions but leave it low the many OLTP sessions. If it's high in OLTP sessions then you could quickly hit swap when they all happen to decide to use the maximum amount at the same time. But then you don't want to be doing big sorts in OLTP sessions anyways. Unfortunately there's no way to tell how much memory it thinks it's going to use. I used to use a script to monitor the pgsql_tmp directory in the database to watch for usage. > select f1 from t1; > > And wrote the rest in Perl. It takes 4 seconds to run. Why? The Perl doesn't > sort to disk, it aggregates in memory. There are 18 rows returned. What I > didn't mention is that I originally had: Oof. I expect if you convinced 7.3 to do the sort in memory by a suitable value of sort_mem it would be close, but still slower than perl. 7.4 should be very close since hash aggregates would be more or less equivalent to the perl method. > select avg(f1), t2.name from t1, t2 where t2.f2 = t1.f2 group by t2.name; > > Which is much worse: > > Aggregate (cost=161046.30..162130.42 rows=8673 width=222) (actual time=72069.10..87455.69 rows=18 loops=1) > -> Group (cost=161046.30..161479.95 rows=86729 width=222) (actual time=71066.38..78108.17 rows=963660 loops=1) > -> Sort (cost=161046.30..161263.13 rows=86729 width=222) (actual time=71066.36..72445.74 rows=963660 loops=1) > Sort Key: t2.name > -> Merge Join (cost=148030.15..153932.66 rows=86729 width=222) (actual time=19850.52..27266.40 rows=963660loops=1) > Merge Cond: ("outer".f2 = "inner".f2) > -> Sort (cost=148028.59..150437.74 rows=963660 width=58) (actual time=19850.18..21750.12 rows=963660loops=1) > Sort Key: t1.f2 > -> Seq Scan on t1 (cost=0.00..32479.60 rows=963660 width=58) (actual time=0.06..3333.39 rows=963660loops=1) > -> Sort (cost=1.56..1.60 rows=18 width=164) (actual time=0.30..737.59 rows=931007 loops=1) > Sort Key: t2.f2 > -> Seq Scan on t2 (cost=0.00..1.18 rows=18 width=164) (actual time=0.05..0.08 rows=18 loops=1) > Total runtime: 87550.31 msec > > Again, there are about 18 values of f2. The optimizer even knows this > (it's a foreign key to t2.f2), but instead it does the query plan in > exactly the wrong order. It hits disk probably 3 times as much as the > simpler query judging by the amount of time this query takes (33 vs 88 > secs). BTW, adding an index to t1.f2 has seriously negative effects > on many other DSS queries. Well, first of all it doesn't really because you said to group by t2.name not f1. You might expect it to at least optimize something like this: select avg(f1),t2.name from t1 join t2 using (f2) group by f2 but even then I don't think it actually is capable of using foreign keys as a hint like that. I don't think Oracle does either actually, but I'm not sure. To convince it to do the right thing you would have to do either: SELECT a, t2.name FROM (SELECT avg(f1),f2 FROM t1 GROUP BY f2) AS t1 JOIN t2 USING (f2) Or use a subquery: SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2) FROM t1 GROUP BY f2 Oh, incidentally, my use of the "JOIN" syntax is a personal preference. Ideally it would produce identical plans but unfortunately that's not always true yet, though 7.4 is closer. I think in the suggestion above it actually would. -- greg
Greg Stark writes: > Sorry I was unclear. By "usual case" I meant reading, as opposed to updates. > The size of the on-disk representation turns out to be a major determinant in > a lot of database applications, since the dominant resource is i/o bandwidth. > Try doing a fresh import of a large table with pctfree 0 pctuse 100 and > compare how long a select takes on it compared to the original table. BTW, I greatly appreciate your support on this stuff. This list is a fantastic resource. I think we agree. The question is what is the workload. On tables without updates, postgres will be fast enough. However, postgres is slow on tables with updates afaict. I think of OLTP as a system with updates. One can do DSS on an OLTP database with Oracle, at least it seems to work for one of our projects. > FIrstly, that type of query will be faster in 7.4 due to implementing a new > method for doing groups called hash aggregates. We'll be trying it as soon as it is out. > Secondly you could try raising sort_mem. Postgres can't know how much memory > it really has before it swaps, so there's a parameter to tell it. And swapping > would be much worse than doing disk sorts. It is at 8000. This is probably as high as I can go with multiple postmasters. The sort area is shared in Oracle (I think :-) in the UGA. > You can raise sort_mem to tell it how much memory it's allowed to > use before it goes to disk sorts. You can even use ALTER SESSION to > raise it in a few DSS sessions but leave it low the many OLTP > sessions. If it's high in OLTP sessions then you could quickly hit > swap when they all happen to decide to use the maximum amount at the > same time. But then you don't want to be doing big sorts in OLTP > sessions anyways. This is a web app. I can't control what the user wants to do. Sometimes they update data, and other times they simply look at it. I didn't find ALTER SESSION for postgres (isn't that Oracle?), so I set sort_mem in the conf file to 512000, restarted postrgres. Reran the simpler query (no name) 3 times, and it was still 27 secs. > Unfortunately there's no way to tell how much memory it thinks it's > going to use. I used to use a script to monitor the pgsql_tmp > directory in the database to watch for usage. I don't have to. The queries that run slow are hitting disk. Anything that takes a minute has to be writing to disk. > Well, first of all it doesn't really because you said to group by t2.name not > f1. You might expect it to at least optimize something like this: I put f2 in the group by, and it doesn't matter. That's the point. It's the on-disk sort before the aggregate that's killing the query. > but even then I don't think it actually is capable of using foreign keys as a > hint like that. I don't think Oracle does either actually, but I'm not sure. I'll be finding out this week. > To convince it to do the right thing you would have to do either: > > SELECT a, t2.name > FROM (SELECT avg(f1),f2 FROM t1 GROUP BY f2) AS t1 > JOIN t2 USING (f2) > > Or use a subquery: > > SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2) > FROM t1 > GROUP BY f2 This doesn't solve the problem. It's the GROUP BY that is doing the wrong thing. It's grouping, then aggregating. Rob
Greg Stark writes: > I don't understand why you would expect overwriting to win here. > What types of updates do you do on these tables? These are statistics that we're adjusting. I think that's pretty normal stuff. The DSS component is the avg() of these numbers on particular groups. The groups are related to foreign keys to customers and other things. > Normally I found using update on such a table was too awkward to > contemplate so I just delete all the relation records that I'm > replacing for the key I'm working with and insert new ones. This > always works out to be cleaner code. In fact I usually leave such > tables with no UPDATE grants on them. In accounting apps, we do this, too. It's awkward with all the relationships to update all the records in the right order. But Oracle wins on delete/insert, too, because it reuses the tuples it already has in memory, and it can reuse the same foreign key index pages, too, since the values are usually the same. The difference between Oracle and postgres seems to be optimism. postgres assumes the transaction will fail and/or that a transaction will modify lots of data that is used by other queries going on in parallel. Oracle assumes that the transaction is going to be committed, and it might as well make the changes in place. > In that situation I would have actually expected Postgres to do as well as or > better than Oracle since that makes them both functionally > equivalent. I'll find out soon enough. :-) Rob
Rob Nagler <nagler@bivio.biz> writes: > I didn't find ALTER SESSION for postgres (isn't that Oracle?), so I > set sort_mem in the conf file to 512000, restarted postrgres. Reran > the simpler query (no name) 3 times, and it was still 27 secs. Sorry, I don't know how that bubbled up from the depths of my Oracle memory. In postgres it's just "SET" db=> set sort_mem = 512000; SET > > To convince it to do the right thing you would have to do either: > > > > SELECT a, t2.name > > FROM (SELECT avg(f1),f2 FROM t1 GROUP BY f2) AS t1 > > JOIN t2 USING (f2) > > > > Or use a subquery: > > > > SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2) > > FROM t1 > > GROUP BY f2 > > This doesn't solve the problem. It's the GROUP BY that is doing the > wrong thing. It's grouping, then aggregating. But at least in the form above it will consider using an index on f2, and it will consider using indexes on t1 and t2 to do the join. It's unlikely to go ahead and use the indexes though because normally sorting is faster than using the index when scanning the whole table. You should compare the "explain analyze" results for the original query and these two. And check the results with "set enable_seqscan = off" as well. I suspect you'll find your original query uses sequential scans even when they're disabled because it has no alternative. With the two above it can use indexes but I suspect you'll find they actually take longer than the sequential scan and sort -- especially if you have sort_mem set large enough. -- greg
Greg Stark writes: > > > SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2) > > > FROM t1 > > > GROUP BY f2 > > > > This doesn't solve the problem. It's the GROUP BY that is doing the > > wrong thing. It's grouping, then aggregating. > > But at least in the form above it will consider using an index on f2, and it > will consider using indexes on t1 and t2 to do the join. There are 20 rows in t2, so an index actually slows down the join. I had to drop the index on t1.f2, because it was trying to use it instead of simply sorting 20 rows. I've got preliminary results for a number of "hard" queries between oracle and postgres (seconds): PG ORA 0 5 q1 1 0 q2 0 5 q3 2 1 q4 219 7 q5 217 5 q6 79 2 q7 31 1 q8 These are averages of 10 runs of each query. I didn't optimize pctfree, etc., but I did run analyze after the oracle import. One of the reason postgres is faster on the q1-4 is that postgres supports OFFSET/LIMIT, and oracle doesn't. q7 and q8 are the queries that I've referred to recently (avg of group by). q5 and q6 are too complex to discuss here, but the fundamental issue is the order in which postgres decides to do things. The choice for me is clear: the developer time trying to figure out how to make the planner do the "obviously right thing" has been too high with postgres. These tests demonstate to me that for even complex queries, oracle wins for our problem. It looks like we'll be migrating to oracle for this project from these preliminary results. It's not just the planner problems. The customer is more familiar with oracle, and the vacuum performance is another problem. Rob
Rob Nagler <nagler@bivio.biz> writes: > q5 and q6 are too complex to discuss here, How do you expect us to get better if you don't show us the problems? BTW, have you tried any of this with a 7.4beta release? Another project that I'm aware of saw several bottlenecks in their Oracle-centric code go away when they tested 7.4 instead of 7.3. For instance, there is hash aggregation capability, which would probably solve the aggregate query problem you were complaining about in http://archives.postgresql.org/pgsql-performance/2003-10/msg00640.php regards, tom lane
Rob, > q5 and q6 are too complex to discuss here, but the fundamental issue > is the order in which postgres decides to do things. The choice for > me is clear: the developer time trying to figure out how to make the > planner do the "obviously right thing" has been too high with > postgres. These tests demonstate to me that for even complex queries, > oracle wins for our problem. > > It looks like we'll be migrating to oracle for this project from these > preliminary results. It's not just the planner problems. The > customer is more familiar with oracle, and the vacuum performance is > another problem. Hey, we can't win 'em all. If we could, Larry would be circulating his resume'. I hope that you'll stay current with PostgreSQL developments so that you can do a similarly thourough evaluation for your next project. -- -Josh Berkus Aglio Database Solutions San Francisco
Rob Nagler <nagler@bivio.biz> writes: > One of the reason postgres is faster on the q1-4 is that postgres > supports OFFSET/LIMIT, and oracle doesn't. q7 and q8 are the queries > that I've referred to recently (avg of group by). Well the way to do offset/limit in Oracle is: SELECT * FROM ( SELECT ... , rownum AS n WHERE rownum <= OFFSET+LIMIT ) WHERE n > OFFSET That's basically the same thing Postgres does anyways. It actually has to do the complete query and fetch and discard the records up to the OFFSET and then stop when it hits the LIMIT. > q5 and q6 are too complex to discuss here, but the fundamental issue > is the order in which postgres decides to do things. That true for pretty 99% of all query optimization whether it's on Postgres or Oracle. I'm rather curious to see the query and explain analyze output from q5 and q6. -- greg
On Wed, 29 Oct 2003, Rob Nagler wrote: > Greg Stark writes: > > > > SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2) > > > > FROM t1 > > > > GROUP BY f2 > > > > > > This doesn't solve the problem. It's the GROUP BY that is doing the > > > wrong thing. It's grouping, then aggregating. > > > > But at least in the form above it will consider using an index on f2, and it > > will consider using indexes on t1 and t2 to do the join. > > There are 20 rows in t2, so an index actually slows down the join. > I had to drop the index on t1.f2, because it was trying to use it > instead of simply sorting 20 rows. t2 was 'vacuum full'ed and analyzed, right? Just guessing.
Josh Berkus writes: > I hope that you'll stay current with PostgreSQL developments so that you can > do a similarly thourough evaluation for your next project. Oh, no worries. This project just happens to be a tough one. We're heavily invested in Postgres. Other projects we maintain that use Postgres are zoescore.com, colosla.org, and paintedsnapshot.com. I am currently working on a very large project where the customer is very committed to Postgres/open source. We're in discussions about what to do about the scalability problems we saw in the other project. You can help by addressing a dilema we (my new customer and I) see. I apologize for the length of what follows, but I'm trying to be as clear as possible about our situation. I have had a lot push back from the core Postgres folks on the idea of planner hints, which would go a long way to solve the performance problems we are seeing. I presented an alternative approach: have a "style sheet" (Scribe, LaTex) type of solution in the postmaster, which can be customized by end users. That got no response so I assume it wasn't in line with the "Postgres way" (more below). The vacuum problem is very serious for the problematic database to the point that one of my customer's customers said: However, I am having a hard time understanding why the system is so slow... from my perspective it seems like you have some fundamental database issues that need to be addressed. This is simply unacceptable, and that's why we're moving to Oracle. It's very bad for my business reputation. I don't have a ready solution to vacuuming, and none on the list have been effective. We'll be adding more memory, but it seems to be disk bandwidth problem. I run Oracle on much slower system, and I've never noticed problems of this kind, even when a database-wide validation is running. When vacuum is running, it's going through the entire database, and that pretty much trashes all other queries, especially DSS queries. As always it is just software, and there's got to be 80/20 solution. Our new project is large, high-profile, but not as data intensive as the problematic one. We are willing to commit significant funding and effort to make Postgres faster. We are "business value" driven. That means we solve problems practically instead of theoretically. This seems to be in conflict with "the Postgres way", which seems to be more theoretical. Our business situation comes ahead of theories. My customer (who monitors this list) and I believe that our changes would not be accepted back into the Postgres main branch. That presents us with a difficult situation, because we don't want to own a separate branch. (Xemacs helped push emacs, and maybe that's what has to happen here, yet it's not a pretty situation.) We'll be meeting next week to discuss the situation, and how we'll go forward. We have budget in 2003 to spend on this, but only if the situation can be resolved. Otherwise, we'll have to respect the data we are seeing, and think about our choice of technologies. Thanks for the feedback. Rob
scott.marlowe writes: > t2 was 'vacuum full'ed and analyzed, right? Just guessing. Fresh import. I've been told this includes a ANALYZE. Rob
Rob Nagler <nagler@bivio.biz> writes: > When vacuum is running, it's going through the entire > database, and that pretty much trashes all other queries, especially > DSS queries. As always it is just software, and there's got to be > 80/20 solution. One thing that's been discussed but not yet tried is putting a tunable delay into VACUUM's per-page loop (ie, sleep N milliseconds after each heap page is processed, and probably each index page too). This might be useless or it might be the 80/20 solution you want. Want to try it and report back? regards, tom lane
On Thu, 30 Oct 2003, Rob Nagler wrote: > The vacuum problem is very serious for the problematic database to the > point that one of my customer's customers said: > > However, I am having a hard time understanding why the system is so > slow... from my perspective it seems like you have some fundamental > database issues that need to be addressed. > > This is simply unacceptable, and that's why we're moving to Oracle. > It's very bad for my business reputation. > > I don't have a ready solution to vacuuming, and none on the list have > been effective. We'll be adding more memory, but it seems to be disk > bandwidth problem. I run Oracle on much slower system, and I've never > noticed problems of this kind, even when a database-wide validation is > running. When vacuum is running, it's going through the entire > database, and that pretty much trashes all other queries, especially > DSS queries. As always it is just software, and there's got to be > 80/20 solution. Have you looked at the autovacuum daemon? Was it found wanting or what? I've had good luck with it so far, so I was just wondering if it might work for your needs as well. It's quite intelligent about which tables et.al. it vacuums.
On Thu, 30 Oct 2003, Rob Nagler wrote: > scott.marlowe writes: > > t2 was 'vacuum full'ed and analyzed, right? Just guessing. > > Fresh import. I've been told this includes a ANALYZE. You should probably run analyze by hand just to be sure. If the planner is using an index scan on a table with 20 rows, then it's likely it has the default statistics for the table, not real ones.
Rob, > I have had a lot push back from the core Postgres folks on the idea of > planner hints, which would go a long way to solve the performance > problems we are seeing. I can tell you that the general reaction that you'll get is "let's fix the problems with the planner instead of giving the user a workaround." Not that that helps people running on older versions, but it stems from a attitude of "let's heal the illness, not the symptoms" attitude that is one of our project's strengths. > I presented an alternative approach: have a > "style sheet" (Scribe, LaTex) type of solution in the postmaster, > which can be customized by end users. That got no response so I > assume it wasn't in line with the "Postgres way" (more below). Or you just posted it on a bad week. I don't remember your post; how about we try it out on Hackers again and we'll argue it out? > running. When vacuum is running, it's going through the entire > database, and that pretty much trashes all other queries, especially > DSS queries. As always it is just software, and there's got to be > 80/20 solution. See Tom's post. > Our new project is large, high-profile, but not as data intensive as > the problematic one. We are willing to commit significant funding and > effort to make Postgres faster. We are "business value" driven. That > means we solve problems practically instead of theoretically. This > seems to be in conflict with "the Postgres way", which seems to be > more theoretical. Our business situation comes ahead of theories. As always, it's a matter of balance. Our "theoretical purity" has given PostgreSQL a reliability and recoverability level only otherwise obtainable from Oracle for six figures. And has allowed us to build an extensability system that lets users define their own datatypes, operators, aggregates, etc., in a way that is not possible on *any* other database. This is what you're up against when you suggest changes to some of the core components ... people don't want to break what's not broken unless there are substantial, proven gains to be made. > My customer (who monitors this list) and I believe that our changes > would not be accepted back into the Postgres main branch. If you haven't posted, you don't know. A *lot* of suggestions get rejected because the suggestor wants Tom, Bruce, Peter, Joe and Jan to do the actual work or aren't willing to follow-through with testing and maintanence. As I said before, *I* don't remember earlier posts from you offering patches; perhaps it's time to try again? -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: > > Our new project is large, high-profile, but not as data intensive as > > the problematic one. We are willing to commit significant funding and > > effort to make Postgres faster. We are "business value" driven. That > > means we solve problems practically instead of theoretically. This > > seems to be in conflict with "the Postgres way", which seems to be > > more theoretical. Our business situation comes ahead of theories. > > As always, it's a matter of balance. Our "theoretical purity" has given > PostgreSQL a reliability and recoverability level only otherwise obtainable > from Oracle for six figures. And has allowed us to build an extensibility > system that lets users define their own datatypes, operators, aggregates, > etc., in a way that is not possible on *any* other database. This is what > you're up against when you suggest changes to some of the core components ... > people don't want to break what's not broken unless there are substantial, > proven gains to be made. Let me add a little historical perspective here --- the PostgreSQL code base is almost 20 years old, and code size has doubled in the past 7 years. We are into PostgreSQL for the long haul --- that means we want code that will be working and maintainable 7 years from now. If your solution doesn't fit that case, well, you might be right, it might get rejected. However, we find that it is worth the time and effort to make our code sustainable, and it is possible your solution could be set up to do that. However, it requires you to go beyond your "business situation" logic and devote time to contribute something that will make PostgreSQL better 5 years in the future, as well as the next release. We have found very few companies that are not willing to work within that long-term perspective. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Tom Lane writes: > Rob Nagler <nagler@bivio.biz> writes: > > q5 and q6 are too complex to discuss here, > > How do you expect us to get better if you don't show us the problems? With all due respect and thanks for the massive amount of help, I have presented the problems. q5 and q6 are a subset of the following general problems: * Multiple ORDER BY results in no index used. Solution: drop multiple ORDER BY, only use first * Vacuum locks out interactive users Solution: don't run vacuum full and only run vacuum at night * Low cardinality index on large table confuses planner Solution: Drop (foreign key) index, which hurts other performance * Grouped aggregates result in disk sort Solution: Wait to 7.4 (may work), or write in Perl (works today) * Extreme non-linear performance (crossing magic number in optimizer drops performance three orders of magnitude) Solution: Don't cross magic number, or code in Perl The general problem is that our system generates 90% of the SQL we need. There are many advantages to this, such as being able to add OFFSET/LIMIT support with a few lines of code in a matter of hours. Every time we have to custom code a query, or worse, code it in Perl, we lose many benefits. I understand the need to optimize queries, but my general experience with Oracle is that I don't have to do this very often. When the 80/20 rule inverts, there's something fundamentally wrong with the model. That's where we feel we're at. It's cost us a tremendous amount of money to deal with these query optimizations. The solution is not to fix the queries, but to address the root causes. That's what my other note in this thread is about. I hope you understand the spirit of my suggestion, and work with us to finding an acceptable approach to the general problems. > BTW, have you tried any of this with a 7.4beta release? I will, but for my other projects, not this one. I'll run this data, because it's a great test case. We have a business decision to make: devote more time to Postgres or go with Oracle. I spent less than a day getting the data into Oracle and to create the benchmark. The payoff is clear, now. The risk of 7.4 is still very high, because the vacuum problem still looms and a simple "past performance is a good indicator of future performance". Going forward, there's no choice. We've had to limit end-user functionality to get Postgres working as well as it does, and that's way below where Oracle is without those same limits and without any effort put into tuning. Thanks again for all your support. Rob
> Fresh import. I've been told this includes a ANALYZE. Uh - no it doesn't. Chris