Thread: Simple join optimized badly?
I have two tables, SAMPLE and HITLIST that when joined, generate a monsterous sort. HITLIST_ROWS has about 48,000 rows SAMPLE has about 16 million rows The joined column is indexed in SAMPLE HITLIST_ROWS is a scratch table which is used a few times then discarded. HITLIST_ROWS has no indexes at all There are two plans below. The first is before an ANALYZE HITLIST_ROWS, and it's horrible -- it looks to me like it's sortingthe 16 million rows of the SEARCH table. Then I run ANALYZE HITLIST_ROWS, and the plan is pretty decent. First question: HITLIST_ROWS so small, I don't understand why the lack of ANALYZE should cause SAMPLE's contents to be sorted. Second question: Even though ANALYZE brings it down from 26 minutes to 47 seconds, a huge improvement, it still seems slowto me. Its going at roughly 1 row per millisecond -- are my expectations too high? This is a small-ish Dell computer(Xeon), 4 GB memory, with a four-disk SATA software RAID0 (bandwidth limited to about 130 MB/sec due to PCI cards). Other joins of a similar size seem much faster. It looks like I'll need to do an ANALYZE every time I modify HITLIST_ROWS, which seems like a waste because HITLIST_ROWSis rarely used more than once or twice before being truncated and rebuilt with new content. (HITLIST_ROWS can'tbe an actual temporary table, though, because it's a web application and each access is from a new connection.) This is Postgres 8.0.3. (We're upgrading soon.) Thanks, Craig explain analyze select t.SAMPLE_ID from SAMPLE t, HITLIST_ROWS ph where t.VERSION_ID = ph.ObjectID); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=4782.35..1063809.82 rows=613226 width=4) (actual time=174.212..1593886.582 rows=176294 loops=1) Merge Cond: ("outer".version_id = "inner".objectid) -> Index Scan using i_sample_version_id on sample t (cost=0.00..1008713.68 rows=16446157 width=8) (actual time=0.111..1571911.208rows=16446157 loops=1) -> Sort (cost=4782.35..4910.39 rows=51216 width=4) (actual time=173.669..389.496 rows=176329 loops=1) Sort Key: ph.objectid -> Seq Scan on hitlist_rows_378593 ph (cost=0.00..776.16 rows=51216 width=4) (actual time=0.015..90.059 rows=48834loops=1) Total runtime: 1594093.725 ms (7 rows) chmoogle2=> analyze HITLIST_ROWS; ANALYZE chmoogle2=> explain analyze select t.SAMPLE_ID from SAMPLE t, HITLIST_ROWS ph where t.VERSION_ID = ph.ObjectID; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=874.43..457976.83 rows=584705 width=4) (actual time=302.792..47796.719 rows=176294 loops=1) Hash Cond: ("outer".version_id = "inner".objectid) -> Seq Scan on sample t (cost=0.00..369024.57 rows=16446157 width=8) (actual time=46.344..26752.343 rows=16446157 loops=1) -> Hash (cost=752.34..752.34 rows=48834 width=4) (actual time=149.548..149.548 rows=0 loops=1) -> Seq Scan on hitlist_rows_378593 ph (cost=0.00..752.34 rows=48834 width=4) (actual time=0.048..80.721 rows=48834loops=1) Total runtime: 47988.572 ms (6 rows)
"Craig A. James" <cjames@modgraph-usa.com> writes: > There are two plans below. The first is before an ANALYZE HITLIST_ROWS, and it's horrible -- it looks to me like it'ssorting the 16 million rows of the SEARCH table. Then I run ANALYZE HITLIST_ROWS, and the plan is pretty decent. It would be interesting to look at the before-ANALYZE cost estimate for the hash join, which you could get by setting enable_mergejoin off (you might have to turn off enable_nestloop too). I recall though that there's a fudge factor in costsize.c that penalizes hashing on a column that no statistics are available for. The reason for this is the possibility that the column has only a small number of distinct values, which would make a hash join very inefficient (in the worst case all the values might end up in the same hash bucket, making it no better than a nestloop). Once you've done ANALYZE it plugs in a real estimate instead, and evidently the cost estimate drops enough to make hashjoin the winner. You might be able to persuade it to use a hashjoin anyway by increasing work_mem enough, but on the whole my advice is to do the ANALYZE after you load up the temp table. The planner really can't be expected to be very intelligent when it has no stats. regards, tom lane
Wouldn't PG supporting simple optmizer hints get around this kinda problem? Seems to me that at least one customer posting per week would be solved via the use of simple hints. If the community is interested... EnterpriseDB has added support for a few different simple types of hints (optimize for speed, optimize for first rows, use particular indexes) for our upcoming 8.2 version. We are glad to submit them into the community process if there is any chance they will eventually be accepted for 8.3. I don't think there is an ANSI standrd for hints, but, that doesn't mean they are not occosaionally extrenmely useful. All hints are effectively harmless/helpful suggestions, the planner is free to ignore them if they are not feasible. --Denis Lussier Founder http://www.enterprisedb.com On 10/7/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Craig A. James" <cjames@modgraph-usa.com> writes: > > There are two plans below. The first is before an ANALYZE HITLIST_ROWS, and it's horrible -- it looks to me like it'ssorting the 16 million rows of the SEARCH table. Then I run ANALYZE HITLIST_ROWS, and the plan is pretty decent. > > It would be interesting to look at the before-ANALYZE cost estimate for > the hash join, which you could get by setting enable_mergejoin off (you > might have to turn off enable_nestloop too). I recall though that > there's a fudge factor in costsize.c that penalizes hashing on a column > that no statistics are available for. The reason for this is the > possibility that the column has only a small number of distinct values, > which would make a hash join very inefficient (in the worst case all > the values might end up in the same hash bucket, making it no better > than a nestloop). Once you've done ANALYZE it plugs in a real estimate > instead, and evidently the cost estimate drops enough to make hashjoin > the winner. > > You might be able to persuade it to use a hashjoin anyway by increasing > work_mem enough, but on the whole my advice is to do the ANALYZE after > you load up the temp table. The planner really can't be expected to be > very intelligent when it has no stats. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >
On Oct 7, 2006, at 8:50 PM, Denis Lussier wrote: > Wouldn't PG supporting simple optmizer hints get around this kinda > problem? Seems to me that at least one customer posting per week > would be solved via the use of simple hints. > > If the community is interested... EnterpriseDB has added support for > a few different simple types of hints (optimize for speed, optimize > for first rows, use particular indexes) for our upcoming 8.2 version. > We are glad to submit them into the community process if there is any > chance they will eventually be accepted for 8.3. +1 (and I'd be voting that way regardless of where my paycheck comes from) While it's important that we continue to improve the planner, it's simply not possible to build one that's smart enough to handle every single situation. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Denis, > Wouldn't PG supporting simple optmizer hints get around this kinda > problem? Seems to me that at least one customer posting per week > would be solved via the use of simple hints. ... and add 100 other problems. Hints are used because the DBA thinks that they are smarter than the optimizer; 99% of the time, they are wrong. Just try manually optimizing a complex query, you'll see -- with three join types, several scan types, aggregates, bitmaps, internal and external sorts, and the ability to collapse subqueries it's significantly more than a human can figure out accurately. Given the availability of hints, the newbie DBA will attempt to use them instead of fixing any of the underlying issues. Craig's post is a classic example of that: what he really needs to do is ANALYZE HITLIST_ROWS after populating it. If he had the option of hints, and was shortsighted (I'm not assuming that Craig is shortsighted, but just for the sake of argument) he'd fix this with a hint and move on ... and then add another hint when he adds a another query which needs HITLIST_ROWS, and another. And then he'll find out that some change in his data (the sample table growing, for example) makes his hints obsolete and he has to go back and re-tune them all. And then ... it comes time to upgrade PostgreSQL. The hints which worked well in version 8.0 won't necessarily work well in 8.2. In fact, many of them may make queries disastrously slow. Ask any Oracle DBA, they'll tell you that upgrading hint is a major PITA, and why Oracle is getting away from Hints and has eliminated the rules-based optimizer. Now, if you were offering us a patch to auto-populate the statistics as a table is loaded, I'd be all for that. But I, personally, would need a lot of convincing to believe that hints don't do more harm than good. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
> ... and add 100 other problems. Hints are used because the DBA thinks that > they are smarter than the optimizer; 99% of the time, they are wrong. > Just try manually optimizing a complex query, you'll see -- with three > join types, several scan types, aggregates, bitmaps, internal and external > sorts, and the ability to collapse subqueries it's significantly more than > a human can figure out accurately. Sorry, this is just wrong, wrong, wrong. I've heard this from several PG developers every time hints have come up in my roughly eighteen months as a PG applicationdeveloper. And in between every assertion that "the application programmers aren't as smart as the optimizer",there are a dozen or two examples where posters to this list are told to increase this setting, decrease thatone, adjust these other two, and the end result is to get the plan that the application programmer -- AND the PG professionals-- knew was the right plan to start with. People are smarter than computers. Period. Now I'll agree that the majority, perhaps the great majority, of questions to this group should NOT be solved with hints. You're absolutely right that in most cases hints are a really bad idea. People will resort to hints when they shouldbe learning better ways to craft SQL, and when they should have read the configuration guides. But that doesn't alter the fact that many, perhaps most, complicated application will, sooner or later, run into a showstoppercase where PG just optimizes wrong, and there's not a damned thing the app programmer can do about it. My example, discussed previously in this forum, is a classic. I have a VERY expensive function (it's in the class of NP-completeproblems, so there is no faster way to do it). There is no circumstance when my function should be used as afilter, and no circumstance when it should be done before a join. But PG has no way of knowing the cost of a function,and so the optimizer assigns the same cost to every function. Big disaster. The result? I can't use my function in any WHERE clause that involves any other conditions or joins. Only by itself. PGwill occasionally decide to use my function as a filter instead of doing the join or the other WHERE conditions first,and I'm dead. The interesting thing is that PG works pretty well for me on big tables -- it does the join first, then applies my expensivefunctions. But with a SMALL (like 50K rows) table, it applies my function first, then does the join. A searchthat completes in 1 second on a 5,000,000 row database can take a minute or more on a 50,000 row database. Instead, I have to separate the WHERE terms into two SQL statements, and do the join myself. I do the first half of my query,suck it all into memory, do the second half, suck it into memory, build a hash table and join the two lists in memory,then take the joined results and apply my function to it. This is not how a relational database should work. It shouldn't fall over dead just when a table's size SHRINKS beyond somethreshold that causes the planner to switch to a poor plan. Since these tables are all in the same database, adjusting configuration parameters doesn't help me. And I suppose I coulduse SET to disable various plans, but how is that any different from a HINT feature? Now you might argue that function-cost needs to be added to the optimizer's arsenal of tricks. And I'd agree with you: ThatWOULD be a better solution than hints. But I need my problem solved TODAY, not next year. Hints can help solve problemsNOW that can be brought to the PG team's attention later, and in the mean time let me get my application to work. Sorry if I seem particularly hot under the collar on this one. I think you PG designers have created a wonderful product. It's not the lack of hints that bothers me, it's the "You app developers are dumber than we are" attitude. We'renot. Some of us know what we're doing, and we need hints. If it is just a matter of resources, that's fine. I understand that these things take time. But please don't keep dismissingthe repeated and serious requests for this feature. It's important. Thanks for listening. Craig
Josh Berkus <josh@agliodbs.com> writes: > Now, if you were offering us a patch to auto-populate the statistics as a > table is loaded, I'd be all for that. Curiously enough, I was just thinking about that after reading Craig's post. autovacuum will do this, sort of, if it's turned on --- but its reaction time is measured in minutes typically so that may not be good enough. Another thing we've been beat up about in the past is that loading a pg_dump script doesn't ANALYZE the data afterward... regards, tom lane
Jonah H. Harris wrote: > On Oct 08, 2006 07:05 PM, Josh Berkus <josh@agliodbs.com> wrote: > > Hints are used because the DBA thinks that they are smarter than > > the optimizer; 99% of the time, they are wrong. > > That's a figure which I'm 100% sure cannot be backed up by fact. > > > Just try manually optimizing a complex query, you'll see -- with three > > join types, several scan types, aggregates, bitmaps, [blah blah blah] > > it's significantly more than a human can figure out accurately. > > Let me get this right... the optimizer is written by humans who know and > can calculate the proper query plan and generate code to do the same; > yet humans aren't smart enough to optimize the queries themselves? A bit > of circular reasoning here? I can do 100! on my computer, but can't do it in my head. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Craig A. James wrote: > > > My example, discussed previously in this forum, is a classic. I have a > VERY expensive function (it's in the class of NP-complete problems, so > there is no faster way to do it). There is no circumstance when my > function should be used as a filter, and no circumstance when it should > be done before a join. But PG has no way of knowing the cost of a > function, and so the optimizer assigns the same cost to every function. > Big disaster. > > The result? I can't use my function in any WHERE clause that involves > any other conditions or joins. Only by itself. PG will occasionally > decide to use my function as a filter instead of doing the join or the > other WHERE conditions first, and I'm dead. > this is an argument for cost-for-functions rather than hints AFAICS. It seems to me that if (in addition to the function cost) we come up with some efficient way of recording cross column statistics we would be well on the way to silencing *most* of the demands for hints. We would still be left with some of the really difficult problems - a metric for "locally correlated" column distributions and a reliable statistical algorithm for most common value sampling (or a different way of approaching this). These sound like interesting computer science or mathematics thesis topics, maybe we could try (again?) to get some interest at that level? Cheers Mark
Bruce Momjian wrote: > I can do 100! on my computer, but can't do it in my head. A poor example. 100! is a simple repetative calculation, something computers are very good at. Optimizing an SQL queryis very difficult, and a completely different class of problem. The fact is the PG team has done a remarkable job with the optimizer so far. I'm usually very happy with its plans. Buthumans still beat computers at many tasks, and there are unquestionably areas where the PG optimizer is not yet fullydeveloped. When the optimizer reaches its limits, and you have to get your web site running, a HINT can be invaluable. I said something in a previous version of this topic, which I'll repeat here. The PG documentation for HINTs should be FILLEDwith STRONG ADMONITIONS to post the problematic queries here before resorting to hints. There will always be fools who abuse hints. Too bad for them, but don't make the rest of us suffer for their folly. Craig
Mark Kirkwood wrote: >> The result? I can't use my function in any WHERE clause that involves >> any other conditions or joins. Only by itself. PG will occasionally >> decide to use my function as a filter instead of doing the join or the >> other WHERE conditions first, and I'm dead. > > this is an argument for cost-for-functions rather than hints AFAICS. Perhaps you scanned past what I wrote a couple paragraphs farther down. I'm going to repeat it because it's the KEY POINTI'm trying to make: Craig James wrote: > Now you might argue that function-cost needs to be added to the > optimizer's arsenal of tricks. And I'd agree with you: That WOULD be a > better solution than hints. But I need my problem solved TODAY, not > next year. Hints can help solve problems NOW that can be brought to the > PG team's attention later, and in the mean time let me get my > application to work. Craig
tgl@sss.pgh.pa.us (Tom Lane) writes: > Another thing we've been beat up about in the past is that loading a > pg_dump script doesn't ANALYZE the data afterward... Do I misrecall, or were there not plans (circa 7.4...) to for pg_dump to have an option to do an ANALYZE at the end? I seem to remember some dispute as to whether the default should be to include the ANALYZE, with an option to suppress it, or the opposite... -- (reverse (concatenate 'string "ofni.sesabatadxunil" "@" "enworbbc")) http://www3.sympatico.ca/cbbrowne/wp.html "You can measure a programmer's perspective by noting his attitude on the continuing vitality of FORTRAN." -- Alan J. Perlis
Craig A. James wrote: > > Perhaps you scanned past what I wrote a couple paragraphs farther down. > I'm going to repeat it because it's the KEY POINT I'm trying to make: > > Craig James wrote: >> Now you might argue that function-cost needs to be added to the >> optimizer's arsenal of tricks. And I'd agree with you: That WOULD be >> a better solution than hints. But I need my problem solved TODAY, not >> next year. Hints can help solve problems NOW that can be brought to >> the PG team's attention later, and in the mean time let me get my >> application to work. True enough - but (aside from the fact that hints might take just as long to get into the development tree as cost-for-functions might take to write and put in...) there is a nasty side effect to adding hints - most of the raw material for optimizer improvement disappears (and hence optimizer improvement stalls)- why? simply that everyone then hints everything - welcome to the mess that Oracle are in (and seem to be trying to get out of recently)! I understand that it is frustrating to not have the feature you need now - but you could perhaps view it as a necessary part of the community development process - your need is the driver for optimizer improvement, and it can take time. Now ISTM that hints "solve" the problem by removing the need any further optimizer improvement at all - by making *you* the optimizer. This is bad for those of us in the DSS world, where most ad-hoc tools do not provide the ability to add hints. Cheers Mark
Mark Kirkwood <markir@paradise.net.nz> writes: > True enough - but (aside from the fact that hints might take just as > long to get into the development tree as cost-for-functions might take > to write and put in...) there is a nasty side effect to adding hints - > most of the raw material for optimizer improvement disappears (and hence > optimizer improvement stalls)- why? simply that everyone then hints > everything - welcome to the mess that Oracle are in (and seem to be > trying to get out of recently)! And *that* is exactly the key point here. Sure, if we had unlimited manpower we could afford to throw some at developing a hint language that would be usable and not too likely to break at every PG revision. But we do not have unlimited manpower. My opinion is that spending our development effort on hints will have a poor yield on investment compared to spending similar effort on making the planner smarter. Josh's post points out some reasons why it's not that easy to get long-term benefits from hints --- you could possibly address some of those problems, but a hint language that responds to those criticisms won't be trivial to design, implement, or maintain. See (many) past discussions for reasons why not. regards, tom lane
Tom, > Josh's post points out some reasons why it's not that easy to get > long-term benefits from hints --- you could possibly address some of > those problems, but a hint language that responds to those criticisms > won't be trivial to design, implement, or maintain. See (many) past > discussions for reasons why not. Well, why don't we see what EDB can come up with? If it's not "good enough" we'll just reject it. Unfortunately, EDB's solution is likely to be Oracle-based, which is liable to fall into the trap of "not good enough." -- Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus <josh@agliodbs.com> writes: > Unfortunately, EDB's solution is likely to be Oracle-based, which is > liable to fall into the trap of "not good enough." I'd be a bit worried about Oracle patents as well... regards, tom lane
cjames@modgraph-usa.com ("Craig A. James") writes: > Mark Kirkwood wrote: >>> The result? I can't use my function in any WHERE clause that >>> involves any other conditions or joins. Only by itself. PG will >>> occasionally decide to use my function as a filter instead of doing >>> the join or the other WHERE conditions first, and I'm dead. >> this is an argument for cost-for-functions rather than hints AFAICS. > > Perhaps you scanned past what I wrote a couple paragraphs farther > down. I'm going to repeat it because it's the KEY POINT I'm trying > to make: > > Craig James wrote: >> Now you might argue that function-cost needs to be added to the >> optimizer's arsenal of tricks. And I'd agree with you: That WOULD >> be a better solution than hints. But I need my problem solved >> TODAY, not next year. Hints can help solve problems NOW that can be >> brought to the PG team's attention later, and in the mean time let >> me get my application to work. Unfortunately, that "hint language" also needs to mandate a temporal awareness of when hints were introduced so that it doesn't worsen things down the road. e.g. - Suppose you upgrade to 8.4, where the query optimizer becomes smart enough (perhaps combined with entirely new kinds of scan strategies) to make certain of your hints obsolete and/or downright wrong. Those hints (well, *some* of them) ought to be ignored, right? The trouble is that the "hint language" will be painfully large and complex. Its likely-nonstandard interaction with SQL will make query parsing worse. All we really have, at this point, is a vague desire for a "hint language," as opposed to any clear direction as to what it should look like, and how it needs to interact with other system components. That's not nearly enough; there needs to be a clear design. -- (format nil "~S@~S" "cbbrowne" "cbbrowne.com") http://cbbrowne.com/info/advocacy.html 'Typos in FINNEGANS WAKE? How could you tell?' -- Kim Stanley Robinson
On Sun, 2006-10-08 at 18:05, Josh Berkus wrote: > Now, if you were offering us a patch to auto-populate the statistics as a > table is loaded, I'd be all for that. But I, personally, would need a > lot of convincing to believe that hints don't do more harm than good. Actually, I'd much rather have a log option, on by default, that spit out info messages when the planner made a guess that was off by a factor of 20 or 50 or so or more on a plan. I can remember to run stats, but finding slow queries that are slow because the plan was bad, that's the hard part.
On Mon, Oct 09, 2006 at 06:07:29PM +0000, Chris Browne wrote: > cjames@modgraph-usa.com ("Craig A. James") writes: > > Mark Kirkwood wrote: > >>> The result? I can't use my function in any WHERE clause that > >>> involves any other conditions or joins. Only by itself. PG will > >>> occasionally decide to use my function as a filter instead of doing > >>> the join or the other WHERE conditions first, and I'm dead. > >> this is an argument for cost-for-functions rather than hints AFAICS. > > > > Perhaps you scanned past what I wrote a couple paragraphs farther > > down. I'm going to repeat it because it's the KEY POINT I'm trying > > to make: > > > > Craig James wrote: > >> Now you might argue that function-cost needs to be added to the > >> optimizer's arsenal of tricks. And I'd agree with you: That WOULD > >> be a better solution than hints. But I need my problem solved > >> TODAY, not next year. Hints can help solve problems NOW that can be > >> brought to the PG team's attention later, and in the mean time let > >> me get my application to work. > > Unfortunately, that "hint language" also needs to mandate a temporal > awareness of when hints were introduced so that it doesn't worsen > things down the road. > > e.g. - Suppose you upgrade to 8.4, where the query optimizer becomes > smart enough (perhaps combined with entirely new kinds of scan > strategies) to make certain of your hints obsolete and/or downright > wrong. Those hints (well, *some* of them) ought to be ignored, right? Great, then you pull the hints back out of the application. They're a last resort anyway; if you have more than a handful of them in your code you really need to look at what you're doing. > The trouble is that the "hint language" will be painfully large and > complex. Its likely-nonstandard interaction with SQL will make query > parsing worse. > > All we really have, at this point, is a vague desire for a "hint > language," as opposed to any clear direction as to what it should look > like, and how it needs to interact with other system components. > That's not nearly enough; there needs to be a clear design. I can agree to that, but we'll never get any progress so long as every time hints are brought up the response is that they're evil and should never be in the database. I'll also say that a very simple hinting language (ie: allowing you to specify access method for a table, and join methods) would go a huge way towards enabling app developers to get stuff done now while waiting for all these magical optimizer improvements that have been talked about for years. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
[Jim C. Nasby - Mon at 04:18:27PM -0500] > I can agree to that, but we'll never get any progress so long as every > time hints are brought up the response is that they're evil and should > never be in the database. I'll also say that a very simple hinting > language (ie: allowing you to specify access method for a table, and > join methods) would go a huge way towards enabling app developers to get > stuff done now while waiting for all these magical optimizer > improvements that have been talked about for years. Just a comment from the side line; can't the rough "set enable_seqscan=off" be considered as sort of a hint anyway? There have been situations where we've actually had to resort to such crud. Beeing able to i.e. force a particular index is something I really wouldn't put into the application except for as a very last resort, _but_ beeing able to force i.e. the use of a particular index in an interactive 'explain analyze'-query would often be ... if not outright useful, then at least very interessting.
On Mon, Oct 09, 2006 at 11:33:03PM +0200, Tobias Brox wrote: > [Jim C. Nasby - Mon at 04:18:27PM -0500] > > I can agree to that, but we'll never get any progress so long as every > > time hints are brought up the response is that they're evil and should > > never be in the database. I'll also say that a very simple hinting > > language (ie: allowing you to specify access method for a table, and > > join methods) would go a huge way towards enabling app developers to get > > stuff done now while waiting for all these magical optimizer > > improvements that have been talked about for years. > > Just a comment from the side line; can't the rough "set > enable_seqscan=off" be considered as sort of a hint anyway? There have > been situations where we've actually had to resort to such crud. > > Beeing able to i.e. force a particular index is something I really > wouldn't put into the application except for as a very last resort, > _but_ beeing able to force i.e. the use of a particular index in an > interactive 'explain analyze'-query would often be ... if not outright > useful, then at least very interessting. One of the big problems with doing set enable_...=off is that there's no way to embed that into something like a view, so you're almost forced into putting into the application code itself, which makes matters even worse. If you could hint this within a query (maybe even on a per-table level), you could at least encapsulate that into a view. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
"Jim C. Nasby" <jim@nasby.net> writes: > I'll also say that a very simple hinting > language (ie: allowing you to specify access method for a table, and > join methods) would go a huge way towards enabling app developers to get > stuff done now while waiting for all these magical optimizer > improvements that have been talked about for years. Basically, the claim that it'll be both easy and useful is what I think is horsepucky ... let's see a detailed design if you think it's easy. regards, tom lane
> > One of the big problems with doing set enable_...=off is that there's no > way to embed that into something like a view, so you're almost forced > into putting into the application code itself, which makes matters even > worse. If you could hint this within a query (maybe even on a per-table > level), you could at least encapsulate that into a view. You can easily pass multiple statements within a single exec() or push it into an SPF. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
"Jim C. Nasby" <jim@nasby.net> writes: > One of the big problems with doing set enable_...=off is that there's no > way to embed that into something like a view, so you're almost forced > into putting into the application code itself, which makes matters even > worse. If you could hint this within a query (maybe even on a per-table > level), you could at least encapsulate that into a view. You've almost reinvented one of the points that was made in the last go-round on the subject of hints, which is that keeping them out of the application code is an important factor in making them manageable by a DBA. Hints stored in a system catalog (and probably having the form of "make this statistical assumption" rather than specifically "use that plan") would avoid many of the negatives. regards, tom lane
PG does support hints actually.. and I used them to solve the last performance problem I had, rather than waiting n years for the query planner to be improved. The problem in question (from an automated query planning point of view) is the lack of multi-column statistics, leading to the wrong index being used. The only thing is, the hints are expressed in an obscure, ad-hoc and implementation dependant language. For example, the "Don't use index X" hint (the one I used) can be accessed by replacing your index with an index on values derived from the actual index, instead of the values themselves. Then that index is not available during normal query planning. Another example is the "Maybe use index on X and also sort by X" hint, which you access by adding "ORDER BY X" to your query. That would have solved my problem for a simple select, but it didn't help for an update. Then there's the "Don't use seq scan" hint, which is expressed as "set enable_seqscan=off". That can help when it mistakenly chooses seq scan. And there are many more such hints, which are regularly used by PG users to work around erroneous query plans. While writing this email, I had an idea for a FAQ, which would tell PG users how to access this informal hint language: Q: The query planner keeps choosing the wrong index. How do I force it to use the correct index? A: Have you analyzed your tables, increased statistics, etc etc etc? If that doesn't help, you can change the index to use a value derived from the actual row values. Then the index will not be available unless you explicitly use the derived values in your conditions. With such a FAQ, us people who use PG in the real world can have our queries running reliably and efficiently, while work to improve the query planner continues.
Brian Herlihy wrote: > PG does support hints actually.. > The only thing is, the hints are expressed in an obscure, ad-hoc and > implementation dependant language. > > For example, the "Don't use index X" hint (the one I used) can be accessed by > replacing your index with an index on values derived from the actual index... And then there's select ... from (select ... offset 0) where the "offset 0" prevents any rewriting between the two levels of query. This replaces joins and AND clauses where theplanner makes the wrong choice of join order or filtering. I grepped my code and found four of these (all workaroundsfor the same underlying problem). Imagine I got run over by a train, and someone was reading my code. Which would be easier for them to maintain: Code withweird SQL, or code with sensible, well-written SQL and explicit hints? Luckily for my (hypothetical, I hope) successor,I put massive comments in my code explaining the strange SQL. The bad applications are ALREADY HERE. And they're WORSE to maintain than if we had a formal hint language. The argumentthat hints lead to poor application is true. But lack of hints leads to worse applications. Craig
> Imagine I got run over by a train, and someone was reading my code. > Which would be easier for them to maintain: Code with weird SQL, or code > with sensible, well-written SQL and explicit hints? You forgot the most important option: Code with appropriate documentation about your weird SQL. If you document your code, your argument is moot. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
On Mon, Oct 09, 2006 at 06:45:16PM -0400, Tom Lane wrote: > "Jim C. Nasby" <jim@nasby.net> writes: > > One of the big problems with doing set enable_...=off is that there's no > > way to embed that into something like a view, so you're almost forced > > into putting into the application code itself, which makes matters even > > worse. If you could hint this within a query (maybe even on a per-table > > level), you could at least encapsulate that into a view. > > You've almost reinvented one of the points that was made in the last > go-round on the subject of hints, which is that keeping them out of the > application code is an important factor in making them manageable by a > DBA. Hints stored in a system catalog (and probably having the form of > "make this statistical assumption" rather than specifically "use that > plan") would avoid many of the negatives. Sure, but IIRC no one's figured out what that would actually look like, while it's not hard to come up with a syntax that allows you to tell the optimizer "scan index XYZ to access this table". (And if there's real interest in adding that I'll come up with a proposal.) I'd rather have the ugly solution sooner rather than the elegant one later (if ever). -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Mon, Oct 09, 2006 at 03:41:09PM -0700, Joshua D. Drake wrote: > > > > One of the big problems with doing set enable_...=off is that there's no > > way to embed that into something like a view, so you're almost forced > > into putting into the application code itself, which makes matters even > > worse. If you could hint this within a query (maybe even on a per-table > > level), you could at least encapsulate that into a view. > > You can easily pass multiple statements within a single exec() or push > it into an SPF. Unless I'm missing something, putting multiple statements in a single exec means you're messing with the application code. And you can't update a SRF (also means messing with the application code). Though, I suppose you could update a view that pulled from an SRF... -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Mon, Oct 09, 2006 at 08:22:39PM -0700, Joshua D. Drake wrote: > > > Imagine I got run over by a train, and someone was reading my code. > > Which would be easier for them to maintain: Code with weird SQL, or code > > with sensible, well-written SQL and explicit hints? > > You forgot the most important option: > > Code with appropriate documentation about your weird SQL. > > If you document your code, your argument is moot. You apparently didn't read the whole email. He said he did document his code. But his point is still valid: obscure code is bad even with documentation. Would you put something from the obfuscated C contest into production with comments describing what it does, or would you just write the code cleanly to begin with? -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
"Jim C. Nasby" <jim@nasby.net> writes: > I'd rather have the ugly solution sooner rather than the elegant one > later (if ever). The trouble with that is that we couldn't ever get rid of it, and we'd be stuck with backward-compatibility concerns with the first (over simplified) design. It's important to get it right the first time, at least for stuff that you know perfectly well is going to end up embedded in application code. regards, tom lane
On Tue, Oct 10, 2006 at 09:07:03AM -0500, Jim C. Nasby wrote: > Would you put something from the obfuscated C contest > into production with comments describing what it does, If nothing else, it would be a nice practical joke =) /* Steinar */ -- Homepage: http://www.sesse.net/
On Tue, Oct 10, 2006 at 10:14:48AM -0400, Tom Lane wrote: > "Jim C. Nasby" <jim@nasby.net> writes: > > I'd rather have the ugly solution sooner rather than the elegant one > > later (if ever). > > The trouble with that is that we couldn't ever get rid of it, and we'd > be stuck with backward-compatibility concerns with the first (over > simplified) design. It's important to get it right the first time, > at least for stuff that you know perfectly well is going to end up > embedded in application code. We've depricated things before, I'm sure we'll do it again. Yes, it's a pain, but it's better than not having anything release after release. And having a formal hint language would at least allow us to eventually clean up some of these oddball cases, like the OFFSET 0 hack. I'm also not convinced that even supplimental statistics will be enough to ensure the planner always does the right thing, so query-level hints may have to stay (though it'd be great if that wasn't the case). -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Jim C. Nasby wrote: > On Mon, Oct 09, 2006 at 03:41:09PM -0700, Joshua D. Drake wrote: >>> One of the big problems with doing set enable_...=off is that there's no >>> way to embed that into something like a view, so you're almost forced >>> into putting into the application code itself, which makes matters even >>> worse. If you could hint this within a query (maybe even on a per-table >>> level), you could at least encapsulate that into a view. >> You can easily pass multiple statements within a single exec() or push >> it into an SPF. > > Unless I'm missing something, putting multiple statements in a single > exec means you're messing with the application code. And you can't > update a SRF (also means messing with the application code). Though, I > suppose you could update a view that pulled from an SRF... I always think of application code as outside the db. I was thinking more in layers. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Jim C. Nasby wrote: > On Mon, Oct 09, 2006 at 08:22:39PM -0700, Joshua D. Drake wrote: >>> Imagine I got run over by a train, and someone was reading my code. >>> Which would be easier for them to maintain: Code with weird SQL, or code >>> with sensible, well-written SQL and explicit hints? >> You forgot the most important option: >> >> Code with appropriate documentation about your weird SQL. >> >> If you document your code, your argument is moot. > > You apparently didn't read the whole email. He said he did document his > code. But his point is still valid: obscure code is bad even with > documentation. Would you put something from the obfuscated C contest > into production with comments describing what it does, or would you just > write the code cleanly to begin with? You are comparing apples to oranges. We aren't talking about an obfuscated piece of code. We are talking about an SQL statement that solves a particular problem. That can easily be documented, and documented with enough verbosity that it is never a question, except to test and see if the problem exists in current versions. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Steinar H. Gunderson wrote: > On Tue, Oct 10, 2006 at 09:07:03AM -0500, Jim C. Nasby wrote: >> Would you put something from the obfuscated C contest >> into production with comments describing what it does, > > If nothing else, it would be a nice practical joke =) nice isn't the word I would use ;) Joshua D. Drake > > /* Steinar */ -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
On Mon, Oct 09, 2006 at 23:33:03 +0200, Tobias Brox <tobias@nordicbet.com> wrote: > > Just a comment from the side line; can't the rough "set > enable_seqscan=off" be considered as sort of a hint anyway? There have > been situations where we've actually had to resort to such crud. That only works for simple queries. To be generally useful, you want to be able to hint how to handle each join being done in the query. The current controlls affect all joins.
Jim, > We've depricated things before, I'm sure we'll do it again. Yes, it's a > pain, but it's better than not having anything release after release. > And having a formal hint language would at least allow us to eventually > clean up some of these oddball cases, like the OFFSET 0 hack. > > I'm also not convinced that even supplimental statistics will be enough > to ensure the planner always does the right thing, so query-level hints > may have to stay (though it'd be great if that wasn't the case). "stay"? I don't think that the general developers of PostgreSQL are going to *accept* anything that stands a significant chance of breaking in one release. You have you challange for the EDB development team: come up with a hinting language which is flexible enough not to do more harm than good (hint: it's not Oracle's hints). -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
On Tue, Oct 10, 2006 at 10:28:29AM -0700, Josh Berkus wrote: > Jim, > > > We've depricated things before, I'm sure we'll do it again. Yes, it's a > > pain, but it's better than not having anything release after release. > > And having a formal hint language would at least allow us to eventually > > clean up some of these oddball cases, like the OFFSET 0 hack. > > > > I'm also not convinced that even supplimental statistics will be enough > > to ensure the planner always does the right thing, so query-level hints > > may have to stay (though it'd be great if that wasn't the case). > > "stay"? I don't think that the general developers of PostgreSQL are going > to *accept* anything that stands a significant chance of breaking in one > release. You have you challange for the EDB development team: come up > with a hinting language which is flexible enough not to do more harm than > good (hint: it's not Oracle's hints). My point was that I think we'll always have a need for fine-grained (ie: table and join level) hints, even if we do get the ability for users to over-ride the statistics system. It's just not possible to come up with automation that will handle every possible query that can be thrown at a system. I don't see how that means breaking anything in a given release. Worst-case, the optimizer might be able to do a better job of something than hints written for an older version of the database, but that's going to be true of any planner override we come up with. BTW, I'm not speaking for EnterpriseDB or it's developers here... query hints are something I feel we've needed for a long time. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Jim C. Nasby wrote: > (snippage)... but we'll never get any progress so long as every > time hints are brought up the response is that they're evil and should > never be in the database. I'll also say that a very simple hinting > language (ie: allowing you to specify access method for a table, and > join methods) would go a huge way towards enabling app developers to get > stuff done now while waiting for all these magical optimizer > improvements that have been talked about for years. It is possibly because some of us feel they are evil :-) (can't speak for the *real* Pg developers, just my 2c here) As for optimizer improvements well, yeah we all want those - but the basic problem (as I think Tom stated) is the developer resources to do them. As an aside this applies to hints as well - even if we have a patch to start off with - look at how much time bitmap indexes have been worked on to get them ready for release.... Personally I don't agree with the oft stated comment along the lines of "we will never get the optimizer to the point where it does not need some form of hinting" as: 1/ we don't know that to be a true statement, and 2/ it is kind of admitting defeat on a very interesting problem, when in fact a great deal of progress has been made to date, obviously by people who believe it is possible to build a "start enough" optimizer. best wishes Mark
Mark Kirkwood wrote: > who believe it is possible to build a "start enough" optimizer. > That's meant to read "smart enough" optimizer .. sorry.
-- tom lane wrote --------------------------------------------------------- "Jim C. Nasby" <jim@nasby.net> writes: > I'd rather have the ugly solution sooner rather than the elegant one > later (if ever). The trouble with that is that we couldn't ever get rid of it, and we'd be stuck with backward-compatibility concerns with the first (over simplified) design. It's important to get it right the first time, at least for stuff that you know perfectly well is going to end up embedded in application code. regards, tom lane --------------------------------------------------------------------------- I agree that it's important to get it right the first time. It's also important that my queries use the right index NOW. It's no use to me if my queries run efficiently in the next release when I am running those queries right now. Hints would allow me to do that. What would it take for hints to be added to postgres? If someone designed a hint system that was powerful and flexible, and offered to implement it themselves, would this be sufficient? This would address the concerns of having a "bad" hint system, and also the concern of time being better spent on other things. I want to know if the other objections to hints, such as hints being left behind after an improvement to the optimizer, would also be an issue. I don't see this objection as significant, as people are already using ad hoc hacks where they would otherwise use hints. The other reason I don't accept this objection is that people who care about performance will review their code after every DBMS upgrade, and they will read the release notes :)
Brian Herlihy <btherl@yahoo.com.au> writes: > What would it take for hints to be added to postgres? A *whole lot* more thought and effort than has been expended on the subject to date. Personally I have no use for the idea of "force the planner to do exactly X given a query of exactly Y". You don't have exactly Y today, tomorrow, and the day after (if you do, you don't need a hint mechanism at all, you need a mysql-style query cache). IMHO most of the planner mistakes we see that could be fixed via hinting are really statistical estimation errors, and so the right level to be fixing them at is hints about how to estimate the number of rows produced for given conditions. Mind you that's still a plenty hard problem, but you could at least hope that a hint of that form would be useful for more than one query. regards, tom lane
--- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Personally I have no use for the idea of "force the planner to do > exactly X given a query of exactly Y". You don't have exactly Y > today, tomorrow, and the day after (if you do, you don't need a > hint mechanism at all, you need a mysql-style query cache). I don't agree here. I have "exactly Y" running millions of times daily. There's enough data that the statistics on specific values don't help all that much, even at the maximum statistics collection level. By "exactly Y" I mean the form of the query is identical, and the query plan is identical, since only the general statistics are being used for most executions of the query. The specific values vary, so caching is no help. In summary, I have a need to run "exactly Y" with query plan "exactly X". (detail in postscript) > IMHO most of the planner mistakes we see that could be fixed via > hinting are really statistical estimation errors, and so the right > level to be fixing them at is hints about how to estimate the number > of rows produced for given conditions. Do you mean something like "The selectivity of these two columns together is really X"? That would solve my specific problem. And the academic part of me likes the elegance of that solution. On the negative side, it means people must learn how the optimizer uses statistics (which I would never have done if I could have said "Use index X"). > Mind you that's still a plenty > hard problem, but you could at least hope that a hint of that form > would be useful for more than one query. Yes it would be useful for more than one query. I agree that it's the "right" level to hint at, in that it is at a higher level. Maybe the right level is not the best level though? In a business environment, you just want things to work, you don't want to analyze a problem all the way through and find the best, most general solution. As a former academic I understand the two points of view, and I don't think either is correct or wrong. Each view has its place. Since I work for a business now, my focus is on making quick fixes that keep the system running smoothly. Solving problems in the "right" way is not important. If the query slows down again later, we will examine the query plan and do whatever we have to do to fix it. It's not elegant, but it gives fast response times to the customers, and that's what matters. PS The case in question is a table with a 3-column primary key on (A, B, C). It also has an index on (B, C). Re-ordering the primary key doesn't help as I do lookups on A only as well. When I specify A, B and C (the primary key), the optimizer chooses the (B, C) index, on the assumption that specifying these two values will return only 1 row. But high correlation between B and C leads to 100s of rows being returned, and the query gets very slow. The quick fix is to say "Use index (A, B, C)". The statistics level fix would be to say "B and C really have high correlation".
> Brian Herlihy <btherl@yahoo.com.au> writes: > > What would it take for hints to be added to postgres? > > A *whole lot* more thought and effort than has been expended on the > subject to date. > > Personally I have no use for the idea of "force the planner to do > exactly X given a query of exactly Y". You don't have exactly Y > today, tomorrow, and the day after (if you do, you don't need a > hint mechanism at all, you need a mysql-style query cache). > IMHO most of the planner mistakes we see that could be fixed via > hinting are really statistical estimation errors, and so the right > level to be fixing them at is hints about how to estimate the number > of rows produced for given conditions. Mind you that's still a plenty > hard problem, but you could at least hope that a hint of that form > would be useful for more than one query. > Do I understand correctly that you're suggesting it might not be a bad idea to allow users to provide statistics? Is this along the lines of "I'm loading a big table and touching every row of data, so I may as well collect some stats along the way" and "I know my data contains these statistical properties, but the analyzer wasn't able to figure that out (or maybe can't figure it out efficiently enough)"? While it seems like this would require more knowledge from the user (e.g. more about their data, how the planner works, and how it uses statistics) this would actually be helpful/required for those who really care about performance. I guess it's the difference between a tool advanced users can get long term benefit from, or a quick fix that will probably come back to bite you. I've been pleased with Postgres' thoughtful design; recently I've been doing some work with MySQL, and can't say I feel the same way. Also, I'm guessing this has already come up at some point, but what about allowing PG to do some stat collection during queries? If you're touching a lot of data (such as an import process) wouldn't it be more efficient (and perhaps more accurate) to collect stats then, rather than having to re-scan? It would be nice to be able to turn this on/off on a per query basis, seeing as it could have pretty negative impacts on OLTP performance... - Bucky
Bucky Jordan wrote: > > Is this along the lines of "I'm loading a big table and touching every > row of data, so I may as well collect some stats along the way" and "I > know my data contains these statistical properties, but the analyzer > wasn't able to figure that out (or maybe can't figure it out efficiently > enough)"? > > While it seems like this would require more knowledge from the user > (e.g. more about their data, how the planner works, and how it uses > statistics) this would actually be helpful/required for those who really > care about performance. ... The user would have to know his data, but he wouldn't need to know how the planner works. While with hints like "use index X", he *does* need to know how the planner works. Being able to give hints about statistical properties of relations and their relationships seems like a good idea to me. And we can later figure out ways to calculate them automatically. BTW, in DB2 you can declare a table as volatile, which means that the cardinality of the table varies greatly. The planner favors index scans on volatile tables. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > BTW, in DB2 you can declare a table as volatile, which means that the > cardinality of the table varies greatly. The planner favors index scans > on volatile tables. Now that seems like a valuable idea. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Tom, I'm interested in the problem of cross-column statistics from a theoretical perspective. It would be interesting to sit down and try to reason out a useful solution, or at very least to understand the problem better so I can anticipate when it might come and eat me. From my understanding, the main problem is that if PG knows the selectivity of n conditions C1,C2,...,Cn then it doesn't know whether the combined selectivity will be C1*C2*...*Cn (conditions are independent) or max(C1,C2,...,Cn) (conditions are strictly dependent), or somewhere in the middle. Therefore, row estimates could be orders of magnitude off. I suppose a common example would be a table with a serial primary key column and a timestamp value which is always inserted as CURRENT_TIMESTAMP, so the two columns are strongly correlated. If the planner guesses that 1% of the rows of the table will match pk>1000000, and 1% of the rows of the table will match timestamp > X, then it would be nice for it to know that if you specify both "pk>1000000 AND timestamp>X" that the combined selectivity is still only 1% and not 1% * 1% = 0.01%. As long as I'm sitting down and reasoning about the problem anyway, are there any other types of cases you're aware of where some form of cross- column statistics would be useful? In the unlikely event that I actually come up with a brilliant and simple solution, I'd at least like to make sure that I'm solving the right problem :) Thanks, Mark Lewis On Tue, 2006-10-10 at 22:38 -0400, Tom Lane wrote: > Brian Herlihy <btherl@yahoo.com.au> writes: > > What would it take for hints to be added to postgres? > > A *whole lot* more thought and effort than has been expended on the > subject to date. > > Personally I have no use for the idea of "force the planner to do > exactly X given a query of exactly Y". You don't have exactly Y > today, tomorrow, and the day after (if you do, you don't need a > hint mechanism at all, you need a mysql-style query cache). > IMHO most of the planner mistakes we see that could be fixed via > hinting are really statistical estimation errors, and so the right > level to be fixing them at is hints about how to estimate the number > of rows produced for given conditions. Mind you that's still a plenty > hard problem, but you could at least hope that a hint of that form > would be useful for more than one query. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match
On Wed, Oct 11, 2006 at 10:27:26AM -0400, Bucky Jordan wrote: > Also, I'm guessing this has already come up at some point, but what > about allowing PG to do some stat collection during queries? If you're > touching a lot of data (such as an import process) wouldn't it be more > efficient (and perhaps more accurate) to collect stats then, rather than > having to re-scan? It would be nice to be able to turn this on/off on a > per query basis, seeing as it could have pretty negative impacts on OLTP > performance... I suspect that could be highly useful in data warehouse environments where you're more likely to have to sequential scan a table. It would be interesting to have it so that a sequential scan that will run to completion also collects stats along the way. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)