Thread: Critical performance problems on large databases
Hi, it had been repeatedly noted that pgsql was sort of slow on selects on large tables. For instance, I know a colleague of mine has asked this recently but on the Google archives I found the following pertient question and response. http://archives.postgresql.org/pgsql-sql/2000-03/msg00031.php Basically if you have a large table with, say, a million rows and you do SELECT * FROM Bigtable; it takes a long time for it to come up with the first page of results using up lots of computer resources etc and after the first page is returned the backend basically goes into idle mode for all the rest of the query results retrieval. We also noted that a SELECT COUNT(*) FROM BigQuery; can take quite a long time and again use a lot of resources, whereas SELECT COUNT(smallcolumn) FROM BigQuery; may be faster and less resource consuming. Off the bat, this indicates to me that there is something sub-obtimal about PostgreSQL handling simple queries. From a database that should perform well in online user transactions one would want the query processing to be streamed as much as possible, i.e., since in a SELECT * FROM Bigtable; there is no work needed other than to retrieve the tuples out of physical storage, the response should be immediate and resource usage low. There should not be large buffer allocations. Conversely it looks as if PostgreSQL will always read a sizeable piece (if not all?) of the result set into some buffer area before returning a single row. This would explain the slow startup on the SELECT * FROM Bigtable; query as well as the fact that COUNT(smallcolumn) behaves much faster than COUNT(*). Again, count should be streamed as well such as to use no significant memory resources other than the counter. Apparently a COUNT(*) in postgres is executed as SELECT * FROM Bigtable INTO $somebuffer COUNT(tuples in $somebuffer) Admittedly I am conjecturing here, but the evidence is strong. Especially because I can make a 3 level subquery with group- by and all kinds of stuff go almost faster than a SELECT * FROM Bigtable; Any enlightenments? Am I wrong? Will this be fixed soon? Is it hard to change pgsql to do better streaming of its operations. As a corollary, I would presume that pgsql won't benefit much from multi-CPU machines because it cannot parallelize its activities. It may be less hard to make all of pgsql thread-safe and threaded than it is to rewrite the execution engine to stream between the different task and do less buffering, right? Or may be all is fine with pgsql and we just didn't figure out how to set up the configuration right? thanks -Gunther PS: we are seriously looking into using pgsql as the core of a BIG medical record system, but we already know that if we can't get quick online responses (< 2 s) on large rasult sets (10000 records) at least at the first page (~ 100 records) we are in trouble. -- Gunther Schadow, M.D., Ph.D. gschadow@regenstrief.org Medical Information Scientist Regenstrief Institute for Health Care Adjunct Assistant Professor Indiana University School of Medicine tel:1(317)630-7960 http://aurora.regenstrief.org
Gunther Schadow <gunther@aurora.regenstrief.org> writes: > We also noted that a > SELECT COUNT(*) FROM BigQuery; > can take quite a long time and again use a lot of resources, > whereas > SELECT COUNT(smallcolumn) FROM BigQuery; > may be faster and less resource consuming. This is complete nonsense... if anything, the second one will take more cycles, since it has to actually examine a column. As for the other thing, use a cursor and "fetch" a few rows at a time if you want to overlap frontend and backend processing. regards, tom lane
On Wed, 10 Apr 2002, Gunther Schadow wrote: > Off the bat, this indicates to me that there is something > sub-obtimal about PostgreSQL handling simple queries. From > a database that should perform well in online user transactions > one would want the query processing to be streamed as much > as possible, i.e., since in a SELECT * FROM Bigtable; there is > no work needed other than to retrieve the tuples out of > physical storage, the response should be immediate and resource > usage low. There should not be large buffer allocations. > > Conversely it looks as if PostgreSQL will always read a sizeable > piece (if not all?) of the result set into some buffer area before > returning a single row. This would explain the slow startup on > the SELECT * FROM Bigtable; query as well as the fact that > COUNT(smallcolumn) behaves much faster than COUNT(*). IIRC, the entire result set is sent across in the select * from bigtable case, possibly to allow random access to the result set? Not sure really. The usual way to deal with these cases is to use limit/offset or a cursor to fetch pieces of the data as you want them (ie: begin; DECLARE foo CURSOR FOR SELECT * FROM Bigtable; FETCH 100 from foo; FETCH 100 from foo; ... end; ) > Again, count should be streamed as well such as to use no > significant memory resources other than the counter. Apparently > a COUNT(*) in postgres is executed as > > SELECT * FROM Bigtable INTO $somebuffer > COUNT(tuples in $somebuffer) I believe the thing that takes a long time here is that it has to do a sequential scan of Bigtable, which for large tables is rather time consuming. I generally haven't seen large growth of backends on moderate sized tables for such queries, although due to the sequential scan I try to avoid count() whenever possible.
On Wed, 10 Apr 2002, Tom Lane wrote: > Gunther Schadow <gunther@aurora.regenstrief.org> writes: > > We also noted that a > > SELECT COUNT(*) FROM BigQuery; > > can take quite a long time and again use a lot of resources, > > whereas > > SELECT COUNT(smallcolumn) FROM BigQuery; > > may be faster and less resource consuming. > > This is complete nonsense... if anything, the second one will take > more cycles, since it has to actually examine a column. > OK, I hate to do this, partly because I seem to remember mention of how this sort of thing has to happen like this because the system isn't clever enough to do the optimisation, but... Shouldn't SELECT count(*) FROM chat_post give an immediate answer, especially when there's an index it can use? Example, with what appears to be an overkill but the the primary key is a two column affair where as the 'time' column counted in the second select has it's own index: avid_chat_archive=> SET ENABLE_SEQSCAN TO FALSE; SET VARIABLE avid_chat_archive=> EXPLAIN ANALYZE SELECT COUNT(*) FROM chat_post; NOTICE: QUERY PLAN: Aggregate (cost=100020853.10..100020853.10 rows=1 width=0) (actual time=48557.4 8..48557.49 rows=1 loops=1) -> Seq Scan on chat_post (cost=100000000.00..100018291.08 rows=1024808 width =0) (actual time=6.68..32598.56 rows=1024808 loops=1) Total runtime: 48557.93 msec EXPLAIN avid_chat_archive=> SELECT COUNT(*) FROM chat_post; count --------- 1024808 (1 row) avid_chat_archive=> EXPLAIN ANALYZE SELECT COUNT(time) FROM chat_post; NOTICE: QUERY PLAN: Aggregate (cost=100020853.10..100020853.10 rows=1 width=8) (actual time=51314.5 4..51314.55 rows=1 loops=1) -> Seq Scan on chat_post (cost=100000000.00..100018291.08 rows=1024808 width =8) (actual time=6.78..35012.81 rows=1024808 loops=1) Total runtime: 51314.97 msec EXPLAIN avid_chat_archive=> SELECT COUNT(time) FROM chat_post; count --------- 1024808 (1 row) avid_chat_archive=> \d chat_post Table "chat_post" Column | Type | Modifiers -------------+--------------------------+----------- session_id | smallint | not null poster_name | character varying(32) | not null time | timestamp with time zone | not null post_number | smallint | not null Indexes: chat_post_time_key, chat_post_user_key Primary key: chat_post_pkey Triggers: RI_ConstraintTrigger_4369014, RI_ConstraintTrigger_4369012, RI_ConstraintTrigger_4369010, RI_ConstraintTrigger_4369008, RI_ConstraintTrigger_4369006 avid_chat_archive=> Having muttered about the primary key using two columns I see the planner can see the table size without having to revert to an index. Which makes sense if only I'd turned my brain on first. Anyway, the question still stands, why does postgres do this query this way? It is doing the full sequential scan, i.e. fetching the tuples from disk, when this data is not necessary for the query result. Is it to do with calling requirement of count(), other aggregate functions and/or functions in general when used in the return list and/or that it requires too much intelligence for the system to determine such optimisations? And, I can't see this specific item addressed in the FAQ. I'm sure I'm not the only to have brought this up (oh wait, I'm reply to a related message so obviously not). Shouldn't it be there? -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
On Thu, 11 Apr 2002, Nigel J. Andrews wrote: > > On Wed, 10 Apr 2002, Tom Lane wrote: > > > Gunther Schadow <gunther@aurora.regenstrief.org> writes: > > > We also noted that a > > > SELECT COUNT(*) FROM BigQuery; > > > can take quite a long time and again use a lot of resources, > > > whereas > > > SELECT COUNT(smallcolumn) FROM BigQuery; > > > may be faster and less resource consuming. > > > > This is complete nonsense... if anything, the second one will take > > more cycles, since it has to actually examine a column. > > > > OK, I hate to do this, partly because I seem to remember mention of how this > sort of thing has to happen like this because the system isn't clever enough to > do the optimisation, but... > > [sniped] > Forgot to say this is on 7.2, I'm currently having some odd difficulty running the regression tests for 7.2.1 -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
> Having muttered about the primary key using two columns I see the planner can > see the table size without having to revert to an index. Which makes sense if > only I'd turned my brain on first. The planner only gets the estimate from the last analyze/vacuum. > Anyway, the question still stands, why does postgres do this query this > way? It is doing the full sequential scan, i.e. fetching the tuples from > disk, when this data is not necessary for the query result. Is it to do with > calling requirement of count(), other aggregate functions and/or functions in > general when used in the return list and/or that it requires too much > intelligence for the system to determine such optimisations? The reason is that it needs to visit those rows anyway to see whether they are visible to your transaction. If you're visiting every row, sequential scan is faster than index scan. If the index had the visibility information this could be done purely by index, but there are problems with doing that as well (I don't know them, but they're in messages on this topic from the past).
I can return result sets of 4K records from a database with 2M records in tenths of a second. Smaller sets of 100 to 800 records take .02 to 0.1 second. This is on linux, pentium III at 500 MHz, pg 7.2.1. Finding the right indexes made a *big* difference. Index using the same criteria you use for select. -- P. J. "Josh" Rovero Sonalysts, Inc. Email: rovero@sonalysts.com www.sonalysts.com 215 Parkway North Work: (860)326-3671 or 442-4355 Waterford CT 06385 ***********************************************************************
On Wed, 2002-04-10 at 17:39, Gunther Schadow wrote: > PS: we are seriously looking into using pgsql as the core > of a BIG medical record system, but we already know that > if we can't get quick online responses (< 2 s) on > large rasult sets (10000 records) at least at the first > page (~ 100 records) we are in trouble. There are a few tricks to getting fast results for pages of data in large tables. I have an application in which we have a scrolling window displaying data from a million-row table, and I have been able to make it fairly interactively responsive (enough that it's not a problem). We grab pages of a few screenfuls of data at a time using LIMIT / OFFSET, enough to scroll smoothly over a short range. For LIMIT / OFFSET queries to be fast, I found it was necessary to CREATE INDEX, CLUSTER and ORDER BY the key field. Then the biggest slowdown is count(*), which we have to do in order to fake up the scrollbar (so we know what proportion of the data has been scrolled through). I have not completely foxed this yet. I want to keep a separate mini-table of how many records are in the big table and update it with a trigger (the table is mostly static). ATM, I just try hard to minimize the times I call count(*). b.g.
On 11 Apr 2002, Bill Gribble wrote: > On Wed, 2002-04-10 at 17:39, Gunther Schadow wrote: > > PS: we are seriously looking into using pgsql as the core > > of a BIG medical record system, but we already know that > > if we can't get quick online responses (< 2 s) on > > large rasult sets (10000 records) at least at the first > > page (~ 100 records) we are in trouble. > > There are a few tricks to getting fast results for pages of data in > large tables. I have an application in which we have a scrolling window > displaying data from a million-row table, and I have been able to make > it fairly interactively responsive (enough that it's not a problem). > > We grab pages of a few screenfuls of data at a time using LIMIT / > OFFSET, enough to scroll smoothly over a short range. For LIMIT / > OFFSET queries to be fast, I found it was necessary to CREATE INDEX, > CLUSTER and ORDER BY the key field. I took a different approach in what I did. I considered selecting the requested page of data only. However, I considered the fact that as the query had been requested it was likely that not just this one page would be viewed by the user. On top of that I always use a chronological time order for results which I could probably do without explicitly specifying in the query if I reloaded the data in time order to sort out the few blocks or rows that are out of sequence it isn't exactly correct to rely on the 'natural' order is it. This normal usage means that I very rarely will avoid a long running query on this large table and since my web server doesn't transmit the page data until the CGI-BIN process has finished I decided that I would utilise all the required multiple page data from the long running query on it's first request by caching the entire set of pages for it. I can do this becuase of the nature of the updates to the data. Updates are in batches at known times and so the cached pages for presentation can be held and then expired at the appropiate times. This doesn't give the quick response time Gunther is looking for but I think it is an alternative for some situations. > Then the biggest slowdown is count(*), which we have to do in order to > fake up the scrollbar (so we know what proportion of the data has been > scrolled through). I have not completely foxed this yet. I want to > keep a separate mini-table of how many records are in the big table and > update it with a trigger (the table is mostly static). ATM, I just try > hard to minimize the times I call count(*). Now this idea I like. I do count some things in the big table and again despite there being an index on the column used in the GROUP BY it does a seqscan. However, I can see that adding triggers to insert etc. in a table and maintain counts is going to hit the data loading time but is going to speed the count accesses tremendously. -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
On Wed, 10 Apr 2002, Gunther Schadow wrote: > SELECT * FROM Bigtable; > > it takes a long time for it to come up with the first page > of results using up lots of computer resources etc and after > the first page is returned the backend basically goes into > idle mode for all the rest of the query results retrieval. Very simple. PHP and PERL DBI, when doing non-cursor queries, will try and buffer the *entire* result into memory before letting you have it. In PHP, when you run a pg_exec, or mysql_query, or whatever, it actually executes the query, and stores all of the results in memory. Don't believe me? Try closing the database connection, and then do a fetch on the result identifier. Heck, fetch until you reach the end. You'll get all of your results. Even though the database is closed. So when you are doing a "SELECT * FROM bigtable", you're telling PHP to not only buffer over a million rows, but to transfer it from the database as well. "SELECT COUNT(*) FROM bigtable" doesn't have this overhead. The database just returns a single row, which you can view basically as soon as the database is done. There's also something to be said for the LIMIT clause. Somehow I doubt you need every single row in the entire table in order to do your work on that page. Just adding LIMIT/OFFSET will increase your speed significantly. Or, as someone else mentioned, use a cursor, and let the database buffer your query. It'll be slow too, but you can at least fetch the rows individually and get the perception of speed in your application. > no work needed other than to retrieve the tuples out of > physical storage, the response should be immediate and resource > usage low. There should not be large buffer allocations. I see how you can get into this kind of thinking. Since select * has no ordering, no calculations, nothing but just returning raw results from the table in whatever format they may be in, how could it possibly be resource intensive? But the logic is wrong. If you want all the data, it'll give it to you. Whether you do it at the console, and it has to throw everything at the screen buffer, or your application, which has to put it in some temporary storage until it gets what it needs and deallocates the memory. That data has to go *somewhere*, it can't just exist in limbo until you decide you want to use some of it. You can use cursors to get around this, because the database basically runs the query and doesn't send back squat until you actually ask it to. But I'll tell you a well selected limit clause will work almost as well, and reduce the need for the database to maintain a cursor. Say you're on page 5, and you are showing 20 results per page. Knowing that results start at 0, you can get the offset just by doing: (PAGE - 1) * PER_PAGE = (5-1) * 20 = 4*20 = 80. So your query becomes: SELECT * FROM Bigtable LIMIT 20 OFFSET 80. And viola. You'll get back 20 rows right where you want them. Also, take care of what you select from the table. Maybe you don't actually need all of the data, but only certain rows. The less data that has to be transferred from database to application, the faster you can get/show your data. Especially if your database is on a separate machine from your application server. Network transfers are *not* instantaneous. Try: SELECT col1,col2,col3,etc FROM Bigtable LIMIT x OFFSET y instead. You'll save yourself some time, save the database the effort of loading all million rows of every column in the table into the network interface, and save your application the need to fetch them. > COUNT(smallcolumn) behaves much faster than COUNT(*). Or you can do what every database optimization book in the entire universe says, and do COUNT(1). Since all you're testing is the existence of the row, not any value of anything in it. > Again, count should be streamed as well such as to use no > significant memory resources other than the counter. Actually, it's converted into: "*, eh? I think not. Let me rewrite this query and put a 1 here instead." The database tries to keep people from shooting themselves in the foot, but even the most Herculean effort will fail when the user is bound and determined to blow off their big toe. > Any enlightenments? Am I wrong? Will this be fixed soon? > Is it hard to change pgsql to do better streaming of its > operations. Read a few books on database design and optimization, the basics of application memory allocation, and the TCP/IP stack. If you want to understand how to make an application fast from front to back, you have to understand the components that make it work. Your knowledge of application memory performance and network latency seems inherently flawed, and until you get over the assumption that network transfers are free and optimizing queries is a fool's errand, you'll continue to have problems in any database you choose. -- +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ | Shaun M. Thomas INN Database Administrator | | Phone: (309) 743-0812 Fax : (309) 743-0830 | | Email: sthomas@townnews.com AIM : trifthen | | Web : www.townnews.com | | | | "Most of our lives are about proving something, either to | | ourselves or to someone else." | | -- Anonymous | +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
On Thu, Apr 11, 2002 at 02:05:07PM +0100, Nigel J. Andrews wrote: > On 11 Apr 2002, Bill Gribble wrote: > > Then the biggest slowdown is count(*), which we have to do in order to > > fake up the scrollbar (so we know what proportion of the data has been > > scrolled through). > seqscan. However, I can see that adding triggers to insert etc. in > a table and maintain counts is going to hit the data loading time > but is going to speed the count accesses tremendously. I suspect the trigger would be pretty miserable for inserts. But why use a trigger? If you need just pretty-close results, you could have a process that runs (say) every 10 minutes which updated a table of stats. (Naturally, if you need really accurate numbers, that's no help. But If the idea is just a "record _r_ of _nnnn_ results" sort of message, you can do what large databases have been doing for years: if a query returns more than some small-ish number of records, look in the stats table. Then you say "record _r_ of approximately _nnnn_ results". A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
Hi, thanks everyone who responded. I want to set this whole COUNT(...) issue aside for a moment because I have reported hearsay instead of my own experiences. I apologize for muddying the waters with that COUNT issue. But I still want to respond to the SELECT * FROM Bigtable; issue once more. [[[I have to say I am a bit confused by some of the responses who basically shrugged the problem off sending the asker from anywhere between back to the schoolbooks to the mental facility. That's not necessary.]]] I am delighted to hear that one respondent has no problem with a 2M table and snappy response, but unfortunately he didn't say much about the detail, was that really select * from bigtable; queries or do we have where clauses and stuff that limits the result set considerably? A number of respondents confirmed my observations, so I think the problem is real. There was one remark about Perl or PHP always loading the complete result set before returning. Bad for them. I don't use either and I think it's just bad design to do that on the client but I don't care about bad clients. I care about a good server. The constructive responses suggested that I use LIMIT/OFFSET and CURSORs. I can see how that could be a workaround the problem, but I still believe that something is wrong with the PostgreSQL query executer. Loading the entire result set into a buffer without need just makes no sense. Good data base engines try to provide for parallel execution of the query plan as much as possible, and that implies streaming. There's a crowd of literature about this testifying for it's importance. The main reasons for this is (a) the use on multi-processor machines where one CPU does one task and the other does another task on the same query plan and the results from CPU 1 is streamed to CPU 2 (hey, I have a 6 processor machine in my basement.) Perhaps more importantly (b) buffering (without need) is inherently bad, because it wastes memory resources leads to bursty demand on CPU and network, and slow perceived response times. Buffering is a complete waste if the buffer is being paged out to disk again and it isn't flexible or scaleable if buffer pages are fixed into physical memory. Straming is especially important if you want to do distributed joins (and though pgsql doesn't support that yet it would be foolish to close your eyes before a fundamental problem and then being forced to rework this in a hurry when the time comes for distributed PostgreSQL.) So, while my client application might benefit from such things as cursors and OFFSET/LIMIT, the query planning and executing may suffer from the buffering. And of course, the point is that it makes sense to design the server such that streaming results to the client is transparent because it automatically relieves the strain on all resources, CPU, storage and network! Isn't that obvious? regards -Gunther -- Gunther Schadow, M.D., Ph.D. gschadow@regenstrief.org Medical Information Scientist Regenstrief Institute for Health Care Adjunct Assistant Professor Indiana University School of Medicine tel:1(317)630-7960 http://aurora.regenstrief.org
Gunther Schadow <gunther@aurora.regenstrief.org> writes: > The constructive responses suggested that I use LIMIT/OFFSET and > CURSORs. I can see how that could be a workaround the problem, but > I still believe that something is wrong with the PostgreSQL query > executer. Loading the entire result set into a buffer without > need just makes no sense. The Postgres backend does not do that. Most of the frontend client-side libraries do, but feel free to write one that does not. Offhand I think the only really serious downside to letting the application code process the result in a streaming fashion is that the application would have to be prepared to undo whatever it's done so far, if it gets an error report partway through the result set. An example: SELECT 1/x FROM foo; where foo.x contains zeroes here and there. You'll get some rows out before the query is abandoned with a divide-by-zero error. By accumulating the result set, the existing libraries are able to offer a cleaner yes-it-succeeded or no-it-didn't API. But this is surely not a fatal objection, just a necessary piece of a less-clean streaming API. > [snip] > And of course, the point is that it makes > sense to design the server such that streaming results to the > client is transparent because it automatically relieves the strain > on all resources, CPU, storage and network! Isn't that obvious? The reason you got brush-off responses before was that you went into lecture mode when you clearly hadn't spent any effort studying Postgres internals. You're still doing that... regards, tom lane
> The constructive responses suggested that I use LIMIT/OFFSET and > CURSORs. I can see how that could be a workaround the problem, but > I still believe that something is wrong with the PostgreSQL query > executer. Loading the entire result set into a buffer without > need just makes no sense. Good data base engines try to provide > for parallel execution of the query plan as much as possible, and > that implies streaming. There's a crowd of literature about this > testifying for it's importance. > > The main reasons for this is (a) the use on multi-processor machines > where one CPU does one task and the other does another task on the > same query plan and the results from CPU 1 is streamed to CPU 2 (hey, > I have a 6 processor machine in my basement.) Perhaps more importantly > (b) buffering (without need) is inherently bad, because it wastes > memory resources leads to bursty demand on CPU and network, and slow > perceived response times. Buffering is a complete waste if the buffer > is being paged out to disk again and it isn't flexible or scaleable > if buffer pages are fixed into physical memory. Straming is > especially important if you want to do distributed joins (and though > pgsql doesn't support that yet it would be foolish to close your eyes > before a fundamental problem and then being forced to rework this in > a hurry when the time comes for distributed PostgreSQL.) > > So, while my client application might benefit from such things as > cursors and OFFSET/LIMIT, the query planning and executing may > suffer from the buffering. And of course, the point is that it makes > sense to design the server such that streaming results to the > client is transparent because it automatically relieves the strain > on all resources, CPU, storage and network! Isn't that obvious? My limited understanding of the internals is that except for steps that want their entire data set for providing output (sort) and the final send everything to the client, internally each node asks other nodes for rows as they need them and those nodes can provide results on an as needed basis without necessarily buffering their result set.
Tom Lane wrote: > Gunther Schadow <gunther@aurora.regenstrief.org> writes: > >>The constructive responses suggested that I use LIMIT/OFFSET and >>CURSORs. I can see how that could be a workaround the problem, but >>I still believe that something is wrong with the PostgreSQL query >>executer. Loading the entire result set into a buffer without >>need just makes no sense. >> > > The Postgres backend does not do that. Most of the frontend client-side > libraries do, but feel free to write one that does not. Thanks, Tom, that's helpful! So, are you saying the the psql client does all the buffering and not the server? Is the buffering mandatory when using the libpq API? When you say "most frontend client side libraries" do you know of any exceptions who don't do that? Is the odbc client doing this too? I can see myself fixing this problem, but would like to know if there is something else already? thanks for the info, -Gunther -- Gunther Schadow, M.D., Ph.D. gschadow@regenstrief.org Medical Information Scientist Regenstrief Institute for Health Care Adjunct Assistant Professor Indiana University School of Medicine tel:1(317)630-7960 http://aurora.regenstrief.org
The big issue with LIMIT,OFFSET is that it still use all rows for sorting. I already suggested to use partial sorting to avoid sorting all rows if one selected only first 20 row, for example. It's very important for Web applications because web users usually read first 1-2 pages. Our experimnets have shown 6 times performance win when using partial sorting. Oleg On Thu, 11 Apr 2002, Shaun Thomas wrote: > On Wed, 10 Apr 2002, Gunther Schadow wrote: > > > SELECT * FROM Bigtable; > > > > it takes a long time for it to come up with the first page > > of results using up lots of computer resources etc and after > > the first page is returned the backend basically goes into > > idle mode for all the rest of the query results retrieval. > > Very simple. PHP and PERL DBI, when doing non-cursor queries, will try > and buffer the *entire* result into memory before letting you have it. > In PHP, when you run a pg_exec, or mysql_query, or whatever, it actually > executes the query, and stores all of the results in memory. Don't > believe me? Try closing the database connection, and then do a fetch on > the result identifier. Heck, fetch until you reach the end. > > You'll get all of your results. Even though the database is closed. > > So when you are doing a "SELECT * FROM bigtable", you're telling PHP to > not only buffer over a million rows, but to transfer it from the > database as well. "SELECT COUNT(*) FROM bigtable" doesn't have > this overhead. The database just returns a single row, which you > can view basically as soon as the database is done. > > There's also something to be said for the LIMIT clause. Somehow I > doubt you need every single row in the entire table in order to do > your work on that page. Just adding LIMIT/OFFSET will increase your > speed significantly. > > Or, as someone else mentioned, use a cursor, and let the database buffer > your query. It'll be slow too, but you can at least fetch the rows > individually and get the perception of speed in your application. > > > no work needed other than to retrieve the tuples out of > > physical storage, the response should be immediate and resource > > usage low. There should not be large buffer allocations. > > I see how you can get into this kind of thinking. Since select * has no > ordering, no calculations, nothing but just returning raw results from > the table in whatever format they may be in, how could it possibly be > resource intensive? But the logic is wrong. If you want all the data, > it'll give it to you. Whether you do it at the console, and it has to > throw everything at the screen buffer, or your application, which has > to put it in some temporary storage until it gets what it needs and > deallocates the memory. That data has to go *somewhere*, it can't just > exist in limbo until you decide you want to use some of it. > > You can use cursors to get around this, because the database basically > runs the query and doesn't send back squat until you actually ask it to. > But I'll tell you a well selected limit clause will work almost as well, > and reduce the need for the database to maintain a cursor. > > Say you're on page 5, and you are showing 20 results per page. Knowing > that results start at 0, you can get the offset just by doing: > > (PAGE - 1) * PER_PAGE = (5-1) * 20 = 4*20 = 80. So your query becomes: > > SELECT * FROM Bigtable LIMIT 20 OFFSET 80. > > And viola. You'll get back 20 rows right where you want them. Also, > take care of what you select from the table. Maybe you don't actually > need all of the data, but only certain rows. The less data that has to > be transferred from database to application, the faster you can get/show > your data. Especially if your database is on a separate machine from > your application server. Network transfers are *not* instantaneous. > > Try: > > SELECT col1,col2,col3,etc FROM Bigtable LIMIT x OFFSET y > > instead. You'll save yourself some time, save the database the effort > of loading all million rows of every column in the table into the > network interface, and save your application the need to fetch them. > > > COUNT(smallcolumn) behaves much faster than COUNT(*). > > Or you can do what every database optimization book in the entire > universe says, and do COUNT(1). Since all you're testing is the > existence of the row, not any value of anything in it. > > > Again, count should be streamed as well such as to use no > > significant memory resources other than the counter. > > Actually, it's converted into: "*, eh? I think not. Let me rewrite > this query and put a 1 here instead." > > The database tries to keep people from shooting themselves in the foot, > but even the most Herculean effort will fail when the user is bound and > determined to blow off their big toe. > > > Any enlightenments? Am I wrong? Will this be fixed soon? > > Is it hard to change pgsql to do better streaming of its > > operations. > > Read a few books on database design and optimization, the basics of > application memory allocation, and the TCP/IP stack. If you want to > understand how to make an application fast from front to back, you have > to understand the components that make it work. Your knowledge of > application memory performance and network latency seems inherently > flawed, and until you get over the assumption that network transfers are > free and optimizing queries is a fool's errand, you'll continue to have > problems in any database you choose. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov wrote: > The big issue with LIMIT,OFFSET is that it still use all rows > for sorting. I already suggested to use partial sorting to avoid > sorting all rows if one selected only first 20 row, for example. > It's very important for Web applications because web users usually > read first 1-2 pages. Our experimnets have shown 6 times performance > win when using partial sorting. That is interesting. I haven't even dared to ask for this. Did you contribute that partial sorting code to postgresql? thanks, -Gunther -- Gunther Schadow, M.D., Ph.D. gschadow@regenstrief.org Medical Information Scientist Regenstrief Institute for Health Care Adjunct Assistant Professor Indiana University School of Medicine tel:1(317)630-7960 http://aurora.regenstrief.org
Oleg Bartunov wrote: > The big issue with LIMIT,OFFSET is that it still use all rows > for sorting. I already suggested to use partial sorting to avoid > sorting all rows if one selected only first 20 row, for example. > It's very important for Web applications because web users usually > read first 1-2 pages. Our experimnets have shown 6 times performance > win when using partial sorting. We do have this TODO item: * Allow ORDER BY ... LIMIT to select top values without sort or index using a sequential scan for highest/lowest values -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
I have had a simular issue when doing a select one a table of 10k or 15k records. It takes a long time to get the data back. I am using JDBC. If I take a subset of the table it returns very quickly. I figured that the JDBC driver is buffering all the data. I guess I will have to try the cursor or the LIMIT,OFFSET next as well. -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Oleg Bartunov Sent: Thursday, April 11, 2002 1:51 PM To: Shaun Thomas Cc: Gunther Schadow; pgsql-general@postgresql.org Subject: Re: [GENERAL] Critical performance problems on large databases The big issue with LIMIT,OFFSET is that it still use all rows for sorting. I already suggested to use partial sorting to avoid sorting all rows if one selected only first 20 row, for example. It's very important for Web applications because web users usually read first 1-2 pages. Our experimnets have shown 6 times performance win when using partial sorting. Oleg On Thu, 11 Apr 2002, Shaun Thomas wrote: > On Wed, 10 Apr 2002, Gunther Schadow wrote: > > > SELECT * FROM Bigtable; > > > > it takes a long time for it to come up with the first page > > of results using up lots of computer resources etc and after > > the first page is returned the backend basically goes into > > idle mode for all the rest of the query results retrieval. > > Very simple. PHP and PERL DBI, when doing non-cursor queries, will try > and buffer the *entire* result into memory before letting you have it. > In PHP, when you run a pg_exec, or mysql_query, or whatever, it actually > executes the query, and stores all of the results in memory. Don't > believe me? Try closing the database connection, and then do a fetch on > the result identifier. Heck, fetch until you reach the end. > > You'll get all of your results. Even though the database is closed. > > So when you are doing a "SELECT * FROM bigtable", you're telling PHP to > not only buffer over a million rows, but to transfer it from the > database as well. "SELECT COUNT(*) FROM bigtable" doesn't have > this overhead. The database just returns a single row, which you > can view basically as soon as the database is done. > > There's also something to be said for the LIMIT clause. Somehow I > doubt you need every single row in the entire table in order to do > your work on that page. Just adding LIMIT/OFFSET will increase your > speed significantly. > > Or, as someone else mentioned, use a cursor, and let the database buffer > your query. It'll be slow too, but you can at least fetch the rows > individually and get the perception of speed in your application. > > > no work needed other than to retrieve the tuples out of > > physical storage, the response should be immediate and resource > > usage low. There should not be large buffer allocations. > > I see how you can get into this kind of thinking. Since select * has no > ordering, no calculations, nothing but just returning raw results from > the table in whatever format they may be in, how could it possibly be > resource intensive? But the logic is wrong. If you want all the data, > it'll give it to you. Whether you do it at the console, and it has to > throw everything at the screen buffer, or your application, which has > to put it in some temporary storage until it gets what it needs and > deallocates the memory. That data has to go *somewhere*, it can't just > exist in limbo until you decide you want to use some of it. > > You can use cursors to get around this, because the database basically > runs the query and doesn't send back squat until you actually ask it to. > But I'll tell you a well selected limit clause will work almost as well, > and reduce the need for the database to maintain a cursor. > > Say you're on page 5, and you are showing 20 results per page. Knowing > that results start at 0, you can get the offset just by doing: > > (PAGE - 1) * PER_PAGE = (5-1) * 20 = 4*20 = 80. So your query becomes: > > SELECT * FROM Bigtable LIMIT 20 OFFSET 80. > > And viola. You'll get back 20 rows right where you want them. Also, > take care of what you select from the table. Maybe you don't actually > need all of the data, but only certain rows. The less data that has to > be transferred from database to application, the faster you can get/show > your data. Especially if your database is on a separate machine from > your application server. Network transfers are *not* instantaneous. > > Try: > > SELECT col1,col2,col3,etc FROM Bigtable LIMIT x OFFSET y > > instead. You'll save yourself some time, save the database the effort > of loading all million rows of every column in the table into the > network interface, and save your application the need to fetch them. > > > COUNT(smallcolumn) behaves much faster than COUNT(*). > > Or you can do what every database optimization book in the entire > universe says, and do COUNT(1). Since all you're testing is the > existence of the row, not any value of anything in it. > > > Again, count should be streamed as well such as to use no > > significant memory resources other than the counter. > > Actually, it's converted into: "*, eh? I think not. Let me rewrite > this query and put a 1 here instead." > > The database tries to keep people from shooting themselves in the foot, > but even the most Herculean effort will fail when the user is bound and > determined to blow off their big toe. > > > Any enlightenments? Am I wrong? Will this be fixed soon? > > Is it hard to change pgsql to do better streaming of its > > operations. > > Read a few books on database design and optimization, the basics of > application memory allocation, and the TCP/IP stack. If you want to > understand how to make an application fast from front to back, you have > to understand the components that make it work. Your knowledge of > application memory performance and network latency seems inherently > flawed, and until you get over the assumption that network transfers are > free and optimizing queries is a fool's errand, you'll continue to have > problems in any database you choose. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
On Thu, 11 Apr 2002, Gunther Schadow wrote: > Oleg Bartunov wrote: > > > The big issue with LIMIT,OFFSET is that it still use all rows > > for sorting. I already suggested to use partial sorting to avoid > > sorting all rows if one selected only first 20 row, for example. > > It's very important for Web applications because web users usually > > read first 1-2 pages. Our experimnets have shown 6 times performance > > win when using partial sorting. > > > That is interesting. I haven't even dared to ask for this. > Did you contribute that partial sorting code to postgresql? A year ago we had a very crude patch which implement partial sorting. We had hope to work close on it for 7.2/3 but seems we have no time. If you're ready to work on this issue we could think about contributing our libpsort to postgres. > > thanks, > -Gunther > > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Gunther Schadow wrote: > Oleg Bartunov wrote: > > > The big issue with LIMIT,OFFSET is that it still use all rows > > for sorting. I already suggested to use partial sorting to avoid > > sorting all rows if one selected only first 20 row, for example. > > It's very important for Web applications because web users usually > > read first 1-2 pages. Our experimnets have shown 6 times performance > > win when using partial sorting. > > > That is interesting. I haven't even dared to ask for this. > Did you contribute that partial sorting code to postgresql? We have the TODO item, but the code is not written yet. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Thu, 11 Apr 2002, Bruce Momjian wrote: > Oleg Bartunov wrote: > > The big issue with LIMIT,OFFSET is that it still use all rows > > for sorting. I already suggested to use partial sorting to avoid > > sorting all rows if one selected only first 20 row, for example. > > It's very important for Web applications because web users usually > > read first 1-2 pages. Our experimnets have shown 6 times performance > > win when using partial sorting. > > We do have this TODO item: > > * Allow ORDER BY ... LIMIT to select top values without sort or index > using a sequential scan for highest/lowest values > looks too complex to me :-) I like partial sorting, but it's not important. > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
On Thu, 11 Apr 2002, Bruce Momjian wrote: > Gunther Schadow wrote: > > Oleg Bartunov wrote: > > > > > The big issue with LIMIT,OFFSET is that it still use all rows > > > for sorting. I already suggested to use partial sorting to avoid > > > sorting all rows if one selected only first 20 row, for example. > > > It's very important for Web applications because web users usually > > > read first 1-2 pages. Our experimnets have shown 6 times performance > > > win when using partial sorting. > > > > > > That is interesting. I haven't even dared to ask for this. > > Did you contribute that partial sorting code to postgresql? > > > We have the TODO item, but the code is not written yet. > Actually, we have libpsort for partial sorting and very crude patch for 7.1 Jan and Tom were not optimist about that patch, so we need one bright person who could incorporate the idea of partial sorting and current sources and satisfy core developers. We have no time, unfortunately > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov wrote: > On Thu, 11 Apr 2002, Bruce Momjian wrote: > > >>Oleg Bartunov wrote: >> >>>The big issue with LIMIT,OFFSET is that it still use all rows >>>for sorting. I already suggested to use partial sorting to avoid >>>sorting all rows if one selected only first 20 row, for example. >>>It's very important for Web applications because web users usually >>>read first 1-2 pages. Our experimnets have shown 6 times performance >>>win when using partial sorting. >>> >>We do have this TODO item: >> >> * Allow ORDER BY ... LIMIT to select top values without sort or index >> using a sequential scan for highest/lowest values >> >> > > looks too complex to me :-) I like partial sorting, but it's not > important. Oleg, I might argee. I might even take some of this one. But I think a first step would be for you to put your libpsort and the pacth "out there" so that someone could just look and try and see if taking the time to do what needs to be done is feasible with this. If you can put up your lib and patch and documentation (even crude notes)) onto a web site, that would be a good start. regards -Gunther -- Gunther Schadow, M.D., Ph.D. gschadow@regenstrief.org Medical Information Scientist Regenstrief Institute for Health Care Adjunct Assistant Professor Indiana University School of Medicine tel:1(317)630-7960 http://aurora.regenstrief.org
Well, I've got several responses.. I'm far from office and has slow and unstable connection, so I will return to this topic next week. Oleg On Thu, 11 Apr 2002, Gunther Schadow wrote: > Oleg Bartunov wrote: > > > On Thu, 11 Apr 2002, Bruce Momjian wrote: > > > > > >>Oleg Bartunov wrote: > >> > >>>The big issue with LIMIT,OFFSET is that it still use all rows > >>>for sorting. I already suggested to use partial sorting to avoid > >>>sorting all rows if one selected only first 20 row, for example. > >>>It's very important for Web applications because web users usually > >>>read first 1-2 pages. Our experimnets have shown 6 times performance > >>>win when using partial sorting. > >>> > >>We do have this TODO item: > >> > >> * Allow ORDER BY ... LIMIT to select top values without sort or index > >> using a sequential scan for highest/lowest values > >> > >> > > > > looks too complex to me :-) I like partial sorting, but it's not > > important. > > > Oleg, I might argee. I might even take some of this one. But I > think a first step would be for you to put your libpsort and > the pacth "out there" so that someone could just look and try > and see if taking the time to do what needs to be done is feasible > with this. If you can put up your lib and patch and documentation > (even crude notes)) onto a web site, that would be a good start. > > regards > -Gunther > > > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
At 11:09 AM 4/11/02 -0500, Gunther Schadow wrote: >There was one remark about Perl or PHP always loading the complete >result set before returning. Bad for them. I don't use either and >I think it's just bad design to do that on the client but I don't >care about bad clients. I care about a good server. >The constructive responses suggested that I use LIMIT/OFFSET and >CURSORs. I can see how that could be a workaround the problem, but >I still believe that something is wrong with the PostgreSQL query >executer. Loading the entire result set into a buffer without >need just makes no sense. Good data base engines try to provide AFAIK you can turn off buffering the whole result set with perl DBI/DBD. Buffering allows programmers to easily use each row of a query to make other queries, even if the DB doesn't support cursors. e.g. select ID from table1; for each row returned { select ID2 from table2 where x=ID; for each row returned: .... } Without buffering or cursors you will have to open another connection or store all rows somewhere then do the subsequent queries using the stored rows. Cheerio, Link.
Bruce Momjian wrote: > > Oleg Bartunov wrote: > > The big issue with LIMIT,OFFSET is that it still use all rows > > for sorting. I already suggested to use partial sorting to avoid > > sorting all rows if one selected only first 20 row, for example. > > It's very important for Web applications because web users usually > > read first 1-2 pages. Our experimnets have shown 6 times performance > > win when using partial sorting. > > We do have this TODO item: > > * Allow ORDER BY ... LIMIT to select top values without sort or index > using a sequential scan for highest/lowest values It may not be wise to disregard the index if there is one for the ORDER BY. JLL > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html
Jean-Luc Lachance wrote: > Bruce Momjian wrote: > > > > Oleg Bartunov wrote: > > > The big issue with LIMIT,OFFSET is that it still use all rows > > > for sorting. I already suggested to use partial sorting to avoid > > > sorting all rows if one selected only first 20 row, for example. > > > It's very important for Web applications because web users usually > > > read first 1-2 pages. Our experimnets have shown 6 times performance > > > win when using partial sorting. > > > > We do have this TODO item: > > > > * Allow ORDER BY ... LIMIT to select top values without sort or index > > using a sequential scan for highest/lowest values > > It may not be wise to disregard the index if there is one for the ORDER > BY. Yes, this would be for cases where the index isn't useful, either because ther isn't on or the join order makes it useless. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Thanks, Tom, that's helpful! So, are you saying the the psql > client does all the buffering and not the server? Is the > buffering mandatory when using the libpq API? When you say > "most frontend client side libraries" do you know of any > exceptions who don't do that? Is the odbc client doing this > too? Yup, yup, and all the ones I know about but I do not know the details of several (including jdbc and odbc). > I can see myself fixing this problem, but would like to know > if there is something else already? Not that I know of. I could imagine extending libpq reasonably straightforwardly: assuming that you use PQsendQuery/PQgetResult, you could add a PQgetPartialResult() that would extract the rows-received-so-far. When it fails to get any more rows, a final PQgetResult would get the query's success/fail status and possible error message. Not sure about all the details here; if you want to pursue it, start a discussion on pgsql-interfaces. regards, tom lane
On Thu, Apr 11, 2002 at 11:09:05AM -0500, Gunther Schadow wrote: [snip] > The constructive responses suggested that I use LIMIT/OFFSET and > CURSORs. I can see how that could be a workaround the problem, but > I still believe that something is wrong with the PostgreSQL query > executer. <editorialize emphasis="mine" text="yours"> > LOADING THE ENTIRE RESULT SET INTO A BUFFER WITHOUT > NEED JUST MAKES NO SENSE. </editorialize> when YOU say "SELECT * FROM SOMETABLE" that's what YOU are doing. the software tries to accomodate your request as best it can. > Good data base engines try to provide > for parallel execution of the query plan as much as possible, and > that implies streaming. There's a crowd of literature about this > testifying for it's importance. if you need the first twenty or the last fifty rows for your purpose, then use limit & offset, or try the cursor approach. if you NEED all the rows (a very unusual occurrence outside of mirroring instances -- even periodic summation, for report generation, will use aggregation instead of all individual rows) then ask for them. -- I use Debian/GNU Linux version 2.2; Linux server 2.2.17 #1 Sun Jun 25 09:24:41 EST 2000 i586 unknown DEBIAN NEWBIE TIP #63 from Will Trillich <will@serensoft.com> : What's the best way to GET RESPONSES ON DEBIAN-USER? There are several things to keep in mind: 1) Debians are all volunteers because they enjoy what they do; they don't owe you diddly (and you'll be one of us when you start getting involved): ASK, and ye shall recieve; DEMAND, and ye shall be rebuffed 2) Provide evidence showing that you did put effort into finding a solution to your problem (at least demonstrate that you've seen the manual) 3) Be known to offer pointers and assistance to others 4) Give enough information so that someone else can figure out what you're after; and make it legible 5) Enjoy yourself and have fun -- it'll come across, and we enjoy people who enjoy life; a petulant whiner seldom gets any useful pointers other than "Out, damn spot!" Also see http://newbieDoc.sourceForge.net/ ...
Oleg, I have added your name to this TODO list: * Allow ORDER BY ... LIMIT to select top values without sort or index using a sequential scan for highest/lowest values (Oleg) --------------------------------------------------------------------------- Oleg Bartunov wrote: > On Thu, 11 Apr 2002, Bruce Momjian wrote: > > > Gunther Schadow wrote: > > > Oleg Bartunov wrote: > > > > > > > The big issue with LIMIT,OFFSET is that it still use all rows > > > > for sorting. I already suggested to use partial sorting to avoid > > > > sorting all rows if one selected only first 20 row, for example. > > > > It's very important for Web applications because web users usually > > > > read first 1-2 pages. Our experimnets have shown 6 times performance > > > > win when using partial sorting. > > > > > > > > > That is interesting. I haven't even dared to ask for this. > > > Did you contribute that partial sorting code to postgresql? > > > > > > We have the TODO item, but the code is not written yet. > > > > Actually, we have libpsort for partial sorting and very crude patch for 7.1 > Jan and Tom were not optimist about that patch, so we need one bright > person who could incorporate the idea of partial sorting and current sources > and satisfy core developers. We have no time, unfortunately > > > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On 11 Apr 2002, Bill Gribble wrote: > On Wed, 2002-04-10 at 17:39, Gunther Schadow wrote: > > Then the biggest slowdown is count(*), which we have to do in order to > fake up the scrollbar (so we know what proportion of the data has been > scrolled through). I have not completely foxed this yet. I want to > keep a separate mini-table of how many records are in the big table and > update it with a trigger (the table is mostly static). ATM, I just try > hard to minimize the times I call count(*). Dont' recall right now which table, but there is a table that has record counts. It is accurate if you have done a vacuum full. Vacuum analyze only gives you estimates, but that may be good enough for your needs.