Thread: count(*) slow on large tables
Hi, I have a somewhat large table, 3 million rows, 1 Gig on disk, and growing. Doing a count(*) takes around 40 seconds. Looks like the count(*) fetches the table from disk and goes through it. Made me wonder, why the optimizer doesn't just choose the smallest index which in my case is around 60 Megs and goes through it, which it could do in a fraction of the time. Dror -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com
On Thu, Oct 02, 2003 at 12:15:47 -0700, Dror Matalon <dror@zapatec.com> wrote: > Hi, > > I have a somewhat large table, 3 million rows, 1 Gig on disk, and growing. Doing a > count(*) takes around 40 seconds. > > Looks like the count(*) fetches the table from disk and goes through it. > Made me wonder, why the optimizer doesn't just choose the smallest index > which in my case is around 60 Megs and goes through it, which it could > do in a fraction of the time. Because it can't tell from the index if a tuple is visible to the current transaction and would still have to hit the table to check this. So that performance would be a lot worse instead of better.
> Hi, > > I have a somewhat large table, 3 million rows, 1 Gig on disk, and growing. Doing a > count(*) takes around 40 seconds. > > Looks like the count(*) fetches the table from disk and goes through it. > Made me wonder, why the optimizer doesn't just choose the smallest index > which in my case is around 60 Megs and goes through it, which it could > do in a fraction of the time. > > Dror Just like other aggregate functions, count(*) won't use indexes when counting whole table. Regards, Tomasz Myrta
On Thu, Oct 02, 2003 at 12:46:45 -0700, Dror Matalon <dror@zapatec.com> wrote: Please keep replies copied to the list. > When would it happen that a tuple be invisible to the current > transaction? Are we talking about permissions? They could be tuples that were changed by a transaction that hasn't committed or in the case of serializable isolation, a transaction that committed after the current transaction started. > > On Thu, Oct 02, 2003 at 02:39:05PM -0500, Bruno Wolff III wrote: > > On Thu, Oct 02, 2003 at 12:15:47 -0700, > > Dror Matalon <dror@zapatec.com> wrote: > > > Hi, > > > > > > I have a somewhat large table, 3 million rows, 1 Gig on disk, and growing. Doing a > > > count(*) takes around 40 seconds. > > > > > > Looks like the count(*) fetches the table from disk and goes through it. > > > Made me wonder, why the optimizer doesn't just choose the smallest index > > > which in my case is around 60 Megs and goes through it, which it could > > > do in a fraction of the time. > > > > Because it can't tell from the index if a tuple is visible to the current > > transaction and would still have to hit the table to check this. So that > > performance would be a lot worse instead of better. > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 5: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/docs/faqs/FAQ.html > > -- > Dror Matalon, President > Zapatec Inc > 1700 MLK Way > Berkeley, CA 94709 > http://www.zapatec.com
That's one of the draw back of MVCC. I once suggested that the transaction number and other house keeping info be included in the index, but was told to forget it... It would solve once and for all the issue of seq_scan vs index_scan. It would simplify the aggregate problem. Bruno Wolff III wrote: > > On Thu, Oct 02, 2003 at 12:15:47 -0700, > Dror Matalon <dror@zapatec.com> wrote: > > Hi, > > > > I have a somewhat large table, 3 million rows, 1 Gig on disk, and growing. Doing a > > count(*) takes around 40 seconds. > > > > Looks like the count(*) fetches the table from disk and goes through it. > > Made me wonder, why the optimizer doesn't just choose the smallest index > > which in my case is around 60 Megs and goes through it, which it could > > do in a fraction of the time. > > Because it can't tell from the index if a tuple is visible to the current > transaction and would still have to hit the table to check this. So that > performance would be a lot worse instead of better. > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html
jllachan@nsd.ca (Jean-Luc Lachance) writes: > That's one of the draw back of MVCC. > I once suggested that the transaction number and other house keeping > info be included in the index, but was told to forget it... > It would solve once and for all the issue of seq_scan vs index_scan. > It would simplify the aggregate problem. It would only simplify _one_ case, namely the case where someone cares about the cardinality of a relation, and it would do that at _considerable_ cost. A while back I outlined how this would have to be done, and for it to be done efficiently, it would be anything BUT simple. It would be very hairy to implement it correctly, and all this would cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;" If you had a single WHERE clause attached, you would have to revert to walking through the tuples looking for the ones that are live and committed, which is true for any DBMS. And it still begs the same question, of why the result of this query would be particularly meaningful to anyone. I don't see the usefulness; I don't see the value of going to the considerable effort of "fixing" this purported problem. -- let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];; <http://dev6.int.libertyrms.com/> Christopher Browne (416) 646 3304 x124 (land)
I don't have an opinion on how hard it would be to implement the tracking in the indexes, but "select count(*) from some table" is, in my experience, a query that people tend to run quite often. One of the databases that I've used, I believe it was Informix, had that info cached so that it always new how many rows there were in any table. It was quite useful. On Thu, Oct 02, 2003 at 05:57:30PM -0400, Christopher Browne wrote: > jllachan@nsd.ca (Jean-Luc Lachance) writes: > > That's one of the draw back of MVCC. > > I once suggested that the transaction number and other house keeping > > info be included in the index, but was told to forget it... > > It would solve once and for all the issue of seq_scan vs index_scan. > > It would simplify the aggregate problem. > > It would only simplify _one_ case, namely the case where someone cares > about the cardinality of a relation, and it would do that at > _considerable_ cost. > > A while back I outlined how this would have to be done, and for it to > be done efficiently, it would be anything BUT simple. > > It would be very hairy to implement it correctly, and all this would > cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;" > > If you had a single WHERE clause attached, you would have to revert to > walking through the tuples looking for the ones that are live and > committed, which is true for any DBMS. > > And it still begs the same question, of why the result of this query > would be particularly meaningful to anyone. I don't see the > usefulness; I don't see the value of going to the considerable effort > of "fixing" this purported problem. > -- > let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];; > <http://dev6.int.libertyrms.com/> > Christopher Browne > (416) 646 3304 x124 (land) > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Dror Matalon, President Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com
The world rejoiced as dror@zapatec.com (Dror Matalon) wrote: > I don't have an opinion on how hard it would be to implement the > tracking in the indexes, but "select count(*) from some table" is, in my > experience, a query that people tend to run quite often. > One of the databases that I've used, I believe it was Informix, had that > info cached so that it always new how many rows there were in any > table. It was quite useful. I can't imagine why the raw number of tuples in a relation would be expected to necessarily be terribly useful. I'm involved with managing Internet domains, and it's only when people are being pretty clueless that anyone imagines that "select count(*) from domains;" would be of any use to anyone. There are enough "test domains" and "inactive domains" and other such things that the raw number of "things in the table" aren't really of much use. - I _do_ care how many pages a table occupies, to some extent, as that determines whether it will fit in my disk space or not, but that's not COUNT(*). - I might care about auditing the exact numbers of records in order to be assured that a data conversion process was done correctly. But in that case, I want to do something a whole *lot* more detailed than mere COUNT(*). I'm playing "devil's advocate" here, to some extent. But realistically, there is good reason to be skeptical of the merits of using SELECT COUNT(*) FROM TABLE for much of anything. Furthermore, the relation that you query mightn't be a physical "table." It might be a more virtual VIEW, and if that's the case, bets are even MORE off. If you go with the common dictum of "good design" that users don't directly access tables, but go through VIEWs, users may have no way to get at SELECT COUNT(*) FROM TABLE. -- output = reverse("ac.notelrac.teneerf" "@" "454aa") http://www.ntlug.org/~cbbrowne/finances.html Rules of the Evil Overlord #74. "When I create a multimedia presentation of my plan designed so that my five-year-old advisor can easily understand the details, I will not label the disk "Project Overlord" and leave it lying on top of my desk." <http://www.eviloverlord.com/>
I smell a religious war in the aii:-). Can you go several days in a row without doing select count(*) on any of your tables? I suspect that this is somewhat a domain specific issue. In some areas you don't need to know the total number of rows in your tables, in others you do. I also suspect that you're right, that end user applications don't use this information as often as DBAs would. On the other hand, it seems whenever you want to optimize your app (something relevant to this list), one of the things you do need to know is the number of rows in your table. Dror On Thu, Oct 02, 2003 at 10:08:18PM -0400, Christopher Browne wrote: > The world rejoiced as dror@zapatec.com (Dror Matalon) wrote: > > I don't have an opinion on how hard it would be to implement the > > tracking in the indexes, but "select count(*) from some table" is, in my > > experience, a query that people tend to run quite often. > > One of the databases that I've used, I believe it was Informix, had that > > info cached so that it always new how many rows there were in any > > table. It was quite useful. > > I can't imagine why the raw number of tuples in a relation would be > expected to necessarily be terribly useful. > > I'm involved with managing Internet domains, and it's only when people > are being pretty clueless that anyone imagines that "select count(*) > from domains;" would be of any use to anyone. There are enough "test > domains" and "inactive domains" and other such things that the raw > number of "things in the table" aren't really of much use. > > - I _do_ care how many pages a table occupies, to some extent, as that > determines whether it will fit in my disk space or not, but that's not > COUNT(*). > > - I might care about auditing the exact numbers of records in order to > be assured that a data conversion process was done correctly. But in > that case, I want to do something a whole *lot* more detailed than > mere COUNT(*). > > I'm playing "devil's advocate" here, to some extent. But > realistically, there is good reason to be skeptical of the merits of > using SELECT COUNT(*) FROM TABLE for much of anything. > > Furthermore, the relation that you query mightn't be a physical > "table." It might be a more virtual VIEW, and if that's the case, > bets are even MORE off. If you go with the common dictum of "good > design" that users don't directly access tables, but go through VIEWs, > users may have no way to get at SELECT COUNT(*) FROM TABLE. > -- > output = reverse("ac.notelrac.teneerf" "@" "454aa") > http://www.ntlug.org/~cbbrowne/finances.html > Rules of the Evil Overlord #74. "When I create a multimedia > presentation of my plan designed so that my five-year-old advisor can > easily understand the details, I will not label the disk "Project > Overlord" and leave it lying on top of my desk." > <http://www.eviloverlord.com/> > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Dror Matalon, President Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com
Christopher Browne <cbbrowne@libertyrms.info> writes: > It would be very hairy to implement it correctly, and all this would > cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;" > > If you had a single WHERE clause attached, you would have to revert to > walking through the tuples looking for the ones that are live and > committed, which is true for any DBMS. Well it would be handy for a few other cases as well. 1 It would be useful for the case where you have a partial index with a matching where clause. The optimizer already considers using such indexes but it has to pay the cost of the tuple lookup, which is substantial. 2 It would be useful for the very common queries of the form WHERE x IN (select id from foo where some_indexed_expression) (Or the various equivalent forms including outer joins that test to see if the matching record was found and don't retrieve any other columns in the select list.) 3 It would be useful for many-many relationships where the intermediate table has only the two primary key columns being joined. If you create a multi-column index on the two columns it shouldn't need to look up the tuple. This would be effectively be nearly equivalent to an "index organized table". 4 It would be useful for just about all the referential integrity queries... I don't mean to say this is definitely a good thing. The tradeoff in complexity and time to maintain the index pages would be large. But don't dismiss it as purely a count(*) optimization hack. I know Oracle is capable of it and it can speed up your query a lot when you remove that last unnecessary column from a join table allowing oracle to skip the step of reading the table. -- greg
I can tell you that this is one of the first thing applications' programmers and IT managers notice. It can slightly tarnishpostgres' image when it takes it many long seconds to do what other databases can do in a snap. The "whys and wherefores"can be hard to get across once they see the comparative numbers. When I use Informix "dbaccess" it has a "status" which will tell me the row count of a table virtually instantly -- it canbe locked out by a user with an exclusive lock so its not entirely independant of the table (like a stored value in oneof the system catalog tables). This is not to say Informix is "right" and Postgres is "wrong" ... but it is something that virtually any newcomer willrun into head long, with resulting bruises and contusions, not to mention confusion. At the very least this needs to be VERY clearly explained right up front, along with some of the possible work-arounds, dependingon what one is really after with this info. Greg Williamson DBA GlobeXplorer LLC -----Original Message----- From: Dror Matalon [mailto:dror@zapatec.com] Sent: Thu 10/2/2003 9:27 PM To: pgsql-performance@postgresql.org Cc: Subject: Re: [PERFORM] count(*) slow on large tables I smell a religious war in the aii:-). Can you go several days in a row without doing select count(*) on any of your tables? I suspect that this is somewhat a domain specific issue. In some areas you don't need to know the total number of rows in your tables, in others you do. I also suspect that you're right, that end user applications don't use this information as often as DBAs would. On the other hand, it seems whenever you want to optimize your app (something relevant to this list), one of the things you do need to know is the number of rows in your table. Dror On Thu, Oct 02, 2003 at 10:08:18PM -0400, Christopher Browne wrote: > The world rejoiced as dror@zapatec.com (Dror Matalon) wrote: > > I don't have an opinion on how hard it would be to implement the > > tracking in the indexes, but "select count(*) from some table" is, in my > > experience, a query that people tend to run quite often. > > One of the databases that I've used, I believe it was Informix, had that > > info cached so that it always new how many rows there were in any > > table. It was quite useful. > > I can't imagine why the raw number of tuples in a relation would be > expected to necessarily be terribly useful. > > I'm involved with managing Internet domains, and it's only when people > are being pretty clueless that anyone imagines that "select count(*) > from domains;" would be of any use to anyone. There are enough "test > domains" and "inactive domains" and other such things that the raw > number of "things in the table" aren't really of much use. > > - I _do_ care how many pages a table occupies, to some extent, as that > determines whether it will fit in my disk space or not, but that's not > COUNT(*). > > - I might care about auditing the exact numbers of records in order to > be assured that a data conversion process was done correctly. But in > that case, I want to do something a whole *lot* more detailed than > mere COUNT(*). > > I'm playing "devil's advocate" here, to some extent. But > realistically, there is good reason to be skeptical of the merits of > using SELECT COUNT(*) FROM TABLE for much of anything. > > Furthermore, the relation that you query mightn't be a physical > "table." It might be a more virtual VIEW, and if that's the case, > bets are even MORE off. If you go with the common dictum of "good > design" that users don't directly access tables, but go through VIEWs, > users may have no way to get at SELECT COUNT(*) FROM TABLE. > -- > output = reverse("ac.notelrac.teneerf" "@" "454aa") > http://www.ntlug.org/~cbbrowne/finances.html > Rules of the Evil Overlord #74. "When I create a multimedia > presentation of my plan designed so that my five-year-old advisor can > easily understand the details, I will not label the disk "Project > Overlord" and leave it lying on top of my desk." > <http://www.eviloverlord.com/> > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Dror Matalon, President Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
Dror Matalon wrote: > I smell a religious war in the aii:-). > Can you go several days in a row without doing select count(*) on any > of your tables? > > I suspect that this is somewhat a domain specific issue. In some areas > you don't need to know the total number of rows in your tables, in > others you do. If I were you, I would have an autovacuum daemon running and rather than doing select count(*), I would look at stats generated by vacuums. They give approximate number of tuples and it should be good enough it is accurate within a percent. Just another approach of achieving same thing.. Don't be religious about running a qeury from SQL prompt. That's it.. Shridhar
Oops! dror@zapatec.com (Dror Matalon) was seen spray-painting on a wall: > I smell a religious war in the aii:-). > Can you go several days in a row without doing select count(*) on any > of your tables? I would be more likely, personally, to run "VACUUM VERBOSE ANALYZE", which has useful side-effects :-). > I suspect that this is somewhat a domain specific issue. In some > areas you don't need to know the total number of rows in your > tables, in others you do. "Relationship tables," which don't contain data in their own right, but which, instead, link together records in other tables, are likely to have particularly useless COUNT(*)'s. > I also suspect that you're right, that end user applications don't > use this information as often as DBAs would. On the other hand, it > seems whenever you want to optimize your app (something relevant to > this list), one of the things you do need to know is the number of > rows in your table. Ah, but in the case of optimization, there's little need for "transactionally safe, MVCC-managed, known-to-be-exact" values. Approximations are plenty good enough to get the right plan. Furthermore, it's not the number of rows that is most important when optimizing queries; the number of pages are more relevant to the matter, as that's what the database is slinging around. -- (reverse (concatenate 'string "ac.notelrac.teneerf" "@" "454aa")) http://www3.sympatico.ca/cbbrowne/multiplexor.html Rules of the Evil Overlord #134. "If I am escaping in a large truck and the hero is pursuing me in a small Italian sports car, I will not wait for the hero to pull up along side of me and try to force him off the road as he attempts to climb aboard. Instead I will slam on the brakes when he's directly behind me. (A rudimentary knowledge of physics can prove quite useful.)" <http://www.eviloverlord.com/>
On Thu, 2 Oct 2003, Christopher Browne wrote: > I can't imagine why the raw number of tuples in a relation would be > expected to necessarily be terribly useful. > We use stuff like that for reporting queries. example: On our message boards each post is a row. The powers that be like to know how many posts there are total (In addition to 'today')- select count(*) from posts is how it has been done on our informix db. With our port to PG I instead select reltuples pg_class. I know when I login to a new db (or unknown to me db) the first thing I do is look at tables and see what sort of data there is.. but in code I'd rarely do that. I know some monitoring things around here also do a select count(*) on sometable to ensure it is growing, but like you said, this is easily done with the number of pages as well. yes. Informix caches this data. I believe Oracle does too. Mysql with InnoDB does the same thing PG does. (MyISAM caches it) -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
Well I can think of many more case where it would be usefull: SELECT COUNT(DISTINCT x) FROM ... SELECT COUNT(*) FROM ... WHERE x = ? Also having transaction number (visibility) would tip the balance more toward index_scan than seq_scan because you do not have to look up visibility in the data file. We all know this has been an issue many times. Having a different index file structure when the index is not UNIQUE would help too. The last page of a non unique index could hold more stats. Christopher Browne wrote: > > jllachan@nsd.ca (Jean-Luc Lachance) writes: > > That's one of the draw back of MVCC. > > I once suggested that the transaction number and other house keeping > > info be included in the index, but was told to forget it... > > It would solve once and for all the issue of seq_scan vs index_scan. > > It would simplify the aggregate problem. > > It would only simplify _one_ case, namely the case where someone cares > about the cardinality of a relation, and it would do that at > _considerable_ cost. > > A while back I outlined how this would have to be done, and for it to > be done efficiently, it would be anything BUT simple. > > It would be very hairy to implement it correctly, and all this would > cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;" > > If you had a single WHERE clause attached, you would have to revert to > walking through the tuples looking for the ones that are live and > committed, which is true for any DBMS. > > And it still begs the same question, of why the result of this query > would be particularly meaningful to anyone. I don't see the > usefulness; I don't see the value of going to the considerable effort > of "fixing" this purported problem. > -- > let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];; > <http://dev6.int.libertyrms.com/> > Christopher Browne > (416) 646 3304 x124 (land) > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
We frequently need to know the number of tuples in a table although sometimes we do have WHERE status='X' for example butthis often doesn't guarantee an indexed scan. And yes, my reasons are the same - reporting figures eg number of bookingsmade since the system was introduced. Have you tried doing SELECT count(pkey) rather than count(*) where pkey is the primary key (assuming you have a single field that is a primary key or a unique indexed key). This isMUCH faster in my experience. If you don't have such an animal, I'd seriously suggesting putting in a serial number andrecreate the table with that as the primary key. The vacuuming bit is not accurate enough for us in many instances. Also a count can be easily fed into other programs/webpages etc without having to parse the vacuum output. Hilary At 23:22 02/10/2003 -0700, you wrote: >I can tell you that this is one of the first thing applications' programmers and IT managers notice. It can slightly tarnishpostgres' image when it takes it many long seconds to do what other databases can do in a snap. The "whys and wherefores"can be hard to get across once they see the comparative numbers. > >When I use Informix "dbaccess" it has a "status" which will tell me the row count of a table virtually instantly -- it canbe locked out by a user with an exclusive lock so its not entirely independant of the table (like a stored value in oneof the system catalog tables). > >This is not to say Informix is "right" and Postgres is "wrong" ... but it is something that virtually any newcomer willrun into head long, with resulting bruises and contusions, not to mention confusion. > >At the very least this needs to be VERY clearly explained right up front, along with some of the possible work-arounds,depending on what one is really after with this info. > >Greg Williamson >DBA >GlobeXplorer LLC > >-----Original Message----- >From: Dror Matalon [mailto:dror@zapatec.com] >Sent: Thu 10/2/2003 9:27 PM >To: pgsql-performance@postgresql.org >Cc: >Subject: Re: [PERFORM] count(*) slow on large tables > > >I smell a religious war in the aii:-). >Can you go several days in a row without doing select count(*) on any >of your tables? > >I suspect that this is somewhat a domain specific issue. In some areas >you don't need to know the total number of rows in your tables, in >others you do. > >I also suspect that you're right, that end user applications don't use >this information as often as DBAs would. On the other hand, it seems >whenever you want to optimize your app (something relevant to this list), >one of the things you do need to know is the number of rows in your >table. > >Dror > >On Thu, Oct 02, 2003 at 10:08:18PM -0400, Christopher Browne wrote: >> The world rejoiced as dror@zapatec.com (Dror Matalon) wrote: >> > I don't have an opinion on how hard it would be to implement the >> > tracking in the indexes, but "select count(*) from some table" is, in my >> > experience, a query that people tend to run quite often. >> > One of the databases that I've used, I believe it was Informix, had that >> > info cached so that it always new how many rows there were in any >> > table. It was quite useful. >> >> I can't imagine why the raw number of tuples in a relation would be >> expected to necessarily be terribly useful. >> >> I'm involved with managing Internet domains, and it's only when people >> are being pretty clueless that anyone imagines that "select count(*) >> from domains;" would be of any use to anyone. There are enough "test >> domains" and "inactive domains" and other such things that the raw >> number of "things in the table" aren't really of much use. >> >> - I _do_ care how many pages a table occupies, to some extent, as that >> determines whether it will fit in my disk space or not, but that's not >> COUNT(*). >> >> - I might care about auditing the exact numbers of records in order to >> be assured that a data conversion process was done correctly. But in >> that case, I want to do something a whole *lot* more detailed than >> mere COUNT(*). >> >> I'm playing "devil's advocate" here, to some extent. But >> realistically, there is good reason to be skeptical of the merits of >> using SELECT COUNT(*) FROM TABLE for much of anything. >> >> Furthermore, the relation that you query mightn't be a physical >> "table." It might be a more virtual VIEW, and if that's the case, >> bets are even MORE off. If you go with the common dictum of "good >> design" that users don't directly access tables, but go through VIEWs, >> users may have no way to get at SELECT COUNT(*) FROM TABLE. >> -- >> output = reverse("ac.notelrac.teneerf" "@" "454aa") >> http://www.ntlug.org/~cbbrowne/finances.html >> Rules of the Evil Overlord #74. "When I create a multimedia >> presentation of my plan designed so that my five-year-old advisor can >> easily understand the details, I will not label the disk "Project >> Overlord" and leave it lying on top of my desk." >> <http://www.eviloverlord.com/> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 4: Don't 'kill -9' the postmaster > >-- >Dror Matalon, President >Zapatec Inc >1700 MLK Way >Berkeley, CA 94709 >http://www.zapatec.com > >---------------------------(end of broadcast)--------------------------- >TIP 4: Don't 'kill -9' the postmaster > > > > >---------------------------(end of broadcast)--------------------------- >TIP 4: Don't 'kill -9' the postmaster Hilary Forbes ------------- DMR Computer Limited: http://www.dmr.co.uk/ Direct line: 01689 889950 Switchboard: (44) 1689 860000 Fax: (44) 1689 860330 E-mail: hforbes@dmr.co.uk **********************************************************
In the last exciting episode, jllachan@nsd.ca (Jean-Luc Lachance) wrote: > Well I can think of many more case where it would be usefull: > > SELECT COUNT(DISTINCT x) FROM ... > SELECT COUNT(*) FROM ... WHERE x = ? Those are precisely the cases that the "other databases" ALSO fall down on. Maintaining those sorts of statistics would lead [in _ANY_ database; PostgreSQL has no disadvantage in this] to needing for each and every update to update a whole host of statistic values. It would be fairly reasonable to have a trigger, in PostgreSQL, to manage this sort of information. It would not be outrageously difficult to substantially improve performance of queries, at the considerable cost that each and every update would have to update a statistics table. If you're doing a whole lot of these sorts of queries, then it is a reasonable idea to create appropriate triggers for the (probably very few) tables where you are doing these counts. But the notion that this should automatically be applied to all tables always is a dangerous one. It would make update performance Suck Badly, because the extra statistical updates would be quite expensive. -- wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','cbbrowne.com'). http://www3.sympatico.ca/cbbrowne/multiplexor.html I'm sorry Dave, I can't let you do that. Why don't you lie down and take a stress pill?
Christopher Browne kirjutas R, 03.10.2003 kell 00:57: > jllachan@nsd.ca (Jean-Luc Lachance) writes: > > That's one of the draw back of MVCC. > > I once suggested that the transaction number and other house keeping > > info be included in the index, but was told to forget it... > > It would solve once and for all the issue of seq_scan vs index_scan. > > It would simplify the aggregate problem. > > It would only simplify _one_ case, namely the case where someone cares > about the cardinality of a relation, and it would do that at > _considerable_ cost. > > A while back I outlined how this would have to be done, and for it to > be done efficiently, it would be anything BUT simple. Could this be made a TODO item, perhaps with your attack plan. Of course as strictly optional feature useful only for special situations (see below) I cross-post this to [HACKERS] as it seem relevant to a problem recently discussed there. > It would be very hairy to implement it correctly, and all this would > cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;" Not really. Just yesterday there was a discussion on [HACKERS] about implementing btree-organized tables, which would be much less needed if the visibility info were kept in indexes. > If you had a single WHERE clause attached, you would have to revert to > walking through the tuples looking for the ones that are live and > committed, which is true for any DBMS. If the WHERE clause could use the same index (or any index with visibility info) then there would be no need for "walking through the tuples" in data relation. the typical usecase cited on [HACKERS] was time series data, where inserts are roughly in (timestamp,id)order but queries in (id,timestamp) order. Now if the index would include all relevant fields (id,timestamp,data1,data2,...,dataN) then the query could run on index only touching just a few pages and thus vastly improving performance. I agree that this is not something everybody needs, but when it is needed it is needed bad. > And it still begs the same question, of why the result of this query > would be particularly meaningful to anyone. I don't see the > usefulness; I don't see the value of going to the considerable effort > of "fixing" this purported problem. Being able to do fast count(*) is just a side benefit. ---------------- Hannu
> On our message boards each post is a row. The powers that be like to know > how many posts there are total (In addition to 'today')- > select count(*) from posts is how it has been > done on our informix db. With our port to PG I instead select reltuples > pg_class. We have exactly the same situation, except we just added a 'num_replies' field to each thread and a 'num_posts' field to each forum, so that getting that information out is a very fast operation. Because, of course, there are hundreds of times more reads of that information than writes... Chris
Christopher Browne wrote: > jllachan@nsd.ca (Jean-Luc Lachance) writes: > > That's one of the draw back of MVCC. > > I once suggested that the transaction number and other house keeping > > info be included in the index, but was told to forget it... > > It would solve once and for all the issue of seq_scan vs index_scan. > > It would simplify the aggregate problem. > > It would only simplify _one_ case, namely the case where someone cares > about the cardinality of a relation, and it would do that at > _considerable_ cost. > > A while back I outlined how this would have to be done, and for it to > be done efficiently, it would be anything BUT simple. > > It would be very hairy to implement it correctly, and all this would > cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;" We do have a TODO item: * Consider using MVCC to cache count(*) queries with no WHERE clause The idea is to cache a recent count of the table, then have insert/delete add +/- records to the count. A COUNT(*) would get the main cached record plus any visible +/- records. This would allow the count to return the proper value depending on the visibility of the requesting transaction, and it would require _no_ heap or index scan. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Hannu Krosing <hannu@tm.ee> writes: > Christopher Browne kirjutas R, 03.10.2003 kell 00:57: >> A while back I outlined how this would have to be done, and for it to >> be done efficiently, it would be anything BUT simple. > Could this be made a TODO item, perhaps with your attack plan. If I recall that discussion correctly, no one including Christopher thought the attack plan was actually reasonable. What this keeps coming down to is that an optimization that helps only COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in development and maintenance effort to justify its existence. At least if you insist on an exact, MVCC-correct answer. So far as I've seen, the actual use cases for unqualified COUNT(*) could be handled equally well by an approximate answer. What we should be doing rather than wasting large amounts of time trying to devise exact solutions is telling people to look at pg_class.reltuples for approximate answers. We could also be looking at beefing up support for that approach --- maybe provide some syntactic sugar for the lookup, maybe see if we can update reltuples in more places than we do now, make sure that the autovacuum daemon includes "keep reltuples accurate" as one of its design goals, etc etc. regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > We do have a TODO item: > * Consider using MVCC to cache count(*) queries with no WHERE clause > The idea is to cache a recent count of the table, then have > insert/delete add +/- records to the count. A COUNT(*) would get the > main cached record plus any visible +/- records. This would allow the > count to return the proper value depending on the visibility of the > requesting transaction, and it would require _no_ heap or index scan. ... and it would give the wrong answers. Unless the cache is somehow snapshot-aware, so that it can know which other transactions should be included in your count. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > We do have a TODO item: > > * Consider using MVCC to cache count(*) queries with no WHERE clause > > > The idea is to cache a recent count of the table, then have > > insert/delete add +/- records to the count. A COUNT(*) would get the > > main cached record plus any visible +/- records. This would allow the > > count to return the proper value depending on the visibility of the > > requesting transaction, and it would require _no_ heap or index scan. > > ... and it would give the wrong answers. Unless the cache is somehow > snapshot-aware, so that it can know which other transactions should be > included in your count. The cache is an ordinary table, with xid's on every row. I meant it would require no index/heap scans of the large table --- it would still require a scan of the "count" table. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> ... and it would give the wrong answers. Unless the cache is somehow >> snapshot-aware, so that it can know which other transactions should be >> included in your count. > The cache is an ordinary table, with xid's on every row. I meant it > would require no index/heap scans of the large table --- it would still > require a scan of the "count" table. Oh, that idea. Yeah, I think we had concluded it might work. You'd better make the TODO item link to that discussion, because there's sure been plenty of discussion of ideas that wouldn't work. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> ... and it would give the wrong answers. Unless the cache is somehow > >> snapshot-aware, so that it can know which other transactions should be > >> included in your count. > > > The cache is an ordinary table, with xid's on every row. I meant it > > would require no index/heap scans of the large table --- it would still > > require a scan of the "count" table. > > Oh, that idea. Yeah, I think we had concluded it might work. You'd > better make the TODO item link to that discussion, because there's sure > been plenty of discussion of ideas that wouldn't work. OK, I beefed up the TODO: * Use a fixed row count and a +/- count with MVCC visibility rules to allow fast COUNT(*) queries with no WHERE clause(?) I can always give the details if someone asks. It doesn't seem complex enough for a separate TODO.detail item. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > It doesn't seem complex enough for a separate TODO.detail item. I thought it was, if only because it is so easy to think of wrong implementations. regards, tom lane
Tom Lane kirjutas L, 04.10.2003 kell 19:07: > Hannu Krosing <hannu@tm.ee> writes: > > Christopher Browne kirjutas R, 03.10.2003 kell 00:57: > >> A while back I outlined how this would have to be done, and for it to > >> be done efficiently, it would be anything BUT simple. > > > Could this be made a TODO item, perhaps with your attack plan. > > If I recall that discussion correctly, no one including Christopher > thought the attack plan was actually reasonable. > > What this keeps coming down to is that an optimization that helps only > COUNT(*)-of-one-table-with-no-WHERE-clause would be too expensive in > development and maintenance effort to justify its existence. Please read further in my email ;) The point I was trying to make was that faster count(*)'s is just a side effect. If we could (conditionally) keep visibility info in indexes, then this would also solve the problem fo much more tricky question of index-structured tables. Count(*) is *not* the only query that could benefit from not needing to go to actual data table for visibilty info, The much more needed case would be the "inveres time series" type of queries, which would otherways trash cache pages badly. ---------------------------- Hannu
Hannu Krosing <hannu@tm.ee> writes: > The point I was trying to make was that faster count(*)'s is just a side > effect. If we could (conditionally) keep visibility info in indexes, I think that's not happening, conditionally or otherwise. The atomicity problems alone are sufficient reason why not, even before you look at the performance issues. regards, tom lane
On 10/4/03 2:00 AM, "Hannu Krosing" <hannu@tm.ee> wrote: > > If the WHERE clause could use the same index (or any index with > visibility info) then there would be no need for "walking through the > tuples" in data relation. > > the typical usecase cited on [HACKERS] was time series data, where > inserts are roughly in (timestamp,id)order but queries in (id,timestamp) > order. Now if the index would include all relevant fields > (id,timestamp,data1,data2,...,dataN) then the query could run on index > only touching just a few pages and thus vastly improving performance. I > agree that this is not something everybody needs, but when it is needed > it is needed bad. I would add that automatically index-organizing tuples isn't just useful for time-series data (though it is a good example), but can be used to substantially improve the query performance of any really large table in a number of different and not always direct ways. Once working sets routinely exceed the size of physical RAM, buffer access/utilization efficiency often becomes the center of performance tuning, but not one that many people know much about. One of the less direct ways of using btree-organized tables for improving scalability is to "materialize" table indexes of tables that *shouldn't* be btree-organized. Not only can you turn tables into indexes, but you can also turn indexes into tables, which can have advantages in some cases. For example, I did some scalability consulting at a well-known movie rental company with some very large Oracle databases running on big Sun boxen. One of the biggest problems was that their rental history table, which had a detailed record of every movie ever rented by every customer, had grown so large that the performance was getting painfully slow. To make matters worse, it and a few related tables had high concurrent usage, a mixture of many performance-sensitive queries grabbing windows of a customer's history plus a few broader OLAP queries which were not time sensitive. Everything was technically optimized in a relational and basic configuration sense, and the database guys at the company were at a loss on how to fix this problem. Performance of all queries was essentially bound by how fast pages could be moved between the disk and buffers. Issue #1: The history rows had quite a lot of columns and the OLAP processes used non-primary indexes, so the table was not particularly suitable for btree-organizing. Issue #2: Partitioning was not an option because it would have exceeded certain limits in Oracle (at that time, I don't know if that has changed). Issue #3: Although customer histories were being constantly queried, data needed most was really an index view of the customers history, not the details of the history itself. The solution I came up with was to use a synced btree-organized partial clone of the main history table that only contained a small number of key columns that mattered for generating customer history indexes in the applications that used them. While this substantially increased the disk space footprint for the same data (since we were cloning it), it greatly reduced the total number of cache misses for the typical query, only fetching the full history row pages when actually needed. In other words, basically concentrating more buffer traffic into a smaller number of page buffers. What we had was an exceedingly active but relatively compact materialized index of the history table that could essentially stay resident in RAM, and a much less active history table+indexes that while less likely to be buffered than before, had pages accessed at such a reduced frequency that there was a huge net performance gain because disk access plummeted. Average performance improvement for the time sensitive queries: 50-70x So btree-organized tables can do more than make tables behave like indexes. They can also make indexes behave like tables. Both are very useful in some cases when your working set exceeds the physical buffer space. For smaller databases this has much less utility and users need to understand the limitations, nonetheless when tables and databases get really big it becomes an important tool in the tool belt. Cheers, -James Rogers jamesr@best.com
Quoth tgl@sss.pgh.pa.us (Tom Lane): > Bruce Momjian <pgman@candle.pha.pa.us> writes: >> We do have a TODO item: >> * Consider using MVCC to cache count(*) queries with no WHERE clause > >> The idea is to cache a recent count of the table, then have >> insert/delete add +/- records to the count. A COUNT(*) would get the >> main cached record plus any visible +/- records. This would allow the >> count to return the proper value depending on the visibility of the >> requesting transaction, and it would require _no_ heap or index scan. > > ... and it would give the wrong answers. Unless the cache is somehow > snapshot-aware, so that it can know which other transactions should be > included in your count. [That's an excellent summary that Bruce did of what came out of the previous discussion...] If this "cache" was a table, itself, the visibility of its records should be identical to that of the visibility of the "real" records. +/- records would become visible when the transaction COMMITed, at the very same time the source records became visible. I thought, at one point, that it would be a slick idea for "record compression" to take place automatically; when you do a COUNT(*), the process would include compressing multiple records down to one. Unfortunately, that turns out to be Tremendously Evil if the same COUNT(*) were being concurrently processed in multiple transactions. Both would repeat much the same work, and this would ultimately lead to one of the transactions aborting. [I recently saw this effect occur, um, a few times...] For this not to have Evil Effects on unsuspecting transactions, we would instead require some process analagous to VACUUM, where a single transaction would be used to compress the "counts table" down to one record per table. Being independent of "user transactions," it could safely compress the data without injuring unsuspecting transactions. But in most cases, the cost of this would be pretty prohibitive. Every transaction that adds a record to a table leads to a record being added to table "pg_exact_row_counts". If transactions typically involve adding ONE row to any given table, this effectively doubles the update traffic. Ouch. That means that in a _real_ implementation, it would make sense to pick and choose the tables that would be so managed. In my earlier arguing of "You don't really want that!", while I may have been guilty of engaging in a _little_ hyperbole, I was certainly _not_ being facetious overall. At work, we tell the developers "avoid doing COUNT(*) inside ordinary transactions!", and that is certainly NOT facetious comment. I recall a case a while back where system performance was getting brutalized by a lurking COUNT(*). (Combined with some other pathological behaviour, of course!) And note that this wasn't a query that the TODO item could address; it was of the form "SELECT COUNT(*) FROM SOME_TABLE WHERE OWNER = VALUE;" As you have commented elsewhere in the thread, much of the time, the point of asking for COUNT(*) is often to get some idea of table size, where the precise number isn't terribly important in comparison with getting general magnitude. Improving the ability to get approximate values would be of some value. I would further argue that "SELECT COUNT(*) FROM TABLE" isn't particularly useful even when precision _is_ important. If I'm working on reports that would be used to reconcile things, the queries I use are a whole lot more involved than that simple form. It is far more likely that I'm using a GROUP BY. It is legitimate to get wishful and imagine that it would be nice if we could get the value of that query "instantaneously." It is also legitimate to think that the effort required to implement that might be better used on improving other things. -- (reverse (concatenate 'string "ac.notelrac.teneerf" "@" "454aa")) http://www3.sympatico.ca/cbbrowne/ "very few people approach me in real life and insist on proving they are drooling idiots." -- Erik Naggum, comp.lang.lisp
Tom Lane wrote: > Hannu Krosing <hannu@tm.ee> writes: > > The point I was trying to make was that faster count(*)'s is just a side > > effect. If we could (conditionally) keep visibility info in indexes, > > I think that's not happening, conditionally or otherwise. The atomicity > problems alone are sufficient reason why not, even before you look at > the performance issues. What are the atomicity problems of adding a create/expire xid to the index tuples? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> I think that's not happening, conditionally or otherwise. The atomicity >> problems alone are sufficient reason why not, even before you look at >> the performance issues. > What are the atomicity problems of adding a create/expire xid to the > index tuples? You can't update a tuple's status in just one place ... you have to update the copies in the indexes too. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> I think that's not happening, conditionally or otherwise. The atomicity > >> problems alone are sufficient reason why not, even before you look at > >> the performance issues. > > > What are the atomicity problems of adding a create/expire xid to the > > index tuples? > > You can't update a tuple's status in just one place ... you have to > update the copies in the indexes too. But we don't update the tuple status for a commit, we just mark the xid as committed. We do have lazy status bits that prevent later lookups in pg_clog, but we have those in the index already also. What am I missing? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce, > OK, I beefed up the TODO: > > * Use a fixed row count and a +/- count with MVCC visibility rules > to allow fast COUNT(*) queries with no WHERE clause(?) > > I can always give the details if someone asks. It doesn't seem complex > enough for a separate TODO.detail item. Hmmm ... this doesn't seem effort-worthy to me. How often does anyone do COUNT with no where clause, except GUIs that give you a record count? (of course, as always, if someone wants to code it, feel free ...) And for those GUIs, wouldn't it be 97% as good to run an ANALYZE and give the approximate record counts for large tables? As for counts with a WHERE clause, this is obviously up to the user. Joe Conway and I tested using a C trigger to track some COUNT ... GROUP BY values for large tables based on additive numbers. It worked fairly well for accuracy, but the performance penalty on data writes was significant ... 8% to 25% penalty for UPDATES, depending on the frequency and batch size (> frequency > batch size --> > penalty) It's possible that this could be improved through some mechanism more tightly integrated with the source code. However,the coding effort would be significant ( 12-20 hours ) and it's possible that there would be no improvement, which is why we didn't do it. We also discussed an asynchronous aggregates collector that would work something like the statistics collector, and keep pre-programmmed aggregate data, updating during "low-activity" periods. This would significantly reduce the performance penalty, but at the cost of accuracy ... that is, a 1%-5% variance on high-activity tables would be unavoidable, and all cached aggregates would have to be recalculated on database restart, significantly slowing down startup. Again, we felt that the effort-result payoff was not worthwhile. Overall, I think the stuff we already have planned ... the hash aggregates in 7.4 and Tom's suggestion of adding an indexable flag to pg_aggs ... are far more likely to yeild useful fruit than any caching plan. -- Josh Berkus Aglio Database Solutions San Francisco
> And for those GUIs, wouldn't it be 97% as good to run an ANALYZE and give the > approximate record counts for large tables? Interfaces which run a COUNT(*) like that are broken by design. They fail to consider the table may really be a view which of course could not be cached with results like that and may take days to load a full result set (we had some pretty large views in an old billing system).
Attachment
Bruce Momjian wrote: > OK, I beefed up the TODO: > > * Use a fixed row count and a +/- count with MVCC visibility rules > to allow fast COUNT(*) queries with no WHERE clause(?) > > I can always give the details if someone asks. It doesn't seem complex > enough for a separate TODO.detail item. May I propose alternate approach for this optimisation? - Postgresql allows to maintain user defined variables in shared memory. - These variables obey transactions but do not get written to disk at all. - There should be a facility to detect whether such a variable is initialized or not. How it will help? This is in addition to trigger proposal that came up earlier. With triggers it's not possible to make values visible across backends unless trigger updates a table, which eventually leads to vacuum/dead tuples problem. 1. User creates a trigger to check updates/inserts for certain conditions. 2. It updates the count as and when required. 3. If the trigger detects the count is not initialized, it would issue the same query first time. There is no avoiding this issue. Besides providing facility of resident variables could be used imaginatively as well. Does this make sense? IMO this is more generalised approach over all. Just a thought. Shridhar
> How it will help? This is in addition to trigger proposal that came > up earlier. With triggers it's not possible to make values visible > across backends unless trigger updates a table, which eventually > leads to vacuum/dead tuples problem. > > 1. User creates a trigger to check updates/inserts for certain conditions. > 2. It updates the count as and when required. > 3. If the trigger detects the count is not initialized, it would issue the > same query first time. There is no avoiding this issue. > > Besides providing facility of resident variables could be used > imaginatively as well. > > Does this make sense? IMO this is more generalised approach over all. I do this _VERY_ frequently in my databases, only I have my stored procs do the aggregate in a predefined MVCC table that's always there. Here's a denormalized version for public consumption/thought: CREATE TABLE global.dba_aggregate_cache ( dbl TEXT NOT NULL, -- The database location, doesn't need to be -- qualified (ex: schema.table.col) op TEXT NOT NULL, -- The operation, SUM, COUNT, etc. qual TEXT, -- Any kind of conditional, such as a where clause val_int INT, -- Whatever the value is, of type INT val_bigint BIGINT, -- Whatever the value is, of type BIGINT val_text TEXT, -- Whatever the value is, of type TEXT val_bytea BYTEA, -- Whatever the value is, of type BYTEA ); CREATE UNIQUE INDEX dba_aggregate_cache_dbl_op_udx ON global.dba_aggregate_cache(dbl,op); Then, I use a function to retrieve this value instead of a SELECT COUNT(*). SELECT public.cache_count('dbl','qual'); -- In this case, the op is COUNT SELECT public.cache_count('dbl'); -- Returns the COUNT for the table listed in the dbl Then, I create 4 or 5 functions (depends on the op I'm performing): 1) A private function that _doesn't_ run as security definer, that populates the global.dba_aggregate_cache row if it's empty. 2) A STABLE function for SELECTs, if the row doesn't exist, then it calls function #1 to populate its existence. 3) A STABLE function for INSERTs, if the row doesn't exist, then it calls function #1 to populate its existence, then adds the necessary bits to make it accurate. 4) A STABLE function for DELETEs, if the row doesn't exist, then it calls function #1 to populate its existence, then deletes the necessary bits to make it accurate. 5) A STABLE function for UPDATEs, if the row doesn't exist, then it calls function #1 to populate its existence, then updates the necessary bits to make it accurate. It's not uncommon for me to not have an UPDATE function/trigger. Create triggers for functions 2-5, and test away. It's MVCC, searching through a table that's INDEX'ed for a single row is obviously vastly faster than a seqscan/aggregate. If I need any kind of an aggregate to be fast, I use this system with a derivation of the above table. The problem with it being that I have to retrain others to use cache_count(), or some other function instead of using COUNT(*). That said, it'd be nice if there were a way to tell PostgreSQL to do the above for you and teach COUNT(*), SUM(*), or other aggregates to use an MVCC backed cache similar to the above. If people want their COUNT's to be fast, then they have to live with the INSERT, UPDATE, DELETE cost. The above doesn't work with anything complex such as join's, but it's certainly a start and I think satisfies everyone's gripes other than the tuple churn that _does_ happen (*nudge nudge*, pg_autovacuum could be integrated into the backend to handle this). Those worried about performance, the pages that are constantly being recycled would likely stay in disk cache (PG or the OS). There's still some commit overhead, but still... no need to over optimize by requiring the table to be stored in the out dated, slow, and over used shm (also, *nudge nudge*). Anyway, let me throw that out there as a solution that I use and it works quite well. I didn't explain the use of the qual column, but I think those who grasp the above way of handling things probably grok how to use the qual column in a dynamically executed query. CREATE AGGREGATE CACHE anyone? -sc -- Sean Chittenden
In article <3F7D172E.3060107@persistent.co.in>, Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes: > Dror Matalon wrote: >> I smell a religious war in the aii:-). Can you go several days in a >> row without doing select count(*) on any >> of your tables? I suspect that this is somewhat a domain specific >> issue. In some areas >> you don't need to know the total number of rows in your tables, in >> others you do. > If I were you, I would have an autovacuum daemon running and rather > than doing select count(*), I would look at stats generated by > vacuums. They give approximate number of tuples and it should be good > enough it is accurate within a percent. The stats might indeed be a good estimate presumed there were not many changes since the last VACUUM. But how about a variant of COUNT(*) using an index? It would not be quite exact since it might contain tuples not visible in the current transaction, but it might be a much better estimate than the stats.