Thread: count(*) using index scan in "query often, update rarely" environment
Hello all
First of all, I do understand why pgsql with it's MVCC design has to examine tuples to evaluate "count(*)" and "count(*) where (...)" queries in environment with heavy concurrent updates.
This kind of usage IMHO isn't the average one. There are many circumstances with rather "query often, update rarely" character.
Isn't it possible (and reasonable) for these environments to keep track of whether there is a transaction in progress with update to given table and if not, use an index scan (count(*) where) or cached value (count(*)) to perform this kind of query?
(sorry for disturbing if this was already discussed)
Regards,
Cestmir Hybl
Re: count(*) using index scan in "query often, update rarely" environment
From
hubert depesz lubaczewski
Date:
On 10/7/05, Cestmir Hybl <cestmirl@freeside.sk> wrote:
if i understand your problem correctly, then simple usage of triggers will do the job just fine.
hubert
Isn't it possible (and reasonable) for these environments to keep track of whether there is a transaction in progress with update to given table and if not, use an index scan (count(*) where) or cached value (count(*)) to perform this kind of query?
if i understand your problem correctly, then simple usage of triggers will do the job just fine.
hubert
Yes, I can possibly use triggers to maintanin counts of several fixed groups of records or total recordcount (but it's unpractical).
No, I can't speed-up evaluation of generic "count(*) where ()" queries this way.
My question was rather about general performance of count() queries in environment with infrequent updates.
Cestmir
----- Original Message -----To: Cestmir HyblSent: Friday, October 07, 2005 11:54 AMSubject: Re: [PERFORM] count(*) using index scan in "query often, update rarely" environmentOn 10/7/05, Cestmir Hybl <cestmirl@freeside.sk> wrote:Isn't it possible (and reasonable) for these environments to keep track of whether there is a transaction in progress with update to given table and if not, use an index scan (count(*) where) or cached value (count(*)) to perform this kind of query?
if i understand your problem correctly, then simple usage of triggers will do the job just fine.
hubert
Re: count(*) using index scan in "query often, update rarely" environment
From
"Steinar H. Gunderson"
Date:
On Fri, Oct 07, 2005 at 11:24:05AM +0200, Cestmir Hybl wrote: > Isn't it possible (and reasonable) for these environments to keep track of > whether there is a transaction in progress with update to given table and > if not, use an index scan (count(*) where) or cached value (count(*)) to > perform this kind of query? Even if there is no running update, there might still be dead rows in the table. In any case, of course, a new update could always be occurring while your counting query was still running. /* Steinar */ -- Homepage: http://www.sesse.net/
collision: it's possible to either block updating transaction until index scan ends or discard index scan imediately and finish query using MVCC compliant scan dead rows: this sounds like more serious counter-argument, I don't know much about dead records management and whether it would be possible/worth to make indexes matching live records when there's no transaction in progress on that table ----- Original Message ----- From: "Steinar H. Gunderson" <sgunderson@bigfoot.com> To: <pgsql-performance@postgresql.org> Sent: Friday, October 07, 2005 12:48 PM Subject: Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment > On Fri, Oct 07, 2005 at 11:24:05AM +0200, Cestmir Hybl wrote: >> Isn't it possible (and reasonable) for these environments to keep track >> of >> whether there is a transaction in progress with update to given table and >> if not, use an index scan (count(*) where) or cached value (count(*)) to >> perform this kind of query? > > Even if there is no running update, there might still be dead rows in the > table. In any case, of course, a new update could always be occurring > while > your counting query was still running. > > /* Steinar */ > -- > Homepage: http://www.sesse.net/ > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster
On Fri, Oct 07, 2005 at 01:14:20PM +0200, Cestmir Hybl wrote: > collision: it's possible to either block updating transaction until > index scan ends or discard index scan imediately and finish query using > MVCC compliant scan You can't change from one scan method to a different one on the fly. There's no way to know which tuples have alreaady been returned. Our index access methods are designed to be very concurrent, and it works extremely well. One index scan being able to block an update would destroy that advantage. > dead rows: this sounds like more serious counter-argument, I don't know > much about dead records management and whether it would be > possible/worth to make indexes matching live records when there's no > transaction in progress on that table It's not possible, because a finishing transaction would have to clean up every index it has used, and also any index it hasn't used but has been modified by another transaction which couldn't clean up by itself but didn't do the work because the first one was looking at the index. It's easy to see that it's possible to create an unbounded number of transactions, each forcing the other to do some index cleanup. This is not acceptable. Plus, it would be very hard to implement, and a very wide door to bugs. -- Alvaro Herrera http://www.advogato.org/person/alvherre "Et put se mouve" (Galileo Galilei)
"Cestmir Hybl" <cestmirl@freeside.sk> writes: > Isn't it possible (and reasonable) for these environments to keep track = > of whether there is a transaction in progress with update to given table = > and if not, use an index scan (count(*) where) or cached value = > (count(*)) to perform this kind of query? Please read the archives before bringing up such well-discussed issues. There's a workable-looking design in the archives (pghackers probably) for maintaining overall table counts in a separate table, with each transaction adding one row of "delta" information just before it commits. I haven't seen anything else that looks remotely attractive. regards, tom lane
Re: count(*) using index scan in "query often, update rarely" environment
From
"Merlin Moncure"
Date:
On 10/7/05, Cestmir Hybl <cestmirl@freeside.sk> wrote: Isn't it possible (and reasonable) for these environments to keep track of whether there is a transaction in progress with update to given table and if not, use an index scan (count(*) where) or cached value (count(*)) to perform this kind of query? ________________________________________ The answer to the first question is subtle. Basically, the PostgreSQL engine is designed for high concurrency. We are definitely on the right side of the cost/benefit tradeoff here. SQL server does not have MVCC (or at least until 2005 appears) so they are on the other side of the tradeoff. You can of course serialize the access yourself by materializing the count in a small table and use triggers or cleverly designed transactions. This is trickier than it might look however so check the archives for a thorough treatment of the topic. One interesting thing is that making count(*) over large swaths of data is frequently an indicator of a poorly normalized database. Is it possible to optimize the counting by laying out your data in a different way? Merlin
Tom Lane wrote: > > There's a workable-looking design in the archives (pghackers probably) > for maintaining overall table counts in a separate table, with each > transaction adding one row of "delta" information just before it > commits. I haven't seen anything else that looks remotely attractive. It might be useful if there was a way to trap certain queries and rewrite/replace them. That way more complex queries could be transparently redirected to a summary table etc. I'm guessing that the overhead to check every query would quickly destroy any gains though. -- Richard Huxton Archonet Ltd
Re: count(*) using index scan in "query often, update rarely" environment
From
hubert depesz lubaczewski
Date:
On 10/7/05, Cestmir Hybl <cestmirl@freeside.sk> wrote:
no you can't speed up generic where(), *but* you can check what are the most common "where"'s (like usually i do where on one column like:
select count(*) from table where some_particular_column = 'some value';
where you can simply make the trigger aware of the fact that it should count based on value in some_particular_column.
works good enough for me not to look for alternatives.
depesz
No, I can't speed-up evaluation of generic "count(*) where ()" queries this way.
no you can't speed up generic where(), *but* you can check what are the most common "where"'s (like usually i do where on one column like:
select count(*) from table where some_particular_column = 'some value';
where you can simply make the trigger aware of the fact that it should count based on value in some_particular_column.
works good enough for me not to look for alternatives.
depesz
Re: count(*) using index scan in "query often, update rarely" environment
From
mark@mark.mielke.cc
Date:
On Fri, Oct 07, 2005 at 12:48:16PM +0200, Steinar H. Gunderson wrote: > On Fri, Oct 07, 2005 at 11:24:05AM +0200, Cestmir Hybl wrote: > > Isn't it possible (and reasonable) for these environments to keep track of > > whether there is a transaction in progress with update to given table and > > if not, use an index scan (count(*) where) or cached value (count(*)) to > > perform this kind of query? > Even if there is no running update, there might still be dead rows in the > table. In any case, of course, a new update could always be occurring while > your counting query was still running. I don't see this being different from count(*) as it is today. Updating a count column is certainly clever. If using a trigger, perhaps it would allow the equivalent of: select count(*) from table for update; :-) Cheers, mark (not that this is necessarily a good thing!) -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/