Thread: count(*) using index scan in "query often, update rarely" environment

count(*) using index scan in "query often, update rarely" environment

From

"Cestmir Hybl"

Date:

07 October 2005, 06:24:19

Hello all

First of all, I do understand why pgsql with it's MVCC design has to examine tuples to evaluate "count(*)" and "count(*) where (...)" queries in environment with heavy concurrent updates.

This kind of usage IMHO isn't the average one. There are many circumstances with rather "query often, update rarely" character.

Isn't it possible (and reasonable) for these environments to keep track of whether there is a transaction in progress with update to given table and if not, use an index scan (count(*) where) or cached value (count(*)) to perform this kind of query?

(sorry for disturbing if this was already discussed)

Regards,

Cestmir Hybl

Re: count(*) using index scan in "query often, update rarely" environment

From

hubert depesz lubaczewski

Date:

07 October 2005, 06:54:29

On 10/7/05, Cestmir Hybl <cestmirl@freeside.sk> wrote:

Isn't it possible (and reasonable) for these environments to keep track of whether there is a transaction in progress with update to given table and if not, use an index scan (count(*) where) or cached value (count(*)) to perform this kind of query?

if i understand your problem correctly, then simple usage of triggers will do the job just fine.

hubert

Re: count(*) using index scan in "query often, update rarely" environment

From

"Cestmir Hybl"

Date:

07 October 2005, 07:14:18

Yes, I can possibly use triggers to maintanin counts of several fixed groups of records or total recordcount (but it's unpractical).

No, I can't speed-up evaluation of generic "count(*) where ()" queries this way.

My question was rather about general performance of count() queries in environment with infrequent updates.

Cestmir

----- Original Message -----
From: hubert depesz lubaczewski
To: Cestmir Hybl
Cc: pgsql-performance@postgresql.org
Sent: Friday, October 07, 2005 11:54 AM
Subject: Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment

On 10/7/05, Cestmir Hybl <cestmirl@freeside.sk> wrote:
Isn't it possible (and reasonable) for these environments to keep track of whether there is a transaction in progress with update to given table and if not, use an index scan (count(*) where) or cached value (count(*)) to perform this kind of query?

if i understand your problem correctly, then simple usage of triggers will do the job just fine.

hubert

Re: count(*) using index scan in "query often, update rarely" environment

From

"Steinar H. Gunderson"

Date:

07 October 2005, 07:48:24

On Fri, Oct 07, 2005 at 11:24:05AM +0200, Cestmir Hybl wrote:
> Isn't it possible (and reasonable) for these environments to keep track of
> whether there is a transaction in progress with update to given table and
> if not, use an index scan (count(*) where) or cached value (count(*)) to
> perform this kind of query?

Even if there is no running update, there might still be dead rows in the
table. In any case, of course, a new update could always be occurring while
your counting query was still running.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: count(*) using index scan in "query often, update rarely" environment

From

"Cestmir Hybl"

Date:

07 October 2005, 08:14:06

collision: it's possible to either block updating transaction until index
scan ends or discard index scan imediately and finish query using MVCC
compliant scan

dead rows: this sounds like more serious counter-argument, I don't know much
about dead records management and whether it would be  possible/worth to
make indexes matching live records when there's no transaction in progress
on that table

----- Original Message -----
From: "Steinar H. Gunderson" <sgunderson@bigfoot.com>
To: <pgsql-performance@postgresql.org>
Sent: Friday, October 07, 2005 12:48 PM
Subject: Re: [PERFORM] count(*) using index scan in "query often, update
rarely" environment


> On Fri, Oct 07, 2005 at 11:24:05AM +0200, Cestmir Hybl wrote:
>> Isn't it possible (and reasonable) for these environments to keep track
>> of
>> whether there is a transaction in progress with update to given table and
>> if not, use an index scan (count(*) where) or cached value (count(*)) to
>> perform this kind of query?
>
> Even if there is no running update, there might still be dead rows in the
> table. In any case, of course, a new update could always be occurring
> while
> your counting query was still running.
>
> /* Steinar */
> --
> Homepage: http://www.sesse.net/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

Re: count(*) using index scan in "query often, update rarely" environment

From

Alvaro Herrera

Date:

07 October 2005, 10:07:20

On Fri, Oct 07, 2005 at 01:14:20PM +0200, Cestmir Hybl wrote:
> collision: it's possible to either block updating transaction until
> index scan ends or discard index scan imediately and finish query using
> MVCC compliant scan

You can't change from one scan method to a different one on the fly.
There's no way to know which tuples have alreaady been returned.

Our index access methods are designed to be very concurrent, and it
works extremely well.  One index scan being able to block an update
would destroy that advantage.

> dead rows: this sounds like more serious counter-argument, I don't know
> much about dead records management and whether it would be
> possible/worth to make indexes matching live records when there's no
> transaction in progress on that table

It's not possible, because a finishing transaction would have to clean
up every index it has used, and also any index it hasn't used but has
been modified by another transaction which couldn't clean up by itself
but didn't do the work because the first one was looking at the index.
It's easy to see that it's possible to create an unbounded number of
transactions, each forcing the other to do some index cleanup.  This is
not acceptable.

Plus, it would be very hard to implement, and a very wide door to bugs.

--
Alvaro Herrera                        http://www.advogato.org/person/alvherre
"Et put se mouve" (Galileo Galilei)

Re: count(*) using index scan in "query often, update rarely" environment

From

Tom Lane

Date:

07 October 2005, 10:42:32

"Cestmir Hybl" <cestmirl@freeside.sk> writes:
> Isn't it possible (and reasonable) for these environments to keep track =
> of whether there is a transaction in progress with update to given table =
> and if not, use an index scan (count(*) where) or cached value =
> (count(*)) to perform this kind of query?

Please read the archives before bringing up such well-discussed issues.

There's a workable-looking design in the archives (pghackers probably)
for maintaining overall table counts in a separate table, with each
transaction adding one row of "delta" information just before it
commits.  I haven't seen anything else that looks remotely attractive.

            regards, tom lane

Re: count(*) using index scan in "query often, update rarely" environment

From

"Merlin Moncure"

Date:

07 October 2005, 10:50:31

On 10/7/05, Cestmir Hybl <cestmirl@freeside.sk> wrote:
Isn't it possible (and reasonable) for these environments to keep track
of whether there is a transaction in progress with update to given table
and if not, use an index scan (count(*) where) or cached value
(count(*)) to perform this kind of query?
________________________________________

The answer to the first question is subtle.  Basically, the PostgreSQL
engine is designed for high concurrency.  We are definitely on the right
side of the cost/benefit tradeoff here.  SQL server does not have MVCC
(or at least until 2005 appears) so they are on the other side of the
tradeoff.

You can of course serialize the access yourself by materializing the
count in a small table and use triggers or cleverly designed
transactions.  This is trickier than it might look however so check the
archives for a thorough treatment of the topic.

One interesting thing is that making count(*) over large swaths of data
is frequently an indicator of a poorly normalized database.  Is it
possible to optimize the counting by laying out your data in a different
way?

Merlin

Re: count(*) using index scan in "query often, update rarely"

From

Richard Huxton

Date:

07 October 2005, 11:32:24

Tom Lane wrote:
>
> There's a workable-looking design in the archives (pghackers probably)
> for maintaining overall table counts in a separate table, with each
> transaction adding one row of "delta" information just before it
> commits.  I haven't seen anything else that looks remotely attractive.

It might be useful if there was a way to trap certain queries and
rewrite/replace them. That way more complex queries could be
transparently redirected to a summary table etc. I'm guessing that the
overhead to check every query would quickly destroy any gains though.

--
   Richard Huxton
   Archonet Ltd

Re: count(*) using index scan in "query often, update rarely" environment

From

hubert depesz lubaczewski

Date:

08 October 2005, 07:44:10

On 10/7/05, Cestmir Hybl <cestmirl@freeside.sk> wrote:

No, I can't speed-up evaluation of generic "count(*) where ()" queries this way.

no you can't speed up generic where(), *but* you can check what are the most common "where"'s (like usually i do where on one column like:
select count(*) from table where some_particular_column = 'some value';
where you can simply make the trigger aware of the fact that it should count based on value in some_particular_column.
works good enough for me not to look for alternatives.

depesz

Re: count(*) using index scan in "query often, update rarely" environment

From

mark@mark.mielke.cc

Date:

08 October 2005, 10:48:22

On Fri, Oct 07, 2005 at 12:48:16PM +0200, Steinar H. Gunderson wrote:
> On Fri, Oct 07, 2005 at 11:24:05AM +0200, Cestmir Hybl wrote:
> > Isn't it possible (and reasonable) for these environments to keep track of
> > whether there is a transaction in progress with update to given table and
> > if not, use an index scan (count(*) where) or cached value (count(*)) to
> > perform this kind of query?
> Even if there is no running update, there might still be dead rows in the
> table. In any case, of course, a new update could always be occurring while
> your counting query was still running.

I don't see this being different from count(*) as it is today.

Updating a count column is certainly clever. If using a trigger,
perhaps it would allow the equivalent of:

    select count(*) from table for update;

:-)

Cheers,
mark

(not that this is necessarily a good thing!)

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/