Re: Performance of count(*) - Mailing list pgsql-performance

From Bill Moran
Subject Re: Performance of count(*)
Date
Msg-id 20070322083130.1de9dc5b.wmoran@collaborativefusion.com
Whole thread Raw
In response to Re: Performance of count(*)  (ismo.tuononen@solenovo.fi)
List pgsql-performance
In response to ismo.tuononen@solenovo.fi:
>
> approximated count?????
>
> why? who would need it? where you can use it?
>
> calculating costs and desiding how to execute query needs
> approximated count, but it's totally worthless information for any user
> IMO.

I don't think so.

We have some AJAX stuff where users enter search criteria on a web form,
and the # of results updates in "real time" as they change their criteria.

Right now, this works fine with small tables using count(*) -- it's fast
enough not to be an issue, but we're aware that we can't use it on large
tables.

An estimate_count(*) or similar that would allow us to put an estimate of
how many results will be returned (not guaranteed accurate) would be very
nice to have in these cases.

We're dealing with complex sets of criteria.  It's very useful for the users
to know in "real time" how much their search criteria is effecting the
result pool.  Once they feel they've limited as much as they can without
reducing the pool too much, they can hit submit and get the actual result.

As I said, we do this with small data sets, but it's not terribly useful
there.  Where it will be useful is searches of large data sets, where
constantly submitting and then retrying is overly time-consuming.

Of course, this is count(*)ing the results of a complex query, possibly
with a bunch of joins and many limitations in the WHERE clause, so I'm
not sure what could be done overall to improve the response time.

> On Thu, 22 Mar 2007, Albert Cervera Areny wrote:
>
> > As you can see, PostgreSQL needs to do a sequencial scan to count because its
> > MVCC nature and indices don't have transaction information. It's a known
> > drawback inherent to the way PostgreSQL works and which gives very good
> > results in other areas. It's been talked about adding some kind of
> > approximated count which wouldn't need a full table scan but I don't think
> > there's anything there right now.
> >
> > A Dijous 22 Març 2007 11:53, Andreas Tille va escriure:
> > > Hi,
> > >
> > > I just try to find out why a simple count(*) might last that long.
> > > At first I tried explain, which rather quickly knows how many rows
> > > to check, but the final count is two orders of magnitude slower.
> > >
> > > My MS_SQL server using colleague can't believe that.
> > >
> > > $ psql InfluenzaWeb -c 'explain SELECT count(*) from agiraw ;'
> > >                                QUERY PLAN
> > > -----------------------------------------------------------------------
> > >   Aggregate  (cost=196969.77..196969.77 rows=1 width=0)
> > >     ->  Seq Scan on agiraw  (cost=0.00..185197.41 rows=4708941 width=0)
> > > (2 rows)
> > >
> > > real    0m0.066s
> > > user    0m0.024s
> > > sys     0m0.008s
> > >
> > > $ psql InfluenzaWeb -c 'SELECT count(*) from agiraw ;'
> > >    count
> > > ---------
> > >   4708941
> > > (1 row)
> > >
> > > real    0m4.474s
> > > user    0m0.036s
> > > sys     0m0.004s
> > >
> > >
> > > Any explanation?
> > >
> > > Kind regards
> > >
> > >           Andreas.
> >
> > --
> > Albert Cervera Areny
> > Dept. Informàtica Sedifa, S.L.
> >
> > Av. Can Bordoll, 149
> > 08202 - Sabadell (Barcelona)
> > Tel. 93 715 51 11
> > Fax. 93 715 51 12
> >
> > ====================================================================
> > ........................  AVISO LEGAL  ............................
> > La   presente  comunicación  y sus anexos tiene como destinatario la
> > persona a  la  que  va  dirigida, por  lo  que  si  usted lo  recibe
> > por error  debe  notificarlo  al  remitente  y   eliminarlo   de  su
> > sistema,  no  pudiendo  utilizarlo,  total  o   parcialmente,   para
> > ningún  fin.  Su  contenido  puede  tener información confidencial o
> > protegida legalmente   y   únicamente   expresa  la  opinión     del
> > remitente.  El   uso   del   correo   electrónico   vía Internet  no
> > permite   asegurar    ni  la   confidencialidad   de   los  mensajes
> > ni    su    correcta     recepción.   En    el  caso   de   que   el
> > destinatario no consintiera la utilización  del correo  electrónico,
> > deberá ponerlo en nuestro conocimiento inmediatamente.
> > ====================================================================
> > ........................... DISCLAIMER .............................
> > This message and its  attachments are  intended  exclusively for the
> > named addressee. If you  receive  this  message  in   error,  please
> > immediately delete it from  your  system  and notify the sender. You
> > may  not  use  this message  or  any  part  of it  for any  purpose.
> > The   message   may  contain  information  that  is  confidential or
> > protected  by  law,  and  any  opinions  expressed  are those of the
> > individual    sender.  Internet  e-mail   guarantees   neither   the
> > confidentiality   nor  the  proper  receipt  of  the  message  sent.
> > If  the  addressee  of  this  message  does  not  consent to the use
> > of   internet    e-mail,    please    inform     us    inmmediately.
> > ====================================================================
> >
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 7: You can help support the PostgreSQL project by donating at
> >
> >                 http://www.postgresql.org/about/donate
> >
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                 http://www.postgresql.org/about/donate
>
>
>
>
>
>


--
Bill Moran
Collaborative Fusion Inc.

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

pgsql-performance by date:

Previous
From: Andreas Kostyrka
Date:
Subject: Re: Performance of count(*)
Next
From: Bill Moran
Date:
Subject: Re: Potential memory usage issue