Re: Yet Another COUNT(*)...WHERE...question - Mailing list pgsql-general

From Andrew Armstrong
Subject Re: Yet Another COUNT(*)...WHERE...question
Date
Msg-id 001401c7df52$95b30c00$b6f3a305@plasma
Whole thread Raw
In response to Re: Yet Another COUNT(*)...WHERE...question  ("Scott Marlowe" <scott.marlowe@gmail.com>)
List pgsql-general
I don't know how PSQL does it, but MySQL has an SQL_CALC_FOUND_ROWS
extension which allows the query to also return how many rows exist without
the LIMIT clause. Perhaps there is similar for PSQL (check LIMIT docs?)

- Andrew

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Scott Marlowe
Sent: Thursday, 16 August 2007 1:24 AM
To: Phoenix Kiula
Cc: Gregory Stark; Postgres General
Subject: Re: [GENERAL] Yet Another COUNT(*)...WHERE...question

On 8/15/07, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> On 15/08/07, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> > On 8/15/07, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> > > On 15/08/07, Gregory Stark <stark@enterprisedb.com> wrote:
> > > > "Phoenix Kiula" <phoenix.kiula@gmail.com> writes:
> > > >
> > > > > I'm grappling with a lot of reporting code for our app that relies
on
> > > > > queries such as:
> > > > >
> > > > >      SELECT COUNT(*) FROM TABLE WHERE ....(conditions)...
> > > > >...
> > > > > The number of such possibilities for multiple WHERE conditions is
> > > > > infinite...
> > > >
> > > > Depends on the "conditions" bit. You can't solve all of the infinite
> > > > possibilities -- well you can, just run the query above -- but if
you want > to do better it's all about understanding your data.
> > >
> > >
> > > I am not sure what the advice here is. The WHERE condition comes from
> > > the indices. So if the query was not "COUNT(*)" but just a couple of
> > > columns, the query executes in less than a second. Just that COUNT(*)
> > > becomes horribly slow.
> >
> > Sorry, but I don't believe you.  if you're doing a count(*) on the
> > same dataset that returns in < 1 second, then the count(*) with the
> > same where clause will run in < 1 second.  I haven't seen pgsql do
> > anything else.
>
>
>
> Sorry I was not clear. Imagine an Amazon.com search results page. It
> has about 15 results on Page 1, then it shows "Page 1 of 190".
>
> To show each page, the query probably has a "LIMIT 15 OFFSET 0" for
> Page 1. However, to calculate the total number of pages, they probably
> do a separate counts query, because doing a "select *" and then
> counting the number of rows returned would be even more inefficient
> than a count(*).

When I go to amazon.com I only ever get three pages of results.  ever.
 Because they know that returning 190 pages is not that useful, as
hardly anyone is going to wander through that many pages.

Google, you'll notice says "Results 1 - 10 of about 5,610,000 for
blacksmith"  i.e. it's guesstimating as well.  no reason for google to
look at every single row for blacksmith to know that there's about 5.6
million.

> So, in reporting, two queries are fairly common I would think, unless
> I am missing something?

Yes, optimization. :)  You don't need an exact count to tell someone
that there's more data and they can go to it.  Note that if you are
planning on doing things google sized, you'll need to do what they
did, invent your own specialized database.

For us mere mortals, it's quite likely that you can do something like:

explain select * from table where field like 'abc%';

and then parse the explain output for an approximate number.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq



pgsql-general by date:

Previous
From: Decibel!
Date:
Subject: Re: memory optimization
Next
From: Decibel!
Date:
Subject: Re: Interpreting statistics collector output