Thread: group by

group by

From

Kathy Zhu

Date:

03 October 2003, 19:31:07

Hi,

I notices a weird thing here.

version 7.2.1
on Solaris

table "test", has a field "state".
There are 4 "state" values, 1, 2, 3, 4.

select count(*) from test group by state;
took 11500 msec

but

select count(*) from test where state = 1;
select count(*) from test where state = 2;
select count(*) from test where state = 3;
select count(*) from test where state = 4;
total took 626 msec

Why ??

thanks,
kathy

Re: group by

From

Dennis Gearon

Date:

03 October 2003, 19:45:52

When was the last time you vacuumed full?

Kathy Zhu wrote:

>Hi,
>
>I notices a weird thing here.
>
>version 7.2.1
>on Solaris
>
>table "test", has a field "state".
>There are 4 "state" values, 1, 2, 3, 4.
>
>select count(*) from test group by state;
>took 11500 msec
>
>but
>
>select count(*) from test where state = 1;
>select count(*) from test where state = 2;
>select count(*) from test where state = 3;
>select count(*) from test where state = 4;
>total took 626 msec
>
>Why ??
>
>thanks,
>kathy
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faqs/FAQ.html
>
>
>

Re: group by

From

"Nigel J. Andrews"

Date:

03 October 2003, 19:53:19

On Fri, 3 Oct 2003, Kathy Zhu wrote:

> Hi,
>
> I notices a weird thing here.
>
> version 7.2.1
> on Solaris
>
> table "test", has a field "state".
> There are 4 "state" values, 1, 2, 3, 4.
>
> select count(*) from test group by state;
> took 11500 msec
>
> but
>
> select count(*) from test where state = 1;
> select count(*) from test where state = 2;
> select count(*) from test where state = 3;
> select count(*) from test where state = 4;
> total took 626 msec
>
> Why ??

First thought was caching: the disk blocks are cached in memory after the first
qery so all the others just fetch from there.

Second thought: well I haven't really had it but indexes and sorting is sort of
sloshing around in my mind. Not sure how that applies to here since I can't
remember if the first would require the sort step having retrieved all the
tuples and the others would just use the index pages.

It is Friday though.

--
Nigel J. Andrews

Re: group by

From

Kris Jurka

Date:

03 October 2003, 20:12:05

On Fri, 3 Oct 2003, Kathy Zhu wrote:

> Hi,
>
> I notices a weird thing here.
>
> version 7.2.1
> on Solaris
>
> table "test", has a field "state".
> There are 4 "state" values, 1, 2, 3, 4.
>
> select count(*) from test group by state;
> took 11500 msec
>
> but
>
> select count(*) from test where state = 1;
> select count(*) from test where state = 2;
> select count(*) from test where state = 3;
> select count(*) from test where state = 4;
> total took 626 msec
>

I believe Solaris's qsort implementation (which is used for the group by)
has problems dealing with large numbers of similar values.

I think in later versions of pg our own qsort is used.

Kris Jurka

Re: group by

From

Kathy Zhu

Date:

03 October 2003, 20:33:16

I did a vacuum and got the same result.
I think the problem lies in there is swapping going for groupby when there is a
large number of rows in the table, 5000 in this case.

I guess I have to use group by with caution.

thanks for the all the replies though,
kathy

> Date: Fri, 03 Oct 2003 15:53:02 -0700
> From: Dennis Gearon <gearond@fireserve.net>
> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4)
Gecko/20030624
> X-Accept-Language: en-us, ru, es-mx
> To: Kathy Zhu <Kathy.Zhu@sun.com>
> CC: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] group by
> Content-Transfer-Encoding: 7bit
>
> When was the last time you vacuumed full?
>
> Kathy Zhu wrote:
>
> >Hi,
> >
> >I notices a weird thing here.
> >
> >version 7.2.1
> >on Solaris
> >
> >table "test", has a field "state".
> >There are 4 "state" values, 1, 2, 3, 4.
> >
> >select count(*) from test group by state;
> >took 11500 msec
> >
> >but
> >
> >select count(*) from test where state = 1;
> >select count(*) from test where state = 2;
> >select count(*) from test where state = 3;
> >select count(*) from test where state = 4;
> >total took 626 msec
> >
> >Why ??
> >
> >thanks,
> >kathy
> >
> >
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 5: Have you checked our extensive FAQ?
> >
> >               http://www.postgresql.org/docs/faqs/FAQ.html
> >
> >
> >
>

Re: group by

From

Bruno Wolff III

Date:

04 October 2003, 13:17:20

On Fri, Oct 03, 2003 at 17:32:22 -0600,
  Kathy Zhu <Kathy.Zhu@Sun.COM> wrote:
> I did a vacuum and got the same result.
> I think the problem lies in there is swapping going for groupby when there is a
> large number of rows in the table, 5000 in this case.
>
> I guess I have to use group by with caution.

In 7.4 you will probably find the results more compatible.
For 7.3 and lower, the first case will use a sort to do a group by.
No sort will be done in the second example.
In 7.4 the groub by will use the new hash aggregate method and will
probably be significantly faster than the second way of doing things.

Re: group by

From

Kathy Zhu

Date:

06 October 2003, 12:24:48

That was also one of my guesses why groupby takes longer, although it is not
mentioned in the doc.
thanks !!!
kathy

> Date: Sat, 4 Oct 2003 11:23:48 -0500
> From: Bruno Wolff III <bruno@wolff.to>
> To: Kathy Zhu <Kathy.Zhu@sun.com>
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] group by
> Mail-Followup-To: Kathy Zhu <Kathy.Zhu@Sun.COM>, pgsql-general@postgresql.org
> Content-Disposition: inline
> User-Agent: Mutt/1.5.4i
>
> On Fri, Oct 03, 2003 at 17:32:22 -0600,
>   Kathy Zhu <Kathy.Zhu@Sun.COM> wrote:
> > I did a vacuum and got the same result.
> > I think the problem lies in there is swapping going for groupby when there
is a
> > large number of rows in the table, 5000 in this case.
> >
> > I guess I have to use group by with caution.
>
> In 7.4 you will probably find the results more compatible.
> For 7.3 and lower, the first case will use a sort to do a group by.
> No sort will be done in the second example.
> In 7.4 the groub by will use the new hash aggregate method and will
> probably be significantly faster than the second way of doing things.

Re: group by

From

Bruno Wolff III

Date:

06 October 2003, 12:38:53

On Mon, Oct 06, 2003 at 09:23:05 -0600,
  Kathy Zhu <Kathy.Zhu@sun.com> wrote:
> That was also one of my guesses why groupby takes longer, although it is not
> mentioned in the doc.

If you want to know how a query is being done, you can use explain
(or explain analyze if you want timing information) to see the
plan.