Thread: group by
Hi, I notices a weird thing here. version 7.2.1 on Solaris table "test", has a field "state". There are 4 "state" values, 1, 2, 3, 4. select count(*) from test group by state; took 11500 msec but select count(*) from test where state = 1; select count(*) from test where state = 2; select count(*) from test where state = 3; select count(*) from test where state = 4; total took 626 msec Why ?? thanks, kathy
When was the last time you vacuumed full? Kathy Zhu wrote: >Hi, > >I notices a weird thing here. > >version 7.2.1 >on Solaris > >table "test", has a field "state". >There are 4 "state" values, 1, 2, 3, 4. > >select count(*) from test group by state; >took 11500 msec > >but > >select count(*) from test where state = 1; >select count(*) from test where state = 2; >select count(*) from test where state = 3; >select count(*) from test where state = 4; >total took 626 msec > >Why ?? > >thanks, >kathy > > > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > > >
On Fri, 3 Oct 2003, Kathy Zhu wrote: > Hi, > > I notices a weird thing here. > > version 7.2.1 > on Solaris > > table "test", has a field "state". > There are 4 "state" values, 1, 2, 3, 4. > > select count(*) from test group by state; > took 11500 msec > > but > > select count(*) from test where state = 1; > select count(*) from test where state = 2; > select count(*) from test where state = 3; > select count(*) from test where state = 4; > total took 626 msec > > Why ?? First thought was caching: the disk blocks are cached in memory after the first qery so all the others just fetch from there. Second thought: well I haven't really had it but indexes and sorting is sort of sloshing around in my mind. Not sure how that applies to here since I can't remember if the first would require the sort step having retrieved all the tuples and the others would just use the index pages. It is Friday though. -- Nigel J. Andrews
On Fri, 3 Oct 2003, Kathy Zhu wrote: > Hi, > > I notices a weird thing here. > > version 7.2.1 > on Solaris > > table "test", has a field "state". > There are 4 "state" values, 1, 2, 3, 4. > > select count(*) from test group by state; > took 11500 msec > > but > > select count(*) from test where state = 1; > select count(*) from test where state = 2; > select count(*) from test where state = 3; > select count(*) from test where state = 4; > total took 626 msec > I believe Solaris's qsort implementation (which is used for the group by) has problems dealing with large numbers of similar values. I think in later versions of pg our own qsort is used. Kris Jurka
I did a vacuum and got the same result. I think the problem lies in there is swapping going for groupby when there is a large number of rows in the table, 5000 in this case. I guess I have to use group by with caution. thanks for the all the replies though, kathy > Date: Fri, 03 Oct 2003 15:53:02 -0700 > From: Dennis Gearon <gearond@fireserve.net> > User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 > X-Accept-Language: en-us, ru, es-mx > To: Kathy Zhu <Kathy.Zhu@sun.com> > CC: pgsql-general@postgresql.org > Subject: Re: [GENERAL] group by > Content-Transfer-Encoding: 7bit > > When was the last time you vacuumed full? > > Kathy Zhu wrote: > > >Hi, > > > >I notices a weird thing here. > > > >version 7.2.1 > >on Solaris > > > >table "test", has a field "state". > >There are 4 "state" values, 1, 2, 3, 4. > > > >select count(*) from test group by state; > >took 11500 msec > > > >but > > > >select count(*) from test where state = 1; > >select count(*) from test where state = 2; > >select count(*) from test where state = 3; > >select count(*) from test where state = 4; > >total took 626 msec > > > >Why ?? > > > >thanks, > >kathy > > > > > > > >---------------------------(end of broadcast)--------------------------- > >TIP 5: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/docs/faqs/FAQ.html > > > > > > >
On Fri, Oct 03, 2003 at 17:32:22 -0600, Kathy Zhu <Kathy.Zhu@Sun.COM> wrote: > I did a vacuum and got the same result. > I think the problem lies in there is swapping going for groupby when there is a > large number of rows in the table, 5000 in this case. > > I guess I have to use group by with caution. In 7.4 you will probably find the results more compatible. For 7.3 and lower, the first case will use a sort to do a group by. No sort will be done in the second example. In 7.4 the groub by will use the new hash aggregate method and will probably be significantly faster than the second way of doing things.
That was also one of my guesses why groupby takes longer, although it is not mentioned in the doc. thanks !!! kathy > Date: Sat, 4 Oct 2003 11:23:48 -0500 > From: Bruno Wolff III <bruno@wolff.to> > To: Kathy Zhu <Kathy.Zhu@sun.com> > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] group by > Mail-Followup-To: Kathy Zhu <Kathy.Zhu@Sun.COM>, pgsql-general@postgresql.org > Content-Disposition: inline > User-Agent: Mutt/1.5.4i > > On Fri, Oct 03, 2003 at 17:32:22 -0600, > Kathy Zhu <Kathy.Zhu@Sun.COM> wrote: > > I did a vacuum and got the same result. > > I think the problem lies in there is swapping going for groupby when there is a > > large number of rows in the table, 5000 in this case. > > > > I guess I have to use group by with caution. > > In 7.4 you will probably find the results more compatible. > For 7.3 and lower, the first case will use a sort to do a group by. > No sort will be done in the second example. > In 7.4 the groub by will use the new hash aggregate method and will > probably be significantly faster than the second way of doing things.
On Mon, Oct 06, 2003 at 09:23:05 -0600, Kathy Zhu <Kathy.Zhu@sun.com> wrote: > That was also one of my guesses why groupby takes longer, although it is not > mentioned in the doc. If you want to know how a query is being done, you can use explain (or explain analyze if you want timing information) to see the plan.