Thread: Really SLOW using GROUP BY ...!?
Hi, I would like to know if someone have a solution for me ... I have a table with today about 2,8 millions records. The table have good indexes, in our case on idcond, and points ! When I do : select sum(points) from gains; With Oracle : 8 sec With PostGreSQL : 10 sec OK for this it's ok ... But when I do : select sum(points) from gains group by idcond; With Oracle : 22 sec With PostGreSQL : about 3 minutes !!! I have done a vacuum analyse of the table gains ... just before testing ... What can I do to optimize that result ? Why the group by function is so slow ?? Thanks for your help ! Regards, -- Hervé
> But when I do : > select sum(points) from gains group by idcond; > > With Oracle : 22 sec > With PostGreSQL : about 3 minutes !!! Try select sum(points) from gains where idcond >= _minimum_id_cond_value_ group by idcond; to see if forced index usage will help. Unfortunately, PG will anyway try to sort result before grouping, but probably this trick will help somehow. Also, use -S 2048 (or more) backend arg to increase sort memory size. Vadim
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: > Try > select sum(points) from gains where idcond >= _minimum_id_cond_value_ > group by idcond; > to see if forced index usage will help. Unfortunately, PG will anyway > try to sort result before grouping, I beg your pardon? regression=# set enable_seqscan TO on; SET VARIABLE regression=# explain select sum(unique1) from tenk1 group by hundred; NOTICE: QUERY PLAN: Aggregate (cost=997.39..1047.39 rows=1000 width=8) -> Group (cost=997.39..1022.39 rows=10000 width=8) -> Sort (cost=997.39..997.39 rows=10000 width=8) -> Seq Scan on tenk1 (cost=0.00..333.00 rows=10000 width=8) EXPLAIN regression=# set enable_seqscan TO off; SET VARIABLE regression=# explain select sum(unique1) from tenk1 group by hundred; NOTICE: QUERY PLAN: Aggregate (cost=0.00..1119.88 rows=1000 width=8) -> Group (cost=0.00..1094.88 rows=10000 width=8) -> Index Scan using tenk1_hundred on tenk1 (cost=0.00..1069.88 rows=10000 width=8) EXPLAIN Unfortunately neither of these plans is likely to be especially speedy on ~3 million rows. The index scan will just thrash the disk, unless the table has been clustered recently --- and given the deficiencies of our CLUSTER implementation, I'd hesitate to recommend using it. I have a personal TODO item to see about implementing group + aggregate with a hash table of active aggregate values, per a suggestion recently from devik@cdi.cz. That would allow this query to be done with a sequential scan and no sort, which is probably what Oracle is doing. Won't happen for 7.1 though ... regards, tom lane
> I beg your pardon? Get it - I didn't know about enable_seqscan -:) > Unfortunately neither of these plans is likely to be especially speedy > on ~3 million rows. The index scan will just thrash the disk, unless > the table has been clustered recently --- and given the > deficiencies of our CLUSTER implementation, I'd hesitate to recommend > using it. Easy to try - why don't do. > I have a personal TODO item to see about implementing group + > aggregate with a hash table of active aggregate values, per a > suggestion recently from devik@cdi.cz. That would allow this > query to be done with a sequential scan and no sort, which is > probably what Oracle is doing. Won't happen for 7.1 though ... Well, definitely good approach. But for the moment increasing sort memory is only hope and it should help. Vadim
HI, "Mikheev, Vadim" a écrit : > > > But when I do : > > select sum(points) from gains group by idcond; > > > > With Oracle : 22 sec > > With PostGreSQL : about 3 minutes !!! > > Try > > select sum(points) from gains where idcond >= _minimum_id_cond_value_ > group by idcond; > > to see if forced index usage will help. Unfortunately, PG will anyway > try to sort result before grouping, but probably this trick will help > somehow. Also, use -S 2048 (or more) backend arg to increase sort > memory size. I'm using -S 512000 ;)) The result I have is not good ;) pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. :( So I vacuum analyze my table ... Then I get : exactly the same result : about 3 min ! and my table have 2729276 records The EXPLAIN of you request give : NOTICE: QUERY PLAN: Aggregate (cost=488140.30..501773.03 rows=272655 width=8) -> Group (cost=488140.30..494956.67 rows=2726547 width=8) -> Sort (cost=488140.30..488140.30 rows=2726547 width=8) -> Seq Scan on gains (cost=0.00..62890.95 rows=2726547 width=8) And with my classical request : NOTICE: QUERY PLAN: Aggregate (cost=481763.55..495409.93 rows=272928 width=8) -> Group (cost=481763.55..488586.74 rows=2729276 width=8) -> Sort (cost=481763.55..481763.55 rows=2729276 width=8) -> Seq Scan on gains (cost=0.00..56067.76 rows=2729276 width=8) Seems to have no effect !? -- Hervé Piedvache Elma Ingenierie Informatique 6, rue du Faubourg Saint-Honoré F-75008 - Paris - France http://www.elma.fr Tel: +33-1-44949901 Fax: +33-1-44949902 Email: herve@elma.fr
> > somehow. Also, use -S 2048 (or more) backend arg to increase sort > > memory size. > > I'm using -S 512000 ;)) 512Mb?! Isn't it tooo much? > The result I have is not good ;) > > pqReadData() -- backend closed the channel unexpectedly. > This probably means the backend terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > :( > > So I vacuum analyze my table ... > > Then I get : exactly the same result : about 3 min ! Seems we have to wait new aggr + group implementation, sorry -:( Vadim
Hi Tom, Tom Lane a écrit : > > Unfortunately neither of these plans is likely to be especially speedy > on ~3 million rows. The index scan will just thrash the disk, unless > the table has been clustered recently --- and given the deficiencies of > our CLUSTER implementation, I'd hesitate to recommend using it. Sorry but I don't understand ... you tell me to not use the CLUSTER implementation ? What is the risk of using it ? What can I do to solve my group by slower trouble ? Just waiting you implement the option you talk after... ? Group by is a classical SQL command, what can I do to circumvent this problem ? Other SQL method ? Thanks for your reply, > I have a personal TODO item to see about implementing group + aggregate > with a hash table of active aggregate values, per a suggestion recently > from devik@cdi.cz. That would allow this query to be done with a > sequential scan and no sort, which is probably what Oracle is doing. > Won't happen for 7.1 though ... > > regards, tom lane Regards, -- Hervé