Re: loop vs. aggregate was: update and group by/aggregate - Mailing list pgsql-general
From | Sam Mason |
---|---|
Subject | Re: loop vs. aggregate was: update and group by/aggregate |
Date | |
Msg-id | 20080827114542.GG7271@frubble.xen.chris-lamb.co.uk Whole thread Raw |
In response to | Re: loop vs. aggregate was: update and group by/aggregate (Ivan Sergio Borgonovo <mail@webthatworks.it>) |
Responses |
Re: loop vs. aggregate was: update and group by/aggregate
|
List | pgsql-general |
On Wed, Aug 27, 2008 at 01:19:30PM +0200, Ivan Sergio Borgonovo wrote: > but this looks much slower than the function: > function: 113sec > vs. > single statement: 488sec > I repeated the test 3 times with similar results. Have you been vacuuming (non-full) between runs? and as always, are the stats reasonably up to date? > Can anybody explain why aggregates under perform so badly? You could try just running the SELECT part to see how long the aggregation takes. > I just read that most of the times I can't be smarter than the > planner and I thought that this would be one of the circumstances > theplanner could outperform my handwritten function. It's quite easy to outsmart the planner with large amounts of data, but it's surprising how well it does most of the time. Generally you can just write whatever is obvious and the planer will do something good with it. If it doesn't do the right thing then you can worry about performance, rather than most languages where you have to worry about performance from the start. > here is the explain: > "Hash Join (cost=137408.51..271599.78 rows=209674 width=221)" > " Hash Cond: ("outer".itemid = "inner".itemid)" > " -> HashAggregate (cost=32994.81..36664.11 rows=209674 width=58)" > " -> Hash Join (cost=8544.62..31946.44 rows=209674 width=58)" > " Hash Cond: ("outer".authorid = "inner".authorid)" > " -> Seq Scan on catalog_itemauthor ia (cost=0.00..10297.21 rows=629021 width=12)" > " -> Hash (cost=8309.00..8309.00 rows=94248 width=54)" > " -> Seq Scan on catalog_author a (cost=0.00..8309.00 rows=94248 width=54)" > " Filter: ((name IS NOT NULL) AND (length(btrim((name)::text, E' \011'::text)) > 1))" > " -> Hash (cost=79538.96..79538.96 rows=833496 width=189)" > " -> Seq Scan on catalog_items (cost=0.00..79538.96 rows=833496 width=189)" The seqscan of catalog_items looks a little strange to me, have you set work_mem to some obscenely big value? > Sam... I did your same error the first time I wrote the above > statement... missing the where clause but I can't understand the > need for your longer version "to ensure that characters trimmed from > the authors' name are consistent. You're trimming an extra close square bracket (']') in one of the trim statements and not in the other. If you just do it in one place then you don't have to worry about inconsistency. > I prefer to put constraint on the length of varchar as an early > warning for something that is unexpected. > eg. if the loop was not working as expected I'd get an error. Infact > that's exactly what happened during development of the above > function. OK, Knuth generally seemed to recommend doing similar things. Most modern programming people tend to advocate removing artificial limits as some fundamental good. I'm never quite so sure, either way! Sam
pgsql-general by date: