Home > mailing lists

Re: loop vs. aggregate was: update and group by/aggregate - Mailing list pgsql-general

From	Sam Mason
Subject	Re: loop vs. aggregate was: update and group by/aggregate
Date	August 27, 2008 11:45:51
Msg-id	20080827114542.GG7271@frubble.xen.chris-lamb.co.uk Whole thread Raw
In response to	Re: loop vs. aggregate was: update and group by/aggregate (Ivan Sergio Borgonovo <mail@webthatworks.it>)
Responses	Re: loop vs. aggregate was: update and group by/aggregate
List	pgsql-general

Tree view

On Wed, Aug 27, 2008 at 01:19:30PM +0200, Ivan Sergio Borgonovo wrote:
> but this looks much slower than the function:
> function: 113sec
> vs.
> single statement: 488sec
> I repeated the test 3 times with similar results.

Have you been vacuuming (non-full) between runs? and as always, are the
stats reasonably up to date?

> Can anybody explain why aggregates under perform so badly?

You could try just running the SELECT part to see how long the
aggregation takes.

> I just read that most of the times I can't be smarter than the
> planner and I thought that this would be one of the circumstances
> theplanner could outperform my handwritten function.

It's quite easy to outsmart the planner with large amounts of data, but
it's surprising how well it does most of the time.  Generally you can
just write whatever is obvious and the planer will do something good
with it.  If it doesn't do the right thing then you can worry about
performance, rather than most languages where you have to worry about
performance from the start.

> here is the explain:
> "Hash Join  (cost=137408.51..271599.78 rows=209674 width=221)"
> "  Hash Cond: ("outer".itemid = "inner".itemid)"
> "  ->  HashAggregate  (cost=32994.81..36664.11 rows=209674 width=58)"
> "        ->  Hash Join  (cost=8544.62..31946.44 rows=209674 width=58)"
> "              Hash Cond: ("outer".authorid = "inner".authorid)"
> "              ->  Seq Scan on catalog_itemauthor ia  (cost=0.00..10297.21 rows=629021 width=12)"
> "              ->  Hash  (cost=8309.00..8309.00 rows=94248 width=54)"
> "                    ->  Seq Scan on catalog_author a  (cost=0.00..8309.00 rows=94248 width=54)"
> "                          Filter: ((name IS NOT NULL) AND (length(btrim((name)::text, E' \011'::text)) > 1))"
> "  ->  Hash  (cost=79538.96..79538.96 rows=833496 width=189)"
> "        ->  Seq Scan on catalog_items  (cost=0.00..79538.96 rows=833496 width=189)"

The seqscan of catalog_items looks a little strange to me, have you set
work_mem to some obscenely big value?

> Sam... I did your same error the first time I wrote the above
> statement... missing the where clause but I can't understand the
> need for your longer version "to ensure that characters trimmed from
> the authors' name are consistent.

You're trimming an extra close square bracket (']') in one of the trim
statements and not in the other.  If you just do it in one place then
you don't have to worry about inconsistency.

> I prefer to put constraint on the length of varchar as an early
> warning for something that is unexpected.
> eg. if the loop was not working as expected I'd get an error. Infact
> that's exactly what happened during development of the above
> function.

OK, Knuth generally seemed to recommend doing similar things.  Most
modern programming people tend to advocate removing artificial limits as
some fundamental good.  I'm never quite so sure, either way!

  Sam

pgsql-general by date:

From: "Albe Laurenz"
Date: 27 August 2008, 11:34:21
Subject: Re: ENABLE / DISABLE ALL TRIGGERS IN DATABASE

From: "Merlin Moncure"
Date: 27 August 2008, 12:19:55
Subject: Re: PITR - base backup question

Re: loop vs. aggregate was: update and group by/aggregate - Mailing list pgsql-general

Previous

Next