Re: Performance problem with joined aggregate query - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: Performance problem with joined aggregate query
Date
Msg-id AANLkTi=8tf+tx1yPM4ehNNenmyg4jv8+e_BFnKiPwHdD@mail.gmail.com
Whole thread Raw
In response to Re: Performance problem with joined aggregate query  (Anssi Kääriäinen <anssi.kaariainen@thl.fi>)
List pgsql-performance
On Thu, Sep 16, 2010 at 1:51 AM, Anssi Kääriäinen
<anssi.kaariainen@thl.fi> wrote:
> Yes, looks like this is the same case. This makes it hard to use views
> having group by in them, as the whole group by part will always be
> executed. Back to planning board then...
>
> I guess my possibilities for pivot views are:
>  - crosstab: Will make statistics go "bad", that is, the crosstab query
>    will always seem to return static number of rows. This can cause
>    problems in complex queries using the view. IIRC performance is
>    a bit worse than pivot by group by.
>  - left joins: if joining the same table 20 times, there will be some
>    planner overhead. Maybe the best way for my usage case. Also about
>    2x slower than pivot using group by.
>  - subselect each of the columns: way worse performance: for my use
>    case, each added column adds about 50ms to run time, so for 20
>    columns this will take 1 second. The group by pivot query runs in
>    250ms.
>
> Any other ideas?

yes. specifically, if you are targeting the aggregation towards an
array you have another option:

analyze select * from test1 inner join (select array_agg(value), id
from test2 group by id) t on test1.id = t.id where test1.other_id = 10001;

can be replaced w/

select *, array(select value from test2 where test2.id=test1.id) from
test1 where test1.other_id = 10001;

This (array vs array_agg) will give you faster performance than
aggregation so is better preferred unless you need to group for other
purposes than relating.  I think the join if it could match up over
the group by key can give theoretically better plans but this should
be good enough especially if you aren't pulling a large amount of data
from the outer table.

merlin

pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Is disableing nested_loops a bad idea ?
Next
From: Josh Berkus
Date:
Subject: Re: Where does data in pg_stat_user_tables come from?