Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running - Mailing list pgsql-performance

From El-Lotso
Subject Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running
Date
Msg-id 1189584573.13796.12.camel@neuromancer.home.net
Whole thread Raw
In response to Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running  (El-Lotso <el.lotso@gmail.com>)
Responses Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running
Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running
List pgsql-performance
On Wed, 2007-09-12 at 10:15 +0800, El-Lotso wrote:
> On Tue, 2007-09-11 at 14:23 -0400, Tom Lane wrote:
> > El-Lotso <el.lotso@gmail.com> writes:
> > > sorry.. I sent this as I was about to go to bed and the explain analyse
> > > of the query w/ 4 tables joined per subquery came out.
> >
> > It's those factor-of-1000 misestimates of the join sizes that are
> > killing you, eg this one:
> >
> > >               ->  Hash Join  (cost=249.61..512.56 rows=1 width=87) (actual time=15.139..32.858 rows=969 loops=1)
> > >                     Hash Cond: (((test_db.ts.id)::text = (test_db.d.id)::text) AND (test_db.ts.start_timestamp =
test_db.trd.start_timestamp)AND (test_db.ts.ttype = test_db.trd.ttype)) 
> > >                     ->  Seq Scan on ts  (cost=0.00..226.44 rows=3244 width=40) (actual time=0.135..6.916
rows=3244loops=1) 
> > >                     ->  Hash  (cost=235.00..235.00 rows=835 width=47) (actual time=14.933..14.933 rows=1016
loops=1)
> >
> > The single-row-result estimate persuades it to use a nestloop at the
> > next level up, and then when the output is actually 969 rows, that
> > means 969 executions of the other side of the upper join.
>
> Yep.. that's consistent with the larger results output. more rows = more
> loops


I'm on the verge of giving up... the schema seems simple and yet there's
so much issues with it. Perhaps it's the layout of the data, I don't
know. But based on the ordering/normalisation of the data and the one to
many relationship of some tables, this is giving the planner a headache
(and me a bulge on the head from knockin it against the wall)

I've tried multiple variations, subqueries, not use subqueries, not join
the table, (but to include it as a subquery - which gets re-written to a
join anyway) exists/not exists to no avail.

PG is fast, yes even w/ all the nested loops for up to 48K of results,
(within 4 minutes) but as soon as I put it into a inner join/left
join/multiple temporary(memory) tables it will choke.

select
a.a,b.b,c.c from
(select
x,y,z
from zz)a
inner join b
on a.a = b.a
left join (select
x,a,z
from xx)
then it will choke.

I'm really at my wits end here.


pgsql-performance by date:

Previous
From: ruben@rentalia.com
Date:
Subject: Re: [Again] Postgres performance problem
Next
From: Nis Jørgensen
Date:
Subject: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running