Re: query planner uses sequencial scan instead of index scan - Mailing list pgsql-performance

From Tom Lane
Subject Re: query planner uses sequencial scan instead of index scan
Date
Msg-id 11228.1242920470@sss.pgh.pa.us
Whole thread Raw
In response to query planner uses sequencial scan instead of index scan  (Daniel Ferreira <daniel.ferreira@saba.pt>)
List pgsql-performance
Daniel Ferreira <daniel.ferreira@saba.pt> writes:
> has we can see the query planner, decided to do sequencial scan in "a_mov_rcb" table and "recibos", when i set the
flag"enable_seqscan" to false all goes well. 

It's not really the seqscan that's the problem.  The problem is this
rowcount misestimate:

> "                          ->  Hash Join  (cost=9016.52..9325.95 rows=1 width=108) (actual time=156.501..304.851
rows=12586loops=1)" 
> "                                Hash Cond: ((qtd.idasso = app.idasso) AND (qtd.idmov = rct.idmov) AND (qtd.mes =
rct.mes))"

which is causing the planner to suppose that the remaining joins should
be done as nestloops.  That would be the right thing if there really
were only one row... with twelve thousand of them, it's taking
twelve thousand times longer than the planner expected.

The right fix would be to get the estimate to be better. (Even if it
were 5 or 10 rows the planner would probably avoid the nestloops.)
But I'm not sure how much you can improve it by raising the stats
targets.  This is trying to estimate the size of the join between
two GROUP BY subselects, and the planner is not tremendously good
at that.

A brute force solution might go like this:

1. Select the two GROUP BY sub-results into temp tables.

2. ANALYZE the temp tables.

3. Do the original query using the temp tables.

But it's a pain ...

            regards, tom lane

pgsql-performance by date:

Previous
From: Robert Schnabel
Date:
Subject: Re: raid10 hard disk choice
Next
From: "Joshua D. Drake"
Date:
Subject: Re: raid10 hard disk choice