On 22/04/14 14:24, Pavel Stehule wrote:
> what is your effective_cache_size in postgresql.conf?
>
> What is random_page_cost and seq_page_cost?
>
8GB, 4, 1
But I am not asking about how to get a different plan or how to optimize
the query. I know that.
What I'm asking is the following. Assuming node without any filter has a
startup cost C1, a total cost of C2 and produces N rows. Now, a filter
is applied which passes through M rows. Then the startup cost for the
node *with* the filter applied should be different from C1 because a
certain amount of rows from the beginning is filtered out, right?
I think the startup cost should be something like
C1 + (C2+N*F-C1)*M/N or C1 + 0.5*(C2+N*F-C1)*M/N
where F is the cost to apply the filter to one row.
On average only one out of N/M rows matches the filter. So we need to
fetch N/M rows to produce the first row out of the filter. Now, you can
argue that we don't know where in that set the first matching row is. On
average it would probably in the middle. That's where the 0.5 comes from.
I certainly got it wrong somewhere. But I think you got the idea.
If not the seqscan node, but the limit node should have a startup cost
>0 (depending where the filter is taken into account). In my case the
startup cost for the limit node should be somewhere between 250000 and
300000.
Torsten
> 2014-04-22 14:10 GMT+02:00 Torsten Förtsch <torsten.foertsch@gmx.net
> <mailto:torsten.foertsch@gmx.net>>:
>
> Hi,
>
> I got this plan:
>
> Limit (cost=0.00..1.12 rows=1 width=0)
> -> Seq Scan on fmb (cost=0.00..6964734.35 rows=6237993 width=0)
> Filter: ...
>
> The table has ~80,000,000 rows. So, the filter, according to the plan,
> filters out >90% of the rows. Although the cost for the first row to
> come out of the seqscan might be 0, the cost for the first row to pass
> the filter and, hence, to hit the limit node is probably higher.