cost_index() and path row estimate. - Mailing list pgsql-hackers
From | Bernd Helmle |
---|---|
Subject | cost_index() and path row estimate. |
Date | |
Msg-id | D6C859D2C8FC254FB517CC07@eje.credativ.lan Whole thread Raw |
Responses |
Re: cost_index() and path row estimate.
|
List | pgsql-hackers |
While looking into a customer performance problem, i saw this in costsize.c, cost_index() (9.3.6, but it looks the same in HEAD): /* Mark the path with the correct row estimate */if (path->path.param_info){ path->path.rows = path->path.param_info->ppi_rows; /* also get the set of clauses that should be enforced by the scan */ allclauses =list_concat(list_copy(path->path.param_info->ppi_clauses), baserel->baserestrictinfo);}else{ path->path.rows = baserel->rows; /* allclauses should just be the rel's restrictionclauses */ allclauses = baserel->baserestrictinfo;} What i'm wondering is the else branch, where the baserel row estimate is assigned to the IndexPath. However, it occurs to me that in conjunction with a partial index, the overall row estimate will be constrained by the row estimate from the partial index itself, no? I see that this doesn't have an impact on the cost estimation of the index scan itself, since cost_index() does this later: /* estimate number of main-table tuples fetched */tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples); I stumpled across this, since i see heavy misestimates in the EXPLAIN output, where the estimated row count from the partial index vs. the real row count heavily differs. In the customers case, there are two tables, where one of the relation has many tuples in the JOIN condition which are NULLs, like: CREATE TABLE a2(id integer); CREATE TABLE b2(id integer); INSERT INTO a2 SELECT NULL FROM generate_series(1, 9800) AS t(id); INSERT INTO a2 SELECT t.id FROM generate_series(1, 200) AS t(id); INSERT INTO b2 SELECT t.id FROM generate_series(1, 200) AS t(id); CREATE INDEX ON a2(id) WHERE id IS NOT NULL; CREATE INDEX ON b2(id); Here i get the following plan: EXPLAIN ANALYZE SELECT * FROM b2 INNER JOIN a2 ON a2.id = b2.id ; Merge Join (cost=10.79..12.63 rows=4 width=8) (actual time=0.084..0.291 rows=200 loops=1) Merge Cond: (b2.id = a2.id) -> Sort (cost=10.64..11.14 rows=200 width=4) (actual time=0.069..0.082 rows=200 loops=1) Sort Key: b2.id Sort Method: quicksort Memory: 35kB -> Seq Scan on b2 (cost=0.00..3.00rows=200 width=4) (actual time=0.010..0.027 rows=200 loops=1) -> Index Only Scan using a2_id_idx on a2 (cost=0.14..15.14 rows=10000 width=4) (actual time=0.012..0.074 rows=200 loops=1) Heap Fetches: 200Total runtime: 0.350 ms EXPLAIN ANALYZE SELECT * FROM b2 INNER JOIN a2 ON a2.id = b2.id WHERE a2.id IS NOT NULL; Merge Join (cost=10.79..12.12 rows=1 width=8) (actual time=0.080..0.281 rows=200 loops=1) Merge Cond: (b2.id = a2.id) -> Sort (cost=10.64..11.14 rows=200 width=4) (actual time=0.063..0.070 rows=200 loops=1) Sort Key: b2.id Sort Method: quicksort Memory: 35kB -> Seq Scan on b2 (cost=0.00..3.00rows=200 width=4) (actual time=0.010..0.034 rows=200 loops=1) -> Index Only Scan using a2_id_idx on a2 (cost=0.14..15.64 rows=200 width=4) (actual time=0.012..0.052 rows=200 loops=1) Index Cond: (id IS NOT NULL) Heap Fetches: 200Total runtime:0.335 ms With the partial index predicate explicitly specified the row estimate is accurate, without the predicate the row estimate of the index scan on a2_id_idx assumes 10000. It's very likely i miss something really important here, could someone shed some light on this? -- Thanks Bernd
pgsql-hackers by date: