Thread: PATCH: index-only scans with partial indexes
Hi, currently partial indexes end up not using index only scans in most cases, because check_index_only() is overly conservative, as explained in this comment: * XXX this is overly conservative for partial indexes, since we will * consider attributes involved in the index predicate as required even * though the predicate won't need to be checked at runtime. (The same * is true for attributes used only in index quals, if we are certain * that the index is not lossy.) However, it would be quite expensive * to determine that accurately at this point, so for now we take the * easy way out. In other words, unless you include columns from the index predicate to the index, the planner will decide index only scans are not possible. Which is a bit unfortunate, because those columns are not needed at runtime, and will only increase the index size (and the main benefit of partial indexes is size reduction). The attached patch fixes this by only considering clauses that are not implied by the index predicate. The effect is simple: create table t as select i as a, i as b from generate_series(1,10000000) s(i); create index tidx_partial on t(b) where a > 1000 and a < 2000; vacuum freeze t; analyze t; explain analyze select count(b) from t where a > 1000 and a < 2000; QUERY PLAN ----------------------------------------------------------------------- Aggregate (cost=39.44..39.45 rows=1 width=4) (actual time=8.350..8.354 rows=1 loops=1) -> Index Scan using tidx_partial on t (cost=0.28..37.98 rows=585 width=4) (actual time=0.034..4.368 rows=999 loops=1) Planning time: 0.197 ms Execution time: 8.441 ms (4 rows) explain analyze select count(b) from t where a > 1000 and a < 2000; QUERY PLAN ----------------------------------------------------------------------- Aggregate (cost=33.44..33.45 rows=1 width=4) (actual time=8.019..8.023 rows=1 loops=1) -> Index Only Scan using tidx_partial on t (cost=0.28..31.98 rows=585 width=4) (actual time=0.036..4.165 rows=999 loops=1) Heap Fetches: 0 Planning time: 0.188 ms Execution time: 8.106 ms (5 rows) I've done a bunch of tests, and I do see small (hardly noticeable) increase in planning time with long list of WHERE clauses, because all those need to be checked against the index predicate. Not sure if this is what's meant by 'quite expensive' in the comment. Moreover, this was more than compensated by the IOS benefits (even with everything in RAM). But maybe it's possible to fix that somehow? For example, we're certainly doing those checks elsewhere when deciding which clauses need to be evaluated at run-time, so maybe we could cache that somehow? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > currently partial indexes end up not using index only scans in most > cases, because check_index_only() is overly conservative, as explained > in this comment: > ... > I've done a bunch of tests, and I do see small (hardly noticeable) > increase in planning time with long list of WHERE clauses, because all > those need to be checked against the index predicate. Not sure if this > is what's meant by 'quite expensive' in the comment. Moreover, this was > more than compensated by the IOS benefits (even with everything in RAM). > But maybe it's possible to fix that somehow? For example, we're > certainly doing those checks elsewhere when deciding which clauses need > to be evaluated at run-time, so maybe we could cache that somehow? The key problem here is that you're doing those proofs vastly earlier than before, for indexes that might not get used at all in the final plan. If you do some tests with multiple partial indexes you will probably see a bigger planning-time penalty. Perhaps we should bite the bullet and do it anyway, but I'm pretty suspicious of any claim that the planning cost is minimal. regards, tom lane
Hi, On 07/10/2015 10:43 PM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> currently partial indexes end up not using index only scans in most >> cases, because check_index_only() is overly conservative, as explained >> in this comment: >> ... > >> I've done a bunch of tests, and I do see small (hardly noticeable) >> increase in planning time with long list of WHERE clauses, because all >> those need to be checked against the index predicate. Not sure if this >> is what's meant by 'quite expensive' in the comment. Moreover, this was >> more than compensated by the IOS benefits (even with everything in RAM). > >> But maybe it's possible to fix that somehow? For example, we're >> certainly doing those checks elsewhere when deciding which clauses need >> to be evaluated at run-time, so maybe we could cache that somehow? > > The key problem here is that you're doing those proofs vastly earlier > than before, for indexes that might not get used at all in the final > plan. If you do some tests with multiple partial indexes you will > probably see a bigger planning-time penalty. Hmmm. Maybe we could get a bit smarter by looking at the attnums of each clause before doing the expensive stuff (which is predicate_implied_by I believe), exploiting a few simple observations: * if the clause is already covered by attrs_used, we don't need to process it at all * if the clause uses attributes not included in the index predicate, we know it can't be implied Of course, those are local optimizations, and can't fix some of the problems (e.g. a lot of partial indexes). > Perhaps we should bite the bullet and do it anyway, but I'm pretty > suspicious of any claim that the planning cost is minimal. Perhaps - I'm not claiming the planning cost is minimal. It was in the tests I've done, but no doubt it's possible to construct examples where the planning time will get much worse. With 30 partial indexes, I got an increase from 0.01 ms to ~2.5ms on simple queries. But maybe we could get at least some of the benefits by planning the index scans like today, and then do the IOS check later? Of course, this won't help with cases where the index scan is thrown away while the index only scan would win, but it does help with cases where we end up doing index scan anyway? That's essentially what I'm struggling right now - I do have a 3TB data set, the plan looks like this: QUERY PLAN ------------------------------------------------------------------------ Sort (cost=1003860164.92..1003860164.92 rows=1width=16) Sort Key: orders.o_orderpriority -> HashAggregate Group Key: orders.o_orderpriority -> Merge Semi Join Merge Cond: -> Index Scan using pk_orders on orders Filter: ((o_orderdate >= '1997-07-01'::date) AND (o_orderdate < '1997-10-01 00:00:00'::timestamp)) -> Index Scan using lineitem_l_orderkey_idx_part1 on lineitem and the visibility checks from Index Scans are killing the I/O. An IOS is likely to perform much better here (but haven't ran the query yet). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jul 10, 2015 at 11:29 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
currently partial indexes end up not using index only scans in most cases, because check_index_only() is overly conservative, as explained in this comment:
* XXX this is overly conservative for partial indexes, since we will
* consider attributes involved in the index predicate as required even
* though the predicate won't need to be checked at runtime. (The same
* is true for attributes used only in index quals, if we are certain
* that the index is not lossy.) However, it would be quite expensive
* to determine that accurately at this point, so for now we take the
* easy way out.
In other words, unless you include columns from the index predicate to the index, the planner will decide index only scans are not possible. Which is a bit unfortunate, because those columns are not needed at runtime, and will only increase the index size (and the main benefit of partial indexes is size reduction).
The attached patch fixes this by only considering clauses that are not implied by the index predicate. The effect is simple:
create table t as select i as a, i as b from
generate_series(1,10000000) s(i);
create index tidx_partial on t(b) where a > 1000 and a < 2000;
vacuum freeze t;
analyze t;
explain analyze select count(b) from t where a > 1000 and a < 2000;
However, "explain analyze select sum(b) from t where a > 1000 and a < 1999;" still doesn't use the index only
scan. Isn't that also implied by the predicate?
Cheers,
Jeff
25.08.2015 20:19, Jeff Janes пишет:
On Fri, Jul 10, 2015 at 11:29 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:Hi,
currently partial indexes end up not using index only scans in most cases, because check_index_only() is overly conservative, as explained in this comment:
* XXX this is overly conservative for partial indexes, since we will
* consider attributes involved in the index predicate as required even
* though the predicate won't need to be checked at runtime. (The same
* is true for attributes used only in index quals, if we are certain
* that the index is not lossy.) However, it would be quite expensive
* to determine that accurately at this point, so for now we take the
* easy way out.
In other words, unless you include columns from the index predicate to the index, the planner will decide index only scans are not possible. Which is a bit unfortunate, because those columns are not needed at runtime, and will only increase the index size (and the main benefit of partial indexes is size reduction).
The attached patch fixes this by only considering clauses that are not implied by the index predicate. The effect is simple:
create table t as select i as a, i as b from
generate_series(1,10000000) s(i);
create index tidx_partial on t(b) where a > 1000 and a < 2000;
vacuum freeze t;
analyze t;
explain analyze select count(b) from t where a > 1000 and a < 2000;However, "explain analyze select sum(b) from t where a > 1000 and a < 1999;" still doesn't use the index onlyscan. Isn't that also implied by the predicate?
In this example it doesn't use IndexOnlyScan correctly. If I understand partial indexes right, if index predicate and search clause are not equal, index scan must recheck values when it's fetching them.
'tidx_partial' in example above has no information about 'a' attribute, beside the index->indpred, so it is impossible to recheck qual without referencing to table.
In example:
create index tidx_partial on t(a) where a > 1000 and a < 2000;
explain analyze select sum(a) from t where a > 1000 and a < 1999;
it can use IndexOnlyScan.
-- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Fri, Sep 4, 2015 at 4:28 AM, Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote:
25.08.2015 20:19, Jeff Janes пишет:On Fri, Jul 10, 2015 at 11:29 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:Hi,
currently partial indexes end up not using index only scans in most cases, because check_index_only() is overly conservative, as explained in this comment:
* XXX this is overly conservative for partial indexes, since we will
* consider attributes involved in the index predicate as required even
* though the predicate won't need to be checked at runtime. (The same
* is true for attributes used only in index quals, if we are certain
* that the index is not lossy.) However, it would be quite expensive
* to determine that accurately at this point, so for now we take the
* easy way out.
In other words, unless you include columns from the index predicate to the index, the planner will decide index only scans are not possible. Which is a bit unfortunate, because those columns are not needed at runtime, and will only increase the index size (and the main benefit of partial indexes is size reduction).
The attached patch fixes this by only considering clauses that are not implied by the index predicate. The effect is simple:
create table t as select i as a, i as b from
generate_series(1,10000000) s(i);
create index tidx_partial on t(b) where a > 1000 and a < 2000;
vacuum freeze t;
analyze t;
explain analyze select count(b) from t where a > 1000 and a < 2000;However, "explain analyze select sum(b) from t where a > 1000 and a < 1999;" still doesn't use the index onlyscan. Isn't that also implied by the predicate?
In this example it doesn't use IndexOnlyScan correctly. If I understand partial indexes right, if index predicate and search clause are not equal, index scan must recheck values when it's fetching them.
'tidx_partial' in example above has no information about 'a' attribute, beside the index->indpred, so it is impossible to recheck qual without referencing to table.
In example:
create index tidx_partial on t(a) where a > 1000 and a < 2000;
explain analyze select sum(a) from t where a > 1000 and a < 1999;
it can use IndexOnlyScan.
Yes, of course. Thanks for the explanation, it is obvious now that you have explained it. I kept slipping into thinking that the predicate-dependent variables are included in the index but only when the predicate is met, but that isn't the case.
How can we evaluate Tom's performance concerns? I tried turning log_planner_stats on and using the regression test as a load generator, but I don't think that that is very demanding of a test.
Thanks,
Jeff
Hi, On 09/04/2015 06:10 PM, Jeff Janes wrote: > > How can we evaluate Tom's performance concerns? I tried > turning log_planner_stats on and using the regression test as a load > generator, but I don't think that that is very demanding of a test. I've done a bit of benchmarking today, trying to measure how expensive the additional checks are. Using a simple table with just 4 columns and 1M rows CREATE TABLE t AS SELECT i AS a, i AS b, i AS c, i AS d FROM generate_series(1,1000000) s(i); with three different index sets: - no indexes - 40 regular indexes (indexes-1.sql) - 40 partial indexes (indexes-2.sql) and two different query sets: - matching the partial indexes (queries-1.sql) - not matching the partial indexes (queries-2.sql) which means 6 combinations: A: no indexes / queries-1 B: no indexes / queries-2 C: indexes-1 / queries-1 D: indexes-1 / queries-2 E: indexes-2 / queries-1 F: indexes-2 / queries-2 A summary of 100 EXPLAIN timings looks like this: master A B C D E F ------------------------------------------------------------------------- min 0.10 0.10 0.30 0.29 0.66 0.23 max 1.07 1.00 2.13 1.98 4.52 1.59 median 0.49 0.52 0.31 0.33 0.68 1.12 average 0.43 0.35 0.62 0.49 1.01 0.89 patched A B C D E F ------------------------------------------------------------------------- min 0.11 0.11 0.29 0.29 0.70 0.22 max 0.99 1.05 0.55 1.93 3.79 1.12 median 0.19 0.55 0.32 0.34 0.74 0.24 average 0.42 0.52 0.34 0.55 0.95 0.27 A-D should be exactly the same, because there are no partial indexes, and the results match that expectation. E and F should be different, depending on how expensive the additional checks are. But in this benchmark that's not true - the patched version is actually a bit faster, thanks to noise. I find that a bit strange, but I repeated the benchmark about 3x just to verify it really behaves like this. Maybe I did some stupid mistake and the results are useless, or maybe it needs to be more complex (e.g. the conditions must not be exactly the same). So if someone could rerun the benchmark and review it, that'd be nice. Judging the cost/benefit ratio is a bit tricky. We need to identify the cases where additional planning complexity makes it measurably slower, without getting better performance at execution. And then we need to somehow argue whether those cases are frequent enough or not. ISTM that the worst case would be a data set with many partial indexes, that however don't allow IOS. And the amount of data would have to be small, so that the queries don't take too much time (which would make the additional planning time noise). However that was the idea of the benchmark, and I got no difference. regards Tomas -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 4 September 2015 at 22:03, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
--
A summary of 100 EXPLAIN timings looks like this:
master A B C D E F
-------------------------------------------------------------------------
min 0.10 0.10 0.30 0.29 0.66 0.23
max 1.07 1.00 2.13 1.98 4.52 1.59
median 0.49 0.52 0.31 0.33 0.68 1.12
average 0.43 0.35 0.62 0.49 1.01 0.89
What are these? Times? in ms?
However that was the idea of the benchmark, and I got no difference.
Please explain what this means and your conclusion, so its clear. That way we can either reject the patch or commit it. Thanks
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 09/05/2015 10:53 AM, Simon Riggs wrote: > On 4 September 2015 at 22:03, Tomas Vondra <tomas.vondra@2ndquadrant.com > <mailto:tomas.vondra@2ndquadrant.com>> wrote: > > A summary of 100 EXPLAIN timings looks like this: > > > master A B C D E F > ------------------------------------------------------------------------- > min 0.10 0.10 0.30 0.29 0.66 0.23 > max 1.07 1.00 2.13 1.98 4.52 1.59 > median 0.49 0.52 0.31 0.33 0.68 1.12 > average 0.43 0.35 0.62 0.49 1.01 0.89 > > > What are these? Times? in ms? Yes, those are planning times in milliseconds. I've been thinking about possible issues in the benchmark, and I ended up with two main suspects: (a) environment - VM running on a laptop. thus quite noisy and subject to various sources of overhead, power-management,etc. (b) time measured using \timing in psql (by running EXPLAIN), so probably influenced by formatting/transfer So I reran the benchmark on a different machine (bare metal, pretty much no noise in the results), and measured the planning time using EXPLAIN ANALYZE (Planning Time). And I got this (milliseconds): A B C D E F ----------------------------------------------------------------- min 0.04 0.04 0.11 0.10 0.37 0.12 max 0.10 0.10 0.92 0.92 1.62 1.23 median 0.04 0.04 0.11 0.11 0.37 0.13 average 0.04 0.04 0.11 0.11 0.38 0.14 A B C D E F ----------------------------------------------------------------- min 0.04 0.04 0.11 0.11 0.38 0.13 max 0.10 0.10 0.92 0.94 1.64 1.21 median 0.04 0.04 0.11 0.11 0.39 0.13 average 0.04 0.04 0.11 0.12 0.40 0.14 So much lower numbers (better CPU, no virtualization, etc.), but otherwise exactly the same conclusion - no overhead compared to master. I think of three ways how to make the checks more expensive: (a) using more indexes The current benchmark already uses 40 indexes (and I've tried with 100), and we've seen no impact at all. Addingmore indexes will eventually show some overhead, but the number of indexes will be very high - I doubtanyone has a table with hundreds of partial indexes on a it. (b) using more complex index predicates I expect the predicate_implied_by() call to get more expensive for more complex predicates. I however believethat's quite uncommon case - vast majority of index predicates that I've seen use just a single equalityclause. (c) using more complex queries (more WHERE conditions) Having more complex WHERE clauses seems quite plausible, though, so I've decided to try it. Instead of the simplequery used before: select a from t where b >= 100 and b <= 200; I've used a query with a bunch of other conditions: select a from t where b >= 100 and b <= 200 and c >= 100 and c <= 200 and d >= 100 and d <= 200 and a >= 100 and a <= 100; And indeed, this made a (tiny) difference - on the master, the planning was 0.50 ms on average, while with thepatch it was 0.55. But 0.05 ms is just barely above noise, even on this HW. Of course, this only impacts the case with partial indexes, all the other cases were exactly the same with andwithout patch. > > However that was the idea of the benchmark, and I got no difference. > > > Please explain what this means and your conclusion, so its clear. That > way we can either reject the patch or commit it. Thanks That means I've been unable to measure any significant overhead of the patch. There certainly are extreme cases where this patch might make the planning noticeably slower, but I believe those are rather artificial, and certainly wouldn't expect them in databases where a tiny increase of planning time would be a problem. This benchmark however only looked at the planning overhead, but we should weight that with respect to possible gains. And IOS is a great optimization - it's not uncommon to see 2-3x improvements on databases that fit into RAM, and order of magnitude improvements on large databases (thanks to eliminating the random I/O when accessing the heap). So my opinion is that we should commit this patch. regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > That means I've been unable to measure any significant overhead > of the patch. I've run a lot of benchmarks, and with anything resembling a common query the differences in planning time are lost in the noise. (I didn't create a better example than Tomas of where a lot of indexes cause a minimal increase in planning time.) The test environment is a "bare iron" machine with: 1 Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (4 cores; 8 HW threads)16GB DDR3 RAM2 1TB drives in RAID 1ubuntu 14.04 LTS 64-bitvariouscheckouts from master, most recently a7212a99no cassert, default optimizations As one example, to get a heap bigger than RAM I set up like this: drop table if exists t; create table t (a int not null, b int not null, x text not null); insert into t select i, i, repeat(md5(i::text), 50) from generate_series(1,10000000) s(i); vacuum freeze t; checkpoint; I ran one-index tests like this: create index t_b_partial on t(b) where a > 1000 and a < 2000; vacuum analyze t; checkpoint; explain (analyze, buffers, verbose) select count(b) from t where a > 1000 and a < 2000; explain (analyze, buffers, verbose) select count(a) from t where a > 1000 and a < 2000; explain (analyze, buffers, verbose) select count(*) from t where a > 1000 and a < 2000; ... then two-index tests like this: create index t_b_a_partial on t(b, a) where a > 1000 and a < 2000; vacuum analyze t; checkpoint; explain (analyze, buffers, verbose) select count(b) from t where a > 1000 and a < 2000; explain (analyze, buffers, verbose) select count(a) from t where a > 1000 and a < 2000; explain (analyze, buffers, verbose) select count(*) from t where a > 1000 and a < 2000; All queries were run 5 times and (to minimize stray slowdowns from other sources on this desktop machine) I took the minimum plan time and minimum execution time. (My browser and other optional processes were stopped to also minimize noise, but the results still had more noise than I would prefer.) master - single index --------------------- Planning time: 0.078 ms Execution time: 0.544 ms Planning time: 0.079 ms Execution time: 0.533 ms Planning time: 0.066 ms Execution time: 0.491 ms master - both indexes --------------------- Planning time: 0.080 ms Execution time: 0.396 ms Planning time: 0.076 ms Execution time: 0.373 ms Planning time: 0.056 ms Execution time: 0.275 ms patched - single index ---------------------- Planning time: 0.032 ms Execution time: 0.136 ms Planning time: 0.079 ms Execution time: 0.537 ms Planning time: 0.050 ms Execution time: 0.213 ms patched - both indexes ---------------------- Planning time: 0.100 ms Execution time: 0.373 ms Planning time: 0.067 ms Execution time: 0.251 ms Planning time: 0.065 ms Execution time: 0.240 ms In my view, the most disappointing thing about the patch is that when both indexes are present, it doesn't use the narrower one. If *only* the narrower index is present, it runs the index-only scan using that index for count(b) and count(*), which is faster. Can we wrangle this patch into making a better choice among available index-only scans? It also seems disappointing that we don't recognize that count(columnname) could be treated as a synonym for count(*) if columnname is NOT NULL, but that doesn't seem like material for this patch. Benchmarking took so much time I did not get to a close review of the code changes. :-( Based on just the test results, it appears to me that the patch as it stands would be a net win for the vast majority of workloads where it would make any noticeable difference, and I didn't manage to create any big downside. I would like to see whether we can't improve the choice of partial index when there are multiple possibilities -- it seems quite surprising to see identical estimates for indexes of different column counts and key widths, and to see the wider index chosen when the narrow one is clearly (and predictably) faster. I am changing this to Waiting on Author. I will be on vacation without Internet access for the next 15 days, so hopefully someone else can have a look when a new version is posted. If it's still open I'll have a look when I get back. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 09/13/2015 08:03 PM, Kevin Grittner wrote: > > In my view, the most disappointing thing about the patch is that > when both indexes are present, it doesn't use the narrower one. If > *only* the narrower index is present, it runs the index-only scan > using that index for count(b) and count(*), which is faster. Can > we wrangle this patch into making a better choice among available > index-only scans? That's indeed strange, but after poking into that for a while, it seems rather like a costing issue. Let me demonstrate: create table t (a int, b int); insert into t select i,i from generate_series(1,1000000) s(i); create index idx1 on t (a) where b between 300000 and 600000; create index idx2 on t (a,b) where b between 300000 and 600000; vacuum t; analyze t; explain select a from t where b between 300000 and 600000; QUERY PLAN --------------------------------------------------------------------------- Index Only Scan using idx2 on t (cost=0.42..9236.88rows=297823 width=4) Index Cond: ((b >= 300000) AND (b <= 600000)) (2 rows) drop index idx2; QUERY PLAN --------------------------------------------------------------------------- Index Only Scan using idx1 on t (cost=0.42..9236.88rows=297823 width=4) (1 row) Now, both plans are index only scans, but the first one has Index Cond and the other one does not! I've put a bunch of logging into cost_index(), and turns out that while the final cost is *exactly* the same, it's most likely by chance. After we call amcostestimate, we get these two results: idx1: indexStartupCost=0.422500 indexTotalCost=4769.537500 indexSelectivity=0.297823 indexCorrelation=1.000000 idx2: indexStartupCost=0.422500 indexTotalCost=6258.652500 indexSelectivity=0.297823 indexCorrelation=0.750000 So amcostestimate does make a difference, and we get idx1: run_cost = 4769.115000 idx2: run_cost = 6258.230000 and then for both indexes tuples_fetched=297823.000000 loop_count=1.000000 pages_fetched = 0.000000 but then we do run_cost += cpu_per_tuple * tuples_fetched; and we end up with run_cost = 9236.460000 for both indexes. How's that possible? Number of tuples fetched is exactly the same for both indexes (297823), so clearly cpu_per_tuple must be different. That however seems a bit strange, because the only difference between the indexes is the additional column, and the condition should be covered by the index predicate ... It seems that the problem is related to this: qpquals = extract_nonindex_conditions(baserel->baserestrictinfo, path->indexquals); while the "larger" index on (a,b) gets path->indexquals=(b BETWEEN 300000 AND 600000) qpquals=NIL the "smaller" index on (a) gets path->indexquals=NIL qpquals=(b BETWEEN 300000 AND 600000) And so the larger index gets qpqual_cost=0, the smaller one gets qpqual_cost=0.005, and so cpu_per_tuple is either 0.01 or 0.015. Which is exactly the difference between costs from amcostestimate idx1: 4769.115000 + 0.015 * 297823 = 9236.460000 idx2: 6258.230000 + 0.010 * 297823 = 9236.460000 Sppoky! Although it seems like a mere coincidence, thanks to the nice round numbers of tuples in the table, and lucky choice of two conditions. For example after replacing the BETWEEN condition (which is actually two conditions) with a single one (b<300000) - both in the indexes and query, I get this: QUERY PLAN --------------------------------------------------------------------------- Index Only Scan using idx2 on t (cost=0.42..8541.25rows=299507 width=4) Index Cond: (b < 300000) (2 rows) drop index idx2; QUERY PLAN --------------------------------------------------------------------------- Index Only Scan using idx1 on t (cost=0.42..8541.43rows=299507 width=4) (1 row) The plans are not costed exactly the same anymore (I'm not saying the costs are correct - clearly still the 'larger' index was preferred). It's not bound to index only scan either - after adding another column to the table, and requesting it from the query (so preventing IOS), I get exactly the same issues. I really wonder why we get different path->indexquals for those indexes, because that's the root of the issue here. Any ideas? > > It also seems disappointing that we don't recognize that > count(columnname) could be treated as a synonym for count(*) if > columnname is NOT NULL, but that doesn't seem like material for > this patch. Yeah, that's really not what this patch deals with. > > Benchmarking took so much time I did not get to a close review of > the code changes. :-( Based on just the test results, it appears > to me that the patch as it stands would be a net win for the vast > majority of workloads where it would make any noticeable difference, > and I didn't manage to create any big downside. I would like to > see whether we can't improve the choice of partial index when there > are multiple possibilities -- it seems quite surprising to see > identical estimates for indexes of different column counts and key > widths, and to see the wider index chosen when the narrow one is > clearly (and predictably) faster. > > I am changing this to Waiting on Author. > > I will be on vacation without Internet access for the next 15 days, > so hopefully someone else can have a look when a new version is > posted. If it's still open I'll have a look when I get back. Thanks for the feedback! regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, At Sun, 13 Sep 2015 23:21:30 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <55F5E8DA.8080303@2ndquadrant.com> > That's indeed strange, but after poking into that for a while, it > seems rather like a costing issue. Let me demonstrate: ... > Now, both plans are index only scans, but the first one has Index Cond > and the other one does not! The seemingly removed IndexCond qual is counted as non-index quals at the last in cost_index. The quals that the partial index implies should be ignored on cost_estimation. > I've put a bunch of logging into cost_index(), and turns out that > while the final cost is *exactly* the same, it's most likely by > chance. After we call amcostestimate, we get these two results: So it is *not by chance* but a stable behavior defined by algorithm. > It seems that the problem is related to this: > > qpquals > = extract_nonindex_conditions(baserel->baserestrictinfo, > path->indexquals); > > while the "larger" index on (a,b) gets > > path->indexquals=(b BETWEEN 300000 AND 600000) > qpquals=NIL > > the "smaller" index on (a) gets > > path->indexquals=NIL > qpquals=(b BETWEEN 300000 AND 600000) > > And so the larger index gets qpqual_cost=0, the smaller one gets > qpqual_cost=0.005, and so cpu_per_tuple is either 0.01 or 0.015. > > Which is exactly the difference between costs from amcostestimate > > idx1: 4769.115000 + 0.015 * 297823 = 9236.460000 > idx2: 6258.230000 + 0.010 * 297823 = 9236.460000 These calculations are exactly right, but you overlooked the breakedown of indexTotalCost for idx2. > Sppoky! Although it seems like a mere coincidence, thanks to the nice > round numbers of tuples in the table, and lucky choice of two > conditions. As said above, it is not a conincidence. The exactly same calculation about baserestrictinfo is simply calculated in different places, cost_index for the former and btcostestiamte(genericcostestimate) for the latter. We should properly ignore or remove the implicitly-applied quals for partial indexes on cost estimation. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 09/14/2015 09:35 AM, Kyotaro HORIGUCHI wrote: > Hi, ,,, >> Which is exactly the difference between costs from amcostestimate >> >> idx1: 4769.115000 + 0.015 * 297823 = 9236.460000 >> idx2: 6258.230000 + 0.010 * 297823 = 9236.460000 > > These calculations are exactly right, but you overlooked the > breakedown of indexTotalCost for idx2. > >> Sppoky! Although it seems like a mere coincidence, thanks to the nice >> round numbers of tuples in the table, and lucky choice of two >> conditions. > > As said above, it is not a conincidence. The exactly same > calculation about baserestrictinfo is simply calculated in > different places, cost_index for the former and > btcostestiamte(genericcostestimate) for the latter. By "coincidence" I meant that we happened to choose such a number of conditions in the index predicate & query that this perfect match is possible. Apparently there are two places that manipulate the costs and in this particular case happen to perfectly compensate the effects. As demonstrated by the example with a single condition, the costs may actually differ for different numbers of clauses (e.g. using a single clause makes the wider index - unexpectedly - cheaper). > > We should properly ignore or remove the implicitly-applied quals > for partial indexes on cost estimation. Probably. So far I've traced the difference to build_index_paths() where we build index_clauses by iterating over index columns - the smaller index does not have the column from the predicate, so we don't add the clause. I'm not particularly familiar with this part of the code, so I wonder where's the best place to fix this, though. regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, this looks to be a bug of cost_index(). The attached patch would fix that. ===== The following part in cost_index, > cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple; > > run_cost += cpu_per_tuple * tuples_fetched; Adds, *cpu_tuple_cost* (which is 0.01) + qpqual_cost.per_tuple (0.0025) per tuple even they are index tuples. On the other hand getnericcostestimate adds the following value for the same deed. > indexTotalCost += numIndexTuples * num_sa_scans * (cpu_index_tuple_cost + qual_op_cost); cpu_index_tuple_cost is 0.005, just a half of cpu_tuple cost as default. I think this should be the culprit of the difference. For confirmation, setting cpu_tuple_cost to 0.05 to equate with cpu_index_tuple_cost and the oppisit makes the estimate for both indexes the same value. set cpu_tuple_cost to 0.005; explain select a from t where b < 300000; QUERY PLAN ---------------------------------------------------------------------------Index Only Scan using idx2 on t (cost=0.42..7022.06rows=297876 width=4) Index Cond: (b < 300000) (2 rows) explain select a from t where b < 300000; QUERY PLAN ---------------------------------------------------------------------------Index Only Scan using idx1 on t (cost=0.42..7022.66rows=297876 width=4) (1 row) This should be a bug. The attached patch would fix this and perhaps costs for all of your examples should match except for errors of double precision. I think it is enough since IndexOnlyScan may not have quals on columns out of the index in focus so qpquals should be index quals. regards, At Mon, 14 Sep 2015 10:00:24 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <55F67E98.5050904@2ndquadrant.com> > On 09/14/2015 09:35 AM, Kyotaro HORIGUCHI wrote: > > Hi, > ,,, > >> Which is exactly the difference between costs from amcostestimate > >> > >> idx1: 4769.115000 + 0.015 * 297823 = 9236.460000 > >> idx2: 6258.230000 + 0.010 * 297823 = 9236.460000 > > > > These calculations are exactly right, but you overlooked the > > breakedown of indexTotalCost for idx2. > > > >> Sppoky! Although it seems like a mere coincidence, thanks to the nice > >> round numbers of tuples in the table, and lucky choice of two > >> conditions. > > > > As said above, it is not a conincidence. The exactly same > > calculation about baserestrictinfo is simply calculated in > > different places, cost_index for the former and > > btcostestiamte(genericcostestimate) for the latter. > > By "coincidence" I meant that we happened to choose such a number of > conditions in the index predicate & query that this perfect match is > possible. Apparently there are two places that manipulate the costs > and in this particular case happen to perfectly compensate the > effects. Ok, I understood. > As demonstrated by the example with a single condition, the costs may > actually differ for different numbers of clauses (e.g. using a single > clause makes the wider index - unexpectedly - cheaper). > > > > > We should properly ignore or remove the implicitly-applied quals > > for partial indexes on cost estimation. > > Probably. So far I've traced the difference to build_index_paths() > where we build index_clauses by iterating over index columns - the > smaller index does not have the column from the predicate, so we don't > add the clause. I'm not particularly familiar with this part of the > code, so I wonder where's the best place to fix this, though. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index d107d76..d354dc2 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -516,7 +516,12 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count) cost_qual_eval(&qpqual_cost, qpquals,root); startup_cost += qpqual_cost.startup; - cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple; + + /* Indexonly scan applies this qual on index tuples */ + if (indexonly) + cpu_per_tuple = cpu_index_tuple_cost + qpqual_cost.per_tuple; + else + cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple; run_cost += cpu_per_tuple * tuples_fetched;
Sorry. > Hi, this looks to be a bug of cost_index(). The attached patch > would fix that. No, that's wrong. please forget the patch. The qual in qpquals should be indexquals which is excluded because it is not necessary to be applied. The right way would be remove the cost for qpqual in cost_index(). > ===== > The following part in cost_index, > > > cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple; > > > > run_cost += cpu_per_tuple * tuples_fetched; > > Adds, *cpu_tuple_cost* (which is 0.01) + qpqual_cost.per_tuple > (0.0025) per tuple even they are index tuples. On the other hand > getnericcostestimate adds the following value for the same deed. > > > indexTotalCost += numIndexTuples * num_sa_scans * (cpu_index_tuple_cost + qual_op_cost); > > cpu_index_tuple_cost is 0.005, just a half of cpu_tuple cost as > default. I think this should be the culprit of the difference. > > For confirmation, setting cpu_tuple_cost to 0.05 to equate with > cpu_index_tuple_cost and the oppisit makes the estimate for both > indexes the same value. > > > set cpu_tuple_cost to 0.005; > explain select a from t where b < 300000; > QUERY PLAN > --------------------------------------------------------------------------- > Index Only Scan using idx2 on t (cost=0.42..7022.06 rows=297876 width=4) > Index Cond: (b < 300000) > (2 rows) > > explain select a from t where b < 300000; > QUERY PLAN > --------------------------------------------------------------------------- > Index Only Scan using idx1 on t (cost=0.42..7022.66 rows=297876 width=4) > (1 row) > > This should be a bug. The attached patch would fix this and > perhaps costs for all of your examples should match except for > errors of double precision. I think it is enough since > IndexOnlyScan may not have quals on columns out of the index in > focus so qpquals should be index quals. > > regards, > > > At Mon, 14 Sep 2015 10:00:24 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <55F67E98.5050904@2ndquadrant.com> > > On 09/14/2015 09:35 AM, Kyotaro HORIGUCHI wrote: > > > Hi, > > ,,, > > >> Which is exactly the difference between costs from amcostestimate > > >> > > >> idx1: 4769.115000 + 0.015 * 297823 = 9236.460000 > > >> idx2: 6258.230000 + 0.010 * 297823 = 9236.460000 > > > > > > These calculations are exactly right, but you overlooked the > > > breakedown of indexTotalCost for idx2. > > > > > >> Sppoky! Although it seems like a mere coincidence, thanks to the nice > > >> round numbers of tuples in the table, and lucky choice of two > > >> conditions. > > > > > > As said above, it is not a conincidence. The exactly same > > > calculation about baserestrictinfo is simply calculated in > > > different places, cost_index for the former and > > > btcostestiamte(genericcostestimate) for the latter. > > > > By "coincidence" I meant that we happened to choose such a number of > > conditions in the index predicate & query that this perfect match is > > possible. Apparently there are two places that manipulate the costs > > and in this particular case happen to perfectly compensate the > > effects. > > Ok, I understood. > > > As demonstrated by the example with a single condition, the costs may > > actually differ for different numbers of clauses (e.g. using a single > > clause makes the wider index - unexpectedly - cheaper). > > > > > > > > We should properly ignore or remove the implicitly-applied quals > > > for partial indexes on cost estimation. > > > > Probably. So far I've traced the difference to build_index_paths() > > where we build index_clauses by iterating over index columns - the > > smaller index does not have the column from the predicate, so we don't > > add the clause. I'm not particularly familiar with this part of the > > code, so I wonder where's the best place to fix this, though. -- Kyotaro Horiguchi NTT Open Source Software Center
I rethinked on this from the first. > Sorry. > > > Hi, this looks to be a bug of cost_index(). The attached patch > > would fix that. > > No, that's wrong. please forget the patch. The qual in qpquals > should be indexquals which is excluded because it is not > necessary to be applied. The right way would be remove the cost > for qpqual in cost_index(). Your patch allows index only scan even if a qual contains non-index column when the qual can be removed by implication from index predicates. So the 'implied' clauses is not needed ever after. It should be excluded from cost estimation and it is not needed on execution even if index only scan is found not to be doable finally. So the implicit quals may be removed on building index paths but I think check_index_only is not the place. Removing implied quals from index quals is not only for index *only* scan so the place for removing such quals is in build_index_paths, in the loop of step 1. After removing the quals there, check_index_only will naturally give disired result. # I remember that I have tried the same or similar thing. I don't # recall what made me give up then. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 09/14/2015 12:51 PM, Kyotaro HORIGUCHI wrote: > I rethinked on this from the first. > >> Sorry. >> >>> Hi, this looks to be a bug of cost_index(). The attached patch >>> would fix that. >> >> No, that's wrong. please forget the patch. The qual in qpquals >> should be indexquals which is excluded because it is not >> necessary to be applied. The right way would be remove the cost >> for qpqual in cost_index(). > > Your patch allows index only scan even if a qual contains > non-index column when the qual can be removed by implication from > index predicates. > > So the 'implied' clauses is not needed ever after. It should be > excluded from cost estimation and it is not needed on execution > even if index only scan is found not to be doable finally. > > So the implicit quals may be removed on building index paths but > I think check_index_only is not the place. > > Removing implied quals from index quals is not only for index > *only* scan so the place for removing such quals is in > build_index_paths, in the loop of step 1. After removing the > quals there, check_index_only will naturally give disired result. > > # I remember that I have tried the same or similar thing. I don't > # recall what made me give up then. I don't think this is particularly related to the patch, because some of the anomalies can be observed even on master. For example, let's force the index scans to be non-IOS by requesting another column from the heap: create table t (a int, b int, c int); insert into t select i,i,i from generate_series(1,1000000) s(i); create index idx1 on t (a) where b between 300000 and 600000; create index idx2 on t (a,b) where b between 300000 and600000; analyze t; vacuum t; The indexes have exactly the same size (thanks to alignment of IndexTuples), and should have exactly the same statistics: test=# \di+ List of relations Schema | Name | Type | Owner | Table | Size | Description --------+------+-------+-------+-------+---------+------------- public | idx1 | index | user | t | 6600 kB | public| idx2 | index | user | t | 6600 kB | (2 rows) Now, let's see the query reading column 'c' (forcing heap fetches) explain select c from t where b between 300000 and 600000; QUERY PLAN ----------------------------------------------------------------------- Index Scan using idx1 on t (cost=0.42..10945.99rows=300971 width=4) (1 row) drop index idx1; set enable_bitmapscan = off; explain select c from t where b between 300000 and 600000; QUERY PLAN ----------------------------------------------------------------------- Index Scan using idx2 on t (cost=0.42..19688.08rows=300971 width=4) Index Cond: ((b >= 300000) AND (b <= 600000)) (2 rows) I claim that either both or none of the indexes should use "Index Cond". This is exactly the same reason that lead to the strange behavior after applying the patch, but in this case the heap access actually introduces some additional cost so the issue is not that obvious. But in reality the costs should be pretty much exactly the same - the indexes have exactly the same size, statistics, selectivity etc. Also, the plan difference suggests this is not merely a costing issue, because while with idx1 (first plan) it was correctly detected we don't need to evaluate the condition on the partial index, on idx2 that's not true and we'll waste time doing that. So we probably can't just tweak the costing a bit - this probably needs to be addressed when actually building the index path. regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, sorry in advance for hardly readable long descriptions.. At Mon, 14 Sep 2015 13:27:47 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <55F6AF33.8000206@2ndquadrant.com> > I don't think this is particularly related to the patch, because some > of the anomalies can be observed even on master. For example, let's > force the index scans to be non-IOS by requesting another column from > the heap: ... > I claim that either both or none of the indexes should use "Index > Cond". Perhaps I understood your point and I think I understood this. Planner dicides whether to use the partial index far before creating specific paths, when measuring baserel sizes. If some partial index is implied by the restriction list, check_partial_indexes() marks the index as predOk, which means that the index is usable for the baserel scan even if the restriction clauses doesn't match the index columns. Then create_index_paths allows to generating paths for partial indexes with predOK. Index clauses of the created paths for the partial indexes don't contain clauses doesn't match the index columns. It is empty for your first example and it is the same with restrictinfo for the second. Finally, create_indexscan_plan strips baserestriction caluses implied by index predicate in addtion to index conditions. So both of your example has no filter conditions. The following example would show you the result of the above steps. explain select c from t where b between 300000 + 1 and 600000 and c = 3; QUERY PLAN ------------------------------------------------------------------Index Scan using idx1 on t (cost=0.42..11665.77 rows=1width=4) Filter: ((b >= 300001) AND (c = 3)) The index predicate (b >= 300000 AND b <= 600000) implies the restrction (b >= 300001 AND b <= 600000) so idx1 is allowed to be used, then conversely the restriction b >= 300001 is implied by the index predicate b >= 300000 so it is not shown as Index Cond and the other two are not, so they are shown and executed as Filter. But regardless of whether stripped as implied conditions or not at planning phase, the cost for all clauses that don't match the index columns are added when creating index paths. That will be one of the cause of cost error. > This is exactly the same reason that lead to the strange behavior > after applying the patch, but in this case the heap access actually > introduces some additional cost so the issue is not that obvious. So you're right on the point that check_index_only is doing wrong. It should properly ignore restrictions implied by index predicates as your patch is doing. But cost_index doesn't know that some nonindex-conditions of baserestrictinfo is finally useless, and it is assuming that nonindex conditions are always applied on heap tuples. After all, what should be done to properly ignore useless conditions would be, 1. Make create_index_paths() to make the list of restrict clauses which are implied by the index predicate of the index in focus. The list would be stored as a member in IndexOptInfo. Then create index clauses excluding the listed clauses and call get_index_paths using them. Modify check_index_only() to ignore the listed clauses when pulling varattnos.This is similar but different a bit to what I said in the previous mail. 2. Pass the listed clauses to generated IndexPath. 3. Modify extract_nonindex_conditions to be called with the exclusion list and the cluases are exluded from the resultof the function. 4. Make create_indexscan_plan to extract qpqual excluding the exclusion list. The same result could be obtained by more smarter way but the steps above will archive right decision on whether to do index only scan on partial index and correct cost estimate propery ignoring unused restrictions. Does this make sense for you? > But in reality the costs should be pretty much exactly the same - the > indexes have exactly the same size, statistics, selectivity etc. > > Also, the plan difference suggests this is not merely a costing issue, > because while with idx1 (first plan) it was correctly detected we > don't need to evaluate the condition on the partial index, on idx2 > that's not true and we'll waste time doing that. So we probably can't > just tweak the costing a bit - this probably needs to be addressed > when actually building the index path. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello Horiguchi-san, On 09/17/2015 12:45 PM, Kyotaro HORIGUCHI wrote: > > After all, what should be done to properly ignore useless > conditions would be, > > 1. Make create_index_paths() to make the list of restrict > clauses which are implied by the index predicate of the index > in focus. The list would be stored as a member in > IndexOptInfo. Then create index clauses excluding the listed > clauses and call get_index_paths using them. Modify > check_index_only() to ignore the listed clauses when pulling > varattnos. This is similar but different a bit to what I said > in the previous mail. > > 2. Pass the listed clauses to generated IndexPath. > > 3. Modify extract_nonindex_conditions to be called with the > exclusion list and the cluases are exluded from the result of > the function. > > 4. Make create_indexscan_plan to extract qpqual excluding the > exclusion list. > > The same result could be obtained by more smarter way but the > steps above will archive right decision on whether to do index > only scan on partial index and correct cost estimate propery > ignoring unused restrictions. > > Does this make sense for you? Yes, this seems sane. I've been poking at this a bit too, and I came to the same plan in general, except that I think it's better to build list of clauses that are *not* implied by the index, because that's what we need both in cost_index and check_index_only. It also seems to me that this change (arguably a bug fix) should pretty much make the original patch irrelevant, because check_index_only can simply walk over the new list. regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello, At Thu, 17 Sep 2015 17:40:27 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <55FADEEB.4000907@2ndquadrant.com> > Yes, this seems sane. I've been poking at this a bit too, and I came > to the same plan in general, except that I think it's better to build > list of clauses that are *not* implied by the index, because that's > what we need both in cost_index and check_index_only. I intended to isolate IndexOptInfo from belonging RelOptInfo but the exclusion list also bonds them tightly, and one IndexOptInfo belongs to only one RelOptInfo so no need to isolate. So not-implied-restrictclauses in IndexOptInfo would be preferable. > It also seems to me that this change (arguably a bug fix) should > pretty much make the original patch irrelevant, because > check_index_only can simply walk over the new list. Yeah. This seems to be a bug irrelevant to your index-only-scan ptch. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hi, On 09/18/2015 03:46 AM, Kyotaro HORIGUCHI wrote: > Hello, > > At Thu, 17 Sep 2015 17:40:27 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <55FADEEB.4000907@2ndquadrant.com> >> Yes, this seems sane. I've been poking at this a bit too, and I came >> to the same plan in general, except that I think it's better to build >> list of clauses that are *not* implied by the index, because that's >> what we need both in cost_index and check_index_only. > > I intended to isolate IndexOptInfo from belonging RelOptInfo but > the exclusion list also bonds them tightly, and one IndexOptInfo > belongs to only one RelOptInfo so no need to isolate. So > not-implied-restrictclauses in IndexOptInfo would be preferable. > >> It also seems to me that this change (arguably a bug fix) should >> pretty much make the original patch irrelevant, because >> check_index_only can simply walk over the new list. > > Yeah. This seems to be a bug irrelevant to your index-only-scan > ptch. Attached is a patch that makes this work correctly, I believe. The only difference compared to the v1 of the patch is this piece of code in match_clause_to_index (indexpath.c): if ((index->indpred != NIL) && (predicate_implied_by(lappend(NIL, rinfo->clause), index->indpred))) continue; which effectively says that we should ignore clauses implied by the index predicate. This fixes the way we build IndexClauseSet for the two indexes - originally we'd add the implied clause on the large index but not on the small one (because it does not have the column referenced in the condition). And as far as I can say, this also fixes all the costing issues that I've described because cost_index sees the same thing for both indexes. It's also early enough to fix the actual path, so EXPLAIN shows the same thing for both indexes (so no "Index Cond" present for only one of the indexes). I've originally tried to fix this in extract_nonindex_conditions, but that's way too late - it can only fix the costing, not the path. Arguably this part of the patch is a bugfix of an issue present on master. The patch does not change the check_index_only implementation - it still needs to check the clauses, just like in v1 of the patch. To make this re-check unnecessary, we'd have to stick the remaining clauses somewhere, so that check_index_only can use only the filtered list (instead of walking through the complete list of restrictions). Or perhaps we can do the check in match_clause_to_index, pretty much for free? If the clause is not implied by the index predicate (which we know thanks to the fix), and we don't assign it to any of the index columns, it means we can'd use IOS, no? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi, On 09/26/2015 01:28 PM, Tomas Vondra wrote: > The patch does not change the check_index_only implementation - it > still needs to check the clauses, just like in v1 of the patch. To > make this re-check unnecessary, we'd have to stick the remaining > clauses somewhere, so that check_index_only can use only the filtered > list (instead of walking through the complete list of restrictions). After thinking about this a bit more, I realized we already have a good place for keeping this list - IndexClauseSet. So I've done that, and removed most of the code from check_index_only - it still needs to decide whether to use the full list of restrictions (regular indexes) or the filtered list (for partial indexes). Calling predicate_implied_by in match_clause_to_index makes the check a bit earlier, compared to the v1 of the patch. So theoretically there might be cases where we'd interrupt the execution between those places, and the v1 would not pay the price for the call. But those places are fairly close together, so I find that unlikely and I've been unable to reproduce a plausible example of such regression despite trying. The only regression test this breaks is "aggregates", and I believe that's actually correct because it changes the plans like this: -> Index Only Scan Backward using minmaxtest2i on minmaxtest2 Index Cond: (f1 IS NOT NULL) -> Index Only Scan using minmaxtest3i on minmaxtest3 - Index Cond: (f1 IS NOT NULL) i.e. it removes the Index Condition from scans of minmaxtest3i, which is perfectly sensible because the index is defined like this: create index minmaxtest3i on minmaxtest3(f1) where f1 is not null; So it's partial index and that condition is implied by the predicate (it's actually exactly the same). I haven't fixed those regression tests for now. > > Or perhaps we can do the check in match_clause_to_index, pretty much > for free? If the clause is not implied by the index predicate (which > we know thanks to the fix), and we don't assign it to any of the > index columns, it means we can'd use IOS, no? After thinking about this a bit more, this is clearly nonsense. It fails on conditions referencing multiple columns (WHERE a=b) which can't be assigned to a single index column. So the logic would have to be much more complicated, effectively doing what check_index_only is doing just a tiny bit later. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi, At Sat, 26 Sep 2015 18:00:33 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <5606C121.10205@2ndquadrant.com> > Hi, > > On 09/26/2015 01:28 PM, Tomas Vondra wrote: > > > The patch does not change the check_index_only implementation - it > > still needs to check the clauses, just like in v1 of the patch. To > > make this re-check unnecessary, we'd have to stick the remaining > > clauses somewhere, so that check_index_only can use only the filtered > > list (instead of walking through the complete list of restrictions). > > After thinking about this a bit more, I realized we already have a > good place for keeping this list - IndexClauseSet. So I've done that, The place looks fine but it might be better that rclauses have baserestrictinfo itself when indpred == NIL. It would make the semantics of rclauses more consistent. > and removed most of the code from check_index_only - it still needs to > decide whether to use the full list of restrictions (regular indexes) > or the filtered list (for partial indexes). The change above allows to reduce the modification still left in check_index_only. > Calling predicate_implied_by in match_clause_to_index makes the check > a bit earlier, compared to the v1 of the patch. So theoretically there > might be cases where we'd interrupt the execution between those > places, and the v1 would not pay the price for the call. But those > places are fairly close together, so I find that unlikely and I've > been unable to reproduce a plausible example of such regression > despite trying. > > The only regression test this breaks is "aggregates", and I believe > that's actually correct because it changes the plans like this: > > -> Index Only Scan Backward using minmaxtest2i on minmaxtest2 > Index Cond: (f1 IS NOT NULL) > -> Index Only Scan using minmaxtest3i on minmaxtest3 > - Index Cond: (f1 IS NOT NULL) > > i.e. it removes the Index Condition from scans of minmaxtest3i, which > is perfectly sensible because the index is defined like this: > > create index minmaxtest3i on minmaxtest3(f1) where f1 is not null; > > So it's partial index and that condition is implied by the predicate > (it's actually exactly the same). Agreed. It looks an unexpected-but-positive product. > I haven't fixed those regression tests for now. > > > > > Or perhaps we can do the check in match_clause_to_index, pretty much > > for free? If the clause is not implied by the index predicate (which > > we know thanks to the fix), and we don't assign it to any of the > > index columns, it means we can'd use IOS, no? > > After thinking about this a bit more, this is clearly nonsense. It > fails on conditions referencing multiple columns (WHERE a=b) which > can't be assigned to a single index column. So the logic would have to > be much more complicated, effectively doing what check_index_only is > doing just a tiny bit later. cost_index() seems to need to be fixed. It would count excluded clauses in estimate. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, On 09/29/2015 12:27 PM, Kyotaro HORIGUCHI wrote: > Hi, > > At Sat, 26 Sep 2015 18:00:33 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <5606C121.10205@2ndquadrant.com> >> Hi, >> >> On 09/26/2015 01:28 PM, Tomas Vondra wrote: >> >>> The patch does not change the check_index_only implementation - it >>> still needs to check the clauses, just like in v1 of the patch. To >>> make this re-check unnecessary, we'd have to stick the remaining >>> clauses somewhere, so that check_index_only can use only the filtered >>> list (instead of walking through the complete list of restrictions). >> >> After thinking about this a bit more, I realized we already have a >> good place for keeping this list - IndexClauseSet. So I've done that, > > The place looks fine but it might be better that rclauses have > baserestrictinfo itself when indpred == NIL. It would make the > semantics of rclauses more consistent. > >> and removed most of the code from check_index_only - it still needs to >> decide whether to use the full list of restrictions (regular indexes) >> or the filtered list (for partial indexes). > > The change above allows to reduce the modification still left in > check_index_only. I'm not sure I understand what change you suggest? Could you explain? The change in check_index_only is effectively just (a) comment update and (b) choice of the right list of clauses. I'd say both changes are needed, although (b) could happen outside check_index_only (i.e. we could do the check elsewhere). Is that what you mean? > > cost_index() seems to need to be fixed. It would count excluded > clauses in estimate. Hmm, good point. The problem is that extract_nonindex_conditions uses baserel->baserestrictinfo again, i.e. it does not skip the implied clauses. So we may either stick the filtered clauses somewhere (for example in the IndexPath), teach extract_nonindex_conditions to use predicate_implied_by. I'd say the first option is better. Agreed? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 09/29/2015 04:57 PM, Tomas Vondra wrote: > Hello, > > On 09/29/2015 12:27 PM, Kyotaro HORIGUCHI wrote: ... >> >> cost_index() seems to need to be fixed. It would count excluded >> clauses in estimate. > > Hmm, good point. The problem is that extract_nonindex_conditions uses > baserel->baserestrictinfo again, i.e. it does not skip the implied > clauses. So we may either stick the filtered clauses somewhere (for > example in the IndexPath), teach extract_nonindex_conditions to use > predicate_implied_by. I'd say the first option is better. Agreed? And the attached patch v4 should do the trick - it adds 'indexrinfos' to IndexPath and uses it in cost_index(). CREATE TABLE t AS SELECT i AS a, i AS b, i AS c FROM generate_series(1,1000) s(i); CREATE INDEX idx ON t(a) WHERE b > 1000; Then SELECT a FROM t WHERE b > 1000 AND a < 1000; /* size(qpquals) = 0 */ SELECT a FROM t WHERE b > 1000 AND c < 1000; /* size(qpquals) = 1 */ SELECT a FROM t WHERE c > 1000 AND c < 1000; /* size(qpquals) = 2 */ and so on. Which seems correct I believe. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hello, At Tue, 29 Sep 2015 16:57:03 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <560AA6BF.5030805@2ndquadrant.com> > >>> The patch does not change the check_index_only implementation - it > >>> still needs to check the clauses, just like in v1 of the patch. To > >>> make this re-check unnecessary, we'd have to stick the remaining > >>> clauses somewhere, so that check_index_only can use only the filtered > >>> list (instead of walking through the complete list of restrictions). > >> > >> After thinking about this a bit more, I realized we already have a > >> good place for keeping this list - IndexClauseSet. So I've done that, > > > > The place looks fine but it might be better that rclauses have > > baserestrictinfo itself when indpred == NIL. It would make the > > semantics of rclauses more consistent. > > > >> and removed most of the code from check_index_only - it still needs to > >> decide whether to use the full list of restrictions (regular indexes) > >> or the filtered list (for partial indexes). > > > > The change above allows to reduce the modification still left in > > check_index_only. > > I'm not sure I understand what change you suggest? Could you explain? > > The change in check_index_only is effectively just (a) comment update > and (b) choice of the right list of clauses. I'd say both changes are > needed, although (b) could happen outside check_index_only (i.e. we > could do the check elsewhere). Is that what you mean? I meant that the (b) could be done earlier in match_clause_to_index. Currently clauseset->rclauses stores given (restrict) clauses which are not implied by indpred *only when* indpred is present. But it's no problem that rclauses *always* stores such clauses. rclauses is still storing clauses "not implied by the index" for the case. It is what I meant by "more consistent". The following codelet would be more clearer than my crumsy words:) in match_clause_to_index() > if (index->indpred != NIL && > predicate_implied_by(list_make1(rinfo->clause), index->indpred)) > return; /* ignore clauses implied by index */ > > /* track all clauses not implied by indpred */ > clauseset->rclauses = lappend(clauseset->rclauses, rinfo); > > for (indexcol = 0; indexcol < index->ncolumns; indexcol++) in check_index_only() - * For partial indexes use the filtered clauses, otherwise use the - * baserestrictinfo directly for non-partial indexes. - */ - rclauses = (index->indpred != NIL) ? clauses : rel->baserestrictinfo; - - /* * Add all the attributes used by restriction clauses (only those not * implied by the index predicate forpartial indexes). */ - foreach(lc, rclauses) + foreach(lc, clauses) > > cost_index() seems to need to be fixed. It would count excluded > > clauses in estimate. > > Hmm, good point. The problem is that extract_nonindex_conditions uses > baserel->baserestrictinfo again, i.e. it does not skip the implied > clauses. So we may either stick the filtered clauses somewhere (for > example in the IndexPath), teach extract_nonindex_conditions to use > predicate_implied_by. I'd say the first option is better. Agreed? I'll mention this in a separate reply for the following mail. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, it looks fine. > >> cost_index() seems to need to be fixed. It would count excluded > >> clauses in estimate. > > > > Hmm, good point. The problem is that extract_nonindex_conditions uses > > baserel->baserestrictinfo again, i.e. it does not skip the implied > > clauses. So we may either stick the filtered clauses somewhere (for > > example in the IndexPath), teach extract_nonindex_conditions to use > > predicate_implied_by. I'd say the first option is better. Agreed? > > And the attached patch v4 should do the trick - it adds 'indexrinfos' > to IndexPath and uses it in cost_index(). It seems to work as expected. It'd also be reasonable to have "indexrinfos" in IndexPath since cost_index needs it. Loose join clauses and recheck conditions are not broken. By the way your comment for indexrinfos is as following, > * 'indexrinfos' is a list of RestrictInfo nodes from the query's WHERE > * or JOIN conditions, excluding those implied by the index predicate > * (if the index is not partial, the list includes all restriction clauses). But the v4 patch instead leaves it empty for non-partial indexes:) I prefer to follow this comment because looking the condition (index->indpred != NIL) for such purpose in build_index_paths is somewhat uneasy for me. But I don't insist on that if you choose to avoid useless memory and clock consumption to construct a list which is not so meaningful for non-partial indexes (it is almost all cases). The following comment in match_clause_to_index does not need to be a 'XXX' comment. What made you to feel to do so? (I rather feel that it is not necessary at all.) > * XXX We must do this before trying to match the index to index > * columns, because the index predicates may use columns not > * used in the index itself. Anyway some description on rclauses should be added in the comment for match_clause_to_index, instead of the comments currently added *within* the function. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello! On 09/30/2015 10:29 AM, Kyotaro HORIGUCHI wrote: > By the way your comment for indexrinfos is as following, > >> * 'indexrinfos' is a list of RestrictInfo nodes from the query's WHERE >> * or JOIN conditions, excluding those implied by the index predicate >> * (if the index is not partial, the list includes all restriction clauses). > > But the v4 patch instead leaves it empty for non-partial > indexes:) I prefer to follow this comment because looking the > condition (index->indpred != NIL) for such purpose in > build_index_paths is somewhat uneasy for me. But I don't insist > on that if you choose to avoid useless memory and clock > consumption to construct a list which is not so meaningful for > non-partial indexes (it is almost all cases). Good point. I think we may simply point indexrinfos to the existing list of restrictions in that case - we don't need to copy it. So no additional memory / CPU consumption, and it should make the code working with it a bit simpler. > > The following comment in match_clause_to_index does not need to be a > 'XXX' comment. What made you to feel to do so? (I rather feel that it > is not necessary at all.) I agree that it may not be really necessary. I added the comment because initially I've made the check inside the for loop and then spent some time investigating why it's not working as expected. > >> * XXX We must do this before trying to match the index to index >> * columns, because the index predicates may use columns not >> * used in the index itself. > > Anyway some description on rclauses should be added in the > comment for match_clause_to_index, instead of the comments > currently added *within* the function. OK, will do. kind regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 09/30/2015 12:55 PM, Tomas Vondra wrote: > Hello! > > On 09/30/2015 10:29 AM, Kyotaro HORIGUCHI wrote: > >> By the way your comment for indexrinfos is as following, >> >>> * 'indexrinfos' is a list of RestrictInfo nodes from the query's WHERE >>> * or JOIN conditions, excluding those implied by the index predicate >>> * (if the index is not partial, the list includes all restriction >>> clauses). >> >> But the v4 patch instead leaves it empty for non-partial >> indexes:) I prefer to follow this comment because looking the >> condition (index->indpred != NIL) for such purpose in >> build_index_paths is somewhat uneasy for me. But I don't insist >> on that if you choose to avoid useless memory and clock >> consumption to construct a list which is not so meaningful for >> non-partial indexes (it is almost all cases). > > Good point. I think we may simply point indexrinfos to the existing list > of restrictions in that case - we don't need to copy it. So no > additional memory / CPU consumption, and it should make the code working > with it a bit simpler. Attached is v5 of the patch improving the comments (rephrasing, moving before function etc.). I also attempted to fix the issue with empty list for non-partial indexes by simply setting it to rel->baserestrictinfo in match_restriction_clauses_to_index (for non-partial indexes), but that results in a bunch of ERROR: variable not found in subplan target list during "make installcheck". I can't quite wrap my head around why. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hello, At Thu, 01 Oct 2015 01:36:51 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <560C7213.3010203@2ndquadrant.com> > > Good point. I think we may simply point indexrinfos to the existing > > list > > of restrictions in that case - we don't need to copy it. So no > > additional memory / CPU consumption, and it should make the code > > working > > with it a bit simpler. > > Attached is v5 of the patch improving the comments (rephrasing, moving > before function etc.). Looks good (for me). > I also attempted to fix the issue with empty list for non-partial > indexes by simply setting it to rel->baserestrictinfo in > match_restriction_clauses_to_index (for non-partial indexes), Although it is not the cause of these errors (or any errors on regress), build_paths_for_OR (which doesn't seem to be called in regression tests) doesn't set indexrinfos properly. match_clauses_to_index can commonize(?) the process in return for additional list copy and concat. The attached diff is doing in the latter method. Avoiding the copies will make the code a bit complicated. But anyway regtests doesn't say whether it is sane or not:( (TODO) > but that > results in a bunch of > > ERROR: variable not found in subplan target list > > during "make installcheck". I can't quite wrap my head around why. During considerartion on parameterized joins in get_join_index_paths, caluseset fed to get_index_paths is generated from given join (j|e)clausesets. So completing the clauseset with necessary restrictinfo from rcaluseset fixes the problem. The attached patch applies on the latest v5 patch and will address above issues. (And modifies expected files, which are the manifestation of this improovement). regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index c5c2b6e..105351f 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -662,6 +662,13 @@ get_join_index_paths(PlannerInfo *root, RelOptInfo *rel, /* We should have found something, elsecaller passed silly relids */ Assert(clauseset.nonempty); + /* + * get_index_paths requires that restririctinfo for the index is stored in + * clauseset->indexrinfos. Since it doesn't contain join clauses, just + * copying that of rclauseset is enough. + */ + clauseset.indexrinfos = rclauseset->indexrinfos; + /* Build index path(s) using the collected set of clauses */ get_index_paths(root, rel, index, &clauseset, bitindexpaths); @@ -867,7 +874,6 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel, double loop_count; List *orderbyclauses; List *orderbyclausecols; - List *restrictinfo; List *index_pathkeys; List *useful_pathkeys; bool found_lower_saop_clause; @@ -1015,16 +1021,13 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel, orderbyclausecols = NIL; } - restrictinfo - = (index->indpred != NIL) ? clauses->indexrinfos : rel->baserestrictinfo; - /* * 3. Check if an index-only scan is possible. If we're not building * plain indexscans, this isn't relevantsince bitmap scans don't support * index data retrieval anyway. */ index_only_scan = (scantype != ST_BITMAPSCAN&& - check_index_only(rel, index, restrictinfo)); + check_index_only(rel, index, clauses->indexrinfos)); /* * 4. Generate an indexscan path ifthere are relevant restriction clauses @@ -1038,7 +1041,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel, ipath = create_index_path(root, index, index_clauses, clause_columns, - restrictinfo, + clauses->indexrinfos, orderbyclauses, orderbyclausecols, useful_pathkeys, @@ -1065,7 +1068,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel, ipath = create_index_path(root, index, index_clauses, clause_columns, - restrictinfo, + clauses->indexrinfos, NIL, NIL, useful_pathkeys, @@ -2121,6 +2124,11 @@ match_clauses_to_index(IndexOptInfo *index, Assert(IsA(rinfo, RestrictInfo)); match_clause_to_index(index,rinfo, clauseset); } + + /* copy clauses so that it won't be broken on concatenation afterward */ + if (index->indpred == NIL) + clauseset->indexrinfos = + list_concat(clauseset->indexrinfos, list_copy(clauses));}/* diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out index de826b5..88f6a02 100644 --- a/src/test/regress/expected/aggregates.out +++ b/src/test/regress/expected/aggregates.out @@ -780,7 +780,6 @@ explain (costs off) -> Index Only Scan Backward using minmaxtest2i on minmaxtest2 Index Cond: (f1 IS NOT NULL) -> Index Only Scan using minmaxtest3i on minmaxtest3 - Index Cond: (f1 IS NOT NULL) InitPlan 2 (returns $1) -> Limit -> Merge Append @@ -792,8 +791,7 @@ explain (costs off) -> Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest2_1 Index Cond: (f1 IS NOT NULL) -> Index Only Scan Backward using minmaxtest3ion minmaxtest3 minmaxtest3_1 - Index Cond: (f1 IS NOT NULL) -(25 rows) +(23 rows)select min(f1), max(f1) from minmaxtest; min | max @@ -819,7 +817,6 @@ explain (costs off) -> Index Only Scan Backward using minmaxtest2i on minmaxtest2 Index Cond: (f1 IS NOT NULL) -> Index Only Scan using minmaxtest3i on minmaxtest3 - Index Cond: (f1 IS NOT NULL) InitPlan 2 (returns $1) -> Limit -> Merge Append @@ -831,9 +828,8 @@ explain (costs off) -> Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest2_1 Index Cond: (f1 IS NOT NULL) -> Index Only Scan Backward using minmaxtest3ion minmaxtest3 minmaxtest3_1 - Index Cond: (f1 IS NOT NULL) -> Result -(27 rows) +(25 rows)select distinct min(f1), max(f1) from minmaxtest; min | max
Hello, > The attached patch applies on the latest v5 patch and will > address above issues. (And modifies expected files, which are the > manifestation of this improovement). As you see, it is a quite bad choice. Ugly and unreadable and fragile. The cause of this seeming mismatch would be the place to hold indexrinfos. It is determined only by baserestrictinfo and indpred. Any other components are not involved. So IndexClauseSet is found not to be the best place after all, I suppose. Instead, I came to think that the better place is IndexOptInfo. Partial indexes are examined in check_partial_index and it seems to be the most proper place to check this so far. Thougts? regards, At Tue, 06 Oct 2015 15:12:02 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20151006.151202.58051959.horiguchi.kyotaro@lab.ntt.co.jp> > Hello, > > At Thu, 01 Oct 2015 01:36:51 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <560C7213.3010203@2ndquadrant.com> > > > Good point. I think we may simply point indexrinfos to the existing > > > list > > > of restrictions in that case - we don't need to copy it. So no > > > additional memory / CPU consumption, and it should make the code > > > working > > > with it a bit simpler. > > > > Attached is v5 of the patch improving the comments (rephrasing, moving > > before function etc.). > > Looks good (for me). > > > I also attempted to fix the issue with empty list for non-partial > > indexes by simply setting it to rel->baserestrictinfo in > > match_restriction_clauses_to_index (for non-partial indexes), > > Although it is not the cause of these errors (or any errors on > regress), build_paths_for_OR (which doesn't seem to be called in > regression tests) doesn't set indexrinfos > properly. match_clauses_to_index can commonize(?) the process in > return for additional list copy and concat. The attached diff is > doing in the latter method. Avoiding the copies will make the > code a bit complicated. > > But anyway regtests doesn't say whether it is sane or not:( (TODO) > > > but that > > results in a bunch of > > > > ERROR: variable not found in subplan target list > > > > during "make installcheck". I can't quite wrap my head around why. > > During considerartion on parameterized joins in > get_join_index_paths, caluseset fed to get_index_paths is > generated from given join (j|e)clausesets. So completing the > clauseset with necessary restrictinfo from rcaluseset fixes the > problem. > > The attached patch applies on the latest v5 patch and will > address above issues. (And modifies expected files, which are the > manifestation of this improovement). -- Kyotaro Horiguchi NTT Open Source Software Center
On 10/08/2015 07:30 AM, Kyotaro HORIGUCHI wrote: > Hello, > >> The attached patch applies on the latest v5 patch and will >> address above issues. (And modifies expected files, which are the >> manifestation of this improovement). > > As you see, it is a quite bad choice. Ugly and unreadable and > fragile. I suppose you mean placing the list into IndexClauseSet? > > The cause of this seeming mismatch would be the place to hold > indexrinfos. It is determined only by baserestrictinfo and > indpred. Any other components are not involved. So IndexClauseSet > is found not to be the best place after all, I suppose. > > Instead, I came to think that the better place is > IndexOptInfo. Partial indexes are examined in check_partial_index > and it seems to be the most proper place to check this so far. AFAIK there's only one IndexOptInfo instance per index, so I'm not sure how would that work with queries that use the index in multiple places? Imagine for example table joined to itself, where both sides use the index with different conditions. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello, At Thu, 08 Oct 2015 15:24:35 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <56166E93.8000505@2ndquadrant.com> > > > On 10/08/2015 07:30 AM, Kyotaro HORIGUCHI wrote: > > Hello, > > > >> The attached patch applies on the latest v5 patch and will > >> address above issues. (And modifies expected files, which are the > >> manifestation of this improovement). > > > > As you see, it is a quite bad choice. Ugly and unreadable and > > fragile. > > I suppose you mean placing the list into IndexClauseSet? No, it is a reasonable choice, It's not bad if it has valid clauses only for partial indexes. What is Ugl.. is my additional patch:(, which let IndexClauseSet to have valid clauses. > > The cause of this seeming mismatch would be the place to hold > > indexrinfos. It is determined only by baserestrictinfo and > > indpred. Any other components are not involved. So IndexClauseSet > > is found not to be the best place after all, I suppose. > > > > Instead, I came to think that the better place is > > IndexOptInfo. Partial indexes are examined in check_partial_index > > and it seems to be the most proper place to check this so far. > > AFAIK there's only one IndexOptInfo instance per index, so I'm not > sure how would that work with queries that use the index in multiple > places? No matter if the index is used multiple places, indexrinfos is determined only with baserestrictinfos of the owner relation and itself's indpred, which are invariant through the following steps. One arguable point on the place is that check_partial_indexes could be called again with longer restrictions (by eclass claues), but the comment suggests that the last call will be valid in the following steps so creating indexrinfos in every calls should be valid. However possibly a bit innefficient, we can choose to use the first-created indexrinfos, which would be longer than that could be re-created. > Imagine for example table joined to itself, where both sides > use the index with different conditions. Such table appears as multiple baserels having maybe different baserestrictinfo, so no problme on the case. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, On 10/09/2015 02:59 AM, Kyotaro HORIGUCHI wrote: >>> The cause of this seeming mismatch would be the place to hold >>> indexrinfos. It is determined only by baserestrictinfo and >>> indpred. Any other components are not involved. So IndexClauseSet >>> is found not to be the best place after all, I suppose. >>> >>> Instead, I came to think that the better place is >>> IndexOptInfo. Partial indexes are examined in check_partial_index >>> and it seems to be the most proper place to check this so far. >> >> AFAIK there's only one IndexOptInfo instance per index, so I'm not >> sure how would that work with queries that use the index in multiple >> places? > > No matter if the index is used multiple places, indexrinfos is > determined only with baserestrictinfos of the owner relation and > itself's indpred, which are invariant through the following steps. I'm probably missing something, but let's say we have a table like this: CREATE TABLE t (a INT, b INT, c INT); CREATE INDEX aidx ON t(c) WHERE a = 1; CREATE INDEX bidx ON t(c) WHERE b = 2; and then a trivial query (where each part perfectly matches one of the indexes to allow IOS) SELECT c FROM t WHERE a=1 UNION ALL SELECT c FROM t WHERE b=2; Now, let's say we move indexrinfos to IndexOptInfo - how will that look like for each index? There's only a single IndexOptInfo for each index, so it will have to work with union of all baserestrictinfos. So we'll have these indexrinfos: aidx: {b=2} bidx: {a=1} which makes index only scans unusable. I think we effectively need a separate list of "not implied" clauses per index-baserel combination. Maybe IndexClauseSet is not the right place, but I don't see how IndexOptInfo could work. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello, At Fri, 09 Oct 2015 16:32:31 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <5617CFFF.10606@2ndquadrant.com> > Hello, > > On 10/09/2015 02:59 AM, Kyotaro HORIGUCHI wrote: > >>> The cause of this seeming mismatch would be the place to hold > >>> indexrinfos. It is determined only by baserestrictinfo and > >>> indpred. Any other components are not involved. So IndexClauseSet > >>> is found not to be the best place after all, I suppose. > >>> > >>> Instead, I came to think that the better place is > >>> IndexOptInfo. Partial indexes are examined in check_partial_index > >>> and it seems to be the most proper place to check this so far. > >> > >> AFAIK there's only one IndexOptInfo instance per index, so I'm not > >> sure how would that work with queries that use the index in multiple > >> places? > > > > No matter if the index is used multiple places, indexrinfos is > > determined only with baserestrictinfos of the owner relation and > > itself's indpred, which are invariant through the following steps. > > I'm probably missing something, but let's say we have a table like > this: You might be missing the fact that a table could represented as multiple relation(RelOptInfo)s in PlannerInfo or PlannerGlobal. > CREATE TABLE t (a INT, b INT, c INT); > CREATE INDEX aidx ON t(c) WHERE a = 1; > CREATE INDEX bidx ON t(c) WHERE b = 2; > > and then a trivial query (where each part perfectly matches one of the > indexes to allow IOS) > > SELECT c FROM t WHERE a=1 > UNION ALL > SELECT c FROM t WHERE b=2; > > Now, let's say we move indexrinfos to IndexOptInfo - how will that > look like for each index? There's only a single IndexOptInfo for each > index, so it will have to work with union of all baserestrictinfos. Needless to say about IndexOptInfo, the two t's in the two component SELECTS are represented as two different subquery rels having different baserestrictinfo. So it will correctly be planned as the following with my previous patch. Append (cost=0.12..64.66 rows=20 width=4) -> Index Only Scan using aidx on t (cost=0.12..32.23 rows=10 width=4) -> Index Only Scan using bidx on t t_1 (cost=0.12..32.23 rows=10 width=4) (3 rows) The table t is referred to twice by different (alias) names (though the diferrence is made by EXPLAIN, it shows that they are different rels in plantree). > So we'll have these indexrinfos: > > aidx: {b=2} > bidx: {a=1} Yes, but each of them belongs to different rels. So, > which makes index only scans unusable. The are usable. > I think we effectively need a separate list of "not implied" clauses > per index-baserel combination. > Maybe IndexClauseSet is not the right > place, but I don't see how IndexOptInfo could work. Does this make sense? regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello Kyotaro-san, Sorry for the long delay since your response in this thread :-( On 10/14/2015 08:06 AM, Kyotaro HORIGUCHI wrote: > > The table t is referred to twice by different (alias) names (though > the diferrence is made by EXPLAIN, it shows that they are different > rels in plantree). > >> So we'll have these indexrinfos: >> >> aidx: {b=2} >> bidx: {a=1} > > Yes, but each of them belongs to different rels. So, > >> which makes index only scans unusable. > > The are usable. Yes, you're of course right. I got confused by the comment at IndexOptInfo that says "per-index information" but as you've pointed out it's really "per-index per-reloptinfo" which makes it perfectly suitable for keeping the info we need. However, I'm not sure the changes to check_partial_indexes() are correct - particularly the first loop. /* * Frequently, there will be no partial indexes, so first check to * make sure there's something useful to do here. */ have_partial = false; foreach(lc, rel->indexlist) { IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); /* * index rinfos are the same to baseristrict infos for non-partial * indexes */ index->indrinfos = rel->baserestrictinfo; if (index->indpred == NIL) continue; /* ignore non-partial indexes */ if (index->predOK) continue; /* don't repeat work if already proven OK */ have_partial = true; break; } Now, imagine we have two indexes - partial and regular. In such case the loop will terminate after the first index (assuming predOK=true), so the regular index won't have indrinfos set. I think we need to remove the 'break' at the end. BTW it took me a while to understand the change at the end: /* Search for the first rinfo that is implied by indpred */ foreach (lcr, rel->baserestrictinfo) { RestrictInfo *rinfo= (RestrictInfo *) lfirst(lcr); if (predicate_implied_by(list_make1(rinfo->clause), index->indpred)) break; } /* This index needs rinfos different from baserestrictinfo */ if (lcr) { ... filter implied conditions ... } Is there a reason why not to simply move the second block into the if block in the foreach loop like this? /* Search for the first rinfo that is implied by indpred */ foreach (lcr, rel->baserestrictinfo) { RestrictInfo *rinfo= (RestrictInfo *) lfirst(lcr); if (predicate_implied_by(list_make1(rinfo->clause), index->indpred)) { ... filter implied conditions... break; } } Otherwise the reworked patch seems fine to me, but I'll give it a bit more testing over the next few days. Thanks for the help so far! -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Dec 7, 2015 at 7:48 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Hello Kyotaro-san, > > Sorry for the long delay since your response in this thread :-( > > On 10/14/2015 08:06 AM, Kyotaro HORIGUCHI wrote: >> >> >> The table t is referred to twice by different (alias) names (though >> the diferrence is made by EXPLAIN, it shows that they are different >> rels in plantree). >> >>> So we'll have these indexrinfos: >>> >>> aidx: {b=2} >>> bidx: {a=1} >> >> >> Yes, but each of them belongs to different rels. So, >> >>> which makes index only scans unusable. >> >> >> The are usable. > > > Yes, you're of course right. I got confused by the comment at IndexOptInfo > that says "per-index information" but as you've pointed out it's really > "per-index per-reloptinfo" which makes it perfectly suitable for keeping the > info we need. > > However, I'm not sure the changes to check_partial_indexes() are correct - > particularly the first loop. > > /* > * Frequently, there will be no partial indexes, so first check to > * make sure there's something useful to do here. > */ > have_partial = false; > foreach(lc, rel->indexlist) > { > IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); > > /* > * index rinfos are the same to baseristrict infos for non-partial > * indexes > */ > index->indrinfos = rel->baserestrictinfo; > > if (index->indpred == NIL) > continue; /* ignore non-partial indexes */ > > if (index->predOK) > continue; /* don't repeat work if already proven OK */ > > have_partial = true; > break; > } > > Now, imagine we have two indexes - partial and regular. In such case the > loop will terminate after the first index (assuming predOK=true), so the > regular index won't have indrinfos set. I think we need to remove the > 'break' at the end. > > BTW it took me a while to understand the change at the end: > > /* Search for the first rinfo that is implied by indpred */ > foreach (lcr, rel->baserestrictinfo) > { > RestrictInfo *rinfo = (RestrictInfo *) lfirst(lcr); > > if (predicate_implied_by(list_make1(rinfo->clause), > index->indpred)) > break; > } > > /* This index needs rinfos different from baserestrictinfo */ > if (lcr) > { > ... filter implied conditions ... > } > > Is there a reason why not to simply move the second block into the if block > in the foreach loop like this? > > /* Search for the first rinfo that is implied by indpred */ > foreach (lcr, rel->baserestrictinfo) > { > RestrictInfo *rinfo = (RestrictInfo *) lfirst(lcr); > > if (predicate_implied_by(list_make1(rinfo->clause), > index->indpred)) > { > ... filter implied conditions ... > break; > } > } > > > Otherwise the reworked patch seems fine to me, but I'll give it a bit more > testing over the next few days. > > Thanks for the help so far! Tomas, are you still working on that? This thread is stalling for 3 weeks. -- Michael
Hi, On 12/24/2015 04:05 AM, Michael Paquier wrote: > On Mon, Dec 7, 2015 at 7:48 AM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: ... >> >> Otherwise the reworked patch seems fine to me, but I'll give it a bit more >> testing over the next few days. >> >> Thanks for the help so far! > > Tomas, are you still working on that? This thread is stalling for 3 weeks. I haven't discovered anything interesting during the testing, so I guess the "needs review" state is appropriate. Let's move the patch to the next commitfest. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra wrote: > On 12/24/2015 04:05 AM, Michael Paquier wrote: > >Tomas, are you still working on that? This thread is stalling for 3 weeks. > > I haven't discovered anything interesting during the testing, so I guess the > "needs review" state is appropriate. Let's move the patch to the next > commitfest. Not sure what to do here, since this patch got no feedback at all in this CF. The right thing to do, ISTM, is to just move it again to the next CF. But it'd be really useful if someone can have it a look and verify at least whether it doesn't need a rebase (requiring a further submission) so that other people can play with it. Of course, if Horiguchi-san or anyone has more review comments, that would be even better. Tomas said he'd do more testing, but we never got a report on whether anything turned up. (At this point I'm not sure if either Kyotaro or Tomas should be considered the patch author ... maybe both?) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I am very interested in this patch because it allows to use partial indexes to ... speed up inserts. I have implemented "ALTER INDEX ... WHERE ..." construction which allows to change predicate of partial index without necessityto fully rebuild it. So it is not necessary to insert new records in index immediately (if new records do not match partial index conditions). It can be done later in background (or at night). My experiments show that it allows to increase insert speed five times(for either partial indexes). At the same time we do not loose RDBMS requirement that result of query should not depend on presence of indexes. And itis applicable to all indexes: B-Tree, GIN, GIST,... But such optimization makes sense only of partial indexes can be used without extra overhead, first of all for index-onlyscans. And it is impossible without this patch. On 01/31/2016 03:34 PM, Alvaro Herrera wrote: > Tomas Vondra wrote: > >> On 12/24/2015 04:05 AM, Michael Paquier wrote: >>> Tomas, are you still working on that? This thread is stalling for 3 weeks. >> I haven't discovered anything interesting during the testing, so I guess the >> "needs review" state is appropriate. Let's move the patch to the next >> commitfest. > Not sure what to do here, since this patch got no feedback at all in > this CF. The right thing to do, ISTM, is to just move it again to the > next CF. But it'd be really useful if someone can have it a look and > verify at least whether it doesn't need a rebase (requiring a further > submission) so that other people can play with it. Of course, if > Horiguchi-san or anyone has more review comments, that would be even > better. > > Tomas said he'd do more testing, but we never got a report on whether > anything turned up. > > (At this point I'm not sure if either Kyotaro or Tomas should be > considered the patch author ... maybe both?) > -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Konstantin Knizhnik wrote: > I am very interested in this patch because it allows to use partial indexes to ... speed up inserts. > I have implemented "ALTER INDEX ... WHERE ..." construction which allows to change predicate of partial index without necessityto fully rebuild it. > So it is not necessary to insert new records in index immediately (if new records do not match partial index conditions). > It can be done later in background (or at night). My experiments show that it allows to increase insert speed five times(for either partial indexes). > At the same time we do not loose RDBMS requirement that result of query should not depend on presence of indexes. And itis applicable to all indexes: B-Tree, GIN, GIST,... > > But such optimization makes sense only of partial indexes can be used without extra overhead, first of all for index-onlyscans. > And it is impossible without this patch. That sounds interesting. So please review this patch and let us know whether you like it, or whether you have any better ideas for any particular hunk, or whether you think it should be rewritten from scratch, or just state that it is perfect. If you think it's useful, then it's a good idea to vouch for it to be integrated; and one way of doing that is making sure it meets the quality standards etc. If you don't say anything about the patch, we may never integrate it because we might have doubts about whether it's worthy. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I have applied this patch to our working branch and during several weeks we ran various tests and benchmarks. We have not noticed any problems or performance degradation. And at some queries this patch cause very significant increase of performance - ten times: With this patch: postgres=# explain analyze select count(*) from t where k1<1000000 and pk < 1454434742881892; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=29.65..29.66 rows=1 width=0) (actual time=0.108..0.108 rows=1 loops=1) -> Index Only Scan using idx1 on t (cost=0.43..27.49 rows=861 width=0) (actual time=0.012..0.071 rows=963 loops=1) Index Cond: (k1 < 1000000) Heap Fetches: 0 Planningtime: 0.100 ms Execution time: 0.121 ms (6 rows) Without patch: postgres=# explain analyze select count(*) from t where k1<1000000 and pk < 1454434742881892; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=2951.55..2951.56 rows=1 width=0) (actual time=1.070..1.070 rows=1 loops=1) -> Bitmap Heap Scan on t (cost=19.10..2949.40 rows=861 width=0) (actual time=0.161..0.997 rows=963 loops=1) Recheck Cond: ((k1 < 1000000) AND (pk < '1454434742881892'::bigint)) Heap Blocks: exact=954 -> Bitmap Index Scan on idx1 (cost=0.00..18.88 rows=861 width=0) (actual time=0.083..0.083 rows=963 loops=1) Index Cond: (k1 < 1000000) Planning time: 0.099 ms Executiontime: 1.089 ms (8 rows) On 01.02.2016 01:11, Alvaro Herrera wrote: > Konstantin Knizhnik wrote: >> I am very interested in this patch because it allows to use partial indexes to ... speed up inserts. >> I have implemented "ALTER INDEX ... WHERE ..." construction which allows to change predicate of partial index withoutnecessity to fully rebuild it. >> So it is not necessary to insert new records in index immediately (if new records do not match partial index conditions). >> It can be done later in background (or at night). My experiments show that it allows to increase insert speed five times(for either partial indexes). >> At the same time we do not loose RDBMS requirement that result of query should not depend on presence of indexes. Andit is applicable to all indexes: B-Tree, GIN, GIST,... >> >> But such optimization makes sense only of partial indexes can be used without extra overhead, first of all for index-onlyscans. >> And it is impossible without this patch. > That sounds interesting. So please review this patch and let us know > whether you like it, or whether you have any better ideas for any > particular hunk, or whether you think it should be rewritten from > scratch, or just state that it is perfect. If you think it's useful, > then it's a good idea to vouch for it to be integrated; and one way of > doing that is making sure it meets the quality standards etc. If you > don't say anything about the patch, we may never integrate it because we > might have doubts about whether it's worthy. > -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi, On 12/06/2015 11:48 PM, Tomas Vondra wrote: > /* > * Frequently, there will be no partial indexes, so first check to > * make sure there's something useful to do here. > */ > have_partial = false; > foreach(lc, rel->indexlist) > { > IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); > > /* > * index rinfos are the same to baseristrict infos for non-partial > * indexes > */ > index->indrinfos = rel->baserestrictinfo; > > if (index->indpred == NIL) > continue; /* ignore non-partial indexes */ > > if (index->predOK) > continue; /* don't repeat work if already proven OK */ > > have_partial = true; > break; > } Attached is a v6 of the patch, which is actually the version submitted by Kyotaro-san on 2015/10/8 rebased to current master and with two additional changes. Firstly, I've removed the "break" from the initial foreach loop in check_partial_indexes(). As explained in the previous message, I believe this was a bug in the patch. Secondly, I've tried to improve the comments to explain a bit better what the code is doing. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hello, Tomas. my cerebral cortext gets squeezed by jumping from parser to planner. At Wed, 24 Feb 2016 01:13:22 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <56CCF5A2.5040702@2ndquadrant.com> > Hi, > > On 12/06/2015 11:48 PM, Tomas Vondra wrote: > > /* > > * Frequently, there will be no partial indexes, so first check to > > * make sure there's something useful to do here. > > */ > > have_partial = false; > > foreach(lc, rel->indexlist) > > { > > IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); > > > > /* > > * index rinfos are the same to baseristrict infos for non-partial > > * indexes > > */ > > index->indrinfos = rel->baserestrictinfo; > > > > if (index->indpred == NIL) > > continue; /* ignore non-partial indexes */ > > > > if (index->predOK) > > continue; /* don't repeat work if already proven OK */ > > > > have_partial = true; > > break; > > } > > Attached is a v6 of the patch, which is actually the version submitted > by Kyotaro-san on 2015/10/8 rebased to current master and with two > additional changes. This relies on the fact I believe that no indexrelinfos are shared among any two baserels. Are you OK with that? > Firstly, I've removed the "break" from the initial foreach loop in > check_partial_indexes(). As explained in the previous message, I > believe this was a bug in the patch. I agreed. The break is surely a bug. (the regression didn't find it though..) > Secondly, I've tried to improve the comments to explain a bit better > what the code is doing. In check_partial_indexes, the following commnet. > /* > * Update restrictinfo for this index by excluding clauses that > * are implied by the index predicates. We first quickly check if > * there are any implied clauses, and if we found at least one > * we do the actual work. This way we don't need the construct > * a copy of the list unless actually needed. > */ Is "need the construct" a mistake of "need to construct" ? match_restriction_clauses_to_index is called only from create_index_paths, and now the former doesn't use its first parameter rel. We can safely remove the useless parameter. - match_restriction_clauses_to_index(RelOptInfo *rel, IndexOptInfo *index, - IndexClauseSet *clauseset) + match_restriction_clauses_to_index(IndexOptInfo *index, + IndexClauseSet *clauseset) I find no other problem and have no objection to this. May I mark this "Ready for committer" after fixing them? regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hi, On 02/25/2016 11:56 AM, Kyotaro HORIGUCHI wrote: > Hello, Tomas. my cerebral cortext gets squeezed by jumping from > parser to planner. LOL > > At Wed, 24 Feb 2016 01:13:22 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <56CCF5A2.5040702@2ndquadrant.com> >> Hi, >> >> On 12/06/2015 11:48 PM, Tomas Vondra wrote: >>> /* >>> * Frequently, there will be no partial indexes, so first check to >>> * make sure there's something useful to do here. >>> */ >>> have_partial = false; >>> foreach(lc, rel->indexlist) >>> { >>> IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); >>> >>> /* >>> * index rinfos are the same to baseristrict infos for non-partial >>> * indexes >>> */ >>> index->indrinfos = rel->baserestrictinfo; >>> >>> if (index->indpred == NIL) >>> continue; /* ignore non-partial indexes */ >>> >>> if (index->predOK) >>> continue; /* don't repeat work if already proven OK */ >>> >>> have_partial = true; >>> break; >>> } >> >> Attached is a v6 of the patch, which is actually the version >> submitted by Kyotaro-san on 2015/10/8 rebased to current master and >> with two additional changes. > > This relies on the fact I believe that no indexrelinfos are > shared among any two baserels. Are you OK with that? I'm not sure what this refers to? You mean the fact that an index will have one IndexOptInfo instance for each baserel? Yes, that seems fine to me. > >> Firstly, I've removed the "break" from the initial foreach loop in >> check_partial_indexes(). As explained in the previous message, I >> believe this was a bug in the patch. > > I agreed. The break is surely a bug. (the regression didn't find > it though..) > >> Secondly, I've tried to improve the comments to explain a bit better >> what the code is doing. > > In check_partial_indexes, the following commnet. > >> /* >> * Update restrictinfo for this index by excluding clauses that >> * are implied by the index predicates. We first quickly check if >> * there are any implied clauses, and if we found at least one >> * we do the actual work. This way we don't need the construct >> * a copy of the list unless actually needed. >> */ > > Is "need the construct" a mistake of "need to construct" ? > > > match_restriction_clauses_to_index is called only from > create_index_paths, and now the former doesn't use its first > parameter rel. We can safely remove the useless parameter. > > - match_restriction_clauses_to_index(RelOptInfo *rel, IndexOptInfo *index, > - IndexClauseSet *clauseset) > + match_restriction_clauses_to_index(IndexOptInfo *index, > + IndexClauseSet *clauseset) > > I find no other problem and have no objection to this. May I mark > this "Ready for committer" after fixing them? OK. Do you want me to do the fixes, or will you do that? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, At Thu, 25 Feb 2016 12:22:45 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <56CEE405.30108@2ndquadrant.com> > >> Attached is a v6 of the patch, which is actually the version > >> submitted by Kyotaro-san on 2015/10/8 rebased to current master and > >> with two additional changes. > > > > This relies on the fact I believe that no indexrelinfos are > > shared among any two baserels. Are you OK with that? > > I'm not sure what this refers to? You mean the fact that an index will > have one IndexOptInfo instance for each baserel? Yes, that seems fine > to me. Yes that what I meant. > >> Firstly, I've removed the "break" from the initial foreach loop in > >> check_partial_indexes(). As explained in the previous message, I > >> believe this was a bug in the patch. > > > > I agreed. The break is surely a bug. (the regression didn't find > > it though..) > > > >> Secondly, I've tried to improve the comments to explain a bit better > >> what the code is doing. > > > > In check_partial_indexes, the following commnet. > > > >> /* > >> * Update restrictinfo for this index by excluding clauses that > >> * are implied by the index predicates. We first quickly check if > >> * there are any implied clauses, and if we found at least one > >> * we do the actual work. This way we don't need the construct > >> * a copy of the list unless actually needed. > >> */ > > > > Is "need the construct" a mistake of "need to construct" ? > > > > > > match_restriction_clauses_to_index is called only from > > create_index_paths, and now the former doesn't use its first > > parameter rel. We can safely remove the useless parameter. > > > > - match_restriction_clauses_to_index(RelOptInfo *rel, IndexOptInfo > > - *index, > > - IndexClauseSet *clauseset) > > + match_restriction_clauses_to_index(IndexOptInfo *index, > > + IndexClauseSet *clauseset) > > > > I find no other problem and have no objection to this. May I mark > > this "Ready for committer" after fixing them? > > OK. Do you want me to do the fixes, or will you do that? Ah. I will do. Please wait a minute. regares, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, This is a (maybe) committer-ready patch of a Tomas Vondra's project. This patch applies on the current master and passes make check. This is to exclude some base-estrict clauses that are implied by index predicates on index scans on partial indexes. First, this patch adds a new member indexrinfos to IndexOptInfo, which usually has the same value with baserestrictinfo of the parent RelOptInfo and cost_index() uses this instead of RelOptInfo.baserestrictinfo. For partial indexes, the clauses implied by the index predicates are removed from the indexrinfos, so that the result plan for scans on such indexes won't contain such restrictions. Such restrictions commonly become filter quals that are nothing but a useless work to do, so this will improve the performance of some index scans on partial indexes. The largest part of the extra cost of the additional work would be the cost of predicate_implied_by() on all clauses of baserectrictinfo and indpred of every IndexOptInfos. The extra work is done in check_partial_indexes() for all index scans on partial indexes. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
I marked this as "ready for commiter" and tried to add me as the *second* author. But the CF app forces certain msyterious order for listed names. Is there any means to arrange the author names in desired order? At Fri, 26 Feb 2016 16:06:37 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20160226.160637.65689630.horiguchi.kyotaro@lab.ntt.co.jp> > Hello, This is a (maybe) committer-ready patch of a Tomas > Vondra's project. > > This patch applies on the current master and passes make check. > > This is to exclude some base-estrict clauses that are implied by > index predicates on index scans on partial indexes. > > First, this patch adds a new member indexrinfos to IndexOptInfo, > which usually has the same value with baserestrictinfo of the > parent RelOptInfo and cost_index() uses this instead of > RelOptInfo.baserestrictinfo. > > For partial indexes, the clauses implied by the index predicates > are removed from the indexrinfos, so that the result plan for > scans on such indexes won't contain such restrictions. Such > restrictions commonly become filter quals that are nothing but a > useless work to do, so this will improve the performance of some > index scans on partial indexes. > > The largest part of the extra cost of the additional work would > be the cost of predicate_implied_by() on all clauses of > baserectrictinfo and indpred of every IndexOptInfos. The extra > work is done in check_partial_indexes() for all index scans on > partial indexes. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Feb 26, 2016 at 4:18 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > I marked this as "ready for commiter" and tried to add me as the > *second* author. But the CF app forces certain msyterious order > for listed names. Is there any means to arrange the author names > in desired order? Those are automatically classified by alphabetical order. -- Michael
On Fri, Feb 26, 2016 at 6:16 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Feb 26, 2016 at 4:18 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> I marked this as "ready for commiter" and tried to add me as the >> *second* author. But the CF app forces certain msyterious order >> for listed names. Is there any means to arrange the author names >> in desired order? > > Those are automatically classified by alphabetical order. Doh. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Feb 27, 2016 at 1:08 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Feb 26, 2016 at 6:16 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Fri, Feb 26, 2016 at 4:18 PM, Kyotaro HORIGUCHI >> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>> I marked this as "ready for commiter" and tried to add me as the >>> *second* author. But the CF app forces certain msyterious order >>> for listed names. Is there any means to arrange the author names >>> in desired order? >> >> Those are automatically classified by alphabetical order. > > Doh. Hm? Not sure I am getting that. My point if that they could appear in the order they have been entered by the user or the order they have been saved, in which case one could take advantage of that to define the a list of authors ordered by the amount of work they did. -- Michael
On Sat, Feb 27, 2016 at 6:19 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Sat, Feb 27, 2016 at 1:08 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Feb 26, 2016 at 6:16 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> On Fri, Feb 26, 2016 at 4:18 PM, Kyotaro HORIGUCHI >>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>>> I marked this as "ready for commiter" and tried to add me as the >>>> *second* author. But the CF app forces certain msyterious order >>>> for listed names. Is there any means to arrange the author names >>>> in desired order? >>> >>> Those are automatically classified by alphabetical order. >> >> Doh. > > Hm? Not sure I am getting that. My point if that they could appear in > the order they have been entered by the user or the order they have > been saved, in which case one could take advantage of that to define > the a list of authors ordered by the amount of work they did. Well it seems to me it ought to let you specify the order. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > Hello, This is a (maybe) committer-ready patch of a Tomas > Vondra's project. I think this needs quite a bit of work yet. A few comments: * If we're going to pay the price of identifying implied restriction conditions in check_partial_indexes(), we should at least recoup some of that investment by not doing it again in create_indexscan_plan(). * create_indexscan_plan() has this comment about what it's doing: * We can also discard quals that are implied by a partialindex's * predicate, but only in a plain SELECT; when scanning a target relation * of UPDATE/DELETE/SELECT FORUPDATE, we must leave such quals in the * plan so that they'll be properly rechecked by EvalPlanQual testing. I believe that that problem applies for this optimization as well, and thus that you can only remove implied quals in plain SELECT. At least, if there's some reason why that problem does *not* apply, there had darn well better be a comment explaining why it's safe. * Adding indexrinfos to IndexPath seems unnecessary, since that struct already has the "index" pointer --- you can just get the field out of the IndexOptInfo when you need it. If you insist on having the extra field, this patch is short of the threshold for correctness on adding fields to paths. It missed _outIndexPath for instance. * The additional #include in costsize.c has no apparent reason. * The changes in cost_index() really ought to come with some change in the associated comments. * Personally I'd not change the signature of match_restriction_clauses_to_index; that's just code churn that may have to get undone someday. * The block comment in check_index_only needs more thought than this, as the phrase "The same is true" is now invalid; the "same" it refers to *isn't* the same anymore. * I'm not too thrilled with injecting the initialization of index->indrinfos into the initial loop in check_partial_indexes(). If it stays there, I'd certainly expect the comment ahead of the loop to be changed to have something to do with reality. But can't we find some more-appropriate place to initialize it? Like maybe where the IndexOptInfo is first created? I would not really expect check_partial_indexes() to have side-effects on non-partial indexes. * I think the double loop in check_partial_indexes() is too cute by half. I'd be inclined to just build the replacement list unconditionally while we do the predicate_implied_by() tests. Those are expensive enough that saving one lappend per implication-test is a useless optimization, especially if it requires code as contorted and bug-prone as this. * The comment added to IndexOptInfo is not very adequate, and not spelled correctly either. There's a block comment you should be adding a para to (probably take the text you added for struct IndexPath). And again, there is more work to do to add a field to such a struct, eg outfuncs.c. Usually a good way to find all the places to touch is to grep for some of the existing field names in the struct. * I don't much care for the field name "indrinfos"; it's neither very readable nor descriptive. Don't have a better suggestion right now though. * Not sure if new regression test cases would be appropriate. The changes in the existing cases seem a bit unfortunate actually; I'm afraid that this may be defeating the original intent of those tests. I'm setting this back to Waiting on Author. regards, tom lane
Hello, thank you for the comments. The new v8 patch is attched. At Tue, 08 Mar 2016 18:08:55 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <21567.1457478535@sss.pgh.pa.us> > Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > > Hello, This is a (maybe) committer-ready patch of a Tomas > > Vondra's project. > > I think this needs quite a bit of work yet. A few comments: Not a few at all. > * If we're going to pay the price of identifying implied restriction > conditions in check_partial_indexes(), we should at least recoup some > of that investment by not doing it again in create_indexscan_plan(). Moved a part of the check from create_indexscan_plan into check_partial_indexes. I noticed that we should avoid to exlude clauses that contain mutable functions so I added that. But I don't understand the reason for the following condition to refuse clause pruning. | rel->relid == root->parse->resultRelation > * create_indexscan_plan() has this comment about what it's doing: > * We can also discard quals that are implied by a partial index's > * predicate, but only in a plain SELECT; when scanning a target relation > * of UPDATE/DELETE/SELECT FOR UPDATE, we must leave such quals in the > * plan so that they'll be properly rechecked by EvalPlanQual testing. > I believe that that problem applies for this optimization as well, > and thus that you can only remove implied quals in plain SELECT. > At least, if there's some reason why that problem does *not* apply, > there had darn well better be a comment explaining why it's safe. This is done in check_partial_indexes using parse_rowmark. The problem haven't realized with the previous patch because it (accidentially) used rel->baserestirictinfo, not index->indrinfos for scan_clauses in create_scan_plan. But the way how create_scan_plan gets scan_clauses seems bad. I haven't have any clean idea to deal with it. > * Adding indexrinfos to IndexPath seems unnecessary, since that struct > already has the "index" pointer --- you can just get the field out of the > IndexOptInfo when you need it. If you insist on having the extra field, > this patch is short of the threshold for correctness on adding fields to > paths. It missed _outIndexPath for instance. Sorry for the stupid code. I don't insist to keep it. Removed. > * The additional #include in costsize.c has no apparent reason. Thank you for pointing out. Removed. > * The changes in cost_index() really ought to come with some change > in the associated comments. I tried to add a comment but it doesn't seem clear. > * Personally I'd not change the signature of > match_restriction_clauses_to_index; that's just code churn that > may have to get undone someday. No problem, reverted. > * The block comment in check_index_only needs more thought than this, > as the phrase "The same is true" is now invalid; the "same" it refers > to *isn't* the same anymore. Maybe I took this "the same" wrongly. Tried to fix it but I'm not confident on the result. > * I'm not too thrilled with injecting the initialization of > index->indrinfos into the initial loop in check_partial_indexes(). > If it stays there, I'd certainly expect the comment ahead of the > loop to be changed to have something to do with reality. But can't > we find some more-appropriate place to initialize it? Like maybe > where the IndexOptInfo is first created? I would not really expect > check_partial_indexes() to have side-effects on non-partial indexes. Mmm. That is quote right in general. IndexOptInfo is created in get_relation_info() but baserestrictinfo has not been fixed at the point. It is fixed as late as set_append_rel_size, almost just before set_rel_size, and just before the check_partial_indexes. But initializing indrinfos as a side-effect of check_partial_indexes is not good as you pointed. But it is called in two ways, set_tablesample_rel_size and set_plain_rel_size. So the only possible position of that other than check_partial_indexes is set_rel_size. > * I think the double loop in check_partial_indexes() is too cute by half. > I'd be inclined to just build the replacement list unconditionally while > we do the predicate_implied_by() tests. Those are expensive enough that > saving one lappend per implication-test is a useless optimization, > especially if it requires code as contorted and bug-prone as this. Ok, I removed the too cute part and added comment mentioning the reason for the unconditional replacement. > * The comment added to IndexOptInfo is not very adequate, and not spelled > correctly either. There's a block comment you should be adding a para to > (probably take the text you added for struct IndexPath). I understand that you are mentioning here. + List *indrinfos; /* baseristrict info which are not implied by + * indpred */ I rewritten to make sense, maybe. > And again, > there is more work to do to add a field to such a struct, eg outfuncs.c. > Usually a good way to find all the places to touch is to grep for some of > the existing field names in the struct. Sorry, I just forgot of that. (In spite that I myself give such kind of comments..) Yeah, I love find-grep on emacs. By the way, I found this comment in copyfuncs.c but I couldn't find the "subsidiary structs". | * We don't support copying RelOptInfo, IndexOptInfo, or Path nodes. | * There are some subsidiary structs that are useful to copy, though. Finally, all I added for this was one line in _outIndexOptInfo. > * I don't much care for the field name "indrinfos"; it's neither very > readable nor descriptive. Don't have a better suggestion right now > though. I agree with you. I didn't like the name so I rethought that. I followed the seeming rule that prefixing with 'ind' to the field name, but it is not for index, but for the parent relation. So I renamed it as "baserestrictinfo" in this version. > * Not sure if new regression test cases would be appropriate. The changes > in the existing cases seem a bit unfortunate actually; I'm afraid that > this may be defeating the original intent of those tests. Only aggregates.out is modifed in this patch. The comment for the test says that, > -- > -- Test cases that should be optimized into indexscans instead of > -- the generic aggregate implementation. ... > -- try it on an inheritance tree ... > explain (costs off) > select min(f1), max(f1) from minmaxtest; and > -- DISTINCT doesn't do anything useful here, but it shouldn't fail > explain (costs off) > select distinct min(f1), max(f1) from minmaxtest; Utterly no problem from the point of the comment. Although this patch removes "Index Cond"s for the index minmaxtest3i, it is simplly caused by a index predicate on the index, which is the very result of this patch. > I'm setting this back to Waiting on Author. Attached the new version v8. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Sorry, I should correct one point. At Wed, 09 Mar 2016 17:29:49 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20160309.172949.84135555.horiguchi.kyotaro@lab.ntt.co.jp> > Hello, thank you for the comments. The new v8 patch is attched. > > At Tue, 08 Mar 2016 18:08:55 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <21567.1457478535@sss.pgh.pa.us> > > Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > > > Hello, This is a (maybe) committer-ready patch of a Tomas > > > Vondra's project. > > > > I think this needs quite a bit of work yet. A few comments: > > Not a few at all. > > > * If we're going to pay the price of identifying implied restriction > > conditions in check_partial_indexes(), we should at least recoup some > > of that investment by not doing it again in create_indexscan_plan(). > > Moved a part of the check from create_indexscan_plan into > check_partial_indexes. > > I noticed that we should avoid to exlude > clauses that contain mutable functions so I added that. But I > don't understand the reason for the following condition to refuse > clause pruning. > > | rel->relid == root->parse->resultRelation > > > > * create_indexscan_plan() has this comment about what it's doing: > > * We can also discard quals that are implied by a partial index's > > * predicate, but only in a plain SELECT; when scanning a target relation > > * of UPDATE/DELETE/SELECT FOR UPDATE, we must leave such quals in the > > * plan so that they'll be properly rechecked by EvalPlanQual testing. > > I believe that that problem applies for this optimization as well, > > and thus that you can only remove implied quals in plain SELECT. > > At least, if there's some reason why that problem does *not* apply, > > there had darn well better be a comment explaining why it's safe. > > This is done in check_partial_indexes using parse_rowmark. No, using plan_rowmark. It is called for expanded inhertance. > The problem haven't realized with the previous patch because it > (accidentially) used rel->baserestirictinfo, not > index->indrinfos for scan_clauses in create_scan_plan. > > But the way how create_scan_plan gets scan_clauses seems > bad. I haven't have any clean idea to deal with it. > > > * Adding indexrinfos to IndexPath seems unnecessary, since that struct > > already has the "index" pointer --- you can just get the field out of the > > IndexOptInfo when you need it. If you insist on having the extra field, > > this patch is short of the threshold for correctness on adding fields to > > paths. It missed _outIndexPath for instance. > > Sorry for the stupid code. I don't insist to keep it. Removed. > > > * The additional #include in costsize.c has no apparent reason. > > Thank you for pointing out. Removed. > > > * The changes in cost_index() really ought to come with some change > > in the associated comments. > > I tried to add a comment but it doesn't seem clear. > > > * Personally I'd not change the signature of > > match_restriction_clauses_to_index; that's just code churn that > > may have to get undone someday. > > No problem, reverted. > > > * The block comment in check_index_only needs more thought than this, > > as the phrase "The same is true" is now invalid; the "same" it refers > > to *isn't* the same anymore. > > Maybe I took this "the same" wrongly. Tried to fix it but I'm not > confident on the result. > > > * I'm not too thrilled with injecting the initialization of > > index->indrinfos into the initial loop in check_partial_indexes(). > > If it stays there, I'd certainly expect the comment ahead of the > > loop to be changed to have something to do with reality. But can't > > we find some more-appropriate place to initialize it? Like maybe > > where the IndexOptInfo is first created? I would not really expect > > check_partial_indexes() to have side-effects on non-partial indexes. > > Mmm. That is quote right in general. IndexOptInfo is created in > get_relation_info() but baserestrictinfo has not been fixed at > the point. It is fixed as late as set_append_rel_size, almost > just before set_rel_size, and just before the > check_partial_indexes. But initializing indrinfos as a > side-effect of check_partial_indexes is not good as you pointed. > But it is called in two ways, set_tablesample_rel_size and > set_plain_rel_size. So the only possible position of that other > than check_partial_indexes is set_rel_size. > > > > * I think the double loop in check_partial_indexes() is too cute by half. > > I'd be inclined to just build the replacement list unconditionally while > > we do the predicate_implied_by() tests. Those are expensive enough that > > saving one lappend per implication-test is a useless optimization, > > especially if it requires code as contorted and bug-prone as this. > > Ok, I removed the too cute part and added comment mentioning the > reason for the unconditional replacement. > > > * The comment added to IndexOptInfo is not very adequate, and not spelled > > correctly either. There's a block comment you should be adding a para to > > (probably take the text you added for struct IndexPath). > > I understand that you are mentioning here. > > + List *indrinfos; /* baseristrict info which are not implied by > + * indpred */ > > I rewritten to make sense, maybe. > > > And again, > > there is more work to do to add a field to such a struct, eg outfuncs.c. > > Usually a good way to find all the places to touch is to grep for some of > > the existing field names in the struct. > > Sorry, I just forgot of that. (In spite that I myself give such > kind of comments..) Yeah, I love find-grep on emacs. > > By the way, I found this comment in copyfuncs.c but I couldn't > find the "subsidiary structs". > > | * We don't support copying RelOptInfo, IndexOptInfo, or Path nodes. > | * There are some subsidiary structs that are useful to copy, though. > > Finally, all I added for this was one line in _outIndexOptInfo. > > > * I don't much care for the field name "indrinfos"; it's neither very > > readable nor descriptive. Don't have a better suggestion right now > > though. > > I agree with you. I didn't like the name so I rethought that. I > followed the seeming rule that prefixing with 'ind' to the field > name, but it is not for index, but for the parent relation. So I > renamed it as "baserestrictinfo" in this version. > > > * Not sure if new regression test cases would be appropriate. The changes > > in the existing cases seem a bit unfortunate actually; I'm afraid that > > this may be defeating the original intent of those tests. > > Only aggregates.out is modifed in this patch. The comment for the > test says that, > > > -- > > -- Test cases that should be optimized into indexscans instead of > > -- the generic aggregate implementation. > ... > > -- try it on an inheritance tree > ... > > explain (costs off) > > select min(f1), max(f1) from minmaxtest; > > and > > > -- DISTINCT doesn't do anything useful here, but it shouldn't fail > > explain (costs off) > > select distinct min(f1), max(f1) from minmaxtest; > > Utterly no problem from the point of the comment. Although this > patch removes "Index Cond"s for the index minmaxtest3i, it is > simplly caused by a index predicate on the index, which is the > very result of this patch. > > > I'm setting this back to Waiting on Author. > > Attached the new version v8. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 3/9/16 3:29 AM, Kyotaro HORIGUCHI wrote: > Hello, thank you for the comments. The new v8 patch is attched. As far as I can see this patch should be marked "ready for review" so now it is. Thanks, -- -David david@pgmasters.net
Thank for changing status. At Wed, 16 Mar 2016 12:13:07 -0400, David Steele <david@pgmasters.net> wrote in <56E98613.5000003@pgmasters.net> > On 3/9/16 3:29 AM, Kyotaro HORIGUCHI wrote: > > > Hello, thank you for the comments. The new v8 patch is attched. > > As far as I can see this patch should be marked "ready for review" so > now it is. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hi, On 03/09/2016 09:29 AM, Kyotaro HORIGUCHI wrote: > Hello, thank you for the comments. The new v8 patch is attched. I've looked at v8, and I do have a few minor comments: 1) indxpath.c uses get_plan_rowmark without including optimizer/prep.h so the compiler complains about missing prototype etc. 2) check_partial_indexes should probably add back the 'break' at the end of the initial loop, as index->baserestrictinfo is not initialized elsewhere 3) match_restriction_clauses_to_index does not need the 'rel' parameter anymore, so we can remove it Attached is v9 of the patch addressing those points, and also tweaking few comments - either fixing typos, or perhaps making them a bit more clear. And also whitespace/formatting on a few places. FWIW if this patch gets committed, I think it's fair to list Kyotaro-san as the primary author and myself as a secondary one. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
I tried to whip this into shape, but there were a few areas I didn't feel I had the necessary understanding to feel comfortable taking on the committer role for it. I've cleaned it up the best I could, fixing whitespace and typos, eliminating an unnecessary addition of an include, improving C comments (at least to my eye), and adding an Assert where it seemed like a good idea to me. My own tests and those of others show performance improvements (up to 10x for some tests) and no significant performance degradation, even with attempts to create "worst case" scenarios. The only changes to the regression tests are to change an "out" file to eliminate re-checking the index predicate against the heap on every matching row, which seems like a good thing. I'm taking my name off as committer and marking it "Ready for Committer". If someone else wants to comment on the issues where Tom and Kyotaro-san still seem unsatisfied to the point where I can get my head around it, I could maybe take it back on as committer -- if anyone feels that could be a net win. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Thank you for polishing this. At Tue, 29 Mar 2016 13:31:19 -0500, Kevin Grittner <kgrittn@gmail.com> wrote in <CACjxUsNm9+tn0Hat0p4wR+sF0bfQDYMPLOEw7FyDycQ2UUrbyA@mail.gmail.com> > I tried to whip this into shape, but there were a few areas I > didn't feel I had the necessary understanding to feel comfortable > taking on the committer role for it. I've cleaned it up the best I > could, fixing whitespace and typos, eliminating an unnecessary > addition of an include, improving C comments (at least to my eye), > and adding an Assert where it seemed like a good idea to me. Especially for this one, === @@ -2697,6 +2697,7 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo *rel) continue; /* don'trepeat work if already proven OK */ have_partial = true; + break; } if (!have_partial) return; === The initialization has been moved to set_rel_size so the break can be restored. > My own tests and those of others show performance improvements (up > to 10x for some tests) and no significant performance degradation, > even with attempts to create "worst case" scenarios. > > The only changes to the regression tests are to change an "out" > file to eliminate re-checking the index predicate against the heap > on every matching row, which seems like a good thing. > > I'm taking my name off as committer and marking it "Ready for > Committer". If someone else wants to comment on the issues where > Tom and Kyotaro-san still seem unsatisfied to the point where I > can get my head around it, I could maybe take it back on as > committer -- if anyone feels that could be a net win. FWIW, as mentioned upthread, I added the following condition to decline ripping index predicates from base restrictinfo without understanding the reason to do so. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hi, On 03/30/2016 06:01 AM, Kyotaro HORIGUCHI wrote: > Thank you for polishing this. > > At Tue, 29 Mar 2016 13:31:19 -0500, Kevin Grittner <kgrittn@gmail.com> wrote in <CACjxUsNm9+tn0Hat0p4wR+sF0bfQDYMPLOEw7FyDycQ2UUrbyA@mail.gmail.com> >> I tried to whip this into shape, but there were a few areas I >> didn't feel I had the necessary understanding to feel comfortable >> taking on the committer role for it. I've cleaned it up the best I >> could, fixing whitespace and typos, eliminating an unnecessary >> addition of an include, improving C comments (at least to my eye), >> and adding an Assert where it seemed like a good idea to me. > > Especially for this one, > > === > @@ -2697,6 +2697,7 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo *rel) > continue; /* don't repeat work if already proven OK */ > > have_partial = true; > + break; > } > if (!have_partial) > return; > === > > The initialization has been moved to set_rel_size so the break > can be restored. FWIW the break was restored in the v9 by me. > > >> My own tests and those of others show performance improvements (up >> to 10x for some tests) and no significant performance degradation, >> even with attempts to create "worst case" scenarios. >> >> The only changes to the regression tests are to change an "out" >> file to eliminate re-checking the index predicate against the heap >> on every matching row, which seems like a good thing. >> >> I'm taking my name off as committer and marking it "Ready for >> Committer". If someone else wants to comment on the issues where >> Tom and Kyotaro-san still seem unsatisfied to the point where I >> can get my head around it, I could maybe take it back on as >> committer -- if anyone feels that could be a net win. > > FWIW, as mentioned upthread, I added the following condition to > decline ripping index predicates from base restrictinfo without > understanding the reason to do so. Ummmm, I'm a bit confused. Which condition? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 30, 2016 at 8:40 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> === >> @@ -2697,6 +2697,7 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo >> *rel) >> continue; /* don't repeat >> work if already proven OK */ >> >> have_partial = true; >> + break; >> } >> if (!have_partial) >> return; >> === >> >> The initialization has been moved to set_rel_size so the break >> can be restored. > > FWIW the break was restored in the v9 by me. Yeah, I kept that in v10, so the three of us seem to be on the same page there. >> FWIW, as mentioned upthread, I added the following condition to >> decline ripping index predicates from base restrictinfo without >> understanding the reason to do so. > > Ummmm, I'm a bit confused. Which condition? Yeah, any additional discussion about areas which anyone sees as open or still needing attention might allow me to get enough traction to wrap this; I'm having trouble seeing what the pending issues are where both Tom and Kyotaro-san seemed to be unsatisfied. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@gmail.com> writes: > I'm taking my name off as committer and marking it "Ready for > Committer". If someone else wants to comment on the issues where > Tom and Kyotaro-san still seem unsatisfied to the point where I > can get my head around it, I could maybe take it back on as > committer -- if anyone feels that could be a net win. I'll pick it up. In a quick scan I see some things I don't like, but nothing insoluble: * Having plancat.c initialize the per-IndexOptInfo restriction lists seems fairly bizarre. I realize that Tomas or Kyotaro probably did that in response to my complaint about having check_partial_indexes have side-effects on non-partial indexes, but this isn't an improvement. For one thing, it means we will produce an incorrect plan if more restriction clauses are added to the rel after plancat.c runs, as the comment for check_partial_indexes contemplates. (I *think* that comment may be obsolete, but I'm not sure.) I think a better answer to that complaint is to rename check_partial_indexes to something else, and more importantly adjust its header comment to reflect its expanded responsibilities --- as the patch I was commenting on at the time failed to do. * It took me some time to convince myself that the patch doesn't break generation of plans for EvalPlanQual. I eventually found where it skips removal of restrictions for target relations, but that's drastically undercommented. * Likewise, there's inadequate documentation of why it doesn't break bitmap scans, which also have to be able to recheck correctly. * I'm not sure that we have any regression test cases covering the above two points, but we should. * The comments leave a lot to be desired in general, eg there are repeated failures to update comments only a few lines away from a change. regards, tom lane
On 03/31/2016 01:36 AM, Tom Lane wrote: > Kevin Grittner <kgrittn@gmail.com> writes: >> I'm taking my name off as committer and marking it "Ready for >> Committer". If someone else wants to comment on the issues where >> Tom and Kyotaro-san still seem unsatisfied to the point where I >> can get my head around it, I could maybe take it back on as >> committer -- if anyone feels that could be a net win. > > I'll pick it up. In a quick scan I see some things I don't like, but > nothing insoluble: > > * Having plancat.c initialize the per-IndexOptInfo restriction lists seems > fairly bizarre. I realize that Tomas or Kyotaro probably did that in > response to my complaint about having check_partial_indexes have > side-effects on non-partial indexes, but this isn't an improvement. For > one thing, it means we will produce an incorrect plan if more restriction > clauses are added to the rel after plancat.c runs, as the comment for > check_partial_indexes contemplates. (I *think* that comment may be > obsolete, but I'm not sure.) I think a better answer to that complaint is > to rename check_partial_indexes to something else, and more importantly > adjust its header comment to reflect its expanded responsibilities --- as > the patch I was commenting on at the time failed to do. > > * It took me some time to convince myself that the patch doesn't break > generation of plans for EvalPlanQual. I eventually found where it > skips removal of restrictions for target relations, but that's drastically > undercommented. > > * Likewise, there's inadequate documentation of why it doesn't break > bitmap scans, which also have to be able to recheck correctly. > > * I'm not sure that we have any regression test cases covering the > above two points, but we should. > > * The comments leave a lot to be desired in general, eg there are > repeated failures to update comments only a few lines away from a change. Kyotaro, I may look into fixing those issues early next week, but that's fairly close to the freeze. Also, I'll have to look into parts that I'm not very familiar with (like the EvalPlanQual), so feel free to address those issues ;-) regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, At Wed, 30 Mar 2016 15:40:24 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <8ad11fae-1cb7-2255-d80c-d1daafb5362f@2ndquadrant.com> > FWIW the break was restored in the v9 by me. Yeah, I know it. Sorry for the misleading comment. > > FWIW, as mentioned upthread, I added the following condition to > > decline ripping index predicates from base restrictinfo without > > understanding the reason to do so. > > Ummmm, I'm a bit confused. Which condition? I sent the messages incomplete. The "condition" mentioned above is the following. | rel->relid == root->parse->resultRelation regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Thank you for the comment. The new version is attached. Some issues has not been addressed but the rest will be addresses in the next version. At Thu, 31 Mar 2016 08:42:50 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <90d885f2-e5ce-6668-226f-c817154e4f73@2ndquadrant.com> > On 03/31/2016 01:36 AM, Tom Lane wrote: > > Kevin Grittner <kgrittn@gmail.com> writes: > >> I'm taking my name off as committer and marking it "Ready for > >> Committer". If someone else wants to comment on the issues where > >> Tom and Kyotaro-san still seem unsatisfied to the point where I > >> can get my head around it, I could maybe take it back on as > >> committer -- if anyone feels that could be a net win. > > > > I'll pick it up. In a quick scan I see some things I don't like, but > > nothing insoluble: > > > > * Having plancat.c initialize the per-IndexOptInfo restriction lists > > * seems > > fairly bizarre. I realize that Tomas or Kyotaro probably did that in > > response to my complaint about having check_partial_indexes have > > side-effects on non-partial indexes, but this isn't an improvement. I felt the same thing for it. It is for discussion. It is the earliest point where we can initialize baserestrictinfo of each IndexOptInfo. And the original location is just after and is the latest point. Reverted. > > For one thing, it means we will produce an incorrect plan if > > more restriction clauses are added to the rel after plancat.c > > runs, as the comment for check_partial_indexes contemplates. Mmm. Thank you. I overlooked that. The following code seems saying related thing. > if (index->predOK) > continue; /* don't repeat work if already proven OK */ So, index restrinctioninfo should be calculated even if index->predOK is already true. But refactroring suggested by the following comment will fix this. > > (I *think* that comment may be obsolete, but I'm not sure.) > > I think a better answer to that complaint is to rename > > check_partial_indexes to something else, and more importantly > > adjust its header comment to reflect its expanded > > responsibilities --- as the patch I was commenting on at the > > time failed to do. Ok, I removed the index baserestrictinfo (a bit long to repeat..) creation out of check_partial_indexes and named rebuild_index_restrictinfo. (calling for better names..) This loosens the restriction on the timing to trim an index restrictinfo. It is now moved to create_index_paths. > > * It took me some time to convince myself that the patch doesn't break > > generation of plans for EvalPlanQual. I eventually found where it > > skips removal of restrictions for target relations, but that's > > drastically undercommented. It is originally in create_indexscan_plan and had no documentation about EPQ. But I also want such documentation there so I added it on in rebuild_index_restrictinfo. > > * Likewise, there's inadequate documentation of why it doesn't break > > bitmap scans, which also have to be able to recheck correctly. This trims clauses that implied by index predicate, which donesn't need recheck even if it is a bitmap scan, I believe. Is it wrong? The original comment on check_index_only said that, > * XXX this is overly conservative for partial indexes, since we will > * consider attributes involved in the index predicate as required even > * though the predicate won't need to be checked at runtime. (The same is > * true for attributes used only in index quals, if we are certain that > * the index is not lossy.) However, it would be quite expensive to > * determine that accurately at this point, so for now we take the easy > * way out. This seems to me that such clauses are safely ignored. But not for attributes used only in index quals, because of possible lossy charcter of the index. It seems quite reasonable. This optimization won't be hold if this comment is wrong. > > * I'm not sure that we have any regression test cases covering the > > above two points, but we should. I agree. Will try to add in the next version, maybe posted tomorrow. > > * The comments leave a lot to be desired in general, eg there are > > repeated failures to update comments only a few lines away from a > > change. I'll recheck that and fix in the next version. > Kyotaro, > > I may look into fixing those issues early next week, but that's fairly > close to the freeze. Also, I'll have to look into parts that I'm not > very familiar with (like the EvalPlanQual), so feel free to address > those issues ;-) Aye sir! regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > Thank you for the comment. The new version is attached. Committed with rather heavy editorialization and a batch of regression test cases. regards, tom lane
At Thu, 31 Mar 2016 14:51:02 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <19589.1459450262@sss.pgh.pa.us> > Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > > Thank you for the comment. The new version is attached. > > Committed with rather heavy editorialization and a batch of regression > test cases. > > regards, tom lane Many thanks for the editorialization (or refactoring), and many descriptive comments and testing, then committing. There seems no problem for me of that. The followings are my reviw and thought on that, FWIW. ====== check_index_predicates:+ * At one time it was possible for this to get re-run after adding more+ * restrictions to the rel,thus possibly letting us prove more indexes OK.+ * That doesn't happen any more (at least not in the core code's usage), !+ * but this code still supports it in case extensions want to mess with the !+ * baserestrictinfo list. We assume that adding more restrictions can't make+ * an index not predOK. We must recomputeindrestrictinfo each time, though,+ * to make sure any newly-added restrictions get into it if needed. I didn't imagine that the assumption is for the sake of extensions.. +check_index_predicates(PlannerInfo *root, RelOptInfo *rel) ... + index->indrestrictinfo = rel->baserestrictinfo; Ah. This has been retuened here. + is_target_rel = (rel->relid == root->parse->resultRelation || + get_plan_rowmark(root->rowMarks, rel->relid) != NULL); This is very descriptive even for me. - * We can also discard quals that are implied by a partial index's - * predicate, but only in a plain SELECT; when scanning a target relation - * of UPDATE/DELETE/SELECT FOR UPDATE, we must leave such quals in the - * plan so that they'll be properly rechecked by EvalPlanQual testing. - * Uggg. I'm sorry that I couldn't find this commnet just above what I removed. + * way because predicate conditions need to be rechecked if the scan + * becomes lossy, so they have to be included in bitmapqualorig. Of couse 'lossy' means 'may contain unqualified data', my brain must have been in powersaving mode. regards, -- Kyotaro Horiguchi NTT Open Source Software Center