Thread: Select queries which violates table constrains
Hi, I investigated some select query performance issues and noticed that postgresql misses some obvious cases while processing SELECT query. I mean the case where WHERE clause contains statement which condition would be against table structure. (excuse my language, look the code) Example: Let the table be : CREATE TABLE test ( id numeric(3,0) NOT NULL, somecolumn numeric(5,0) NOT NULL, CONSTRAINT id_pk PRIMARY KEY (id) ); Simple table with "somecolumn" column which has constraint NOT NULL. Let's do a following query to the table. SELECT somecolumn FROM test WHERE somecolumn IS NULL; Result is empty result set which is obvious because any null value would be against the table constrain. The thing here is that postgresql does SeqScan to this table in order to find out if there is any null values. Explain: "Seq Scan on test (cost=0.00..1.06 rows=1 width=5)" " Filter: (somecolumn IS NULL)" "Planning time: 0.778 ms" SeqScan can be avoided by making index for "somecolumn" and indexing all the null values. That index would be empty and very fast but also very pointless since table constraint here is simple. No one would do such a query in real life but some programmatically generated queries does this kind of things. Only way I found to go around this problem was to create those empty indexies but I think the query optimizer could be smarter here. I took a look of the optimizer code and I didn't find any code which avoids this kind of situations. (I expect that it would be optimizer's task to find out this kind of things) I was thinking some feature for optimizer where the optimizer could add a hint for an executor if some query plan path leads to the empty result set case. If executor sees this hint it could avoid doing seqscan and actually even index scans. This kind of query constraint vs. table constraint comparison should be anyway cheaper process to execute than seqscan. The question is that, is there any reason why such an optimization phase could not be implemented? Another question is that how is the query engine handling the partitioned table case? Am i right that table partitions are solved by table constrains and indexies are used to validate which child table to look for? And so forth could this kind of new optimization phase benefit partitioned tables? Kind regards Joni Martikainen
Joni Martikainen <joni@shade-fx.com> writes: > I investigated some select query performance issues and noticed that > postgresql misses some obvious cases while processing SELECT query. I > mean the case where WHERE clause contains statement which condition > would be against table structure. (excuse my language, look the code) Your example does what you want if you set constraint_exclusion to ON: regression=# explain SELECT somecolumn FROM test WHERE somecolumn IS NULL; QUERY PLAN ------------------------------------------------------Seq Scan on test (cost=0.00..25.10 rows=8 width=12) Filter: (somecolumnIS NULL)Planning time: 0.055 ms (3 rows) regression=# set constraint_exclusion = on; SET regression=# explain SELECT somecolumn FROM test WHERE somecolumn IS NULL; QUERY PLAN ------------------------------------------Result (cost=0.00..0.01 rows=1 width=0) One-Time Filter: falsePlanning time:0.065 ms (3 rows) There may be other cases where the planner could be smarter, but in this particular case it intentionally doesn't check for this sort of situation by default, because (as you say) the case only happens with badly-written queries, and (as the above output demonstrates) we take rather a big hit in planning time to make those checks. regards, tom lane
On 05/10/2014 09:24 PM, Joni Martikainen wrote: > Hi, > > I investigated some select query performance issues and noticed that > postgresql misses some obvious cases while processing SELECT query. I > mean the case where WHERE clause contains statement which condition > would be against table structure. (excuse my language, look the code) > > Example: > Let the table be : > > CREATE TABLE test > ( > id numeric(3,0) NOT NULL, > somecolumn numeric(5,0) NOT NULL, > CONSTRAINT id_pk PRIMARY KEY (id) > ); > > Simple table with "somecolumn" column which has constraint NOT NULL. > > Let's do a following query to the table. > > SELECT somecolumn FROM test WHERE somecolumn IS NULL; > > Result is empty result set which is obvious because any null value would > be against the table constrain. > The thing here is that postgresql does SeqScan to this table in order to > find out if there is any null values. > > Explain: > "Seq Scan on test (cost=0.00..1.06 rows=1 width=5)" > " Filter: (somecolumn IS NULL)" > "Planning time: 0.778 ms" > > SeqScan can be avoided by making index for "somecolumn" and indexing all > the null values. That index would be empty and very fast but also very > pointless since table constraint here is simple. > No one would do such a query in real life but some programmatically > generated queries does this kind of things. Only way I found to go > around this problem was to create those empty indexies but I think the > query optimizer could be smarter here. > > I took a look of the optimizer code and I didn't find any code which > avoids this kind of situations. (I expect that it would be optimizer's > task to find out this kind of things) > > I was thinking some feature for optimizer where the optimizer could add > a hint for an executor if some query plan path leads to the empty result > set case. If executor sees this hint it could avoid doing seqscan and > actually even index scans. This kind of query constraint vs. table > constraint comparison should be anyway cheaper process to execute than > seqscan. > > The question is that, is there any reason why such an optimization phase > could not be implemented? Another question is that how is the query > engine handling the partitioned table case? Am i right that table > partitions are solved by table constrains and indexies are used to > validate which child table to look for? And so forth could this kind of > new optimization phase benefit partitioned tables? Actually, the planner can perform that optimization. The trick is called "constraint exclusion". It is typically used for partitioning, where the WHERE-clause restricts the query to a single partition, and you would otherwise have to scan all the partitions. It is not usually a very useful optimization, and it is somewhat expensive to check for that case, so it is disabled by default except for partitioned tables. But if you do "set constraint_exclusion=on", you will get the plan you're looking for: postgres=# set constraint_exclusion=on; SET postgres=# explain SELECT somecolumn FROM test WHERE somecolumn IS NULL; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.01 rows=1 width=0) One-Time Filter: false Planning time:0.071 ms (3 rows) - Heikki