Thread: CHECK constraints and optimizations
Greetings! Just trying some tests out, and wanted to know about some optimizations. If I do a CHECK constraint on a table, is this used to optimize a SELECT or does Postgresql rely mostly on normal index search? For example, I want to create some tables to manage different data in a kind of <object, relationship, object2> manner, but where object2 could be an IP address, text, a number, etc. So I thought of doing the following: ---------- create table tmp ( luid bigserial, object_luid bigint, relationship ltree ); create table tmp1 ( child_luid bigint, check (relationship <@ 'Object') ) inherits (tmp); create table tmp2 ( ip inet, check (relationship <@ 'IP') ) inherits (tmp); insert into tmp1 (object_luid, relationship, child_luid) values (1, 'Object', 2); insert into tmp2 (object_luid, relationship, ip) values (1, 'IP.Packet.Source', '10.1.1.2'); insert into tmp2 (object_luid, relationship, ip) values (2, 'IP.Packet.Source', '10.11.0.1'); create view tmp_view as select luid, object_luid, relationship, child_luid, null as ip from tmp1 union select luid, object_luid, relationship, null, ip from tmp2 ; explain analyze select * from tmp_view where object_luid = 2; explain analyze select * from tmp_view where relationship <@ 'IP.Packet'; explain analyze select * from ( select luid, object_luid, relationship, child_luid, null as ip from tmp1 union all select luid, object_luid, relationship, null, ip from tmp2 ) as foo where relationship <@ 'IP.Packet'; ; ----------------------------------------- When I do the above analyzes, the table for <tmp1> is still scanned, even though the WHERE clause cannot meet the CHECK clause. Now, this is a fairly edge case for optimizations, so I just wanted to check that this indeed will not be utilized. Or will it be only after I have lots of rows in the table, thereby justifying the check? Is there ever a time where CONSTRAINTS will be used to optimize a SELECT? Alternatively, is there another way of accomplishing what I want without the ugly VIEW (as each new table type I add will necessitate rebuilding the VIEW with a new column)? Regards! Ed
Edmund Dengler <edmundd@eSentire.com> writes: > Just trying some tests out, and wanted to know about some optimizations. > If I do a CHECK constraint on a table, is this used to optimize a SELECT It is not. regards, tom lane
On Wed, 5 May 2004, Edmund Dengler wrote: > Greetings! > > Just trying some tests out, and wanted to know about some optimizations. > If I do a CHECK constraint on a table, is this used to optimize a SELECT > or does Postgresql rely mostly on normal index search? I think the only kind of constraint that incidentally improves performance is a unique constraint, which creates a unique index. A check constraint is run on a record when it is changed to make sure it still meets the requirements of the constraint. There is no seperate file that says "this row meets the constraint". Deferred constraints mean the check is to be done at the commit time of the transaction. Note that unique constraints are not necessarily deferrable due to issues caused by using an immediate acting unique index. I don't think this is easily fixable either. So, a check constraint is of no use during a read from the table, and is a performance penalty when writing to it.
On Wed, 05 May 2004 21:09:25 -0400, Tom Lane wrote: >> Just trying some tests out, and wanted to know about some optimizations. >> If I do a CHECK constraint on a table, is this used to optimize a SELECT > > It is not. I one were to try to add some constraint-based optimizations ("semantic query optimizations"), what parts of the code would be most relevant to study? In particular, I'm interested in the case of join eliminations, based on foreign key constraints. E.g. having a SUPPLIER(s_id,...) and a SUPPLIER_PART(s_id,p_id) table where SUPPLIER_PART.s_id references SUPPLIER.s_id. Then, a "SELECT p_id FROM SUPPLIER_PART NATURAL JOIN SUPPLIER" could skip the join and just look in SUPPLIER_PART. Another thing: Oracle and PostgreSQL uses IOs to respond to SELECT * FROM person WHERE age < 30 AND age > 30. DB2 and MySQL sees that the result is the empty set, without wasting IOs. - So here's another place for potential optimizations, although the area is rather hairy, as soon as one moves beyond the most simple cases. By the way, in "An Introduction to Database Systems", Date writes about semantic optimizations: "... such optimization could provide significant performance improvements - much greater improvements, very likely, than are obtained by any of today's more traditional optimization techniques". -- Greetings from Troels Arvin, Copenhagen, Denmark
On Thu, 6 May 2004, Troels Arvin wrote: > On Wed, 05 May 2004 21:09:25 -0400, Tom Lane wrote: > > >> Just trying some tests out, and wanted to know about some optimizations. > >> If I do a CHECK constraint on a table, is this used to optimize a SELECT > > > > It is not. > > I one were to try to add some constraint-based optimizations ("semantic > query optimizations"), what parts of the code would be most relevant to > study? I'll leave the answer to such a question to someone who knows the internals of pgsql a bit better than me. > Oracle and PostgreSQL uses IOs to respond to > SELECT * FROM person WHERE age < 30 AND age > 30. > DB2 and MySQL sees that the result is the empty set, without wasting IOs. > - So here's another place for potential optimizations, although the area > is rather hairy, as soon as one moves beyond the most simple cases. The postgresql team considers the load on their plates to be great enough without bothering to optimize highly non-optimal, poorly thought out queries. I.e. if you're asking for things like in that where clause, no one's gonna optimize that. There's already too much to do around here without focusing on that. Now, if someone gets an itch and wants to try and code it in their spare time...
On Thu, May 06, 2004 at 09:29:42AM -0600, scott.marlowe wrote: > A check constraint is run on a record when it is changed to make sure it > still meets the requirements of the constraint. There is no seperate file > that says "this row meets the constraint". Deferred constraints mean the > check is to be done at the commit time of the transaction. > > Note that unique constraints are not necessarily deferrable due to issues > caused by using an immediate acting unique index. I don't think this is > easily fixable either. > > So, a check constraint is of no use during a read from the table, and > is a performance penalty when writing to it. I have been thinking though, imagine a table with the constraint: x < 1000 If I have a query that has WHERE x > 2000, can't that be optimised to WHERE FALSE? Or WHERE x < 1200 optimised to x < 1000? Obviously not if the constraint is deferred, but otherwise? The other person is correct in that (x < 1000 and x > 2000) is not optimised away by postgresql. Odd, because the capability is there as very similar tests are use by partial indexes and the index code in general. If that worked, the system could just add the (simple) CHECK constraints to the WHERE clause of a query, do the optimisation phrase and then remove any that remain. I can't see why this wouldn't work. Any thoughts? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Martijn van Oosterhout <kleptog@svana.org> writes: > I can't see why this wouldn't work. Doubtless you could do it. The problem with the idea is that those inference tests are pretty expensive. I think that any such thing would waste significant numbers of cycles on ordinary queries while only being a win on a few poorly-written queries. We do have to make a tradeoff between planning time and execution time, and I fear that this idea is not going to be a win in those terms. If you feel like experimenting, though, go for it ... regards, tom lane
On Thu, May 06, 2004 at 09:02:21PM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > I can't see why this wouldn't work. > > Doubtless you could do it. The problem with the idea is that those > inference tests are pretty expensive. I think that any such thing would > waste significant numbers of cycles on ordinary queries while only being > a win on a few poorly-written queries. Is it really that expensive? From the index code I remember playing with way back when I was fiddling with the partial index stuff, there is a table where it takes: X OPa Val1 X OPb Val2 and it has a lookup table on (OPa,OPb) to provide an OPc that can be applied to (Val1,Val2) to determine if one implies the other. I was very impressed actually, quite a neat idea. Quite simple I thought. I wasn't really considering anything more complicated than this. No subclauses, only ANDs. > We do have to make a tradeoff between planning time and execution time, > and I fear that this idea is not going to be a win in those terms. > If you feel like experimenting, though, go for it ... Ofcourse, my ulterior motive is that I want table partitioning based on values within the tuple. And then have queries avoid scanning tables that query things that are not in those tables as inferred by parts of the WHERE clause. Think phone calls with a different subtable for each year, automatically. I toyed with creating a script that would generate the RULEs necessary to implement it in the current system, but splitting a table into four peices would require around 50+ RULEs (4 subtables x 4 conditions x 3 query types), obviously massively more inefficient that what's being suggested here. The solution is to build it right into the storage manager, but I haven't tried that yet. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Martijn van Oosterhout <kleptog@svana.org> writes: > On Thu, May 06, 2004 at 09:02:21PM -0400, Tom Lane wrote: >> Doubtless you could do it. The problem with the idea is that those >> inference tests are pretty expensive. > Is it really that expensive? I'm afraid it would be. You're correct that the basic test in the pred_test routine is simple enough, but there are a few issues: 1. You gotta root through all the index opclasses to see if you can find one involving the operators at hand. (I'm pretty sure this is the most expensive part of pred_test_simple_clause() as it stands.) 2. Using this to detect redundant/contradictory clauses implies comparing every WHERE clause to every other WHERE clause, hence O(N^2) effort. 3. You'll be paying the price even on pretty simple queries. I usually figure that extra planner effort is worthwhile if it only fires on complex queries, or queries where there's particular reason to think a win might be had. (Eg, 7.4 goes out of its way when it sees an IN-subselect clause, but I don't think anyone has a problem with that.) But with this idea, I don't see any way to avoid expending a lot of effort on queries where no win will actually result. BTW, there actually is code in the btree index stuff to detect contradictory index quals, so "x > 10 AND x < 10" will in fact result in no I/O if the chosen plan is an indexscan on x. (This path doesn't have the same problems mentioned above, because by the time control gets there, we've already determined that the operators are indeed in the same index opclass and that the same variable is involved.) So that's another hole in the scope of usefulness of a planning-time test. I was a tad surprised by the assertion up at the top of this thread that MySQL has a test for this case. From what I know of their design philosophy, they'd have even less interest than us in optimizing badly-written queries at the cost of slowing down the normal path. Am I right to guess that what they actually have is a short-circuit case similar to ours for contradictory index quals, and not a blanket check for contradictory WHERE conditions in general? regards, tom lane