index-only quals vs. security_barrier views - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | index-only quals vs. security_barrier views |
Date | |
Msg-id | CA+Tgmob3PQa=FoO=s=Eb4H=vvewPvKnJiqyjFFm7hyb8Q7ajbg@mail.gmail.com Whole thread Raw |
Responses |
Re: index-only quals vs. security_barrier views
Re: index-only quals vs. security_barrier views |
List | pgsql-hackers |
When Heikki worked up his original index-only scan patches (which didn't end up looking much like what eventually got committed), he had the notion of an index-only qual. That is, given a query like this: select sum(1) from foo where substring(a,1,3) = 'abc'; We could evaluate the substring qual before performing the heap fetch, and fetch the tuple from the heap only if the qual passes. The current code is capable of generating an index-only scan plan for this query, but the MVCC visibility check always happens first: if the page is all-visible, we go ahead and evaluate the qual using only the index data, but if the page is not all-visible, we do the heap fetch first and then check the qual only if the tuple is visible to our snapshot. It would be nice to have the ability to do those checks in the other order, in the case where the projected cost of checking the qual is less than the projected cost of the heap fetch. This would allow index-only scans to win in more situations than they can right now, because we'd conceivably avoid quite a bit of random I/O if the qual is fairly selective and there are a decent number of visibility map bits that are unset. In fact, this technique might pay off even if we don't have a covering index: select * from foo where substring(a,1,3) = 'abc'; If the expected selectivity of the qual is low and the index is a lot smaller than the table, we might want to iterate through all the index tuples in the entire index and fetch the heap tuples for only those where the qual passes. This would allow index-only scans - or whatever term you want to use, since there's not much that's index "only" about this case - to potentially win even when no visibility-map bits are set at all. Now, there's a fly in the ointment here, which is that applying arbitrary user-defined functions to tuples that might not be visible doesn't sound very safe. The user-defined function in question might perform some action based on those invisible tuples that has side effects, which would be bad, because now we're violating MVCC semantics. Or it might throw an error, killing the scan dead on the basis of the contents of some tuple that the scan shouldn't even see.However, there is certainly a class of functions forwhich this type of optimization would be safe, and it's an awful lot like the set of functions that can be safely pushed down through a security_barrier view - namely, things that don't have side effects or throw errors. So it's possible that the proleakproof flag KaiGai is proposing to add to pg_proc could do double duty, serving also to identify when it's safe to apply a qual to an index tuple when the corresponding heap tuple might be invisible. However, I have some reservations about assuming that the two concepts are exactly the same. For one thing, people are inevitably going to want to cheat a little bit more here than is appropriate for security views, and encouraging people to mark things LEAKPROOF when they're really not is a security disaster waiting to happen. For another thing, there are some important cases that this doesn't cover, like: select * from foo where substring(a,1,3) like '%def%'; The like operator doesn't seem to be leakproof in the security sense, because it can throw an error if the pattern is something like a single backslash (ERROR: LIKE pattern must not end with escape character) and indeed it doesn't seem like it would be safe here either if the pattern were stored in the table. But if the pattern were constant, it'd be OK, or almost OK: there's still the edge case where the table contains invisible rows but no visible ones - whether or not we complain about the pattern there ought to be the same as whether or not we complain about it on a completely empty table. If we got to that point, then we might as well consider the qual leakproof for security purposes under the same set of circumstances we'd consider it OK to apply to possibly-invisible tuples. I don't have any appetite for trying to do anything more with index-only scans for 9.2, though maybe someone else will think otherwise. But I would like very much to get KaiGai's leakproof stuff committed, and so it seems like a good idea to reconcile the needs of that machinery with what might eventually be needed here. Comments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: