Re: LIKE, leading percent, bind parameters and indexes - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: LIKE, leading percent, bind parameters and indexes |
Date | |
Msg-id | 23216.1148741825@sss.pgh.pa.us Whole thread Raw |
In response to | Re: LIKE, leading percent, bind parameters and indexes (Martijn van Oosterhout <kleptog@svana.org>) |
Responses |
Re: LIKE, leading percent, bind parameters and indexes
|
List | pgsql-hackers |
Martijn van Oosterhout <kleptog@svana.org> writes: > On Fri, May 26, 2006 at 11:38:41AM -0500, Jim C. Nasby wrote: >> Also, might a bitmap scan be a win for the %string case? Presumably it's >> much faster to find matching rows via an index and then go back into the >> heap for them; unless you're matching a heck of a lot of rows. > This is an interesting thought. Currently, AFAICS, the bitmap-scan code > only considers operators that are indexable, just like for narmal index > scans. However, in this case the query could scan the entire index, > apply the LIKE to each one and produce a bitmap of possible matches. > Then do a bitmap scan over the table to check the results. > Not just LIKE could use this, but any function marked STABLE. You'd > have to weigh up the cost of scanning the *entire* index (because we > don't have any actual restriction clauses) against avoiding a full > table scan. I've been thinking about this. The general case is that you could have some "auxiliary conditions" (arbitrary nonvolatile expressions using only columns present in the index) as well as the regular index qualification conditions (possibly zero of these). AFAICS it wouldn't matter if the indexscan is bitmap or regular. It seems fairly doable, and would have some nice side effects --- for example, the ancient bugaboo that "foo IS NULL" isn't an indexable operator would be partially assuaged. But there are a couple of gotchas: * It doesn't work for indexes that store "compressed" keys instead of the original column value; which lets out GiST and GIN, at least with some opclasses. I'd be inclined to just implement it for btree and maybe hash, rather than bothering with checking opclasses. (I don't think we have any official way for a GiST/GIN opclass to show whether it does any key compression, anyhow.) * Up to now, the only functions directly invoked by an index AM were members of index opclasses; and since opclasses can only be defined by superusers, there was at least some basis for trusting the functions to behave sanely. But if an index AM is going to invoke arbitrary user-defined expressions then more care is needed. What's particularly bothering me is the notion of executing arbitrary functions while holding a buffer lock on an index page. If the arbitrary functions go off and scan other tables (or even the same table) I think it wouldn't be too hard to get into deadlock situations, especially across multiple backends. And deadlocks on LWLocks are really nasty: there's no deadlock detection, no timeout, and no way out short of SIGQUIT/SIGKILL. That would make it a security hole, even if the conditions needed to trigger it are so bizarre they'd never arise in normal usage. Given that btree now works page-at-a-time in all cases, we could imagine fixing this by postponing checks of auxiliary conditions until after we release the buffer lock. This would require making copies of all index tuples that pass the regular index quals as we scan the page (needing at most BLCKSZ workspace), releasing the buffer lock, and then applying the auxiliary conditions to filter out tuples we don't want to return. The extra data-copying is annoying but probably still beats a trip to the heap. It might be best to just copy the whole page out of shared buffers and into a local page, release the lock immediately, and then go on with checking both indexquals and auxiliary conditions in one pass. This would be more data-copying but the improvement in locality of access to the shared buffer might repay that. I don't recall the locking rules for hash in any detail, but probably something similar would work for hash, assuming anyone even wants to bother with it. Comments? regards, tom lane
pgsql-hackers by date: