Thread: Bitmap index scans use of filters on available columns
create table f as select (random()*100)::int as x, md5(random()::text) as y from generate_series(1,1000000); create index on f (x, y); analyze verbose f; --dont vacuum explain select * from f where x=5 and y like '%abc%'; QUERY PLAN ------------------------------------------------------------------------------Bitmap Heap Scan on f (cost=382.67..9314.72rows=1 width=37) Recheck Cond: (x = 5) Filter: (y ~~ '%abc%'::text) -> Bitmap Index Scan on f_x_y_idx (cost=0.00..382.67 rows=10433 width=0) Index Cond: (x = 5) Is there a fundamental reason the filter on y is not being applied to the index scan, rather than the heap scan? Should this be a to-do item? This could avoid a lot of heap block reads, and also might prevent the bitmap from overflowing work_mem and turning lossy. Cheers, Jeff
While prefix expression y like 'abc%' can be converted to y >= 'abc' (see expand_indexqual_opclause()), I'm not sure any kind of expansion is possible for '%abc%' which would result in a b-tree searchable condition. Jeff Janes <jeff.janes@gmail.com> wrote: > Is there a fundamental reason the filter on y is not being applied to > the index scan, rather than the heap scan? -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
2015-11-04 Antonin Houska <ah@cybertec.at>: > While prefix expression > > y like 'abc%' > > can be converted to > > y >= 'abc' > > (see expand_indexqual_opclause()), I'm not sure any kind of expansion is > possible for '%abc%' which would result in a b-tree searchable condition. I think the question is not about using the b-tree for checking the condition, but about just retrieving the value for y from the index, and just using that to check the condition before fetching the corresponding tuple from the heap. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad?
Nicolas Barbier <nicolas.barbier@gmail.com> writes: > 2015-11-04 Antonin Houska <ah@cybertec.at>: >> (see expand_indexqual_opclause()), I'm not sure any kind of expansion is >> possible for '%abc%' which would result in a b-tree searchable condition. > I think the question is not about using the b-tree for checking the > condition, but about just retrieving the value for y from the index, > and just using that to check the condition before fetching the > corresponding tuple from the heap. Bitmap index scans only return TID bitmaps, not index tuples; indeed the underlying index may not store anything recognizable as tuples. regards, tom lane
On 4 November 2015 at 15:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
postgres=# explain select * from f where x=5 and y like '%abc%';
QUERY PLAN
--------------------------------------------------------------------------
Index Scan using f_x_y_idx on f (cost=0.42..26075.71 rows=209 width=37)
Index Cond: (x = 5)
Filter: (y ~~ '%abc%'::text)
--
Nicolas Barbier <nicolas.barbier@gmail.com> writes:
> 2015-11-04 Antonin Houska <ah@cybertec.at>:
>> (see expand_indexqual_opclause()), I'm not sure any kind of expansion is
>> possible for '%abc%' which would result in a b-tree searchable condition.
> I think the question is not about using the b-tree for checking the
> condition, but about just retrieving the value for y from the index,
> and just using that to check the condition before fetching the
> corresponding tuple from the heap.
Bitmap index scans only return TID bitmaps, not index tuples; indeed
the underlying index may not store anything recognizable as tuples.
Agreed, though the OP's question was asking why a Filter condition can't be added to a BitmapIndexScan, so that non-qualifying rows can be excluded from the TID bitmap it generates.
We generate this plan
QUERY PLAN
--------------------------------------------------------------------------
Index Scan using f_x_y_idx on f (cost=0.42..26075.71 rows=209 width=37)
Index Cond: (x = 5)
Filter: (y ~~ '%abc%'::text)
So it should be possible to do the Filter condition on the BitmapIndexScan.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndquadrant.com> writes: > On 4 November 2015 at 15:54, Tom Lane <tgl@sss.pgh.pa.us> wrote: > We generate this plan > Index Scan using f_x_y_idx on f (cost=0.42..26075.71 rows=209 width=37) > Index Cond: (x = 5) > Filter: (y ~~ '%abc%'::text) > So it should be possible to do the Filter condition on the BitmapIndexScan. You're missing my point: that is possible in an indexscan, but *not* in a bitmap indexscan, because the index AM APIs are totally different in the two cases. In a bitmap scan, nothing more than a TID bitmap is ever returned out to anyplace that could execute arbitrary expressions. In the case at hand, the planner should have considered a plan of this shape as well. Presumably it concluded it was more expensive than using the bitmap approach. Jeff might try "set enable_bitmapscan = 0" and compare the estimated and actual costs. regards, tom lane
On 4 November 2015 at 16:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
explain select * from f where x=5 and y like '%abc%';
QUERY PLAN
------------------------------------------------------------------------------
Bitmap Heap Scan on f (cost=382.67..9314.72 rows=1 width=37)
-> Bitmap Index Scan on f_x_y_idx (cost=0.00..382.67 rows=10433 width=0)
--
Simon Riggs <simon@2ndquadrant.com> writes:
> On 4 November 2015 at 15:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We generate this plan
> Index Scan using f_x_y_idx on f (cost=0.42..26075.71 rows=209 width=37)
> Index Cond: (x = 5)
> Filter: (y ~~ '%abc%'::text)
> So it should be possible to do the Filter condition on the BitmapIndexScan.
You're missing my point: that is possible in an indexscan, but *not* in a
bitmap indexscan, because the index AM APIs are totally different in the
two cases. In a bitmap scan, nothing more than a TID bitmap is ever
returned out to anyplace that could execute arbitrary expressions.
Still misunderstanding each other... sorry about that
If a btree can Filter y like that on an IndexScan, then it can also apply that Filter on y when it is looking through rows before it adds them to the bitmap.
I completely understand that it cannot return the value (of y) via the bitmap.
We should be able to get a plan like this
QUERY PLAN
------------------------------------------------------------------------------
Bitmap Heap Scan on f (cost=382.67..9314.72 rows=1 width=37)
-> Bitmap Index Scan on f_x_y_idx (cost=0.00..382.67 rows=10433 width=0)
Index Cond: (x = 5)
>> Filter: (y ~~ '%abc%'::text)
if the bitmap stays non-lossy.
I see that plannodes.h says
"In a BitmapIndexScan plan node, the targetlist and qual fields are not used and are always NIL. "but it doesn't say why.
Why can't the BitmapIndexScan execute arbitrary expressions? (or What did you mean by that phrase?)
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndquadrant.com> writes: > On 4 November 2015 at 16:14, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> You're missing my point: that is possible in an indexscan, but *not* in a >> bitmap indexscan, because the index AM APIs are totally different in the >> two cases. In a bitmap scan, nothing more than a TID bitmap is ever >> returned out to anyplace that could execute arbitrary expressions. > Still misunderstanding each other... sorry about that > If a btree can Filter y like that on an IndexScan, then it can also apply > that Filter on y when it is looking through rows before it adds them to the > bitmap. btree applies no such filter in either case. "Filter" conditions are applied outside the index AM --- and yes, I will resist any proposal to relocate that responsibility. regards, tom lane
On 4 November 2015 at 16:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
--
Simon Riggs <simon@2ndquadrant.com> writes:
> On 4 November 2015 at 16:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> You're missing my point: that is possible in an indexscan, but *not* in a
>> bitmap indexscan, because the index AM APIs are totally different in the
>> two cases. In a bitmap scan, nothing more than a TID bitmap is ever
>> returned out to anyplace that could execute arbitrary expressions.
> Still misunderstanding each other... sorry about that
> If a btree can Filter y like that on an IndexScan, then it can also apply
> that Filter on y when it is looking through rows before it adds them to the
> bitmap.
btree applies no such filter in either case. "Filter" conditions are
applied outside the index AM --- and yes, I will resist any proposal to
relocate that responsibility.
I don't think anyone has argued that point, yet, we just don't have enough info to agree yet.
As Jeff said in his OP, "Is there a fundamental reason the filter on y is not being applied to
the index scan, rather than the heap scan?", which still stands.
Why would you resist? And/Or Why is that important?
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Nov 4, 2015 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndquadrant.com> writes: >> On 4 November 2015 at 16:14, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> You're missing my point: that is possible in an indexscan, but *not* in a >>> bitmap indexscan, because the index AM APIs are totally different in the >>> two cases. In a bitmap scan, nothing more than a TID bitmap is ever >>> returned out to anyplace that could execute arbitrary expressions. > >> Still misunderstanding each other... sorry about that > >> If a btree can Filter y like that on an IndexScan, then it can also apply >> that Filter on y when it is looking through rows before it adds them to the >> bitmap. > > btree applies no such filter in either case. "Filter" conditions are > applied outside the index AM --- and yes, I will resist any proposal to > relocate that responsibility. I don't understand what you're arguing against here - as Simon I think correctly points out, there seems to be a substantial performance improvement possible here. It's not so much that we would apply the Filter conditions outside the index AM as that we would have two kinds of Index Quals that could be enforced by the index AM. Some quals (such as x = 5) could be enforced by scanning only the relevant portion of the index, while others (such as y like '%abc%') don't abridge the portion of the index that needs to be scanned, but could disqualify some index tuples before we go to the trouble of hitting the heap. The problem I see is this might involve testing quals against an index tuple when the corresponding heap tuple isn't visible. This probably isn't safe except for leakproof quals, and there might be some other problems as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Nov 4, 2015 at 7:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndquadrant.com> writes: >> On 4 November 2015 at 15:54, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> We generate this plan >> Index Scan using f_x_y_idx on f (cost=0.42..26075.71 rows=209 width=37) >> Index Cond: (x = 5) >> Filter: (y ~~ '%abc%'::text) > >> So it should be possible to do the Filter condition on the BitmapIndexScan. > > You're missing my point: that is possible in an indexscan, but *not* in a > bitmap indexscan, because the index AM APIs are totally different in the > two cases. In a bitmap scan, nothing more than a TID bitmap is ever > returned out to anyplace that could execute arbitrary expressions. I had thought it must already be able to execute arbitrary expressions, due to the ability to already support user-defined btree ops (and ops of non-btree types in the case of other index types). Does that only require a restricted execution environment? > In the case at hand, the planner should have considered a plan of this > shape as well. Presumably it concluded it was more expensive than using > the bitmap approach. Jeff might try "set enable_bitmapscan = 0" and > compare the estimated and actual costs. It is a simplified test case, I can get it to do about anything I want by tweaking the size, selectivity, columns selected, and cache warmth. My understanding is that the normal (not index-only) index scan, which is thought to be slower and actually is slower with a hot cache, also applies the filter on the heap tuple, not the index tuple. But since it only matters much for large queries and large queries tend to prefer bitmaps, I was less interested in that case. And bitmaps can feed into BitmapAnd, which is very helpful in avoiding index-creep where every query gets a new index tailored for it. I think we are missing a trick here that could apply to lots of situations. Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > On Wed, Nov 4, 2015 at 7:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> You're missing my point: that is possible in an indexscan, but *not* in a >> bitmap indexscan, because the index AM APIs are totally different in the >> two cases. In a bitmap scan, nothing more than a TID bitmap is ever >> returned out to anyplace that could execute arbitrary expressions. > I had thought it must already be able to execute arbitrary > expressions, due to the ability to already support user-defined btree > ops (and ops of non-btree types in the case of other index types). No. An index AM is only expected to be able to evaluate clauses of the form <indexed_column> <indexable_operator> <constant>, and the key restriction there is that the operator is one that the AM has volunteered to support. Well, actually, it's the opclass more than the AM that determines this, but anyway it's not just some random operator; more than likely, the AM and/or opclass has got special logic about the operator. This also ties into Robert's point about evaluation of operators against index entries for dead or invisible rows. Indexable operators are much less likely than others to have unexpected side-effects. regards, tom lane
Hi, On 11/04/2015 11:32 PM, Tom Lane wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> On Wed, Nov 4, 2015 at 7:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> You're missing my point: that is possible in an indexscan, but >>> *not* in a bitmap indexscan, because the index AM APIs are >>> totally different in the two cases. In a bitmap scan, nothing >>> more than a TID bitmap is ever returned out to anyplace that >>> could execute arbitrary expressions. > >> I had thought it must already be able to execute arbitrary >> expressions, due to the ability to already support user-defined >> btree ops (and ops of non-btree types in the case of other index >> types). > > No. An index AM is only expected to be able to evaluate clauses of > the form <indexed_column> <indexable_operator> <constant>, and the > key restriction there is that the operator is one that the AM has > volunteered to support. Well, actually, it's the opclass more than > the AM that determines this, but anyway it's not just some random > operator; more than likely, the AM and/or opclass has got special > logic about the operator. Isn't that pretty much exactly the point made by Jeff and Simon, that index AM is currently only allowed to handle the indexable operators, i.e. operators that it can explicitly optimize (e.g. use to walk the btree and such), and completely ignores the other operators despite having all the columns in the index. Which means we'll have to do the heap fetch, which usually means a significant performance hit. > > This also ties into Robert's point about evaluation of operators > against index entries for dead or invisible rows. Indexable operators > are much less likely than others to have unexpected side-effects. I certainly understand there are cases that require care - like the leakproof thing pointed out by Robert for example. I don't immediately see why evaluation against dead rows would be a problem. But maybe we can derive a set of rules required from the operators? Say only those marked as leakproof when RLS is enabled on the table, and perhaps additional things. A "bruteforce" way would be to extend each index AM with every possible operator, but that's not quite manageable I guess. But why couldn't we provide a generic infrastructure that would allow filtering "safe" expressions and validating them on an index tuple? kind regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Nov 4, 2015 at 9:15 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > I certainly understand there are cases that require care - like the > leakproof thing pointed out by Robert for example. I don't immediately see > why evaluation against dead rows would be a problem. Well, one thing is that you might leak information about already-deleted rows, which could be a security vulnerability, or more mundanely cause a function to error out when there are no actually visible rows that could trigger such an error. It would be surprising if you could add a CHECK(b != 0) constraint to a table, query it for rows where a/b>1, and get a division by zero error. But I think it's even worse: I believe there can be dead rows in the heap whose tuple descriptor doesn't even match the current pg_attribute contents. Consider BEGIN; ALTER TABLE ... ADD COLUMN ... int8; INSERT ...; ROLLBACK; ALTER TABLE .. ADD COLUMN .. text; SELECT ... If the SELECT tries to decode one of the tuples added by the failed transaction considering the new column as text when the dead tuples in question had it as an int8, I suspect that you can crash the server. Nothing good will happen, anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 11/05/2015 06:51 PM, Robert Haas wrote: > On Wed, Nov 4, 2015 at 9:15 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> I certainly understand there are cases that require care - like the >> leakproof thing pointed out by Robert for example. I don't immediately see >> why evaluation against dead rows would be a problem. > > Well, one thing is that you might leak information about > already-deleted rows, which could be a security vulnerability, or more > mundanely cause a function to error out when there are no actually > visible rows that could trigger such an error. It would be surprising > if you could add a CHECK(b != 0) constraint to a table, query it for > rows where a/b>1, and get a division by zero error. Yes, I guess we don't have this problem when evaluating the expression on heap because we get to check visibility first, and after moving the expression to the index we can't do that. But then again, can we come up with a way to distinguish operators that are safe to evaluate on indexes - either automatic or manual? We already do that with the indexable operators with explicitly listing them in the opclass, and we also explicitly use the LEAKPROOF for a similar thing. I don't think extending the opclass is a really good solution, but why not to invent another *PROOF flag? > But I think it's even worse: I believe there can be dead rows in the > heap whose tuple descriptor doesn't even match the current > pg_attribute contents. Consider BEGIN; ALTER TABLE ... ADD COLUMN ... > int8; INSERT ...; ROLLBACK; ALTER TABLE .. ADD COLUMN .. text; SELECT > ... If the SELECT tries to decode one of the tuples added by the > failed transaction considering the new column as text when the dead > tuples in question had it as an int8, I suspect that you can crash the > server. Nothing good will happen, anyway. But that is about heap, and we're discussing indexes here, right? I don't think you can break the index descriptor like this, because otherwise we'd already have problems with that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Nov 5, 2015 at 1:29 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> Well, one thing is that you might leak information about >> already-deleted rows, which could be a security vulnerability, or more >> mundanely cause a function to error out when there are no actually >> visible rows that could trigger such an error. It would be surprising >> if you could add a CHECK(b != 0) constraint to a table, query it for >> rows where a/b>1, and get a division by zero error. > > Yes, I guess we don't have this problem when evaluating the expression on > heap because we get to check visibility first, and after moving the > expression to the index we can't do that. Right. > But then again, can we come up with a way to distinguish operators that are > safe to evaluate on indexes - either automatic or manual? We already do that > with the indexable operators with explicitly listing them in the opclass, > and we also explicitly use the LEAKPROOF for a similar thing. I don't think > extending the opclass is a really good solution, but why not to invent > another *PROOF flag? I think LEAKPROOF is probably fine for this. How would the new thing be different? > But that is about heap, and we're discussing indexes here, right? I don't > think you can break the index descriptor like this, because otherwise we'd > already have problems with that. Oh, right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Nov 5, 2015 at 7:45 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
On 11/04/2015 11:32 PM, Tom Lane wrote:Jeff Janes <jeff.janes@gmail.com> writes:On Wed, Nov 4, 2015 at 7:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:You're missing my point: that is possible in an indexscan, but
*not* in a bitmap indexscan, because the index AM APIs are
totally different in the two cases. In a bitmap scan, nothing
more than a TID bitmap is ever returned out to anyplace that
could execute arbitrary expressions.I had thought it must already be able to execute arbitrary
expressions, due to the ability to already support user-defined
btree ops (and ops of non-btree types in the case of other index
types).
No. An index AM is only expected to be able to evaluate clauses of
the form <indexed_column> <indexable_operator> <constant>, and the
key restriction there is that the operator is one that the AM has
volunteered to support. Well, actually, it's the opclass more than
the AM that determines this, but anyway it's not just some random
operator; more than likely, the AM and/or opclass has got special
logic about the operator.
Isn't that pretty much exactly the point made by Jeff and Simon, that index AM is currently only allowed to handle the indexable operators, i.e. operators that it can explicitly optimize (e.g. use to walk the btree and such), and completely ignores the other operators despite having all the columns in the index. Which means we'll have to do the heap fetch, which usually means a significant performance hit.
This also ties into Robert's point about evaluation of operators
against index entries for dead or invisible rows. Indexable operators
are much less likely than others to have unexpected side-effects.
I certainly understand there are cases that require care - like the leakproof thing pointed out by Robert for example. I don't immediately see why evaluation against dead rows would be a problem.
Apart from other problems discussed, I think it could also lead to
a performance penality for the cases when the qual condition is
costly as evaluating such a qual against lot of dead rows would be a
time consuming operation. I am not sure, but I think some of the
other well know databases might not have any such problems as
they store visibility info inside index due to which they don't need to
fetch the heap tuple for evaluating such quals.
Hello, At Fri, 6 Nov 2015 09:49:30 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1L_5T-UYsQeMOrX54g3QeXGhhAk5YFmzZqu5MidzxzkRg@mail.gmail.com> > Apart from other problems discussed, I think it could also lead to > a performance penality for the cases when the qual condition is > costly as evaluating such a qual against lot of dead rows would be a > time consuming operation. I am not sure, but I think some of the > other well know databases might not have any such problems as > they store visibility info inside index due to which they don't need to > fetch the heap tuple for evaluating such quals. I was junst thinking of the same thing. Can we estimate the degree of the expected penalty using heap statistics? Of couse not so accurate though. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hi, On 11/05/2015 07:36 PM, Robert Haas wrote: > On Thu, Nov 5, 2015 at 1:29 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: > >> But then again, can we come up with a way to distinguish operators that are >> safe to evaluate on indexes - either automatic or manual? We already do that >> with the indexable operators with explicitly listing them in the opclass, >> and we also explicitly use the LEAKPROOF for a similar thing. I don't think >> extending the opclass is a really good solution, but why not to invent >> another *PROOF flag? > > I think LEAKPROOF is probably fine for this. How would the new thing > be different? I don't think so - AFAIK "leakproof" is about not revealing information about arguments, nothing more and nothing less. It does not say whether it's safe to evaluate on indexes, and I don't think we should extend the meaning in this direction. I find it perfectly plausible that there will be leakproof functions that can't be pushed to indexes, and that would not be possible with a single marker. Of course, "leakproof" may be a minimum requirement, but another marker is needed to actually enable pushing the function to index. Of course, we may eventually realize that leakproof really is sufficient, and that we can push all leakproof functions to indexes. In that case we may deprecate the other marker and just use leakproof. But if we go just with leakproof and later find that we really need two markers because not all leakproof functions are not index-pushable, it'll be much harder to fix because it will cause performance regressions for the users (some of the expressions won't be pushed to indexes anymore). But I think the marker is the last thing we need to worry about. regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Nov 6, 2015 at 7:11 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> I think LEAKPROOF is probably fine for this. How would the new thing >> be different? > > I don't think so - AFAIK "leakproof" is about not revealing information > about arguments, nothing more and nothing less. It does not say whether it's > safe to evaluate on indexes, and I don't think we should extend the meaning > in this direction. That seems like a non-answer answer. Clearly, if a function can leak information about it's arguments, for example by throwing an error, we can't call it on tuples that might not even be visible, or the behavior of the query might be change. So any function that is not leakproof is also not safe for this purpose. Now that doesn't rule out the possibility that the functions for which this optimization is safe are a subset of the leakproof functions - but offhand I don't see why that should be the case. The index tuple is just a tuple, and the values it contains are just datums, no more or less than in a heap tuple. There could be a reason for concern here, but saying that it might not be "safe to evaluate on indexes" just begs the question: WHY wouldn't that be safe? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 11/07/2015 02:18 AM, Robert Haas wrote: > On Fri, Nov 6, 2015 at 7:11 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >>> I think LEAKPROOF is probably fine for this. How would the new thing >>> be different? >> >> I don't think so - AFAIK "leakproof" is about not revealing information >> about arguments, nothing more and nothing less. It does not say whether it's >> safe to evaluate on indexes, and I don't think we should extend the meaning >> in this direction. > > That seems like a non-answer answer. I'm not claiming I have an answer, really. My knowledge of leakproof stuff is a bit shallow. Also, I had a few beers earlier today, which does not really improve the depth of knowledge on any topic except politics and football (aka soccer). So you may be perfectly right. > Clearly, if a function can leak information about it's arguments, > for example by throwing an error, we can't call it on tuples that > might not even be visible, or the behavior of the query might be > change. So any function that is not leakproof is also not safe for > this purpose. > > Now that doesn't rule out the possibility that the functions for > which this optimization is safe are a subset of the leakproof > functions - but offhand I don't see why that should be the case. The > index tuple is just a tuple, and the values it contains are just > datums, no more or less than in a heap tuple. There could be a reason > for concern here, but saying that it might not be "safe to evaluate > on indexes" just begs the question: WHY wouldn't that be safe? Ah. For some reason I thought the "division by zero" example is one of the non-safe cases, but now I see int4div is not marked as leakproof, so we couldn't push it to index anyway. I've however also noticed that all the 'like' procedures are marked as not leak proof, which is a bit unfortunate because that's the example from Jeff's e-mail that started this thread. regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Nov 6, 2015 at 10:28 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> Hello,
>
> At Fri, 6 Nov 2015 09:49:30 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1L_5T-UYsQeMOrX54g3QeXGhhAk5YFmzZqu5MidzxzkRg@mail.gmail.com>
> > Apart from other problems discussed, I think it could also lead to
> > a performance penality for the cases when the qual condition is
> > costly as evaluating such a qual against lot of dead rows would be a
> > time consuming operation. I am not sure, but I think some of the
> > other well know databases might not have any such problems as
> > they store visibility info inside index due to which they don't need to
> > fetch the heap tuple for evaluating such quals.
>
> I was junst thinking of the same thing. Can we estimate the
> degree of the expected penalty using heap statistics? Of couse
> not so accurate though.
>
>
> Hello,
>
> At Fri, 6 Nov 2015 09:49:30 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1L_5T-UYsQeMOrX54g3QeXGhhAk5YFmzZqu5MidzxzkRg@mail.gmail.com>
> > Apart from other problems discussed, I think it could also lead to
> > a performance penality for the cases when the qual condition is
> > costly as evaluating such a qual against lot of dead rows would be a
> > time consuming operation. I am not sure, but I think some of the
> > other well know databases might not have any such problems as
> > they store visibility info inside index due to which they don't need to
> > fetch the heap tuple for evaluating such quals.
>
> I was junst thinking of the same thing. Can we estimate the
> degree of the expected penalty using heap statistics? Of couse
> not so accurate though.
>
I think so. Information related to number of tuples inserted, updated,
hot-updated, deleted and so on is maintained and is shown by
pg_stat_all_tables. By the way, whats your point here, do you mean to
say that we can estimate approximate penality and then decide whether
quals should be evaluated at index level?
On Fri, Nov 6, 2015 at 7:15 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Hi, > > On 11/07/2015 02:18 AM, Robert Haas wrote: >> >> On Fri, Nov 6, 2015 at 7:11 PM, Tomas Vondra >> <tomas.vondra@2ndquadrant.com> wrote: >>>> >>>> I think LEAKPROOF is probably fine for this. How would the new thing >>>> be different? >>> >>> >>> I don't think so - AFAIK "leakproof" is about not revealing information >>> about arguments, nothing more and nothing less. It does not say whether >>> it's >>> safe to evaluate on indexes, and I don't think we should extend the >>> meaning >>> in this direction. >> >> >> That seems like a non-answer answer. > > > I'm not claiming I have an answer, really. My knowledge of leakproof stuff > is a bit shallow. Also, I had a few beers earlier today, which does not > really improve the depth of knowledge on any topic except politics and > football (aka soccer). So you may be perfectly right. > >> Clearly, if a function can leak information about it's arguments, >> for example by throwing an error, we can't call it on tuples that >> might not even be visible, or the behavior of the query might be >> change. So any function that is not leakproof is also not safe for >> this purpose. >> >> Now that doesn't rule out the possibility that the functions for >> which this optimization is safe are a subset of the leakproof >> functions - but offhand I don't see why that should be the case. The >> index tuple is just a tuple, and the values it contains are just >> datums, no more or less than in a heap tuple. There could be a reason >> for concern here, but saying that it might not be "safe to evaluate >> on indexes" just begs the question: WHY wouldn't that be safe? > > > Ah. For some reason I thought the "division by zero" example is one of the > non-safe cases, but now I see int4div is not marked as leakproof, so we > couldn't push it to index anyway. > > I've however also noticed that all the 'like' procedures are marked as not > leak proof, which is a bit unfortunate because that's the example from > Jeff's e-mail that started this thread. Is there a reason they aren't leak proof? I don't see that they have side effects, or that they can throw errors. What do they leak? Anyway, I just picked that for convenience. One of the original pieces that lead me on this quest had a range inclusion rather than a LIKE. I just thought that changing it to a LIKE made the data generator simpler and easier to reason about, without changing the fundamental principle. Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > On Fri, Nov 6, 2015 at 7:15 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> I've however also noticed that all the 'like' procedures are marked as not >> leak proof, which is a bit unfortunate because that's the example from >> Jeff's e-mail that started this thread. > Is there a reason they aren't leak proof? I don't see that they have > side effects, or that they can throw errors. What do they leak? Huh? regression=# select 'z' like '\'; ERROR: LIKE pattern must not end with escape character regards, tom lane
On Sun, Nov 8, 2015 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> On Fri, Nov 6, 2015 at 7:15 PM, Tomas Vondra >> <tomas.vondra@2ndquadrant.com> wrote: >>> I've however also noticed that all the 'like' procedures are marked as not >>> leak proof, which is a bit unfortunate because that's the example from >>> Jeff's e-mail that started this thread. > >> Is there a reason they aren't leak proof? I don't see that they have >> side effects, or that they can throw errors. What do they leak? > > Huh? > > regression=# select 'z' like '\'; > ERROR: LIKE pattern must not end with escape character Ah, I was only thinking of the pattern as a constant. And there is no way to declare a function leakproof on one input but not another. Cheers, Jeff