Thread: int8 primary keys still not using index without manual JDBC driver patch (7.4RC1)
int8 primary keys still not using index without manual JDBC driver patch (7.4RC1)
From
"Craig O'Shannessy"
Date:
Hi all, Just thought I'd mention that I really think this problem needs to be fixed. I I'm patching the 7.4RC1 JDBC drivers as we speak due to this optimiser bug, and it's the third time I've had to do this. I would think this bug causes quite a lot of people to evaluate postgres and decide it has awful primary key performance! I love postgres, and hate to think that this could be happening. template1=# explain select * from lineitem where lineitemid=26845437; QUERY PLAN -------------------------------------------------------------- Seq Scan on lineitem (cost=0.00..82685.91 rows=1 width=103) Filter: (lineitemid = 26845437) (2 rows) template1=# explain select * from lineitem where lineitemid=26845437::int8; QUERY PLAN -------------------------------------------------------------------------------- Index Scan using lineitem_pkey on lineitem (cost=0.00..3.53 rows=1 width=103) Index Cond: (lineitemid = 26845437::bigint) (2 rows) I've noticed this is in the TODO : Allow SELECT * FROM tab WHERE int2col = 4 to use int2col index, int8, float4, numeric/decimal too [optimizer]) Too hard to fix before 7.4 final? Regards, Craig
Re: int8 primary keys still not using index without manual JDBC driver patch (7.4RC1)
From
Martijn van Oosterhout
Date:
Any particular reason you can't just put the value in quotes and let postgres determine the type? On Fri, Nov 07, 2003 at 10:43:05AM +1100, Craig O'Shannessy wrote: > Hi all, > > Just thought I'd mention that I really think this problem needs to be > fixed. I > > I'm patching the 7.4RC1 JDBC drivers as we speak due to this optimiser > bug, and it's the third time I've had to do this. I would think this bug > causes quite a lot of people to evaluate postgres and decide it has awful > primary key performance! I love postgres, and hate to think that this > could be happening. > > template1=# explain select * from lineitem where lineitemid=26845437; > QUERY PLAN > -------------------------------------------------------------- > Seq Scan on lineitem (cost=0.00..82685.91 rows=1 width=103) > Filter: (lineitemid = 26845437) > (2 rows) > > template1=# explain select * from lineitem where lineitemid=26845437::int8; > QUERY PLAN > -------------------------------------------------------------------------------- > Index Scan using lineitem_pkey on lineitem (cost=0.00..3.53 rows=1 width=103) > Index Cond: (lineitemid = 26845437::bigint) > (2 rows) > > I've noticed this is in the TODO : > Allow SELECT * FROM tab WHERE int2col = 4 to use int2col index, int8, > float4, numeric/decimal too [optimizer]) > > Too hard to fix before 7.4 final? > > Regards, > > Craig > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "All that is needed for the forces of evil to triumph is for enough good > men to do nothing." - Edmond Burke > "The penalty good people pay for not being interested in politics is to be > governed by people worse than themselves." - Plato
Attachment
I'm using EJB CMP (Enterprise Java Beans, Container Managed Persistence), so the SQL is generated. I would think this is a common usage of PostgreSQL, as a database for a modern EJB container. There are options for fixing this (not including fixing postgres itself), IMHO the best is patching the JDBC PreparedStatement code so that setLong() adds '::int8'. The advantage here is that you can use hand coded prepared statements, as well as auto CMP ones, and both will get the proper cast. The real problem is that PostgreSQL out of the box is not really usable for CMP! This really isn't good, and I'm always suprised that it's not fixed. It was very luck we found the bug on the website when we were evaluating PostgreSQL against Oracle, it wasn't easy to track down or fix, and it causes truly horrible performance problems. I spose you'd call it my pet peeve. Craig On Fri, 7 Nov 2003, Martijn van Oosterhout wrote: > Any particular reason you can't just put the value in quotes and let > postgres determine the type? > > On Fri, Nov 07, 2003 at 10:43:05AM +1100, Craig O'Shannessy wrote: > > Hi all, > > > > Just thought I'd mention that I really think this problem needs to be > > fixed. I > > > > I'm patching the 7.4RC1 JDBC drivers as we speak due to this optimiser > > bug, and it's the third time I've had to do this. I would think this bug > > causes quite a lot of people to evaluate postgres and decide it has awful > > primary key performance! I love postgres, and hate to think that this > > could be happening. > > > > template1=# explain select * from lineitem where lineitemid=26845437; > > QUERY PLAN > > -------------------------------------------------------------- > > Seq Scan on lineitem (cost=0.00..82685.91 rows=1 width=103) > > Filter: (lineitemid = 26845437) > > (2 rows) > > > > template1=# explain select * from lineitem where lineitemid=26845437::int8; > > QUERY PLAN > > -------------------------------------------------------------------------------- > > Index Scan using lineitem_pkey on lineitem (cost=0.00..3.53 rows=1 width=103) > > Index Cond: (lineitemid = 26845437::bigint) > > (2 rows) > > > > I've noticed this is in the TODO : > > Allow SELECT * FROM tab WHERE int2col = 4 to use int2col index, int8, > > float4, numeric/decimal too [optimizer]) > > > > Too hard to fix before 7.4 final? > > > > Regards, > > > > Craig > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: Have you searched our list archives? > > > > http://archives.postgresql.org > >
Hi Craig, Craig O'Shannessy schrieb: > I'm using EJB CMP (Enterprise Java Beans, Container Managed Persistence), > so the SQL is generated. I would think this is a common usage of > PostgreSQL, as a database for a modern EJB container. There are options > for fixing this (not including fixing postgres itself), IMHO the best is > patching the JDBC PreparedStatement code so that setLong() adds '::int8'. > The advantage here is that you can use hand coded prepared statements, as > well as auto CMP ones, and both will get the proper cast. > > The real problem is that PostgreSQL out of the box is not really usable > for CMP! This really isn't good, and I'm always suprised that it's not > fixed. It was very luck we found the bug on the website when we were > evaluating PostgreSQL against Oracle, it wasn't easy to track down or fix, > and it causes truly horrible performance problems. > > I spose you'd call it my pet peeve. > I agree with you wholeheartly - it also bothers me why postgresql can cast [0-9]+ to int4, but only '[0-9]+' to int8 or int2, I really cannot see the difference. Any ideas where we have to look for the place to patch? Regards Tino
Tino Wildenhain wrote: > Hi Craig, > > Craig O'Shannessy schrieb: > > I'm using EJB CMP (Enterprise Java Beans, Container Managed Persistence), > > so the SQL is generated. I would think this is a common usage of > > PostgreSQL, as a database for a modern EJB container. There are options > > for fixing this (not including fixing postgres itself), IMHO the best is > > patching the JDBC PreparedStatement code so that setLong() adds '::int8'. > > The advantage here is that you can use hand coded prepared statements, as > > well as auto CMP ones, and both will get the proper cast. > > > > The real problem is that PostgreSQL out of the box is not really usable > > for CMP! This really isn't good, and I'm always suprised that it's not > > fixed. It was very luck we found the bug on the website when we were > > evaluating PostgreSQL against Oracle, it wasn't easy to track down or fix, > > and it causes truly horrible performance problems. > > > > I spose you'd call it my pet peeve. > > > I agree with you wholeheartly - it also bothers me why > postgresql can cast [0-9]+ to int4, but only > '[0-9]+' to int8 or int2, I really cannot see the > difference. OK, the issue for int8, I think, is that when the lexer/scanner is looking for tokens, it doesn't know how the token is going to be used, so it doesn't know if [0-9]+ should be considered an int4 or int8 so it makes it an int4. When it is a string, there is no type-casting done in the lexer because it is just a string. What I don't understand is why we can't just upgrade an int4 to int8 if the value is greater than an int4, and why we can't just convert it inside to int8 as needed. I am sure there is a reason, but I can't remember it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Tino Wildenhain <tino@wildenhain.de> writes: > I agree with you wholeheartly - it also bothers me why > postgresql can cast [0-9]+ to int4, but only > '[0-9]+' to int8 or int2, I really cannot see the > difference. > Any ideas where we have to look for the place to patch? Try reading the thousands of lines of discussion of this problem that exist in the last several years of the pgsql-hackers archives. Basically, we have found no solution that doesn't have side-effects worse than what it fixes. Here's one recent example of a possible solution that crashed and burned on takeoff: http://archives.postgresql.org/pgsql-hackers/2002-11/msg00468.php regards, tom lane
Tom Lane wrote: > Tino Wildenhain <tino@wildenhain.de> writes: > > I agree with you wholeheartly - it also bothers me why > > postgresql can cast [0-9]+ to int4, but only > > '[0-9]+' to int8 or int2, I really cannot see the > > difference. > > > Any ideas where we have to look for the place to patch? > > Try reading the thousands of lines of discussion of this problem that > exist in the last several years of the pgsql-hackers archives. > Basically, we have found no solution that doesn't have side-effects > worse than what it fixes. Here's one recent example of a possible > solution that crashed and burned on takeoff: > http://archives.postgresql.org/pgsql-hackers/2002-11/msg00468.php I read that URL and it covered the main stuff. This part was particularly interesting: I am wondering about adding some notion of "conversion distance" associated with casts, and preferring choices that require a smaller conversion distance; perhaps this could replace the concept of "preferred type", too. But again I don't have a specific proposal to make. Any thoughts? The test case that was actually in the regression tests was select to_hex(256*256*256 - 1) AS "ffffff"; ! ERROR: Function to_hex(smallint) does not exist ! Unable to identify a function that satisfies the given argument types ! You may need to add explicit typecasts Even had the parser resolved the overloaded to_hex call, this test would have failed, because int2 multiplication overflows: regression=# select 256::int2*256::int2*256::int2; ?column? ---------- 0 (1 row) I am thinking that it might be good to eliminate all the basic arithmetic operators on int2, so that you get int4 as the minimum width for arithmetic. But this cannot work unless we have some concept like conversion distance, or the parser will just fail to make a choice between int4, int8, etc alternatives. I think your idea of conversion distance is required for any working solution --- without it, things seem impossible --- you have an int4 value and have function for int2 and int8 --- you need to lean in a specific direction and can't just give up. We can try removing most int2 functions and see if that makes such conversions much easier, or try casting incoming constants to int2 to see what happens. I see float4/float8/numeric as similar, though that has precision issues that I am not sure how to address --- when can you tell if the user wants full precision? Sorry I haven't focused on this issue but I am ready to do so if I can be of help. How can we set up some tests of these ideas? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > We can try removing most int2 functions and see if that makes such > conversions much easier, or try casting incoming constants to int2 to > see what happens. I already did that --- that was exactly the substance of the tests I was reporting in that message. I have been thinking lately that the fundamental approach is wrong anyway. Basically the idea was to make the world safe for single-datatype index handling by removing all the cross-type comparison operators. The reason SELECT ... WHERE int8col = 42 isn't indexable is that the = operator is int84eq, which is not to be found in the set of operators associated with an index on int8. What we were thinking was that if we didn't have int84eq then the parser would be forced to promote the 42 to int8, and then the comparison using int8eq would be recognized as indexable. I think this might actually be workable for int8, but it's not going to work for int2 without changing the initial typing of small integer constants, and we already know that that opens a Pandora's box of other problems. But quite aside from the semantic difficulties of rejiggering all that stuff, it's going to break other parts of the optimizer if we do it. In particular it will interfere with handling of mergejoins and recognizing transitive equality. For example consider SELECT ... WHERE a.int8col = b.int4col AND b.int4col = 42; Currently we are able to deduce a.int8col = 42 (where the operator is int84eq). If we remove int84eq then the output of the parser for this example will look like SELECT ... WHERE a.int8col = b.int4col::int8 AND b.int4col = 42; and the transitive equality will not be recognized because b.int4col::int8 is not the same expression as b.int4col. So I'm currently thinking we'd be better off not to try to eliminate the cross-type comparison operators. Instead we need some solution that is narrowly focused on the problem of making a non-indexable comparison indexable, by converting a comparison value of the wrong datatype into the right datatype locally to the indexscan plan generation code. I posted some speculation about that here: http://archives.postgresql.org/pgsql-hackers/2003-09/msg00983.php regards, tom lane
I said: > So I'm currently thinking we'd be better off not to try to eliminate > the cross-type comparison operators. Instead we need some solution > that is narrowly focused on the problem of making a non-indexable > comparison indexable, by converting a comparison value of the wrong > datatype into the right datatype locally to the indexscan plan > generation code. BTW, plan C would be to attack the problem head-on by allowing index opclasses to include cross-datatype operators. This might be the cleanest solution in the long run, but it seems likely to be a lot of work and could force us to break existing user-defined operator classes. I think everyone has shied away from that without much thought, but in principle at least we could probably do it. (Say, extend pg_amop and pg_amproc so that the datatype of the other operand becomes part of the key.) We are now well outside the charter of pgsql-general, so please redirect any followup discussion to pgsql-hackers ... regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > We can try removing most int2 functions and see if that makes such > > conversions much easier, or try casting incoming constants to int2 to > > see what happens. > > I already did that --- that was exactly the substance of the tests I was > reporting in that message. > > I have been thinking lately that the fundamental approach is wrong > anyway. Basically the idea was to make the world safe for > single-datatype index handling by removing all the cross-type comparison > operators. The reason > > SELECT ... WHERE int8col = 42 > > isn't indexable is that the = operator is int84eq, which is not to be > found in the set of operators associated with an index on int8. What > we were thinking was that if we didn't have int84eq then the parser > would be forced to promote the 42 to int8, and then the comparison using > int8eq would be recognized as indexable. Could we not always promote int4 to int8 for indexing purposes? I realize OID has issues, though, as you mention in that URL: This algorithm was wrong on both practical and theoretical levels; in the first place it's not very helpful to only be able to handle binary-compatible transformations, and in the second place there isn't any good guarantee that it's not changing the semantics when it replaces the operator. For instance int4 < and oid < do not act the same. Depending on equality of operator names was a bad idea even then, and would be quite unworkable now in the world of schema search paths. Could we just promote int4 constants to int8 always? I just checked and 2^32-1 is already promoted to int8: select 4294967295; so the funny thing is that: SELECT ... WHERE int8col = 4294967296; already uses the int8 index. I think the complex case you mentioned was oid. Let's look at the various possible constant comparisons against an oid column: -1 This constant would match no oid, so we could just allow the sequential scan. I don't think anyone would expect good behavior for such a comparison. 1 This could be promoted to oid cleanly. 2^31+1 This will come in a int8, so we can just downcast to oid automatically. I know your case was "<" comparison. It would be: SELECT ... WHERE oidcol < -1; SELECT ... WHERE oidcol < 1; SELECT ... WHERE oidcol < 2147483649; 2^31+1 These all seem to work, I think. -1 will not use an index, which is OK. I am concerned about having to add catalog maintenance for every index case, which seems it could be a lot. Here is my logic. I am having trouble getting the big picture on this: int2 fits in the int4 range int4 fits in the int8 range oid fits only in the int8 range, some oids fit in int4 This means a valid oid could come in as int4 or int8. I realize this requires hard-coded comparisons to C include defines to get the maxium for each type. I know this breaks our type-neutral style, but in this case, it seems it might be the cleanest way --- abstracting this out into a table seems too hard. Now for int2-based indexes. Can't we just downcast constants to int2 if they fit in the int2 valid range? > I think this might actually be workable for int8, but it's not going to > work for int2 without changing the initial typing of small integer > constants, and we already know that that opens a Pandora's box of other > problems. > > But quite aside from the semantic difficulties of rejiggering all that > stuff, it's going to break other parts of the optimizer if we do it. > In particular it will interfere with handling of mergejoins and > recognizing transitive equality. For example consider > > SELECT ... WHERE a.int8col = b.int4col AND b.int4col = 42; > > Currently we are able to deduce a.int8col = 42 (where the operator > is int84eq). If we remove int84eq then the output of the parser for > this example will look like > > SELECT ... WHERE a.int8col = b.int4col::int8 AND b.int4col = 42; > > and the transitive equality will not be recognized because > b.int4col::int8 is not the same expression as b.int4col. Seems we should keep those cross-type comparisons around for col op col comparisons, at least, as well as internal optimizer use as you described. > So I'm currently thinking we'd be better off not to try to eliminate > the cross-type comparison operators. Instead we need some solution > that is narrowly focused on the problem of making a non-indexable > comparison indexable, by converting a comparison value of the wrong > datatype into the right datatype locally to the indexscan plan > generation code. I posted some speculation about that here: > http://archives.postgresql.org/pgsql-hackers/2003-09/msg00983.php Agreed. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Here is my logic. I am having trouble getting the big picture on this: Th big picture is that it doesn't work very well to assume that indexes only need to handle same-datatype comparisons. I think we are ultimately going to have to address that issue more-or-less directly. At least for btree indexes, it doesn't seem that the index mechanics would have too big a problem with this --- as long as you are using the proper comparison function, who cares whether the righthand side of the comparison is the same datatype as the left? It might be harder for other index types, but 99.9% of the problem is with btrees anyway. I think if we implemented it only for btrees we'd still have a usable solution. I'm currently digging around to see how much of the rest of the backend really cares about it ... regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: >> Here is my logic. I am having trouble getting the big picture on this: > > Th big picture is that it doesn't work very well to assume that indexes > only need to handle same-datatype comparisons. I think we are > ultimately going to have to address that issue more-or-less directly. > > At least for btree indexes, it doesn't seem that the index mechanics > would have too big a problem with this --- as long as you are using the > proper comparison function, who cares whether the righthand side of the > comparison is the same datatype as the left? It might be harder for > other index types, but 99.9% of the problem is with btrees anyway. > I think if we implemented it only for btrees we'd still have a usable > solution. > > I'm currently digging around to see how much of the rest of the backend > really cares about it ... Wouldn't that logically lead to an "abstract" operator class to be pointed to in the original indexes operator class? In the concrete example, int8, a btree index is using int8_ops for opcamid 403. This operator class is specifically set up with 5 operators designed for int8 to int8 comparision. Assume we create a pseudo operator class that only tells the operator names "<, <=, =, >= and >", but does not already resolve them to the operators and thus the comparision functions and allow int8_ops to be substituted with this pseudo operator class. In the case the planner sees a type mismatch in the comparision, and the index operator class points to our pseudo opclass, then it could try "on the fly" to construct the operator class if it finds cross type operators/functions for all 5 needed operators. Not that I know much about the planner, more asking. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck <JanWieck@Yahoo.com> writes: > Tom Lane wrote: >> Th big picture is that it doesn't work very well to assume that indexes >> only need to handle same-datatype comparisons. I think we are >> ultimately going to have to address that issue more-or-less directly. > Wouldn't that logically lead to an "abstract" operator class to be > pointed to in the original indexes operator class? I've just posted a proposal in pgsql-hackers that attacks the problem a little differently: put the cross-type operators directly into the opclass. regards, tom lane
Tom Lane wrote: > Jan Wieck <JanWieck@Yahoo.com> writes: >> Tom Lane wrote: >>> Th big picture is that it doesn't work very well to assume that indexes >>> only need to handle same-datatype comparisons. I think we are >>> ultimately going to have to address that issue more-or-less directly. > >> Wouldn't that logically lead to an "abstract" operator class to be >> pointed to in the original indexes operator class? > > I've just posted a proposal in pgsql-hackers that attacks the problem > a little differently: put the cross-type operators directly into the > opclass. I like that approach even better than mine. It needs less work during the actual planning. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck <JanWieck@Yahoo.com> writes: > I like that approach even better than mine. It needs less work during > the actual planning. Right. I don't think I will need to touch the planner at all, except for recording the operator strategy numbers in indexscan plan nodes. regards, tom lane