Thread: Planning without reason.
Hi, I think there is a bug/misscalculation of some rare query i am using. Suppose we only query one specific relation R. R contains indices but not on all attributes or not on all ordered subset of keys. Query example: (SELECT * FROM R WHERE a=3, b=6,. ...) UNION (SELECT * FROM R WHERE b=5, d=2,. ...) UNION .... And lots of unions. When doing explain analyze i see that some nodes in the plan uses an index and some uses a sequential scan (where the WHERE clause made it impossible to use an index). As you can see, having even one sequential scan should nullify the whole plan to using only one node of sequential scan. Currently, the planner does not seem to understand that. I circumvent the problem by doing a recursive tree search of the plan and checking if there is a node of sequential scan (and redefine the query as a sequential scan) but obviously it is prefferable that the planner will do this. P.s.: I am currently just writing the query as a string and open a cursor. Is there a simple way to use Datums instead of converting the attributes to strings to create a plan for SPI. 10x. -- Regards, Tzahi. -- Tzahi Fadida Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info WARNING TO SPAMMERS: see at http://members.lycos.co.uk/my2nis/spamwarning.html
On Fri, Jun 23, 2006 at 03:57:19PM +0300, Tzahi Fadida wrote: > R contains indices but not on all attributes or not on > all ordered subset of keys. > > Query example: > (SELECT * FROM R > WHERE a=3, b=6,. ...) > UNION > (SELECT * FROM R > WHERE b=5, d=2,. ...) > UNION > .... > And lots of unions. Do you need UNION, or do you actually mean UNION ALL? Also, couldn't you just do: SELECT * FROM R WHERE (a=3, b=6, ...) OR (b=5, d=2, ...) etc > I am currently just writing the query as a string and open a cursor. > Is there a simple way to use Datums instead of converting the attributes to > strings to create a plan for SPI. > 10x. I imagine SPI_prepare() and SPI_execp() would be used for this. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Friday 23 June 2006 16:14, Martijn van Oosterhout wrote: > On Fri, Jun 23, 2006 at 03:57:19PM +0300, Tzahi Fadida wrote: > > R contains indices but not on all attributes or not on > > all ordered subset of keys. > > > > Query example: > > (SELECT * FROM R > > WHERE a=3, b=6,. ...) > > UNION > > (SELECT * FROM R > > WHERE b=5, d=2,. ...) > > UNION > > .... > > And lots of unions. > > Do you need UNION, or do you actually mean UNION ALL? I am using UNION ALL, but it doesn't matter i am actually doing: (SELECT * FROM R WHERE a=3, b=6,. ... LIMIT 1) UNION ALL (SELECT * FROM R WHERE b=5, d=2,. ... LIMIT 1) UNION ALL .... with LIMIT 1. My initial reasoning was to avoid extra sorts but i guess that the planner just doesn't get the LIMIT 1. I see now that UNION should be better for the planner to undestand (not performance wise). However, UNION alone, doesn't seem to cut it. Following is an example. t7 has 2 attributes and a non-unique index on one attribute. here is a printout: explain analyze (select * from t7 where a4=113 LIMIT 1) UNION (select * from t7 where a2=139 LIMIT 1); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------Unique (cost=23.18..23.19 rows=2 width=8) (actual time=0.149..0.165 rows=1 loops=1) -> Sort (cost=23.18..23.18 rows=2 width=8) (actual time=0.142..0.148 rows=2 loops=1) Sort Key: a4, a2 -> Append (cost=0.00..23.17 rows=2 width=8) (actual time=0.052..0.106 rows=2 loops=1) -> Limit (cost=0.00..5.65 rows=1 width=8) (actual time=0.046..0.049 rows=1 loops=1) -> Index Scan using indext7 on t7 (cost=0.00..5.65 rows=1 width=8) (actual time=0.038..0.038 rows=1 loops=1) Index Cond: (a4 = 113) -> Limit (cost=0.00..17.50 rows=1 width=8) (actual time=0.035..0.038 rows=1 loops=1) -> Seq Scan on t7 (cost=0.00..17.50 rows=1 width=8) (actual time=0.029..0.029 rows=1 loops=1) Filter: (a2 = 139)Total runtime: 0.334 ms (11 rows) > > Also, couldn't you just do: > > SELECT * FROM R > WHERE (a=3, b=6, ...) > OR (b=5, d=2, ...) > etc No, a filtering action is not enough since my goal is to only use indices when retrieving single tuples each time thus, if i will use OR i cannot control the number of tuples returned by each Or clause. > > > I am currently just writing the query as a string and open a cursor. > > Is there a simple way to use Datums instead of converting the attributes > > to strings to create a plan for SPI. > > 10x. > > I imagine SPI_prepare() and SPI_execp() would be used for this. I am already using SPI_prepare but it uses a query of the form of a char string, which i need to prepare and is quite long. I.e. if i have 100 tuples i wish to retrieve it can be very wasteful to prepare the string in memory and use SPI_prepare to prepare and later execute it. better to use directly the datums (which i already have deformed from previous operations). > > Have a nice day, -- Regards, ��������Tzahi. -- Tzahi Fadida Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info WARNING TO SPAMMERS: �see at http://members.lycos.co.uk/my2nis/spamwarning.html
On Fri, Jun 23, 2006 at 05:12:14PM +0300, Tzahi Fadida wrote: > My initial reasoning was to avoid extra sorts but i guess that the planner > just doesn't get the LIMIT 1. I see now that UNION should be better > for the planner to undestand (not performance wise). > However, UNION alone, doesn't seem to cut it. > Following is an example. t7 has 2 attributes and a non-unique index on one > attribute. here is a printout: > explain analyze (select * from t7 where a4=113 LIMIT 1) UNION (select * from > t7 where a2=139 LIMIT 1); What are the indexes on? If you only have an index on a4, the latter query has to be an index scan and there's no way to optimise it way. > > Also, couldn't you just do: > > > > SELECT * FROM R > > WHERE (a=3, b=6, ...) > > OR (b=5, d=2, ...) > > etc > > No, a filtering action is not enough since my goal is to only use indices > when retrieving single tuples each time thus, if i will use OR i cannot > control the number of tuples returned by each Or clause. I must admit, this is a really strange way of doing it. For example, if multiple rows match, the tuples eventually returned will be a random selection of the rows that matched. Especially with the "limit 1" there's no way the optimiser could combine the individual scans. If you really need the "LIMIT 1" and you don't have full index coverage then you're quite limited as to how it can be optimised. > > > I am currently just writing the query as a string and open a cursor. > > > Is there a simple way to use Datums instead of converting the attributes > > > to strings to create a plan for SPI. > > > 10x. > > > > I imagine SPI_prepare() and SPI_execp() would be used for this. > > I am already using SPI_prepare but it uses a query of the form of a char > string, which i need to prepare and is quite long. I.e. if i have 100 tuples > i wish to retrieve it can be very wasteful to prepare the string in memory > and use SPI_prepare to prepare and later execute it. > better to use directly the datums (which i already have deformed from > previous operations). I'm confused here too. I thought the datums you're talking about were arguments, thus you could push them straight to SPI_execp(). But you seem to be suggesting parts of the actual query are in datum form also? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Friday 23 June 2006 17:47, Martijn van Oosterhout wrote: > On Fri, Jun 23, 2006 at 05:12:14PM +0300, Tzahi Fadida wrote: > > My initial reasoning was to avoid extra sorts but i guess that the > > planner just doesn't get the LIMIT 1. I see now that UNION should be > > better for the planner to undestand (not performance wise). > > However, UNION alone, doesn't seem to cut it. > > Following is an example. t7 has 2 attributes and a non-unique index on > > one attribute. here is a printout: > > explain analyze (select * from t7 where a4=113 LIMIT 1) UNION (select * > > from t7 where a2=139 LIMIT 1); > > What are the indexes on? If you only have an index on a4, the latter > query has to be an index scan and there's no way to optimise it way. That's my point, it should have only used a sequence scan and not also do an index scan. In other words, it should have consolidated the two nodes of index scan and sequence scan into a single plan node where you only scan sequentially the relation and choose a tuple for each UNION clause. > > > > Also, couldn't you just do: > > > > > > SELECT * FROM R > > > WHERE (a=3, b=6, ...) > > > OR (b=5, d=2, ...) > > > etc > > > > No, a filtering action is not enough since my goal is to only use indices > > when retrieving single tuples each time thus, if i will use OR i cannot > > control the number of tuples returned by each Or clause. > > I must admit, this is a really strange way of doing it. For example, if > multiple rows match, the tuples eventually returned will be a random > selection of the rows that matched. Especially with the "limit 1" > there's no way the optimiser could combine the individual scans. It is a query i use for full disjunction which is a part of the algorithm. I am doing it manually, so i don't see why it can't do it itself. I.e.: Scan sequentially R. for each UNION clause find a matching tuple. the end. > > If you really need the "LIMIT 1" and you don't have full index coverage > then you're quite limited as to how it can be optimised. You misunderstood me, i wish the planner to only use sequence scan in the event where even one node is a sequential scan. > > > > > I am currently just writing the query as a string and open a cursor. > > > > Is there a simple way to use Datums instead of converting the > > > > attributes to strings to create a plan for SPI. > > > > 10x. > > > > > > I imagine SPI_prepare() and SPI_execp() would be used for this. > > > > I am already using SPI_prepare but it uses a query of the form of a char > > string, which i need to prepare and is quite long. I.e. if i have 100 > > tuples i wish to retrieve it can be very wasteful to prepare the string > > in memory and use SPI_prepare to prepare and later execute it. > > better to use directly the datums (which i already have deformed from > > previous operations). > > I'm confused here too. I thought the datums you're talking about were > arguments, thus you could push them straight to SPI_execp(). But you > seem to be suggesting parts of the actual query are in datum form also? Example. i have a tuple T i am searching for. T contains attribute1, attribute2. I have T in a heap_deformtuple(T) manner, i.e., i have T->v and T->n (for nulls). Currently i am doing (loosely): "(SELECT * FROM R where attribute1=" + convertDatumToCharString(T->v[0])+ " AND attribute2=" + convertDatumToCharString(T->v[1]) +" LIMIT 1)" + "UNION" ... as above. I can use prepare without conversions but i still have to construct the long query each time. I can't do prepare just once because the where clauses structures are always changing. Thus, i was wondering if i can also construct the part in the plan where i request to SELECT * FROM R... I.e. not to use strings at all. The structure of the query is the same all the time. I.e. there is the SELECT * FROM R and the WHERE clause with LIMIT 1 nodes with UNION ALL between SELECTS. > > Have a nice day, -- Regards, ��������Tzahi. -- Tzahi Fadida Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info WARNING TO SPAMMERS: �see at http://members.lycos.co.uk/my2nis/spamwarning.html
On Fri, Jun 23, 2006 at 06:10:33PM +0300, Tzahi Fadida wrote: > On Friday 23 June 2006 17:47, Martijn van Oosterhout wrote: > > On Fri, Jun 23, 2006 at 05:12:14PM +0300, Tzahi Fadida wrote: > > > My initial reasoning was to avoid extra sorts but i guess that the > > > planner just doesn't get the LIMIT 1. I see now that UNION should be > > > better for the planner to undestand (not performance wise). > > > However, UNION alone, doesn't seem to cut it. > > > Following is an example. t7 has 2 attributes and a non-unique index on > > > one attribute. here is a printout: > > > explain analyze (select * from t7 where a4=113 LIMIT 1) UNION (select * > > > from t7 where a2=139 LIMIT 1); > > > > What are the indexes on? If you only have an index on a4, the latter > > query has to be an index scan and there's no way to optimise it way. > > That's my point, it should have only used a sequence scan and not also > do an index scan. > In other words, it should have consolidated the two nodes of index scan and > sequence scan into a single plan node where you only scan sequentially the > relation and choose a tuple for each UNION clause. I see what you're saying, but that's not necessarily faster. If you consider the case where the tuple found by the seq scan is near the beginning of the table and the tuple by the index near the end, it could be worse to fold them. If you did only a single seq scan it would have to scan the whole table, whereas now it only has to scan the beginning. Just because it says "Seq Scan" doesn't mean it actually scans the whole table. There isn't a way in postgres to specify the plan you want. LIMIT only works on the output of entire nodes, you can't say "output one tuple matching A, one tuple matching B" in a single node. > Example. i have a tuple T i am searching for. > T contains attribute1, attribute2. I have T in a > heap_deformtuple(T) manner, i.e., i have T->v and T->n (for nulls). > Currently i am doing (loosely): > "(SELECT * FROM R where attribute1=" + convertDatumToCharString(T->v[0])+ > " AND attribute2=" + convertDatumToCharString(T->v[1]) +" LIMIT 1)" > + "UNION" > ... as above. > > I can use prepare without conversions but i still have to construct the long > query each time. I can't do prepare just once because the where clauses > structures are always changing. Thus, i was wondering if i can > also construct the part in the plan where i request to SELECT * FROM R... > I.e. not to use strings at all. The structure of the query is the same all the > time. I.e. there is the SELECT * FROM R and the WHERE clause with LIMIT 1 > nodes with UNION ALL between SELECTS. I see. You could do: (SELECT * FROM R WHERE attribute1=$1 and attribute2=$2 LIMIT 1) ... etc ... That saves the conversions to cstrings, but you're looking for something more abstract than that. I'm not sure that exists. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Fri, Jun 23, 2006 at 03:57:19PM +0300, Tzahi Fadida wrote: >> (SELECT * FROM R >> WHERE a=3, b=6,. ...) >> UNION >> (SELECT * FROM R >> WHERE b=5, d=2,. ...) >> UNION >> .... >> And lots of unions. > Do you need UNION, or do you actually mean UNION ALL? > Also, couldn't you just do: > SELECT * FROM R > WHERE (a=3, b=6, ...) > OR (b=5, d=2, ...) > etc That seems to be what Tzahi wants the system to do for him. But the OR format is not in general logically equivalent to either UNION or UNION ALL, because UNION would cause duplicate output rows to be suppressed whereas UNION ALL could allow the same table row to be emitted multiple times (if two different WHERE conditions could select the same row). It's conceivable that the planner could prove that neither effect is possible in a particular query and then make the transformation automatically, but I'm not about to expend that kind of planning effort on such an odd case --- checking for it would waste entirely too many cycles in most cases. regards, tom lane
Perhaps it is over the top just for my specific query. Basically, i wish not to do something the system should do because, as i already noticed, when versions changes the database can break your code if you don't keep up. I guess i can make a map of attributes participating in an index of a relation. Also, i would have to take into account the type of index used. For example, a btree should have the capability to do prefix key searches while hash indices probably can't. Then check each target tuple if it can use an index. All this before constructing the query for the planner. However, i already played with this in the past. I reached the conclusion that it is better to let the planner decide what is best since it is too complex for me to say things about cost estimates or index changing capabilities. On Friday 23 June 2006 19:28, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > On Fri, Jun 23, 2006 at 03:57:19PM +0300, Tzahi Fadida wrote: > >> (SELECT * FROM R > >> WHERE a=3, b=6,. ...) > >> UNION > >> (SELECT * FROM R > >> WHERE b=5, d=2,. ...) > >> UNION > >> .... > >> And lots of unions. > > > > Do you need UNION, or do you actually mean UNION ALL? > > > > Also, couldn't you just do: > > > > SELECT * FROM R > > WHERE (a=3, b=6, ...) > > OR (b=5, d=2, ...) > > etc > > That seems to be what Tzahi wants the system to do for him. But the OR > format is not in general logically equivalent to either UNION or UNION > ALL, because UNION would cause duplicate output rows to be suppressed > whereas UNION ALL could allow the same table row to be emitted multiple > times (if two different WHERE conditions could select the same row). > > It's conceivable that the planner could prove that neither effect is > possible in a particular query and then make the transformation > automatically, but I'm not about to expend that kind of planning effort > on such an odd case --- checking for it would waste entirely too many > cycles in most cases. > > regards, tom lane -- Regards, ��������Tzahi. -- Tzahi Fadida Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info WARNING TO SPAMMERS: �see at http://members.lycos.co.uk/my2nis/spamwarning.html
Tom Lane <tgl@sss.pgh.pa.us> writes: > It's conceivable that the planner could prove that neither effect is > possible in a particular query and then make the transformation > automatically, but I'm not about to expend that kind of planning effort > on such an odd case --- checking for it would waste entirely too many > cycles in most cases. Fwiw these aren't really very rare cases. Usually it goes the other direction though. I seem to recall Oracle did in fact support a plan where it converted OR expressions into a kind of union plan node. But I think Postgres's bitmap index scan satisfies much of the same need. I think the most useful case where the union plan was beneficial was precisely when you had something like WHERE index_col1=1 OR indexed_col2=2. Going from an UNION plan to a OR plan would be somewhat strange. Programmers don't usually write plans as UNION in place of the more natural OR unless they have a reason to. -- greg
On Fri, Jun 23, 2006 at 08:14:07PM +0300, Tzahi Fadida wrote: > I guess i can make a map of attributes participating in an index > of a relation. > Also, i would have to take into account the type of index used. > For example, a btree should have the capability to do prefix key > searches while hash indices probably can't. > Then check each target tuple if it can use an index. > All this before constructing the query for the planner. At the end of the day it comes down to that Postgres has no way currently to express the plan you want, so we're not talking about small planner or optimiser changes, we're talking about creating a completely new node type which could be used for this purpose. It would be a node that sat on top of a "seq scan" and had a number of conditions. Each tuple would be matched against each condition. Once a condition is matched, that tuple is returned. Once a condition has matched N times it is disabled. Once all conditions are disabled, you're done. Seems terribly special purpose, yet I don't see how you could do it any other way. If it wern't for the LIMIT clauses the whole operation would be trivial. > > It's conceivable that the planner could prove that neither effect is > > possible in a particular query and then make the transformation > > automatically, but I'm not about to expend that kind of planning effort > > on such an odd case --- checking for it would waste entirely too many > > cycles in most cases. I think in the case we have here, the transformation wouldn't apply anyway (otherwise it could be done by hand). Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.