Thread: Planning without reason.

Planning without reason.

From
Tzahi Fadida
Date:
Hi,
I think there is a bug/misscalculation of some rare query i am using.

Suppose we only query one specific relation R.

R contains indices but not on all attributes or not on
all ordered subset of keys.

Query example:
(SELECT * FROM R
WHERE a=3, b=6,. ...)
UNION
(SELECT * FROM R
WHERE b=5, d=2,. ...)
UNION
....
And lots of unions.

When doing explain analyze i see that some nodes in the plan uses an index
and some uses a sequential scan (where the WHERE clause made it impossible
to use an index).
As you can see, having even one sequential scan should nullify the whole plan
to using only one node of sequential scan.
Currently, the planner does not seem to understand that.
I circumvent the problem by doing a recursive tree search of the plan
and checking if there is a node of sequential scan (and redefine the query
as a sequential scan) but obviously it is prefferable that the planner will do
this.


P.s.:
I am currently just writing the query as a string and open a cursor.
Is there a simple way to use Datums instead of converting the attributes to
strings to create a plan for SPI.
10x.

--
Regards,
        Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at
http://members.lycos.co.uk/my2nis/spamwarning.html


Re: Planning without reason.

From
Martijn van Oosterhout
Date:
On Fri, Jun 23, 2006 at 03:57:19PM +0300, Tzahi Fadida wrote:
> R contains indices but not on all attributes or not on
> all ordered subset of keys.
>
> Query example:
> (SELECT * FROM R
> WHERE a=3, b=6,. ...)
> UNION
> (SELECT * FROM R
> WHERE b=5, d=2,. ...)
> UNION
> ....
> And lots of unions.

Do you need UNION, or do you actually mean UNION ALL?

Also, couldn't you just do:

SELECT * FROM R
WHERE (a=3, b=6, ...)
OR (b=5, d=2, ...)
etc

> I am currently just writing the query as a string and open a cursor.
> Is there a simple way to use Datums instead of converting the attributes to
> strings to create a plan for SPI.
> 10x.

I imagine SPI_prepare() and SPI_execp() would be used for this.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Planning without reason.

From
Tzahi Fadida
Date:
On Friday 23 June 2006 16:14, Martijn van Oosterhout wrote:
> On Fri, Jun 23, 2006 at 03:57:19PM +0300, Tzahi Fadida wrote:
> > R contains indices but not on all attributes or not on
> > all ordered subset of keys.
> >
> > Query example:
> > (SELECT * FROM R
> > WHERE a=3, b=6,. ...)
> > UNION
> > (SELECT * FROM R
> > WHERE b=5, d=2,. ...)
> > UNION
> > ....
> > And lots of unions.
>
> Do you need UNION, or do you actually mean UNION ALL?

I am using UNION ALL, but it doesn't matter
i am actually doing:
(SELECT * FROM R
WHERE a=3, b=6,. ... LIMIT 1)
UNION ALL
(SELECT * FROM R
WHERE b=5, d=2,. ... LIMIT 1)
UNION ALL
....

with LIMIT 1.
My initial reasoning was to avoid extra sorts but i guess that the planner
just doesn't get the LIMIT 1. I see now that UNION should be better
for the planner to undestand (not performance wise).
However, UNION alone, doesn't seem to cut it.
Following is an example. t7 has 2 attributes and a non-unique index on one
attribute. here is a printout:
explain analyze (select * from t7 where a4=113 LIMIT 1) UNION (select * from
t7 where a2=139 LIMIT 1);                                                            QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------Unique
(cost=23.18..23.19 rows=2 width=8) (actual time=0.149..0.165 rows=1  
loops=1)  ->  Sort  (cost=23.18..23.18 rows=2 width=8) (actual time=0.142..0.148
rows=2 loops=1)        Sort Key: a4, a2        ->  Append  (cost=0.00..23.17 rows=2 width=8) (actual
time=0.052..0.106 rows=2 loops=1)              ->  Limit  (cost=0.00..5.65 rows=1 width=8) (actual
time=0.046..0.049 rows=1 loops=1)                    ->  Index Scan using indext7 on t7  (cost=0.00..5.65
rows=1 width=8) (actual time=0.038..0.038 rows=1 loops=1)                          Index Cond: (a4 = 113)
-> Limit  (cost=0.00..17.50 rows=1 width=8) (actual  
time=0.035..0.038 rows=1 loops=1)                    ->  Seq Scan on t7  (cost=0.00..17.50 rows=1 width=8)
(actual time=0.029..0.029 rows=1 loops=1)                          Filter: (a2 = 139)Total runtime: 0.334 ms
(11 rows)


>
> Also, couldn't you just do:
>
> SELECT * FROM R
> WHERE (a=3, b=6, ...)
> OR (b=5, d=2, ...)
> etc

No, a filtering action is not enough since my goal is to only use indices
when retrieving single tuples each time thus, if i will use OR i cannot
control the number of tuples returned by each Or clause.

>
> > I am currently just writing the query as a string and open a cursor.
> > Is there a simple way to use Datums instead of converting the attributes
> > to strings to create a plan for SPI.
> > 10x.
>
> I imagine SPI_prepare() and SPI_execp() would be used for this.

I am already using SPI_prepare but it uses a query of the form of a char
string, which i need to prepare and is quite long. I.e. if i have 100 tuples
i wish to retrieve it can be very wasteful to prepare the string in memory
and use SPI_prepare to prepare and later execute it.
better to use directly the datums (which i already have deformed from
previous operations).

>
> Have a nice day,

--
Regards,
��������Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS: �see at
http://members.lycos.co.uk/my2nis/spamwarning.html


Re: Planning without reason.

From
Martijn van Oosterhout
Date:
On Fri, Jun 23, 2006 at 05:12:14PM +0300, Tzahi Fadida wrote:
> My initial reasoning was to avoid extra sorts but i guess that the planner
> just doesn't get the LIMIT 1. I see now that UNION should be better
> for the planner to undestand (not performance wise).
> However, UNION alone, doesn't seem to cut it.
> Following is an example. t7 has 2 attributes and a non-unique index on one
> attribute. here is a printout:
> explain analyze (select * from t7 where a4=113 LIMIT 1) UNION (select * from
> t7 where a2=139 LIMIT 1);

What are the indexes on?  If you only have an index on a4, the latter
query has to be an index scan and there's no way to optimise it way.

> > Also, couldn't you just do:
> >
> > SELECT * FROM R
> > WHERE (a=3, b=6, ...)
> > OR (b=5, d=2, ...)
> > etc
>
> No, a filtering action is not enough since my goal is to only use indices
> when retrieving single tuples each time thus, if i will use OR i cannot
> control the number of tuples returned by each Or clause.

I must admit, this is a really strange way of doing it. For example, if
multiple rows match, the tuples eventually returned will be a random
selection of the rows that matched. Especially with the "limit 1"
there's no way the optimiser could combine the individual scans.

If you really need the "LIMIT 1" and you don't have full index coverage
then you're quite limited as to how it can be optimised.

> > > I am currently just writing the query as a string and open a cursor.
> > > Is there a simple way to use Datums instead of converting the attributes
> > > to strings to create a plan for SPI.
> > > 10x.
> >
> > I imagine SPI_prepare() and SPI_execp() would be used for this.
>
> I am already using SPI_prepare but it uses a query of the form of a char
> string, which i need to prepare and is quite long. I.e. if i have 100 tuples
> i wish to retrieve it can be very wasteful to prepare the string in memory
> and use SPI_prepare to prepare and later execute it.
> better to use directly the datums (which i already have deformed from
> previous operations).

I'm confused here too. I thought the datums you're talking about were
arguments, thus you could push them straight to SPI_execp(). But you
seem to be suggesting parts of the actual query are in datum form also?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Planning without reason.

From
Tzahi Fadida
Date:
On Friday 23 June 2006 17:47, Martijn van Oosterhout wrote:
> On Fri, Jun 23, 2006 at 05:12:14PM +0300, Tzahi Fadida wrote:
> > My initial reasoning was to avoid extra sorts but i guess that the
> > planner just doesn't get the LIMIT 1. I see now that UNION should be
> > better for the planner to undestand (not performance wise).
> > However, UNION alone, doesn't seem to cut it.
> > Following is an example. t7 has 2 attributes and a non-unique index on
> > one attribute. here is a printout:
> > explain analyze (select * from t7 where a4=113 LIMIT 1) UNION (select *
> > from t7 where a2=139 LIMIT 1);
>
> What are the indexes on?  If you only have an index on a4, the latter
> query has to be an index scan and there's no way to optimise it way.

That's my point, it should have only used a sequence scan and not also
do an index scan.
In other words, it should have consolidated the two nodes of index scan and
sequence scan into a single plan node where you only scan sequentially the
relation and choose a tuple for each UNION clause.

>
> > > Also, couldn't you just do:
> > >
> > > SELECT * FROM R
> > > WHERE (a=3, b=6, ...)
> > > OR (b=5, d=2, ...)
> > > etc
> >
> > No, a filtering action is not enough since my goal is to only use indices
> > when retrieving single tuples each time thus, if i will use OR i cannot
> > control the number of tuples returned by each Or clause.
>
> I must admit, this is a really strange way of doing it. For example, if
> multiple rows match, the tuples eventually returned will be a random
> selection of the rows that matched. Especially with the "limit 1"
> there's no way the optimiser could combine the individual scans.

It is a query i use for full disjunction which is a part of the algorithm.
I am doing it manually, so i don't see why it can't do it itself.
I.e.: Scan sequentially R. for each UNION clause find a matching tuple.
the end.

>
> If you really need the "LIMIT 1" and you don't have full index coverage
> then you're quite limited as to how it can be optimised.

You misunderstood me, i wish the planner to only use sequence scan in the
event where even one node is a sequential scan.

>
> > > > I am currently just writing the query as a string and open a cursor.
> > > > Is there a simple way to use Datums instead of converting the
> > > > attributes to strings to create a plan for SPI.
> > > > 10x.
> > >
> > > I imagine SPI_prepare() and SPI_execp() would be used for this.
> >
> > I am already using SPI_prepare but it uses a query of the form of a char
> > string, which i need to prepare and is quite long. I.e. if i have 100
> > tuples i wish to retrieve it can be very wasteful to prepare the string
> > in memory and use SPI_prepare to prepare and later execute it.
> > better to use directly the datums (which i already have deformed from
> > previous operations).
>
> I'm confused here too. I thought the datums you're talking about were
> arguments, thus you could push them straight to SPI_execp(). But you
> seem to be suggesting parts of the actual query are in datum form also?

Example. i have a tuple T i am searching for.
T contains attribute1, attribute2. I have T in a
heap_deformtuple(T) manner, i.e., i have T->v and T->n (for nulls).
Currently i am doing (loosely):
"(SELECT * FROM R where attribute1=" + convertDatumToCharString(T->v[0])+
" AND attribute2=" + convertDatumToCharString(T->v[1]) +" LIMIT 1)"
+ "UNION"
... as above.

I can use prepare without conversions but i still have to construct the long
query each time. I can't do prepare just once because the where clauses
structures are always changing. Thus, i was wondering if i can
also construct the part in the plan where i request to SELECT * FROM R...
I.e. not to use strings at all. The structure of the query is the same all the
time. I.e. there is the SELECT * FROM R and the WHERE clause with LIMIT 1
nodes with UNION  ALL between SELECTS.

>
> Have a nice day,

--
Regards,
��������Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS: �see at
http://members.lycos.co.uk/my2nis/spamwarning.html


Re: Planning without reason.

From
Martijn van Oosterhout
Date:
On Fri, Jun 23, 2006 at 06:10:33PM +0300, Tzahi Fadida wrote:
> On Friday 23 June 2006 17:47, Martijn van Oosterhout wrote:
> > On Fri, Jun 23, 2006 at 05:12:14PM +0300, Tzahi Fadida wrote:
> > > My initial reasoning was to avoid extra sorts but i guess that the
> > > planner just doesn't get the LIMIT 1. I see now that UNION should be
> > > better for the planner to undestand (not performance wise).
> > > However, UNION alone, doesn't seem to cut it.
> > > Following is an example. t7 has 2 attributes and a non-unique index on
> > > one attribute. here is a printout:
> > > explain analyze (select * from t7 where a4=113 LIMIT 1) UNION (select *
> > > from t7 where a2=139 LIMIT 1);
> >
> > What are the indexes on?  If you only have an index on a4, the latter
> > query has to be an index scan and there's no way to optimise it way.
>
> That's my point, it should have only used a sequence scan and not also
> do an index scan.
> In other words, it should have consolidated the two nodes of index scan and
> sequence scan into a single plan node where you only scan sequentially the
> relation and choose a tuple for each UNION clause.

I see what you're saying, but that's not necessarily faster. If you
consider the case where the tuple found by the seq scan is near the
beginning of the table and the tuple by the index near the end, it
could be worse to fold them. If you did only a single seq scan it would
have to scan the whole table, whereas now it only has to scan the
beginning.

Just because it says "Seq Scan" doesn't mean it actually scans the
whole table. There isn't a way in postgres to specify the plan you
want. LIMIT only works on the output of entire nodes, you can't say
"output one tuple matching A, one tuple matching B" in a single node.

> Example. i have a tuple T i am searching for.
> T contains attribute1, attribute2. I have T in a
> heap_deformtuple(T) manner, i.e., i have T->v and T->n (for nulls).
> Currently i am doing (loosely):
> "(SELECT * FROM R where attribute1=" + convertDatumToCharString(T->v[0])+
> " AND attribute2=" + convertDatumToCharString(T->v[1]) +" LIMIT 1)"
> + "UNION"
> ... as above.
>
> I can use prepare without conversions but i still have to construct the long
> query each time. I can't do prepare just once because the where clauses
> structures are always changing. Thus, i was wondering if i can
> also construct the part in the plan where i request to SELECT * FROM R...
> I.e. not to use strings at all. The structure of the query is the same all the
> time. I.e. there is the SELECT * FROM R and the WHERE clause with LIMIT 1
> nodes with UNION  ALL between SELECTS.

I see. You could do:

(SELECT * FROM R WHERE attribute1=$1 and attribute2=$2 LIMIT 1) ... etc ...

That saves the conversions to cstrings, but you're looking for something
more abstract than that. I'm not sure that exists.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Planning without reason.

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Fri, Jun 23, 2006 at 03:57:19PM +0300, Tzahi Fadida wrote:
>> (SELECT * FROM R
>> WHERE a=3, b=6,. ...)
>> UNION
>> (SELECT * FROM R
>> WHERE b=5, d=2,. ...)
>> UNION
>> ....
>> And lots of unions.

> Do you need UNION, or do you actually mean UNION ALL?

> Also, couldn't you just do:

> SELECT * FROM R 
> WHERE (a=3, b=6, ...)
> OR (b=5, d=2, ...)
> etc

That seems to be what Tzahi wants the system to do for him.  But the OR
format is not in general logically equivalent to either UNION or UNION
ALL, because UNION would cause duplicate output rows to be suppressed
whereas UNION ALL could allow the same table row to be emitted multiple
times (if two different WHERE conditions could select the same row).

It's conceivable that the planner could prove that neither effect is
possible in a particular query and then make the transformation
automatically, but I'm not about to expend that kind of planning effort
on such an odd case --- checking for it would waste entirely too many
cycles in most cases.
        regards, tom lane


Re: Planning without reason.

From
Tzahi Fadida
Date:
Perhaps it is over the top just for my specific query.
Basically, i wish not to do something the system should do
because, as i already noticed, when versions changes the
database can break your code if you don't keep up.

I guess i can make a map of attributes participating in an index
of a relation.
Also, i would have to take into account the type of index used.
For example, a btree should have the capability to do prefix key
searches while hash indices probably can't.
Then check each target tuple if it can use an index.
All this before constructing the query for the planner.

However, i already played with this in the past. I reached
the conclusion that it is better to let the planner decide what is
best since it is too complex for me to say things about cost
estimates or index changing capabilities.

On Friday 23 June 2006 19:28, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > On Fri, Jun 23, 2006 at 03:57:19PM +0300, Tzahi Fadida wrote:
> >> (SELECT * FROM R
> >> WHERE a=3, b=6,. ...)
> >> UNION
> >> (SELECT * FROM R
> >> WHERE b=5, d=2,. ...)
> >> UNION
> >> ....
> >> And lots of unions.
> >
> > Do you need UNION, or do you actually mean UNION ALL?
> >
> > Also, couldn't you just do:
> >
> > SELECT * FROM R
> > WHERE (a=3, b=6, ...)
> > OR (b=5, d=2, ...)
> > etc
>
> That seems to be what Tzahi wants the system to do for him.  But the OR
> format is not in general logically equivalent to either UNION or UNION
> ALL, because UNION would cause duplicate output rows to be suppressed
> whereas UNION ALL could allow the same table row to be emitted multiple
> times (if two different WHERE conditions could select the same row).
>
> It's conceivable that the planner could prove that neither effect is
> possible in a particular query and then make the transformation
> automatically, but I'm not about to expend that kind of planning effort
> on such an odd case --- checking for it would waste entirely too many
> cycles in most cases.
>
>             regards, tom lane

--
Regards,
��������Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS: �see at
http://members.lycos.co.uk/my2nis/spamwarning.html


Re: Planning without reason.

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> It's conceivable that the planner could prove that neither effect is
> possible in a particular query and then make the transformation
> automatically, but I'm not about to expend that kind of planning effort
> on such an odd case --- checking for it would waste entirely too many
> cycles in most cases.

Fwiw these aren't really very rare cases. Usually it goes the other direction
though. I seem to recall Oracle did in fact support a plan where it converted
OR expressions into a kind of union plan node.

But I think Postgres's bitmap index scan satisfies much of the same need. I
think the most useful case where the union plan was beneficial was precisely
when you had something like WHERE index_col1=1 OR indexed_col2=2.

Going from an UNION plan to a OR plan would be somewhat strange. Programmers
don't usually write plans as UNION in place of the more natural OR unless they
have a reason to.

-- 
greg



Re: Planning without reason.

From
Martijn van Oosterhout
Date:
On Fri, Jun 23, 2006 at 08:14:07PM +0300, Tzahi Fadida wrote:
> I guess i can make a map of attributes participating in an index
> of a relation.
> Also, i would have to take into account the type of index used.
> For example, a btree should have the capability to do prefix key
> searches while hash indices probably can't.
> Then check each target tuple if it can use an index.
> All this before constructing the query for the planner.

At the end of the day it comes down to that Postgres has no way
currently to express the plan you want, so we're not talking about
small planner or optimiser changes, we're talking about creating a
completely new node type which could be used for this purpose.

It would be a node that sat on top of a "seq scan" and had a number of
conditions. Each tuple would be matched against each condition. Once a
condition is matched, that tuple is returned. Once a condition has
matched N times it is disabled. Once all conditions are disabled,
you're done.

Seems terribly special purpose, yet I don't see how you could do it any
other way. If it wern't for the LIMIT clauses the whole operation would
be trivial.

> > It's conceivable that the planner could prove that neither effect is
> > possible in a particular query and then make the transformation
> > automatically, but I'm not about to expend that kind of planning effort
> > on such an odd case --- checking for it would waste entirely too many
> > cycles in most cases.

I think in the case we have here, the transformation wouldn't apply
anyway (otherwise it could be done by hand).

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.