Thread: Re: [INTERFACES] Re: [HACKERS] changes in 6.4

Re: [INTERFACES] Re: [HACKERS] changes in 6.4

From
David Hartwig
Date:

Bruce Momjian wrote:

> >
> > Hannu Krosing wrote:
> >
> > > > The days where every release fixed server crashes, or added a feature
> > > > that users were 'screaming for' may be a thing of the past.
> > >
> > > Is anyone working on fixing the exploding optimisations for many OR-s,
> > > at least the canonic case used by access?
> > >
> > > My impression is that this has fallen somewhere between
> > > insightdist and Vadim.
> >
> > This is really big for the ODBCers. (And I suspect for JDBCers too.)  Many
> > desktop libraries and end-user tools depend on this "record set" strategy to
> > operate effectively.
> >
> > I have put together a workable hack that runs just before cnfify().  The
> > option is activated through the SET command.  Once activated, it identifies
> > queries with this particular multi-OR pattern generated by these RECORD SET
> > strategies.  Qualified query trees are rewritten as multiple UNIONs.   (One
> > for each OR grouping).
> >
> > The results are profound.    Queries that used to scan tables because of the
> > ORs, now make use of any indexes.   Thus, the size of the table has virtually
> > no effect on performance.  Furthermore, queries that used to crash the
> > backend, now run in under a second.
> >
> > Currently the down sides are:
> >     1. If there is no usable index, performance is significantly worse.  The
> > patch does not check to make sure that there is a usable index.  I could use
> > some pointers on this.
> >
> >     2. Small tables are actually a bit slower than without the patch.
> >
> >     3.  Not very elegant.    I am looking for a more generalized solution.
> > I have lots of ideas, but I would need to know the backend much better before
> > attempting any of them.   My favorite idea is before cnfify(), to factor the
> > OR terms and pull out the constants into a virtual (temporary) table spaces.
> > Then rewrite the query as a join.   The optimizer will (should) treat the new
> > query accordingly.  This assumes that an efficient factoring algorithm exists
> > and that temporary tables can exist in the heap.
> >
> > Illustration:
> > SELECT ... FROM tab WHERE
> > (var1 = const1 AND var2 = const2) OR
> > (var1 = const3 AND var2 = const4) OR
> > (var1 = const5 AND var2 = const6)
> >
> > SELECT ... FROM tab, tmp WHERE
> > (var1 = var_x AND var2 = var_y)
> >
> > tmp
> > var_x  | var_y
> > --------------
> > const1|const2
> > const3|const4
> > const5|const6
>
> David, where are we on this?  I know we have OR's using indexes.  Do we
> still need to look this as a fix, or are we OK.   I have not gotten far
> enough in the optimizer to know how to fix the

Bruce,

If the question is, have I come up with a solution for the cnf'ify problem:  No

If the question is, is it still important:  Very much yes.

It is essential for many RAD tools using remote data objects which make use of key
sets.  Your recent optimization of the OR list goes a long way, but inevitably
users are confronted with multi-part keys.

When I look at the problem my head spins.   I do not have the experience (yet?)
with the backend to be mucking around in the optimizer.  As I see it, cnf'ify is
doing just what it is supposed to do.  Boundless boolean logic.

I think hope may lay though, in identifying each AND'ed group associated with a key
and tagging it as a special sub-root node which cnf'ify does not penetrate.   This
node would be allowed to pass to the later stages of the optimizer where it will be
used to plan index scans.  Easy for me to say.

In the meantime, I still have the patch that I described in prior email.  It has
worked well for us.  Let me restate that.   We could not survive without it!
However, I do not feel that is a sufficiently functional approach that should be
incorporated as a final solution.     I will submit the patch if you, (anyone) does
not come up with a better solution.  It is coded to be activated by a SET KSQO to
minimize its reach.


Re: [INTERFACES] Re: [HACKERS] changes in 6.4

From
Bruce Momjian
Date:
>
>
> Bruce Momjian wrote:
>
> > >
> > > Hannu Krosing wrote:
> > >
> > > > > The days where every release fixed server crashes, or added a feature
> > > > > that users were 'screaming for' may be a thing of the past.
> > > >
> > > > Is anyone working on fixing the exploding optimisations for many OR-s,
> > > > at least the canonic case used by access?
> > > >
> > > > My impression is that this has fallen somewhere between
> > > > insightdist and Vadim.
> > >
> > > This is really big for the ODBCers. (And I suspect for JDBCers too.)  Many
> > > desktop libraries and end-user tools depend on this "record set" strategy to
> > > operate effectively.
> > >
> > > I have put together a workable hack that runs just before cnfify().  The
> > > option is activated through the SET command.  Once activated, it identifies
> > > queries with this particular multi-OR pattern generated by these RECORD SET
> > > strategies.  Qualified query trees are rewritten as multiple UNIONs.   (One
> > > for each OR grouping).
> > >
> > > The results are profound.    Queries that used to scan tables because of the
> > > ORs, now make use of any indexes.   Thus, the size of the table has virtually
> > > no effect on performance.  Furthermore, queries that used to crash the
> > > backend, now run in under a second.
> > >
> > > Currently the down sides are:
> > >     1. If there is no usable index, performance is significantly worse.  The
> > > patch does not check to make sure that there is a usable index.  I could use
> > > some pointers on this.
> > >
> > >     2. Small tables are actually a bit slower than without the patch.
> > >
> > >     3.  Not very elegant.    I am looking for a more generalized solution.
> > > I have lots of ideas, but I would need to know the backend much better before
> > > attempting any of them.   My favorite idea is before cnfify(), to factor the
> > > OR terms and pull out the constants into a virtual (temporary) table spaces.
> > > Then rewrite the query as a join.   The optimizer will (should) treat the new
> > > query accordingly.  This assumes that an efficient factoring algorithm exists
> > > and that temporary tables can exist in the heap.
> > >
> > > Illustration:
> > > SELECT ... FROM tab WHERE
> > > (var1 = const1 AND var2 = const2) OR
> > > (var1 = const3 AND var2 = const4) OR
> > > (var1 = const5 AND var2 = const6)
> > >
> > > SELECT ... FROM tab, tmp WHERE
> > > (var1 = var_x AND var2 = var_y)
> > >
> > > tmp
> > > var_x  | var_y
> > > --------------
> > > const1|const2
> > > const3|const4
> > > const5|const6
> >
> > David, where are we on this?  I know we have OR's using indexes.  Do we
> > still need to look this as a fix, or are we OK.   I have not gotten far
> > enough in the optimizer to know how to fix the
>
> Bruce,
>
> If the question is, have I come up with a solution for the cnf'ify problem:  No
>
> If the question is, is it still important:  Very much yes.
>
> It is essential for many RAD tools using remote data objects which make use of key
> sets.  Your recent optimization of the OR list goes a long way, but inevitably
> users are confronted with multi-part keys.
>
> When I look at the problem my head spins.   I do not have the experience (yet?)
> with the backend to be mucking around in the optimizer.  As I see it, cnf'ify is
> doing just what it is supposed to do.  Boundless boolean logic.
>
> I think hope may lay though, in identifying each AND'ed group associated with a key
> and tagging it as a special sub-root node which cnf'ify does not penetrate.   This
> node would be allowed to pass to the later stages of the optimizer where it will be
> used to plan index scans.  Easy for me to say.
>
> In the meantime, I still have the patch that I described in prior email.  It has
> worked well for us.  Let me restate that.   We could not survive without it!
> However, I do not feel that is a sufficiently functional approach that should be
> incorporated as a final solution.     I will submit the patch if you, (anyone) does
> not come up with a better solution.  It is coded to be activated by a SET KSQO to
> minimize its reach.
>
>

OK, let me try this one.

Why is the system cnf'ifying the query.  Because it  wants to have a
list of qualifications that are AND'ed, so it can just pick the most
restrictive/cheapest, and evaluate that one first.

If you have:

    (a=b and c=d) or e=1

In this case, without cnf'ify, it has to evaluate both of them, because
if one is false, you can't be sure another would be true.  In the
cnf'ify case,

    (a=b or e=1) and (c=d or e=1)

In this case, it can choose either, and act on just one, if a row fails
to meet it, it can stop and not evaluate it using the other restriction.

The fact is that it is only going to use fancy join/index in one of the
two cases, so it tries to pick the best one, and does a brute-force
qualification test on the remaining item if the first one tried is true.

The problem is of course large where clauses can exponentially expand
this.  What it really trying to do is to pick a cheapest restriction,
but the memory explosion and query failure are serious problems.

The issue is that it thinks it is doing something to help things, while
it is actually hurting things.

In the ODBC case of:

    (x=3 and y=4) or
    (x=3 and y=5) or
    (x=3 and y=6) or ...

it clearly is not going to gain anything by choosing any CHEAPEST path,
because they are all the same in terms of cost, and the use by ODBC
clients is hurting reliability.

I am inclined to agree with David's solution of breaking apart the query
into separate UNION queries in certain cases.  It seems to be the most
logical solution, because the cnf'ify code is working counter to its
purpose in these cases.

Now, the question is how/where to implement this.  I see your idea of
making the OR a join to a temp table that holds all the constants.
Another idea would be to do actual UNION queries:

    SELECT * FROM tab
    WHERE (x=3 and y=4)
    UNION
    SELECT * FROM tab
    WHERE (x=3 and y=5)
    UNION
    SELECT * FROM tab
    WHERE (x=3 and y=6) ...

This would work well for tables with indexes, but for a sequential scan,
you are doing a sequential scan for each UNION.  Another idea is
subselects.  Also, you have to make sure you return the proper rows,
keeping duplicates where they are in the base table, but not returning
them when the meet more than one qualification.

    SELECT * FROM tab
    WHERE (x,y) IN (SELECT 3, 4
            UNION
            SELECT 3, 5
            UNION
            SELECT 3, 6)

I believe we actually support this.  This is not going to use an index
on tab, so it may be slow if x and y are indexed.

Another more bizarre solution is:

    SELECT * FROM tab
    WHERE (x,y) = (SELECT 3, 4) OR
          (x,y) = (SELECT 3, 5) OR
          (x,y) = (SELECT 3, 6)

Again, I think we do this too.  I don't think cnf'ify does anything with
this.  I also believe "=" uses indexes on subselects, while IN does not
because IN could return lots of rows, and an index is slower than a
non-index join on lots of rows.  Of course, now that we index OR's.

Let me ask another question.  If I do:

    SELECT * FROM tab WHERE x=3 OR x=4

it works, and uses indexes.  Why can't the optimizer just not cnf'ify
things sometimes, and just do:

    SELECT * FROM tab
    WHERE    (x=3 AND y=4) OR
        (x=3 AND y=5) OR
        (x=3 AND y=6)

Why can it handle x=3 OR x=4, but not the more complicated case above,
without trying to be too smart?  If x,y is a multi-key index, it could
use that quite easily.  If not, it can do a sequentail scan and run the
tests.

Another issue.  To the optimizer, x=3 and x=y are totally different.  In
x=3, it is a column compared to a constant, while in x=y, it is a join.
That makes a huge difference.

In the case of (a=b and c=d) or e=1, you pick the best path and do the
a=b join, and throw in the e=1 entries.  You can't easily do both joins,
because you also need the e=1 stuff.

I wounder what would happen if we prevent cnf'ifying of cases where the
OR represent only column = constant restrictions.

I meant to really go through the optimizer this month, but other backend
items took my time.

Can someone run some tests on disabling the cnf'ify calls.  It is my
understanding that with the non-cnf-ify'ed query, it can't choose an
optimial path, and starts to do either straight index matches,
sequential scans, or cartesian products where it joins every row to
every other row looking for a match.

Let's say we turn off cnf-ify just for non-join queries.  Does that
help?

I am not sure of the ramifications of telling the optimizer it no longer
has a variety of paths to choose for evaluating the query.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [INTERFACES] Re: [HACKERS] changes in 6.4

From
David Hartwig
Date:

Bruce Momjian wrote:

> OK, let me try this one.
>
> Why is the system cnf'ifying the query.  Because it  wants to have a
> list of qualifications that are AND'ed, so it can just pick the most
> restrictive/cheapest, and evaluate that one first.
>
> If you have:
>
>         (a=b and c=d) or e=1
>
> In this case, without cnf'ify, it has to evaluate both of them, because
> if one is false, you can't be sure another would be true.  In the
> cnf'ify case,
>
>         (a=b or e=1) and (c=d or e=1)
>
> In this case, it can choose either, and act on just one, if a row fails
> to meet it, it can stop and not evaluate it using the other restriction.
>
> The fact is that it is only going to use fancy join/index in one of the
> two cases, so it tries to pick the best one, and does a brute-force
> qualification test on the remaining item if the first one tried is true.
>
> The problem is of course large where clauses can exponentially expand
> this.  What it really trying to do is to pick a cheapest restriction,
> but the memory explosion and query failure are serious problems.
>
> The issue is that it thinks it is doing something to help things, while
> it is actually hurting things.
>
> In the ODBC case of:
>
>         (x=3 and y=4) or
>         (x=3 and y=5) or
>         (x=3 and y=6) or ...
>
> it clearly is not going to gain anything by choosing any CHEAPEST path,
> because they are all the same in terms of cost, and the use by ODBC
> clients is hurting reliability.
>
> I am inclined to agree with David's solution of breaking apart the query
> into separate UNION queries in certain cases.  It seems to be the most
> logical solution, because the cnf'ify code is working counter to its
> purpose in these cases.
>
> Now, the question is how/where to implement this.  I see your idea of
> making the OR a join to a temp table that holds all the constants.
> Another idea would be to do actual UNION queries:
>
>         SELECT * FROM tab
>         WHERE (x=3 and y=4)
>         UNION
>         SELECT * FROM tab
>         WHERE (x=3 and y=5)
>         UNION
>         SELECT * FROM tab
>         WHERE (x=3 and y=6) ...
>
> This would work well for tables with indexes, but for a sequential scan,
> you are doing a sequential scan for each UNION.

Practically speaking, the lack of an index concern, may not be justified.   The reason
these queries are being generated, with this shape, is because remote data objects on the
client side are being told that a primary key exists on these tables.  The object is told
about these keys  in one of two ways.

1.  It queries the database for the primary key of the table.  The ODBC driver serviced
this request by querying for the attributes used in {table_name}_pkey.

2.  The user manually specifies the primary key.  In this case an actual index may not
exist.   (i.e. MS Access asks the user for this information if a primary key is not found
in a table)

The second case is the only one that would cause a problem.  Fortunately, the solution is
simple.  Add a primary key index!

My only concern is to be able to accurately identify a query with the proper signature
before rewriting it as a UNION.   To what degree should this inspection be taken?

BTW,  I would not do the rewrite on OR's without AND's since you have fixed the OR's use
of the index.

There is one other potential issue.  My experience with using arrays in tables and UNIONS
creates problems.  There are missing array comparison operators which are used by the
implied DISTINCT.

> Another idea is
> subselects.  Also, you have to make sure you return the proper rows,
> keeping duplicates where they are in the base table, but not returning
> them when the meet more than one qualification.
>
>         SELECT * FROM tab
>         WHERE (x,y) IN (SELECT 3, 4
>                         UNION
>                         SELECT 3, 5
>                         UNION
>                         SELECT 3, 6)
>
> I believe we actually support this.  This is not going to use an index
> on tab, so it may be slow if x and y are indexed.
>
> Another more bizarre solution is:
>
>         SELECT * FROM tab
>         WHERE (x,y) = (SELECT 3, 4) OR
>               (x,y) = (SELECT 3, 5) OR
>               (x,y) = (SELECT 3, 6)
>
> Again, I think we do this too.  I don't think cnf'ify does anything with
> this.  I also believe "=" uses indexes on subselects, while IN does not
> because IN could return lots of rows, and an index is slower than a
> non-index join on lots of rows.  Of course, now that we index OR's.
>
> Let me ask another question.  If I do:
>
>         SELECT * FROM tab WHERE x=3 OR x=4
>
> it works, and uses indexes.  Why can't the optimizer just not cnf'ify
> things sometimes, and just do:
>
>         SELECT * FROM tab
>         WHERE   (x=3 AND y=4) OR
>                 (x=3 AND y=5) OR
>                 (x=3 AND y=6)
>
> Why can it handle x=3 OR x=4, but not the more complicated case above,
> without trying to be too smart?  If x,y is a multi-key index, it could
> use that quite easily.  If not, it can do a sequentail scan and run the
> tests.
>
> Another issue.  To the optimizer, x=3 and x=y are totally different.  In
> x=3, it is a column compared to a constant, while in x=y, it is a join.
> That makes a huge difference.
>
> In the case of (a=b and c=d) or e=1, you pick the best path and do the
> a=b join, and throw in the e=1 entries.  You can't easily do both joins,
> because you also need the e=1 stuff.
>
> I wounder what would happen if we prevent cnf'ifying of cases where the
> OR represent only column = constant restrictions.
>
> I meant to really go through the optimizer this month, but other backend
> items took my time.
>
> Can someone run some tests on disabling the cnf'ify calls.  It is my
> understanding that with the non-cnf-ify'ed query, it can't choose an
> optimial path, and starts to do either straight index matches,
> sequential scans, or cartesian products where it joins every row to
> every other row looking for a match.
>
> Let's say we turn off cnf-ify just for non-join queries.  Does that
> help?
>
> I am not sure of the ramifications of telling the optimizer it no longer
> has a variety of paths to choose for evaluating the query.

I did not try this earlier because I thought it was too good to be true.   I was right.
I tried commenting out the normalize() function in the cnfify().   The EXPLAIN showed a
sequential scan and the resulting tuple set was empty.   Time will not allow me to dig
into this further this weekend.

Unless you come up with a better solution, I am going to submit my patch on Monday to
make the Sept. 1st deadline.  It includes a SET switch to activate the rewrite so as not
to cause problems outside the ODBC users.    We can either improve, it or yank it, by the
Oct. 1st deadline.


Re: [INTERFACES] Re: [HACKERS] changes in 6.4

From
Sbragion Denis
Date:
Hello,

At 11.40 30/08/98 -0400, David Hartwig wrote:
>> Why is the system cnf'ifying the query.  Because it  wants to have a
>> list of qualifications that are AND'ed, so it can just pick the most
>> restrictive/cheapest, and evaluate that one first.

Just a small question about all this optimizations stuff. I'm not a
database expert but I think we are talking about a NP-complete problem.
Could'nt we convert this optimization problem into another NP one that is
known to have a good solution ? For example for the traveling salesman
problem there's an alghoritm that provide a solution that's never more than
two times the optimal one an provides results that are *really* near the
optimal one most of the times. The simplex alghoritm may be another
example. I think that this kind of alghoritm would be better than a
collection ot tricks for special cases, and this tricks could be used
anyway when special cases are detected. Furthermore I also know that exists
a free program I used in the past that provides this kind of optimizations
for chip design. I don't remember the exact name of the program but I
remember it came from Berkeley university. Of course may be I'm totally
missing the point.

Hope it helps !

Bye!

    Dr. Sbragion Denis
    InfoTecna
    Tel, Fax: +39 39 2324054
    URL: http://space.tin.it/internet/dsbragio

Re: [INTERFACES] Re: [HACKERS] changes in 6.4

From
Bruce Momjian
Date:
This is an old message, but still relivant.  I belive 6.6 will have much
better OR memory usage.

> 
> 
> Bruce Momjian wrote:
> 
> > >
> > > Hannu Krosing wrote:
> > >
> > > > > The days where every release fixed server crashes, or added a feature
> > > > > that users were 'screaming for' may be a thing of the past.
> > > >
> > > > Is anyone working on fixing the exploding optimisations for many OR-s,
> > > > at least the canonic case used by access?
> > > >
> > > > My impression is that this has fallen somewhere between
> > > > insightdist and Vadim.
> > >
> > > This is really big for the ODBCers. (And I suspect for JDBCers too.)  Many
> > > desktop libraries and end-user tools depend on this "record set" strategy to
> > > operate effectively.
> > >
> > > I have put together a workable hack that runs just before cnfify().  The
> > > option is activated through the SET command.  Once activated, it identifies
> > > queries with this particular multi-OR pattern generated by these RECORD SET
> > > strategies.  Qualified query trees are rewritten as multiple UNIONs.   (One
> > > for each OR grouping).
> > >
> > > The results are profound.    Queries that used to scan tables because of the
> > > ORs, now make use of any indexes.   Thus, the size of the table has virtually
> > > no effect on performance.  Furthermore, queries that used to crash the
> > > backend, now run in under a second.
> > >
> > > Currently the down sides are:
> > >     1. If there is no usable index, performance is significantly worse.  The
> > > patch does not check to make sure that there is a usable index.  I could use
> > > some pointers on this.
> > >
> > >     2. Small tables are actually a bit slower than without the patch.
> > >
> > >     3.  Not very elegant.    I am looking for a more generalized solution.
> > > I have lots of ideas, but I would need to know the backend much better before
> > > attempting any of them.   My favorite idea is before cnfify(), to factor the
> > > OR terms and pull out the constants into a virtual (temporary) table spaces.
> > > Then rewrite the query as a join.   The optimizer will (should) treat the new
> > > query accordingly.  This assumes that an efficient factoring algorithm exists
> > > and that temporary tables can exist in the heap.
> > >
> > > Illustration:
> > > SELECT ... FROM tab WHERE
> > > (var1 = const1 AND var2 = const2) OR
> > > (var1 = const3 AND var2 = const4) OR
> > > (var1 = const5 AND var2 = const6)
> > >
> > > SELECT ... FROM tab, tmp WHERE
> > > (var1 = var_x AND var2 = var_y)
> > >
> > > tmp
> > > var_x  | var_y
> > > --------------
> > > const1|const2
> > > const3|const4
> > > const5|const6
> >
> > David, where are we on this?  I know we have OR's using indexes.  Do we
> > still need to look this as a fix, or are we OK.   I have not gotten far
> > enough in the optimizer to know how to fix the
> 
> Bruce,
> 
> If the question is, have I come up with a solution for the cnf'ify problem:  No
> 
> If the question is, is it still important:  Very much yes.
> 
> It is essential for many RAD tools using remote data objects which make use of key
> sets.  Your recent optimization of the OR list goes a long way, but inevitably
> users are confronted with multi-part keys.
> 
> When I look at the problem my head spins.   I do not have the experience (yet?)
> with the backend to be mucking around in the optimizer.  As I see it, cnf'ify is
> doing just what it is supposed to do.  Boundless boolean logic.
> 
> I think hope may lay though, in identifying each AND'ed group associated with a key
> and tagging it as a special sub-root node which cnf'ify does not penetrate.   This
> node would be allowed to pass to the later stages of the optimizer where it will be
> used to plan index scans.  Easy for me to say.
> 
> In the meantime, I still have the patch that I described in prior email.  It has
> worked well for us.  Let me restate that.   We could not survive without it!
> However, I do not feel that is a sufficiently functional approach that should be
> incorporated as a final solution.     I will submit the patch if you, (anyone) does
> not come up with a better solution.  It is coded to be activated by a SET KSQO to
> minimize its reach.
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


PERL

From
"Jason Doller"
Date:
Hi All

I'm a bit lost!  Where can I find documentation on accessing postgres 
from inside PERL (5)?

Any help will be appreciated.  (The thing is, I'm sure I've seen the info 
somewhere, but for the life of me I can't remember where...)

Thanks

Jason Doller


Re: [INTERFACES] PERL

From
"Brett W. McCoy"
Date:
On Sun, 19 Sep 1999, Jason Doller wrote:

> I'm a bit lost!  Where can I find documentation on accessing postgres 
> from inside PERL (5)?
> 
> Any help will be appreciated.  (The thing is, I'm sure I've seen the info 
> somewhere, but for the life of me I can't remember where...)

Under the source tree, go to the interfaces directory, and there is a
directory for the perl interface (Pg.pm).  You have to enable the Perl
option when you run configure, and you will also have to install the
module as root, since it gets installed under the module hierarchy
wherever you have Perl installed.  Then you only need to do a 'perldoc Pg'
to see the documentation on it.  See the build instructions for more
information on the how to install the Perl module.

Brett W. McCoy                                                  http://www.lan2wan.com/~bmccoy/
-----------------------------------------------------------------------
Don't let your mind wander -- it's too little to be let out alone.