Thread: subselect and optimizer

subselect and optimizer

From
t-ishii@sra.co.jp
Date:
Hi,

This question was sent to me by a user who uses PostgreSQL 6.3.1.
Is this normal?
(Note that the patch for src/backend/optimizer/path/prune.c created by
Vadim did not help)
--
Tatsuo Ishii
t-ishii@sra.co.jp
------------------------------------------------------------------
The following query seems to generate a rather slow query plan.

explain select * from product,order_tbl where
product.serial=order_tbl.serial and product.serial in (select serial
from order_tbl where cust_id='ABCDE');

    NOTICE:  QUERY PLAN:

    Hash Join  (cost=906.09 size=744 width=110)
      ->  Seq Scan on order_tbl  (cost=296.13 size=6822 width=36)
      ->  Hash  (cost=0.00 size=0 width=0)
            ->  Seq Scan on product  (cost=358.29 size=744 width=74)
                  SubPlan
                    ->  Index Scan on order_tbl  (cost=2.05 size=1 width=12)

    EXPLAIN

product and order_tbl are defined as follows:

create table product (
serial char(10) primary key,
pname char(15) not null,
price int2);
create index prod_name on product using hash(pname);

create table order_tbl (
cust_id char(5) primary key,
serial char(10) not null,
nums int2,
o_date date);
create index order_ser on order_tbl using hash(serial);

* product has 7289 tuples, and order_tbl has 6818 tuples.

Re: [HACKERS] subselect and optimizer

From
Bruce Momjian
Date:
I will say we have an optimization problem with tables being referenced
multiple times in a query, but I don't know if this is the cause, though
you could test it by making a copy of order_tbl with another name, and
testing the speed.
>
> Hi,
>
> This question was sent to me by a user who uses PostgreSQL 6.3.1.
> Is this normal?
> (Note that the patch for src/backend/optimizer/path/prune.c created by
> Vadim did not help)
> --
> Tatsuo Ishii
> t-ishii@sra.co.jp
> ------------------------------------------------------------------
> The following query seems to generate a rather slow query plan.
>
> explain select * from product,order_tbl where
> product.serial=order_tbl.serial and product.serial in (select serial
> from order_tbl where cust_id='ABCDE');
>
>     NOTICE:  QUERY PLAN:
>
>     Hash Join  (cost=906.09 size=744 width=110)
>       ->  Seq Scan on order_tbl  (cost=296.13 size=6822 width=36)
>       ->  Hash  (cost=0.00 size=0 width=0)
>             ->  Seq Scan on product  (cost=358.29 size=744 width=74)
>                   SubPlan
>                     ->  Index Scan on order_tbl  (cost=2.05 size=1 width=12)
>
>     EXPLAIN
>
> product and order_tbl are defined as follows:
>
> create table product (
> serial char(10) primary key,
> pname char(15) not null,
> price int2);
> create index prod_name on product using hash(pname);
>
> create table order_tbl (
> cust_id char(5) primary key,
> serial char(10) not null,
> nums int2,
> o_date date);
> create index order_ser on order_tbl using hash(serial);
>
> * product has 7289 tuples, and order_tbl has 6818 tuples.
>
>


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] subselect and optimizer

From
t-ishii@sra.co.jp
Date:
>I will say we have an optimization problem with tables being referenced
>multiple times in a query, but I don't know if this is the cause, though
>you could test it by making a copy of order_tbl with another name, and
>testing the speed.

Thank you for your suggestion. I made a copy of order_tbl (named
order_tbl1) and did a query:

    explain select * from product,order_tbl where \
              product.serial=order_tbl.serial and product.serial in \
                      (select serial from order_tbl1  where cust_id='H3550');
    NOTICE:  QUERY PLAN:

    Hash Join  (cost=934.65 size=798 width=112)
      ->  Seq Scan on order_tbl  (cost=296.82 size=6843 width=36)
      ->  Hash  (cost=0.00 size=0 width=0)
            ->  Seq Scan on product  (cost=383.71 size=797 width=76)
                  SubPlan
                    ->  Index Scan on order_tbl1  (cost=2.05 size=1 width=12)

Seems like no change here?
--
Tatsuo Ishii
t-ishii@sra.co.jp

Re: [HACKERS] subselect and optimizer

From
"Boersenspielteam"
Date:
Hi,

Vadim helped me with the patch for my query.

But this patch still didn't help for a simple join without a where
clause. The query plan says it uses two sequential scans, where 6.2.1
uses two index scans.

Strange stuff.

Seems like more than one problem it the optimizer code ...


Ciao

Das Boersenspielteam.

---------------------------------------------------------------------------
                          http://www.boersenspiel.de
                           Das Boersenspiel im Internet
             *Realitaetsnah*  *Kostenlos*  *Ueber 6000 Spieler*
---------------------------------------------------------------------------

Re: [HACKERS] subselect and optimizer

From
Bruce Momjian
Date:
>
> Hi,
>
> Vadim helped me with the patch for my query.
>
> But this patch still didn't help for a simple join without a where
> clause. The query plan says it uses two sequential scans, where 6.2.1
> uses two index scans.
>
> Strange stuff.
>
> Seems like more than one problem it the optimizer code ...
>

But we didn't have subselcts in 6.2.1?

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] subselect and optimizer

From
"Boersenspielteam"
Date:
> > But this patch still didn't help for a simple join without a where
> > clause. The query plan says it uses two sequential scans, where 6.2.1
> > uses two index scans.
>
> But we didn't have subselcts in 6.2.1?

No, but in the more general case of a simple join over two tables
with fields with an index declared on them.

say: Select * from Trans, Spieler where
Spieler.spieler_nr=Trans.spieler_nr

Uses indices in 6.2.1, doesn't use them in 6.3.1 (two seq scans).

I just wanted to remind you, that these problems are not restricted
to subqueries, but seem to be a more general 'flaw' in 6.3.x .

Hope this helps.

Ulrich

Re: [HACKERS] subselect and optimizer

From
Bruce Momjian
Date:
>
> > > But this patch still didn't help for a simple join without a where
> > > clause. The query plan says it uses two sequential scans, where 6.2.1
> > > uses two index scans.
> >
> > But we didn't have subselcts in 6.2.1?
>
> No, but in the more general case of a simple join over two tables
> with fields with an index declared on them.
>
> say: Select * from Trans, Spieler where
> Spieler.spieler_nr=Trans.spieler_nr
>
> Uses indices in 6.2.1, doesn't use them in 6.3.1 (two seq scans).
>
> I just wanted to remind you, that these problems are not restricted
> to subqueries, but seem to be a more general 'flaw' in 6.3.x .

Ah, but that is fixed in 6.3.2 beta.  We particularly waited for a fix
for this before releasing a new beta.  But you say you have Vadim's fix
that is in 6.3.2, and it still doesn't work?

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] subselect and optimizer

From
"Boersenspielteam"
Date:
Hi Bruce,


> > > > But this patch still didn't help for a simple join without a where
> > > > clause. The query plan says it uses two sequential scans, where 6.2.1
> > > > uses two index scans.
> > >
> > > But we didn't have subselcts in 6.2.1?
> >
> > No, but in the more general case of a simple join over two tables
> > with fields with an index declared on them.
> >
> > say: Select * from Trans, Spieler where
> > Spieler.spieler_nr=Trans.spieler_nr
> >
> > Uses indices in 6.2.1, doesn't use them in 6.3.1 (two seq scans).
> >
> > I just wanted to remind you, that these problems are not restricted
> > to subqueries, but seem to be a more general 'flaw' in 6.3.x .
>
> Ah, but that is fixed in 6.3.2 beta.  We particularly waited for a fix
> for this before releasing a new beta.  But you say you have Vadim's fix
> that is in 6.3.2, and it still doesn't work?

Yep, exactly. The query with the where clause is fixed after
applying Vadim's prune.c patch, simple join still uses two seq scans
:-(

I uploaded test data and Vadim fixed one file, but asked you
(Bruce) to look over other files of the optimizer code. There seem
to be other bugs in the optimizer code, which were introduced between
6.2.1 and 6.3. We have seen about 5-6 error reports from different
people, from the simpliest queries like my simple join to rather
complex subqueries. But when a simple join doesn't work (ok, it
works, but kind of crawls), this error is supposed to pop up under
other circumstances too.

Hope you can find this nasty little bug, cause it makes postgres
unusable. Especially before going into development again.

See the mailinglist archives for a post of mine. There is a link in
it,where you can download the test data, it should still be
there. (don't have access to this from home)

I greatly appreciate all the time and hard work all you
PostgreSQL-hackers and contributors put into this fantastic freeware
product. Just to let you know.

Ciao

Ulrich





Ulrich Voss                            \ \   / /__  / ___|__ _| |
VoCal web publishing                    \ \ / / _ \| |   / _` | |
voss@vocalweb.de                         \ V / (_) | |__| (_| | |
http://www.vocalweb.de                    \_/ \___/ \____\__,_|_|
http://www.boersenspiel.de                         web publishing

Re: [HACKERS] subselect and optimizer

From
Bruce Momjian
Date:
> Yep, exactly. The query with the where clause is fixed after
> applying Vadim's prune.c patch, simple join still uses two seq scans
> :-(
>
> I uploaded test data and Vadim fixed one file, but asked you
> (Bruce) to look over other files of the optimizer code. There seem
> to be other bugs in the optimizer code, which were introduced between
> 6.2.1 and 6.3. We have seen about 5-6 error reports from different
> people, from the simpliest queries like my simple join to rather
> complex subqueries. But when a simple join doesn't work (ok, it
> works, but kind of crawls), this error is supposed to pop up under
> other circumstances too.
>
> Hope you can find this nasty little bug, cause it makes postgres
> unusable. Especially before going into development again.
>
> See the mailinglist archives for a post of mine. There is a link in
> it,where you can download the test data, it should still be
> there. (don't have access to this from home)
>
> I greatly appreciate all the time and hard work all you
> PostgreSQL-hackers and contributors put into this fantastic freeware
> product. Just to let you know.

Here is the prune.c file from 6.2.1.  Please try it and let me know if
it fixes the problem:

---------------------------------------------------------------------------

/*-------------------------------------------------------------------------
 *
 * prune.c--
 *      Routines to prune redundant paths and relations
 *
 * Copyright (c) 1994, Regents of the University of California
 *
 *
 * IDENTIFICATION
 *      $Header: /usr/local/cvsroot/pgsql/src/backend/optimizer/path/prune.c,v 1.6 1997/09/08 21:45:08 momjian Exp $
 *
 *-------------------------------------------------------------------------
 */
#include "postgres.h"

#include "nodes/pg_list.h"
#include "nodes/relation.h"

#include "optimizer/internal.h"
#include "optimizer/cost.h"
#include "optimizer/paths.h"
#include "optimizer/pathnode.h"

#include "utils/elog.h"


static List *prune_joinrel(Rel *rel, List *other_rels);

/*
 * prune-joinrels--
 *      Removes any redundant relation entries from a list of rel nodes
 *      'rel-list'.
 *
 * Returns the resulting list.
 *
 */
List       *
prune_joinrels(List *rel_list)
{
    List       *temp_list = NIL;

    if (rel_list != NIL)
    {
        temp_list = lcons(lfirst(rel_list),
                   prune_joinrels(prune_joinrel((Rel *) lfirst(rel_list),
                                                lnext(rel_list))));
    }
    return (temp_list);
}

/*
 * prune-joinrel--
 *      Prunes those relations from 'other-rels' that are redundant with
 *      'rel'.  A relation is redundant if it is built up of the same
 *      relations as 'rel'.  Paths for the redundant relation are merged into
 *      the pathlist of 'rel'.
 *
 * Returns a list of non-redundant relations, and sets the pathlist field
 * of 'rel' appropriately.
 *
 */
static List *
prune_joinrel(Rel *rel, List *other_rels)
{
    List       *i = NIL;
    List       *t_list = NIL;
    List       *temp_node = NIL;
    Rel           *other_rel = (Rel *) NULL;

    foreach(i, other_rels)
    {
        other_rel = (Rel *) lfirst(i);
        if (same(rel->relids, other_rel->relids))
        {
            rel->pathlist = add_pathlist(rel,
                                         rel->pathlist,
                                         other_rel->pathlist);
            t_list = nconc(t_list, NIL);        /* XXX is this right ? */
        }
        else
        {
            temp_node = lcons(other_rel, NIL);
            t_list = nconc(t_list, temp_node);
        }
    }
    return (t_list);
}

/*
 * prune-rel-paths--
 *      For each relation entry in 'rel-list' (which corresponds to a join
 *      relation), set pointers to the unordered path and cheapest paths
 *      (if the unordered path isn't the cheapest, it is pruned), and
 *      reset the relation's size field to reflect the join.
 *
 * Returns nothing of interest.
 *
 */
void
prune_rel_paths(List *rel_list)
{
    List       *x = NIL;
    List       *y = NIL;
    Path       *path = NULL;
    Rel           *rel = (Rel *) NULL;
    JoinPath   *cheapest = (JoinPath *) NULL;

    foreach(x, rel_list)
    {
        rel = (Rel *) lfirst(x);
        rel->size = 0;
        foreach(y, rel->pathlist)
        {
            path = (Path *) lfirst(y);

            if (!path->p_ordering.ord.sortop)
            {
                break;
            }
        }
        cheapest = (JoinPath *) prune_rel_path(rel, path);
        if (IsA_JoinPath(cheapest))
        {
            rel->size = compute_joinrel_size(cheapest);
        }
        else
            elog(WARN, "non JoinPath called");
    }
}


/*
 * prune-rel-path--
 *      Compares the unordered path for a relation with the cheapest path. If
 *      the unordered path is not cheapest, it is pruned.
 *
 *      Resets the pointers in 'rel' for unordered and cheapest paths.
 *
 * Returns the cheapest path.
 *
 */
Path       *
prune_rel_path(Rel *rel, Path *unorderedpath)
{
    Path       *cheapest = set_cheapest(rel, rel->pathlist);

    /* don't prune if not pruneable  -- JMH, 11/23/92 */
    if (unorderedpath != cheapest
        && rel->pruneable)
    {

        rel->unorderedpath = (Path *) NULL;
        rel->pathlist = lremove(unorderedpath, rel->pathlist);
    }
    else
    {
        rel->unorderedpath = (Path *) unorderedpath;
    }

    return (cheapest);
}

/*
 * merge-joinrels--
 *      Given two lists of rel nodes that are already
 *      pruned, merge them into one pruned rel node list
 *
 * 'rel-list1' and
 * 'rel-list2' are the rel node lists
 *
 * Returns one pruned rel node list
 */
List       *
merge_joinrels(List *rel_list1, List *rel_list2)
{
    List       *xrel = NIL;

    foreach(xrel, rel_list1)
    {
        Rel           *rel = (Rel *) lfirst(xrel);

        rel_list2 = prune_joinrel(rel, rel_list2);
    }
    return (append(rel_list1, rel_list2));
}

/*
 * prune_oldrels--
 *      If all the joininfo's in a rel node are inactive,
 *      that means that this node has been joined into
 *      other nodes in all possible ways, therefore
 *      this node can be discarded.  If not, it will cause
 *      extra complexity of the optimizer.
 *
 * old_rels is a list of rel nodes
 *
 * Returns a new list of rel nodes
 */
List       *
prune_oldrels(List *old_rels)
{
    Rel           *rel;
    List       *joininfo_list,
               *xjoininfo;

    if (old_rels == NIL)
        return (NIL);

    rel = (Rel *) lfirst(old_rels);
    joininfo_list = rel->joininfo;
    if (joininfo_list == NIL)
        return (lcons(rel, prune_oldrels(lnext(old_rels))));

    foreach(xjoininfo, joininfo_list)
    {
        JInfo       *joininfo = (JInfo *) lfirst(xjoininfo);

        if (!joininfo->inactive)
            return (lcons(rel, prune_oldrels(lnext(old_rels))));
    }
    return (prune_oldrels(lnext(old_rels)));
}


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] subselect and optimizer

From
"Boersenspielteam"
Date:
> Here is the prune.c file from 6.2.1.  Please try it and let me know if
> it fixes the problem:

Nope, doesn't even compile ...

---

gcc -I../../../include -I../../../backend -I/usr/include/curses -O2
-Wall -Wmissing-prototypes -I../..    -c prune.c -o prune.o
prune.c:39: conflicting types for `prune_joinrels'
../../../include/optimizer/paths.h:95: previous declaration of
`prune_joinrels' prune.c: In function `prune_rel_paths': prune.c:127:
`WARN' undeclared (first use this function) prune.c:127: (Each
undeclared identifier is reported only once prune.c:127: for each
function it appears in.)

---

Ciao

Das Boersenspielteam.

---------------------------------------------------------------------------
                          http://www.boersenspiel.de
                           Das Boersenspiel im Internet
             *Realitaetsnah*  *Kostenlos*  *Ueber 6000 Spieler*
---------------------------------------------------------------------------

Re: [HACKERS] subselect and optimizer

From
Bruce Momjian
Date:
>
>
> > Here is the prune.c file from 6.2.1.  Please try it and let me know if
> > it fixes the problem:
>
> Nope, doesn't even compile ...
>
> ---
>
> gcc -I../../../include -I../../../backend -I/usr/include/curses -O2
> -Wall -Wmissing-prototypes -I../..    -c prune.c -o prune.o
> prune.c:39: conflicting types for `prune_joinrels'
> ../../../include/optimizer/paths.h:95: previous declaration of
> `prune_joinrels' prune.c: In function `prune_rel_paths': prune.c:127:
> `WARN' undeclared (first use this function) prune.c:127: (Each
> undeclared identifier is reported only once prune.c:127: for each
> function it appears in.)

OK, please try this patch.  It reverts us to 6.2.1 for this file only.
It may have to be cleaned up a little.  Not sure.  This is against the
current source tree, which is probably the same as 6.3.2 beta.

---------------------------------------------------------------------------


*** /pg/backend/optimizer/path/prune.c    Thu Apr  2 10:18:46 1998
--- /root/prune.c    Thu Apr  2 11:40:32 1998
***************
*** 7,13 ****
   *
   *
   * IDENTIFICATION
!  *      $Header: /usr/local/cvsroot/pgsql/src/backend/optimizer/path/prune.c,v 1.13 1998/04/02 07:27:15 vadim Exp $
   *
   *-------------------------------------------------------------------------
   */
--- 7,13 ----
   *
   *
   * IDENTIFICATION
!  *      $Header: /usr/local/cvsroot/pgsql/src/backend/optimizer/path/prune.c,v 1.6 1997/09/08 21:45:08 momjian Exp $
   *
   *-------------------------------------------------------------------------
   */
***************
*** 29,50 ****
  /*
   * prune-joinrels--
   *      Removes any redundant relation entries from a list of rel nodes
!  *      'rel-list'.  Obviosly, the first relation can't be a duplicate.
   *
   * Returns the resulting list.
   *
   */
! void
  prune_joinrels(List *rel_list)
  {
!     List       *i;

!     /*
!      * rel_list can shorten while running as duplicate relations are
!      * deleted
!      */
!     foreach(i, rel_list)
!         lnext(i) = prune_joinrel((Rel *) lfirst(i), lnext(i));
  }

  /*
--- 29,51 ----
  /*
   * prune-joinrels--
   *      Removes any redundant relation entries from a list of rel nodes
!  *      'rel-list'.
   *
   * Returns the resulting list.
   *
   */
! List       *
  prune_joinrels(List *rel_list)
  {
!     List       *temp_list = NIL;

!     if (rel_list != NIL)
!     {
!         temp_list = lcons(lfirst(rel_list),
!                    prune_joinrels(prune_joinrel((Rel *) lfirst(rel_list),
!                                                 lnext(rel_list))));
!     }
!     return (temp_list);
  }

  /*
***************
*** 62,85 ****
  prune_joinrel(Rel *rel, List *other_rels)
  {
      List       *i = NIL;
!     List       *result = NIL;

      foreach(i, other_rels)
      {
!         Rel       *other_rel = (Rel *) lfirst(i);
!
          if (same(rel->relids, other_rel->relids))
          {
              rel->pathlist = add_pathlist(rel,
                                           rel->pathlist,
                                           other_rel->pathlist);
          }
          else
          {
!             result = nconc(result, lcons(other_rel, NIL));
          }
      }
!     return (result);
  }

  /*
--- 63,89 ----
  prune_joinrel(Rel *rel, List *other_rels)
  {
      List       *i = NIL;
!     List       *t_list = NIL;
!     List       *temp_node = NIL;
!     Rel           *other_rel = (Rel *) NULL;

      foreach(i, other_rels)
      {
!         other_rel = (Rel *) lfirst(i);
          if (same(rel->relids, other_rel->relids))
          {
              rel->pathlist = add_pathlist(rel,
                                           rel->pathlist,
                                           other_rel->pathlist);
+             t_list = nconc(t_list, NIL);        /* XXX is this right ? */
          }
          else
          {
!             temp_node = lcons(other_rel, NIL);
!             t_list = nconc(t_list, temp_node);
          }
      }
!     return (t_list);
  }

  /*
***************
*** 120,126 ****
              rel->size = compute_joinrel_size(cheapest);
          }
          else
!             elog(ERROR, "non JoinPath called");
      }
  }

--- 124,130 ----
              rel->size = compute_joinrel_size(cheapest);
          }
          else
!             elog(WARN, "non JoinPath called");
      }
  }

***************
*** 135,141 ****
   * Returns the cheapest path.
   *
   */
! Path *
  prune_rel_path(Rel *rel, Path *unorderedpath)
  {
      Path       *cheapest = set_cheapest(rel, rel->pathlist);
--- 139,145 ----
   * Returns the cheapest path.
   *
   */
! Path       *
  prune_rel_path(Rel *rel, Path *unorderedpath)
  {
      Path       *cheapest = set_cheapest(rel, rel->pathlist);
***************
*** 166,172 ****
   *
   * Returns one pruned rel node list
   */
! List *
  merge_joinrels(List *rel_list1, List *rel_list2)
  {
      List       *xrel = NIL;
--- 170,176 ----
   *
   * Returns one pruned rel node list
   */
! List       *
  merge_joinrels(List *rel_list1, List *rel_list2)
  {
      List       *xrel = NIL;
***************
*** 192,226 ****
   *
   * Returns a new list of rel nodes
   */
! List *
  prune_oldrels(List *old_rels)
  {
      Rel           *rel;
      List       *joininfo_list,
!                *xjoininfo,
!                *i,
!                *temp_list = NIL;

!     foreach(i, old_rels)
!     {
!         rel = (Rel *) lfirst(i);
!         joininfo_list = rel->joininfo;

!         if (joininfo_list == NIL)
!             temp_list = lcons(rel, temp_list);
!         else
!         {
!             foreach(xjoininfo, joininfo_list)
!             {
!                 JInfo       *joininfo = (JInfo *) lfirst(xjoininfo);

!                 if (!joininfo->inactive)
!                 {
!                     temp_list = lcons(rel, temp_list);
!                     break;
!                 }
!             }
!         }
      }
!     return temp_list;
  }
--- 196,222 ----
   *
   * Returns a new list of rel nodes
   */
! List       *
  prune_oldrels(List *old_rels)
  {
      Rel           *rel;
      List       *joininfo_list,
!                *xjoininfo;

!     if (old_rels == NIL)
!         return (NIL);

!     rel = (Rel *) lfirst(old_rels);
!     joininfo_list = rel->joininfo;
!     if (joininfo_list == NIL)
!         return (lcons(rel, prune_oldrels(lnext(old_rels))));

!     foreach(xjoininfo, joininfo_list)
!     {
!         JInfo       *joininfo = (JInfo *) lfirst(xjoininfo);
!
!         if (!joininfo->inactive)
!             return (lcons(rel, prune_oldrels(lnext(old_rels))));
      }
!     return (prune_oldrels(lnext(old_rels)));
  }


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] subselect and optimizer

From
"Vadim B. Mikheev"
Date:
Boersenspielteam wrote:
>
> No, but in the more general case of a simple join over two tables
> with fields with an index declared on them.
>
> say: Select * from Trans, Spieler where
> Spieler.spieler_nr=Trans.spieler_nr
>
> Uses indices in 6.2.1, doesn't use them in 6.3.1 (two seq scans).

Sorry, old mail from you is lost - what was execution plan in 6.2.1 ?

In current I see that

Hash Join  (cost=5905.62 size=3343409 width=8)
  ->  Seq Scan on trans  (cost=3154.70 size=71112 width=4)
  ->  Hash  (cost=0.00 size=0 width=0)
        ->  Seq Scan on kurse  (cost=238.61 size=4958 width=4)

IS FASTEST plan ! Result is returned in ~ 56 sec.

Nested Loop  (cost=148934.30 size=3343409 width=8)
  ->  Seq Scan on trans  (cost=3154.70 size=71112 width=4)
  ->  Index Scan on kurse  (cost=2.05 size=4958 width=4)

returns result in ~ 80 sec.

Merge Join  (cost=7411.81 size=3343409 width=8)
  ->  Index Scan on kurse  (cost=337.90 size=4958 width=4)
  ->  Index Scan on trans  (cost=4563.60 size=71112 width=4)

is SLOWEST plan (~200 sec).

Please don't think that using indices is the best way in all cases...

BTW, you can use -fX _backend_ option to forbid some join methods -
I used '-o -fh' to get MJ plan and '-o -fh -fm' to test NL plan.

Vadim

Re: [HACKERS] subselect and optimizer

From
"Boersenspielteam"
Date:
Hi,

then I think this one is solved.

I'll try to reproduce it on my machine, if I get the same results, I
will be a quiet and happy Postgres user again ;-)

I don't have the message, that originated this thread, but is the
slow subselect  from Tatsuo fixed?

Tatsua, can you test queries with the abckend options suggested by
Vadim?

> In current I see that
>
> Hash Join  (cost=5905.62 size=3343409 width=8)
>   ->  Seq Scan on trans  (cost=3154.70 size=71112 width=4)
>   ->  Hash  (cost=0.00 size=0 width=0)
>         ->  Seq Scan on kurse  (cost=238.61 size=4958 width=4)
>
> IS FASTEST plan ! Result is returned in ~ 56 sec.
>
> Nested Loop  (cost=148934.30 size=3343409 width=8)
>   ->  Seq Scan on trans  (cost=3154.70 size=71112 width=4)
>   ->  Index Scan on kurse  (cost=2.05 size=4958 width=4)
>
> returns result in ~ 80 sec.
>
> Merge Join  (cost=7411.81 size=3343409 width=8)
>   ->  Index Scan on kurse  (cost=337.90 size=4958 width=4)
>   ->  Index Scan on trans  (cost=4563.60 size=71112 width=4)
>
> is SLOWEST plan (~200 sec).
>
> Please don't think that using indices is the best way in all cases...
>
> BTW, you can use -fX _backend_ option to forbid some join methods -
> I used '-o -fh' to get MJ plan and '-o -fh -fm' to test NL plan.
>
> Vadim
>

Ciao

Das Boersenspielteam.

---------------------------------------------------------------------------
                          http://www.boersenspiel.de
                           Das Boersenspiel im Internet
             *Realitaetsnah*  *Kostenlos*  *Ueber 6000 Spieler*
---------------------------------------------------------------------------

Re: [HACKERS] subselect and optimizer

From
Bruce Momjian
Date:
>
> Boersenspielteam wrote:
> >
> > No, but in the more general case of a simple join over two tables
> > with fields with an index declared on them.
> >
> > say: Select * from Trans, Spieler where
> > Spieler.spieler_nr=Trans.spieler_nr
> >
> > Uses indices in 6.2.1, doesn't use them in 6.3.1 (two seq scans).
>
> Sorry, old mail from you is lost - what was execution plan in 6.2.1 ?
>
> In current I see that
>
> Hash Join  (cost=5905.62 size=3343409 width=8)
>   ->  Seq Scan on trans  (cost=3154.70 size=71112 width=4)
>   ->  Hash  (cost=0.00 size=0 width=0)
>         ->  Seq Scan on kurse  (cost=238.61 size=4958 width=4)
>
> IS FASTEST plan ! Result is returned in ~ 56 sec.

This is very helpful, and what I suspected.

Two issues.  First, I have heard reports that the optimizer in 6.3.2 is
better than 6.2.1, where indexes are used in 6.3.2 that were not used in
6.2.1.  In your case, you are seeing the opposite, but that may be OK
too.

Second, using an index to join two large tables is not always a good
thing.  The index can be scanned quickly, but it must find the heap for
every index entry, and that can cause the system to scan all over the
heap getting pages.  Sometimes, it is better to just scan through the
heap, and make your own hash index, which is the plan that it is being
used.

> Nested Loop  (cost=148934.30 size=3343409 width=8)
>   ->  Seq Scan on trans  (cost=3154.70 size=71112 width=4)
>   ->  Index Scan on kurse  (cost=2.05 size=4958 width=4)
>
> returns result in ~ 80 sec.
>
> Merge Join  (cost=7411.81 size=3343409 width=8)
>   ->  Index Scan on kurse  (cost=337.90 size=4958 width=4)
>   ->  Index Scan on trans  (cost=4563.60 size=71112 width=4)
>
> is SLOWEST plan (~200 sec).
>
> Please don't think that using indices is the best way in all cases...
>
> BTW, you can use -fX _backend_ option to forbid some join methods -
> I used '-o -fh' to get MJ plan and '-o -fh -fm' to test NL plan.

This is also very helpful.  I had forgotten these options existed.

Hopefully we don't have a bug here.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] subselect and optimizer

From
t-ishii@sra.co.jp
Date:
>then I think this one is solved.
>
>I'll try to reproduce it on my machine, if I get the same results, I
>will be a quiet and happy Postgres user again ;-)
>
>I don't have the message, that originated this thread, but is the
>slow subselect  from Tatsuo fixed?
>
>Tatsua, can you test queries with the abckend options suggested by
>Vadim?

I have tested with 6.3.2 beta. (Sorry test data is not same as my
original posting) Here are results:

"postal" table holds ~110k records. "prefecture" table has 47 records.
query is as follows:

select * from prefecture,postal where prefecture.pid = postal.pid and
postal.town in (select town from postal where newcode = '1040061');

All of columns that appear above have btree index.

No options to backend produced a nested loop plan.

Nested Loop  (cost=98.90 size=11888 width=92)
  ->  Seq Scan on prefecture  (cost=2.55 size=47 width=26)
  ->  Index Scan on postal  (cost=2.05 size=11888 width=66)
        SubPlan
          ->  Index Scan on postal  (cost=2.05 size=2 width=12)

>        26.78 real        22.35 user         0.58 sys

Next I gave -fn to the backend.

Hash Join  (cost=6246.48 size=11888 width=92)
  ->  Seq Scan on postal  (cost=5842.97 size=11888 width=66)
        SubPlan
          ->  Index Scan on postal  (cost=2.05 size=2 width=12)
  ->  Hash  (cost=0.00 size=0 width=0)
        ->  Seq Scan on prefecture  (cost=2.55 size=47 width=26)

>        24.97 real        21.30 user         0.50 sys

Finally I tried merge join.

Merge Join  (cost=8580.86 size=11888 width=92)
  ->  Seq Scan  (cost=2.55 size=0 width=0)
        ->  Sort  (cost=2.55 size=0 width=0)
              ->  Seq Scan on prefecture  (cost=2.55 size=47 width=26)
  ->  Index Scan on postal  (cost=8181.90 size=11888 width=66)
        SubPlan
          ->  Index Scan on postal  (cost=2.05 size=2 width=12)

>> In current I see that
>>
>> Hash Join  (cost=5905.62 size=3343409 width=8)
>>   ->  Seq Scan on trans  (cost=3154.70 size=71112 width=4)
>>   ->  Hash  (cost=0.00 size=0 width=0)
>>         ->  Seq Scan on kurse  (cost=238.61 size=4958 width=4)
>        25.63 real        22.13 user         0.51 sys

So in my case Merge Join was the fastest, Hash Join and Nested Loop
(PostgreSQL decides this was the best) were almost same.
I tried for several times and the tendency seemed not changed.
Anyway the differences were not so big.
--
Tatsuo Ishii
t-ishii@sra.co.jp