Thread: Strange query plan

Strange query plan

From
Oleg Bartunov
Date:
Tom,

I have a problem with slow query execution (postgresql 7.1.2):

There are 2 tables - idx, msg_prt:
bug=# \dt   List of relations Name   | Type  | Owner
---------+-------+--------idx     | table | megeramsg_prt | table | megera
(2 rows)

bug=#  \d idx         Table "idx"Attribute |  Type   | Modifier
-----------+---------+----------tid       | integer |lid       | integer |did       | integer |
Index: idxidx

bug=#  \d msg_prt       Table "msg_prt"Attribute |  Type   | Modifier
-----------+---------+----------tid       | integer |
Index: mprt_tid


Also there are 2 indexes - idxidx, mprt_tid

bug=# \d idxidx  Index "idxidx"Attribute |  Type
-----------+---------lid       | integerdid       | integertid       | integer
unique btree

bug=# \d mprt_tid Index "mprt_tid"Attribute |  Type
-----------+---------tid       | integer
unique btree


Query is:
select msg_prt.tid as mid from msg_prt
where exists (select idx.tid from idx where msg_prt.tid=idx.tid  and idx.did=1 and idx.lid in (1207,59587) )

Plan for this query looks very ineffective and query is very slow:

select msg_prt.tid as mid from msg_prtwhere exists (select idx.tid from idx where msg_prt.tid=idx.tid               and
idx.did=1and idx.lid in (1207,59587) )
 
NOTICE:  QUERY PLAN:

Seq Scan on msg_prt  (cost=0.00..119090807.13 rows=69505 width=4) SubPlan   ->  Index Scan using idxidx, idxidx on idx
(cost=0.00..1713.40rows=1 width=4)
 

total: 6.80 sec; number: 1; for one: 6.796 sec;

Statistics on tables:
idx     - 103651 rows
msg_prt - 69505  rows

There are only 16 rows in 'idx' table satisfied subselect condition.
I did vacuum analyze.

Adding another index 'create index tididx on idx (tid);' helps:
select msg_prt.tid as mid from msg_prtwhere exists (select idx.tid from idx where msg_prt.tid=idx.tid              and
idx.did=1and idx.lid in (1207,59587) )
 
NOTICE:  QUERY PLAN:

Seq Scan on msg_prt  (cost=0.00..1134474.94 rows=69505 width=4) SubPlan   ->  Index Scan using tididx on idx
(cost=0.00..16.31rows=1 width=4)
 

total: 1.71 sec; number: 1; for one: 1.711 sec;

but still plan looks ineffective.

The best plan I've got eliminating  IN predicate:
select msg_prt.tid as mid from msg_prtwhere exists (select idx.tid from idx where msg_prt.tid=idx.tid              and
idx.did=1and idx.lid = 1207 and idx.lid=59587 )
 
NOTICE:  QUERY PLAN:

Seq Scan on msg_prt  (cost=0.00..167368.47 rows=69505 width=4) SubPlan   ->  Index Scan using idxidx on idx
(cost=0.00..2.39rows=1 width=4)
 

total: 0.54 sec; number: 1; for one: 0.541 sec;

Unfortunately I can't use this way in general case.

Does it's a known problem ?

data+schema is available from
http://www.sai.msu.su/~megera/postgres/data/bug.dump.gz
It's about 500Kb !
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: Strange query plan

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> should be
> select msg_prt.tid as mid from msg_prt
>  where exists (select idx.tid from idx where msg_prt.tid=idx.tid
>                and idx.did=1 and ( idx.lid = 1207 or idx.lid=59587 ));
> but this is not a big win.

Shouldn't be any win at all: the IN expression-list notation will get
translated to exactly that form.

> Anyway, what's about original query ?

IN/EXISTS subqueries suck.  This has been true for a long time and is
going to be true for a while longer, unless someone else fixes it before
I have a chance to look at it.  See if you can't rewrite your query as
a plain join.
        regards, tom lane


Re: Strange query plan

From
Oleg Bartunov
Date:
On Tue, 5 Jun 2001, Tom Lane wrote:

> Oleg Bartunov <oleg@sai.msu.su> writes:
> > should be
> > select msg_prt.tid as mid from msg_prt
> >  where exists (select idx.tid from idx where msg_prt.tid=idx.tid
> >                and idx.did=1 and ( idx.lid = 1207 or idx.lid=59587 ));
> > but this is not a big win.
>
> Shouldn't be any win at all: the IN expression-list notation will get
> translated to exactly that form.

Sure

>
> > Anyway, what's about original query ?
>
> IN/EXISTS subqueries suck.  This has been true for a long time and is
> going to be true for a while longer, unless someone else fixes it before
> I have a chance to look at it.  See if you can't rewrite your query as
> a plain join.
>

That's why we've moved to GiST and feel fine :-) That query we used in
our old project, now swithing to GiST.


>             regards, tom lane
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: Strange query plan

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> select msg_prt.tid as mid from msg_prt
>  where exists (select idx.tid from idx where msg_prt.tid=idx.tid
>                 and idx.did=1 and idx.lid in (1207,59587) )
> NOTICE:  QUERY PLAN:

> Seq Scan on msg_prt  (cost=0.00..119090807.13 rows=69505 width=4)
>   SubPlan
>     ->  Index Scan using idxidx, idxidx on idx  (cost=0.00..1713.40 rows=1 width=4)

Actually, this example does reveal an unnecessary inefficiency: the
planner is only using the "idx.lid in (1207,59587)" clause for the
indexscan, ignoring the fact that the did and tid clauses match the
additional columns of your three-column index.  The attached patch
should improve matters.
        regards, tom lane


*** src/backend/optimizer/path/indxpath.c.orig    Sun May 20 16:28:18 2001
--- src/backend/optimizer/path/indxpath.c    Tue Jun  5 12:38:21 2001
***************
*** 397,403 ****                                         clause, false); } 
! /*  * Given an OR subclause that has previously been determined to match  * the specified index, extract a list of
specificopclauses that can be  * used as indexquals.
 
--- 397,403 ----                                         clause, false); } 
! /*----------  * Given an OR subclause that has previously been determined to match  * the specified index, extract a
listof specific opclauses that can be  * used as indexquals.
 
***************
*** 406,415 ****  * given opclause.    However, if the OR subclause is an AND, we have to  * scan it to find the
opclause(s)that match the index.  (There should  * be at least one, if match_or_subclause_to_indexkey succeeded, but
there
!  * could be more.)    Also, we apply expand_indexqual_conditions() to convert
!  * any special matching opclauses to indexable operators.  *  * The passed-in clause is not changed.  */ List *
extract_or_indexqual_conditions(RelOptInfo*rel,
 
--- 406,430 ----  * given opclause.    However, if the OR subclause is an AND, we have to  * scan it to find the
opclause(s)that match the index.  (There should  * be at least one, if match_or_subclause_to_indexkey succeeded, but
there
!  * could be more.)
!  *
!  * Also, we can look at other restriction clauses of the rel to discover
!  * additional candidate indexquals: for example, consider
!  *            ... where (a = 11 or a = 12) and b = 42;
!  * If we are dealing with an index on (a,b) then we can include the clause
!  * b = 42 in the indexqual list generated for each of the OR subclauses.
!  * Essentially, we are making an index-specific transformation from CNF to
!  * DNF.  (NOTE: when we do this, we end up with a slightly inefficient plan
!  * because create_indexscan_plan is not very bright about figuring out which
!  * restriction clauses are implied by the generated indexqual condition.
!  * Currently we'll end up rechecking both the OR clause and the transferred
!  * restriction clause as qpquals.  FIXME someday.)
!  *
!  * Also, we apply expand_indexqual_conditions() to convert any special
!  * matching opclauses to indexable operators.  *  * The passed-in clause is not changed.
+  *----------  */ List * extract_or_indexqual_conditions(RelOptInfo *rel,
***************
*** 417,470 ****                                 Expr *orsubclause) {     List       *quals = NIL; 
!     if (and_clause((Node *) orsubclause))     { 
!         /*
!          * Extract relevant sub-subclauses in indexkey order.  This is
!          * just like group_clauses_by_indexkey() except that the input and
!          * output are lists of bare clauses, not of RestrictInfo nodes.
!          */
!         int           *indexkeys = index->indexkeys;
!         Oid           *classes = index->classlist; 
!         do         {
!             int            curIndxKey = indexkeys[0];
!             Oid            curClass = classes[0];
!             List       *clausegroup = NIL;
!             List       *item; 
!             foreach(item, orsubclause->args)             {                 if (match_clause_to_indexkey(rel, index,
                                          curIndxKey, curClass,
 
!                                              lfirst(item), false))
!                     clausegroup = lappend(clausegroup, lfirst(item));             } 
!             /*
!              * If no clauses match this key, we're done; we don't want to
!              * look at keys to its right.
!              */
!             if (clausegroup == NIL)
!                 break;
! 
!             quals = nconc(quals, clausegroup);
! 
!             indexkeys++;
!             classes++;
!         } while (!DoneMatchingIndexKeys(indexkeys, index));
! 
!         if (quals == NIL)
!             elog(ERROR, "extract_or_indexqual_conditions: no matching clause");
!     }
!     else
!     {
!         /* we assume the caller passed a valid indexable qual */
!         quals = makeList1(orsubclause);
!     }      return expand_indexqual_conditions(quals); }
--- 432,503 ----                                 Expr *orsubclause) {     List       *quals = NIL;
+     int           *indexkeys = index->indexkeys;
+     Oid           *classes = index->classlist; 
!     /*
!      * Extract relevant indexclauses in indexkey order.  This is essentially
!      * just like group_clauses_by_indexkey() except that the input and
!      * output are lists of bare clauses, not of RestrictInfo nodes.
!      */
!     do     {
+         int            curIndxKey = indexkeys[0];
+         Oid            curClass = classes[0];
+         List       *clausegroup = NIL;
+         List       *item; 
!         if (and_clause((Node *) orsubclause))
!         {
!             foreach(item, orsubclause->args)
!             {
!                 Expr   *subsubclause = (Expr *) lfirst(item); 
!                 if (match_clause_to_indexkey(rel, index,
!                                              curIndxKey, curClass,
!                                              subsubclause, false))
!                     clausegroup = lappend(clausegroup, subsubclause);
!             }
!         }
!         else if (match_clause_to_indexkey(rel, index,
!                                           curIndxKey, curClass,
!                                           orsubclause, false))         {
!             clausegroup = makeList1(orsubclause);
!         } 
!         /*
!          * If we found no clauses for this indexkey in the OR subclause
!          * itself, try looking in the rel's top-level restriction list.
!          */
!         if (clausegroup == NIL)
!         {
!             foreach(item, rel->baserestrictinfo)             {
+                 RestrictInfo *rinfo = (RestrictInfo *) lfirst(item);
+                  if (match_clause_to_indexkey(rel, index,                                              curIndxKey,
curClass,
!                                              rinfo->clause, false))
!                     clausegroup = lappend(clausegroup, rinfo->clause);             }
+         } 
!         /*
!          * If still no clauses match this key, we're done; we don't want to
!          * look at keys to its right.
!          */
!         if (clausegroup == NIL)
!             break;
! 
!         quals = nconc(quals, clausegroup);
! 
!         indexkeys++;
!         classes++;
!     } while (!DoneMatchingIndexKeys(indexkeys, index));
! 
!     if (quals == NIL)
!         elog(ERROR, "extract_or_indexqual_conditions: no matching clause");      return
expand_indexqual_conditions(quals);}
 


Re: Strange query plan

From
Oleg Bartunov
Date:
On Tue, 5 Jun 2001, Tom Lane wrote:

> Oleg Bartunov <oleg@sai.msu.su> writes:
> > select msg_prt.tid as mid from msg_prt
> >  where exists (select idx.tid from idx where msg_prt.tid=idx.tid
> >                 and idx.did=1 and idx.lid in (1207,59587) )
> > NOTICE:  QUERY PLAN:
>
> > Seq Scan on msg_prt  (cost=0.00..119090807.13 rows=69505 width=4)
> >   SubPlan
> >     ->  Index Scan using idxidx, idxidx on idx  (cost=0.00..1713.40 rows=1 width=4)
>
> Actually, this example does reveal an unnecessary inefficiency: the
> planner is only using the "idx.lid in (1207,59587)" clause for the
> indexscan, ignoring the fact that the did and tid clauses match the
> additional columns of your three-column index.  The attached patch
> should improve matters.
>
>             regards, tom lane

Cool. Looks better
select msg_prt.tid as mid from msg_prt where exists (select idx.tid from idx where msg_prt.tid=idx.tid and idx.did=1
andidx.lid  in ( 1207, 59587) )
 
NOTICE:  QUERY PLAN:

Seq Scan on msg_prt  (cost=0.00..333700.88 rows=69505 width=4) SubPlan   ->  Index Scan using idxidx, idxidx on idx
(cost=0.00..4.79rows=1 width=4)
 

total: 3.15 sec; number: 1; for one: 3.153 sec;

interesting that droping index 'idxidx' and creating simple
create index tididx on idx (tid);
behaves better, while plan looks worse. Notice, index on tididx
estimates cost better (16).

select msg_prt.tid as mid from msg_prt where exists (select idx.tid from idx where msg_prt.tid=idx.tid and idx.did=1
andidx.lid  in ( 1207, 59587) )
 
NOTICE:  QUERY PLAN:

Seq Scan on msg_prt  (cost=0.00..1134474.94 rows=69505 width=4) SubPlan   ->  Index Scan using tididx on idx
(cost=0.00..16.31rows=1 width=4)
 

total: 1.70 sec; number: 1; for one: 1.703 sec;

Interesting that earlier if I have 2 indexes - idxidx and tididx
optimizer choose tididx, while now (after patching) optimizer
always choose idxidx.

Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83