Thread: A question on the query planner

A question on the query planner

From

Jared Carr

Date:

01 December 2003, 17:44:31

I am currently working on optimizing some fairly time consuming queries
on a decently large
dataset.

The Following is the query in question.

SELECT z.lat, z.lon, z.city, z.state, q.date_time, c.make, c.model, c.year
    FROM quotes AS q, zips AS z, cars AS c
        WHERE
            z.zip = q.zip AND
            c.car_id = q.car_id AND
            z.state != 'AA' AND
            z.state != 'AE' AND
            z.state != 'AP' AND
            z.state = 'WA'
     ORDER BY date_time;

The tables are as follows.

                              Table "public.cars"
    Column     |         Type          |               Modifiers
---------------+-----------------------+----------------------------------------
 car_id        | character varying(10) | not null default ''::character
varying
 nags_glass_id | character varying(7)  | not null default ''::character
varying
 make          | character varying(30) | not null default ''::character
varying
 model         | character varying(30) | not null default ''::character
varying
 year          | character varying(4)  | not null default ''::character
varying
 style         | character varying(30) | not null default ''::character
varying
 price         | double precision      | not null default (0)::double
precision
Indexes:
    "cars_pkey" primary key, btree (car_id)
    "cars_car_id_btree_index" btree (car_id)
    "make_cars_index" btree (make)
    "model_cars_index" btree (model)
    "year_cars_index" btree ("year")

                                                 Table "public.quotes"
      Column       |            Type
|                              Modifiers
-------------------+-----------------------------+---------------------------------------------------------------------
 quote_id          | bigint                      | not null default
nextval('quotes_quote_id_seq'::text)
 visitor_id        | bigint                      | not null default
(0)::bigint
 date_time         | timestamp without time zone | not null default
'0001-01-01 00:00:00'::timestamp without time zone
 car_id            | character varying(10)       | not null default
''::character varying
 email             | text                        | not null default ''::text
 zip               | character varying(5)        | not null default
''::character varying
 current_referrer  | text                        | not null default ''::text
 original_referrer | text                        | not null default ''::text
Indexes:
    "quotes_pkey" primary key, btree (quote_id)
    "car_id_quotes_index" btree (car_id)
    "visitor_id_quotes_index" btree (visitor_id)
    "zip_quotes_index" btree (zip)

                                Table "public.zips"
 Column |         Type          |                     Modifiers
--------+-----------------------+---------------------------------------------------
 zip_id | bigint                | not null default
nextval('zips_zip_id_seq'::text)
 zip    | character varying(5)  | not null default ''::character varying
 city   | character varying(28) | not null default ''::character varying
 state  | character varying(2)  | not null default ''::character varying
 lat    | character varying(10) | not null default ''::character varying
 lon    | character varying(10) | not null default ''::character varying
Indexes:
    "zips_pkey" primary key, btree (zip_id)
    "zip_zips_index" btree (zip)
    "zips_state_btree_index" btree (state)

The above query with the default setting of 10 for
default_statistics_target runs as follows

(From Explain Analyze)


---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=58064.16..58074.20 rows=4015 width=80) (actual
time=2415.060..2421.421 rows=4539 loops=1)
   Sort Key: q.date_time
   ->  Merge Join  (cost=57728.02..57823.84 rows=4015 width=80) (actual
time=2254.056..2345.013 rows=4539 loops=1)
         Merge Cond: ("outer"."?column7?" = "inner"."?column5?")
         ->  Sort  (cost=56880.61..56890.65 rows=4015 width=62) (actual
time=2054.353..2062.189 rows=4693 loops=1)
               Sort Key: (q.car_id)::text
               ->  Hash Join  (cost=1403.91..56640.29 rows=4015
width=62) (actual time=8.479..1757.126 rows=10151 loops=1)
                     Hash Cond: (("outer".zip)::text = ("inner".zip)::text)
                     ->  Seq Scan on quotes q  (cost=0.00..10657.42
rows=336142 width=27) (actual time=0.062..657.015 rows=336166 loops=1)
                     ->  Hash  (cost=1402.63..1402.63 rows=511 width=52)
(actual time=8.273..8.273 rows=0 loops=1)
                           ->  Index Scan using zips_state_btree_index
on zips z  (cost=0.00..1402.63 rows=511 width=52) (actual
time=0.215..6.877 rows=718 loops=1)
                                 Index Cond: ((state)::text = 'WA'::text)
                                 Filter: (((state)::text <> 'AA'::text)
AND ((state)::text <> 'AE'::text) AND ((state)::text <> 'AP'::text))
         ->  Sort  (cost=847.41..870.91 rows=9401 width=37) (actual
time=199.172..216.354 rows=11922 loops=1)
               Sort Key: (c.car_id)::text
               ->  Seq Scan on cars c  (cost=0.00..227.01 rows=9401
width=37) (actual time=0.104..43.523 rows=9401 loops=1)
 Total runtime: 2427.937 ms

If I set enable_seqscan=off I get the following

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=122108.52..122118.62 rows=4039 width=80) (actual
time=701.002..707.442 rows=4541 loops=1)
   Sort Key: q.date_time
   ->  Nested Loop  (cost=0.00..121866.59 rows=4039 width=80) (actual
time=0.648..624.134 rows=4541 loops=1)
         ->  Nested Loop  (cost=0.00..102256.36 rows=4039 width=62)
(actual time=0.374..381.440 rows=10153 loops=1)
               ->  Index Scan using zips_state_btree_index on zips z
(cost=0.00..1413.31 rows=514 width=52) (actual time=0.042..9.043
rows=718 loops=1)
                     Index Cond: ((state)::text = 'WA'::text)
                     Filter: (((state)::text <> 'AA'::text) AND
((state)::text <> 'AE'::text) AND ((state)::text <> 'AP'::text))
               ->  Index Scan using zip_quotes_index on quotes q
(cost=0.00..195.59 rows=48 width=27) (actual time=0.039..0.426 rows=14
loops=718)
                     Index Cond: (("outer".zip)::text = (q.zip)::text)
         ->  Index Scan using cars_car_id_btree_index on cars c
(cost=0.00..4.84 rows=1 width=37) (actual time=0.015..0.017 rows=0
loops=10153)
               Index Cond: ((c.car_id)::text = ("outer".car_id)::text)
 Total runtime: 711.375 ms

I can also get a similar plan if I disable both Hash Joins and Merge Joins.

Furthermore I can get some additional speedup without turning off
sequence scans if I
set the value of  default_statistics_target = 1000 then the runtime will
be around 1200
otoh if I set default_statistics_target = 100 then the runtime will be
around 12000.

So, my question is is there any way to get the query planner to
recognize the potential
performance increase available by using the indexes that are set up
without specifically
turning off sequential scans before I run this query every time?

Thanks for the help.

Jared

Re: A question on the query planner

From

Tom Lane

Date:

01 December 2003, 20:46:01

Jared Carr <jared@89glass.com> writes:
> I am currently working on optimizing some fairly time consuming queries
> on a decently large dataset.

It doesn't look that large from here ;-).  I'd suggest experimenting
with reducing random_page_cost, since at least for your test query
it sure looks like everything is in RAM.  In theory random_page_cost = 1.0
is the correct setting for all-in-RAM cases.

            regards, tom lane

Re: A question on the query planner

From

Robert Treat

Date:

02 December 2003, 13:18:38

On Mon, 2003-12-01 at 16:44, Jared Carr wrote:
> I am currently working on optimizing some fairly time consuming queries
> on a decently large
> dataset.
>
> The Following is the query in question.
>
> SELECT z.lat, z.lon, z.city, z.state, q.date_time, c.make, c.model, c.year
>     FROM quotes AS q, zips AS z, cars AS c
>         WHERE
>             z.zip = q.zip AND
>             c.car_id = q.car_id AND
>             z.state != 'AA' AND
>             z.state != 'AE' AND
>             z.state != 'AP' AND
>             z.state = 'WA'
>      ORDER BY date_time;
>

This wont completely solve your problem, but z.state = 'WA' would seem
to be mutually exclusive of the != AA|AE|AP.  While it's not much, it is
extra overhead there doesn't seem to be any need for...

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

Re: A question on the query planner

From

Jared Carr

Date:

02 December 2003, 13:55:41

Robert Treat wrote:

>On Mon, 2003-12-01 at 16:44, Jared Carr wrote:
>
>
>>I am currently working on optimizing some fairly time consuming queries
>>on a decently large
>>dataset.
>>
>>The Following is the query in question.
>>
>>SELECT z.lat, z.lon, z.city, z.state, q.date_time, c.make, c.model, c.year
>>    FROM quotes AS q, zips AS z, cars AS c
>>        WHERE
>>            z.zip = q.zip AND
>>            c.car_id = q.car_id AND
>>            z.state != 'AA' AND
>>            z.state != 'AE' AND
>>            z.state != 'AP' AND
>>            z.state = 'WA'
>>     ORDER BY date_time;
>>
>>
>>
>
>This wont completely solve your problem, but z.state = 'WA' would seem
>to be mutually exclusive of the != AA|AE|AP.  While it's not much, it is
>extra overhead there doesn't seem to be any need for...
>
>Robert Treat
>
>
That is an excellent point, unfortunately it doesn't change the query
plan at all.

Furthermore noticed that in the following query plan it is doing the
sequential scan on quotes first, and
then doing the sequential on zips.  IMHO this should be the other way
around, since the result set for
zips is considerably smaller especially give that we are using a where
clause to limit the number of items
returned from zips, so it would seem that it would be faster to scan
zips then join onto quotes, but perhaps
it needs to do the sequential scan on both regardless.

Of course still there is the holy grail of getting it to actually use
the indexes. :P

                                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=57812.71..57822.86 rows=4058 width=80) (actual
time=2522.826..2529.237 rows=4581 loops=1)
   Sort Key: q.date_time
   ->  Merge Join  (cost=57473.20..57569.50 rows=4058 width=80) (actual
time=2360.656..2451.987 rows=4581 loops=1)
         Merge Cond: ("outer"."?column7?" = "inner"."?column5?")
         ->  Sort  (cost=56625.79..56635.93 rows=4058 width=62) (actual
time=2077.209..2085.095 rows=4735 loops=1)
               Sort Key: (q.car_id)::text
               ->  Hash Join  (cost=1088.19..56382.58 rows=4058
width=62) (actual time=86.111..1834.682 rows=10193 loops=1)
                     Hash Cond: (("outer".zip)::text = ("inner".zip)::text)
                     ->  Seq Scan on quotes q  (cost=0.00..10664.25
rows=336525 width=27) (actual time=0.098..658.905 rows=336963 loops=1)
                     ->  Hash  (cost=1086.90..1086.90 rows=516 width=52)
(actual time=85.798..85.798 rows=0 loops=1)
                           ->  Seq Scan on zips z  (cost=0.00..1086.90
rows=516 width=52) (actual time=79.532..84.151 rows=718 loops=1)
                                 Filter: ((state)::text = 'WA'::text)
         ->  Sort  (cost=847.41..870.91 rows=9401 width=37) (actual
time=282.896..300.082 rows=11950 loops=1)
               Sort Key: (c.car_id)::text
               ->  Seq Scan on cars c  (cost=0.00..227.01 rows=9401
width=37) (actual time=0.102..43.516 rows=9401 loops=1)

Re: A question on the query planner

From

Greg Stark

Date:

02 December 2003, 15:01:39

Jared Carr <jared@89glass.com> writes:

> Furthermore noticed that in the following query plan it is doing the
> sequential scan on quotes first, and then doing the sequential on zips. IMHO
> this should be the other way around, since the result set for zips is
> considerably smaller especially give that we are using a where clause to
> limit the number of items returned from zips, so it would seem that it would
> be faster to scan zips then join onto quotes, but perhaps it needs to do the
> sequential scan on both regardless.

>->  Hash Join  (cost=1088.19..56382.58 rows=4058 width=62) (actual time=86.111..1834.682 rows=10193 loops=1)
>      Hash Cond: (("outer".zip)::text = ("inner".zip)::text)
>      ->  Seq Scan on quotes q  (cost=0.00..10664.25 rows=336525 width=27) (actual time=0.098..658.905 rows=336963
loops=1)
>      ->  Hash  (cost=1086.90..1086.90 rows=516 width=52) (actual time=85.798..85.798 rows=0 loops=1)
>            ->  Seq Scan on zips z  (cost=0.00..1086.90 rows=516 width=52) (actual time=79.532..84.151 rows=718
loops=1)
>                  Filter: ((state)::text = 'WA'::text)

You're misreading it. Hash join is done by reading in one table into a hash
table, then reading the other table looking up entries in the hash table. The
zips are being read into the hash table which is appropriate if it's the
smaller table.


> Of course still there is the holy grail of getting it to actually use
> the indexes. :P

>          Merge Cond: ("outer"."?column7?" = "inner"."?column5?")

Well it looks like you have something strange going on. What data type is
car_id in each table?


--
greg

Re: A question on the query planner

From

Jared Carr

Date:

02 December 2003, 15:13:52

Greg Stark wrote:

>
>>         Merge Cond: ("outer"."?column7?" = "inner"."?column5?")
>>
>>
>
>Well it looks like you have something strange going on. What data type is
>car_id in each table?
>
>
>
car_id is a varchar(10) in both tables.

Re: A question on the query planner

From

Greg Stark

Date:

02 December 2003, 16:11:41

Jared Carr <jared@89glass.com> writes:

> Greg Stark wrote:
>
> >
> >>         Merge Cond: ("outer"."?column7?" = "inner"."?column5?")
> >>
> >
> >Well it looks like you have something strange going on. What data type is
> > car_id in each table?
> car_id is a varchar(10) in both tables.

Well for some reason it's being cast to a text to do the merge.

What version of postgres is this btw? The analyzes look like 7.4?

--
greg

Re: A question on the query planner

From

Jared Carr

Date:

02 December 2003, 16:28:58

Greg Stark wrote:

>Jared Carr <jared@89glass.com> writes:
>
>
>
>>Greg Stark wrote:
>>
>>
>>
>>>>        Merge Cond: ("outer"."?column7?" = "inner"."?column5?")
>>>>
>>>>
>>>>
>>>Well it looks like you have something strange going on. What data type is
>>>car_id in each table?
>>>
>>>
>>car_id is a varchar(10) in both tables.
>>
>>
>
>Well for some reason it's being cast to a text to do the merge.
>
>What version of postgres is this btw? The analyzes look like 7.4?
>
>
>
Yes, this is 7.4.

Re: A question on the query planner

From

Greg Stark

Date:

02 December 2003, 18:32:47

Jared Carr <jared@89glass.com> writes:

> Greg Stark wrote:
>
> > Well it looks like you have something strange going on. What data type is
> > car_id in each table?
> >
> car_id is a varchar(10) in both tables.

Huh. The following shows something strange. It seems joining on two varchars
no longer works well. Instead the optimizer has to convert both columns to
text.

I know some inter-type comparisons were removed a while ago, but I would not
have thought that would effect varchar-varchar comparisons. I think this is
pretty bad.


test=# create table a (x varchar primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "a_pkey" for table "a"
CREATE TABLE
test=# create table b (x varchar primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "b_pkey" for table "b"
CREATE TABLE
test=# select * from a,b where a.x=b.x;
 x | x
---+---
(0 rows)

test=# explain select * from a,b where a.x=b.x;
                            QUERY PLAN
------------------------------------------------------------------
 Merge Join  (cost=139.66..159.67 rows=1001 width=64)
   Merge Cond: ("outer"."?column2?" = "inner"."?column2?")
   ->  Sort  (cost=69.83..72.33 rows=1000 width=32)
         Sort Key: (a.x)::text
         ->  Seq Scan on a  (cost=0.00..20.00 rows=1000 width=32)
   ->  Sort  (cost=69.83..72.33 rows=1000 width=32)
         Sort Key: (b.x)::text
         ->  Seq Scan on b  (cost=0.00..20.00 rows=1000 width=32)
(8 rows)

test=# create table a2 (x text primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "a2_pkey" for table "a2"
CREATE TABLE
test=# create table b2 (x text primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "b2_pkey" for table "b2"
CREATE TABLE
test=# explain select * from a2,b2 where a2.x=b2.x;
                            QUERY PLAN
-------------------------------------------------------------------
 Hash Join  (cost=22.50..57.51 rows=1001 width=64)
   Hash Cond: ("outer".x = "inner".x)
   ->  Seq Scan on a2  (cost=0.00..20.00 rows=1000 width=32)
   ->  Hash  (cost=20.00..20.00 rows=1000 width=32)
         ->  Seq Scan on b2  (cost=0.00..20.00 rows=1000 width=32)
(5 rows)


--
greg

Re: A question on the query planner

From

Greg Stark

Date:

02 December 2003, 18:44:04

Greg Stark <gsstark@MIT.EDU> writes:

> Huh. The following shows something strange.

Worse, with enable_hashjoin off it's even more obvious something's broken:


test=# set enable_hashjoin = off;
SET
test=# explain select * from a,b where a.x=b.x;
                            QUERY PLAN
------------------------------------------------------------------
 Merge Join  (cost=139.66..159.67 rows=1001 width=64)
   Merge Cond: ("outer"."?column2?" = "inner"."?column2?")
   ->  Sort  (cost=69.83..72.33 rows=1000 width=32)
         Sort Key: (a.x)::text
         ->  Seq Scan on a  (cost=0.00..20.00 rows=1000 width=32)
   ->  Sort  (cost=69.83..72.33 rows=1000 width=32)
         Sort Key: (b.x)::text
         ->  Seq Scan on b  (cost=0.00..20.00 rows=1000 width=32)
(8 rows)

test=# explain select * from a2,b2 where a2.x=b2.x;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Merge Join  (cost=0.00..63.04 rows=1001 width=64)
   Merge Cond: ("outer".x = "inner".x)
   ->  Index Scan using a2_pkey on a2  (cost=0.00..24.00 rows=1000 width=32)
   ->  Index Scan using b2_pkey on b2  (cost=0.00..24.00 rows=1000 width=32)
(4 rows)

--
greg

Re: A question on the query planner

From

Tom Lane

Date:

02 December 2003, 19:51:18

Greg Stark <gsstark@mit.edu> writes:
> Huh. The following shows something strange. It seems joining on two varchars
> no longer works well. Instead the optimizer has to convert both columns to
> text.

Define "no longer works well".  varchar doesn't have its own comparison
operators anymore, but AFAIK that makes no difference.

            regards, tom lane

Re: A question on the query planner

From

Greg Stark

Date:

02 December 2003, 21:33:46

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Define "no longer works well".  varchar doesn't have its own comparison
> operators anymore, but AFAIK that makes no difference.


Well it seems to completely bar the use of a straight merge join between two
index scans:


test=# set enable_seqscan = off;
SET

test=# explain select * from a,b where a.x=b.x;
                                QUERY PLAN
---------------------------------------------------------------------------
 Nested Loop  (cost=100000000.00..100002188.86 rows=1001 width=64)
   ->  Seq Scan on a  (cost=100000000.00..100000020.00 rows=1000 width=32)
   ->  Index Scan using b_pkey on b  (cost=0.00..2.16 rows=1 width=32)
         Index Cond: (("outer".x)::text = (b.x)::text)
(4 rows)

test=# explain select * from a2,b2 where a2.x=b2.x;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Merge Join  (cost=0.00..63.04 rows=1001 width=64)
   Merge Cond: ("outer".x = "inner".x)
   ->  Index Scan using a2_pkey on a2  (cost=0.00..24.00 rows=1000 width=32)
   ->  Index Scan using b2_pkey on b2  (cost=0.00..24.00 rows=1000 width=32)
(4 rows)


--
greg

Re: A question on the query planner

From

Tom Lane

Date:

02 December 2003, 23:53:57

Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Define "no longer works well".

> Well it seems to completely bar the use of a straight merge join between two
> index scans:

Hmmm ... [squints] ... it's not supposed to do that ... [digs] ... yeah,
there's something busted here.  Will get back to you ...

            regards, tom lane

Re: A question on the query planner

From

Tom Lane

Date:

03 December 2003, 13:49:49

> Hmmm ... [squints] ... it's not supposed to do that ...

The attached patch seems to make it better.

            regards, tom lane


Index: src/backend/optimizer/path/costsize.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/optimizer/path/costsize.c,v
retrieving revision 1.115
diff -c -r1.115 costsize.c
*** src/backend/optimizer/path/costsize.c    5 Oct 2003 22:44:25 -0000    1.115
--- src/backend/optimizer/path/costsize.c    3 Dec 2003 17:40:58 -0000
***************
*** 1322,1327 ****
--- 1322,1331 ----
      float4       *numbers;
      int            nnumbers;

+     /* Ignore any binary-compatible relabeling */
+     if (var && IsA(var, RelabelType))
+         var = (Var *) ((RelabelType *) var)->arg;
+
      /*
       * Lookup info about var's relation and attribute; if none available,
       * return default estimate.
Index: src/backend/optimizer/path/pathkeys.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/optimizer/path/pathkeys.c,v
retrieving revision 1.53
diff -c -r1.53 pathkeys.c
*** src/backend/optimizer/path/pathkeys.c    4 Aug 2003 02:40:00 -0000    1.53
--- src/backend/optimizer/path/pathkeys.c    3 Dec 2003 17:40:58 -0000
***************
*** 25,36 ****
  #include "optimizer/tlist.h"
  #include "optimizer/var.h"
  #include "parser/parsetree.h"
  #include "parser/parse_func.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"


! static PathKeyItem *makePathKeyItem(Node *key, Oid sortop);
  static List *make_canonical_pathkey(Query *root, PathKeyItem *item);
  static Var *find_indexkey_var(Query *root, RelOptInfo *rel,
                    AttrNumber varattno);
--- 25,37 ----
  #include "optimizer/tlist.h"
  #include "optimizer/var.h"
  #include "parser/parsetree.h"
+ #include "parser/parse_expr.h"
  #include "parser/parse_func.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"


! static PathKeyItem *makePathKeyItem(Node *key, Oid sortop, bool checkType);
  static List *make_canonical_pathkey(Query *root, PathKeyItem *item);
  static Var *find_indexkey_var(Query *root, RelOptInfo *rel,
                    AttrNumber varattno);
***************
*** 41,50 ****
   *        create a PathKeyItem node
   */
  static PathKeyItem *
! makePathKeyItem(Node *key, Oid sortop)
  {
      PathKeyItem *item = makeNode(PathKeyItem);

      item->key = key;
      item->sortop = sortop;
      return item;
--- 42,70 ----
   *        create a PathKeyItem node
   */
  static PathKeyItem *
! makePathKeyItem(Node *key, Oid sortop, bool checkType)
  {
      PathKeyItem *item = makeNode(PathKeyItem);

+     /*
+      * Some callers pass expressions that are not necessarily of the same
+      * type as the sort operator expects as input (for example when dealing
+      * with an index that uses binary-compatible operators).  We must relabel
+      * these with the correct type so that the key expressions will be seen
+      * as equal() to expressions that have been correctly labeled.
+      */
+     if (checkType)
+     {
+         Oid            lefttype,
+                     righttype;
+
+         op_input_types(sortop, &lefttype, &righttype);
+         if (exprType(key) != lefttype)
+             key = (Node *) makeRelabelType((Expr *) key,
+                                            lefttype, -1,
+                                            COERCE_DONTCARE);
+     }
+
      item->key = key;
      item->sortop = sortop;
      return item;
***************
*** 70,78 ****
  {
      Expr       *clause = restrictinfo->clause;
      PathKeyItem *item1 = makePathKeyItem(get_leftop(clause),
!                                          restrictinfo->left_sortop);
      PathKeyItem *item2 = makePathKeyItem(get_rightop(clause),
!                                          restrictinfo->right_sortop);
      List       *newset,
                 *cursetlink;

--- 90,100 ----
  {
      Expr       *clause = restrictinfo->clause;
      PathKeyItem *item1 = makePathKeyItem(get_leftop(clause),
!                                          restrictinfo->left_sortop,
!                                          false);
      PathKeyItem *item2 = makePathKeyItem(get_rightop(clause),
!                                          restrictinfo->right_sortop,
!                                          false);
      List       *newset,
                 *cursetlink;

***************
*** 668,674 ****
          }

          /* OK, make a sublist for this sort key */
!         item = makePathKeyItem(indexkey, sortop);
          cpathkey = make_canonical_pathkey(root, item);

          /*
--- 690,696 ----
          }

          /* OK, make a sublist for this sort key */
!         item = makePathKeyItem(indexkey, sortop, true);
          cpathkey = make_canonical_pathkey(root, item);

          /*
***************
*** 785,791 ****
                                          tle->resdom->restypmod,
                                          0);
                      outer_item = makePathKeyItem((Node *) outer_var,
!                                                  sub_item->sortop);
                      /* score = # of mergejoin peers */
                      score = count_canonical_peers(root, outer_item);
                      /* +1 if it matches the proper query_pathkeys item */
--- 807,814 ----
                                          tle->resdom->restypmod,
                                          0);
                      outer_item = makePathKeyItem((Node *) outer_var,
!                                                  sub_item->sortop,
!                                                  true);
                      /* score = # of mergejoin peers */
                      score = count_canonical_peers(root, outer_item);
                      /* +1 if it matches the proper query_pathkeys item */
***************
*** 893,899 ****
          PathKeyItem *pathkey;

          sortkey = get_sortgroupclause_expr(sortcl, tlist);
!         pathkey = makePathKeyItem(sortkey, sortcl->sortop);

          /*
           * The pathkey becomes a one-element sublist, for now;
--- 916,922 ----
          PathKeyItem *pathkey;

          sortkey = get_sortgroupclause_expr(sortcl, tlist);
!         pathkey = makePathKeyItem(sortkey, sortcl->sortop, true);

          /*
           * The pathkey becomes a one-element sublist, for now;
***************
*** 937,943 ****
      {
          oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(restrictinfo));
          key = get_leftop(restrictinfo->clause);
!         item = makePathKeyItem(key, restrictinfo->left_sortop);
          restrictinfo->left_pathkey = make_canonical_pathkey(root, item);
          MemoryContextSwitchTo(oldcontext);
      }
--- 960,966 ----
      {
          oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(restrictinfo));
          key = get_leftop(restrictinfo->clause);
!         item = makePathKeyItem(key, restrictinfo->left_sortop, false);
          restrictinfo->left_pathkey = make_canonical_pathkey(root, item);
          MemoryContextSwitchTo(oldcontext);
      }
***************
*** 945,951 ****
      {
          oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(restrictinfo));
          key = get_rightop(restrictinfo->clause);
!         item = makePathKeyItem(key, restrictinfo->right_sortop);
          restrictinfo->right_pathkey = make_canonical_pathkey(root, item);
          MemoryContextSwitchTo(oldcontext);
      }
--- 968,974 ----
      {
          oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(restrictinfo));
          key = get_rightop(restrictinfo->clause);
!         item = makePathKeyItem(key, restrictinfo->right_sortop, false);
          restrictinfo->right_pathkey = make_canonical_pathkey(root, item);
          MemoryContextSwitchTo(oldcontext);
      }
Index: src/backend/utils/cache/lsyscache.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/utils/cache/lsyscache.c,v
retrieving revision 1.108
diff -c -r1.108 lsyscache.c
*** src/backend/utils/cache/lsyscache.c    4 Oct 2003 18:22:59 -0000    1.108
--- src/backend/utils/cache/lsyscache.c    3 Dec 2003 17:40:58 -0000
***************
*** 465,470 ****
--- 465,493 ----
  }

  /*
+  * op_input_types
+  *
+  *        Returns the left and right input datatypes for an operator
+  *        (InvalidOid if not relevant).
+  */
+ void
+ op_input_types(Oid opno, Oid *lefttype, Oid *righttype)
+ {
+     HeapTuple    tp;
+     Form_pg_operator optup;
+
+     tp = SearchSysCache(OPEROID,
+                         ObjectIdGetDatum(opno),
+                         0, 0, 0);
+     if (!HeapTupleIsValid(tp))    /* shouldn't happen */
+         elog(ERROR, "cache lookup failed for operator %u", opno);
+     optup = (Form_pg_operator) GETSTRUCT(tp);
+     *lefttype = optup->oprleft;
+     *righttype = optup->oprright;
+     ReleaseSysCache(tp);
+ }
+
+ /*
   * op_mergejoinable
   *
   *        Returns the left and right sort operators corresponding to a
Index: src/include/utils/lsyscache.h
===================================================================
RCS file: /cvsroot/pgsql-server/src/include/utils/lsyscache.h,v
retrieving revision 1.82
diff -c -r1.82 lsyscache.h
*** src/include/utils/lsyscache.h    4 Oct 2003 18:22:59 -0000    1.82
--- src/include/utils/lsyscache.h    3 Dec 2003 17:41:00 -0000
***************
*** 40,45 ****
--- 40,46 ----
  extern bool opclass_is_hash(Oid opclass);
  extern RegProcedure get_opcode(Oid opno);
  extern char *get_opname(Oid opno);
+ extern void op_input_types(Oid opno, Oid *lefttype, Oid *righttype);
  extern bool op_mergejoinable(Oid opno, Oid *leftOp, Oid *rightOp);
  extern void op_mergejoin_crossops(Oid opno, Oid *ltop, Oid *gtop,
                        RegProcedure *ltproc, RegProcedure *gtproc);

Re: A question on the query planner

From

Jared Carr

Date:

03 December 2003, 14:27:33

Tom Lane wrote:

>>Hmmm ... [squints] ... it's not supposed to do that ...
>>
>>
>
>The attached patch seems to make it better.
>
>
>
The patch definitely makes things more consistent...unfortunately it is
more
consistent toward the slower execution times.  Of course I am looking at
this
simply from a straight performance standpoint and not a viewpoint of
what *should*
be happening. At any rate here are the query plans with the various
settings.

Default Settings:

                                                                 QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=15290.20..15300.34 rows=4058 width=80) (actual
time=2944.650..2951.292 rows=4672 loops=1)
   Sort Key: q.date_time
   ->  Hash Join  (cost=13529.79..15046.99 rows=4058 width=80) (actual
time=2678.033..2873.475 rows=4672 loops=1)
         Hash Cond: (("outer".car_id)::text = ("inner".car_id)::text)
         ->  Seq Scan on cars c  (cost=0.00..227.01 rows=9401 width=37)
(actual time=19.887..50.971 rows=9401 loops=1)
         ->  Hash  (cost=13475.65..13475.65 rows=4058 width=62) (actual
time=2643.377..2643.377 rows=0 loops=1)
               ->  Hash Join  (cost=1088.19..13475.65 rows=4058
width=62) (actual time=86.739..2497.558 rows=10284 loops=1)
                     Hash Cond: (("outer".zip)::text = ("inner".zip)::text)
                     ->  Seq Scan on quotes q  (cost=0.00..10664.25
rows=336525 width=27) (actual time=0.223..1308.561 rows=340694 loops=1)
                     ->  Hash  (cost=1086.90..1086.90 rows=516 width=52)
(actual time=84.329..84.329 rows=0 loops=1)
                           ->  Seq Scan on zips z  (cost=0.00..1086.90
rows=516 width=52) (actual time=78.363..82.901 rows=718 loops=1)
                                 Filter: ((state)::text = 'WA'::text)
 Total runtime: 2955.366 ms

SET enable_seqscan=false;


QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=103557.82..103567.97 rows=4058 width=80) (actual
time=1015.122..1021.750 rows=4673 loops=1)
   Sort Key: q.date_time
   ->  Merge Join  (cost=102734.94..103314.61 rows=4058 width=80)
(actual time=802.908..941.520 rows=4673 loops=1)
         Merge Cond: ("outer"."?column7?" = ("inner".car_id)::text)
         ->  Sort  (cost=102734.94..102745.08 rows=4058 width=62)
(actual time=802.112..812.755 rows=4827 loops=1)
               Sort Key: (q.car_id)::text
               ->  Nested Loop  (cost=0.00..102491.73 rows=4058
width=62) (actual time=148.535..555.653 rows=10285 loops=1)
                     ->  Index Scan using zip_zips_index on zips z
(cost=0.00..1272.69 rows=516 width=52) (actual time=148.243..155.577
rows=718 loops=1)
                           Filter: ((state)::text = 'WA'::text)
                     ->  Index Scan using zip_quotes_index on quotes q
(cost=0.00..195.55 rows=48 width=27) (actual time=0.042..0.454 rows=14
loops=718)
                           Index Cond: (("outer".zip)::text = (q.zip)::text)
         ->  Index Scan using cars_car_id_btree_index on cars c
(cost=0.00..506.87 rows=9401 width=37) (actual time=0.220..46.910
rows=12019 loops=1)
 Total runtime: 1027.339 ms

There is still a 3x decrease in execution time here, but it is overall
slower than before the
patch was applied.

SET enable_mergejoin = false; AND SET enable_seqscan = false;


QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=104586.15..104596.29 rows=4058 width=80) (actual
time=887.719..894.358 rows=4673 loops=1)
   Sort Key: q.date_time
   ->  Hash Join  (cost=102545.88..104342.94 rows=4058 width=80) (actual
time=593.710..815.541 rows=4673 loops=1)
         Hash Cond: (("outer".car_id)::text = ("inner".car_id)::text)
         ->  Index Scan using cars_car_id_btree_index on cars c
(cost=0.00..506.87 rows=9401 width=37) (actual time=0.182..37.306
rows=9401 loops=1)
         ->  Hash  (cost=102491.73..102491.73 rows=4058 width=62)
(actual time=593.040..593.040 rows=0 loops=1)
               ->  Nested Loop  (cost=0.00..102491.73 rows=4058
width=62) (actual time=146.647..551.975 rows=10285 loops=1)
                     ->  Index Scan using zip_zips_index on zips z
(cost=0.00..1272.69 rows=516 width=52) (actual time=146.378..153.767
rows=718 loops=1)
                           Filter: ((state)::text = 'WA'::text)
                     ->  Index Scan using zip_quotes_index on quotes q
(cost=0.00..195.55 rows=48 width=27) (actual time=0.044..0.464 rows=14
loops=718)
                           Index Cond: (("outer".zip)::text = (q.zip)::text)
 Total runtime: 898.438 ms

Again a decrease in execution time.

On the other hand:
SET enable_hasdjoin=false;


QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=62829.86..62840.00 rows=4058 width=80) (actual
time=11368.025..11374.629 rows=4673 loops=1)
   Sort Key: q.date_time
   ->  Merge Join  (cost=62006.97..62586.65 rows=4058 width=80) (actual
time=11188.371..11295.156 rows=4673 loops=1)
         Merge Cond: (("outer".car_id)::text = "inner"."?column7?")
         ->  Index Scan using cars_car_id_btree_index on cars c
(cost=0.00..506.87 rows=9401 width=37) (actual time=0.167..37.728
rows=9401 loops=1)
         ->  Sort  (cost=62006.97..62017.12 rows=4058 width=62) (actual
time=11187.581..11196.343 rows=4827 loops=1)
               Sort Key: (q.car_id)::text
               ->  Merge Join  (cost=60037.99..61763.76 rows=4058
width=62) (actual time=10893.572..10975.658 rows=10285 loops=1)
                     Merge Cond: ("outer"."?column6?" = "inner"."?column4?")
                     ->  Sort  (cost=1110.15..1111.44 rows=516 width=52)
(actual time=86.679..87.166 rows=718 loops=1)
                           Sort Key: (z.zip)::text
                           ->  Seq Scan on zips z  (cost=0.00..1086.90
rows=516 width=52) (actual time=79.023..83.921 rows=718 loops=1)
                                 Filter: ((state)::text = 'WA'::text)
                     ->  Sort  (cost=58927.84..59769.15 rows=336525
width=27) (actual time=9848.479..10319.275 rows=340426 loops=1)
                           Sort Key: (q.zip)::text
                           ->  Seq Scan on quotes q
(cost=0.00..10664.25 rows=336525 width=27) (actual time=0.227..2171.917
rows=340740 loops=1)
 Total runtime: 11408.120 ms

Which really is not that surprising.

And Finally:
set enable_hashjoin=false; enable_seqscan=false;


QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=103557.82..103567.97 rows=4058 width=80) (actual
time=1206.168..1212.880 rows=4673 loops=1)
   Sort Key: q.date_time
   ->  Merge Join  (cost=102734.94..103314.61 rows=4058 width=80)
(actual time=809.448..949.110 rows=4673 loops=1)
         Merge Cond: ("outer"."?column7?" = ("inner".car_id)::text)
         ->  Sort  (cost=102734.94..102745.08 rows=4058 width=62)
(actual time=808.660..819.317 rows=4827 loops=1)
               Sort Key: (q.car_id)::text
               ->  Nested Loop  (cost=0.00..102491.73 rows=4058
width=62) (actual time=151.457..559.886 rows=10285 loops=1)
                     ->  Index Scan using zip_zips_index on zips z
(cost=0.00..1272.69 rows=516 width=52) (actual time=151.179..158.375
rows=718 loops=1)
                           Filter: ((state)::text = 'WA'::text)
                     ->  Index Scan using zip_quotes_index on quotes q
(cost=0.00..195.55 rows=48 width=27) (actual time=0.042..0.455 rows=14
loops=718)
                           Index Cond: (("outer".zip)::text = (q.zip)::text)
         ->  Index Scan using cars_car_id_btree_index on cars c
(cost=0.00..506.87 rows=9401 width=37) (actual time=0.213..47.307
rows=12019 loops=1)
 Total runtime: 1218.459 ms


Anyway, thanks for the attention to this issue. And I hope that this
helps some.

Jared

Re: A question on the query planner

From

Greg Stark

Date:

03 December 2003, 15:33:22

Jared Carr <jared@89glass.com> writes:

> The patch definitely makes things more consistent...unfortunately it is more
> consistent toward the slower execution times. Of course I am looking at this
> simply from a straight performance standpoint and not a viewpoint of what
> *should* be happening. At any rate here are the query plans with the various
> settings.

The optimizer seems to be at least considering reasonable plans now. It seems
from the estimates that you need to rerun analyze. You might try "vacuum full
analyze" to be sure.

Also, you might try raising effective_cache_size and/or lowering
random_page_size (it looks like something around 2 might help).

--
greg

Re: A question on the query planner

From

Bruce Momjian

Date:

03 December 2003, 19:18:23

Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > Tom Lane <tgl@sss.pgh.pa.us> writes:
> >> Define "no longer works well".
>
> > Well it seems to completely bar the use of a straight merge join between two
> > index scans:
>
> Hmmm ... [squints] ... it's not supposed to do that ... [digs] ... yeah,
> there's something busted here.  Will get back to you ...

LOL, but I am not sure why.  :-)

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: A question on the query planner

From

Jared Carr

Date:

03 December 2003, 19:22:54

Greg Stark wrote:

>Jared Carr <jared@89glass.com> writes:
>
>
>
>>The patch definitely makes things more consistent...unfortunately it is more
>>consistent toward the slower execution times. Of course I am looking at this
>>simply from a straight performance standpoint and not a viewpoint of what
>>*should* be happening. At any rate here are the query plans with the various
>>settings.
>>
>>
>
>The optimizer seems to be at least considering reasonable plans now. It seems
>from the estimates that you need to rerun analyze. You might try "vacuum full
>analyze" to be sure.
>
>Also, you might try raising effective_cache_size and/or lowering
>random_page_size (it looks like something around 2 might help).
>
>
>
Yep, I had forgotten to run vacuum since I had patched it :P. The
overall performance is definitely better,
I will go ahead and tweak the server settings and see what I can get.
Thanks again for all the help.

Re: A question on the query planner

From

Gaetano Mendola

Date:

03 December 2003, 19:56:06

Tom Lane wrote:

>>Hmmm ... [squints] ... it's not supposed to do that ...
>
>
> The attached patch seems to make it better.

I guess is too late for 7.3.5.

:-(

Any chance for 7.4.1 ?




Regards
Gaetano Mendola