Thread: Identical Queries

Identical Queries

From
Rob Schall
Date:
Question for anyone...

I tried posting to the bugs, and they said this is a better question for here.
I have to queries. One runs in about 2 seconds. The other takes upwards
of 2 minutes. I have a temp table that is created with 2 columns. This
table is joined with the larger database of call detail records.
However, these 2 queries are handled very differently.

The queries:
First----

calldetail=> EXPLAIN SELECT current.* FROM current JOIN anitmp ON
current.destnum=anitmp.ani AND istf=true;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..2026113.09 rows=500908 width=108)
   ->  Seq Scan on anitmp  (cost=0.00..33.62 rows=945 width=8)
         Filter: (istf = true)
   ->  Index Scan using i_destnum on current  (cost=0.00..2137.36
rows=531 width=108)
         Index Cond: (current.destnum = "outer".ani)
(5 rows)

Second----
calldetail=> EXPLAIN SELECT current.* FROM current JOIN anitmp ON
current.orignum=anitmp.ani AND istf=false;
                                QUERY PLAN
---------------------------------------------------------------------------
 Hash Join  (cost=35.99..3402035.53 rows=5381529 width=108)
   Hash Cond: ("outer".orignum = "inner".ani)
   ->  Seq Scan on current  (cost=0.00..907191.05 rows=10170805 width=108)
   ->  Hash  (cost=33.62..33.62 rows=945 width=8)
         ->  Seq Scan on anitmp  (cost=0.00..33.62 rows=945 width=8)
               Filter: (istf = false)
(6 rows)


The tables:
               Table "public.current"
  Column  |            Type             | Modifiers
----------+-----------------------------+-----------
 datetime | timestamp without time zone |
 orignum  | bigint                      |
 destnum  | bigint                      |
 billto   | bigint                      |
 cost     | numeric(6,4)                |
 duration | numeric(8,1)                |
 origcity | character(12)               |
 destcity | character(12)               |
 file     | character varying(30)       |
 linenum  | integer                     |
 carrier  | character(1)                |
Indexes:
    "i_destnum" btree (destnum)
    "i_orignum" btree (orignum)


    Table "public.anitmp"
 Column |  Type   | Modifiers
--------+---------+-----------
 ani    | bigint  |
 istf   | boolean |


I was also asked to post the EXPLAIN ANALYZE for both:

calldetail=> EXPLAIN ANALYZE SELECT current.* FROM anitmp JOIN current ON istf=false AND current.orignum=anitmp.ani;
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=35.99..3427123.39 rows=5421215 width=108) (actual time=1994.164..157443.544 rows=157 loops=1)
   Hash Cond: ("outer".orignum = "inner".ani)
   ->  Seq Scan on current  (cost=0.00..913881.09 rows=10245809 width=108) (actual time=710.986..137963.320
rows=10893541loops=1) 
   ->  Hash  (cost=33.62..33.62 rows=945 width=8) (actual time=10.948..10.948 rows=0 loops=1)
         ->  Seq Scan on anitmp  (cost=0.00..33.62 rows=945 width=8) (actual time=10.934..10.939 rows=2 loops=1)
               Filter: (istf = false)
 Total runtime: 157443.900 ms
(7 rows)

calldetail=> EXPLAIN ANALYZE SELECT current.* FROM current JOIN anitmp ON current.destnum=anitmp.ani AND istf=true;
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..2037526.69 rows=504602 width=108) (actual time=88.752..1050.295 rows=1445 loops=1)
   ->  Seq Scan on anitmp  (cost=0.00..33.62 rows=945 width=8) (actual time=8.189..8.202 rows=2 loops=1)
         Filter: (istf = true)
   ->  Index Scan using i_destnum on current  (cost=0.00..2149.40 rows=534 width=108) (actual time=62.365..517.454
rows=722loops=2) 
         Index Cond: (current.destnum = "outer".ani)
 Total runtime: 1052.862 ms
(6 rows)


Anyone have any ideas for me? I have indexes on each of the necessary
columns.

Rob



Re: Identical Queries

From
Stephan Szabo
Date:
On Thu, 1 Mar 2007, Rob Schall wrote:

> Question for anyone...
>
> I tried posting to the bugs, and they said this is a better question for here.
> I have to queries. One runs in about 2 seconds. The other takes upwards
> of 2 minutes. I have a temp table that is created with 2 columns. This
> table is joined with the larger database of call detail records.
> However, these 2 queries are handled very differently.

How many rows are there in anitmp and how many rows in anitmp have
istf=true and how many have istf=false? If you don't currently analyze the
temp table after adding the rows, you might find that doing an analyze
helps, or at least makes the row estimates better.

Re: Identical Queries

From
Rob Schall
Date:
There are 4 entries (wanted to make the playing field level for this
test). There are 2 with true for istf and 2 with false.

Rob


Stephan Szabo wrote:
> On Thu, 1 Mar 2007, Rob Schall wrote:
>
>
>> Question for anyone...
>>
>> I tried posting to the bugs, and they said this is a better question for here.
>> I have to queries. One runs in about 2 seconds. The other takes upwards
>> of 2 minutes. I have a temp table that is created with 2 columns. This
>> table is joined with the larger database of call detail records.
>> However, these 2 queries are handled very differently.
>>
>
> How many rows are there in anitmp and how many rows in anitmp have
> istf=true and how many have istf=false? If you don't currently analyze the
> temp table after adding the rows, you might find that doing an analyze
> helps, or at least makes the row estimates better.
>


Re: Identical Queries

From
"Chad Wagner"
Date:
On 3/1/07, Rob Schall <rschall@callone.net> wrote:
There are 4 entries (wanted to make the playing field level for this
test). There are 2 with true for istf and 2 with false.

Then the difference here has to do with using orignum vs destnum as the join criteria.  There must be more intersections for orignum than destnum, or your statistics are so far out of whack.  It appears to be estimating 5M vs 500K for a result set, and naturally it chose a different plan.

Re: Identical Queries

From
Stephan Szabo
Date:
On Thu, 1 Mar 2007, Rob Schall wrote:

> There are 4 entries (wanted to make the playing field level for this
> test). There are 2 with true for istf and 2 with false.

Then analyzing might help, because I think it's estimating many more rows
for both cases, and with 2 rows estimated to be returned the nested loop
should seem a lot more attractive than at 900+.

Re: Identical Queries

From
"Craig A. James"
Date:
Stephan Szabo wrote:
> I tried posting to the bugs, and they said this is a better question for here.
> I have to queries. One runs in about 2 seconds. The other takes upwards
> of 2 minutes. I have a temp table that is created with 2 columns. This
> table is joined with the larger database of call detail records.
> However, these 2 queries are handled very differently.

Even for a temporary table, you should run ANALYZE on it after you fill it but before you query or join to it.  I found
out(the hard way) that a temporary table of just 100 rows will generate dramatically different plans before and after
ANALYZE.

Craig