Query optimizer 8.0.1 (and 8.0) - Mailing list pgsql-hackers

From pgsql@mohawksoft.com
Subject Query optimizer 8.0.1 (and 8.0)
Date
Msg-id 16398.24.91.171.78.1107555018.squirrel@mail.mohawksoft.com
Whole thread Raw
Responses Re: Query optimizer 8.0.1 (and 8.0)
Re: Query optimizer 8.0.1 (and 8.0)
List pgsql-hackers
Here's one:
I have the USA census TIGER database loaded, the WHOLE THING, the whole
country. It isn't the biggest database, but it is about 40G before
indexes. Every table is over a few million rows. I can quite honestly say,
a sequential scan is almost never the right thing to do.

Info below.

Here is the query:
select * from rt1, rt2 where rt1.tlid = rt2.tlid and (zipr = 2186 or
zipl=2186);

Well, when you look at the explain, obviously it looks like the sequential
scan is better, but trust me, it takes a good few minutes for the query to
respond with sequential scan enabled. I've run analyze and everything.

I suspect that analyze only samples a very small amount of the database
and gets the wrong idea about it. Is there a way to force analyze to
sample more  rows?

tiger=# analyze verbose rt2 ;
INFO:  analyzing "public.rt2"
INFO:  "rt2": scanned 3000 of 1139825 pages, containing 84160 live rows
and 0 dead rows; 3000 rows in sample, 31975891 estimated total rows
ANALYZE
tiger=# analyze verbose rt1 ;
INFO:  analyzing "public.rt1"
INFO:  "rt1": scanned 3000 of 1527360 pages, containing 90951 live rows
and 0 dead rows; 3000 rows in sample, 46304973 estimated total rows
ANALYZE



tiger=# explain select * from rt1, rt2 where rt1.tlid = rt2.tlid and (zipr
= 2186 or zipl=2186);                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------Hash Join
(cost=121978.81..3996190.89rows=21118 width=520)  Hash Cond: ("outer".tlid = "inner".tlid)  ->  Seq Scan on rt2
(cost=0.00..1459291.36rows=31946636 width=218)  ->  Hash  (cost=120662.36..120662.36 rows=30579 width=302)        ->
IndexScan using rt1_zipr, rt1_zipl on rt1 
 
(cost=0.00..120662.36 rows=30579 width=302)              Index Cond: ((zipr = 2186) OR (zipl = 2186))
(6 rows)

tiger=# set enable_seqscan=no;
SET
tiger=# explain select * from rt1, rt2 where rt1.tlid = rt2.tlid and (zipr
= 2186 or zipl=2186);                                         QUERY PLAN
-----------------------------------------------------------------------------------------------Nested Loop
(cost=0.00..46868256.00rows=21118 width=520)  ->  Index Scan using rt1_zipr, rt1_zipl on rt1  (cost=0.00..120662.36
 
rows=30579 width=302)        Index Cond: ((zipr = 2186) OR (zipl = 2186))  ->  Index Scan using rt2_tlid on rt2
(cost=0.00..1523.80rows=396
 
width=218)        Index Cond: ("outer".tlid = rt2.tlid)



           Table "public.rt1" Column   |       Type        | Modifiers
-----------+-------------------+-----------tlid      | integer           |side1     | character varying |source    |
charactervarying |fedirp    | character varying |fename    | character varying |fetype    | character varying |fedirs
| character varying |cfcc      | character varying |fraddl    | character varying |toaddl    | character varying
|fraddr   | character varying |toaddr    | character varying |friaddl   | character varying |toiaddl   | character
varying|friaddr   | character varying |toiaddr   | character varying |zipl      | integer           |zipr      |
integer          |aianhhfpl | character varying |aianhhfpr | character varying |aihhtlil  | character varying |aihhtlir
| character varying |statel    | character varying |stater    | character varying |countyl   | character varying
|countyr  | character varying |cousubl   | character varying |cousubr   | character varying |submcdl   | character
varying|submcdr   | character varying |placel    | character varying |placer    | character varying |tractl    |
charactervarying |tractr    | character varying |blockl    | character varying |blockr    | character varying |frlong
| numeric           |frlat     | numeric           |tolong    | numeric           |tolat     | numeric           |
 
Indexes:   "rt1_fename" btree (fename)   "rt1_frlat" btree (frlat)   "rt1_frlong" btree (frlong)   "rt1_tlid" btree
(tlid)  "rt1_tolat" btree (tolat)   "rt1_tolong" btree (tolong)   "rt1_zipl" btree (zipl)   "rt1_zipr" btree (zipr)
 
     Table "public.rt2"Column |  Type   | Modifiers
--------+---------+-----------tlid   | integer |rtsq   | integer |long1  | numeric |lat1   | numeric |long2  | numeric
|lat2  | numeric |long3  | numeric |lat3   | numeric |long4  | numeric |lat4   | numeric |long5  | numeric |lat5   |
numeric|long6  | numeric |lat6   | numeric |long7  | numeric |lat7   | numeric |long8  | numeric |lat8   | numeric
|long9 | numeric |lat9   | numeric |long10 | numeric |lat10  | numeric |
 
Indexes:   "rt2_tlid" btree (tlid)




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Escaping the ARC patent
Next
From: Philip Warner
Date:
Subject: Re: Escaping the ARC patent