Re: TB-sized databases - Mailing list pgsql-performance

From Ron Mayer
Subject Re: TB-sized databases
Date
Msg-id 4758A1B8.20501@cheapcomplexdevices.com
Whole thread Raw
In response to Re: TB-sized databases  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: TB-sized databases
List pgsql-performance
Tom Lane wrote:
> Michael Stone <mstone+postgres@mathom.us> writes:
>> OTOH, the planner can really screw up queries on really large databases.
>> ... I've got some queries that the
>> planner thinks will return on the order of 10^30 rows for that sort of
>> reason. In practice, the query may return 10^3 rows....
>
> Indeed, and if you've got examples where it's that far off, you should
> report them.

If I read this right, I've got quite a few cases where the planner
expects 1 row but gets over 2000.

And within the plan, it looks like there's a step where it expects
511 rows and gets 2390779 which seems to be off by a factor of 4600x.

Also shown below it seems that if I use "OFFSET 0" as a "hint"
I can force a much (10x) better plan.  I wonder if there's room for
a pgfoundry project for a patch set that lets us use more hints
than OFFSET 0.

    Ron

logs=# analyze;
ANALYZE
logs=# explain analyze  select * from fact natural join d_ref natural join d_uag where ref_host = 'download.com.com'
andref_path = '/[path_removed].html' and useragent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)'; 
                                                                       QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=2827.72..398919.05 rows=1 width=242) (actual time=69175.963..141550.628 rows=2474 loops=1)
   Hash Cond: (fact.ref_id = d_ref.ref_id)
   ->  Nested Loop  (cost=2819.88..398908.65 rows=511 width=119) (actual time=3094.740..139361.235 rows=2390779
loops=1)
         ->  Index Scan using i_uag__val on d_uag  (cost=0.00..6.38 rows=1 width=91) (actual time=45.937..45.948 rows=1
loops=1)
               Index Cond: ((useragent)::text = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)'::text)
         ->  Bitmap Heap Scan on fact  (cost=2819.88..396449.49 rows=196223 width=32) (actual time=3048.770..135653.875
rows=2390779loops=1) 
               Recheck Cond: (fact.uag_id = d_uag.uag_id)
               ->  Bitmap Index Scan on i__fact__uag_id  (cost=0.00..2770.83 rows=196223 width=0) (actual
time=1713.148..1713.148rows=2390779 loops=1) 
                     Index Cond: (fact.uag_id = d_uag.uag_id)
   ->  Hash  (cost=7.83..7.83 rows=1 width=127) (actual time=62.841..62.841 rows=2 loops=1)
         ->  Index Scan using i_ref__val on d_ref  (cost=0.00..7.83 rows=1 width=127) (actual time=62.813..62.823
rows=2loops=1) 
               Index Cond: (((ref_path)::text = '[path_removed].html'::text) AND ((ref_host)::text =
'download.com.com'::text))
 Total runtime: 141563.733 ms
(13 rows)



############ using "offset 0" to force a better plan.


logs=# explain analyze select * from fact natural join (select * from d_ref natural join d_uag where ref_host =
'download.com.com'and ref_path = '/[path_removed].html' and useragent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows
98)'offset 0) as a; 
                                                                       QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=6465.12..7575.91 rows=367 width=2096) (actual time=2659.251..14703.343 rows=2474 loops=1)
   ->  Limit  (cost=0.00..14.22 rows=1 width=218) (actual time=114.968..115.140 rows=2 loops=1)
         ->  Nested Loop  (cost=0.00..14.22 rows=1 width=218) (actual time=114.964..115.127 rows=2 loops=1)
               ->  Index Scan using i_ref__val on d_ref  (cost=0.00..7.83 rows=1 width=127) (actual time=75.891..75.900
rows=2loops=1) 
                     Index Cond: (((ref_path)::text = '[path_removed].html'::text) AND ((ref_host)::text =
'download.com.com'::text))
               ->  Index Scan using i_uag__val on d_uag  (cost=0.00..6.38 rows=1 width=91) (actual time=19.582..19.597
rows=1loops=2) 
                     Index Cond: ((useragent)::text = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)'::text)
   ->  Bitmap Heap Scan on fact  (cost=6465.12..7556.18 rows=367 width=32) (actual time=2240.090..7288.145 rows=1237
loops=2)
         Recheck Cond: ((fact.uag_id = a.uag_id) AND (fact.ref_id = a.ref_id))
         ->  BitmapAnd  (cost=6465.12..6465.12 rows=367 width=0) (actual time=2221.539..2221.539 rows=0 loops=2)
               ->  Bitmap Index Scan on i__fact__uag_id  (cost=0.00..2770.83 rows=196223 width=0) (actual
time=1633.032..1633.032rows=2390779 loops=2) 
                     Index Cond: (fact.uag_id = a.uag_id)
               ->  Bitmap Index Scan on i__fact__ref_id  (cost=0.00..3581.50 rows=253913 width=0) (actual
time=150.614..150.614rows=77306 loops=2) 
                     Index Cond: (fact.ref_id = a.ref_id)
 Total runtime: 14710.870 ms
(15 rows)


pgsql-performance by date:

Previous
From: "kelvan"
Date:
Subject: database tuning
Next
From: Tom Lane
Date:
Subject: Re: TB-sized databases