Re: Queries joining views - Mailing list pgsql-general

From Alban Hertroys
Subject Re: Queries joining views
Date
Msg-id 44EB1DA7.9040000@magproductions.nl
Whole thread Raw
In response to Re: Queries joining views  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Queries joining views
List pgsql-general
Tom Lane wrote:
> Alban Hertroys <alban@magproductions.nl> writes:
>> tablename         | mm_product_table
>> attname           | number
>> histogram_bounds  | {2930,3244,3558,3872,4186,4500,4814,5128,5442,5756,6070}
>
>> tablename         | mm_insrel_table
>> attname           | snumber
>> histogram_bounds  |
>> {135,3768,4780,14822,57048,92958,125442,158954,433002,502836,610034}
>
> Hmm ... if I'm not still confused, these are the two columns being
> mergejoined in your slow query (would you double-check that?).

That's correct.

I read up some on the meaning of the pg_stats values, and I noticed that
  mm_insrel_tables' snumber and dnumber columns seem to have a rather
bad correlation. I think this could be improved by clustering on an
index over (number, snumber, dnumber); is that correct?

> But the numbers don't seem to add up.  Given those stats the estimate
> should be that something over 20% of the mm_insrel_table has to be
> scanned to complete the join (since 6070 falls into the third decile
> of the other histogram).  But we saw from Alban's original post that
> the planner must be estimating well under 10% of the table needs to
> be scanned.  Either we're still confused about which columns are being
> joined, or there's some weird bug in the computation.

Since the start of this thread the insrel table has grown to 339195
records (it was closer to 330,000), maybe that changed the statistics a
bit. To be sure, attached is the query plan of the problematic query
again at this moment.

The other table involved, mm_medical_care_container_table, has the
following stats on number:

zorgweb_solaris=> select * from pg_stats where attname = 'number' and
tablename = 'mm_medical_care_container_table';
-[ RECORD 1
]-----+-------------------------------------------------------------------------------
schemaname        | public
tablename         | mm_medical_care_container_table
attname           | number
null_frac         | 0
avg_width         | 4
n_distinct        | -1
most_common_vals  |
most_common_freqs |
histogram_bounds  |
{418768,436686,455444,473600,490610,508680,527182,545038,562786,578528,595132}
correlation       | 0.339138

I sure hope we get this mystery unveiled...

Regards,
--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //
zorgweb_solaris=> explain analyze SELECT
zorgweb_solaris->
insrel.rnumber,insrel.dnumber,insrel.snumber,insrel.dir,insrel.number,medical_care_container.number,product.number
zorgweb_solaris-> FROM mm_medical_care_container medical_care_container,mm_insrel
zorgweb_solaris-> insrel,mm_product product WHERE medical_care_container.number=558332 AND
zorgweb_solaris-> (medical_care_container.number=insrel.dnumber AND
zorgweb_solaris(> product.number=insrel.snumber AND insrel.dir<>1);
                                                                                             QUERY PLAN
                                                  

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..178.61 rows=1 width=28) (actual time=56.089..137.304 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..174.67 rows=1 width=28) (actual time=56.041..137.250 rows=1 loops=1)
         ->  Merge Join  (cost=0.00..170.73 rows=1 width=28) (actual time=55.986..137.189 rows=1 loops=1)
               Merge Cond: ("outer".number = "inner".number)
               ->  Nested Loop  (cost=0.00..2966.75 rows=30 width=28) (actual time=46.161..125.315 rows=2 loops=1)
                     ->  Nested Loop  (cost=0.00..2848.50 rows=30 width=24) (actual time=46.108..125.205 rows=2
loops=1)
                           ->  Index Scan using mm_insrel_full_idx on mm_insrel_table  (cost=0.00..2678.20 rows=30
width=20)(actual time=46.042..125.067 rows=2 loops=1) 
                                 Index Cond: (dnumber = 558332)
                           ->  Index Scan using mm_medical_care_container_table_pkey on mm_medical_care_container_table
(cost=0.00..5.67 rows=1 width=4) (actual time=0.048..0.049 rows=1 loops=2) 
                                 Index Cond: (558332 = number)
                     ->  Index Scan using mm_object_pkey on mm_object  (cost=0.00..3.93 rows=1 width=4) (actual
time=0.041..0.043rows=1 loops=2) 
                           Index Cond: (mm_object.number = "outer".snumber)
               ->  Index Scan using mm_product_table_pkey on mm_product_table  (cost=0.00..67.90 rows=1571 width=4)
(actualtime=0.024..9.462 rows=1571 loops=1) 
         ->  Index Scan using mm_object_pkey on mm_object  (cost=0.00..3.93 rows=1 width=4) (actual time=0.043..0.045
rows=1loops=1) 
               Index Cond: ("outer".number = mm_object.number)
   ->  Index Scan using mm_object_pkey on mm_object  (cost=0.00..3.93 rows=1 width=4) (actual time=0.038..0.041 rows=1
loops=1)
         Index Cond: (number = 558332)
 Total runtime: 138.132 ms
(18 rows)


pgsql-general by date:

Previous
From: Vance Maverick
Date:
Subject: Re: UUID as primary key
Next
From: "Harald Armin Massa"
Date:
Subject: Re: UUID as primary key