Home > mailing lists

self join, parameterized base/join rel path row estimation and generally... - Mailing list pgsql-performance

From	Bongseo Jang
Subject	self join, parameterized base/join rel path row estimation and generally...
Date	January 27, 2014 05:58:50
Msg-id	CAKPsnf1+ZU0W=UYPB_eAi3u8qATx43tny5Hba4x_B_T71qUa6A@mail.gmail.com Whole thread
List	pgsql-performance

Tree view

Hi,

I have two plans of a query.
nestloop plan is much faster, but planner chose the slower one, hashjoin.

http://explain.depesz.com/s/Aqs
http://explain.depesz.com/s/97C

it seems that rows=39698995 are quite overestimated.

-> Nested Loop (cost=0.000..5403.600 rows=39698995 width=45) (actual time=0.392..14.817 rows=943 loops=1)
-> Nested Loop (cost=0.000..17.600 rows=1 width=8) (actual time=0.241..0.246 rows=1 loops=1)
-> Index Scan using seven on hotel three (cost=0.000..6.880 rows=1 width=6) (actual time=0.113..0.115 rows=1 loops=1)
Index Cond: (two = 31750::numeric)
-> Index Scan using echo on oscar_foxtrot november (cost=0.000..10.710 rows=1 width=14) (actual time=0.117..0.118 rows=1 loops=1)
Index Cond: (charlie = three.golf)
-> Index Scan using zulu on oscar_foxtrot juliet (cost=0.000..3849.200 rows=153679 width=45) (actual time=0.147..14.241 rows=943 loops=1)
Index Cond: ((uniform_yankee = november.uniform_yankee) AND (uniform_victor = november.uniform_victor))

pg_stats is like this;

> select attname, null_frac, n_distinct, most_common_vals, most_common_freqs from pg_stats where tablename like 'oscar_foxtrot%' and (attname = 'uniform_yankee' or attname = 'uniform_victor')

"uniform_yankee";0;12;"{83886082,83886085}";"{0.9742,0.02}"

"uniform_victor";0;23;"{1342767106,1342308357}";"{0.973467,0.02}"

I assumed that nestloop rows would be more or less inner_path_rows * outer_path_rows with good pg_stats, and good plan could come based on it.

the plan above is not that case. Suspcious of 40 million rows and small number of values(actually two values) making up 98% of distribution.

so.. I looked up some code and found that rows=153679 is rows of parameterized base rel estimated by eqsel(), and row=39698995 is rows of parameterized join rel by eqjoinsel().

I think wrong plan above comes from the fact that the two estimation cannot be close in general, great difference in my case.

where am i wrong and right?

Is there recommended approach, related issue, commit and so on i can follow ?

thanks

[[nstallation Info]]

PostgreSQL-9.2.5 (via postgresql yum repository)

OS: Centos 6.3 (custom linux-3.10.12 kernel)
postgresql.conf:
effective_cache_size = 10000MB
shared_buffers = 1000MB
work_mem = 100MB
maintenance_work_mem = 100MB
HW: CPU 4-core Xeon x 2 sockets, RAM 256GB

--
Regards,
Jang.

a sound mind in a sound body

pgsql-performance by date:

From: Josh Kupershmidt
Date: 24 January 2014, 21:43:10
Subject: Re: pg_repack solves alter table set tablespace lock

From: Ying He
Date: 27 January 2014, 14:15:13
Subject: Re: pg_repack solves alter table set tablespace lock

self join, parameterized base/join rel path row estimation and generally... - Mailing list pgsql-performance

Previous

Next