wrong join result set estimate - Mailing list pgsql-performance

From Evgeny Shishkin
Subject wrong join result set estimate
Date
Msg-id 5C1F9E10-588B-478C-ADC7-84B2CBA504F4@gmail.com
Whole thread Raw
List pgsql-performance
Hello,

i have a problem with relatively easy query.

EXPLAIN ANALYZE SELECT content.* FROM content JOIN blog ON blog.id = content.blog_id JOIN community_prop ON blog.id = community_prop.blog_id JOIN community ON community.id = community_prop.id WHERE community.id IN (33, 55, 61, 1741, 75, 90, 106, 180, 228, 232, 256, 310, 388, 404, 504, 534, 536, 666, 700, 768, 824, 832, 855, 873, 898, 962, 1003, 1008, 1027, 1051, 1201, 1258, 1269, 1339, 1355, 1360, 1383, 1390, 1430, 1505, 1506, 1530, 1566, 1578, 1616, 1678, 1701, 1713, 1723, 1821, 1842, 1880, 1882, 1894, 1973, 2039, 2069, 2106, 2130, 2204, 2226, 2236, 2238, 2263, 2272, 2310, 2317, 2327, 2353, 2360, 2401, 2402, 2409, 2419, 2425, 2426, 2438, 2440, 2452, 2467, 2494, 2514, 2559, 2581, 2653, 2677, 2679, 2683, 2686, 2694, 2729, 2732, 2739, 2779, 2785, 2795, 2821, 2831, 2839, 2862, 2864, 2866, 2882, 2890, 2905, 2947, 2962, 2964, 2978, 2981, 3006, 3016, 3037, 3039, 3055, 3060, 3076, 3112, 3124, 3135, 3138, 3186, 3213, 3222, 3225, 3269, 3273, 3288, 3291, 3329, 3363, 3375, 3376, 3397, 3415, 3491, 3500, 2296, 3547, 129, 1039, 8, 1053, 1441, 2372, 1974, 289, 2449, 2747, 2075, 57, 3550, 3069, 89, 1603, 1570, 54, 152, 1035, 1456, 506, 1387, 43, 1805, 1851, 1843, 2587, 1908, 1790, 2630, 901, 13, 529, 705, 81, 2668, 1086, 603, 1986, 2516, 2969, 2671, 568, 4636, 1115, 864, 381, 4516, 2608, 677, 88, 1825, 3220, 3284, 947, 1190, 2233, 4489, 3320, 2957, 4146, 1841, 25, 643, 4352, 14, 4261, 3876, 1311, 1342, 4057, 3974) ORDER BY content.time_create DESC LIMIT 10;


As you can see, planner estimates 115 rows, but there are 259554 of them.

This query shows root of the problem
EXPLAIN ANALYZE SELECT content.* FROM content JOIN blog ON blog.id = content.blog_id JOIN community_prop ON blog.id = community_prop.blog_id;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=24498.17..137922.26 rows=2624 width=572) (actual time=36.028..1342.267 rows=408374 loops=1)
   Hash Cond: (content.blog_id = blog.id)
   ->  Seq Scan on content  (cost=0.00..102364.99 rows=1260899 width=572) (actual time=0.030..983.274 rows=1256128 loops=1)
   ->  Hash  (cost=24439.07..24439.07 rows=4728 width=8) (actual time=35.964..35.964 rows=4728 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 185kB
         ->  Nested Loop  (cost=0.00..24439.07 rows=4728 width=8) (actual time=0.064..33.092 rows=4728 loops=1)
               ->  Seq Scan on community_prop  (cost=0.00..463.28 rows=4728 width=4) (actual time=0.004..5.089 rows=4728 loops=1)
               ->  Index Scan using blog_pkey on blog  (cost=0.00..5.06 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=4728)
                     Index Cond: (id = community_prop.blog_id)
 Total runtime: 1361.354 ms

2624 vs 408374

Joining only content with blog: 1260211 vs 1256124.
Joining only blog with community_prop: 4728 vs 4728
Joining only content with community_prop: 78304 vs 408376

SHOW default_statistics_target ;
 default_statistics_target 
---------------------------
 500

I already altered stats on blog_id column 
ALTER TABLE content ALTER COLUMN blog_id SET STATISTICS 1000;

Tried setting 3000 and 10000 on all join columns - did not make a difference.
Tried setting n_distinct on content(blog_id) manually to different values from 10000 to 200000 (exact distinct is 90k, vacuum sets it to 76k) - did not change the estimate result set, only estimated index lookup.

Don't now what to do with this.

Ready to provide any additional information.
Thank you for your time.

pgsql-performance by date:

Previous
From: Thomas Kellerer
Date:
Subject: Re: Postgres becoming slow, only full vacuum fixes it
Next
From: Matthias
Date:
Subject: Re: [GENERAL] Inaccurate Explain Cost