[PERFORMANCE] Performance index and table - Mailing list pgsql-performance

From Oscar Camuendo
Subject [PERFORMANCE] Performance index and table
Date
Msg-id BN3PR20MB04990FF1BE4ACCBC75924C71F00A0@BN3PR20MB0499.namprd20.prod.outlook.com
Whole thread Raw
Responses Re: [PERFORMANCE] Performance index and table  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance

I'm working on Postgresql 9.5.3 and executed a query which takes 5 or 7 seconds and it should not take more than 0.30 milliseconds, the query is:


-----------QUERY--------------------------------------------------------------------------------------


with recursive t(level,parent_id,id) as (
select 0,parent_id,id from parties where parent_id = 105
union
select t.level + 1,c.parent_id,c.id from parties c join t on c.parent_id = t.id
)
select distinct id from t order by id;

--------------------------------------------------------------------------------------------------------------

The parties table has 245512 rows and one index named "index_parties_on_parent_id" , so I added an EXPLAIN ANALYZE VERBOSE to get more details and it was the result:


--------RESULT--------------------------------------------------------------------------------------


Sort (cost=21237260.78..21237261.28 rows=200 width=4) (actual time=6850.338..6850.343 rows=88 loops=1)
Output: t.id
Sort Key: t.id
Sort Method: quicksort Memory: 29kB
CTE t
-> Recursive Union (cost=0.43..20562814.38 rows=29974967 width=12) (actual time=0.072..6850.180 rows=88 loops=1)
-> Index Scan using index_parties_on_parent_id on public.parties (cost=0.43..3091.24 rows=807 width=8) (actual time=0.064..0.154 rows=23 loops=1)
Output: 0, parties.parent_id, parties.id
Index Cond: (parties.parent_id = 105)
-> Hash Join (cost=777279.14..1996022.38 rows=2997416 width=12) (actual time=2245.623..2283.290 rows=22 loops=3)
Output: (t_1.level + 1), c.parent_id, c.id
Hash Cond: (t_1.id = c.parent_id)
-> WorkTable Scan on t t_1 (cost=0.00..161.40 rows=8070 width=8) (actual time=0.002..0.009 rows=29 loops=3)
Output: t_1.level, t_1.id
-> Hash (cost=606642.73..606642.73 rows=10400673 width=8) (actual time=2206.149..2206.149 rows=1742 loops=3)
Output: c.parent_id, c.id
Buckets: 2097152 Batches: 16 Memory Usage: 16388kB
-> Seq Scan on public.parties c (cost=0.00..606642.73 rows=10400673 width=8) (actual time=71.070..2190.318 rows=244249 loops=3)
Output: c.parent_id, c.id
-> HashAggregate (cost=674436.76..674438.76 rows=200 width=4) (actual time=6850.291..6850.305 rows=88 loops=1)
Output: t.id
Group Key: t.id
-> CTE Scan on t (cost=0.00..599499.34 rows=29974967 width=4) (actual time=0.075..6850.236 rows=88 loops=1)
Output: t.id
Planning time: 0.815 ms
Execution time: 7026.026 ms

----------------------------------------------------------------------------------------------------------------------

So, I could see that index_parties_on_parent_id showed 10400673 rows and checking index_parties_on_parent_id index I get this information: num_rows = 10400673 and index_size = 310 MB 

Could Anybody explain me why the difference between parties table = 245512 and index_parties_on_parent_id index = 10400673? and How could I improve this index and its response time?

pgsql-performance by date:

Previous
From: Robert Klemme
Date:
Subject: Re: Seeing execution plan of foreign key constraint check?
Next
From: Tom Lane
Date:
Subject: Re: [PERFORMANCE] Performance index and table