Re: Performance with sorting and LIMIT on partitioned table - Mailing list pgsql-performance
From | Robert Haas |
---|---|
Subject | Re: Performance with sorting and LIMIT on partitioned table |
Date | |
Msg-id | 603c8f070910181752x76ee07d2q5bbffa28c39914e3@mail.gmail.com Whole thread Raw |
In response to | Performance with sorting and LIMIT on partitioned table (Michal Szymanski <mich20061@gmail.com>) |
Responses |
Re: Performance with sorting and LIMIT on partitioned table
|
List | pgsql-performance |
On Mon, Oct 12, 2009 at 10:14 AM, Michal Szymanski <mich20061@gmail.com> wrote: > We have performance problem with query on partitioned table when query > use order by and we want to use first/last rows from result set. > More detail description: > We have big table where each row is one telephone call (CDR). > Definitnion of this table look like this: > CREATE TABLE accounting.cdr_full_partitioned (it is parrent table) > ( > cdr_id bigint NOT NULL, > id_crx_group_from bigint, -- identifier of user > start_time_invite timestamp with time zone, -- start call time > call_status VARCHAR -- FINF-call finished, FINC-call > unfinished > ..some extra data.. > ) > > We creating 12 partitions using 'start_time_invite' column, simply we > create one partition for each month. We create costraints like this: > ALTER TABLE accounting.cdr_y2009_m09 > ADD CONSTRAINT y2009m09 CHECK (start_time_invite >= '2009-09-01 > 00:00:00+02'::timestamp with time zone AND start_time_invite < > '2009-10-01 00:00:00+02'::timestamp with time zone); > > and we define necessery indexes of course > > CREATE INDEX cdr_full_partitioned_y2009_m09_id_crx_group_to_key1 > ON accounting.cdr_full_partitioned_y2009_m09 > USING btree > (id_crx_group_from, start_time_invite, call_status); > > > The problem appears when we want to select calls for specified user > with specified call_Status e.g: > SELECT * FROM accounting.cdr_full_partitioned > WHERE > id_crx_group_from='522921' AND > call_status='FINS' AND > start_time_invite>='2009-09-28 00:00:00+02' AND > start_time_invite<'2009-10-12 23:59:59+02' AND > ORDER BY start_time_invite LIMIT '100' OFFSET 0 > > you can see execution plan http://szymanskich.net/pub/postgres/full.jpg > as you see 20000 rows were selected and after were sorted what take > very long about 30-40s and after sorting it limit > result to 100 rows. > > Using table without partition > > SELECT * FROM accounting.cdr_full WHERE > (id_crx_group_from='522921') AND ( > call_status='FINS' ) AND (start_time_invite>='2009-01-28 > 00:00:00+02') > AND (start_time_invite<'2009-10-12 23:59:59+02') ORDER BY > start_time_invite LIMIT '100' OFFSET 0 > > execution plan is very simple > "Limit (cost=0.00..406.40 rows=100 width=456)" > " -> Index Scan using > cdr_full_crx_group_from_start_time_invite_status_ind on cdr_full > (cost=0.00..18275.76 rows=4497 width=456)" > " Index Cond: ((id_crx_group_from = 522921::bigint) AND > (start_time_invite >= '2009-01-27 23:00:00+01'::timestamp with time > zone) AND (start_time_invite < '2009-10-12 23:59:59+02'::timestamp > with time zone) AND ((call_status)::text = 'FINS'::text))" > > it use index to fetch first 100 rows and it is super fast and take > less than 0.5s. There is no rows sorting! > I've tried to execute the same query on one partition: > SELECT * FROM accounting.cdr_full_partitioned_y2009_m09 > WHERE (id_crx_group_from='509498') AND ( > call_status='FINS' ) AND (start_time_invite>='2009-09-01 > 00:00:00+02') > AND (start_time_invite<'2009-10-12 23:59:59+02') > > You can see execution plan http://szymanskich.net/pub/postgres/ononeprtition.jpg > and query is superfast because there is no sorting. The question is > how to speed up query when we use partitioning? So far I have not > found solution. I'm wonder how do you solve problems > when result from partition must be sorted and after we want to display > only first/last 100 rows? > We can use own partitioning mechanism and partitioning data using > id_crx_group_from and create dynamic query (depending on > id_crx_group_from we can execute query on one partition) but it is not > most beautiful solution. Yeah - unfortunately the query planner is not real smart about partitioned tables yet. I can't make anything of the JPG link you posted. Can you post the EXPLAIN ANALYZE output for the case that is slow? What PG version is this? ...Robert
pgsql-performance by date: