Re: Performance with sorting and LIMIT on partitioned table - Mailing list pgsql-performance

From Robert Haas
Subject Re: Performance with sorting and LIMIT on partitioned table
Date
Msg-id 603c8f070910181752x76ee07d2q5bbffa28c39914e3@mail.gmail.com
Whole thread Raw
In response to Performance with sorting and LIMIT on partitioned table  (Michal Szymanski <mich20061@gmail.com>)
Responses Re: Performance with sorting and LIMIT on partitioned table
List pgsql-performance
On Mon, Oct 12, 2009 at 10:14 AM, Michal Szymanski <mich20061@gmail.com> wrote:
> We have performance problem with query on partitioned table when query
> use order by and we want to use first/last rows from result set.
> More detail description:
> We have big table where each row is one telephone call (CDR).
> Definitnion of this table look like this:
> CREATE TABLE accounting.cdr_full_partitioned  (it is parrent table)
> (
>  cdr_id bigint NOT NULL,
>  id_crx_group_from bigint,                                             -- identifier of user
>  start_time_invite timestamp with time zone,   -- start call time
>  call_status VARCHAR                                                   -- FINF-call finished, FINC-call
> unfinished
>  ..some extra data..
> )
>
> We creating 12 partitions using 'start_time_invite' column, simply we
> create one partition for each month. We create costraints like this:
> ALTER TABLE accounting.cdr_y2009_m09
>  ADD CONSTRAINT y2009m09 CHECK (start_time_invite >= '2009-09-01
> 00:00:00+02'::timestamp with time zone AND start_time_invite <
> '2009-10-01 00:00:00+02'::timestamp with time zone);
>
> and we define necessery indexes of course
>
> CREATE INDEX cdr_full_partitioned_y2009_m09_id_crx_group_to_key1
>  ON accounting.cdr_full_partitioned_y2009_m09
>  USING btree
>  (id_crx_group_from, start_time_invite, call_status);
>
>
> The problem appears when we want to select calls for specified user
> with specified call_Status e.g:
>  SELECT * FROM accounting.cdr_full_partitioned
>   WHERE
>   id_crx_group_from='522921' AND
>   call_status='FINS' AND
>   start_time_invite>='2009-09-28 00:00:00+02' AND
>   start_time_invite<'2009-10-12 23:59:59+02'   AND
>  ORDER BY start_time_invite  LIMIT '100' OFFSET 0
>
> you can see execution plan  http://szymanskich.net/pub/postgres/full.jpg
>  as you see 20000 rows were selected and after were sorted what take
> very long about 30-40s and after sorting it limit
> result to 100 rows.
>
> Using table without partition
>
>  SELECT * FROM accounting.cdr_full    WHERE
> (id_crx_group_from='522921') AND (
>   call_status='FINS' ) AND (start_time_invite>='2009-01-28
> 00:00:00+02')
>   AND (start_time_invite<'2009-10-12 23:59:59+02') ORDER BY
> start_time_invite  LIMIT '100' OFFSET 0
>
> execution plan is very simple
> "Limit  (cost=0.00..406.40 rows=100 width=456)"
> "  ->  Index Scan using
> cdr_full_crx_group_from_start_time_invite_status_ind on cdr_full
> (cost=0.00..18275.76 rows=4497 width=456)"
> "        Index Cond: ((id_crx_group_from = 522921::bigint) AND
> (start_time_invite >= '2009-01-27 23:00:00+01'::timestamp with time
> zone) AND (start_time_invite < '2009-10-12 23:59:59+02'::timestamp
> with time zone) AND ((call_status)::text = 'FINS'::text))"
>
> it use index to fetch first 100 rows and it is super fast and take
> less than 0.5s. There is no rows sorting!
> I've tried to execute the same query on one partition:
>  SELECT * FROM accounting.cdr_full_partitioned_y2009_m09
>  WHERE (id_crx_group_from='509498') AND (
>   call_status='FINS' ) AND (start_time_invite>='2009-09-01
> 00:00:00+02')
>   AND (start_time_invite<'2009-10-12 23:59:59+02')
>
> You can see execution plan http://szymanskich.net/pub/postgres/ononeprtition.jpg
> and query is superfast because there is no sorting. The question is
> how to speed up query when we use partitioning? So far I have not
> found solution. I'm wonder how do you solve problems
> when result from partition must be sorted and after we want to display
> only first/last 100 rows?
> We can use own partitioning mechanism and partitioning data using
> id_crx_group_from and create dynamic query (depending on
> id_crx_group_from we can execute query on one partition) but it is not
> most beautiful solution.

Yeah - unfortunately the query planner is not real smart about
partitioned tables yet.  I can't make anything of the JPG link you
posted.  Can you post the EXPLAIN ANALYZE output for the case that is
slow?  What PG version is this?

...Robert

pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: Improving join performance over multiple moderately wide tables
Next
From: Grzegorz Jaśkiewicz
Date:
Subject: Re: Partitioned Tables and ORDER BY