Re: Partitioning with a lot of number of partitions - Mailing list pgsql-novice
From | Josh Kupershmidt |
---|---|
Subject | Re: Partitioning with a lot of number of partitions |
Date | |
Msg-id | CAK3UJRFyeFkt_4M-rPX3uqq1mKk_7X8YJkHvcdpqUb8qpK4DeQ@mail.gmail.com Whole thread Raw |
In response to | Partitioning with a lot of number of partitions (Aleksej Trofimov <aleksej.trofimov@ruptela.lt>) |
List | pgsql-novice |
On Wed, Oct 12, 2011 at 10:09 AM, Aleksej Trofimov <aleksej.trofimov@ruptela.lt> wrote: > By the moment we are using PostgreSQL 9.1.0. Our database is about 270GB and > currently we have a problem with our "hot" table which is about 120GB of > data and 20GB index. That sounds quite manageable with decent hardware. > By "hot" I mean, that we have a lot of > insert/update/select operations on this table (about 30/30/120 per second). > This hot table is called coordinates. In our logic we have several > definitions: Coordinate - basic packet of data; Object - basic unit, which > sends coordinates; Client - client which own object. One client may has a > lot of object. and each object may has a lot of coordinates. All > coordinates are stored in coordinates table. > So by the moment we have an issue with slow insert/select queries... this > why we decided to partition this table. We have tested 2 types of > partitioning: By date (weekly), by object; > This are our results: > > SELECT QUERY (1 object): > > ----------------------------------------------------- > Partition type | M | W | D | > ----------------------------------------------------- > Non-partitioned | 5830 | 460 | 2913 | > Date(weekly) | 1000 | 440 | 106 | > Object | 0.02 | 0.03 | 0.009 | > ------------------------------------------------------ > > M - query for month period > W - query for week period > D - query for day period > All times in ms. > > So according to our results, we have much greater performance using > partitioning by Object, Well hold on, your results only show the timings for "SELECT QUERY (1 object):" -- does this mean you're just selecting one row? If so, that doesn't seem like a particularly useful performance benchmark. What are the _real_ bottleneck queries for your application when you're not using partitioning? Also, 5830 ms seems like a lot for what amounts to an indexed (primary key?) lookup of a single row -- how long does it take for you to SELECT [some object ID] FROM original_unpartitioned_big_table; (try the query a few times, with both same and different object IDs to get a feel for cache effects). BTW, If you do decide to partition based on objects, you don't have to make a partition for each object ID, if there are 5000 of those. You can partition based on ranged of object IDs. > BUT we have a lot of object (about 5000, and > planning to connect 2-5 times bigger object count next year). This plans > leads to the firs question: > > 1) How postgres would "like" such a big count of partitions? Will it > decrease performance in future? 5000 partitions is going to incur significant planner overhead for each query; check the docs and archives (search term "large number of partitions" or similar). > By the moment we noticed several performance > issues with the big count of object id in such conditional statement: SELECT > * FROM coordinates_object.coordinates WHERE object_id in (HERE ARE SEVERAL > OBJECTS); [snip] > " Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))" > "Total runtime: 695.963 ms" > > 2) As you see from EXPLAIN, postgres is using array for filtering Filter: > (object_id = ANY ('{722,728,727,248}'::bigint[]))", why? AFAIR, that's a normal transformation of a IN (...) query to = ANY (...), and I bet you'll see the same thing without partitioning at play. For a variety of reasons (performance, ease of maintenance), I think partitioning is most useful for improving query times for large data-sets when you can separate out your "hot" and "cold" data. Or, if you can't do that, then next-best is to try to partition into sets that will be queried together often (e.g. if you often query a cluster of nearby object IDs together, partition by object ID, or same thing for your other types), so that your queries hit a small number of partitions on average. Josh
pgsql-novice by date: