Re: tables with 300+ partitions - Mailing list pgsql-performance
From | Tomáš Vondra |
---|---|
Subject | Re: tables with 300+ partitions |
Date | |
Msg-id | 4728C208.8080501@fuzzy.cz Whole thread Raw |
In response to | Re: tables with 300+ partitions (Pablo Alcaraz <pabloa@laotraesquina.com.ar>) |
List | pgsql-performance |
> Steven Flatt wrote: >> On 10/30/07, *Pablo Alcaraz* <pabloa@laotraesquina.com.ar >> <mailto:pabloa@laotraesquina.com.ar>> wrote: >> >> I did some testing. I created a 300 partitioned empty table. Then, I >> inserted some rows on it and the perfomance was SLOW too. >> >> Is the problem with inserting to the partitioned table or selecting >> from it? It sounds like inserting is the problem in which case I >> ask: how are you redirecting inserts to the appropriate partition? >> If you're using rules, then insert performance will quickly degrade >> with number of partitions as *every* rule needs to be evaluated for >> *every* row inserted to the base table. Using a trigger which you >> can modify according to some schedule is much faster, or better yet, >> use some application-level logic to insert directly to the desired >> partition. >> >> Steve > I was a program inserting into the base table. The program ran in 200+ > threads and every thread insert data on it. Every thread inserts a row > every 3 seconds aprox.(or they used to do it), but when I put more > partitions the insert speed went to 1 insert every 2 minutes. OK, that gives about 70 inserts per second - depending on the amount of data inserted this may or may not be manageable. What is the size of the data the threads are writing with each insert, or what is the size of the whole table (not the number of rows, but size in MB / GB). What is the table structure - what indices are defined on it, etc.? What kind of SELECT queries do you execute on the table / partitions? Aggregations or simple queries? Have you executed ANALYZE on all the partitions after loading the data? What are the EXPLAIN plan for the slow SELECT queries? Anyway 300 partitions for 200 threads seems a little bit too much to me. I'd use something like 10 partitions or something like that. What strategy have you chosen to redirect the inserts into the partitions, i.e. how do you determine the partition the insert should be written to? Maybe I missed something, but what is the CPU and I/O load? In other words, is the system CPU bound or I/O bound? > The selects that need to evaluate all partitions were slow too, but I > think I can wait for them. :D > > I wonder if the update are slow too. I do not know that. > > Do I need to do a trigger for insert only or I need a trigger to > update and delete too? If you have created the queries using "INHERITS" then all you need to do is redirect inserts - either using a RULE, a BEFORE INSERT trigger, or a stored procedure. Each of these options has advandages / disadvantages: Rules are quite easy to maintain (once you create a new partition you just need to create a new rule), but may have serious overhead in case of many partitions as you have to evaluate all rules . Triggers are not as easy to maintain as all the tables have to be in a single procedure, and adding / removing a partition means modifying the procedure. On the other side the performance may be better in case of many partitions. Both the solutions mentioned above have the advantage of transparency, i.e. the clients don't need to know about them. Stored procedures have the advantages and disadvanteges of a trigger, plus they have to be invoked by the client. Tomas
pgsql-performance by date: