Re: tables with 300+ partitions - Mailing list pgsql-performance

From Tomáš Vondra
Subject Re: tables with 300+ partitions
Date
Msg-id 4728C208.8080501@fuzzy.cz
Whole thread Raw
In response to Re: tables with 300+ partitions  (Pablo Alcaraz <pabloa@laotraesquina.com.ar>)
List pgsql-performance
 > Steven Flatt wrote:
 >> On 10/30/07, *Pablo Alcaraz* <pabloa@laotraesquina.com.ar
 >> <mailto:pabloa@laotraesquina.com.ar>> wrote:
 >>
 >>     I did some testing. I created a 300 partitioned empty table. Then, I
 >>     inserted some rows on it and the perfomance was SLOW too.
 >>
 >> Is the problem with inserting to the partitioned table or selecting
 >> from it?  It sounds like inserting is the problem in which case I
 >> ask: how are you redirecting inserts to the appropriate partition?
 >> If you're using rules, then insert performance will quickly degrade
 >> with number of partitions as *every* rule needs to be evaluated for
 >> *every* row inserted to the base table.  Using a trigger which you
 >> can modify according to some schedule is much faster, or better yet,
 >> use some application-level logic to insert directly to the desired
 >> partition.
 >>
 >> Steve
 > I was a program inserting into the base table. The program ran in 200+
 > threads and every thread insert data on it. Every thread inserts a row
 > every 3 seconds aprox.(or they used to do it), but when I put more
 > partitions the insert speed went to 1 insert every 2 minutes.

OK, that gives about 70 inserts per second - depending on the amount of
data inserted this may or may not be manageable. What is the size of the
data the threads are writing with each insert, or what is the size of
the whole table (not the number of rows, but size in MB / GB). What is
the table structure - what indices are defined on it, etc.?

What kind of SELECT queries do you execute on the table / partitions?
Aggregations or simple queries? Have you executed ANALYZE on all the
partitions after loading the data? What are the EXPLAIN plan for the
slow SELECT queries?

Anyway 300 partitions for 200 threads seems a little bit too much to me.
I'd use something like 10 partitions or something like that. What
strategy have you chosen to redirect the inserts into the partitions,
i.e. how do you determine the partition the insert should be written to?

Maybe I missed something, but what is the CPU and I/O load? In other
words, is the system CPU bound or I/O bound?

 > The selects that need to evaluate all partitions were slow too, but I
 > think I can wait for them. :D
 >
 > I wonder if the update are slow too. I do not know that.
 >
 > Do I need to do a trigger for insert only or I need a trigger to
 > update and delete too?

If you have created the queries using "INHERITS" then all you need to do
is redirect inserts - either using a RULE, a BEFORE INSERT trigger, or a
stored procedure. Each of these options has advandages / disadvantages:

Rules are quite easy to maintain (once you create a new partition you
just need to create a new rule), but may have serious overhead in case
of many partitions as you have to evaluate all rules .

Triggers are not as easy to maintain as all the tables have to be in a
single procedure, and adding / removing a partition means modifying the
procedure. On the other side the performance may be better in case of
many partitions.

Both the solutions mentioned above have the advantage of transparency,
i.e. the clients don't need to know about them. Stored procedures have
the advantages and disadvanteges of a trigger, plus they have to be
invoked by the client.

  Tomas

pgsql-performance by date:

Previous
From: "Scott Marlowe"
Date:
Subject: Re: hardware and For PostgreSQL
Next
From: Joe Uhl
Date:
Subject: Re: hardware and For PostgreSQL