Thread: Scalability question
Hi, I got a question about scalability in high volume insert situation where the table has a primary key and several non-unique indexes on other columns of the table. How does PostgreSQL behave in terms of scalability? The high volume of inserts comes from multiple transactions. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
> Hi, > > I got a question about scalability in high volume insert situation > where the table has a primary key and several non-unique indexes > on other columns of the table. How does PostgreSQL behave > in terms of scalability? The high volume of inserts comes from > multiple transactions. > > Best regards, > Zoltán Böszörményi Well, that's a difficult question as it depends on hardware and software, but with a proper tunning the results may be very good. Just do the basic PostgreSQL tuning and then tune it for the INSERT performance if needed. It's difficult to give any other recommendations without a more detailed knowledge of the problem, but consider these hints: 1) move the pg_xlog to a separate drive (so it's linear) 2) move the table with large amount of inserts to a separate tablespace 3) minimize the amount of indexes etc. The basic rule is that each index adds some overhead to the insert, but it depends on datatype, etc. Just prepare some data to import, and run the insert with and without the indexes and compare the time. Tomas
tv@fuzzy.cz írta: >> Hi, >> >> I got a question about scalability in high volume insert situation >> where the table has a primary key and several non-unique indexes >> on other columns of the table. How does PostgreSQL behave >> in terms of scalability? The high volume of inserts comes from >> multiple transactions. >> >> Best regards, >> Zoltán Böszörményi >> > > Well, that's a difficult question as it depends on hardware and software, > but with a proper tunning the results may be very good. Just do the basic > PostgreSQL tuning and then tune it for the INSERT performance if needed. > It's difficult to give any other recommendations without a more detailed > knowledge of the problem, but consider these hints: > > 1) move the pg_xlog to a separate drive (so it's linear) > 2) move the table with large amount of inserts to a separate tablespace > 3) minimize the amount of indexes etc. > > The basic rule is that each index adds some overhead to the insert, but it > depends on datatype, etc. Just prepare some data to import, and run the > insert with and without the indexes and compare the time. > > Tomas > Thanks. The question is more about theoretical working. E.g. if INSERTs add "similar" records with identical index records (they are non-unique indexes) does it cause contention? Because these similar records add index tuples that supposed to be near to each other in the btree. -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Zoltan Boszormenyi wrote: > Hi, > > I got a question about scalability in high volume insert situation > where the table has a primary key and several non-unique indexes > on other columns of the table. How does PostgreSQL behave > in terms of scalability? The high volume of inserts comes from > multiple transactions. btree and gist indexes can have multiple concurrent insertions in flight. A potential for blocking is in UNIQUE indexes: if two transactions try to insert the same value in the unique index, the second one will block until the first transaction finishes. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Wed, Jun 11, 2008 at 3:56 AM, Zoltan Boszormenyi <zb@cybertec.at> wrote: > Hi, > > I got a question about scalability in high volume insert situation > where the table has a primary key and several non-unique indexes > on other columns of the table. How does PostgreSQL behave > in terms of scalability? The high volume of inserts comes from > multiple transactions. PostgreSQL supports initial fill rates of < 100% for indexes, so set it to 50% filled and new entries that live near current entries will have room to be added without having the split the btree. PostgreSQL also allows you to easily put your indexes on other paritions / drive arrays etc... PostgreSQL does NOT store visibility info in the indexes, so they stay small and updates to them are pretty fast.