Re: Performance on Bulk Insert to Partitioned Table - Mailing list pgsql-performance
From | Charles Gomes |
---|---|
Subject | Re: Performance on Bulk Insert to Partitioned Table |
Date | |
Msg-id | BLU002-W187422443209B681CE83B5FAB360@phx.gbl Whole thread Raw |
In response to | Re: Performance on Bulk Insert to Partitioned Table (Jeff Janes <jeff.janes@gmail.com>) |
List | pgsql-performance |
The BBU does combines the writes. I've benchmarked using a single table and it took 1:34:21.549959 to insert 1188000000 rows. (70 writers to a single table) I've also benchmarked having writers targeting individual partitions and they get the same job done in 1 Hour. I/O is definitely not the botleneck. Without changing hardware it accelerates things almost 4 times, looks like to be a delay on the way Postgresql handles thepartitions or the time taking for the trigger to select what partition to insert. When targeting I issue commands that insert directly into the partition "INSERT INTO quotes_DATE VALUES() ..,..,...,.., "10k rows at time. When not targeting I leave to the trigger to decide: CREATE OR REPLACE FUNCTION quotes_insert_trigger()RETURNS trigger AS $$ DECLARE tablename varchar(24); bdate varchar(10); edate varchar(10); BEGIN tablename = 'quotes_' || to_char(new.received_time,'YYYY_MM_DD'); EXECUTE 'INSERT INTO '|| tablename ||' VALUES (($1).*)' USING NEW ; RETURN NULL; END; $$ LANGUAGE plpgsql; Maybe translating this trigger to C could help. But I haven't heart anyone that did use partitioning with a trigger in Cand I don't have the know how on it without examples. ________________________________ > Date: Thu, 20 Dec 2012 19:24:09 -0800 > Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table > From: jeff.janes@gmail.com > To: charlesrg@outlook.com > CC: pgsql-performance@postgresql.org > > > > On Thursday, December 20, 2012, Charles Gomes wrote: > Jeff, > > The 8288 writes are fine, as the array has a BBU, it's fine. You see > about 4% of the utilization. > > BBU is great for latency, but it doesn't do much for throughput, unless > it is doing write combining behind the scenes. Is it HDD or SSD behind > the BBU? Have you bench-marked it on randomly scattered 8k writes? > > I've seen %util reports that were low while watching a strace showed > obvious IO freezes. So I don't know how much faith to put into low > %util. > > > > To target directly instead of doing : > INSERT INTO TABLE VALUES () > I use: > INSERT INTO TABLE_PARTITION_01 VALUES() > > But how is it deciding what partition to use? Does it have to > re-decide for every row, or does each thread serve only one partition > throughout its life and so makes the decision only once? > > > > By targeting it I see a huge performance increase. > > But is that because by targeting you are by-passing the the over-head > of triggers, or is it because you are loading the rows in an order > which leads to more efficient index maintenance? > > > I haven't tested using 1Billion rows in a single table. The issue is > that in the future it will grow to more than 1 billion rows, it will > get to about 4Billion rows and that's when I believe partition would be > a major improvement. > > The way that partitioning gives you performance improvements is by you > embracing the partitioning, for example by targeting the loading to > just one partition without any indexes, creating indexes, and then > atomically attaching it to the table. If you wish to have partitions, > but want to use triggers to hide that partitioning from you, then I > don't think you can expect to get much of a speed up through using > partitions. > > Any way, the way I would approach it would be to load to a single > un-partitioned table, and also load to a single dummy-partitioned table > which uses a trigger that looks like the one you want to use for real, > but directs all rows to a single partition. If these loads take the > same time, you know it is not the trigger which is limiting. > > Cheers, > > Jeff
pgsql-performance by date: