Re: concurrent inserts into two separate tables are very slow - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: concurrent inserts into two separate tables are very slow
Date
Msg-id dcc563d10801071508n7297eeeaw121ff16d3dc262cd@mail.gmail.com
Whole thread Raw
In response to Re: concurrent inserts into two separate tables are very slow  (Sergei Shelukhin <realgeek@gmail.com>)
List pgsql-performance
On Jan 7, 2008 4:49 PM, Sergei Shelukhin <realgeek@gmail.com> wrote:
>
> Scott Marlowe wrote:
> > On Jan 5, 2008 9:00 PM, Sergei Shelukhin <realgeek@gmail.com> wrote:
> >
> >> Hi. Running postgres 8.2 on debian.
> >> I've noticed that concurrent inserts (archiving) of large batches data
> >> into two completely unrelated tables are many times slower than the
> >> same inserts done in sequence.
> >> Is there any way to speed them up apart from buying faster HDs/
> >> changing RAID configuration?
> >>
> >
> > What method are you using to load these data?  Got a short example
> > that illustrates what you're doing?
> >
> >
> The basic structure is as follows: there are several tables with
> transaction data that is stored for one month only.
> The data comes from several sources in different formats and is pushed
> in using a custom script.
> It gets the source data and puts it into a table it creates (import
> table) with the same schema as the main table; then it deletes the month
> old data from the main table; it also searches for duplicates in the
> main table using some specific criteria and deletes them too (to make
> use of indexes 2nd temp table is created with id int column and it's
> populated with one insert ... select query with the transaction ids of
> data duplicate in main and import tables, after that delete from pages
> where id in (select id from 2nd-temp-table) is called). Then it inserts
> the remainder of the imports table into the main table.
> There are several data load processes that function in the same manner
> with different target tables.
> When they are running in sequence, they take about 20 minutes to
> complete on average. If, however, they are running in parallel, they can
> take up to 3 hours... I was wondering if it's solely the HD bottleneck
> case, given that there's plenty of CPU and RAM available and postgres is
> configured to use it.

Ahh, thanks for the more detailed explanation.  Now I get what you're facing.

There are a few things you could do that would probably help.  Doing
more than one might help.

1: Buy a decent battery backed caching RAID controller.  This will
smooth out writes a lot.  If you can't afford that...
2: Build a nice big RAID-10 array, say 8 to 14 discs.
3: Put pg_xlog on a physically separate drive from the rest of the database.
4: Put each table being inserted to on a separate physical hard drives.
5: Stop writing to multiple tables at once.
6: (Not recommended) run with fsync turned off.

Each of these things can help on their own.  My personal preference
for heavily written databases is a good RAID controller with battery
backed caching on and a lot of discs in RAID-10 or RAID-6 (depending
on read versus write ratio and the need for storage space.)  RAID-10
is normally better for performance, RAID-6 with large arrays is better
for maximizing your size while maintaining decent performance and
reliability.  RAID-5 is right out.

pgsql-performance by date:

Previous
From: Sergei Shelukhin
Date:
Subject: Re: concurrent inserts into two separate tables are very slow
Next
From: Tom Lane
Date:
Subject: Re: Linux/PostgreSQL scalability issue - problem with 8 cores