Thread: [PERFORM] Insert Concurrency

[PERFORM] Insert Concurrency

From

ROBERT PRICE

Date:

18 April 2017, 08:55:41

I come from an Oracle background and am porting an application to postgres. App has a table that will contain 100 million rows and has to be loaded by a process that reads messages off a SQS queue and makes web service calls to insert records one row at a time in a postgres RDS instance. I know slow by slow is not the ideal approach but I was wondering if postgres had partitioning or other ways to tune concurrent insert statements. Process will run 50 - 100 concurrent threads.

Re: [PERFORM] Insert Concurrency

From

David Rowley

Date:

18 April 2017, 11:41:04

On 18 April 2017 at 14:55, ROBERT PRICE <rprice504@hotmail.com> wrote:
> I come from an Oracle background and am porting an application to postgres.
> App has a table that will contain 100 million rows and has to be loaded by a
> process that reads messages off a SQS queue and makes web service calls to
> insert records one row at a time in a postgres RDS instance. I know slow by
> slow is not the ideal approach but I was wondering if postgres had
> partitioning or other ways to tune concurrent insert statements. Process
> will run 50 - 100 concurrent threads.

Have you tested performance and noticed that it is insufficient for
your needs? or do you just assume PostgreSQL suffers from the same
issue as Oracle in regards to INSERT contention on a single table?

You may like to look at pgbench [1] to test the performance if you've
not done so already.

[1] https://www.postgresql.org/docs/9.6/static/pgbench.html

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [PERFORM] Insert Concurrency

From

Daniel Blanch Bataller

Date:

18 April 2017, 11:45:31

Yes, postgres has partitions:

https://www.postgresql.org/docs/9.6/static/ddl-partitioning.html

But this is not going to help much in the scenario you have.

Postgres can ingest data very very fast, 100M records in seconds - minutes , faster than oracle can serve it in many scenarios (all I have tested).

Specially if you use COPY command

https://www.postgresql.org/docs/9.6/static/sql-copy.html

and even faster if you use the unlogged feature

https://www.postgresql.org/docs/9.6/static/sql-altertable.html

You can tune postgres to make it even faster, but it’s not normally necessary, with the two advices I gave you firstly, is more than enough, If I don’t remember it wrong you can move 100M records in ~ 2 minutes.

https://www.postgresql.org/docs/current/static/populate.html

But if you are going to move a record at a time you are going to be limited by the fastest transaction rate you can achieve, which is going to be a few hundred per second, and limited at the end by the disk hardware you have, . Out of the box and on commodity hardware it can take you up to then days to move 100M records.

So, my recomendation is to find a way to batch record insertions using copy, the benefits you can achieve tunning postgres are going to be marginal compared with COPY.

Regards

Daniel Blanch.

ww.translatetopostgres.com

El 18 abr 2017, a las 4:55, ROBERT PRICE <rprice504@hotmail.com> escribió:

I come from an Oracle background and am porting an application to postgres. App has a table that will contain 100 million rows and has to be loaded by a process that reads messages off a SQS queue and makes web service calls to insert records one row at a time in a postgres RDS instance. I know slow by slow is not the ideal approach but I was wondering if postgres had partitioning or other ways to tune concurrent insert statements. Process will run 50 - 100 concurrent threads.

Re: [PERFORM] Insert Concurrency

From

Claudio Freire

Date:

18 April 2017, 21:29:18

On Tue, Apr 18, 2017 at 2:45 AM, Daniel Blanch Bataller
<daniel.blanch.bataller@gmail.com> wrote:
>
> But if you are going to move a record at a time you are going to be limited
> by the fastest transaction rate you can achieve, which is going to be a few
> hundred per second, and limited at the end by the disk hardware you have, .
> Out of the box  and on commodity hardware it can take you up to then days to
> move 100M records.

RDS usually is not commodity hardware, most RDS instances will have
some form of SSD storage, so performance could be much higher than
what you'd get on your laptop.

I'd have to second David's advice: test with pgbench first. It can
quite accurately simulate your use case.

Re: [PERFORM] Insert Concurrency

From

Scott Marlowe

Date:

18 April 2017, 21:41:17

On Mon, Apr 17, 2017 at 8:55 PM, ROBERT PRICE <rprice504@hotmail.com> wrote:
> I come from an Oracle background and am porting an application to postgres.
> App has a table that will contain 100 million rows and has to be loaded by a
> process that reads messages off a SQS queue and makes web service calls to
> insert records one row at a time in a postgres RDS instance. I know slow by
> slow is not the ideal approach but I was wondering if postgres had
> partitioning or other ways to tune concurrent insert statements. Process
> will run 50 - 100 concurrent threads.

It's not uncommon to look for an Oracle solution while working with
another rdbms. Often what works in one engine doesn't work the same or
as well in another.

Is it possible for you to roll up some of these inserts into a single
transaction in some way? Even inserting ten rows at a time instead of
one at a time can make a big difference in your insert rate. Being
able to roll up 100 or more together even more so.

Another possibility is to insert them into a smaller table, then have
a process every so often come along, and insert all the rows there and
then delete them or truncate the table (for truncate you'll need to
lock the table to not lose rows).

--
To understand recursion, one must first understand recursion.

Re: [PERFORM] Insert Concurrency

From

David McKelvie

Date:

18 April 2017, 21:49:53

>> To understand recursion, one must first understand recursion.

This makes no sense unless you also provide the base case.

David

Re: [PERFORM] Insert Concurrency

From

ROBERT PRICE

Date:

18 April 2017, 22:29:13

Thanks everyone, I decided to have the SQS process changed to create csv files in a S3 bucket. Then we have a process that will use the copy command to load the data. Process is loading 500,000 records in around 4 minutes which should be good enough for now. Going to look at pg_citus to get up to speed on postgres partitioning for a future need.

From: Scott Marlowe <scott.marlowe@gmail.com>
Sent: Tuesday, April 18, 2017 3:41 PM
To: ROBERT PRICE
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Insert Concurrency

On Mon, Apr 17, 2017 at 8:55 PM, ROBERT PRICE <rprice504@hotmail.com> wrote:
> I come from an Oracle background and am porting an application to postgres.
> App has a table that will contain 100 million rows and has to be loaded by a
> process that reads messages off a SQS queue and makes web service calls to
> insert records one row at a time in a postgres RDS instance. I know slow by
> slow is not the ideal approach but I was wondering if postgres had
> partitioning or other ways to tune concurrent insert statements. Process
> will run 50 - 100 concurrent threads.

It's not uncommon to look for an Oracle solution while working with
another rdbms. Often what works in one engine doesn't work the same or
as well in another.

Is it possible for you to roll up some of these inserts into a single
transaction in some way? Even inserting ten rows at a time instead of
one at a time can make a big difference in your insert rate. Being
able to roll up 100 or more together even more so.

Another possibility is to insert them into a smaller table, then have
a process every so often come along, and insert all the rows there and
then delete them or truncate the table (for truncate you'll need to
lock the table to not lose rows).

--
To understand recursion, one must first understand recursion.