Thread: [PERFORM] Insert Concurrency
I come from an Oracle background and am porting an application to postgres. App has a table that will contain 100 million rows and has to be loaded by a process that reads messages off a SQS queue and makes web service calls to insert records one row at a time in a postgres RDS instance. I know slow by slow is not the ideal approach but I was wondering if postgres had partitioning or other ways to tune concurrent insert statements. Process will run 50 - 100 concurrent threads.
On 18 April 2017 at 14:55, ROBERT PRICE <rprice504@hotmail.com> wrote: > I come from an Oracle background and am porting an application to postgres. > App has a table that will contain 100 million rows and has to be loaded by a > process that reads messages off a SQS queue and makes web service calls to > insert records one row at a time in a postgres RDS instance. I know slow by > slow is not the ideal approach but I was wondering if postgres had > partitioning or other ways to tune concurrent insert statements. Process > will run 50 - 100 concurrent threads. Have you tested performance and noticed that it is insufficient for your needs? or do you just assume PostgreSQL suffers from the same issue as Oracle in regards to INSERT contention on a single table? You may like to look at pgbench [1] to test the performance if you've not done so already. [1] https://www.postgresql.org/docs/9.6/static/pgbench.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
El 18 abr 2017, a las 4:55, ROBERT PRICE <rprice504@hotmail.com> escribió:I come from an Oracle background and am porting an application to postgres. App has a table that will contain 100 million rows and has to be loaded by a process that reads messages off a SQS queue and makes web service calls to insert records one row at a time in a postgres RDS instance. I know slow by slow is not the ideal approach but I was wondering if postgres had partitioning or other ways to tune concurrent insert statements. Process will run 50 - 100 concurrent threads.
On Tue, Apr 18, 2017 at 2:45 AM, Daniel Blanch Bataller <daniel.blanch.bataller@gmail.com> wrote: > > But if you are going to move a record at a time you are going to be limited > by the fastest transaction rate you can achieve, which is going to be a few > hundred per second, and limited at the end by the disk hardware you have, . > Out of the box and on commodity hardware it can take you up to then days to > move 100M records. RDS usually is not commodity hardware, most RDS instances will have some form of SSD storage, so performance could be much higher than what you'd get on your laptop. I'd have to second David's advice: test with pgbench first. It can quite accurately simulate your use case.
On Mon, Apr 17, 2017 at 8:55 PM, ROBERT PRICE <rprice504@hotmail.com> wrote: > I come from an Oracle background and am porting an application to postgres. > App has a table that will contain 100 million rows and has to be loaded by a > process that reads messages off a SQS queue and makes web service calls to > insert records one row at a time in a postgres RDS instance. I know slow by > slow is not the ideal approach but I was wondering if postgres had > partitioning or other ways to tune concurrent insert statements. Process > will run 50 - 100 concurrent threads. It's not uncommon to look for an Oracle solution while working with another rdbms. Often what works in one engine doesn't work the same or as well in another. Is it possible for you to roll up some of these inserts into a single transaction in some way? Even inserting ten rows at a time instead of one at a time can make a big difference in your insert rate. Being able to roll up 100 or more together even more so. Another possibility is to insert them into a smaller table, then have a process every so often come along, and insert all the rows there and then delete them or truncate the table (for truncate you'll need to lock the table to not lose rows). -- To understand recursion, one must first understand recursion.
>> To understand recursion, one must first understand recursion. This makes no sense unless you also provide the base case. David
Thanks everyone, I decided to have the SQS process changed to create csv files in a S3 bucket. Then we have a process that will use the copy command to load the data. Process is loading 500,000 records in around 4 minutes which should be good enough for now. Going to look at pg_citus to get up to speed on postgres partitioning for a future need.
Sent: Tuesday, April 18, 2017 3:41 PM
To: ROBERT PRICE
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Insert Concurrency
> I come from an Oracle background and am porting an application to postgres.
> App has a table that will contain 100 million rows and has to be loaded by a
> process that reads messages off a SQS queue and makes web service calls to
> insert records one row at a time in a postgres RDS instance. I know slow by
> slow is not the ideal approach but I was wondering if postgres had
> partitioning or other ways to tune concurrent insert statements. Process
> will run 50 - 100 concurrent threads.
It's not uncommon to look for an Oracle solution while working with
another rdbms. Often what works in one engine doesn't work the same or
as well in another.
Is it possible for you to roll up some of these inserts into a single
transaction in some way? Even inserting ten rows at a time instead of
one at a time can make a big difference in your insert rate. Being
able to roll up 100 or more together even more so.
Another possibility is to insert them into a smaller table, then have
a process every so often come along, and insert all the rows there and
then delete them or truncate the table (for truncate you'll need to
lock the table to not lose rows).
--
To understand recursion, one must first understand recursion.