Thread: Writing 1100 rows per second

Writing 1100 rows per second

From
Arya F
Date:
I'm looking to write about 1100 rows per second to tables up to 100 million rows. I'm trying to come up with a design that I can do all the writes to a database with no indexes. When having indexes the write performance slows down dramatically after the table gets bigger than 30 million rows.


I was thinking of having a server dedicated for all the writes and have another server for reads that has indexes and use logical replication to update the read only server.


Would that work? Or any recommendations how I can achieve good performance for a lot of writes?

Thank you

Re: Writing 1100 rows per second

From
Laurenz Albe
Date:
On Wed, 2020-02-05 at 12:03 -0500, Arya F wrote:
> I'm looking to write about 1100 rows per second to tables up to 100 million rows. I'm trying to
> come up with a design that I can do all the writes to a database with no indexes. When having
> indexes the write performance slows down dramatically after the table gets bigger than 30 million rows.
> 
> I was thinking of having a server dedicated for all the writes and have another server for reads
> that has indexes and use logical replication to update the read only server.
> 
> Would that work? Or any recommendations how I can achieve good performance for a lot of writes?

Logical replication wouldn't make a difference, because with many indexes, replay of the
inserts would be slow as well, and replication would lag more and more.

No matter what you do, there will be no magic way to have your tables indexed and
have fast inserts at the same time.

One idea I can come up with is a table that is partitioned by a column that appears
in a selective search condition, but have no indexes on the table, so that you always get
away with a sequential scan of a single partition.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Writing 1100 rows per second

From
Justin Pryzby
Date:
On Wed, Feb 05, 2020 at 12:03:52PM -0500, Arya F wrote:
> I'm looking to write about 1100 rows per second to tables up to 100 million
> rows. I'm trying to come up with a design that I can do all the writes to a
> database with no indexes. When having indexes the write performance slows
> down dramatically after the table gets bigger than 30 million rows.
> 
> I was thinking of having a server dedicated for all the writes and have
> another server for reads that has indexes and use logical replication to
> update the read only server.

Wouldn't the readonly server still have bad performance for all the wites being
replicated to it ?

> Would that work? Or any recommendations how I can achieve good performance
> for a lot of writes?

Can you use partitioning so the updates are mostly affecting only one table at
once, and its indices are of reasonable size, such that they can fit easily in
shared_buffers.

brin indices may help for some, but likely not for all your indices.

Justin



Re: Writing 1100 rows per second

From
Arya F
Date:
If I run the database on a server that has enough ram to load all the indexes and tables into ram. And then it would update the index on the HDD every x seconds. Would that work to increase performance dramatically?

On Wed, Feb 5, 2020, 12:15 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Wed, Feb 05, 2020 at 12:03:52PM -0500, Arya F wrote:
> I'm looking to write about 1100 rows per second to tables up to 100 million
> rows. I'm trying to come up with a design that I can do all the writes to a
> database with no indexes. When having indexes the write performance slows
> down dramatically after the table gets bigger than 30 million rows.
>
> I was thinking of having a server dedicated for all the writes and have
> another server for reads that has indexes and use logical replication to
> update the read only server.

Wouldn't the readonly server still have bad performance for all the wites being
replicated to it ?

> Would that work? Or any recommendations how I can achieve good performance
> for a lot of writes?

Can you use partitioning so the updates are mostly affecting only one table at
once, and its indices are of reasonable size, such that they can fit easily in
shared_buffers.

brin indices may help for some, but likely not for all your indices.

Justin

Re: Writing 1100 rows per second

From
Haroldo Kerry
Date:
Arya,
We ran into the issue of decreasing insert performance for tables of hundreds of millions of rows and they are indeed due to index updates.
We tested TimescaleDB (a pgsql plugin) with success in all use cases that we have. It does a "behind the scenes" single-level partitioning that is indeed very efficient.
Not sure about the 1100 inserts/s  as it is hardware dependent, but we got the flat response curve (inserts per second stayed stable with hundreds of millions of rows, regardless of indexes).
My suggestion: have a look at https://blog.timescale.com/timescaledb-vs-6a696248104e/ , and do some PoCs.

Regards,
Haroldo Kerry

On Wed, Feb 5, 2020 at 2:25 PM Arya F <arya6000@gmail.com> wrote:
If I run the database on a server that has enough ram to load all the indexes and tables into ram. And then it would update the index on the HDD every x seconds. Would that work to increase performance dramatically?

On Wed, Feb 5, 2020, 12:15 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Wed, Feb 05, 2020 at 12:03:52PM -0500, Arya F wrote:
> I'm looking to write about 1100 rows per second to tables up to 100 million
> rows. I'm trying to come up with a design that I can do all the writes to a
> database with no indexes. When having indexes the write performance slows
> down dramatically after the table gets bigger than 30 million rows.
>
> I was thinking of having a server dedicated for all the writes and have
> another server for reads that has indexes and use logical replication to
> update the read only server.

Wouldn't the readonly server still have bad performance for all the wites being
replicated to it ?

> Would that work? Or any recommendations how I can achieve good performance
> for a lot of writes?

Can you use partitioning so the updates are mostly affecting only one table at
once, and its indices are of reasonable size, such that they can fit easily in
shared_buffers.

brin indices may help for some, but likely not for all your indices.

Justin


--

Haroldo Kerry

CTO/COO

Rua do Rócio, 220, 7° andar, conjunto 72

São Paulo – SP / CEP 04552-000

hkerry@callix.com.br

www.callix.com.br

Re: Writing 1100 rows per second

From
Ogden Brash
Date:


On Wed, Feb 5, 2020 at 9:12 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
One idea I can come up with is a table that is partitioned by a column that appears
in a selective search condition, but have no indexes on the table, so that you always get
away with a sequential scan of a single partition.


This is an approach that I am currently using successfully. We have a large dataset that continues to be computed and so insert speed is of importance to us. The DB currently has about 45 billion rows. There are three columns that are involved in all searches of the data. We have separate tables for all unique combination of those 3 values (which gives us about 2000 tables). Thus, we were able to save the space for having to store those columns (since the name of the table defines what those 3 columns are in that table). We don't have any indices on those tables (except for the default one which gets created for the pk serial number). As a result all searches only involve 1 table and a sequential scan of that table. The logic to choose the correct tables for insertionse or searches lives in our application code and not in SQL.

The size of the 2000 tables forms a gaussian distirbution, so our largest table is about a billion rows and there are many tables that have hundreds of millions of rows. The ongoing insertions form the same distribution, so the bulk of insertions is happening into the largest tables. It is not a speed demon and I have not run tests recently but back of the envelope calculations give me confidence that we are definitely inserting more than 1100 per second. And that is running the server on an old puny i5 processor with regular HDDs and  only 32Gb of memory.

Re: Writing 1100 rows per second

From
Jeff Janes
Date:
On Wed, Feb 5, 2020 at 12:25 PM Arya F <arya6000@gmail.com> wrote:
If I run the database on a server that has enough ram to load all the indexes and tables into ram. And then it would update the index on the HDD every x seconds. Would that work to increase performance dramatically?

Perhaps.  Probably not dramatically though.  If x seconds (called a checkpoint) is not long enough for the entire index to have been dirtied, then my finding is that writing half of the pages (randomly interspersed) of a file, even in block order, still has the horrid performance of a long sequence of random writes, not the much better performance of a handful of sequential writes.  Although this probably depends strongly on your RAID controller and OS version and such, so you should try it for yourself on your own hardware.

Cheers,

Jeff

Re: Writing 1100 rows per second

From
Amol Tarte
Date:
I am using npgsql in my c# program,
I have a table in which we need to dump data from a flat file approx. 5.5 million rows once in every 24 hours.
Soon it will grow to 10 million rows.
We are using npgsql binary import and all this is done (5.5 million) inserts in < 3 minutes.
Then we create two indexes on the table which takes 25-30 seconds.

With Warm Regards,

Amol P. Tarte
Project Manager,
Rajdeep Info Techno Pvt. Ltd.


On Mon, Feb 10, 2020 at 1:00 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Feb 5, 2020 at 12:25 PM Arya F <arya6000@gmail.com> wrote:
If I run the database on a server that has enough ram to load all the indexes and tables into ram. And then it would update the index on the HDD every x seconds. Would that work to increase performance dramatically?

Perhaps.  Probably not dramatically though.  If x seconds (called a checkpoint) is not long enough for the entire index to have been dirtied, then my finding is that writing half of the pages (randomly interspersed) of a file, even in block order, still has the horrid performance of a long sequence of random writes, not the much better performance of a handful of sequential writes.  Although this probably depends strongly on your RAID controller and OS version and such, so you should try it for yourself on your own hardware.

Cheers,

Jeff