Re: best way to write large data-streams quickly? - Mailing list pgsql-general

From Mark Moellering
Subject Re: best way to write large data-streams quickly?
Date
Msg-id CAA0uU3W4Loyv3Bubo_m9TVp80W0odnq1euthH6X+rjgSKWiSjw@mail.gmail.com
Whole thread Raw
In response to Re: best way to write large data-streams quickly?  (Steve Atkins <steve@blighty.com>)
Responses Re: best way to write large data-streams quickly?  (Jerry Sievers <gsievers19@comcast.net>)
List pgsql-general
On Mon, Apr 9, 2018 at 12:01 PM, Steve Atkins <steve@blighty.com> wrote:

> On Apr 9, 2018, at 8:49 AM, Mark Moellering <markmoellering@psyberation.com> wrote:
>
> Everyone,
>
> We are trying to architect a new system, which will have to take several large datastreams (total of ~200,000 parsed files per second) and place them in a database.  I am trying to figure out the best way to import that sort of data into Postgres.
>
> I keep thinking i can't be the first to have this problem and there are common solutions but I can't find any.  Does anyone know of some sort method, third party program, etc, that can accept data from a number of different sources, and push it into Postgres as fast as possible?

Take a look at http://ossc-db.github.io/pg_bulkload/index.html. Check the benchmarks for different situations compared to COPY.

Depending on what you're doing using custom code to parse your data and then do multiple binary COPYs in parallel may be better.

Cheers,
  Steve



(fighting google slightly to keep from top-posting...)

Thanks!

How long can you run COPY?  I have been looking at it more closely.  In some ways, it would be simple just to take data from stdin and send it to postgres but can I do that literally 24/7?  I am monitoring data feeds that will never stop and I don't know if that is how Copy is meant to be used or if I have to let it finish and start another one at some point? 

Thanks for everyones' help and input!

Mark Moellering


pgsql-general by date:

Previous
From: Vikas Sharma
Date:
Subject: Postgresql Split Brain: Which one is latest
Next
From: Achilleas Mantzios
Date:
Subject: Re: Postgresql Split Brain: Which one is latest