Thread: best way to write large data-streams quickly?

best way to write large data-streams quickly?

From
Mark Moellering
Date:
Everyone,

We are trying to architect a new system, which will have to take several large datastreams (total of ~200,000 parsed files per second) and place them in a database.  I am trying to figure out the best way to import that sort of data into Postgres. 

I keep thinking i can't be the first to have this problem and there are common solutions but I can't find any.  Does anyone know of some sort method, third party program, etc, that can accept data from a number of different sources, and push it into Postgres as fast as possible?

Thanks in advance,

Mark Moellering

Re: best way to write large data-streams quickly?

From
Steve Atkins
Date:
> On Apr 9, 2018, at 8:49 AM, Mark Moellering <markmoellering@psyberation.com> wrote:
>
> Everyone,
>
> We are trying to architect a new system, which will have to take several large datastreams (total of ~200,000 parsed
filesper second) and place them in a database.  I am trying to figure out the best way to import that sort of data into
Postgres.  
>
> I keep thinking i can't be the first to have this problem and there are common solutions but I can't find any.  Does
anyoneknow of some sort method, third party program, etc, that can accept data from a number of different sources, and
pushit into Postgres as fast as possible? 

Take a look at http://ossc-db.github.io/pg_bulkload/index.html. Check the benchmarks for different situations compared
toCOPY. 

Depending on what you're doing using custom code to parse your data and then do multiple binary COPYs in parallel may
bebetter. 

Cheers,
  Steve



Re: best way to write large data-streams quickly?

From
Mark Moellering
Date:
On Mon, Apr 9, 2018 at 12:01 PM, Steve Atkins <steve@blighty.com> wrote:

> On Apr 9, 2018, at 8:49 AM, Mark Moellering <markmoellering@psyberation.com> wrote:
>
> Everyone,
>
> We are trying to architect a new system, which will have to take several large datastreams (total of ~200,000 parsed files per second) and place them in a database.  I am trying to figure out the best way to import that sort of data into Postgres.
>
> I keep thinking i can't be the first to have this problem and there are common solutions but I can't find any.  Does anyone know of some sort method, third party program, etc, that can accept data from a number of different sources, and push it into Postgres as fast as possible?

Take a look at http://ossc-db.github.io/pg_bulkload/index.html. Check the benchmarks for different situations compared to COPY.

Depending on what you're doing using custom code to parse your data and then do multiple binary COPYs in parallel may be better.

Cheers,
  Steve



(fighting google slightly to keep from top-posting...)

Thanks!

How long can you run COPY?  I have been looking at it more closely.  In some ways, it would be simple just to take data from stdin and send it to postgres but can I do that literally 24/7?  I am monitoring data feeds that will never stop and I don't know if that is how Copy is meant to be used or if I have to let it finish and start another one at some point? 

Thanks for everyones' help and input!

Mark Moellering


Re: best way to write large data-streams quickly?

From
Jerry Sievers
Date:
Mark Moellering <markmoellering@psyberation.com> writes:

<snip>

>
> How long can you run COPY?  I have been looking at it more closely. 
> In some ways, it would be simple just to take data from stdin and
> send it to postgres but can I do that literally 24/7?  I am
> monitoring data feeds that will never stop and I don't know if that
> is how Copy is meant to be used or if I have to let it finish and
> start another one at some point? 

Launch a single copy and pipe data into it for an extended period an/or
bulk is fine but nothing will be visible until the statement is finished
and, if it were run in a transaction block, the block committed.

HTH

>
> Thanks for everyones' help and input!
>
> Mark Moellering
>
>
>
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800