Re: Loading 500m json files to database - Mailing list pgsql-general

From pinker
Subject Re: Loading 500m json files to database
Date
Msg-id 1585009125214-0.post@n3.nabble.com
Whole thread Raw
In response to Re: Loading 500m json files to database  (Christopher Browne <cbbrowne@gmail.com>)
Responses Re: Loading 500m json files to database
List pgsql-general
Christopher Browne-3 wrote
> Well, you're paying for a lot of overhead in that, as you're
> establishing a psql command, connecting to a database, spawning a backend
> process, starting a transactions, committing a transaction, closing the
> backend
> process, disconnecting from the database, and cleaning up after the
> launching
> of the psql command.  And you're doing that 500 million times.
> 
> The one thing I left off that was the loading of a single tuple into
> json_parts.
> 
> What you could do to improve things quite a lot would be to group some
> number
> of those files together, so that each time you pay for the overhead, you
> at
> least
> get the benefit of loading several entries into json_parts.
> 
> So, loosely, I'd commend using /bin/cat (or similar) to assemble several
> files together
> into one, and then \copy that one file in.
> 
> Having 2 tuples loaded at once drops overhead by 50%
> Having 10 tuples loaded at once drops overhead by 90%
> Having 100 tuples loaded at once drops overhead by 99%
> Having 1000 tuples loaded at once drops overhead by 99.9%
> 
> There probably isn't too much real value to going past 1000 tuples per
> batch; the
> overhead, by that point, is getting pretty immaterial.
> 
> Reducing that overhead is the single most important thing you can do.

Yes, I was thinking about that but no idea now how to do it right now. like
some kind of outer loop to concatenate those files? and adding delimiter
between them?


Christopher Browne-3 wrote
> It is also quite likely that you could run such streams in parallel,
> although
> it would require quite a bit more information about the I/O capabilities
> of
> your
> hardware to know if that would do any good.

I can spin up every size of instance.





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



pgsql-general by date:

Previous
From: Justin King
Date:
Subject: Re: PG12 autovac issues
Next
From: pinker
Date:
Subject: Re: Loading 500m json files to database