Thread: announce: spark-postgres 3 released

announce: spark-postgres 3 released

From
Nicolas Paris
Date:
Hello postgres users,

Spark-postgres is designed for reliable and performant ETL in big-data
workload and offers read/write/scd capability to better bridge spark and
postgres. The version 3 introduces a datasource API. It outperforms
sqoop by factor 8 and the apache spark core jdbc by infinity.

Features:
- use of pg COPY statements
- parallel reads/writes
- use of hdfs to store intermediary csv 
- reindex after bulk-loading
- SCD1 computations done on the spark side
- use unlogged tables when needed
- handle arrays and multiline string columns
- useful jdbc functions (ddl, updates...)

The official repository:
https://framagit.org/parisni/spark-etl/tree/master/spark-postgres

And its mirror on microsoft github:
https://github.com/EDS-APHP/spark-etl/tree/master/spark-postgres

-- 
nicolas



Re: announce: spark-postgres 3 released

From
Adrian Klaver
Date:
On 11/10/19 4:05 PM, Nicolas Paris wrote:
> Hello postgres users,

Interesting. FYI, the announcement list is:

https://www.postgresql.org/list/pgsql-announce/

> 
> Spark-postgres is designed for reliable and performant ETL in big-data
> workload and offers read/write/scd capability to better bridge spark and
> postgres. The version 3 introduces a datasource API. It outperforms
> sqoop by factor 8 and the apache spark core jdbc by infinity.
> 
> Features:
> - use of pg COPY statements
> - parallel reads/writes
> - use of hdfs to store intermediary csv
> - reindex after bulk-loading
> - SCD1 computations done on the spark side
> - use unlogged tables when needed
> - handle arrays and multiline string columns
> - useful jdbc functions (ddl, updates...)
> 
> The official repository:
> https://framagit.org/parisni/spark-etl/tree/master/spark-postgres
> 
> And its mirror on microsoft github:
> https://github.com/EDS-APHP/spark-etl/tree/master/spark-postgres
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com