I am using psql to periodically dump the postgres tables into json files which are imported into snowflake. For large tables (e.g. 70M rows), it takes hours for psql to complete. Using spark to read the postgres table seems not to work as the postgres read only replication is the bottleneck so spark cluster never uses >1 worker node and the working node timeout or out of memory.
Will vertical scaling the postgres db speed up psql? Or any thread related parameter of psql can help? Thanks for any hints.
Regards
Lian
Have you looked into COPY command? Or CopyManager java class?