On Mon, Mar 23, 2020 at 3:24 AM pinker <pinker@onet.eu> wrote:
time for i in datafiles/*; do psql -c "\copy json_parts(json_data) FROM $i"& done
Don't know whether this is faster but it does avoid spinning up a connection multiple times.
#bash, linux
function append_each_split_file_to_etl_load_script() { for filetoload in ./*; do ronumber="$(basename $filetoload)" # only process files since subdirs can be present if [[ -f "$filetoload" ]]; then echo "" echo "\set invoice"' `cat '"'""$filetoload""'"'`' echo ", ('$ronumber',:'invoice')" fi >> "$PSQLSCRIPT" done
There is a bit other related code that is needed (for my specific usage) but this is the core of it. Use psql variables to capture the contents of each file into a variable and then just perform a normal insert (specifically, a VALUES (...), (...) variant). Since you can intermix psql and SQL you basically output a bloody long script, that has memory issues at scale - but you can divide and conquer - and then "psql --file bloody_long_script_part_1_of_100000.psql".
David J.
Can one put 550M files in a single directory? I thought it topped out at 16M or so.