Thread: Changing csv structure and corresponding etl

Changing csv structure and corresponding etl

From

Shekar Tippur

Date:

13 December 2018, 16:51:59

Hello,

I am using redshift to store data from csv backups that appear at a regular interval. I use pyspark (psycopg library)
toperform etl. The issue is that the csv structure changes in between and the etl job fails.  
The issue I found is that the column gets mixed up.
For example, the original column list was A,B,C,D. In the next iteration, the columns can be A, B, C, X,Y, D
I read from some of the other posts that it is not possible to alter a table to add a column in a particular position
withinPostgres.  
The table itself currently has millions of rows. Merging tables whenever I get a change may not be a good option. I.e.
createa union of existing table and new data, drop the original table and rename union to original. 
Any pointers in how to proceed?

Thanks,
Shekar
Sent from my iPhone