Double insertion from scala spark job - Mailing list pgsql-jdbc

From Antoine DUBOIS
Subject Double insertion from scala spark job
Date
Msg-id 1094480618.3358882.1612867353140.JavaMail.zimbra@cc.in2p3.fr
Whole thread Raw
Responses Re: Double insertion from scala spark job
List pgsql-jdbc
Hello

I'm working with spark and postgresql to compute stat.
I came to encounter a strange behaviour in my job, when working with postgresql output I sometime have a double insertion happenning into my table (and violating constraint).
Detail: Key (xxx, xxx, xxx, xxx, xxx, xxxx, xxxx)=(2021-02-05 00:00:00, data, moredate, evenmoredata, somuchmoredata, dataagain, somuchofit) already exists.  Call getNextException to see other errors in the batch.
My data are generated as duplicate if I write the same data into mysql or into a parquet file with the same input and treatment I don't observe this behaviour.
Dev spec:
Scala 2.12
Spark Version 3.0.1
JDK 8
jdbc  "org.postgresql" % "postgresql" % "42.2.18"

PostgreSQL 12.5

My code is pretty simple and apply a SQL request to a parquet file and write the result like this :

outputDF.write.format("jdbc").option("driver", "org.postgresql.Driver").option("url", "jdbc:postgresql://<HOST>:<PORT>/<SCHEMA>?user=<USERNAME>&password=<PASSWORD>").option("dbtable", "mytable").mode(append).save()

What lead me to think it's a postgres jdbc bug more than anything else is the fact that this same command to output in mysql or in a parquet file produce no duplicate in this particular edge case i have with only some of my input files.
If any of you had any idea what could cause such a behavior (special char in the input, misconfigured something, maybe an option I don't know could help solving this issue )

I came to a point where I'm not sure of anything any longer.
Hope anyone will have some though about it.

Hope your doing fine

Antoine
Attachment

pgsql-jdbc by date:

Previous
From: Roman Kozlov
Date:
Subject: [pgjdbc/pgjdbc] 138d7e: test: Add TestUtils.closeQuietly(...)
Next
From: Dave Cramer
Date:
Subject: Re: Double insertion from scala spark job