Peter Geoghegan <pg@heroku.com> wrote:
> I still strongly feel it ought to be driven by an insert
Could you clarify that? Does this mean that you feel that we
should write to the heap before reading the index to see if the row
will be a duplicate? If so, I think that is a bad idea, since this
will sometimes be used to apply a new data set which hasn't changed
much from the old, and that approach will perform poorly for this
use case, causing a lot of bloat. It certainly would work well for
the case that most of the rows are expected to be INSERTs rather
than DELETEs, but I'm not sure that's justification for causing
extreme bloat in the other cases.
Also, just a reminder that I'm going to squawk loudly if the
implementation does not do something fairly predictable and sane
for the case that the table has more than one UNIQUE index and you
attempt to UPSERT a row that is a duplicate of one row on one of
the indexes and a different row on a different index. The example
discussed during your PGCon talk was something like a city table
with two column, each with a UNIQUE constraint, containing:
city_id | city_name
---------+-----------
1 | Toronto
2 | Ottawa
... and an UPSERT comes through for (1, 'Ottawa'). We would all
like for that never to happen, but it will. There must be sane and
documented behavior in that case.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company