On Thu, Jan 30, 2020 at 11:16 AM Peter Geoghegan <pg@bowt.ie> wrote:
> I prefer to think of the patch as being about improving the stability
> and predictability of Postgres with certain workloads, rather than
> being about overall throughput. Postgres has an ungoing need to VACUUM
> indexes, so making indexes smaller is generally more compelling than
> it would be with another system. That said, there are certainly quite
> a few cases that have big improvements in throughput and latency.
I also reran TPC-C/benchmarksql with the patch (v30). TPC-C has hardly
any non-unique indexes, which is a little unrealistic. I found that
the patch was up to 7% faster in the first few hours, since it can
control the bloat from certain non-HOT updates. This isn't a
particularly relevant workload, since almost all UPDATEs don't affect
indexed columns. The incoming-item-is-duplicate heuristic works well
with TPC-C, so there is probably hardly any possible downside there.
I think that I should commit the patch without the GUC tentatively.
Just have the storage parameter, so that everyone gets the
optimization without asking for it. We can then review the decision to
enable deduplication generally after the feature has been in the tree
for several months.
There is no need to make a final decision about whether or not the
optimization gets enabled before committing the patch.
--
Peter Geoghegan