On Thu, Jan 30, 2020 at 2:40 PM Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Jan 30, 2020 at 11:16 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > I prefer to think of the patch as being about improving the stability
> > and predictability of Postgres with certain workloads, rather than
> > being about overall throughput. Postgres has an ungoing need to VACUUM
> > indexes, so making indexes smaller is generally more compelling than
> > it would be with another system. That said, there are certainly quite
> > a few cases that have big improvements in throughput and latency.
>
> I also reran TPC-C/benchmarksql with the patch (v30). TPC-C has hardly
> any non-unique indexes, which is a little unrealistic. I found that
> the patch was up to 7% faster in the first few hours, since it can
> control the bloat from certain non-HOT updates. This isn't a
> particularly relevant workload, since almost all UPDATEs don't affect
> indexed columns. The incoming-item-is-duplicate heuristic works well
> with TPC-C, so there is probably hardly any possible downside there.
>
> I think that I should commit the patch without the GUC tentatively.
> Just have the storage parameter, so that everyone gets the
> optimization without asking for it. We can then review the decision to
> enable deduplication generally after the feature has been in the tree
> for several months.
>
> There is no need to make a final decision about whether or not the
> optimization gets enabled before committing the patch.
That seems reasonable.
I suspect that you're right that the worst-case downside is not big
enough to really be a problem given all the upsides. But the advantage
of getting things committed is that we can find out what users think.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company