Re: Enabling B-Tree deduplication by default - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Enabling B-Tree deduplication by default
Date
Msg-id CA+TgmoYpqEVP-hTx0Ut4c+16ynMakDy5MCprPHAx61vJvXuogA@mail.gmail.com
Whole thread Raw
In response to Re: Enabling B-Tree deduplication by default  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Enabling B-Tree deduplication by default  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Thu, Jan 30, 2020 at 2:40 PM Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Jan 30, 2020 at 11:16 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > I prefer to think of the patch as being about improving the stability
> > and predictability of Postgres with certain workloads, rather than
> > being about overall throughput. Postgres has an ungoing need to VACUUM
> > indexes, so making indexes smaller is generally more compelling than
> > it would be with another system. That said, there are certainly quite
> > a few cases that have big improvements in throughput and latency.
>
> I also reran TPC-C/benchmarksql with the patch (v30). TPC-C has hardly
> any non-unique indexes, which is a little unrealistic. I found that
> the patch was up to 7% faster in the first few hours, since it can
> control the bloat from certain non-HOT updates. This isn't a
> particularly relevant workload, since almost all UPDATEs don't affect
> indexed columns. The incoming-item-is-duplicate heuristic works well
> with TPC-C, so there is probably hardly any possible downside there.
>
> I think that I should commit the patch without the GUC tentatively.
> Just have the storage parameter, so that everyone gets the
> optimization without asking for it. We can then review the decision to
> enable deduplication generally after the feature has been in the tree
> for several months.
>
> There is no need to make a final decision about whether or not the
> optimization gets enabled before committing the patch.

That seems reasonable.

I suspect that you're right that the worst-case downside is not big
enough to really be a problem given all the upsides. But the advantage
of getting things committed is that we can find out what users think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Hash join not finding which collation to use for string hashing
Next
From: Mark Dilger
Date:
Subject: Re: Hash join not finding which collation to use for string hashing