Re: Enabling B-Tree deduplication by default - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Enabling B-Tree deduplication by default
Date
Msg-id CAH2-Wz=YUP20oB1J3fxPfZnwx2mqQr5ixx2ABLc=_0abyfJbbA@mail.gmail.com
Whole thread Raw
In response to Re: Enabling B-Tree deduplication by default  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Enabling B-Tree deduplication by default
List pgsql-hackers
On Thu, Jan 30, 2020 at 9:36 AM Robert Haas <robertmhaas@gmail.com> wrote:
> How do things look in a more sympathetic case?

I prefer to think of the patch as being about improving the stability
and predictability of Postgres with certain workloads, rather than
being about overall throughput. Postgres has an ungoing need to VACUUM
indexes, so making indexes smaller is generally more compelling than
it would be with another system. That said, there are certainly quite
a few cases that have big improvements in throughput and latency.

As I mentioned in my opening e-mail, there is a 60% increase in
transaction throughput when there are 3 extra indexes on the
pgbench_accounts table, at scale 1000 (so every column has an index).
That's a little bit silly, but I think that the extreme cases are
interesting. More recently, I found a 19% increase in throughput for a
similar pgbench workload at scale 5000, with only one extra index, on
pgbench_accounts.abalance (so lots of non-HOT updates). I think that
there were 12 + 16 clients in both cases. We can reliably keep (say)
the extra index on pgbench_accounts.abalance 3x smaller with the
patch, even though there is constant update churn. The difference in
index size between master and patch doesn't depend on having pristine
indexes. There is also about a 15% reduction in transaction latency in
these cases.

We usually manage to keep pgbench_accounts_pkey a lot smaller -- it
depends on the exact distribution of values. Skew that isn't all
concentrated in one part of the index (e.g. because we hash the value
generated by the pgbench PRNG) works best when it comes to controlling
pgbench_accounts_pkey bloat. I have seen plenty of cases where it was
about 50% - 95% smaller after several hours. OTOH, a less favorable
distribution of update values will tend to overwhelm the patch's
ability to soak up extra bloat in pgbench_accounts_pkey, though in a
way that is less abrupt. Deduplication of pgbench_accounts_pkey never
seems to have any downside.

You can even have cases like the insert benchmark that still come out
ahead -- despite having no reads or updates. This will tend to happen
when all non-unique indexes are on lower cardinality columns. Maybe 5
- 10 tuples for each distinct key value on average. I would even say
that this is the common case. If I look at the join benchmark data, or
the mouse genome database, or the TPC-E schema, then the patch tends
to leave non-unique indexes a lot smaller than they'd be on master, by
enough to pay for the cycles of deduplication and then some. The patch
makes all indexes taken together (including all unique indexes) about
60% of their original size with the join benchmark database and with
the mouse genome database.  Also, all of the larger TPC-E non-unique
indexes are at least low cardinality enough to be somewhat smaller. If
you can make indexes 2x - 3x smaller, then even inserts will be a lot
faster.

Back in 2008, Jignesh Shah reported that the TPC-E trade table's indexes
were the source of a lot of problems:

https://www.slideshare.net/jkshah/postgresql-and-benchmarks (See slide 19)

All of the three non-unique indexes on the trade table are only about
30% of their original size with deduplication (e.g. i_t_st_id goes
from 1853 MB to only 571 MB). I haven't been able to run the DBT-5
implementation of TPC-E, since it has severely bitrot, but I imagine
that deduplication would help a lot. I did manage to get DBT-5 to produce
initial test data, and have been in touch with Mark Wong about it. That's how
I know that all three extra indexes are 30% of their original size.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Duplicated LSN in ReorderBuffer
Next
From: Tom Lane
Date:
Subject: Re: pg_restore crash when there is a failure before all child process is created