Re: Enabling B-Tree deduplication by default - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Enabling B-Tree deduplication by default |
Date | |
Msg-id | CAH2-Wz=YUP20oB1J3fxPfZnwx2mqQr5ixx2ABLc=_0abyfJbbA@mail.gmail.com Whole thread Raw |
In response to | Re: Enabling B-Tree deduplication by default (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Enabling B-Tree deduplication by default
|
List | pgsql-hackers |
On Thu, Jan 30, 2020 at 9:36 AM Robert Haas <robertmhaas@gmail.com> wrote: > How do things look in a more sympathetic case? I prefer to think of the patch as being about improving the stability and predictability of Postgres with certain workloads, rather than being about overall throughput. Postgres has an ungoing need to VACUUM indexes, so making indexes smaller is generally more compelling than it would be with another system. That said, there are certainly quite a few cases that have big improvements in throughput and latency. As I mentioned in my opening e-mail, there is a 60% increase in transaction throughput when there are 3 extra indexes on the pgbench_accounts table, at scale 1000 (so every column has an index). That's a little bit silly, but I think that the extreme cases are interesting. More recently, I found a 19% increase in throughput for a similar pgbench workload at scale 5000, with only one extra index, on pgbench_accounts.abalance (so lots of non-HOT updates). I think that there were 12 + 16 clients in both cases. We can reliably keep (say) the extra index on pgbench_accounts.abalance 3x smaller with the patch, even though there is constant update churn. The difference in index size between master and patch doesn't depend on having pristine indexes. There is also about a 15% reduction in transaction latency in these cases. We usually manage to keep pgbench_accounts_pkey a lot smaller -- it depends on the exact distribution of values. Skew that isn't all concentrated in one part of the index (e.g. because we hash the value generated by the pgbench PRNG) works best when it comes to controlling pgbench_accounts_pkey bloat. I have seen plenty of cases where it was about 50% - 95% smaller after several hours. OTOH, a less favorable distribution of update values will tend to overwhelm the patch's ability to soak up extra bloat in pgbench_accounts_pkey, though in a way that is less abrupt. Deduplication of pgbench_accounts_pkey never seems to have any downside. You can even have cases like the insert benchmark that still come out ahead -- despite having no reads or updates. This will tend to happen when all non-unique indexes are on lower cardinality columns. Maybe 5 - 10 tuples for each distinct key value on average. I would even say that this is the common case. If I look at the join benchmark data, or the mouse genome database, or the TPC-E schema, then the patch tends to leave non-unique indexes a lot smaller than they'd be on master, by enough to pay for the cycles of deduplication and then some. The patch makes all indexes taken together (including all unique indexes) about 60% of their original size with the join benchmark database and with the mouse genome database. Also, all of the larger TPC-E non-unique indexes are at least low cardinality enough to be somewhat smaller. If you can make indexes 2x - 3x smaller, then even inserts will be a lot faster. Back in 2008, Jignesh Shah reported that the TPC-E trade table's indexes were the source of a lot of problems: https://www.slideshare.net/jkshah/postgresql-and-benchmarks (See slide 19) All of the three non-unique indexes on the trade table are only about 30% of their original size with deduplication (e.g. i_t_st_id goes from 1853 MB to only 571 MB). I haven't been able to run the DBT-5 implementation of TPC-E, since it has severely bitrot, but I imagine that deduplication would help a lot. I did manage to get DBT-5 to produce initial test data, and have been in touch with Mark Wong about it. That's how I know that all three extra indexes are 30% of their original size. -- Peter Geoghegan
pgsql-hackers by date: