On Wed, Jul 26, 2017 at 3:32 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Fri, Jul 14, 2017 at 5:06 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> I think that what this probably comes down to, more than anything
>> else, is that you have leftmost hot/bloated leaf pages like this:
>>
>>
>> idx | level | l_item | blkno | btpo_prev |
>> btpo_next | btpo_flags | type | live_items | dead_items |
>> avg_item_size | page_size | free_size | highkey
>>
-----------------------+-------+--------+-------+-----------+-----------+------------+------+------------+------------+---------------+-----------+-----------+-------------------------
>> ...
>> pgbench_accounts_pkey | 0 | 1 | 1 | 0 |
>> 2751 | 65 | l | 100 | 41 | 16 |
>> 8192 | 5328 | 11 00 00 00 00 00 00 00
>> pgbench_accounts_pkey | 0 | 2 | 2751 | 1 |
>> 2746 | 65 | l | 48 | 90 | 16 |
>> 8192 | 5388 | 32 00 00 00 00 00 00 00
>> ...
>>
>> The high key for the leftmost shows that only values below 0x11 belong
>> on the first page. This is about 16 or 17 possible distinct values,
>> and yet the page has 100 live items, and 41 dead items; in total,
>> there is room for 367 items. It's only slightly better with other
>> nearby pages. This is especially bad because once the keyspace gets
>> split up this finely, it's *impossible* to reverse it -- it's more or
>> less a permanent problem, at least until a REINDEX.
>
> I've been thinking about this a lot, because this really does look
> like a pathological case to me. I think that this workload is very
> sensitive to how effective kill_prior_tuples/LP_DEAD hinting is. Or at
> least, I can imagine that mechanism doing a lot better than it
> actually manages to do here. I wonder if it's possible that commit
> 2ed5b87f9, which let MVCC snapshots not hold on to a pin on leaf
> pages, should have considered workloads like this.
>
Isn't it possible to confirm if the problem is due to commit
2ed5b87f9? Basically, if we have unlogged tables, then it won't
release the pin. So if the commit in question is the culprit, then
the same workload should not lead to bloat.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com