Thread: [WIP] [B-Tree] Retail IndexTuple deletion

[WIP] [B-Tree] Retail IndexTuple deletion

From
"Andrey V. Lepikhov"
Date:
Hi,
I have written a code for quick indextuple deletion from an relation by 
heap tuple TID. The code relate to "Retail IndexTuple deletion" 
enhancement of btree index on postgresql wiki [1].
Briefly, it includes three steps:
1. Key generation for index tuple searching.
2. Index relation search for tuple with the heap tuple TID.
3. Deletion of the tuple from the index relation.

Now, index relation cleaning performs by vacuum which scan all index 
relation for dead entries sequentially, tuple-by-tuple. This simplistic 
and safe method can be significantly surpasses in the cases, than a 
number of dead entries is not large by retail deletions which uses index 
scans for search dead entries. Also, it can be used by distributed 
systems for reduce cost of a global index vacuum.

Patch '0001-retail-indextuple-deletion' introduce new function 
amtargetdelete() in access method interface. Patch 
'0002-quick-vacuum-strategy' implements this function for an alternative 
strategy of lazy index vacuum, called 'Quick Vacuum'.

The code demands hold DEAD tuple storage until scan key will be created. 
In this version I add 'target_index_deletion_factor' option. If it more 
than 0, heap_page_prune() uses ItemIdMarkDead() instead of 
ItemIdSetDead() function for set DEAD flag and hold the tuple storage.
Next step is developing background worker which will collect pairs (tid, 
scankey) of DEAD tuples from heap_page_prune() function.

Here are test description and some execution time measurements results 
showing the benefit of this patches:

Test:
-----
create table test_range(id serial primary key, value integer);
insert into test_range (value) select random()*1e7/10^N from 
generate_series(1, 1e7);
DELETE FROM test_range WHERE value=1;
VACUUM test_range;

Results:
--------

| n | t1, s  | t2, s  | speedup |
|---|---------------------------|
| 0 | 0.00003| 0.4476 | 1748.4  |
| 1 | 0.00006| 0.5367 | 855.99  |
| 2 | 0.0004 | 0.9804 | 233.99  |
| 3 | 0.0048 | 1.6493 | 34.576  |
| 4 | 0.5600 | 2.4854 | 4.4382  |
| 5 | 3.3300 | 3.9555 | 1.2012  |
| 6 | 17.700 | 5.6000 | 0.3164  |
|---|---------------------------|
in the table, t1 - measured execution time of lazy_vacuum_index() 
function by Quick-Vacuum strategy; t2 - measured execution time of 
lazy_vacuum_index() function by Lazy-Vacuum strategy;

Note, guaranteed allowable time of index scans (used for quick deletion) 
will be achieved by storing equal-key index tuples in physical TID order 
[2] with patch [3].

[1] 
https://wiki.postgresql.org/wiki/Key_normalization#Retail_IndexTuple_deletion
[2] 

https://wiki.postgresql.org/wiki/Key_normalization#Making_all_items_in_the_index_unique_by_treating_heap_TID_as_an_implicit_last_attribute
[3] 
https://www.postgresql.org/message-id/CAH2-WzkVb0Kom%3DR%2B88fDFb%3DJSxZMFvbHVC6Mn9LJ2n%3DX%3DkS-Uw%40mail.gmail.com

--
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company

Attachment

Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Sun, Jun 17, 2018 at 9:39 PM, Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> I have written a code for quick indextuple deletion from an relation by heap
> tuple TID. The code relate to "Retail IndexTuple deletion" enhancement of
> btree index on postgresql wiki [1].

I knew that somebody would eventually read that Wiki page.  :-)

> Now, index relation cleaning performs by vacuum which scan all index
> relation for dead entries sequentially, tuple-by-tuple. This simplistic and
> safe method can be significantly surpasses in the cases, than a number of
> dead entries is not large by retail deletions which uses index scans for
> search dead entries.

I assume that the lazy vacuum bulk delete thing is much faster for the
situation where you have many dead tuples in the index. However,
allowing a B-Tree index to accumulate so much bloat in the first place
often has consequences that cannot be reversed with anything less than
a REINDEX. It's true that "prevention is better than cure" for all
types of bloat. However, this could perhaps be as much as 100x more
important for B-Tree bloat, since we cannot place new tuples in any
place that is convenient. It's very different to bloat in the heap,
even at a high level.

> The code demands hold DEAD tuple storage until scan key will be created. In
> this version I add 'target_index_deletion_factor' option. If it more than 0,
> heap_page_prune() uses ItemIdMarkDead() instead of ItemIdSetDead() function
> for set DEAD flag and hold the tuple storage.

Makes sense.

> Next step is developing background worker which will collect pairs (tid,
> scankey) of DEAD tuples from heap_page_prune() function.

Makes sense.

> | n | t1, s  | t2, s  | speedup |
> |---|---------------------------|
> | 0 | 0.00003| 0.4476 | 1748.4  |
> | 1 | 0.00006| 0.5367 | 855.99  |
> | 2 | 0.0004 | 0.9804 | 233.99  |
> | 3 | 0.0048 | 1.6493 | 34.576  |
> | 4 | 0.5600 | 2.4854 | 4.4382  |
> | 5 | 3.3300 | 3.9555 | 1.2012  |
> | 6 | 17.700 | 5.6000 | 0.3164  |
> |---|---------------------------|
> in the table, t1 - measured execution time of lazy_vacuum_index() function
> by Quick-Vacuum strategy; t2 - measured execution time of
> lazy_vacuum_index() function by Lazy-Vacuum strategy;

The speedup looks promising. However, the real benefit should be in
query performance, especially when we have heavy contention. Very
eager retail index tuple deletion could help a lot there. It already
makes sense to make autovacuum extremely aggressive in this case, to
the point when it's running almost constantly. A more targeted cleanup
process that can run much faster could do the same job, but be much
more eager, and so be much more effective at *preventing* bloating of
the key space [1][2].

> Note, guaranteed allowable time of index scans (used for quick deletion)
> will be achieved by storing equal-key index tuples in physical TID order [2]
> with patch [3].

I now have greater motivation to develop that patch into a real project.

I bet that my heap-tid-sort patch will allow you to refine your
interface when there are many logical duplicates: You can create one
explicit scan key, but have a list of heap TIDs that need to be killed
within the range of matching index tuples. That could be a lot more
efficient in the event of many non-HOT updates, where most index tuple
values won't actually change. You can sort the list of heap TIDs that
you want to kill once, and then "merge" it with the tuples at the leaf
level as they are matched/killed. It should be safe to avoid
rechecking anything other than the heap TID values.

[1] http://pgeoghegan.blogspot.com/2017/07/postgresql-index-bloat-microscope.html
[2] https://www.postgresql.org/message-id/CAH2-Wzmf6intNY1ggiNzOziiO5Eq=DsXfeptODGxO=2j-i1NGQ@mail.gmail.com
-- 
Peter Geoghegan


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Claudio Freire
Date:
On Mon, Jun 18, 2018 at 4:59 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > Note, guaranteed allowable time of index scans (used for quick deletion)
> > will be achieved by storing equal-key index tuples in physical TID order [2]
> > with patch [3].
>
> I now have greater motivation to develop that patch into a real project.
>
> I bet that my heap-tid-sort patch will allow you to refine your
> interface when there are many logical duplicates: You can create one
> explicit scan key, but have a list of heap TIDs that need to be killed
> within the range of matching index tuples. That could be a lot more
> efficient in the event of many non-HOT updates, where most index tuple
> values won't actually change. You can sort the list of heap TIDs that
> you want to kill once, and then "merge" it with the tuples at the leaf
> level as they are matched/killed. It should be safe to avoid
> rechecking anything other than the heap TID values.
>
> [1] http://pgeoghegan.blogspot.com/2017/07/postgresql-index-bloat-microscope.html
> [2] https://www.postgresql.org/message-id/CAH2-Wzmf6intNY1ggiNzOziiO5Eq=DsXfeptODGxO=2j-i1NGQ@mail.gmail.com

Actually, once btree tids are sorted, you can continue tree descent
all the way to the exact leaf page that contains the tuple to be
deleted.

Thus, the single-tuple interface ends up being quite OK. Sure, you can
optimize things a bit by scanning a range, but only if vacuum is able
to group keys in order to produce the optimized calls, and I don't see
that terribly likely.

So, IMHO the current interface may be quite enough.


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Mon, Jun 18, 2018 at 1:42 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> Actually, once btree tids are sorted, you can continue tree descent
> all the way to the exact leaf page that contains the tuple to be
> deleted.
>
> Thus, the single-tuple interface ends up being quite OK. Sure, you can
> optimize things a bit by scanning a range, but only if vacuum is able
> to group keys in order to produce the optimized calls, and I don't see
> that terribly likely.

Andrey talked about a background worker that did processing + index
tuple deletion when handed off work by a user backend after it
performs HOT pruning of a heap page. I consider that idea to be a good
place to go with the patch, because in practice the big problem is
workloads that suffer from so-called "write amplification", where most
index tuples are created despite being "logically unnecessary" (e.g.
one index among several prevents an UPDATE being HOT-safe, making
inserts into most of the indexes "logically unnecessary").

I think that it's likely that only descending the tree once in order
to kill multiple duplicate index tuples in one shot will in fact be
*very* important (unless perhaps you assume that that problem is
solved by something else, such as zheap). The mechanism that Andrey
describes is rather unlike VACUUM as we know it today, but that's the
whole point.

-- 
Peter Geoghegan


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Sun, Jun 17, 2018 at 9:39 PM, Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> Patch '0001-retail-indextuple-deletion' introduce new function
> amtargetdelete() in access method interface. Patch
> '0002-quick-vacuum-strategy' implements this function for an alternative
> strategy of lazy index vacuum, called 'Quick Vacuum'.

My compiler shows the following warnings:

/code/postgresql/root/build/../source/src/backend/access/nbtree/nbtree.c:
In function ‘bttargetdelete’:
/code/postgresql/root/build/../source/src/backend/access/nbtree/nbtree.c:1053:3:
warning: this ‘if’ clause does not guard... [-Wmisleading-indentation]
   if (needLock)
   ^~
/code/postgresql/root/build/../source/src/backend/access/nbtree/nbtree.c:1055:4:
note: ...this statement, but the latter is misleadingly indented as if
it were guarded by the ‘if’
    npages = RelationGetNumberOfBlocks(irel);
    ^~~~~~
/code/postgresql/root/build/../source/src/backend/access/nbtree/nbtree.c:1074:3:
warning: ‘blkno’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
   cleanup_block(info, stats, blkno);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I think that they're both harmless, though.

--
Peter Geoghegan


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Mon, Jun 18, 2018 at 2:54 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Sun, Jun 17, 2018 at 9:39 PM, Andrey V. Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> Patch '0001-retail-indextuple-deletion' introduce new function
>> amtargetdelete() in access method interface. Patch
>> '0002-quick-vacuum-strategy' implements this function for an alternative
>> strategy of lazy index vacuum, called 'Quick Vacuum'.
>
> My compiler shows the following warnings:

Some real feedback:

What we probably want to end up with here is new lazyvacuum.c code
that does processing for one heap page (and associated indexes) that
is really just a "partial" lazy vacuum. Though it won't do things like
advance relfrozenxid, it will do pruning for the heap page, index
tuple killing, and finally heap tuple killing. It will do all of these
things reliably, just like traditional lazy vacuum. This will be what
your background worker eventually uses.

I doubt that you can use routines like index_beginscan() within
bttargetdelete() at all. I think that you need something closer to
_bt_doinsert() or _bt_pagedel(), that manages its own scan (your code
should probably live in nbtpage.c). It does not make sense to teach
external, generic routines like index_beginscan() about heap TID being
an implicit final index attribute, which is one reason for this (I'm
assuming that this patch relies on my own patch).  Another reason is
that you need to hold an exclusive buffer lock at the point that you
identify the tuple to be killed, until the point that you actually
kill it. You don't do that now.

IOW, the approach you've taken in bttargetdelete() isn't quite correct
because you imagine that it's okay to occasionally "lose" the index
tuple that you originally found, and just move on. That needs to be
100% reliable, or else we'll end up with index tuples that point to
the wrong heap tuples in rare cases with concurrent insertions. As I
said, we want a "partial" lazy vacuum here, which must mean that it's
reliable. Note that _bt_pagedel() actually calls _bt_search() when it
deletes a page. Your patch will not be the first patch that makes
nbtree vacuuming do an index scan. You should be managing your own
insertion scan key, much like _bt_pagedel() does. If you use my patch,
_bt_search() can be taught to look for a specific heap TID.

Finally, doing things this way would let you delete multiple
duplicates in one shot, as I described in an earlier e-mail. Only a
single descent of the tree is needed to delete quite a few index
tuples, provided that they all happen to be logical duplicates. Again,
your background worker will take advantage of this.

This code does not follow the Postgres style:

> -       else
> +       }
> +       else {
> +           if (rootoffnum != latestdead)
> +               heap_prune_record_unused(prstate, latestdead);
>             heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
> +       }
>     }

Please be more careful about that. I find it very distracting.

-- 
Peter Geoghegan


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Mon, Jun 18, 2018 at 4:05 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> Finally, doing things this way would let you delete multiple
> duplicates in one shot, as I described in an earlier e-mail. Only a
> single descent of the tree is needed to delete quite a few index
> tuples, provided that they all happen to be logical duplicates. Again,
> your background worker will take advantage of this.

BTW, when you do this you should make sure that there is only one call
to _bt_delitems_vacuum(), so that there aren't too many WAL records.
Actually, that's not quite correct -- there should be one
_bt_delitems_vacuum() call *per leaf page* per bttargetdelete() call,
which is slightly different. There should rarely be more than one or
two calls to _bt_delitems_vacuum() in total, because your background
worker is only going to delete one heap page's duplicates per
bttargetdelete() call, and because there will be locality/correlation
with my TID order patch.

-- 
Peter Geoghegan


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Mon, Jun 18, 2018 at 4:05 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> IOW, the approach you've taken in bttargetdelete() isn't quite correct
> because you imagine that it's okay to occasionally "lose" the index
> tuple that you originally found, and just move on. That needs to be
> 100% reliable, or else we'll end up with index tuples that point to
> the wrong heap tuples in rare cases with concurrent insertions.

Attached patch adds a new amcheck check within
bt_index_parent_check(). With the patch, heap TIDs are accumulated in
a tuplesort and sorted at the tail end of verification (before
optional heapallindexed verification runs). This will reliably detect
the kind of corruption I noticed was possible with your patch.

Note that the amcheck enhancement that went along with my
heap-tid-btree-sort patch may not have detected this issue, which is
why I wrote this patch --  the heap-tid-btree-sort amcheck stuff could
detect duplicates, but only when all other attributes happened to be
identical when comparing sibling index tuples (i.e. only when we must
actually compare TIDs across sibling index tuples). If you add this
check, I'm pretty sure that you can detect any possible problem. You
should think about using this to debug your patch.

I may get around to submitting this to a CF, but that isn't a priority
right now.

-- 
Peter Geoghegan

Attachment

Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Masahiko Sawada
Date:
On Tue, Jun 19, 2018 at 8:05 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Mon, Jun 18, 2018 at 2:54 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> On Sun, Jun 17, 2018 at 9:39 PM, Andrey V. Lepikhov
>> <a.lepikhov@postgrespro.ru> wrote:
>>> Patch '0001-retail-indextuple-deletion' introduce new function
>>> amtargetdelete() in access method interface. Patch
>>> '0002-quick-vacuum-strategy' implements this function for an alternative
>>> strategy of lazy index vacuum, called 'Quick Vacuum'.

Great!

>>
>> My compiler shows the following warnings:
>
> Some real feedback:
>
> What we probably want to end up with here is new lazyvacuum.c code
> that does processing for one heap page (and associated indexes) that
> is really just a "partial" lazy vacuum. Though it won't do things like
> advance relfrozenxid, it will do pruning for the heap page, index
> tuple killing, and finally heap tuple killing. It will do all of these
> things reliably, just like traditional lazy vacuum. This will be what
> your background worker eventually uses.

I think that we do the partial lazy vacuum using visibility map even
now. That does heap pruning, index tuple killing but doesn't advance
relfrozenxid. Since this patch adds an ability to delete small amount
of index tuples quickly, what I'd like to do with this patch is to
invoke autovacuum more frequently, and do the target index deletion or
the index bulk-deletion depending on amount of garbage and index size
etc. That is, it might be better if lazy vacuum scans heap in ordinary
way and then plans and decides a method of index deletion based on
costs similar to what query planning does.

Regards,

--
Masahiko Sawada


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Tue, Jun 19, 2018 at 2:33 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I think that we do the partial lazy vacuum using visibility map even
> now. That does heap pruning, index tuple killing but doesn't advance
> relfrozenxid.

Right, that's what I was thinking. Opportunistic HOT pruning isn't
like vacuuming because it doesn't touch indexes. This patch adds an
alternative strategy for conventional lazy vacuum that is also able to
run a page at a time if needed. Perhaps page-at-a-time operation could
later be used for doing something that is opportunistic in the same
way that pruning is opportunistic, but it's too early to worry about
that.

> Since this patch adds an ability to delete small amount
> of index tuples quickly, what I'd like to do with this patch is to
> invoke autovacuum more frequently, and do the target index deletion or
> the index bulk-deletion depending on amount of garbage and index size
> etc. That is, it might be better if lazy vacuum scans heap in ordinary
> way and then plans and decides a method of index deletion based on
> costs similar to what query planning does.

That seems to be what Andrey wants to do, though right now the
prototype patch actually just always uses its alternative strategy
while doing any kind of lazy vacuuming (some simple costing code is
commented out right now). It shouldn't be too hard to add some costing
to it. Once we do that, and once we polish the patch some more, we can
do performance testing. Maybe that alone will be enough to make the
patch worth committing; "opportunistic microvacuuming" can come later,
if at all.

-- 
Peter Geoghegan


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
"Andrey V. Lepikhov"
Date:

On 19.06.2018 04:05, Peter Geoghegan wrote:
> On Mon, Jun 18, 2018 at 2:54 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> On Sun, Jun 17, 2018 at 9:39 PM, Andrey V. Lepikhov
>> <a.lepikhov@postgrespro.ru> wrote:
>>> Patch '0001-retail-indextuple-deletion' introduce new function
>>> amtargetdelete() in access method interface. Patch
>>> '0002-quick-vacuum-strategy' implements this function for an alternative
>>> strategy of lazy index vacuum, called 'Quick Vacuum'.
>>
>> My compiler shows the following warnings:
> 
> Some real feedback:
> 
> What we probably want to end up with here is new lazyvacuum.c code
> that does processing for one heap page (and associated indexes) that
> is really just a "partial" lazy vacuum. Though it won't do things like
> advance relfrozenxid, it will do pruning for the heap page, index
> tuple killing, and finally heap tuple killing. It will do all of these
> things reliably, just like traditional lazy vacuum. This will be what
> your background worker eventually uses.
> 
It is final goal of the patch.

> I doubt that you can use routines like index_beginscan() within
> bttargetdelete() at all. I think that you need something closer to
> _bt_doinsert() or _bt_pagedel(), that manages its own scan (your code
> should probably live in nbtpage.c). It does not make sense to teach
> external, generic routines like index_beginscan() about heap TID being
> an implicit final index attribute, which is one reason for this (I'm
> assuming that this patch relies on my own patch).  Another reason is
> that you need to hold an exclusive buffer lock at the point that you
> identify the tuple to be killed, until the point that you actually
> kill it. You don't do that now.
> 
> IOW, the approach you've taken in bttargetdelete() isn't quite correct
> because you imagine that it's okay to occasionally "lose" the index
> tuple that you originally found, and just move on. That needs to be
> 100% reliable, or else we'll end up with index tuples that point to
> the wrong heap tuples in rare cases with concurrent insertions. As I
> said, we want a "partial" lazy vacuum here, which must mean that it's
> reliable. Note that _bt_pagedel() actually calls _bt_search() when it
> deletes a page. Your patch will not be the first patch that makes
> nbtree vacuuming do an index scan. You should be managing your own
> insertion scan key, much like _bt_pagedel() does. If you use my patch,
> _bt_search() can be taught to look for a specific heap TID.
> 
Agree with this notes. Corrections will made in the next version of the 
patch.

> Finally, doing things this way would let you delete multiple
> duplicates in one shot, as I described in an earlier e-mail. Only a
> single descent of the tree is needed to delete quite a few index
> tuples, provided that they all happen to be logical duplicates. Again,
> your background worker will take advantage of this.
> 
It is very interesting idea. According to this, I plan to change 
bttargetdelete() interface as follows:
IndexTargetDeleteStats*
amtargetdelete(IndexTargetDeleteInfo *info,
               IndexTargetDeleteStats *stats,
               Datum *values, bool *isnull);
where structure IndexTargetDeleteInfo contains a TID list of dead heap 
tuples. All index entries, corresponding to this list, may be deleted 
(or only some of it) by one call of amtargetdelete() function with 
single descent of the tree.

> This code does not follow the Postgres style:
> 
>> -       else
>> +       }
>> +       else {
>> +           if (rootoffnum != latestdead)
>> +               heap_prune_record_unused(prstate, latestdead);
>>              heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
>> +       }
>>      }
> 
> Please be more careful about that. I find it very distracting.
> 
Done

-- 
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
"Andrey V. Lepikhov"
Date:
Hi,
According to your feedback, i develop second version of the patch.
In this version:
1. High-level functions index_beginscan(), index_rescan() not used. Tree 
descent made by _bt_search(). _bt_binsrch() used for positioning on the 
page.
2. TID list introduced in amtargetdelete() interface. Now only one tree 
descent needed for deletion all tid's from the list with equal scan key 
value - logical duplicates deletion problem.
3. Only one WAL record for index tuple deletion per leaf page per 
amtargetdelete() call.
4. VACUUM can sort TID list preliminary for more quick search of duplicates.

Background worker will come later.

On 19.06.2018 22:38, Peter Geoghegan wrote:
> On Tue, Jun 19, 2018 at 2:33 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I think that we do the partial lazy vacuum using visibility map even
>> now. That does heap pruning, index tuple killing but doesn't advance
>> relfrozenxid.
> 
> Right, that's what I was thinking. Opportunistic HOT pruning isn't
> like vacuuming because it doesn't touch indexes. This patch adds an
> alternative strategy for conventional lazy vacuum that is also able to
> run a page at a time if needed. Perhaps page-at-a-time operation could
> later be used for doing something that is opportunistic in the same
> way that pruning is opportunistic, but it's too early to worry about
> that.
> 
>> Since this patch adds an ability to delete small amount
>> of index tuples quickly, what I'd like to do with this patch is to
>> invoke autovacuum more frequently, and do the target index deletion or
>> the index bulk-deletion depending on amount of garbage and index size
>> etc. That is, it might be better if lazy vacuum scans heap in ordinary
>> way and then plans and decides a method of index deletion based on
>> costs similar to what query planning does.
> 
> That seems to be what Andrey wants to do, though right now the
> prototype patch actually just always uses its alternative strategy
> while doing any kind of lazy vacuuming (some simple costing code is
> commented out right now). It shouldn't be too hard to add some costing
> to it. Once we do that, and once we polish the patch some more, we can
> do performance testing. Maybe that alone will be enough to make the
> patch worth committing; "opportunistic microvacuuming" can come later,
> if at all.
> 

-- 
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company

Attachment

Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Fri, Jun 22, 2018 at 4:24 AM, Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> According to your feedback, i develop second version of the patch.
> In this version:
> 1. High-level functions index_beginscan(), index_rescan() not used. Tree
> descent made by _bt_search(). _bt_binsrch() used for positioning on the
> page.
> 2. TID list introduced in amtargetdelete() interface. Now only one tree
> descent needed for deletion all tid's from the list with equal scan key
> value - logical duplicates deletion problem.
> 3. Only one WAL record for index tuple deletion per leaf page per
> amtargetdelete() call.

Cool.

What is this "race" code about?

> +   buffer = ReadBufferExtended(rel, MAIN_FORKNUM, ItemPointerGetBlockNumber(tid), RBM_NORMAL, NULL);
> +   LockBuffer(buffer, BUFFER_LOCK_SHARE);
> +
> +   page = (Page) BufferGetPage(buffer);
> +   offnum = ItemPointerGetOffsetNumber(tid);
> +   lp = PageGetItemId(page, offnum);
> +
> +   /*
> +    * VACUUM Races: someone already remove the tuple from HEAP. Ignore it.
> +    */
> +   if (!ItemIdIsUsed(lp))
> +       return NULL;

It looks wrong -- why should another process have set the item as
unused? And if we assume that that's possible at all, what's to stop a
third process from actually reusing the item pointer before we arrive
(at get_tuple_by_tid()), leaving us to find a tuple that is totally
unrelated to the original tuple to be deleted?

(Also, you're not releasing the buffer lock before you return.)

> 4. VACUUM can sort TID list preliminary for more quick search of duplicates.

This version of the patch prevents my own "force unique keys" patch
from working, since you're not using my proposed new
_bt_search()/_bt_binsrch()/_bt_compare() interface (you're not passing
them a heap TID). It is essential that your patch be able to quickly
reach any tuple that it needs to kill. Otherwise, the worst case
performance is totally unacceptable; it will never be okay to go
through 10%+ of the index to kill a single tuple hundreds or even
thousands of times per VACUUM. It seems to me that doing this
tid_list_search() binary search is pointless -- you should instead be
relying on logical duplicates using their heap TID as a tie-breaker.
Rather than doing a binary search within tid_list_search(), you should
instead match the presorted heap TIDs at the leaf level against the
sorted in-memory TID list. You know, a bit like a merge join.

I suggest that you go even further than this: I think that you should
just start distributing my patch as part of your patch series. You can
change my code if you need to. I also suggest using "git format patch"
with simple, short commit messages to produce patches. This makes it a
lot easier to track the version of the patch, changes over time, etc.

I understand why you'd hesitate to take ownership of my code (it has
big problems!), but the reality is that all the problems that my patch
has are also problems for your patch. One patch cannot get committed
without the other, so they are already the same project. As a bonus,
my patch will probably improve the best case performance for your
patch, since multi-deletions will now have much better locality of
access.

-- 
Peter Geoghegan


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Fri, Jun 22, 2018 at 12:43 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Fri, Jun 22, 2018 at 4:24 AM, Andrey V. Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> According to your feedback, i develop second version of the patch.
>> In this version:
>> 1. High-level functions index_beginscan(), index_rescan() not used. Tree
>> descent made by _bt_search(). _bt_binsrch() used for positioning on the
>> page.
>> 2. TID list introduced in amtargetdelete() interface. Now only one tree
>> descent needed for deletion all tid's from the list with equal scan key
>> value - logical duplicates deletion problem.
>> 3. Only one WAL record for index tuple deletion per leaf page per
>> amtargetdelete() call.
>
> Cool.
>
> What is this "race" code about?

I noticed another bug in your patch, when running a
"wal_consistency_checking=all" smoke test. I do this simple, generic
test for anything that touches WAL-logging, actually -- it's a good
practice to adopt.

I enable "wal_consistency_checking=all" on the installation, create a
streaming replica with pg_basebackup (which also has
"wal_consistency_checking=all"), and then run "make installcheck"
against the primary. Here is what I see on the standby when I do this
with v2 of your patch applied:

9524/2018-06-22 13:03:12 PDT LOG:  entering standby mode
9524/2018-06-22 13:03:12 PDT LOG:  consistent recovery state reached
at 0/30000D0
9524/2018-06-22 13:03:12 PDT LOG:  invalid record length at 0/30000D0:
wanted 24, got 0
9523/2018-06-22 13:03:12 PDT LOG:  database system is ready to accept
read only connections
9528/2018-06-22 13:03:12 PDT LOG:  started streaming WAL from primary
at 0/3000000 on timeline 1
9524/2018-06-22 13:03:12 PDT LOG:  redo starts at 0/30000D0
9524/2018-06-22 13:03:32 PDT FATAL:  inconsistent page found, rel
1663/16384/1259, forknum 0, blkno 0
9524/2018-06-22 13:03:32 PDT CONTEXT:  WAL redo at 0/3360B00 for
Heap2/CLEAN: remxid 599
9523/2018-06-22 13:03:32 PDT LOG:  startup process (PID 9524) exited
with exit code 1
9523/2018-06-22 13:03:32 PDT LOG:  terminating any other active server processes
9523/2018-06-22 13:03:32 PDT LOG:  database system is shut down

I haven't investigated this at all, but I assume that the problem is a
simple oversight. The new ItemIdSetDeadRedirect() concept that you've
introduced probably necessitates changes in both the WAL logging
routines and the redo/recovery routines. You need to go make those
changes. (By the way, I don't think you should be using the constant
"3" with the ItemIdIsDeadRedirection() macro definition.)

Let me know if you get stuck on this, or need more direction.

-- 
Peter Geoghegan


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
"Andrey V. Lepikhov"
Date:

On 23.06.2018 01:14, Peter Geoghegan wrote:
> On Fri, Jun 22, 2018 at 12:43 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> On Fri, Jun 22, 2018 at 4:24 AM, Andrey V. Lepikhov
>> <a.lepikhov@postgrespro.ru> wrote:
>>> According to your feedback, i develop second version of the patch.
>>> In this version:
>>> 1. High-level functions index_beginscan(), index_rescan() not used. Tree
>>> descent made by _bt_search(). _bt_binsrch() used for positioning on the
>>> page.
>>> 2. TID list introduced in amtargetdelete() interface. Now only one tree
>>> descent needed for deletion all tid's from the list with equal scan key
>>> value - logical duplicates deletion problem.
>>> 3. Only one WAL record for index tuple deletion per leaf page per
>>> amtargetdelete() call.
>>
>> Cool.
>>
>> What is this "race" code about?
> 
I introduce this check because keep in mind about another vacuum 
workers, which can make cleaning a relation concurrently. may be it is 
redundant.

> I noticed another bug in your patch, when running a
> "wal_consistency_checking=all" smoke test. I do this simple, generic
> test for anything that touches WAL-logging, actually -- it's a good
> practice to adopt.
> 
> I enable "wal_consistency_checking=all" on the installation, create a
> streaming replica with pg_basebackup (which also has
> "wal_consistency_checking=all"), and then run "make installcheck"
> against the primary. Here is what I see on the standby when I do this
> with v2 of your patch applied:
> 
> 9524/2018-06-22 13:03:12 PDT LOG:  entering standby mode
> 9524/2018-06-22 13:03:12 PDT LOG:  consistent recovery state reached
> at 0/30000D0
> 9524/2018-06-22 13:03:12 PDT LOG:  invalid record length at 0/30000D0:
> wanted 24, got 0
> 9523/2018-06-22 13:03:12 PDT LOG:  database system is ready to accept
> read only connections
> 9528/2018-06-22 13:03:12 PDT LOG:  started streaming WAL from primary
> at 0/3000000 on timeline 1
> 9524/2018-06-22 13:03:12 PDT LOG:  redo starts at 0/30000D0
> 9524/2018-06-22 13:03:32 PDT FATAL:  inconsistent page found, rel
> 1663/16384/1259, forknum 0, blkno 0
> 9524/2018-06-22 13:03:32 PDT CONTEXT:  WAL redo at 0/3360B00 for
> Heap2/CLEAN: remxid 599
> 9523/2018-06-22 13:03:32 PDT LOG:  startup process (PID 9524) exited
> with exit code 1
> 9523/2018-06-22 13:03:32 PDT LOG:  terminating any other active server processes
> 9523/2018-06-22 13:03:32 PDT LOG:  database system is shut down
> 
> I haven't investigated this at all, but I assume that the problem is a
> simple oversight. The new ItemIdSetDeadRedirect() concept that you've
> introduced probably necessitates changes in both the WAL logging
> routines and the redo/recovery routines. You need to go make those
> changes. (By the way, I don't think you should be using the constant
> "3" with the ItemIdIsDeadRedirection() macro definition.)
> 
> Let me know if you get stuck on this, or need more direction.
> 
I was investigated the bug of the simple smoke test. You're right: make 
any manipulations with line pointer in heap_page_prune() without 
reflection in WAL record is no good idea.
But this consistency problem arises even on clean PostgreSQL 
installation (without my patch) with ItemIdSetDead() -> ItemIdMarkDead() 
replacement.
Byte-by-byte comparison of master and replay pages shows only 2 bytes 
difference in the tuple storage part of page.
I don't stuck on yet, but good ideas are welcome.

-- 
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Masahiko Sawada
Date:
On Fri, Jun 22, 2018 at 8:24 PM, Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> Hi,
> According to your feedback, i develop second version of the patch.
> In this version:
> 1. High-level functions index_beginscan(), index_rescan() not used. Tree
> descent made by _bt_search(). _bt_binsrch() used for positioning on the
> page.
> 2. TID list introduced in amtargetdelete() interface. Now only one tree
> descent needed for deletion all tid's from the list with equal scan key
> value - logical duplicates deletion problem.
> 3. Only one WAL record for index tuple deletion per leaf page per
> amtargetdelete() call.
> 4. VACUUM can sort TID list preliminary for more quick search of duplicates.
>
> Background worker will come later.
>
>

Thank you for updating patches! Here is some comments for the latest patch.

+static void
+quick_vacuum_index(Relation irel, Relation hrel,
+                                  IndexBulkDeleteResult **overall_stats,
+                                  LVRelStats *vacrelstats)
+{
(snip)
+       /*
+        * Collect statistical info
+        */
+       lazy_cleanup_index(irel, *overall_stats, vacrelstats);
+}

I think that we should not call lazy_cleanup_index at the end of
quick_vacuum_index because we call it multiple times during a lazy
vacuum and index statistics can be changed during vacuum. We already
call lazy_cleanup_index at the end of lazy_scan_heap.

bttargetdelete doesn't delete btree pages even if pages become empty.
I think we should do that. Otherwise empty page never be recycled. But
please note that if we delete btree pages during bttargetdelete,
recyclable pages might not be recycled. That is, if we choose the
target deletion method every time then the deleted-but-not-recycled
pages could never be touched, unless reaching
vacuum_cleanup_index_scale_factor. So I think we need to either run
bulk-deletion method or do cleanup index before btpo.xact wraparound.

+               ivinfo.indexRelation = irel;
+               ivinfo.heapRelation = hrel;
+               qsort((void *)vacrelstats->dead_tuples,
vacrelstats->num_dead_tuples, sizeof(ItemPointerData),
tid_comparator);

I think the sorting vacrelstats->dead_tuples is not necessary because
garbage TIDs  are stored in a sorted order.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
"Andrey V. Lepikhov"
Date:

On 26.06.2018 15:31, Masahiko Sawada wrote:
> On Fri, Jun 22, 2018 at 8:24 PM, Andrey V. Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> Hi,
>> According to your feedback, i develop second version of the patch.
>> In this version:
>> 1. High-level functions index_beginscan(), index_rescan() not used. Tree
>> descent made by _bt_search(). _bt_binsrch() used for positioning on the
>> page.
>> 2. TID list introduced in amtargetdelete() interface. Now only one tree
>> descent needed for deletion all tid's from the list with equal scan key
>> value - logical duplicates deletion problem.
>> 3. Only one WAL record for index tuple deletion per leaf page per
>> amtargetdelete() call.
>> 4. VACUUM can sort TID list preliminary for more quick search of duplicates.
>>
>> Background worker will come later.
>>
>>
> 
> Thank you for updating patches! Here is some comments for the latest patch.
> 
> +static void
> +quick_vacuum_index(Relation irel, Relation hrel,
> +                                  IndexBulkDeleteResult **overall_stats,
> +                                  LVRelStats *vacrelstats)
> +{
> (snip)
> +       /*
> +        * Collect statistical info
> +        */
> +       lazy_cleanup_index(irel, *overall_stats, vacrelstats);
> +}
> 
> I think that we should not call lazy_cleanup_index at the end of
> quick_vacuum_index because we call it multiple times during a lazy
> vacuum and index statistics can be changed during vacuum. We already
> call lazy_cleanup_index at the end of lazy_scan_heap.
> 
Ok

> bttargetdelete doesn't delete btree pages even if pages become empty.
> I think we should do that. Otherwise empty page never be recycled. But
> please note that if we delete btree pages during bttargetdelete,
> recyclable pages might not be recycled. That is, if we choose the
> target deletion method every time then the deleted-but-not-recycled
> pages could never be touched, unless reaching
> vacuum_cleanup_index_scale_factor. So I think we need to either run
> bulk-deletion method or do cleanup index before btpo.xact wraparound.
> 
> +               ivinfo.indexRelation = irel;
> +               ivinfo.heapRelation = hrel;
> +               qsort((void *)vacrelstats->dead_tuples,
> vacrelstats->num_dead_tuples, sizeof(ItemPointerData),
> tid_comparator);
> 
Ok. I think caller of bttargetdelete() must decide when to make index 
cleanup.

> I think the sorting vacrelstats->dead_tuples is not necessary because
> garbage TIDs  are stored in a sorted order.
> 
Sorting was introduced because I keep in mind background worker and more 
flexible cleaning strategies, not only full tuple-by-tuple relation and 
block scan.
Caller of bttargetdelete() can set info->isSorted to prevent sorting 
operation.

> Regards,
> 
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
> 

-- 
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
"Andrey V. Lepikhov"
Date:

On 23.06.2018 00:43, Peter Geoghegan wrote:
> On Fri, Jun 22, 2018 at 4:24 AM, Andrey V. Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> According to your feedback, i develop second version of the patch.
>> In this version:
>> 1. High-level functions index_beginscan(), index_rescan() not used. Tree
>> descent made by _bt_search(). _bt_binsrch() used for positioning on the
>> page.
>> 2. TID list introduced in amtargetdelete() interface. Now only one tree
>> descent needed for deletion all tid's from the list with equal scan key
>> value - logical duplicates deletion problem.
>> 3. Only one WAL record for index tuple deletion per leaf page per
>> amtargetdelete() call.
> 
> Cool.
> 
> What is this "race" code about?
> 
>> +   buffer = ReadBufferExtended(rel, MAIN_FORKNUM, ItemPointerGetBlockNumber(tid), RBM_NORMAL, NULL);
>> +   LockBuffer(buffer, BUFFER_LOCK_SHARE);
>> +
>> +   page = (Page) BufferGetPage(buffer);
>> +   offnum = ItemPointerGetOffsetNumber(tid);
>> +   lp = PageGetItemId(page, offnum);
>> +
>> +   /*
>> +    * VACUUM Races: someone already remove the tuple from HEAP. Ignore it.
>> +    */
>> +   if (!ItemIdIsUsed(lp))
>> +       return NULL;
> 
> It looks wrong -- why should another process have set the item as
> unused? And if we assume that that's possible at all, what's to stop a
> third process from actually reusing the item pointer before we arrive
> (at get_tuple_by_tid()), leaving us to find a tuple that is totally
> unrelated to the original tuple to be deleted?
> 
> (Also, you're not releasing the buffer lock before you return.)
> 
>> 4. VACUUM can sort TID list preliminary for more quick search of duplicates.
> 
> This version of the patch prevents my own "force unique keys" patch
> from working, since you're not using my proposed new
> _bt_search()/_bt_binsrch()/_bt_compare() interface (you're not passing
> them a heap TID). It is essential that your patch be able to quickly
> reach any tuple that it needs to kill. Otherwise, the worst case
> performance is totally unacceptable; it will never be okay to go
> through 10%+ of the index to kill a single tuple hundreds or even
> thousands of times per VACUUM. It seems to me that doing this
> tid_list_search() binary search is pointless -- you should instead be
> relying on logical duplicates using their heap TID as a tie-breaker.
> Rather than doing a binary search within tid_list_search(), you should
> instead match the presorted heap TIDs at the leaf level against the
> sorted in-memory TID list. You know, a bit like a merge join.

I agree with you. Binary search was developed in awaiting your patch.
> 
> I suggest that you go even further than this: I think that you should
> just start distributing my patch as part of your patch series. You can
> change my code if you need to. 

Good. I am ready to start distributing your patch. At the beginning of 
the work I planned to make patch for physical TID ordering in the btree 
index. Your patch will make it much easier.

I also suggest using "git format patch"
> with simple, short commit messages to produce patches. This makes it a
> lot easier to track the version of the patch, changes over time, etc.
> 
Ok

> I understand why you'd hesitate to take ownership of my code (it has
> big problems!), but the reality is that all the problems that my patch
> has are also problems for your patch. One patch cannot get committed
> without the other, so they are already the same project. As a bonus,
> my patch will probably improve the best case performance for your
> patch, since multi-deletions will now have much better locality of
> access.
> 
I still believe that the patch for physical TID ordering in btree:
1) has its own value, not only for target deletion,
2) will require only a few local changes in my code,
and this patches can be developed independently.

I prepare third version of the patches. Summary:
1. Mask DEAD tuples at a page during consistency checking (See comments 
for the mask_dead_tuples() function).
2. Still not using physical TID ordering.
3. Index cleanup() after each quick_vacuum_index() call was excluded.

-- 
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company

Attachment

Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Tue, Jun 26, 2018 at 3:31 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> bttargetdelete doesn't delete btree pages even if pages become empty.
> I think we should do that. Otherwise empty page never be recycled. But
> please note that if we delete btree pages during bttargetdelete,
> recyclable pages might not be recycled. That is, if we choose the
> target deletion method every time then the deleted-but-not-recycled
> pages could never be touched, unless reaching
> vacuum_cleanup_index_scale_factor. So I think we need to either run
> bulk-deletion method or do cleanup index before btpo.xact wraparound.

As you pointed out, we can certainly never fully delete or recycle
half-dead pages using bttargetdelete. We already need to make some
kind of compromise around page deletion, and it may not be necessary
to insist that bttargetdelete does any kind of page deletion. I'm
unsure of that, though.

-- 
Peter Geoghegan


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Tue, Jun 26, 2018 at 11:40 PM, Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> I still believe that the patch for physical TID ordering in btree:
> 1) has its own value, not only for target deletion,
> 2) will require only a few local changes in my code,
> and this patches can be developed independently.

I want to be clear on something now: I just don't think that this
patch has any chance of getting committed without something like my
own patch to go with it. The worst case for your patch without that
component is completely terrible. It's not really important for you to
actually formally make it part of your patch, so I'm not going to
insist on that or anything, but the reality is that my patch does not
have independent value -- and neither does yours.

I'm sorry if that sounds harsh, but this is a difficult, complicated
project. It's better to be clear about this stuff earlier on.

> I prepare third version of the patches. Summary:
> 1. Mask DEAD tuples at a page during consistency checking (See comments for
> the mask_dead_tuples() function).
> 2. Still not using physical TID ordering.
> 3. Index cleanup() after each quick_vacuum_index() call was excluded.

How does this patch affect opportunistic pruning in particular? Not
being able to immediately reclaim tuple space in the event of a dead
hot chain that is marked LP_DEAD could hurt quite a lot, including
with very common workloads, such as pgbench (pgbench accounts tuples
are quite a lot wider than a raw item pointer, and opportunistic
pruning is much more important than vacuuming). Is that going to be
acceptable, do you think? Have you measured the effects? Can we do
something about it, like make pruning behave differently when it's
opportunistic?

Are you aware of the difference between _bt_delitems_delete() and
_bt_delitems_vacuum(), and the considerations for hot standby? I think
that that's another TODO list item for this patch.

-- 
Peter Geoghegan


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
"Andrey V. Lepikhov"
Date:

On 28.06.2018 05:00, Peter Geoghegan wrote:
> On Tue, Jun 26, 2018 at 11:40 PM, Andrey V. Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> I still believe that the patch for physical TID ordering in btree:
>> 1) has its own value, not only for target deletion,
>> 2) will require only a few local changes in my code,
>> and this patches can be developed independently.
> 
> I want to be clear on something now: I just don't think that this
> patch has any chance of getting committed without something like my
> own patch to go with it. The worst case for your patch without that
> component is completely terrible. It's not really important for you to
> actually formally make it part of your patch, so I'm not going to
> insist on that or anything, but the reality is that my patch does not
> have independent value -- and neither does yours.
> 
As I wrote before in the last email, I will integrate TID sorting to my 
patches right now. Please, give me access to the last version of your 
code, if it possible.
You can track the progress at https://github.com/danolivo/postgres git 
repository

> I'm sorry if that sounds harsh, but this is a difficult, complicated
> project. It's better to be clear about this stuff earlier on.

Ok. It is clear now.
> 
>> I prepare third version of the patches. Summary:
>> 1. Mask DEAD tuples at a page during consistency checking (See comments for
>> the mask_dead_tuples() function).
>> 2. Still not using physical TID ordering.
>> 3. Index cleanup() after each quick_vacuum_index() call was excluded.
> 
> How does this patch affect opportunistic pruning in particular? Not
> being able to immediately reclaim tuple space in the event of a dead
> hot chain that is marked LP_DEAD could hurt quite a lot, including
> with very common workloads, such as pgbench (pgbench accounts tuples
> are quite a lot wider than a raw item pointer, and opportunistic
> pruning is much more important than vacuuming). Is that going to be
> acceptable, do you think? Have you measured the effects? Can we do
> something about it, like make pruning behave differently when it's
> opportunistic?

This is the most "tasty" part of the work. I plan some experimental 
research on it at the end of patches developing (including TID sort)
and parametrized opportunistic pruning for flexibility of switching 
between strategies on the fly.
My current opinion on this question: we can develop flexible strategy 
based on parameters: free space at a block, frequency of UPDATE/DELETE 
queries, percent of DEAD tuples in a block/relation.
Background cleaner, raised by heap_page_prune(), give an opportunity for 
using different ways for each block or relation.
This technique should be able to configure from fully non-storage DEAD 
tuples+vacuum to all-storage DEAD tuples+target deletion by DB admin.

> 
> Are you aware of the difference between _bt_delitems_delete() and
> _bt_delitems_vacuum(), and the considerations for hot standby? I think
> that that's another TODO list item for this patch.
> 

Ok

-- 
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Kuntal Ghosh
Date:
On Wed, Jun 27, 2018 at 12:10 PM, Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> I prepare third version of the patches. Summary:
> 1. Mask DEAD tuples at a page during consistency checking (See comments for
> the mask_dead_tuples() function).

- ItemIdSetDead(lp);
+ if (target_index_deletion_factor > 0)
+ ItemIdMarkDead(lp);
+ else
+ ItemIdSetDead(lp);
IIUC, you want to hold the storage of DEAD tuples to form the ScanKey
which is required for the index scan in the second phase of quick
vacuum strategy. To achieve that, you've only marked the tuple as DEAD
during pruning. Hence, PageRepairFragmentation cannot claim the space
for DEAD tuples since they still have storage associated with them.
But, you've WAL-logged it as DEAD tuples having no storage. So, when
the WAL record is replayed in standby(or crash recovery), the tuples
will be marked as DEAD having no storage and their space may be
reclaimed by PageRepairFragmentation. Now, if you do byte-by-byte
comparison with wal_consistency tool, it may fail even for normal
tuple as well. Please let me know if you feel the same way.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
"Andrey V. Lepikhov"
Date:

On 29.06.2018 10:00, Kuntal Ghosh wrote:
> On Wed, Jun 27, 2018 at 12:10 PM, Andrey V. Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> I prepare third version of the patches. Summary:
>> 1. Mask DEAD tuples at a page during consistency checking (See comments for
>> the mask_dead_tuples() function).
> 
> - ItemIdSetDead(lp);
> + if (target_index_deletion_factor > 0)
> + ItemIdMarkDead(lp);
> + else
> + ItemIdSetDead(lp);
> IIUC, you want to hold the storage of DEAD tuples to form the ScanKey
> which is required for the index scan in the second phase of quick
> vacuum strategy. To achieve that, you've only marked the tuple as DEAD
> during pruning. Hence, PageRepairFragmentation cannot claim the space
> for DEAD tuples since they still have storage associated with them.
> But, you've WAL-logged it as DEAD tuples having no storage. So, when
> the WAL record is replayed in standby(or crash recovery), the tuples
> will be marked as DEAD having no storage and their space may be
> reclaimed by PageRepairFragmentation. Now, if you do byte-by-byte
> comparison with wal_consistency tool, it may fail even for normal
> tuple as well. Please let me know if you feel the same way.
> 
Thanks for your analysis.
In this development version of the patch I expect the same prune() 
strategy on a master and standby (i.e. target_index_deletion_factor is 
equal for both).
In this case storage of a DEAD tuple holds during replay or recovery in 
the same way.
On some future step of development I plan to use more flexible prune() 
strategy. This will require to append a 'isDeadStorageHold' field to the 
WAL record.

-- 
Regards,
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Kuntal Ghosh
Date:
On Fri, Jun 29, 2018 at 11:04 AM, Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> On 29.06.2018 10:00, Kuntal Ghosh wrote:
>>
>> On Wed, Jun 27, 2018 at 12:10 PM, Andrey V. Lepikhov
>> <a.lepikhov@postgrespro.ru> wrote:
>>>
>>> I prepare third version of the patches. Summary:
>>> 1. Mask DEAD tuples at a page during consistency checking (See comments
>>> for
>>> the mask_dead_tuples() function).
>>
>>
>> - ItemIdSetDead(lp);
>> + if (target_index_deletion_factor > 0)
>> + ItemIdMarkDead(lp);
>> + else
>> + ItemIdSetDead(lp);
>> IIUC, you want to hold the storage of DEAD tuples to form the ScanKey
>> which is required for the index scan in the second phase of quick
>> vacuum strategy. To achieve that, you've only marked the tuple as DEAD
>> during pruning. Hence, PageRepairFragmentation cannot claim the space
>> for DEAD tuples since they still have storage associated with them.
>> But, you've WAL-logged it as DEAD tuples having no storage. So, when
>> the WAL record is replayed in standby(or crash recovery), the tuples
>> will be marked as DEAD having no storage and their space may be
>> reclaimed by PageRepairFragmentation. Now, if you do byte-by-byte
>> comparison with wal_consistency tool, it may fail even for normal
>> tuple as well. Please let me know if you feel the same way.
>>
> Thanks for your analysis.
> In this development version of the patch I expect the same prune() strategy
> on a master and standby (i.e. target_index_deletion_factor is equal for
> both).
> In this case storage of a DEAD tuple holds during replay or recovery in the
> same way.
Okay. That's acceptable for now.

> On some future step of development I plan to use more flexible prune()
> strategy. This will require to append a 'isDeadStorageHold' field to the WAL
> record.
That sounds interesting. I'll be waiting for your next patches.

Few minor comments:
+ qsort((void *)vacrelstats->dead_tuples,
vacrelstats->num_dead_tuples, sizeof(ItemPointerData),
tid_comparator);
+ ivinfo.isSorted = true;
My understanding is that vacrelstats->dead_tuples are already sorted
based on their tids. I'm referring to the following part in
lazy_scan_heap(),
for (blk=0 to nblocks)
{
 for (offset=1 to max offset)
 {
  if certain conditions are met store the tuple in
vacrelstats->dead_tuples using lazy_record_dead_tuple();
 }
}
So, you don't have to sort them again, right?

+ slot = MakeSingleTupleTableSlot(RelationGetDescr(hrel));
+ econtext->ecxt_scantuple = slot;
+ ExecDropSingleTupleTableSlot(slot);
You don't have to do this for every tuple. Before storing the tuple,
you can just call MemoryContextReset(econtext->ecxt_per_tuple_memory);
Perhaps, you can check IndexBuildHeapRangeScan for the details.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Dilip Kumar
Date:
On Wed, Jun 27, 2018 at 12:10 PM, Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
>
>
> On 23.06.2018 00:43, Peter Geoghegan wrote:
>>
>> On Fri, Jun 22, 2018 at 4:24 AM, Andrey V. Lepikhov
>> <a.lepikhov@postgrespro.ru> wrote:
>>>
>>> According to your feedback, i develop second version of the patch.
>>> In this version:

I had quick look at your patch, I have few comments.
1.
+ if (use_quick_strategy)
+ quick_vacuum_index(Irel[i], onerel, vacrelstats);
+ else
+ lazy_vacuum_index(Irel[i],

I noticed that inside quick_vacuum_index again we check if we don't
have am routine for the target
delete then we call lazy_vacuum_index.  I think we can have that check
here itself?

2.
+ else
+ {
+ IndexBulkDeleteResult **stats;
+
+ lazy_vacuum_index(irel, stats, vacrelstats);
+ }

These stats will be lost, I think if you fix first then this will
issue will be solved, otherwise, you might
need to pass indexstats as a parameter to quick_vacuum_index function.

3.
+}
+
+#include "access/nbtree.h"
+#include "catalog/index.h"
+#include "executor/executor.h"

You can move these header inclusions at the top of the file.

4. Many places PG coding standard is not followed, few examples
a.
+ bool use_quick_strategy =
(vacrelstats->num_dead_tuples/vacrelstats->old_live_tuples <
target_index_deletion_factor);
+
space between operator and variable
vacrelstats->num_dead_tuples/vacrelstats->old_live_tuples  ->
vacrelstats->num_dead_tuples / vacrelstats->old_live_tuples

b.qsort((void *)vacrelstats  -> qsort((void *) vacrelstats

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Юрий Соколов
Date:
чт, 28 июн. 2018 г., 8:37 Andrey V. Lepikhov <a.lepikhov@postgrespro.ru>:


On 28.06.2018 05:00, Peter Geoghegan wrote:
> On Tue, Jun 26, 2018 at 11:40 PM, Andrey V. Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> I still believe that the patch for physical TID ordering in btree:
>> 1) has its own value, not only for target deletion,
>> 2) will require only a few local changes in my code,
>> and this patches can be developed independently.
>
> I want to be clear on something now: I just don't think that this
> patch has any chance of getting committed without something like my
> own patch to go with it. The worst case for your patch without that
> component is completely terrible. It's not really important for you to
> actually formally make it part of your patch, so I'm not going to
> insist on that or anything, but the reality is that my patch does not
> have independent value -- and neither does yours.
>
As I wrote before in the last email, I will integrate TID sorting to my
patches right now. Please, give me access to the last version of your
code, if it possible.
You can track the progress at https://github.com/danolivo/postgres git
repository

Peter is absolutely right, imho: tie-breaking by TID within index
 ordering is inevitable for reliable performance of this patch.

With regards, 
Sokolov Yura. 

Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
"Andrey V. Lepikhov"
Date:

On 29.06.2018 14:07, Юрий Соколов wrote:
> чт, 28 июн. 2018 г., 8:37 Andrey V. Lepikhov <a.lepikhov@postgrespro.ru 
> <mailto:a.lepikhov@postgrespro.ru>>:
> 
> 
> 
>     On 28.06.2018 05:00, Peter Geoghegan wrote:
>      > On Tue, Jun 26, 2018 at 11:40 PM, Andrey V. Lepikhov
>      > <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
>      >> I still believe that the patch for physical TID ordering in btree:
>      >> 1) has its own value, not only for target deletion,
>      >> 2) will require only a few local changes in my code,
>      >> and this patches can be developed independently.
>      >
>      > I want to be clear on something now: I just don't think that this
>      > patch has any chance of getting committed without something like my
>      > own patch to go with it. The worst case for your patch without that
>      > component is completely terrible. It's not really important for
>     you to
>      > actually formally make it part of your patch, so I'm not going to
>      > insist on that or anything, but the reality is that my patch does not
>      > have independent value -- and neither does yours.
>      >
>     As I wrote before in the last email, I will integrate TID sorting to my
>     patches right now. Please, give me access to the last version of your
>     code, if it possible.
>     You can track the progress at https://github.com/danolivo/postgres git
>     repository
> 
> 
> Peter is absolutely right, imho: tie-breaking by TID within index
>   ordering is inevitable for reliable performance of this patch.
> 

In the new version the patch [1] was used in cooperation with 'retail 
indextuple deletion' and 'quick vacuum strategy' patches (see 
'0004-Retail-IndexTuple-Deletion-with-TID-sorting-in-leaf-.patch'.

To demonstrate the potential benefits, I did a test:

CREATE TABLE test (id serial primary key, value integer, factor integer);
INSERT INTO test (value, factor) SELECT random()*1e5, random()*1e3 FROM 
generate_series(1, 1e7);
CREATE INDEX ON test(value);
VACUUM;
DELETE FROM test WHERE (factor = 1);
VACUUM test;

Execution time of last "VACUUM test;" command on my notebook was:

with bulk deletion: 1.6 s;
with Quick Vacuum Strategy: 5.2 s;
with Quick Vacuum Strategy & TID sorting: 0.6 s.

[1] 
https://www.postgresql.org/message-id/CAH2-WzkVb0Kom%3DR%2B88fDFb%3DJSxZMFvbHVC6Mn9LJ2n%3DX%3DkS-Uw%40mail.gmail.com

> With regards,
> Sokolov Yura.

-- 
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company


Attachment

Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Mon, Jul 2, 2018 at 7:29 AM, Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> In the new version the patch [1] was used in cooperation with 'retail
> indextuple deletion' and 'quick vacuum strategy' patches (see
> '0004-Retail-IndexTuple-Deletion-with-TID-sorting-in-leaf-.patch'.

Cool.

I'm going to post a revised version of the unique key patch soon. I've
found that it's slightly faster to use DESC ordering for the implicit
heap TID attribute. Doing so also restores the old user-visible
behavior for DROP dependency management, which allows me to remove all
changes to the regression test output

> Execution time of last "VACUUM test;" command on my notebook was:
>
> with bulk deletion: 1.6 s;
> with Quick Vacuum Strategy: 5.2 s;
> with Quick Vacuum Strategy & TID sorting: 0.6 s.

I'm glad that you looked into this. You could make this faster still,
by actually passing the lowest heap TID in the list of TIDs to kill to
_bt_search() and _bt_binsrch(). You might have to work through several
extra B-Tree leaf pages per bttargetdelete() call without this (you'll
move right multiple times within bttargetdelete()).

-- 
Peter Geoghegan


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Mon, Jul 2, 2018 at 9:28 AM, Peter Geoghegan <pg@bowt.ie> wrote:
>> Execution time of last "VACUUM test;" command on my notebook was:
>>
>> with bulk deletion: 1.6 s;
>> with Quick Vacuum Strategy: 5.2 s;
>> with Quick Vacuum Strategy & TID sorting: 0.6 s.
>
> I'm glad that you looked into this. You could make this faster still,
> by actually passing the lowest heap TID in the list of TIDs to kill to
> _bt_search() and _bt_binsrch(). You might have to work through several
> extra B-Tree leaf pages per bttargetdelete() call without this (you'll
> move right multiple times within bttargetdelete()).

I should add: I think that this doesn't matter so much in your
original test case with v1 of my patch, because you're naturally
accessing the index tuples in almost the most efficient way already,
since you VACUUM works its way from the start of the table until the
end of the table. You'll definitely need to pass a heap TID to
routines like _bt_search() once you start using my v2, though, since
that puts the heap TIDs in DESC sort order. Otherwise, it'll be almost
as slow as the plain "Quick Vacuum Strategy" case was.

In general, the big idea with my patch is that heap TID is just
another attribute. I am not "cheating" in any way; if it's not
possible to descend the tree and arrive at the right leaf page without
looking through several leaf pages, then my patch is broken.

You might also use _bt_moveright() with my patch. That way, you can
quickly detect that you need to move right immediately, without going
through all the items on the page. This should only be an issue in the
event of a concurrent page split, though. In my patch, I use
_bt_moveright() in a special way for unique indexes: I need to start
at the first leaf page a duplicate could be on for duplicate checking,
but once that's over I want to "jump immediately" to the leaf page the
index tuple actually needs to be inserted on. That's when
_bt_moveright() is called. (Actually, that looks like it breaks unique
index enforcement in the case of my patch, which I need to fix, but
you might still do this.)

-- 
Peter Geoghegan


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
"Andrey V. Lepikhov"
Date:

On 03.07.2018 00:40, Peter Geoghegan wrote:
> On Mon, Jul 2, 2018 at 9:28 AM, Peter Geoghegan <pg@bowt.ie> wrote:
>>> Execution time of last "VACUUM test;" command on my notebook was:
>>>
>>> with bulk deletion: 1.6 s;
>>> with Quick Vacuum Strategy: 5.2 s;
>>> with Quick Vacuum Strategy & TID sorting: 0.6 s.
>>
>> I'm glad that you looked into this. You could make this faster still,
>> by actually passing the lowest heap TID in the list of TIDs to kill to
>> _bt_search() and _bt_binsrch(). You might have to work through several
>> extra B-Tree leaf pages per bttargetdelete() call without this (you'll
>> move right multiple times within bttargetdelete()).
> 
> I should add: I think that this doesn't matter so much in your
> original test case with v1 of my patch, because you're naturally
> accessing the index tuples in almost the most efficient way already,
> since you VACUUM works its way from the start of the table until the
> end of the table. You'll definitely need to pass a heap TID to
> routines like _bt_search() once you start using my v2, though, since
> that puts the heap TIDs in DESC sort order. Otherwise, it'll be almost
> as slow as the plain "Quick Vacuum Strategy" case was.
> 
> In general, the big idea with my patch is that heap TID is just
> another attribute. I am not "cheating" in any way; if it's not
> possible to descend the tree and arrive at the right leaf page without
> looking through several leaf pages, then my patch is broken.
> 
> You might also use _bt_moveright() with my patch. That way, you can
> quickly detect that you need to move right immediately, without going
> through all the items on the page. This should only be an issue in the
> event of a concurrent page split, though. In my patch, I use
> _bt_moveright() in a special way for unique indexes: I need to start
> at the first leaf page a duplicate could be on for duplicate checking,
> but once that's over I want to "jump immediately" to the leaf page the
> index tuple actually needs to be inserted on. That's when
> _bt_moveright() is called. (Actually, that looks like it breaks unique
> index enforcement in the case of my patch, which I need to fix, but
> you might still do this.)
> 

Done.
Attachment contains an update for use v.2 of the 'Ensure nbtree leaf 
tuple keys are always unique' patch.

Apply order:
1. 0001-Retail-IndexTuple-Deletion-Access-Method.patch - from previous email
2. 0002-Quick-vacuum-strategy.patch - from previous email
3. v2-0001-Ensure-nbtree-leaf-tuple-keys-are-always-unique.patch - from [1]
4. 0004-Retail-IndexTuple-Deletion-with-TID-sorting-in-leaf.patch

[1] 
https://www.postgresql.org/message-id/CAH2-Wzm6D%3DKnV%2BP8bZE-ZtP4e%2BW64HtVTdOenqd1d7HjJL3xZQ%40mail.gmail.com

-- 
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company

Attachment

Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Tue, Jul 3, 2018 at 5:17 AM, Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> Done.
> Attachment contains an update for use v.2 of the 'Ensure nbtree leaf tuple
> keys are always unique' patch.

My v3 is still pending, but is now a lot better than v2. There were
bugs in v2 that were fixed.

One area that might be worth investigating is retail index tuple
deletion performed within the executor in the event of non-HOT
updates. Maybe LP_REDIRECT could be repurposed to mean "ghost record",
at least in unique index tuples with no NULL values. The idea is that
MVCC index scans can skip over those if they've already found a
visible tuple with the same value. Also, when there was about to be a
page split, they could be treated a little bit like LP_DEAD items. Of
course, the ghost bit would have to be treated as a hint that could be
"wrong" (e.g. because the transaction hasn't committed yet), so you'd
have to go to the heap in the context of a page split, to double
check. Also, you'd need heuristics that let you give up on this
strategy when it didn't help.

I think that this could work well enough for OLTP workloads, and might
be more future-proof than doing it in VACUUM. Though, of course, it's
still very complicated.

-- 
Peter Geoghegan


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Masahiko Sawada
Date:
On Fri, Jul 13, 2018 at 4:00 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Tue, Jul 3, 2018 at 5:17 AM, Andrey V. Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> Done.
>> Attachment contains an update for use v.2 of the 'Ensure nbtree leaf tuple
>> keys are always unique' patch.
>
> My v3 is still pending, but is now a lot better than v2. There were
> bugs in v2 that were fixed.
>
> One area that might be worth investigating is retail index tuple
> deletion performed within the executor in the event of non-HOT
> updates. Maybe LP_REDIRECT could be repurposed to mean "ghost record",
> at least in unique index tuples with no NULL values. The idea is that
> MVCC index scans can skip over those if they've already found a
> visible tuple with the same value.

I think that's a good idea. The overhead of marking it as ghost seems
small and it would speed up index scans. If MVCC index scans have
already found a visible tuples with the same value they can not only
skip scanning but also kill them? If can, we can kill index tuples
without checking the heap.

> Also, when there was about to be a
> page split, they could be treated a little bit like LP_DEAD items. Of
> course, the ghost bit would have to be treated as a hint that could be
> "wrong" (e.g. because the transaction hasn't committed yet), so you'd
> have to go to the heap in the context of a page split, to double
> check. Also, you'd need heuristics that let you give up on this
> strategy when it didn't help.
>
> I think that this could work well enough for OLTP workloads, and might
> be more future-proof than doing it in VACUUM. Though, of course, it's
> still very complicated.

Agreed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Thu, Jul 19, 2018 at 4:29 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> One area that might be worth investigating is retail index tuple
>> deletion performed within the executor in the event of non-HOT
>> updates. Maybe LP_REDIRECT could be repurposed to mean "ghost record",
>> at least in unique index tuples with no NULL values. The idea is that
>> MVCC index scans can skip over those if they've already found a
>> visible tuple with the same value.
>
> I think that's a good idea. The overhead of marking it as ghost seems
> small and it would speed up index scans. If MVCC index scans have
> already found a visible tuples with the same value they can not only
> skip scanning but also kill them? If can, we can kill index tuples
> without checking the heap.

I think you're talking about making LP_REDIRECT marking in nbtree
represent a "recently dead" hint: the deleting transaction has
committed, and so we are 100% sure that the tuple is about to become
garbage, but it cannot be LP_DEAD just yet because it needs to stay
around for the benefit of at least one existing snapshot/long running
transaction.

That's a different idea to what I talked about, since it could perhaps
work in a way that's much closer to LP_DEAD/kill_prior_tuple (no extra
executor stuff is required). I'm not sure if your idea is better or
worse than what I suggested, though. It would certainly be easier to
implement.

-- 
Peter Geoghegan


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Andrey Lepikhov
Date:
This v5 version of the patch is intended to use with 
v3-0001-Make-all-nbtree-index-tuples-have-unique-keys patch [1].

[1] 
https://www.postgresql.org/message-id/CAH2-WzkmTRXh%3DzyMAUHyG3%3DO-QQip6CJc2VyNijRO-vzgPxmoQ%40mail.gmail.com


13.07.2018 00:00, Peter Geoghegan пишет:
> On Tue, Jul 3, 2018 at 5:17 AM, Andrey V. Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> Done.
>> Attachment contains an update for use v.2 of the 'Ensure nbtree leaf tuple
>> keys are always unique' patch.
> My v3 is still pending, but is now a lot better than v2. There were
> bugs in v2 that were fixed.
>
> One area that might be worth investigating is retail index tuple
> deletion performed within the executor in the event of non-HOT
> updates. Maybe LP_REDIRECT could be repurposed to mean "ghost record",
> at least in unique index tuples with no NULL values. The idea is that
> MVCC index scans can skip over those if they've already found a
> visible tuple with the same value. Also, when there was about to be a
> page split, they could be treated a little bit like LP_DEAD items. Of
> course, the ghost bit would have to be treated as a hint that could be
> "wrong" (e.g. because the transaction hasn't committed yet), so you'd
> have to go to the heap in the context of a page split, to double
> check. Also, you'd need heuristics that let you give up on this
> strategy when it didn't help.
>
> I think that this could work well enough for OLTP workloads, and might
> be more future-proof than doing it in VACUUM. Though, of course, it's
> still very complicated.
>

-- 
---
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company


Attachment

Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Andrey Lepikhov
Date:

13.07.2018 00:00, Peter Geoghegan пишет:
> One area that might be worth investigating is retail index tuple
> deletion performed within the executor in the event of non-HOT
> updates.

I will try to use this idea. But now I developed a background worker 
already (experimental state). It clean an index and heap relations with 
a retail deletion approach.
It uses "soft" strategy with conditional buffer locks. Benchmarks on 
pgbench test demonstrate same execution time as PostgreSQL without the 
worker, but 90% of dead tuples in index and heap relations cleaned 
without vacuum.
My main efforts is involved to tuning of the worker.

---
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Masahiko Sawada
Date:
On Sat, Jul 21, 2018 at 3:11 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Jul 19, 2018 at 4:29 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> One area that might be worth investigating is retail index tuple
>>> deletion performed within the executor in the event of non-HOT
>>> updates. Maybe LP_REDIRECT could be repurposed to mean "ghost record",
>>> at least in unique index tuples with no NULL values. The idea is that
>>> MVCC index scans can skip over those if they've already found a
>>> visible tuple with the same value.
>>
>> I think that's a good idea. The overhead of marking it as ghost seems
>> small and it would speed up index scans. If MVCC index scans have
>> already found a visible tuples with the same value they can not only
>> skip scanning but also kill them? If can, we can kill index tuples
>> without checking the heap.
>
> I think you're talking about making LP_REDIRECT marking in nbtree
> represent a "recently dead" hint: the deleting transaction has
> committed, and so we are 100% sure that the tuple is about to become
> garbage, but it cannot be LP_DEAD just yet because it needs to stay
> around for the benefit of at least one existing snapshot/long running
> transaction.
>
> That's a different idea to what I talked about, since it could perhaps
> work in a way that's much closer to LP_DEAD/kill_prior_tuple (no extra
> executor stuff is required).

I understood. Thank you for explanation!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Andrey Lepikhov
Date:
Hi,
I wrote a background worker (hcleaner) to demonstrate application of 
Retail IndexTuple deletion (see patch at attachment).
Like Autovacuum it utilizes concept of one launcher and many workers. 
But one worker correspond to one database.

Short description:
Backend collects dirty block numbers by a hash table at the point in 
code immediately after heap_page_prune() call. Backend send a package of 
dirty block numbers (not one-by-one!) by socket at the end of 
transaction or if hash table is full.
Launcher transfers block numbers to correspond workers.
Worker collects dead tuples from a block, clean index relations, clean 
heap block. It uses conditional locking with waiting list approach if 
heap block are busy.

hcleaner has experimental status, but make check-world passed
.
For benchmarking i used xiaomi notebook with intel Core i5 8gen processor.

BENCHMARK
----------

test: pgbench -i -s 100 && pgbench -c 25 -j 8 -M prepared -P 60 -T 3600
autovacuum = off

master:
-------
number of transactions actually processed: 6373215
latency average = 14.122 ms
latency stddev = 9.458 ms
tps = 1770.291436 (including connections establishing)
tps = 1770.293191 (excluding connections establishing)

VACUUM verbose pgbench_accounts:
--------------------------------
INFO: vacuuming "public.pgbench_accounts"
INFO: scanned index "pgbench_accounts_pkey" to remove 237496 row versions
DETAIL: CPU: user: 4.67 s, system: 0.27 s, elapsed: 8.05 s
INFO: "pgbench_accounts": removed 237496 row versions in 167652 pages
DETAIL: CPU: user: 7.54 s, system: 3.40 s, elapsed: 26.10 s
INFO: index "pgbench_accounts_pkey" now contains 10000000 row versions 
in 27422 pages
DETAIL: 237496 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "pgbench_accounts": found 165275 removable, 10000000 nonremovable 
row versions in 167840 out of 167840 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 6373796
There were 82282 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 20.33 s, system: 5.88 s, elapsed: 51.38 s.

patched:
--------
number of transactions actually processed: 6338593
latency average = 14.199 ms
latency stddev = 13.988 ms
tps = 1760.685922 (including connections establishing)
tps = 1760.688038 (excluding connections establishing)

VACUUM verbose pgbench_accounts:
--------------------------------
INFO: vacuuming "public.pgbench_accounts"
INFO: scanned index "pgbench_accounts_pkey" to remove 1804 row versions
DETAIL: CPU: user: 1.84 s, system: 0.05 s, elapsed: 3.34 s
INFO: "pgbench_accounts": removed 1804 row versions in 1468 pages
DETAIL: CPU: user: 0.06 s, system: 0.03 s, elapsed: 1.42 s
INFO: index "pgbench_accounts_pkey" now contains 10000000 row versions 
in 27422 pages
DETAIL: 1618 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "pgbench_accounts": found 168561 removable, 10000000 nonremovable 
row versions in 169466 out of 169466 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 6339174
There were 75478 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 12.27 s, system: 4.03 s, elapsed: 31.43 s.

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment

Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Peter Geoghegan
Date:
On Tue, Aug 7, 2018 at 12:19 AM, Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> I wrote a background worker (hcleaner) to demonstrate application of Retail
> IndexTuple deletion (see patch at attachment).
> Like Autovacuum it utilizes concept of one launcher and many workers. But
> one worker correspond to one database.
>
> Short description:
> Backend collects dirty block numbers by a hash table at the point in code
> immediately after heap_page_prune() call. Backend send a package of dirty
> block numbers (not one-by-one!) by socket at the end of transaction or if
> hash table is full.
> Launcher transfers block numbers to correspond workers.
> Worker collects dead tuples from a block, clean index relations, clean heap
> block. It uses conditional locking with waiting list approach if heap block
> are busy.
>
> hcleaner has experimental status, but make check-world passed

How does this affect ordinary opportunistic pruning?

-- 
Peter Geoghegan


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Andrey Lepikhov
Date:
09.08.2018 05:19, Peter Geoghegan пишет:
> On Tue, Aug 7, 2018 at 12:19 AM, Andrey Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> I wrote a background worker (hcleaner) to demonstrate application of Retail
>> IndexTuple deletion (see patch at attachment).
>> Like Autovacuum it utilizes concept of one launcher and many workers. But
>> one worker correspond to one database.
>>
>> Short description:
>> Backend collects dirty block numbers by a hash table at the point in code
>> immediately after heap_page_prune() call. Backend send a package of dirty
>> block numbers (not one-by-one!) by socket at the end of transaction or if
>> hash table is full.
>> Launcher transfers block numbers to correspond workers.
>> Worker collects dead tuples from a block, clean index relations, clean heap
>> block. It uses conditional locking with waiting list approach if heap block
>> are busy.
>>
>> hcleaner has experimental status, but make check-world passed
> 
> How does this affect ordinary opportunistic pruning?
> 
As far as I understand, background worker not affect on ordinary 
opportunistic pruning. For a dirty block it made cleaning work like 
vacuum (if acquire conditional lock).

I tried two ways:
1. In executor: tid of deleted/updated tuples collects directly in 
executor and send to background worker. Background worker make 
heap_page_prune() himself.
But this is no good variant, because each backend perform opportunistic 
pruning and it is sufficient work to detect dead tuples. Moreover, in 
this case the worker compete with backends for pruning and has high load 
when many backends existed.
2. In heap_page_prune_opt(): backend collects ids of dirty blocks 
immediately after pruning and send it to the worker. In this case 
backend perform all work for detection of dead tuples. It let the worker 
to miss block cleaning during periods of heavy load: later it can clean 
block.
Since locks are conditional we do not hamper backends work during high 
load periods and utilize idle time for relation cleaning in soft manner.

Variant No.2 look better better and now i use it.

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Masahiko Sawada
Date:
On Tue, Aug 7, 2018 at 4:19 PM, Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> Hi,
> I wrote a background worker (hcleaner) to demonstrate application of Retail
> IndexTuple deletion (see patch at attachment).

The patch doesn't seem to have the hcleaner code. Could you share it?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Andrey Lepikhov
Date:
15.08.2018 12:17, Masahiko Sawada пишет:
> On Tue, Aug 7, 2018 at 4:19 PM, Andrey Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> Hi,
>> I wrote a background worker (hcleaner) to demonstrate application of Retail
>> IndexTuple deletion (see patch at attachment).
> 
> The patch doesn't seem to have the hcleaner code. Could you share it?

I appreciate you pointing out my mistake.

Attachment contains full patch set: indexTuple retail deletion, ordering 
b-tree tuples by tid (provided by Peter Geoghean) and background cleaner.

In this version of background worker you can show state of the hcleaner 
at the 'pg_stat_progress_cleaner' view, like VACUUM.

unlike the previous version, hcleaner check presence a block in memory 
before cleanup (see RBM_NORMAL_NO_READ mode at ReadBufferExtended() 
call) and do not read blocks from a disk storage (only on shutdown after 
SIGTERM signal catch).

For feature demonstration you can use simple test (autovacuum = off):
pgbench -i -s 1 && psql -c $"CREATE INDEX pgbench_accounts_ext ON 
public.pgbench_accounts USING btree (abalance);" && pgbench -t <n> -c 20 
-j 8 -f test.pgb

where test.pgb is:
-------
\set aid random(1, 100000 * :scale)
\set delta random(-5000, 5000)
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
-------

My workstation shows:

     | master                  | patched             |
n   |HEAP size | INDEX size | HEAP size | INDEX size |
---------------|-------------------------------------|
2e3 | 13 MB    | 2.3 MB     | 13 MB     | 2.3 MB     |
2e4 | 14 MB    | 2.7 MB     | 13 MB     | 2.7 MB     |
2e5 | 14 MB    | 8.0 MB     | 14 MB     | 4.8 MB     |
2e6 | 61 MB    | 58. MB     | 14 MB     | 6.7 MB     |

where HEAP size - size of 'pgbench_accounts' relation; INDEX size - size 
of 'pgbench_accounts_ext' index relation.
It is demonstrates a relation 'blowing' problem and influence of 
hcleaner in an excessive manner.

Some problem is regression tests modification, because hcleaner makes 
physical order of tuples in relations unpredictable.

> 
> Regards,
> 
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
> 

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment

Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Andrey Lepikhov
Date:
Hi,

I prepared next version of Background worker (cleaner) based on a retail 
indextuple deletion patch.
This version shows stable behavior on regression tests and pgbench 
workloads.

In this version:
1. Only AccessShareLock are acquired on a cleanup of heap and index 
relations.
2. Some 'aggressive' cleanup strategy introduced - conditional cleanup 
locks not used.
3. Cleanup only an in-memory blocks.
4. The Cleaner calls heap_page_prune() before cleanup a block.

Benchmarks
---------

Two factors were evaluated: performance (tps) and relations blowing.

Before each test some rarefaction of pgbench_accounts was modeled by 
deletion 10% of tuples at each block.
Also, I tested normal and Gaussian distribution of queries on 
pgbench_accounts relation.
Autovacuum uses default settings.

Script:
pgbench -i -s 10
psql -c $"DELETE FROM pgbench_accounts WHERE (random() < 0.1);"
psql -c $"VACUUM;"
psql -c $"CREATE INDEX pgbench_accounts_ext ON public.pgbench_accounts 
USING btree (abalance);" &&
pgbench -T 3600 -c 32 -j 8 -M prepared -P 600

NORMAL distribution:
average tps = 1045 (cleaner); = 1077 (autovacuum)

Relations size at the end of test, MB:
pgbench_accounts: 128 (cleaner); 128 (autovacuum)
pgbench_branches: 0.1 (cleaner); 2.1 (autovacuum)
pgbench_tellers: 0.4 (cleaner); 2.8 (autovacuum)
pgbench_accounts_pkey: 21 (cleaner); 43 (autovacuum)
pgbench_accounts_ext: 48 (cleaner); 56 (autovacuum)

Gaussian distribution:
average tps = 213 (cleaner); = 213 (autovacuum)

Relations size at the end of test, MB:
pgbench_accounts: 128 (cleaner); 128 (autovacuum)
pgbench_accounts_ext: 22 (cleaner); 29 (autovacuum)

Conclusions
-----------
1. For retail indextuple deletion purposes i replaced ItemIdSetDead() by 
ItemIdMarkDead() in heap_page_prune_execute() operation. Hereupon in the 
case of 100% filling of each relation block we get a blowing HEAP and 
index , more or less. When the blocks already have free space, the 
cleaner can delay blowing the heap and index without a vacuum.
2. Cleaner works fine in the case of skewness of access frequency to 
relation blocks.
3. The cleaner does not cause a decrease of performance.

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment

Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Andrey Lepikhov
Date:
And background worker patch in attachment.

05.09.2018 15:25, Andrey Lepikhov пишет:
> Hi,
> 
> I prepared next version of Background worker (cleaner) based on a retail 
> indextuple deletion patch.
> This version shows stable behavior on regression tests and pgbench 
> workloads.
> 
> In this version:
> 1. Only AccessShareLock are acquired on a cleanup of heap and index 
> relations.
> 2. Some 'aggressive' cleanup strategy introduced - conditional cleanup 
> locks not used.
> 3. Cleanup only an in-memory blocks.
> 4. The Cleaner calls heap_page_prune() before cleanup a block.
> 
> Benchmarks
> ---------
> 
> Two factors were evaluated: performance (tps) and relations blowing.
> 
> Before each test some rarefaction of pgbench_accounts was modeled by 
> deletion 10% of tuples at each block.
> Also, I tested normal and Gaussian distribution of queries on 
> pgbench_accounts relation.
> Autovacuum uses default settings.
> 
> Script:
> pgbench -i -s 10
> psql -c $"DELETE FROM pgbench_accounts WHERE (random() < 0.1);"
> psql -c $"VACUUM;"
> psql -c $"CREATE INDEX pgbench_accounts_ext ON public.pgbench_accounts 
> USING btree (abalance);" &&
> pgbench -T 3600 -c 32 -j 8 -M prepared -P 600
> 
> NORMAL distribution:
> average tps = 1045 (cleaner); = 1077 (autovacuum)
> 
> Relations size at the end of test, MB:
> pgbench_accounts: 128 (cleaner); 128 (autovacuum)
> pgbench_branches: 0.1 (cleaner); 2.1 (autovacuum)
> pgbench_tellers: 0.4 (cleaner); 2.8 (autovacuum)
> pgbench_accounts_pkey: 21 (cleaner); 43 (autovacuum)
> pgbench_accounts_ext: 48 (cleaner); 56 (autovacuum)
> 
> Gaussian distribution:
> average tps = 213 (cleaner); = 213 (autovacuum)
> 
> Relations size at the end of test, MB:
> pgbench_accounts: 128 (cleaner); 128 (autovacuum)
> pgbench_accounts_ext: 22 (cleaner); 29 (autovacuum)
> 
> Conclusions
> -----------
> 1. For retail indextuple deletion purposes i replaced ItemIdSetDead() by 
> ItemIdMarkDead() in heap_page_prune_execute() operation. Hereupon in the 
> case of 100% filling of each relation block we get a blowing HEAP and 
> index , more or less. When the blocks already have free space, the 
> cleaner can delay blowing the heap and index without a vacuum.
> 2. Cleaner works fine in the case of skewness of access frequency to 
> relation blocks.
> 3. The cleaner does not cause a decrease of performance.
> 

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company


Attachment

Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Andrey Lepikhov
Date:
The next version of amtargetdelete() interface and its implementation 
for nbtree, devoted to the problem of retail indextuple deletion.
Uses 'v5-0001-Make-nbtree-indexes-have-unique-keys-in-tuples' patch [1] 
to delete all logical duplicates by one tree descent.

[1] 
https://www.postgresql.org/message-id/CAH2-WzkfK=JVHjkd17TLDvsFb6psduTt5WYiT8dg+-UFc+rSSQ@mail.gmail.com

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment

Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Andrey Lepikhov
Date:
The v6 version of quick vacuum, which utilizes the amtargetdelete() 
interface for retail indextuple deletion.
Now it is more simple and laconic.
It must be applied after Retail-IndexTuple-Deletion-Access-Method.patch.

BENCHMARKS:
-----------

Initial script:
pgbench -i -s $scale # initial DB generation
"CREATE INDEX pgbench_accounts_ext ON public.pgbench_accounts USING 
btree (abalance);" # additional index

Comparison with lazy vacuum:

script:
"DELETE FROM pgbench_accounts WHERE (random() < $factor);" # delete a 
part of tuples for cleaning strategies comparison
"VACUUM pgbench_accounts;" # check time of vacuum process by bash 'date 
+%s%N | cut -b1-13' command

Results:
        | $scale=10   | $scale=100  |
$factor| QVAC | LVAC | QVAC | LVAC |
1E-6   | -    | -    | 284  | 979  |
1E-5   | 78   |    144  | 288  | 1423 |
1E-4   | 72   |    280  | 388  | 3304 |
1E-3   | 189  |    609  | 2294 | 6029 |
1E-2   | 443  | 783  | 54232| 67884|
1E-1   | 1593 | 1237 | 83092| 86104|

where QVAC - forced use of quick vacuum; LVAC - use lazy vacuum for 
index cleanup. $factor corresponds a number of vacuumed tuples. For 
example, $scale=10, $factor=1E-1 -> 100000 tuples vacuumed. Time 
measured in ms.

So, quick strategy can be used in a vacuum process effectively up to 
1-2% of DEAD tuples in a relation.

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment

Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Dmitry Dolgov
Date:
> On Fri, Sep 21, 2018 at 5:52 AM Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote:
>
> The v6 version of quick vacuum, which utilizes the amtargetdelete()
> interface for retail indextuple deletion.
> Now it is more simple and laconic.
> It must be applied after Retail-IndexTuple-Deletion-Access-Method.patch.
>
> BENCHMARKS:
> -----------
>
> Initial script:
> pgbench -i -s $scale # initial DB generation
> "CREATE INDEX pgbench_accounts_ext ON public.pgbench_accounts USING
> btree (abalance);" # additional index
>
> Comparison with lazy vacuum:
>
> script:
> "DELETE FROM pgbench_accounts WHERE (random() < $factor);" # delete a
> part of tuples for cleaning strategies comparison
> "VACUUM pgbench_accounts;" # check time of vacuum process by bash 'date
> +%s%N | cut -b1-13' command
>
> Results:
>         | $scale=10   | $scale=100  |
> $factor| QVAC | LVAC | QVAC | LVAC |
> 1E-6   | -    | -    | 284  | 979  |
> 1E-5   | 78   | 144  | 288  | 1423 |
> 1E-4   | 72   | 280  | 388  | 3304 |
> 1E-3   | 189  | 609  | 2294 | 6029 |
> 1E-2   | 443  | 783  | 54232| 67884|
> 1E-1   | 1593 | 1237 | 83092| 86104|
>
> where QVAC - forced use of quick vacuum; LVAC - use lazy vacuum for
> index cleanup. $factor corresponds a number of vacuumed tuples. For
> example, $scale=10, $factor=1E-1 -> 100000 tuples vacuumed. Time
> measured in ms.
>
> So, quick strategy can be used in a vacuum process effectively up to
> 1-2% of DEAD tuples in a relation.

Hi,

Unfortunately, this patch doesn't compile anymore:

index.c: In function ‘htid2IndexDatum’:
index.c:4172:2: error: too few arguments to function ‘MakeSingleTupleTableSlot’
  TupleTableSlot *slot = MakeSingleTupleTableSlot(RelationGetDescr(hrel));
  ^

Also I'm a bit confused about the current status of collaboration between this
patch and [1], one is tightly depends on another or not? Does it makes sense
to have only one corresponding CF item then? For now I'll move this one to
the next CF.

[1]:
https://www.postgresql.org/message-id/flat/CAH2-WzkVb0Kom%3DR%2B88fDFb%3DJSxZMFvbHVC6Mn9LJ2n%3DX%3DkS-Uw%40mail.gmail.com


Re: [WIP] [B-Tree] Retail IndexTuple deletion

From
Andres Freund
Date:
On 2018-11-29 14:27:50 +0100, Dmitry Dolgov wrote:
> > On Fri, Sep 21, 2018 at 5:52 AM Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote:
> >
> > The v6 version of quick vacuum, which utilizes the amtargetdelete()
> > interface for retail indextuple deletion.
> > Now it is more simple and laconic.
> > It must be applied after Retail-IndexTuple-Deletion-Access-Method.patch.
> >
> > BENCHMARKS:
> > -----------
> >
> > Initial script:
> > pgbench -i -s $scale # initial DB generation
> > "CREATE INDEX pgbench_accounts_ext ON public.pgbench_accounts USING
> > btree (abalance);" # additional index
> >
> > Comparison with lazy vacuum:
> >
> > script:
> > "DELETE FROM pgbench_accounts WHERE (random() < $factor);" # delete a
> > part of tuples for cleaning strategies comparison
> > "VACUUM pgbench_accounts;" # check time of vacuum process by bash 'date
> > +%s%N | cut -b1-13' command
> >
> > Results:
> >         | $scale=10   | $scale=100  |
> > $factor| QVAC | LVAC | QVAC | LVAC |
> > 1E-6   | -    | -    | 284  | 979  |
> > 1E-5   | 78   | 144  | 288  | 1423 |
> > 1E-4   | 72   | 280  | 388  | 3304 |
> > 1E-3   | 189  | 609  | 2294 | 6029 |
> > 1E-2   | 443  | 783  | 54232| 67884|
> > 1E-1   | 1593 | 1237 | 83092| 86104|
> >
> > where QVAC - forced use of quick vacuum; LVAC - use lazy vacuum for
> > index cleanup. $factor corresponds a number of vacuumed tuples. For
> > example, $scale=10, $factor=1E-1 -> 100000 tuples vacuumed. Time
> > measured in ms.
> >
> > So, quick strategy can be used in a vacuum process effectively up to
> > 1-2% of DEAD tuples in a relation.
> 
> Hi,
> 
> Unfortunately, this patch doesn't compile anymore:
> 
> index.c: In function ‘htid2IndexDatum’:
> index.c:4172:2: error: too few arguments to function ‘MakeSingleTupleTableSlot’
>   TupleTableSlot *slot = MakeSingleTupleTableSlot(RelationGetDescr(hrel));
>   ^
> 
> Also I'm a bit confused about the current status of collaboration between this
> patch and [1], one is tightly depends on another or not? Does it makes sense
> to have only one corresponding CF item then? For now I'll move this one to
> the next CF.
> 
> [1]:
https://www.postgresql.org/message-id/flat/CAH2-WzkVb0Kom%3DR%2B88fDFb%3DJSxZMFvbHVC6Mn9LJ2n%3DX%3DkS-Uw%40mail.gmail.com


Given this patch has not been updated since Dmitry's message above, I'm
marking this as returned with feedback.

Greetings,

Andres Freund