Thread: Microvacuum support for Hash Index

Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

24 October 2016, 08:51:14

Hi All,

I have added a microvacuum support for hash index access method and
attached is the v1 patch for the same. The patch basically takes care
of the following things:

1. Firstly, it changes the marking of dead tuples from
'tuple-at-a-time' to 'page-at-a-time' during hash index scan. For this
we accumulate the heap tids and offset of all the hash index tuples if
it is pointed by kill_prior_tuple during scan and then mark all
accumulated tids as LP_DEAD either while stepping from one page to
another (assuming the scan in both forward and backward direction) or
during end of the hash index scan or during rescan.

2. Secondly, when inserting tuple into hash index table, if not enough
space is found on a current page then it ensures that we first clean
the dead tuples if found in the current hash index page before moving
to the next page in a bucket chain or going for a bucket split. This
basically increases the page reusability and reduces the number of
page splits, thereby reducing the overall size of hash index table.

I have compared the hash index size with and without my patch
(microvacuum_hash_index_v1.patch attached with this mail) on a high
end machine at various scale factors and the results are shown below.
For testing this, i have created hash index (pgbench_accounts_aid) on
aid column of 'pgbench_accounts' table instead of primary key and the
results shown in below table are for the same. The patch
(pgbench.patch) having these changes is also attached with this mail.
Moreover, I am using my own script file (file_hash_kill_prior_tuple)
for updating the index column with pgbench read-write command. The
script file 'file_hash_kill_prior_tuple' is also attached with this
mail.

Here are some initial test results showing the benefit of this patch:

postgresql.conf and pgbench settings:
autovacuum=off
client counts = 64
run time duration = 15 mins

./pgbench -c $threads -j $threads -T 900 postgres -f
~/file_hash_kill_prior_tuple

Scale Factor    hash index size @ start         HEAD        HEAD + Patch
10                   32 MB                                 579 MB      158 MB
50                   128 MB                               630 MB      350 MB
100                 256 MB                              1255 MB      635 MB
300                 1024 MB                            2233 MB      1093 MB


As shown in above result, at 10 scale factor the hash index size has
reduced by almost 4 times whereas at 50 and 300 scale factors it has
reduced by half with my patch. This basically proves that we can
reduce the hash index size to a good extent with this patch.

System specifications:
---------------------------------
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             8
NUMA node(s):          8
Vendor ID:             GenuineIntel


Note: The patch (microvacuum_hash_index_v1.patch) is prepared on top
of concurrent_hash_index_v8.patch-[1] and wal_hash_index_v5.1.patch[2]
for hash index.


[1] - https://www.postgresql.org/message-id/CAA4eK1%2BX%3D8sUd1UCZDZnE3D9CGi9kw%2Bkjxp2Tnw7SX5w8pLBNw%40mail.gmail.com
[2] -
https://www.postgresql.org/message-id/CAA4eK1KE%3D%2BkkowyYD0vmch%3Dph4ND3H1tViAB%2B0cWTHqjZDDfqg%40mail.gmail.com

Attachment

Re: Microvacuum support for Hash Index

From

Amit Kapila

Date:

28 October 2016, 11:12:33

On Mon, Oct 24, 2016 at 2:21 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> Hi All,
>
> I have added a microvacuum support for hash index access method and
> attached is the v1 patch for the same.
>

This is an important functionality for hash index as we already do
have same functionality for other types of indexes like btree.

> The patch basically takes care
> of the following things:
>
> 1. Firstly, it changes the marking of dead tuples from
> 'tuple-at-a-time' to 'page-at-a-time' during hash index scan. For this
> we accumulate the heap tids and offset of all the hash index tuples if
> it is pointed by kill_prior_tuple during scan and then mark all
> accumulated tids as LP_DEAD either while stepping from one page to
> another (assuming the scan in both forward and backward direction) or
> during end of the hash index scan or during rescan.
>
> 2. Secondly, when inserting tuple into hash index table, if not enough
> space is found on a current page then it ensures that we first clean
> the dead tuples if found in the current hash index page before moving
> to the next page in a bucket chain or going for a bucket split. This
> basically increases the page reusability and reduces the number of
> page splits, thereby reducing the overall size of hash index table.
>

Few comments on patch:

1.
+static void
+hash_xlog_vacuum_one_page(XLogReaderState *record)
+{
+ XLogRecPtr lsn = record->EndRecPtr;
+ xl_hash_vacuum *xldata = (xl_hash_vacuum *) XLogRecGetData(record);
+ Buffer bucketbuf = InvalidBuffer;
+ Buffer buffer;
+ Buffer metabuf;
+ Page page;
+ XLogRedoAction action;

While replaying the delete/vacuum record on standby, it can conflict
with some already running queries.  Basically the replay can remove
some row which can be visible on standby.  You need to resolve
conflicts similar to what we do in btree delete records (refer
btree_xlog_delete).

2.
+ /*
+ * Write-lock the meta page so that we can decrement
+ * tuple count.
+ */
+ _hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_WRITE);
+
+ _hash_vacuum_one_page(rel, metabuf, buf, bucket_buf,
+  (buf == bucket_buf) ? true : false);
+
+ _hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);

It seems here meta page lock is acquired for duration more than
required and also it is not required when there are no deletable items
on page. You can take the metapage lock before decrementing the count.

3. Assert(offnum <= maxoff);
+

Spurious space.  There are some other similar spurious white space
changes in patch, remove them as well.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

31 October 2016, 04:34:03

Hi Amit,

Thanks for showing your interest and reviewing my patch. I have
started looking into your review comments. I will share the updated
patch in a day or two.

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

On Fri, Oct 28, 2016 at 4:42 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Oct 24, 2016 at 2:21 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>> Hi All,
>>
>> I have added a microvacuum support for hash index access method and
>> attached is the v1 patch for the same.
>>
>
> This is an important functionality for hash index as we already do
> have same functionality for other types of indexes like btree.
>
>> The patch basically takes care
>> of the following things:
>>
>> 1. Firstly, it changes the marking of dead tuples from
>> 'tuple-at-a-time' to 'page-at-a-time' during hash index scan. For this
>> we accumulate the heap tids and offset of all the hash index tuples if
>> it is pointed by kill_prior_tuple during scan and then mark all
>> accumulated tids as LP_DEAD either while stepping from one page to
>> another (assuming the scan in both forward and backward direction) or
>> during end of the hash index scan or during rescan.
>>
>> 2. Secondly, when inserting tuple into hash index table, if not enough
>> space is found on a current page then it ensures that we first clean
>> the dead tuples if found in the current hash index page before moving
>> to the next page in a bucket chain or going for a bucket split. This
>> basically increases the page reusability and reduces the number of
>> page splits, thereby reducing the overall size of hash index table.
>>
>
> Few comments on patch:
>
> 1.
> +static void
> +hash_xlog_vacuum_one_page(XLogReaderState *record)
> +{
> + XLogRecPtr lsn = record->EndRecPtr;
> + xl_hash_vacuum *xldata = (xl_hash_vacuum *) XLogRecGetData(record);
> + Buffer bucketbuf = InvalidBuffer;
> + Buffer buffer;
> + Buffer metabuf;
> + Page page;
> + XLogRedoAction action;
>
> While replaying the delete/vacuum record on standby, it can conflict
> with some already running queries.  Basically the replay can remove
> some row which can be visible on standby.  You need to resolve
> conflicts similar to what we do in btree delete records (refer
> btree_xlog_delete).
>
> 2.
> + /*
> + * Write-lock the meta page so that we can decrement
> + * tuple count.
> + */
> + _hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_WRITE);
> +
> + _hash_vacuum_one_page(rel, metabuf, buf, bucket_buf,
> +  (buf == bucket_buf) ? true : false);
> +
> + _hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
>
> It seems here meta page lock is acquired for duration more than
> required and also it is not required when there are no deletable items
> on page. You can take the metapage lock before decrementing the count.
>
> 3.
>   Assert(offnum <= maxoff);
> +
>
> Spurious space.  There are some other similar spurious white space
> changes in patch, remove them as well.
>
>
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com

Re: Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

02 November 2016, 05:38:26

Hi,

> While replaying the delete/vacuum record on standby, it can conflict
> with some already running queries.  Basically the replay can remove
> some row which can be visible on standby.  You need to resolve
> conflicts similar to what we do in btree delete records (refer
> btree_xlog_delete).

Agreed. Thanks for putting this point. I have taken care of it in the
attached v2 patch.

> + /*
> + * Write-lock the meta page so that we can decrement
> + * tuple count.
> + */
> + _hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_WRITE);
> +
> + _hash_vacuum_one_page(rel, metabuf, buf, bucket_buf,
> +  (buf == bucket_buf) ? true : false);
> +
> + _hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
>
> It seems here meta page lock is acquired for duration more than
> required and also it is not required when there are no deletable items
> on page. You can take the metapage lock before decrementing the count.

Ok. Corrected. Please refer to the attached v2 patch.


> Spurious space.  There are some other similar spurious white space
> changes in patch, remove them as well.

Corrected. Please refer attached v2 patch.

Attachment

microvacuum_hash_index_v2.patch

Re: Microvacuum support for Hash Index

From

Jesper Pedersen

Date:

03 November 2016, 18:00:47

Hi,

On 11/02/2016 01:38 AM, Ashutosh Sharma wrote:
>> While replaying the delete/vacuum record on standby, it can conflict
>> with some already running queries.  Basically the replay can remove
>> some row which can be visible on standby.  You need to resolve
>> conflicts similar to what we do in btree delete records (refer
>> btree_xlog_delete).
>
> Agreed. Thanks for putting this point. I have taken care of it in the
> attached v2 patch.
>

Some initial comments.

_hash_vacuum_one_page:

+        END_CRIT_SECTION();
+        _hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);

The _hash_chgbufaccess() needs a comment.

You also need a place where you call pfree for so->killedItems - maybe 
in hashkillitems().

Best regards, Jesper

Re: Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

11 November 2016, 05:11:29

Hi Jesper,

> Some initial comments.
>
> _hash_vacuum_one_page:
>
> +               END_CRIT_SECTION();
> +               _hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
>
> The _hash_chgbufaccess() needs a comment.
>
> You also need a place where you call pfree for so->killedItems - maybe in
> hashkillitems().

Thanks for reviewing this patch. I would like to update you that this
patch has got dependency on a patch for concurrent hash index and WAL
log in hash index. So, till these two patches for hash index are not
stable I won't be able to share you a next version of patch for
supporting microvacuum in hash index.

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

Re: Microvacuum support for Hash Index

From

Jesper Pedersen

Date:

01 December 2016, 14:42:18

On 11/11/2016 12:11 AM, Ashutosh Sharma wrote:
> Thanks for reviewing this patch. I would like to update you that this
> patch has got dependency on a patch for concurrent hash index and WAL
> log in hash index. So, till these two patches for hash index are not
> stable I won't be able to share you a next version of patch for
> supporting microvacuum in hash index.
>

As the concurrent hash index patch was committed in 6d46f4 this patch 
needs a rebase.

I have moved this submission to the next CF.

Thanks for working on this !

Best regards, Jesper

Re: [HACKERS] Microvacuum support for Hash Index

From

Jesper Pedersen

Date:

30 December 2016, 17:27:45

On 11/11/2016 12:11 AM, Ashutosh Sharma wrote:
> Hi Jesper,
>
>> Some initial comments.
>>
>> _hash_vacuum_one_page:
>>
>> +               END_CRIT_SECTION();
>> +               _hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
>>
>> The _hash_chgbufaccess() needs a comment.
>>
>> You also need a place where you call pfree for so->killedItems - maybe in
>> hashkillitems().
>
> Thanks for reviewing this patch. I would like to update you that this
> patch has got dependency on a patch for concurrent hash index and WAL
> log in hash index. So, till these two patches for hash index are not
> stable I won't be able to share you a next version of patch for
> supporting microvacuum in hash index.
>

This can be rebased on the WAL v7 patch [1]. In addition to the previous 
comments you need to take commit 7819ba into account.

[1] 
https://www.postgresql.org/message-id/CAA4eK1%2BdmGNTFMnLO4EbOWJDHUq%3D%2Ba2L8T%3D72ifXeh-Kd8HOsg%40mail.gmail.com

Best regards, Jesper

Re: [HACKERS] Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

04 January 2017, 14:13:38

Hi,

> This can be rebased on the WAL v7 patch [1]. In addition to the previous
> comments you need to take commit 7819ba into account.
>

Attached is the v3 patch rebased on postgreSQL HEAD and WAL v7 patch.
It also takes care of all the previous comments from Jesper - [1].

Also, I have changed the status of this patch to "Needs review" for
this commit-fest.

[1] https://www.postgresql.org/message-id/a751842f-2aed-9f2e-104c-34cfe06bfbe2%40redhat.com

With Regards,
Ashutosh Sharma.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

microvacuum_hash_index_v3.patch

Re: [HACKERS] Microvacuum support for Hash Index

From

Jesper Pedersen

Date:

05 January 2017, 20:33:42

Hi Ashutosh,

On 01/04/2017 06:13 AM, Ashutosh Sharma wrote:
> Attached is the v3 patch rebased on postgreSQL HEAD and WAL v7 patch.
> It also takes care of all the previous comments from Jesper - [1].
>

With an --enable-cassert build (master / WAL v7 / MV v3) and

-- ddl.sql --
CREATE TABLE test AS SELECT generate_series(1, 10) AS id, 0 AS val;
CREATE INDEX IF NOT EXISTS idx_id ON test USING hash (id);
CREATE INDEX IF NOT EXISTS idx_val ON test USING hash (val);
ANALYZE;
-- ddl.sql --

-- test.sql --
\set id random(1,10)
\set val random(0,10)
BEGIN;
UPDATE test SET val = :val WHERE id = :id;
COMMIT;
-- test.sql --

using pgbench -M prepared -c 10 -j 10 -T 600 -f test.sql test

crashes after a few minutes with

TRAP: FailedAssertion("!(LWLockHeldByMeInMode(((LWLock*) 
(&(bufHdr)->content_lock)), LW_EXCLUSIVE))", File: "bufmgr.c", Line: 3781)

BTW, better rename 'hashkillitems' to '_hash_kill_items' to follow the 
naming convention in hash.h

Best regards, Jesper

Re: [HACKERS] Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

06 January 2017, 08:54:57

Hi,

> using pgbench -M prepared -c 10 -j 10 -T 600 -f test.sql test
>
> crashes after a few minutes with
>
> TRAP: FailedAssertion("!(LWLockHeldByMeInMode(((LWLock*)
> (&(bufHdr)->content_lock)), LW_EXCLUSIVE))", File: "bufmgr.c", Line: 3781)

Attached v4 patch fixes this assertion failure.

> BTW, better rename 'hashkillitems' to '_hash_kill_items' to follow the
> naming convention in hash.h

okay, I have renamed 'hashkillitems' to '_hash_kill_items'. Please
check the attached v4 patch.

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

microvacuum_hash_index_v4.patch

Re: [HACKERS] Microvacuum support for Hash Index

From

Jesper Pedersen

Date:

09 January 2017, 20:21:08

Hi Ashutosh,

On 01/06/2017 12:54 AM, Ashutosh Sharma wrote:
>> using pgbench -M prepared -c 10 -j 10 -T 600 -f test.sql test
>>
>> crashes after a few minutes with
>>
>> TRAP: FailedAssertion("!(LWLockHeldByMeInMode(((LWLock*)
>> (&(bufHdr)->content_lock)), LW_EXCLUSIVE))", File: "bufmgr.c", Line: 3781)
>
> Attached v4 patch fixes this assertion failure.
>

Yes, that fixes the problem.

However (master / WAL v7 / MV v4),

--- ddl.sql ---
CREATE TABLE test AS SELECT generate_series(1, 10) AS id, 0 AS val;
CREATE INDEX IF NOT EXISTS idx_id ON test USING hash (id);
CREATE INDEX IF NOT EXISTS idx_val ON test USING hash (val);
ANALYZE;
--- ddl.sql ---

--- test.sql ---
\set id random(1,10)
\set val random(0,10)
BEGIN;
DELETE FROM test WHERE id = :id;
INSERT INTO test VALUES (:id, :val);
COMMIT;
--- test.sql ---

gives

#9  0x000000000098a83e in elog_finish (elevel=20, fmt=0xb6ea92 
"incorrect local pin count: %d") at elog.c:1378
#10 0x00000000007f0b33 in LockBufferForCleanup (buffer=1677) at 
bufmgr.c:3605
#11 0x0000000000549390 in XLogReadBufferForRedoExtended 
(record=0x2afced8, block_id=1 '\001', mode=RBM_NORMAL, 
get_cleanup_lock=1 '\001', buf=0x7ffe3ee27c8c) at xlogutils.c:394
#12 0x00000000004c5026 in hash_xlog_vacuum_one_page (record=0x2afced8) 
at hash_xlog.c:1109
#13 0x00000000004c5547 in hash_redo (record=0x2afced8) at hash_xlog.c:1214
#14 0x000000000053a361 in StartupXLOG () at xlog.c:6975
#15 0x00000000007a4ca0 in StartupProcessMain () at startup.c:216

on the slave instance in a master-slave setup.

Also, the src/backend/access/README file should be updated with a 
description of the changes which this patch provides.

Best regards, Jesper

Re: [HACKERS] Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

10 January 2017, 13:24:58

Hi Jesper,

> However (master / WAL v7 / MV v4),
>
> --- ddl.sql ---
> CREATE TABLE test AS SELECT generate_series(1, 10) AS id, 0 AS val;
> CREATE INDEX IF NOT EXISTS idx_id ON test USING hash (id);
> CREATE INDEX IF NOT EXISTS idx_val ON test USING hash (val);
> ANALYZE;
> --- ddl.sql ---
>
> --- test.sql ---
> \set id random(1,10)
> \set val random(0,10)
> BEGIN;
> DELETE FROM test WHERE id = :id;
> INSERT INTO test VALUES (:id, :val);
> COMMIT;
> --- test.sql ---
>
> gives
>
> #9  0x000000000098a83e in elog_finish (elevel=20, fmt=0xb6ea92 "incorrect
> local pin count: %d") at elog.c:1378
> #10 0x00000000007f0b33 in LockBufferForCleanup (buffer=1677) at
> bufmgr.c:3605
> #11 0x0000000000549390 in XLogReadBufferForRedoExtended (record=0x2afced8,
> block_id=1 '\001', mode=RBM_NORMAL, get_cleanup_lock=1 '\001',
> buf=0x7ffe3ee27c8c) at xlogutils.c:394
> #12 0x00000000004c5026 in hash_xlog_vacuum_one_page (record=0x2afced8) at
> hash_xlog.c:1109
> #13 0x00000000004c5547 in hash_redo (record=0x2afced8) at hash_xlog.c:1214
> #14 0x000000000053a361 in StartupXLOG () at xlog.c:6975
> #15 0x00000000007a4ca0 in StartupProcessMain () at startup.c:216
>
> on the slave instance in a master-slave setup.

Thanks for reporting this problem. It is basically coming because i
forgot to unpin the bucketbuf in hash_xlog_vacuum_one_page(). Please
find the attached v5 patch that fixes the issue.

>
> Also, the src/backend/access/README file should be updated with a
> description of the changes which this patch provides.

okay, I have updated the insertion algorithm in the README file.

--
With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

microvacuum_hash_index_v5.patch

Re: [HACKERS] Microvacuum support for Hash Index

From

Jesper Pedersen

Date:

10 January 2017, 16:40:08

Hi Ashutosh,

On 01/10/2017 05:24 AM, Ashutosh Sharma wrote:
> Thanks for reporting this problem. It is basically coming because i
> forgot to unpin the bucketbuf in hash_xlog_vacuum_one_page(). Please
> find the attached v5 patch that fixes the issue.
>

The crash is now fixed, but the

--- test.sql ---
\set id random(1,10)
\set val random(0,10)
BEGIN;
UPDATE test SET val = :val WHERE id = :id;
COMMIT;
--- test.sql ---

case gives

client 6 aborted in command 3 of script 0; ERROR:  deadlock detected
DETAIL:  Process 14608 waits for ShareLock on transaction 1444620; 
blocked by process 14610.
Process 14610 waits for ShareLock on transaction 1444616; blocked by 
process 14608.
HINT:  See server log for query details.
CONTEXT:  while rechecking updated tuple (12,3) in relation "test"
...

using pgbench -M prepared -c 10 -j 10 -T 300 -f test.sql test

Best regards, Jesper

Re: [HACKERS] Microvacuum support for Hash Index

From

Jesper Pedersen

Date:

19 January 2017, 16:43:56

Hi Ashutosh,

On 01/10/2017 08:40 AM, Jesper Pedersen wrote:
> On 01/10/2017 05:24 AM, Ashutosh Sharma wrote:
>> Thanks for reporting this problem. It is basically coming because i
>> forgot to unpin the bucketbuf in hash_xlog_vacuum_one_page(). Please
>> find the attached v5 patch that fixes the issue.
>>
>
> The crash is now fixed, but the
>
> --- test.sql ---
> \set id random(1,10)
> \set val random(0,10)
> BEGIN;
> UPDATE test SET val = :val WHERE id = :id;
> COMMIT;
> --- test.sql ---
>
> case gives
>
> client 6 aborted in command 3 of script 0; ERROR:  deadlock detected
> DETAIL:  Process 14608 waits for ShareLock on transaction 1444620;
> blocked by process 14610.
> Process 14610 waits for ShareLock on transaction 1444616; blocked by
> process 14608.
> HINT:  See server log for query details.
> CONTEXT:  while rechecking updated tuple (12,3) in relation "test"
> ...
>
> using pgbench -M prepared -c 10 -j 10 -T 300 -f test.sql test
>

I'm not seeing this deadlock with just the WAL v8 patch applied.

Best regards, Jesper

Re: [HACKERS] Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

20 January 2017, 12:18:10

Hi Jesper,

> I'm not seeing this deadlock with just the WAL v8 patch applied.
>

okay, Thanks for confirming that.

I would like to update you that I am not able to reproduce this issue
at my end. I suspect that the steps i am following might be slightly
different than your's. Could you please have a look at steps mentioned
below and confirm if there is something different that I am doing.

Firstly, I am running the test-case on following git commit in head:

<git-commmit>
commit ba61a04bc7fefeee03416d9911eb825c4897c223
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Thu Jan 19 19:52:13 2017 -0500
   Avoid core dump for empty prepared statement in an aborted transaction.
   Brown-paper-bag bug in commit ab1f0c822: the old code here coped with   null CachedPlanSource.raw_parse_tree, the
newcode not so much.   Per report from Dave Cramer.
 
</git-commit>

On top of above commit, I have applied WAL v8 patch for hash index and
MV v5 patch.

Now, with an --enable-cassert build I am following below steps:

1) Created a 'test' database

2) psql -d test -f ~/ddl.sql

where ddl.sql is,

-- ddl.sql --
CREATE TABLE test AS SELECT generate_series(1, 10) AS id, 0 AS val;
CREATE INDEX IF NOT EXISTS idx_id ON test USING hash (id);
CREATE INDEX IF NOT EXISTS idx_val ON test USING hash (val);
ANALYZE;
-- ddl.sql --

3) pgbench -M prepared -c 10 -j 10 -T 1800 -f ~/test.sql test

where test.sql is,

-- test.sql --
\set id random(1,10)
\set val random(0,10)
BEGIN;
UPDATE test SET val = :val WHERE id = :id;
COMMIT;
-- test.sql --


Machine details are as follows:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             8

Also, It would be great if you could confirm as if you have been
getting this issue repeatedly. Thanks.

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Microvacuum support for Hash Index

From

Jesper Pedersen

Date:

20 January 2017, 23:24:34

Hi Ashutosh,

On 01/20/2017 04:18 AM, Ashutosh Sharma wrote:
> okay, Thanks for confirming that.
>
> I would like to update you that I am not able to reproduce this issue
> at my end. I suspect that the steps i am following might be slightly
> different than your's. Could you please have a look at steps mentioned
> below and confirm if there is something different that I am doing.
>
> Firstly, I am running the test-case on following git commit in head:
>
> <git-commmit>
> commit ba61a04bc7fefeee03416d9911eb825c4897c223
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   Thu Jan 19 19:52:13 2017 -0500
>
>     Avoid core dump for empty prepared statement in an aborted transaction.
>
>     Brown-paper-bag bug in commit ab1f0c822: the old code here coped with
>     null CachedPlanSource.raw_parse_tree, the new code not so much.
>     Per report from Dave Cramer.
> </git-commit>
>
> On top of above commit, I have applied WAL v8 patch for hash index and
> MV v5 patch.
>
> Now, with an --enable-cassert build I am following below steps:
>
> 1) Created a 'test' database
>
> 2) psql -d test -f ~/ddl.sql
>
> where ddl.sql is,
>
> -- ddl.sql --
> CREATE TABLE test AS SELECT generate_series(1, 10) AS id, 0 AS val;
> CREATE INDEX IF NOT EXISTS idx_id ON test USING hash (id);
> CREATE INDEX IF NOT EXISTS idx_val ON test USING hash (val);
> ANALYZE;
> -- ddl.sql --
>
> 3) pgbench -M prepared -c 10 -j 10 -T 1800 -f ~/test.sql test
>
> where test.sql is,
>
> -- test.sql --
> \set id random(1,10)
> \set val random(0,10)
> BEGIN;
> UPDATE test SET val = :val WHERE id = :id;
> COMMIT;
> -- test.sql --
>
>
> Machine details are as follows:
>
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                128
> On-line CPU(s) list:   0-127
> Thread(s) per core:    2
> Core(s) per socket:    8
> Socket(s):             8
>
> Also, It would be great if you could confirm as if you have been
> getting this issue repeatedly. Thanks.
>

Yeah, those are the steps; just with a Skylake laptop.

However, I restarted with a fresh master, with WAL v8 and MV v5, and 
can't reproduce the issue.

Best regards, Jesper

Re: [HACKERS] Microvacuum support for Hash Index

From

Jesper Pedersen

Date:

23 January 2017, 22:53:35

Hi Ashutosh,

On 01/20/2017 03:24 PM, Jesper Pedersen wrote:
> Yeah, those are the steps; just with a Skylake laptop.
>
> However, I restarted with a fresh master, with WAL v8 and MV v5, and
> can't reproduce the issue.
>

I have done some more testing with this, and have moved to the patch 
back to 'Needs Review' pending Amit's comments.

Best regards, Jesper

Re: [HACKERS] Microvacuum support for Hash Index

From

Jesper Pedersen

Date:

26 January 2017, 16:08:24

On 01/23/2017 02:53 PM, Jesper Pedersen wrote:
> I have done some more testing with this, and have moved to the patch
> back to 'Needs Review' pending Amit's comments.
>

Moved to "Ready for Committer".

Best regards, Jesper

Re: [HACKERS] Microvacuum support for Hash Index

From

Amit Kapila

Date:

27 January 2017, 05:30:40

On Thu, Jan 26, 2017 at 6:38 PM, Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:
> On 01/23/2017 02:53 PM, Jesper Pedersen wrote:
>>
>> I have done some more testing with this, and have moved to the patch
>> back to 'Needs Review' pending Amit's comments.
>>
>
> Moved to "Ready for Committer".
>

Don't you think we should try to identify the reason of the deadlock
error reported by you up thread [1]?  I know that you and Ashutosh are
not able to reproduce it, but still I feel some investigation is
required to find the reason.  It is quite possible that the test case
is such that the deadlock is expected in rare cases, if that is the
case then it is okay.  I have not spent enough time on that to comment
whether it is a test or code issue.

[1]  - https://www.postgresql.org/message-id/dc6d7247-050f-4014-8c80-a4ee676eb384%40redhat.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

27 January 2017, 14:45:04

>>> I have done some more testing with this, and have moved to the patch
>>> back to 'Needs Review' pending Amit's comments.
>>>
>>
>> Moved to "Ready for Committer".
>>
>
> Don't you think we should try to identify the reason of the deadlock
> error reported by you up thread [1]?  I know that you and Ashutosh are
> not able to reproduce it, but still I feel some investigation is
> required to find the reason.  It is quite possible that the test case
> is such that the deadlock is expected in rare cases, if that is the
> case then it is okay.  I have not spent enough time on that to comment
> whether it is a test or code issue.
>
>
> [1]  - https://www.postgresql.org/message-id/dc6d7247-050f-4014-8c80-a4ee676eb384%40redhat.com
>

I am finally able to reproduce the issue using the attached script
file (deadlock_report). Basically, once I was able to reproduce the
issue with hash index I also thought of checking it with a non unique
B-Tree index and was able to reproduce it with B-Tree index as well.
This certainly tells us that there is nothing wrong at the code level
rather there is something wrong with the test script which is causing
this deadlock issue. Well, I have ran pgbench with two different
configurations and my observations are as follows:

1) With Primary keys i.e. uinque values: I have never encountered
deadlock issue with this configuration.

2) With non unique indexes (be it hash or B-Tree): I have seen
deadlock many a times with this configuration. Basically when we have
non-unique values associated with a column then there is a high
probability that multiple records will get updated with a single
'UPDATE' statement and when there are huge number of backends trying
to do so there is high chance of getting deadlock which i assume is
expected behaviour in database.

Also, Attached are pgbench_bt.patch and pgbench_hash.patch files that
has changes done to create btree and hash index.

pgbench settings:
Scale Factor = 300
Shared Buffer= 1GB
Client counts = 64
run time duration = 30 mins
read-write workload.

./pgbench -c $threads -j $threads -T $time_for_reading -M prepared
postgres -f /home/ashu/deadlock_report

I hope this makes the things clear now and if there is no more
concerns it can be moved to 'Ready for committer' state. Thank you.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Microvacuum support for Hash Index

From

Amit Kapila

Date:

28 January 2017, 14:02:10

On Fri, Jan 27, 2017 at 5:15 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>
>> Don't you think we should try to identify the reason of the deadlock
>> error reported by you up thread [1]?  I know that you and Ashutosh are
>> not able to reproduce it, but still I feel some investigation is
>> required to find the reason.  It is quite possible that the test case
>> is such that the deadlock is expected in rare cases, if that is the
>> case then it is okay.  I have not spent enough time on that to comment
>> whether it is a test or code issue.
>>
>>
>> [1]  - https://www.postgresql.org/message-id/dc6d7247-050f-4014-8c80-a4ee676eb384%40redhat.com
>>
>
> I am finally able to reproduce the issue using the attached script
> file (deadlock_report). Basically, once I was able to reproduce the
> issue with hash index I also thought of checking it with a non unique
> B-Tree index and was able to reproduce it with B-Tree index as well.
> This certainly tells us that there is nothing wrong at the code level
> rather there is something wrong with the test script which is causing
> this deadlock issue. Well, I have ran pgbench with two different
> configurations and my observations are as follows:
>
> 1) With Primary keys i.e. uinque values: I have never encountered
> deadlock issue with this configuration.
>
> 2) With non unique indexes (be it hash or B-Tree): I have seen
> deadlock many a times with this configuration. Basically when we have
> non-unique values associated with a column then there is a high
> probability that multiple records will get updated with a single
> 'UPDATE' statement and when there are huge number of backends trying
> to do so there is high chance of getting deadlock which i assume is
> expected behaviour in database.
>

I agree with your analysis, surely trying to update multiple rows with
same values from different backends can lead to deadlock.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Microvacuum support for Hash Index

From

Michael Paquier

Date:

01 February 2017, 08:00:51

On Sat, Jan 28, 2017 at 8:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Jan 27, 2017 at 5:15 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>>
>>> Don't you think we should try to identify the reason of the deadlock
>>> error reported by you up thread [1]?  I know that you and Ashutosh are
>>> not able to reproduce it, but still I feel some investigation is
>>> required to find the reason.  It is quite possible that the test case
>>> is such that the deadlock is expected in rare cases, if that is the
>>> case then it is okay.  I have not spent enough time on that to comment
>>> whether it is a test or code issue.
>>
>> I am finally able to reproduce the issue using the attached script
>> file (deadlock_report). Basically, once I was able to reproduce the
>> issue with hash index I also thought of checking it with a non unique
>> B-Tree index and was able to reproduce it with B-Tree index as well.
>> This certainly tells us that there is nothing wrong at the code level
>> rather there is something wrong with the test script which is causing
>> this deadlock issue. Well, I have ran pgbench with two different
>> configurations and my observations are as follows:
>>
>> 1) With Primary keys i.e. uinque values: I have never encountered
>> deadlock issue with this configuration.
>>
>> 2) With non unique indexes (be it hash or B-Tree): I have seen
>> deadlock many a times with this configuration. Basically when we have
>> non-unique values associated with a column then there is a high
>> probability that multiple records will get updated with a single
>> 'UPDATE' statement and when there are huge number of backends trying
>> to do so there is high chance of getting deadlock which i assume is
>> expected behaviour in database.
>>
>
> I agree with your analysis, surely trying to update multiple rows with
> same values from different backends can lead to deadlock.

Moved that to CF 2017-03.
-- 
Michael

Re: [HACKERS] Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

14 March 2017, 15:02:37

Hi,

Attached is the v6 patch for microvacuum in hash index rebased on top
of 'v10 patch for WAL in hash index - [1]' and 'v1 patch for WAL
consistency check for hash index - [2]'.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2Bk5wR4-kAjPqLoKemuHayQd6RkQQT9gheTfpn%2B72o1UA%40mail.gmail.com
[2] -
https://www.postgresql.org/message-id/flat/CAGz5QCJLERUn_zoO0eDv6_Y_d0o4tNTMPeR7ivTLBg4rUrJdwg@mail.gmail.com#CAGz5QCJLERUn_zoO0eDv6_Y_d0o4tNTMPeR7ivTLBg4rUrJdwg@mail.gmail.com

Also, the patch (mask_hint_bit_LH_PAGE_HAS_DEAD_TUPLES.patch) to mask
'LH_PAGE_HAS_DEAD_TUPLES' flag which got added as a part of
Microvacuum patch is attached with this mail.


--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Wed, Feb 1, 2017 at 10:30 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sat, Jan 28, 2017 at 8:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Fri, Jan 27, 2017 at 5:15 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>>>
>>>> Don't you think we should try to identify the reason of the deadlock
>>>> error reported by you up thread [1]?  I know that you and Ashutosh are
>>>> not able to reproduce it, but still I feel some investigation is
>>>> required to find the reason.  It is quite possible that the test case
>>>> is such that the deadlock is expected in rare cases, if that is the
>>>> case then it is okay.  I have not spent enough time on that to comment
>>>> whether it is a test or code issue.
>>>
>>> I am finally able to reproduce the issue using the attached script
>>> file (deadlock_report). Basically, once I was able to reproduce the
>>> issue with hash index I also thought of checking it with a non unique
>>> B-Tree index and was able to reproduce it with B-Tree index as well.
>>> This certainly tells us that there is nothing wrong at the code level
>>> rather there is something wrong with the test script which is causing
>>> this deadlock issue. Well, I have ran pgbench with two different
>>> configurations and my observations are as follows:
>>>
>>> 1) With Primary keys i.e. uinque values: I have never encountered
>>> deadlock issue with this configuration.
>>>
>>> 2) With non unique indexes (be it hash or B-Tree): I have seen
>>> deadlock many a times with this configuration. Basically when we have
>>> non-unique values associated with a column then there is a high
>>> probability that multiple records will get updated with a single
>>> 'UPDATE' statement and when there are huge number of backends trying
>>> to do so there is high chance of getting deadlock which i assume is
>>> expected behaviour in database.
>>>
>>
>> I agree with your analysis, surely trying to update multiple rows with
>> same values from different backends can lead to deadlock.
>
> Moved that to CF 2017-03.
> --
> Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Microvacuum support for Hash Index

From

Robert Haas

Date:

14 March 2017, 22:45:29

On Tue, Mar 14, 2017 at 8:02 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> Attached is the v6 patch for microvacuum in hash index rebased on top
> of 'v10 patch for WAL in hash index - [1]' and 'v1 patch for WAL
> consistency check for hash index - [2]'.
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1%2Bk5wR4-kAjPqLoKemuHayQd6RkQQT9gheTfpn%2B72o1UA%40mail.gmail.com
> [2] -
https://www.postgresql.org/message-id/flat/CAGz5QCJLERUn_zoO0eDv6_Y_d0o4tNTMPeR7ivTLBg4rUrJdwg@mail.gmail.com#CAGz5QCJLERUn_zoO0eDv6_Y_d0o4tNTMPeR7ivTLBg4rUrJdwg@mail.gmail.com
>
> Also, the patch (mask_hint_bit_LH_PAGE_HAS_DEAD_TUPLES.patch) to mask
> 'LH_PAGE_HAS_DEAD_TUPLES' flag which got added as a part of
> Microvacuum patch is attached with this mail.

Generally, this patch looks like a pretty straightforward adaptation
of the similar btree mechanism to hash indexes, so if it works for
btree it ought to work here, too.  But I noticed a few things while
reading through it.

+        /* Get RelfileNode from relation OID */
+        rel = relation_open(htup->t_tableOid, NoLock);
+        rnode = rel->rd_node;
+        relation_close(rel, NoLock);        itup->t_tid = htup->t_self;
-        _hash_doinsert(index, itup);
+        _hash_doinsert(index, itup, rnode);

This is an awfully low-level place to be doing something like this.
I'm not sure exactly where this should be happening, but not in the
per-tuple callback.

+    /*
+     * If there's nothing running on the standby we don't need to derive a
+     * full latestRemovedXid value, so use a fast path out of here.  This
+     * returns InvalidTransactionId, and so will conflict with all HS
+     * transactions; but since we just worked out that that's zero people,
+     * it's OK.
+     */
+    if (CountDBBackends(InvalidOid) == 0)
+        return latestRemovedXid;

I see that this comment (and most of what surrounds it) was just
copied from the existing btree example, but isn't there a discrepancy
between the comment and the code?  It says it returns
InvalidTransactionId, but it doesn't.  Also, you dropped the XXX from
the btree original, and the following reachedConsistency check.

+     * Hash Index delete records can conflict with standby queries.You might
+     * think that vacuum records would conflict as well, but we've handled

But they're not called delete records in a hash index.  The function
is called hash_xlog_vacuum_one_page.  The corresponding btree function
is btree_xlog_delete.  So this comment needs a little more updating.

+            if (IsBufferCleanupOK(bucket_buf))
+            {
+                _hash_vacuum_one_page(rel, metabuf, buf, bucket_buf,
+                                      (buf == bucket_buf) ? true : false,
+                                      hnode);
+                if (bucket_buf != buf)
+                    LockBuffer(bucket_buf, BUFFER_LOCK_UNLOCK);
+
+                if (PageGetFreeSpace(page) >= itemsz)
+                    break;              /* OK, now we have enough space */
+            }

I might be missing something, but I don't quite see why this needs a
cleanup lock on the primary bucket page.  I would think a cleanup lock
on the page we're actually modifying would suffice, and I think if
that's correct it would be a better way to go.  If that's not correct,
then I think the comments needs some work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Microvacuum support for Hash Index

From

Amit Kapila

Date:

15 March 2017, 15:35:49

On Wed, Mar 15, 2017 at 1:15 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 14, 2017 at 8:02 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>> Attached is the v6 patch for microvacuum in hash index rebased on top
>> of 'v10 patch for WAL in hash index - [1]' and 'v1 patch for WAL
>> consistency check for hash index - [2]'.
>>
>> [1] -
https://www.postgresql.org/message-id/CAA4eK1%2Bk5wR4-kAjPqLoKemuHayQd6RkQQT9gheTfpn%2B72o1UA%40mail.gmail.com
>> [2] -
https://www.postgresql.org/message-id/flat/CAGz5QCJLERUn_zoO0eDv6_Y_d0o4tNTMPeR7ivTLBg4rUrJdwg@mail.gmail.com#CAGz5QCJLERUn_zoO0eDv6_Y_d0o4tNTMPeR7ivTLBg4rUrJdwg@mail.gmail.com
>>
>> Also, the patch (mask_hint_bit_LH_PAGE_HAS_DEAD_TUPLES.patch) to mask
>> 'LH_PAGE_HAS_DEAD_TUPLES' flag which got added as a part of
>> Microvacuum patch is attached with this mail.
>
> Generally, this patch looks like a pretty straightforward adaptation
> of the similar btree mechanism to hash indexes, so if it works for
> btree it ought to work here, too.  But I noticed a few things while
> reading through it.
>
> +        /* Get RelfileNode from relation OID */
> +        rel = relation_open(htup->t_tableOid, NoLock);
> +        rnode = rel->rd_node;
> +        relation_close(rel, NoLock);
>          itup->t_tid = htup->t_self;
> -        _hash_doinsert(index, itup);
> +        _hash_doinsert(index, itup, rnode);
>
> This is an awfully low-level place to be doing something like this.
> I'm not sure exactly where this should be happening, but not in the
> per-tuple callback.
>

I think one possibility is to get it using
indexrel->rd_index->indrelid in _hash_doinsert().


>
> But they're not called delete records in a hash index.  The function
> is called hash_xlog_vacuum_one_page.  The corresponding btree function
> is btree_xlog_delete.  So this comment needs a little more updating.
>
> +            if (IsBufferCleanupOK(bucket_buf))
> +            {
> +                _hash_vacuum_one_page(rel, metabuf, buf, bucket_buf,
> +                                      (buf == bucket_buf) ? true : false,
> +                                      hnode);
> +                if (bucket_buf != buf)
> +                    LockBuffer(bucket_buf, BUFFER_LOCK_UNLOCK);
> +
> +                if (PageGetFreeSpace(page) >= itemsz)
> +                    break;              /* OK, now we have enough space */
> +            }
>
> I might be missing something, but I don't quite see why this needs a
> cleanup lock on the primary bucket page.  I would think a cleanup lock
> on the page we're actually modifying would suffice, and I think if
> that's correct it would be a better way to go.
>

Offhand, I also don't see any problem with it.


Few other comments:
1.
+ if (ndeletable > 0)
+ {
+ /* No ereport(ERROR) until changes are logged */
+ START_CRIT_SECTION();
+
+ PageIndexMultiDelete(page, deletable, ndeletable);
+
+ pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+ pageopaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;

You clearing this flag while logging the action, but same is not taken
care during replay. Any reasons?

2.
+ /* No ereport(ERROR) until changes are logged */
+ START_CRIT_SECTION
();
+
+ PageIndexMultiDelete(page, deletable, ndeletable);
+
+ pageopaque =
(HashPageOpaque) PageGetSpecialPointer(page);
+ pageopaque->hasho_flag &=
~LH_PAGE_HAS_DEAD_TUPLES;
+
+ /*
+ * Write-lock the meta page so that we can
decrement
+ * tuple count.
+ */
+ LockBuffer(metabuf,
BUFFER_LOCK_EXCLUSIVE);

The lock on buffer should be acquired before critical section.

3.
-
/*
- * Since this can be redone later if needed, mark as a
hint.
- */
- MarkBufferDirtyHint(buf, true);
+
if (so->numKilled < MaxIndexTuplesPerPage)
+ {
+ so-
>killedItems[so->numKilled].heapTid = so->hashso_heappos;
+ so-
>killedItems[so->numKilled].indexOffset =
+
ItemPointerGetOffsetNumber(&(so->hashso_curpos));
+ so->numKilled++;
+
}

By looking at above code, the first thing that comes to mind is when
numKilled can become greater than MaxIndexTuplesPerPage and why we are
ignoring the marking of dead tuples when it becomes greater than
MaxIndexTuplesPerPage.  After looking at similar btree code, I realize
that it could
happen if user reverses the scan direction.  I think you should
mention in comments that see btgettuple to know the reason of
numKilled overun test or something like that.

4.
+ * We match items by heap TID before assuming they are the right ones to
+ * delete. If an item has
moved off the current page due to a split, we'll
+ * fail to find it and do nothing (this is not an
error case --- we assume
+ * the item will eventually get marked in a future indexscan).
+ */
+void
+_hash_kill_items(IndexScanDesc scan)

I think this comment doesn't make much sense for hash index because
item won't move off from the current page due to split, only later
cleanup can remove it.

5.

+/* * Maximum size of a hash index item (it's okay to have only one per page)

Spurious white space change.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

15 March 2017, 18:37:51

> Generally, this patch looks like a pretty straightforward adaptation
> of the similar btree mechanism to hash indexes, so if it works for
> btree it ought to work here, too.  But I noticed a few things while
> reading through it.
>
> +        /* Get RelfileNode from relation OID */
> +        rel = relation_open(htup->t_tableOid, NoLock);
> +        rnode = rel->rd_node;
> +        relation_close(rel, NoLock);
>          itup->t_tid = htup->t_self;
> -        _hash_doinsert(index, itup);
> +        _hash_doinsert(index, itup, rnode);
>
> This is an awfully low-level place to be doing something like this.
> I'm not sure exactly where this should be happening, but not in the
> per-tuple callback.

Okay, Now I have done this inside _hash_doinsert() instead of callback
function. Please have a look into the attached v7 patch.

>
> +    /*
> +     * If there's nothing running on the standby we don't need to derive a
> +     * full latestRemovedXid value, so use a fast path out of here.  This
> +     * returns InvalidTransactionId, and so will conflict with all HS
> +     * transactions; but since we just worked out that that's zero people,
> +     * it's OK.
> +     */
> +    if (CountDBBackends(InvalidOid) == 0)
> +        return latestRemovedXid;
>
> I see that this comment (and most of what surrounds it) was just
> copied from the existing btree example, but isn't there a discrepancy
> between the comment and the code?  It says it returns
> InvalidTransactionId, but it doesn't.  Also, you dropped the XXX from
> the btree original, and the following reachedConsistency check.

It does return InvalidTransactionId if there are no backends running
across any database in the standby. As shown below 'latestRemovedXid'
is initialised with InvalidTransactionId,

TransactionId   latestRemovedXid = InvalidTransactionId;

So, If there are no backend processes running across any databases in
standby latestRemovedXid will be returned as it is.

I have also added the note in XXX in above comment. Please check v7
patch attached with this mail.


>
> +     * Hash Index delete records can conflict with standby queries.You might
> +     * think that vacuum records would conflict as well, but we've handled
>
> But they're not called delete records in a hash index.  The function
> is called hash_xlog_vacuum_one_page.  The corresponding btree function
> is btree_xlog_delete.  So this comment needs a little more updating.

Okay, I have tried to rephrase it to avoid the confusion.

>
> +            if (IsBufferCleanupOK(bucket_buf))
> +            {
> +                _hash_vacuum_one_page(rel, metabuf, buf, bucket_buf,
> +                                      (buf == bucket_buf) ? true : false,
> +                                      hnode);
> +                if (bucket_buf != buf)
> +                    LockBuffer(bucket_buf, BUFFER_LOCK_UNLOCK);
> +
> +                if (PageGetFreeSpace(page) >= itemsz)
> +                    break;              /* OK, now we have enough space */
> +            }
>
> I might be missing something, but I don't quite see why this needs a
> cleanup lock on the primary bucket page.  I would think a cleanup lock
> on the page we're actually modifying would suffice, and I think if
> that's correct it would be a better way to go.  If that's not correct,
> then I think the comments needs some work.

Thanks for your that suggestion... I spent a lot of time thinking on
this and also had a small discussion with Amit but could not find any
issue with taking cleanup lock on modified page instead of primary
bucket page.I had to do some decent code changes for this. Attached v7
patch has the changes.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

microvacuum_hash_index_v7.patch

Re: [HACKERS] Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

15 March 2017, 18:53:49

>
> I think one possibility is to get it using
> indexrel->rd_index->indrelid in _hash_doinsert().
>

Thanks. I have tried the same in the v7 patch shared upthread.

>
>>
>> But they're not called delete records in a hash index.  The function
>> is called hash_xlog_vacuum_one_page.  The corresponding btree function
>> is btree_xlog_delete.  So this comment needs a little more updating.
>>
>> +            if (IsBufferCleanupOK(bucket_buf))
>> +            {
>> +                _hash_vacuum_one_page(rel, metabuf, buf, bucket_buf,
>> +                                      (buf == bucket_buf) ? true : false,
>> +                                      hnode);
>> +                if (bucket_buf != buf)
>> +                    LockBuffer(bucket_buf, BUFFER_LOCK_UNLOCK);
>> +
>> +                if (PageGetFreeSpace(page) >= itemsz)
>> +                    break;              /* OK, now we have enough space */
>> +            }
>>
>> I might be missing something, but I don't quite see why this needs a
>> cleanup lock on the primary bucket page.  I would think a cleanup lock
>> on the page we're actually modifying would suffice, and I think if
>> that's correct it would be a better way to go.
>>
>
> Offhand, I also don't see any problem with it.

I too found no problem with that...

>
> Few other comments:
> 1.
> + if (ndeletable > 0)
> + {
> + /* No ereport(ERROR) until changes are logged */
> + START_CRIT_SECTION();
> +
> + PageIndexMultiDelete(page, deletable, ndeletable);
> +
> + pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
> + pageopaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;
>
> You clearing this flag while logging the action, but same is not taken
> care during replay. Any reasons?

That's because we  conditionally WAL Log this flag status and when we
do so, we take a it's FPI.

>
> 2.
> + /* No ereport(ERROR) until changes are logged */
> + START_CRIT_SECTION
> ();
> +
> + PageIndexMultiDelete(page, deletable, ndeletable);
> +
> + pageopaque =
> (HashPageOpaque) PageGetSpecialPointer(page);
> + pageopaque->hasho_flag &=
> ~LH_PAGE_HAS_DEAD_TUPLES;
> +
> + /*
> + * Write-lock the meta page so that we can
> decrement
> + * tuple count.
> + */
> + LockBuffer(metabuf,
> BUFFER_LOCK_EXCLUSIVE);
>
> The lock on buffer should be acquired before critical section.

Point taken. I have taken care of it in the v7 patch.

>
> 3.
> -
> /*
> - * Since this can be redone later if needed, mark as a
> hint.
> - */
> - MarkBufferDirtyHint(buf, true);
> +
> if (so->numKilled < MaxIndexTuplesPerPage)
> + {
> + so-
>>killedItems[so->numKilled].heapTid = so->hashso_heappos;
> + so-
>>killedItems[so->numKilled].indexOffset =
> +
> ItemPointerGetOffsetNumber(&(so->hashso_curpos));
> + so->numKilled++;
> +
> }
>
> By looking at above code, the first thing that comes to mind is when
> numKilled can become greater than MaxIndexTuplesPerPage and why we are
> ignoring the marking of dead tuples when it becomes greater than
> MaxIndexTuplesPerPage.  After looking at similar btree code, I realize
> that it could
> happen if user reverses the scan direction.  I think you should
> mention in comments that see btgettuple to know the reason of
> numKilled overun test or something like that.

Added comment. Please refer to v7 patch.

>
> 4.
> + * We match items by heap TID before assuming they are the right ones to
> + * delete. If an item has
> moved off the current page due to a split, we'll
> + * fail to find it and do nothing (this is not an
> error case --- we assume
> + * the item will eventually get marked in a future indexscan).
> + */
> +void
> +_hash_kill_items(IndexScanDesc scan)
>
> I think this comment doesn't make much sense for hash index because
> item won't move off from the current page due to split, only later
> cleanup can remove it.

Yes. The reason being, no cleanup will happen when scan in progress.
Corrected it .

>
> 5.
>
> +
>  /*
>   * Maximum size of a hash index item (it's okay to have only one per page)
>
> Spurious white space change.

Fixed.

Re: [HACKERS] Microvacuum support for Hash Index

From

Robert Haas

Date:

15 March 2017, 18:58:24

On Wed, Mar 15, 2017 at 11:37 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>> +        /* Get RelfileNode from relation OID */
>> +        rel = relation_open(htup->t_tableOid, NoLock);
>> +        rnode = rel->rd_node;
>> +        relation_close(rel, NoLock);
>>          itup->t_tid = htup->t_self;
>> -        _hash_doinsert(index, itup);
>> +        _hash_doinsert(index, itup, rnode);
>>
>> This is an awfully low-level place to be doing something like this.
>> I'm not sure exactly where this should be happening, but not in the
>> per-tuple callback.
>
> Okay, Now I have done this inside _hash_doinsert() instead of callback
> function. Please have a look into the attached v7 patch.

In the btree case, the heap relation isn't re-opened from anywhere in
the btree code.  I think we should try to do the same thing here.  If
we add an argument for the heap relation to _hash_doinsert(),
hashinsert() can easily it down; it's already got that value
available.  There are two other calls to _hash_doinsert:

1. _h_indexbuild() calls _hash_doinsert().  It's called only from
hashbuild(), which has the heap relation available.  So we can just
add that as an extra argument to _h_indexbuild() and then from there
pass it to _hash_doinsert.

2. hashbuildCallback calls _hash_doinsert().  It's sixth argument is a
HashBuildState which is set up by hashbuild(), which has the heap
relation available.  So we can just add an extra member to the
HashBuildState and have hashbuild() set it before calling
IndexBuildHeapScan.  hashbuildCallback can then fish it out of the
HashBuildState and pass it to _hash_doinsert().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

15 March 2017, 21:10:09

On Wed, Mar 15, 2017 at 9:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 15, 2017 at 11:37 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>> +        /* Get RelfileNode from relation OID */
>>> +        rel = relation_open(htup->t_tableOid, NoLock);
>>> +        rnode = rel->rd_node;
>>> +        relation_close(rel, NoLock);
>>>          itup->t_tid = htup->t_self;
>>> -        _hash_doinsert(index, itup);
>>> +        _hash_doinsert(index, itup, rnode);
>>>
>>> This is an awfully low-level place to be doing something like this.
>>> I'm not sure exactly where this should be happening, but not in the
>>> per-tuple callback.
>>
>> Okay, Now I have done this inside _hash_doinsert() instead of callback
>> function. Please have a look into the attached v7 patch.
>
> In the btree case, the heap relation isn't re-opened from anywhere in
> the btree code.  I think we should try to do the same thing here.  If
> we add an argument for the heap relation to _hash_doinsert(),
> hashinsert() can easily it down; it's already got that value
> available.  There are two other calls to _hash_doinsert:
>
> 1. _h_indexbuild() calls _hash_doinsert().  It's called only from
> hashbuild(), which has the heap relation available.  So we can just
> add that as an extra argument to _h_indexbuild() and then from there
> pass it to _hash_doinsert.
>
> 2. hashbuildCallback calls _hash_doinsert().  It's sixth argument is a
> HashBuildState which is set up by hashbuild(), which has the heap
> relation available.  So we can just add an extra member to the
> HashBuildState and have hashbuild() set it before calling
> IndexBuildHeapScan.  hashbuildCallback can then fish it out of the
> HashBuildState and pass it to _hash_doinsert().

Okay, I have done the changes as suggested by you. Please refer to the
attached v8 patch. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

microvacuum_hash_index_v8.patch

Re: [HACKERS] Microvacuum support for Hash Index

From

Robert Haas

Date:

15 March 2017, 21:31:41

On Wed, Mar 15, 2017 at 2:10 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> Okay, I have done the changes as suggested by you. Please refer to the
> attached v8 patch. Thanks.

Cool, but this doesn't look right:

+    action = XLogReadBufferForRedo(record, 0, &buffer);
+
+    if (!IsBufferCleanupOK(buffer))
+        elog(PANIC, "hash_xlog_vacuum_one_page: failed to acquire
cleanup lock");

That could fail, I think, because of a pin from a Hot Standby backend.
You want to call XLogReadBufferForRedoExtended() with a third argument
of true. Come to think of it, shouldn't hash_xlog_split_allocate_page
be changed the same way?

+            /*
+             * Let us mark the page as clean if vacuum removes the DEAD tuples
+             * from an index page. We do this by clearing
LH_PAGE_HAS_DEAD_TUPLES
+             * flag.
+             */

Maybe add: Clearing this flag is just a hint; replay won't redo this.

+     * Hash Index records that are marked as LP_DEAD and being removed during
+     * hash index tuple insertion can conflict with standby queries.You might

The word Index shouldn't be capitalized here.  There should be a space
before "You".

The formatting of this comment is oddly narrow:

+ * _hash_vacuum_one_page - vacuum just one index page.
+ * Try to remove LP_DEAD items from the given page.  We
+ * must acquire cleanup lock on the page being modified
+ * before calling this function.

I'd add a blank line after the first line and reflow the rest to be
something more like 75 characters.  pgindent evidently doesn't think
this needs reformatting, but it's oddly narrow.

I suggest changing the header comment of
hash_xlog_vacuum_get_latestRemovedXid like this:

+ * Get the latestRemovedXid from the heap pages pointed at by the index
+ * tuples being deleted.  See also btree_xlog_delete_get_latestRemovedXid,
+ * on which this function is based.

This is looking good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

15 March 2017, 22:54:26

> +    action = XLogReadBufferForRedo(record, 0, &buffer);
> +
> +    if (!IsBufferCleanupOK(buffer))
> +        elog(PANIC, "hash_xlog_vacuum_one_page: failed to acquire
> cleanup lock");
>
> That could fail, I think, because of a pin from a Hot Standby backend.
> You want to call XLogReadBufferForRedoExtended() with a third argument
> of true.

Yes, there is a possibility that a new backend may start in standby
after we kill the conflicting backends. If the new backend has pin on
the buffer which the startup process is trying to read then
'IsBufferCleanupOK' will fail thereby causing a startup process to
PANIC.

 Come to think of it, shouldn't hash_xlog_split_allocate_page
> be changed the same way?

No, the reason being we are trying to allocate a new bucket page on
standby so there can't be any backend which could have pin on a page
that is yet to initialised.

>
> +            /*
> +             * Let us mark the page as clean if vacuum removes the DEAD tuples
> +             * from an index page. We do this by clearing
> LH_PAGE_HAS_DEAD_TUPLES
> +             * flag.
> +             */
>
> Maybe add: Clearing this flag is just a hint; replay won't redo this.

Added. Please check the attached v9 patch.

>
> +     * Hash Index records that are marked as LP_DEAD and being removed during
> +     * hash index tuple insertion can conflict with standby queries.You might
>
> The word Index shouldn't be capitalized here.  There should be a space
> before "You".

Corrected.

>
> The formatting of this comment is oddly narrow:
>
> + * _hash_vacuum_one_page - vacuum just one index page.
> + * Try to remove LP_DEAD items from the given page.  We
> + * must acquire cleanup lock on the page being modified
> + * before calling this function.
>
> I'd add a blank line after the first line and reflow the rest to be
> something more like 75 characters.  pgindent evidently doesn't think
> this needs reformatting, but it's oddly narrow.

Corrected.

>
> I suggest changing the header comment of
> hash_xlog_vacuum_get_latestRemovedXid like this:
>
> + * Get the latestRemovedXid from the heap pages pointed at by the index
> + * tuples being deleted.  See also btree_xlog_delete_get_latestRemovedXid,
> + * on which this function is based.
>
> This is looking good.

Changed as per suggestions. Attached v9 patch. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

microvacuum_hash_index_v9.patch

Re: [HACKERS] Microvacuum support for Hash Index

From

Robert Haas

Date:

15 March 2017, 23:31:11

On Wed, Mar 15, 2017 at 3:54 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> Changed as per suggestions. Attached v9 patch. Thanks.

Wow, when do you sleep?   Will have a look.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Microvacuum support for Hash Index

From

Robert Haas

Date:

16 March 2017, 05:19:53

On Wed, Mar 15, 2017 at 4:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 15, 2017 at 3:54 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>> Changed as per suggestions. Attached v9 patch. Thanks.
>
> Wow, when do you sleep?   Will have a look.

Committed with a few corrections.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Microvacuum support for Hash Index

From

Andres Freund

Date:

16 March 2017, 05:26:45

On 2017-03-15 16:31:11 -0400, Robert Haas wrote:
> On Wed, Mar 15, 2017 at 3:54 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> > Changed as per suggestions. Attached v9 patch. Thanks.
> 
> Wow, when do you sleep?

I think that applies to a bunch of people, including yourself ;)

Re: [HACKERS] Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

16 March 2017, 06:01:47

On Mar 16, 2017 7:49 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:

On Wed, Mar 15, 2017 at 4:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 15, 2017 at 3:54 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>> Changed as per suggestions. Attached v9 patch. Thanks.
>
> Wow, when do you sleep? Will have a look.

Committed with a few corrections.

Thanks Robert for the commit. Thank you Amit and Jesper for reviewing this patch.

With Regards,

Ashutosh Sharma

Re: [HACKERS] Microvacuum support for Hash Index

From

Amit Kapila

Date:

16 March 2017, 08:41:12

On Wed, Mar 15, 2017 at 9:23 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
>>
>> Few other comments:
>> 1.
>> + if (ndeletable > 0)
>> + {
>> + /* No ereport(ERROR) until changes are logged */
>> + START_CRIT_SECTION();
>> +
>> + PageIndexMultiDelete(page, deletable, ndeletable);
>> +
>> + pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
>> + pageopaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;
>>
>> You clearing this flag while logging the action, but same is not taken
>> care during replay. Any reasons?
>
> That's because we  conditionally WAL Log this flag status and when we
> do so, we take a it's FPI.
>

Sure, but we are not clearing in conditionally.  I am not sure, how
after recovery it will be cleared it gets set during normal operation.
Moreover, btree already clears similar flag during replay (refer
btree_xlog_delete).


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

16 March 2017, 10:32:54

On Thu, Mar 16, 2017 at 11:11 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Mar 15, 2017 at 9:23 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>
>>>
>>> Few other comments:
>>> 1.
>>> + if (ndeletable > 0)
>>> + {
>>> + /* No ereport(ERROR) until changes are logged */
>>> + START_CRIT_SECTION();
>>> +
>>> + PageIndexMultiDelete(page, deletable, ndeletable);
>>> +
>>> + pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
>>> + pageopaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;
>>>
>>> You clearing this flag while logging the action, but same is not taken
>>> care during replay. Any reasons?
>>
>> That's because we  conditionally WAL Log this flag status and when we
>> do so, we take a it's FPI.
>>
>
> Sure, but we are not clearing in conditionally.  I am not sure, how
> after recovery it will be cleared it gets set during normal operation.
> Moreover, btree already clears similar flag during replay (refer
> btree_xlog_delete).

You were right. In case datachecksum is enabled or wal_log_hint is set
to true, 'LH_PAGE_HAS_DEAD_TUPLES' will get wal logged and therefore
needs to be cleared on the standby as well. Attached is the patch that
clears this flag on standby during replay.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

0001-Reset-LH_PAGE_HAS_DEAD_TUPLES-flag-on-standby-when.patch

Re: [HACKERS] Microvacuum support for Hash Index

From

Amit Kapila

Date:

16 March 2017, 14:34:23

On Thu, Mar 16, 2017 at 1:02 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> On Thu, Mar 16, 2017 at 11:11 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Wed, Mar 15, 2017 at 9:23 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>>
>>>>
>>>> Few other comments:
>>>> 1.
>>>> + if (ndeletable > 0)
>>>> + {
>>>> + /* No ereport(ERROR) until changes are logged */
>>>> + START_CRIT_SECTION();
>>>> +
>>>> + PageIndexMultiDelete(page, deletable, ndeletable);
>>>> +
>>>> + pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
>>>> + pageopaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;
>>>>
>>>> You clearing this flag while logging the action, but same is not taken
>>>> care during replay. Any reasons?
>>>
>>> That's because we  conditionally WAL Log this flag status and when we
>>> do so, we take a it's FPI.
>>>
>>
>> Sure, but we are not clearing in conditionally.  I am not sure, how
>> after recovery it will be cleared it gets set during normal operation.
>> Moreover, btree already clears similar flag during replay (refer
>> btree_xlog_delete).
>
> You were right. In case datachecksum is enabled or wal_log_hint is set
> to true, 'LH_PAGE_HAS_DEAD_TUPLES' will get wal logged and therefore
> needs to be cleared on the standby as well.
>

I was thinking what bad can happen if we don't clear this flag during
replay, the main thing that comes to mind is that after crash
recovery, if the flag is set the inserts on that page might need to
traverse all the tuples on that page once the page is full even if
there are no dead tuples in that page.  It can be later cleared when
there are dead tuples in that page and we actually delete them, but I
don't think it is worth the price to pay for not clearing the flag
during replay.

> Attached is the patch that
> clears this flag on standby during replay.
>

Don't you think, we should also clear it during the replay of
XLOG_HASH_DELETE?  We might want to log the clear of flag along with
WAL record for XLOG_HASH_DELETE.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

16 March 2017, 19:09:04

>>> Sure, but we are not clearing in conditionally.  I am not sure, how
>>> after recovery it will be cleared it gets set during normal operation.
>>> Moreover, btree already clears similar flag during replay (refer
>>> btree_xlog_delete).
>>
>> You were right. In case datachecksum is enabled or wal_log_hint is set
>> to true, 'LH_PAGE_HAS_DEAD_TUPLES' will get wal logged and therefore
>> needs to be cleared on the standby as well.
>>
>
> I was thinking what bad can happen if we don't clear this flag during
> replay, the main thing that comes to mind is that after crash
> recovery, if the flag is set the inserts on that page might need to
> traverse all the tuples on that page once the page is full even if
> there are no dead tuples in that page.  It can be later cleared when
> there are dead tuples in that page and we actually delete them, but I
> don't think it is worth the price to pay for not clearing the flag
> during replay.

Yes, you are absolutely correct. If we do not clear this flag  during
replay then there is a possibility of _hash_doinsert() unnecessarily
scanning the page with no space assuming that the page has got some
dead tuples in it which is not true.

>
>> Attached is the patch that
>> clears this flag on standby during replay.
>>
>
> Don't you think, we should also clear it during the replay of
> XLOG_HASH_DELETE?  We might want to log the clear of flag along with
> WAL record for XLOG_HASH_DELETE.
>

Yes, it should be cleared. I completely missed this part in a hurry.
Thanks for informing. I have taken care of it in the attached v2
patch.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

0001-Reset-LH_PAGE_HAS_DEAD_TUPLES-flag-on-standby-during.patch

Re: [HACKERS] Microvacuum support for Hash Index

From

Amit Kapila

Date:

17 March 2017, 05:50:36

On Thu, Mar 16, 2017 at 9:39 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>>
>>
>> Don't you think, we should also clear it during the replay of
>> XLOG_HASH_DELETE?  We might want to log the clear of flag along with
>> WAL record for XLOG_HASH_DELETE.
>>
>
> Yes, it should be cleared. I completely missed this part in a hurry.
> Thanks for informing. I have taken care of it in the attached v2
> patch.
>

+ /*
+ * Mark the page as not containing any LP_DEAD items. See comments
+ * in hashbucketcleanup() for details.
+ */
+ pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+ pageopaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;

Your comment here says, refer hashbucketcleanup and in that function,
the comment says "Clearing this flag is just a hint; replay won't redo
this.".  Both seems contradictory.  You need to change the comment in
hashbucketcleanup.  As I said in my previous e-mail, I think you need
to record clearing of this flag in WAL record XLOG_HASH_DELETE as you
are not doing this unconditionally and then during replay clear it
only when the WAL record indicates the same.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

17 March 2017, 09:57:17

On Fri, Mar 17, 2017 at 8:20 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Mar 16, 2017 at 9:39 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>>>
>>>
>>> Don't you think, we should also clear it during the replay of
>>> XLOG_HASH_DELETE?  We might want to log the clear of flag along with
>>> WAL record for XLOG_HASH_DELETE.
>>>
>>
>> Yes, it should be cleared. I completely missed this part in a hurry.
>> Thanks for informing. I have taken care of it in the attached v2
>> patch.
>>
>
> + /*
> + * Mark the page as not containing any LP_DEAD items. See comments
> + * in hashbucketcleanup() for details.
> + */
> + pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
> + pageopaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;
>
> Your comment here says, refer hashbucketcleanup and in that function,
> the comment says "Clearing this flag is just a hint; replay won't redo
> this.".  Both seems contradictory.  You need to change the comment in
> hashbucketcleanup.

Done. Please check the attached v3 patch.

As I said in my previous e-mail, I think you need
> to record clearing of this flag in WAL record XLOG_HASH_DELETE as you
> are not doing this unconditionally and then during replay clear it
> only when the WAL record indicates the same.

Thank you so much for putting that point. I too think that we should
record the flag status in the WAL record and clear it only when
required during replay.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

0001-Reset-LH_PAGE_HAS_DEAD_TUPLES-flag-during-replay-v3.patch

Re: [HACKERS] Microvacuum support for Hash Index

From

Amit Kapila

Date:

17 March 2017, 15:43:03

On Fri, Mar 17, 2017 at 12:27 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> On Fri, Mar 17, 2017 at 8:20 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> As I said in my previous e-mail, I think you need
>> to record clearing of this flag in WAL record XLOG_HASH_DELETE as you
>> are not doing this unconditionally and then during replay clear it
>> only when the WAL record indicates the same.
>
> Thank you so much for putting that point. I too think that we should
> record the flag status in the WAL record and clear it only when
> required during replay.
>

I think hashdesc.c needs an update (refer case XLOG_HASH_DELETE:).

- * flag. Clearing this flag is just a hint; replay won't redo this.
+ * flag. Clearing this flag is just a hint; replay will check the
+ * status of clear_dead_marking flag before redo it. */ if (tuples_removed && *tuples_removed > 0 &&
opaque->hasho_flag& LH_PAGE_HAS_DEAD_TUPLES)
 
+ { opaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;
+ clear_dead_marking = true;
+ }


I feel the above comment is not required as you are logging this
action explicitly.

+ bool clear_dead_marking; /* TRUE if VACUUM clears

No need to write VACUUM explicitly, you can simply say "TRUE if this
operation clears ...".


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Microvacuum support for Hash Index

From

Ashutosh Sharma

Date:

17 March 2017, 18:04:08

On Fri, Mar 17, 2017 at 6:13 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Mar 17, 2017 at 12:27 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>> On Fri, Mar 17, 2017 at 8:20 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> As I said in my previous e-mail, I think you need
>>> to record clearing of this flag in WAL record XLOG_HASH_DELETE as you
>>> are not doing this unconditionally and then during replay clear it
>>> only when the WAL record indicates the same.
>>
>> Thank you so much for putting that point. I too think that we should
>> record the flag status in the WAL record and clear it only when
>> required during replay.
>>
>
> I think hashdesc.c needs an update (refer case XLOG_HASH_DELETE:).

Done. Thanks!

>
> - * flag. Clearing this flag is just a hint; replay won't redo this.
> + * flag. Clearing this flag is just a hint; replay will check the
> + * status of clear_dead_marking flag before redo it.
>   */
>   if (tuples_removed && *tuples_removed > 0 &&
>   opaque->hasho_flag & LH_PAGE_HAS_DEAD_TUPLES)
> + {
>   opaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;
> + clear_dead_marking = true;
> + }
>
>
> I feel the above comment is not required as you are logging this
> action explicitly.

That's right. I have removed it in the attached v4 patch.

>
> + bool clear_dead_marking; /* TRUE if VACUUM clears
>
> No need to write VACUUM explicitly, you can simply say "TRUE if this
> operation clears ...".

Corrected. Please find the attached v4 patch.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

0001-Reset-LH_PAGE_HAS_DEAD_TUPLES-flag-during-replay-v4.patch

Re: [HACKERS] Microvacuum support for Hash Index

From

Bruce Momjian

Date:

18 March 2017, 01:22:25

On Wed, Mar 15, 2017 at 07:26:45PM -0700, Andres Freund wrote:
> On 2017-03-15 16:31:11 -0400, Robert Haas wrote:
> > On Wed, Mar 15, 2017 at 3:54 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> > > Changed as per suggestions. Attached v9 patch. Thanks.
> > 
> > Wow, when do you sleep?
> 
> I think that applies to a bunch of people, including yourself ;)

Gee, no one asks when I sleep.  I wonder why.  ;-)

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Re: [HACKERS] Microvacuum support for Hash Index

From

Amit Kapila

Date:

18 March 2017, 11:35:34

On Fri, Mar 17, 2017 at 8:34 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> On Fri, Mar 17, 2017 at 6:13 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Fri, Mar 17, 2017 at 12:27 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>>> On Fri, Mar 17, 2017 at 8:20 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>
>>> As I said in my previous e-mail, I think you need
>>>> to record clearing of this flag in WAL record XLOG_HASH_DELETE as you
>>>> are not doing this unconditionally and then during replay clear it
>>>> only when the WAL record indicates the same.
>>>
>>> Thank you so much for putting that point. I too think that we should
>>> record the flag status in the WAL record and clear it only when
>>> required during replay.
>>>
>>
>> I think hashdesc.c needs an update (refer case XLOG_HASH_DELETE:).
>
> Done. Thanks!
>

This version looks good to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Microvacuum support for Hash Index

From

Robert Haas

Date:

20 March 2017, 22:54:35

On Sat, Mar 18, 2017 at 4:35 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> This version looks good to me.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company