Thread: pageinspect: Hash index support
Hi, Attached is a patch that adds support for hash indexes in pageinspect. The functionality (hash_metap, hash_page_stats and hash_page_items) follows the B-tree functions. This patch will need an update once Amit's and Mithun's work on Concurrent Hash Indexes is committed to account for the new meta-page constants. I'll create a CommitFest entry for this submission. Feedback is most welcome. Best regards, Jesper
Attachment
Jesper Pedersen wrote: > Hi, > > Attached is a patch that adds support for hash indexes in pageinspect. > > The functionality (hash_metap, hash_page_stats and hash_page_items) follows > the B-tree functions. I suggest that pageinspect functions are more convenient to use via the get_raw_page interface, that is, instead of reading the buffer themselves, the buffer is handed over from elsewhere and they receive it as bytea. This enables other use cases such as grabbing a page from one server and examining it in another one. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Aug 31, 2016 at 2:06 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Jesper Pedersen wrote: >> Attached is a patch that adds support for hash indexes in pageinspect. >> >> The functionality (hash_metap, hash_page_stats and hash_page_items) follows >> the B-tree functions. > > I suggest that pageinspect functions are more convenient to use via the > get_raw_page interface, that is, instead of reading the buffer > themselves, the buffer is handed over from elsewhere and they receive it > as bytea. This enables other use cases such as grabbing a page from one > server and examining it in another one. +1. -- Michael
On Tue, Aug 30, 2016 at 10:06 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Jesper Pedersen wrote:
> Hi,
>
> Attached is a patch that adds support for hash indexes in pageinspect.
>
> The functionality (hash_metap, hash_page_stats and hash_page_items) follows
> the B-tree functions.
I suggest that pageinspect functions are more convenient to use via the
get_raw_page interface, that is, instead of reading the buffer
themselves, the buffer is handed over from elsewhere and they receive it
as bytea. This enables other use cases such as grabbing a page from one
server and examining it in another one.
I've never needed to do that, but it does sound like it might be useful. But it is also annoying to have to do that when you want to examine a current server. Could we use overloading, so that it can be used in either way?
For hash_page_items, the 'data' is always a 32 bit integer, isn't it? (unlike other pageinspect features, where the data could be any user-defined data type). If so, I'd rather see it in plain hex (without the spaces, without the trailing zeros)
+ /* page type (flags) */
+ if (opaque->hasho_flag & LH_UNUSED_PAGE)
+ stat->type = 'u';
This can never be true, because LH_UNUSED_PAGE is zero. You should make this be the fall-through case, and have LH_META_PAGE be explicitly tested for.
Cheers,
Jeff
On 09/14/2016 04:21 PM, Jeff Janes wrote: >> I suggest that pageinspect functions are more convenient to use via the >> get_raw_page interface, that is, instead of reading the buffer >> themselves, the buffer is handed over from elsewhere and they receive it >> as bytea. This enables other use cases such as grabbing a page from one >> server and examining it in another one. >> > > I've never needed to do that, but it does sound like it might be useful. > But it is also annoying to have to do that when you want to examine a > current server. Could we use overloading, so that it can be used in either > way? > > For hash_page_items, the 'data' is always a 32 bit integer, isn't it? > (unlike other pageinspect features, where the data could be any > user-defined data type). If so, I'd rather see it in plain hex (without > the spaces, without the trailing zeros) > > + /* page type (flags) */ > + if (opaque->hasho_flag & LH_UNUSED_PAGE) > + stat->type = 'u'; > > This can never be true, because LH_UNUSED_PAGE is zero. You should make > this be the fall-through case, and have LH_META_PAGE be explicitly tested > for. > This version adds the overloaded get_raw_page based methods. However, I'm not verifying the page other than page_id, as hasho_flag can contain multiple flags in Amit's patches. And, I don't see having different page ids as a benefit - at least not part of this patch. We should probably just have one of the method sets, so I'll send a v3 with the set voted for. I kept the 'data' field as is, for now. Best regards, Jesper
Attachment
On Tue, Sep 20, 2016 at 1:11 AM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > This version adds the overloaded get_raw_page based methods. However, I'm > not verifying the page other than page_id, as hasho_flag can contain > multiple flags in Amit's patches. And, I don't see having different page ids > as a benefit - at least not part of this patch. > > We should probably just have one of the method sets, so I'll send a v3 with > the set voted for. > > I kept the 'data' field as is, for now. $ git diff master --check contrib/pageinspect/hashfuncs.c:758: space before tab in indent. + values); That's always good to run to check the format of a patch before sending it. You did not get right the comments from Alvaro upthread. The following functions are added with this patch:function hash_metap(text)function hash_metap_bytea(bytea)function hash_page_items(text,integer)functionhash_page_items_bytea(bytea)function hash_page_stats(text,integer)function hash_page_stats_bytea(bytea,integer) Now the following set of functions would be sufficient: function hash_metapage_info(bytea) function hash_page_items(bytea) function hash_page_stats(bytea) The last time pageinspect has been updated, when BRIN functions have been added, it has been discussed to just use (bytea) as an argument interface and just rely on get_raw_page() to get the pages wanted, so I think that we had better stick with that and keep things simple. Note: the patch checks if a superuser is calling the new functions, which is a good thing. I am switching the patch back to "waiting on author". As far as I can see the patch is in rather good shape, you just need to adapt the docs and shave all the parts that are not needed for the final result. -- Michael
On 09/20/2016 03:19 AM, Michael Paquier wrote: > You did not get right the comments from Alvaro upthread. The following > functions are added with this patch: > function hash_metap(text) > function hash_metap_bytea(bytea) > function hash_page_items(text,integer) > function hash_page_items_bytea(bytea) > function hash_page_stats(text,integer) > function hash_page_stats_bytea(bytea,integer) > > Now the following set of functions would be sufficient: > function hash_metapage_info(bytea) > function hash_page_items(bytea) > function hash_page_stats(bytea) > The last time pageinspect has been updated, when BRIN functions have > been added, it has been discussed to just use (bytea) as an argument > interface and just rely on get_raw_page() to get the pages wanted, so > I think that we had better stick with that and keep things simple. > Yes, I know, Alvaro and you voted for the bytea methods, and Jeff asked for both. Attached is v3 with only the bytea based methods. Alvaro, Michael and Jeff - Thanks for the review ! Best regards, Jesper
Attachment
On Tue, Sep 20, 2016 at 5:40 AM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote:
On 09/20/2016 03:19 AM, Michael Paquier wrote:You did not get right the comments from Alvaro upthread. The following
functions are added with this patch:
function hash_metap(text)
function hash_metap_bytea(bytea)
function hash_page_items(text,integer)
function hash_page_items_bytea(bytea)
function hash_page_stats(text,integer)
function hash_page_stats_bytea(bytea,integer)
Now the following set of functions would be sufficient:
function hash_metapage_info(bytea)
function hash_page_items(bytea)
function hash_page_stats(bytea)
The last time pageinspect has been updated, when BRIN functions have
been added, it has been discussed to just use (bytea) as an argument
interface and just rely on get_raw_page() to get the pages wanted, so
I think that we had better stick with that and keep things simple.
Yes, I know, Alvaro and you voted for the bytea methods, and Jeff asked for both.
Attached is v3 with only the bytea based methods.
Alvaro, Michael and Jeff - Thanks for the review !
Is the 2nd "1" in this call needed?
SELECT * FROM hash_page_stats(get_raw_page('mytab_index', 1), 1)
As far as I can tell, that argument is only used to stuff into the output field "blkno", it is not used to instruct the interpretation of the raw page itself. It doesn't seem worthwhile to have the parameter that only echos back to the user what the user already put in (twice). The only existing funtions which take the blkno argument are those that don't use the get_raw_page style.
Also, should we document what the single letter values mean in the hash_page_stats.type column? It is not obvious that 'i' means bitmap, for example.
Cheers,
Jeff
Hi, I am getting a fatal error along with some warnings when trying to apply the v3 patch using "git apply" command. [ashu@localhost postgresql]$ git apply 0001-pageinspect-Hash-index-support_v3.patch 0001-pageinspect-Hash-index-support_v3.patch:29: trailing whitespace. brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES) 0001-pageinspect-Hash-index-support_v3.patch:36: trailing whitespace. DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \ 0001-pageinspect-Hash-index-support_v3.patch:37: trailing whitespace. pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql\ 0001-pageinspect-Hash-index-support_v3.patch:38: trailing whitespace. pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql\ 0001-pageinspect-Hash-index-support_v3.patch:39: trailing whitespace. pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql fatal: git apply: bad git-diff - expected /dev/null on line 46 Also, i think the USAGE for hash_metap() is refering to hash_metap_bytea(). Please correct it. I have just started reviewing the patch, will keep on posting my comments upon further review. Thanks. With Regards, Ashutosh Sharma. EnterpriseDB: http://www.enterprisedb.com On Tue, Sep 20, 2016 at 10:15 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Tue, Sep 20, 2016 at 5:40 AM, Jesper Pedersen > <jesper.pedersen@redhat.com> wrote: >> >> On 09/20/2016 03:19 AM, Michael Paquier wrote: >>> >>> You did not get right the comments from Alvaro upthread. The following >>> functions are added with this patch: >>> function hash_metap(text) >>> function hash_metap_bytea(bytea) >>> function hash_page_items(text,integer) >>> function hash_page_items_bytea(bytea) >>> function hash_page_stats(text,integer) >>> function hash_page_stats_bytea(bytea,integer) >>> >>> Now the following set of functions would be sufficient: >>> function hash_metapage_info(bytea) >>> function hash_page_items(bytea) >>> function hash_page_stats(bytea) >>> The last time pageinspect has been updated, when BRIN functions have >>> been added, it has been discussed to just use (bytea) as an argument >>> interface and just rely on get_raw_page() to get the pages wanted, so >>> I think that we had better stick with that and keep things simple. >>> >> >> Yes, I know, Alvaro and you voted for the bytea methods, and Jeff asked >> for both. >> >> Attached is v3 with only the bytea based methods. >> >> Alvaro, Michael and Jeff - Thanks for the review ! > > > > Is the 2nd "1" in this call needed? > > SELECT * FROM hash_page_stats(get_raw_page('mytab_index', 1), 1) > > As far as I can tell, that argument is only used to stuff into the output > field "blkno", it is not used to instruct the interpretation of the raw page > itself. It doesn't seem worthwhile to have the parameter that only echos > back to the user what the user already put in (twice). The only existing > funtions which take the blkno argument are those that don't use the > get_raw_page style. > > Also, should we document what the single letter values mean in the > hash_page_stats.type column? It is not obvious that 'i' means bitmap, for > example. > > Cheers, > > Jeff
On 09/20/2016 12:45 PM, Jeff Janes wrote: > Is the 2nd "1" in this call needed? > > SELECT * FROM hash_page_stats(get_raw_page('mytab_index', 1), 1) > > As far as I can tell, that argument is only used to stuff into the output > field "blkno", it is not used to instruct the interpretation of the raw > page itself. It doesn't seem worthwhile to have the parameter that only > echos back to the user what the user already put in (twice). The only > existing funtions which take the blkno argument are those that don't use > the get_raw_page style. > > Also, should we document what the single letter values mean in the > hash_page_stats.type column? It is not obvious that 'i' means bitmap, for > example. > Adjusted in v4. Code/doc will need an update once the CHI patch goes in. Best regards, Jesper
Attachment
Hi, On 09/20/2016 01:46 PM, Ashutosh Sharma wrote: > I am getting a fatal error along with some warnings when trying to > apply the v3 patch using "git apply" command. > > [ashu@localhost postgresql]$ git apply > 0001-pageinspect-Hash-index-support_v3.patch > 0001-pageinspect-Hash-index-support_v3.patch:29: trailing whitespace. > brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES) > 0001-pageinspect-Hash-index-support_v3.patch:36: trailing whitespace. > DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \ > 0001-pageinspect-Hash-index-support_v3.patch:37: trailing whitespace. > pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \ > 0001-pageinspect-Hash-index-support_v3.patch:38: trailing whitespace. > pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \ > 0001-pageinspect-Hash-index-support_v3.patch:39: trailing whitespace. > pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql > fatal: git apply: bad git-diff - expected /dev/null on line 46 > Which platform are you on ? The development branch is @ https://github.com/jesperpedersen/postgres/tree/pageinspect_hash > Also, i think the USAGE for hash_metap() is refering to > hash_metap_bytea(). Please correct it. I have just started reviewing > the patch, will keep on posting my comments upon further review. Fixed in v4. Thanks for the review. Best regards, Jesper
On Wed, Sep 21, 2016 at 3:25 AM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > On 09/20/2016 12:45 PM, Jeff Janes wrote: >> Is the 2nd "1" in this call needed? >> >> SELECT * FROM hash_page_stats(get_raw_page('mytab_index', 1), 1) >> >> As far as I can tell, that argument is only used to stuff into the output >> field "blkno", it is not used to instruct the interpretation of the raw >> page itself. It doesn't seem worthwhile to have the parameter that only >> echos back to the user what the user already put in (twice). The only >> existing funtions which take the blkno argument are those that don't use >> the get_raw_page style. >> >> Also, should we document what the single letter values mean in the >> hash_page_stats.type column? It is not obvious that 'i' means bitmap, for >> example. >> > > Adjusted in v4. Thanks for the new version. > Code/doc will need an update once the CHI patch goes in. If you are referring to the WAL support, I guess that this patch or the other patch could just rebase and adjust as needed. hash_page_items and hash_page_stats share a lot of common points with their btree equivalents, still doing a refactoring would just impact the visibility of the code, and it is wanted as educational in this module, so let's go with things the way you suggest. + <para> + The type information will be '<literal>m</literal>' for a metadata page, + '<literal>v</literal>' for an overflow page, '<literal>b</literal>' for a bucket page, + '<literal>i</literal>' for a bitmap page, and '<literal>u</literal>' for an unused page. + </para> Other functions don't go into this level of details, so I would just rip out this paragraph. The patch looks in pretty good shape to me, so I am switching it as ready for committer. -- Michael
Hi, > Which platform are you on ? > > The development branch is @ > https://github.com/jesperpedersen/postgres/tree/pageinspect_hash I am working on centOS 7. I am still facing the issue when applying your patch using "git apply" command but if i use 'patch' command it works fine however i am seeing some warning messages with 'patch' command as well. I continued reviewing your v4 patch and here are some more comments: 1). Got below error if i pass meta page to hash_page_items(). Does hash_page_items() accept only bucket or bitmap page as input? postgres=# SELECT * FROM hash_page_items(get_raw_page('hash_i4_index', 0)); ERROR: invalid memory alloc request size 18446744073709551593 2). Server crashed when below query was executed on a code that was build with cassert-enabled. postgres=# SELECT * FROM hash_page_stats(get_raw_page('hash_i4_index', 0)); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processingthe request. The connection to the server was lost. Attempting reset: Failed. !> \q The callstack is as follows: (gdb) bt #0 0x00007fef794f15f7 in raise () from /lib64/libc.so.6 #1 0x00007fef794f2ce8 in abort () from /lib64/libc.so.6 #2 0x0000000000967381 in ExceptionalCondition (conditionName=0x7fef699c942d "!(((id)->lp_len != 0))", errorType=0x7fef699c92d4 "FailedAssertion", fileName=0x7fef699c9397 "hashfuncs.c", lineNumber=123) at assert.c:54 #3 0x00007fef699c6ed6 in GetHashPageStatistics (page=0x1060c2c "", stat=0x7ffd76846230) at hashfuncs.c:123 #4 0x00007fef699c70a4 in hash_page_stats (fcinfo=0x7ffd768463e0) at hashfuncs.c:169 #5 0x0000000000682e6b in ExecMakeTableFunctionResult (funcexpr=0x10457f0, econtext=0x1044e28, argContext=0x104ee38, expectedDesc=0x1047640, randomAccess=0 '\000') at execQual.c:2216 #6 0x00000000006a88e2 in FunctionNext (node=0x1044d10) at nodeFunctionscan.c:94 #7 0x000000000068a3d9 in ExecScanFetch (node=0x1044d10, accessMtd=0x6a882b <FunctionNext>, recheckMtd=0x6a8c10 <FunctionRecheck>) at execScan.c:95 #8 0x000000000068a448 in ExecScan (node=0x1044d10, accessMtd=0x6a882b <FunctionNext>, recheckMtd=0x6a8c10 <FunctionRecheck>) at execScan.c:145 #9 0x00000000006a8c45 in ExecFunctionScan (node=0x1044d10) at nodeFunctionscan.c:268 #10 0x000000000067f0cf in ExecProcNode (node=0x1044d10) at execProcnode.c:449 #11 0x000000000067b40b in ExecutePlan (estate=0x1044bf8, planstate=0x1044d10, use_parallel_mode=0 '\000', operation=CMD_SELECT, sendTuples=1 '\001', numberTuples=0, direction=ForwardScanDirection, dest=0x10444c8) at execMain.c:1567 #12 0x0000000000679527 in standard_ExecutorRun (queryDesc=0xfac778, direction=ForwardScanDirection, count=0) at execMain.c:338 #13 0x00000000006793ab in ExecutorRun (queryDesc=0xfac778, direction=ForwardScanDirection, count=0) at execMain.c:286 #14 0x0000000000816b3e in PortalRunSelect (portal=0xfa49e8, forward=1 '\001', count=0, dest=0x10444c8) at pquery.c:948 #15 0x00000000008167d1 in PortalRun (portal=0xfa49e8, count=9223372036854775807, isTopLevel=1 '\001', dest=0x10444c8, altdest=0x10444c8, completionTag=0x7ffd76846c60 "") at pquery.c:789 #16 0x0000000000810a27 in exec_simple_query (query_string=0x1007018 "SELECT * FROM hash_page_stats(get_raw_page('hash_i4_index', 0));") at postgres.c:1094 #17 0x0000000000814b5e in PostgresMain (argc=1, argv=0xfb1d08, dbname=0xf873d8 "postgres", username=0xfb1b70 "edb") at postgres.c:4070 #18 0x000000000078990c in BackendRun (port=0xfacb50) at postmaster.c:4260 3). Could you please let me know, what does the hard coded values '*5' and '+1' represents in the below lines of code. You have used them when allocating memory for storing spare pages allocated before certain split point and block number of bitmap pages inside hash_metap(). spares = palloc0(HASH_MAX_SPLITPOINTS * 5 + 1); mapp = palloc0(HASH_MAX_BITMAPS * 5 + 1); I guess it would be better to add some comments if using any hard coded values. 4). Inside hash_page_stats(), you are first calling verify_hash_page() to verify if it is a hash page or not and then calling GetHashPageStatistics() to get the page stats wherein you are trying to check what is the type of hash page. Below is the code snippet for this, + /* page type (flags) */ + if (opaque->hasho_flag & LH_META_PAGE) + stat->type = 'm'; + else if (opaque->hasho_flag & LH_OVERFLOW_PAGE) + stat->type = 'v'; + else if (opaque->hasho_flag & LH_BUCKET_PAGE) + stat->type = 'b'; + else if (opaque->hasho_flag & LH_BITMAP_PAGE) + stat->type = 'i'; + else + stat->type = 'u'; My question is, can we have a hash page that does not fall under the category of Metapage, overflow page, bitmap page, bucket page? I guess we can't have such type of hash page and incase if we found such type of undefined page i guess we should error out instead of reading the page because it is quite possible that the page would be corrupted. Please let me know your thoughts on this. 5). I think we have added enough functions to show the page level statistics but not the index level statistics like the total number of overflow pages , bucket pages, number of free overflow pages, number of bitmap pages etc. in the hash index. How about adding a function that would store the index level statistics? With Regards, Ashutosh Sharma EnterpriseDB: http://www.enterprisedb.com
Hi, On 09/21/2016 07:24 AM, Ashutosh Sharma wrote: >> The development branch is @ >> https://github.com/jesperpedersen/postgres/tree/pageinspect_hash > > I am working on centOS 7. I am still facing the issue when applying > your patch using "git apply" command but if i use 'patch' command it > works fine however i am seeing some warning messages with 'patch' > command as well. > git apply w/ v4 works here, so you will have to investigate what happens on your side. > I continued reviewing your v4 patch and here are some more comments: > > 1). Got below error if i pass meta page to hash_page_items(). Does > hash_page_items() accept only bucket or bitmap page as input? > > postgres=# SELECT * FROM hash_page_items(get_raw_page('hash_i4_index', 0)); > ERROR: invalid memory alloc request size 18446744073709551593 > page_stats / page_items should not be used on the metadata page. As these functions are marked as superuser only it is expected that people provides the correct input, especially since the raw page structure is used as the input. > 2). Server crashed when below query was executed on a code that was > build with cassert-enabled. > > postgres=# SELECT * FROM hash_page_stats(get_raw_page('hash_i4_index', 0)); > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > !> \q > > The callstack is as follows: > > (gdb) bt > #0 0x00007fef794f15f7 in raise () from /lib64/libc.so.6 > #1 0x00007fef794f2ce8 in abort () from /lib64/libc.so.6 > #2 0x0000000000967381 in ExceptionalCondition > (conditionName=0x7fef699c942d "!(((id)->lp_len != 0))", > errorType=0x7fef699c92d4 "FailedAssertion", > fileName=0x7fef699c9397 "hashfuncs.c", lineNumber=123) at assert.c:54 > #3 0x00007fef699c6ed6 in GetHashPageStatistics (page=0x1060c2c "", > stat=0x7ffd76846230) at hashfuncs.c:123 > #4 0x00007fef699c70a4 in hash_page_stats (fcinfo=0x7ffd768463e0) at > hashfuncs.c:169 > #5 0x0000000000682e6b in ExecMakeTableFunctionResult > (funcexpr=0x10457f0, econtext=0x1044e28, argContext=0x104ee38, > expectedDesc=0x1047640, > randomAccess=0 '\000') at execQual.c:2216 > #6 0x00000000006a88e2 in FunctionNext (node=0x1044d10) at nodeFunctionscan.c:94 > #7 0x000000000068a3d9 in ExecScanFetch (node=0x1044d10, > accessMtd=0x6a882b <FunctionNext>, recheckMtd=0x6a8c10 > <FunctionRecheck>) at execScan.c:95 > #8 0x000000000068a448 in ExecScan (node=0x1044d10, accessMtd=0x6a882b > <FunctionNext>, recheckMtd=0x6a8c10 <FunctionRecheck>) at > execScan.c:145 > #9 0x00000000006a8c45 in ExecFunctionScan (node=0x1044d10) at > nodeFunctionscan.c:268 > #10 0x000000000067f0cf in ExecProcNode (node=0x1044d10) at execProcnode.c:449 > #11 0x000000000067b40b in ExecutePlan (estate=0x1044bf8, > planstate=0x1044d10, use_parallel_mode=0 '\000', operation=CMD_SELECT, > sendTuples=1 '\001', > numberTuples=0, direction=ForwardScanDirection, dest=0x10444c8) at > execMain.c:1567 > #12 0x0000000000679527 in standard_ExecutorRun (queryDesc=0xfac778, > direction=ForwardScanDirection, count=0) at execMain.c:338 > #13 0x00000000006793ab in ExecutorRun (queryDesc=0xfac778, > direction=ForwardScanDirection, count=0) at execMain.c:286 > #14 0x0000000000816b3e in PortalRunSelect (portal=0xfa49e8, forward=1 > '\001', count=0, dest=0x10444c8) at pquery.c:948 > #15 0x00000000008167d1 in PortalRun (portal=0xfa49e8, > count=9223372036854775807, isTopLevel=1 '\001', dest=0x10444c8, > altdest=0x10444c8, > completionTag=0x7ffd76846c60 "") at pquery.c:789 > #16 0x0000000000810a27 in exec_simple_query (query_string=0x1007018 > "SELECT * FROM hash_page_stats(get_raw_page('hash_i4_index', 0));") at > postgres.c:1094 > #17 0x0000000000814b5e in PostgresMain (argc=1, argv=0xfb1d08, > dbname=0xf873d8 "postgres", username=0xfb1b70 "edb") at > postgres.c:4070 > #18 0x000000000078990c in BackendRun (port=0xfacb50) at postmaster.c:4260 > > Same deal here. > > 3). Could you please let me know, what does the hard coded values '*5' > and '+1' represents in the below lines of code. You have used them > when allocating memory for storing spare pages allocated before > certain split point and block number of bitmap pages inside > hash_metap(). > > spares = palloc0(HASH_MAX_SPLITPOINTS * 5 + 1); > > mapp = palloc0(HASH_MAX_BITMAPS * 5 + 1); > > I guess it would be better to add some comments if using any hard coded values. > It is the space needed to output the values. > 4). Inside hash_page_stats(), you are first calling verify_hash_page() > to verify if it is a hash page or not and No, we assume that the page is a valid hash page structure, since there is an explicit cast w/o any further checks. > then calling > GetHashPageStatistics() to get the page stats wherein you are trying > to check what is the type of hash page. Below is the code snippet for > this, > > + /* page type (flags) */ > + if (opaque->hasho_flag & LH_META_PAGE) > + stat->type = 'm'; > + else if (opaque->hasho_flag & LH_OVERFLOW_PAGE) > + stat->type = 'v'; > + else if (opaque->hasho_flag & LH_BUCKET_PAGE) > + stat->type = 'b'; > + else if (opaque->hasho_flag & LH_BITMAP_PAGE) > + stat->type = 'i'; > + else > + stat->type = 'u'; > > My question is, can we have a hash page that does not fall under the > category of Metapage, overflow page, bitmap page, bucket page? I guess > we can't have such type of hash page and incase if we found such type > of undefined page i guess we should error out instead of reading the > page because it is quite possible that the page would be corrupted. > Please let me know your thoughts on this. > The "if" statement will need updating once the CHI patch goes in, as it defines new constants for page types. However, as the other pageinspect function it assume that the operator is passing valid data. > 5). I think we have added enough functions to show the page level > statistics but not the index level statistics like the total number of > overflow pages , bucket pages, number of free overflow pages, number > of bitmap pages etc. in the hash index. How about adding a function > that would store the index level statistics? > Sure, additional functions could be of benefit later on. Feel free to investigate that possibility for a future CommitFest. Thanks for your feedback ! Best regards, Jesper
On 09/21/2016 02:14 AM, Michael Paquier wrote: >> Adjusted in v4. > > Thanks for the new version. > >> Code/doc will need an update once the CHI patch goes in. > > If you are referring to the WAL support, I guess that this patch or > the other patch could just rebase and adjust as needed. > It is the main patch [1] that defines the new constants for page type. But I'll submit an update for pageinspect when needed. > hash_page_items and hash_page_stats share a lot of common points with > their btree equivalents, still doing a refactoring would just impact > the visibility of the code, and it is wanted as educational in this > module, so let's go with things the way you suggest. > Ok. > + <para> > + The type information will be '<literal>m</literal>' for a metadata page, > + '<literal>v</literal>' for an overflow page, > '<literal>b</literal>' for a bucket page, > + '<literal>i</literal>' for a bitmap page, and > '<literal>u</literal>' for an unused page. > + </para> > Other functions don't go into this level of details, so I would just > rip out this paragraph. > I'll add an annotation for this part, and leave it for the committer to decide, since Jeff wanted documentation for the 'type' information. > The patch looks in pretty good shape to me, so I am switching it as > ready for committer. > Thanks for your feedback ! Best regards, Jesper
On Wed, Sep 21, 2016 at 9:21 PM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > On 09/21/2016 07:24 AM, Ashutosh Sharma wrote: > git apply w/ v4 works here, so you will have to investigate what happens on > your side. Works here as well. >> I continued reviewing your v4 patch and here are some more comments: >> >> 1). Got below error if i pass meta page to hash_page_items(). Does >> hash_page_items() accept only bucket or bitmap page as input? >> >> postgres=# SELECT * FROM hash_page_items(get_raw_page('hash_i4_index', >> 0)); >> ERROR: invalid memory alloc request size 18446744073709551593 >> > > page_stats / page_items should not be used on the metadata page. > > As these functions are marked as superuser only it is expected that people > provides the correct input, especially since the raw page structure is used > as the input. btree functions use the block number to do some sanity checks. You cannot do that here as only bytea functions are available, but you could do it in verify_hash_page by looking at the opaque data and look at LH_META_PAGE. Then add a boolean argument into verify_hash_page to see if the caller expects a meta page or not and just issue an error. Actually it would be a good idea to put in those safeguards, even if I agree with you that calling those functions is at the risk of the user... Could you update the patch in this sense? I had fun doing the same tests, aka running the items and stats functions on a meta page, and the meta function on a non-meta page, but at my surprise I did not see a crash, so perhaps I was lucky and perhaps that was because of OSX. -- Michael
Hi, > git apply w/ v4 works here, so you will have to investigate what happens on > your side. > Thanks, It works with v4 patch. > As these functions are marked as superuser only it is expected that people > provides the correct input, especially since the raw page structure is used > as the input. Well, page_stats / page_items does accept bitmap page as input but these function are not defined to read bitmap page as bitmap page doesnot have a standard page layout. Infact if we pass bitmap page as an input to page_stats / page_items, the output is always same. I think we need to have a separete function to read bitmap page. And i am not sure why should a server crash if i pass meta page to hash_page_stats or hash_page_items. > The "if" statement will need updating once the CHI patch goes in, as it > defines new constants for page types. Not sure if CHI patch is adding a new type of hash page other than holding an information about split in progress. Basically my point was can we have hash page of types other than meta page, bucket page, overflow and bitmap page. If pageinspect tool finds a page that doesnot fall under any of these category shouldn't it error out. With Regards, Ashutosh Sharma EnterpriseDB: http://www.enterprisedb.com
On 09/21/2016 08:43 AM, Michael Paquier wrote: >> page_stats / page_items should not be used on the metadata page. >> >> As these functions are marked as superuser only it is expected that people >> provides the correct input, especially since the raw page structure is used >> as the input. > > btree functions use the block number to do some sanity checks. You > cannot do that here as only bytea functions are available, but you > could do it in verify_hash_page by looking at the opaque data and look > at LH_META_PAGE. Then add a boolean argument into verify_hash_page to > see if the caller expects a meta page or not and just issue an error. > Actually it would be a good idea to put in those safeguards, even if I > agree with you that calling those functions is at the risk of the > user... Could you update the patch in this sense? > > I had fun doing the same tests, aka running the items and stats > functions on a meta page, and the meta function on a non-meta page, > but at my surprise I did not see a crash, so perhaps I was lucky and > perhaps that was because of OSX. > Attached is v5, which add basic page verification. Thanks for the feedback ! Best regards, Jesper
Attachment
On Tue, Sep 20, 2016 at 11:14 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
+ <para>
+ The type information will be '<literal>m</literal>' for a metadata page,
+ '<literal>v</literal>' for an overflow page,
'<literal>b</literal>' for a bucket page,
+ '<literal>i</literal>' for a bitmap page, and
'<literal>u</literal>' for an unused page.
+ </para>
Other functions don't go into this level of details, so I would just
rip out this paragraph.
I'd argue that the other functions should go into that level detail in some places. Pageinspect is needlessly hard to use; not all precedent is good precedent. Some of them do refer you to header files or README files, which can be useful. But the abbreviations used here are not explained in any header file or README file, so I think the right place to explain them is the documentation in that case. Or change from the single-letter strings to full name strings so they are self-documenting.
Cheers,
Jeff
On 9/21/16 9:30 AM, Jesper Pedersen wrote: > Attached is v5, which add basic page verification. There are still some things that I found that appear to follow the old style (btree) conventions instead the new style (brin, gin) conventions. - Rename hash_metap to hash_metapage_info. - Don't use BuildTupleFromCStrings(). (There is nothing inherently wrong with it, but why convert numbers to strings and back again when it can be done more directly.) - Documentation for hash_page_stats still has blkno argument. - In hash_metap, the magic field could be type bytea, so that it displays in hex. Or text like the brin functions. Some other comments: - hash_metap result fields spares and mapp should be arrays of integer. (Incidentally, the comment in hash.h talks about bitmaps[] but I think it means hashm_mapp[].) - As of very recently, we don't need to move pageinspect--1.5.sql to pageinspect--1.6.sql anymore. Just add pageinspect--1.5--1.6.sql. - In the documentation for hash_page_stats(), the "+" sign is misaligned. - In hash_page_items, the fields itemlen, nulls, vars are always 16, false, false. So perhaps there is no need for them. Similarly, the hash_page_stats in hash_page_stats is always 16. - The data field could be of type bytea. - Add a pointer in the documentation to the relevant header file. Bonus: - Add subsections in the documentation (general functions, B-tree functions, etc.) - Add tests. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Sep 23, 2016 at 9:40 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 9/21/16 9:30 AM, Jesper Pedersen wrote: >> Attached is v5, which add basic page verification. > > There are still some things that I found that appear to follow the old > style (btree) conventions instead the new style (brin, gin) conventions. > > - Rename hash_metap to hash_metapage_info. > > - Don't use BuildTupleFromCStrings(). (There is nothing inherently > wrong with it, but why convert numbers to strings and back again when it > can be done more directly.) > > - Documentation for hash_page_stats still has blkno argument. > > - In hash_metap, the magic field could be type bytea, so that it > displays in hex. Or text like the brin functions. > > Some other comments: > > - hash_metap result fields spares and mapp should be arrays of integer. > how would you like to see both those arrays in tuple, right now, I think Jesper's code is showing it as string. > (Incidentally, the comment in hash.h talks about bitmaps[] but I think > it means hashm_mapp[].) > which comment are you referring here? hashm_mapp contains block numbers of bitmap pages. While looking at patch, I noticed below code which seems somewhat problematic: + stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData); + + /* page type (flags) */ + if (opaque->hasho_flag & LH_META_PAGE) + stat->type = 'm'; + else if (opaque->hasho_flag & LH_OVERFLOW_PAGE) + stat->type = 'v'; + else if (opaque->hasho_flag & LH_BUCKET_PAGE) + stat->type = 'b'; + else if (opaque->hasho_flag & LH_BITMAP_PAGE) + stat->type = 'i'; In the above code, it appears that you are trying to calculate max_avail space for all pages in same way. Don't you need to calculate it differently for bitmap page or meta page as they don't share the same format as other type of pages? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 9/23/16 1:56 AM, Amit Kapila wrote: > On Fri, Sep 23, 2016 at 9:40 AM, Peter Eisentraut >> - hash_metap result fields spares and mapp should be arrays of integer. >> > > how would you like to see both those arrays in tuple, right now, I > think Jesper's code is showing it as string. I'm not sure what you are asking here. >> (Incidentally, the comment in hash.h talks about bitmaps[] but I think >> it means hashm_mapp[].) >> > > which comment are you referring here? hashm_mapp contains block > numbers of bitmap pages. The comment I'm referring to says The blknos of these bitmap pages are kept in bitmaps[]; nmaps is the number of currently existing bitmaps. But there is no "bitmaps" field anywhere. > In the above code, it appears that you are trying to calculate > max_avail space for all pages in same way. Don't you need to > calculate it differently for bitmap page or meta page as they don't > share the same format as other type of pages? Is this even useful for hash indexes? This idea came from btrees. Neither the gin nor the brin code have this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Sep 23, 2016 at 6:11 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 9/23/16 1:56 AM, Amit Kapila wrote: >> which comment are you referring here? hashm_mapp contains block >> numbers of bitmap pages. > > The comment I'm referring to says > > The blknos of these bitmap pages are kept in bitmaps[]; nmaps is the > number of currently existing bitmaps. > > But there is no "bitmaps" field anywhere. > Okay. You are right, it should be hashm_mapp. >> In the above code, it appears that you are trying to calculate >> max_avail space for all pages in same way. Don't you need to >> calculate it differently for bitmap page or meta page as they don't >> share the same format as other type of pages? > > Is this even useful for hash indexes? > I think so. It will be helpful for bucket and overflow pages. They store the index tuples similar to btree. Is there a reason, why you think it won't be useful? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Hi, On 09/23/2016 12:10 AM, Peter Eisentraut wrote: > On 9/21/16 9:30 AM, Jesper Pedersen wrote: >> Attached is v5, which add basic page verification. > > There are still some things that I found that appear to follow the old > style (btree) conventions instead the new style (brin, gin) conventions. > > - Rename hash_metap to hash_metapage_info. > Done. > - Don't use BuildTupleFromCStrings(). (There is nothing inherently > wrong with it, but why convert numbers to strings and back again when it > can be done more directly.) > Left as is, since BuildTupleFromCStrings() vs. xyzGetDatum() are equally readable in this case. But, I can change the patch if needed. > - Documentation for hash_page_stats still has blkno argument. > Fixed. > - In hash_metap, the magic field could be type bytea, so that it > displays in hex. Or text like the brin functions. > Changed to 'text'. > Some other comments: > > - hash_metap result fields spares and mapp should be arrays of integer. > B-tree and BRIN uses a 'text' field as output, so left as is. > (Incidentally, the comment in hash.h talks about bitmaps[] but I think > it means hashm_mapp[].) > > - As of very recently, we don't need to move pageinspect--1.5.sql to > pageinspect--1.6.sql anymore. Just add pageinspect--1.5--1.6.sql. > We need to rename the pageinspect--1.5.sql file and add the new methods, right ? Or ? > - In the documentation for hash_page_stats(), the "+" sign is misaligned. > Fixed. > - In hash_page_items, the fields itemlen, nulls, vars are always 16, > false, false. So perhaps there is no need for them. Similarly, the > hash_page_stats in hash_page_stats is always 16. > Fixed, by removing them. We can add them back later if needed. > - The data field could be of type bytea. > Left as is, for same reasons as 'spares' and 'mapp'. > - Add a pointer in the documentation to the relevant header file. > Done. > Bonus: > > - Add subsections in the documentation (general functions, B-tree > functions, etc.) > Done. > - Add tests. > Thanks for the review ! Best regards, Jesper
Attachment
On 09/23/2016 01:56 AM, Amit Kapila wrote: > While looking at patch, I noticed below code which seems somewhat problematic: > > + stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData); > + > + /* page type (flags) */ > + if (opaque->hasho_flag & LH_META_PAGE) > + stat->type = 'm'; > + else if (opaque->hasho_flag & LH_OVERFLOW_PAGE) > + stat->type = 'v'; > + else if (opaque->hasho_flag & LH_BUCKET_PAGE) > + stat->type = 'b'; > + else if (opaque->hasho_flag & LH_BITMAP_PAGE) > + stat->type = 'i'; > > In the above code, it appears that you are trying to calculate > max_avail space for all pages in same way. Don't you need to > calculate it differently for bitmap page or meta page as they don't > share the same format as other type of pages? > Correct, however the max_avail calculation was removed in v6, since we don't display average item size anymore. Thanks for the feedback ! Best regards, Jesper
On Mon, Sep 26, 2016 at 10:39 AM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote:
Hi,
On 09/23/2016 12:10 AM, Peter Eisentraut wrote:
- As of very recently, we don't need to move pageinspect--1.5.sql to
pageinspect--1.6.sql anymore. Just add pageinspect--1.5--1.6.sql.
We need to rename the pageinspect--1.5.sql file and add the new methods, right ? Or ?
You just need to add pageinspect--1.5--1.6.sql. With recent changes to the extension infrastructure, when you create the extension as version 1.6, the infrastructure automatically figures out that it should install 1.5 and then upgrade to 1.6. Or that is my understanding, anyway.
Cheers,
Jeff
On Tue, Sep 20, 2016 at 12:19 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
Note: the patch checks if a superuser is calling the new functions,
which is a good thing.
If we only have the bytea functions and the user needs to supply the raw pages themselves, rather than having the function go get the raw page for you, is there any reason to restrict the interpretation function to super users? I guess if we don't trust the C coded functions not to dereference bogus data in harmful ways?
Cheers,
Jeff
Jeff Janes wrote: > On Tue, Sep 20, 2016 at 12:19 AM, Michael Paquier <michael.paquier@gmail.com > > wrote: > > > > > Note: the patch checks if a superuser is calling the new functions, > > which is a good thing. > > > > If we only have the bytea functions and the user needs to supply the raw > pages themselves, rather than having the function go get the raw page for > you, is there any reason to restrict the interpretation function to super > users? I guess if we don't trust the C coded functions not to dereference > bogus data in harmful ways? Yeah, it'd likely be simple to manufacture a fake page that causes the server to misbehave resulting in a security leak or at least DoS. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 9/26/16 1:39 PM, Jesper Pedersen wrote: > Left as is, since BuildTupleFromCStrings() vs. xyzGetDatum() are equally > readable in this case. But, I can change the patch if needed. The point is that to use BuildTupleFromCStrings() you need to convert numbers to strings, and then they are converted back. This is not a typical way to write row-returning functions. >> - hash_metap result fields spares and mapp should be arrays of integer. > > B-tree and BRIN uses a 'text' field as output, so left as is. These fields are specific to hash, so the precedent doesn't necessarily apply. >> - The data field could be of type bytea. > > Left as is, for same reasons as 'spares' and 'mapp'. Comments from others here? Why not use bytea instead of text? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut wrote: > On 9/26/16 1:39 PM, Jesper Pedersen wrote: > >> - hash_metap result fields spares and mapp should be arrays of integer. > > > > B-tree and BRIN uses a 'text' field as output, so left as is. > > These fields are specific to hash, so the precedent doesn't necessarily > apply. > > >> - The data field could be of type bytea. > > > > Left as is, for same reasons as 'spares' and 'mapp'. > > Comments from others here? Why not use bytea instead of text? The BRIN pageinspect functions aren't a particularly well thought-out precedent IMO. There was practically no review of that part of the BRIN patch, and I just threw it together in the way that seemed to make the most sense at the time. If there's some reason why some other way is better for hash indexes, that should be followed rather than mimicking BRIN. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 09/26/2016 10:45 PM, Peter Eisentraut wrote: > On 9/26/16 1:39 PM, Jesper Pedersen wrote: >> Left as is, since BuildTupleFromCStrings() vs. xyzGetDatum() are equally >> readable in this case. But, I can change the patch if needed. > > The point is that to use BuildTupleFromCStrings() you need to convert > numbers to strings, and then they are converted back. This is not a > typical way to write row-returning functions. > Ok. Changed: * BuildTupleFromCStrings -> xyzGetDatum * 'type' field: char -> text w/ full description * Removed 'type' information from documentation Best regards, Jesper
Attachment
On 9/27/16 10:10 AM, Jesper Pedersen wrote: > contrib/pageinspect/pageinspect--1.5--1.6.sql | 59 ++++ > contrib/pageinspect/pageinspect--1.5.sql | 279 ------------------ > contrib/pageinspect/pageinspect--1.6.sql | 346 ++++++++++++++++++++++ I think there is still a misunderstanding about these extension files. You only need to supply pageinspect--1.5--1.6.sql and leave all the other ones alone. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 09/29/2016 11:58 AM, Peter Eisentraut wrote: > On 9/27/16 10:10 AM, Jesper Pedersen wrote: >> contrib/pageinspect/pageinspect--1.5--1.6.sql | 59 ++++ >> contrib/pageinspect/pageinspect--1.5.sql | 279 ------------------ >> contrib/pageinspect/pageinspect--1.6.sql | 346 ++++++++++++++++++++++ > > I think there is still a misunderstanding about these extension files. > You only need to supply pageinspect--1.5--1.6.sql and leave all the > other ones alone. > Ok. Left the 'DATA' section in the Makefile, and the control file as is. Best regards, Jesper
Attachment
I wrote some tests for pageinspect, attached here (hash stuff in a separate patch). I think all the output numbers ought to be deterministic (I excluded everything that might contain xids). Please test. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
I think we should look into handling the different page types better. The hash_page_stats function was copied from btree, which only has one type. It's not clear whether all the values apply to each page type. At least they should be null if they don't apply. BRIN has a separate function for each page type, which might make more sense. I suggest taken the test suite that I posted and expanding the tests so that we see output for each different page type. Besides that, I would still like better data types for some of the output columns, as previously discussed. In addition to what I already pointed out, the ctid column can be of type ctid instead of text. Since the commit fest is drawing to a close, I'll set this patch as returned with feedback. Please continue working on it, since there is clearly renewed interest in hash indexes, and we'll need this type of functionality for that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 9/29/16 4:00 PM, Peter Eisentraut wrote: > Since the commit fest is drawing to a close, I'll set this patch as > returned with feedback. Actually, the CF app informs me that moving to the next CF is more appropriate, so I have done that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 09/29/2016 04:02 PM, Peter Eisentraut wrote: > On 9/29/16 4:00 PM, Peter Eisentraut wrote: >> Since the commit fest is drawing to a close, I'll set this patch as >> returned with feedback. > > Actually, the CF app informs me that moving to the next CF is more > appropriate, so I have done that. > Ok, thanks for your feedback. Maybe "Returned with Feedback" is more appropriate, as there is still development left. Best regards, Jesper
On Mon, Oct 3, 2016 at 9:52 PM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > Maybe "Returned with Feedback" is more appropriate, as there is still > development left. I have switched it to "waiting on author". -- Michael
On 10/3/16 8:52 AM, Jesper Pedersen wrote: > On 09/29/2016 04:02 PM, Peter Eisentraut wrote: >> On 9/29/16 4:00 PM, Peter Eisentraut wrote: >>> Since the commit fest is drawing to a close, I'll set this patch as >>> returned with feedback. >> >> Actually, the CF app informs me that moving to the next CF is more >> appropriate, so I have done that. >> > > Ok, thanks for your feedback. > > Maybe "Returned with Feedback" is more appropriate, as there is still > development left. I have applied the documentation change that introduced subsections, which seems quite useful independently. I have also committed the tests that I proposed and will work through the failures. As no new patch has been posted for the 2016-11 CF, I will close the patch entry now. Please submit an updated patch when you have the time, keeping an eye on ongoing work to update hash indexes. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/01/2016 03:28 PM, Peter Eisentraut wrote: >> Ok, thanks for your feedback. >> >> Maybe "Returned with Feedback" is more appropriate, as there is still >> development left. > > I have applied the documentation change that introduced subsections, > which seems quite useful independently. I have also committed the tests > that I proposed and will work through the failures. > > As no new patch has been posted for the 2016-11 CF, I will close the > patch entry now. Please submit an updated patch when you have the time, > keeping an eye on ongoing work to update hash indexes. > Agreed. Best regards, Jesper
On 11/1/16 3:28 PM, Peter Eisentraut wrote: > I have also committed the tests > that I proposed and will work through the failures. So, we're down to crashes in gin_metapage_info() on ia64 and sparc64. My guess is that the raw page data that is passed into the function needs to be 8-byte aligned before being accessed as GinMetaPageData. (Maybe GinPageGetMeta() should do it?) Does anyone have a box like this handy to experiment? Of course an actual core dump could tell more. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > So, we're down to crashes in gin_metapage_info() on ia64 and sparc64. > My guess is that the raw page data that is passed into the function > needs to be 8-byte aligned before being accessed as GinMetaPageData. That's what it looks like to me, too. The "bytea" page image is guaranteed to be improperly aligned for 64-bit access, since it will have an int32 length word before the actual page data, breaking the alignment that would exist for a page sitting in a page buffer. This is likely to be a problem for more things than just gin_metapage_info(); sooner or later it could affect just about everything in pageinspect. > (Maybe GinPageGetMeta() should do it?) I think the right thing is likely to be to copy the presented bytea into a palloc'd (and therefore properly aligned) buffer. And not just in this one function. regards, tom lane
On 11/2/16 1:54 PM, Tom Lane wrote: > I think the right thing is likely to be to copy the presented bytea > into a palloc'd (and therefore properly aligned) buffer. And not > just in this one function. Does the attached look reasonable? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 11/2/16 1:54 PM, Tom Lane wrote: >> I think the right thing is likely to be to copy the presented bytea >> into a palloc'd (and therefore properly aligned) buffer. And not >> just in this one function. > Does the attached look reasonable? I'd be inclined to wrap it in some kind of support function for conciseness; and you could put the other boilerplate (the length check) there too. Otherwise, +1. regards, tom lane
Hi All, I have spent some time in reviewing the latest v8 patch from Jesper and could find some issues which i would like to list down, 1) Firstly, the DATA section in the Makefile is referring to "pageinspect--1.6.sql" file and currently this file is missing. +DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql 2) Secondly, I can see that the server is crashing when following queries are executed on a code build with cassert-enabled. postgres=# SELECT * FROM hash_page_items(get_raw_page('con_hash_index', 25000)); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> \q [ashu@localhost bin]$ ./psql -d postgres psql (10devel) Type "help" for help. postgres=# SELECT * FROM hash_page_stats(get_raw_page('con_hash_index', 25000)); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> \q postgres=# SELECT * FROM hash_metapage_info(get_raw_page('con_hash_index', 25000)); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> 3) Thirdly, AFAIU the functions hash_page_stats() and hash_page_items() are basically dedicated for bucket and overflow pages in hash index and may not be used with bitmap page. It is currently blocked for metapage but not for the bitmap page. I think we need to block it for bitmap page as well. Apart from issues listed above there are few other review comments that has not been addressed yet. Considering the fact that it is a very important project from hash index perspective and as there has been no work being done on this project for quite some time, I thought of working on it to close some of the open review comments along with the issues found by myself. I would like to share the revised patch which is based on Jesper's v8 patch. The patch has mainly following following changes in it, 1) It introduces two new functions hash_page_type() and hash_bitmap_info(). hash_page_type basically displays the type of hash page whereas hash_bitmap_info() shows the status of a bit for a particular overflow page in bitmap page of hash index. 2) The functions hash_page_stats() and hash_page_items() basically shows the information about data stored in bucket and overflow pages of hash index. If a metapage or bitmap page is passed as an input to this function it throws an error saying "Expected page type:'' got: ''". 3) It also improves verify_hash_page() function to handle any type of page in hash index. It is now being used as verify_hash_page('raw_page', <page_type>) to verify if the page passed to this function is of 'page_type' or not. Here page_type can be LH_META_PAGE, LH_BUCKET_PAGE, LH_OVERFLOW_PAGE, LH_BITMAP_PAGE. Attached is the revised patch. Please have a look and let me know your feedback. I have also changed the status for this patch in the upcoming commitfest to "needs review". Thanks. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi, On 12/20/2016 05:55 AM, Ashutosh Sharma wrote: > Attached is the revised patch. Please have a look and let me know your > feedback. I have also changed the status for this patch in the > upcoming commitfest to "needs review". Thanks. > I was planning to submit a new version next week for CF/January, so I'll review your changes with the previous feedback in mind. Thanks for working on this ! Best regards, Jesper
Hi Jesper, > I was planning to submit a new version next week for CF/January, so I'll > review your changes with the previous feedback in mind. > > Thanks for working on this ! As i was not seeing any updates from you for last 1 month, I thought of working on it. I have created a commit-fest entry for it and have added your name as main Author. I am sorry, I think i should have taken your opinion before starting to work on it.
Hi Ashutosh, On 12/20/2016 05:55 AM, Ashutosh Sharma wrote: > 1) It introduces two new functions hash_page_type() and > hash_bitmap_info(). hash_page_type basically displays the type of hash > page whereas hash_bitmap_info() shows the status of a bit for a > particular overflow page in bitmap page of hash index. > > 2) The functions hash_page_stats() and hash_page_items() basically > shows the information about data stored in bucket and overflow pages > of hash index. If a metapage or bitmap page is passed as an input to > this function it throws an error saying "Expected page type:'' got: > ''". > > 3) It also improves verify_hash_page() function to handle any type of > page in hash index. It is now being used as > verify_hash_page('raw_page', <page_type>) to verify if the page passed > to this function is of 'page_type' or not. Here page_type can be > LH_META_PAGE, LH_BUCKET_PAGE, LH_OVERFLOW_PAGE, LH_BITMAP_PAGE. > Here is an updated patch with some changes. * Rename convert_ovflblkno_to_bitno to _hash_ovflblkno_to_bitno Such that there is only 1 method, which is exposed * Readded pageinspect--1.6.sql In order to have the latest pageinspect interface in 1 file, as we need something to install from. Removing --1.5.sql otherwise would give test=# CREATE EXTENSION "pageinspect"; ERROR: extension "pageinspect" has no installation script nor update path for version "1.6" * Minor documentation changes Look it over, and maybe there is a simple test case we can add for this. Best regards, Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi Jesper, > * Rename convert_ovflblkno_to_bitno to _hash_ovflblkno_to_bitno > > Such that there is only 1 method, which is exposed Okay, Thanks. It makes sense. > > * Readded pageinspect--1.6.sql > > In order to have the latest pageinspect interface in 1 file, as we need > something to install from. I think there should be no problem even if we simply add pageinspect--1.5--1.6.sql file instead of removing 1.5.sql as with the latest changes to the create extension infrastructure in postgresql it automatically finds a lower version script if unable to find the default version script and then upgrade it to the latest version. > Removing --1.5.sql otherwise would give > > test=# CREATE EXTENSION "pageinspect"; > ERROR: extension "pageinspect" has no installation script nor update path > for version "1.6" I didn't get this. Do you mean to say that if you add 1.6.sql and do not remove 1.5.sql you get this error when trying to add pageinspect module. I didn't get any such error. See below, postgres=# create extension pageinspect WITH VERSION '1.6'; CREATE EXTENSION > > * Minor documentation changes > > Look it over, and maybe there is a simple test case we can add for this. Peter has already written a test-case-[1] based on your earlier patch for supporting hash index with pageinspect module. Once the latest patch (v10) becomes stable i will share a separete patch having a test-case for hash index. Infact I will try to modify an already existing patch by Peter. [1]-https://www.postgresql.org/message-id/bcf8c21b-702e-20a7-a4b0-236ed2363d84%402ndquadrant.com -- With Regards, Ashutosh Sharma EnterpriseDB: http://www.enterprisedb.com
Hi Ashutosh, On 01/05/2017 07:13 AM, Ashutosh Sharma wrote: >> * Readded pageinspect--1.6.sql >> >> In order to have the latest pageinspect interface in 1 file, as we need >> something to install from. > > I think there should be no problem even if we simply add > pageinspect--1.5--1.6.sql file instead of removing 1.5.sql as with the > latest changes to the create extension infrastructure in postgresql it > automatically finds a lower version script if unable to find the > default version script and then upgrade it to the latest version. > >> Removing --1.5.sql otherwise would give >> >> test=# CREATE EXTENSION "pageinspect"; >> ERROR: extension "pageinspect" has no installation script nor update path >> for version "1.6" > > I didn't get this. Do you mean to say that if you add 1.6.sql and do > not remove 1.5.sql you get this error when trying to add pageinspect > module. I didn't > get any such error. See below, > > postgres=# create extension pageinspect WITH VERSION '1.6'; > CREATE EXTENSION > The previous patch was using pageinspect--1.5.sql as a base, and then uses pageinspect--1.5-1.6.sql to upgrade to version 1.6. Removing pageinspect--1.5.sql, and adding pageinspect--1.6.sql with the current interface will use pageinspect--1.6.sql directly where as existing installations will use the upgrade scripts. Hence I don't see a reason why we should keep pageinspect--1.5.sql around when we can provide a clear interface description in a pageinspect--1.6.sql. > Peter has already written a test-case-[1] based on your earlier patch > for supporting hash index with pageinspect module. Once the latest > patch (v10) becomes stable i will share a separete patch having a > test-case for hash index. Infact I will try to modify an already > existing patch by Peter. > Ok, sounds good. Best regards, Jesper
> The previous patch was using pageinspect--1.5.sql as a base, and then uses > pageinspect--1.5-1.6.sql to upgrade to version 1.6. > > Removing pageinspect--1.5.sql, and adding pageinspect--1.6.sql with the > current interface will use pageinspect--1.6.sql directly where as existing > installations will use the upgrade scripts. > > Hence I don't see a reason why we should keep pageinspect--1.5.sql around > when we can provide a clear interface description in a pageinspect--1.6.sql. okay, agreed. With Regards, Ashutosh Sharma EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 6, 2017 at 1:14 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> The previous patch was using pageinspect--1.5.sql as a base, and then uses >> pageinspect--1.5-1.6.sql to upgrade to version 1.6. >> >> Removing pageinspect--1.5.sql, and adding pageinspect--1.6.sql with the >> current interface will use pageinspect--1.6.sql directly where as existing >> installations will use the upgrade scripts. >> >> Hence I don't see a reason why we should keep pageinspect--1.5.sql around >> when we can provide a clear interface description in a pageinspect--1.6.sql. > > okay, agreed. No, you're missing the point. The patch doesn't need to add pageinspect--1.6.sql at all, nor does it remove pageinspect--1.5.sql. It only needs to add pageinspect--1.5--1.6.sql and change the default version to 1.6. No other changes are required. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > No, you're missing the point. The patch doesn't need to add > pageinspect--1.6.sql at all, nor does it remove pageinspect--1.5.sql. > It only needs to add pageinspect--1.5--1.6.sql and change the default > version to 1.6. No other changes are required. Right, this is a new recommended process since commit 40b449ae8, which see for rationale. Use, eg, commit 11da83a0e as a model for extension update patches. regards, tom lane
On 01/10/2017 10:39 AM, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> No, you're missing the point. The patch doesn't need to add >> pageinspect--1.6.sql at all, nor does it remove pageinspect--1.5.sql. >> It only needs to add pageinspect--1.5--1.6.sql and change the default >> version to 1.6. No other changes are required. > > Right, this is a new recommended process since commit 40b449ae8, > which see for rationale. > > Use, eg, commit 11da83a0e as a model for extension update patches. > Thanks for the commit ids ! Revised patched attached. Best regards, Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Tue, Jan 10, 2017 at 12:19 PM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > Revised patched attached. + itup = (IndexTuple) PageGetItem(uargs->page, id); + + MemSet(nulls, 0, sizeof(nulls)); + + j = 0; + values[j++] = UInt16GetDatum(uargs->offset); + values[j++] = CStringGetTextDatum(psprintf("(%u,%u)", + BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)), + itup->t_tid.ip_posid)); + + ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info); + dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info); It seems like this could be used to index off the end of the page, if you feed it invalid data. + dump = palloc0(dlen * 3 + 1); This is wasteful. Just use palloc and install a terminating NUL byte instead. + sprintf(dump, "%02x", *(ptr + off) & 0xff); *(ptr + off) is normally written ptr[off]. + if (pageopaque->hasho_flag != LH_OVERFLOW_PAGE) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("page is not an overflow page"), + errdetail("Expected %08x, got %08x.", + LH_OVERFLOW_PAGE, pageopaque->hasho_flag))); I think this is an unnecessary test given that you've already called verify_hash_page(). + if (bitmappage >= metap->hashm_nmaps) + elog(ERROR, "invalid overflow bit number %u", ovflbitno); I think this should be an ereport(), because it's reachable given a bogus page which a user might construct (or a corrupted page). +test=# SELECT * FROM hash_page_items(get_raw_page('con_hash_index', 1)); + itemoffset | ctid | data +------------+-----------------+------------------------- + 1 | (3145728,14376) | 00 c0 ca 3e 00 00 00 00 + 2 | (3145728,14376) | 00 c0 ca 3e 00 00 00 00 + 3 | (3407872,14376) | 00 c0 ca 3e 00 00 00 00 Won't the first 4 bytes always be a hash code and the second 4 bytes always 0? Should we just return the hash code as an int4 or int8 instead of pretending it's a bunch of uninterpretable binary data? + <function>hash_bitmap_info</function> returns information about + the status of a bit for an overflow page in bitmap page of a <acronym>HASH</acronym> + index. For example: +<screen> +test=# SELECT * FROM hash_bitmap_info('con_hash_index', 2050); + bitmapblkno | bitmapbit +-------------+----------- + 65 | 1 +</screen> I find this hard to understand. This says that it returns information, but the nature of the returned information is unspecified and in my opinion unclear. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> + itup = (IndexTuple) PageGetItem(uargs->page, id); > + > + MemSet(nulls, 0, sizeof(nulls)); > + > + j = 0; > + values[j++] = UInt16GetDatum(uargs->offset); > + values[j++] = CStringGetTextDatum(psprintf("(%u,%u)", > + > BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)), > + itup->t_tid.ip_posid)); > + > + ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info); > + dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info); > > It seems like this could be used to index off the end of the page, if > you feed it invalid data. > I think it should not exceed the page size. This is how it has been implemented for btree as well. However, just to be on a safer side i am planning to add following 'if check' to ensure that we do not go beyond the page size while reading tuples. ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info); + if (ptr > page + BLCKSZ) + /* Error */ dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info); Meanwhile, I am working on other review comments and will try to share an updated patch asap. With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com
Jesper Pedersen wrote: > + /* > + * We copy the page into local storage to avoid holding pin on the > + * buffer longer than we must, and possibly failing to release it at > + * all if the calling query doesn't fetch all rows. > + */ > + mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx); > + > + uargs = palloc(sizeof(struct user_args)); > + > + uargs->page = palloc(BLCKSZ); Is this necessary? I think this was copied from btreefuncs, but there is no buffer release in this code. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, > + values[j++] = UInt16GetDatum(uargs->offset); > + values[j++] = CStringGetTextDatum(psprintf("(%u,%u)", > + > BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)), > + itup->t_tid.ip_posid)); > + > + ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info); > + dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info); > > It seems like this could be used to index off the end of the page, if > you feed it invalid data. okay, I have handled it in the attached patch. > > + dump = palloc0(dlen * 3 + 1); > > This is wasteful. Just use palloc and install a terminating NUL byte instead. > fixed. Please check the attached patch. > + sprintf(dump, "%02x", *(ptr + off) & 0xff); > > *(ptr + off) is normally written ptr[off]. > corrected. > + if (pageopaque->hasho_flag != LH_OVERFLOW_PAGE) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("page is not an overflow page"), > + errdetail("Expected %08x, got %08x.", > + LH_OVERFLOW_PAGE, pageopaque->hasho_flag))); > > I think this is an unnecessary test given that you've already called > verify_hash_page(). > Yes, it is not required. I have removed it in the attached patch. > + if (bitmappage >= metap->hashm_nmaps) > + elog(ERROR, "invalid overflow bit number %u", ovflbitno); > > I think this should be an ereport(), because it's reachable given a > bogus page which a user might construct (or a corrupted page). > okay, I have corrected it. > +test=# SELECT * FROM hash_page_items(get_raw_page('con_hash_index', 1)); > + itemoffset | ctid | data > +------------+-----------------+------------------------- > + 1 | (3145728,14376) | 00 c0 ca 3e 00 00 00 00 > + 2 | (3145728,14376) | 00 c0 ca 3e 00 00 00 00 > + 3 | (3407872,14376) | 00 c0 ca 3e 00 00 00 00 > > Won't the first 4 bytes always be a hash code and the second 4 bytes > always 0? Should we just return the hash code as an int4 or int8 > instead of pretending it's a bunch of uninterpretable binary data? > Yes, the first 4 bytes represents a hash code and the second 4 bytes is always zero. Now, returning the hash code as int4. > +test=# SELECT * FROM hash_bitmap_info('con_hash_index', 2050); > + bitmapblkno | bitmapbit > +-------------+----------- > + 65 | 1 > +</screen> > > I find this hard to understand. This says that it returns > information, but the nature of the returned information is unspecified > and in my opinion unclear. > I have rephrased it to make it more clear. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi, > >> + /* >> + * We copy the page into local storage to avoid holding pin on the >> + * buffer longer than we must, and possibly failing to release it at >> + * all if the calling query doesn't fetch all rows. >> + */ >> + mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx); >> + >> + uargs = palloc(sizeof(struct user_args)); >> + >> + uargs->page = palloc(BLCKSZ); > > Is this necessary? I think this was copied from btreefuncs, but there > is no buffer release in this code. Yes, it was copied from btreefuncs and is not required in this case as we are already passing raw_page as an input to hash_page_items. I havetaken care of it in the updated patch shared up thread. With Regards, Ashutosh Sharma EnterpriseDB: http://www.enterprisedb.com
Hi, On 01/11/2017 03:16 PM, Ashutosh Sharma wrote: > > I have rephrased it to make it more clear. > Rebased, and removed the compile warn in hashfuncs.c Best regards, Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Fri, Jan 13, 2017 at 12:49 AM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > Rebased, and removed the compile warn in hashfuncs.c I did some testing and review for the patch. I did not see any major issue, but there are few minor cases for which I have few suggestions. 1. Source File : /doc/src/sgml/pageinspect.sgml example given. SELECT * FROM hash_page_type(get_raw_page('con_hash_index', 0)); can be changed to SELECT hash_page_type(get_raw_page('con_hash_index', 0)); 2. @verify_hash_page I can easily produce this error right after the split, so there will be pages which are not inited before it is used. So an error saying it is unexpected might be slightly misleading. I think error message should be improved. postgres=# SELECT hash_page_type(get_raw_page('i1', 12)); ERROR: index table contains unexpected zero page 3. @verify_hash_page (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("page is not a HASH page"), + errdetail("Expected %08x, got %08x.", In error message "HASH" can downcase to "hash". That makes error messages consistent with other error messages like "page is not a hash page of expected type" 4. The condition is raw_page_size != BLCKSZ; But error msg "input page too small"; I think error message should be changed to "invalid page size". if (raw_page_size != BLCKSZ) + ereport(ERROR,i+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("input page too small"), + errdetail("Expected size %d, got %d", + BLCKSZ, raw_page_size))); 5. @hash_bitmap_info Metabuf can be released once after bitmapblkno is set it is off no use. _hash_relbuf(indexRel, metabuf); is Write lock required here for bitmap page? mapbuf = _hash_getbuf(indexRel, bitmapblkno, HASH_WRITE, LH_BITMAP_PAGE); 6. You have renamed convert_ovflblkno_to_bitno to _hash_ovflblkno_to_bitno. But unfortunately, you also moved the function down. The diff appears as if you have completely replaced it. I think we should put it back to at old place. 7. I think it is not your bug, but probably a bug in Hash index itself; page flag is set to 66 (for below test); So the page is both LH_BUCKET_PAGE and LH_BITMAP_PAGE. Is not this a bug in hash index? I have inserted 3000 records. Hash index is on integer column. select hasho_flag FROM hash_page_stats(get_raw_page('i1', 1));hasho_flag ------------ 66 -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
> > 7. I think it is not your bug, but probably a bug in Hash index > itself; page flag is set to 66 (for below test); So the page is both > LH_BUCKET_PAGE and LH_BITMAP_PAGE. Is not this a bug in hash index? > > I have inserted 3000 records. Hash index is on integer column. > select hasho_flag FROM hash_page_stats(get_raw_page('i1', 1)); > hasho_flag > ------------ > 66 > Here is the test for same. After insertion of 3000 records, I think at first split we can see bucket page flag is set with LH_BITMAP_PAGE. create table t1( ti int); create index i1 on t1 using hash(ti); postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1));hasho_flag ------------ 2 postgres=# insert into t1 select generate_series(1, 1000); INSERT 0 1000 postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1));hasho_flag ------------ 2 (1 row) postgres=# insert into t1 select generate_series(1, 1000); INSERT 0 1000 postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1));hasho_flag ------------ 2 (1 row) postgres=# insert into t1 select generate_series(1, 1000); INSERT 0 1000 postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1));hasho_flag ------------ 66 (1 row) I think If this is a bug then example given in pageinspect.sgml should be changed in your patch after the bug fix. +hasho_prevblkno | -1 +hasho_nextblkno | 8474 +hasho_bucket | 0 +hasho_flag | 66 +hasho_page_id | 65408 -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
On Tue, Jan 17, 2017 at 1:22 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >> >> 7. I think it is not your bug, but probably a bug in Hash index >> itself; page flag is set to 66 (for below test); So the page is both >> LH_BUCKET_PAGE and LH_BITMAP_PAGE. Is not this a bug in hash index? >> >> I have inserted 3000 records. Hash index is on integer column. >> select hasho_flag FROM hash_page_stats(get_raw_page('i1', 1)); >> hasho_flag >> ------------ >> 66 >> > > Here is the test for same. After insertion of 3000 records, I think at > first split we can see bucket page flag is set with LH_BITMAP_PAGE. > I think your calculation is not right. 66 indicates LH_BUCKET_PAGE | LH_BUCKET_NEEDS_SPLIT_CLEANUP which is a valid state after the split. This flag will be cleared either during next split or when vacuum operates on that index page. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Jan 17, 2017 at 3:06 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > > I think your calculation is not right. 66 indicates LH_BUCKET_PAGE | > LH_BUCKET_NEEDS_SPLIT_CLEANUP which is a valid state after the split. > This flag will be cleared either during next split or when vacuum > operates on that index page. Oops, I errored in decimal to binary conversion. Sorry, I was wrong. What you said is correct. No need to change .sgml file. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 13, 2017 at 12:49 AM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > Hi, > > On 01/11/2017 03:16 PM, Ashutosh Sharma wrote: >> >> >> I have rephrased it to make it more clear. >> > > Rebased, and removed the compile warn in hashfuncs.c > Review comments: 1. +static Page +verify_hash_page(bytea *raw_page, int flags) Few checks for meta page are missing, refer _hash_checkpage. 2. + if (!superuser()) + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + (errmsg("must be superuser to use pageinspect functions")))); Isn't it better to use "raw page" instead of "pageinspect" in the above error message? If you agree, then fix other similar occurrences in the patch. 3. + values[j++] = CStringGetTextDatum(psprintf("(%u,%u)", + BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)), + itup->t_tid.ip_posid)); Fix indentation in the third line. 4. +Datum +hash_page_items(PG_FUNCTION_ARGS) +{ + bytea *raw_page = PG_GETARG_BYTEA_P(0); +Datum +hash_bitmap_info(PG_FUNCTION_ARGS) +{ + Oid indexRelid = PG_GETARG_OID(0); + uint32 ovflblkno = PG_GETARG_UINT32(1); Is there a reason for keeping the input arguments for hash_bitmap_info() different from hash_page_items()? 5. +hash_metapage_info(PG_FUNCTION_ARGS) { .. + spares = palloc0(HASH_MAX_SPLITPOINTS * 5 + 1); .. + mapp = palloc0(HASH_MAX_BITMAPS * 5 + 1); .. } Don't you think we should free the allocated memory in this function? Also, why are you 5 as a multiplier in both the above pallocs, shouldn't it be 4? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Hi, > 1. > +static Page > +verify_hash_page(bytea *raw_page, int flags) > > Few checks for meta page are missing, refer _hash_checkpage. okay, I have added the checks for meta page as well. Please refer to attached patch. > > 2. > + if (!superuser()) > + ereport(ERROR, > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > + (errmsg("must be superuser to use pageinspect functions")))); > > Isn't it better to use "raw page" instead of "pageinspect" in the > above error message? If you agree, then fix other similar occurrences > in the patch. Yes, knowing that we deal with raw pages as in brin index it would be good to use raw page instead of pageinspect to maintain the consistency in the error message. > > 3. > + values[j++] = CStringGetTextDatum(psprintf("(%u,%u)", > + BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)), > + itup->t_tid.ip_posid)); > > Fix indentation in the third line. > Corrected. Please check the attached patch. > 4. > +Datum > +hash_page_items(PG_FUNCTION_ARGS) > +{ > + bytea *raw_page = PG_GETARG_BYTEA_P(0); > > > +Datum > +hash_bitmap_info(PG_FUNCTION_ARGS) > +{ > + Oid indexRelid = PG_GETARG_OID(0); > + uint32 ovflblkno = PG_GETARG_UINT32(1); > > Is there a reason for keeping the input arguments for > hash_bitmap_info() different from hash_page_items()? > Yes, there are two reasons behind it. Firstly, we need metapage to identify the bitmap page that holds the information about the overflow page passed as an input to this function. Secondly, we will have to input overflow block number as an input to this function so as to determine the overflow bit number which can be used further to identify the bitmap page. + /* Read the metapage so we can determine which bitmap page to use */ + metabuf = _hash_getbuf(indexRel, HASH_METAPAGE, HASH_READ, LH_META_PAGE); + metap = HashPageGetMeta(BufferGetPage(metabuf)); + + /* Identify overflow bit number */ + ovflbitno = _hash_ovflblkno_to_bitno(metap, ovflblkno); + + bitmappage = ovflbitno >> BMPG_SHIFT(metap); + bitmapbit = ovflbitno & BMPG_MASK(metap); + + if (bitmappage >= metap->hashm_nmaps) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid overflow bit number %u", ovflbitno))); + + bitmapblkno = metap->hashm_mapp[bitmappage]; > 5. > +hash_metapage_info(PG_FUNCTION_ARGS) > { > .. > + spares = palloc0(HASH_MAX_SPLITPOINTS * 5 + 1); > .. > + mapp = palloc0(HASH_MAX_BITMAPS * 5 + 1); > .. > } > > Don't you think we should free the allocated memory in this function? > Also, why are you 5 as a multiplier in both the above pallocs, > shouldn't it be 4? Yes, we should free it. We have used 5 as a multiplier instead of 4 because of ' ' character. Apart from above comments, the attached patch also handles all the comments from Mithun. With Regards, Ashutosh Sharma. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi, On 01/18/2017 04:54 AM, Ashutosh Sharma wrote: >> Is there a reason for keeping the input arguments for >> hash_bitmap_info() different from hash_page_items()? >> > > Yes, there are two reasons behind it. > > Firstly, we need metapage to identify the bitmap page that holds the > information about the overflow page passed as an input to this > function. > > Secondly, we will have to input overflow block number as an input to > this function so as to determine the overflow bit number which can be > used further to identify the bitmap page. > > + /* Read the metapage so we can determine which bitmap page to use */ > + metabuf = _hash_getbuf(indexRel, HASH_METAPAGE, HASH_READ, LH_META_PAGE); > + metap = HashPageGetMeta(BufferGetPage(metabuf)); > + > + /* Identify overflow bit number */ > + ovflbitno = _hash_ovflblkno_to_bitno(metap, ovflblkno); > + > + bitmappage = ovflbitno >> BMPG_SHIFT(metap); > + bitmapbit = ovflbitno & BMPG_MASK(metap); > + > + if (bitmappage >= metap->hashm_nmaps) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("invalid overflow bit number %u", ovflbitno))); > + > + bitmapblkno = metap->hashm_mapp[bitmappage]; > > As discussed off-list (along with my patch that included some of these fixes), it would require calculation from the user to be able to provide the necessary pages through get_raw_page(). Other ideas on how to improve this are most welcome. > Apart from above comments, the attached patch also handles all the > comments from Mithun. > Please, include a version number for your patch files in the future. Fixed in this version: * verify_hash_page: Display magic in hex, like hash_metapage_info * Update header for hash_page_type Moving the patch back to 'Needs Review'. Best regards, Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi, I got some 'trailing whitespace' error (shown below) when git applying v7 patch attached upthread. I have corrected the same and also ran pgindent on a new file 'hashfuncs.c' added as a part of this project. Attached is the updated v8 patch. 0001-Add-support-for-hash-index-in-pageinspect-contrib-mo_v7.patch:28: trailing whitespace. brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES) 0001-Add-support-for-hash-index-in-pageinspect-contrib-mo_v7.patch:35: trailing whitespace. DATA = pageinspect--1.5.sql pageinspect--1.5--1.6.sql \ 0001-Add-support-for-hash-index-in-pageinspect-contrib-mo_v7.patch:36: trailing whitespace. pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \ 0001-Add-support-for-hash-index-in-pageinspect-contrib-mo_v7.patch:37: trailing whitespace. pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \ 0001-Add-support-for-hash-index-in-pageinspect-contrib-mo_v7.patch:38: trailing whitespace. pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql pgindent: ------------- ./src/tools/pgindent/pgindent contrib/pageinspect/hashfuncs.c On Wed, Jan 18, 2017 at 9:15 PM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > Hi, > > > On 01/18/2017 04:54 AM, Ashutosh Sharma wrote: >>> >>> Is there a reason for keeping the input arguments for >>> hash_bitmap_info() different from hash_page_items()? >>> >> >> Yes, there are two reasons behind it. >> >> Firstly, we need metapage to identify the bitmap page that holds the >> information about the overflow page passed as an input to this >> function. >> >> Secondly, we will have to input overflow block number as an input to >> this function so as to determine the overflow bit number which can be >> used further to identify the bitmap page. >> >> + /* Read the metapage so we can determine which bitmap page to use */ >> + metabuf = _hash_getbuf(indexRel, HASH_METAPAGE, HASH_READ, >> LH_META_PAGE); >> + metap = HashPageGetMeta(BufferGetPage(metabuf)); >> + >> + /* Identify overflow bit number */ >> + ovflbitno = _hash_ovflblkno_to_bitno(metap, ovflblkno); >> + >> + bitmappage = ovflbitno >> BMPG_SHIFT(metap); >> + bitmapbit = ovflbitno & BMPG_MASK(metap); >> + >> + if (bitmappage >= metap->hashm_nmaps) >> + ereport(ERROR, >> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >> + errmsg("invalid overflow bit number %u", ovflbitno))); >> + >> + bitmapblkno = metap->hashm_mapp[bitmappage]; >> >> > > As discussed off-list (along with my patch that included some of these > fixes), it would require calculation from the user to be able to provide the > necessary pages through get_raw_page(). > > Other ideas on how to improve this are most welcome. > >> Apart from above comments, the attached patch also handles all the >> comments from Mithun. >> > > Please, include a version number for your patch files in the future. > > Fixed in this version: > > * verify_hash_page: Display magic in hex, like hash_metapage_info > * Update header for hash_page_type > > Moving the patch back to 'Needs Review'. > > Best regards, > Jesper > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Jan 18, 2017 at 3:24 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > >> 4. >> +Datum >> +hash_page_items(PG_FUNCTION_ARGS) >> +{ >> + bytea *raw_page = PG_GETARG_BYTEA_P(0); >> >> >> +Datum >> +hash_bitmap_info(PG_FUNCTION_ARGS) >> +{ >> + Oid indexRelid = PG_GETARG_OID(0); >> + uint32 ovflblkno = PG_GETARG_UINT32(1); >> >> Is there a reason for keeping the input arguments for >> hash_bitmap_info() different from hash_page_items()? >> > > Yes, there are two reasons behind it. > > Firstly, we need metapage to identify the bitmap page that holds the > information about the overflow page passed as an input to this > function. > Okay, for that probably index oid is sufficient. > Secondly, we will have to input overflow block number as an input to > this function so as to determine the overflow bit number which can be > used further to identify the bitmap page. > I think you can get that from bucket number by using BUCKET_TO_BLKNO. You can get bucket number from page's opaque data. So, if we follow that then you can have a prototype of a function as hash_bitmap_info(index_oid, page bytea) which will be quite similar to other API's exposed by this module. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
>> Secondly, we will have to input overflow block number as an input to >> this function so as to determine the overflow bit number which can be >> used further to identify the bitmap page. >> > > I think you can get that from bucket number by using BUCKET_TO_BLKNO. > You can get bucket number from page's opaque data. So, if we follow > that then you can have a prototype of a function as > hash_bitmap_info(index_oid, page bytea) which will be quite similar to > other API's exposed by this module. > AFAIU, BUCKET_TO_BLKNO will give us the block number of a primary bucket page rather than overflow page and what we want is an overflow block number so as to determine the overflow bit number which can be used further to identify the bitmap page. If we pass overflow page as an input to hash_bitmap_info() we won't be able to get it's block number. Considering above facts, I think we need to input overflow block number rather than overflow page to hash_bitmap_info(). Please let me know your opinion on this. Thanks.
On Tue, Jan 24, 2017 at 11:41 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >>> Secondly, we will have to input overflow block number as an input to >>> this function so as to determine the overflow bit number which can be >>> used further to identify the bitmap page. >>> >> >> I think you can get that from bucket number by using BUCKET_TO_BLKNO. >> You can get bucket number from page's opaque data. So, if we follow >> that then you can have a prototype of a function as >> hash_bitmap_info(index_oid, page bytea) which will be quite similar to >> other API's exposed by this module. >> > > AFAIU, BUCKET_TO_BLKNO will give us the block number of a primary > bucket page rather than overflow page and what we want is an overflow > block number so as to determine the overflow bit number which can be > used further to identify the bitmap page. > Right. > If we pass overflow page as > an input to hash_bitmap_info() we won't be able to get it's block > number. > I think we can get it by using its prev block number, but not sure if it is worth the complexity, so let's keep the interface as proposed by you. TBH, I am not sure how important is to expose this (hash_bitmap_info()) function, but let's keep it and see if committer has any opinion on this API. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Jan 19, 2017 at 5:08 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: Thanks, Ashutosh and Jesper. I have tested the patch I do not have any more comments so making it ready for committer. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
On Tue, Jan 24, 2017 at 9:59 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Thu, Jan 19, 2017 at 5:08 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > Thanks, Ashutosh and Jesper. I have tested the patch I do not have any > more comments so making it ready for committer. I took a look at this patch. I think hash_page_items() is buggy. It's a little confused about whether it is building an array of datums (which can be assembled into a tuple using heap_form_tuple) or an array of cstrings (which can be assembled into a tuple using BuildTupleFromCStrings). It's mostly the former: the array is of type Datum, and two of the three values put into it are datums. But the code that puts the TID in there is stolen from bt_page_items(), which is constructing an array of cstrings, so it's wrong here. The practical consequence of this is, I believe, that the TIDs output by this function will be totally garbled and wrong, which is possibly why the example in the documentation looks so wacky: + 1 | (3407872,12584) | 1053474816 + 2 | (3407872,12584) | 1053474816 The heap tuple is on page 3407872 at line pointer offset 12584? That's an awfully but not implausibly big page number (about 26GB into the relation) and an awfully and implausibly large line pointer offset (how do we fit 12584 index tuples into an 8192-byte page?). Also, the fact that this example has two index entries with the same hash code pointing at the same heap TID seems wrong; wouldn't that result in index scans returning duplicates? I think what's actually happening here is that (3407872,12584) is actually the attempt to interpret some chunk of memory containing the text representation of a TID as an actual TID. When I print those numbers as hex, I get 0x343128, and those are the digits "4" and "1" and an opening parenthesis ")" as ASCII, so that might fit this theory. The code that tries to extract the hashcode from the item also looks suspicious. I'm not 100% sure whether it's wrong or just strangely-written, but it doesn't make much sense to extract the item contents, then using sprintf() to turn that into a hex string of bytes, then parse the hex string using strtol() to get an integer back. I think what it ought to be doing is getting a pointer to the index tuple and then calling _hash_get_indextuple_hashkey. Another minor complaint about hash_page_items is that it doesn't pfree(uargs->page). I'm not sure it's necessary to pfree anything here, but if we're going to do it I think we might as well be consistent with the btree case. Anyway it can hardly make sense to free the 8-byte structure and not the 8192-byte page to which it's pointing. In hash_bimap_info(), we go to the trouble of creating a raw page but never do anything with it. I guess the idea here is just to error out if the supplied page number is not an overflow page, but there are no comments explaining that. Anyway, I'm not sure it's a very good idea, because it means that if you try to write a query to interrogate the status of all the bitmap pages, it's going to read all of the overflow pages to which they point, which makes the operation a LOT more expensive. I think it would be better to leave this part out; if the user wants to know which pages are actually overflow pages, they can use hash_page_type() to figure it out. Let's make it the job of this function just to check the available/free status of the page in the bitmap, without reading the bitmap itself. In doing that, I think it should either return (bitmapblkno, bitmapbit, bit) or just bit. Returning bitmapblkno but not bitmapbit seems strange. Also, I think it should return bit as a bool, not int4. Another general note is that, in general, if you use the BlahGetDatum() function to construct a datum, the SQL type should be match the macro you picked - e.g. if the SQL type is int4, the macro used should be Int32GetDatum(), not UInt32GetDatum() or anything else. If the SQL type is bool, you should be using BoolGetDatum(). Apart from the above, I did a little work to improve the reporting when a page of the wrong type is verified. Updated patch with those changes attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 1/18/17 10:45 AM, Jesper Pedersen wrote: > Fixed in this version: > > * verify_hash_page: Display magic in hex, like hash_metapage_info > * Update header for hash_page_type > > Moving the patch back to 'Needs Review'. Please include tests in your patch. I have posted a sample test suite in <https://www.postgresql.org/message-id/bcf8c21b-702e-20a7-a4b0-236ed2363d84@2ndquadrant.com>, which you could use. Also, as mentioned before, hash_metapage_info result fields spares and mapp should be arrays of integer, not a text string. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, Please find my reply inline. > In hash_bimap_info(), we go to the trouble of creating a raw page but > never do anything with it. I guess the idea here is just to error out > if the supplied page number is not an overflow page, but there are no > comments explaining that. Anyway, I'm not sure it's a very good idea, > because it means that if you try to write a query to interrogate the > status of all the bitmap pages, it's going to read all of the overflow > pages to which they point, which makes the operation a LOT more > expensive. I think it would be better to leave this part out; if the > user wants to know which pages are actually overflow pages, they can > use hash_page_type() to figure it out. Yes, the intention was to ensure that user only passes overflow page as an input to this function. I think if we wan't to avoid creating a raw page then we may need to find some other way to verify if it is an overflow page or not, may be we can make use of hash_check_page(). Let's make it the job of this > function just to check the available/free status of the page in the > bitmap, without reading the bitmap itself. > okay, In that case I think we can check the previous block number that the overflow page is pointing to in order to identify if it is free or in-use. AFAIU, if an overflow page is free it's prev block number will be Invalid. This way, we may not have to read bitmap page. Now the question here is, we also have bucket pages where previous block number is always Invalid but before checking this we ensure that we are only dealing with an overflow page.Please let me know if you feel we do have some better option than this to identify the status of overflow page without reading bitmap. > In doing that, I think it should either return (bitmapblkno, > bitmapbit, bit) or just bit. Returning bitmapblkno but not bitmapbit > seems strange. Also, I think it should return bit as a bool, not > int4. It would be good to return bitmap bit number as well along with the bitmap block number. Also, returning bit as bool would look good. I will do that. I am also working on other review comments and will share the updated patch asap. Thanks.
> The heap tuple is on page 3407872 at line pointer offset 12584? > That's an awfully but not implausibly big page number (about 26GB into > the relation) and an awfully and implausibly large line pointer offset > (how do we fit 12584 index tuples into an 8192-byte page?). Also, the > fact that this example has two index entries with the same hash code > pointing at the same heap TID seems wrong; wouldn't that result in > index scans returning duplicates? I think what's actually happening > here is that (3407872,12584) is actually the attempt to interpret some > chunk of memory containing the text representation of a TID as an > actual TID. When I print those numbers as hex, I get 0x343128, and > those are the digits "4" and "1" and an opening parenthesis ")" as > ASCII, so that might fit this theory. I too had a similar observations when debugging hash_page_items() and I think we were trying to represent tid as a text rather than tid itself which was not correct. Attched patch fixes this bug. Below is what i observed and it is somewhat similar to your explanation in the above comment: (gdb) p PageGetItemId(uargs->page, uargs->offset) $2 = (ItemIdData *) 0x1d84754 (gdb) p *(ItemIdData *) 0x1d84754 $3 = {lp_off = 1664, lp_flags = 1, lp_len = 16} (gdb) p *(IndexTuple) (page + 1664) $4 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 839}, ip_posid = 137}, t_info = 16} (gdb) p BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)) $5 = 839 (gdb) p (itup->t_tid.ip_posid) $6 = 137 (gdb) p CStringGetTextDatum(psprintf("(%u,%u)", 839, 137)) $7 = 30959896 (gdb) p DatumGetCString(30959896) $8 = 0x1d86918 "4" (gdb) p (char *)0x1d86918 $23 = 0x1d86918 "4" > > The code that tries to extract the hashcode from the item also looks > suspicious. I'm not 100% sure whether it's wrong or just > strangely-written, but it doesn't make much sense to extract the item > contents, then using sprintf() to turn that into a hex string of > bytes, then parse the hex string using strtol() to get an integer > back. I think what it ought to be doing is getting a pointer to the > index tuple and then calling _hash_get_indextuple_hashkey. > I think there is nothing wrong with the hashcode being shown but i do agree that it is a bit lengthly method to find a hashcode considering that we already have a extern function _hash_get_indextuple_hashkey() that can be used to find the hashcode. I have corrected this in the attached patch. > Another minor complaint about hash_page_items is that it doesn't > pfree(uargs->page). I'm not sure it's necessary to pfree anything > here, but if we're going to do it I think we might as well be > consistent with the btree case. Anyway it can hardly make sense to > free the 8-byte structure and not the 8192-byte page to which it's > pointing. > In case of btree, we are copying entire page into uargs->page. <btreefuncs.c> memcpy(uargs->page, BufferGetPage(buffer), BLCKSZ); </btrrefuncs.c> But in our case we have a raw_page and uargs->page contains pointer to the page so no need to pfree uargs->page separately. > In hash_bimap_info(), we go to the trouble of creating a raw page but > never do anything with it. I guess the idea here is just to error out > if the supplied page number is not an overflow page, but there are no > comments explaining that. Anyway, I'm not sure it's a very good idea, > because it means that if you try to write a query to interrogate the > status of all the bitmap pages, it's going to read all of the overflow > pages to which they point, which makes the operation a LOT more > expensive. I think it would be better to leave this part out; if the > user wants to know which pages are actually overflow pages, they can > use hash_page_type() to figure it out. Let's make it the job of this > function just to check the available/free status of the page in the > bitmap, without reading the bitmap itself. Point taken. I am now checking the status of an overflow page without reading bitmap page. > > In doing that, I think it should either return (bitmapblkno, > bitmapbit, bit) or just bit. Returning bitmapblkno but not bitmapbit > seems strange. Also, I think it should return bit as a bool, not > int4. > The new bitmap_info() now returns bitmapblkno, bitmapbit and bitstatus as bool. > Another general note is that, in general, if you use the > BlahGetDatum() function to construct a datum, the SQL type should be > match the macro you picked - e.g. if the SQL type is int4, the macro > used should be Int32GetDatum(), not UInt32GetDatum() or anything else. > If the SQL type is bool, you should be using BoolGetDatum(). Sorry to mention but I didn't find any SQL datatype equivalent to uint32 or uint16 in 'C'. So, currently i am using int4 for uint16 and int8 for uint32. > Apart from the above, I did a little work to improve the reporting > when a page of the wrong type is verified. Updated patch with those > changes attached. okay. Thanks. I have done changes on top of this patch. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi, > Please include tests in your patch. I have posted a sample test suite > in > <https://www.postgresql.org/message-id/bcf8c21b-702e-20a7-a4b0-236ed2363d84@2ndquadrant.com>, > which you could use. > > Also, as mentioned before, hash_metapage_info result fields spares and > mapp should be arrays of integer, not a text string. I have added the test-case in the v9 patch shared upthread and also corrected datatypes for spares and mapp fields in a metapage. Now these two fields are being shown as an array of integers rather than text. https://www.postgresql.org/message-id/CAE9k0Pke046HKYfuJGcCtP77NyHrun7hBV-v20a0TW4CUg4H%2BA%40mail.gmail.com -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com
On Sun, Jan 29, 2017 at 11:09 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > okay. Thanks. I have done changes on top of this patch. Moved to CF 2017-03 as there is a new patch, no reviews. -- Michael
On Sat, Jan 28, 2017 at 9:09 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > okay. Thanks. I have done changes on top of this patch. + ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info); + Assert(ptr <= uargs->page + BLCKSZ); I think this should be promoted to an ereport(); these functions can accept an arbitrary bytea. + if (opaque->hasho_flag & LH_BUCKET_PAGE) + stat->hasho_prevblkno = InvalidBlockNumber; + else + stat->hasho_prevblkno = opaque->hasho_prevblkno; I think we should return the raw value here. Mithun's patch to cache the metapage hasn't gone in yet, but even if it does, let's assume anyone using contrib/pageinspect wants to see the data that's physically present, not our gloss on it. Other than that, I don't think I have any other comments on this. The tests that got added look a little verbose to me (do we really need to check pages 1-4 separately in each case when they're all hash pages? if hash_bitmap_info rejects one, surely it will reject the others) but I'm not going to fight too hard if Peter wants it that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> > + ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info); > + Assert(ptr <= uargs->page + BLCKSZ); > > I think this should be promoted to an ereport(); these functions can > accept an arbitrary bytea. I think we had added 'ptr' variable initially just to dump hash code in hexadecimal format but now since we have removed that logic from current code, I think it is no more required and I have therefore removed it from the current patch. Below is the code that used it initially. It got changed as per your comment - [1] > > + if (opaque->hasho_flag & LH_BUCKET_PAGE) > + stat->hasho_prevblkno = InvalidBlockNumber; > + else > + stat->hasho_prevblkno = opaque->hasho_prevblkno; > > I think we should return the raw value here. Mithun's patch to cache > the metapage hasn't gone in yet, but even if it does, let's assume > anyone using contrib/pageinspect wants to see the data that's > physically present, not our gloss on it. okay, agreed. I have corrected this in the attached v10 patch. > > Other than that, I don't think I have any other comments on this. The > tests that got added look a little verbose to me (do we really need to > check pages 1-4 separately in each case when they're all hash pages? > if hash_bitmap_info rejects one, surely it will reject the others) but > I'm not going to fight too hard if Peter wants it that way. > I think it's OK to check hash_bitmap_info() or any other functions with different page types at least once. [1]- https://www.postgresql.org/message-id/CA%2BTgmoZUjrVy52TUU3b_Q5L6jcrt7w5R4qFwMXdeCuKQBmYWqQ%40mail.gmail.com -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Feb 2, 2017 at 6:29 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> >> + ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info); >> + Assert(ptr <= uargs->page + BLCKSZ); >> >> I think this should be promoted to an ereport(); these functions can >> accept an arbitrary bytea. > > I think we had added 'ptr' variable initially just to dump hash code > in hexadecimal format but now since we have removed that logic from > current code, I think it is no more required and I have therefore > removed it from the current patch. Below is the code that used it > initially. It got changed as per your comment - [1] OK. > I think it's OK to check hash_bitmap_info() or any other functions > with different page types at least once. > > [1]- https://www.postgresql.org/message-id/CA%2BTgmoZUjrVy52TUU3b_Q5L6jcrt7w5R4qFwMXdeCuKQBmYWqQ%40mail.gmail.com Sure, I just don't know if we need to test them 4 or 5 times. But I think this is good enough for now - it's not like this code is written in stone once committed. So, committed. Wow, I wish every patch had this many reviewers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 02/02/2017 02:24 PM, Robert Haas wrote: > So, committed. Wow, I wish every patch had this many reviewers. > Thanks Robert ! Best regards, Jesper
On Fri, Feb 3, 2017 at 4:28 AM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > On 02/02/2017 02:24 PM, Robert Haas wrote: >> >> So, committed. Wow, I wish every patch had this many reviewers. >> > > Thanks Robert ! 9 people in total per the commit message. Yes that's rare, and thanks for doing the effort to list everybody. -- Michael
>> I think it's OK to check hash_bitmap_info() or any other functions >> with different page types at least once. >> >> [1]- https://www.postgresql.org/message-id/CA%2BTgmoZUjrVy52TUU3b_Q5L6jcrt7w5R4qFwMXdeCuKQBmYWqQ%40mail.gmail.com > > Sure, I just don't know if we need to test them 4 or 5 times. But I > think this is good enough for now - it's not like this code is written > in stone once committed. > > So, committed. Wow, I wish every patch had this many reviewers. > Thanks Robert and Thank you to all the experts for showing their interest towards this project.
On 02/02/2017 02:28 PM, Jesper Pedersen wrote: > On 02/02/2017 02:24 PM, Robert Haas wrote: >> So, committed. Wow, I wish every patch had this many reviewers. >> > > Thanks Robert ! > This message should have included a thank you to everybody who provided valuable feedback for this feature, and for that I'm sorry. Best regards, Jesper
BTW, the buildfarm is still crashing on ia64 and sparc64 members. I believe this is the same problem addressed in 84ad68d64 for pageinspect's GIN functions, to wit, the payload of a "bytea" is not maxaligned, so machines that care about alignment won't be happy when trying to fetch 64-bit values out of a bytea page image. Clearly, the fix should be to start using the get_page_from_raw() function that Peter introduced in that patch. But it's static in ginfuncs.c, which I thought was a mistake at the time, and it's proven so now. contrib/pageinspect actually seems to lack *any* infrastructure for sharing functions across modules. It's time to remedy that. I propose inventing "pageinspect.h" to hold commonly visible declarations, and moving get_page_from_raw() into rawpage.c, which seems like a reasonably natural home for it. (Alternatively, we could invent a pageinspect.c file to hold utility functions, but that might be overkill.) regards, tom lane
On Fri, Feb 3, 2017 at 10:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > BTW, the buildfarm is still crashing on ia64 and sparc64 members. > I believe this is the same problem addressed in 84ad68d64 for > pageinspect's GIN functions, to wit, the payload of a "bytea" > is not maxaligned, so machines that care about alignment won't be > happy when trying to fetch 64-bit values out of a bytea page image. > > Clearly, the fix should be to start using the get_page_from_raw() > function that Peter introduced in that patch. But it's static in > ginfuncs.c, which I thought was a mistake at the time, and it's > proven so now. > > contrib/pageinspect actually seems to lack *any* infrastructure > for sharing functions across modules. It's time to remedy that. > I propose inventing "pageinspect.h" to hold commonly visible > declarations, and moving get_page_from_raw() into rawpage.c, > which seems like a reasonably natural home for it. (Alternatively, > we could invent a pageinspect.c file to hold utility functions, > but that might be overkill.) No objections. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 02/03/2017 11:36 AM, Robert Haas wrote: > On Fri, Feb 3, 2017 at 10:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> BTW, the buildfarm is still crashing on ia64 and sparc64 members. >> I believe this is the same problem addressed in 84ad68d64 for >> pageinspect's GIN functions, to wit, the payload of a "bytea" >> is not maxaligned, so machines that care about alignment won't be >> happy when trying to fetch 64-bit values out of a bytea page image. >> >> Clearly, the fix should be to start using the get_page_from_raw() >> function that Peter introduced in that patch. But it's static in >> ginfuncs.c, which I thought was a mistake at the time, and it's >> proven so now. >> >> contrib/pageinspect actually seems to lack *any* infrastructure >> for sharing functions across modules. It's time to remedy that. >> I propose inventing "pageinspect.h" to hold commonly visible >> declarations, and moving get_page_from_raw() into rawpage.c, >> which seems like a reasonably natural home for it. (Alternatively, >> we could invent a pageinspect.c file to hold utility functions, >> but that might be overkill.) > > No objections. > Attached is v1 of this w/ verify_hash_page() using get_page_from_raw(). Sorry for all the problems. Best regards, Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 02/03/2017 11:41 AM, Jesper Pedersen wrote: >>> contrib/pageinspect actually seems to lack *any* infrastructure >>> for sharing functions across modules. It's time to remedy that. >>> I propose inventing "pageinspect.h" to hold commonly visible >>> declarations, and moving get_page_from_raw() into rawpage.c, >>> which seems like a reasonably natural home for it. (Alternatively, >>> we could invent a pageinspect.c file to hold utility functions, >>> but that might be overkill.) >> >> No objections. >> > > Attached is v1 of this w/ verify_hash_page() using get_page_from_raw(). > > Sorry for all the problems. > Disregard, as Tom has committed a fix. Best regards, Jesper
On Fri, Feb 3, 2017 at 11:49 AM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > Disregard, as Tom has committed a fix. So we're six commits into this mess now and I'm hopeful that we've got most of the problems with type selection and crashing fixed now. However, I discovered another thing that doesn't make sense to me -- and I admit this is another thing I should have noticed before committing, but better late than never. As far as I can tell, the hash_bitmap_info() function is doing something completely ridiculous. One would expect that the purpose of this function was to tell you about the status of pages in the bitmap. The documentation claims that this is what the function does: it claims that this function "shows the status of a bit in the bitmap page for a particular overflow page". So you would think that what the function would do is: 1. Work out which bitmap page contains the bit for the page number in question. 2. Read that bitmap page. 3. Indicate the status of that bit within that page. However, that's not what the function actually does. Instead, it does this: 1. Go examine the overflow page and see whether hasho_prevblkno. If so, claim that the bit is set in the bitmap; if not, claim that it isn't. 2. Work out which bitmap page contains the bit for the page number in question. 3. But don't look at it. Instead, tell the user which bitmap page and bit you would have looked at, but instead of returning the status of that bit, return the value you computed in step 1. I do not think this can be the right approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Jan 28, 2017 at 10:25 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > Hi, > > Please find my reply inline. > >> In hash_bimap_info(), we go to the trouble of creating a raw page but >> never do anything with it. I guess the idea here is just to error out >> if the supplied page number is not an overflow page, but there are no >> comments explaining that. Anyway, I'm not sure it's a very good idea, >> because it means that if you try to write a query to interrogate the >> status of all the bitmap pages, it's going to read all of the overflow >> pages to which they point, which makes the operation a LOT more >> expensive. I think it would be better to leave this part out; if the >> user wants to know which pages are actually overflow pages, they can >> use hash_page_type() to figure it out. > > Yes, the intention was to ensure that user only passes overflow page > as an input to this function. I think if we wan't to avoid creating a > raw page then we may need to find some other way to verify if it is an > overflow page or not, may be we can make use of hash_check_page(). > > Let's make it the job of this >> function just to check the available/free status of the page in the >> bitmap, without reading the bitmap itself. >> > > okay, In that case I think we can check the previous block number that > the overflow page is pointing to in order to identify if it is free or > in-use. AFAIU, if an overflow page is free it's prev block number will > be Invalid. This way, we may not have to read bitmap page. Now the > question here is, we also have bucket pages where previous block > number is always Invalid but before checking this we ensure that we > are only dealing with an overflow page.Please let me know if you feel > we do have some better option than this to identify the status of > overflow page without reading bitmap. > I think this was a very poor finding by me. If an overflow page is freed, it is completely filled with zero values rather than marking it's prev and next block number as invalid. Therefore, we won't be able to read a free overflow page as it is a new page and hence, we can't decide if an overflow page is free or not without reading the corresponding bitmap page. >> In doing that, I think it should either return (bitmapblkno, >> bitmapbit, bit) or just bit. Returning bitmapblkno but not bitmapbit >> seems strange. Also, I think it should return bit as a bool, not >> int4. > > It would be good to return bitmap bit number as well along with the > bitmap block number. Also, returning bit as bool would look good. I > will do that. > > I am also working on other review comments and will share the updated > patch asap. Thanks.
> As far as I can tell, the hash_bitmap_info() function is doing > something completely ridiculous. One would expect that the purpose of > this function was to tell you about the status of pages in the bitmap. > The documentation claims that this is what the function does: it > claims that this function "shows the status of a bit in the bitmap > page for a particular overflow page". So you would think that what > the function would do is: > > 1. Work out which bitmap page contains the bit for the page number in question. > 2. Read that bitmap page. > 3. Indicate the status of that bit within that page. > > However, that's not what the function actually does. Instead, it does this: > > 1. Go examine the overflow page and see whether hasho_prevblkno. If > so, claim that the bit is set in the bitmap; if not, claim that it > isn't. > 2. Work out which bitmap page contains the bit for the page number in question. > 3. But don't look at it. Instead, tell the user which bitmap page and > bit you would have looked at, but instead of returning the status of > that bit, return the value you computed in step 1. > > I do not think this can be the right approach. Yes, It is not a right approach. As I mentioned in [1], the overflow page being freed is completely filled with zero values which means it is not in a readable state. So, we won't be able to examine a free overflow page. Considering these facts, I would take following approach, 1) Check if an overflow page is a new page. If so, read a bitmap page to confirm if a bit corresponding to this overflow page is clear or not. For this, I would first add Assert statement to ensure that the bit is clear and if it is, then set the statusbit as false indicating that the page is free. 2) In case if an overflow page is not new, first verify if it is really an overflow page and if so, check if the bit corresponding to it in the bitmap page is SET. If so, set the statusbit as true; If not, we would see an assertion failure happening. If you are okay with this approach, please let me know I will share you an updated patch. Thanks. [1]- https://www.postgresql.org/message-id/CAE9k0PkiwT0qD3fdruU8bgAjTpzJpnqcT0XNWnnKxxFbogbL9A%40mail.gmail.com
On Sat, Feb 4, 2017 at 7:06 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> As far as I can tell, the hash_bitmap_info() function is doing >> something completely ridiculous. One would expect that the purpose of >> this function was to tell you about the status of pages in the bitmap. >> The documentation claims that this is what the function does: it >> claims that this function "shows the status of a bit in the bitmap >> page for a particular overflow page". So you would think that what >> the function would do is: >> >> 1. Work out which bitmap page contains the bit for the page number in question. >> 2. Read that bitmap page. >> 3. Indicate the status of that bit within that page. >> >> However, that's not what the function actually does. Instead, it does this: >> >> 1. Go examine the overflow page and see whether hasho_prevblkno. If >> so, claim that the bit is set in the bitmap; if not, claim that it >> isn't. >> 2. Work out which bitmap page contains the bit for the page number in question. >> 3. But don't look at it. Instead, tell the user which bitmap page and >> bit you would have looked at, but instead of returning the status of >> that bit, return the value you computed in step 1. >> >> I do not think this can be the right approach. > > Yes, It is not a right approach. As I mentioned in [1], the overflow > page being freed is completely filled with zero values which means it > is not in a readable state. So, we won't be able to examine a free > overflow page. Considering these facts, I would take following > approach, > > 1) Check if an overflow page is a new page. If so, read a bitmap page > to confirm if a bit corresponding to this overflow page is clear or > not. For this, I would first add Assert statement to ensure that the > bit is clear and if it is, then set the statusbit as false indicating > that the page is free. > > 2) In case if an overflow page is not new, first verify if it is > really an overflow page and if so, check if the bit corresponding to > it in the bitmap page is SET. If so, set the statusbit as true; If > not, we would see an assertion failure happening. I think this is complicated and not what anybody wants. I think you should do exactly what I said above: return true if the bit is set in the bitmap, and false if it isn't. Full stop. Don't read or do anything with the underlying page. Only read the bitmap page. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>> 1) Check if an overflow page is a new page. If so, read a bitmap page >> to confirm if a bit corresponding to this overflow page is clear or >> not. For this, I would first add Assert statement to ensure that the >> bit is clear and if it is, then set the statusbit as false indicating >> that the page is free. >> >> 2) In case if an overflow page is not new, first verify if it is >> really an overflow page and if so, check if the bit corresponding to >> it in the bitmap page is SET. If so, set the statusbit as true; If >> not, we would see an assertion failure happening. > > I think this is complicated and not what anybody wants. I think you > should do exactly what I said above: return true if the bit is set in > the bitmap, and false if it isn't. Full stop. Don't read or do > anything with the underlying page. Only read the bitmap page. > Okay, As per the inputs from you, I have modified hash_bitmap_info() and have tried to keep the things simple. Attached is the patch that has this changes. Please have a look and let me know if you feel it is not yet in the right shape. Thanks. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Feb 8, 2017 at 9:25 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >>> 1) Check if an overflow page is a new page. If so, read a bitmap page >>> to confirm if a bit corresponding to this overflow page is clear or >>> not. For this, I would first add Assert statement to ensure that the >>> bit is clear and if it is, then set the statusbit as false indicating >>> that the page is free. >>> >>> 2) In case if an overflow page is not new, first verify if it is >>> really an overflow page and if so, check if the bit corresponding to >>> it in the bitmap page is SET. If so, set the statusbit as true; If >>> not, we would see an assertion failure happening. >> >> I think this is complicated and not what anybody wants. I think you >> should do exactly what I said above: return true if the bit is set in >> the bitmap, and false if it isn't. Full stop. Don't read or do >> anything with the underlying page. Only read the bitmap page. > > Okay, As per the inputs from you, I have modified hash_bitmap_info() > and have tried to keep the things simple. Attached is the patch that > has this changes. Please have a look and let me know if you feel it is > not yet in the right shape. Thanks. Well, this changes the function signature and I don't see any advantage in doing that. Also, it still doesn't read the code that reads the overflow page. Any correct patch for this problem needs to include the following hunk: - buf = ReadBufferExtended(indexRel, MAIN_FORKNUM, (BlockNumber) ovflblkno, - RBM_NORMAL, NULL); - LockBuffer(buf, BUFFER_LOCK_SHARE); - _hash_checkpage(indexRel, buf, LH_PAGE_TYPE); - page = BufferGetPage(buf); - opaque = (HashPageOpaque) PageGetSpecialPointer(page); - - if (opaque->hasho_flag != LH_OVERFLOW_PAGE) - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("page is not an overflow page"), - errdetail("Expected %08x, got %08x.", - LH_OVERFLOW_PAGE, opaque->hasho_flag))); - - if (BlockNumberIsValid(opaque->hasho_prevblkno)) - bit = true; - - UnlockReleaseBuffer(buf); And then, instead, you need to add some code to set bit based on the bitmap page, like what you have: + mapbuf = _hash_getbuf(indexRel, bitmapblkno, HASH_READ, LH_BITMAP_PAGE); + mappage = BufferGetPage(mapbuf); + freep = HashPageGetBitmap(mappage); + + if (ISSET(freep, bitmapbit)) + bit = 1; Except I would write that last line as... bit = ISSET(freep, bitmapbit) != 0 ...instead of using an if statement. I'm sort of confused as to why the idea of not reading the underlying page is so hard to understand. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Feb 8, 2017 at 11:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Feb 8, 2017 at 11:58 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >>> And then, instead, you need to add some code to set bit based on the >>> bitmap page, like what you have: >>> >>> + mapbuf = _hash_getbuf(indexRel, bitmapblkno, HASH_READ, LH_BITMAP_PAGE); >>> + mappage = BufferGetPage(mapbuf); >>> + freep = HashPageGetBitmap(mappage); >>> + >>> + if (ISSET(freep, bitmapbit)) >>> + bit = 1; >>> >>> Except I would write that last line as... >>> >>> bit = ISSET(freep, bitmapbit) != 0 >>> >>> ...instead of using an if statement. >>> >>> I'm sort of confused as to why the idea of not reading the underlying >>> page is so hard to understand. >> >> It was not so hard to understand your point. The only reason for >> reading overflow page is to ensure that we are passing an overflow >> block as input to '_hash_ovflblkno_to_bitno(metap, (BlockNumber) >> ovflblkno)'. I had removed the code for reading an overflow page >> assuming that _hash_ovflblkno_to_bitno() will throw an error if we >> pass block number of a page other than overflow page but, it doesn't >> seem to guarantee that it will always return value for an overflow >> page. Here are my observations, >> >> postgres=# select hash_page_type(get_raw_page('con_hash_index', 65)); >> hash_page_type >> ---------------- >> bucket >> (1 row) >> >> postgres=# SELECT * FROM >> hash_bitmap_info(get_raw_page('con_hash_index', 0), 'con_hash_index', >> 65); >> bitmapblkno | bitmapbit | bitstatus >> -------------+-----------+----------- >> 33 | 0 | t >> (1 row) >> >> postgres=# select hash_page_type(get_raw_page('con_hash_index', 64)); >> hash_page_type >> ---------------- >> bucket >> (1 row) >> >> postgres=# SELECT * FROM >> hash_bitmap_info(get_raw_page('con_hash_index', 0), 'con_hash_index', >> 64); >> ERROR: invalid overflow block number 64 >> >> postgres=# select hash_page_type(get_raw_page('con_hash_index', 63)); >> hash_page_type >> ---------------- >> bucket >> (1 row) >> >> postgres=# SELECT * FROM >> hash_bitmap_info(get_raw_page('con_hash_index', 0), 'con_hash_index', >> 63); >> ERROR: invalid overflow block number 63 >> >> postgres=# select hash_page_type(get_raw_page('con_hash_index', 33)); >> hash_page_type >> ---------------- >> bitmap >> (1 row) >> >> postgres=# SELECT * FROM >> hash_bitmap_info(get_raw_page('con_hash_index', 0), 'con_hash_index', >> 33); >> bitmapblkno | bitmapbit | bitstatus >> -------------+-----------+----------- >> 33 | 0 | t >> (1 row) > > Right, I understand. But if the caller cares about that, they can use > hash_page_type() in order to find out whether the result of > hash_bitmap_info() will be meaningful. The problem with the way > you've done it is that if someone wants to check the status of a > million bitmap bits, that currently requires reading a million pages > (8GB) whereas if you don't read the underlying page it requires > reading only enough pages to contain a million bitmap bits (~128kB). > That's a big difference. > > If you want to verify that the supplied page number is an overflow > page before returning the bit, I think you should do that calculation > based on the metapage. There's enough information in hashm_spares to > figure out which pages are primary bucket pages as opposed to overflow > pages, and there's enough information in hashm_mapp to identify which > pages are bitmap pages, and the metapage is always page 0. If you > take that approach, then you can check a million bitmap bits reading > only the relevant bitmap pages plus the metapage, which is a LOT less > I/O than reading a million index pages. > Thanks for the inputs. Attached is the patch modified as per your suggestions. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Feb 9, 2017 at 8:56 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> If you want to verify that the supplied page number is an overflow >> page before returning the bit, I think you should do that calculation >> based on the metapage. There's enough information in hashm_spares to >> figure out which pages are primary bucket pages as opposed to overflow >> pages, and there's enough information in hashm_mapp to identify which >> pages are bitmap pages, and the metapage is always page 0. If you >> take that approach, then you can check a million bitmap bits reading >> only the relevant bitmap pages plus the metapage, which is a LOT less >> I/O than reading a million index pages. > > Thanks for the inputs. Attached is the patch modified as per your suggestions. I think you should just tighten up the sanity checking in the existing function _hash_ovflblkno_to_bitno rather than duplicating the code. I don't think it's called often enough for one extra (cheap) test to be an issue. (You should change the elog in that function to an ereport, too, since it's going to be a user-facing error message now.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> I think you should just tighten up the sanity checking in the existing > function _hash_ovflblkno_to_bitno rather than duplicating the code. I > don't think it's called often enough for one extra (cheap) test to be > an issue. (You should change the elog in that function to an ereport, > too, since it's going to be a user-facing error message now.) okay, I have taken care of above two points in the attached patch. Thanks. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Feb 9, 2017 at 1:11 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> I think you should just tighten up the sanity checking in the existing >> function _hash_ovflblkno_to_bitno rather than duplicating the code. I >> don't think it's called often enough for one extra (cheap) test to be >> an issue. (You should change the elog in that function to an ereport, >> too, since it's going to be a user-facing error message now.) > > okay, I have taken care of above two points in the attached patch. Thanks. Alright, committed with a little further hacking. That would rejected using hash_bitmap_info() on primary bucket pages and the metapage, but not on bitmap pages, which seems weird. So I fixed that and pushed this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Feb 10, 2017 at 1:06 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Alright, committed with a little further hacking. I did pull the latest code, and tried Test: ==== create table t1(t int); create index i1 on t1 using hash(t); insert into t1 select generate_series(1, 10000000); postgres=# SELECT spares FROM hash_metapage_info(get_raw_page('i1', 0)); spares ----------------------------------------------------------------------------{0,0,0,1,9,17,33,65,-127,1,1,0,1,-1,-1,-4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0} spares are showing negative numbers; I think the wrong type has been chosen, seems it is rounding at 127, spares are defined as following uint32 hashm_spares[HASH_MAX_SPLITPOINTS]; /* spare pages before each splitpoint */ it should be always positive. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Thanks for reporting it. This is because of incorrect data typecasting. Attached is the patch that fixes this issue.
On Tue, Feb 21, 2017 at 2:58 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
On Fri, Feb 10, 2017 at 1:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Alright, committed with a little further hacking.
I did pull the latest code, and tried
Test:
====
create table t1(t int);
create index i1 on t1 using hash(t);
insert into t1 select generate_series(1, 10000000);
postgres=# SELECT spares FROM hash_metapage_info(get_raw_page('i1', 0));
spares
------------------------------------------------------------ ----------------
{0,0,0,1,9,17,33,65,-127,1,1,0,1,-1,-1,-4,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0}
spares are showing negative numbers; I think the wrong type has been
chosen, seems it is rounding at 127, spares are defined as following
uint32 hashm_spares[HASH_MAX_SPLITPOINTS]; /* spare pages before each
splitpoint */
it should be always positive.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
Attachment
On Tue, Feb 21, 2017 at 3:14 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > Thanks for reporting it. This is because of incorrect data typecasting. > Attached is the patch that fixes this issue. Oops, that's probably my fault. Thanks for the patch; committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company