Thread: [HACKERS] Add pgstathashindex() to get hash index table statistics.
Hi All, I have introduced a new function 'pgstathashindex()' inside pgstatuple extension to view the statistics related to hash index table. I could have used 'pgstattuple()' function to view hash index stats instead of adding this new function but there are certain limitations when using pgstattuple() for hash indexes. Firstly, it doesn't work if a hash index contains zero or new pages which is very common in case of hash indexes. Secondly, it doesn't provide information about different types of pages in hash index and its count. Considering these points, I have thought of introducing this function. Attached is the patch for the same. Please have a look and let me your feedback. I would also like to mention that this idea basically came from my colleague Kuntal Ghosh and i implemented it. I have also created a commit-fest entry for this submission. Thanks. With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Dec 21, 2016 at 7:22 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > Hi All, > > I have introduced a new function 'pgstathashindex()' inside pgstatuple > extension to view the statistics related to hash index table. > I think this feature is going to be helpful. I've some initial review for the patch. I've not applied the patch yet. + buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, NULL); Use ReadBuffer() instead. + /* + * Let us ignore metapage and bitmap page when calculating + * free space percentage for tuples in a table. + */ + table_len = (stats.total_pages - 2) * BLCKSZ; + + free_percent = 100.0 * stats.free_space / table_len; I think the calculation for max available spcae is wrong here. You should subtract the page header and special area from the total page size. A check for non-zero denominator should be added while calculating the percentage. There can be multiple bitmap pages. Right? + values[10] = Float8GetDatum(free_percent); Some precision should be added. + <row> + <entry><structfield>ffactor</structfield></entry> + <entry><type>integer</type></entry> + <entry>Average number of tuples per bucket</entry> + </row> I feel that either the column name should be changed or it should just output how full the index method is in percentage. + </varlistentry> + + + <varlistentry> Please remove extra spaces. And, please add some test cases for regression tests. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
Hi, > I think the calculation for max available spcae is wrong here. You > should subtract the page header and special area from the total page size. > A check for non-zero denominator should be added while calculating the > percentage. > There can be multiple bitmap pages. Right? Yes, we can have multiple bitmap pages. I have adjusted the calculation for free space percentage accordingly. Please check the attached v2 patch. > > + values[10] = Float8GetDatum(free_percent); > Some precision should be added. Corrected. Please refer v2 patch. > > + <row> > + <entry><structfield>ffactor</structfield></entry> > + <entry><type>integer</type></entry> > + <entry>Average number of tuples per bucket</entry> > + </row> > I feel that either the column name should be changed or it should > just output how full the index method is in percentage. Fixed. Refer to v2 patch. > > + </varlistentry> > + > + > + <varlistentry> > Please remove extra spaces. Done. Please refer to v2 patch. > > And, please add some test cases for regression tests. > Added a test-case. Please check v2 patch attached with this mail. -- With Regards, Ashutosh Sharma. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Fri, Jan 6, 2017 at 6:58 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: I've successfully applied the patch on the latest head and ran a regression tests without any failure. There is no major changes. However, I've some minor comments on the patch: +/* + * HASH_ALLOCATABLE_PAGE_SZ represents allocatable + * space (pd_upper - pd_lower) on a hash page. + */ +#define HASH_ALLOCATABLE_PAGE_SZ \ + BLCKSZ - \ + (SizeOfPageHeaderData + sizeof(HashPageOpaqueData)) My suggestion will be not to write "(pd_upper - pd_lower)" in the comment. You may write allocatable space on a empty hash page. + buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, NULL); Use BAS_BULKREAD strategy to read the buffer. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
> However, I've some minor comments on the patch: > > +/* > + * HASH_ALLOCATABLE_PAGE_SZ represents allocatable > + * space (pd_upper - pd_lower) on a hash page. > + */ > +#define HASH_ALLOCATABLE_PAGE_SZ \ > + BLCKSZ - \ > + (SizeOfPageHeaderData + sizeof(HashPageOpaqueData)) > My suggestion will be not to write "(pd_upper - pd_lower)" in the > comment. You may write allocatable space on a empty hash page. Accepted. Have changed the comment accordingly. > > + buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, > RBM_NORMAL, NULL); > Use BAS_BULKREAD strategy to read the buffer. > okay, corrected. Please check the attached v3 patch with corrections. With Regards, Ashutosh Sharma EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi, Please find the attached v4 patch rebased on a latest commitID in head. I had to rebase it as the following git commit has some changes in pgstatindex.c file due to which 'git apply' was failing. <git-commit> commit f21a563d25dbae153937aec062161184189478b8 Author: Peter Eisentraut <peter_e@gmx.net> Date: Fri Jan 20 20:29:53 2017 -0500 Move some things from builtins.h to new header files </git-commit> On Thu, Jan 19, 2017 at 12:27 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> However, I've some minor comments on the patch: >> >> +/* >> + * HASH_ALLOCATABLE_PAGE_SZ represents allocatable >> + * space (pd_upper - pd_lower) on a hash page. >> + */ >> +#define HASH_ALLOCATABLE_PAGE_SZ \ >> + BLCKSZ - \ >> + (SizeOfPageHeaderData + sizeof(HashPageOpaqueData)) >> My suggestion will be not to write "(pd_upper - pd_lower)" in the >> comment. You may write allocatable space on a empty hash page. > > Accepted. Have changed the comment accordingly. > >> >> + buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, >> RBM_NORMAL, NULL); >> Use BAS_BULKREAD strategy to read the buffer. >> > > okay, corrected. Please check the attached v3 patch with corrections. > > With Regards, > Ashutosh Sharma > EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Mon, Jan 23, 2017 at 2:56 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > Hi, > > Please find the attached v4 patch rebased on a latest commitID in > head. I had to rebase it as the following git commit has some changes > in pgstatindex.c file due to which 'git apply' was failing. > I've looked at the patch. It looks good. However, I was wondering why an exclusive lock for extension is necessary for reading the number blocks in this case. Please refer to the following code. + /* Get the current relation length */ + LockRelationForExtension(rel, ExclusiveLock); + nblocks = RelationGetNumberOfBlocks(rel); + UnlockRelationForExtension(rel, ExclusiveLock); Apart from this, I've nothing else to add. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
> I've looked at the patch. It looks good. However, I was wondering why > an exclusive lock for extension is necessary for reading the number > blocks in this case. Please refer to the following code. > > + /* Get the current relation length */ > + LockRelationForExtension(rel, ExclusiveLock); > + nblocks = RelationGetNumberOfBlocks(rel); > + UnlockRelationForExtension(rel, ExclusiveLock); > > Apart from this, I've nothing else to add. Actually in my case I may not need to acquire ExclusiveLock just to get the number of pages in a relation. Infact I have already opened a relation using 'AccessShareLock' which should be enough to read a table length. Thanks for putting this point. I have corrected it in the attached v5 patch. I think in 'pgstat_index()' currently LockRelationForExtension(rel, ExclusiveLock); is being used just to read a table length which I feel is not required. You may raise this point in the community in a separete mail thread if you agree with it. Also, I guess LockRelationForExtension() is used when trying to add a new page in an existing relation so not sure if it would be right to call it from contrib modules like pgstattuple where we are just trying to read the tables. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Tue, Jan 24, 2017 at 10:09 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> I've looked at the patch. It looks good. However, I was wondering why >> an exclusive lock for extension is necessary for reading the number >> blocks in this case. Please refer to the following code. >> >> + /* Get the current relation length */ >> + LockRelationForExtension(rel, ExclusiveLock); >> + nblocks = RelationGetNumberOfBlocks(rel); >> + UnlockRelationForExtension(rel, ExclusiveLock); >> >> Apart from this, I've nothing else to add. > > > Actually in my case I may not need to acquire ExclusiveLock just to > get the number of pages in a relation. Infact I have already opened a > relation using > 'AccessShareLock' which should be enough to read a table length. > Thanks for putting this point. I have corrected it in the attached v5 > patch. I've checked the last attached patch and didn't find any issue. There is no problem with patching and compilation. It also passed all regression tests. Nothing else to add from my side. I'm marking this 'Ready for commiter'. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
On Tue, Jan 24, 2017 at 8:36 PM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > Nothing else to add from my side. I'm marking this 'Ready for commiter'. Moved to CF 2017-03 with the same status. -- Michael
On Wed, Feb 1, 2017 at 10:13 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Jan 24, 2017 at 8:36 PM, Kuntal Ghosh > <kuntalghosh.2007@gmail.com> wrote: >> Nothing else to add from my side. I'm marking this 'Ready for commiter'. > > Moved to CF 2017-03 with the same status. OK, I took a look at this. - The handling of the extension stuff wasn't correct. You can't go back and modify version 1.4; that's already been released. But version 1.5 hasn't been released yet, so we can (and should) add more stuff to that version instead of creating a new version. We don't need the pushups that you did with superuser checks, because there's no version out there in the wild that has those checks, so users with installed binaries can't be relying on them. I cleaned all that stuff up, which made this significantly simpler. - I removed several of the columns being returned from the metapage. The pageinspect code that I committed yesterday can be used to look at those values; there doesn't seem to be a need to also return them here. What this is really useful for is getting the "real" values by scanning through the whole index and tallying things up. - I adjusted some of the data types, both in the SQL and in the C code, so that they all match and that they're wide enough to return the values they might contain without overflowing (but not wider). I also made the free_percent float8 rather than float4, consistent with other things in this module. One thing I cheated on slightly: I left the version as int4 rather than int8 even though the underlying field is uint32, for consistency with everything else in this module. That's not entirely intellectually satisfying, but I guess it's better to be consistent for the moment. - I fixed a number of cosmetic issues, like the fact that the documentation for this didn't use \x format, unlike all of the other pgstattuple documentation, and the fact that table_len isn't a very good name for a variable representing the total amount of space for tuples, and the fact that the documentation referred to a hash index as a "hash table", and the fact that the wording for the columns wasn't consistent with other functions in pgstattuple with similar columns. Committed with those changes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Feb 4, 2017 at 1:12 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Feb 1, 2017 at 10:13 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Tue, Jan 24, 2017 at 8:36 PM, Kuntal Ghosh >> <kuntalghosh.2007@gmail.com> wrote: >>> Nothing else to add from my side. I'm marking this 'Ready for commiter'. >> >> Moved to CF 2017-03 with the same status. > > OK, I took a look at this. > > - The handling of the extension stuff wasn't correct. You can't go > back and modify version 1.4; that's already been released. But > version 1.5 hasn't been released yet, so we can (and should) add more > stuff to that version instead of creating a new version. We don't > need the pushups that you did with superuser checks, because there's > no version out there in the wild that has those checks, so users with > installed binaries can't be relying on them. I cleaned all that stuff > up, which made this significantly simpler. > > - I removed several of the columns being returned from the metapage. > The pageinspect code that I committed yesterday can be used to look at > those values; there doesn't seem to be a need to also return them > here. What this is really useful for is getting the "real" values by > scanning through the whole index and tallying things up. > > - I adjusted some of the data types, both in the SQL and in the C > code, so that they all match and that they're wide enough to return > the values they might contain without overflowing (but not wider). I > also made the free_percent float8 rather than float4, consistent with > other things in this module. One thing I cheated on slightly: I left > the version as int4 rather than int8 even though the underlying field > is uint32, for consistency with everything else in this module. > That's not entirely intellectually satisfying, but I guess it's better > to be consistent for the moment. > > - I fixed a number of cosmetic issues, like the fact that the > documentation for this didn't use \x format, unlike all of the other > pgstattuple documentation, and the fact that table_len isn't a very > good name for a variable representing the total amount of space for > tuples, and the fact that the documentation referred to a hash index > as a "hash table", and the fact that the wording for the columns > wasn't consistent with other functions in pgstattuple with similar > columns. > > Committed with those changes. Thanks for above corrections and commit. But, There are couple of things that we might have to change once the patch for 'WAL in Hash Indexes' gets checked-in. 1) The test-case result needs to be changed because there won't be any WARNING message : "WARNING: hash indexes are not WAL-logged and their use is discouraged". 2) From WAL patch for Hash Indexes onwards, we won't have any zero pages in Hash Indexes so I don't think we need to have column showing zero pages (zero_pages). When working on WAL in hash indexes, we found that WAL routine 'XLogReadBufferExtended' does not expect a page to be completely zero page else it returns Invalid Buffer. To fix this, we started initializing freed overflow page and newly allocated bucket pages using _hash_pageinit() hence I don't think there will be any zero pages from here onwards. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com
On Mon, Feb 6, 2017 at 10:06 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > On Sat, Feb 4, 2017 at 1:12 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> Committed with those changes. > > Thanks for above corrections and commit. But, There are couple of > things that we might have to change once the patch for 'WAL in Hash > Indexes' gets checked-in. > > 1) The test-case result needs to be changed because there won't be any > WARNING message : "WARNING: hash indexes are not WAL-logged and their > use is discouraged". > I think this should be changed in WAL patch itself, no need to handle it separately. > 2) From WAL patch for Hash Indexes onwards, we won't have any zero > pages in Hash Indexes so I don't think we need to have column showing > zero pages (zero_pages). When working on WAL in hash indexes, we found > that WAL routine 'XLogReadBufferExtended' does not expect a page to be > completely zero page else it returns Invalid Buffer. To fix this, we > started initializing freed overflow page and newly allocated bucket > pages using _hash_pageinit() hence I don't think there will be any > zero pages from here onwards. > Can't we use PageIsEmpty() to show such information? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sun, Feb 5, 2017 at 11:36 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> Committed with those changes. > > Thanks for above corrections and commit. But, There are couple of > things that we might have to change once the patch for 'WAL in Hash > Indexes' gets checked-in. > > 1) The test-case result needs to be changed because there won't be any > WARNING message : "WARNING: hash indexes are not WAL-logged and their > use is discouraged". > > 2) From WAL patch for Hash Indexes onwards, we won't have any zero > pages in Hash Indexes so I don't think we need to have column showing > zero pages (zero_pages). When working on WAL in hash indexes, we found > that WAL routine 'XLogReadBufferExtended' does not expect a page to be > completely zero page else it returns Invalid Buffer. To fix this, we > started initializing freed overflow page and newly allocated bucket > pages using _hash_pageinit() hence I don't think there will be any > zero pages from here onwards. Maybe we should call them "unused pages". Those will presumably still exist. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Feb 7, 2017 at 2:04 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Feb 5, 2017 at 11:36 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >>> Committed with those changes. >> >> Thanks for above corrections and commit. But, There are couple of >> things that we might have to change once the patch for 'WAL in Hash >> Indexes' gets checked-in. >> >> 1) The test-case result needs to be changed because there won't be any >> WARNING message : "WARNING: hash indexes are not WAL-logged and their >> use is discouraged". >> >> 2) From WAL patch for Hash Indexes onwards, we won't have any zero >> pages in Hash Indexes so I don't think we need to have column showing >> zero pages (zero_pages). When working on WAL in hash indexes, we found >> that WAL routine 'XLogReadBufferExtended' does not expect a page to be >> completely zero page else it returns Invalid Buffer. To fix this, we >> started initializing freed overflow page and newly allocated bucket >> pages using _hash_pageinit() hence I don't think there will be any >> zero pages from here onwards. > > Maybe we should call them "unused pages". > +1. If we consider some more names for that column then probably one alternative could be "empty pages". -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Feb 6, 2017 at 10:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> Maybe we should call them "unused pages". > > +1. If we consider some more names for that column then probably one > alternative could be "empty pages". Yeah, but I think "unused" might be better. Because a page could be in use (as an overflow page or primary bucket page) and still be empty. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On Tue, Feb 7, 2017 at 9:23 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Feb 6, 2017 at 10:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> Maybe we should call them "unused pages". >> >> +1. If we consider some more names for that column then probably one >> alternative could be "empty pages". > > Yeah, but I think "unused" might be better. Because a page could be > in use (as an overflow page or primary bucket page) and still be > empty. > Based on the earlier discussions, I have prepared a patch that would allow pgstathashindex() to show the number of unused pages in hash index. Please find the attached patch for the same. Thanks. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Mar 23, 2017 at 11:24 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > Hi, > > On Tue, Feb 7, 2017 at 9:23 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Feb 6, 2017 at 10:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>> Maybe we should call them "unused pages". >>> >>> +1. If we consider some more names for that column then probably one >>> alternative could be "empty pages". >> >> Yeah, but I think "unused" might be better. Because a page could be >> in use (as an overflow page or primary bucket page) and still be >> empty. >> > > Based on the earlier discussions, I have prepared a patch that would > allow pgstathashindex() to show the number of unused pages in hash > index. Please find the attached patch for the same. Thanks. > else if (opaque->hasho_flag & LH_BITMAP_PAGE) stats.bitmap_pages++; + else if (PageIsEmpty(page)) + stats.unused_pages++; I think having this check after PageIsNew() makes more sense then having at the place where you currently have, other than that code-wise your patch looks okay, although I haven't tested it. I think this should also be tracked under PostgreSQL 10 open items, but as this is not directly a bug, so not sure, what do others think? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Mar 25, 2017 at 11:02 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Mar 23, 2017 at 11:24 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> Hi, >> >> On Tue, Feb 7, 2017 at 9:23 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Mon, Feb 6, 2017 at 10:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>>> Maybe we should call them "unused pages". >>>> >>>> +1. If we consider some more names for that column then probably one >>>> alternative could be "empty pages". >>> >>> Yeah, but I think "unused" might be better. Because a page could be >>> in use (as an overflow page or primary bucket page) and still be >>> empty. >>> >> >> Based on the earlier discussions, I have prepared a patch that would >> allow pgstathashindex() to show the number of unused pages in hash >> index. Please find the attached patch for the same. Thanks. >> > > else if (opaque->hasho_flag & LH_BITMAP_PAGE) > stats.bitmap_pages++; > + else if (PageIsEmpty(page)) > + stats.unused_pages++; > > I think having this check after PageIsNew() makes more sense then > having at the place where you currently have, I don't think having it just below PageIsNew() check would be good. The reason being, there can be a bucket page which is in use but still be empty. So, if we add a check just below PageIsNew check then even though a page is marked as bucket page and is empty it will be shown as unused page which i feel is not correct. Here is one simple example that illustrates this point, Query: ====== select hash_page_type(get_raw_page('con_hash_index', 17)); gdb shows following info for this block number 17, (gdb) p *(PageHeader)page $1 = {pd_lsn = {xlogid = 0, xrecoff = 122297104}, pd_checksum = 0, pd_flags = 0, pd_lower = 24, pd_upper = 8176, pd_special = 8176, pd_pagesize_version = 8196, pd_prune_xid = 0, pd_linp= 0x22a82d8} (gdb) p ((HashPageOpaque) PageGetSpecialPointer(page))->hasho_flag $2 = 66 (gdb) p (((HashPageOpaque) PageGetSpecialPointer(page))->hasho_flag & LH_BUCKET_PAGE) $3 = 2 From above information it is clear that this page is a bucket page and is empty. I think it should be shown as bucket page rather than unused page. Also, this has already been discussed in [1]. other than that > code-wise your patch looks okay, although I haven't tested it. > I have tested it properly and didn't find any issues. > I think this should also be tracked under PostgreSQL 10 open items, > but as this is not directly a bug, so not sure, what do others think? Well, Even I am not sure if this has to be added under open items list or not. Thanks. [1] - https://www.postgresql.org/message-id/CA%2BTgmobwLKpKe99qnTJCBzFB4UaVGKrLNX3hwp8wcxObMDx7pA%40mail.gmail.com -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com
On Sat, Mar 25, 2017 at 12:33 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > On Sat, Mar 25, 2017 at 11:02 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Thu, Mar 23, 2017 at 11:24 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >>> Hi, >>> >>> On Tue, Feb 7, 2017 at 9:23 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>>> On Mon, Feb 6, 2017 at 10:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>>>> Maybe we should call them "unused pages". >>>>> >>>>> +1. If we consider some more names for that column then probably one >>>>> alternative could be "empty pages". >>>> >>>> Yeah, but I think "unused" might be better. Because a page could be >>>> in use (as an overflow page or primary bucket page) and still be >>>> empty. >>>> >>> >>> Based on the earlier discussions, I have prepared a patch that would >>> allow pgstathashindex() to show the number of unused pages in hash >>> index. Please find the attached patch for the same. Thanks. >>> >> >> else if (opaque->hasho_flag & LH_BITMAP_PAGE) >> stats.bitmap_pages++; >> + else if (PageIsEmpty(page)) >> + stats.unused_pages++; >> >> I think having this check after PageIsNew() makes more sense then >> having at the place where you currently have, > > I don't think having it just below PageIsNew() check would be good. > The reason being, there can be a bucket page which is in use but still > be empty. So, if we add a check just below PageIsNew check then even > though a page is marked as bucket page and is empty it will be shown > as unused page which i feel is not correct. Here is one simple example > that illustrates this point, > oh, is it a page where all the items have been deleted and no new items have been inserted? The reason why I have told that place is not appropriate is because all the other checks in near by code are on the page type and this check looks odd at that place, but we might need to do this way if there is no other clean solution. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
>>>>>> >>>>>> +1. If we consider some more names for that column then probably one >>>>>> alternative could be "empty pages". >>>>> >>>>> Yeah, but I think "unused" might be better. Because a page could be >>>>> in use (as an overflow page or primary bucket page) and still be >>>>> empty. >>>>> >>>> >>>> Based on the earlier discussions, I have prepared a patch that would >>>> allow pgstathashindex() to show the number of unused pages in hash >>>> index. Please find the attached patch for the same. Thanks. >>>> >>> >>> else if (opaque->hasho_flag & LH_BITMAP_PAGE) >>> stats.bitmap_pages++; >>> + else if (PageIsEmpty(page)) >>> + stats.unused_pages++; >>> >>> I think having this check after PageIsNew() makes more sense then >>> having at the place where you currently have, >> >> I don't think having it just below PageIsNew() check would be good. >> The reason being, there can be a bucket page which is in use but still >> be empty. So, if we add a check just below PageIsNew check then even >> though a page is marked as bucket page and is empty it will be shown >> as unused page which i feel is not correct. Here is one simple example >> that illustrates this point, >> > > oh, is it a page where all the items have been deleted and no new > items have been inserted? Yes, it is a page from where items have been removed and no new insertion has happened. The reason why I have told that place is > not appropriate is because all the other checks in near by code are on > the page type and this check looks odd at that place, but we might > need to do this way if there is no other clean solution. I got your point but then i think that is the only one solution we have as of now. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com
On Thu, Mar 23, 2017 at 1:54 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> Yeah, but I think "unused" might be better. Because a page could be >> in use (as an overflow page or primary bucket page) and still be >> empty. > > Based on the earlier discussions, I have prepared a patch that would > allow pgstathashindex() to show the number of unused pages in hash > index. Please find the attached patch for the same. Thanks. My idea is that we shouldn't end up with both a zero_pages column and an unused_pages column. Instead, we should end up with just an unused_pages column, which will include both pages that are all-zeroes and pages that have a valid special space marked as LH_UNUSED. Also, I don't see why it's correct to test PageIsEmpty() here instead of just testing the page type as we do in pageinspect. Attached is a revised patch that shows what I have in mind; please review. Along the way I made the code for examining the page type more similar to what pageinspect does, because I see no reason for those things to be different, and I think the pageinspect code is better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hi, >> >> Based on the earlier discussions, I have prepared a patch that would >> allow pgstathashindex() to show the number of unused pages in hash >> index. Please find the attached patch for the same. Thanks. > > My idea is that we shouldn't end up with both a zero_pages column and > an unused_pages column. Instead, we should end up with just an > unused_pages column, which will include both pages that are all-zeroes > and pages that have a valid special space marked as LH_UNUSED. > > Also, I don't see why it's correct to test PageIsEmpty() here instead > of just testing the page type as we do in pageinspect. > > Attached is a revised patch that shows what I have in mind; please > review. Along the way I made the code for examining the page type > more similar to what pageinspect does, because I see no reason for > those things to be different, and I think the pageinspect code is > better. I have reviewed the patch and it looks good to me. Also, the idea of including both zero and unused pages in a single 'unused' column looks better. Thanks. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com
On Thu, Apr 06, 2017 at 10:45:26AM +0530, Ashutosh Sharma wrote: > >> Based on the earlier discussions, I have prepared a patch that would > >> allow pgstathashindex() to show the number of unused pages in hash > >> index. Please find the attached patch for the same. Thanks. > > > > My idea is that we shouldn't end up with both a zero_pages column and > > an unused_pages column. Instead, we should end up with just an > > unused_pages column, which will include both pages that are all-zeroes > > and pages that have a valid special space marked as LH_UNUSED. > > > > Also, I don't see why it's correct to test PageIsEmpty() here instead > > of just testing the page type as we do in pageinspect. > > > > Attached is a revised patch that shows what I have in mind; please > > review. Along the way I made the code for examining the page type > > more similar to what pageinspect does, because I see no reason for > > those things to be different, and I think the pageinspect code is > > better. > > I have reviewed the patch and it looks good to me. Also, the idea of > including both zero and unused pages in a single 'unused' column looks > better. Thanks. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Robert, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
On Wed, Apr 12, 2017 at 2:28 AM, Noah Misch <noah@leadboat.com> wrote: > [Action required within three days. This is a generic notification.] > > The above-described topic is currently a PostgreSQL 10 open item. Robert, > since you committed the patch believed to have created it, you own this open > item. If some other commit is more relevant or if this does not belong as a > v10 open item, please let us know. Otherwise, please observe the policy on > open item ownership[1] and send a status update within three calendar days of > this message. Include a date for your subsequent status update. Testers may > discover new open items at any time, and I want to plan to get them all fixed > well in advance of shipping v10. Consequently, I will appreciate your efforts > toward speedy resolution. Thanks. Oops. I had forgotten about this one; committed now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company