Thread: pgstattuple extension for indexes

pgstattuple extension for indexes

From
ITAGAKI Takahiro
Date:
This is an extension of pgstattuple to query information from indexes.
It supports btree, hash and gist. Gin is not supported.
It scans only index pages and does not read corresponding heap tuples.
Therefore, 'dead_tuple' means the number of tuples with LP_DELETE flag.

Also, I added an experimental feature for btree indexes. It checks
fragmentation factor of indexes. If an leaf has the right link on the next
adjacent page in the file, it is assumed to be continuous (not fragmented).
It will help us to decide when to REINDEX.

Suggestions welcome.

----
$ pgbench -i
$ pgbench -n -t 100 -c 10
# select * from pgstattuple('accounts_pkey');
NOTICE:  0.36% fragmented
HINT:  continuous=273, forward=1, backward=0
-[ RECORD 1 ]------+--------
table_len          | 2260992
tuple_count        | 100996     -- 996 tuples are dead practically,
tuple_len          | 1615936       but no LP_DELETE yet.
tuple_percent      | 71.47
dead_tuple_count   | 4
dead_tuple_len     | 64         -- 64 tuples are marked as LP_DELETE.
dead_tuple_percent | 0
free_space         | 208188
free_percent       | 9.21
----


Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment

Re: pgstattuple extension for indexes

From
Bruce Momjian
Date:
Patch applied.  Thanks.

---------------------------------------------------------------------------


ITAGAKI Takahiro wrote:
> This is an extension of pgstattuple to query information from indexes.
> It supports btree, hash and gist. Gin is not supported.
> It scans only index pages and does not read corresponding heap tuples.
> Therefore, 'dead_tuple' means the number of tuples with LP_DELETE flag.
>
> Also, I added an experimental feature for btree indexes. It checks
> fragmentation factor of indexes. If an leaf has the right link on the next
> adjacent page in the file, it is assumed to be continuous (not fragmented).
> It will help us to decide when to REINDEX.
>
> Suggestions welcome.
>
> ----
> $ pgbench -i
> $ pgbench -n -t 100 -c 10
> # select * from pgstattuple('accounts_pkey');
> NOTICE:  0.36% fragmented
> HINT:  continuous=273, forward=1, backward=0
> -[ RECORD 1 ]------+--------
> table_len          | 2260992
> tuple_count        | 100996     -- 996 tuples are dead practically,
> tuple_len          | 1615936       but no LP_DELETE yet.
> tuple_percent      | 71.47
> dead_tuple_count   | 4
> dead_tuple_len     | 64         -- 64 tuples are marked as LP_DELETE.
> dead_tuple_percent | 0
> free_space         | 208188
> free_percent       | 9.21
> ----
>
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: pgstattuple extension for indexes

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Patch applied.  Thanks.

>> Also, I added an experimental feature for btree indexes. It checks
>> fragmentation factor of indexes. If an leaf has the right link on the next
>> adjacent page in the file, it is assumed to be continuous (not fragmented).
>> It will help us to decide when to REINDEX.

This was done in an entirely unacceptable way, to wit

    ereport(NOTICE,
        (errmsg("%.2f%% fragmented",
            100.0 * (stat.forward + stat.backward) /
            (stat.continuous + stat.forward + stat.backward)),
        errhint("continuous=%llu, forward=%llu, backward=%llu",
            stat.continuous, stat.forward, stat.backward)));

The really serious problem with reporting this info via NOTICE is that
there's no way for a program to get its hands on the info.  The output
tuple format needs to be extended instead.

The lesser problem that drew my attention is that %llu is unportable
(and in fact draws gcc warnings for me; did you ignore that?).  But
using UINT64_FORMAT instead would create a headache for translation
because the string would vary across platforms.

I won't bother correcting the violations of message style guidelines,
because this code simply has to go away.

            regards, tom lane

Re: pgstattuple extension for indexes

From
ITAGAKI Takahiro
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> >> Also, I added an experimental feature for btree indexes. It checks
> >> fragmentation factor of indexes.

> The really serious problem with reporting this info via NOTICE is that
> there's no way for a program to get its hands on the info.  The output
> tuple format needs to be extended instead.

Ok, I added 'fragmented_percent' field to the output tuple. This
information will help us to decide when to do REINDEX.
However, it is only avaliable for btree index presently. Other indexes
should have equivalent information, but I don't know how to determine it.


BTW, should we change VACUUM VERBOSE in the same way? If we do so,
autovacuum can handle the reports of VACUUM VERBOSE and plan when to
do VACUUM FULL, REINDEX and/or CLUSTER using the information.
Is this worth doing?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


Attachment

Re: pgstattuple extension for indexes

From
Alvaro Herrera
Date:
ITAGAKI Takahiro wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > >> Also, I added an experimental feature for btree indexes. It checks
> > >> fragmentation factor of indexes.
>
> > The really serious problem with reporting this info via NOTICE is that
> > there's no way for a program to get its hands on the info.  The output
> > tuple format needs to be extended instead.
>
> Ok, I added 'fragmented_percent' field to the output tuple. This
> information will help us to decide when to do REINDEX.
> However, it is only avaliable for btree index presently. Other indexes
> should have equivalent information, but I don't know how to determine it.

BTW while you're handling this, why not change the function to use OUT
parameters instead of having to CREATE TYPE to handle the return type?
I think it is easier to handle ...

One question I had was: in the percentage of fragmentation, is higher
better or lower better?  (I'm guessing lower is better, but this should
be mentioned in the docs)


> BTW, should we change VACUUM VERBOSE in the same way? If we do so,
> autovacuum can handle the reports of VACUUM VERBOSE and plan when to
> do VACUUM FULL, REINDEX and/or CLUSTER using the information.
> Is this worth doing?

You mean having VACUUM VERBOSE return a result set?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: pgstattuple extension for indexes

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> ITAGAKI Takahiro wrote:
>> BTW, should we change VACUUM VERBOSE in the same way? If we do so,
>> autovacuum can handle the reports of VACUUM VERBOSE and plan when to
>> do VACUUM FULL, REINDEX and/or CLUSTER using the information.
>> Is this worth doing?

> You mean having VACUUM VERBOSE return a result set?

To me, the point of VACUUM VERBOSE is mostly to give you some
reassurance that it's making progress.  If it were returning rows
instead of notice messages, you'd lose that functionality (at least
in libpq-based clients).  In any case, autovacuum has other ways
of getting the information without needing a change in user-visible
behavior.

            regards, tom lane

Re: pgstattuple extension for indexes

From
"Satoshi Nagayasu"
Date:
Hi,

I'm working on an utility for b-tree index, called `pgstatindex`.

It reports b-tree index statistics like a pgstattuple as below.
----------------------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey1');
-[ RECORD 1 ]------+---------
version            | 2
tree_level         | 2
index_size         | 17956864
root_block_no      | 361
internal_pages     | 8
leaf_pages         | 2184
empty_pages        | 0
deleted_pages      | 0
avg_leaf_density   | 90.07
leaf_fragmentation | 0

pgbench=#
----------------------------------------------------------------

I want to make this to contrib module like a pgstattuple,
and to make this open to public in a few days.

Do you think this is useful?

2006/7/24, ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > >> Also, I added an experimental feature for btree indexes. It checks
> > >> fragmentation factor of indexes.
>
> > The really serious problem with reporting this info via NOTICE is that
> > there's no way for a program to get its hands on the info.  The output
> > tuple format needs to be extended instead.
>
> Ok, I added 'fragmented_percent' field to the output tuple. This
> information will help us to decide when to do REINDEX.
> However, it is only avaliable for btree index presently. Other indexes
> should have equivalent information, but I don't know how to determine it.
>
>
> BTW, should we change VACUUM VERBOSE in the same way? If we do so,
> autovacuum can handle the reports of VACUUM VERBOSE and plan when to
> do VACUUM FULL, REINDEX and/or CLUSTER using the information.
> Is this worth doing?
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>
>
>
>

Re: pgstattuple extension for indexes

From
Bruce Momjian
Date:
Satoshi Nagayasu wrote:
> Hi,
>
> I'm working on an utility for b-tree index, called `pgstatindex`.
>
> It reports b-tree index statistics like a pgstattuple as below.
> ----------------------------------------------------------------
> pgbench=# \x
> Expanded display is on.
> pgbench=# SELECT * FROM pgstatindex('accounts_pkey1');
> -[ RECORD 1 ]------+---------
> version            | 2
> tree_level         | 2
> index_size         | 17956864
> root_block_no      | 361
> internal_pages     | 8
> leaf_pages         | 2184
> empty_pages        | 0
> deleted_pages      | 0
> avg_leaf_density   | 90.07
> leaf_fragmentation | 0
>
> pgbench=#
> ----------------------------------------------------------------
>
> I want to make this to contrib module like a pgstattuple,
> and to make this open to public in a few days.
>
> Do you think this is useful?

Yes, for performance debugging, I think.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: pgstattuple extension for indexes

From
Alvaro Herrera
Date:
Satoshi Nagayasu wrote:
> Hi,
>
> I'm working on an utility for b-tree index, called `pgstatindex`.

Does it make sense to merge the pgstatindex stuff with pgstattuple, and
have the fragmentation report into pgstatindex instead of pgstattuple
itself?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: pgstattuple extension for indexes

From
ITAGAKI Takahiro
Date:
Alvaro Herrera <alvherre@commandprompt.com> wrote:

> Satoshi Nagayasu wrote:
> > I'm working on an utility for b-tree index, called `pgstatindex`.
>
> Does it make sense to merge the pgstatindex stuff with pgstattuple, and
> have the fragmentation report into pgstatindex instead of pgstattuple
> itself?

It sounds good. We will have two separate commands:
 - pgstattuple: returns tuple-level information
 - pgstatindex: returns page-level information

We can use tuple-level info to check LP_DELETE flags on index tuples,
and use page-level info to check needs for REINDEX.


Do we add pgstatindex as a new contrib module,
or merge it into contrib/pgstattuple?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



Re: pgstattuple extension for indexes

From
Tom Lane
Date:
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> Do we add pgstatindex as a new contrib module,
> or merge it into contrib/pgstattuple?

I believe Alvaro was suggesting that you should add it as an additional
SQL function within contrib/pgstattuple.  That'd be my advice too ---
I don't see a reason to break this out as a separate contrib module.

            regards, tom lane

Re: pgstattuple extension for indexes

From
satoshi nagayasu
Date:
Hi folks,

As I said on -PATCHES, I've been working on an utility to get
a b-tree index information. I'm happy to introduce
my new functions to you.

pgstattuple module provides a `pgstatindex()`, and other small
functions, which allow you to get b-tree internal information.
I believe this module will be helpful to know b-tree index deeply.

So please try it, send comment to me, and have fun.

Thanks,
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>

-----------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
-[ RECORD 1 ]------+--------
version            | 2
tree_level         | 1
index_size         | 3588096
root_block_no      | 3
internal_pages     | 0
leaf_pages         | 437
empty_pages        | 0
deleted_pages      | 0
avg_leaf_density   | 59.5
leaf_fragmentation | 49.89
-----------------------------------------------------



Attachment

Re: pgstattuple extension for indexes

From
Bruce Momjian
Date:
I thought these new functions were going to be merged into
/contrib/pgstattuple.

---------------------------------------------------------------------------

satoshi nagayasu wrote:
> Hi folks,
>
> As I said on -PATCHES, I've been working on an utility to get
> a b-tree index information. I'm happy to introduce
> my new functions to you.
>
> pgstattuple module provides a `pgstatindex()`, and other small
> functions, which allow you to get b-tree internal information.
> I believe this module will be helpful to know b-tree index deeply.
>
> So please try it, send comment to me, and have fun.
>
> Thanks,
> --
> NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
>
> -----------------------------------------------------
> pgbench=# \x
> Expanded display is on.
> pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
> -[ RECORD 1 ]------+--------
> version            | 2
> tree_level         | 1
> index_size         | 3588096
> root_block_no      | 3
> internal_pages     | 0
> leaf_pages         | 437
> empty_pages        | 0
> deleted_pages      | 0
> avg_leaf_density   | 59.5
> leaf_fragmentation | 49.89
> -----------------------------------------------------
>
>

[ application/x-gzip is not supported, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] pgstattuple extension for indexes

From
Alvaro Herrera
Date:
Bruce Momjian wrote:
>
> I thought these new functions were going to be merged into
> /contrib/pgstattuple.

Well, that's exactly what this patch seems to do ...

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: [HACKERS] pgstattuple extension for indexes

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> Bruce Momjian wrote:
> >
> > I thought these new functions were going to be merged into
> > /contrib/pgstattuple.
>
> Well, that's exactly what this patch seems to do ...

Well, looking at the tarball it puts everything in pgstatindex, and the
Makefile is:

    #-------------------------------------------------------------------------
    #
    # pgstatindex Makefile
    #
    # $PostgreSQL$
    #
    #-------------------------------------------------------------------------

    SRCS        = pgstatindex.c

    MODULE_big    = pgstatindex
    OBJS        = $(SRCS:.c=.o)
    DOCS        =
    DATA_built    = pgstatindex.sql

    ifdef USE_PGXS
    PGXS := $(shell pg_config --pgxs)
    include $(PGXS)
    else
    subdir = contrib/pgstatindex
    top_builddir = /home/snaga/pgsql/sources/postgresql-8.1.3
    include $(top_builddir)/src/Makefile.global
    include $(top_srcdir)/contrib/contrib-global.mk
    endif

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: pgstattuple extension for indexes

From
Bruce Momjian
Date:
nagayasu-san,

This looks good, but we would like the code added to
/contrib/pgstattuple, rather than it being its own /contrib module.  Can
you make that adjustment?  Thanks.

---------------------------------------------------------------------------

satoshi nagayasu wrote:
> Hi folks,
>
> As I said on -PATCHES, I've been working on an utility to get
> a b-tree index information. I'm happy to introduce
> my new functions to you.
>
> pgstattuple module provides a `pgstatindex()`, and other small
> functions, which allow you to get b-tree internal information.
> I believe this module will be helpful to know b-tree index deeply.
>
> So please try it, send comment to me, and have fun.
>
> Thanks,
> --
> NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
>
> -----------------------------------------------------
> pgbench=# \x
> Expanded display is on.
> pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
> -[ RECORD 1 ]------+--------
> version            | 2
> tree_level         | 1
> index_size         | 3588096
> root_block_no      | 3
> internal_pages     | 0
> leaf_pages         | 437
> empty_pages        | 0
> deleted_pages      | 0
> avg_leaf_density   | 59.5
> leaf_fragmentation | 49.89
> -----------------------------------------------------
>
>

[ application/x-gzip is not supported, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: pgstattuple extension for indexes

From
Satoshi Nagayasu
Date:
Bruce,

I'll fix it in this week. Please wait a few days.
Thanks.

Bruce Momjian wrote:
> nagayasu-san,
>
> This looks good, but we would like the code added to
> /contrib/pgstattuple, rather than it being its own /contrib module.  Can
> you make that adjustment?  Thanks.
>
> ---------------------------------------------------------------------------
>
> satoshi nagayasu wrote:
>> Hi folks,
>>
>> As I said on -PATCHES, I've been working on an utility to get
>> a b-tree index information. I'm happy to introduce
>> my new functions to you.
>>
>> pgstattuple module provides a `pgstatindex()`, and other small
>> functions, which allow you to get b-tree internal information.
>> I believe this module will be helpful to know b-tree index deeply.
>>
>> So please try it, send comment to me, and have fun.
>>
>> Thanks,
>> --
>> NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
>>
>> -----------------------------------------------------
>> pgbench=# \x
>> Expanded display is on.
>> pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
>> -[ RECORD 1 ]------+--------
>> version            | 2
>> tree_level         | 1
>> index_size         | 3588096
>> root_block_no      | 3
>> internal_pages     | 0
>> leaf_pages         | 437
>> empty_pages        | 0
>> deleted_pages      | 0
>> avg_leaf_density   | 59.5
>> leaf_fragmentation | 49.89
>> -----------------------------------------------------
>>
>>
>
> [ application/x-gzip is not supported, skipping... ]
>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Have you searched our list archives?
>>
>>                http://archives.postgresql.org
>


--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122

Re: pgstattuple extension for indexes

From
Satoshi Nagayasu
Date:
Hi all,

Here is a patch to add pgstatindex functions to the pgstattuple module,
which can work with 8.1.4. Please review and try it. Thanks.


Satoshi Nagayasu wrote:
> Bruce,
>
> I'll fix it in this week. Please wait a few days.
> Thanks.
>
> Bruce Momjian wrote:
>> nagayasu-san,
>>
>> This looks good, but we would like the code added to
>> /contrib/pgstattuple, rather than it being its own /contrib module.  Can
>> you make that adjustment?  Thanks.
>>
>> ---------------------------------------------------------------------------
>>
>> satoshi nagayasu wrote:
>>> Hi folks,
>>>
>>> As I said on -PATCHES, I've been working on an utility to get
>>> a b-tree index information. I'm happy to introduce
>>> my new functions to you.
>>>
>>> pgstattuple module provides a `pgstatindex()`, and other small
>>> functions, which allow you to get b-tree internal information.
>>> I believe this module will be helpful to know b-tree index deeply.
>>>
>>> So please try it, send comment to me, and have fun.
>>>
>>> Thanks,
>>> --
>>> NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
>>>
>>> -----------------------------------------------------
>>> pgbench=# \x
>>> Expanded display is on.
>>> pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
>>> -[ RECORD 1 ]------+--------
>>> version            | 2
>>> tree_level         | 1
>>> index_size         | 3588096
>>> root_block_no      | 3
>>> internal_pages     | 0
>>> leaf_pages         | 437
>>> empty_pages        | 0
>>> deleted_pages      | 0
>>> avg_leaf_density   | 59.5
>>> leaf_fragmentation | 49.89
>>> -----------------------------------------------------
>>>
>>>
>> [ application/x-gzip is not supported, skipping... ]
>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 4: Have you searched our list archives?
>>>
>>>                http://archives.postgresql.org
>
>


--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
--- pgstattuple.orig/Makefile    2006-08-10 19:22:47.000000000 +0900
+++ pgstattuple/Makefile    2006-08-10 19:24:05.000000000 +0900
@@ -6,7 +6,7 @@
 #
 #-------------------------------------------------------------------------

-SRCS        = pgstattuple.c
+SRCS        = pgstattuple.c pgstatindex.c

 MODULE_big    = pgstattuple
 OBJS        = $(SRCS:.c=.o)
diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
--- pgstattuple.orig/pgstatindex.c    1970-01-01 09:00:00.000000000 +0900
+++ pgstattuple/pgstatindex.c    2006-08-11 17:51:26.000000000 +0900
@@ -0,0 +1,714 @@
+/*
+ * pgstatindex
+ *
+ * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp>
+ *
+ * Permission to use, copy, modify, and distribute this software and
+ * its documentation for any purpose, without fee, and without a
+ * written agreement is hereby granted, provided that the above
+ * copyright notice and this paragraph and the following two
+ * paragraphs appear in all copies.
+ *
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
+ * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
+ * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
+ * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
+ * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
+ * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+#include "access/heapam.h"
+#include "access/itup.h"
+#include "access/nbtree.h"
+#include "access/transam.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_type.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+
+PG_FUNCTION_INFO_V1(pgstatindex);
+PG_FUNCTION_INFO_V1(bt_metap);
+PG_FUNCTION_INFO_V1(bt_page_items);
+PG_FUNCTION_INFO_V1(bt_page_stats);
+PG_FUNCTION_INFO_V1(pg_relpages);
+
+extern Datum pgstatindex(PG_FUNCTION_ARGS);
+extern Datum bt_metap(PG_FUNCTION_ARGS);
+extern Datum bt_page_items(PG_FUNCTION_ARGS);
+extern Datum bt_page_stats(PG_FUNCTION_ARGS);
+extern Datum pg_relpages(PG_FUNCTION_ARGS);
+
+#define PGSTATINDEX_TYPE "public.pgstatindex_type"
+#define PGSTATINDEX_NCOLUMNS 10
+
+#define BTMETAP_TYPE "public.bt_metap_type"
+#define BTMETAP_NCOLUMNS 6
+
+#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
+#define BTPAGEITEMS_NCOLUMNS 6
+
+#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
+#define BTPAGESTATS_NCOLUMNS 12
+
+
+#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
+#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
+
+#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
+        if ( !(FirstOffsetNumber<=(offset) && \
+                        (offset)<=PageGetMaxOffsetNumber(page)) ) \
+             elog(ERROR, "Page offset number out of range."); }
+
+#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
+        if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
+             elog(ERROR, "Block number out of range."); }
+
+/* ------------------------------------------------
+ * structure for single btree page statistics
+ * ------------------------------------------------
+ */
+typedef struct BTPageStat {
+    uint32 blkno;
+    uint32 live_items;
+    uint32 dead_items;
+    uint32 page_size;
+    uint32 max_avail;
+    uint32 free_size;
+    uint32 avg_item_size;
+    uint32 fragments;
+    bool is_root;
+    bool is_internal;
+    bool is_leaf;
+    bool is_deleted;
+    bool is_empty;
+} BTPageStat;
+
+/* ------------------------------------------------
+ * A structure for a whole btree index statistics
+ * used by pgstatindex().
+ * ------------------------------------------------
+ */
+typedef struct BTIndexStat {
+    uint32 version;
+    BlockNumber root_blkno;
+    uint32 level;
+
+    uint32 live_items;
+    uint32 dead_items;
+
+    uint32 root_pages;
+    uint32 internal_pages;
+    uint32 leaf_pages;
+    uint32 empty_pages;
+    uint32 deleted_pages;
+
+    uint32 page_size;
+    uint32 avg_item_size;
+
+    uint32 max_avail;
+    uint32 free_space;
+
+    uint32 fragments;
+} BTIndexStat;
+
+/* -------------------------------------------------
+ * GetBTPageStatistics()
+ *
+ * Collect statistics of single b-tree leaf page
+ * -------------------------------------------------
+ */
+static bool
+GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat *stat)
+{
+    Page page           = BufferGetPage(buffer);
+    PageHeader phdr     = (PageHeader) page;
+    OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+    BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+    int item_size = 0;
+    int off;
+
+    stat->blkno = blkno;
+
+    stat->max_avail = BLCKSZ - ( BLCKSZ - phdr->pd_special + SizeOfPageHeaderData );
+
+    stat->dead_items = stat->live_items = 0;
+
+    stat->page_size = PageGetPageSize(page);
+
+    /* page type */
+    stat->is_root    = false;
+    stat->is_leaf    = false;
+    stat->is_deleted = false;
+    stat->is_empty   = false;
+
+    if ( P_ISDELETED(opaque) )
+    {
+        stat->is_deleted = true;
+        return true;
+    }
+    else if ( P_IGNORE(opaque) )
+        stat->is_empty = true;
+    else if ( P_ISLEAF(opaque) )
+        stat->is_leaf = true;
+    else if ( P_ISROOT(opaque) )
+        stat->is_root = true;
+    else
+        stat->is_internal = true;
+
+    /*----------------------------------------------
+     * If a next leaf is on the previous block,
+     * it means a fragmentation.
+     *----------------------------------------------
+     */
+    stat->fragments = 0;
+    if ( stat->is_leaf )
+    {
+        if ( opaque->btpo_next != P_NONE && opaque->btpo_next < blkno )
+            stat->fragments++;
+    }
+
+    /* count live and dead tuples, and free space */
+    for (off=FirstOffsetNumber ; off<=maxoff ; off++)
+    {
+        BTItem btitem;
+        IndexTuple itup;
+
+        ItemId id = PageGetItemId(page, off);
+
+        btitem = (BTItem)PageGetItem(page, id);
+
+        itup = &(btitem->bti_itup);
+
+        item_size += IndexTupleSize(itup);
+
+        if ( !ItemIdDeleted(id) )
+            stat->live_items++;
+        else
+            stat->dead_items++;
+    }
+    stat->free_size = PageGetFreeSpace(page);
+
+#ifdef NOT_USED
+    elog(NOTICE, "%d/%d (%.2f%%)", stat->free_size, stat->max_avail,
+         (float)stat->free_size/(float)stat->max_avail*100.0);
+#endif
+
+    if ( (stat->live_items + stat->dead_items) > 0 )
+        stat->avg_item_size = item_size / ( stat->live_items + stat->dead_items );
+    else
+        stat->avg_item_size = 0;
+
+    return true;
+}
+
+
+/* ------------------------------------------------------
+ * pgstatindex()
+ *
+ * Usage: SELECT * FROM pgstatindex('t1_pkey');
+ * ------------------------------------------------------
+ */
+Datum
+pgstatindex(PG_FUNCTION_ARGS)
+{
+    text       *relname = PG_GETARG_TEXT_P(0);
+    Relation    rel;
+    RangeVar    *relrv;
+    Datum        result;
+    uint32        nblocks;
+    uint32        blkno;
+    BTIndexStat indexStat;
+
+    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+    rel = relation_openrv(relrv, AccessShareLock);
+
+    if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
+        elog(ERROR, "pgstatindex() can be used only on b-tree index.");
+
+    /*-------------------
+     * Read a metapage
+     *-------------------
+     */
+    {
+        Buffer buffer = ReadBuffer(rel, 0);
+        Page page = BufferGetPage(buffer);
+        BTMetaPageData *metad = BTPageGetMeta(page);
+
+//        snprintf(values[j++], 32, "%d", metad->btm_magic);
+        indexStat.version    = metad->btm_version;
+        indexStat.root_blkno = metad->btm_root;
+        indexStat.level      = metad->btm_level;
+//        snprintf(values[j++], 32, "%d", metad->btm_fastroot);
+//        snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
+
+        ReleaseBuffer(buffer);
+    }
+
+    nblocks = RelationGetNumberOfBlocks(rel);
+
+    /* -- init stat -- */
+    indexStat.fragments      = 0;
+
+    indexStat.root_pages     = 0;
+    indexStat.leaf_pages     = 0;
+    indexStat.internal_pages = 0;
+    indexStat.empty_pages    = 0;
+    indexStat.deleted_pages  = 0;
+
+    indexStat.max_avail  = 0;
+    indexStat.free_space = 0;
+
+    /*-----------------------
+     * Scan all blocks
+     *-----------------------
+     */
+    for (blkno=1 ; blkno<nblocks ; blkno++)
+    {
+        Buffer buffer = ReadBuffer(rel, blkno);
+        BTPageStat stat;
+
+        /* scan one page */
+        stat.blkno = blkno;
+        GetBTPageStatistics(blkno, buffer, &stat);
+
+        /*---------------------
+         * page status (type)
+         *---------------------
+         */
+        if ( stat.is_deleted )
+            indexStat.deleted_pages++;
+        else if ( stat.is_leaf )
+            indexStat.leaf_pages++;
+        else if ( stat.is_internal )
+            indexStat.internal_pages++;
+        else if ( stat.is_empty )
+            indexStat.empty_pages++;
+        else if ( stat.is_root )
+            indexStat.root_pages++;
+        else
+            elog(ERROR, "unknown page status.");
+
+        /* -- leaf fragmentation -- */
+        indexStat.fragments += stat.fragments;
+
+        if ( stat.is_leaf )
+        {
+            indexStat.max_avail  += stat.max_avail;
+            indexStat.free_space += stat.free_size;
+        }
+
+        ReleaseBuffer(buffer);
+    }
+
+    relation_close(rel, AccessShareLock);
+
+#ifdef NOT_USED
+    elog(NOTICE, "[index information]");
+    elog(NOTICE, "version.................: %d", indexStat.version);
+    elog(NOTICE, "tree level..............: %d", indexStat.level);
+    elog(NOTICE, "index size..............: %d", (indexStat.root_pages +
+                                    indexStat.leaf_pages +
+                                    indexStat.internal_pages +
+                                    indexStat.deleted_pages +
+                                    indexStat.empty_pages) * BLCKSZ);
+
+    elog(NOTICE, "");
+    elog(NOTICE, "[page statistics]");
+    elog(NOTICE, "root block number.......: %d", indexStat.root_blkno);
+    elog(NOTICE, "internal pages..........: %d", indexStat.internal_pages);
+    elog(NOTICE, "leaf pages..............: %d", indexStat.leaf_pages);
+    elog(NOTICE, "empty pages.............: %d", indexStat.empty_pages);
+    elog(NOTICE, "deleted pages...........: %d", indexStat.deleted_pages);
+
+    elog(NOTICE, "");
+    elog(NOTICE, "[leaf statistics]");
+    elog(NOTICE, "avg. leaf density.......: %.2f%%", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail *
100.0); 
+    elog(NOTICE, "leaf fragmentation......: %.2f%%", (float)indexStat.fragments / 2.0 / (float)indexStat.leaf_pages *
100.0); 
+#endif
+
+    /*----------------------------
+     * Build a result tuple
+     *----------------------------
+     */
+    {
+        TupleDesc tupleDesc;
+        int j;
+        char *values[PGSTATINDEX_NCOLUMNS];
+
+        HeapTupleData tupleData;
+        HeapTuple tuple = &tupleData;
+
+        tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
+
+        j = 0;
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", indexStat.version);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", indexStat.level);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", (indexStat.root_pages +
+                                         indexStat.leaf_pages +
+                                         indexStat.internal_pages +
+                                         indexStat.deleted_pages +
+                                         indexStat.empty_pages) * BLCKSZ );
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", indexStat.root_blkno);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", indexStat.internal_pages);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", indexStat.empty_pages);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%.2f", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail * 100.0 );
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%.2f", (float)indexStat.fragments / (float)indexStat.leaf_pages * 100.0 );
+
+        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
+                                       values);
+
+        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
+    }
+
+    PG_RETURN_DATUM( result );
+}
+
+/* -----------------------------------------------
+ * bt_page()
+ *
+ * Usage: SELECT * FROM bt_page('t1_pkey', 0);
+ * -----------------------------------------------
+ */
+Datum
+bt_page_stats(PG_FUNCTION_ARGS)
+{
+    text       *relname = PG_GETARG_TEXT_P(0);
+    uint32       blkno   = PG_GETARG_UINT32(1);
+    Buffer       buffer;
+
+    Relation    rel;
+    RangeVar    *relrv;
+    Datum        result;
+
+    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+    rel = relation_openrv(relrv, AccessShareLock);
+
+    CHECK_RELATION_BLOCK_RANGE(rel, blkno);
+
+    buffer = ReadBuffer(rel, blkno);
+
+    if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
+        elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
+
+    if ( blkno==0 )
+        elog(ERROR, "Block 0 is a meta page.");
+
+    {
+        HeapTuple tuple;
+        TupleDesc tupleDesc;
+        int j;
+        char *values[BTPAGESTATS_NCOLUMNS];
+
+        BTPageStat stat;
+
+        GetBTPageStatistics(blkno, buffer, &stat);
+
+        tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
+
+        j = 0;
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.blkno);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.live_items);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.dead_items);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.live_items + stat.dead_items);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.avg_item_size);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.page_size);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.free_size);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.is_deleted);
+        values[j] = palloc(32);
+//        snprintf(values[j++], 32, "%d", opaque->btpo_prev);
+        snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
+        values[j] = palloc(32);
+//        snprintf(values[j++], 32, "%d", opaque->btpo_next);
+        snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
+        values[j] = palloc(32);
+//        snprintf(values[j++], 32, "%d", opaque->btpo.level);
+        snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
+        values[j] = palloc(32);
+//        snprintf(values[j++], 32, "%d", opaque->btpo_flags);
+        snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
+
+        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
+                                       values);
+
+        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
+    }
+
+    ReleaseBuffer(buffer);
+
+    relation_close(rel, AccessShareLock);
+
+    PG_RETURN_DATUM(result);
+}
+
+/*-------------------------------------------------------
+ * bt_page_items()
+ *
+ * Get IndexTupleData set in a leaf page
+ *
+ * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
+ *-------------------------------------------------------
+ */
+/* ---------------------------------------------------
+ * data structure for SRF to hold a scan information
+ * ---------------------------------------------------
+ */
+struct user_args
+{
+    TupleDesc tupd;
+    Relation rel;
+    Buffer buffer;
+    Page page;
+    uint16 offset;
+};
+
+Datum
+bt_page_items(PG_FUNCTION_ARGS)
+{
+    text       *relname = PG_GETARG_TEXT_P(0);
+    uint32       blkno   = PG_GETARG_UINT32(1);
+
+    RangeVar    *relrv;
+    Datum result;
+    char *values[BTPAGEITEMS_NCOLUMNS];
+    BTPageOpaque opaque;
+    HeapTuple tuple;
+    ItemId id;
+
+    FuncCallContext *fctx;
+    MemoryContext mctx;
+    struct user_args *uargs = NULL;
+
+    if ( blkno==0 )
+        elog(ERROR, "Block 0 is a meta page.");
+
+    if ( SRF_IS_FIRSTCALL() )
+    {
+        fctx = SRF_FIRSTCALL_INIT();
+        mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
+
+        uargs = palloc(sizeof(struct user_args));
+
+        uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
+        uargs->offset = FirstOffsetNumber;
+
+        relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+        uargs->rel = relation_openrv(relrv, AccessShareLock);
+
+        CHECK_RELATION_BLOCK_RANGE(uargs->rel,  blkno);
+
+        uargs->buffer = ReadBuffer(uargs->rel, blkno);
+
+        if ( !IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel) )
+            elog(ERROR, "bt_page_items() can be used only on b-tree index.");
+
+        uargs->page = BufferGetPage(uargs->buffer);
+
+        opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
+
+        if ( P_ISDELETED(opaque) )
+            elog(NOTICE, "bt_page_items(): this page is deleted.");
+
+        fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
+        fctx->user_fctx = uargs;
+
+        MemoryContextSwitchTo(mctx);
+    }
+
+    fctx = SRF_PERCALL_SETUP();
+    uargs = fctx->user_fctx;
+
+    if ( fctx->call_cntr < fctx->max_calls )
+    {
+        BTItem btitem;
+        IndexTuple itup;
+
+        id = PageGetItemId(uargs->page, uargs->offset);
+
+        if ( !ItemIdIsValid(id) )
+            elog(ERROR, "Invalid ItemId.");
+
+        btitem = (BTItem)PageGetItem(uargs->page, id);
+        itup = &(btitem->bti_itup);
+
+        {
+            int j = 0;
+
+            BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
+
+            values[j] = palloc(32);
+            snprintf(values[j++], 32, "%d", uargs->offset);
+            values[j] = palloc(32);
+            snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
+            values[j] = palloc(32);
+            snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
+            values[j] = palloc(32);
+            snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
+            values[j] = palloc(32);
+            snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
+
+            {
+                int off;
+                char *dump;
+                char *ptr = (char *)itup + IndexInfoFindDataOffset(itup->t_info);
+
+                dump = palloc(IndexTupleSize(itup)*3);
+                memset(dump, 0, IndexTupleSize(itup)*3);
+
+                for (off=0 ;
+                     off < IndexTupleSize(itup)-IndexInfoFindDataOffset(itup->t_info) ;
+                     off++)
+                {
+                    if ( dump[0]=='\0' )
+                        sprintf(dump, "%02x", *(ptr+off) & 0xff);
+                    else
+                    {
+                        char buf[4];
+                        sprintf(buf, " %02x", *(ptr+off) & 0xff);
+                        strcat(dump, buf);
+                    }
+                }
+                values[j] = dump;
+            }
+
+            tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
+            result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
+        }
+
+        uargs->offset = uargs->offset + 1;
+
+        SRF_RETURN_NEXT(fctx, result);
+    }
+    else
+    {
+        ReleaseBuffer(uargs->buffer);
+        relation_close(uargs->rel, AccessShareLock);
+
+        SRF_RETURN_DONE(fctx);
+    }
+}
+
+
+/* ------------------------------------------------
+ * bt_metap()
+ *
+ * Get a btree meta-page information
+ *
+ * Usage: SELECT * FROM bt_metap('t1_pkey')
+ * ------------------------------------------------
+ */
+Datum
+bt_metap(PG_FUNCTION_ARGS)
+{
+    text       *relname = PG_GETARG_TEXT_P(0);
+    Buffer       buffer;
+
+    Relation    rel;
+    RangeVar    *relrv;
+    Datum        result;
+
+    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+    rel = relation_openrv(relrv, AccessShareLock);
+
+    if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
+        elog(ERROR, "bt_metap() can be used only on b-tree index.");
+
+    buffer = ReadBuffer(rel, 0);
+
+    {
+        BTMetaPageData *metad;
+
+        TupleDesc tupleDesc;
+        int j;
+        char *values[BTMETAP_NCOLUMNS];
+        HeapTuple tuple;
+
+        Page page = BufferGetPage(buffer);
+
+        metad = BTPageGetMeta(page);
+
+        tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
+
+        j = 0;
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", metad->btm_magic);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", metad->btm_version);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", metad->btm_root);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", metad->btm_level);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", metad->btm_fastroot);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
+
+        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
+                                       values);
+
+        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
+    }
+
+    ReleaseBuffer(buffer);
+
+    relation_close(rel, AccessShareLock);
+
+    PG_RETURN_DATUM(result);
+}
+
+/* --------------------------------------------------------
+ * pg_relpages()
+ *
+ * Get a number of pages of the table/index.
+ *
+ * Usage: SELECT pg_relpages('t1');
+ *        SELECT pg_relpages('t1_pkey');
+ * --------------------------------------------------------
+ */
+Datum
+pg_relpages(PG_FUNCTION_ARGS)
+{
+    text       *relname = PG_GETARG_TEXT_P(0);
+
+    Relation    rel;
+    RangeVar    *relrv;
+    int4  relpages;
+
+    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+    rel = relation_openrv(relrv, AccessShareLock);
+
+    relpages = RelationGetNumberOfBlocks(rel);
+
+    relation_close(rel, AccessShareLock);
+
+    PG_RETURN_INT32(relpages);
+}
+
diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
--- pgstattuple.orig/pgstattuple.sql.in    2006-08-10 19:22:47.000000000 +0900
+++ pgstattuple/pgstattuple.sql.in    2006-08-11 17:46:19.000000000 +0900
@@ -22,3 +22,97 @@
 RETURNS pgstattuple_type
 AS 'MODULE_PATHNAME', 'pgstattuplebyid'
 LANGUAGE 'C' STRICT;
+
+--
+-- pgstatindex
+--
+DROP TYPE pgstatindex_type CASCADE;
+CREATE TYPE pgstatindex_type AS (
+  version int4,
+  tree_level int4,
+  index_size int4,
+  root_block_no int4,
+  internal_pages int4,
+  leaf_pages int4,
+  empty_pages int4,
+  deleted_pages int4,
+  avg_leaf_density float8,
+  leaf_fragmentation float8
+);
+
+CREATE OR REPLACE FUNCTION pgstatindex(text)
+RETURNS pgstatindex_type
+AS 'MODULE_PATHNAME', 'pgstatindex'
+LANGUAGE 'C' STRICT;
+
+--
+-- bt_metap()
+--
+DROP TYPE bt_metap_type CASCADE;
+CREATE TYPE bt_metap_type AS (
+  magic int4,
+  version int4,
+  root int4,
+  level int4,
+  fastroot int4,
+  fastlevel int4
+);
+
+CREATE OR REPLACE FUNCTION bt_metap(text)
+RETURNS bt_metap_type
+AS 'MODULE_PATHNAME', 'bt_metap'
+LANGUAGE 'C' STRICT;
+
+--
+-- bt_page_items()
+--
+DROP TYPE bt_page_items_type CASCADE;
+CREATE TYPE bt_page_items_type AS (
+  itemoffset int4,
+  ctid tid,
+  itemlen int4,
+  nulls bool,
+  vars bool,
+  data text
+);
+
+DROP FUNCTION bt_page_items(text, int4);
+
+CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
+RETURNS SETOF bt_page_items_type
+AS 'MODULE_PATHNAME', 'bt_page_items'
+LANGUAGE 'C' STRICT;
+
+--
+-- bt_page_stats()
+--
+DROP TYPE bt_page_stats_type CASCADE;
+CREATE TYPE bt_page_stats_type AS (
+  blkno int4,
+  live_items int4,
+  dead_items int4,
+  total_items int4,
+  avg_item_size float,
+  page_size int4,
+  free_size int4,
+  is_deleted int4,
+  btpo_prev int4,
+  btpo_next int4,
+  btpo_level int4,
+  btpo_flags int4
+);
+
+DROP FUNCTION bt_page_stats(text, int4);
+
+CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
+RETURNS bt_page_stats_type
+AS 'MODULE_PATHNAME', 'bt_page_stats'
+LANGUAGE 'C' STRICT;
+
+--
+-- pg_relpages()
+--
+CREATE OR REPLACE FUNCTION pg_relpages(text)
+RETURNS int
+AS 'MODULE_PATHNAME', 'pg_relpages'
+LANGUAGE 'C' STRICT;
diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
--- pgstattuple.orig/test.sh    1970-01-01 09:00:00.000000000 +0900
+++ pgstattuple/test.sh    2006-08-11 17:47:05.000000000 +0900
@@ -0,0 +1,22 @@
+#!/bin/sh
+
+export PATH=/usr/local/pgsql814/bin:$PATH
+
+psql pgbench<<EOF
+DROP FUNCTION pgstatindex(text);
+EOF
+
+psql -f /usr/local/pgsql814/share/contrib/pgstattuple.sql pgbench
+
+psql pgbench<<EOF
+SELECT * FROM pg_relpages('accounts_pkey');
+\x
+SELECT * FROM pgstatindex('accounts_pkey');
+SELECT * FROM bt_metap('accounts_pkey');
+\x
+SELECT * FROM bt_page_items('accounts_pkey', 0);
+SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
+
+\x
+SELECT * FROM bt_page_stats('accounts_pkey', 1);
+EOF

Re: pgstattuple extension for indexes

From
Bruce Momjian
Date:
I don't see any documentation, so I assume you want me to add something
to README.pgstattuple.

Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Satoshi Nagayasu wrote:
> Hi all,
>
> Here is a patch to add pgstatindex functions to the pgstattuple module,
> which can work with 8.1.4. Please review and try it. Thanks.
>
>
> Satoshi Nagayasu wrote:
> > Bruce,
> >
> > I'll fix it in this week. Please wait a few days.
> > Thanks.
> >
> > Bruce Momjian wrote:
> >> nagayasu-san,
> >>
> >> This looks good, but we would like the code added to
> >> /contrib/pgstattuple, rather than it being its own /contrib module.  Can
> >> you make that adjustment?  Thanks.
> >>
> >> ---------------------------------------------------------------------------
> >>
> >> satoshi nagayasu wrote:
> >>> Hi folks,
> >>>
> >>> As I said on -PATCHES, I've been working on an utility to get
> >>> a b-tree index information. I'm happy to introduce
> >>> my new functions to you.
> >>>
> >>> pgstattuple module provides a `pgstatindex()`, and other small
> >>> functions, which allow you to get b-tree internal information.
> >>> I believe this module will be helpful to know b-tree index deeply.
> >>>
> >>> So please try it, send comment to me, and have fun.
> >>>
> >>> Thanks,
> >>> --
> >>> NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
> >>>
> >>> -----------------------------------------------------
> >>> pgbench=# \x
> >>> Expanded display is on.
> >>> pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
> >>> -[ RECORD 1 ]------+--------
> >>> version            | 2
> >>> tree_level         | 1
> >>> index_size         | 3588096
> >>> root_block_no      | 3
> >>> internal_pages     | 0
> >>> leaf_pages         | 437
> >>> empty_pages        | 0
> >>> deleted_pages      | 0
> >>> avg_leaf_density   | 59.5
> >>> leaf_fragmentation | 49.89
> >>> -----------------------------------------------------
> >>>
> >>>
> >> [ application/x-gzip is not supported, skipping... ]
> >>
> >>> ---------------------------(end of broadcast)---------------------------
> >>> TIP 4: Have you searched our list archives?
> >>>
> >>>                http://archives.postgresql.org
> >
> >
>
>
> --
> NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
> Phone: +81-3-3523-8122

> diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
> --- pgstattuple.orig/Makefile    2006-08-10 19:22:47.000000000 +0900
> +++ pgstattuple/Makefile    2006-08-10 19:24:05.000000000 +0900
> @@ -6,7 +6,7 @@
>  #
>  #-------------------------------------------------------------------------
>
> -SRCS        = pgstattuple.c
> +SRCS        = pgstattuple.c pgstatindex.c
>
>  MODULE_big    = pgstattuple
>  OBJS        = $(SRCS:.c=.o)
> diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
> --- pgstattuple.orig/pgstatindex.c    1970-01-01 09:00:00.000000000 +0900
> +++ pgstattuple/pgstatindex.c    2006-08-11 17:51:26.000000000 +0900
> @@ -0,0 +1,714 @@
> +/*
> + * pgstatindex
> + *
> + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp>
> + *
> + * Permission to use, copy, modify, and distribute this software and
> + * its documentation for any purpose, without fee, and without a
> + * written agreement is hereby granted, provided that the above
> + * copyright notice and this paragraph and the following two
> + * paragraphs appear in all copies.
> + *
> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
> + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
> + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
> + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
> + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
> + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
> + */
> +
> +#include "postgres.h"
> +
> +#include "fmgr.h"
> +#include "funcapi.h"
> +#include "access/heapam.h"
> +#include "access/itup.h"
> +#include "access/nbtree.h"
> +#include "access/transam.h"
> +#include "catalog/namespace.h"
> +#include "catalog/pg_type.h"
> +#include "utils/builtins.h"
> +#include "utils/inval.h"
> +
> +PG_FUNCTION_INFO_V1(pgstatindex);
> +PG_FUNCTION_INFO_V1(bt_metap);
> +PG_FUNCTION_INFO_V1(bt_page_items);
> +PG_FUNCTION_INFO_V1(bt_page_stats);
> +PG_FUNCTION_INFO_V1(pg_relpages);
> +
> +extern Datum pgstatindex(PG_FUNCTION_ARGS);
> +extern Datum bt_metap(PG_FUNCTION_ARGS);
> +extern Datum bt_page_items(PG_FUNCTION_ARGS);
> +extern Datum bt_page_stats(PG_FUNCTION_ARGS);
> +extern Datum pg_relpages(PG_FUNCTION_ARGS);
> +
> +#define PGSTATINDEX_TYPE "public.pgstatindex_type"
> +#define PGSTATINDEX_NCOLUMNS 10
> +
> +#define BTMETAP_TYPE "public.bt_metap_type"
> +#define BTMETAP_NCOLUMNS 6
> +
> +#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
> +#define BTPAGEITEMS_NCOLUMNS 6
> +
> +#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
> +#define BTPAGESTATS_NCOLUMNS 12
> +
> +
> +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
> +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
> +
> +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
> +        if ( !(FirstOffsetNumber<=(offset) && \
> +                        (offset)<=PageGetMaxOffsetNumber(page)) ) \
> +             elog(ERROR, "Page offset number out of range."); }
> +
> +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
> +        if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
> +             elog(ERROR, "Block number out of range."); }
> +
> +/* ------------------------------------------------
> + * structure for single btree page statistics
> + * ------------------------------------------------
> + */
> +typedef struct BTPageStat {
> +    uint32 blkno;
> +    uint32 live_items;
> +    uint32 dead_items;
> +    uint32 page_size;
> +    uint32 max_avail;
> +    uint32 free_size;
> +    uint32 avg_item_size;
> +    uint32 fragments;
> +    bool is_root;
> +    bool is_internal;
> +    bool is_leaf;
> +    bool is_deleted;
> +    bool is_empty;
> +} BTPageStat;
> +
> +/* ------------------------------------------------
> + * A structure for a whole btree index statistics
> + * used by pgstatindex().
> + * ------------------------------------------------
> + */
> +typedef struct BTIndexStat {
> +    uint32 version;
> +    BlockNumber root_blkno;
> +    uint32 level;
> +
> +    uint32 live_items;
> +    uint32 dead_items;
> +
> +    uint32 root_pages;
> +    uint32 internal_pages;
> +    uint32 leaf_pages;
> +    uint32 empty_pages;
> +    uint32 deleted_pages;
> +
> +    uint32 page_size;
> +    uint32 avg_item_size;
> +
> +    uint32 max_avail;
> +    uint32 free_space;
> +
> +    uint32 fragments;
> +} BTIndexStat;
> +
> +/* -------------------------------------------------
> + * GetBTPageStatistics()
> + *
> + * Collect statistics of single b-tree leaf page
> + * -------------------------------------------------
> + */
> +static bool
> +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat *stat)
> +{
> +    Page page           = BufferGetPage(buffer);
> +    PageHeader phdr     = (PageHeader) page;
> +    OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
> +    BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> +    int item_size = 0;
> +    int off;
> +
> +    stat->blkno = blkno;
> +
> +    stat->max_avail = BLCKSZ - ( BLCKSZ - phdr->pd_special + SizeOfPageHeaderData );
> +
> +    stat->dead_items = stat->live_items = 0;
> +
> +    stat->page_size = PageGetPageSize(page);
> +
> +    /* page type */
> +    stat->is_root    = false;
> +    stat->is_leaf    = false;
> +    stat->is_deleted = false;
> +    stat->is_empty   = false;
> +
> +    if ( P_ISDELETED(opaque) )
> +    {
> +        stat->is_deleted = true;
> +        return true;
> +    }
> +    else if ( P_IGNORE(opaque) )
> +        stat->is_empty = true;
> +    else if ( P_ISLEAF(opaque) )
> +        stat->is_leaf = true;
> +    else if ( P_ISROOT(opaque) )
> +        stat->is_root = true;
> +    else
> +        stat->is_internal = true;
> +
> +    /*----------------------------------------------
> +     * If a next leaf is on the previous block,
> +     * it means a fragmentation.
> +     *----------------------------------------------
> +     */
> +    stat->fragments = 0;
> +    if ( stat->is_leaf )
> +    {
> +        if ( opaque->btpo_next != P_NONE && opaque->btpo_next < blkno )
> +            stat->fragments++;
> +    }
> +
> +    /* count live and dead tuples, and free space */
> +    for (off=FirstOffsetNumber ; off<=maxoff ; off++)
> +    {
> +        BTItem btitem;
> +        IndexTuple itup;
> +
> +        ItemId id = PageGetItemId(page, off);
> +
> +        btitem = (BTItem)PageGetItem(page, id);
> +
> +        itup = &(btitem->bti_itup);
> +
> +        item_size += IndexTupleSize(itup);
> +
> +        if ( !ItemIdDeleted(id) )
> +            stat->live_items++;
> +        else
> +            stat->dead_items++;
> +    }
> +    stat->free_size = PageGetFreeSpace(page);
> +
> +#ifdef NOT_USED
> +    elog(NOTICE, "%d/%d (%.2f%%)", stat->free_size, stat->max_avail,
> +         (float)stat->free_size/(float)stat->max_avail*100.0);
> +#endif
> +
> +    if ( (stat->live_items + stat->dead_items) > 0 )
> +        stat->avg_item_size = item_size / ( stat->live_items + stat->dead_items );
> +    else
> +        stat->avg_item_size = 0;
> +
> +    return true;
> +}
> +
> +
> +/* ------------------------------------------------------
> + * pgstatindex()
> + *
> + * Usage: SELECT * FROM pgstatindex('t1_pkey');
> + * ------------------------------------------------------
> + */
> +Datum
> +pgstatindex(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    Relation    rel;
> +    RangeVar    *relrv;
> +    Datum        result;
> +    uint32        nblocks;
> +    uint32        blkno;
> +    BTIndexStat indexStat;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
> +        elog(ERROR, "pgstatindex() can be used only on b-tree index.");
> +
> +    /*-------------------
> +     * Read a metapage
> +     *-------------------
> +     */
> +    {
> +        Buffer buffer = ReadBuffer(rel, 0);
> +        Page page = BufferGetPage(buffer);
> +        BTMetaPageData *metad = BTPageGetMeta(page);
> +
> +//        snprintf(values[j++], 32, "%d", metad->btm_magic);
> +        indexStat.version    = metad->btm_version;
> +        indexStat.root_blkno = metad->btm_root;
> +        indexStat.level      = metad->btm_level;
> +//        snprintf(values[j++], 32, "%d", metad->btm_fastroot);
> +//        snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
> +
> +        ReleaseBuffer(buffer);
> +    }
> +
> +    nblocks = RelationGetNumberOfBlocks(rel);
> +
> +    /* -- init stat -- */
> +    indexStat.fragments      = 0;
> +
> +    indexStat.root_pages     = 0;
> +    indexStat.leaf_pages     = 0;
> +    indexStat.internal_pages = 0;
> +    indexStat.empty_pages    = 0;
> +    indexStat.deleted_pages  = 0;
> +
> +    indexStat.max_avail  = 0;
> +    indexStat.free_space = 0;
> +
> +    /*-----------------------
> +     * Scan all blocks
> +     *-----------------------
> +     */
> +    for (blkno=1 ; blkno<nblocks ; blkno++)
> +    {
> +        Buffer buffer = ReadBuffer(rel, blkno);
> +        BTPageStat stat;
> +
> +        /* scan one page */
> +        stat.blkno = blkno;
> +        GetBTPageStatistics(blkno, buffer, &stat);
> +
> +        /*---------------------
> +         * page status (type)
> +         *---------------------
> +         */
> +        if ( stat.is_deleted )
> +            indexStat.deleted_pages++;
> +        else if ( stat.is_leaf )
> +            indexStat.leaf_pages++;
> +        else if ( stat.is_internal )
> +            indexStat.internal_pages++;
> +        else if ( stat.is_empty )
> +            indexStat.empty_pages++;
> +        else if ( stat.is_root )
> +            indexStat.root_pages++;
> +        else
> +            elog(ERROR, "unknown page status.");
> +
> +        /* -- leaf fragmentation -- */
> +        indexStat.fragments += stat.fragments;
> +
> +        if ( stat.is_leaf )
> +        {
> +            indexStat.max_avail  += stat.max_avail;
> +            indexStat.free_space += stat.free_size;
> +        }
> +
> +        ReleaseBuffer(buffer);
> +    }
> +
> +    relation_close(rel, AccessShareLock);
> +
> +#ifdef NOT_USED
> +    elog(NOTICE, "[index information]");
> +    elog(NOTICE, "version.................: %d", indexStat.version);
> +    elog(NOTICE, "tree level..............: %d", indexStat.level);
> +    elog(NOTICE, "index size..............: %d", (indexStat.root_pages +
> +                                    indexStat.leaf_pages +
> +                                    indexStat.internal_pages +
> +                                    indexStat.deleted_pages +
> +                                    indexStat.empty_pages) * BLCKSZ);
> +
> +    elog(NOTICE, "");
> +    elog(NOTICE, "[page statistics]");
> +    elog(NOTICE, "root block number.......: %d", indexStat.root_blkno);
> +    elog(NOTICE, "internal pages..........: %d", indexStat.internal_pages);
> +    elog(NOTICE, "leaf pages..............: %d", indexStat.leaf_pages);
> +    elog(NOTICE, "empty pages.............: %d", indexStat.empty_pages);
> +    elog(NOTICE, "deleted pages...........: %d", indexStat.deleted_pages);
> +
> +    elog(NOTICE, "");
> +    elog(NOTICE, "[leaf statistics]");
> +    elog(NOTICE, "avg. leaf density.......: %.2f%%", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail
*100.0 ); 
> +    elog(NOTICE, "leaf fragmentation......: %.2f%%", (float)indexStat.fragments / 2.0 / (float)indexStat.leaf_pages
*100.0 ); 
> +#endif
> +
> +    /*----------------------------
> +     * Build a result tuple
> +     *----------------------------
> +     */
> +    {
> +        TupleDesc tupleDesc;
> +        int j;
> +        char *values[PGSTATINDEX_NCOLUMNS];
> +
> +        HeapTupleData tupleData;
> +        HeapTuple tuple = &tupleData;
> +
> +        tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
> +
> +        j = 0;
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.version);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.level);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", (indexStat.root_pages +
> +                                         indexStat.leaf_pages +
> +                                         indexStat.internal_pages +
> +                                         indexStat.deleted_pages +
> +                                         indexStat.empty_pages) * BLCKSZ );
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.root_blkno);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.internal_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.empty_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%.2f", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail * 100.0 );
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%.2f", (float)indexStat.fragments / (float)indexStat.leaf_pages * 100.0 );
> +
> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> +                                       values);
> +
> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> +    }
> +
> +    PG_RETURN_DATUM( result );
> +}
> +
> +/* -----------------------------------------------
> + * bt_page()
> + *
> + * Usage: SELECT * FROM bt_page('t1_pkey', 0);
> + * -----------------------------------------------
> + */
> +Datum
> +bt_page_stats(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    uint32       blkno   = PG_GETARG_UINT32(1);
> +    Buffer       buffer;
> +
> +    Relation    rel;
> +    RangeVar    *relrv;
> +    Datum        result;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    CHECK_RELATION_BLOCK_RANGE(rel, blkno);
> +
> +    buffer = ReadBuffer(rel, blkno);
> +
> +    if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
> +        elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
> +
> +    if ( blkno==0 )
> +        elog(ERROR, "Block 0 is a meta page.");
> +
> +    {
> +        HeapTuple tuple;
> +        TupleDesc tupleDesc;
> +        int j;
> +        char *values[BTPAGESTATS_NCOLUMNS];
> +
> +        BTPageStat stat;
> +
> +        GetBTPageStatistics(blkno, buffer, &stat);
> +
> +        tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
> +
> +        j = 0;
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.blkno);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.live_items);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.dead_items);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.live_items + stat.dead_items);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.avg_item_size);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.page_size);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.free_size);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.is_deleted);
> +        values[j] = palloc(32);
> +//        snprintf(values[j++], 32, "%d", opaque->btpo_prev);
> +        snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> +        values[j] = palloc(32);
> +//        snprintf(values[j++], 32, "%d", opaque->btpo_next);
> +        snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> +        values[j] = palloc(32);
> +//        snprintf(values[j++], 32, "%d", opaque->btpo.level);
> +        snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> +        values[j] = palloc(32);
> +//        snprintf(values[j++], 32, "%d", opaque->btpo_flags);
> +        snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> +
> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> +                                       values);
> +
> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> +    }
> +
> +    ReleaseBuffer(buffer);
> +
> +    relation_close(rel, AccessShareLock);
> +
> +    PG_RETURN_DATUM(result);
> +}
> +
> +/*-------------------------------------------------------
> + * bt_page_items()
> + *
> + * Get IndexTupleData set in a leaf page
> + *
> + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
> + *-------------------------------------------------------
> + */
> +/* ---------------------------------------------------
> + * data structure for SRF to hold a scan information
> + * ---------------------------------------------------
> + */
> +struct user_args
> +{
> +    TupleDesc tupd;
> +    Relation rel;
> +    Buffer buffer;
> +    Page page;
> +    uint16 offset;
> +};
> +
> +Datum
> +bt_page_items(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    uint32       blkno   = PG_GETARG_UINT32(1);
> +
> +    RangeVar    *relrv;
> +    Datum result;
> +    char *values[BTPAGEITEMS_NCOLUMNS];
> +    BTPageOpaque opaque;
> +    HeapTuple tuple;
> +    ItemId id;
> +
> +    FuncCallContext *fctx;
> +    MemoryContext mctx;
> +    struct user_args *uargs = NULL;
> +
> +    if ( blkno==0 )
> +        elog(ERROR, "Block 0 is a meta page.");
> +
> +    if ( SRF_IS_FIRSTCALL() )
> +    {
> +        fctx = SRF_FIRSTCALL_INIT();
> +        mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
> +
> +        uargs = palloc(sizeof(struct user_args));
> +
> +        uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
> +        uargs->offset = FirstOffsetNumber;
> +
> +        relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +        uargs->rel = relation_openrv(relrv, AccessShareLock);
> +
> +        CHECK_RELATION_BLOCK_RANGE(uargs->rel,  blkno);
> +
> +        uargs->buffer = ReadBuffer(uargs->rel, blkno);
> +
> +        if ( !IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel) )
> +            elog(ERROR, "bt_page_items() can be used only on b-tree index.");
> +
> +        uargs->page = BufferGetPage(uargs->buffer);
> +
> +        opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
> +
> +        if ( P_ISDELETED(opaque) )
> +            elog(NOTICE, "bt_page_items(): this page is deleted.");
> +
> +        fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
> +        fctx->user_fctx = uargs;
> +
> +        MemoryContextSwitchTo(mctx);
> +    }
> +
> +    fctx = SRF_PERCALL_SETUP();
> +    uargs = fctx->user_fctx;
> +
> +    if ( fctx->call_cntr < fctx->max_calls )
> +    {
> +        BTItem btitem;
> +        IndexTuple itup;
> +
> +        id = PageGetItemId(uargs->page, uargs->offset);
> +
> +        if ( !ItemIdIsValid(id) )
> +            elog(ERROR, "Invalid ItemId.");
> +
> +        btitem = (BTItem)PageGetItem(uargs->page, id);
> +        itup = &(btitem->bti_itup);
> +
> +        {
> +            int j = 0;
> +
> +            BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
> +
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%d", uargs->offset);
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
> +
> +            {
> +                int off;
> +                char *dump;
> +                char *ptr = (char *)itup + IndexInfoFindDataOffset(itup->t_info);
> +
> +                dump = palloc(IndexTupleSize(itup)*3);
> +                memset(dump, 0, IndexTupleSize(itup)*3);
> +
> +                for (off=0 ;
> +                     off < IndexTupleSize(itup)-IndexInfoFindDataOffset(itup->t_info) ;
> +                     off++)
> +                {
> +                    if ( dump[0]=='\0' )
> +                        sprintf(dump, "%02x", *(ptr+off) & 0xff);
> +                    else
> +                    {
> +                        char buf[4];
> +                        sprintf(buf, " %02x", *(ptr+off) & 0xff);
> +                        strcat(dump, buf);
> +                    }
> +                }
> +                values[j] = dump;
> +            }
> +
> +            tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
> +            result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
> +        }
> +
> +        uargs->offset = uargs->offset + 1;
> +
> +        SRF_RETURN_NEXT(fctx, result);
> +    }
> +    else
> +    {
> +        ReleaseBuffer(uargs->buffer);
> +        relation_close(uargs->rel, AccessShareLock);
> +
> +        SRF_RETURN_DONE(fctx);
> +    }
> +}
> +
> +
> +/* ------------------------------------------------
> + * bt_metap()
> + *
> + * Get a btree meta-page information
> + *
> + * Usage: SELECT * FROM bt_metap('t1_pkey')
> + * ------------------------------------------------
> + */
> +Datum
> +bt_metap(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    Buffer       buffer;
> +
> +    Relation    rel;
> +    RangeVar    *relrv;
> +    Datum        result;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
> +        elog(ERROR, "bt_metap() can be used only on b-tree index.");
> +
> +    buffer = ReadBuffer(rel, 0);
> +
> +    {
> +        BTMetaPageData *metad;
> +
> +        TupleDesc tupleDesc;
> +        int j;
> +        char *values[BTMETAP_NCOLUMNS];
> +        HeapTuple tuple;
> +
> +        Page page = BufferGetPage(buffer);
> +
> +        metad = BTPageGetMeta(page);
> +
> +        tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
> +
> +        j = 0;
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_magic);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_version);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_root);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_level);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_fastroot);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
> +
> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> +                                       values);
> +
> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> +    }
> +
> +    ReleaseBuffer(buffer);
> +
> +    relation_close(rel, AccessShareLock);
> +
> +    PG_RETURN_DATUM(result);
> +}
> +
> +/* --------------------------------------------------------
> + * pg_relpages()
> + *
> + * Get a number of pages of the table/index.
> + *
> + * Usage: SELECT pg_relpages('t1');
> + *        SELECT pg_relpages('t1_pkey');
> + * --------------------------------------------------------
> + */
> +Datum
> +pg_relpages(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +
> +    Relation    rel;
> +    RangeVar    *relrv;
> +    int4  relpages;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    relpages = RelationGetNumberOfBlocks(rel);
> +
> +    relation_close(rel, AccessShareLock);
> +
> +    PG_RETURN_INT32(relpages);
> +}
> +
> diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
> --- pgstattuple.orig/pgstattuple.sql.in    2006-08-10 19:22:47.000000000 +0900
> +++ pgstattuple/pgstattuple.sql.in    2006-08-11 17:46:19.000000000 +0900
> @@ -22,3 +22,97 @@
>  RETURNS pgstattuple_type
>  AS 'MODULE_PATHNAME', 'pgstattuplebyid'
>  LANGUAGE 'C' STRICT;
> +
> +--
> +-- pgstatindex
> +--
> +DROP TYPE pgstatindex_type CASCADE;
> +CREATE TYPE pgstatindex_type AS (
> +  version int4,
> +  tree_level int4,
> +  index_size int4,
> +  root_block_no int4,
> +  internal_pages int4,
> +  leaf_pages int4,
> +  empty_pages int4,
> +  deleted_pages int4,
> +  avg_leaf_density float8,
> +  leaf_fragmentation float8
> +);
> +
> +CREATE OR REPLACE FUNCTION pgstatindex(text)
> +RETURNS pgstatindex_type
> +AS 'MODULE_PATHNAME', 'pgstatindex'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_metap()
> +--
> +DROP TYPE bt_metap_type CASCADE;
> +CREATE TYPE bt_metap_type AS (
> +  magic int4,
> +  version int4,
> +  root int4,
> +  level int4,
> +  fastroot int4,
> +  fastlevel int4
> +);
> +
> +CREATE OR REPLACE FUNCTION bt_metap(text)
> +RETURNS bt_metap_type
> +AS 'MODULE_PATHNAME', 'bt_metap'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_page_items()
> +--
> +DROP TYPE bt_page_items_type CASCADE;
> +CREATE TYPE bt_page_items_type AS (
> +  itemoffset int4,
> +  ctid tid,
> +  itemlen int4,
> +  nulls bool,
> +  vars bool,
> +  data text
> +);
> +
> +DROP FUNCTION bt_page_items(text, int4);
> +
> +CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
> +RETURNS SETOF bt_page_items_type
> +AS 'MODULE_PATHNAME', 'bt_page_items'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_page_stats()
> +--
> +DROP TYPE bt_page_stats_type CASCADE;
> +CREATE TYPE bt_page_stats_type AS (
> +  blkno int4,
> +  live_items int4,
> +  dead_items int4,
> +  total_items int4,
> +  avg_item_size float,
> +  page_size int4,
> +  free_size int4,
> +  is_deleted int4,
> +  btpo_prev int4,
> +  btpo_next int4,
> +  btpo_level int4,
> +  btpo_flags int4
> +);
> +
> +DROP FUNCTION bt_page_stats(text, int4);
> +
> +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
> +RETURNS bt_page_stats_type
> +AS 'MODULE_PATHNAME', 'bt_page_stats'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- pg_relpages()
> +--
> +CREATE OR REPLACE FUNCTION pg_relpages(text)
> +RETURNS int
> +AS 'MODULE_PATHNAME', 'pg_relpages'
> +LANGUAGE 'C' STRICT;
> diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
> --- pgstattuple.orig/test.sh    1970-01-01 09:00:00.000000000 +0900
> +++ pgstattuple/test.sh    2006-08-11 17:47:05.000000000 +0900
> @@ -0,0 +1,22 @@
> +#!/bin/sh
> +
> +export PATH=/usr/local/pgsql814/bin:$PATH
> +
> +psql pgbench<<EOF
> +DROP FUNCTION pgstatindex(text);
> +EOF
> +
> +psql -f /usr/local/pgsql814/share/contrib/pgstattuple.sql pgbench
> +
> +psql pgbench<<EOF
> +SELECT * FROM pg_relpages('accounts_pkey');
> +\x
> +SELECT * FROM pgstatindex('accounts_pkey');
> +SELECT * FROM bt_metap('accounts_pkey');
> +\x
> +SELECT * FROM bt_page_items('accounts_pkey', 0);
> +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
> +
> +\x
> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
> +EOF
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: pgstattuple extension for indexes

From
Alvaro Herrera
Date:
Satoshi Nagayasu wrote:
> Hi all,
>
> Here is a patch to add pgstatindex functions to the pgstattuple module,
> which can work with 8.1.4. Please review and try it. Thanks.

Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:

/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)


While you're at it, please consider removing C++ style comments and
unused code.

Formatting is way off as well, but I guess that is easily fixed with
pgindent.

Regarding the pg_relpages function, why do you think it's necessary?
(It returns the true number of blocks of a given relation).  It may
belong into core given a reasonable use case, but otherwise it doesn't
seem to belong into pgstatindex (or pgstattuple for that matter).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: pgstattuple extension for indexes

From
Satoshi Nagayasu
Date:
Alvaro,

Alvaro Herrera wrote:
> Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
>
> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)
>
>
> While you're at it, please consider removing C++ style comments and
> unused code.
>
> Formatting is way off as well, but I guess that is easily fixed with
> pgindent.

Thanks for comments. I'm going to fix my patch from now.

> Regarding the pg_relpages function, why do you think it's necessary?
> (It returns the true number of blocks of a given relation).  It may
> belong into core given a reasonable use case, but otherwise it doesn't
> seem to belong into pgstatindex (or pgstattuple for that matter).

I wanted to sample some pages from the table/index, and get their statistics
to know table/index conditions. I know pgstattuple() reports table
statistics, however, pgstattuple() generates heavy CPU and I/O load.

When we need to sample some pages from table/index, we need to know
true number of blocks.

I have another function, called pgstatpage(), to get information inside
a single block/page statistics of the table. pg_relpages() will be used
with this.

Sorry for not mentioned in previous post about pgstatpage(),
but I've remembered about it just now.

Many memories in my brain have already `paged-out` (too busy in last few months),
and some of them got `out-of-memory`. :^)

Thanks.
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122

Re: pgstattuple extension for indexes

From
Satoshi Nagayasu
Date:
Bruce,

Attached patch has been cleaned up,
and modified to be able to work with CVS HEAD.

Thanks.

Satoshi Nagayasu wrote:
> Alvaro,
>
> Alvaro Herrera wrote:
>> Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
>>
>> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
>> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)
>>
>>
>> While you're at it, please consider removing C++ style comments and
>> unused code.
>>
>> Formatting is way off as well, but I guess that is easily fixed with
>> pgindent.
>
> Thanks for comments. I'm going to fix my patch from now.
>
>> Regarding the pg_relpages function, why do you think it's necessary?
>> (It returns the true number of blocks of a given relation).  It may
>> belong into core given a reasonable use case, but otherwise it doesn't
>> seem to belong into pgstatindex (or pgstattuple for that matter).
>
> I wanted to sample some pages from the table/index, and get their statistics
> to know table/index conditions. I know pgstattuple() reports table
> statistics, however, pgstattuple() generates heavy CPU and I/O load.
>
> When we need to sample some pages from table/index, we need to know
> true number of blocks.
>
> I have another function, called pgstatpage(), to get information inside
> a single block/page statistics of the table. pg_relpages() will be used
> with this.
>
> Sorry for not mentioned in previous post about pgstatpage(),
> but I've remembered about it just now.
>
> Many memories in my brain have already `paged-out` (too busy in last few months),
> and some of them got `out-of-memory`. :^)
>
> Thanks.


--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
--- pgstattuple.orig/Makefile    2006-02-27 21:54:40.000000000 +0900
+++ pgstattuple/Makefile    2006-08-14 09:28:58.000000000 +0900
@@ -6,7 +6,7 @@
 #
 #-------------------------------------------------------------------------

-SRCS        = pgstattuple.c
+SRCS        = pgstattuple.c pgstatindex.c

 MODULE_big    = pgstattuple
 OBJS        = $(SRCS:.c=.o)
diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
--- pgstattuple.orig/pgstatindex.c    1970-01-01 09:00:00.000000000 +0900
+++ pgstattuple/pgstatindex.c    2006-08-14 11:24:23.000000000 +0900
@@ -0,0 +1,706 @@
+/*
+ * pgstatindex
+ *
+ * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp>
+ *
+ * Permission to use, copy, modify, and distribute this software and
+ * its documentation for any purpose, without fee, and without a
+ * written agreement is hereby granted, provided that the above
+ * copyright notice and this paragraph and the following two
+ * paragraphs appear in all copies.
+ *
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
+ * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
+ * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
+ * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
+ * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
+ * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+#include "access/heapam.h"
+#include "access/itup.h"
+#include "access/nbtree.h"
+#include "access/transam.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_type.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+
+PG_FUNCTION_INFO_V1(pgstatindex);
+PG_FUNCTION_INFO_V1(bt_metap);
+PG_FUNCTION_INFO_V1(bt_page_items);
+PG_FUNCTION_INFO_V1(bt_page_stats);
+PG_FUNCTION_INFO_V1(pg_relpages);
+
+extern Datum pgstatindex(PG_FUNCTION_ARGS);
+extern Datum bt_metap(PG_FUNCTION_ARGS);
+extern Datum bt_page_items(PG_FUNCTION_ARGS);
+extern Datum bt_page_stats(PG_FUNCTION_ARGS);
+extern Datum pg_relpages(PG_FUNCTION_ARGS);
+
+#define PGSTATINDEX_TYPE "public.pgstatindex_type"
+#define PGSTATINDEX_NCOLUMNS 10
+
+#define BTMETAP_TYPE "public.bt_metap_type"
+#define BTMETAP_NCOLUMNS 6
+
+#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
+#define BTPAGEITEMS_NCOLUMNS 6
+
+#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
+#define BTPAGESTATS_NCOLUMNS 11
+
+
+#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
+#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
+
+#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
+        if ( !(FirstOffsetNumber<=(offset) && \
+                        (offset)<=PageGetMaxOffsetNumber(page)) ) \
+             elog(ERROR, "Page offset number out of range."); }
+
+#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
+        if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
+             elog(ERROR, "Block number out of range."); }
+
+/* ------------------------------------------------
+ * structure for single btree page statistics
+ * ------------------------------------------------
+ */
+typedef struct BTPageStat
+{
+    uint32        blkno;
+    uint32        live_items;
+    uint32        dead_items;
+    uint32        page_size;
+    uint32        max_avail;
+    uint32        free_size;
+    uint32        avg_item_size;
+    uint32        fragments;
+    char        type;
+
+    /* opaque data */
+    BlockNumber btpo_prev;
+    BlockNumber btpo_next;
+    union
+    {
+        uint32        level;
+        TransactionId xact;
+    }            btpo;
+    uint16        btpo_flags;
+    BTCycleId    btpo_cycleid;
+}    BTPageStat;
+
+/* ------------------------------------------------
+ * A structure for a whole btree index statistics
+ * used by pgstatindex().
+ * ------------------------------------------------
+ */
+typedef struct BTIndexStat
+{
+    uint32        magic;
+    uint32        version;
+    BlockNumber root_blkno;
+    uint32        level;
+
+    BlockNumber fastroot;
+    uint32        fastlevel;
+
+    uint32        live_items;
+    uint32        dead_items;
+
+    uint32        root_pages;
+    uint32        internal_pages;
+    uint32        leaf_pages;
+    uint32        empty_pages;
+    uint32        deleted_pages;
+
+    uint32        page_size;
+    uint32        avg_item_size;
+
+    uint32        max_avail;
+    uint32        free_space;
+
+    uint32        fragments;
+}    BTIndexStat;
+
+/* -------------------------------------------------
+ * GetBTPageStatistics()
+ *
+ * Collect statistics of single b-tree leaf page
+ * -------------------------------------------------
+ */
+static bool
+GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat * stat)
+{
+    Page        page = BufferGetPage(buffer);
+    PageHeader    phdr = (PageHeader) page;
+    OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+    BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+    int            item_size = 0;
+    int            off;
+
+    stat->blkno = blkno;
+
+    stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData);
+
+    stat->dead_items = stat->live_items = 0;
+
+    stat->page_size = PageGetPageSize(page);
+
+    /* page type (flags) */
+    if (P_ISDELETED(opaque))
+    {
+        stat->type = 'd';
+        return true;
+    }
+    else if (P_IGNORE(opaque))
+        stat->type = 'e';
+    else if (P_ISLEAF(opaque))
+        stat->type = 'l';
+    else if (P_ISROOT(opaque))
+        stat->type = 'r';
+    else
+        stat->type = 'i';
+
+    /* btpage opaque data */
+    stat->btpo_prev = opaque->btpo_prev;
+    stat->btpo_next = opaque->btpo_next;
+    if (P_ISDELETED(opaque))
+        stat->btpo.xact = opaque->btpo.xact;
+    else
+        stat->btpo.level = opaque->btpo.level;
+    stat->btpo_flags = opaque->btpo_flags;
+    stat->btpo_cycleid = opaque->btpo_cycleid;
+
+    /*----------------------------------------------
+     * If a next leaf is on the previous block,
+     * it means a fragmentation.
+     *----------------------------------------------
+     */
+    stat->fragments = 0;
+    if (stat->type == 'l')
+    {
+        if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
+            stat->fragments++;
+    }
+
+    /* count live and dead tuples, and free space */
+    for (off = FirstOffsetNumber; off <= maxoff; off++)
+    {
+        IndexTuple    itup;
+
+        ItemId        id = PageGetItemId(page, off);
+
+        itup = (IndexTuple) PageGetItem(page, id);
+
+        item_size += IndexTupleSize(itup);
+
+        if (!ItemIdDeleted(id))
+            stat->live_items++;
+        else
+            stat->dead_items++;
+    }
+    stat->free_size = PageGetFreeSpace(page);
+
+    if ((stat->live_items + stat->dead_items) > 0)
+        stat->avg_item_size = item_size / (stat->live_items + stat->dead_items);
+    else
+        stat->avg_item_size = 0;
+
+    return true;
+}
+
+
+/* ------------------------------------------------------
+ * pgstatindex()
+ *
+ * Usage: SELECT * FROM pgstatindex('t1_pkey');
+ * ------------------------------------------------------
+ */
+Datum
+pgstatindex(PG_FUNCTION_ARGS)
+{
+    text       *relname = PG_GETARG_TEXT_P(0);
+    Relation    rel;
+    RangeVar   *relrv;
+    Datum        result;
+    uint32        nblocks;
+    uint32        blkno;
+    BTIndexStat indexStat;
+
+    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+    rel = relation_openrv(relrv, AccessShareLock);
+
+    if (!IS_INDEX(rel) || !IS_BTREE(rel))
+        elog(ERROR, "pgstatindex() can be used only on b-tree index.");
+
+    /*-------------------
+     * Read a metapage
+     *-------------------
+     */
+    {
+        Buffer        buffer = ReadBuffer(rel, 0);
+        Page        page = BufferGetPage(buffer);
+        BTMetaPageData *metad = BTPageGetMeta(page);
+
+        indexStat.magic = metad->btm_magic;
+        indexStat.version = metad->btm_version;
+        indexStat.root_blkno = metad->btm_root;
+        indexStat.level = metad->btm_level;
+        indexStat.fastroot = metad->btm_fastroot;
+        indexStat.fastlevel = metad->btm_fastlevel;
+
+        ReleaseBuffer(buffer);
+    }
+
+    nblocks = RelationGetNumberOfBlocks(rel);
+
+    /* -- init stat -- */
+    indexStat.fragments = 0;
+
+    indexStat.root_pages = 0;
+    indexStat.leaf_pages = 0;
+    indexStat.internal_pages = 0;
+    indexStat.empty_pages = 0;
+    indexStat.deleted_pages = 0;
+
+    indexStat.max_avail = 0;
+    indexStat.free_space = 0;
+
+    /*-----------------------
+     * Scan all blocks
+     *-----------------------
+     */
+    for (blkno = 1; blkno < nblocks; blkno++)
+    {
+        Buffer        buffer = ReadBuffer(rel, blkno);
+        BTPageStat    stat;
+
+        /* scan one page */
+        stat.blkno = blkno;
+        GetBTPageStatistics(blkno, buffer, &stat);
+
+        /*---------------------
+         * page status (type)
+         *---------------------
+         */
+        switch (stat.type)
+        {
+            case 'd':
+                indexStat.deleted_pages++;
+                break;
+            case 'l':
+                indexStat.leaf_pages++;
+                break;
+            case 'i':
+                indexStat.internal_pages++;
+                break;
+            case 'e':
+                indexStat.empty_pages++;
+                break;
+            case 'r':
+                indexStat.root_pages++;
+                break;
+            default:
+                elog(ERROR, "unknown page status.");
+        }
+
+        /* -- leaf fragmentation -- */
+        indexStat.fragments += stat.fragments;
+
+        if (stat.type == 'l')
+        {
+            indexStat.max_avail += stat.max_avail;
+            indexStat.free_space += stat.free_size;
+        }
+
+        ReleaseBuffer(buffer);
+    }
+
+    relation_close(rel, AccessShareLock);
+
+    /*----------------------------
+     * Build a result tuple
+     *----------------------------
+     */
+    {
+        TupleDesc    tupleDesc;
+        int            j;
+        char       *values[PGSTATINDEX_NCOLUMNS];
+
+        HeapTupleData tupleData;
+        HeapTuple    tuple = &tupleData;
+
+        tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
+
+        j = 0;
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", indexStat.version);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", indexStat.level);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", (indexStat.root_pages +
+                                         indexStat.leaf_pages +
+                                         indexStat.internal_pages +
+                                         indexStat.deleted_pages +
+                                         indexStat.empty_pages) * BLCKSZ);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", indexStat.root_blkno);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", indexStat.internal_pages);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", indexStat.empty_pages);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%.2f", 100.0 - (float) indexStat.free_space / (float) indexStat.max_avail * 100.0);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%.2f", (float) indexStat.fragments / (float) indexStat.leaf_pages * 100.0);
+
+        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
+                                       values);
+
+        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
+    }
+
+    PG_RETURN_DATUM(result);
+}
+
+/* -----------------------------------------------
+ * bt_page()
+ *
+ * Usage: SELECT * FROM bt_page('t1_pkey', 0);
+ * -----------------------------------------------
+ */
+Datum
+bt_page_stats(PG_FUNCTION_ARGS)
+{
+    text       *relname = PG_GETARG_TEXT_P(0);
+    uint32        blkno = PG_GETARG_UINT32(1);
+    Buffer        buffer;
+
+    Relation    rel;
+    RangeVar   *relrv;
+    Datum        result;
+
+    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+    rel = relation_openrv(relrv, AccessShareLock);
+
+    CHECK_RELATION_BLOCK_RANGE(rel, blkno);
+
+    buffer = ReadBuffer(rel, blkno);
+
+    if (!IS_INDEX(rel) || !IS_BTREE(rel))
+        elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
+
+    if (blkno == 0)
+        elog(ERROR, "Block 0 is a meta page.");
+
+    {
+        HeapTuple    tuple;
+        TupleDesc    tupleDesc;
+        int            j;
+        char       *values[BTPAGESTATS_NCOLUMNS];
+
+        BTPageStat    stat;
+
+        GetBTPageStatistics(blkno, buffer, &stat);
+
+        tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
+
+        j = 0;
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.blkno);
+
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%c", stat.type);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.live_items);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.dead_items);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.avg_item_size);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.page_size);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.free_size);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.btpo_prev);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.btpo_next);
+
+        values[j] = palloc(32);
+        if (stat.type == 'd')
+            snprintf(values[j++], 32, "%d", stat.btpo.xact);
+        else
+            snprintf(values[j++], 32, "%d", stat.btpo.level);
+
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", stat.btpo_flags);
+
+        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
+                                       values);
+
+        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
+    }
+
+    ReleaseBuffer(buffer);
+
+    relation_close(rel, AccessShareLock);
+
+    PG_RETURN_DATUM(result);
+}
+
+/*-------------------------------------------------------
+ * bt_page_items()
+ *
+ * Get IndexTupleData set in a leaf page
+ *
+ * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
+ *-------------------------------------------------------
+ */
+/* ---------------------------------------------------
+ * data structure for SRF to hold a scan information
+ * ---------------------------------------------------
+ */
+struct user_args
+{
+    TupleDesc    tupd;
+    Relation    rel;
+    Buffer        buffer;
+    Page        page;
+    uint16        offset;
+};
+
+Datum
+bt_page_items(PG_FUNCTION_ARGS)
+{
+    text       *relname = PG_GETARG_TEXT_P(0);
+    uint32        blkno = PG_GETARG_UINT32(1);
+
+    RangeVar   *relrv;
+    Datum        result;
+    char       *values[BTPAGEITEMS_NCOLUMNS];
+    BTPageOpaque opaque;
+    HeapTuple    tuple;
+    ItemId        id;
+
+    FuncCallContext *fctx;
+    MemoryContext mctx;
+    struct user_args *uargs = NULL;
+
+    if (blkno == 0)
+        elog(ERROR, "Block 0 is a meta page.");
+
+    if (SRF_IS_FIRSTCALL())
+    {
+        fctx = SRF_FIRSTCALL_INIT();
+        mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
+
+        uargs = palloc(sizeof(struct user_args));
+
+        uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
+        uargs->offset = FirstOffsetNumber;
+
+        relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+        uargs->rel = relation_openrv(relrv, AccessShareLock);
+
+        CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno);
+
+        uargs->buffer = ReadBuffer(uargs->rel, blkno);
+
+        if (!IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel))
+            elog(ERROR, "bt_page_items() can be used only on b-tree index.");
+
+        uargs->page = BufferGetPage(uargs->buffer);
+
+        opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
+
+        if (P_ISDELETED(opaque))
+            elog(NOTICE, "bt_page_items(): this page is deleted.");
+
+        fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
+        fctx->user_fctx = uargs;
+
+        MemoryContextSwitchTo(mctx);
+    }
+
+    fctx = SRF_PERCALL_SETUP();
+    uargs = fctx->user_fctx;
+
+    if (fctx->call_cntr < fctx->max_calls)
+    {
+        IndexTuple    itup;
+
+        id = PageGetItemId(uargs->page, uargs->offset);
+
+        if (!ItemIdIsValid(id))
+            elog(ERROR, "Invalid ItemId.");
+
+        itup = (IndexTuple) PageGetItem(uargs->page, id);
+
+        {
+            int            j = 0;
+
+            BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
+
+            values[j] = palloc(32);
+            snprintf(values[j++], 32, "%d", uargs->offset);
+            values[j] = palloc(32);
+            snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
+            values[j] = palloc(32);
+            snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
+            values[j] = palloc(32);
+            snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
+            values[j] = palloc(32);
+            snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
+
+            {
+                int            off;
+                char       *dump;
+                char       *ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
+
+                dump = palloc(IndexTupleSize(itup) * 3);
+                memset(dump, 0, IndexTupleSize(itup) * 3);
+
+                for (off = 0;
+                     off < IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
+                     off++)
+                {
+                    if (dump[0] == '\0')
+                        sprintf(dump, "%02x", *(ptr + off) & 0xff);
+                    else
+                    {
+                        char        buf[4];
+
+                        sprintf(buf, " %02x", *(ptr + off) & 0xff);
+                        strcat(dump, buf);
+                    }
+                }
+                values[j] = dump;
+            }
+
+            tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
+            result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
+        }
+
+        uargs->offset = uargs->offset + 1;
+
+        SRF_RETURN_NEXT(fctx, result);
+    }
+    else
+    {
+        ReleaseBuffer(uargs->buffer);
+        relation_close(uargs->rel, AccessShareLock);
+
+        SRF_RETURN_DONE(fctx);
+    }
+}
+
+
+/* ------------------------------------------------
+ * bt_metap()
+ *
+ * Get a btree meta-page information
+ *
+ * Usage: SELECT * FROM bt_metap('t1_pkey')
+ * ------------------------------------------------
+ */
+Datum
+bt_metap(PG_FUNCTION_ARGS)
+{
+    text       *relname = PG_GETARG_TEXT_P(0);
+    Buffer        buffer;
+
+    Relation    rel;
+    RangeVar   *relrv;
+    Datum        result;
+
+    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+    rel = relation_openrv(relrv, AccessShareLock);
+
+    if (!IS_INDEX(rel) || !IS_BTREE(rel))
+        elog(ERROR, "bt_metap() can be used only on b-tree index.");
+
+    buffer = ReadBuffer(rel, 0);
+
+    {
+        BTMetaPageData *metad;
+
+        TupleDesc    tupleDesc;
+        int            j;
+        char       *values[BTMETAP_NCOLUMNS];
+        HeapTuple    tuple;
+
+        Page        page = BufferGetPage(buffer);
+
+        metad = BTPageGetMeta(page);
+
+        tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
+
+        j = 0;
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", metad->btm_magic);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", metad->btm_version);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", metad->btm_root);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", metad->btm_level);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", metad->btm_fastroot);
+        values[j] = palloc(32);
+        snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
+
+        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
+                                       values);
+
+        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
+    }
+
+    ReleaseBuffer(buffer);
+
+    relation_close(rel, AccessShareLock);
+
+    PG_RETURN_DATUM(result);
+}
+
+/* --------------------------------------------------------
+ * pg_relpages()
+ *
+ * Get a number of pages of the table/index.
+ *
+ * Usage: SELECT pg_relpages('t1');
+ *          SELECT pg_relpages('t1_pkey');
+ * --------------------------------------------------------
+ */
+Datum
+pg_relpages(PG_FUNCTION_ARGS)
+{
+    text       *relname = PG_GETARG_TEXT_P(0);
+
+    Relation    rel;
+    RangeVar   *relrv;
+    int4        relpages;
+
+    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+    rel = relation_openrv(relrv, AccessShareLock);
+
+    relpages = RelationGetNumberOfBlocks(rel);
+
+    relation_close(rel, AccessShareLock);
+
+    PG_RETURN_INT32(relpages);
+}
diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
--- pgstattuple.orig/pgstattuple.sql.in    2006-02-28 01:09:50.000000000 +0900
+++ pgstattuple/pgstattuple.sql.in    2006-08-14 10:37:32.000000000 +0900
@@ -22,3 +22,96 @@
 RETURNS pgstattuple_type
 AS 'MODULE_PATHNAME', 'pgstattuplebyid'
 LANGUAGE C STRICT;
+
+--
+-- pgstatindex
+--
+DROP TYPE pgstatindex_type CASCADE;
+CREATE TYPE pgstatindex_type AS (
+  version int4,
+  tree_level int4,
+  index_size int4,
+  root_block_no int4,
+  internal_pages int4,
+  leaf_pages int4,
+  empty_pages int4,
+  deleted_pages int4,
+  avg_leaf_density float8,
+  leaf_fragmentation float8
+);
+
+CREATE OR REPLACE FUNCTION pgstatindex(text)
+RETURNS pgstatindex_type
+AS 'MODULE_PATHNAME', 'pgstatindex'
+LANGUAGE 'C' STRICT;
+
+--
+-- bt_metap()
+--
+DROP TYPE bt_metap_type CASCADE;
+CREATE TYPE bt_metap_type AS (
+  magic int4,
+  version int4,
+  root int4,
+  level int4,
+  fastroot int4,
+  fastlevel int4
+);
+
+CREATE OR REPLACE FUNCTION bt_metap(text)
+RETURNS bt_metap_type
+AS 'MODULE_PATHNAME', 'bt_metap'
+LANGUAGE 'C' STRICT;
+
+--
+-- bt_page_items()
+--
+DROP TYPE bt_page_items_type CASCADE;
+CREATE TYPE bt_page_items_type AS (
+  itemoffset int4,
+  ctid tid,
+  itemlen int4,
+  nulls bool,
+  vars bool,
+  data text
+);
+
+DROP FUNCTION bt_page_items(text, int4);
+
+CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
+RETURNS SETOF bt_page_items_type
+AS 'MODULE_PATHNAME', 'bt_page_items'
+LANGUAGE 'C' STRICT;
+
+--
+-- bt_page_stats()
+--
+DROP TYPE bt_page_stats_type CASCADE;
+CREATE TYPE bt_page_stats_type AS (
+  blkno int4,
+  type char,
+  live_items int4,
+  dead_items int4,
+  avg_item_size float,
+  page_size int4,
+  free_size int4,
+  btpo_prev int4,
+  btpo_next int4,
+  btpo int4,
+  btpo_flags int4
+);
+
+DROP FUNCTION bt_page_stats(text, int4);
+
+CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
+RETURNS bt_page_stats_type
+AS 'MODULE_PATHNAME', 'bt_page_stats'
+LANGUAGE 'C' STRICT;
+
+--
+-- pg_relpages()
+--
+CREATE OR REPLACE FUNCTION pg_relpages(text)
+RETURNS int
+AS 'MODULE_PATHNAME', 'pg_relpages'
+LANGUAGE 'C' STRICT;
diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
--- pgstattuple.orig/test.sh    1970-01-01 09:00:00.000000000 +0900
+++ pgstattuple/test.sh    2006-08-14 10:23:08.000000000 +0900
@@ -0,0 +1,27 @@
+#!/bin/sh
+
+PGHOME=/home/snaga/pgsql20060814
+export PATH=${PGHOME}/bin:$PATH
+
+psql -p 9999 pgbench<<EOF
+DROP FUNCTION pgstatindex(text);
+EOF
+
+psql -p 9999 -f ${PGHOME}/share/contrib/pgstattuple.sql pgbench
+
+psql -p 9999 pgbench<<EOF
+SELECT * FROM pg_relpages('accounts_pkey');
+\x
+SELECT * FROM pgstatindex('accounts_pkey');
+SELECT * FROM bt_metap('accounts_pkey');
+\x
+SELECT * FROM bt_page_items('accounts_pkey', 0);
+SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
+
+SELECT * FROM bt_page_stats('accounts_pkey', 1);
+\x
+SELECT * FROM bt_page_stats('accounts_pkey', 1);
+SELECT * FROM bt_page_stats('accounts_pkey', 361);
+\x
+SELECT * FROM bt_page_items('accounts_pkey', 361);
+EOF

Re: pgstattuple extension for indexes

From
Alvaro Herrera
Date:
Satoshi Nagayasu wrote:
> Bruce,
>
> Attached patch has been cleaned up,
> and modified to be able to work with CVS HEAD.

I was thinking, isn't it a lot cleaner to define the functions to use
OUT parameters instead of having to define a custom type for each?

Also, in 8.2 there is a uninstall SQL script -- ISTM you should put the
DROP commands there, not in the install script.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: pgstattuple extension for indexes

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I was thinking, isn't it a lot cleaner to define the functions to use
> OUT parameters instead of having to define a custom type for each?

Not really --- it's a bit less notation maybe, but if he's got it
written like that already, I see no need to change it.

> Also, in 8.2 there is a uninstall SQL script -- ISTM you should put the
> DROP commands there, not in the install script.

Agreed.

            regards, tom lane

Re: pgstattuple extension for indexes

From
Bruce Momjian
Date:
BTIem is no longer in CVS HEAD, though it was in 8.1.X.  Please update
your patch for CVS HEAD.  Thanks.

---------------------------------------------------------------------------

Satoshi Nagayasu wrote:
> Hi all,
>
> Here is a patch to add pgstatindex functions to the pgstattuple module,
> which can work with 8.1.4. Please review and try it. Thanks.
>
>
> Satoshi Nagayasu wrote:
> > Bruce,
> >
> > I'll fix it in this week. Please wait a few days.
> > Thanks.
> >
> > Bruce Momjian wrote:
> >> nagayasu-san,
> >>
> >> This looks good, but we would like the code added to
> >> /contrib/pgstattuple, rather than it being its own /contrib module.  Can
> >> you make that adjustment?  Thanks.
> >>
> >> ---------------------------------------------------------------------------
> >>
> >> satoshi nagayasu wrote:
> >>> Hi folks,
> >>>
> >>> As I said on -PATCHES, I've been working on an utility to get
> >>> a b-tree index information. I'm happy to introduce
> >>> my new functions to you.
> >>>
> >>> pgstattuple module provides a `pgstatindex()`, and other small
> >>> functions, which allow you to get b-tree internal information.
> >>> I believe this module will be helpful to know b-tree index deeply.
> >>>
> >>> So please try it, send comment to me, and have fun.
> >>>
> >>> Thanks,
> >>> --
> >>> NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
> >>>
> >>> -----------------------------------------------------
> >>> pgbench=# \x
> >>> Expanded display is on.
> >>> pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
> >>> -[ RECORD 1 ]------+--------
> >>> version            | 2
> >>> tree_level         | 1
> >>> index_size         | 3588096
> >>> root_block_no      | 3
> >>> internal_pages     | 0
> >>> leaf_pages         | 437
> >>> empty_pages        | 0
> >>> deleted_pages      | 0
> >>> avg_leaf_density   | 59.5
> >>> leaf_fragmentation | 49.89
> >>> -----------------------------------------------------
> >>>
> >>>
> >> [ application/x-gzip is not supported, skipping... ]
> >>
> >>> ---------------------------(end of broadcast)---------------------------
> >>> TIP 4: Have you searched our list archives?
> >>>
> >>>                http://archives.postgresql.org
> >
> >
>
>
> --
> NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
> Phone: +81-3-3523-8122

> diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
> --- pgstattuple.orig/Makefile    2006-08-10 19:22:47.000000000 +0900
> +++ pgstattuple/Makefile    2006-08-10 19:24:05.000000000 +0900
> @@ -6,7 +6,7 @@
>  #
>  #-------------------------------------------------------------------------
>
> -SRCS        = pgstattuple.c
> +SRCS        = pgstattuple.c pgstatindex.c
>
>  MODULE_big    = pgstattuple
>  OBJS        = $(SRCS:.c=.o)
> diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
> --- pgstattuple.orig/pgstatindex.c    1970-01-01 09:00:00.000000000 +0900
> +++ pgstattuple/pgstatindex.c    2006-08-11 17:51:26.000000000 +0900
> @@ -0,0 +1,714 @@
> +/*
> + * pgstatindex
> + *
> + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp>
> + *
> + * Permission to use, copy, modify, and distribute this software and
> + * its documentation for any purpose, without fee, and without a
> + * written agreement is hereby granted, provided that the above
> + * copyright notice and this paragraph and the following two
> + * paragraphs appear in all copies.
> + *
> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
> + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
> + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
> + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
> + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
> + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
> + */
> +
> +#include "postgres.h"
> +
> +#include "fmgr.h"
> +#include "funcapi.h"
> +#include "access/heapam.h"
> +#include "access/itup.h"
> +#include "access/nbtree.h"
> +#include "access/transam.h"
> +#include "catalog/namespace.h"
> +#include "catalog/pg_type.h"
> +#include "utils/builtins.h"
> +#include "utils/inval.h"
> +
> +PG_FUNCTION_INFO_V1(pgstatindex);
> +PG_FUNCTION_INFO_V1(bt_metap);
> +PG_FUNCTION_INFO_V1(bt_page_items);
> +PG_FUNCTION_INFO_V1(bt_page_stats);
> +PG_FUNCTION_INFO_V1(pg_relpages);
> +
> +extern Datum pgstatindex(PG_FUNCTION_ARGS);
> +extern Datum bt_metap(PG_FUNCTION_ARGS);
> +extern Datum bt_page_items(PG_FUNCTION_ARGS);
> +extern Datum bt_page_stats(PG_FUNCTION_ARGS);
> +extern Datum pg_relpages(PG_FUNCTION_ARGS);
> +
> +#define PGSTATINDEX_TYPE "public.pgstatindex_type"
> +#define PGSTATINDEX_NCOLUMNS 10
> +
> +#define BTMETAP_TYPE "public.bt_metap_type"
> +#define BTMETAP_NCOLUMNS 6
> +
> +#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
> +#define BTPAGEITEMS_NCOLUMNS 6
> +
> +#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
> +#define BTPAGESTATS_NCOLUMNS 12
> +
> +
> +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
> +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
> +
> +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
> +        if ( !(FirstOffsetNumber<=(offset) && \
> +                        (offset)<=PageGetMaxOffsetNumber(page)) ) \
> +             elog(ERROR, "Page offset number out of range."); }
> +
> +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
> +        if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
> +             elog(ERROR, "Block number out of range."); }
> +
> +/* ------------------------------------------------
> + * structure for single btree page statistics
> + * ------------------------------------------------
> + */
> +typedef struct BTPageStat {
> +    uint32 blkno;
> +    uint32 live_items;
> +    uint32 dead_items;
> +    uint32 page_size;
> +    uint32 max_avail;
> +    uint32 free_size;
> +    uint32 avg_item_size;
> +    uint32 fragments;
> +    bool is_root;
> +    bool is_internal;
> +    bool is_leaf;
> +    bool is_deleted;
> +    bool is_empty;
> +} BTPageStat;
> +
> +/* ------------------------------------------------
> + * A structure for a whole btree index statistics
> + * used by pgstatindex().
> + * ------------------------------------------------
> + */
> +typedef struct BTIndexStat {
> +    uint32 version;
> +    BlockNumber root_blkno;
> +    uint32 level;
> +
> +    uint32 live_items;
> +    uint32 dead_items;
> +
> +    uint32 root_pages;
> +    uint32 internal_pages;
> +    uint32 leaf_pages;
> +    uint32 empty_pages;
> +    uint32 deleted_pages;
> +
> +    uint32 page_size;
> +    uint32 avg_item_size;
> +
> +    uint32 max_avail;
> +    uint32 free_space;
> +
> +    uint32 fragments;
> +} BTIndexStat;
> +
> +/* -------------------------------------------------
> + * GetBTPageStatistics()
> + *
> + * Collect statistics of single b-tree leaf page
> + * -------------------------------------------------
> + */
> +static bool
> +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat *stat)
> +{
> +    Page page           = BufferGetPage(buffer);
> +    PageHeader phdr     = (PageHeader) page;
> +    OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
> +    BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> +    int item_size = 0;
> +    int off;
> +
> +    stat->blkno = blkno;
> +
> +    stat->max_avail = BLCKSZ - ( BLCKSZ - phdr->pd_special + SizeOfPageHeaderData );
> +
> +    stat->dead_items = stat->live_items = 0;
> +
> +    stat->page_size = PageGetPageSize(page);
> +
> +    /* page type */
> +    stat->is_root    = false;
> +    stat->is_leaf    = false;
> +    stat->is_deleted = false;
> +    stat->is_empty   = false;
> +
> +    if ( P_ISDELETED(opaque) )
> +    {
> +        stat->is_deleted = true;
> +        return true;
> +    }
> +    else if ( P_IGNORE(opaque) )
> +        stat->is_empty = true;
> +    else if ( P_ISLEAF(opaque) )
> +        stat->is_leaf = true;
> +    else if ( P_ISROOT(opaque) )
> +        stat->is_root = true;
> +    else
> +        stat->is_internal = true;
> +
> +    /*----------------------------------------------
> +     * If a next leaf is on the previous block,
> +     * it means a fragmentation.
> +     *----------------------------------------------
> +     */
> +    stat->fragments = 0;
> +    if ( stat->is_leaf )
> +    {
> +        if ( opaque->btpo_next != P_NONE && opaque->btpo_next < blkno )
> +            stat->fragments++;
> +    }
> +
> +    /* count live and dead tuples, and free space */
> +    for (off=FirstOffsetNumber ; off<=maxoff ; off++)
> +    {
> +        BTItem btitem;
> +        IndexTuple itup;
> +
> +        ItemId id = PageGetItemId(page, off);
> +
> +        btitem = (BTItem)PageGetItem(page, id);
> +
> +        itup = &(btitem->bti_itup);
> +
> +        item_size += IndexTupleSize(itup);
> +
> +        if ( !ItemIdDeleted(id) )
> +            stat->live_items++;
> +        else
> +            stat->dead_items++;
> +    }
> +    stat->free_size = PageGetFreeSpace(page);
> +
> +#ifdef NOT_USED
> +    elog(NOTICE, "%d/%d (%.2f%%)", stat->free_size, stat->max_avail,
> +         (float)stat->free_size/(float)stat->max_avail*100.0);
> +#endif
> +
> +    if ( (stat->live_items + stat->dead_items) > 0 )
> +        stat->avg_item_size = item_size / ( stat->live_items + stat->dead_items );
> +    else
> +        stat->avg_item_size = 0;
> +
> +    return true;
> +}
> +
> +
> +/* ------------------------------------------------------
> + * pgstatindex()
> + *
> + * Usage: SELECT * FROM pgstatindex('t1_pkey');
> + * ------------------------------------------------------
> + */
> +Datum
> +pgstatindex(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    Relation    rel;
> +    RangeVar    *relrv;
> +    Datum        result;
> +    uint32        nblocks;
> +    uint32        blkno;
> +    BTIndexStat indexStat;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
> +        elog(ERROR, "pgstatindex() can be used only on b-tree index.");
> +
> +    /*-------------------
> +     * Read a metapage
> +     *-------------------
> +     */
> +    {
> +        Buffer buffer = ReadBuffer(rel, 0);
> +        Page page = BufferGetPage(buffer);
> +        BTMetaPageData *metad = BTPageGetMeta(page);
> +
> +//        snprintf(values[j++], 32, "%d", metad->btm_magic);
> +        indexStat.version    = metad->btm_version;
> +        indexStat.root_blkno = metad->btm_root;
> +        indexStat.level      = metad->btm_level;
> +//        snprintf(values[j++], 32, "%d", metad->btm_fastroot);
> +//        snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
> +
> +        ReleaseBuffer(buffer);
> +    }
> +
> +    nblocks = RelationGetNumberOfBlocks(rel);
> +
> +    /* -- init stat -- */
> +    indexStat.fragments      = 0;
> +
> +    indexStat.root_pages     = 0;
> +    indexStat.leaf_pages     = 0;
> +    indexStat.internal_pages = 0;
> +    indexStat.empty_pages    = 0;
> +    indexStat.deleted_pages  = 0;
> +
> +    indexStat.max_avail  = 0;
> +    indexStat.free_space = 0;
> +
> +    /*-----------------------
> +     * Scan all blocks
> +     *-----------------------
> +     */
> +    for (blkno=1 ; blkno<nblocks ; blkno++)
> +    {
> +        Buffer buffer = ReadBuffer(rel, blkno);
> +        BTPageStat stat;
> +
> +        /* scan one page */
> +        stat.blkno = blkno;
> +        GetBTPageStatistics(blkno, buffer, &stat);
> +
> +        /*---------------------
> +         * page status (type)
> +         *---------------------
> +         */
> +        if ( stat.is_deleted )
> +            indexStat.deleted_pages++;
> +        else if ( stat.is_leaf )
> +            indexStat.leaf_pages++;
> +        else if ( stat.is_internal )
> +            indexStat.internal_pages++;
> +        else if ( stat.is_empty )
> +            indexStat.empty_pages++;
> +        else if ( stat.is_root )
> +            indexStat.root_pages++;
> +        else
> +            elog(ERROR, "unknown page status.");
> +
> +        /* -- leaf fragmentation -- */
> +        indexStat.fragments += stat.fragments;
> +
> +        if ( stat.is_leaf )
> +        {
> +            indexStat.max_avail  += stat.max_avail;
> +            indexStat.free_space += stat.free_size;
> +        }
> +
> +        ReleaseBuffer(buffer);
> +    }
> +
> +    relation_close(rel, AccessShareLock);
> +
> +#ifdef NOT_USED
> +    elog(NOTICE, "[index information]");
> +    elog(NOTICE, "version.................: %d", indexStat.version);
> +    elog(NOTICE, "tree level..............: %d", indexStat.level);
> +    elog(NOTICE, "index size..............: %d", (indexStat.root_pages +
> +                                    indexStat.leaf_pages +
> +                                    indexStat.internal_pages +
> +                                    indexStat.deleted_pages +
> +                                    indexStat.empty_pages) * BLCKSZ);
> +
> +    elog(NOTICE, "");
> +    elog(NOTICE, "[page statistics]");
> +    elog(NOTICE, "root block number.......: %d", indexStat.root_blkno);
> +    elog(NOTICE, "internal pages..........: %d", indexStat.internal_pages);
> +    elog(NOTICE, "leaf pages..............: %d", indexStat.leaf_pages);
> +    elog(NOTICE, "empty pages.............: %d", indexStat.empty_pages);
> +    elog(NOTICE, "deleted pages...........: %d", indexStat.deleted_pages);
> +
> +    elog(NOTICE, "");
> +    elog(NOTICE, "[leaf statistics]");
> +    elog(NOTICE, "avg. leaf density.......: %.2f%%", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail
*100.0 ); 
> +    elog(NOTICE, "leaf fragmentation......: %.2f%%", (float)indexStat.fragments / 2.0 / (float)indexStat.leaf_pages
*100.0 ); 
> +#endif
> +
> +    /*----------------------------
> +     * Build a result tuple
> +     *----------------------------
> +     */
> +    {
> +        TupleDesc tupleDesc;
> +        int j;
> +        char *values[PGSTATINDEX_NCOLUMNS];
> +
> +        HeapTupleData tupleData;
> +        HeapTuple tuple = &tupleData;
> +
> +        tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
> +
> +        j = 0;
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.version);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.level);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", (indexStat.root_pages +
> +                                         indexStat.leaf_pages +
> +                                         indexStat.internal_pages +
> +                                         indexStat.deleted_pages +
> +                                         indexStat.empty_pages) * BLCKSZ );
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.root_blkno);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.internal_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.empty_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%.2f", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail * 100.0 );
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%.2f", (float)indexStat.fragments / (float)indexStat.leaf_pages * 100.0 );
> +
> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> +                                       values);
> +
> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> +    }
> +
> +    PG_RETURN_DATUM( result );
> +}
> +
> +/* -----------------------------------------------
> + * bt_page()
> + *
> + * Usage: SELECT * FROM bt_page('t1_pkey', 0);
> + * -----------------------------------------------
> + */
> +Datum
> +bt_page_stats(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    uint32       blkno   = PG_GETARG_UINT32(1);
> +    Buffer       buffer;
> +
> +    Relation    rel;
> +    RangeVar    *relrv;
> +    Datum        result;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    CHECK_RELATION_BLOCK_RANGE(rel, blkno);
> +
> +    buffer = ReadBuffer(rel, blkno);
> +
> +    if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
> +        elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
> +
> +    if ( blkno==0 )
> +        elog(ERROR, "Block 0 is a meta page.");
> +
> +    {
> +        HeapTuple tuple;
> +        TupleDesc tupleDesc;
> +        int j;
> +        char *values[BTPAGESTATS_NCOLUMNS];
> +
> +        BTPageStat stat;
> +
> +        GetBTPageStatistics(blkno, buffer, &stat);
> +
> +        tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
> +
> +        j = 0;
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.blkno);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.live_items);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.dead_items);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.live_items + stat.dead_items);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.avg_item_size);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.page_size);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.free_size);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.is_deleted);
> +        values[j] = palloc(32);
> +//        snprintf(values[j++], 32, "%d", opaque->btpo_prev);
> +        snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> +        values[j] = palloc(32);
> +//        snprintf(values[j++], 32, "%d", opaque->btpo_next);
> +        snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> +        values[j] = palloc(32);
> +//        snprintf(values[j++], 32, "%d", opaque->btpo.level);
> +        snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> +        values[j] = palloc(32);
> +//        snprintf(values[j++], 32, "%d", opaque->btpo_flags);
> +        snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> +
> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> +                                       values);
> +
> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> +    }
> +
> +    ReleaseBuffer(buffer);
> +
> +    relation_close(rel, AccessShareLock);
> +
> +    PG_RETURN_DATUM(result);
> +}
> +
> +/*-------------------------------------------------------
> + * bt_page_items()
> + *
> + * Get IndexTupleData set in a leaf page
> + *
> + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
> + *-------------------------------------------------------
> + */
> +/* ---------------------------------------------------
> + * data structure for SRF to hold a scan information
> + * ---------------------------------------------------
> + */
> +struct user_args
> +{
> +    TupleDesc tupd;
> +    Relation rel;
> +    Buffer buffer;
> +    Page page;
> +    uint16 offset;
> +};
> +
> +Datum
> +bt_page_items(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    uint32       blkno   = PG_GETARG_UINT32(1);
> +
> +    RangeVar    *relrv;
> +    Datum result;
> +    char *values[BTPAGEITEMS_NCOLUMNS];
> +    BTPageOpaque opaque;
> +    HeapTuple tuple;
> +    ItemId id;
> +
> +    FuncCallContext *fctx;
> +    MemoryContext mctx;
> +    struct user_args *uargs = NULL;
> +
> +    if ( blkno==0 )
> +        elog(ERROR, "Block 0 is a meta page.");
> +
> +    if ( SRF_IS_FIRSTCALL() )
> +    {
> +        fctx = SRF_FIRSTCALL_INIT();
> +        mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
> +
> +        uargs = palloc(sizeof(struct user_args));
> +
> +        uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
> +        uargs->offset = FirstOffsetNumber;
> +
> +        relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +        uargs->rel = relation_openrv(relrv, AccessShareLock);
> +
> +        CHECK_RELATION_BLOCK_RANGE(uargs->rel,  blkno);
> +
> +        uargs->buffer = ReadBuffer(uargs->rel, blkno);
> +
> +        if ( !IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel) )
> +            elog(ERROR, "bt_page_items() can be used only on b-tree index.");
> +
> +        uargs->page = BufferGetPage(uargs->buffer);
> +
> +        opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
> +
> +        if ( P_ISDELETED(opaque) )
> +            elog(NOTICE, "bt_page_items(): this page is deleted.");
> +
> +        fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
> +        fctx->user_fctx = uargs;
> +
> +        MemoryContextSwitchTo(mctx);
> +    }
> +
> +    fctx = SRF_PERCALL_SETUP();
> +    uargs = fctx->user_fctx;
> +
> +    if ( fctx->call_cntr < fctx->max_calls )
> +    {
> +        BTItem btitem;
> +        IndexTuple itup;
> +
> +        id = PageGetItemId(uargs->page, uargs->offset);
> +
> +        if ( !ItemIdIsValid(id) )
> +            elog(ERROR, "Invalid ItemId.");
> +
> +        btitem = (BTItem)PageGetItem(uargs->page, id);
> +        itup = &(btitem->bti_itup);
> +
> +        {
> +            int j = 0;
> +
> +            BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
> +
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%d", uargs->offset);
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
> +
> +            {
> +                int off;
> +                char *dump;
> +                char *ptr = (char *)itup + IndexInfoFindDataOffset(itup->t_info);
> +
> +                dump = palloc(IndexTupleSize(itup)*3);
> +                memset(dump, 0, IndexTupleSize(itup)*3);
> +
> +                for (off=0 ;
> +                     off < IndexTupleSize(itup)-IndexInfoFindDataOffset(itup->t_info) ;
> +                     off++)
> +                {
> +                    if ( dump[0]=='\0' )
> +                        sprintf(dump, "%02x", *(ptr+off) & 0xff);
> +                    else
> +                    {
> +                        char buf[4];
> +                        sprintf(buf, " %02x", *(ptr+off) & 0xff);
> +                        strcat(dump, buf);
> +                    }
> +                }
> +                values[j] = dump;
> +            }
> +
> +            tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
> +            result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
> +        }
> +
> +        uargs->offset = uargs->offset + 1;
> +
> +        SRF_RETURN_NEXT(fctx, result);
> +    }
> +    else
> +    {
> +        ReleaseBuffer(uargs->buffer);
> +        relation_close(uargs->rel, AccessShareLock);
> +
> +        SRF_RETURN_DONE(fctx);
> +    }
> +}
> +
> +
> +/* ------------------------------------------------
> + * bt_metap()
> + *
> + * Get a btree meta-page information
> + *
> + * Usage: SELECT * FROM bt_metap('t1_pkey')
> + * ------------------------------------------------
> + */
> +Datum
> +bt_metap(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    Buffer       buffer;
> +
> +    Relation    rel;
> +    RangeVar    *relrv;
> +    Datum        result;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
> +        elog(ERROR, "bt_metap() can be used only on b-tree index.");
> +
> +    buffer = ReadBuffer(rel, 0);
> +
> +    {
> +        BTMetaPageData *metad;
> +
> +        TupleDesc tupleDesc;
> +        int j;
> +        char *values[BTMETAP_NCOLUMNS];
> +        HeapTuple tuple;
> +
> +        Page page = BufferGetPage(buffer);
> +
> +        metad = BTPageGetMeta(page);
> +
> +        tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
> +
> +        j = 0;
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_magic);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_version);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_root);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_level);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_fastroot);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
> +
> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> +                                       values);
> +
> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> +    }
> +
> +    ReleaseBuffer(buffer);
> +
> +    relation_close(rel, AccessShareLock);
> +
> +    PG_RETURN_DATUM(result);
> +}
> +
> +/* --------------------------------------------------------
> + * pg_relpages()
> + *
> + * Get a number of pages of the table/index.
> + *
> + * Usage: SELECT pg_relpages('t1');
> + *        SELECT pg_relpages('t1_pkey');
> + * --------------------------------------------------------
> + */
> +Datum
> +pg_relpages(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +
> +    Relation    rel;
> +    RangeVar    *relrv;
> +    int4  relpages;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    relpages = RelationGetNumberOfBlocks(rel);
> +
> +    relation_close(rel, AccessShareLock);
> +
> +    PG_RETURN_INT32(relpages);
> +}
> +
> diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
> --- pgstattuple.orig/pgstattuple.sql.in    2006-08-10 19:22:47.000000000 +0900
> +++ pgstattuple/pgstattuple.sql.in    2006-08-11 17:46:19.000000000 +0900
> @@ -22,3 +22,97 @@
>  RETURNS pgstattuple_type
>  AS 'MODULE_PATHNAME', 'pgstattuplebyid'
>  LANGUAGE 'C' STRICT;
> +
> +--
> +-- pgstatindex
> +--
> +DROP TYPE pgstatindex_type CASCADE;
> +CREATE TYPE pgstatindex_type AS (
> +  version int4,
> +  tree_level int4,
> +  index_size int4,
> +  root_block_no int4,
> +  internal_pages int4,
> +  leaf_pages int4,
> +  empty_pages int4,
> +  deleted_pages int4,
> +  avg_leaf_density float8,
> +  leaf_fragmentation float8
> +);
> +
> +CREATE OR REPLACE FUNCTION pgstatindex(text)
> +RETURNS pgstatindex_type
> +AS 'MODULE_PATHNAME', 'pgstatindex'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_metap()
> +--
> +DROP TYPE bt_metap_type CASCADE;
> +CREATE TYPE bt_metap_type AS (
> +  magic int4,
> +  version int4,
> +  root int4,
> +  level int4,
> +  fastroot int4,
> +  fastlevel int4
> +);
> +
> +CREATE OR REPLACE FUNCTION bt_metap(text)
> +RETURNS bt_metap_type
> +AS 'MODULE_PATHNAME', 'bt_metap'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_page_items()
> +--
> +DROP TYPE bt_page_items_type CASCADE;
> +CREATE TYPE bt_page_items_type AS (
> +  itemoffset int4,
> +  ctid tid,
> +  itemlen int4,
> +  nulls bool,
> +  vars bool,
> +  data text
> +);
> +
> +DROP FUNCTION bt_page_items(text, int4);
> +
> +CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
> +RETURNS SETOF bt_page_items_type
> +AS 'MODULE_PATHNAME', 'bt_page_items'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_page_stats()
> +--
> +DROP TYPE bt_page_stats_type CASCADE;
> +CREATE TYPE bt_page_stats_type AS (
> +  blkno int4,
> +  live_items int4,
> +  dead_items int4,
> +  total_items int4,
> +  avg_item_size float,
> +  page_size int4,
> +  free_size int4,
> +  is_deleted int4,
> +  btpo_prev int4,
> +  btpo_next int4,
> +  btpo_level int4,
> +  btpo_flags int4
> +);
> +
> +DROP FUNCTION bt_page_stats(text, int4);
> +
> +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
> +RETURNS bt_page_stats_type
> +AS 'MODULE_PATHNAME', 'bt_page_stats'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- pg_relpages()
> +--
> +CREATE OR REPLACE FUNCTION pg_relpages(text)
> +RETURNS int
> +AS 'MODULE_PATHNAME', 'pg_relpages'
> +LANGUAGE 'C' STRICT;
> diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
> --- pgstattuple.orig/test.sh    1970-01-01 09:00:00.000000000 +0900
> +++ pgstattuple/test.sh    2006-08-11 17:47:05.000000000 +0900
> @@ -0,0 +1,22 @@
> +#!/bin/sh
> +
> +export PATH=/usr/local/pgsql814/bin:$PATH
> +
> +psql pgbench<<EOF
> +DROP FUNCTION pgstatindex(text);
> +EOF
> +
> +psql -f /usr/local/pgsql814/share/contrib/pgstattuple.sql pgbench
> +
> +psql pgbench<<EOF
> +SELECT * FROM pg_relpages('accounts_pkey');
> +\x
> +SELECT * FROM pgstatindex('accounts_pkey');
> +SELECT * FROM bt_metap('accounts_pkey');
> +\x
> +SELECT * FROM bt_page_items('accounts_pkey', 0);
> +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
> +
> +\x
> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
> +EOF
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: pgstattuple extension for indexes

From
Satoshi Nagayasu
Date:
Bruce,

Bruce Momjian wrote:
> BTIem is no longer in CVS HEAD, though it was in 8.1.X.  Please update
> your patch for CVS HEAD.  Thanks.

I've posted CVS HEAD workable version on Aug.14.
Please check it out. Thanks.

--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122

Re: pgstattuple extension for indexes

From
Bruce Momjian
Date:
Satoshi Nagayasu wrote:
> Bruce,
>
> Bruce Momjian wrote:
> > BTIem is no longer in CVS HEAD, though it was in 8.1.X.  Please update
> > your patch for CVS HEAD.  Thanks.
>
> I've posted CVS HEAD workable version on Aug.14.
> Please check it out. Thanks.

OK, I found it, but it has no updates to README.pgstattuple to describe
the new functionality.  Should I write it?

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: pgstattuple extension for indexes

From
Satoshi Nagayasu
Date:
Sorry, I'll write README (and uninstall.sql?) by tomorrow.

Bruce Momjian wrote:
> Satoshi Nagayasu wrote:
>> Bruce,
>>
>> Bruce Momjian wrote:
>>> BTIem is no longer in CVS HEAD, though it was in 8.1.X.  Please update
>>> your patch for CVS HEAD.  Thanks.
>> I've posted CVS HEAD workable version on Aug.14.
>> Please check it out. Thanks.
>
> OK, I found it, but it has no updates to README.pgstattuple to describe
> the new functionality.  Should I write it?
>


--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122

Re: pgstattuple extension for indexes

From
Bruce Momjian
Date:
Satoshi Nagayasu wrote:
> Sorry, I'll write README (and uninstall.sql?) by tomorrow.

Thanks.  Yea, you need to update the uninstall too.

---------------------------------------------------------------------------


>
> Bruce Momjian wrote:
> > Satoshi Nagayasu wrote:
> >> Bruce,
> >>
> >> Bruce Momjian wrote:
> >>> BTIem is no longer in CVS HEAD, though it was in 8.1.X.  Please update
> >>> your patch for CVS HEAD.  Thanks.
> >> I've posted CVS HEAD workable version on Aug.14.
> >> Please check it out. Thanks.
> >
> > OK, I found it, but it has no updates to README.pgstattuple to describe
> > the new functionality.  Should I write it?
> >
>
>
> --
> NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
> Phone: +81-3-3523-8122

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: pgstattuple extension for indexes

From
Bruce Momjian
Date:
Patch applied.  Thanks.

I updated the README documentation for the new functions, attached.  I
could not update the Japanese version of the README.

---------------------------------------------------------------------------


Satoshi Nagayasu wrote:
> Bruce,
>
> Attached patch has been cleaned up,
> and modified to be able to work with CVS HEAD.
>
> Thanks.
>
> Satoshi Nagayasu wrote:
> > Alvaro,
> >
> > Alvaro Herrera wrote:
> >> Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
> >>
> >> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
> >> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this
function)
> >>
> >>
> >> While you're at it, please consider removing C++ style comments and
> >> unused code.
> >>
> >> Formatting is way off as well, but I guess that is easily fixed with
> >> pgindent.
> >
> > Thanks for comments. I'm going to fix my patch from now.
> >
> >> Regarding the pg_relpages function, why do you think it's necessary?
> >> (It returns the true number of blocks of a given relation).  It may
> >> belong into core given a reasonable use case, but otherwise it doesn't
> >> seem to belong into pgstatindex (or pgstattuple for that matter).
> >
> > I wanted to sample some pages from the table/index, and get their statistics
> > to know table/index conditions. I know pgstattuple() reports table
> > statistics, however, pgstattuple() generates heavy CPU and I/O load.
> >
> > When we need to sample some pages from table/index, we need to know
> > true number of blocks.
> >
> > I have another function, called pgstatpage(), to get information inside
> > a single block/page statistics of the table. pg_relpages() will be used
> > with this.
> >
> > Sorry for not mentioned in previous post about pgstatpage(),
> > but I've remembered about it just now.
> >
> > Many memories in my brain have already `paged-out` (too busy in last few months),
> > and some of them got `out-of-memory`. :^)
> >
> > Thanks.
>
>
> --
> NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
> Phone: +81-3-3523-8122

> diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
> --- pgstattuple.orig/Makefile    2006-02-27 21:54:40.000000000 +0900
> +++ pgstattuple/Makefile    2006-08-14 09:28:58.000000000 +0900
> @@ -6,7 +6,7 @@
>  #
>  #-------------------------------------------------------------------------
>
> -SRCS        = pgstattuple.c
> +SRCS        = pgstattuple.c pgstatindex.c
>
>  MODULE_big    = pgstattuple
>  OBJS        = $(SRCS:.c=.o)
> diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
> --- pgstattuple.orig/pgstatindex.c    1970-01-01 09:00:00.000000000 +0900
> +++ pgstattuple/pgstatindex.c    2006-08-14 11:24:23.000000000 +0900
> @@ -0,0 +1,706 @@
> +/*
> + * pgstatindex
> + *
> + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp>
> + *
> + * Permission to use, copy, modify, and distribute this software and
> + * its documentation for any purpose, without fee, and without a
> + * written agreement is hereby granted, provided that the above
> + * copyright notice and this paragraph and the following two
> + * paragraphs appear in all copies.
> + *
> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
> + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
> + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
> + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
> + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
> + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
> + */
> +
> +#include "postgres.h"
> +
> +#include "fmgr.h"
> +#include "funcapi.h"
> +#include "access/heapam.h"
> +#include "access/itup.h"
> +#include "access/nbtree.h"
> +#include "access/transam.h"
> +#include "catalog/namespace.h"
> +#include "catalog/pg_type.h"
> +#include "utils/builtins.h"
> +#include "utils/inval.h"
> +
> +PG_FUNCTION_INFO_V1(pgstatindex);
> +PG_FUNCTION_INFO_V1(bt_metap);
> +PG_FUNCTION_INFO_V1(bt_page_items);
> +PG_FUNCTION_INFO_V1(bt_page_stats);
> +PG_FUNCTION_INFO_V1(pg_relpages);
> +
> +extern Datum pgstatindex(PG_FUNCTION_ARGS);
> +extern Datum bt_metap(PG_FUNCTION_ARGS);
> +extern Datum bt_page_items(PG_FUNCTION_ARGS);
> +extern Datum bt_page_stats(PG_FUNCTION_ARGS);
> +extern Datum pg_relpages(PG_FUNCTION_ARGS);
> +
> +#define PGSTATINDEX_TYPE "public.pgstatindex_type"
> +#define PGSTATINDEX_NCOLUMNS 10
> +
> +#define BTMETAP_TYPE "public.bt_metap_type"
> +#define BTMETAP_NCOLUMNS 6
> +
> +#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
> +#define BTPAGEITEMS_NCOLUMNS 6
> +
> +#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
> +#define BTPAGESTATS_NCOLUMNS 11
> +
> +
> +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
> +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
> +
> +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
> +        if ( !(FirstOffsetNumber<=(offset) && \
> +                        (offset)<=PageGetMaxOffsetNumber(page)) ) \
> +             elog(ERROR, "Page offset number out of range."); }
> +
> +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
> +        if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
> +             elog(ERROR, "Block number out of range."); }
> +
> +/* ------------------------------------------------
> + * structure for single btree page statistics
> + * ------------------------------------------------
> + */
> +typedef struct BTPageStat
> +{
> +    uint32        blkno;
> +    uint32        live_items;
> +    uint32        dead_items;
> +    uint32        page_size;
> +    uint32        max_avail;
> +    uint32        free_size;
> +    uint32        avg_item_size;
> +    uint32        fragments;
> +    char        type;
> +
> +    /* opaque data */
> +    BlockNumber btpo_prev;
> +    BlockNumber btpo_next;
> +    union
> +    {
> +        uint32        level;
> +        TransactionId xact;
> +    }            btpo;
> +    uint16        btpo_flags;
> +    BTCycleId    btpo_cycleid;
> +}    BTPageStat;
> +
> +/* ------------------------------------------------
> + * A structure for a whole btree index statistics
> + * used by pgstatindex().
> + * ------------------------------------------------
> + */
> +typedef struct BTIndexStat
> +{
> +    uint32        magic;
> +    uint32        version;
> +    BlockNumber root_blkno;
> +    uint32        level;
> +
> +    BlockNumber fastroot;
> +    uint32        fastlevel;
> +
> +    uint32        live_items;
> +    uint32        dead_items;
> +
> +    uint32        root_pages;
> +    uint32        internal_pages;
> +    uint32        leaf_pages;
> +    uint32        empty_pages;
> +    uint32        deleted_pages;
> +
> +    uint32        page_size;
> +    uint32        avg_item_size;
> +
> +    uint32        max_avail;
> +    uint32        free_space;
> +
> +    uint32        fragments;
> +}    BTIndexStat;
> +
> +/* -------------------------------------------------
> + * GetBTPageStatistics()
> + *
> + * Collect statistics of single b-tree leaf page
> + * -------------------------------------------------
> + */
> +static bool
> +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat * stat)
> +{
> +    Page        page = BufferGetPage(buffer);
> +    PageHeader    phdr = (PageHeader) page;
> +    OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
> +    BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> +    int            item_size = 0;
> +    int            off;
> +
> +    stat->blkno = blkno;
> +
> +    stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData);
> +
> +    stat->dead_items = stat->live_items = 0;
> +
> +    stat->page_size = PageGetPageSize(page);
> +
> +    /* page type (flags) */
> +    if (P_ISDELETED(opaque))
> +    {
> +        stat->type = 'd';
> +        return true;
> +    }
> +    else if (P_IGNORE(opaque))
> +        stat->type = 'e';
> +    else if (P_ISLEAF(opaque))
> +        stat->type = 'l';
> +    else if (P_ISROOT(opaque))
> +        stat->type = 'r';
> +    else
> +        stat->type = 'i';
> +
> +    /* btpage opaque data */
> +    stat->btpo_prev = opaque->btpo_prev;
> +    stat->btpo_next = opaque->btpo_next;
> +    if (P_ISDELETED(opaque))
> +        stat->btpo.xact = opaque->btpo.xact;
> +    else
> +        stat->btpo.level = opaque->btpo.level;
> +    stat->btpo_flags = opaque->btpo_flags;
> +    stat->btpo_cycleid = opaque->btpo_cycleid;
> +
> +    /*----------------------------------------------
> +     * If a next leaf is on the previous block,
> +     * it means a fragmentation.
> +     *----------------------------------------------
> +     */
> +    stat->fragments = 0;
> +    if (stat->type == 'l')
> +    {
> +        if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
> +            stat->fragments++;
> +    }
> +
> +    /* count live and dead tuples, and free space */
> +    for (off = FirstOffsetNumber; off <= maxoff; off++)
> +    {
> +        IndexTuple    itup;
> +
> +        ItemId        id = PageGetItemId(page, off);
> +
> +        itup = (IndexTuple) PageGetItem(page, id);
> +
> +        item_size += IndexTupleSize(itup);
> +
> +        if (!ItemIdDeleted(id))
> +            stat->live_items++;
> +        else
> +            stat->dead_items++;
> +    }
> +    stat->free_size = PageGetFreeSpace(page);
> +
> +    if ((stat->live_items + stat->dead_items) > 0)
> +        stat->avg_item_size = item_size / (stat->live_items + stat->dead_items);
> +    else
> +        stat->avg_item_size = 0;
> +
> +    return true;
> +}
> +
> +
> +/* ------------------------------------------------------
> + * pgstatindex()
> + *
> + * Usage: SELECT * FROM pgstatindex('t1_pkey');
> + * ------------------------------------------------------
> + */
> +Datum
> +pgstatindex(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    Relation    rel;
> +    RangeVar   *relrv;
> +    Datum        result;
> +    uint32        nblocks;
> +    uint32        blkno;
> +    BTIndexStat indexStat;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    if (!IS_INDEX(rel) || !IS_BTREE(rel))
> +        elog(ERROR, "pgstatindex() can be used only on b-tree index.");
> +
> +    /*-------------------
> +     * Read a metapage
> +     *-------------------
> +     */
> +    {
> +        Buffer        buffer = ReadBuffer(rel, 0);
> +        Page        page = BufferGetPage(buffer);
> +        BTMetaPageData *metad = BTPageGetMeta(page);
> +
> +        indexStat.magic = metad->btm_magic;
> +        indexStat.version = metad->btm_version;
> +        indexStat.root_blkno = metad->btm_root;
> +        indexStat.level = metad->btm_level;
> +        indexStat.fastroot = metad->btm_fastroot;
> +        indexStat.fastlevel = metad->btm_fastlevel;
> +
> +        ReleaseBuffer(buffer);
> +    }
> +
> +    nblocks = RelationGetNumberOfBlocks(rel);
> +
> +    /* -- init stat -- */
> +    indexStat.fragments = 0;
> +
> +    indexStat.root_pages = 0;
> +    indexStat.leaf_pages = 0;
> +    indexStat.internal_pages = 0;
> +    indexStat.empty_pages = 0;
> +    indexStat.deleted_pages = 0;
> +
> +    indexStat.max_avail = 0;
> +    indexStat.free_space = 0;
> +
> +    /*-----------------------
> +     * Scan all blocks
> +     *-----------------------
> +     */
> +    for (blkno = 1; blkno < nblocks; blkno++)
> +    {
> +        Buffer        buffer = ReadBuffer(rel, blkno);
> +        BTPageStat    stat;
> +
> +        /* scan one page */
> +        stat.blkno = blkno;
> +        GetBTPageStatistics(blkno, buffer, &stat);
> +
> +        /*---------------------
> +         * page status (type)
> +         *---------------------
> +         */
> +        switch (stat.type)
> +        {
> +            case 'd':
> +                indexStat.deleted_pages++;
> +                break;
> +            case 'l':
> +                indexStat.leaf_pages++;
> +                break;
> +            case 'i':
> +                indexStat.internal_pages++;
> +                break;
> +            case 'e':
> +                indexStat.empty_pages++;
> +                break;
> +            case 'r':
> +                indexStat.root_pages++;
> +                break;
> +            default:
> +                elog(ERROR, "unknown page status.");
> +        }
> +
> +        /* -- leaf fragmentation -- */
> +        indexStat.fragments += stat.fragments;
> +
> +        if (stat.type == 'l')
> +        {
> +            indexStat.max_avail += stat.max_avail;
> +            indexStat.free_space += stat.free_size;
> +        }
> +
> +        ReleaseBuffer(buffer);
> +    }
> +
> +    relation_close(rel, AccessShareLock);
> +
> +    /*----------------------------
> +     * Build a result tuple
> +     *----------------------------
> +     */
> +    {
> +        TupleDesc    tupleDesc;
> +        int            j;
> +        char       *values[PGSTATINDEX_NCOLUMNS];
> +
> +        HeapTupleData tupleData;
> +        HeapTuple    tuple = &tupleData;
> +
> +        tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
> +
> +        j = 0;
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.version);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.level);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", (indexStat.root_pages +
> +                                         indexStat.leaf_pages +
> +                                         indexStat.internal_pages +
> +                                         indexStat.deleted_pages +
> +                                         indexStat.empty_pages) * BLCKSZ);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.root_blkno);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.internal_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.empty_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%.2f", 100.0 - (float) indexStat.free_space / (float) indexStat.max_avail *
100.0);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%.2f", (float) indexStat.fragments / (float) indexStat.leaf_pages * 100.0);
> +
> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> +                                       values);
> +
> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> +    }
> +
> +    PG_RETURN_DATUM(result);
> +}
> +
> +/* -----------------------------------------------
> + * bt_page()
> + *
> + * Usage: SELECT * FROM bt_page('t1_pkey', 0);
> + * -----------------------------------------------
> + */
> +Datum
> +bt_page_stats(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    uint32        blkno = PG_GETARG_UINT32(1);
> +    Buffer        buffer;
> +
> +    Relation    rel;
> +    RangeVar   *relrv;
> +    Datum        result;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    CHECK_RELATION_BLOCK_RANGE(rel, blkno);
> +
> +    buffer = ReadBuffer(rel, blkno);
> +
> +    if (!IS_INDEX(rel) || !IS_BTREE(rel))
> +        elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
> +
> +    if (blkno == 0)
> +        elog(ERROR, "Block 0 is a meta page.");
> +
> +    {
> +        HeapTuple    tuple;
> +        TupleDesc    tupleDesc;
> +        int            j;
> +        char       *values[BTPAGESTATS_NCOLUMNS];
> +
> +        BTPageStat    stat;
> +
> +        GetBTPageStatistics(blkno, buffer, &stat);
> +
> +        tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
> +
> +        j = 0;
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.blkno);
> +
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%c", stat.type);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.live_items);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.dead_items);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.avg_item_size);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.page_size);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.free_size);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.btpo_prev);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.btpo_next);
> +
> +        values[j] = palloc(32);
> +        if (stat.type == 'd')
> +            snprintf(values[j++], 32, "%d", stat.btpo.xact);
> +        else
> +            snprintf(values[j++], 32, "%d", stat.btpo.level);
> +
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.btpo_flags);
> +
> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> +                                       values);
> +
> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> +    }
> +
> +    ReleaseBuffer(buffer);
> +
> +    relation_close(rel, AccessShareLock);
> +
> +    PG_RETURN_DATUM(result);
> +}
> +
> +/*-------------------------------------------------------
> + * bt_page_items()
> + *
> + * Get IndexTupleData set in a leaf page
> + *
> + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
> + *-------------------------------------------------------
> + */
> +/* ---------------------------------------------------
> + * data structure for SRF to hold a scan information
> + * ---------------------------------------------------
> + */
> +struct user_args
> +{
> +    TupleDesc    tupd;
> +    Relation    rel;
> +    Buffer        buffer;
> +    Page        page;
> +    uint16        offset;
> +};
> +
> +Datum
> +bt_page_items(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    uint32        blkno = PG_GETARG_UINT32(1);
> +
> +    RangeVar   *relrv;
> +    Datum        result;
> +    char       *values[BTPAGEITEMS_NCOLUMNS];
> +    BTPageOpaque opaque;
> +    HeapTuple    tuple;
> +    ItemId        id;
> +
> +    FuncCallContext *fctx;
> +    MemoryContext mctx;
> +    struct user_args *uargs = NULL;
> +
> +    if (blkno == 0)
> +        elog(ERROR, "Block 0 is a meta page.");
> +
> +    if (SRF_IS_FIRSTCALL())
> +    {
> +        fctx = SRF_FIRSTCALL_INIT();
> +        mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
> +
> +        uargs = palloc(sizeof(struct user_args));
> +
> +        uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
> +        uargs->offset = FirstOffsetNumber;
> +
> +        relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +        uargs->rel = relation_openrv(relrv, AccessShareLock);
> +
> +        CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno);
> +
> +        uargs->buffer = ReadBuffer(uargs->rel, blkno);
> +
> +        if (!IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel))
> +            elog(ERROR, "bt_page_items() can be used only on b-tree index.");
> +
> +        uargs->page = BufferGetPage(uargs->buffer);
> +
> +        opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
> +
> +        if (P_ISDELETED(opaque))
> +            elog(NOTICE, "bt_page_items(): this page is deleted.");
> +
> +        fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
> +        fctx->user_fctx = uargs;
> +
> +        MemoryContextSwitchTo(mctx);
> +    }
> +
> +    fctx = SRF_PERCALL_SETUP();
> +    uargs = fctx->user_fctx;
> +
> +    if (fctx->call_cntr < fctx->max_calls)
> +    {
> +        IndexTuple    itup;
> +
> +        id = PageGetItemId(uargs->page, uargs->offset);
> +
> +        if (!ItemIdIsValid(id))
> +            elog(ERROR, "Invalid ItemId.");
> +
> +        itup = (IndexTuple) PageGetItem(uargs->page, id);
> +
> +        {
> +            int            j = 0;
> +
> +            BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
> +
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%d", uargs->offset);
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
> +
> +            {
> +                int            off;
> +                char       *dump;
> +                char       *ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
> +
> +                dump = palloc(IndexTupleSize(itup) * 3);
> +                memset(dump, 0, IndexTupleSize(itup) * 3);
> +
> +                for (off = 0;
> +                     off < IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
> +                     off++)
> +                {
> +                    if (dump[0] == '\0')
> +                        sprintf(dump, "%02x", *(ptr + off) & 0xff);
> +                    else
> +                    {
> +                        char        buf[4];
> +
> +                        sprintf(buf, " %02x", *(ptr + off) & 0xff);
> +                        strcat(dump, buf);
> +                    }
> +                }
> +                values[j] = dump;
> +            }
> +
> +            tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
> +            result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
> +        }
> +
> +        uargs->offset = uargs->offset + 1;
> +
> +        SRF_RETURN_NEXT(fctx, result);
> +    }
> +    else
> +    {
> +        ReleaseBuffer(uargs->buffer);
> +        relation_close(uargs->rel, AccessShareLock);
> +
> +        SRF_RETURN_DONE(fctx);
> +    }
> +}
> +
> +
> +/* ------------------------------------------------
> + * bt_metap()
> + *
> + * Get a btree meta-page information
> + *
> + * Usage: SELECT * FROM bt_metap('t1_pkey')
> + * ------------------------------------------------
> + */
> +Datum
> +bt_metap(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    Buffer        buffer;
> +
> +    Relation    rel;
> +    RangeVar   *relrv;
> +    Datum        result;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    if (!IS_INDEX(rel) || !IS_BTREE(rel))
> +        elog(ERROR, "bt_metap() can be used only on b-tree index.");
> +
> +    buffer = ReadBuffer(rel, 0);
> +
> +    {
> +        BTMetaPageData *metad;
> +
> +        TupleDesc    tupleDesc;
> +        int            j;
> +        char       *values[BTMETAP_NCOLUMNS];
> +        HeapTuple    tuple;
> +
> +        Page        page = BufferGetPage(buffer);
> +
> +        metad = BTPageGetMeta(page);
> +
> +        tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
> +
> +        j = 0;
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_magic);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_version);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_root);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_level);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_fastroot);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
> +
> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> +                                       values);
> +
> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> +    }
> +
> +    ReleaseBuffer(buffer);
> +
> +    relation_close(rel, AccessShareLock);
> +
> +    PG_RETURN_DATUM(result);
> +}
> +
> +/* --------------------------------------------------------
> + * pg_relpages()
> + *
> + * Get a number of pages of the table/index.
> + *
> + * Usage: SELECT pg_relpages('t1');
> + *          SELECT pg_relpages('t1_pkey');
> + * --------------------------------------------------------
> + */
> +Datum
> +pg_relpages(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +
> +    Relation    rel;
> +    RangeVar   *relrv;
> +    int4        relpages;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    relpages = RelationGetNumberOfBlocks(rel);
> +
> +    relation_close(rel, AccessShareLock);
> +
> +    PG_RETURN_INT32(relpages);
> +}
> diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
> --- pgstattuple.orig/pgstattuple.sql.in    2006-02-28 01:09:50.000000000 +0900
> +++ pgstattuple/pgstattuple.sql.in    2006-08-14 10:37:32.000000000 +0900
> @@ -22,3 +22,96 @@
>  RETURNS pgstattuple_type
>  AS 'MODULE_PATHNAME', 'pgstattuplebyid'
>  LANGUAGE C STRICT;
> +
> +--
> +-- pgstatindex
> +--
> +DROP TYPE pgstatindex_type CASCADE;
> +CREATE TYPE pgstatindex_type AS (
> +  version int4,
> +  tree_level int4,
> +  index_size int4,
> +  root_block_no int4,
> +  internal_pages int4,
> +  leaf_pages int4,
> +  empty_pages int4,
> +  deleted_pages int4,
> +  avg_leaf_density float8,
> +  leaf_fragmentation float8
> +);
> +
> +CREATE OR REPLACE FUNCTION pgstatindex(text)
> +RETURNS pgstatindex_type
> +AS 'MODULE_PATHNAME', 'pgstatindex'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_metap()
> +--
> +DROP TYPE bt_metap_type CASCADE;
> +CREATE TYPE bt_metap_type AS (
> +  magic int4,
> +  version int4,
> +  root int4,
> +  level int4,
> +  fastroot int4,
> +  fastlevel int4
> +);
> +
> +CREATE OR REPLACE FUNCTION bt_metap(text)
> +RETURNS bt_metap_type
> +AS 'MODULE_PATHNAME', 'bt_metap'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_page_items()
> +--
> +DROP TYPE bt_page_items_type CASCADE;
> +CREATE TYPE bt_page_items_type AS (
> +  itemoffset int4,
> +  ctid tid,
> +  itemlen int4,
> +  nulls bool,
> +  vars bool,
> +  data text
> +);
> +
> +DROP FUNCTION bt_page_items(text, int4);
> +
> +CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
> +RETURNS SETOF bt_page_items_type
> +AS 'MODULE_PATHNAME', 'bt_page_items'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_page_stats()
> +--
> +DROP TYPE bt_page_stats_type CASCADE;
> +CREATE TYPE bt_page_stats_type AS (
> +  blkno int4,
> +  type char,
> +  live_items int4,
> +  dead_items int4,
> +  avg_item_size float,
> +  page_size int4,
> +  free_size int4,
> +  btpo_prev int4,
> +  btpo_next int4,
> +  btpo int4,
> +  btpo_flags int4
> +);
> +
> +DROP FUNCTION bt_page_stats(text, int4);
> +
> +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
> +RETURNS bt_page_stats_type
> +AS 'MODULE_PATHNAME', 'bt_page_stats'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- pg_relpages()
> +--
> +CREATE OR REPLACE FUNCTION pg_relpages(text)
> +RETURNS int
> +AS 'MODULE_PATHNAME', 'pg_relpages'
> +LANGUAGE 'C' STRICT;
> diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
> --- pgstattuple.orig/test.sh    1970-01-01 09:00:00.000000000 +0900
> +++ pgstattuple/test.sh    2006-08-14 10:23:08.000000000 +0900
> @@ -0,0 +1,27 @@
> +#!/bin/sh
> +
> +PGHOME=/home/snaga/pgsql20060814
> +export PATH=${PGHOME}/bin:$PATH
> +
> +psql -p 9999 pgbench<<EOF
> +DROP FUNCTION pgstatindex(text);
> +EOF
> +
> +psql -p 9999 -f ${PGHOME}/share/contrib/pgstattuple.sql pgbench
> +
> +psql -p 9999 pgbench<<EOF
> +SELECT * FROM pg_relpages('accounts_pkey');
> +\x
> +SELECT * FROM pgstatindex('accounts_pkey');
> +SELECT * FROM bt_metap('accounts_pkey');
> +\x
> +SELECT * FROM bt_page_items('accounts_pkey', 0);
> +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
> +
> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
> +\x
> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
> +SELECT * FROM bt_page_stats('accounts_pkey', 361);
> +\x
> +SELECT * FROM bt_page_items('accounts_pkey', 361);
> +EOF
>
--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
pgstattuple README            2002/08/29 Tatsuo Ishii

1. Functions supported:

    pgstattuple
    -----------
    pgstattuple() returns the relation length, percentage of the "dead"
    tuples of a relation and other info. This may help users to determine
    whether vacuum is necessary or not. Here is an example session:

        test=> \x
        Expanded display is on.
        test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
        -[ RECORD 1 ]------+-------
        table_len          | 458752
        tuple_count        | 1470
        tuple_len          | 438896
        tuple_percent      | 95.67
        dead_tuple_count   | 11
        dead_tuple_len     | 3157
        dead_tuple_percent | 0.69
        free_space         | 8932
        free_percent       | 1.95

    Here are explanations for each column:

        table_len        -- physical relation length in bytes
        tuple_count        -- number of live tuples
        tuple_len        -- total tuples length in bytes
        tuple_percent    -- live tuples in %
        dead_tuple_len    -- total dead tuples length in bytes
        dead_tuple_percent    -- dead tuples in %
        free_space        -- free space in bytes
        free_percent    -- free space in %

    pg_relpages
    -----------
    pg_relpages() returns the number of pages in the relation.

    pgstatindex
    -----------
    pgstatindex() returns an array showing the information about an index:

        test=> \x
        Expanded display is on.
        test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
        -[ RECORD 1 ]------+------
        version            | 2
        tree_level         | 0
        index_size         | 8192
        root_block_no      | 1
        internal_pages     | 0
        leaf_pages         | 1
        empty_pages        | 0
        deleted_pages      | 0
        avg_leaf_density   | 50.27
        leaf_fragmentation | 0

    bt_metap
    --------
    bt_metap() returns information about the btree index metapage:

        test=> SELECT * FROM bt_metap('pg_cast_oid_index');
        -[ RECORD 1 ]-----
        magic     | 340322
        version   | 2
        root      | 1
        level     | 0
        fastroot  | 1
        fastlevel | 0

    bt_page_stats
    -------------
    bt_page_stats() shows information about single btree pages:

        test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
        -[ RECORD 1 ]-+-----
        blkno         | 1
        type          | l
        live_items    | 256
        dead_items    | 0
        avg_item_size | 12
        page_size     | 8192
        free_size     | 4056
        btpo_prev     | 0
        btpo_next     | 0
        btpo          | 0
        btpo_flags    | 3

    bt_page_items
    -------------
    bt_page_items() returns information about specific items on btree pages:

        test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
         itemoffset |  ctid   | itemlen | nulls | vars |    data
        ------------+---------+---------+-------+------+-------------
                  1 | (0,1)   |      12 | f     | f    | 23 27 00 00
                  2 | (0,2)   |      12 | f     | f    | 24 27 00 00
                  3 | (0,3)   |      12 | f     | f    | 25 27 00 00
                  4 | (0,4)   |      12 | f     | f    | 26 27 00 00
                  5 | (0,5)   |      12 | f     | f    | 27 27 00 00
                  6 | (0,6)   |      12 | f     | f    | 28 27 00 00
                  7 | (0,7)   |      12 | f     | f    | 29 27 00 00
                  8 | (0,8)   |      12 | f     | f    | 2a 27 00 00


2. Installing pgstattuple

    $ make
    $ make install
    $ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test


3. Using pgstattuple

    pgstattuple may be called as a relation function and is
    defined as follows:

    CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
     AS 'MODULE_PATHNAME', 'pgstattuple'
     LANGUAGE C STRICT;

    CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
     AS 'MODULE_PATHNAME', 'pgstattuplebyid'
     LANGUAGE C STRICT;

    The argument is the relation name (optionally it may be qualified)
    or the OID of the relation.  Note that pgstattuple only returns
    one row.


4. Notes

    pgstattuple acquires only a read lock on the relation. So concurrent
    update may affect the result.

    pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow()
    returns false.


5. History

    2006/06/28

    Extended to work against indexes.

Re: pgstattuple extension for indexes

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Patch applied.  Thanks.

For some reason I expected this patch to correct the portability errors
and design problems identified here:
http://archives.postgresql.org/pgsql-patches/2006-07/msg00100.php

Not only has it not fixed anything, it's made things worse:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels
-fno-strict-aliasing-g -fpic -I. -I../../src/include -D_GNU_SOURCE   -c -o pgstattuple.o pgstattuple.c 
pgstattuple.c: In function 'pgstat_btree':
pgstattuple.c:335: warning: format '%llu' expects type 'long long unsigned int', but argument 2 has type 'uint64'
pgstattuple.c:335: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'uint64'
pgstattuple.c:335: warning: format '%llu' expects type 'long long unsigned int', but argument 4 has type 'uint64'
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels
-fno-strict-aliasing-g -fpic -I. -I../../src/include -D_GNU_SOURCE   -c -o pgstatindex.o pgstatindex.c 
pgstatindex.c: In function 'bt_page_items':
pgstatindex.c:564: warning: format '%d' expects type 'int', but argument 4 has type 'long unsigned int'
pgstatindex.c:564: warning: format '%d' expects type 'int', but argument 4 has type 'long unsigned int'
ar crs libpgstattuple.a pgstattuple.o pgstatindex.o

The only reason the buildfarm isn't crashing on this contrib module is
that it lacks any regression test to crash on.

            regards, tom lane

Re: pgstattuple extension for indexes

From
Satoshi Nagayasu
Date:
Thanks Bruce,

Here are updated Japanese README, and uninstall_pgstattuple.sql.

Bruce Momjian wrote:
> Patch applied.  Thanks.
>
> I updated the README documentation for the new functions, attached.  I
> could not update the Japanese version of the README.
>
> ---------------------------------------------------------------------------
>
>
> Satoshi Nagayasu wrote:
>> Bruce,
>>
>> Attached patch has been cleaned up,
>> and modified to be able to work with CVS HEAD.
>>
>> Thanks.
>>
>> Satoshi Nagayasu wrote:
>>> Alvaro,
>>>
>>> Alvaro Herrera wrote:
>>>> Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
>>>>
>>>> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
>>>> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this
function)
>>>>
>>>>
>>>> While you're at it, please consider removing C++ style comments and
>>>> unused code.
>>>>
>>>> Formatting is way off as well, but I guess that is easily fixed with
>>>> pgindent.
>>> Thanks for comments. I'm going to fix my patch from now.
>>>
>>>> Regarding the pg_relpages function, why do you think it's necessary?
>>>> (It returns the true number of blocks of a given relation).  It may
>>>> belong into core given a reasonable use case, but otherwise it doesn't
>>>> seem to belong into pgstatindex (or pgstattuple for that matter).
>>> I wanted to sample some pages from the table/index, and get their statistics
>>> to know table/index conditions. I know pgstattuple() reports table
>>> statistics, however, pgstattuple() generates heavy CPU and I/O load.
>>>
>>> When we need to sample some pages from table/index, we need to know
>>> true number of blocks.
>>>
>>> I have another function, called pgstatpage(), to get information inside
>>> a single block/page statistics of the table. pg_relpages() will be used
>>> with this.
>>>
>>> Sorry for not mentioned in previous post about pgstatpage(),
>>> but I've remembered about it just now.
>>>
>>> Many memories in my brain have already `paged-out` (too busy in last few months),
>>> and some of them got `out-of-memory`. :^)
>>>
>>> Thanks.
>>
>> --
>> NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
>> Phone: +81-3-3523-8122
>
>> diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
>> --- pgstattuple.orig/Makefile    2006-02-27 21:54:40.000000000 +0900
>> +++ pgstattuple/Makefile    2006-08-14 09:28:58.000000000 +0900
>> @@ -6,7 +6,7 @@
>>  #
>>  #-------------------------------------------------------------------------
>>
>> -SRCS        = pgstattuple.c
>> +SRCS        = pgstattuple.c pgstatindex.c
>>
>>  MODULE_big    = pgstattuple
>>  OBJS        = $(SRCS:.c=.o)
>> diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
>> --- pgstattuple.orig/pgstatindex.c    1970-01-01 09:00:00.000000000 +0900
>> +++ pgstattuple/pgstatindex.c    2006-08-14 11:24:23.000000000 +0900
>> @@ -0,0 +1,706 @@
>> +/*
>> + * pgstatindex
>> + *
>> + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp>
>> + *
>> + * Permission to use, copy, modify, and distribute this software and
>> + * its documentation for any purpose, without fee, and without a
>> + * written agreement is hereby granted, provided that the above
>> + * copyright notice and this paragraph and the following two
>> + * paragraphs appear in all copies.
>> + *
>> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
>> + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
>> + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
>> + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
>> + * OF THE POSSIBILITY OF SUCH DAMAGE.
>> + *
>> + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
>> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> + * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
>> + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
>> + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
>> + */
>> +
>> +#include "postgres.h"
>> +
>> +#include "fmgr.h"
>> +#include "funcapi.h"
>> +#include "access/heapam.h"
>> +#include "access/itup.h"
>> +#include "access/nbtree.h"
>> +#include "access/transam.h"
>> +#include "catalog/namespace.h"
>> +#include "catalog/pg_type.h"
>> +#include "utils/builtins.h"
>> +#include "utils/inval.h"
>> +
>> +PG_FUNCTION_INFO_V1(pgstatindex);
>> +PG_FUNCTION_INFO_V1(bt_metap);
>> +PG_FUNCTION_INFO_V1(bt_page_items);
>> +PG_FUNCTION_INFO_V1(bt_page_stats);
>> +PG_FUNCTION_INFO_V1(pg_relpages);
>> +
>> +extern Datum pgstatindex(PG_FUNCTION_ARGS);
>> +extern Datum bt_metap(PG_FUNCTION_ARGS);
>> +extern Datum bt_page_items(PG_FUNCTION_ARGS);
>> +extern Datum bt_page_stats(PG_FUNCTION_ARGS);
>> +extern Datum pg_relpages(PG_FUNCTION_ARGS);
>> +
>> +#define PGSTATINDEX_TYPE "public.pgstatindex_type"
>> +#define PGSTATINDEX_NCOLUMNS 10
>> +
>> +#define BTMETAP_TYPE "public.bt_metap_type"
>> +#define BTMETAP_NCOLUMNS 6
>> +
>> +#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
>> +#define BTPAGEITEMS_NCOLUMNS 6
>> +
>> +#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
>> +#define BTPAGESTATS_NCOLUMNS 11
>> +
>> +
>> +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
>> +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
>> +
>> +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
>> +        if ( !(FirstOffsetNumber<=(offset) && \
>> +                        (offset)<=PageGetMaxOffsetNumber(page)) ) \
>> +             elog(ERROR, "Page offset number out of range."); }
>> +
>> +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
>> +        if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
>> +             elog(ERROR, "Block number out of range."); }
>> +
>> +/* ------------------------------------------------
>> + * structure for single btree page statistics
>> + * ------------------------------------------------
>> + */
>> +typedef struct BTPageStat
>> +{
>> +    uint32        blkno;
>> +    uint32        live_items;
>> +    uint32        dead_items;
>> +    uint32        page_size;
>> +    uint32        max_avail;
>> +    uint32        free_size;
>> +    uint32        avg_item_size;
>> +    uint32        fragments;
>> +    char        type;
>> +
>> +    /* opaque data */
>> +    BlockNumber btpo_prev;
>> +    BlockNumber btpo_next;
>> +    union
>> +    {
>> +        uint32        level;
>> +        TransactionId xact;
>> +    }            btpo;
>> +    uint16        btpo_flags;
>> +    BTCycleId    btpo_cycleid;
>> +}    BTPageStat;
>> +
>> +/* ------------------------------------------------
>> + * A structure for a whole btree index statistics
>> + * used by pgstatindex().
>> + * ------------------------------------------------
>> + */
>> +typedef struct BTIndexStat
>> +{
>> +    uint32        magic;
>> +    uint32        version;
>> +    BlockNumber root_blkno;
>> +    uint32        level;
>> +
>> +    BlockNumber fastroot;
>> +    uint32        fastlevel;
>> +
>> +    uint32        live_items;
>> +    uint32        dead_items;
>> +
>> +    uint32        root_pages;
>> +    uint32        internal_pages;
>> +    uint32        leaf_pages;
>> +    uint32        empty_pages;
>> +    uint32        deleted_pages;
>> +
>> +    uint32        page_size;
>> +    uint32        avg_item_size;
>> +
>> +    uint32        max_avail;
>> +    uint32        free_space;
>> +
>> +    uint32        fragments;
>> +}    BTIndexStat;
>> +
>> +/* -------------------------------------------------
>> + * GetBTPageStatistics()
>> + *
>> + * Collect statistics of single b-tree leaf page
>> + * -------------------------------------------------
>> + */
>> +static bool
>> +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat * stat)
>> +{
>> +    Page        page = BufferGetPage(buffer);
>> +    PageHeader    phdr = (PageHeader) page;
>> +    OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
>> +    BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
>> +    int            item_size = 0;
>> +    int            off;
>> +
>> +    stat->blkno = blkno;
>> +
>> +    stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData);
>> +
>> +    stat->dead_items = stat->live_items = 0;
>> +
>> +    stat->page_size = PageGetPageSize(page);
>> +
>> +    /* page type (flags) */
>> +    if (P_ISDELETED(opaque))
>> +    {
>> +        stat->type = 'd';
>> +        return true;
>> +    }
>> +    else if (P_IGNORE(opaque))
>> +        stat->type = 'e';
>> +    else if (P_ISLEAF(opaque))
>> +        stat->type = 'l';
>> +    else if (P_ISROOT(opaque))
>> +        stat->type = 'r';
>> +    else
>> +        stat->type = 'i';
>> +
>> +    /* btpage opaque data */
>> +    stat->btpo_prev = opaque->btpo_prev;
>> +    stat->btpo_next = opaque->btpo_next;
>> +    if (P_ISDELETED(opaque))
>> +        stat->btpo.xact = opaque->btpo.xact;
>> +    else
>> +        stat->btpo.level = opaque->btpo.level;
>> +    stat->btpo_flags = opaque->btpo_flags;
>> +    stat->btpo_cycleid = opaque->btpo_cycleid;
>> +
>> +    /*----------------------------------------------
>> +     * If a next leaf is on the previous block,
>> +     * it means a fragmentation.
>> +     *----------------------------------------------
>> +     */
>> +    stat->fragments = 0;
>> +    if (stat->type == 'l')
>> +    {
>> +        if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
>> +            stat->fragments++;
>> +    }
>> +
>> +    /* count live and dead tuples, and free space */
>> +    for (off = FirstOffsetNumber; off <= maxoff; off++)
>> +    {
>> +        IndexTuple    itup;
>> +
>> +        ItemId        id = PageGetItemId(page, off);
>> +
>> +        itup = (IndexTuple) PageGetItem(page, id);
>> +
>> +        item_size += IndexTupleSize(itup);
>> +
>> +        if (!ItemIdDeleted(id))
>> +            stat->live_items++;
>> +        else
>> +            stat->dead_items++;
>> +    }
>> +    stat->free_size = PageGetFreeSpace(page);
>> +
>> +    if ((stat->live_items + stat->dead_items) > 0)
>> +        stat->avg_item_size = item_size / (stat->live_items + stat->dead_items);
>> +    else
>> +        stat->avg_item_size = 0;
>> +
>> +    return true;
>> +}
>> +
>> +
>> +/* ------------------------------------------------------
>> + * pgstatindex()
>> + *
>> + * Usage: SELECT * FROM pgstatindex('t1_pkey');
>> + * ------------------------------------------------------
>> + */
>> +Datum
>> +pgstatindex(PG_FUNCTION_ARGS)
>> +{
>> +    text       *relname = PG_GETARG_TEXT_P(0);
>> +    Relation    rel;
>> +    RangeVar   *relrv;
>> +    Datum        result;
>> +    uint32        nblocks;
>> +    uint32        blkno;
>> +    BTIndexStat indexStat;
>> +
>> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
>> +    rel = relation_openrv(relrv, AccessShareLock);
>> +
>> +    if (!IS_INDEX(rel) || !IS_BTREE(rel))
>> +        elog(ERROR, "pgstatindex() can be used only on b-tree index.");
>> +
>> +    /*-------------------
>> +     * Read a metapage
>> +     *-------------------
>> +     */
>> +    {
>> +        Buffer        buffer = ReadBuffer(rel, 0);
>> +        Page        page = BufferGetPage(buffer);
>> +        BTMetaPageData *metad = BTPageGetMeta(page);
>> +
>> +        indexStat.magic = metad->btm_magic;
>> +        indexStat.version = metad->btm_version;
>> +        indexStat.root_blkno = metad->btm_root;
>> +        indexStat.level = metad->btm_level;
>> +        indexStat.fastroot = metad->btm_fastroot;
>> +        indexStat.fastlevel = metad->btm_fastlevel;
>> +
>> +        ReleaseBuffer(buffer);
>> +    }
>> +
>> +    nblocks = RelationGetNumberOfBlocks(rel);
>> +
>> +    /* -- init stat -- */
>> +    indexStat.fragments = 0;
>> +
>> +    indexStat.root_pages = 0;
>> +    indexStat.leaf_pages = 0;
>> +    indexStat.internal_pages = 0;
>> +    indexStat.empty_pages = 0;
>> +    indexStat.deleted_pages = 0;
>> +
>> +    indexStat.max_avail = 0;
>> +    indexStat.free_space = 0;
>> +
>> +    /*-----------------------
>> +     * Scan all blocks
>> +     *-----------------------
>> +     */
>> +    for (blkno = 1; blkno < nblocks; blkno++)
>> +    {
>> +        Buffer        buffer = ReadBuffer(rel, blkno);
>> +        BTPageStat    stat;
>> +
>> +        /* scan one page */
>> +        stat.blkno = blkno;
>> +        GetBTPageStatistics(blkno, buffer, &stat);
>> +
>> +        /*---------------------
>> +         * page status (type)
>> +         *---------------------
>> +         */
>> +        switch (stat.type)
>> +        {
>> +            case 'd':
>> +                indexStat.deleted_pages++;
>> +                break;
>> +            case 'l':
>> +                indexStat.leaf_pages++;
>> +                break;
>> +            case 'i':
>> +                indexStat.internal_pages++;
>> +                break;
>> +            case 'e':
>> +                indexStat.empty_pages++;
>> +                break;
>> +            case 'r':
>> +                indexStat.root_pages++;
>> +                break;
>> +            default:
>> +                elog(ERROR, "unknown page status.");
>> +        }
>> +
>> +        /* -- leaf fragmentation -- */
>> +        indexStat.fragments += stat.fragments;
>> +
>> +        if (stat.type == 'l')
>> +        {
>> +            indexStat.max_avail += stat.max_avail;
>> +            indexStat.free_space += stat.free_size;
>> +        }
>> +
>> +        ReleaseBuffer(buffer);
>> +    }
>> +
>> +    relation_close(rel, AccessShareLock);
>> +
>> +    /*----------------------------
>> +     * Build a result tuple
>> +     *----------------------------
>> +     */
>> +    {
>> +        TupleDesc    tupleDesc;
>> +        int            j;
>> +        char       *values[PGSTATINDEX_NCOLUMNS];
>> +
>> +        HeapTupleData tupleData;
>> +        HeapTuple    tuple = &tupleData;
>> +
>> +        tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
>> +
>> +        j = 0;
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", indexStat.version);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", indexStat.level);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", (indexStat.root_pages +
>> +                                         indexStat.leaf_pages +
>> +                                         indexStat.internal_pages +
>> +                                         indexStat.deleted_pages +
>> +                                         indexStat.empty_pages) * BLCKSZ);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", indexStat.root_blkno);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", indexStat.internal_pages);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", indexStat.empty_pages);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%.2f", 100.0 - (float) indexStat.free_space / (float) indexStat.max_avail *
100.0);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%.2f", (float) indexStat.fragments / (float) indexStat.leaf_pages * 100.0);
>> +
>> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
>> +                                       values);
>> +
>> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
>> +    }
>> +
>> +    PG_RETURN_DATUM(result);
>> +}
>> +
>> +/* -----------------------------------------------
>> + * bt_page()
>> + *
>> + * Usage: SELECT * FROM bt_page('t1_pkey', 0);
>> + * -----------------------------------------------
>> + */
>> +Datum
>> +bt_page_stats(PG_FUNCTION_ARGS)
>> +{
>> +    text       *relname = PG_GETARG_TEXT_P(0);
>> +    uint32        blkno = PG_GETARG_UINT32(1);
>> +    Buffer        buffer;
>> +
>> +    Relation    rel;
>> +    RangeVar   *relrv;
>> +    Datum        result;
>> +
>> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
>> +    rel = relation_openrv(relrv, AccessShareLock);
>> +
>> +    CHECK_RELATION_BLOCK_RANGE(rel, blkno);
>> +
>> +    buffer = ReadBuffer(rel, blkno);
>> +
>> +    if (!IS_INDEX(rel) || !IS_BTREE(rel))
>> +        elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
>> +
>> +    if (blkno == 0)
>> +        elog(ERROR, "Block 0 is a meta page.");
>> +
>> +    {
>> +        HeapTuple    tuple;
>> +        TupleDesc    tupleDesc;
>> +        int            j;
>> +        char       *values[BTPAGESTATS_NCOLUMNS];
>> +
>> +        BTPageStat    stat;
>> +
>> +        GetBTPageStatistics(blkno, buffer, &stat);
>> +
>> +        tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
>> +
>> +        j = 0;
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", stat.blkno);
>> +
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%c", stat.type);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", stat.live_items);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", stat.dead_items);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", stat.avg_item_size);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", stat.page_size);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", stat.free_size);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", stat.btpo_prev);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", stat.btpo_next);
>> +
>> +        values[j] = palloc(32);
>> +        if (stat.type == 'd')
>> +            snprintf(values[j++], 32, "%d", stat.btpo.xact);
>> +        else
>> +            snprintf(values[j++], 32, "%d", stat.btpo.level);
>> +
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", stat.btpo_flags);
>> +
>> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
>> +                                       values);
>> +
>> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
>> +    }
>> +
>> +    ReleaseBuffer(buffer);
>> +
>> +    relation_close(rel, AccessShareLock);
>> +
>> +    PG_RETURN_DATUM(result);
>> +}
>> +
>> +/*-------------------------------------------------------
>> + * bt_page_items()
>> + *
>> + * Get IndexTupleData set in a leaf page
>> + *
>> + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
>> + *-------------------------------------------------------
>> + */
>> +/* ---------------------------------------------------
>> + * data structure for SRF to hold a scan information
>> + * ---------------------------------------------------
>> + */
>> +struct user_args
>> +{
>> +    TupleDesc    tupd;
>> +    Relation    rel;
>> +    Buffer        buffer;
>> +    Page        page;
>> +    uint16        offset;
>> +};
>> +
>> +Datum
>> +bt_page_items(PG_FUNCTION_ARGS)
>> +{
>> +    text       *relname = PG_GETARG_TEXT_P(0);
>> +    uint32        blkno = PG_GETARG_UINT32(1);
>> +
>> +    RangeVar   *relrv;
>> +    Datum        result;
>> +    char       *values[BTPAGEITEMS_NCOLUMNS];
>> +    BTPageOpaque opaque;
>> +    HeapTuple    tuple;
>> +    ItemId        id;
>> +
>> +    FuncCallContext *fctx;
>> +    MemoryContext mctx;
>> +    struct user_args *uargs = NULL;
>> +
>> +    if (blkno == 0)
>> +        elog(ERROR, "Block 0 is a meta page.");
>> +
>> +    if (SRF_IS_FIRSTCALL())
>> +    {
>> +        fctx = SRF_FIRSTCALL_INIT();
>> +        mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
>> +
>> +        uargs = palloc(sizeof(struct user_args));
>> +
>> +        uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
>> +        uargs->offset = FirstOffsetNumber;
>> +
>> +        relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
>> +        uargs->rel = relation_openrv(relrv, AccessShareLock);
>> +
>> +        CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno);
>> +
>> +        uargs->buffer = ReadBuffer(uargs->rel, blkno);
>> +
>> +        if (!IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel))
>> +            elog(ERROR, "bt_page_items() can be used only on b-tree index.");
>> +
>> +        uargs->page = BufferGetPage(uargs->buffer);
>> +
>> +        opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
>> +
>> +        if (P_ISDELETED(opaque))
>> +            elog(NOTICE, "bt_page_items(): this page is deleted.");
>> +
>> +        fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
>> +        fctx->user_fctx = uargs;
>> +
>> +        MemoryContextSwitchTo(mctx);
>> +    }
>> +
>> +    fctx = SRF_PERCALL_SETUP();
>> +    uargs = fctx->user_fctx;
>> +
>> +    if (fctx->call_cntr < fctx->max_calls)
>> +    {
>> +        IndexTuple    itup;
>> +
>> +        id = PageGetItemId(uargs->page, uargs->offset);
>> +
>> +        if (!ItemIdIsValid(id))
>> +            elog(ERROR, "Invalid ItemId.");
>> +
>> +        itup = (IndexTuple) PageGetItem(uargs->page, id);
>> +
>> +        {
>> +            int            j = 0;
>> +
>> +            BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
>> +
>> +            values[j] = palloc(32);
>> +            snprintf(values[j++], 32, "%d", uargs->offset);
>> +            values[j] = palloc(32);
>> +            snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
>> +            values[j] = palloc(32);
>> +            snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
>> +            values[j] = palloc(32);
>> +            snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
>> +            values[j] = palloc(32);
>> +            snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
>> +
>> +            {
>> +                int            off;
>> +                char       *dump;
>> +                char       *ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
>> +
>> +                dump = palloc(IndexTupleSize(itup) * 3);
>> +                memset(dump, 0, IndexTupleSize(itup) * 3);
>> +
>> +                for (off = 0;
>> +                     off < IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
>> +                     off++)
>> +                {
>> +                    if (dump[0] == '\0')
>> +                        sprintf(dump, "%02x", *(ptr + off) & 0xff);
>> +                    else
>> +                    {
>> +                        char        buf[4];
>> +
>> +                        sprintf(buf, " %02x", *(ptr + off) & 0xff);
>> +                        strcat(dump, buf);
>> +                    }
>> +                }
>> +                values[j] = dump;
>> +            }
>> +
>> +            tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
>> +            result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
>> +        }
>> +
>> +        uargs->offset = uargs->offset + 1;
>> +
>> +        SRF_RETURN_NEXT(fctx, result);
>> +    }
>> +    else
>> +    {
>> +        ReleaseBuffer(uargs->buffer);
>> +        relation_close(uargs->rel, AccessShareLock);
>> +
>> +        SRF_RETURN_DONE(fctx);
>> +    }
>> +}
>> +
>> +
>> +/* ------------------------------------------------
>> + * bt_metap()
>> + *
>> + * Get a btree meta-page information
>> + *
>> + * Usage: SELECT * FROM bt_metap('t1_pkey')
>> + * ------------------------------------------------
>> + */
>> +Datum
>> +bt_metap(PG_FUNCTION_ARGS)
>> +{
>> +    text       *relname = PG_GETARG_TEXT_P(0);
>> +    Buffer        buffer;
>> +
>> +    Relation    rel;
>> +    RangeVar   *relrv;
>> +    Datum        result;
>> +
>> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
>> +    rel = relation_openrv(relrv, AccessShareLock);
>> +
>> +    if (!IS_INDEX(rel) || !IS_BTREE(rel))
>> +        elog(ERROR, "bt_metap() can be used only on b-tree index.");
>> +
>> +    buffer = ReadBuffer(rel, 0);
>> +
>> +    {
>> +        BTMetaPageData *metad;
>> +
>> +        TupleDesc    tupleDesc;
>> +        int            j;
>> +        char       *values[BTMETAP_NCOLUMNS];
>> +        HeapTuple    tuple;
>> +
>> +        Page        page = BufferGetPage(buffer);
>> +
>> +        metad = BTPageGetMeta(page);
>> +
>> +        tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
>> +
>> +        j = 0;
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", metad->btm_magic);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", metad->btm_version);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", metad->btm_root);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", metad->btm_level);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", metad->btm_fastroot);
>> +        values[j] = palloc(32);
>> +        snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
>> +
>> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
>> +                                       values);
>> +
>> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
>> +    }
>> +
>> +    ReleaseBuffer(buffer);
>> +
>> +    relation_close(rel, AccessShareLock);
>> +
>> +    PG_RETURN_DATUM(result);
>> +}
>> +
>> +/* --------------------------------------------------------
>> + * pg_relpages()
>> + *
>> + * Get a number of pages of the table/index.
>> + *
>> + * Usage: SELECT pg_relpages('t1');
>> + *          SELECT pg_relpages('t1_pkey');
>> + * --------------------------------------------------------
>> + */
>> +Datum
>> +pg_relpages(PG_FUNCTION_ARGS)
>> +{
>> +    text       *relname = PG_GETARG_TEXT_P(0);
>> +
>> +    Relation    rel;
>> +    RangeVar   *relrv;
>> +    int4        relpages;
>> +
>> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
>> +    rel = relation_openrv(relrv, AccessShareLock);
>> +
>> +    relpages = RelationGetNumberOfBlocks(rel);
>> +
>> +    relation_close(rel, AccessShareLock);
>> +
>> +    PG_RETURN_INT32(relpages);
>> +}
>> diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
>> --- pgstattuple.orig/pgstattuple.sql.in    2006-02-28 01:09:50.000000000 +0900
>> +++ pgstattuple/pgstattuple.sql.in    2006-08-14 10:37:32.000000000 +0900
>> @@ -22,3 +22,96 @@
>>  RETURNS pgstattuple_type
>>  AS 'MODULE_PATHNAME', 'pgstattuplebyid'
>>  LANGUAGE C STRICT;
>> +
>> +--
>> +-- pgstatindex
>> +--
>> +DROP TYPE pgstatindex_type CASCADE;
>> +CREATE TYPE pgstatindex_type AS (
>> +  version int4,
>> +  tree_level int4,
>> +  index_size int4,
>> +  root_block_no int4,
>> +  internal_pages int4,
>> +  leaf_pages int4,
>> +  empty_pages int4,
>> +  deleted_pages int4,
>> +  avg_leaf_density float8,
>> +  leaf_fragmentation float8
>> +);
>> +
>> +CREATE OR REPLACE FUNCTION pgstatindex(text)
>> +RETURNS pgstatindex_type
>> +AS 'MODULE_PATHNAME', 'pgstatindex'
>> +LANGUAGE 'C' STRICT;
>> +
>> +--
>> +-- bt_metap()
>> +--
>> +DROP TYPE bt_metap_type CASCADE;
>> +CREATE TYPE bt_metap_type AS (
>> +  magic int4,
>> +  version int4,
>> +  root int4,
>> +  level int4,
>> +  fastroot int4,
>> +  fastlevel int4
>> +);
>> +
>> +CREATE OR REPLACE FUNCTION bt_metap(text)
>> +RETURNS bt_metap_type
>> +AS 'MODULE_PATHNAME', 'bt_metap'
>> +LANGUAGE 'C' STRICT;
>> +
>> +--
>> +-- bt_page_items()
>> +--
>> +DROP TYPE bt_page_items_type CASCADE;
>> +CREATE TYPE bt_page_items_type AS (
>> +  itemoffset int4,
>> +  ctid tid,
>> +  itemlen int4,
>> +  nulls bool,
>> +  vars bool,
>> +  data text
>> +);
>> +
>> +DROP FUNCTION bt_page_items(text, int4);
>> +
>> +CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
>> +RETURNS SETOF bt_page_items_type
>> +AS 'MODULE_PATHNAME', 'bt_page_items'
>> +LANGUAGE 'C' STRICT;
>> +
>> +--
>> +-- bt_page_stats()
>> +--
>> +DROP TYPE bt_page_stats_type CASCADE;
>> +CREATE TYPE bt_page_stats_type AS (
>> +  blkno int4,
>> +  type char,
>> +  live_items int4,
>> +  dead_items int4,
>> +  avg_item_size float,
>> +  page_size int4,
>> +  free_size int4,
>> +  btpo_prev int4,
>> +  btpo_next int4,
>> +  btpo int4,
>> +  btpo_flags int4
>> +);
>> +
>> +DROP FUNCTION bt_page_stats(text, int4);
>> +
>> +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
>> +RETURNS bt_page_stats_type
>> +AS 'MODULE_PATHNAME', 'bt_page_stats'
>> +LANGUAGE 'C' STRICT;
>> +
>> +--
>> +-- pg_relpages()
>> +--
>> +CREATE OR REPLACE FUNCTION pg_relpages(text)
>> +RETURNS int
>> +AS 'MODULE_PATHNAME', 'pg_relpages'
>> +LANGUAGE 'C' STRICT;
>> diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
>> --- pgstattuple.orig/test.sh    1970-01-01 09:00:00.000000000 +0900
>> +++ pgstattuple/test.sh    2006-08-14 10:23:08.000000000 +0900
>> @@ -0,0 +1,27 @@
>> +#!/bin/sh
>> +
>> +PGHOME=/home/snaga/pgsql20060814
>> +export PATH=${PGHOME}/bin:$PATH
>> +
>> +psql -p 9999 pgbench<<EOF
>> +DROP FUNCTION pgstatindex(text);
>> +EOF
>> +
>> +psql -p 9999 -f ${PGHOME}/share/contrib/pgstattuple.sql pgbench
>> +
>> +psql -p 9999 pgbench<<EOF
>> +SELECT * FROM pg_relpages('accounts_pkey');
>> +\x
>> +SELECT * FROM pgstatindex('accounts_pkey');
>> +SELECT * FROM bt_metap('accounts_pkey');
>> +\x
>> +SELECT * FROM bt_page_items('accounts_pkey', 0);
>> +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
>> +
>> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
>> +\x
>> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
>> +SELECT * FROM bt_page_stats('accounts_pkey', 361);
>> +\x
>> +SELECT * FROM bt_page_items('accounts_pkey', 361);
>> +EOF
>>
>>
>> ------------------------------------------------------------------------
>>
>> pgstattuple README            2002/08/29 Tatsuo Ishii
>>
>> 1. Functions supported:
>>
>>     pgstattuple
>>     -----------
>>     pgstattuple() returns the relation length, percentage of the "dead"
>>     tuples of a relation and other info. This may help users to determine
>>     whether vacuum is necessary or not. Here is an example session:
>>
>>         test=> \x
>>         Expanded display is on.
>>         test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
>>         -[ RECORD 1 ]------+-------
>>         table_len          | 458752
>>         tuple_count        | 1470
>>         tuple_len          | 438896
>>         tuple_percent      | 95.67
>>         dead_tuple_count   | 11
>>         dead_tuple_len     | 3157
>>         dead_tuple_percent | 0.69
>>         free_space         | 8932
>>         free_percent       | 1.95
>>
>>     Here are explanations for each column:
>>
>>         table_len        -- physical relation length in bytes
>>         tuple_count        -- number of live tuples
>>         tuple_len        -- total tuples length in bytes
>>         tuple_percent    -- live tuples in %
>>         dead_tuple_len    -- total dead tuples length in bytes
>>         dead_tuple_percent    -- dead tuples in %
>>         free_space        -- free space in bytes
>>         free_percent    -- free space in %
>>
>>     pg_relpages
>>     -----------
>>     pg_relpages() returns the number of pages in the relation.
>>
>>     pgstatindex
>>     -----------
>>     pgstatindex() returns an array showing the information about an index:
>>
>>         test=> \x
>>         Expanded display is on.
>>         test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
>>         -[ RECORD 1 ]------+------
>>         version            | 2
>>         tree_level         | 0
>>         index_size         | 8192
>>         root_block_no      | 1
>>         internal_pages     | 0
>>         leaf_pages         | 1
>>         empty_pages        | 0
>>         deleted_pages      | 0
>>         avg_leaf_density   | 50.27
>>         leaf_fragmentation | 0
>>
>>     bt_metap
>>     --------
>>     bt_metap() returns information about the btree index metapage:
>>
>>         test=> SELECT * FROM bt_metap('pg_cast_oid_index');
>>         -[ RECORD 1 ]-----
>>         magic     | 340322
>>         version   | 2
>>         root      | 1
>>         level     | 0
>>         fastroot  | 1
>>         fastlevel | 0
>>
>>     bt_page_stats
>>     -------------
>>     bt_page_stats() shows information about single btree pages:
>>
>>         test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
>>         -[ RECORD 1 ]-+-----
>>         blkno         | 1
>>         type          | l
>>         live_items    | 256
>>         dead_items    | 0
>>         avg_item_size | 12
>>         page_size     | 8192
>>         free_size     | 4056
>>         btpo_prev     | 0
>>         btpo_next     | 0
>>         btpo          | 0
>>         btpo_flags    | 3
>>
>>     bt_page_items
>>     -------------
>>     bt_page_items() returns information about specific items on btree pages:
>>
>>         test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
>>          itemoffset |  ctid   | itemlen | nulls | vars |    data
>>         ------------+---------+---------+-------+------+-------------
>>                   1 | (0,1)   |      12 | f     | f    | 23 27 00 00
>>                   2 | (0,2)   |      12 | f     | f    | 24 27 00 00
>>                   3 | (0,3)   |      12 | f     | f    | 25 27 00 00
>>                   4 | (0,4)   |      12 | f     | f    | 26 27 00 00
>>                   5 | (0,5)   |      12 | f     | f    | 27 27 00 00
>>                   6 | (0,6)   |      12 | f     | f    | 28 27 00 00
>>                   7 | (0,7)   |      12 | f     | f    | 29 27 00 00
>>                   8 | (0,8)   |      12 | f     | f    | 2a 27 00 00
>>
>>
>> 2. Installing pgstattuple
>>
>>     $ make
>>     $ make install
>>     $ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test
>>
>>
>> 3. Using pgstattuple
>>
>>     pgstattuple may be called as a relation function and is
>>     defined as follows:
>>
>>     CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
>>      AS 'MODULE_PATHNAME', 'pgstattuple'
>>      LANGUAGE C STRICT;
>>
>>     CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
>>      AS 'MODULE_PATHNAME', 'pgstattuplebyid'
>>      LANGUAGE C STRICT;
>>
>>     The argument is the relation name (optionally it may be qualified)
>>     or the OID of the relation.  Note that pgstattuple only returns
>>     one row.
>>
>>
>> 4. Notes
>>
>>     pgstattuple acquires only a read lock on the relation. So concurrent
>>     update may affect the result.
>>
>>     pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow()
>>     returns false.
>>
>>
>> 5. History
>>
>>     2006/06/28
>>
>>     Extended to work against indexes.


--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122

Re: pgstattuple extension for indexes

From
Satoshi Nagayasu
Date:
Tom Lane wrote:
> gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels
-fno-strict-aliasing-g -fpic -I. -I../../src/include -D_GNU_SOURCE   -c -o pgstatindex.o pgstatindex.c 
> pgstatindex.c: In function 'bt_page_items':
> pgstatindex.c:564: warning: format '%d' expects type 'int', but argument 4 has type 'long unsigned int'
> pgstatindex.c:564: warning: format '%d' expects type 'int', but argument 4 has type 'long unsigned int'

I guess my '%d' should be '%zd', right?
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
*** pgstatindex.c    2006-09-03 02:05:29.000000000 +0900
--- pgstatindex.c.new    2006-09-04 08:22:42.000000000 +0900
***************
*** 561,567 ****
              values[j] = palloc(32);
              snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
              values[j] = palloc(32);
!             snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
              values[j] = palloc(32);
              snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
              values[j] = palloc(32);
--- 561,567 ----
              values[j] = palloc(32);
              snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
              values[j] = palloc(32);
!             snprintf(values[j++], 32, "%zd", IndexTupleSize(itup));
              values[j] = palloc(32);
              snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
              values[j] = palloc(32);

Re: pgstattuple extension for indexes

From
ITAGAKI Takahiro
Date:
Bruce Momjian <bruce@momjian.us> wrote:
> Patch applied.  Thanks.

The two attached patches fix contrib/pgstattuple.

pgstattuple.c.diff removes the fragmemtation reporting in pgstattuple().
It is no longer needed, because pgstatindex() has upward functionality now.
Also, the report using elog was judged as improper in earlier discusses.

pgstattuple.sql.in.diff removes DROP statements in the installer. The DROP
statements make some unpleasant ERROR logs during install. According to
other contrib modules, DROPs should be in the uninstaller and should not
be in the installer.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


Attachment

Re: pgstattuple extension for indexes

From
Tom Lane
Date:
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> The two attached patches fix contrib/pgstattuple.

Good, applied.  I made some additional changes to get install/uninstall/
reinstall to work cleanly after the latest additions, and to get it to
compile without warnings on a 64-bit Fedora machine.  (It seems to
actually work there, too.)

I notice that the original pgstattuple() function comes in two flavors,
one with OID input and one with text-relation-name input.  Shouldn't all
the others be likewise?

            regards, tom lane

Re: pgstattuple extension for indexes

From
Tom Lane
Date:
Satoshi Nagayasu <nagayasus@nttdata.co.jp> writes:
> Tom Lane wrote:
>> pgstatindex.c: In function 'bt_page_items':
>> pgstatindex.c:564: warning: format '%d' expects type 'int', but argument 4 has type 'long unsigned int'

> I guess my '%d' should be '%zd', right?

No, that sounds even less portable :-(

Given the expected range of IndexTupleSize(), it seems sufficient to
cast its result to int and then use %d formatting.  I've done that
in the latest commit.

            regards, tom lane

Re: pgstattuple extension for indexes

From
Bruce Momjian
Date:
Applied.

---------------------------------------------------------------------------

Satoshi Nagayasu wrote:
> Thanks Bruce,
>
> Here are updated Japanese README, and uninstall_pgstattuple.sql.
>
> Bruce Momjian wrote:
> > Patch applied.  Thanks.
> >
> > I updated the README documentation for the new functions, attached.  I
> > could not update the Japanese version of the README.
> >
> > ---------------------------------------------------------------------------
> >
> >
> > Satoshi Nagayasu wrote:
> >> Bruce,
> >>
> >> Attached patch has been cleaned up,
> >> and modified to be able to work with CVS HEAD.
> >>
> >> Thanks.
> >>
> >> Satoshi Nagayasu wrote:
> >>> Alvaro,
> >>>
> >>> Alvaro Herrera wrote:
> >>>> Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
> >>>>
> >>>> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
> >>>> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this
function)
> >>>>
> >>>>
> >>>> While you're at it, please consider removing C++ style comments and
> >>>> unused code.
> >>>>
> >>>> Formatting is way off as well, but I guess that is easily fixed with
> >>>> pgindent.
> >>> Thanks for comments. I'm going to fix my patch from now.
> >>>
> >>>> Regarding the pg_relpages function, why do you think it's necessary?
> >>>> (It returns the true number of blocks of a given relation).  It may
> >>>> belong into core given a reasonable use case, but otherwise it doesn't
> >>>> seem to belong into pgstatindex (or pgstattuple for that matter).
> >>> I wanted to sample some pages from the table/index, and get their statistics
> >>> to know table/index conditions. I know pgstattuple() reports table
> >>> statistics, however, pgstattuple() generates heavy CPU and I/O load.
> >>>
> >>> When we need to sample some pages from table/index, we need to know
> >>> true number of blocks.
> >>>
> >>> I have another function, called pgstatpage(), to get information inside
> >>> a single block/page statistics of the table. pg_relpages() will be used
> >>> with this.
> >>>
> >>> Sorry for not mentioned in previous post about pgstatpage(),
> >>> but I've remembered about it just now.
> >>>
> >>> Many memories in my brain have already `paged-out` (too busy in last few months),
> >>> and some of them got `out-of-memory`. :^)
> >>>
> >>> Thanks.
> >>
> >> --
> >> NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
> >> Phone: +81-3-3523-8122
> >
> >> diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
> >> --- pgstattuple.orig/Makefile    2006-02-27 21:54:40.000000000 +0900
> >> +++ pgstattuple/Makefile    2006-08-14 09:28:58.000000000 +0900
> >> @@ -6,7 +6,7 @@
> >>  #
> >>  #-------------------------------------------------------------------------
> >>
> >> -SRCS        = pgstattuple.c
> >> +SRCS        = pgstattuple.c pgstatindex.c
> >>
> >>  MODULE_big    = pgstattuple
> >>  OBJS        = $(SRCS:.c=.o)
> >> diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
> >> --- pgstattuple.orig/pgstatindex.c    1970-01-01 09:00:00.000000000 +0900
> >> +++ pgstattuple/pgstatindex.c    2006-08-14 11:24:23.000000000 +0900
> >> @@ -0,0 +1,706 @@
> >> +/*
> >> + * pgstatindex
> >> + *
> >> + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp>
> >> + *
> >> + * Permission to use, copy, modify, and distribute this software and
> >> + * its documentation for any purpose, without fee, and without a
> >> + * written agreement is hereby granted, provided that the above
> >> + * copyright notice and this paragraph and the following two
> >> + * paragraphs appear in all copies.
> >> + *
> >> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
> >> + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
> >> + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
> >> + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
> >> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> >> + *
> >> + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
> >> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> >> + * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
> >> + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
> >> + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
> >> + */
> >> +
> >> +#include "postgres.h"
> >> +
> >> +#include "fmgr.h"
> >> +#include "funcapi.h"
> >> +#include "access/heapam.h"
> >> +#include "access/itup.h"
> >> +#include "access/nbtree.h"
> >> +#include "access/transam.h"
> >> +#include "catalog/namespace.h"
> >> +#include "catalog/pg_type.h"
> >> +#include "utils/builtins.h"
> >> +#include "utils/inval.h"
> >> +
> >> +PG_FUNCTION_INFO_V1(pgstatindex);
> >> +PG_FUNCTION_INFO_V1(bt_metap);
> >> +PG_FUNCTION_INFO_V1(bt_page_items);
> >> +PG_FUNCTION_INFO_V1(bt_page_stats);
> >> +PG_FUNCTION_INFO_V1(pg_relpages);
> >> +
> >> +extern Datum pgstatindex(PG_FUNCTION_ARGS);
> >> +extern Datum bt_metap(PG_FUNCTION_ARGS);
> >> +extern Datum bt_page_items(PG_FUNCTION_ARGS);
> >> +extern Datum bt_page_stats(PG_FUNCTION_ARGS);
> >> +extern Datum pg_relpages(PG_FUNCTION_ARGS);
> >> +
> >> +#define PGSTATINDEX_TYPE "public.pgstatindex_type"
> >> +#define PGSTATINDEX_NCOLUMNS 10
> >> +
> >> +#define BTMETAP_TYPE "public.bt_metap_type"
> >> +#define BTMETAP_NCOLUMNS 6
> >> +
> >> +#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
> >> +#define BTPAGEITEMS_NCOLUMNS 6
> >> +
> >> +#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
> >> +#define BTPAGESTATS_NCOLUMNS 11
> >> +
> >> +
> >> +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
> >> +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
> >> +
> >> +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
> >> +        if ( !(FirstOffsetNumber<=(offset) && \
> >> +                        (offset)<=PageGetMaxOffsetNumber(page)) ) \
> >> +             elog(ERROR, "Page offset number out of range."); }
> >> +
> >> +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
> >> +        if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
> >> +             elog(ERROR, "Block number out of range."); }
> >> +
> >> +/* ------------------------------------------------
> >> + * structure for single btree page statistics
> >> + * ------------------------------------------------
> >> + */
> >> +typedef struct BTPageStat
> >> +{
> >> +    uint32        blkno;
> >> +    uint32        live_items;
> >> +    uint32        dead_items;
> >> +    uint32        page_size;
> >> +    uint32        max_avail;
> >> +    uint32        free_size;
> >> +    uint32        avg_item_size;
> >> +    uint32        fragments;
> >> +    char        type;
> >> +
> >> +    /* opaque data */
> >> +    BlockNumber btpo_prev;
> >> +    BlockNumber btpo_next;
> >> +    union
> >> +    {
> >> +        uint32        level;
> >> +        TransactionId xact;
> >> +    }            btpo;
> >> +    uint16        btpo_flags;
> >> +    BTCycleId    btpo_cycleid;
> >> +}    BTPageStat;
> >> +
> >> +/* ------------------------------------------------
> >> + * A structure for a whole btree index statistics
> >> + * used by pgstatindex().
> >> + * ------------------------------------------------
> >> + */
> >> +typedef struct BTIndexStat
> >> +{
> >> +    uint32        magic;
> >> +    uint32        version;
> >> +    BlockNumber root_blkno;
> >> +    uint32        level;
> >> +
> >> +    BlockNumber fastroot;
> >> +    uint32        fastlevel;
> >> +
> >> +    uint32        live_items;
> >> +    uint32        dead_items;
> >> +
> >> +    uint32        root_pages;
> >> +    uint32        internal_pages;
> >> +    uint32        leaf_pages;
> >> +    uint32        empty_pages;
> >> +    uint32        deleted_pages;
> >> +
> >> +    uint32        page_size;
> >> +    uint32        avg_item_size;
> >> +
> >> +    uint32        max_avail;
> >> +    uint32        free_space;
> >> +
> >> +    uint32        fragments;
> >> +}    BTIndexStat;
> >> +
> >> +/* -------------------------------------------------
> >> + * GetBTPageStatistics()
> >> + *
> >> + * Collect statistics of single b-tree leaf page
> >> + * -------------------------------------------------
> >> + */
> >> +static bool
> >> +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat * stat)
> >> +{
> >> +    Page        page = BufferGetPage(buffer);
> >> +    PageHeader    phdr = (PageHeader) page;
> >> +    OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
> >> +    BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> >> +    int            item_size = 0;
> >> +    int            off;
> >> +
> >> +    stat->blkno = blkno;
> >> +
> >> +    stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData);
> >> +
> >> +    stat->dead_items = stat->live_items = 0;
> >> +
> >> +    stat->page_size = PageGetPageSize(page);
> >> +
> >> +    /* page type (flags) */
> >> +    if (P_ISDELETED(opaque))
> >> +    {
> >> +        stat->type = 'd';
> >> +        return true;
> >> +    }
> >> +    else if (P_IGNORE(opaque))
> >> +        stat->type = 'e';
> >> +    else if (P_ISLEAF(opaque))
> >> +        stat->type = 'l';
> >> +    else if (P_ISROOT(opaque))
> >> +        stat->type = 'r';
> >> +    else
> >> +        stat->type = 'i';
> >> +
> >> +    /* btpage opaque data */
> >> +    stat->btpo_prev = opaque->btpo_prev;
> >> +    stat->btpo_next = opaque->btpo_next;
> >> +    if (P_ISDELETED(opaque))
> >> +        stat->btpo.xact = opaque->btpo.xact;
> >> +    else
> >> +        stat->btpo.level = opaque->btpo.level;
> >> +    stat->btpo_flags = opaque->btpo_flags;
> >> +    stat->btpo_cycleid = opaque->btpo_cycleid;
> >> +
> >> +    /*----------------------------------------------
> >> +     * If a next leaf is on the previous block,
> >> +     * it means a fragmentation.
> >> +     *----------------------------------------------
> >> +     */
> >> +    stat->fragments = 0;
> >> +    if (stat->type == 'l')
> >> +    {
> >> +        if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
> >> +            stat->fragments++;
> >> +    }
> >> +
> >> +    /* count live and dead tuples, and free space */
> >> +    for (off = FirstOffsetNumber; off <= maxoff; off++)
> >> +    {
> >> +        IndexTuple    itup;
> >> +
> >> +        ItemId        id = PageGetItemId(page, off);
> >> +
> >> +        itup = (IndexTuple) PageGetItem(page, id);
> >> +
> >> +        item_size += IndexTupleSize(itup);
> >> +
> >> +        if (!ItemIdDeleted(id))
> >> +            stat->live_items++;
> >> +        else
> >> +            stat->dead_items++;
> >> +    }
> >> +    stat->free_size = PageGetFreeSpace(page);
> >> +
> >> +    if ((stat->live_items + stat->dead_items) > 0)
> >> +        stat->avg_item_size = item_size / (stat->live_items + stat->dead_items);
> >> +    else
> >> +        stat->avg_item_size = 0;
> >> +
> >> +    return true;
> >> +}
> >> +
> >> +
> >> +/* ------------------------------------------------------
> >> + * pgstatindex()
> >> + *
> >> + * Usage: SELECT * FROM pgstatindex('t1_pkey');
> >> + * ------------------------------------------------------
> >> + */
> >> +Datum
> >> +pgstatindex(PG_FUNCTION_ARGS)
> >> +{
> >> +    text       *relname = PG_GETARG_TEXT_P(0);
> >> +    Relation    rel;
> >> +    RangeVar   *relrv;
> >> +    Datum        result;
> >> +    uint32        nblocks;
> >> +    uint32        blkno;
> >> +    BTIndexStat indexStat;
> >> +
> >> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> >> +    rel = relation_openrv(relrv, AccessShareLock);
> >> +
> >> +    if (!IS_INDEX(rel) || !IS_BTREE(rel))
> >> +        elog(ERROR, "pgstatindex() can be used only on b-tree index.");
> >> +
> >> +    /*-------------------
> >> +     * Read a metapage
> >> +     *-------------------
> >> +     */
> >> +    {
> >> +        Buffer        buffer = ReadBuffer(rel, 0);
> >> +        Page        page = BufferGetPage(buffer);
> >> +        BTMetaPageData *metad = BTPageGetMeta(page);
> >> +
> >> +        indexStat.magic = metad->btm_magic;
> >> +        indexStat.version = metad->btm_version;
> >> +        indexStat.root_blkno = metad->btm_root;
> >> +        indexStat.level = metad->btm_level;
> >> +        indexStat.fastroot = metad->btm_fastroot;
> >> +        indexStat.fastlevel = metad->btm_fastlevel;
> >> +
> >> +        ReleaseBuffer(buffer);
> >> +    }
> >> +
> >> +    nblocks = RelationGetNumberOfBlocks(rel);
> >> +
> >> +    /* -- init stat -- */
> >> +    indexStat.fragments = 0;
> >> +
> >> +    indexStat.root_pages = 0;
> >> +    indexStat.leaf_pages = 0;
> >> +    indexStat.internal_pages = 0;
> >> +    indexStat.empty_pages = 0;
> >> +    indexStat.deleted_pages = 0;
> >> +
> >> +    indexStat.max_avail = 0;
> >> +    indexStat.free_space = 0;
> >> +
> >> +    /*-----------------------
> >> +     * Scan all blocks
> >> +     *-----------------------
> >> +     */
> >> +    for (blkno = 1; blkno < nblocks; blkno++)
> >> +    {
> >> +        Buffer        buffer = ReadBuffer(rel, blkno);
> >> +        BTPageStat    stat;
> >> +
> >> +        /* scan one page */
> >> +        stat.blkno = blkno;
> >> +        GetBTPageStatistics(blkno, buffer, &stat);
> >> +
> >> +        /*---------------------
> >> +         * page status (type)
> >> +         *---------------------
> >> +         */
> >> +        switch (stat.type)
> >> +        {
> >> +            case 'd':
> >> +                indexStat.deleted_pages++;
> >> +                break;
> >> +            case 'l':
> >> +                indexStat.leaf_pages++;
> >> +                break;
> >> +            case 'i':
> >> +                indexStat.internal_pages++;
> >> +                break;
> >> +            case 'e':
> >> +                indexStat.empty_pages++;
> >> +                break;
> >> +            case 'r':
> >> +                indexStat.root_pages++;
> >> +                break;
> >> +            default:
> >> +                elog(ERROR, "unknown page status.");
> >> +        }
> >> +
> >> +        /* -- leaf fragmentation -- */
> >> +        indexStat.fragments += stat.fragments;
> >> +
> >> +        if (stat.type == 'l')
> >> +        {
> >> +            indexStat.max_avail += stat.max_avail;
> >> +            indexStat.free_space += stat.free_size;
> >> +        }
> >> +
> >> +        ReleaseBuffer(buffer);
> >> +    }
> >> +
> >> +    relation_close(rel, AccessShareLock);
> >> +
> >> +    /*----------------------------
> >> +     * Build a result tuple
> >> +     *----------------------------
> >> +     */
> >> +    {
> >> +        TupleDesc    tupleDesc;
> >> +        int            j;
> >> +        char       *values[PGSTATINDEX_NCOLUMNS];
> >> +
> >> +        HeapTupleData tupleData;
> >> +        HeapTuple    tuple = &tupleData;
> >> +
> >> +        tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
> >> +
> >> +        j = 0;
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", indexStat.version);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", indexStat.level);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", (indexStat.root_pages +
> >> +                                         indexStat.leaf_pages +
> >> +                                         indexStat.internal_pages +
> >> +                                         indexStat.deleted_pages +
> >> +                                         indexStat.empty_pages) * BLCKSZ);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", indexStat.root_blkno);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", indexStat.internal_pages);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", indexStat.empty_pages);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%.2f", 100.0 - (float) indexStat.free_space / (float) indexStat.max_avail *
100.0);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%.2f", (float) indexStat.fragments / (float) indexStat.leaf_pages * 100.0);
> >> +
> >> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> >> +                                       values);
> >> +
> >> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> >> +    }
> >> +
> >> +    PG_RETURN_DATUM(result);
> >> +}
> >> +
> >> +/* -----------------------------------------------
> >> + * bt_page()
> >> + *
> >> + * Usage: SELECT * FROM bt_page('t1_pkey', 0);
> >> + * -----------------------------------------------
> >> + */
> >> +Datum
> >> +bt_page_stats(PG_FUNCTION_ARGS)
> >> +{
> >> +    text       *relname = PG_GETARG_TEXT_P(0);
> >> +    uint32        blkno = PG_GETARG_UINT32(1);
> >> +    Buffer        buffer;
> >> +
> >> +    Relation    rel;
> >> +    RangeVar   *relrv;
> >> +    Datum        result;
> >> +
> >> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> >> +    rel = relation_openrv(relrv, AccessShareLock);
> >> +
> >> +    CHECK_RELATION_BLOCK_RANGE(rel, blkno);
> >> +
> >> +    buffer = ReadBuffer(rel, blkno);
> >> +
> >> +    if (!IS_INDEX(rel) || !IS_BTREE(rel))
> >> +        elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
> >> +
> >> +    if (blkno == 0)
> >> +        elog(ERROR, "Block 0 is a meta page.");
> >> +
> >> +    {
> >> +        HeapTuple    tuple;
> >> +        TupleDesc    tupleDesc;
> >> +        int            j;
> >> +        char       *values[BTPAGESTATS_NCOLUMNS];
> >> +
> >> +        BTPageStat    stat;
> >> +
> >> +        GetBTPageStatistics(blkno, buffer, &stat);
> >> +
> >> +        tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
> >> +
> >> +        j = 0;
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", stat.blkno);
> >> +
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%c", stat.type);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", stat.live_items);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", stat.dead_items);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", stat.avg_item_size);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", stat.page_size);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", stat.free_size);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", stat.btpo_prev);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", stat.btpo_next);
> >> +
> >> +        values[j] = palloc(32);
> >> +        if (stat.type == 'd')
> >> +            snprintf(values[j++], 32, "%d", stat.btpo.xact);
> >> +        else
> >> +            snprintf(values[j++], 32, "%d", stat.btpo.level);
> >> +
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", stat.btpo_flags);
> >> +
> >> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> >> +                                       values);
> >> +
> >> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> >> +    }
> >> +
> >> +    ReleaseBuffer(buffer);
> >> +
> >> +    relation_close(rel, AccessShareLock);
> >> +
> >> +    PG_RETURN_DATUM(result);
> >> +}
> >> +
> >> +/*-------------------------------------------------------
> >> + * bt_page_items()
> >> + *
> >> + * Get IndexTupleData set in a leaf page
> >> + *
> >> + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
> >> + *-------------------------------------------------------
> >> + */
> >> +/* ---------------------------------------------------
> >> + * data structure for SRF to hold a scan information
> >> + * ---------------------------------------------------
> >> + */
> >> +struct user_args
> >> +{
> >> +    TupleDesc    tupd;
> >> +    Relation    rel;
> >> +    Buffer        buffer;
> >> +    Page        page;
> >> +    uint16        offset;
> >> +};
> >> +
> >> +Datum
> >> +bt_page_items(PG_FUNCTION_ARGS)
> >> +{
> >> +    text       *relname = PG_GETARG_TEXT_P(0);
> >> +    uint32        blkno = PG_GETARG_UINT32(1);
> >> +
> >> +    RangeVar   *relrv;
> >> +    Datum        result;
> >> +    char       *values[BTPAGEITEMS_NCOLUMNS];
> >> +    BTPageOpaque opaque;
> >> +    HeapTuple    tuple;
> >> +    ItemId        id;
> >> +
> >> +    FuncCallContext *fctx;
> >> +    MemoryContext mctx;
> >> +    struct user_args *uargs = NULL;
> >> +
> >> +    if (blkno == 0)
> >> +        elog(ERROR, "Block 0 is a meta page.");
> >> +
> >> +    if (SRF_IS_FIRSTCALL())
> >> +    {
> >> +        fctx = SRF_FIRSTCALL_INIT();
> >> +        mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
> >> +
> >> +        uargs = palloc(sizeof(struct user_args));
> >> +
> >> +        uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
> >> +        uargs->offset = FirstOffsetNumber;
> >> +
> >> +        relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> >> +        uargs->rel = relation_openrv(relrv, AccessShareLock);
> >> +
> >> +        CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno);
> >> +
> >> +        uargs->buffer = ReadBuffer(uargs->rel, blkno);
> >> +
> >> +        if (!IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel))
> >> +            elog(ERROR, "bt_page_items() can be used only on b-tree index.");
> >> +
> >> +        uargs->page = BufferGetPage(uargs->buffer);
> >> +
> >> +        opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
> >> +
> >> +        if (P_ISDELETED(opaque))
> >> +            elog(NOTICE, "bt_page_items(): this page is deleted.");
> >> +
> >> +        fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
> >> +        fctx->user_fctx = uargs;
> >> +
> >> +        MemoryContextSwitchTo(mctx);
> >> +    }
> >> +
> >> +    fctx = SRF_PERCALL_SETUP();
> >> +    uargs = fctx->user_fctx;
> >> +
> >> +    if (fctx->call_cntr < fctx->max_calls)
> >> +    {
> >> +        IndexTuple    itup;
> >> +
> >> +        id = PageGetItemId(uargs->page, uargs->offset);
> >> +
> >> +        if (!ItemIdIsValid(id))
> >> +            elog(ERROR, "Invalid ItemId.");
> >> +
> >> +        itup = (IndexTuple) PageGetItem(uargs->page, id);
> >> +
> >> +        {
> >> +            int            j = 0;
> >> +
> >> +            BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
> >> +
> >> +            values[j] = palloc(32);
> >> +            snprintf(values[j++], 32, "%d", uargs->offset);
> >> +            values[j] = palloc(32);
> >> +            snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
> >> +            values[j] = palloc(32);
> >> +            snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
> >> +            values[j] = palloc(32);
> >> +            snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
> >> +            values[j] = palloc(32);
> >> +            snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
> >> +
> >> +            {
> >> +                int            off;
> >> +                char       *dump;
> >> +                char       *ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
> >> +
> >> +                dump = palloc(IndexTupleSize(itup) * 3);
> >> +                memset(dump, 0, IndexTupleSize(itup) * 3);
> >> +
> >> +                for (off = 0;
> >> +                     off < IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
> >> +                     off++)
> >> +                {
> >> +                    if (dump[0] == '\0')
> >> +                        sprintf(dump, "%02x", *(ptr + off) & 0xff);
> >> +                    else
> >> +                    {
> >> +                        char        buf[4];
> >> +
> >> +                        sprintf(buf, " %02x", *(ptr + off) & 0xff);
> >> +                        strcat(dump, buf);
> >> +                    }
> >> +                }
> >> +                values[j] = dump;
> >> +            }
> >> +
> >> +            tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
> >> +            result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
> >> +        }
> >> +
> >> +        uargs->offset = uargs->offset + 1;
> >> +
> >> +        SRF_RETURN_NEXT(fctx, result);
> >> +    }
> >> +    else
> >> +    {
> >> +        ReleaseBuffer(uargs->buffer);
> >> +        relation_close(uargs->rel, AccessShareLock);
> >> +
> >> +        SRF_RETURN_DONE(fctx);
> >> +    }
> >> +}
> >> +
> >> +
> >> +/* ------------------------------------------------
> >> + * bt_metap()
> >> + *
> >> + * Get a btree meta-page information
> >> + *
> >> + * Usage: SELECT * FROM bt_metap('t1_pkey')
> >> + * ------------------------------------------------
> >> + */
> >> +Datum
> >> +bt_metap(PG_FUNCTION_ARGS)
> >> +{
> >> +    text       *relname = PG_GETARG_TEXT_P(0);
> >> +    Buffer        buffer;
> >> +
> >> +    Relation    rel;
> >> +    RangeVar   *relrv;
> >> +    Datum        result;
> >> +
> >> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> >> +    rel = relation_openrv(relrv, AccessShareLock);
> >> +
> >> +    if (!IS_INDEX(rel) || !IS_BTREE(rel))
> >> +        elog(ERROR, "bt_metap() can be used only on b-tree index.");
> >> +
> >> +    buffer = ReadBuffer(rel, 0);
> >> +
> >> +    {
> >> +        BTMetaPageData *metad;
> >> +
> >> +        TupleDesc    tupleDesc;
> >> +        int            j;
> >> +        char       *values[BTMETAP_NCOLUMNS];
> >> +        HeapTuple    tuple;
> >> +
> >> +        Page        page = BufferGetPage(buffer);
> >> +
> >> +        metad = BTPageGetMeta(page);
> >> +
> >> +        tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
> >> +
> >> +        j = 0;
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", metad->btm_magic);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", metad->btm_version);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", metad->btm_root);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", metad->btm_level);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", metad->btm_fastroot);
> >> +        values[j] = palloc(32);
> >> +        snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
> >> +
> >> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> >> +                                       values);
> >> +
> >> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> >> +    }
> >> +
> >> +    ReleaseBuffer(buffer);
> >> +
> >> +    relation_close(rel, AccessShareLock);
> >> +
> >> +    PG_RETURN_DATUM(result);
> >> +}
> >> +
> >> +/* --------------------------------------------------------
> >> + * pg_relpages()
> >> + *
> >> + * Get a number of pages of the table/index.
> >> + *
> >> + * Usage: SELECT pg_relpages('t1');
> >> + *          SELECT pg_relpages('t1_pkey');
> >> + * --------------------------------------------------------
> >> + */
> >> +Datum
> >> +pg_relpages(PG_FUNCTION_ARGS)
> >> +{
> >> +    text       *relname = PG_GETARG_TEXT_P(0);
> >> +
> >> +    Relation    rel;
> >> +    RangeVar   *relrv;
> >> +    int4        relpages;
> >> +
> >> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> >> +    rel = relation_openrv(relrv, AccessShareLock);
> >> +
> >> +    relpages = RelationGetNumberOfBlocks(rel);
> >> +
> >> +    relation_close(rel, AccessShareLock);
> >> +
> >> +    PG_RETURN_INT32(relpages);
> >> +}
> >> diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
> >> --- pgstattuple.orig/pgstattuple.sql.in    2006-02-28 01:09:50.000000000 +0900
> >> +++ pgstattuple/pgstattuple.sql.in    2006-08-14 10:37:32.000000000 +0900
> >> @@ -22,3 +22,96 @@
> >>  RETURNS pgstattuple_type
> >>  AS 'MODULE_PATHNAME', 'pgstattuplebyid'
> >>  LANGUAGE C STRICT;
> >> +
> >> +--
> >> +-- pgstatindex
> >> +--
> >> +DROP TYPE pgstatindex_type CASCADE;
> >> +CREATE TYPE pgstatindex_type AS (
> >> +  version int4,
> >> +  tree_level int4,
> >> +  index_size int4,
> >> +  root_block_no int4,
> >> +  internal_pages int4,
> >> +  leaf_pages int4,
> >> +  empty_pages int4,
> >> +  deleted_pages int4,
> >> +  avg_leaf_density float8,
> >> +  leaf_fragmentation float8
> >> +);
> >> +
> >> +CREATE OR REPLACE FUNCTION pgstatindex(text)
> >> +RETURNS pgstatindex_type
> >> +AS 'MODULE_PATHNAME', 'pgstatindex'
> >> +LANGUAGE 'C' STRICT;
> >> +
> >> +--
> >> +-- bt_metap()
> >> +--
> >> +DROP TYPE bt_metap_type CASCADE;
> >> +CREATE TYPE bt_metap_type AS (
> >> +  magic int4,
> >> +  version int4,
> >> +  root int4,
> >> +  level int4,
> >> +  fastroot int4,
> >> +  fastlevel int4
> >> +);
> >> +
> >> +CREATE OR REPLACE FUNCTION bt_metap(text)
> >> +RETURNS bt_metap_type
> >> +AS 'MODULE_PATHNAME', 'bt_metap'
> >> +LANGUAGE 'C' STRICT;
> >> +
> >> +--
> >> +-- bt_page_items()
> >> +--
> >> +DROP TYPE bt_page_items_type CASCADE;
> >> +CREATE TYPE bt_page_items_type AS (
> >> +  itemoffset int4,
> >> +  ctid tid,
> >> +  itemlen int4,
> >> +  nulls bool,
> >> +  vars bool,
> >> +  data text
> >> +);
> >> +
> >> +DROP FUNCTION bt_page_items(text, int4);
> >> +
> >> +CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
> >> +RETURNS SETOF bt_page_items_type
> >> +AS 'MODULE_PATHNAME', 'bt_page_items'
> >> +LANGUAGE 'C' STRICT;
> >> +
> >> +--
> >> +-- bt_page_stats()
> >> +--
> >> +DROP TYPE bt_page_stats_type CASCADE;
> >> +CREATE TYPE bt_page_stats_type AS (
> >> +  blkno int4,
> >> +  type char,
> >> +  live_items int4,
> >> +  dead_items int4,
> >> +  avg_item_size float,
> >> +  page_size int4,
> >> +  free_size int4,
> >> +  btpo_prev int4,
> >> +  btpo_next int4,
> >> +  btpo int4,
> >> +  btpo_flags int4
> >> +);
> >> +
> >> +DROP FUNCTION bt_page_stats(text, int4);
> >> +
> >> +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
> >> +RETURNS bt_page_stats_type
> >> +AS 'MODULE_PATHNAME', 'bt_page_stats'
> >> +LANGUAGE 'C' STRICT;
> >> +
> >> +--
> >> +-- pg_relpages()
> >> +--
> >> +CREATE OR REPLACE FUNCTION pg_relpages(text)
> >> +RETURNS int
> >> +AS 'MODULE_PATHNAME', 'pg_relpages'
> >> +LANGUAGE 'C' STRICT;
> >> diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
> >> --- pgstattuple.orig/test.sh    1970-01-01 09:00:00.000000000 +0900
> >> +++ pgstattuple/test.sh    2006-08-14 10:23:08.000000000 +0900
> >> @@ -0,0 +1,27 @@
> >> +#!/bin/sh
> >> +
> >> +PGHOME=/home/snaga/pgsql20060814
> >> +export PATH=${PGHOME}/bin:$PATH
> >> +
> >> +psql -p 9999 pgbench<<EOF
> >> +DROP FUNCTION pgstatindex(text);
> >> +EOF
> >> +
> >> +psql -p 9999 -f ${PGHOME}/share/contrib/pgstattuple.sql pgbench
> >> +
> >> +psql -p 9999 pgbench<<EOF
> >> +SELECT * FROM pg_relpages('accounts_pkey');
> >> +\x
> >> +SELECT * FROM pgstatindex('accounts_pkey');
> >> +SELECT * FROM bt_metap('accounts_pkey');
> >> +\x
> >> +SELECT * FROM bt_page_items('accounts_pkey', 0);
> >> +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
> >> +
> >> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
> >> +\x
> >> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
> >> +SELECT * FROM bt_page_stats('accounts_pkey', 361);
> >> +\x
> >> +SELECT * FROM bt_page_items('accounts_pkey', 361);
> >> +EOF
> >>
> >>
> >> ------------------------------------------------------------------------
> >>
> >> pgstattuple README            2002/08/29 Tatsuo Ishii
> >>
> >> 1. Functions supported:
> >>
> >>     pgstattuple
> >>     -----------
> >>     pgstattuple() returns the relation length, percentage of the "dead"
> >>     tuples of a relation and other info. This may help users to determine
> >>     whether vacuum is necessary or not. Here is an example session:
> >>
> >>         test=> \x
> >>         Expanded display is on.
> >>         test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
> >>         -[ RECORD 1 ]------+-------
> >>         table_len          | 458752
> >>         tuple_count        | 1470
> >>         tuple_len          | 438896
> >>         tuple_percent      | 95.67
> >>         dead_tuple_count   | 11
> >>         dead_tuple_len     | 3157
> >>         dead_tuple_percent | 0.69
> >>         free_space         | 8932
> >>         free_percent       | 1.95
> >>
> >>     Here are explanations for each column:
> >>
> >>         table_len        -- physical relation length in bytes
> >>         tuple_count        -- number of live tuples
> >>         tuple_len        -- total tuples length in bytes
> >>         tuple_percent    -- live tuples in %
> >>         dead_tuple_len    -- total dead tuples length in bytes
> >>         dead_tuple_percent    -- dead tuples in %
> >>         free_space        -- free space in bytes
> >>         free_percent    -- free space in %
> >>
> >>     pg_relpages
> >>     -----------
> >>     pg_relpages() returns the number of pages in the relation.
> >>
> >>     pgstatindex
> >>     -----------
> >>     pgstatindex() returns an array showing the information about an index:
> >>
> >>         test=> \x
> >>         Expanded display is on.
> >>         test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
> >>         -[ RECORD 1 ]------+------
> >>         version            | 2
> >>         tree_level         | 0
> >>         index_size         | 8192
> >>         root_block_no      | 1
> >>         internal_pages     | 0
> >>         leaf_pages         | 1
> >>         empty_pages        | 0
> >>         deleted_pages      | 0
> >>         avg_leaf_density   | 50.27
> >>         leaf_fragmentation | 0
> >>
> >>     bt_metap
> >>     --------
> >>     bt_metap() returns information about the btree index metapage:
> >>
> >>         test=> SELECT * FROM bt_metap('pg_cast_oid_index');
> >>         -[ RECORD 1 ]-----
> >>         magic     | 340322
> >>         version   | 2
> >>         root      | 1
> >>         level     | 0
> >>         fastroot  | 1
> >>         fastlevel | 0
> >>
> >>     bt_page_stats
> >>     -------------
> >>     bt_page_stats() shows information about single btree pages:
> >>
> >>         test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
> >>         -[ RECORD 1 ]-+-----
> >>         blkno         | 1
> >>         type          | l
> >>         live_items    | 256
> >>         dead_items    | 0
> >>         avg_item_size | 12
> >>         page_size     | 8192
> >>         free_size     | 4056
> >>         btpo_prev     | 0
> >>         btpo_next     | 0
> >>         btpo          | 0
> >>         btpo_flags    | 3
> >>
> >>     bt_page_items
> >>     -------------
> >>     bt_page_items() returns information about specific items on btree pages:
> >>
> >>         test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
> >>          itemoffset |  ctid   | itemlen | nulls | vars |    data
> >>         ------------+---------+---------+-------+------+-------------
> >>                   1 | (0,1)   |      12 | f     | f    | 23 27 00 00
> >>                   2 | (0,2)   |      12 | f     | f    | 24 27 00 00
> >>                   3 | (0,3)   |      12 | f     | f    | 25 27 00 00
> >>                   4 | (0,4)   |      12 | f     | f    | 26 27 00 00
> >>                   5 | (0,5)   |      12 | f     | f    | 27 27 00 00
> >>                   6 | (0,6)   |      12 | f     | f    | 28 27 00 00
> >>                   7 | (0,7)   |      12 | f     | f    | 29 27 00 00
> >>                   8 | (0,8)   |      12 | f     | f    | 2a 27 00 00
> >>
> >>
> >> 2. Installing pgstattuple
> >>
> >>     $ make
> >>     $ make install
> >>     $ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test
> >>
> >>
> >> 3. Using pgstattuple
> >>
> >>     pgstattuple may be called as a relation function and is
> >>     defined as follows:
> >>
> >>     CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
> >>      AS 'MODULE_PATHNAME', 'pgstattuple'
> >>      LANGUAGE C STRICT;
> >>
> >>     CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
> >>      AS 'MODULE_PATHNAME', 'pgstattuplebyid'
> >>      LANGUAGE C STRICT;
> >>
> >>     The argument is the relation name (optionally it may be qualified)
> >>     or the OID of the relation.  Note that pgstattuple only returns
> >>     one row.
> >>
> >>
> >> 4. Notes
> >>
> >>     pgstattuple acquires only a read lock on the relation. So concurrent
> >>     update may affect the result.
> >>
> >>     pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow()
> >>     returns false.
> >>
> >>
> >> 5. History
> >>
> >>     2006/06/28
> >>
> >>     Extended to work against indexes.
>
>
> --
> NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
> Phone: +81-3-3523-8122

> $PostgreSQL: pgsql/contrib/pgstattuple/README.pgstattuple.euc_jp,v 1.7 2006/07/06 02:18:07 momjian Exp $
>
> pgstattuple README        2002/08/22 ÀаæãÉ×
>
> 1. ¥µ¥Ý¡¼¥È¤µ¤ì¤Æ¤¤¤ë´Ø¿ô
>
>    pgstattuple
>    -----------
>    pgstattuple¤Ï¡¤UPDATE¤äDELETE¤Çºî¤é¤ì¤¿¥ê¥ì¡¼¥·¥ç¥ó¤Î¥´¥ßÎΰè¤ÎÂ礭¤µ¤ò¡¤
>    ¥ê¥ì¡¼¥·¥ç¥ó¼«ÂΤÎʪÍýŪ¤ÊÂ礭¤µ¤ËÂФ¹¤ë¥Ñ¡¼¥»¥ó¥Æ¡¼¥¸¤ÇÊֵѤ·¤Þ¤¹¡¥¤Ä
>    ¤Þ¤ê¡¤ÊÖµÑÃͤ¬Â礭¤±¤ì¤Ð¡¤¤½¤ì¤À¤±¥´¥ß¤â¿¤¤¤Î¤Ç¡¤vacuum¤ò¤«¤±¤ëɬ
>    Íפ¬¤¢¤ë¤È¤¤¤¦È½ÃǤνõ¤±¤Ë¤Ê¤ë¤ï¤±¤Ç¤¹¡¥¤³¤ì°Ê³°¤Ë¤â¤¤¤í¤¤¤í¤Ê¾ðÊó
>    ¤¬ÊÖ¤ê¤Þ¤¹¡¥
>    pgstattuple() ¤Ï¡¢¥ê¥ì¡¼¥·¥ç¥ó¤ÎŤµ¤ä¡¢¥¿¥×¥ë¤Î"¥´¥ßÎΰè"¤Î³ä¹ç¤Ê¤É¤Î
>    ¾ðÊó¤òÊֵѤ·¤Þ¤¹¡£¤³¤ì¤é¤Î¾ðÊó¤Ï¡¢vacuum ¤¬É¬Íפ«¤É¤¦¤«¡¢¥æ¡¼¥¶¤¬È½ÃÇ
>    ¤¹¤ë¤Î¤ËÌòΩ¤Ä¤Ç¤·¤ç¤¦¡£Î㤨¤Ð°Ê²¼¤Î¤è¤¦¤Ê·Á¤Ë¤Ê¤ê¤Þ¤¹¡§
>
>        test=> \x
>        Expanded display is on.
>        test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
>        -[ RECORD 1 ]------+-------
>        table_len          | 458752
>        tuple_count        | 1470
>        tuple_len          | 438896
>        tuple_percent      | 95.67
>        dead_tuple_count   | 11
>        dead_tuple_len     | 3157
>        dead_tuple_percent | 0.69
>        free_space         | 8932
>        free_percent       | 1.95
>
>    °Ê²¼¤¬³Æ¹àÌܤÎÆâÍƤǤ¹¡§
>
>        table_len        -- ¥ê¥ì¡¼¥·¥ç¥ó¤ÎʪÍýŪ¤ÊÂ礭¤µ(¥Ð¥¤¥È)
>        tuple_count        -- ¥¿¥×¥ë¿ô
>        tuple_len        -- ¥¿¥×¥ëĹ¤Î¹ç·×(¥Ð¥¤¥È)
>        tuple_percent    -- ¥¿¥×¥ë¤Î³ä¹ç¡¥table_len¤ËÂФ¹¤ëtuple_len¤ÎÈæΨ¡¥
>        dead_tuple_len    -- ¥Ç¥Ã¥É¥¿¥×¥ë¿ô
>        dead_tuple_percent    -- ¥Ç¥Ã¥É¥¿¥×¥ë¤Î³ä¹ç¡¥table_len¤ËÂФ¹¤ëtuple_len¤ÎÈæΨ¡¥
>        free_space        -- ºÆÍøÍѲÄǽ¤ÊÎΰè(¥Ð¥¤¥È)
>        free_percent    -- ºÆÍøÍѲÄǽ¤ÊÎΰ补table_len¤ËÂФ¹¤ëfree_space¤ÎÈæΨ¡¥
>
>    pg_relpages
>    -----------
>    pg_relpages() ¤Ï¥ê¥ì¡¼¥·¥ç¥ó¤Î¥Ú¡¼¥¸¿ô¤òÊֵѤ·¤Þ¤¹¡¥
>
>    pgstatindex
>    -----------
>    pgstatindex() ¤Ï¥¤¥ó¥Ç¥Ã¥¯¥¹¤Ë´Ø¤¹¤ë¾ðÊó¤òÊֵѤ·¤Þ¤¹¡§
>
>        test=> \x
>        Expanded display is on.
>        test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
>        -[ RECORD 1 ]------+------
>        version            | 2
>        tree_level         | 0
>        index_size         | 8192
>        root_block_no      | 1
>        internal_pages     | 0
>        leaf_pages         | 1
>        empty_pages        | 0
>        deleted_pages      | 0
>        avg_leaf_density   | 50.27
>        leaf_fragmentation | 0
>
>    bt_metap
>    --------
>    bt_metap() ¤Ïbtree¥¤¥ó¥Ç¥Ã¥¯¥¹¤Î¥á¥¿¥Ú¡¼¥¸¤Ë´Ø¤¹¤ë¾ðÊó¤òÊֵѤ·¤Þ¤¹¡§
>
>        test=> SELECT * FROM bt_metap('pg_cast_oid_index');
>        -[ RECORD 1 ]-----
>        magic     | 340322
>        version   | 2
>        root      | 1
>        level     | 0
>        fastroot  | 1
>        fastlevel | 0
>
>    bt_page_stats
>    -------------
>    bt_page_stats() ¤Ï¡¢btree¤Îñ°ì¤Î¥Ú¡¼¥¸¤Ë´Ø¤¹¤ë¾ðÊó¤òɽ¼¨¤·¤Þ¤¹¡§
>
>        test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
>        -[ RECORD 1 ]-+-----
>        blkno         | 1
>        type          | l
>        live_items    | 256
>        dead_items    | 0
>        avg_item_size | 12
>        page_size     | 8192
>        free_size     | 4056
>        btpo_prev     | 0
>        btpo_next     | 0
>        btpo          | 0
>        btpo_flags    | 3
>
>    bt_page_items
>    -------------
>    bt_page_items() ¤Ï¡¢³Æbtree¥Ú¡¼¥¸¤Î¸ÄÊ̤Υ¢¥¤¥Æ¥à¤Ë´Ø¤¹¤ë¾ðÊó¤òɽ¼¨¤·¤Þ¤¹¡§
>
>        test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
>         itemoffset |  ctid   | itemlen | nulls | vars |    data
>        ------------+---------+---------+-------+------+-------------
>                  1 | (0,1)   |      12 | f     | f    | 23 27 00 00
>                  2 | (0,2)   |      12 | f     | f    | 24 27 00 00
>                  3 | (0,3)   |      12 | f     | f    | 25 27 00 00
>                  4 | (0,4)   |      12 | f     | f    | 26 27 00 00
>                  5 | (0,5)   |      12 | f     | f    | 27 27 00 00
>                  6 | (0,6)   |      12 | f     | f    | 28 27 00 00
>                  7 | (0,7)   |      12 | f     | f    | 29 27 00 00
>                  8 | (0,8)   |      12 | f     | f    | 2a 27 00 00
>
>
> 2. pgstattuple¤Î¥¤¥ó¥¹¥È¡¼¥ë
>
>    PostgreSQL¤¬/usr/local/pgsql¤Ë¥¤¥ó¥¹¥È¡¼¥ëºÑ¤Ç¤¢¤ê¡¤test¥Ç¡¼¥¿¥Ù¡¼
>    ¥¹¤Ëpgstattuple¤ò¥¤¥ó¥¹¥È¡¼¥ë¤¹¤ë¾ì¹ç¤Î¼ê½ç¤ò¼¨¤·¤Þ¤¹¡¥
>
>     $ make
>     $ make install
>
>     ¥æ¡¼¥¶ÄêµÁ´Ø¿ô¤òÅÐÏ¿¤·¤Þ¤¹¡¥
>
>     $ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test
>
>
> 3. pgstattuple¤Î»È¤¤Êý
>
>    pgstattuple¤Î¸Æ¤Ó½Ð¤··Á¼°¤Ï°Ê²¼¤Ç¤¹¡¥
>
>    CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
>      AS 'MODULE_PATHNAME', 'pgstattuple'
>      LANGUAGE C STRICT;
>
>    Âè°ì°ú¿ô: ¥ê¥ì¡¼¥·¥ç¥ó̾
>
>    ´Ø¿ô¤ÎÌá¤ê¤Ïpgstattuple_type·¿¤Ç¤¹¡¥
>
>    pgstattuple¤Ï¥ê¥ì¡¼¥·¥ç¥ó¤ËAccessShareLock¤·¤«¤«¤±¤Ê¤¤¤Î¤Ç¡¤
>    pgstattuple ¤ò¼Â¹ÔÃæ¤Ë³ºÅö¥ê¥ì¡¼¥·¥ç¥ó¤Ë¹¹¿·¤äºï½ü¤¬È¯À¸¤¹¤ë¤È¡¤Àµ¤·¤¯
>    ¤Ê¤¤·ë²Ì¤òÊÖ¤¹²ÄǽÀ­¤¬¤¢¤ê¤Þ¤¹¡¥
>
>    pgstattuple¤¬¥¿¥×¥ë¤ò¡Ö¥´¥ß¡×¤ÈȽÃǤ¹¤ë´ð½à¤Ï¡¤
>    HeapTupleSatisfiesNow()¤¬µ¶¤òÊÖ¤·¤¿¤È¤­¤Ç¤¹¡¥
>
> 4. pgstattuple¤Î¥é¥¤¥»¥ó¥¹¾ò·ï¤Ë¤Ä¤¤¤Æ
>
>    pgstattuple.c¤ÎËÁƬ¤Ë½ñ¤¤¤Æ¤¢¤ëÄ̤ê¤Ç¤¹¡¥¤Þ¤¿¡¤pgstattuple ¤Ï´°Á´¤Ë̵ÊÝ
>    ¾Ú¤Ç¤¹¡¥pgstattuple ¤ò»ÈÍѤ·¤¿¤³¤È¤Ë¤è¤Ã¤ÆÀ¸¤¸¤ë¤¤¤«¤Ê¤ë·ë²Ì¤Ë´Ø¤·¤Æ
>    ¤âÀÕǤ¤òÉ餤¤Þ¤»¤ó¡¥
>
> 5. ²þÄûÍúÎò
>
>    2006/06/28
>
>     ¥¤¥ó¥Ç¥Ã¥¯¥¹¤ËÂФ·¤Æ¤âÆ°ºî¤¹¤ë¤è¤¦¤Ë³ÈÄ¥¡£
>
>    2002/09/04
>
>     SRFÊѹ¹¤Ëȼ¤¤¡¤Tom Lane    ¤¬½¤Àµ¥¤¥ó¥¿¡¼¥Õ¥§¥¤¥¹¤Î½¤Àµ¤ò¹Ô¤Ã¤¿¡¥
>     ¤½¤Î¤³¤È¤ò¤³¤ÎREADME¤Ë¤âÈ¿±Ç¡¥
>
>    2002/08/23
>
>     SRF(Set Returning Function)¤ò»È¤Ã¤Æ7.3ÍѤ˽ñ¤­´¹¤¨¡¥
>
>    2001/12/20 Tom Lane¤Ë¤è¤ë½¤Àµ
>
>        Fix pgstattuple to acquire a read lock on the target table.  This
>     prevents embarassments such as having the table dropped or truncated
>     partway through the scan.  Also, fix free space calculation to include
>     pages that currently contain no tuples.
>
>    2001/10/01 PostgreSQL 7.2 ÍÑcontrib module¤ËÅÐÏ¿
>
>    2001/08/30 pgstattuple ¥Ð¡¼¥¸¥ç¥ó 0.1¥ê¥ê¡¼¥¹
>
> -- Adjust this setting to control where the objects get created.
> SET search_path = public;
>
> DROP FUNCTION pgstattuple(oid);
>
> DROP FUNCTION pgstattuple(text);
>
> DROP TYPE pgstattuple_type;
>
> DROP FUNCTION pgstatindex(text);
> DROP FUNCTION bt_metap(text);
> DROP FUNCTION bt_page_items(text, int4);
> DROP FUNCTION bt_page_stats(text, int4);
> DROP FUNCTION pg_relpages(text);
>
> DROP TYPE pgstatindex_type;
> DROP TYPE bt_metap_type;
> DROP TYPE bt_page_items_type;
> DROP TYPE bt_page_stats_type;

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +