Re: pgstattuple extension for indexes - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: pgstattuple extension for indexes
Date
Msg-id 200609021706.k82H6Sr29059@momjian.us
Whole thread Raw
In response to Re: pgstattuple extension for indexes  (Satoshi Nagayasu <nagayasus@nttdata.co.jp>)
Responses Re: pgstattuple extension for indexes
Re: pgstattuple extension for indexes
Re: pgstattuple extension for indexes
List pgsql-patches
Patch applied.  Thanks.

I updated the README documentation for the new functions, attached.  I
could not update the Japanese version of the README.

---------------------------------------------------------------------------


Satoshi Nagayasu wrote:
> Bruce,
>
> Attached patch has been cleaned up,
> and modified to be able to work with CVS HEAD.
>
> Thanks.
>
> Satoshi Nagayasu wrote:
> > Alvaro,
> >
> > Alvaro Herrera wrote:
> >> Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
> >>
> >> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
> >> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this
function)
> >>
> >>
> >> While you're at it, please consider removing C++ style comments and
> >> unused code.
> >>
> >> Formatting is way off as well, but I guess that is easily fixed with
> >> pgindent.
> >
> > Thanks for comments. I'm going to fix my patch from now.
> >
> >> Regarding the pg_relpages function, why do you think it's necessary?
> >> (It returns the true number of blocks of a given relation).  It may
> >> belong into core given a reasonable use case, but otherwise it doesn't
> >> seem to belong into pgstatindex (or pgstattuple for that matter).
> >
> > I wanted to sample some pages from the table/index, and get their statistics
> > to know table/index conditions. I know pgstattuple() reports table
> > statistics, however, pgstattuple() generates heavy CPU and I/O load.
> >
> > When we need to sample some pages from table/index, we need to know
> > true number of blocks.
> >
> > I have another function, called pgstatpage(), to get information inside
> > a single block/page statistics of the table. pg_relpages() will be used
> > with this.
> >
> > Sorry for not mentioned in previous post about pgstatpage(),
> > but I've remembered about it just now.
> >
> > Many memories in my brain have already `paged-out` (too busy in last few months),
> > and some of them got `out-of-memory`. :^)
> >
> > Thanks.
>
>
> --
> NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
> Phone: +81-3-3523-8122

> diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
> --- pgstattuple.orig/Makefile    2006-02-27 21:54:40.000000000 +0900
> +++ pgstattuple/Makefile    2006-08-14 09:28:58.000000000 +0900
> @@ -6,7 +6,7 @@
>  #
>  #-------------------------------------------------------------------------
>
> -SRCS        = pgstattuple.c
> +SRCS        = pgstattuple.c pgstatindex.c
>
>  MODULE_big    = pgstattuple
>  OBJS        = $(SRCS:.c=.o)
> diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
> --- pgstattuple.orig/pgstatindex.c    1970-01-01 09:00:00.000000000 +0900
> +++ pgstattuple/pgstatindex.c    2006-08-14 11:24:23.000000000 +0900
> @@ -0,0 +1,706 @@
> +/*
> + * pgstatindex
> + *
> + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp>
> + *
> + * Permission to use, copy, modify, and distribute this software and
> + * its documentation for any purpose, without fee, and without a
> + * written agreement is hereby granted, provided that the above
> + * copyright notice and this paragraph and the following two
> + * paragraphs appear in all copies.
> + *
> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
> + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
> + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
> + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
> + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
> + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
> + */
> +
> +#include "postgres.h"
> +
> +#include "fmgr.h"
> +#include "funcapi.h"
> +#include "access/heapam.h"
> +#include "access/itup.h"
> +#include "access/nbtree.h"
> +#include "access/transam.h"
> +#include "catalog/namespace.h"
> +#include "catalog/pg_type.h"
> +#include "utils/builtins.h"
> +#include "utils/inval.h"
> +
> +PG_FUNCTION_INFO_V1(pgstatindex);
> +PG_FUNCTION_INFO_V1(bt_metap);
> +PG_FUNCTION_INFO_V1(bt_page_items);
> +PG_FUNCTION_INFO_V1(bt_page_stats);
> +PG_FUNCTION_INFO_V1(pg_relpages);
> +
> +extern Datum pgstatindex(PG_FUNCTION_ARGS);
> +extern Datum bt_metap(PG_FUNCTION_ARGS);
> +extern Datum bt_page_items(PG_FUNCTION_ARGS);
> +extern Datum bt_page_stats(PG_FUNCTION_ARGS);
> +extern Datum pg_relpages(PG_FUNCTION_ARGS);
> +
> +#define PGSTATINDEX_TYPE "public.pgstatindex_type"
> +#define PGSTATINDEX_NCOLUMNS 10
> +
> +#define BTMETAP_TYPE "public.bt_metap_type"
> +#define BTMETAP_NCOLUMNS 6
> +
> +#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
> +#define BTPAGEITEMS_NCOLUMNS 6
> +
> +#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
> +#define BTPAGESTATS_NCOLUMNS 11
> +
> +
> +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
> +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
> +
> +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
> +        if ( !(FirstOffsetNumber<=(offset) && \
> +                        (offset)<=PageGetMaxOffsetNumber(page)) ) \
> +             elog(ERROR, "Page offset number out of range."); }
> +
> +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
> +        if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
> +             elog(ERROR, "Block number out of range."); }
> +
> +/* ------------------------------------------------
> + * structure for single btree page statistics
> + * ------------------------------------------------
> + */
> +typedef struct BTPageStat
> +{
> +    uint32        blkno;
> +    uint32        live_items;
> +    uint32        dead_items;
> +    uint32        page_size;
> +    uint32        max_avail;
> +    uint32        free_size;
> +    uint32        avg_item_size;
> +    uint32        fragments;
> +    char        type;
> +
> +    /* opaque data */
> +    BlockNumber btpo_prev;
> +    BlockNumber btpo_next;
> +    union
> +    {
> +        uint32        level;
> +        TransactionId xact;
> +    }            btpo;
> +    uint16        btpo_flags;
> +    BTCycleId    btpo_cycleid;
> +}    BTPageStat;
> +
> +/* ------------------------------------------------
> + * A structure for a whole btree index statistics
> + * used by pgstatindex().
> + * ------------------------------------------------
> + */
> +typedef struct BTIndexStat
> +{
> +    uint32        magic;
> +    uint32        version;
> +    BlockNumber root_blkno;
> +    uint32        level;
> +
> +    BlockNumber fastroot;
> +    uint32        fastlevel;
> +
> +    uint32        live_items;
> +    uint32        dead_items;
> +
> +    uint32        root_pages;
> +    uint32        internal_pages;
> +    uint32        leaf_pages;
> +    uint32        empty_pages;
> +    uint32        deleted_pages;
> +
> +    uint32        page_size;
> +    uint32        avg_item_size;
> +
> +    uint32        max_avail;
> +    uint32        free_space;
> +
> +    uint32        fragments;
> +}    BTIndexStat;
> +
> +/* -------------------------------------------------
> + * GetBTPageStatistics()
> + *
> + * Collect statistics of single b-tree leaf page
> + * -------------------------------------------------
> + */
> +static bool
> +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat * stat)
> +{
> +    Page        page = BufferGetPage(buffer);
> +    PageHeader    phdr = (PageHeader) page;
> +    OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
> +    BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> +    int            item_size = 0;
> +    int            off;
> +
> +    stat->blkno = blkno;
> +
> +    stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData);
> +
> +    stat->dead_items = stat->live_items = 0;
> +
> +    stat->page_size = PageGetPageSize(page);
> +
> +    /* page type (flags) */
> +    if (P_ISDELETED(opaque))
> +    {
> +        stat->type = 'd';
> +        return true;
> +    }
> +    else if (P_IGNORE(opaque))
> +        stat->type = 'e';
> +    else if (P_ISLEAF(opaque))
> +        stat->type = 'l';
> +    else if (P_ISROOT(opaque))
> +        stat->type = 'r';
> +    else
> +        stat->type = 'i';
> +
> +    /* btpage opaque data */
> +    stat->btpo_prev = opaque->btpo_prev;
> +    stat->btpo_next = opaque->btpo_next;
> +    if (P_ISDELETED(opaque))
> +        stat->btpo.xact = opaque->btpo.xact;
> +    else
> +        stat->btpo.level = opaque->btpo.level;
> +    stat->btpo_flags = opaque->btpo_flags;
> +    stat->btpo_cycleid = opaque->btpo_cycleid;
> +
> +    /*----------------------------------------------
> +     * If a next leaf is on the previous block,
> +     * it means a fragmentation.
> +     *----------------------------------------------
> +     */
> +    stat->fragments = 0;
> +    if (stat->type == 'l')
> +    {
> +        if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
> +            stat->fragments++;
> +    }
> +
> +    /* count live and dead tuples, and free space */
> +    for (off = FirstOffsetNumber; off <= maxoff; off++)
> +    {
> +        IndexTuple    itup;
> +
> +        ItemId        id = PageGetItemId(page, off);
> +
> +        itup = (IndexTuple) PageGetItem(page, id);
> +
> +        item_size += IndexTupleSize(itup);
> +
> +        if (!ItemIdDeleted(id))
> +            stat->live_items++;
> +        else
> +            stat->dead_items++;
> +    }
> +    stat->free_size = PageGetFreeSpace(page);
> +
> +    if ((stat->live_items + stat->dead_items) > 0)
> +        stat->avg_item_size = item_size / (stat->live_items + stat->dead_items);
> +    else
> +        stat->avg_item_size = 0;
> +
> +    return true;
> +}
> +
> +
> +/* ------------------------------------------------------
> + * pgstatindex()
> + *
> + * Usage: SELECT * FROM pgstatindex('t1_pkey');
> + * ------------------------------------------------------
> + */
> +Datum
> +pgstatindex(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    Relation    rel;
> +    RangeVar   *relrv;
> +    Datum        result;
> +    uint32        nblocks;
> +    uint32        blkno;
> +    BTIndexStat indexStat;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    if (!IS_INDEX(rel) || !IS_BTREE(rel))
> +        elog(ERROR, "pgstatindex() can be used only on b-tree index.");
> +
> +    /*-------------------
> +     * Read a metapage
> +     *-------------------
> +     */
> +    {
> +        Buffer        buffer = ReadBuffer(rel, 0);
> +        Page        page = BufferGetPage(buffer);
> +        BTMetaPageData *metad = BTPageGetMeta(page);
> +
> +        indexStat.magic = metad->btm_magic;
> +        indexStat.version = metad->btm_version;
> +        indexStat.root_blkno = metad->btm_root;
> +        indexStat.level = metad->btm_level;
> +        indexStat.fastroot = metad->btm_fastroot;
> +        indexStat.fastlevel = metad->btm_fastlevel;
> +
> +        ReleaseBuffer(buffer);
> +    }
> +
> +    nblocks = RelationGetNumberOfBlocks(rel);
> +
> +    /* -- init stat -- */
> +    indexStat.fragments = 0;
> +
> +    indexStat.root_pages = 0;
> +    indexStat.leaf_pages = 0;
> +    indexStat.internal_pages = 0;
> +    indexStat.empty_pages = 0;
> +    indexStat.deleted_pages = 0;
> +
> +    indexStat.max_avail = 0;
> +    indexStat.free_space = 0;
> +
> +    /*-----------------------
> +     * Scan all blocks
> +     *-----------------------
> +     */
> +    for (blkno = 1; blkno < nblocks; blkno++)
> +    {
> +        Buffer        buffer = ReadBuffer(rel, blkno);
> +        BTPageStat    stat;
> +
> +        /* scan one page */
> +        stat.blkno = blkno;
> +        GetBTPageStatistics(blkno, buffer, &stat);
> +
> +        /*---------------------
> +         * page status (type)
> +         *---------------------
> +         */
> +        switch (stat.type)
> +        {
> +            case 'd':
> +                indexStat.deleted_pages++;
> +                break;
> +            case 'l':
> +                indexStat.leaf_pages++;
> +                break;
> +            case 'i':
> +                indexStat.internal_pages++;
> +                break;
> +            case 'e':
> +                indexStat.empty_pages++;
> +                break;
> +            case 'r':
> +                indexStat.root_pages++;
> +                break;
> +            default:
> +                elog(ERROR, "unknown page status.");
> +        }
> +
> +        /* -- leaf fragmentation -- */
> +        indexStat.fragments += stat.fragments;
> +
> +        if (stat.type == 'l')
> +        {
> +            indexStat.max_avail += stat.max_avail;
> +            indexStat.free_space += stat.free_size;
> +        }
> +
> +        ReleaseBuffer(buffer);
> +    }
> +
> +    relation_close(rel, AccessShareLock);
> +
> +    /*----------------------------
> +     * Build a result tuple
> +     *----------------------------
> +     */
> +    {
> +        TupleDesc    tupleDesc;
> +        int            j;
> +        char       *values[PGSTATINDEX_NCOLUMNS];
> +
> +        HeapTupleData tupleData;
> +        HeapTuple    tuple = &tupleData;
> +
> +        tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
> +
> +        j = 0;
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.version);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.level);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", (indexStat.root_pages +
> +                                         indexStat.leaf_pages +
> +                                         indexStat.internal_pages +
> +                                         indexStat.deleted_pages +
> +                                         indexStat.empty_pages) * BLCKSZ);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.root_blkno);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.internal_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.empty_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%.2f", 100.0 - (float) indexStat.free_space / (float) indexStat.max_avail *
100.0);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%.2f", (float) indexStat.fragments / (float) indexStat.leaf_pages * 100.0);
> +
> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> +                                       values);
> +
> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> +    }
> +
> +    PG_RETURN_DATUM(result);
> +}
> +
> +/* -----------------------------------------------
> + * bt_page()
> + *
> + * Usage: SELECT * FROM bt_page('t1_pkey', 0);
> + * -----------------------------------------------
> + */
> +Datum
> +bt_page_stats(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    uint32        blkno = PG_GETARG_UINT32(1);
> +    Buffer        buffer;
> +
> +    Relation    rel;
> +    RangeVar   *relrv;
> +    Datum        result;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    CHECK_RELATION_BLOCK_RANGE(rel, blkno);
> +
> +    buffer = ReadBuffer(rel, blkno);
> +
> +    if (!IS_INDEX(rel) || !IS_BTREE(rel))
> +        elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
> +
> +    if (blkno == 0)
> +        elog(ERROR, "Block 0 is a meta page.");
> +
> +    {
> +        HeapTuple    tuple;
> +        TupleDesc    tupleDesc;
> +        int            j;
> +        char       *values[BTPAGESTATS_NCOLUMNS];
> +
> +        BTPageStat    stat;
> +
> +        GetBTPageStatistics(blkno, buffer, &stat);
> +
> +        tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
> +
> +        j = 0;
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.blkno);
> +
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%c", stat.type);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.live_items);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.dead_items);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.avg_item_size);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.page_size);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.free_size);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.btpo_prev);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.btpo_next);
> +
> +        values[j] = palloc(32);
> +        if (stat.type == 'd')
> +            snprintf(values[j++], 32, "%d", stat.btpo.xact);
> +        else
> +            snprintf(values[j++], 32, "%d", stat.btpo.level);
> +
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", stat.btpo_flags);
> +
> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> +                                       values);
> +
> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> +    }
> +
> +    ReleaseBuffer(buffer);
> +
> +    relation_close(rel, AccessShareLock);
> +
> +    PG_RETURN_DATUM(result);
> +}
> +
> +/*-------------------------------------------------------
> + * bt_page_items()
> + *
> + * Get IndexTupleData set in a leaf page
> + *
> + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
> + *-------------------------------------------------------
> + */
> +/* ---------------------------------------------------
> + * data structure for SRF to hold a scan information
> + * ---------------------------------------------------
> + */
> +struct user_args
> +{
> +    TupleDesc    tupd;
> +    Relation    rel;
> +    Buffer        buffer;
> +    Page        page;
> +    uint16        offset;
> +};
> +
> +Datum
> +bt_page_items(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    uint32        blkno = PG_GETARG_UINT32(1);
> +
> +    RangeVar   *relrv;
> +    Datum        result;
> +    char       *values[BTPAGEITEMS_NCOLUMNS];
> +    BTPageOpaque opaque;
> +    HeapTuple    tuple;
> +    ItemId        id;
> +
> +    FuncCallContext *fctx;
> +    MemoryContext mctx;
> +    struct user_args *uargs = NULL;
> +
> +    if (blkno == 0)
> +        elog(ERROR, "Block 0 is a meta page.");
> +
> +    if (SRF_IS_FIRSTCALL())
> +    {
> +        fctx = SRF_FIRSTCALL_INIT();
> +        mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
> +
> +        uargs = palloc(sizeof(struct user_args));
> +
> +        uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
> +        uargs->offset = FirstOffsetNumber;
> +
> +        relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +        uargs->rel = relation_openrv(relrv, AccessShareLock);
> +
> +        CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno);
> +
> +        uargs->buffer = ReadBuffer(uargs->rel, blkno);
> +
> +        if (!IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel))
> +            elog(ERROR, "bt_page_items() can be used only on b-tree index.");
> +
> +        uargs->page = BufferGetPage(uargs->buffer);
> +
> +        opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
> +
> +        if (P_ISDELETED(opaque))
> +            elog(NOTICE, "bt_page_items(): this page is deleted.");
> +
> +        fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
> +        fctx->user_fctx = uargs;
> +
> +        MemoryContextSwitchTo(mctx);
> +    }
> +
> +    fctx = SRF_PERCALL_SETUP();
> +    uargs = fctx->user_fctx;
> +
> +    if (fctx->call_cntr < fctx->max_calls)
> +    {
> +        IndexTuple    itup;
> +
> +        id = PageGetItemId(uargs->page, uargs->offset);
> +
> +        if (!ItemIdIsValid(id))
> +            elog(ERROR, "Invalid ItemId.");
> +
> +        itup = (IndexTuple) PageGetItem(uargs->page, id);
> +
> +        {
> +            int            j = 0;
> +
> +            BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
> +
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%d", uargs->offset);
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
> +            values[j] = palloc(32);
> +            snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
> +
> +            {
> +                int            off;
> +                char       *dump;
> +                char       *ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
> +
> +                dump = palloc(IndexTupleSize(itup) * 3);
> +                memset(dump, 0, IndexTupleSize(itup) * 3);
> +
> +                for (off = 0;
> +                     off < IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
> +                     off++)
> +                {
> +                    if (dump[0] == '\0')
> +                        sprintf(dump, "%02x", *(ptr + off) & 0xff);
> +                    else
> +                    {
> +                        char        buf[4];
> +
> +                        sprintf(buf, " %02x", *(ptr + off) & 0xff);
> +                        strcat(dump, buf);
> +                    }
> +                }
> +                values[j] = dump;
> +            }
> +
> +            tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
> +            result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
> +        }
> +
> +        uargs->offset = uargs->offset + 1;
> +
> +        SRF_RETURN_NEXT(fctx, result);
> +    }
> +    else
> +    {
> +        ReleaseBuffer(uargs->buffer);
> +        relation_close(uargs->rel, AccessShareLock);
> +
> +        SRF_RETURN_DONE(fctx);
> +    }
> +}
> +
> +
> +/* ------------------------------------------------
> + * bt_metap()
> + *
> + * Get a btree meta-page information
> + *
> + * Usage: SELECT * FROM bt_metap('t1_pkey')
> + * ------------------------------------------------
> + */
> +Datum
> +bt_metap(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +    Buffer        buffer;
> +
> +    Relation    rel;
> +    RangeVar   *relrv;
> +    Datum        result;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    if (!IS_INDEX(rel) || !IS_BTREE(rel))
> +        elog(ERROR, "bt_metap() can be used only on b-tree index.");
> +
> +    buffer = ReadBuffer(rel, 0);
> +
> +    {
> +        BTMetaPageData *metad;
> +
> +        TupleDesc    tupleDesc;
> +        int            j;
> +        char       *values[BTMETAP_NCOLUMNS];
> +        HeapTuple    tuple;
> +
> +        Page        page = BufferGetPage(buffer);
> +
> +        metad = BTPageGetMeta(page);
> +
> +        tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
> +
> +        j = 0;
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_magic);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_version);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_root);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_level);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_fastroot);
> +        values[j] = palloc(32);
> +        snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
> +
> +        tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> +                                       values);
> +
> +        result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> +    }
> +
> +    ReleaseBuffer(buffer);
> +
> +    relation_close(rel, AccessShareLock);
> +
> +    PG_RETURN_DATUM(result);
> +}
> +
> +/* --------------------------------------------------------
> + * pg_relpages()
> + *
> + * Get a number of pages of the table/index.
> + *
> + * Usage: SELECT pg_relpages('t1');
> + *          SELECT pg_relpages('t1_pkey');
> + * --------------------------------------------------------
> + */
> +Datum
> +pg_relpages(PG_FUNCTION_ARGS)
> +{
> +    text       *relname = PG_GETARG_TEXT_P(0);
> +
> +    Relation    rel;
> +    RangeVar   *relrv;
> +    int4        relpages;
> +
> +    relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> +    rel = relation_openrv(relrv, AccessShareLock);
> +
> +    relpages = RelationGetNumberOfBlocks(rel);
> +
> +    relation_close(rel, AccessShareLock);
> +
> +    PG_RETURN_INT32(relpages);
> +}
> diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
> --- pgstattuple.orig/pgstattuple.sql.in    2006-02-28 01:09:50.000000000 +0900
> +++ pgstattuple/pgstattuple.sql.in    2006-08-14 10:37:32.000000000 +0900
> @@ -22,3 +22,96 @@
>  RETURNS pgstattuple_type
>  AS 'MODULE_PATHNAME', 'pgstattuplebyid'
>  LANGUAGE C STRICT;
> +
> +--
> +-- pgstatindex
> +--
> +DROP TYPE pgstatindex_type CASCADE;
> +CREATE TYPE pgstatindex_type AS (
> +  version int4,
> +  tree_level int4,
> +  index_size int4,
> +  root_block_no int4,
> +  internal_pages int4,
> +  leaf_pages int4,
> +  empty_pages int4,
> +  deleted_pages int4,
> +  avg_leaf_density float8,
> +  leaf_fragmentation float8
> +);
> +
> +CREATE OR REPLACE FUNCTION pgstatindex(text)
> +RETURNS pgstatindex_type
> +AS 'MODULE_PATHNAME', 'pgstatindex'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_metap()
> +--
> +DROP TYPE bt_metap_type CASCADE;
> +CREATE TYPE bt_metap_type AS (
> +  magic int4,
> +  version int4,
> +  root int4,
> +  level int4,
> +  fastroot int4,
> +  fastlevel int4
> +);
> +
> +CREATE OR REPLACE FUNCTION bt_metap(text)
> +RETURNS bt_metap_type
> +AS 'MODULE_PATHNAME', 'bt_metap'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_page_items()
> +--
> +DROP TYPE bt_page_items_type CASCADE;
> +CREATE TYPE bt_page_items_type AS (
> +  itemoffset int4,
> +  ctid tid,
> +  itemlen int4,
> +  nulls bool,
> +  vars bool,
> +  data text
> +);
> +
> +DROP FUNCTION bt_page_items(text, int4);
> +
> +CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
> +RETURNS SETOF bt_page_items_type
> +AS 'MODULE_PATHNAME', 'bt_page_items'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_page_stats()
> +--
> +DROP TYPE bt_page_stats_type CASCADE;
> +CREATE TYPE bt_page_stats_type AS (
> +  blkno int4,
> +  type char,
> +  live_items int4,
> +  dead_items int4,
> +  avg_item_size float,
> +  page_size int4,
> +  free_size int4,
> +  btpo_prev int4,
> +  btpo_next int4,
> +  btpo int4,
> +  btpo_flags int4
> +);
> +
> +DROP FUNCTION bt_page_stats(text, int4);
> +
> +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
> +RETURNS bt_page_stats_type
> +AS 'MODULE_PATHNAME', 'bt_page_stats'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- pg_relpages()
> +--
> +CREATE OR REPLACE FUNCTION pg_relpages(text)
> +RETURNS int
> +AS 'MODULE_PATHNAME', 'pg_relpages'
> +LANGUAGE 'C' STRICT;
> diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
> --- pgstattuple.orig/test.sh    1970-01-01 09:00:00.000000000 +0900
> +++ pgstattuple/test.sh    2006-08-14 10:23:08.000000000 +0900
> @@ -0,0 +1,27 @@
> +#!/bin/sh
> +
> +PGHOME=/home/snaga/pgsql20060814
> +export PATH=${PGHOME}/bin:$PATH
> +
> +psql -p 9999 pgbench<<EOF
> +DROP FUNCTION pgstatindex(text);
> +EOF
> +
> +psql -p 9999 -f ${PGHOME}/share/contrib/pgstattuple.sql pgbench
> +
> +psql -p 9999 pgbench<<EOF
> +SELECT * FROM pg_relpages('accounts_pkey');
> +\x
> +SELECT * FROM pgstatindex('accounts_pkey');
> +SELECT * FROM bt_metap('accounts_pkey');
> +\x
> +SELECT * FROM bt_page_items('accounts_pkey', 0);
> +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
> +
> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
> +\x
> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
> +SELECT * FROM bt_page_stats('accounts_pkey', 361);
> +\x
> +SELECT * FROM bt_page_items('accounts_pkey', 361);
> +EOF
>
--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
pgstattuple README            2002/08/29 Tatsuo Ishii

1. Functions supported:

    pgstattuple
    -----------
    pgstattuple() returns the relation length, percentage of the "dead"
    tuples of a relation and other info. This may help users to determine
    whether vacuum is necessary or not. Here is an example session:

        test=> \x
        Expanded display is on.
        test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
        -[ RECORD 1 ]------+-------
        table_len          | 458752
        tuple_count        | 1470
        tuple_len          | 438896
        tuple_percent      | 95.67
        dead_tuple_count   | 11
        dead_tuple_len     | 3157
        dead_tuple_percent | 0.69
        free_space         | 8932
        free_percent       | 1.95

    Here are explanations for each column:

        table_len        -- physical relation length in bytes
        tuple_count        -- number of live tuples
        tuple_len        -- total tuples length in bytes
        tuple_percent    -- live tuples in %
        dead_tuple_len    -- total dead tuples length in bytes
        dead_tuple_percent    -- dead tuples in %
        free_space        -- free space in bytes
        free_percent    -- free space in %

    pg_relpages
    -----------
    pg_relpages() returns the number of pages in the relation.

    pgstatindex
    -----------
    pgstatindex() returns an array showing the information about an index:

        test=> \x
        Expanded display is on.
        test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
        -[ RECORD 1 ]------+------
        version            | 2
        tree_level         | 0
        index_size         | 8192
        root_block_no      | 1
        internal_pages     | 0
        leaf_pages         | 1
        empty_pages        | 0
        deleted_pages      | 0
        avg_leaf_density   | 50.27
        leaf_fragmentation | 0

    bt_metap
    --------
    bt_metap() returns information about the btree index metapage:

        test=> SELECT * FROM bt_metap('pg_cast_oid_index');
        -[ RECORD 1 ]-----
        magic     | 340322
        version   | 2
        root      | 1
        level     | 0
        fastroot  | 1
        fastlevel | 0

    bt_page_stats
    -------------
    bt_page_stats() shows information about single btree pages:

        test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
        -[ RECORD 1 ]-+-----
        blkno         | 1
        type          | l
        live_items    | 256
        dead_items    | 0
        avg_item_size | 12
        page_size     | 8192
        free_size     | 4056
        btpo_prev     | 0
        btpo_next     | 0
        btpo          | 0
        btpo_flags    | 3

    bt_page_items
    -------------
    bt_page_items() returns information about specific items on btree pages:

        test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
         itemoffset |  ctid   | itemlen | nulls | vars |    data
        ------------+---------+---------+-------+------+-------------
                  1 | (0,1)   |      12 | f     | f    | 23 27 00 00
                  2 | (0,2)   |      12 | f     | f    | 24 27 00 00
                  3 | (0,3)   |      12 | f     | f    | 25 27 00 00
                  4 | (0,4)   |      12 | f     | f    | 26 27 00 00
                  5 | (0,5)   |      12 | f     | f    | 27 27 00 00
                  6 | (0,6)   |      12 | f     | f    | 28 27 00 00
                  7 | (0,7)   |      12 | f     | f    | 29 27 00 00
                  8 | (0,8)   |      12 | f     | f    | 2a 27 00 00


2. Installing pgstattuple

    $ make
    $ make install
    $ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test


3. Using pgstattuple

    pgstattuple may be called as a relation function and is
    defined as follows:

    CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
     AS 'MODULE_PATHNAME', 'pgstattuple'
     LANGUAGE C STRICT;

    CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
     AS 'MODULE_PATHNAME', 'pgstattuplebyid'
     LANGUAGE C STRICT;

    The argument is the relation name (optionally it may be qualified)
    or the OID of the relation.  Note that pgstattuple only returns
    one row.


4. Notes

    pgstattuple acquires only a read lock on the relation. So concurrent
    update may affect the result.

    pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow()
    returns false.


5. History

    2006/06/28

    Extended to work against indexes.

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Resurrecting per-page cleaner for
Next
From: Bruce Momjian
Date:
Subject: Re: better support of out parameters in plperl