ANALYZE patch for review - Mailing list pgsql-patches

From Mark Cave-Ayland
Subject ANALYZE patch for review
Date
Msg-id 8F4A22E017460A458DB7BBAB65CA6AE512D0F6@openmanage
Whole thread Raw
Responses Re: ANALYZE patch for review
List pgsql-patches
Hi everyone,


Here is a first attempt at a patch to allow a customised ANALYZE
function to be specified for user-defined types, relating to the
following two threads:

http://archives.postgresql.org/pgsql-hackers/2003-10/msg00113.php and
http://archives.postgresql.org/pgsql-hackers/2004-01/msg00236.php


This is my first attempt at a patch for Postgres so I don't expect it to
get applied without significant review. Included along with this email
is a sample datatype that can be used to see that the ANALYZE function
is being called by inserting a fixed value into the pg_statistic table.

Some points:

    1) This patch does nothing other than provide a mechanism for
the user to place
    their own values in the pg_statistic table. Currently I am still
trying to complete
    the loop by finding out how and in what format a GiST
selectivity function has
    access to this data.

    The user-defined function uses a new STATISTICS_KIND_CUSTOM
    constant to hold the data as I am not 100% sure how the planner
interprets the
    STATISTIC_KIND_* values in the pg_statistic table. My aim is to
be able to store
    a 2D histogram so that spatial row count estimates can be used
by the planner
    in PostGIS (http://postgis.refractions.net) without having to
maintain the
    statistics by manually running a stored procedure. The
pg_statistic.h file doesn't
    seem clear to me as to what the values of the various columns
should be when not
    dealing with single one-dimensional histograms.....


    2) All existing types are assigned the default analyze function
called
    pg_analyze_alg_default().


    3) The function is called within examine_attribute() to setup a
valid VacAttrStats
    function if the column is analyzable and point to the algorithm
function itself
    which currently uses the existing API of compute_scalar_stats()
and     compute_minimal_stats().


    4) The VacAttrStats structure may need to be moved out to a
different .h file
    for access by the programmer (currently it is in
commands/vacuum.h) - maybe a
    new analyze.h would be better?


    5) Currently no thought has been given to providing statistics
on functional
    indexes (see
http://archives.postgresql.org/pgsql-general/2004-01/msg00978.php)


In other words, the basic work is done but there is still some way to go
before it is complete. Hopefully by putting this out now other
pgsql-hackers will get the chance to review and refine the patch
(particularly in the area of functional indexes) so this
feature can make it out into 7.5.


Many thanks to the hackers who have spent their time helping me get this
far (particularly Tom).


Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.

Attachment

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: pltcl - "Cache lookup for attribute" error - version 2
Next
From: Jan Wieck
Date:
Subject: Re: pltcl - "Cache lookup for attribute" error - version