Re: ANALYZE patch for review - Mailing list pgsql-patches

From Mark Cave-Ayland
Subject Re: ANALYZE patch for review
Date
Msg-id 8F4A22E017460A458DB7BBAB65CA6AE502654D@openmanage
Whole thread Raw
In response to ANALYZE patch for review  ("Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk>)
Responses Re: ANALYZE patch for review
List pgsql-patches
Hi Tom,

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: 29 January 2004 15:31
> To: Mark Cave-Ayland
> Cc: pgsql-patches@postgresql.org
> Subject: Re: [PATCHES] ANALYZE patch for review
>
>

<lots cut about pointers>

OK, I've had another attempt at writing the code as you suggested but
the more I work on it the less I like it :(. What I would like to do is
make the VacAttrStats structure so that it just contains the information
that is updated in the pg_statistic table, however this fell apart when
I realised that update_attstats() suddenly requires the attr and
attrtype fields to be present. Doh.

So I'd like to propose a slightly different solution. I think that
examine_attribute() should return a pointer to a custom structure
containing any information that needs to be passed to the datatype
specific routine (not the entire VacAttrStats structure), or NULL if the
column should not be analyzed. I'm also considering changing the
examine_attribute() input parameters to be Relation, Attribute, Type for
the current column along with a pointer to a bool to indicate whether or
not the column should be analyzed or not.

If examine_attribute() sets the bool to false then the column is
ignored. If the bool is set to true then a VacAttrStats structure is
created in memory, and then the Attribute and Type tuple information is
copied into the VacAttrStats structure. A new field for VacAttrStats
will contain the pointer to the custom structure returned by
examine_attribute() which can then be passed into the compute_*_stats()
functions as an extra parameter.

This seems to achieve the aims of abstracting the statistics data from
the intermediate information required by the statistics routines,
allowing extra/custom data to be passed between the typanalyze function
and the statistics algorithm, and allowing the user to have the attr and
attrtype structures given to them. The only thing I don't really like
about this is providing a pointer to a bool in examine_attribute() -
however this is needed to distinguish from a NULL meaning 'I have no
custom data but the analyze function should still be called' and 'This
column should not be analyzed'. I can't think of a better solution at
the moment.

> > I'm beginning to think that perhaps we're looking at this
> in the wrong
> > way, and that a more elegant version of what you're
> suggesting could
> > be implemented using a major/minor method of identifying a
> statistics
> > type.
>
> If you suppose that the "major" field is the upper bits of
> the statistics ID value, then this is just a slightly
> different way of thinking about the range-based allocation
> method I suggested before. However, the range-based method
> can adapt to allocating different amounts of identifier space
> to different owners, whereas a major/minor approach can't
> easily do that since you've defined it to be 2^N minor IDs
> for each major code.

I was thinking perhaps in terms of an extra staowner int2 field in
pg_statistic where the IDs are allocated by the PGDG. Then each
group/project would only require one owner id to be allocated to them
and then have the existing 2^16 stakind space to organise themselves.
The advantage of this is that projects can allocate their own stakind
fields, implementing new or improved statistic algorithms without having
to wait on the new allocation from the PGDG.


Many thanks,

Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.



pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: C locale sort in src/tools/make_ctags
Next
From: Tom Lane
Date:
Subject: Re: ANALYZE patch for review