Re: visualizing B-tree index coverage - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: visualizing B-tree index coverage
Date
Msg-id Pine.GSO.4.62.0501260951440.6363@ra.sai.msu.su
Whole thread Raw
In response to Re: visualizing B-tree index coverage  ("TJ O'Donnell" <tjo@acm.org>)
List pgsql-general
Excuse me for bothering but what kind of search engine you
developed. Does it looks like sets comparing ?

     Oleg

On Tue, 25 Jan 2005, TJ O'Donnell wrote:

> Since I'm using a multi-column index, I can greatly influence
> the nature of the index created, depending on which columns I use
> and how many.  I'm searching for an optimal set
> of columns that creates an index that, for sure does not have
> every value the same, nor only two values.  Instead, I want to see
> how well I've spread the index out over the data (if that phrasing makes sense).
>
> More specifically, I have character data representing molecular structures.
> I've written (rather slow) search functions.  I can create any number of
> columns that "fingerprint" each structure, e.g. # Carbon atoms, # N atoms,
> # single bonds, etc.  I expect my fingerprints will not be unique (fingerprint may
> be a poor analogy), but rather will classify similar structures together.
> I create a multi-column index on these counts and
> get about 2-3 times speedup using 13 columns right now.
> For example:
>
> select count(smiles) from structure where  oe_matches(smiles,'c1ccccc1CC(=O)NC')  about 15 sec.
>
> select count(smiles) from structure where
> (_c, _n, _o, _s, _p, _halo,
>  _arom_c, _arom_n, _arom_o, _arom_s,
>  _atoms, _single_bonds, _other_bonds)  >=
> ( 3,1,1,0,0,0, 6,0,0,0, 11,4,7 )
> and oe_matches(smiles,'c1ccccc1CC(=O)NC')   about 6 seconds
> when the (_c, etc.) is a multi-column index.
>
> The data isn't inherently structured in any way that invites some particular number of columns
> for indexing.  I don't want to use too many, nor too few columns.  I also
> want to optimize the nature(which atom types, bond types, etc.)
> of the count columns.  While I could do this
> and use the speedup as the measure of success, I think
> that if my B-tree were "covering" the data well, I would get the best results.
> Covering means finding that optimal situation where there is not one index for all rows
> and also not a unique index for every row - something inbetween would be ideal,
> or is that basically a wrong idea?
>
> TJ
>
>
>
>> Useful explanation of PostgreSQL index format:
>> http://www.faqs.org/docs/ppbook/c13329.htm
>>
>> I think you are aiming for the wrong thing.
>> The worst possible index is one with every value the same.
>> The second worst (still basically useless) is one with only two values. The greater the
>> differentiation of the data, the more workload is
>> reduced on a search.
>>
>> Since it isn't a straight binary tree, I don't think that having highly dissimilar data in the
>> index should be a problem.
>>
>> Do you have data or experience that shows otherwise?
>>
>> -----Original Message-----
>> From: pgsql-general-owner@postgresql.org
>> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of TJ O'Donnell Sent: Tuesday, January 25,
>> 2005 2:19 PM
>> To: pgsql-general@postgresql.org
>> Cc: tjo@acm.org
>> Subject: [GENERAL] visualizing B-tree index coverage
>>
>> Does anyone know of a tools that allows one to visualize
>> the tree created by a multi-column B-tree index?
>> A picture of a tree with branches, showing how "branchy" the
>> tree is would be great.
>> I'm wondering how well I've "clustered" the data in my table
>> using the multi-column index.  In other words, do my
>> multi-columns sufficiently but not overly discriminate rows from each other?
>> Do I have too many with the same index? (not enough branches)
>> Do I have a unique index for each row? (way too many branches)
>>
>> Thanks,
>> TJ
>>
>>
>>
>> ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to
>> increase your free space map settings
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>               http://archives.postgresql.org
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

pgsql-general by date:

Previous
From: Dale Sykora
Date:
Subject: text field constraint advice
Next
From: Michael Fuhr
Date:
Subject: Re: text field constraint advice