Re: visualizing B-tree index coverage - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: visualizing B-tree index coverage |
Date | |
Msg-id | Pine.GSO.4.62.0501260951440.6363@ra.sai.msu.su Whole thread Raw |
In response to | Re: visualizing B-tree index coverage ("TJ O'Donnell" <tjo@acm.org>) |
List | pgsql-general |
Excuse me for bothering but what kind of search engine you developed. Does it looks like sets comparing ? Oleg On Tue, 25 Jan 2005, TJ O'Donnell wrote: > Since I'm using a multi-column index, I can greatly influence > the nature of the index created, depending on which columns I use > and how many. I'm searching for an optimal set > of columns that creates an index that, for sure does not have > every value the same, nor only two values. Instead, I want to see > how well I've spread the index out over the data (if that phrasing makes sense). > > More specifically, I have character data representing molecular structures. > I've written (rather slow) search functions. I can create any number of > columns that "fingerprint" each structure, e.g. # Carbon atoms, # N atoms, > # single bonds, etc. I expect my fingerprints will not be unique (fingerprint may > be a poor analogy), but rather will classify similar structures together. > I create a multi-column index on these counts and > get about 2-3 times speedup using 13 columns right now. > For example: > > select count(smiles) from structure where oe_matches(smiles,'c1ccccc1CC(=O)NC') about 15 sec. > > select count(smiles) from structure where > (_c, _n, _o, _s, _p, _halo, > _arom_c, _arom_n, _arom_o, _arom_s, > _atoms, _single_bonds, _other_bonds) >= > ( 3,1,1,0,0,0, 6,0,0,0, 11,4,7 ) > and oe_matches(smiles,'c1ccccc1CC(=O)NC') about 6 seconds > when the (_c, etc.) is a multi-column index. > > The data isn't inherently structured in any way that invites some particular number of columns > for indexing. I don't want to use too many, nor too few columns. I also > want to optimize the nature(which atom types, bond types, etc.) > of the count columns. While I could do this > and use the speedup as the measure of success, I think > that if my B-tree were "covering" the data well, I would get the best results. > Covering means finding that optimal situation where there is not one index for all rows > and also not a unique index for every row - something inbetween would be ideal, > or is that basically a wrong idea? > > TJ > > > >> Useful explanation of PostgreSQL index format: >> http://www.faqs.org/docs/ppbook/c13329.htm >> >> I think you are aiming for the wrong thing. >> The worst possible index is one with every value the same. >> The second worst (still basically useless) is one with only two values. The greater the >> differentiation of the data, the more workload is >> reduced on a search. >> >> Since it isn't a straight binary tree, I don't think that having highly dissimilar data in the >> index should be a problem. >> >> Do you have data or experience that shows otherwise? >> >> -----Original Message----- >> From: pgsql-general-owner@postgresql.org >> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of TJ O'Donnell Sent: Tuesday, January 25, >> 2005 2:19 PM >> To: pgsql-general@postgresql.org >> Cc: tjo@acm.org >> Subject: [GENERAL] visualizing B-tree index coverage >> >> Does anyone know of a tools that allows one to visualize >> the tree created by a multi-column B-tree index? >> A picture of a tree with branches, showing how "branchy" the >> tree is would be great. >> I'm wondering how well I've "clustered" the data in my table >> using the multi-column index. In other words, do my >> multi-columns sufficiently but not overly discriminate rows from each other? >> Do I have too many with the same index? (not enough branches) >> Do I have a unique index for each row? (way too many branches) >> >> Thanks, >> TJ >> >> >> >> ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to >> increase your free space map settings > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pgsql-general by date: