Re: What is wrong with hashed index usage? - Mailing list pgsql-hackers
From | Dann Corbit |
---|---|
Subject | Re: What is wrong with hashed index usage? |
Date | |
Msg-id | D90A5A6C612A39408103E6ECDD77B82906F47B@voyager.corporate.connx.com Whole thread Raw |
In response to | What is wrong with hashed index usage? ("Dann Corbit" <DCorbit@connx.com>) |
Responses |
Re: What is wrong with hashed index usage?
|
List | pgsql-hackers |
> -----Original Message----- > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > Sent: Friday, June 21, 2002 6:32 AM > To: Tom Lane > Cc: Neil Conway; mloftis@wgops.com; Dann Corbit; > pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] What is wrong with hashed index usage? > > > Tom Lane wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > <para> > > > ! Because of the limited utility of hash indexes, a > B-tree index > > > ! should generally be preferred over a hash index. > We do not have > > > ! sufficient evidence that hash indexes are actually > faster than > > > ! B-trees even for <literal>=</literal> comparisons. > Moreover, > > > ! hash indexes require coarser locks; see <xref > > > ! linkend="locking-indexes">. > > > </para> > > > </note> > > > </para> > > > --- 181,189 ---- > > > </synopsis> > > > <note> > > > <para> > > > ! Testing has shown that hash indexes are slower > than btree indexes, > > > ! and the size and build time for hash indexes is > much worse. For > > > ! these reasons, hash index use is discouraged. > > > > This change strikes me as a step backwards. The existing > wording tells > > the truth; the proposed revision removes the facts in favor > of a blanket > > assertion that is demonstrably false. > > OK, which part of is "demonstrably false"? I think the old "should > generally be preferred" is too vague. No one has come up with a case > where hash has shown to be faster, and a lot of cases where > it is slower. I agree with Tom. Maybe it is not true for PostgreSQL that hashed indexes are better, but for every other database if you are doing single lookups and do not need to order the items sequentially, hashed indexes are better. What this indicates to me is that hashed indexes could {potentially} be much better implemented for PostgreSQL. See section 2.4: http://citeseer.nj.nec.com/cache/papers/cs/21214/http:zSzzSzwww.cs.cmu.e duzSz~christoszSzcourseszSz826-resourceszSzFOILS-LATEXzSzslides.pdf/inde xing-multimedia-databases.pdf See http://ycmi.med.yale.edu/nadkarni/db_course/NonStd_Contents.htm See also: http://www-courses.cs.uiuc.edu/~cs411/RR2_goodpoints.html From the Oracle Rdb documentation: 1.3.5 Retrieval Methods Oracle Rdb provides several methods for retrieving or accessing data. In the physical design of your database, consider that Oracle Rdb can use one or more of the following methods to retrieve the rows in a table: Sequential: locating a row or rows in sequence by retrieving data within a logical area Sorted index lookup with value retrieval: using the database key (dbkey) for the value from the index to retrieve the row Sorted index only: using data values in the index key pertinent to your query Hashed index retrieval: for retrieving exact data value matches Dbkey only: retrieving a row through its dbkey You determine the retrieval method Oracle Rdb chooses by creating one or more sorted or hashed indexes. Sorted index retrieval provides indexed sequential access to rows in a table. (A sorted index is also called a B-tree index.) By contrast, hashed index retrieval, also known as hash-addressing, provides direct retrieval of a specific row. Retrieval of a row is based on a given value of some set of columns in the row (called the search key). Use a hashed index primarily for random, direct retrieval when you can supply the entire hashed key on which the hashed index is defined, such as an employee identification number (ID). For this kind of retrieval, input/output operations can be significantly reduced, particularly for tables with many rows and large indexes. For example, to retrieve a row using a sorted index that is four levels deep, Oracle Rdb may need to do a total of five input/output operations, one for each level of the sorted index and one to retrieve the actual row. By using a hashed index, the number of input/output operations may be reduced to one or two because hashed index retrieval retrieves the row directly.
pgsql-hackers by date: