Thread: hash taboo?
I've been reading the postgresql manual and I find there is very little discussion about hash compared to btree. Most of the focus seems to be on using btree indices even that the default for 'create index' is btree also. From the documentation, it seems the only difference between either searching method is that btree can be used with multiple operators whilst hash can only be used with '='. Furthermore, hash seems to be contained in memory, so should be limited to small queries or, in my case, queries using limit (without using sort which would need to retrieve the entire data anyways). My conclusion is that if I can live with just using '=' and using slightly more memory, I should be using hash. Unfortunately, there is very little sign in the documentation that I should be using hash at all. Perhaps I have missed something important. If someone could help me make a more rational decision on using searching methods, I'd appreciate. Thanks in advance, Marc
Run some performace tests and let us know. > I've been reading the postgresql manual and I find there is very little > discussion about hash compared to btree. Most of the focus seems to be on > using btree indices even that the default for 'create index' is btree > also. From the documentation, it seems the only difference between either > searching method is that btree can be used with multiple operators whilst > hash can only be used with '='. Furthermore, hash seems to be contained in > memory, so should be limited to small queries or, in my case, queries > using limit (without using sort which would need to retrieve the entire > data anyways). > > My conclusion is that if I can live with just using '=' and using slightly > more memory, I should be using hash. Unfortunately, there is very little > sign in the documentation that I should be using hash at all. Perhaps I > have missed something important. > > If someone could help me make a more rational decision on using searching > methods, I'd appreciate. > > Thanks in advance, > Marc > > > ************ > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
My results were exactly the same for btree and hash, even when vacumming between each index creation. Here's my query: SELECT * FROM prod_base WHERE mid='2'; Here's my result: Index Scan using prod_mid_idx on prod_base (cost=2.05 rows=2 width=120) My database is perhaps not big enough to run some relevant tests, so please let me know if there's another way I could get a better idea of the resources used for using each searching method. > Run some performace tests and let us know. > > > I've been reading the postgresql manual and I find there is very little > > discussion about hash compared to btree. Most of the focus seems to be on > > using btree indices even that the default for 'create index' is btree > > also. From the documentation, it seems the only difference between either > > searching method is that btree can be used with multiple operators whilst > > hash can only be used with '='. Furthermore, hash seems to be contained in > > memory, so should be limited to small queries or, in my case, queries > > using limit (without using sort which would need to retrieve the entire > > data anyways). > > > > My conclusion is that if I can live with just using '=' and using slightly > > more memory, I should be using hash. Unfortunately, there is very little > > sign in the documentation that I should be using hash at all. Perhaps I > > have missed something important. > > > > If someone could help me make a more rational decision on using searching > > methods, I'd appreciate. > > > > Thanks in advance, > > Marc > > > > > > ************ > > > > > > > -- > Bruce Momjian | http://www.op.net/~candle > maillist@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 >
> My results were exactly the same for btree and hash, even when vacumming > between each index creation. Here's my query: > SELECT * FROM prod_base WHERE mid='2'; > > Here's my result: > Index Scan using prod_mid_idx on prod_base (cost=2.05 rows=2 width=120) > > My database is perhaps not big enough to run some relevant tests, so > please let me know if there's another way I could get a better idea of the > resources used for using each searching method. You have to look at index creation speed and index access speed. Not sure which one wins in each category. Also, index modification speed may be important. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 ************
Excellent point, your last comment gives me a tangible incentive for using hash instead of btree. Since I don't need to use other operators than '=', there is really no need to spend extra time creating a btree while all I need is a hash table. In the end, both are as fast for searching, but I gain some additional speed for inserting and removing entries. > > My results were exactly the same for btree and hash, even when vacumming > > between each index creation. Here's my query: > > SELECT * FROM prod_base WHERE mid='2'; > > > > Here's my result: > > Index Scan using prod_mid_idx on prod_base (cost=2.05 rows=2 width=120) > > > > My database is perhaps not big enough to run some relevant tests, so > > please let me know if there's another way I could get a better idea of the > > resources used for using each searching method. > > You have to look at index creation speed and index access speed. > > Not sure which one wins in each category. Also, index modification > speed may be important. Thanks again, Marc
> Excellent point, your last comment gives me a tangible incentive for using > hash instead of btree. Since I don't need to use other operators than '=', > there is really no need to spend extra time creating a btree while all I > need is a hash table. In the end, both are as fast for searching, but I > gain some additional speed for inserting and removing entries. Is the hash faster to create? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> My results were exactly the same for btree and hash, even when vacumming > between each index creation. Here's my query: > SELECT * FROM prod_base WHERE mid='2'; > > Here's my result: > Index Scan using prod_mid_idx on prod_base (cost=2.05 rows=2 width=120) > > My database is perhaps not big enough to run some relevant tests, so > please let me know if there's another way I could get a better idea of the > resources used for using each searching method. You have to look at index creation speed and index access speed. Not sure which one wins in each category. Also, index modification speed may be important. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 ************