Thread: [PATCH]-hash index improving

[PATCH]-hash index improving

From
"Xiao Meng"
Date:
The patch store hash code only in the index tuple.
It based on  Neil Conway's patch with an old version of PostgreSQL.
It passes the regression test but I didn't test the performance yet.
Anyone interested can make a performance test;-)
You can undefine the macro HASHVALUE_ONLY in hash.h to get the
original implementation.
It's a preliminary implementation and  I'm looking for input here.
Hope to hear from you.

--
Best Regards,
Xiao Meng

DKERC, Harbin Institute of Technology, China
Gtalk: mx.cogito@gmail.com
MSN: cnEnder@live.com
http://xiaomeng.yo2.cn

Attachment

Re: [PATCH]-hash index improving

From
"Jonah H. Harris"
Date:
On Thu, Jul 17, 2008 at 5:26 AM, Xiao Meng <mx.cogito@gmail.com> wrote:
> The patch store hash code only in the index tuple.
> It based on  Neil Conway's patch with an old version of PostgreSQL.
> It passes the regression test but I didn't test the performance yet.
> Anyone interested can make a performance test;-)
> You can undefine the macro HASHVALUE_ONLY in hash.h to get the
> original implementation.
> It's a preliminary implementation and  I'm looking for input here.
> Hope to hear from you.

Tom, Simon, Heikki, Greg, we need to make sure this SoC project
succeeds and would appreciate your review and input.

Based on some of Kenneth Marshall and Tom Raney's past hash index test
cases, Xiao and I are going to work on some benchmarks for measuring
the performance-related aspects of this project.

Thanks!

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/


Re: [PATCH]-hash index improving

From
Alvaro Herrera
Date:
Xiao Meng escribió:
> The patch store hash code only in the index tuple.
> It based on  Neil Conway's patch with an old version of PostgreSQL.
> It passes the regression test but I didn't test the performance yet.
> Anyone interested can make a performance test;-)
> You can undefine the macro HASHVALUE_ONLY in hash.h to get the
> original implementation.

I think having the HASHVALUE_ONLY define is not a good idea -- it just
makes the patch harder to read.  I suggest just removing the old code
and putting the new code in place.  (That's why we have revision
control.)


-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: [PATCH]-hash index improving

From
"Jonah H. Harris"
Date:
On Thu, Jul 17, 2008 at 12:42 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> I think having the HASHVALUE_ONLY define is not a good idea -- it just
> makes the patch harder to read.  I suggest just removing the old code
> and putting the new code in place.  (That's why we have revision
> control.)

Agreed.  I'm also getting a crash on it with large data sets, so we'll
make sure to get those fixed in the next version of the patch.

-Jonah


Re: [PATCH]-hash index improving

From
Kenneth Marshall
Date:
On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote:
> Xiao Meng escribi?:
> > The patch store hash code only in the index tuple.
> > It based on  Neil Conway's patch with an old version of PostgreSQL.
> > It passes the regression test but I didn't test the performance yet.
> > Anyone interested can make a performance test;-)
> > You can undefine the macro HASHVALUE_ONLY in hash.h to get the
> > original implementation.
> 
> I think having the HASHVALUE_ONLY define is not a good idea -- it just
> makes the patch harder to read.  I suggest just removing the old code
> and putting the new code in place.  (That's why we have revision
> control.)
> 
One thing it helps is building an old version and a new version
for comparative testing. Otherwise, you could end up with an apples-to-
oranges comparison. I certainly think that the final patch should not
have it, but it is useful now for testing and comparisons.

My two cents,
Ken


Re: [PATCH]-hash index improving

From
Kenneth Marshall
Date:
On Thu, Jul 17, 2008 at 02:00:07PM -0400, Jonah H. Harris wrote:
> On Thu, Jul 17, 2008 at 1:54 PM, Kenneth Marshall <ktm@rice.edu> wrote:
> >> I think having the HASHVALUE_ONLY define is not a good idea -- it just
> >> makes the patch harder to read.  I suggest just removing the old code
> >> and putting the new code in place.  (That's why we have revision
> >> control.)
> >>
> > One thing it helps is building an old version and a new version
> > for comparative testing. Otherwise, you could end up with an apples-to-
> > oranges comparison. I certainly think that the final patch should not
> > have it, but it is useful now for testing and comparisons.
> 
> Yes, that's why Xiao did it that way.  However, we traditionally just
> submit a patch with only the changes and it's up to the person testing
> to have an identical build-tree without the patch for testing.
> Another reason for it is that even if you build without the define,
> the patch author may have mistakenly added something outside the ifdef
> which could impact testing.
> 
> I agree with Alvaro that we should submit it as a standard change patch.

Okay, that makes sense.

Ken


Re: [PATCH]-hash index improving

From
"Jonah H. Harris"
Date:
On Thu, Jul 17, 2008 at 1:54 PM, Kenneth Marshall <ktm@rice.edu> wrote:
>> I think having the HASHVALUE_ONLY define is not a good idea -- it just
>> makes the patch harder to read.  I suggest just removing the old code
>> and putting the new code in place.  (That's why we have revision
>> control.)
>>
> One thing it helps is building an old version and a new version
> for comparative testing. Otherwise, you could end up with an apples-to-
> oranges comparison. I certainly think that the final patch should not
> have it, but it is useful now for testing and comparisons.

Yes, that's why Xiao did it that way.  However, we traditionally just
submit a patch with only the changes and it's up to the person testing
to have an identical build-tree without the patch for testing.
Another reason for it is that even if you build without the define,
the patch author may have mistakenly added something outside the ifdef
which could impact testing.

I agree with Alvaro that we should submit it as a standard change patch.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/


Re: [PATCH]-hash index improving

From
Alvaro Herrera
Date:
Kenneth Marshall escribió:
> On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote:

> > I think having the HASHVALUE_ONLY define is not a good idea -- it just
> > makes the patch harder to read.  I suggest just removing the old code
> > and putting the new code in place.  (That's why we have revision
> > control.)
> > 
> One thing it helps is building an old version and a new version
> for comparative testing. Otherwise, you could end up with an apples-to-
> oranges comparison. I certainly think that the final patch should not
> have it, but it is useful now for testing and comparisons.

For this purpose I think it would be easier to have a separate tree with
the patch, and one without it.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: [PATCH]-hash index improving

From
"Jonah H. Harris"
Date:
On Thu, Jul 17, 2008 at 5:26 AM, Xiao Meng <mx.cogito@gmail.com> wrote:
> The patch store hash code only in the index tuple.
> It based on  Neil Conway's patch with an old version of PostgreSQL.
> It passes the regression test but I didn't test the performance yet.
> Anyone interested can make a performance test;-)
> You can undefine the macro HASHVALUE_ONLY in hash.h to get the
> original implementation.
> It's a preliminary implementation and  I'm looking for input here.
> Hope to hear from you.

I've spent some time today performing tests similar to those mentioned
here (http://archives.postgresql.org/pgsql-hackers/2007-09/msg00208.php)

Using a word list of 2650024 unique words (maximum length is 118
bytes), build times are still high, but I'm not really seeing any
performance improvements over b-tree.  I haven't profiled it yet, but
my test is as follows:

- Created the dict table
- Loaded the dict table
- Counted the records in the dict table
- Created the index
- Shutdown the database
- Randomly selected 200 entries from the word list and built a file
full of (SELECT * FROM dict WHERE word = '<word>') queries using them.
- Cleared out the kernel cache
- Started the database
- Ran the query file

The result of this is between 5-10ms improvement in the overall
execution time of all 200 queries.  The time-per-query is practically
unnoticeable.  As this is in the range of noise, methinks there's a
larger problem with hash indexes.  I haven't looked heavily into their
implementation, but do you any of you know of any major design flaws?

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/


Re: [PATCH]-hash index improving

From
Kenneth Marshall
Date:
On Thu, Jul 17, 2008 at 04:24:28PM -0400, Jonah H. Harris wrote:
> On Thu, Jul 17, 2008 at 5:26 AM, Xiao Meng <mx.cogito@gmail.com> wrote:
> > The patch store hash code only in the index tuple.
> > It based on  Neil Conway's patch with an old version of PostgreSQL.
> > It passes the regression test but I didn't test the performance yet.
> > Anyone interested can make a performance test;-)
> > You can undefine the macro HASHVALUE_ONLY in hash.h to get the
> > original implementation.
> > It's a preliminary implementation and  I'm looking for input here.
> > Hope to hear from you.
> 
> I've spent some time today performing tests similar to those mentioned
> here (http://archives.postgresql.org/pgsql-hackers/2007-09/msg00208.php)
> 
> Using a word list of 2650024 unique words (maximum length is 118
> bytes), build times are still high, but I'm not really seeing any
> performance improvements over b-tree.  I haven't profiled it yet, but
> my test is as follows:
> 
> - Created the dict table
> - Loaded the dict table
> - Counted the records in the dict table
> - Created the index
> - Shutdown the database
> - Randomly selected 200 entries from the word list and built a file
> full of (SELECT * FROM dict WHERE word = '<word>') queries using them.
> - Cleared out the kernel cache
> - Started the database
> - Ran the query file
> 
> The result of this is between 5-10ms improvement in the overall
> execution time of all 200 queries.  The time-per-query is practically
> unnoticeable.  As this is in the range of noise, methinks there's a
> larger problem with hash indexes.  I haven't looked heavily into their
> implementation, but do you any of you know of any major design flaws?
> 
Jonah,

Thank you for running these tests. I was trying to reproduce my initial
tests on the original system to make it more apples to apples, but the
latest release needs more resources semaphore-wise than the 8.2 and
to fix it on Solaris 8 I will need a reboot. Would you mind posting
the timings for the hash_only index versus the hash_value versus the
btree index for the same test. Also, what is the on-disk size of all
three indexes? This will allow us to figure out the bucket/page load
or fill-factor for each scenario.

The basic implementation looked reasonable. I will take a look at
the patch this evening.

Regards,
Ken


Re: [PATCH]-hash index improving

From
Simon Riggs
Date:
On Thu, 2008-07-17 at 16:24 -0400, Jonah H. Harris wrote:
> I'm not really seeing any performance improvements over b-tree.

I'd like to see a theoretical analysis on the benefit case before we
spend too many teraflops on investigation.

In which cases do we expect that hash indexes will beat btrees?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: [PATCH]-hash index improving

From
"Dann Corbit"
Date:
-----Original Message-----

> From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Simon Riggs
> Sent: Thursday, July 17, 2008 4:10 PM

> To: Jonah H. Harris

> Cc: Xiao Meng; pgsql-hackers@postgresql.org; Kenneth Marshall

> Subject: Re: [HACKERS] [PATCH]-hash index improving

>

>

> On Thu, 2008-07-17 at 16:24 -0400, Jonah H. Harris wrote:

> > I'm not really seeing any performance improvements over b-tree.

>

> I'd like to see a theoretical analysis on the benefit case before we

> spend too many teraflops on investigation.

>

> In which cases do we expect that hash indexes will beat btrees?

Large table unique index equality search should be very fast with hashed
index (and the only place where any advantage will be seen).  Hashed
indexes are useless for any search besides equality and gain more and
more when the levels of the b-tree index increase.

Here is a hash index lookup that is frightfully fast:
http://www.corpit.ru/mjt/tinycdb.html

It's a constant database, but the file format might be worth
examination.  Here is a quickie definition of the CDB format:
http://cr.yp.to/cdb/cdb.txt

I think that the problem with hashed indexes tends to be the blocking of
index pages on disk.  For instance, the FastDB/GigaBase author was able
to make FastDB memory based hash indexes that are faster than the memory
based b-tree versions, but not for the disk based GigaBase database.
The only way to get better performance from hash based indexes is to
read fewer index pages than if a tree-based index were used.  So I think
that the scheme used to create the index pages is the focus to make them
worthwhile.



Re: [PATCH]-hash index improving

From
David Fetter
Date:
On Thu, Jul 17, 2008 at 02:11:20PM -0400, Alvaro Herrera wrote:
> Kenneth Marshall escribió:
> > On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote:
> 
> > > I think having the HASHVALUE_ONLY define is not a good idea --
> > > it just makes the patch harder to read.  I suggest just removing
> > > the old code and putting the new code in place.  (That's why we
> > > have revision control.)
> > > 
> > One thing it helps is building an old version and a new version
> > for comparative testing. Otherwise, you could end up with an
> > apples-to- oranges comparison. I certainly think that the final
> > patch should not have it, but it is useful now for testing and
> > comparisons.
> 
> For this purpose I think it would be easier to have a separate tree
> with the patch, and one without it.

Here's one tree.  Anyone can get an initial copy as follows:

git clone http://git.postgresql.org/git/~davidfetter/hash/.git

Xiao Meng, if you let me know where your git repo is, say by cloning
onto a machine I can see from the internet and applying your patches
to it, I can pull and let others see it :)

Yes, I know it's a little cumbersome, but we'll get something slicker
as we figure out what "slicker" really should mean.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: [PATCH]-hash index improving

From
"Jonah H. Harris"
Date:
On Thu, Jul 17, 2008 at 7:37 PM, Dann Corbit <DCorbit@connx.com> wrote:

>> In which cases do we expect that hash indexes will beat btrees?
>
> Large table unique index equality search should be very fast with hashed
> index (and the only place where any advantage will be seen).

Yes, this is the exact use-case.  Likewise, Dan has provided a good
description regarding the primary implementation goals of a disk-based
hash table.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/


Re: [PATCH]-hash index improving

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Xiao Meng escribi�:
>> You can undefine the macro HASHVALUE_ONLY in hash.h to get the
>> original implementation.

> I think having the HASHVALUE_ONLY define is not a good idea -- it just
> makes the patch harder to read.

While we are griping about readability: failure to update the comments
to match the code is NOT, NOT, NOT acceptable.  I had barely started
to read the patch before encountering this insult to the reader:    /* Hash indexes are never lossy (at the moment
anyway)*/
 
-    scan->xs_recheck = false;
+#ifdef HASHVALUE_ONLY
+    scan->xs_recheck = true;
+#else
+    scan->xs_recheck = false;
+#endif

The fact that the patch doesn't touch backend/access/hash/README is
already grounds for rejection, but can't you be bothered to fix a
comment that is literally one line away from where you are making
it wrong?
        regards, tom lane


Re: [PATCH]-hash index improving

From
"Jonah H. Harris"
Date:
On Fri, Jul 18, 2008 at 1:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> While we are griping about readability: failure to update the comments
> to match the code is NOT, NOT, NOT acceptable.  I had barely started
> to read the patch before encountering this insult to the reader:
> ...

As this is Xiao's first patch to the community, I'd appreciate it if
you guys would chill a little.  I'm fully aware of what needs to be
done with it clean-up wise, but given that we're under some time
constraints, I asked that he submit his preliminary patch for those
who wanted to preview/test it.

Again, this patch is nowhere near acceptance quality, nor was it
intended to be.  I'll make sure he gets the next one cleaned up prior
to submitting it.

The real question now has been listed above; why are hash index
queries, including this patch, no better than b-tree even for straight
equality comparisons?  Because hash is lossy, we could easily be
performing multiple I/Os for recheck, and that may be causing some of
the performance problems.  I haven't checked I/O for that yet, but was
wondering if you (Tom) knew of any major design/implementation
deficiency we should be taking into consideration regarding PG's hash
index implementation?

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/


Re: [PATCH]-hash index improving

From
Tom Lane
Date:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:
> The real question now has been listed above; why are hash index
> queries, including this patch, no better than b-tree even for straight
> equality comparisons?

Well, the theoretical advantage is that a hash probe is O(1) while a
btree probe is O(log N) (ignoring a lot of possible nonoptimalities
on each side of course).  So you'd need a test case large enough for
log N to be noticeably more than 1, which I think means in practice that
the upper levels of the btree don't fit in memory.  So small test cases
aren't going to show any improvement.

I think the historical problem with our hash implementation has been
that it was terribly space-inefficient, because of the fixed (and large)
bucket size.  A dense btree index can probably beat a sparse hash up to
exceedingly large values of N.  It's not clear to me how much the patch
at hand does to fix that.

The patch might introduce a new problem, which is extra heap visits
caused by the lossiness of the hash value.  Again, that doesn't hurt
much ideally, but maybe you chanced to use a test case where it does
hurt.  It would be worth sticking in a bit of debug instrumentation to
see whether the number of failed heap visits is significant.

In any case, the reported test seems to be dealing with index sizes in
the tens-of-megabytes range, and it doesn't surprise me a whole lot that
an O(log N) penalty isn't showing up there.  Try a test case where the
index size is significantly larger than your RAM.
        regards, tom lane


Re: [PATCH]-hash index improving

From
Simon Riggs
Date:
On Thu, 2008-07-17 at 16:37 -0700, Dann Corbit wrote:

> Large table unique index equality search should be very fast with hashed
> index (and the only place where any advantage will be seen).  Hashed
> indexes are useless for any search besides equality and gain more and
> more when the levels of the b-tree index increase.

I think a comparison with a btree using a functional index should be
shown.

> The only way to get better performance from hash based indexes is to
> read fewer index pages than if a tree-based index were used.  So I think
> that the scheme used to create the index pages is the focus to make them
> worthwhile.

Agreed. Some math on that, plus a clear focus on making this faster than
a btree is critical to this project.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: [PATCH]-hash index improving

From
Gregory Stark
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:

> On Thu, 2008-07-17 at 16:37 -0700, Dann Corbit wrote:
>
>> Large table unique index equality search should be very fast with hashed
>> index (and the only place where any advantage will be seen).  Hashed
>> indexes are useless for any search besides equality and gain more and
>> more when the levels of the b-tree index increase.
>
> I think a comparison with a btree using a functional index should be
> shown.

To do that you'll have to think about the use cases you think hash should win
on. 

For cpu-bound databases with small indexes there might be a win if you can
avoid the binary search of all the elements on a page. (Have we modified btree
to do that or does it still scan sequentially on the leaf pages?)

For i/o-bound databases with very large indexes there should be an opportunity
where btree lookups are O(logn) and hash lookups can in theory be O(1).

However to get O(1) hash lookups need to do extra work at insert time. If they
don't expand the hash as necessary then they end up just being a linear
speedup to whatever lookup algorithm is used to scan the buckets. That isn't
going to win over btree which is already doing a binary search.

The extra work on insert time is O(nlogn) amortized, but I'm not sure
good amortized performance is good enough for Postgres. Users are unhappy when
they're average performance is good but 1/1000 inserts thrashes their i/o
rewriting the whole index... 

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!


Re: [PATCH]-hash index improving

From
Simon Riggs
Date:
On Fri, 2008-07-18 at 11:07 +0100, Gregory Stark wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:

> hash lookups can in theory be O(1).

I'm not sure whether that applies here? I'm interested in how *this*
patch will work, not in more generic algorithm theory.

To patch authors: Can we please see a table showing expected number of
logical I/Os (i,e, block accesses) for btrees and hash indexes

e.g. for 100-byte rows...

rows    btree        hash
----    -----        ----
10^2
10^3
10^4
10^5
10^6
10^7
10^8

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: [PATCH]-hash index improving

From
Alvaro Herrera
Date:
Gregory Stark escribió:

> For cpu-bound databases with small indexes there might be a win if you can
> avoid the binary search of all the elements on a page. (Have we modified btree
> to do that or does it still scan sequentially on the leaf pages?)

Hmm?  It has used binary search since as long as I can remember ... see
_bt_first and _bt_binsrch.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: [PATCH]-hash index improving

From
"Heikki Linnakangas"
Date:
Gregory Stark wrote:
> For i/o-bound databases with very large indexes there should be an opportunity
> where btree lookups are O(logn) and hash lookups can in theory be O(1).

Ignoring the big-O complexity, if a hash index only stores a 32-bit hash 
code instead of the whole key, it could be a big win in storage size, 
and therefore in cache-efficiency and performance, when the keys are 
very long.

Granted, it's not very common to use a 1K text field as a key column...

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: [PATCH]-hash index improving

From
Gregory Stark
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

> Gregory Stark wrote:
>> For i/o-bound databases with very large indexes there should be an opportunity
>> where btree lookups are O(logn) and hash lookups can in theory be O(1).
>
> Ignoring the big-O complexity, if a hash index only stores a 32-bit hash code
> instead of the whole key, it could be a big win in storage size, and therefore
> in cache-efficiency and performance, when the keys are very long.

I think it has to show an improvement over an expression index over
(hashany(col)) and not just an improvement over an index over "col" due to col
being large.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!


Re: [PATCH]-hash index improving

From
"Jonah H. Harris"
Date:
On Fri, Jul 18, 2008 at 10:44 AM, Heikki Linnakangas
<heikki@enterprisedb.com> wrote:
> Ignoring the big-O complexity, if a hash index only stores a 32-bit hash
> code instead of the whole key, it could be a big win in storage size, and
> therefore in cache-efficiency and performance, when the keys are very long.

Agreed.  My thinking is that there's either something inherently wrong
with the implementation, or we're performing so many disk I/Os that
it's nearly equivalent to b-tree.  Tom has a couple suggestions which
Xiao and I will explore.

> Granted, it's not very common to use a 1K text field as a key column...

Especially for direct equality comparison :)

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/


Re: [PATCH]-hash index improving

From
Tom Lane
Date:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:
> Agreed.  My thinking is that there's either something inherently wrong
> with the implementation, or we're performing so many disk I/Os that
> it's nearly equivalent to b-tree.  Tom has a couple suggestions which
> Xiao and I will explore.

I finally got a chance to look through the patch in some detail.
If I haven't missed something, there are just two improvement ideas
embodied in it:

1. Store just the hash value, and not the index key value, in hash
index entries.  (This means that all hash indexscans become lossy
and have to be rechecked at the heap.)

2. Keep the contents of each index page ordered by hash value, and use
binary instead of linear search to find the matching item(s) during an
indexscan.  (This depends on #1 because recomputing the hash values
during the binary search would be too expensive --- although you could
also make it work by storing *both* the hash value and the original
key.)

I suppose that the main point of #1 is to reduce index size by
allowing more tuples to fit in each bucket.  However, the patch
neglects to adjust the target-fillfactor calculation in _hash_metapinit,
which means that the code won't actually put any more tuples per bucket
(on average) than it did before.  So the index isn't actually any
smaller and you get no I/O savings --- you just have more unused
space on a typical page.  Fixing that might help.

FWIW, I had always assumed that part of the solution to hash's woes
would involve decoupling the bucket size from the page size, so
that you could have multiple buckets per page.  But maybe the
binary-search idea makes that unnecessary.  I'm not sure.  A whole
lot depends on how evenly the buckets get filled.  You should probably
investigate how many tuples actually end up in each bucket with and
without the patch.

In the realm of micro-optimizations that might be significant, I think
you really need to get rid of all those _create_hash_desc calls,
particularly the one in _hash_checkqual which is part of the inner loop
of an indexscan.  Not only are they slow but they probably represent a
serious memory leak in a scan that returns many tuples.  For reading the
hash value out of an existing index tuple, I don't think you should be
bothering with a tupdesc at all --- don't use index_getattr, just map a
C struct onto the known layout of a indextuple with a single never-null
int field.  This would particularly make for a noticeable improvement in
the speed of _hash_binsearch.  The tupdescs used in storing an index
entry are probably needed, but you could just use a single statically
declared tupdesc (look at the ones in relcache.c for examples of
building one as a constant).
        regards, tom lane


Re: [PATCH]-hash index improving

From
Kenneth Marshall
Date:
I just ran my original 16M word test case against the patched
version, and like Tom noted below, the tuples per bucket
calculation is wrong which results in identical index sizes
for both the original version and the hash-value-only version.

> I suppose that the main point of #1 is to reduce index size by
On Fri, Jul 18, 2008 at 12:23:25PM -0400, Tom Lane wrote:
> "Jonah H. Harris" <jonah.harris@gmail.com> writes:
> > Agreed.  My thinking is that there's either something inherently wrong
> > with the implementation, or we're performing so many disk I/Os that
> > it's nearly equivalent to b-tree.  Tom has a couple suggestions which
> > Xiao and I will explore.
> 
> I finally got a chance to look through the patch in some detail.
> If I haven't missed something, there are just two improvement ideas
> embodied in it:
> 
> 1. Store just the hash value, and not the index key value, in hash
> index entries.  (This means that all hash indexscans become lossy
> and have to be rechecked at the heap.)
> 
> 2. Keep the contents of each index page ordered by hash value, and use
> binary instead of linear search to find the matching item(s) during an
> indexscan.  (This depends on #1 because recomputing the hash values
> during the binary search would be too expensive --- although you could
> also make it work by storing *both* the hash value and the original
> key.)
> 
> allowing more tuples to fit in each bucket.  However, the patch
> neglects to adjust the target-fillfactor calculation in _hash_metapinit,
> which means that the code won't actually put any more tuples per bucket
> (on average) than it did before.  So the index isn't actually any
> smaller and you get no I/O savings --- you just have more unused
> space on a typical page.  Fixing that might help.
> 
> FWIW, I had always assumed that part of the solution to hash's woes
> would involve decoupling the bucket size from the page size, so
> that you could have multiple buckets per page.  But maybe the
> binary-search idea makes that unnecessary.  I'm not sure.  A whole
> lot depends on how evenly the buckets get filled.  You should probably
> investigate how many tuples actually end up in each bucket with and
> without the patch.
> 
I think that while the binary-search idea will improve the lookup
over the original sequential scan of the bucket, it makes updates
much more expensive particularly with buckets approaching 100%
full. The idea that I have been mulling over tries to improve access
times by breaking a bucket in mini-virtual buckets within a page. We
restrict the size of the mini-bucket to be pagesize/(1/2^n). The
sweet spot should be around n=6 or 7 which for an 8k pagesize yields
a mini-bucket size of 32 or 64 bytes. Then the search for the value
in a page is to read the virtual bucket corresponding to the n bits
of the hash value.

The second piece is to take advantage of the fact that the size of
the mini-bucket is not an even multiple of the size of a hash index
tuple and aggregate all the lost space for use as the "first" overflow
page for all of a pages mini-buckets. This avoids the I/O needed to
read a full overflow page from disk and accomodates the imperfections
in the hash function distribution. The overflow pages, both the virtual
"first" and subsequent real pages would benefit from the binary lookups.
It may also be worth storing the high and low hash values specially to
avoid the search in a page if its value would not be on the page.

> In the realm of micro-optimizations that might be significant, I think
> you really need to get rid of all those _create_hash_desc calls,
> particularly the one in _hash_checkqual which is part of the inner loop
> of an indexscan.  Not only are they slow but they probably represent a
> serious memory leak in a scan that returns many tuples.  For reading the
> hash value out of an existing index tuple, I don't think you should be
> bothering with a tupdesc at all --- don't use index_getattr, just map a
> C struct onto the known layout of a indextuple with a single never-null
> int field.  This would particularly make for a noticeable improvement in
> the speed of _hash_binsearch.  The tupdescs used in storing an index
> entry are probably needed, but you could just use a single statically
> declared tupdesc (look at the ones in relcache.c for examples of
> building one as a constant).
> 
+1

>             regards, tom lane
> 

I think that this sort of virtual bucket would allow us to take
better advantage of the O(1) behavior. What do you all think?

Regards,
Ken


Re: [PATCH]-hash index improving

From
Kenneth Marshall
Date:
FYI,

I just patched the fill-factor calculation and re-ran my test.
The index size dropped from 513M to 43M which is the same disk
footprint as the corresponding btree index.

Have a nice weekend.
Ken

On Fri, Jul 18, 2008 at 12:23:14PM -0500, Kenneth Marshall wrote:
> I just ran my original 16M word test case against the patched
> version, and like Tom noted below, the tuples per bucket
> calculation is wrong which results in identical index sizes
> for both the original version and the hash-value-only version.
> 


Re: [PATCH]-hash index improving

From
"Xiao Meng"
Date:
I'm sorry for delay reply. I couldn't get access to the internet these
days for some reason.
I do apologize for my rough work and very bad readability. I posted it
in a hurry and I didn't mean to  cause the reader so much
inconvenience.
I'll NEVER make such a mistake again.
Currently, I've made some optimization Tom advised and removed the
macro HASHVALUE_ONLY.  And I'm working on fixing the problem that it
crashed in large data set.
I'll post a new patch later.
Thank you for all your advice and test.

-- 
Best Regards,
Xiao Meng

DKERC, Harbin Institute of Technology, China
Gtalk: mx.cogito@gmail.com
MSN: cnEnder@live.com
http://xiaomeng.yo2.cn


Re: [PATCH]-hash index improving

From
"Xiao Meng"
Date:
Well, I'll do it after I finish my second patch.
Hash index should be more efficient than btree when N is big enough.
It seems meaningful to find how big N is in an experiment way.

On Fri, Jul 18, 2008 at 6:35 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On Fri, 2008-07-18 at 11:07 +0100, Gregory Stark wrote:
>> "Simon Riggs" <simon@2ndquadrant.com> writes:
>
>> hash lookups can in theory be O(1).
>
> I'm not sure whether that applies here? I'm interested in how *this*
> patch will work, not in more generic algorithm theory.
>
> To patch authors: Can we please see a table showing expected number of
> logical I/Os (i,e, block accesses) for btrees and hash indexes
>
> e.g. for 100-byte rows...
>
> rows    btree           hash
> ----    -----           ----
> 10^2
> 10^3
> 10^4
> 10^5
> 10^6
> 10^7
> 10^8
>
> --
>  Simon Riggs           www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
>
>



-- 
Best Regards,
Xiao Meng

DKERC, Harbin Institute of Technology, China
Gtalk: mx.cogito@gmail.com
MSN: cnEnder@live.com
http://xiaomeng.yo2.cn


Re: [PATCH]-hash index improving

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Xiao Meng
> Sent: Tuesday, July 22, 2008 7:57 PM
> To: Simon Riggs
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] [PATCH]-hash index improving
> 
> Well, I'll do it after I finish my second patch.
> Hash index should be more efficient than btree when N is big enough.
> It seems meaningful to find how big N is in an experiment way.

The savings will depend on many factors.  Another thing (besides volume) which is important is the sort of key data
beingindexed.
 

Consider a unique key on six varchar(40) fields:

1.  Country
2.  State/Province
3.  City
4.  Division
5.  Branch
6.  Office

Typically, a key such as this will have lots of clumps of similar data, only being differentiated with the final
segment. This sort of index is often used for reporting purposes.  To determine a unique entry, it is not unlikely that
morethan 200 characters will be traversed.  A hash index gets a special boost here because a much smaller data
signatureis stored.  Even a 64 bit hash occupies only 8 bytes.  On the other hand, consider an index on a field
consistingof a single character.  Here, the pages of the b-tree will have a huge volume of entries per page, requiring
fewerpages to search, and the hash index is many times larger and hence more pages will have to be loaded.
 

These things make a hash index desirable:
1. Unique index
2. Long keys
3. Huge data cardinality
4. Equality search

These things make a hash index undesirable:
1.  Non-unique index
2.  Short keys
3.  Small data sets

These things render a hash index as worthless (except in COMBINATION with a b-tree type index):
1.  Need for range searches like BETWEEN
2.  Need for ORDER BY on the column(s)

As an aside:
I guess it will also be nice if you can CLUSTER both index and data values on the hash.  It may need a different
algorithmthan a b-tree clustering concept.  I know that Rdb uses different kinds of storage areas for hashed indexes
versesb-tree indexes.
 

This effort to create hashed indexes is very valuable.  Because it becomes more and more dominant as the data scales
up,right at the point when things get tougher is when it becomes the most helpful.  If you have a tiny table, it does
noteven matter if you index it, because (for instance) 10 rows will probably always stay in memory and iteration will
findwhat is needed instantly.  But if you have hundreds of millions of rows or billions of rows, now is when
performancereally matters.  So when the data scales to preposterous size (which it has an uncanny ability to do) the
boostof performance becomes even more valuable.
 


Re: [PATCH]-hash index improving

From
Simon Riggs
Date:
On Wed, 2008-07-23 at 10:57 +0800, Xiao Meng wrote:
> Well, I'll do it after I finish my second patch.
> Hash index should be more efficient than btree when N is big enough.
> It seems meaningful to find how big N is in an experiment way.

Agreed.

We should also examine the basic thinking of the index.

My understanding is that it dynamically resizes hash as the index grows.
If we already believe the only benefit would come when the index is
large, having special handling for small tables seems like a waste of
time because we will never use it in those contexts. 

So starting the hash at a fairly large size makes more sense than it
might otherwise seem to.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: [PATCH]-hash index improving

From
Kenneth Marshall
Date:
On Tue, Jul 22, 2008 at 08:36:34PM -0700, Dann Corbit wrote:
> > -----Original Message-----
> > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> > owner@postgresql.org] On Behalf Of Xiao Meng
> > Sent: Tuesday, July 22, 2008 7:57 PM
> > To: Simon Riggs
> > Cc: pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] [PATCH]-hash index improving
> > 
> > Well, I'll do it after I finish my second patch.
> > Hash index should be more efficient than btree when N is big enough.
> > It seems meaningful to find how big N is in an experiment way.
> 
> The savings will depend on many factors.  Another thing (besides volume) which is important is the sort of key data
beingindexed.
 
> 
> Consider a unique key on six varchar(40) fields:
> 
> 1.  Country
> 2.  State/Province
> 3.  City
> 4.  Division
> 5.  Branch
> 6.  Office
> 
> Typically, a key such as this will have lots of clumps of similar data, only being differentiated with the final
segment. This sort of index is often used for reporting purposes.  To determine a unique entry, it is not unlikely that
morethan 200 characters will be traversed.  A hash index gets a special boost here because a much smaller data
signatureis stored.  Even a 64 bit hash occupies only 8 bytes.  On the other hand, consider an index on a field
consistingof a single character.  Here, the pages of the b-tree will have a huge volume of entries per page, requiring
fewerpages to search, and the hash index is many times larger and hence more pages will have to be loaded.
 
> 
> These things make a hash index desirable:
> 1. Unique index
> 2. Long keys
> 3. Huge data cardinality
> 4. Equality search
> 
> These things make a hash index undesirable:
> 1.  Non-unique index
> 2.  Short keys
> 3.  Small data sets
> 
I mentioned in a previous E-mail, adding some additional informational settings
that can be used like fill-factor to improve the layout and performance of a hash
index. They are roughly:  - number of elements  - maximum number of elements  - multiplicity - estimate of element
repetitionin a non-unique index
 
Knowing the number of elements in advance can allow the index to be pre-created
in the optimal layout and disk footprint. For every multiple of 256, you can
reduce the space needed by the hash value stored by 8-bits. For large indexes
you can store a 64-bit hash in the same space as the 32-bit hash in a small
index. This will allow for the use of larger hash values which will result in
better data distribution between the buckets and more O(1) like behavior.

> These things render a hash index as worthless (except in COMBINATION with a b-tree type index):
> 1.  Need for range searches like BETWEEN
> 2.  Need for ORDER BY on the column(s)
> 
> As an aside:
> I guess it will also be nice if you can CLUSTER both index and data values on the hash.  It may need a different
algorithmthan a b-tree clustering concept.  I know that Rdb uses different kinds of storage areas for hashed indexes
versesb-tree indexes.
 
> 
Clustering a hash index will allow for much smaller indexes through the use
of prefix-compression of the common heap tuple id's. Now an entry in the hash
index would need sizeof(hash) + sizeof(heap tuple id) which is 4 + 6 = 10bytes
before clustering. After clustering and for large indexes, this could drop to
4bytes per entry plus a constant value.

> This effort to create hashed indexes is very valuable.  Because it becomes more and more dominant as the data scales
up,right at the point when things get tougher is when it becomes the most helpful.  If you have a tiny table, it does
noteven matter if you index it, because (for instance) 10 rows will probably always stay in memory and iteration will
findwhat is needed instantly.  But if you have hundreds of millions of rows or billions of rows, now is when
performancereally matters.  So when the data scales to preposterous size (which it has an uncanny ability to do) the
boostof performance becomes even more valuable.
 

Although it is a clear theoretical benefit from the O(1) lookup for large indexes,
I think that the cross-over point between btree and hash indexes may take place
for smaller indexes than might be expected due to the possibly smaller memory footprint
needed for the hash index. Of course, this will all need to be tested.

Regards,
Ken