Re: Experimenting with hash join prefetch - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Experimenting with hash join prefetch |
Date | |
Msg-id | 20200204013127.llyrwlf7e636wm5w@alap3.anarazel.de Whole thread Raw |
In response to | Re: Experimenting with hash join prefetch (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Experimenting with hash join prefetch
|
List | pgsql-hackers |
HI, On 2020-02-04 01:48:49 +1300, Thomas Munro wrote: > On Fri, Apr 12, 2019 at 4:23 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > ... if we also prefetch during > > the build phase ... > > Here's an experimental patch to investigate just that part. I tried > initiating a prefetch of the bucket just before we copy the tuple and > then finally insert it, but it doesn't seem to be far enough apart (at > least for small tuples), which is a shame because that'd be a one line > patch. So I made a little insert queue that prefetches and defers the > insertion until N tuples later, and then I managed to get between 10% > and 20% speed-up for contrived tests like this: How much of the benefit here comes from the prefetching, and how much just from writing the code in a manner that allows for more out-of-order execution? Because there's no dependencies preventing execution of the next queued tuple anymore, I'd assume that this is a good part what helps? Code like this really should look something roughly like: while (true) have_skew = False # get tuples for i in 0..batchsize: tuples[i] = ExecProcNode(outerNode); if tuples[i] == NULL: # have slowpath handle this break; # compute their hash values for i in 0..batchsize: hashvalues[i] = ExecHashGetHashValue(tuples[i]) # check whether go into skew buckets if have_skew_table: for i in 0..batchsize: skewbuckets[i] = ExecHashGetSkewBucket(tuples[i], hashvalues[i]) if (skewbuckets[i] != INVALID_SKEW_BUCKET_NO) have_skew = True if have_skew: # handle everything here continue # assume there's no skew tuples going forward, all handled above # compute bucket/batch for all tuples have_into_batch = False for i in 0..batchsize: hashbuckets[i] = ExecHashGetBucketAndBatch() if hashbuckets[i] != hashtable->curbatch: have_into_batchfile = True if have_into_batchfile: # slowpath continue # Allocate all tuples for i in 0..batchsize: hjtuples[i] = alloc_mintuple(hashtuples[i]) # And finally isert them for i in 0..batchsize: hjtuple.next = buckets[hashbuckets[i]] buckets[hashbuckets[i]] = hjtuple Obviously it's a bit more complicated in reality than this, but I do think that's where we've to go to make crucial parts like this faster (same with hashaggs, and a few other places). I would bet this helps significantly even if there's no prefetch instruction - but using explicit prefetching might help further. Also allows us to increase branch costs, because we can amortize them over a few tuples. > create unlogged table t as select generate_series(1, 100000000)::int i; > select pg_prewarm('t'); > set work_mem='8GB'; > > select count(*) from t t1 join t t2 using (i); > > master patched/N=1 patched/N=4 > workers=0 89.808s 80.556s (+11%) 76.864 (+16%) > workers=2 27.010s 22.679s (+19%) 23.503 (+15%) > workers=4 16.728s 14.896s (+12%) 14.167 (+18%) > > Just an early experiment, but I though it looked promising enough to share. Nice! Greetings, Andres Freund
pgsql-hackers by date: