Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date
Msg-id 26bb3b68-3c51-2dac-d6ce-ae321e6bc801@enterprisedb.com
Whole thread Raw
In response to Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
List pgsql-hackers
On 4/6/21 5:58 AM, David Rowley wrote:
> On Sat, 20 Mar 2021 at 09:41, James Coleman <jtc331@gmail.com> wrote:
>> I've attached a cleaned up patch. Last CF it was listed in is
>> https://commitfest.postgresql.org/29/2542/ -- what's the appropriate
>> step to take here given it's an already existing patch, but not yet
>> moved into recent CFs?
> 
> I had a look at this patch.  I like the idea of using a simplehash.h
> hash table to hash the constant values so that repeat lookups can be
> performed much more quickly, however, I'm a bit concerned that there
> are quite a few places in the code where we often just execute a
> ScalarArrayOpExpr once and I'm a bit worried that we'll slow down
> expression evaluation of those cases.
> 
> The two cases that I have in mind are:
> 
> 1. eval_const_expressions() where we use the executor to evaluate the
> ScalarArrayOpExpr to see if the result is Const.
> 2. CHECK constraints with IN clauses and single-row INSERTs.
> 
> I tried to benchmark both of these but I'm struggling to get stable
> enough performance for #2, even with fsync=off.  Sometimes I'm getting
> results 2.5x slower than other runs.
> 
> For benchmarking #1 I'm also not too sure I'm getting stable enough
> results for them to mean anything.
> 
> I was running:
> 
> create table a (a int);
> 
> bench.sql: explain select * from a where a
> in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16);
> 
> drowley@amd3990x:~$ pgbench -n -T 60 -j 64 -c 64 -f bench.sql -P 10 postgres
> Master (6d41dd045):
> tps = 992586.991045 (without initial connection time)
> tps = 987964.990483 (without initial connection time)
> tps = 994309.670918 (without initial connection time)
> 
> Master + v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch
> tps = 956022.557626 (without initial connection time)
> tps = 963043.352823 (without initial connection time)
> tps = 968582.600100 (without initial connection time)
> 
> This puts the patched version about 3% slower. I'm not sure how much
> of that is changes in the binary and noise and how much is the
> needless hashtable build done for eval_const_expressions().
> 
> I wondered if we should make it the query planner's job of deciding if
> the ScalarArrayOpExpr should be hashed or not.  I ended up with the
> attached rough-cut patch that introduces HashedScalarArrayOpExpr and
> has the query planner decide if it's going to replace
> ScalarArrayOpExpr with these HashedScalarArrayOpExpr during
> preprocess_expression().  I do think that we might want to consider
> being a bit selective about when we do these replacements.  It seems
> likely that we'd want to do this for EXPRKIND_QUAL and maybe
> EXPRKIND_TARGET, but I imagine that converting ScalarArrayOpExpr to
> HashedScalarArrayOpExpr for EXPRKIND_VALUES would be a waste of time
> since those will just be executed once.
> 
> I tried the same above test with the
> v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch plus the
> attached rough-cut patch and got:
> 
> master + v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch
> + v5-0002-Rough-cut-patch-for-HashedScalarArrayOpExpr.patch
> tps = 1167969.983173 (without initial connection time)
> tps = 1199636.793314 (without initial connection time)
> tps = 1190690.939963 (without initial connection time)
> 
> I can't really explain why this became faster. I was expecting it just
> to reduce that slowdown of the v4 patch a little. I don't really see
> any reason why it would become faster.  It's almost 20% faster which
> seems like too much to just be fluctuations in code alignment in the
> binary.
> 

Interesting. I tried this on the "small" machine I use for benchmarking,
with the same SQL script you used, and also with IN() containing 10 and
100 values - so less/more than your script, which used 16 values.

I only ran that with a single client, the machine only has 4 cores and
this should not be related to concurrency, so 1 client seems fine. The
average of 10 runs, 15 seconds each look like this:

            simple   prepared      10/s    10/p    100/s    100/p
    -------------------------------------------------------------
    master   21847      59476     23343   59380    11757    56488
    v4       21546      57757     22864   57704    11572    57350
    v4+v5    23374      56089     24410   56140    14765    55302

The first two columns are your bench.sql, with -M simple or prepared.
The other columns are 10 or 100 values, /s is simple, /p is prepared.

Compared to master:

            simple   prepared      10/s    10/p    100/s    100/p
    -------------------------------------------------------------
    v4      98.62%     97.11%    97.95%  97.18%   98.43%  101.52%
    v4+v5  106.99%     94.31%   104.57%  94.54%  125.59%   97.90%

That seems to mostly match your observation - there's a small
performance hit (~2%), although that might be due to changes in the
layout of the binary. And v4+v5 improves that a bit (even compared to
master), although I don't see the same 20% speedup.

I see +25% improvement, but only with 100 values.

It's a bit strange that in prepared mode, the v5 actually hurts the
performance a bit.

That being said, this is a pretty extreme test case. I'm pretty sure
that once the table is not empty, the results will probably show a clear
improvement. I'll collect some of those results.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: multi-install PostgresNode fails with older postgres versions
Next
From: Tom Lane
Date:
Subject: Re: ModifyTable overheads in generic plans