Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays |
Date | |
Msg-id | 26bb3b68-3c51-2dac-d6ce-ae321e6bc801@enterprisedb.com Whole thread Raw |
In response to | Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
|
List | pgsql-hackers |
On 4/6/21 5:58 AM, David Rowley wrote: > On Sat, 20 Mar 2021 at 09:41, James Coleman <jtc331@gmail.com> wrote: >> I've attached a cleaned up patch. Last CF it was listed in is >> https://commitfest.postgresql.org/29/2542/ -- what's the appropriate >> step to take here given it's an already existing patch, but not yet >> moved into recent CFs? > > I had a look at this patch. I like the idea of using a simplehash.h > hash table to hash the constant values so that repeat lookups can be > performed much more quickly, however, I'm a bit concerned that there > are quite a few places in the code where we often just execute a > ScalarArrayOpExpr once and I'm a bit worried that we'll slow down > expression evaluation of those cases. > > The two cases that I have in mind are: > > 1. eval_const_expressions() where we use the executor to evaluate the > ScalarArrayOpExpr to see if the result is Const. > 2. CHECK constraints with IN clauses and single-row INSERTs. > > I tried to benchmark both of these but I'm struggling to get stable > enough performance for #2, even with fsync=off. Sometimes I'm getting > results 2.5x slower than other runs. > > For benchmarking #1 I'm also not too sure I'm getting stable enough > results for them to mean anything. > > I was running: > > create table a (a int); > > bench.sql: explain select * from a where a > in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16); > > drowley@amd3990x:~$ pgbench -n -T 60 -j 64 -c 64 -f bench.sql -P 10 postgres > Master (6d41dd045): > tps = 992586.991045 (without initial connection time) > tps = 987964.990483 (without initial connection time) > tps = 994309.670918 (without initial connection time) > > Master + v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch > tps = 956022.557626 (without initial connection time) > tps = 963043.352823 (without initial connection time) > tps = 968582.600100 (without initial connection time) > > This puts the patched version about 3% slower. I'm not sure how much > of that is changes in the binary and noise and how much is the > needless hashtable build done for eval_const_expressions(). > > I wondered if we should make it the query planner's job of deciding if > the ScalarArrayOpExpr should be hashed or not. I ended up with the > attached rough-cut patch that introduces HashedScalarArrayOpExpr and > has the query planner decide if it's going to replace > ScalarArrayOpExpr with these HashedScalarArrayOpExpr during > preprocess_expression(). I do think that we might want to consider > being a bit selective about when we do these replacements. It seems > likely that we'd want to do this for EXPRKIND_QUAL and maybe > EXPRKIND_TARGET, but I imagine that converting ScalarArrayOpExpr to > HashedScalarArrayOpExpr for EXPRKIND_VALUES would be a waste of time > since those will just be executed once. > > I tried the same above test with the > v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch plus the > attached rough-cut patch and got: > > master + v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch > + v5-0002-Rough-cut-patch-for-HashedScalarArrayOpExpr.patch > tps = 1167969.983173 (without initial connection time) > tps = 1199636.793314 (without initial connection time) > tps = 1190690.939963 (without initial connection time) > > I can't really explain why this became faster. I was expecting it just > to reduce that slowdown of the v4 patch a little. I don't really see > any reason why it would become faster. It's almost 20% faster which > seems like too much to just be fluctuations in code alignment in the > binary. > Interesting. I tried this on the "small" machine I use for benchmarking, with the same SQL script you used, and also with IN() containing 10 and 100 values - so less/more than your script, which used 16 values. I only ran that with a single client, the machine only has 4 cores and this should not be related to concurrency, so 1 client seems fine. The average of 10 runs, 15 seconds each look like this: simple prepared 10/s 10/p 100/s 100/p ------------------------------------------------------------- master 21847 59476 23343 59380 11757 56488 v4 21546 57757 22864 57704 11572 57350 v4+v5 23374 56089 24410 56140 14765 55302 The first two columns are your bench.sql, with -M simple or prepared. The other columns are 10 or 100 values, /s is simple, /p is prepared. Compared to master: simple prepared 10/s 10/p 100/s 100/p ------------------------------------------------------------- v4 98.62% 97.11% 97.95% 97.18% 98.43% 101.52% v4+v5 106.99% 94.31% 104.57% 94.54% 125.59% 97.90% That seems to mostly match your observation - there's a small performance hit (~2%), although that might be due to changes in the layout of the binary. And v4+v5 improves that a bit (even compared to master), although I don't see the same 20% speedup. I see +25% improvement, but only with 100 values. It's a bit strange that in prepared mode, the v5 actually hurts the performance a bit. That being said, this is a pretty extreme test case. I'm pretty sure that once the table is not empty, the results will probably show a clear improvement. I'll collect some of those results. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: