Thread: make additional use of optimized linear search routines
Hi hackers, Now that we have some optimized linear search routines [0], I thought I'd quickly check whether we could use them elsewhere. To start, I took another look at a previously posted patch [1] and noticed two potentially useful applications of pg_lfind32(). The attached patches replace the open-coded linear searches with calls to pg_lfind32(). I haven't done any performance analysis with these patches yet, and the overall impact might be limited, but it seemed like low-hanging fruit. I'm hoping to spend a bit more time looking for additional applications of the pg_lfind*() suite of functions (and anywhere else where SIMD might be useful, really). If you have any ideas in mind, I'm all ears. [0] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/port/pg_lfind.h [1] https://postgr.es/m/20220802221301.GA742739%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Fri, Sep 2, 2022 at 2:52 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
I'm hoping to spend a bit more time looking for additional applications of
the pg_lfind*() suite of functions (and anywhere else where SIMD might be
useful, really). If you have any ideas in mind, I'm all ears.
+1 for the proposal. I did some simple grep work in the source codes but
not too much outputs besides the two places addressed in your patches.
Here are some places I noticed that might be optimized with pg_lfind*().
But 1) these places have issues that arguments differ in signedness, and
2) I'm not sure whether they are performance-sensitive or not.
In check_valid_internal_signature()
for (int i = 0; i < nargs; i++)
{
if (declared_arg_types[i] == ret_type)
return NULL; /* OK */
}
In validateFkOnDeleteSetColumns()
for (int j = 0; j < numfks; j++)
{
if (fkattnums[j] == setcol_attnum)
{
seen = true;
break;
}
}
In pg_isolation_test_session_is_blocked()
for (i = 0; i < num_blocking_pids; i++)
for (j = 0; j < num_interesting_pids; j++)
{
if (blocking_pids[i] == interesting_pids[j])
PG_RETURN_BOOL(true);
}
not too much outputs besides the two places addressed in your patches.
Here are some places I noticed that might be optimized with pg_lfind*().
But 1) these places have issues that arguments differ in signedness, and
2) I'm not sure whether they are performance-sensitive or not.
In check_valid_internal_signature()
for (int i = 0; i < nargs; i++)
{
if (declared_arg_types[i] == ret_type)
return NULL; /* OK */
}
In validateFkOnDeleteSetColumns()
for (int j = 0; j < numfks; j++)
{
if (fkattnums[j] == setcol_attnum)
{
seen = true;
break;
}
}
In pg_isolation_test_session_is_blocked()
for (i = 0; i < num_blocking_pids; i++)
for (j = 0; j < num_interesting_pids; j++)
{
if (blocking_pids[i] == interesting_pids[j])
PG_RETURN_BOOL(true);
}
In dotrim()
for (i = 0; i < setlen; i++)
{
if (str_ch == set[i])
break;
}
if (i >= setlen)
break; /* no match here */
And the function has_lock_conflicts().
Thanks
Richard
for (i = 0; i < setlen; i++)
{
if (str_ch == set[i])
break;
}
if (i >= setlen)
break; /* no match here */
And the function has_lock_conflicts().
Thanks
Richard
On Fri, Sep 02, 2022 at 08:15:46PM +0800, Richard Guo wrote: > +1 for the proposal. I did some simple grep work in the source codes but > not too much outputs besides the two places addressed in your patches. Thanks for taking a look! > Here are some places I noticed that might be optimized with pg_lfind*(). > But 1) these places have issues that arguments differ in signedness, and > 2) I'm not sure whether they are performance-sensitive or not. Yeah, I doubt that these typically deal with many elements or are performance-sensitive enough to bother with. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Fri, Sep 02, 2022 at 03:16:11PM -0700, Nathan Bossart wrote: > On Fri, Sep 02, 2022 at 08:15:46PM +0800, Richard Guo wrote: >> +1 for the proposal. I did some simple grep work in the source codes but >> not too much outputs besides the two places addressed in your patches. > > Thanks for taking a look! Ohoh. This sounds like a good idea to me, close to what John has applied lately. I'll take a closer look.. -- Michael
Attachment
On Sat, Sep 03, 2022 at 10:06:58AM +0900, Michael Paquier wrote: > Ohoh. This sounds like a good idea to me, close to what John has > applied lately. I'll take a closer look.. So, the two code paths patched here are rather isolated. The one in TransactionIdIsInProgress() requires an overflowed set of subxids still running, something similar to what the isolation test subxid-overflow does. XidIsConcurrent() is also kind of hard to reason about with a benchmark. Anyway, I did not know about the work done with SIMD instructions in pg_lfind.h and after playing the API I have run some micro benchmarks with on pg_lfind32() and I can see some improvements. With a range of 100~10k elements in a fixed number of repeated calls with a for loop and lfind(), I could not get up to the 40% speedup. That was somewhat closer to 15%~20% on x86 and 20%~25% with arm64. There is a trend where things got better with a higher number of elements with lfind(). In short, switching those code paths to use the linear search routines looks like a good thing in the long-term, so I would like to apply this patch. If you have any comments or objections, please feel free. -- Michael
Attachment
On Wed, Sep 21, 2022 at 1:40 PM Michael Paquier <michael@paquier.xyz> wrote:
In short, switching those code paths to use the linear search routines
looks like a good thing in the long-term, so I would like to apply
this patch. If you have any comments or objections, please feel
free.
Yeah, I agree that the changes in the patch are meaningful even if the
performance gain is limited.
I wonder if there are other code paths we can replace with the linear
search routines. I tried to search for them but no luck.
Thanks
Richard
performance gain is limited.
I wonder if there are other code paths we can replace with the linear
search routines. I tried to search for them but no luck.
Thanks
Richard
On Wed, Sep 21, 2022 at 02:28:08PM +0800, Richard Guo wrote: > I wonder if there are other code paths we can replace with the linear > search routines. I tried to search for them but no luck. I have been looking at a couple of simple patterns across the tree but no luck here either. Well, if someone spots something, it could always be done later. For now I have applied the bits discussed on this thread. -- Michael
Attachment
On Thu, Sep 22, 2022 at 09:52:57AM +0900, Michael Paquier wrote: > On Wed, Sep 21, 2022 at 02:28:08PM +0800, Richard Guo wrote: >> I wonder if there are other code paths we can replace with the linear >> search routines. I tried to search for them but no luck. > > I have been looking at a couple of simple patterns across the tree but > no luck here either. Well, if someone spots something, it could > always be done later. For now I have applied the bits discussed on > this thread. Thanks! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com