Re: WIP: expression evaluation improvements - Mailing list pgsql-hackers

From Andres Freund
Subject Re: WIP: expression evaluation improvements
Date
Msg-id 20191028223218.uhz3o753z64xkpcd@alap3.anarazel.de
Whole thread Raw
In response to Re: WIP: expression evaluation improvements  (Soumyadeep Chakraborty <sochakraborty@pivotal.io>)
Responses Re: WIP: expression evaluation improvements
List pgsql-hackers
Hi,

On 2019-10-27 23:46:22 -0700, Soumyadeep Chakraborty wrote:
> Apologies, I realize my understanding of symbol resolution and the
> referenced_functions mechanism wasn't correct. Thank you for your very
> helpful
> explanations.

No worries! I was just wondering whether I was misunderstanding you.


> > If indeed the only case this is being hit is language PL handlers, it
> > might be better to instead work out the symbol name for that handler -
> > we should be able to get that via pg_language.lanplcallfoid.
> 
> I took a stab at this (on top of your patch set):
> v1-0001-Resolve-PL-handler-names-for-JITed-code-instead-o.patch

I think I'd probably try to apply this to master independent of the
larger patchset, to avoid a large dependency.


> From 07c7ff996706c6f71e00d76894845c1f87956472 Mon Sep 17 00:00:00 2001
> From: soumyadeep2007 <sochakraborty@pivotal.io>
> Date: Sun, 27 Oct 2019 17:42:53 -0700
> Subject: [PATCH v1] Resolve PL handler names for JITed code instead of using
>  const pointers
> 
> Using const pointers to PL handler functions prevents optimization
> opportunities in JITed code. Now fmgr_symbol() resolves PL function
> references to the corresponding language's handler.
> llvm_function_reference() now no longer needs to create the global to
> such a function.

Did you check whether there's any cases this fails in the tree with your
patch applied? The way I usually do that is by running the regression
tests like
PGOPTIONS='-cjit_above_cost=0' make -s -Otarget check-world

(which will take a bit longer if use an optimized LLVM build, and a
*lot* longer if you use a debug llvm build)


> Discussion: https://postgr.es/m/20191024224303.jvdx3hq3ak2vbit3%40alap3.anarazel.de:wq
> ---
>  src/backend/jit/llvm/llvmjit.c | 29 +++--------------------------
>  src/backend/utils/fmgr/fmgr.c  | 30 +++++++++++++++++++++++-------
>  2 files changed, 26 insertions(+), 33 deletions(-)
> 
> diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
> index 82c4afb701..69a4167ac9 100644
> --- a/src/backend/jit/llvm/llvmjit.c
> +++ b/src/backend/jit/llvm/llvmjit.c
> @@ -369,38 +369,15 @@ llvm_function_reference(LLVMJitContext *context,
>  
>      fmgr_symbol(fcinfo->flinfo->fn_oid, &modname, &basename);
>  
> -    if (modname != NULL && basename != NULL)
> +    if (modname != NULL)
>      {
>          /* external function in loadable library */
>          funcname = psprintf("pgextern.%s.%s", modname, basename);
>      }
> -    else if (basename != NULL)
> -    {
> -        /* internal function */
> -        funcname = psprintf("%s", basename);
> -    }
>      else
>      {
> -        /*
> -         * Function we don't know to handle, return pointer. We do so by
> -         * creating a global constant containing a pointer to the function.
> -         * Makes IR more readable.
> -         */
> -        LLVMValueRef v_fn_addr;
> -
> -        funcname = psprintf("pgoidextern.%u",
> -                            fcinfo->flinfo->fn_oid);
> -        v_fn = LLVMGetNamedGlobal(mod, funcname);
> -        if (v_fn != 0)
> -            return LLVMBuildLoad(builder, v_fn, "");
> -
> -        v_fn_addr = l_ptr_const(fcinfo->flinfo->fn_addr, TypePGFunction);
> -
> -        v_fn = LLVMAddGlobal(mod, TypePGFunction, funcname);
> -        LLVMSetInitializer(v_fn, v_fn_addr);
> -        LLVMSetGlobalConstant(v_fn, true);
> -
> -        return LLVMBuildLoad(builder, v_fn, "");
> +        /* internal function or a PL handler */
> +        funcname = psprintf("%s", basename);
>      }

Hm. Aren't you breaking things here? If fmgr_symbol returns a basename
of NULL, as is the case for all internal functions, you're going to
print a NULL pointer, no?


>      /* check if function already has been added */
> diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
> index 099ebd779b..71398bb3c1 100644
> --- a/src/backend/utils/fmgr/fmgr.c
> +++ b/src/backend/utils/fmgr/fmgr.c
> @@ -265,11 +265,9 @@ fmgr_info_cxt_security(Oid functionId, FmgrInfo *finfo, MemoryContext mcxt,
>  /*
>   * Return module and C function name providing implementation of functionId.
>   *
> - * If *mod == NULL and *fn == NULL, no C symbol is known to implement
> - * function.
> - *
>   * If *mod == NULL and *fn != NULL, the function is implemented by a symbol in
> - * the main binary.
> + * the main binary. If the function being looked up is not a C language
> + * function, it's language handler name is returned.
>   *
>   * If *mod != NULL and *fn !=NULL the function is implemented in an extension
>   * shared object.
> @@ -285,6 +283,11 @@ fmgr_symbol(Oid functionId, char **mod, char **fn)
>      bool        isnull;
>      Datum        prosrcattr;
>      Datum        probinattr;
> +    Oid            language;
> +    HeapTuple    languageTuple;
> +    Form_pg_language languageStruct;
> +    HeapTuple    plHandlerProcedureTuple;
> +    Form_pg_proc plHandlerProcedureStruct;
>  
>      /* Otherwise we need the pg_proc entry */
>      procedureTuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(functionId));
> @@ -304,8 +307,9 @@ fmgr_symbol(Oid functionId, char **mod, char **fn)
>          return;
>      }
>  
> +    language = procedureStruct->prolang;
>      /* see fmgr_info_cxt_security for the individual cases */
> -    switch (procedureStruct->prolang)
> +    switch (language)
>      {
>          case INTERNALlanguageId:
>              prosrcattr = SysCacheGetAttr(PROCOID, procedureTuple,
> @@ -342,9 +346,21 @@ fmgr_symbol(Oid functionId, char **mod, char **fn)
>              break;
>  
>          default:
> +            languageTuple = SearchSysCache1(LANGOID,
> +                                             ObjectIdGetDatum(language));
> +            if (!HeapTupleIsValid(languageTuple))
> +                elog(ERROR, "cache lookup failed for language %u", language);
> +            languageStruct = (Form_pg_language) GETSTRUCT(languageTuple);
> +            plHandlerProcedureTuple = SearchSysCache1(PROCOID,
> +                                                      ObjectIdGetDatum(
> +                                                          languageStruct->lanplcallfoid));
> +            if (!HeapTupleIsValid(plHandlerProcedureTuple))
> +                elog(ERROR, "cache lookup failed for function %u", functionId);
> +            plHandlerProcedureStruct = (Form_pg_proc) GETSTRUCT(plHandlerProcedureTuple);
>              *mod = NULL;
> -            *fn = NULL;            /* unknown, pass pointer */
> -            break;
> +            *fn = pstrdup(NameStr(plHandlerProcedureStruct->proname));
> +            ReleaseSysCache(languageTuple);
> +            ReleaseSysCache(plHandlerProcedureTuple);
>      }


> > I do want to benefit from getting accurate signatures for patch
> > [PATCH v2 26/32] WIP: expression eval: relative pointer suppport
> > I had a number of cases where I passed the wrong parameters, and llvm
> > couldn't tell me...
> 
> I took a stab:
> v1-0001-Rely-on-llvmjit_types-for-building-EvalFunc-calls.patch

Cool! I'll probably merge that into my patch (with attribution of
course).

I wonder if it'd nicer to not have separate C variables for all of
these, and instead look them up on-demand from the module loaded in
llvm_create_types(). Not sure.


> On a separate note, I had submitted a patch earlier to optimize functions
> earlier
> in accordance to the code comment:
> /*
>  * Do function level optimization. This could be moved to the point where
>  * functions are emitted, to reduce memory usage a bit.
>  */
>  LLVMInitializeFunctionPassManager(llvm_fpm);
> Refer:
> https://www.postgresql.org/message-id/flat/CAE-ML+_OE4-sHvn0AA_qakc5qkZvQvainxwb1ztuuT67SPMegw@mail.gmail.com
> I have rebased that patch on top of your patch set. Here it is:
> v2-0001-Optimize-generated-functions-earlier-to-lower-mem.patch

Sorry for not replying to that earlier.  I'm not quite sure it's
actually worthwhile doing so - did you try to measure any memory / cpu
savings?

The magnitude of wins aside, I also have a local patch that I'm going to
try to publish this or next week, that deduplicates functions more
aggressively, mostly to avoid redundant optimizations. It's quite
possible that we should run that before the function passes - or even
give up entirely on the function pass optimizations. As the module pass
manager does the same optimizations it's not that clear in which cases
it'd be beneficial to run it, especially if it means we can't
deduplicate before optimizations.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Taylor Vesely
Date:
Subject: Re: Zedstore - compressed in-core columnar storage
Next
From: Andres Freund
Date:
Subject: merging HashJoin and Hash nodes