Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date
Msg-id 557e8853-7cb8-9c01-6193-1d4109345401@iki.fi
Whole thread Raw
In response to Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays  (James Coleman <jtc331@gmail.com>)
Responses Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
List pgsql-hackers
On 08/09/2020 22:25, James Coleman wrote:
> On Wed, Aug 19, 2020 at 3:16 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>
>> You could also apply the optimization for non-Const expressions, as long
>> as they're stable (i.e. !contain_volatile_functions(expr)).
> 
> Is that true? Don't we also have to worry about whether or not the
> value is stable (i.e., know when a param has changed)? There have been
> discussions about being able to cache stable subexpressions, and my
> understanding was that we'd need to have that infrastructure (along
> with the necessarily invalidation when the param changes) to be able
> to use this for non-Const expressions.

Yeah, you're right, you'd have to also check for PARAM_EXEC Params. And 
Vars. I think the conditions are the same as those checked in 
match_clause_to_partition_key() in partprune.c (it's a long function, 
search for "if (!IsA(rightop, Const))"). Not sure it's worth the 
trouble, then. But it would be nice to handle queries like "WHERE column 
= ANY ($1)"

>> Deconstructing the array Datum into a simple C array on first call would
>> be a win even for very small arrays and for AND semantics, even if you
>> don't use a hash table.
> 
> Because you wouldn't have to repeatedly detoast it? Or some other
> reason I'm not thinking of? My intuition would have been that (aside
> from detoasting if necessary) there wouldn't be much real overhead in,
> for example, an array storing integers.

Dealing with NULLs and different element sizes in the array is pretty 
complicated. Looping through the array currently looks like this:

    /* Loop over the array elements */
    s = (char *) ARR_DATA_PTR(arr);
    bitmap = ARR_NULLBITMAP(arr);
    bitmask = 1;

    for (int i = 0; i < nitems; i++)
    {
        Datum        elt;
        Datum        thisresult;

        /* Get array element, checking for NULL */
        if (bitmap && (*bitmap & bitmask) == 0)
        {
            fcinfo->args[1].value = (Datum) 0;
            fcinfo->args[1].isnull = true;
        }
        else
        {
            elt = fetch_att(s, typbyval, typlen);
            s = att_addlength_pointer(s, typlen, s);
            s = (char *) att_align_nominal(s, typalign);
            fcinfo->args[1].value = elt;
            fcinfo->args[1].isnull = false;
        }

        [do stuff with Datum/isnull]

        /* advance bitmap pointer if any */
        if (bitmap)
        {
            bitmask <<= 1;
            if (bitmask == 0x100)
            {
                bitmap++;
                bitmask = 1;
            }
        }
    }

Compared with just:

    for (int i = 0; i < nitems; i++)
    {
        Datum        elt = datums[i];

        [do stuff with the Datum]
    }

I'm not sure how much difference that makes, but I presume it's not 
zero, and it seems like an easy win when you have the code to deal with 
the Datum array representation anyway.

- Heikki



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: VACUUM (INTERRUPTIBLE)?
Next
From: Tomas Vondra
Date:
Subject: Re: Ideas about a better API for postgres_fdw remote estimates