Re: Using Expanded Objects other than Arrays from plpgsql - Mailing list pgsql-hackers

From Michel Pelletier
Subject Re: Using Expanded Objects other than Arrays from plpgsql
Date
Msg-id CACxu=vKCSSqO6y0ES4SJnFSMBa8szANOykQkG3L1LsLnN2v0JA@mail.gmail.com
Whole thread Raw
In response to Re: Using Expanded Objects other than Arrays from plpgsql  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Using Expanded Objects other than Arrays from plpgsql
List pgsql-hackers
On Tue, Dec 3, 2024 at 4:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michel Pelletier <pelletier.michel@gmail.com> writes:
> Here's a WIP patch for a pgexpanded example in src/test/modules.

I didn't look at your patch yet, but in the meantime here's an update
that takes the next step towards what I promised.

Awesome!  I made a support function for my set_element(matrix, i, j, value) function and it works great, but I do have a couple questions about how to move forward with some of my other methods.  For reference, here's the set_element test function, and a trace of function calls, after implementing the support function for set_element, the expand_matrix function is called twice, first by nvals(), and later by the first set_element, but not the subsequent sets, so the true clause of the  PLPGSQL_RWOPT_INPLACE case in plpgsql_param_eval_var_check works great and as expected:

postgres=# create or replace function test_se(graph matrix) returns matrix language plpgsql as
    $$
    declare
        nvals bigint;
    begin
        graph = wait(graph);
        nvals = nvals(graph);
        raise notice 'nvals: %', nvals;
        graph = set_element(graph, 4, 2, 42);
        graph = set_element(graph, 4, 3, 43);
        graph = set_element(graph, 4, 4, 44);
        return graph;
    end;
    $$;
CREATE FUNCTION
postgres=# select nvals(test_se('int32'::matrix));
DEBUG:  new_matrix
DEBUG:  matrix_get_flat_size
DEBUG:  flatten_matrix
DEBUG:  matrix_wait
DEBUG:  DatumGetMatrix
DEBUG:  expand_matrix            <- wait expands with support function
DEBUG:  new_matrix
DEBUG:  matrix_nvals
DEBUG:  DatumGetMatrix
DEBUG:  matrix_get_flat_size
DEBUG:  flatten_matrix
DEBUG:  expand_matrix            <- nvals() reexpands
DEBUG:  new_matrix
DEBUG:  context_callback_matrix_free
NOTICE:  nvals: 0
DEBUG:  scalar_int32
DEBUG:  new_scalar
DEBUG:  matrix_set_element
DEBUG:  DatumGetMatrix            < set_element does not reexpand, yay!
DEBUG:  DatumGetScalar
DEBUG:  context_callback_scalar_free
DEBUG:  scalar_int32
DEBUG:  new_scalar
DEBUG:  matrix_set_element
DEBUG:  DatumGetMatrix
DEBUG:  DatumGetScalar
DEBUG:  context_callback_scalar_free
DEBUG:  scalar_int32
DEBUG:  new_scalar
DEBUG:  matrix_set_element
DEBUG:  DatumGetMatrix
DEBUG:  DatumGetScalar
DEBUG:  context_callback_scalar_free
DEBUG:  matrix_nvals
DEBUG:  DatumGetMatrix
DEBUG:  context_callback_matrix_free
DEBUG:  context_callback_matrix_free
┌───────┐
│ nvals │
├───────┤
│     3 │
└───────┘

 My question is about nvals = nvals(graph) in that function above, the object is flattened and then rexpanded, even after the object was expanded and returned by wait() [1] into a r/w pointer.  set_element honors the expanded object from wait(), but nvals does not.  It seems like I want to be able to pass the argument as a RO pointer, but I'm not sure how to trigger the "else" clause in the PLPGSQL_RWOPT_INPLACE case.  I can see how the support function triggers the if, but I don't see a similar way to trigger the else.  I'm almost certainly missing something.

[1](Due to the asynchronous nature of the GraphBLAS API, there is a GrB_wait() function to wait until an object is "complete" computationally, but since this object was just expanded, there is no pending work, so the wait() is essentially a noop but a handy way for me to return a r/w pointer for subsequent operations).

set_element and wait are the simple case where there is only one reference to the r/w object in the argument list.  As we've discussed I have many functions that can possibly have multiple references to the same object.  The core operation of the library is matrix multiplication, and all other complex functions follow a very similar pattern, so I'll just focus on that one here:

CREATE FUNCTION mxm(
    a matrix,
    b matrix,
    op semiring default null,
    inout c matrix default null,
    mask matrix default null,
    accum binaryop default null,
    descr descriptor default null
    )
RETURNS matrix

The order of arguments mostly follows the order in the C API.  a and b are the left and right matrix operands and the matrix product is the return value.  If c is not null, then it is a pre-created return value which may contain partial results already from some previous operations, otherwise mxm creates a new matrix of the correct dimensions and returns that.  I think the inout is meaningless as it doesn't seem to change anything, but I'm using it as a visual indication in code that c can be the return value if it's not null.

Here's an example of doing Triangle Counting using Burkhardt's method [2], where a, b, and the mask are all the same adjacency matrix (here the Newman/karate graph.  The 'plus_pair' semiring is optimized for structural counting, and the descriptor 's' tells suitesparse to only use the structure of the mask and not to consider the values):


CREATE OR REPLACE FUNCTION public.tcount_burkhardt(graph matrix)
 RETURNS bigint
 LANGUAGE plpgsql
AS $$
    begin
        graph = wait(graph);
        graph = mxm(graph, graph, 'plus_pair_int32', mask=>graph, descr=>'s');
        return reduce_scalar(graph) / 6;
    end;
    $$;

postgres=# select tcount_burkhardt(graph) from karateg;
DEBUG:  matrix_wait
DEBUG:  DatumGetMatrix
DEBUG:  expand_matrix                    <- wait expands and returns r/w pointer with support function
DEBUG:  new_matrix
DEBUG:  matrix_mxm                        <- mxm starts here
DEBUG:  DatumGetMatrix
DEBUG:  matrix_get_flat_size
DEBUG:  flatten_matrix
DEBUG:  expand_matrix                    <- expanding left operand again
DEBUG:  new_matrix
DEBUG:  DatumGetMatrix
DEBUG:  matrix_get_flat_size
DEBUG:  flatten_matrix
DEBUG:  expand_matrix                    <- expanding right operand again
DEBUG:  new_matrix
DEBUG:  DatumGetSemiring
DEBUG:  expand_semiring 
DEBUG:  new_semiring
DEBUG:  new_matrix
DEBUG:  DatumGetMatrix
DEBUG:  matrix_get_flat_size
DEBUG:  flatten_matrix
DEBUG:  expand_matrix                    <- expanding mask argument again
DEBUG:  new_matrix
DEBUG:  DatumGetDescriptor
DEBUG:  expand_descriptor
DEBUG:  new_descriptor
DEBUG:  context_callback_matrix_free
DEBUG:  context_callback_descriptor_free
DEBUG:  context_callback_matrix_free
DEBUG:  context_callback_semiring_free
DEBUG:  context_callback_matrix_free
DEBUG:  context_callback_matrix_free
DEBUG:  matrix_reduce_scalar
DEBUG:  DatumGetMatrix
DEBUG:  matrix_get_flat_size
DEBUG:  flatten_matrix
DEBUG:  expand_matrix                                        <- reduce also re-expands matrix
DEBUG:  new_matrix
DEBUG:  new_scalar
DEBUG:  scalar_div_int32
DEBUG:  DatumGetScalar
DEBUG:  new_scalar
DEBUG:  cast_scalar_int64
DEBUG:  DatumGetScalar
DEBUG:  context_callback_scalar_free
DEBUG:  context_callback_scalar_free
DEBUG:  context_callback_matrix_free
DEBUG:  context_callback_matrix_free
┌──────────────────┐
│ tcount_burkhardt                  │
├──────────────────┤
│               45                           │
└──────────────────┘

mxm calls expand_matrix three times for each of the three arguments.  Ideally I'd like the already expanded rw pointer from wait() to be honored by mxm so that it doesn't re-expand the object three times but, like set_element, not at all.

Hope that question makes sense, still going on the main theory that I'm not understanding the support function, and maybe the c argument thing being optional throws a wrench in the plan, and I'm happy to try and find a workaround for that.  Maybe always requiring the result to be pre-constructed and then making c required and reassigning back to the same input argument is the right approach?

Some good news I always like seeing is that 45 is the right answer.  Very close to having optimal sparse linear algebra in Postgres!

Thanks for your help!  I'll move onto your comments on the test module next.

-Michel
  

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: inconsistent use of SearchSysCacheCopy
Next
From: jian he
Date:
Subject: Re: proposal: schema variables