Thread: Re: proposal: schema variables

Re: proposal: schema variables

From
Laurenz Albe
Date:
On Tue, 2024-10-29 at 08:16 +0100, Pavel Stehule wrote:
> again, necessary rebase

I have started looking at patch 5, and I have some questions and comments.

- The commit message is headed "memory cleaning after DROP VARIABLE", but
  the rest of the commit message speaks of sinval messages.  These two
  things are independent, aren't they?  And both lead to the need to validate
  the variables, right?

  Then this code comment would for example be wrong:

     /* true after accepted sinval message */
     static bool needs_validation = false;

  It also becomes "true" after DROP VARIABLE, right?
  I am happy to fix the comment, but I want to understand the patch first.

- I see that the patch adds cleanup of invalid session variable to each
  COMMIT.  Is that a good idea?  I'd expect that it is good enough to clean
  up whenever session variables are accessed.
  Calling remove_invalid_session_variables() during each COMMIT will affect
  all transactions, and I don't see the benefit.

  Also, do we need to call it during pg_session_variables()?

Yours,
Laurenz Albe



Re: proposal: schema variables

From
Pavel Stehule
Date:


so 2. 11. 2024 v 6:46 odesílatel Laurenz Albe <laurenz.albe@cybertec.at> napsal:
On Tue, 2024-10-29 at 08:16 +0100, Pavel Stehule wrote:
> again, necessary rebase

I have started looking at patch 5, and I have some questions and comments.

- The commit message is headed "memory cleaning after DROP VARIABLE", but
  the rest of the commit message speaks of sinval messages.  These two
  things are independent, aren't they?  And both lead to the need to validate
  the variables, right?

Maybe it can be formulated differently, but it is true. There are a lot of sinval messages, but in this case
only sinval messages related to DROP VARIABLE are interesting. 


  Then this code comment would for example be wrong:

     /* true after accepted sinval message */
     static bool needs_validation = false;

  It also becomes "true" after DROP VARIABLE, right?
  I am happy to fix the comment, but I want to understand the patch first.

sinval message can be raised by any operation over the pg_variable table.

 <-><-->if (hashvalue == 0 || svar->hashvalue == hashvalue)
 <-><-->{
 <-><--><-->svar->is_valid = false;
+<-><--><-->needs_validation = true;
+<-><-->}
+<->}

When I execute DROP VARIABLE, then the hash value is specified, but the hash can be zero for some massive cleaning, and there are other events that can send sinval message. I think an ANALYZE does this. So the comment /* true after accepted sinval message */ is more accurate than /* true after DROP VARIABLE */.
 

- I see that the patch adds cleanup of invalid session variable to each
  COMMIT.  Is that a good idea?  I'd expect that it is good enough to clean
  up whenever session variables are accessed.
  Calling remove_invalid_session_variables() during each COMMIT will affect
  all transactions, and I don't see the benefit.

If I remember well, there were two reasons why I did it.

1. Minimize the unwanted surprises for users that will check memory usage - So if you drop the variables, then the allocated space is released in possibly near time. The rule - allocated space is released, when in the next transaction you use any session variable looks a little bit crazy (although I think so there will not be real significant difference in functionality). Correct me, if I am wrong, but I don't remember any memory (or resource) cleaning in Postgres, that is delayed to second transactions. I agree, there is overhead of cleaning, but this can be very fast when the user doesn't use session variables, because the hash table with session variables is not initialized. I can imagine some usage some hooks there as alternative

2. The main reason why it is implemented is implementation of temporal variables with RESET or DROP on transaction end. Related code should be triggered at commit time, it cannot be delayed.


  Also, do we need to call it during pg_session_variables()?

I think it can be removed. Originally pg_session_variables showed only valid variables, but it is not true now.

Regards

Pavel
 

Yours,
Laurenz Albe

Re: proposal: schema variables

From
Laurenz Albe
Date:
On Sat, 2024-11-02 at 08:36 +0100, Pavel Stehule wrote:
> so 2. 11. 2024 v 6:46 odesílatel Laurenz Albe <laurenz.albe@cybertec.at> napsal:
> > - The commit message is headed "memory cleaning after DROP VARIABLE", but
> >   the rest of the commit message speaks of sinval messages.  These two
> >   things are independent, aren't they?  And both lead to the need to validate
> >   the variables, right?
>
> Maybe it can be formulated differently, but it is true. There are a lot of sinval messages, but in this case
> only sinval messages related to DROP VARIABLE are interesting.

Okay...

> >   Then this code comment would for example be wrong:
> >
> >      /* true after accepted sinval message */
> >      static bool needs_validation = false;
> >
> >   It also becomes "true" after DROP VARIABLE, right?
> >   I am happy to fix the comment, but I want to understand the patch first.
> >
>
> sinval message can be raised by any operation over the pg_variable table.

I set a watchpoint on "needs_validation", and when I run DROP VARIABLE, it is called
directly from the command:

Hardware watchpoint 2: needs_validation

Old value = false
New value = true
0x00000000006ad44c in SessionVariableDropPostprocess (varid=<optimized out>) at session_variable.c:169
169                svar->drop_lxid = MyProc->vxid.lxid;
(gdb) where
#0  0x00000000006ad44c in SessionVariableDropPostprocess (varid=<optimized out>) at session_variable.c:169
#1  0x0000000000625dec in DropVariableById (varid=<optimized out>) at pg_variable.c:259
#2  0x00000000005fec57 in doDeletion (object=0x4ac85e8, flags=0) at dependency.c:1450
...
#8  0x00000000008bb7e0 in standard_ProcessUtility (pstmt=0x49ac700, queryString=0x49abbb0 "DROP VARIABLE a;",
readOnlyTree=<optimizedout>,  
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x49acac0, qc=0x7ffe19d2db60) at utility.c:1076
...
#12 0x00000000008b6742 in exec_simple_query (query_string=0x49abbb0 "DROP VARIABLE a;") at postgres.c:1283

Later, there is a sinval message and "pg_variable_cache_callback()" is called, which sets
"needs_validation" again.

Perhaps my confustion is as follows: if DROP VARIABLE sends an invalidation message,
and the handler sets "needs_validation", why is it necessary to set "needs_validation"
in SessionVariableDropPostprocess() too?

If we didn't set "needs_validation" in SessionVariableDropPostprocess(), the comment
would be true.

> > - I see that the patch adds cleanup of invalid session variable to each
> >   COMMIT.  Is that a good idea?  I'd expect that it is good enough to clean
> >   up whenever session variables are accessed.
> >   Calling remove_invalid_session_variables() during each COMMIT will affect
> >   all transactions, and I don't see the benefit.
>
> If I remember well, there were two reasons why I did it.
>
> 1. Minimize the unwanted surprises for users that will check memory usage - So if you
>    drop the variables, then the allocated space is released in possibly near time.
>    The rule - allocated space is released, when in the next transaction you use any
>    session variable looks a little bit crazy (although I think so there will not be real
>    significant difference in functionality).
>    Correct me, if I am wrong, but I don't remember any memory (or resource) cleaning
>    in Postgres, that is delayed to second transactions. I agree, there is overhead of cleaning,
>    but this can be very fast when the user doesn't use session variables, because the hash table
>    with session variables is not initialized. I can imagine some usage some hooks there as alternative

I don't think that is a good enough reason.

Memory used by an idle backend is not totally predictable for other reasons as well:
- the catalog cache
- memory that PostgreSQL freed, but that is kept in the malloc arena so that
  it can be reused for the next malloc() call

I believe that the disadvantage of keeping some memory allocated across transaction
is not as bad as the pain of going through all variables on each COMMIT.
If you have set one or two session variables and run a lot of statements in
autocommit mode, that is an unnecessary overhead.

> 2. The main reason why it is implemented is implementation of temporal variables
>    with RESET or DROP on transaction end. Related code should be triggered at
>    commit time, it cannot be delayed.

That makes sense.

If I remember right, temporary variables are an optional feature.
So I suggest that you move AtPreEOXact_SessionVariables() to the patch that adds
temporary session variables.

> >   Also, do we need to call it during pg_session_variables()?
>
> I think it can be removed. Originally pg_session_variables showed only valid variables, but it is not true now.

Right.

I'll wait for your reaction to my above suggestions before I start working on the comments.

Yours,
Laurenz Albe