Thread: [HACKERS] Passing values to a dynamic background worker

[HACKERS] Passing values to a dynamic background worker

From
Keith Fiske
Date:
So after reading a recent thread on the steep learning curve for PG internals [1], I figured I'd share where I've gotten stuck with this in a new thread vs hijacking that one.

One of the goals I had with pg_partman was to see if I could get the partitioning python scripts redone as C functions using a dynamic background worker to be able to commit in batches with a single call. My thinking was to have a user-function that can accept arguments for things like the interval value, batch size, and other arguments to the python script, then start/stop a dynamic bgw up for each batch so it can commit after each one. The dymanic bgw would essentially just have to call the already existing partition_data() plpgsql function, but I have to be able to pass the argument values that the user gave down into the dynamic bgw.

I've reached a roadblock in that bgw_main_arg can only accept a single argument that must be passed by value for a dynamic bgw. I already worked around this for passing the database name to the my existing use of a bgw with doing partition maintenance (pass a simple integer to use as an index array value). But I'm not sure how to do this for passing multiple values in. I'm assuming this would be the place where I'd see about storing values in shared memory to be able to re-use later? I'm not even sure if that's the right approach, and if it is, where to even start to understand how to do that. Let alone in the context of how that would interact with the background worker system. If you look at my existing C code, you can see it's very simple and doesn't do much more than the worker_spi example. I've yet to have to interact with any memory contexts or such things, and as the referenced thread below mentions, doing so is quite a steep learning curve.

Any guidance for a newer internals dev here would be great.

Re: [HACKERS] Passing values to a dynamic background worker

From
Kyotaro HORIGUCHI
Date:
Hello,

At Mon, 17 Apr 2017 16:19:13 -0400, Keith Fiske <keith@omniti.com> wrote in
<CAG1_KcAFJ60pac_QnnZX0qeO12NENiPOcohuoQvs297WaT_ObQ@mail.gmail.com>
> So after reading a recent thread on the steep learning curve for PG
> internals [1], I figured I'd share where I've gotten stuck with this in a
> new thread vs hijacking that one.
> 
> One of the goals I had with pg_partman was to see if I could get the
> partitioning python scripts redone as C functions using a dynamic
> background worker to be able to commit in batches with a single call. My
> thinking was to have a user-function that can accept arguments for things
> like the interval value, batch size, and other arguments to the python
> script, then start/stop a dynamic bgw up for each batch so it can commit
> after each one. The dymanic bgw would essentially just have to call the
> already existing partition_data() plpgsql function, but I have to be able
> to pass the argument values that the user gave down into the dynamic bgw.
> 
> I've reached a roadblock in that bgw_main_arg can only accept a single
> argument that must be passed by value for a dynamic bgw. I already worked
> around this for passing the database name to the my existing use of a bgw
> with doing partition maintenance (pass a simple integer to use as an index
> array value). But I'm not sure how to do this for passing multiple values
> in. I'm assuming this would be the place where I'd see about storing values
> in shared memory to be able to re-use later? I'm not even sure if that's
> the right approach, and if it is, where to even start to understand how to
> do that.

I think you are on the way, shared memory is that. There are two
ways to acquire shared memory areas for such purpose. One is
static shared memory that stays living aside shared_buffers, and
the another is dynamic shared memory (DSM). If you need fixed
size of memory segment, the former will work. If you need that of
indefinite amount, DSM will work.

You will see how to use (static) shared memory in the following
section in the documentation. Or pg_stat_statements.c will be a
good reference. This kind of shared memory is guaranteed to be
mapped at the same address so we can use pointers on there.

https://www.postgresql.org/docs/devel/static/xfunc-c.html#idp83376336


On the other hand, AFAICS, DSM doesn't seem well documented. I
mangaged to find a related document in Postgres Wiki but it seems
a bit old.

https://wiki.postgresql.org/wiki/Parallel_Internal_Sort

This is a little complex than static shared memory, and it is
*not* guaranteed to mapped at the same address among workers. You
will see an instance in LaunchParallelWorkers() and the related
functions in parallel.c. The basic of its usage would be as the
follows.

- Create a segment :  dsm_segment *seg = dsm_create(size);
- Send its handle via the bgw_main_arg.  worker.bgw_main_arg = dsm_segment_handle(seg);
- Attach the memory on the other side.  dsm_segment *seg = dsm_attach(main_arg);

On both side, the address of the attached shared memory is
obtained using dsm_segment_address(seg).

dsm_detach(seg) detaches the segment. All users of this segment
detach the segment, it will be destroyed.

You might need some locking or notification mechanism. Usually
the mechanisms named LWLock and Latch are used for the purpose.


> Let alone in the context of how that would interact with the
> background worker system. If you look at my existing C code, you can see
> it's very simple and doesn't do much more than the worker_spi example. I've
> yet to have to interact with any memory contexts or such things, and as the
> referenced thread below mentions, doing so is quite a steep learning curve.
> 
> Any guidance for a newer internals dev here would be great.
> 
> 1.
> https://www.postgresql.org/message-id/CAH%3Dt1kqwCBF7J1bP0RjgsTcp-SaJaHrF4Yhb1iiQZMe3W-FX2w%40mail.gmail.com
> 
> --
> Keith Fiske
> Database Administrator
> OmniTI Computer Consulting, Inc.
> http://www.keithf4.com

Good luck!

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Passing values to a dynamic background worker

From
Amit Langote
Date:
On 2017/04/18 18:12, Kyotaro HORIGUCHI wrote:
> At Mon, 17 Apr 2017 16:19:13 -0400, Keith Fiske wrote:
>> So after reading a recent thread on the steep learning curve for PG
>> internals [1], I figured I'd share where I've gotten stuck with this in a
>> new thread vs hijacking that one.
>>
>> One of the goals I had with pg_partman was to see if I could get the
>> partitioning python scripts redone as C functions using a dynamic
>> background worker to be able to commit in batches with a single call. My
>> thinking was to have a user-function that can accept arguments for things
>> like the interval value, batch size, and other arguments to the python
>> script, then start/stop a dynamic bgw up for each batch so it can commit
>> after each one. The dymanic bgw would essentially just have to call the
>> already existing partition_data() plpgsql function, but I have to be able
>> to pass the argument values that the user gave down into the dynamic bgw.
>>
>> I've reached a roadblock in that bgw_main_arg can only accept a single
>> argument that must be passed by value for a dynamic bgw. I already worked
>> around this for passing the database name to the my existing use of a bgw
>> with doing partition maintenance (pass a simple integer to use as an index
>> array value). But I'm not sure how to do this for passing multiple values
>> in. I'm assuming this would be the place where I'd see about storing values
>> in shared memory to be able to re-use later? I'm not even sure if that's
>> the right approach, and if it is, where to even start to understand how to
>> do that.
> 
> On the other hand, AFAICS, DSM doesn't seem well documented. I
> mangaged to find a related document in Postgres Wiki but it seems
> a bit old.
> 
> https://wiki.postgresql.org/wiki/Parallel_Internal_Sort
> 
> This is a little complex than static shared memory, and it is
> *not* guaranteed to mapped at the same address among workers. You
> will see an instance in LaunchParallelWorkers() and the related
> functions in parallel.c. The basic of its usage would be as the
> follows.
> 
> - Create a segment :
>    dsm_segment *seg = dsm_create(size);
> - Send its handle via the bgw_main_arg.
>    worker.bgw_main_arg = dsm_segment_handle(seg);
> - Attach the memory on the other side.
>    dsm_segment *seg = dsm_attach(main_arg);
> 
> On both side, the address of the attached shared memory is
> obtained using dsm_segment_address(seg).
> 
> dsm_detach(seg) detaches the segment. All users of this segment
> detach the segment, it will be destroyed.

Perhaps, the more modern DSA mechanism could be applicable here, too.

Some recent commits demonstrate examples of DSA usage, such as BRIN
autosummarization commit (7526e10224f) and tidbitmap.c's shared iteration
support commit (98e6e89040a05).

Thanks,
Amit




Re: [HACKERS] Passing values to a dynamic background worker

From
Keith Fiske
Date:


On Tue, Apr 18, 2017 at 5:40 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/04/18 18:12, Kyotaro HORIGUCHI wrote:
> At Mon, 17 Apr 2017 16:19:13 -0400, Keith Fiske wrote:
>> So after reading a recent thread on the steep learning curve for PG
>> internals [1], I figured I'd share where I've gotten stuck with this in a
>> new thread vs hijacking that one.
>>
>> One of the goals I had with pg_partman was to see if I could get the
>> partitioning python scripts redone as C functions using a dynamic
>> background worker to be able to commit in batches with a single call. My
>> thinking was to have a user-function that can accept arguments for things
>> like the interval value, batch size, and other arguments to the python
>> script, then start/stop a dynamic bgw up for each batch so it can commit
>> after each one. The dymanic bgw would essentially just have to call the
>> already existing partition_data() plpgsql function, but I have to be able
>> to pass the argument values that the user gave down into the dynamic bgw.
>>
>> I've reached a roadblock in that bgw_main_arg can only accept a single
>> argument that must be passed by value for a dynamic bgw. I already worked
>> around this for passing the database name to the my existing use of a bgw
>> with doing partition maintenance (pass a simple integer to use as an index
>> array value). But I'm not sure how to do this for passing multiple values
>> in. I'm assuming this would be the place where I'd see about storing values
>> in shared memory to be able to re-use later? I'm not even sure if that's
>> the right approach, and if it is, where to even start to understand how to
>> do that.
>
> On the other hand, AFAICS, DSM doesn't seem well documented. I
> mangaged to find a related document in Postgres Wiki but it seems
> a bit old.
>
> https://wiki.postgresql.org/wiki/Parallel_Internal_Sort
>
> This is a little complex than static shared memory, and it is
> *not* guaranteed to mapped at the same address among workers. You
> will see an instance in LaunchParallelWorkers() and the related
> functions in parallel.c. The basic of its usage would be as the
> follows.
>
> - Create a segment :
>    dsm_segment *seg = dsm_create(size);
> - Send its handle via the bgw_main_arg.
>    worker.bgw_main_arg = dsm_segment_handle(seg);
> - Attach the memory on the other side.
>    dsm_segment *seg = dsm_attach(main_arg);
>
> On both side, the address of the attached shared memory is
> obtained using dsm_segment_address(seg).
>
> dsm_detach(seg) detaches the segment. All users of this segment
> detach the segment, it will be destroyed.

Perhaps, the more modern DSA mechanism could be applicable here, too.

Some recent commits demonstrate examples of DSA usage, such as BRIN
autosummarization commit (7526e10224f) and tidbitmap.c's shared iteration
support commit (98e6e89040a05).

Thanks,
Amit


Thank you both very much for the suggestions!

Keith

Re: [HACKERS] Passing values to a dynamic background worker

From
Peter Eisentraut
Date:
On 4/17/17 16:19, Keith Fiske wrote:
> I've reached a roadblock in that bgw_main_arg can only accept a single
> argument that must be passed by value for a dynamic bgw. I already
> worked around this for passing the database name to the my existing use
> of a bgw with doing partition maintenance (pass a simple integer to use
> as an index array value). But I'm not sure how to do this for passing
> multiple values in.

You can also store this kind of information in a table.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Passing values to a dynamic background worker

From
Keith Fiske
Date:

On Tue, Apr 18, 2017 at 12:34 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 4/17/17 16:19, Keith Fiske wrote:
> I've reached a roadblock in that bgw_main_arg can only accept a single
> argument that must be passed by value for a dynamic bgw. I already
> worked around this for passing the database name to the my existing use
> of a bgw with doing partition maintenance (pass a simple integer to use
> as an index array value). But I'm not sure how to do this for passing
> multiple values in.

You can also store this kind of information in a table.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

True, but that seemed like the easy way out. :)  Trying to find ways to learn internals better through projects I'm actively working on.

Keith