Re: POC: Extension for adding distributed tracing - pg_tracing - Mailing list pgsql-hackers
From | Anthonin Bonnefoy |
---|---|
Subject | Re: POC: Extension for adding distributed tracing - pg_tracing |
Date | |
Msg-id | CAO6_XqrTwFTakufrMgG5Ha9gz1ZHUO4WwqVTOUOez2cS0XKUag@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Extension for adding distributed tracing - pg_tracing (Aleksander Alekseev <aleksander@timescale.com>) |
Responses |
Re: POC: Extension for adding distributed tracing - pg_tracing
Re: POC: Extension for adding distributed tracing - pg_tracing |
List | pgsql-hackers |
Hi, Thanks for the review! > ``` > +-- Worker can take some additional time to end and report their spans > +SELECT pg_sleep(0.2); > + pg_sleep > +---------- > + > +(1 row) > ``` > > Pretty sure this will fail on buildfarm from time to time. Maybe this > is a case when it is worth using TAP tests? Especially considering the > support of pg_tracing.notify_channel. The sleep was unnecessary as the leader waits for all workers to finish and report. All workers should have added their spans in the shared buffer before the leader and I removed the sleep from the test. I haven't seen a recent failure on this test since then. I did add a TAP test for the notify part since it seems to be the only way to test it. > ``` > + Only superusers can change this setting. > + This parameter can only be set at server start. > ``` > > I find this confusing. If the parameter can only be set at server > start doesn't it mean that whoever has access to postgresql.conf can > change it? The 'only superusers...' part was incorrect. This setting was needed to compute the size of the shmem request so modifying it would require a server start. Though for this parameter, I've realised that I could use the global's max_parallel_workers instead, removing the need to have a max_traced_parallel_workers parameter. > Perhaps TABLE command should be added to the list, since it's > basically a syntax sugar for SELECT: Added. > Obviously pg_tracing.drop_on_full_buffer = on will not work properly > with pg_tracing.notify_threshold because there is a race condition > possible. By the time one receives and processes the notification the > buffer can be already cleaned. That's definitely a possibility. Lowering the notification threshold can lower the chance of the buffer being cleared when handling the notification but the race can still happen. For my use cases, spans should be sent as soon as possible and old spans are less valuable than new spans but other usage would want the opposite, thus the parameters to control the behavior. > This makes me think that 1) pg_tracing.drop_on_full_buffer = on could > be not the best default value I don't have a strong opinion on the default behaviour, I will switch to keep buffer on full. > 2) As a user I would like a support of ring buffer mode. Even if > implementing this mode is out of scope of this particular patch, I > would argue that instead of drop_on_full_buffer boolean parameter we > need something like: > > pg_tracing.buffer_mode = keep_on_full | drop_on_full | ring_buffer | > perhaps_other_future_modes Having a buffer ring was also my first approach but it adds some complexity and I've switched to the full drop + truncate query file for the first iteration. Ring buffer would definitely be a better experience, I've switched the parameter to an enum with only keep_on_full and drop_on_full for the time being. > Do you think it possible to document how much shared memory does the > extension consume depending on the parameters? Alternatively maybe > it's worth providing a SQL function that returns this value? As a user > I would like to know whether 10 MB, 500 MB or 1.5 GB of shared memory > are consumed. Good point. I've updated the doc to give some approximations with a query to detail the memory usage. > I would argue that ns ( 1 / 1_000_000_000 sec! ) precision is > redundant and only complicates the interface. This is way below the > noise floor formed from network delays, process switching by the OS, > noisy neighbours under the same supervisor, time slowdown by ntpd, > etc. I don't recall ever needing spans with better than 1 us > precision. > > On top of that if you have `span_start_ns` you should probably rename > `duration` to `duration_ns` for consistency. I believe it would be > more convenient to have `duration` in microseconds. Nothing prevents > us from using fractions of microseconds (e.g. 12.345 us) if necessary. I was definitely on the fence on this. I've initially tried to match the precision provided by query instrumentation but that definitely introduced a lot of complexity. Also, while trace standard specify that start and end should be in nanoseconds, it turns out that the nanosecond precision is generally ignored. I've switched to a start+end in TimestampTz, dropping the ns precision. Tests had to be rewritten to have a stable order with spans having that can have the same start/end. I've also removed some spans: ExecutorStart, ExecutorEnd, PostParse. Those spans generally had a sub microsecond duration and didn't bring much value except for debugging pg_tracing itself. To avoid the confusion of having spans with a 0us duration, removing them altogether seemed better. ExecutorFinish also has a sub microsecond duration unless it has a nested query (triggers) so I'm only emitting them in those cases. > Negative span ids, for sure, are possible, but I don't recall seeing > many negative ids in my life. I suggest not to surprise the user > unless this is really necessary. That one is gonna be difficult. Traceid and spanid are 8 bytes so I'm exporting them using bigint. PostgreSQL doesn't support unsigned bigint thus the negative span ids. An alternative would be to use a bytea but bigint are easier to read and interact with. The latest patch contains changes around the nanosecond precision drop, test updates to make them more stable and simplification around span management. Regards, Anthonin On Fri, Nov 17, 2023 at 3:34 PM Aleksander Alekseev <aleksander@timescale.com> wrote: > > Hi, > > Thanks for the updated patch! > > > Some small changes, mostly around making tests less flaky: > > - Removed the top_span and nested_level in the span output, those were > > mostly used for debugging > > - More tests around Parse span in nested queries > > - Remove explicit queryId in the tests > > ``` > +-- Worker can take some additional time to end and report their spans > +SELECT pg_sleep(0.2); > + pg_sleep > +---------- > + > +(1 row) > ``` > > Pretty sure this will fail on buildfarm from time to time. Maybe this > is a case when it is worth using TAP tests? Especially considering the > support of pg_tracing.notify_channel. > > ``` > + Only superusers can change this setting. > + This parameter can only be set at server start. > ``` > > I find this confusing. If the parameter can only be set at server > start doesn't it mean that whoever has access to postgresql.conf can > change it? > > ``` > + <varname>pg_tracing.track_utility</varname> controls whether spans > + should be generated for utility statements. Utility commands are > + all those other than <command>SELECT</command>, > <command>INSERT</command>, > + <command>UPDATE</command>, <command>DELETE</command>, and > <command>MERGE</command>. > + The default value is <literal>on</literal>. > ``` > > Perhaps TABLE command should be added to the list, since it's > basically a syntax sugar for SELECT: > > ``` > =# create table t(x int); > =# insert into t values (1), (2), (3); > =# table t; > x > --- > 1 > 2 > 3 > (3 rows) > ``` > > This part worries me a bit: > > ``` > + <listitem> > + <para> > + <varname>pg_tracing.drop_on_full_buffer</varname> controls whether > + span buffer should be emptied when <varname>pg_tracing.max_span</varname> > + spans is reached. If <literal>off</literal>, new spans are dropped. > + The default value is <literal>on</literal>. > + </para> > + </listitem> > + </varlistentry> > ``` > > Obviously pg_tracing.drop_on_full_buffer = on will not work properly > with pg_tracing.notify_threshold because there is a race condition > possible. By the time one receives and processes the notification the > buffer can be already cleaned. This makes me think that 1) > pg_tracing.drop_on_full_buffer = on could be not the best default > value 2) As a user I would like a support of ring buffer mode. Even if > implementing this mode is out of scope of this particular patch, I > would argue that instead of drop_on_full_buffer boolean parameter we > need something like: > > pg_tracing.buffer_mode = keep_on_full | drop_on_full | ring_buffer | > perhaps_other_future_modes > > ``` > + <para> > + The module requires additional shared memory proportional to > + <varname>pg_tracing.max_span</varname>. Note that this > + memory is consumed whenever the module is loaded, even if > + no spans are generated. > + </para> > ``` > > Do you think it possible to document how much shared memory does the > extension consume depending on the parameters? Alternatively maybe > it's worth providing a SQL function that returns this value? As a user > I would like to know whether 10 MB, 500 MB or 1.5 GB of shared memory > are consumed. > > ``` > + The start time of a span has two parts: <literal>span_start</literal> and > + <literal>span_start_ns</literal>. <literal>span_start</literal> is > a Timestamp > + with time zone providing up a microsecond precision. > <literal>span_start_ns</literal> > + provides the nanosecond part of the span start. > <literal>span_start + span_start_ns</literal> > + can be used to get the start with nanosecond precision. > ``` > > I would argue that ns ( 1 / 1_000_000_000 sec! ) precision is > redundant and only complicates the interface. This is way below the > noise floor formed from network delays, process switching by the OS, > noisy neighbours under the same supervisor, time slowdown by ntpd, > etc. I don't recall ever needing spans with better than 1 us > precision. > > On top of that if you have `span_start_ns` you should probably rename > `duration` to `duration_ns` for consistency. I believe it would be > more convenient to have `duration` in microseconds. Nothing prevents > us from using fractions of microseconds (e.g. 12.345 us) if necessary. > > ``` > +=# select trace_id, parent_id, span_id, span_start, duration, span_type, ... > + trace_id | parent_id | span_id | > ... > +----------------------+----------------------+----------------------+---- > + 479062278978111363 | 479062278978111363 | 479062278978111363 ... > + 479062278978111363 | 479062278978111363 | 479062278978111363 ... > + 479062278978111363 | 479062278978111363 | -7732425595282414585 ... > ``` > > Negative span ids, for sure, are possible, but I don't recall seeing > many negative ids in my life. I suggest not to surprise the user > unless this is really necessary. > > -- > Best regards, > Aleksander Alekseev
Attachment
pgsql-hackers by date: