Thread: TRACE_SORT defined by default

TRACE_SORT defined by default

From

Joe Conway

Date:

24 April 2019, 21:07:13

I just noticed that TRACE_SORT is defined by default (since 2005
apparently). It seems odd since it is the only debugging code enabled by
default.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Attachment

signature.asc

Re: TRACE_SORT defined by default

From

Peter Geoghegan

Date:

24 April 2019, 21:10:44

On Wed, Apr 24, 2019 at 2:07 PM Joe Conway <mail@joeconway.com> wrote:
> I just noticed that TRACE_SORT is defined by default (since 2005
> apparently). It seems odd since it is the only debugging code enabled by
> default.

I think that we should get rid of the #ifdef stuff, so that it isn't
possible to disable the trace_sort instrumentation my commenting out
the TRACE_SORT entry in pg_config_manual.h. I recall being opposed on
this point by Robert Haas. Possibly because he just didn't want to
deal with it at the time.

-- 
Peter Geoghegan

Re: TRACE_SORT defined by default

From

Joe Conway

Date:

24 April 2019, 21:15:25

On 4/24/19 5:10 PM, Peter Geoghegan wrote:
> On Wed, Apr 24, 2019 at 2:07 PM Joe Conway <mail@joeconway.com> wrote:
>> I just noticed that TRACE_SORT is defined by default (since 2005
>> apparently). It seems odd since it is the only debugging code enabled by
>> default.
>
> I think that we should get rid of the #ifdef stuff, so that it isn't
> possible to disable the trace_sort instrumentation my commenting out
> the TRACE_SORT entry in pg_config_manual.h. I recall being opposed on
> this point by Robert Haas. Possibly because he just didn't want to
> deal with it at the time.

Has anyone ever (or recently) measured the impact on performance to have
this enabled? Is it that generically useful for debugging of production
instances of Postgres that we really want it always enabled despite the
performance impact?

Maybe the answer to both is "yes", but if so I would agree that we ought
to remove the define and ifdef's and just bake it in.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Attachment

signature.asc

Re: TRACE_SORT defined by default

From

Peter Geoghegan

Date:

24 April 2019, 21:23:31

On Wed, Apr 24, 2019 at 2:15 PM Joe Conway <mail@joeconway.com> wrote:
> Has anyone ever (or recently) measured the impact on performance to have
> this enabled? Is it that generically useful for debugging of production
> instances of Postgres that we really want it always enabled despite the
> performance impact?

It is disabled by default, in the sense that the trace_sort GUC
defaults to off. I believe that the overhead of building in the
instrumentation without enabling it is indistinguishable from zero. In
any case the current status quo is that it's built by default. I have
used it in production, though not very often. It's easy to turn it on
and off.

> Maybe the answer to both is "yes", but if so I would agree that we ought
> to remove the define and ifdef's and just bake it in.

We're only talking about removing the option of including the
instrumentation in binaries when Postgres is built. I'm not aware that
anyone is doing that. It nobody was doing that, then nobody could be
affected by removing the #ifdef crud.

I suspect that the reason that this hasn't happened already is because
it leaves trace_sort/TRACE_SORT in the slightly awkward position of no
longer quite meeting the traditional definition of a "developer
option".

-- 
Peter Geoghegan

Re: TRACE_SORT defined by default

From

Alvaro Herrera

Date:

24 April 2019, 21:29:08

On 2019-Apr-24, Peter Geoghegan wrote:

> I suspect that the reason that this hasn't happened already is because
> it leaves trace_sort/TRACE_SORT in the slightly awkward position of no
> longer quite meeting the traditional definition of a "developer
> option".

This is a really strange argument.  You're saying that somebody thought
about it: "Hmm, well, I can remove this preprocessor symbol but then
trace_sort would no longer resemble a developer option.  So I'm going to
leave the symbol alone".  I don't think that's what happened.  It seems
more likely to me that nobody has gone to the trouble of deciding that
the symbol is worth removing, let alone actually doing it.

If the instrumentation is good, and you seem to be saying that it is, I
think we should just remove the symbol and be done with it.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: TRACE_SORT defined by default

From

Peter Geoghegan

Date:

24 April 2019, 21:31:36

On Wed, Apr 24, 2019 at 2:29 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> This is a really strange argument.  You're saying that somebody thought
> about it: "Hmm, well, I can remove this preprocessor symbol but then
> trace_sort would no longer resemble a developer option.  So I'm going to
> leave the symbol alone".  I don't think that's what happened.  It seems
> more likely to me that nobody has gone to the trouble of deciding that
> the symbol is worth removing, let alone actually doing it.

It doesn't seem very important now.

> If the instrumentation is good, and you seem to be saying that it is, I
> think we should just remove the symbol and be done with it.

Sounds like a plan. Do you want to take care of it, Joe?

-- 
Peter Geoghegan

Re: TRACE_SORT defined by default

From

Tom Lane

Date:

24 April 2019, 22:04:41

Peter Geoghegan <pg@bowt.ie> writes:
> On Wed, Apr 24, 2019 at 2:15 PM Joe Conway <mail@joeconway.com> wrote:
>> Has anyone ever (or recently) measured the impact on performance to have
>> this enabled? Is it that generically useful for debugging of production
>> instances of Postgres that we really want it always enabled despite the
>> performance impact?

> It is disabled by default, in the sense that the trace_sort GUC
> defaults to off. I believe that the overhead of building in the
> instrumentation without enabling it is indistinguishable from zero.

It would probably be useful to actually prove that rather than just
assuming it.  I do see some code under the symbol that is executed
even when !trace_sort, and in any case Andres keeps complaining that
even always-taken branches are expensive ...

> In
> any case the current status quo is that it's built by default. I have
> used it in production, though not very often. It's easy to turn it on
> and off.

Would any non-wizard really have a use for it?

It seems like we should either make this really a developer option
(and hence not enabled by default) or else move it into some other
category than DEVELOPER_OPTIONS.

            regards, tom lane

Re: TRACE_SORT defined by default

From

Peter Geoghegan

Date:

24 April 2019, 22:40:44

On Wed, Apr 24, 2019 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > It is disabled by default, in the sense that the trace_sort GUC
> > defaults to off. I believe that the overhead of building in the
> > instrumentation without enabling it is indistinguishable from zero.
>
> It would probably be useful to actually prove that rather than just
> assuming it.

The number of individual trace_sort LOG messages is proportionate to
the number of runs produced.

> I do see some code under the symbol that is executed
> even when !trace_sort, and in any case Andres keeps complaining that
> even always-taken branches are expensive ...

I think that you're referring to the stuff needed for the D-Trace
probes. It's a pity that there isn't better support for that, since
Linux has a lot for options around static userspace probes these days
(SystemTap is very much out of favor, and never was very popular).
There seems to be a recognition among the Linux people that the
distinction between users and backend experts is blurred. The kernel
support for things like eBPF and BCC is still patchy, but that will
change.

> Would any non-wizard really have a use for it?

That depends on what the cut-off point is for wizard. I recognize that
there is a need to draw the line somewhere. I suspect that a fair
number of people could intuit problems in a real-world scenario using
trace_sort, without having any grounding in the theory, and without
much knowledge of tuplesort.c specifically.

> It seems like we should either make this really a developer option
> (and hence not enabled by default) or else move it into some other
> category than DEVELOPER_OPTIONS.

The information that it makes available is approximately the same as
the information made available by the new
pg_stat_progress_create_index view, but with getrusage() stats.

-- 
Peter Geoghegan

Re: TRACE_SORT defined by default

From

Jeff Janes

Date:

25 April 2019, 15:53:16

On Wed, Apr 24, 2019 at 6:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Geoghegan <pg@bowt.ie> writes:

> In
> any case the current status quo is that it's built by default. I have
> used it in production, though not very often. It's easy to turn it on
> and off.

Would any non-wizard really have a use for it?

I've had people use it to get some insight into the operation and memory usage of Aggregate nodes, since those nodes offer nothing useful via EXPLAIN ANALYZE. It would be a shame to lose that ability on package-installed PostgreSQL unless we fix Aggregate node reporting first.

Cheers,

Jeff

Re: TRACE_SORT defined by default

From

Alvaro Herrera

Date:

25 April 2019, 20:52:03

On 2019-Apr-25, Jeff Janes wrote:

> I've had people use it to get some insight into the operation and memory
> usage of Aggregate nodes, since those nodes offer nothing useful via
> EXPLAIN ANALYZE.  It would be a shame to lose that ability on
> package-installed PostgreSQL unless we fix Aggregate node reporting first.

But the proposal is not to remove the _code_.  The proposal is just to
remove that "#ifdef" lines that would make it conditionally compilable,
*if* the symbol that they test weren't always enabled.  In other words,
turn it from "always compiled, but you can turn it off although nobody
does" into "always compiled".

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: TRACE_SORT defined by default

From

Peter Geoghegan

Date:

25 April 2019, 20:54:58

On Thu, Apr 25, 2019 at 1:52 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> But the proposal is not to remove the _code_.  The proposal is just to
> remove that "#ifdef" lines that would make it conditionally compilable,
> *if* the symbol that they test weren't always enabled.  In other words,
> turn it from "always compiled, but you can turn it off although nobody
> does" into "always compiled".

Tom suggested that we might want to remove the code as an alternative.
We have two almost-opposite proposals here.

As I said, I think that the way that we think about or define
developer options frames this discussion.

-- 
Peter Geoghegan

Re: TRACE_SORT defined by default

From

Tom Lane

Date:

25 April 2019, 20:56:33

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-Apr-25, Jeff Janes wrote:
>> I've had people use it to get some insight into the operation and memory
>> usage of Aggregate nodes, since those nodes offer nothing useful via
>> EXPLAIN ANALYZE.  It would be a shame to lose that ability on
>> package-installed PostgreSQL unless we fix Aggregate node reporting first.

> But the proposal is not to remove the _code_.  The proposal is just to
> remove that "#ifdef" lines that would make it conditionally compilable,
> *if* the symbol that they test weren't always enabled.  In other words,
> turn it from "always compiled, but you can turn it off although nobody
> does" into "always compiled".

Well, I was suggesting that we ought to consider the alternative of
making it *not* always compiled, and Jeff was pushing back on that.

            regards, tom lane

Re: TRACE_SORT defined by default

From

Peter Geoghegan

Date:

25 April 2019, 20:56:56

On Thu, Apr 25, 2019 at 1:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, I was suggesting that we ought to consider the alternative of
> making it *not* always compiled, and Jeff was pushing back on that.

Right. Sorry.

-- 
Peter Geoghegan

Re: TRACE_SORT defined by default

From

Tomas Vondra

Date:

25 April 2019, 21:49:09

On Wed, Apr 24, 2019 at 06:04:41PM -0400, Tom Lane wrote:
>Peter Geoghegan <pg@bowt.ie> writes:
>> On Wed, Apr 24, 2019 at 2:15 PM Joe Conway <mail@joeconway.com> wrote:
>>> Has anyone ever (or recently) measured the impact on performance to have
>>> this enabled? Is it that generically useful for debugging of production
>>> instances of Postgres that we really want it always enabled despite the
>>> performance impact?
>
>> It is disabled by default, in the sense that the trace_sort GUC
>> defaults to off. I believe that the overhead of building in the
>> instrumentation without enabling it is indistinguishable from zero.
>
>It would probably be useful to actually prove that rather than just
>assuming it.  I do see some code under the symbol that is executed
>even when !trace_sort, and in any case Andres keeps complaining that
>even always-taken branches are expensive ...
>

Did I hear the magical word "benchmark" over here?

I suppose it'd be useful to have some actual numbers showing what
overhead this actually has, and whether disabling it would make any
difference. I can't run anything right away, but I could get us some
numbers in a couple of days, assuming there is some agreement on which
cases we need to test.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: TRACE_SORT defined by default

From

Craig Ringer

Date:

29 October 2019, 08:54:47

On Thu, 25 Apr 2019 at 06:41, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Wed, Apr 24, 2019 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > It is disabled by default, in the sense that the trace_sort GUC
> > > defaults to off. I believe that the overhead of building in the
> > > instrumentation without enabling it is indistinguishable from zero.
> >
> > It would probably be useful to actually prove that rather than just
> > assuming it.
>
> The number of individual trace_sort LOG messages is proportionate to
> the number of runs produced.
>
> > I do see some code under the symbol that is executed
> > even when !trace_sort, and in any case Andres keeps complaining that
> > even always-taken branches are expensive ...
>
> I think that you're referring to the stuff needed for the D-Trace
> probes. It's a pity that there isn't better support for that, since
> Linux has a lot for options around static userspace probes these days
> (SystemTap is very much out of favor, and never was very popular).

Which is a real shame. I got into it last week and I cannot believe
I've wasted time and effort trying to get anything useful out of perf
when it comes to tracing. There's just no comparison.

As usual, the incoming replacement tools are broken, incompatible and
incomplete, especially for userspace. Because really, bah, users, who
cares about users? But yet issues with systemtap are being dismissed
with variants of "that's obsolete, use perf or eBPF-tools". Right,
just like a CnC router can be easily replaced with a chisel.

I expect eBPF-tools will be pretty amazing once it matures. But for
those of us stuck in "working with actual user applications" land, on
RHEL6 and RHEL7 and the like, it won't be doing so in a hurry.

With that said, static probes are useful but frustrating. The
dtrace-alike model SystemTap adopted, and that we use in Pg via
SystemTap's 'dtrace' script, generates quite dtrace-alike static probe
points, complete with the frustrating deficiencies of those probe
points. Most importantly, they don't record the probe argument names
or data types, and if you want to get a string value you need to
handle each string probe argument individually.

Static probes are fantastic as labels and they let you see the program
state in places that are often impractical for debuginfo-based DWARF
probing since they give you a stable, consistent way to see something
other than function args and return values. But they're frustratingly
deficient compared to DWARF-based probes in other ways.

> There seems to be a recognition among the Linux people that the
> distinction between users and backend experts is blurred. The kernel
> support for things like eBPF and BCC is still patchy, but that will
> change.

Just in time for it to be deemed obsolete and replaced with another
tool that only works with the kernel for the first couple of years,
probably. Like perf was.

> > Would any non-wizard really have a use for it?

Extremely strongly yes.

If nothing else you can wrap these tools up into toolsets and scripts
that give people insights into running systems just by running the
script. Non-invasively, without code changes, on an existing running
system.

I wrote a systemtap script script last week that tracks each
transaction in a Pg backend from xid allocation though to
commit/rollback/prepare/commitprepared/rollbackprepared and takes
stats on xid allocations, txn durations, etc. Then figures out if the
committed txn needs logical decoding by any existing slots and tracks
how long each txn takes between commit until a given logical walsender
finishes decoding it and sending it. Collects stats on
inserts/updates/deletes etc in each txn, txn size, etc. Then follows
the txn and observes it being received and applied by a logical
replication client (pglogical). So it can report latencies and
throughputs in every step through the logical replication pipeline.

Right now that script isn't pretty, and it's not something easily
reused. But I could wrap it up in a script that turned on/off parts
based on what a user needed, wrote the stats to csv for
postprocessing, etc.

The underlying tool isn't for end users. But the end result sure can
be. After all, we don't expect users to mess with xact.c and
transam.c, we just expect them to run SQL, but we don't say PostgreSQL
is only for wizards not end users.

For that reason I'm trying to find time to add a large pile more probe
points to PostgreSQL. Informed in part by what I learned writing the
above script. I want probes for WaitEventSetWait, transam activities
(especially xid allocation, commit, rollback), 2pc, walsender
activity, reorderbuffer, walreceiver, slots, global xmin/catalog_xmin
changes, writes, file flushes, buffer access, and lots more. (Pg's
existing probes on transaction start and finish are almost totally
useless as you can't tell if the txn then gets an xid allocated,
whether the commit generates an xlog record or not, etc).

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

Re: TRACE_SORT defined by default

From

Craig Ringer

Date:

29 October 2019, 09:00:21

On Fri, 26 Apr 2019 at 05:49, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On Wed, Apr 24, 2019 at 06:04:41PM -0400, Tom Lane wrote:
> >Peter Geoghegan <pg@bowt.ie> writes:
> >> On Wed, Apr 24, 2019 at 2:15 PM Joe Conway <mail@joeconway.com> wrote:
> >>> Has anyone ever (or recently) measured the impact on performance to have
> >>> this enabled? Is it that generically useful for debugging of production
> >>> instances of Postgres that we really want it always enabled despite the
> >>> performance impact?
> >
> >> It is disabled by default, in the sense that the trace_sort GUC
> >> defaults to off. I believe that the overhead of building in the
> >> instrumentation without enabling it is indistinguishable from zero.
> >
> >It would probably be useful to actually prove that rather than just
> >assuming it.  I do see some code under the symbol that is executed
> >even when !trace_sort, and in any case Andres keeps complaining that
> >even always-taken branches are expensive ...
> >
>
> Did I hear the magical word "benchmark" over here?
>
> I suppose it'd be useful to have some actual numbers showing what
> overhead this actually has, and whether disabling it would make any
> difference. I can't run anything right away, but I could get us some
> numbers in a couple of days, assuming there is some agreement on which
> cases we need to test.


If you're worried about overheads of dtrace-style probes, you can
(with systemtap ones like we use) generate a set of semaphores as a
separate .so that you link into the final build. Then you can test for
TRACE_POSTGRESQL_FOO_BAR_ENABLED() and only do any work required to
generate input for the trace call if that returns true. You can
generally unlikely() it since you don't care about the cost of it with
tracing enabled nearly as much.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise