Thread: Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes: > On Fri, May 04, 2007 at 12:38:18PM -0400, Tom Lane wrote: >> Magnus Hagander <magnus@hagander.net> writes: >>> Could we show it in EXPLAIN ANALYZE somehow? I'm thinking it would be good >>> to see at runtime (for example as a hint that if you put in a bit more >>> work_mem it might get used) >> >> I don't see that this is any more interesting than whether the sort >> spilled to disk or not; which is something we don't show in EXPLAIN >> ANALYZE either. trace_sort is the agreed API for examining that. > Now that you mention it, that'd be nice to have as well - the point being > making it available without recompile. Well, trace_sort is available by default, but... >> It's not exactly easy to do, because (a) none of this information >> is exposed outside tuplesort.c, and (b) the tuplesortstate object >> is probably gone by the time EXPLAIN ANALYZE runs, anyway. > Hmm. Ok. Don't know enough about those parts of the code to comment on > that, but I'll certainly take your word for it :-) I take back point (b) --- the state object is released at ExecutorEnd, and EXPLAIN ANALYZE examines the tree before doing that, so if we added some kind of reporting function to tuplesort.c's API it'd be doable easily enough. What do you think the output should look like? The first thought that comes to mind is to add "method=memory" (or disk or top-N) to the "actual" annotation: regression=# explain analyze select * from tenk1 order by fivethous limit 100; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------Limit (cost=840.19..840.44 rows=100 width=244) (actual time=140.511..141.604 rows=100 loops=1) -> Sort (cost=840.19..865.19rows=10000 width=244) (actual time=140.492..140.880 rows=100 loops=1 method=top-N) ^^^^^^^^^^^^ Sort Key: fivethous -> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244) (actual time=0.074..51.849 rows=10000 loops=1)Totalruntime: 143.089 ms (5 rows) Another possibility, which could be wedged into explain.c slightly more easily, is to append "Method: top-N" or some such to the Sort Key line, but I'm not sure that that would look nice. Comments, ideas? regards, tom lane
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
"Magnus Hagander"
Date:
> >> It's not exactly easy to do, because (a) none of this information > >> is exposed outside tuplesort.c, and (b) the tuplesortstate object > >> is probably gone by the time EXPLAIN ANALYZE runs, anyway. > > > Hmm. Ok. Don't know enough about those parts of the code to comment on > > that, but I'll certainly take your word for it :-) > > I take back point (b) --- the state object is released at ExecutorEnd, > and EXPLAIN ANALYZE examines the tree before doing that, so if we added > some kind of reporting function to tuplesort.c's API it'd be doable > easily enough. > > What do you think the output should look like? The first thought that > comes to mind is to add "method=memory" (or disk or top-N) to the > "actual" annotation: > > regression=# explain analyze select * from tenk1 order by fivethous limit 100; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------ > Limit (cost=840.19..840.44 rows=100 width=244) (actual time=140.511..141.604 rows=100 loops=1) > -> Sort (cost=840.19..865.19 rows=10000 width=244) (actual time=140.492..140.880 rows=100 loops=1 method=top-N) > ^^^^^^^^^^^^ > Sort Key: fivethous > -> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244) (actual time=0.074..51.849 rows=10000 loops=1) > Total runtime: 143.089 ms > (5 rows) Looks pretty good to me. Easy to find and hard to misunderstand :) /Magnus
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Jim Nasby
Date:
On May 4, 2007, at 7:08 PM, Tom Lane wrote: > What do you think the output should look like? The first thought that > comes to mind is to add "method=memory" (or disk or top-N) to the > "actual" annotation: > > regression=# explain analyze select * from tenk1 order by fivethous > limit 100; > QUERY PLAN > ---------------------------------------------------------------------- > -------------------------------------------------- > Limit (cost=840.19..840.44 rows=100 width=244) (actual > time=140.511..141.604 rows=100 loops=1) > -> Sort (cost=840.19..865.19 rows=10000 width=244) (actual > time=140.492..140.880 rows=100 loops=1 method=top-N) > > ^^^^^^^^^^^^ > Sort Key: fivethous > -> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 > width=244) (actual time=0.074..51.849 rows=10000 loops=1) > Total runtime: 143.089 ms > (5 rows) > > Another possibility, which could be wedged into explain.c slightly > more > easily, is to append "Method: top-N" or some such to the Sort Key > line, > but I'm not sure that that would look nice. If the method is disk it would be nice to know how much spilled to disk. That would tell you if it would be worth increasing work_mem, and by how much. On a related note, it would also be *really* nice if we kept stats on how many sorts or hashes had spilled to disk, perhaps along with how much had spilled. Right now the only way to monitor that in a production system is to setup a cron job to watch pgsql_tmp, which is far from elegant. I know there's concern about how much we add to the stats file, but I don't think this needs to be on a per-relation basis; per-database should be fine. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Tom Lane
Date:
Jim Nasby <decibel@decibel.org> writes: > On a related note, it would also be *really* nice if we kept stats on > how many sorts or hashes had spilled to disk, perhaps along with how > much had spilled. Right now the only way to monitor that in a > production system is to setup a cron job to watch pgsql_tmp, which is > far from elegant. No, you can turn on trace_sort and track it from watching the log. If pgfouine hasn't got something for that already, I'd be surprised. regards, tom lane
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Josh Berkus
Date:
> > What do you think the output should look like? The first thought that > > comes to mind is to add "method=memory" (or disk or top-N) to the > > "actual" annotation: Having the "disk" and "memory" would be really useful too. -- Josh Berkus PostgreSQL @ Sun San Francisco
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Tom Lane
Date:
Jim Nasby <decibel@decibel.org> writes: > If the method is disk it would be nice to know how much spilled to > disk. That would tell you if it would be worth increasing work_mem, > and by how much. Well, a more radical proposal is to add a whole 'nother line to the output, which would give us room for several bits of info. Perhaps like this: -> Sort (cost=840.19..865.19 rows=10000 width=244) (actual time=151.769..152.157 rows=100 loops=1) Sort Key: fivethous Sort Method: top-N Memory: 17KB -> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244) (actual or Sort Method: disk Memory: 1000KB Disk: 18482KB Not sure what other info might be useful. regards, tom lane
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
"Guillaume Smet"
Date:
On 5/4/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > -> Sort (cost=840.19..865.19 rows=10000 width=244) (actual time=140.492..140.880 rows=100 loops=1 method=top-N) > ^^^^^^^^^^^^ > Sort Key: fivethous > Another possibility, which could be wedged into explain.c slightly more > easily, is to append "Method: top-N" or some such to the Sort Key line, > but I'm not sure that that would look nice. Is it possible to have something like Sort (disk|top-N|memory) instead of Sort? I'm really not sure it's a good idea to break the (actual time=0.074..51.849 rows=10000 loops=1) output we have for every node. It's easier to read the output when it's consistent. If not, append it at the end of the Sort Key line is better IMHO. -- Guillaume
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
"Guillaume Smet"
Date:
On 5/4/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > No, you can turn on trace_sort and track it from watching the log. > If pgfouine hasn't got something for that already, I'd be surprised. Well, it hasn't. I never used trace_sort so i didn't think of implementing something to use it. I'll take a look at it for the next versions to see if I can do something useful. If anyone has suggestions/needs on this point, they are welcome. -- Guillaume
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
"Guillaume Smet"
Date:
On 5/4/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sort Method: disk Memory: 1000KB Disk: 18482KB +1 for this one. -- Guillaume
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Tom Lane
Date:
"Guillaume Smet" <guillaume.smet@gmail.com> writes: > Is it possible to have something like Sort (disk|top-N|memory) instead > of Sort? That would be sane if the decision were fixed at plan time, but it isn't. What do you think of the add-a-line approach? regards, tom lane
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Stefan Kaltenbrunner
Date:
Guillaume Smet wrote: > On 5/4/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Sort Method: disk Memory: 1000KB Disk: 18482KB > > +1 for this one. I like that one too ... Stefan
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Tom Lane
Date:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: > Guillaume Smet wrote: >> On 5/4/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Sort Method: disk Memory: 1000KB Disk: 18482KB >> >> +1 for this one. > I like that one too ... OK, in the event it looks like one of these four messages: "Sort Method: top-N heapsort Memory: %ldkB" "Sort Method: quicksort Memory: %ldkB" "Sort Method: external sort Disk:%ldkB" "Sort Method: external merge Disk: %ldkB" where "external merge" implies that the final merge pass was done on-the-fly instead of materializing the fully sorted data on disk. I'm not wedded to these method descriptions if anyone has better phrases in mind. Also, I tried to make the disk cases print disk and memory usage both, but was getting wacko numbers for memory usage. I had forgotten that the disk-sort path doesn't bother to track memory usage accurately once it starts returning tuples to the caller (since at that point all the decisions are made). I'm not really excited about fixing that; it would add per-tuple overhead for not much value. regards, tom lane
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Jim Nasby
Date:
On May 4, 2007, at 7:49 PM, Tom Lane wrote: > Jim Nasby <decibel@decibel.org> writes: >> On a related note, it would also be *really* nice if we kept stats on >> how many sorts or hashes had spilled to disk, perhaps along with how >> much had spilled. Right now the only way to monitor that in a >> production system is to setup a cron job to watch pgsql_tmp, which is >> far from elegant. > > No, you can turn on trace_sort and track it from watching the log. > If pgfouine hasn't got something for that already, I'd be surprised. There's several problems with that. First, trace_sort isn't documented (or at least it's not in postgresql.conf), so most folks don't know it exists. Second, in order to see it's output you have to drop log_min_messages to debug. That results in a huge log volume, especially on a production system. Aside from that, log files are not a good way to monitor performance, they should be used for reporting on exception conditions. If the log was meant to be the means for monitoring performance, then why have the statistics system at all? As for pgfouine, I've never been to a customer that knew what it was. But almost all of them have other monitoring tools such as cricket, MRTG and Nagios setup. Those that don't at least know they exist. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Tom Lane
Date:
Jim Nasby <decibel@decibel.org> writes: > There's several problems with that. First, trace_sort isn't > documented (or at least it's not in postgresql.conf), so most folks > don't know it exists. Second, in order to see it's output you have to > drop log_min_messages to debug. That results in a huge log volume, > especially on a production system. Nonsense ... trace_sort output is at LOG level. regards, tom lane
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Jim Nasby
Date:
On May 6, 2007, at 9:32 AM, Tom Lane wrote: > Jim Nasby <decibel@decibel.org> writes: >> There's several problems with that. First, trace_sort isn't >> documented (or at least it's not in postgresql.conf), so most folks >> don't know it exists. Second, in order to see it's output you have to >> drop log_min_messages to debug. That results in a huge log volume, >> especially on a production system. > > Nonsense ... trace_sort output is at LOG level. I stand corrected. But my point still remains. It wouldn't be unusual for a website database to be running several sorts a second; that means 4 lines per sort, which is still a large amount of data. If we really want to make the logfile the approved method for monitoring performance, then why do we have the stats infrastructure at all? It could all be replaced with logging output and pgfouine. Why are we maintaining two separate sets of code for monitoring database performance? -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Alvaro Herrera
Date:
Jim Nasby wrote: > If we really want to make the logfile the approved method for > monitoring performance, then why do we have the stats infrastructure > at all? It could all be replaced with logging output and pgfouine. First we'd have to fix the usability problem of our redirect_stderr stuff for pgfouine (multiline messages from different backends are mixed, according to Guillaume). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Magnus Hagander
Date:
Alvaro Herrera wrote: > Jim Nasby wrote: > >> If we really want to make the logfile the approved method for >> monitoring performance, then why do we have the stats infrastructure >> at all? It could all be replaced with logging output and pgfouine. > > First we'd have to fix the usability problem of our redirect_stderr > stuff for pgfouine (multiline messages from different backends are > mixed, according to Guillaume). I'd like to sign on to the list of people saying that logging isn't the best way to do performance measuring. Providing a way to get at the counters in realtime for monitoring or graphing or whatever is what AFAIK everybody else do, and it's for a good reason - it fits in to existing monitoring/management solutions. It may be that our current stats system isn't the best way to do this, but if it isn't that just means we have to come up with a better way. //Magnus
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Magnus Hagander
Date:
Magnus Hagander wrote: > Alvaro Herrera wrote: >> Jim Nasby wrote: >> >>> If we really want to make the logfile the approved method for >>> monitoring performance, then why do we have the stats infrastructure >>> at all? It could all be replaced with logging output and pgfouine. >> First we'd have to fix the usability problem of our redirect_stderr >> stuff for pgfouine (multiline messages from different backends are >> mixed, according to Guillaume). > > I'd like to sign on to the list of people saying that logging isn't the > best way to do performance measuring. Providing a way to get at the > counters in realtime for monitoring or graphing or whatever is what > AFAIK everybody else do, and it's for a good reason - it fits in to > existing monitoring/management solutions. It may be that our current > stats system isn't the best way to do this, but if it isn't that just > means we have to come up with a better way. Speaking of which, it might be interesting to actually show these values in the stats collector. I was thinking three cols for each database (probably the best level?) that counts each of those three counters. If you have a lot of sorts (percentage-wise) spilling to disk, it is often something you want to investigate, so exposing it that way seems like a good thing. Comments on that? Oh, and is it too late to sneak this into 8.3 claiming it's an extension to the addition of the new sort feature? ;-) //Magnus
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Jim Nasby
Date:
On May 8, 2007, at 2:24 PM, Magnus Hagander wrote: > Speaking of which, it might be interesting to actually show these > values > in the stats collector. I was thinking three cols for each database > (probably the best level?) that counts each of those three > counters. If > you have a lot of sorts (percentage-wise) spilling to disk, it is > often > something you want to investigate, so exposing it that way seems > like a > good thing. What 3 columns? In-memory sorts, on-disk sorts, and on-disk size? (Sum of how much spilled to disk). I agree that per-database makes sense, though I'd settle for per- cluster. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Magnus Hagander
Date:
On Wed, May 09, 2007 at 10:55:12AM -0500, Jim Nasby wrote: > On May 8, 2007, at 2:24 PM, Magnus Hagander wrote: > >Speaking of which, it might be interesting to actually show these > >values > >in the stats collector. I was thinking three cols for each database > >(probably the best level?) that counts each of those three > >counters. If > >you have a lot of sorts (percentage-wise) spilling to disk, it is > >often > >something you want to investigate, so exposing it that way seems > >like a > >good thing. > > What 3 columns? In-memory sorts, on-disk sorts, and on-disk size? > (Sum of how much spilled to disk). I was thinking in-mem sorts, on-disk sorts, limited-by-LIMIT sorts (that would be the new feature..) //Magnus
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Gregory Stark
Date:
"Magnus Hagander" <magnus@hagander.net> writes: >> What 3 columns? In-memory sorts, on-disk sorts, and on-disk size? >> (Sum of how much spilled to disk). > > I was thinking in-mem sorts, on-disk sorts, limited-by-LIMIT sorts (that > would be the new feature..) Tom's code distinguished in-memory, top-N, on-disk with final merge postponed, and on-disk with materialized result. Four categories. But I think the distinction between the two types of in-memory and the two types of on-disk sorts is only really useful when you're looking at an individual query. And even then probably only useful to a Postgres hacker, not a DBA. It seems like it would be more useful to just break it down into in-memory and on-disk but for each give number of sorts, number of tuples, and space used. What would be really handy is breaking this down by table -- probably that would only be possible when the sort is sorting directly a table scan. I don't even know how easy it would be to get that information. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Magnus Hagander
Date:
Gregory Stark wrote: > "Magnus Hagander" <magnus@hagander.net> writes: > >>> What 3 columns? In-memory sorts, on-disk sorts, and on-disk size? >>> (Sum of how much spilled to disk). >> I was thinking in-mem sorts, on-disk sorts, limited-by-LIMIT sorts (that >> would be the new feature..) > > Tom's code distinguished in-memory, top-N, on-disk with final merge postponed, > and on-disk with materialized result. Four categories. But I think the > distinction between the two types of in-memory and the two types of on-disk > sorts is only really useful when you're looking at an individual query. And > even then probably only useful to a Postgres hacker, not a DBA. Missed the two on-disk distinctions, yeah. But you're probably right that on-disk vs in-memory is enough, the interesting thing is to get indications on when you hit disk given what it does for performance. > It seems like it would be more useful to just break it down into in-memory and > on-disk but for each give number of sorts, number of tuples, and space used. > > What would be really handy is breaking this down by table -- probably that > would only be possible when the sort is sorting directly a table scan. I don't > even know how easy it would be to get that information. And how would you deal with the data that's sorting the result of a join or something like that - makes things a lot more complicated ;) And the original question remains, 8.3 or 8.4... //Magnus
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Heikki Linnakangas
Date:
Magnus Hagander wrote: > Gregory Stark wrote: >> "Magnus Hagander" <magnus@hagander.net> writes: >> >>>> What 3 columns? In-memory sorts, on-disk sorts, and on-disk size? >>>> (Sum of how much spilled to disk). >>> I was thinking in-mem sorts, on-disk sorts, limited-by-LIMIT sorts (that >>> would be the new feature..) >> Tom's code distinguished in-memory, top-N, on-disk with final merge postponed, >> and on-disk with materialized result. Four categories. But I think the >> distinction between the two types of in-memory and the two types of on-disk >> sorts is only really useful when you're looking at an individual query. And >> even then probably only useful to a Postgres hacker, not a DBA. > > Missed the two on-disk distinctions, yeah. But you're probably right > that on-disk vs in-memory is enough, the interesting thing is to get > indications on when you hit disk given what it does for performance. Keep in mind that when the sort "goes to disk", it actually just means that it used up work_mem and switches to merge sort with tapes. In a typical configuration, there's plenty of RAM available to buffer the tapes, so the terms on-disk and in-memory sorts are misleading. If I've understood earlier discussion correctly, the quicksort -> tape sort point is not even that interesting because the tape sort is actually not that much slower than quicksort, as long as it fits in RAM. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [COMMITTERS] pgsql: Teach tuplesort.c about "top N" sorting, in which only the first
From
Jim Nasby
Date:
On May 9, 2007, at 11:01 AM, Magnus Hagander wrote: > On Wed, May 09, 2007 at 10:55:12AM -0500, Jim Nasby wrote: >> On May 8, 2007, at 2:24 PM, Magnus Hagander wrote: >>> Speaking of which, it might be interesting to actually show these >>> values >>> in the stats collector. I was thinking three cols for each database >>> (probably the best level?) that counts each of those three >>> counters. If >>> you have a lot of sorts (percentage-wise) spilling to disk, it is >>> often >>> something you want to investigate, so exposing it that way seems >>> like a >>> good thing. >> >> What 3 columns? In-memory sorts, on-disk sorts, and on-disk size? >> (Sum of how much spilled to disk). > > I was thinking in-mem sorts, on-disk sorts, limited-by-LIMIT sorts > (that > would be the new feature..) It would also be useful to know how much got spilled to disk. If it's a large amount per sort, then there's probably not much you could do, but if it's just a tad over available memory per-sort... -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)