Thread: Progress bar updates
Has anyone looked thought about what it would take to get progress bars from clients like pgadmin? (Or dare I even suggest psql:) My first thought would be a message like CancelQuery which would cause the backend to peek into a static data structure and return a message that the client could parse and display something intelligent. Various commands would then stuff information into this data structure as they worked. For a first cut this "data structure" could just be a float between 0 and 1. Or perhaps it should be two integers, a "current" and an "estimated final". That would let the client do more intelligent things when the estimates change for the length of the whole job. Later I could imagine elaborating into more complex structures for representing multi-step processes or even whole query plans. I also see it possibly being interesting to stuff this data structure into shared memory handled just like how Tom handled the "current command". That would let you see the other queries running on the server, how long they've been running, and estimates for how long they'll continue to run. I would suggest starting with utility functions like index builds or COPY which would have to be specially handled anyways. Handling all optimizable queries in a single generic implementation seems like something to tackle only once the basic infrastructure is there and working for simple cases. Of course the estimates would be not much better than guesses. But if you want to say it's not worth having since they won't be perfectly accurate be prepared to swear that you've never looked at the "% complete" that modern ftp clients and web browsers display even though they too are, of course, widely inaccurate. They nonetheless provide some feedback the user desperately wants to be reassured that his job is making progress and isn't years away from finishing. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
+1 Looked, talked it through, it's a very important feature to BI users. - Luke Sent from my GoodLink synchronized handheld (www.good.com) -----Original Message----- From: Gregory Stark [mailto:gsstark@mit.edu] Sent: Tuesday, July 18, 2006 02:38 PM Eastern Standard Time To: pgsql-hackers@postgresql.org Subject: [HACKERS] Progress bar updates Has anyone looked thought about what it would take to get progress bars from clients like pgadmin? (Or dare I even suggest psql:) My first thought would be a message like CancelQuery which would cause the backend to peek into a static data structure and return a message that the client could parse and display something intelligent. Various commands would then stuff information into this data structure as they worked. For a first cut this "data structure" could just be a float between 0 and 1. Or perhaps it should be two integers, a "current" and an "estimated final". That would let the client do more intelligent things when the estimates change for the length of the whole job. Later I could imagine elaborating into more complex structures for representing multi-step processes or even whole query plans. I also see it possibly being interesting to stuff this data structure into shared memory handled just like how Tom handled the "current command". That would let you see the other queries running on the server, how long they've been running, and estimates for how long they'll continue to run. I would suggest starting with utility functions like index builds or COPY which would have to be specially handled anyways. Handling all optimizable queries in a single generic implementation seems like something to tackle only once the basic infrastructure is there and working for simple cases. Of course the estimates would be not much better than guesses. But if you want to say it's not worth having since they won't be perfectly accurate be prepared to swear that you've never looked at the "% complete" that modern ftp clients and web browsers display even though they too are, of course, widely inaccurate. They nonetheless provide some feedback the user desperately wants to be reassured that his job is making progress and isn't years away from finishing. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Gregory Stark > Sent: 18 July 2006 19:36 > To: pgsql-hackers@postgresql.org > Subject: [HACKERS] Progress bar updates > > > For a first cut this "data structure" could just be a float > between 0 and 1. > Or perhaps it should be two integers, a "current" and an > "estimated final". > That would let the client do more intelligent things when the > estimates change > for the length of the whole job. Hi Greg, I would vote for the latter so that we could give more meaningful feedback - for example, when vacuuming you might give a scale of 0 to <num tables>. In cases such as COPY where you mightn't have any idea of an upper bound, then a simple heartbeat could be supplied so at least the client could count rows (or 100's of rows) processed or whatever. It would certainly allow us to present a nicer user experience in pgAdmin :-) Regards, Dave.
Gregory Stark wrote: > Has anyone looked thought about what it would take to get progress bars from > clients like pgadmin? (Or dare I even suggest psql:) > Some weeks ago I proposed a PROGRESS parameter for COPY, to enable progress feedback via notices. tgl thinks nobody needs that... Regards, Andreas
On Tue, 2006-07-18 at 14:35 -0400, Gregory Stark wrote: > My first thought would be a message like CancelQuery which would cause the > backend to peek into a static data structure and return a message that the > client could parse and display something intelligent. I'm not quite sure what you're suggesting; presumably you'd need to open another client connection to send the "status report" message to a backend (since a backend will not be polling its input socket during query execution). That just seems like the wrong approach -- stashing a backend's current status into shared memory sounds more promising, IMHO, and won't require changes to the FE/BE protocol. > I would suggest starting with utility functions like index builds or COPY > which would have to be specially handled anyways. Handling all optimizable > queries in a single generic implementation seems like something to tackle only > once the basic infrastructure is there and working for simple cases. > > Of course the estimates would be not much better than guesses. Estimating query progress for DDL should be reasonably doable, but I think it would require some hard thought to get even somewhat accurate estimates for SELECT queries -- and I'm not sure there's much point doing this if we don't at least have an idea how we might implement reasonably accurate progress reporting for every kind of query. This paper is worth a read: Gang Luo, Jeffrey F.Naughton, Curt Ellmann and Michael Watzke: Toward a Progress Indicator for Database Queries. SIGMOD Conference 2004: 791-802. Interestingly, they apparently implemented a prototype using PostgreSQL. -Neil
Neil Conway <neilc@samurai.com> writes: > I'm not quite sure what you're suggesting; presumably you'd need to open > another client connection to send the "status report" message to a > backend (since a backend will not be polling its input socket during > query execution). That just seems like the wrong approach -- stashing a > backend's current status into shared memory sounds more promising, IMHO, > and won't require changes to the FE/BE protocol. Yeah, I was about to make the same comment. The new support for query status in shared memory should make it pretty cheap to update a progress indicator there, and then it'd be trivial to expose the indicator to other backends via pg_stat_activity. Sending the progress info directly to the connected client implies protocol changes (fairly trivial ones) and client changes (possibly highly nontrivial ones --- think about how you'd get the info out through something like a webserver application with multiple layers of software in the way). In practice, if a query is taking long enough for this feature to be interesting, making another connection and looking to see what's happening is not a problem, and it's likely to be the most practical way anyway for many clients. regards, tom lane
Andreas, > Some weeks ago I proposed a PROGRESS parameter for COPY, to enable > progress feedback via notices. tgl thinks nobody needs that... Well, *Tom* doesn't need it. What mechanism did you propose to make this work? -- Josh Berkus PostgreSQL @ Sun San Francisco
Tom Lane <tgl@sss.pgh.pa.us> writes: > Neil Conway <neilc@samurai.com> writes: > > I'm not quite sure what you're suggesting; presumably you'd need to open > > another client connection to send the "status report" message to a > > backend (since a backend will not be polling its input socket during > > query execution). That just seems like the wrong approach -- stashing a > > backend's current status into shared memory sounds more promising, IMHO, > > and won't require changes to the FE/BE protocol. > > Yeah, I was about to make the same comment. The new support for query > status in shared memory should make it pretty cheap to update a progress > indicator there, and then it'd be trivial to expose the indicator to > other backends via pg_stat_activity. I think that would be a fine feature too. But I don't think that reduces the desire clients have to be able to request updates on the status of their own queries. > In practice, if a query is taking long enough for this feature to be > interesting, making another connection and looking to see what's happening > is not a problem, and it's likely to be the most practical way anyway for > many clients. It would be the most practical way for a DBA to monitor an application. But it's not going to be convenient for clients like pgadmin or psql. Even a web server may want to, for example, stream ajax code updating a progress bar until it has results and then stream the ajax to display the results. Having to get the backend pid before your query and then open a second database connection to monitor your first connection would be extra footwork for nothing. -- greg
Ühel kenal päeval, K, 2006-07-19 kell 05:18, kirjutas Greg Stark: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > > Neil Conway <neilc@samurai.com> writes: > > > I'm not quite sure what you're suggesting; presumably you'd need to open > > > another client connection to send the "status report" message to a > > > backend (since a backend will not be polling its input socket during > > > query execution). That just seems like the wrong approach -- stashing a > > > backend's current status into shared memory sounds more promising, IMHO, > > > and won't require changes to the FE/BE protocol. > > > > Yeah, I was about to make the same comment. The new support for query > > status in shared memory should make it pretty cheap to update a progress > > indicator there, and then it'd be trivial to expose the indicator to > > other backends via pg_stat_activity. > > I think that would be a fine feature too. But I don't think that reduces the > desire clients have to be able to request updates on the status of their own > queries. another \x command could be added to psql to do just that > > In practice, if a query is taking long enough for this feature to be > > interesting, making another connection and looking to see what's happening > > is not a problem, and it's likely to be the most practical way anyway for > > many clients. > > It would be the most practical way for a DBA to monitor an application. But > it's not going to be convenient for clients like pgadmin or psql. Even a web > server may want to, for example, stream ajax code updating a progress bar > until it has results and then stream the ajax to display the results. Having > to get the backend pid before your query and then open a second database > connection to monitor your first connection would be extra footwork for > nothing. You would have to do some extra work anyway. opening another connection is not such a big deal. -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Greg Stark > Sent: 19 July 2006 10:19 > To: Tom Lane > Cc: Neil Conway; Gregory Stark; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Progress bar updates > > It would be the most practical way for a DBA to monitor an > application. But > it's not going to be convenient for clients like pgadmin or > psql. Even a web > server may want to, for example, stream ajax code updating a > progress bar > until it has results and then stream the ajax to display the > results. Having > to get the backend pid before your query and then open a > second database > connection to monitor your first connection would be extra > footwork for > nothing. No to mention that we already get occasional complaints about the number of connections pgAdmin can open (even though it's only one per database for the main app, plus one per query tool or data editor window). Regards, Dave.
Josh Berkus wrote: > Andreas, > > >> Some weeks ago I proposed a PROGRESS parameter for COPY, to enable >> progress feedback via notices. tgl thinks nobody needs that... >> > > Well, *Tom* doesn't need it. What mechanism did you propose to make this > work? > Extended the parser to accept that keyword, and emit notices when n lines were copied. I found that convenient when transferring a large amount of data, to estimate total runtime. Patch was submitted a while ago to -hackers, together with compression that was torn down in a way not suitable to inspire me to continue. Regards, Andreas Regards, Andreas
Greg Stark <gsstark@mit.edu> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> In practice, if a query is taking long enough for this feature to be >> interesting, making another connection and looking to see what's happening >> is not a problem, and it's likely to be the most practical way anyway for >> many clients. > It would be the most practical way for a DBA to monitor an application. But > it's not going to be convenient for clients like pgadmin or psql. [ shrug... ] Let me explain it to you this way: a progress counter visible through pg_stat_activity is something that might possibly get done in time for 8.2. If you insist on having the other stuff right off the bat as well, it won't get done this cycle. regards, tom lane
On Wednesday 19 July 2006 07:33, Tom Lane wrote: > Greg Stark <gsstark@mit.edu> writes: > > Tom Lane <tgl@sss.pgh.pa.us> writes: > >> In practice, if a query is taking long enough for this feature to be > >> interesting, making another connection and looking to see what's > >> happening is not a problem, and it's likely to be the most practical way > >> anyway for many clients. > > > > It would be the most practical way for a DBA to monitor an application. > > But it's not going to be convenient for clients like pgadmin or psql. > > [ shrug... ] Let me explain it to you this way: a progress counter > visible through pg_stat_activity is something that might possibly get > done in time for 8.2. If you insist on having the other stuff right > off the bat as well, it won't get done this cycle. Having the progress, or estimated time of completion in pg_stat_activity sounds like a good starting point, the rest of the desired features can be bolted on top of this down the road > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Darcy Buskermolen Wavefire Technologies Corp. http://www.wavefire.com ph: 250.717.0200 fx: 250.763.1759
Neil Conway wrote: > > I would suggest starting with utility functions like index builds or COPY > > which would have to be specially handled anyways. Handling all optimizable > > queries in a single generic implementation seems like something to tackle only > > once the basic infrastructure is there and working for simple cases. > > > > Of course the estimates would be not much better than guesses. > > Estimating query progress for DDL should be reasonably doable, but I > think it would require some hard thought to get even somewhat accurate > estimates for SELECT queries -- and I'm not sure there's much point > doing this if we don't at least have an idea how we might implement > reasonably accurate progress reporting for every kind of query. We already have EXPLAIN ANALYZE. Perhaps the right way to do this is something that provides similar output. I could see something that looks like EXPLAIN for the parts that have not yet executed, something reasonable to show progress of the currently active part of the plan (current time, rows, loops), and EXPLAIN ANALYZE output for the parts which have been completed. I can see how this might lead to dynamically re-planning queries. Going backwards, perhaps there's something related to progress monitoring that could be taken from the TelegraphCQ work? Drew
> It would be the most practical way for a DBA to monitor an application. But > it's not going to be convenient for clients like pgadmin or psql. Even a web > server may want to, for example, stream ajax code updating a progress bar > until it has results and then stream the ajax to display the results. Having > to get the backend pid before your query and then open a second database > connection to monitor your first connection would be extra footwork for > nothing. But that said, it CAN be coded and work just fine no?
Why make it so complicated? There could be a guc to indicate that the client is interested in progress updates. For the execution phase, elog(INFO,...) could be emitted for each major plan node. (The client would probably run the explain plan beforehand or it would be embedded in the elog). During the downloading of the rows, the client would display the bar relative to the number of estimated rows returned. -M On Jul 18, 2006, at 2:35 PM, Gregory Stark wrote: > > Has anyone looked thought about what it would take to get progress > bars from > clients like pgadmin? (Or dare I even suggest psql:) > > My first thought would be a message like CancelQuery which would cause > the > backend to peek into a static data structure and return a message that > the > client could parse and display something intelligent. Various commands > would > then stuff information into this data structure as they worked. ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ AgentM agentm@themactionfaction.com ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬
> We already have EXPLAIN ANALYZE. Perhaps the right way to do this is > something that provides similar output. I could see something that > looks like EXPLAIN for the parts that have not yet executed, something > reasonable to show progress of the currently active part of the plan > (current time, rows, loops), and EXPLAIN ANALYZE output for the parts > which have been completed. Now this is something that would really help testing a system, by dynamically seeing the plans of queries which run too long. That combined with the ability to see the values of bind parameters would be a useful debug aid. Cheers, Csaba.
This is how netezza and teradata do it and it works very well. In each of their cases you can see a graphical representation of the plan with progress for each stage. For the command line it would be great to just dump the current status, which would provide a snapshot of the explain analyze. - Luke Sent from my GoodLink synchronized handheld (www.good.com) -----Original Message----- From: Csaba Nagy [mailto:nagy@ecircle-ag.com] Sent: Thursday, July 20, 2006 04:52 AM Eastern Standard Time To: Andrew Hammond Cc: postgres hackers Subject: Re: [HACKERS] Progress bar updates > We already have EXPLAIN ANALYZE. Perhaps the right way to do this is > something that provides similar output. I could see something that > looks like EXPLAIN for the parts that have not yet executed, something > reasonable to show progress of the currently active part of the plan > (current time, rows, loops), and EXPLAIN ANALYZE output for the parts > which have been completed. Now this is something that would really help testing a system, by dynamically seeing the plans of queries which run too long. That combined with the ability to see the values of bind parameters would be a useful debug aid. Cheers, Csaba. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org