Thread: Parallel query execution
I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Years ago I added thread-safety to libpq. Recently I added two parallel execution paths to pg_upgrade. The first parallel path allows execution of external binaries pg_dump and psql (to restore). The second parallel path does copy/link by calling fork/thread-safe C functions. I was able to do each in 2-3 days. I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I have summarized my ideas by updating our Parallel Query Execution wiki page: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Please consider updating the page yourself or posting your ideas to this thread. Thanks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
* Bruce Momjian (bruce@momjian.us) wrote: > Parallelism isn't going to help all queries, in fact it might be just a > small subset, but it will be the larger queries. The pg_upgrade > parallelism only helps clusters with multiple databases or tablespaces, > but the improvements are significant. This would be fantastic and I'd like to help. Parallel query and real partitioning are two of our biggest holes for OLAP and data warehouse users. > Please consider updating the page yourself or posting your ideas to this > thread. Thanks. Will do. Thanks, Stephen
On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote: > I believe it is time to start adding parallel execution to the backend. > We already have some parallelism in the backend: > effective_io_concurrency and helper processes. I think it is time we > start to consider additional options. A few months back, I remarked [1] that speeding up sorting using pipelining and asynchronous I/O was probably parallelism low-hanging fruit. That hasn't changed, though I personally still don't have the bandwidth to look into it in a serious way. [1] http://www.postgresql.org/message-id/CAEYLb_VeZpKDX54VEx3X30oy_UOTh89XoejJW6aucjjiUjskXw@mail.gmail.com -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Tue, Jan 15, 2013 at 10:39:10PM +0000, Peter Geoghegan wrote: > On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote: > > I believe it is time to start adding parallel execution to the backend. > > We already have some parallelism in the backend: > > effective_io_concurrency and helper processes. I think it is time we > > start to consider additional options. > > A few months back, I remarked [1] that speeding up sorting using > pipelining and asynchronous I/O was probably parallelism low-hanging > fruit. That hasn't changed, though I personally still don't have the > bandwidth to look into it in a serious way. > > [1] http://www.postgresql.org/message-id/CAEYLb_VeZpKDX54VEx3X30oy_UOTh89XoejJW6aucjjiUjskXw@mail.gmail.com OK, I added the link to the wiki. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Tue, Jan 15, 2013 at 10:53:29PM +0000, Simon Riggs wrote: > On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote: > > > I mentioned last year that I wanted to start working on parallelism: > > We don't normally begin discussing topics for next release just as a > CF is starting. > > Why is this being discussed now? It is for 9.4 and will take months. I didn't think there was a better time. We don't usually discuss features during beta testing. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 15 January 2013 22:14, Bruce Momjian <bruce@momjian.us> wrote: > I mentioned last year that I wanted to start working on parallelism: We don't normally begin discussing topics for next release just as a CF is starting. Why is this being discussed now? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 15 January 2013 22:55, Bruce Momjian <bruce@momjian.us> wrote: >> Why is this being discussed now? > > It is for 9.4 and will take months. I didn't think there was a better > time. We don't usually discuss features during beta testing. Bruce, there are many, many patches on the queue. How will we ever get to beta testing if we begin open ended discussions on next release? If we can't finish what we've started for 9.3, why talk about 9.4? Yes, its a great topic for discussion, but there are better times. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
<div class="moz-cite-prefix">On 16/01/13 11:14, Bruce Momjian wrote:<br /></div><blockquote cite="mid:20130115221419.GI27934@momjian.us"type="cite"><pre wrap="">I mentioned last year that I wanted to start workingon parallelism: <a class="moz-txt-link-freetext" href="https://wiki.postgresql.org/wiki/Parallel_Query_Execution">https://wiki.postgresql.org/wiki/Parallel_Query_Execution</a> Years ago I added thread-safety to libpq. Recently I added two parallel execution paths to pg_upgrade. The first parallel path allows execution of external binaries pg_dump and psql (to restore). The second parallel path does copy/link by calling fork/thread-safe C functions. I was able to do each in 2-3 days. I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I have summarized my ideas by updating our Parallel Query Execution wiki page: <a class="moz-txt-link-freetext" href="https://wiki.postgresql.org/wiki/Parallel_Query_Execution">https://wiki.postgresql.org/wiki/Parallel_Query_Execution</a> Please consider updating the page yourself or posting your ideas to this thread. Thanks. </pre></blockquote><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1">Hmm...<br /><br /> How about being aware of multiple spindles - so if the requested data coversmultiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels?<br/><br /> On large multiple processor machines, there are different blocks of memory that might be accessed atdifferent speeds depending on the processor. Possibly a mechanism could be used to split a transaction over multiple processorsto ensure the fastest memory is used?<br /><br /> Once a selection of rows has been made, then if there is a lotof reformatting going on, then could this be done in parallel? I can of think of 2 very simplistic strategies: (A) usea different processor core for each column, or (B) farm out sets of rows to different cores. I am sure in reality, thereare more subtleties and aspects of both the strategies will be used in a hybrid fashion along with other approaches.<br/><br /> I expect that before any parallel algorithm is invoked, then some sort of threshold needs to be exceededto make it worth while. Different aspects of the parallel algorithm may have their own thresholds. It may not beworth applying a parallel algorithm for 10 rows from a simple table, but selecting 10,000 records from multiple tableseach over 10 <font size="-1">million</font> rows using joins may <font size="-1">benefit</font> for more extreme parallelism.<br/><br /> I expect that UNIONs, as well as the processing of partitioned tables, may be amenable to parallelprocessing.<br /><br /><br /> Cheers,<br /> Gavin<br /><br /></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font>
On Tue, Jan 15, 2013 at 11:01:04PM +0000, Simon Riggs wrote: > On 15 January 2013 22:55, Bruce Momjian <bruce@momjian.us> wrote: > > >> Why is this being discussed now? > > > > It is for 9.4 and will take months. I didn't think there was a better > > time. We don't usually discuss features during beta testing. > > Bruce, there are many, many patches on the queue. How will we ever get > to beta testing if we begin open ended discussions on next release? > > If we can't finish what we've started for 9.3, why talk about 9.4? > > Yes, its a great topic for discussion, but there are better times. Like when? I don't remember a policy of not discussing things now. Does anyone else remember this? Are you saying feature discussion is only between commit-fests? Is this written down anywhere? I only remember beta-time as a time not to discuss features. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Jan 16, 2013 at 12:03:50PM +1300, Gavin Flower wrote: > On 16/01/13 11:14, Bruce Momjian wrote: > > I mentioned last year that I wanted to start working on parallelism: > > https://wiki.postgresql.org/wiki/Parallel_Query_Execution > > Years ago I added thread-safety to libpq. Recently I added two parallel > execution paths to pg_upgrade. The first parallel path allows execution > of external binaries pg_dump and psql (to restore). The second parallel > path does copy/link by calling fork/thread-safe C functions. I was able > to do each in 2-3 days. > > I believe it is time to start adding parallel execution to the backend. > We already have some parallelism in the backend: > effective_io_concurrency and helper processes. I think it is time we > start to consider additional options. > > Parallelism isn't going to help all queries, in fact it might be just a > small subset, but it will be the larger queries. The pg_upgrade > parallelism only helps clusters with multiple databases or tablespaces, > but the improvements are significant. > > I have summarized my ideas by updating our Parallel Query Execution wiki > page: > > https://wiki.postgresql.org/wiki/Parallel_Query_Execution > > Please consider updating the page yourself or posting your ideas to this > thread. Thanks. > > > Hmm... > > How about being aware of multiple spindles - so if the requested data covers > multiple spindles, then data could be extracted in parallel. This may, or may > not, involve multiple I/O channels? Well, we usually label these as tablespaces. I don't know if spindle-level is a reasonable level to add. > On large multiple processor machines, there are different blocks of memory that > might be accessed at different speeds depending on the processor. Possibly a > mechanism could be used to split a transaction over multiple processors to > ensure the fastest memory is used? That seems too far-out for an initial approach. > Once a selection of rows has been made, then if there is a lot of reformatting > going on, then could this be done in parallel? I can of think of 2 very > simplistic strategies: (A) use a different processor core for each column, or > (B) farm out sets of rows to different cores. I am sure in reality, there are > more subtleties and aspects of both the strategies will be used in a hybrid > fashion along with other approaches. Probably #2, but that is going to require having some of modules thread/fork-safe, and that is going to be tricky. > I expect that before any parallel algorithm is invoked, then some sort of > threshold needs to be exceeded to make it worth while. Different aspects of > the parallel algorithm may have their own thresholds. It may not be worth > applying a parallel algorithm for 10 rows from a simple table, but selecting > 10,000 records from multiple tables each over 10 million rows using joins may > benefit for more extreme parallelism. Right, I bet we will need some way to control when the overhead of parallel execution is worth it. > I expect that UNIONs, as well as the processing of partitioned tables, may be > amenable to parallel processing. Interesting idea on UNION. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
* Gavin Flower (GavinFlower@archidevsys.co.nz) wrote: > How about being aware of multiple spindles - so if the requested > data covers multiple spindles, then data could be extracted in > parallel. This may, or may not, involve multiple I/O channels? Yes, this should dovetail with partitioning and tablespaces to pick up on exactly that. We're implementing our own poor-man's parallelism using exactly this to use as much of the CPU and I/O bandwidth as we can. I have every confidence that it could be done better and be simpler for us if it was handled in the backend. > On large multiple processor machines, there are different blocks of > memory that might be accessed at different speeds depending on the > processor. Possibly a mechanism could be used to split a transaction > over multiple processors to ensure the fastest memory is used? Let's work on getting it working on the h/w that PG is most commonly deployed on first.. I agree that we don't want to paint ourselves into a corner with this, but I don't think massive NUMA systems are what we should focus on first (are you familiar with any that run PG today..?). I don't expect we're going to be trying to fight with the Linux (or whatever) kernel over what threads run on what processors with access to what memory on small-NUMA systems (x86-based). > Once a selection of rows has been made, then if there is a lot of > reformatting going on, then could this be done in parallel? I can > of think of 2 very simplistic strategies: (A) use a different > processor core for each column, or (B) farm out sets of rows to > different cores. I am sure in reality, there are more subtleties > and aspects of both the strategies will be used in a hybrid fashion > along with other approaches. Given our row-based storage architecture, I can't imagine we'd do anything other than take a row-based approach to this.. I would think we'd do two things: parallelize based on partitioning, and parallelize seqscan's across the individual heap files which are split on a per-1G boundary already. Perhaps we can generalize that and scale it based on the number of available processors and the size of the relation but I could see advantages in matching up with what the kernel thinks are independent files. > I expect that before any parallel algorithm is invoked, then some > sort of threshold needs to be exceeded to make it worth while. Certainly. That's need to be included in the optimization model to support this. Thanks, Stephen
On Tue, Jan 15, 2013 at 06:15:57PM -0500, Stephen Frost wrote: > * Gavin Flower (GavinFlower@archidevsys.co.nz) wrote: > > How about being aware of multiple spindles - so if the requested > > data covers multiple spindles, then data could be extracted in > > parallel. This may, or may not, involve multiple I/O channels? > > Yes, this should dovetail with partitioning and tablespaces to pick up > on exactly that. We're implementing our own poor-man's parallelism > using exactly this to use as much of the CPU and I/O bandwidth as we > can. I have every confidence that it could be done better and be > simpler for us if it was handled in the backend. Yes, I have listed tablespaces and partitions as possible parallel options on the wiki. > > On large multiple processor machines, there are different blocks of > > memory that might be accessed at different speeds depending on the > > processor. Possibly a mechanism could be used to split a transaction > > over multiple processors to ensure the fastest memory is used? > > Let's work on getting it working on the h/w that PG is most commonly > deployed on first.. I agree that we don't want to paint ourselves into > a corner with this, but I don't think massive NUMA systems are what we > should focus on first (are you familiar with any that run PG today..?). > I don't expect we're going to be trying to fight with the Linux (or > whatever) kernel over what threads run on what processors with access to > what memory on small-NUMA systems (x86-based). Agreed. > > Once a selection of rows has been made, then if there is a lot of > > reformatting going on, then could this be done in parallel? I can > > of think of 2 very simplistic strategies: (A) use a different > > processor core for each column, or (B) farm out sets of rows to > > different cores. I am sure in reality, there are more subtleties > > and aspects of both the strategies will be used in a hybrid fashion > > along with other approaches. > > Given our row-based storage architecture, I can't imagine we'd do > anything other than take a row-based approach to this.. I would think > we'd do two things: parallelize based on partitioning, and parallelize > seqscan's across the individual heap files which are split on a per-1G > boundary already. Perhaps we can generalize that and scale it based on > the number of available processors and the size of the relation but I > could see advantages in matching up with what the kernel thinks are > independent files. The 1GB idea is interesting. I found in pg_upgrade that file copy would just overwhelm the I/O channel, and that doing multiple copies on the same device had no win, but those were pure I/O operations --- a sequential scan might be enough of a mix of I/O and CPU that parallelism might help. > > I expect that before any parallel algorithm is invoked, then some > > sort of threshold needs to be exceeded to make it worth while. > > Certainly. That's need to be included in the optimization model to > support this. I have updated the wiki to reflect the ideas mentioned above. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
<br /><br /><div class="gmail_quote">On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian <span dir="ltr"><<a href="mailto:bruce@momjian.us"target="_blank">bruce@momjian.us</a>></span> wrote:<br /><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I mentioned last year that I wantedto start working on parallelism:<br /><br /> <a href="https://wiki.postgresql.org/wiki/Parallel_Query_Execution" target="_blank">https://wiki.postgresql.org/wiki/Parallel_Query_Execution</a><br/><br /> Years ago I added thread-safetyto libpq. Recently I added two parallel<br /> execution paths to pg_upgrade. The first parallel path allowsexecution<br /> of external binaries pg_dump and psql (to restore). The second parallel<br /> path does copy/linkby calling fork/thread-safe C functions. I was able<br /> to do each in 2-3 days.<br /><br /> I believe it is timeto start adding parallel execution to the backend.<br /> We already have some parallelism in the backend:<br /> effective_io_concurrencyand helper processes. I think it is time we<br /> start to consider additional options.<br /><br/> Parallelism isn't going to help all queries, in fact it might be just a<br /> small subset, but it will be the largerqueries. The pg_upgrade<br /> parallelism only helps clusters with multiple databases or tablespaces,<br /> but theimprovements are significant.<br /><br /> I have summarized my ideas by updating our Parallel Query Execution wiki<br/> page:<br /><br /> <a href="https://wiki.postgresql.org/wiki/Parallel_Query_Execution" target="_blank">https://wiki.postgresql.org/wiki/Parallel_Query_Execution</a><br/><br /> Please consider updating the pageyourself or posting your ideas to this<br /> thread. Thanks.<span class="HOEnZb"></span><br /></blockquote></div>Honestlythat would be a great feature, and I would be happy helping working on it.<br />Taking advantageof parallelism in a server with multiple core, especially for things like large sorting operations would be great.<br/> Just thinking loudly, but wouldn't it be the role of the planner to determine if such or such query is worthusing parallelism? The executor would then be in charge of actually firing the tasks in parallel that planner has determinednecessary to do.<br /> -- <br />Michael Paquier<br /><a href="http://michael.otacoo.com" target="_blank">http://michael.otacoo.com</a>
On Wed, Jan 16, 2013 at 09:11:20AM +0900, Michael Paquier wrote: > > > On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian <bruce@momjian.us> wrote: > > I mentioned last year that I wanted to start working on parallelism: > > https://wiki.postgresql.org/wiki/Parallel_Query_Execution > > Years ago I added thread-safety to libpq. Recently I added two parallel > execution paths to pg_upgrade. The first parallel path allows execution > of external binaries pg_dump and psql (to restore). The second parallel > path does copy/link by calling fork/thread-safe C functions. I was able > to do each in 2-3 days. > > I believe it is time to start adding parallel execution to the backend. > We already have some parallelism in the backend: > effective_io_concurrency and helper processes. I think it is time we > start to consider additional options. > > Parallelism isn't going to help all queries, in fact it might be just a > small subset, but it will be the larger queries. The pg_upgrade > parallelism only helps clusters with multiple databases or tablespaces, > but the improvements are significant. > > I have summarized my ideas by updating our Parallel Query Execution wiki > page: > > https://wiki.postgresql.org/wiki/Parallel_Query_Execution > > Please consider updating the page yourself or posting your ideas to this > thread. Thanks. > > Honestly that would be a great feature, and I would be happy helping working on > it. > Taking advantage of parallelism in a server with multiple core, especially for > things like large sorting operations would be great. > Just thinking loudly, but wouldn't it be the role of the planner to determine > if such or such query is worth using parallelism? The executor would then be in > charge of actually firing the tasks in parallel that planner has determined > necessary to do. Yes, it would probably be driven off of the optimizer statistics. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian <bruce@momjian.us> wrote: >> Given our row-based storage architecture, I can't imagine we'd do >> anything other than take a row-based approach to this.. I would think >> we'd do two things: parallelize based on partitioning, and parallelize >> seqscan's across the individual heap files which are split on a per-1G >> boundary already. Perhaps we can generalize that and scale it based on >> the number of available processors and the size of the relation but I >> could see advantages in matching up with what the kernel thinks are >> independent files. > > The 1GB idea is interesting. I found in pg_upgrade that file copy would > just overwhelm the I/O channel, and that doing multiple copies on the > same device had no win, but those were pure I/O operations --- a > sequential scan might be enough of a mix of I/O and CPU that parallelism > might help. AFAIR, synchroscans were introduced because multiple large sequential scans were counterproductive (big time).
* Claudio Freire (klaussfreire@gmail.com) wrote: > On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian <bruce@momjian.us> wrote: > > The 1GB idea is interesting. I found in pg_upgrade that file copy would > > just overwhelm the I/O channel, and that doing multiple copies on the > > same device had no win, but those were pure I/O operations --- a > > sequential scan might be enough of a mix of I/O and CPU that parallelism > > might help. > > AFAIR, synchroscans were introduced because multiple large sequential > scans were counterproductive (big time). Sequentially scanning the *same* data over and over is certainly counterprouctive. Synchroscans fixed that, yes. That's not what we're talking about though- we're talking about scanning and processing independent sets of data using multiple processes. It's certainly possible that in some cases that won't be as good, but there will be quite a few cases where it's much, much better. Consider a very complicated function running against each row which makes the CPU the bottleneck instead of the i/o system. That type of a query will never run faster than a single CPU in a single-process environment, regardless of if you have synch-scans or not, while in a multi-process environment you'll take advantage of the extra CPUs which are available and use more of the I/O bandwidth that isn't yet exhausted. Thanks, Stephen
On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Claudio Freire (klaussfreire@gmail.com) wrote: >> On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian <bruce@momjian.us> wrote: >> > The 1GB idea is interesting. I found in pg_upgrade that file copy would >> > just overwhelm the I/O channel, and that doing multiple copies on the >> > same device had no win, but those were pure I/O operations --- a >> > sequential scan might be enough of a mix of I/O and CPU that parallelism >> > might help. >> >> AFAIR, synchroscans were introduced because multiple large sequential >> scans were counterproductive (big time). > > Sequentially scanning the *same* data over and over is certainly > counterprouctive. Synchroscans fixed that, yes. That's not what we're > talking about though- we're talking about scanning and processing > independent sets of data using multiple processes. I don't see the difference. Blocks are blocks (unless they're cached). > It's certainly > possible that in some cases that won't be as good If memory serves me correctly (and it does, I suffered it a lot), the performance hit is quite considerable. Enough to make it "a lot worse" rather than "not as good". > but there will be > quite a few cases where it's much, much better. Just cached segments.
* Claudio Freire (klaussfreire@gmail.com) wrote: > On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost <sfrost@snowman.net> wrote: > > Sequentially scanning the *same* data over and over is certainly > > counterprouctive. Synchroscans fixed that, yes. That's not what we're > > talking about though- we're talking about scanning and processing > > independent sets of data using multiple processes. > > I don't see the difference. Blocks are blocks (unless they're cached). Not quite. Having to go out to the kernel isn't free. Additionally, the seq scans used to pollute our shared buffers prior to synch-scanning, which didn't help things. > > It's certainly > > possible that in some cases that won't be as good > > If memory serves me correctly (and it does, I suffered it a lot), the > performance hit is quite considerable. Enough to make it "a lot worse" > rather than "not as good". I feel like we must not be communicating very well. If the CPU is pegged at 100% and the I/O system is at 20%, adding another CPU at 100% will bring the I/O load up to 40% and you're now processing data twice as fast overall. If you're running a single CPU at 20% and your I/O system is at 100%, then adding another CPU isn't going to help and may even degrade performance by causing problems for the I/O system. The goal of the optimizer will be to model the plan to account for exactly that, as best it can. > > but there will be > > quite a few cases where it's much, much better. > > Just cached segments. No, certainly not just cached segments. Any situation where the CPU is the bottleneck. Thanks, Stephen
>> but there will be >> quite a few cases where it's much, much better. > > Just cached segments. Actually, thanks to much faster storage (think SSD, SAN), it's easily possible for PostgreSQL to become CPU-limited on a seq scan query, even when reading from disk. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
* Josh Berkus (josh@agliodbs.com) wrote: > Actually, thanks to much faster storage (think SSD, SAN), it's easily > possible for PostgreSQL to become CPU-limited on a seq scan query, even > when reading from disk. Particularly with a complex filter being applied or if it's feeding into something above that's expensive.. Thanks, Stephen
Claudio, Stephen, It really seems like the areas where we could get the most "bang for the buck" in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus <josh@agliodbs.com> wrote:
-- Claudio, Stephen,
It really seems like the areas where we could get the most "bang for the
buck" in parallelism would be:
1. Parallel sort
2. Parallel aggregation (for commutative aggregates)
3. Parallel nested loop join (especially for expression joins, like GIS)
parallel data load? :/
Michael Paquier
http://michael.otacoo.com
On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: > > > On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus <josh@agliodbs.com> wrote: > > Claudio, Stephen, > > It really seems like the areas where we could get the most "bang for the > buck" in parallelism would be: > > 1. Parallel sort > 2. Parallel aggregation (for commutative aggregates) > 3. Parallel nested loop join (especially for expression joins, like GIS) > > parallel data load? :/ We have that in pg_restore, and I thinnk we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Jan 16, 2013 at 1:32 PM, Bruce Momjian <bruce@momjian.us> wrote:
Michael PaquierOn Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:We have that in pg_restore, and I thinnk we are getting parallel dump in
>
>
> On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus <josh@agliodbs.com> wrote:
>
> Claudio, Stephen,
>
> It really seems like the areas where we could get the most "bang for the
> buck" in parallelism would be:
>
> 1. Parallel sort
> 2. Parallel aggregation (for commutative aggregates)
> 3. Parallel nested loop join (especially for expression joins, like GIS)
>
> parallel data load? :/
9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest.
Is it still being worked on?
Not exactly, I meant something like being able to use parallel processing when doing INSERT or COPY directly in core. If there is a parallel processing infrastructure, it could also be used for such write operations. I agree that the cases mentioned by Josh are far more appealing though...
--
--
http://michael.otacoo.com
On Wed, Jan 16, 2013 at 12:55 AM, Stephen Frost <sfrost@snowman.net> wrote: >> If memory serves me correctly (and it does, I suffered it a lot), the >> performance hit is quite considerable. Enough to make it "a lot worse" >> rather than "not as good". > > I feel like we must not be communicating very well. > > If the CPU is pegged at 100% and the I/O system is at 20%, adding > another CPU at 100% will bring the I/O load up to 40% and you're now > processing data twice as fast overall Well, there's the fault in your logic. It won't be as linear. Adding another sequential scan will decrease bandwidth, if the I/O system was doing say 10MB/s at 20% load, now it will be doing 20MB/s at 80% load (maybe even worse). Quite suddenly you'll meet diminishing returns, and the I/O subsystem which wasn't the bottleneck will become it, bandwidth being the key. You might end up with less bandwidth than you've started, if you go far enough past that knee. Add some concurrent operations (connections) to the mix and it just gets worse. Figuring out where the knee is may be the hardest problem you'll face. I don't think it'll be predictable enough to make I/O parallelization in that case worth the effort. If you instead think of parallelizing random I/O (say index scans within nested loops), that might work (or it might not). Again it depends a helluva lot on what else is contending with the I/O resources and how far ahead of optimum you push it. I've faced this problem when trying to prefetch on index scans. If you try to prefetch too much, you induce extra delays and it's a bad tradeoff. Feel free to do your own testing.
Bruce Momjian escribió: > On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: > > > > > > On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus <josh@agliodbs.com> wrote: > > > > Claudio, Stephen, > > > > It really seems like the areas where we could get the most "bang for the > > buck" in parallelism would be: > > > > 1. Parallel sort > > 2. Parallel aggregation (for commutative aggregates) > > 3. Parallel nested loop join (especially for expression joins, like GIS) > > > > parallel data load? :/ > > We have that in pg_restore, and I thinnk we are getting parallel dump in > 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. > Is it still being worked on? It's in the previous-to-last commitfest. IIRC that patch required review and testing from people with some Windows background. There are still 34 items needing attention in CF3. I suggest that, if you have some spare time, your help would be very much appreciated there. The commitfest that started on Jan 15th has 65 extra items. Anything currently listed in CF3 can rightfully be considered to be part of CF4, too. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tuesday, January 15, 2013, Simon Riggs wrote:
On 15 January 2013 22:55, Bruce Momjian <bruce@momjian.us> wrote:
>> Why is this being discussed now?
>
> It is for 9.4 and will take months. I didn't think there was a better
> time. We don't usually discuss features during beta testing.
Bruce, there are many, many patches on the queue. How will we ever get
to beta testing if we begin open ended discussions on next release?
If we can't finish what we've started for 9.3, why talk about 9.4?
Yes, its a great topic for discussion, but there are better times.
Possibly so. But unless we are to introduce a "thinkfest", how do we know when such a better time would be?
Lately commit-fests have been basically a continuous thing, except during beta which would be an even worse time to discuss it. It think that parallel execution is huge and probably more likely for 9.5 (10.0?) than 9.4 for the general case (maybe some special cases for 9.4, like index builds). Yet the single biggest risk I see to the future of the project is the lack of parallel execution.
Cheers,
Jeff
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > There are still 34 items needing attention in CF3. I suggest that, if > you have some spare time, your help would be very much appreciated > there. The commitfest that started on Jan 15th has 65 extra items. > Anything currently listed in CF3 can rightfully be considered to be part > of CF4, too. In case you hadn't noticed, we've totally lost control of the CF process. Quite aside from the lack of progress on closing CF3, major hackers who should know better are submitting significant new feature patches now, despite our agreement in Ottawa that nothing big would be accepted after CF3. At this point I'd bet against releasing 9.3 during 2013. regards, tom lane
At 2013-01-16 02:07:29 -0500, tgl@sss.pgh.pa.us wrote: > > In case you hadn't noticed, we've totally lost control of > the CF process. What can we do to get it back on track? I know various people (myself included) have been trying to keep CF3 moving, e.g. sending followup mail, adjusting patch status, etc. I want to help, but I don't know what's wrong. What are the committers working on, and what is the status of the "Ready for commiter" patches? Is the problem that the patches marked Ready aren't, in fact, ready? Or is it lack of feedback from authors? Or something else? Would it help at all to move all pending items (i.e. anything less than ready) from CF3 to CF4, just so that the committers have only one list to look at, while reviewers can work on the other? Only psychological, but maybe that's better than the current situation? -- Abhijit
On Tue, Jan 15, 2013 at 11:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> There are still 34 items needing attention in CF3. I suggest that, if >> you have some spare time, your help would be very much appreciated >> there. The commitfest that started on Jan 15th has 65 extra items. >> Anything currently listed in CF3 can rightfully be considered to be part >> of CF4, too. > > In case you hadn't noticed, we've totally lost control of the CF > process. Quite aside from the lack of progress on closing CF3, major > hackers who should know better are submitting significant new feature > patches now, despite our agreement in Ottawa that nothing big would be > accepted after CF3. At this point I'd bet against releasing 9.3 during > 2013. I have been skimming the commitfest application, and unlike some of the previous commitfests a huge number of patches have had review at some point in time, but probably need more...so looking for the red "Nobody" in the 'reviewers' column probably understates the shortage of review. I'm curious what the qualitative feelings are on patches or clusters thereof and what kind of review would be helpful in clearing the field. -- fdr
On Wed, Jan 16, 2013 at 12:03 AM, Bruce Momjian <bruce@momjian.us> wrote: > On Tue, Jan 15, 2013 at 11:01:04PM +0000, Simon Riggs wrote: >> On 15 January 2013 22:55, Bruce Momjian <bruce@momjian.us> wrote: >> >> >> Why is this being discussed now? >> > >> > It is for 9.4 and will take months. I didn't think there was a better >> > time. We don't usually discuss features during beta testing. >> >> Bruce, there are many, many patches on the queue. How will we ever get >> to beta testing if we begin open ended discussions on next release? >> >> If we can't finish what we've started for 9.3, why talk about 9.4? >> >> Yes, its a great topic for discussion, but there are better times. > > Like when? I don't remember a policy of not discussing things now. > Does anyone else remember this? Are you saying feature discussion is > only between commit-fests? Is this written down anywhere? I only > remember beta-time as a time not to discuss features. We kind of do - when in a CF we should do reviewing of existing patches, when outside a CF we should do discussions and work on new features. It's on http://wiki.postgresql.org/wiki/CommitFest. It doesn't specifically say do this and don't do htat, but it says focus on review and discussing things that will happen that far ahead is definitely not focusing on review. --Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On 2013-01-16 02:07:29 -0500, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > There are still 34 items needing attention in CF3. I suggest that, if > > you have some spare time, your help would be very much appreciated > > there. The commitfest that started on Jan 15th has 65 extra items. > > Anything currently listed in CF3 can rightfully be considered to be part > > of CF4, too. > > In case you hadn't noticed, we've totally lost control of the CF > process. Quite aside from the lack of progress on closing CF3, major > hackers who should know better are submitting significant new feature > patches now, despite our agreement in Ottawa that nothing big would be > accepted after CF3. At this point I'd bet against releasing 9.3 during > 2013. To be honest I don't think new patches coming in late are the actual problem though. If you look at the CF progress in December and January there have been rather few non-trivial commits of patches that haven't been authored by committers: - auto updatable views (Dean via Tom) - autovacuum truncate lock (Jan via Kevin) - minimal recovery.conf (Zoltan via Magnus) I might have missed one or two, but thats about it. There's quite some smaller patches going through but thats all coming directly from on-list conversations. I benefit from that, no question, but its still frustrating for others. At this point it feels like reviewing doesn't really help all that much. There's a noticeable amount of patches in "Ready for Committer" state and while one or two might not have that status rightfully thats made up by others that are ready but not marked as such. While I think I put a fair amount of time into reviewing I am happy to increase that, but if it just means patches are lingering around anyway, why bother? Now that sounds a bit like I am blaming committers. I am not. Its their own time and they all got other stuff they need (but not necessarily like) to to do and thats absolutely fair. But we still need a way to at least partially solve this issue. Regards, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jan 16, 2013 at 9:21 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: > At 2013-01-16 02:07:29 -0500, tgl@sss.pgh.pa.us wrote: >> >> In case you hadn't noticed, we've totally lost control of >> the CF process. > > What can we do to get it back on track? Not sure. One start might be to actually start having commitfest managers. I haven't actually checked numbers, but my feeling is that last round most commitfests (maybe not all) had a commitfest manager that worked (usually hard) at keeping things moving, whereas this round most (or all?) commitfests have been unmanaged. I'm not sure if that's the reason for it, but it's a feeling I have.. > I know various people (myself included) have been trying to keep CF3 > moving, e.g. sending followup mail, adjusting patch status, etc. > > I want to help, but I don't know what's wrong. What are the committers > working on, and what is the status of the "Ready for commiter" patches? > Is the problem that the patches marked Ready aren't, in fact, ready? Or > is it lack of feedback from authors? Or something else? I'm not one of the committers that pick up the most patches, but from reading messages on the lists I think fairly often patches that are marked as ready, aren't. Sometimes they require a small change, which is fine, but more often than not once it hits a committer it ends up with a lot of feedback requiring rather extensive changes. As in it technical works, but it's better to do it in a different way. I'm not sure how to catch those better. > Would it help at all to move all pending items (i.e. anything less than > ready) from CF3 to CF4, just so that the committers have only one list > to look at, while reviewers can work on the other? Only psychological, > but maybe that's better than the current situation? No, it would. They should've been bounced to the next commitfest when the last one closed. The problem was we never closed it... Tom also wrote: > In case you hadn't noticed, we've totally lost control of the CF > process. Quite aside from the lack of progress on closing CF3, major > hackers who should know better are submitting significant new feature > patches now, despite our agreement in Ottawa that nothing big would be > accepted after CF3. At this point I'd bet against releasing 9.3 during > 2013. Well, if we said that, why don't we just single out those patches, and bounce them right now. Before people put more work into it. We also talked about the one-patch-one-review. Did someone ever check if that worked out - did we get that spread, or did we end up with the same ratio as last time? --Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On 16 January 2013 08:21, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: > At 2013-01-16 02:07:29 -0500, tgl@sss.pgh.pa.us wrote: >> >> In case you hadn't noticed, we've totally lost control of >> the CF process. > > What can we do to get it back on track? "Totally lost control" is an overstatement. The current situation is that there are clearly more patches than people working on them; a situation that we've been in, with various degrees for all of the last 9 years of my involvement with PostgreSQL. AFAIK there was no CF manager assigned to Nov2012 CF, so not surprising it is off track. Not sure we ever agreed a process for assigning CF mgrs, so again, not surprising it is off track. I would like to nominate Craig Ringer to be independent CF mgr for Jan2013 CF. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jan 16, 2013 at 1:12 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 16 January 2013 08:21, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: >> At 2013-01-16 02:07:29 -0500, tgl@sss.pgh.pa.us wrote: >>> >>> In case you hadn't noticed, we've totally lost control of >>> the CF process. >> >> What can we do to get it back on track? > > "Totally lost control" is an overstatement. The current situation is > that there are clearly more patches than people working on them; a > situation that we've been in, with various degrees for all of the last > 9 years of my involvement with PostgreSQL. > > AFAIK there was no CF manager assigned to Nov2012 CF, so not > surprising it is off track. Not sure we ever agreed a process for > assigning CF mgrs, so again, not surprising it is off track. I don't think we have a process for it, and that's perhaps something we need to discuss for the future. > I would like to nominate Craig Ringer to be independent CF mgr for Jan2013 CF. If he's willing to do that, then +1 from me. --Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Wed, Jan 16, 2013 at 2:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> There are still 34 items needing attention in CF3. I suggest that, if >> you have some spare time, your help would be very much appreciated >> there. The commitfest that started on Jan 15th has 65 extra items. >> Anything currently listed in CF3 can rightfully be considered to be part >> of CF4, too. > > In case you hadn't noticed, we've totally lost control of the CF > process. Quite aside from the lack of progress on closing CF3, major > hackers who should know better are submitting significant new feature > patches now, despite our agreement in Ottawa that nothing big would be > accepted after CF3. At this point I'd bet against releasing 9.3 during > 2013. Or we could reject all of those patches. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jan 16, 2013 at 6:52 AM, Magnus Hagander <magnus@hagander.net> wrote: > On Wed, Jan 16, 2013 at 12:03 AM, Bruce Momjian <bruce@momjian.us> wrote: >> On Tue, Jan 15, 2013 at 11:01:04PM +0000, Simon Riggs wrote: >>> On 15 January 2013 22:55, Bruce Momjian <bruce@momjian.us> wrote: >>> >>> >> Why is this being discussed now? >>> > >>> > It is for 9.4 and will take months. I didn't think there was a better >>> > time. We don't usually discuss features during beta testing. >>> >>> Bruce, there are many, many patches on the queue. How will we ever get >>> to beta testing if we begin open ended discussions on next release? >>> >>> If we can't finish what we've started for 9.3, why talk about 9.4? >>> >>> Yes, its a great topic for discussion, but there are better times. >> >> Like when? I don't remember a policy of not discussing things now. >> Does anyone else remember this? Are you saying feature discussion is >> only between commit-fests? Is this written down anywhere? I only >> remember beta-time as a time not to discuss features. > > We kind of do - when in a CF we should do reviewing of existing > patches, when outside a CF we should do discussions and work on new > features. It's on http://wiki.postgresql.org/wiki/CommitFest. It > doesn't specifically say do this and don't do htat, but it says focus > on review and discussing things that will happen that far ahead is > definitely not focusing on review. Bruce is evidently under the impression that he's no longer under any obligation to review or commit other people's patches, or participate in the CommitFest process in any way. I believe that he has not committed a significant patch written by someone else in several years. If the committers on the core team aren't committed to the process, it doesn't stand much chance of working. The fact that I have been completely buried for the last six months is perhaps not helping, either, but even at the very low level of engagement I've been at recently, I've still done more reviews (a few) than patch submissions (none). I view it as everyone's responsibility to maintain a similar balance in their own work. And some people are, but not enough, especially among the committers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
At 2013-01-16 13:08:27 +0100, magnus@hagander.net wrote: > > One start might be to actually start having commitfest managers. (I'm skipping over this point, since Craig's nomination as CF manager is being discussed elsewhere in this thread.) > As in it technical works, but it's better to do it in a different way. > I'm not sure how to catch those better. It may be best to return the patch with feedback in that case (i.e. fail fast) rather than fix the problem. With some advice about the right way to do things, the original author or someone else can do the remaining work before a committer looks at it again. If that works, then everyone involved will have a better idea of where each patch stands, rather than having the opaque backlog we do now. In the best case, more people can get some hands-on experience in what it takes to prepare properly committable patches, and thus improve any reviews they do in future. I can only speak for myself, but I would be happy to help polish up patches like this. I don't know if this will help (and I haven't checked to see if it's been discussed before), but perhaps it's worth trying? Right now there seems to be very little else we non-committers can do to help directly. Aside: I thought about importing all CF patches into separate branches in one git repository, so that one doesn't have to play hunt-the-wumpus to find the latest version of a patch, and it's easier to submit small fixes, easier to reject or fix patches that don't apply, easier to fix whitespace errors once and for all (or even run pg_indent), etc. If any committers think it would be helpful, please let me know. > We also talked about the one-patch-one-review. Did someone ever check > if that worked out - did we get that spread, or did we end up with the > same ratio as last time? Here's a breakdown based purely on the names from the CF page (i.e. I didn't check archives to see who actually posted reviews, and didn't take into account reviews posted without updating the CF page). First, authors: 1 Abhijit Menon-Sen 1 Alexander Korotkov, Oleg Bartunov 1 Alexander Lakhin 1 Amit Kapila, Hari Babu 1 Asif Rehman 1 Atri Sharma 1 cjk 1 Dean Rasheed 1 Etsuro Fujita 1 Greg Smith 1 Heikki Linnakangas 1 Jameison Martin 1 Jan Wieck 1 Jeff Janes 1 Jeremy Evans 1 Joachim Wieland 1 Josh Kupershmidt 1 Kevin Grittner 1 Kyotaro Horiguichi 1 Lars Kanis 1 Marko Tiikkaja, Joel Jacobson 1 OskariSaarenmaa 1 Phil Sorber 1 Robert Haas 1 Satoshi Nagayasu 1 Vik Reykja 1 Will Leinweber 2 ÁlvaroHerrera 2 Andres Freund 2 Fujii Masao 2 Kyotaro Horiguchi 2 Peter Eisentraut 3 Amit Kapila 3 Dimitri Fontaine 3 Pavel Stehule 3 Tomas Vondra 3 Zoltán Böszörményi 7 Alexander Korotkov 13 Karl O.Pinc Second, reviewers: 1 Alastair Turner 1 Ali Dar 1 Amit Khandekar 1 Andrew Dunstan 1 Asif Rehman 1 Dimitri Fontaine 1 Etsuro Fujita 1 Hari Babu 1 Heikki Linnakangas 1 Ibrar Ahmed 1 Jeevan Chalke 1 Josh Berkus 1Josh Kupershmidt 1 KaiGai Kohei 1 Kevin Grittner 1 Kyotaro Horiguchi 1 Magnus Hagander 1 Michael Paquier 1 Muhammad Usama 1 Piyush Newe 1 Simon Riggs 1 Vik Reykja 1 Zoltán Böszörményi 2 Abhijit Menon-Sen 2 Andres Freund 2 Fujii Masao 2 Karl O. Pinc 2 Marko Tiikkaja 2 Noah Misch 2 Shigeru Hanada 2 Tom Lane 3 Peter Geoghegan 3 Tomas Vondra 4 Jeff Davis 4 Pavel Stehule 4 Robert Haas 6 Amit Kapila 9 Peter Eisentraut 10 Nobody I haven't looked too carefully, but it seems to me that the people who reviewed patches reviewed more than the number they submitted, but also many people didn't review anything. -- Abhijit
* Claudio Freire (klaussfreire@gmail.com) wrote: > Well, there's the fault in your logic. It won't be as linear. I really don't see how this has become so difficult to communicate. It doesn't have to be linear. We're currently doing massive amounts of parallel processing by hand using partitioning, tablespaces, and client-side logic to split up the jobs. It's certainly *much* faster than doing it in a single thread. It's also faster with 10 processes going than 5 (we've checked). With 10 going, we've hit the FC fabric limit (and these are spinning disks in the SAN, not SSDs). I'm also sure it'd be much slower if all 10 processes were trying to read data through a single process that's reading from the I/O system. We've got some processes which essentially end up doing that and we don't come anywhere near the total FC fabric bandwidth when just scanning through the system because, at that point, you do hit the limits of how fast the individual drive sets can provide data. To be clear- I'm not suggesting that we would parallelize a SeqScan node and have the nodes above it be single-threaded. As I said upthread- we want to parallelize reading and processing the data coming in. Perhaps at some level that works out to not change how we actually *do* seqscans at all and instead something higher in the plan tree just creates multiple of them on independent threads, but it's still going to end up being parallel I/O in the end. I'm done with this thread for now- as brought up, we need to focus on getting 9.3 out the door. Thanks, Stephen
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > In case you hadn't noticed, we've totally lost control of the CF > process. I concur. > Quite aside from the lack of progress on closing CF3, major > hackers who should know better are submitting significant new feature > patches now, despite our agreement in Ottawa that nothing big would be > accepted after CF3. For my small part, it wasn't my intent to drop a contentious patch at the end. I had felt it was pretty minor and relatively simple. My arguments regarding the popen patch were simply that it didn't address one of the use-cases that I was hoping to. I'll hold off on working on the compressed transport for now in favor of doing reviews and trying to help get 9.3 wrapped up. Thanks, Stephen
* Daniel Farina (daniel@heroku.com) wrote: > I have been skimming the commitfest application, and unlike some of > the previous commitfests a huge number of patches have had review at > some point in time, but probably need more...so looking for the red > "Nobody" in the 'reviewers' column probably understates the shortage > of review. I've been frustrated by that myself. I realize we don't want to duplicate work but I'm really starting to think that having the Reviewers column has turned out to actually work against us. > I'm curious what the qualitative feelings are on patches or clusters > thereof and what kind of review would be helpful in clearing the > field. I haven't been thrilled with the patches that I've looked at but they've also been ones that hadn't been reviewed before, so perhaps that's what should be expected. It'd be neat if we had some idea of what committers were actively working on and keep off of *those*, but keep working on the ones which aren't being worked by a committer currently. Thanks, Stephen
2013-01-16 14:18 keltezéssel, Abhijit Menon-Sen írta: > At 2013-01-16 13:08:27 +0100, magnus@hagander.net wrote: >> One start might be to actually start having commitfest managers. > (I'm skipping over this point, since Craig's nomination as CF manager is > being discussed elsewhere in this thread.) > >> As in it technical works, but it's better to do it in a different way. >> I'm not sure how to catch those better. > It may be best to return the patch with feedback in that case (i.e. fail > fast) rather than fix the problem. With some advice about the right way > to do things, the original author or someone else can do the remaining > work before a committer looks at it again. If that works, then everyone > involved will have a better idea of where each patch stands, rather than > having the opaque backlog we do now. > > In the best case, more people can get some hands-on experience in what > it takes to prepare properly committable patches, and thus improve any > reviews they do in future. I can only speak for myself, but I would be > happy to help polish up patches like this. > > I don't know if this will help (and I haven't checked to see if it's > been discussed before), but perhaps it's worth trying? Right now there > seems to be very little else we non-committers can do to help directly. > > Aside: I thought about importing all CF patches into separate branches > in one git repository, so that one doesn't have to play hunt-the-wumpus > to find the latest version of a patch, and it's easier to submit small > fixes, easier to reject or fix patches that don't apply, easier to fix > whitespace errors once and for all (or even run pg_indent), etc. If any > committers think it would be helpful, please let me know. > >> We also talked about the one-patch-one-review. Did someone ever check >> if that worked out - did we get that spread, or did we end up with the >> same ratio as last time? > Here's a breakdown based purely on the names from the CF page (i.e. I > didn't check archives to see who actually posted reviews, and didn't > take into account reviews posted without updating the CF page). > > First, authors: > > 1 Abhijit Menon-Sen > 1 Alexander Korotkov, Oleg Bartunov > 1 Alexander Lakhin > 1 Amit Kapila, Hari Babu > 1 Asif Rehman > 1 Atri Sharma > 1 cjk > 1 Dean Rasheed > 1 Etsuro Fujita > 1 Greg Smith > 1 Heikki Linnakangas > 1 Jameison Martin > 1 Jan Wieck > 1 Jeff Janes > 1 Jeremy Evans > 1 Joachim Wieland > 1 Josh Kupershmidt > 1 Kevin Grittner > 1 Kyotaro Horiguichi > 1 Lars Kanis > 1 Marko Tiikkaja, Joel Jacobson > 1 Oskari Saarenmaa > 1 Phil Sorber > 1 Robert Haas > 1 Satoshi Nagayasu > 1 Vik Reykja > 1 Will Leinweber > 2 Álvaro Herrera > 2 Andres Freund > 2 Fujii Masao > 2 Kyotaro Horiguchi > 2 Peter Eisentraut > 3 Amit Kapila > 3 Dimitri Fontaine > 3 Pavel Stehule > 3 Tomas Vondra > 3 Zoltán Böszörményi It should be 2 submitted patches. The PQconninfo() was a subfeature of the pg_basebackup patch, Magnus added it to the commitfest separately. > 7 Alexander Korotkov > 13 Karl O. Pinc > > Second, reviewers: > > 1 Alastair Turner > 1 Ali Dar > 1 Amit Khandekar > 1 Andrew Dunstan > 1 Asif Rehman > 1 Dimitri Fontaine > 1 Etsuro Fujita > 1 Hari Babu > 1 Heikki Linnakangas > 1 Ibrar Ahmed > 1 Jeevan Chalke > 1 Josh Berkus > 1 Josh Kupershmidt > 1 KaiGai Kohei > 1 Kevin Grittner > 1 Kyotaro Horiguchi > 1 Magnus Hagander > 1 Michael Paquier > 1 Muhammad Usama > 1 Piyush Newe > 1 Simon Riggs > 1 Vik Reykja > 1 Zoltán Böszörményi I reviewed 2 patches: https://commitfest.postgresql.org/action/patch_view?id=1005 https://commitfest.postgresql.org/action/patch_view?id=1006 > 2 Abhijit Menon-Sen > 2 Andres Freund > 2 Fujii Masao > 2 Karl O. Pinc > 2 Marko Tiikkaja > 2 Noah Misch > 2 Shigeru Hanada > 2 Tom Lane > 3 Peter Geoghegan > 3 Tomas Vondra > 4 Jeff Davis > 4 Pavel Stehule > 4 Robert Haas > 6 Amit Kapila > 9 Peter Eisentraut > 10 Nobody > > I haven't looked too carefully, but it seems to me that the people who > reviewed patches reviewed more than the number they submitted, but also > many people didn't review anything. > > -- Abhijit > > -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
Simon Riggs escribió: > On 16 January 2013 08:21, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: > > At 2013-01-16 02:07:29 -0500, tgl@sss.pgh.pa.us wrote: > >> > >> In case you hadn't noticed, we've totally lost control of > >> the CF process. > > > > What can we do to get it back on track? > > "Totally lost control" is an overstatement. The current situation is > that there are clearly more patches than people working on them; a > situation that we've been in, with various degrees for all of the last > 9 years of my involvement with PostgreSQL. > > AFAIK there was no CF manager assigned to Nov2012 CF, so not > surprising it is off track. Not sure we ever agreed a process for > assigning CF mgrs, so again, not surprising it is off track. FWIW, the September 2012 commitfest also went without a manager *for a month*. I started doing the job at that point because it was pretty clear to me that there was a serious problem. And when I finally closed it (just before the Nov. 2012 one was about to start), it was my intention to manage the next one as well; however somebody else told me they wanted that job, so I stepped aside. That was, probably, a bad decision on my part, and I blame myself for not keeping a closer eye on things that I intuitively knew were going wrong. > I would like to nominate Craig Ringer to be independent CF mgr for Jan2013 CF. Seconded. I particularly like the fact that Craig is not already a PG developer, so he's not going to be working on his own patches. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > > I would like to nominate Craig Ringer to be independent CF mgr for Jan2013 CF. > > Seconded. I particularly like the fact that Craig is not already a PG > developer, so he's not going to be working on his own patches. So when can he start? :D Thanks, Stephen
On 01/15/2013 11:32 PM, Bruce Momjian wrote: > On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: >> >> On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus <josh@agliodbs.com> wrote: >> >> Claudio, Stephen, >> >> It really seems like the areas where we could get the most "bang for the >> buck" in parallelism would be: >> >> 1. Parallel sort >> 2. Parallel aggregation (for commutative aggregates) >> 3. Parallel nested loop join (especially for expression joins, like GIS) >> >> parallel data load? :/ > We have that in pg_restore, and I thinnk we are getting parallel dump in > 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. > Is it still being worked on? > I am about half way through reviewing it. Unfortunately paid work take precedence over unpaid work. cheers andrew
At 2013-01-16 09:02:45 -0500, sfrost@snowman.net wrote: > > So when can he start? :D Also, what should he start with? CF3 as it stands today, or CF4 with all of the pending patches moved from CF3, immense though the result may be? I slightly prefer the latter, so that we're all on the same page when it comes to seeing what needs to be done. (If others agree, I can do some patch-moving, unless some special privileges are needed to do it.) -- Abhijit
On 01/16/2013 10:02 PM, Stephen Frost wrote:<br /><span style="white-space: pre;">> * Alvaro Herrera (<a class="moz-txt-link-abbreviated"href="mailto:alvherre@2ndquadrant.com">alvherre@2ndquadrant.com</a>) wrote:<br /> >>>I would like to nominate Craig Ringer to be independent CF mgr for Jan2013 CF.<br /> >><br /> >>Seconded. I particularly like the fact that Craig is not already a PG<br /> >> developer, so he's not goingto be working on his own patches.<br /> ><br /> > So when can he start? :D<br /> ></span><br /> Hey, I've hada few toy patches accepted... but yeah, I'm more interested in build systems, QA, docs, etc.<br /><br /> -- <br /> CraigRinger <a class="moz-txt-link-freetext" href="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a><br/> PostgreSQL Development, 24x7 Support, Training &Services<br /><br />
On 01/16/2013 08:12 PM, Simon Riggs wrote: > I would like to nominate Craig Ringer to be independent CF mgr for > Jan2013 CF. I'm happy to step up and help out. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Abhijit Menon-Sen (ams@2ndQuadrant.com) wrote: > Also, what should he start with? CF3 as it stands today, or CF4 with all > of the pending patches moved from CF3, immense though the result may be? > I slightly prefer the latter, so that we're all on the same page when it > comes to seeing what needs to be done. I'd leave it up to him to decide, but I think the general thought was to move it all to one place. Thanks, Stephen
On Wed, Jan 16, 2013 at 10:33 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Claudio Freire (klaussfreire@gmail.com) wrote: >> Well, there's the fault in your logic. It won't be as linear. > > I really don't see how this has become so difficult to communicate. > > It doesn't have to be linear. > > We're currently doing massive amounts of parallel processing by hand > using partitioning, tablespaces, and client-side logic to split up the > jobs. It's certainly *much* faster than doing it in a single thread. > It's also faster with 10 processes going than 5 (we've checked). With > 10 going, we've hit the FC fabric limit (and these are spinning disks in > the SAN, not SSDs). I'm also sure it'd be much slower if all 10 > processes were trying to read data through a single process that's > reading from the I/O system. We've got some processes which essentially > end up doing that and we don't come anywhere near the total FC fabric > bandwidth when just scanning through the system because, at that point, > you do hit the limits of how fast the individual drive sets can provide > data. Well... just closing then (to let people focus on 9.3's CF), that's a level of hardware I haven't had experience with, but seems to behave much different than regular (big and small) RAID arrays. In any case, perhaps tablespaces are a hint here: if nodes are working on different tablespaces, there's an indication that they *can* be parallelized efficiently. That could be fleshed out on a "parallel execution" node, but for that to work the whole execution engine needs to be thread-safe (or it has to fork). It won't be easy. It's best to concentrate on lower-hanging fruits, like sorting and aggregates. Now back to the CF.
On Wed, Jan 16, 2013 at 01:08:27PM +0100, Magnus Hagander wrote: > On Wed, Jan 16, 2013 at 9:21 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: > > At 2013-01-16 02:07:29 -0500, tgl@sss.pgh.pa.us wrote: > >> > >> In case you hadn't noticed, we've totally lost control of > >> the CF process. > > > > What can we do to get it back on track? > > Not sure. One start might be to actually start having commitfest > managers. I haven't actually checked numbers, but my feeling is that > last round most commitfests (maybe not all) had a commitfest manager > that worked (usually hard) at keeping things moving, whereas this > round most (or all?) commitfests have been unmanaged. I'm not sure if > that's the reason for it, but it's a feeling I have.. Agreed. > > I want to help, but I don't know what's wrong. What are the committers > > working on, and what is the status of the "Ready for commiter" patches? > > Is the problem that the patches marked Ready aren't, in fact, ready? Or > > is it lack of feedback from authors? Or something else? > > I'm not one of the committers that pick up the most patches, but from > reading messages on the lists I think fairly often patches that are > marked as ready, aren't. Sometimes they require a small change, which > is fine, but more often than not once it hits a committer it ends up > with a lot of feedback requiring rather extensive changes. That's an intrinsic hazard of non-committer reviews. All the people who are truly skilled at distinguishing ready patches from not-ready patches are already committers, but a non-committer review catching even one important problem adds value. I'd rather have more people getting involved in the process with high-effort reviews, however imperfect. > As in it > technical works, but it's better to do it in a different way. I'm not > sure how to catch those better. That's even harder, because it's subjective. Committers often disagree on such matters.
On Wed, Jan 16, 2013 at 08:42:29AM -0500, Stephen Frost wrote: > * Daniel Farina (daniel@heroku.com) wrote: > > I have been skimming the commitfest application, and unlike some of > > the previous commitfests a huge number of patches have had review at > > some point in time, but probably need more...so looking for the red > > "Nobody" in the 'reviewers' column probably understates the shortage > > of review. > > I've been frustrated by that myself. I realize we don't want to > duplicate work but I'm really starting to think that having the > Reviewers column has turned out to actually work against us. That column tells the CF manager whom to browbeat. Without a CF manager, a stale entry can indeed make a patch look under-control when it isn't.
On Wed, Jan 16, 2013 at 01:37:28PM +0900, Michael Paquier wrote: > > > On Wed, Jan 16, 2013 at 1:32 PM, Bruce Momjian <bruce@momjian.us> wrote: > > On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: > > > > > > On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus <josh@agliodbs.com> wrote: > > > > Claudio, Stephen, > > > > It really seems like the areas where we could get the most "bang for > the > > buck" in parallelism would be: > > > > 1. Parallel sort > > 2. Parallel aggregation (for commutative aggregates) > > 3. Parallel nested loop join (especially for expression joins, like > GIS) > > > > parallel data load? :/ > > We have that in pg_restore, and I think we are getting parallel dump in > 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. > Is it still being worked on? > > Not exactly, I meant something like being able to use parallel processing when > doing INSERT or COPY directly in core. If there is a parallel processing > infrastructure, it could also be used for such write operations. I agree that > the cases mentioned by Josh are far more appealing though... I am not sure how a COPY could be easily parallelized, but I supposed it could be done as part of the 1GB segment feature. People have complained that COPY is CPU-bound, so it might be very interesting to see if we could offload some of that parsing overhead to a child. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Jan 16, 2013 at 01:48:29AM -0300, Alvaro Herrera wrote: > Bruce Momjian escribió: > > On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: > > > > > > > > > On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus <josh@agliodbs.com> wrote: > > > > > > Claudio, Stephen, > > > > > > It really seems like the areas where we could get the most "bang for the > > > buck" in parallelism would be: > > > > > > 1. Parallel sort > > > 2. Parallel aggregation (for commutative aggregates) > > > 3. Parallel nested loop join (especially for expression joins, like GIS) > > > > > > parallel data load? :/ > > > > We have that in pg_restore, and I thinnk we are getting parallel dump in > > 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. > > Is it still being worked on? > > It's in the previous-to-last commitfest. IIRC that patch required > review and testing from people with some Windows background. > > There are still 34 items needing attention in CF3. I suggest that, if > you have some spare time, your help would be very much appreciated > there. The commitfest that started on Jan 15th has 65 extra items. > Anything currently listed in CF3 can rightfully be considered to be part > of CF4, too. Wow, I had no idea we were that far behind. I have avoided commit-fest work because I often travel so might leave the items abandoned, and I try to do cleanup of items that never make the commit-fest --- I thought that was something that needed doing too, and I rarely can complete that task. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Jan 16, 2013 at 08:11:06AM -0500, Robert Haas wrote: > > We kind of do - when in a CF we should do reviewing of existing > > patches, when outside a CF we should do discussions and work on new > > features. It's on http://wiki.postgresql.org/wiki/CommitFest. It > > doesn't specifically say do this and don't do htat, but it says focus > > on review and discussing things that will happen that far ahead is > > definitely not focusing on review. > > Bruce is evidently under the impression that he's no longer under any > obligation to review or commit other people's patches, or participate > in the CommitFest process in any way. I believe that he has not > committed a significant patch written by someone else in several > years. If the committers on the core team aren't committed to the > process, it doesn't stand much chance of working. I assume you know I was the most frequent committer of other people's patches for years before the commit-fests started, so I thought I would move on to other things. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Jan 16, 2013 at 09:05:39AM -0500, Andrew Dunstan wrote: > > On 01/15/2013 11:32 PM, Bruce Momjian wrote: > >On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: > >> > >>On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus <josh@agliodbs.com> wrote: > >> > >> Claudio, Stephen, > >> > >> It really seems like the areas where we could get the most "bang for the > >> buck" in parallelism would be: > >> > >> 1. Parallel sort > >> 2. Parallel aggregation (for commutative aggregates) > >> 3. Parallel nested loop join (especially for expression joins, like GIS) > >> > >>parallel data load? :/ > >We have that in pg_restore, and I thinnk we are getting parallel dump in > >9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. > >Is it still being worked on? > > > > > I am about half way through reviewing it. Unfortunately paid work > take precedence over unpaid work. Do you think it will make it into 9.3? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 01/16/2013 12:20 PM, Bruce Momjian wrote: > On Wed, Jan 16, 2013 at 09:05:39AM -0500, Andrew Dunstan wrote: >> On 01/15/2013 11:32 PM, Bruce Momjian wrote: >>> On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: >>>> On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus <josh@agliodbs.com> wrote: >>>> >>>> Claudio, Stephen, >>>> >>>> It really seems like the areas where we could get the most "bang for the >>>> buck" in parallelism would be: >>>> >>>> 1. Parallel sort >>>> 2. Parallel aggregation (for commutative aggregates) >>>> 3. Parallel nested loop join (especially for expression joins, like GIS) >>>> >>>> parallel data load? :/ >>> We have that in pg_restore, and I thinnk we are getting parallel dump in >>> 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. >>> Is it still being worked on? >>> >> >> I am about half way through reviewing it. Unfortunately paid work >> take precedence over unpaid work. > Do you think it will make it into 9.3? Yes, I hope it will. cheers andrew
> I would like to nominate Craig Ringer to be independent CF mgr for Jan2013 CF. +1, although I'll suggest that we should have *two* CF managers for this one to keep the workload manageable. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
> I assume you know I was the most frequent committer of other people's > patches for years before the commit-fests started, so I thought I would > move on to other things. Why would you think that? Given the volume of incoming patches, we need more committers than ever. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Wed, Jan 16, 2013 at 09:50:07AM -0800, Josh Berkus wrote: > > > I assume you know I was the most frequent committer of other people's > > patches for years before the commit-fests started, so I thought I would > > move on to other things. > > Why would you think that? Given the volume of incoming patches, we need > more committers than ever. Well, I usually do stuff no one wants to do, and it seems we have people doing this. Also, I had my hand in deciding lots of things when I was committing all those patches in the past, so I thought others should get the chance. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
> Well, I usually do stuff no one wants to do, and it seems we have people > doing this. Also, I had my hand in deciding lots of things when I was > committing all those patches in the past, so I thought others should get > the chance. Well, we clearly don't have *enough* people committing patches. So we could really use your help. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Stephen Frost <sfrost@snowman.net> writes: > * Abhijit Menon-Sen (ams@2ndQuadrant.com) wrote: >> Also, what should he start with? CF3 as it stands today, or CF4 with all >> of the pending patches moved from CF3, immense though the result may be? >> I slightly prefer the latter, so that we're all on the same page when it >> comes to seeing what needs to be done. > I'd leave it up to him to decide, but I think the general thought was to > move it all to one place. The original intention, per agreement at the last dev meeting, https://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting#CommitFest_Schedule was that we'd have a "triage" discussion between CF3 and CF4 to try to figure out which remaining big patches had a realistic chance of getting committed during CF4. The ones that didn't could then be deferred to 9.4 without first sucking a lot of time away from the ones that could get in. If we decide to fold CF3 and CF4 together, either we lose that step (which would make me sad, it seems like a good idea) or we need to figure another way to work it into the process. regards, tom lane
> If we decide to fold CF3 and CF4 together, either we lose that step > (which would make me sad, it seems like a good idea) or we need to > figure another way to work it into the process. Well, we should have the triage discussion ASAP then. We were really supposed to have it a week ago. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
* Bruce Momjian (bruce@momjian.us) wrote: > I am not sure how a COPY could be easily parallelized, but I supposed it > could be done as part of the 1GB segment feature. People have > complained that COPY is CPU-bound, so it might be very interesting to > see if we could offload some of that parsing overhead to a child. COPY can certainly be CPU bound but before we can parallelize that usefully we need to solve the problem around extent locking when trying to do multiple COPY's to the same table. Thanks, Stephen
Josh Berkus <josh@agliodbs.com> writes: >> If we decide to fold CF3 and CF4 together, either we lose that step >> (which would make me sad, it seems like a good idea) or we need to >> figure another way to work it into the process. > Well, we should have the triage discussion ASAP then. We were really > supposed to have it a week ago. Well, I don't want to just say "hey, it's triage time!" while we're still actively working on the CF. Personally, the reason I've accomplished nothing since early December on CF3 is lack of bandwidth --- bug fixing, email-answering, and non-PG Red Hat work have consumed all my time that wasn't eaten by holiday distractions. I'm hoping to get something done soon towards committing immediately-committable CF entries, but if we're instead focusing on patches that require triage discussions, that's likely to not happen. Looking back at the original schedule agreement, I see I misremembered it a bit; we actually had the idea for two rounds of which-patches-have- a-chance: CF1: June 15 - July 15CF2: Sept 15 - Oct 15CF3: Nov 15 - Dec 15 Planning Week - Dec 8-15CF4.1: Jan 15 - Feb 15 Final Triage:Feb 1-7 but both of those discussion weeks were mid-CF, and I now think that that's got little chance of working because of bandwidth considerations. We certainly totally forgot about the first one. I think a realistic answer might be to admit that we've slipped quite a bit. Set the end date of CF3 to perhaps end of January, do triage the first week of February, and then start CF4 after that, about three or four weeks later than planned. regards, tom lane
On Wed, Jan 16, 2013 at 03:13:50PM -0500, Tom Lane wrote: > I think a realistic answer might be to admit that we've slipped quite a > bit. Set the end date of CF3 to perhaps end of January, do triage the > first week of February, and then start CF4 after that, about three or > four weeks later than planned. This makes sense. Things have just not worked out as planned and we should adjust. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
2013/1/16 Stephen Frost <sfrost@snowman.net>: > * Bruce Momjian (bruce@momjian.us) wrote: >> I am not sure how a COPY could be easily parallelized, but I supposed it >> could be done as part of the 1GB segment feature. People have >> complained that COPY is CPU-bound, so it might be very interesting to >> see if we could offload some of that parsing overhead to a child. > > COPY can certainly be CPU bound but before we can parallelize that > usefully we need to solve the problem around extent locking when trying > to do multiple COPY's to the same table. Probably update any related indexes and constraint checking should be paralellized. Regards Pavel > > Thanks, > > Stephen
On Wed, Jan 16, 2013 at 10:06:51PM +0100, Pavel Stehule wrote: > 2013/1/16 Stephen Frost <sfrost@snowman.net>: > > * Bruce Momjian (bruce@momjian.us) wrote: > >> I am not sure how a COPY could be easily parallelized, but I supposed it > >> could be done as part of the 1GB segment feature. People have > >> complained that COPY is CPU-bound, so it might be very interesting to > >> see if we could offload some of that parsing overhead to a child. > > > > COPY can certainly be CPU bound but before we can parallelize that > > usefully we need to solve the problem around extent locking when trying > > to do multiple COPY's to the same table. > > Probably update any related indexes and constraint checking should be > paralellized. Wiki updated: https://wiki.postgresql.org/wiki/Parallel_Query_Execution -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
2013/1/16 Bruce Momjian <bruce@momjian.us>: > Wiki updated: > > https://wiki.postgresql.org/wiki/Parallel_Query_Execution Could we add CTE to that opportunities list? I think that some kind of queries in CTE queries could be easilly parallelized. []s -- Dickson S. Guedes mail/xmpp: guedes@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br
On Wed, Jan 16, 2013 at 07:57:01PM -0200, Dickson S. Guedes wrote: > 2013/1/16 Bruce Momjian <bruce@momjian.us>: > > Wiki updated: > > > > https://wiki.postgresql.org/wiki/Parallel_Query_Execution > > Could we add CTE to that opportunities list? I think that some kind of > queries in CTE queries could be easilly parallelized. I added CTEs with joins. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Jan 16, 2013 at 12:36 PM, Josh Berkus <josh@agliodbs.com> wrote: >> I would like to nominate Craig Ringer to be independent CF mgr for Jan2013 CF. > > +1, although I'll suggest that we should have *two* CF managers for this > one to keep the workload manageable. That has never worked before, so I'm reluctant to start now. Either one person does all the work and the other person still gets half the credit, or neither person does any work because they're hoping the other person has it covered. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tuesday, January 15, 2013, Gavin Flower wrote:
On 16/01/13 11:14, Bruce Momjian wrote:Hmm...I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Years ago I added thread-safety to libpq. Recently I added two parallel execution paths to pg_upgrade. The first parallel path allows execution of external binaries pg_dump and psql (to restore). The second parallel path does copy/link by calling fork/thread-safe C functions. I was able to do each in 2-3 days. I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I have summarized my ideas by updating our Parallel Query Execution wiki page: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Please consider updating the page yourself or posting your ideas to this thread. Thanks.
How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels?
effective_io_concurrency does this for bitmap scans. I thought there was a patch in the commitfest to extend this to ordinary index scans, but now I can't find it. But it still doesn't give you CPU parallelism. The nice thing about CPU parallelism is that it usually brings some amount of IO parallelism for free, while the reverse less likely to be so.
Cheers,
Jeff
On Tuesday, January 15, 2013, Stephen Frost wrote:
* Gavin Flower (GavinFlower@archidevsys.co.nz) wrote:
> How about being aware of multiple spindles - so if the requested
> data covers multiple spindles, then data could be extracted in
> parallel. This may, or may not, involve multiple I/O channels?
Yes, this should dovetail with partitioning and tablespaces to pick up
on exactly that.
I'd rather not have the benefits of parallelism be tied to partitioning if we can help it. Hopefully implementing parallelism in core would result in something more transparent than that.
Cheers,
Jeff
On 16 January 2013 19:28, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Stephen Frost <sfrost@snowman.net> writes: >> * Abhijit Menon-Sen (ams@2ndQuadrant.com) wrote: >>> Also, what should he start with? CF3 as it stands today, or CF4 with all >>> of the pending patches moved from CF3, immense though the result may be? >>> I slightly prefer the latter, so that we're all on the same page when it >>> comes to seeing what needs to be done. > >> I'd leave it up to him to decide, but I think the general thought was to >> move it all to one place. > > The original intention, per agreement at the last dev meeting, > https://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting#CommitFest_Schedule > was that we'd have a "triage" discussion between CF3 and CF4 to try to > figure out which remaining big patches had a realistic chance of getting > committed during CF4. The ones that didn't could then be deferred to > 9.4 without first sucking a lot of time away from the ones that could > get in. Sorry to correct you but that was not the original intention/agreement. Robert made that suggestion, and I opposed it, saying it was too early and suggesting triage week for first week of Feb instead. My recollection, sitting opposite you, was that you agreed, as did many others. I agree we need triage, and have no problem if you lead that. But lets wait until early Feb, please. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jan 16, 2013 at 10:04 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> Hmm... >> >> How about being aware of multiple spindles - so if the requested data >> covers multiple spindles, then data could be extracted in parallel. This >> may, or may not, involve multiple I/O channels? > > > > effective_io_concurrency does this for bitmap scans. I thought there was a > patch in the commitfest to extend this to ordinary index scans, but now I > can't find it. I never pushed it to the CF since it interacts so badly with the kernel. I was thinking about pushing the small part that is a net win in all cases, the back-sequential patch, but that's independent of any spindle count. It's more related to rotating media and read request merges than it is to multiple spindles or parallelization. The kernel guys basically are waiting for me to patch the kernel. I think I convinced our IT guy at the office to lend me a machine for tests... so it might happen soon.
On Wed, Jan 16, 2013 at 05:04:05PM -0800, Jeff Janes wrote: > On Tuesday, January 15, 2013, Stephen Frost wrote: > > * Gavin Flower (GavinFlower@archidevsys.co.nz) wrote: > > How about being aware of multiple spindles - so if the requested > > data covers multiple spindles, then data could be extracted in > > parallel. This may, or may not, involve multiple I/O channels? > > Yes, this should dovetail with partitioning and tablespaces to pick up > on exactly that. > > > I'd rather not have the benefits of parallelism be tied to partitioning if we > can help it. Hopefully implementing parallelism in core would result in > something more transparent than that. We will need a way to know we are not saturating the I/O channel with random I/O that could have been sequential if it was single-threaded. Tablespaces give us that info; not sure what else does. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Jan 16, 2013 at 11:44 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Wed, Jan 16, 2013 at 05:04:05PM -0800, Jeff Janes wrote: >> On Tuesday, January 15, 2013, Stephen Frost wrote: >> >> * Gavin Flower (GavinFlower@archidevsys.co.nz) wrote: >> > How about being aware of multiple spindles - so if the requested >> > data covers multiple spindles, then data could be extracted in >> > parallel. This may, or may not, involve multiple I/O channels? >> >> Yes, this should dovetail with partitioning and tablespaces to pick up >> on exactly that. >> >> >> I'd rather not have the benefits of parallelism be tied to partitioning if we >> can help it. Hopefully implementing parallelism in core would result in >> something more transparent than that. > > We will need a way to know we are not saturating the I/O channel with > random I/O that could have been sequential if it was single-threaded. > Tablespaces give us that info; not sure what else does. I do also think tablespaces are a safe bet. But it wouldn't help for parallelizing sorts or other operations with tempfiles (tempfiles reside on the same tablespace), or even over a single table (same tablespace again). And when the query is CPU-bound, it could be parallelized by simply making a multithreaded memory sort. Well, not so simply, but I do think it's an important building block.
On Wed, Jan 16, 2013 at 11:56:21PM -0300, Claudio Freire wrote: > On Wed, Jan 16, 2013 at 11:44 PM, Bruce Momjian <bruce@momjian.us> wrote: > > On Wed, Jan 16, 2013 at 05:04:05PM -0800, Jeff Janes wrote: > >> On Tuesday, January 15, 2013, Stephen Frost wrote: > >> > >> * Gavin Flower (GavinFlower@archidevsys.co.nz) wrote: > >> > How about being aware of multiple spindles - so if the requested > >> > data covers multiple spindles, then data could be extracted in > >> > parallel. This may, or may not, involve multiple I/O channels? > >> > >> Yes, this should dovetail with partitioning and tablespaces to pick up > >> on exactly that. > >> > >> > >> I'd rather not have the benefits of parallelism be tied to partitioning if we > >> can help it. Hopefully implementing parallelism in core would result in > >> something more transparent than that. > > > > We will need a way to know we are not saturating the I/O channel with > > random I/O that could have been sequential if it was single-threaded. > > Tablespaces give us that info; not sure what else does. > > I do also think tablespaces are a safe bet. But it wouldn't help for > parallelizing sorts or other operations with tempfiles (tempfiles > reside on the same tablespace), or even over a single table (same We can round-robin temp tablespace usage if you list multiple entries. > tablespace again). And when the query is CPU-bound, it could be > parallelized by simply making a multithreaded memory sort. Well, not > so simply, but I do think it's an important building block. Yes, and detecting when to use these parallel features will be hard. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wednesday, January 16, 2013, Stephen Frost wrote:
* Bruce Momjian (bruce@momjian.us) wrote:
> I am not sure how a COPY could be easily parallelized, but I supposed it
> could be done as part of the 1GB segment feature. People have
> complained that COPY is CPU-bound, so it might be very interesting to
> see if we could offload some of that parsing overhead to a child.
COPY can certainly be CPU bound but before we can parallelize that
usefully we need to solve the problem around extent locking when trying
to do multiple COPY's to the same table.
I think that is rather over-stating it. Even with unindexed untriggered tables, I can get some benefit from doing hand-rolled parallel COPY before the extension lock becomes an issue, at least on some machines. And with triggered or indexed tables, all the more so.
Cheers,
Jeff
Simon Riggs <simon@2ndQuadrant.com> writes: > I agree we need triage, and have no problem if you lead that. But lets > wait until early Feb, please. See my later response in the thread, where I corrected my initial recollection (of the first proposal) to be what the meeting minutes say we agreed to. However, since we already missed the scheduling agreed to then, the question that's on the table now is what we should do instead. regards, tom lane
On 01/17/2013 06:01 AM, Robert Haas wrote: > On Wed, Jan 16, 2013 at 12:36 PM, Josh Berkus <josh@agliodbs.com> wrote: >>> I would like to nominate Craig Ringer to be independent CF mgr for Jan2013 CF. >> +1, although I'll suggest that we should have *two* CF managers for this >> one to keep the workload manageable. > That has never worked before, so I'm reluctant to start now. Either > one person does all the work and the other person still gets half the > credit, or neither person does any work because they're hoping the > other person has it covered. Alright - it sounds like there's a tentative agreement on having me help out and as time's a'-flowing I'll get to it. If anyone has anything they think needs specific attention please email me directly. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
At 2013-01-16 22:40:07 -0500, tgl@sss.pgh.pa.us wrote: > > However, since we already missed the scheduling agreed to then, the > question that's on the table now is what we should do instead. I suggest we close CF3 and bring the pending CF3 patches into CF4, but still have a triage of CF4 patches in early February. Until then, there are several patches marked "Ready for committer" for committers to look at when they have time (14 in CF3, 9 in CF4, with two weeks remaining to the beginning of February). Merging the two CFs will result in a huge number of patches, but I don't think we can treat CF4 as being any less "in progress" than CF3 at this point. People have already started reviewing those patches, etc. -- Abhijit
On Wed, 2013-01-16 at 15:13 -0500, Tom Lane wrote: > I think a realistic answer might be to admit that we've slipped quite > a bit. Set the end date of CF3 to perhaps end of January, do triage > the first week of February, and then start CF4 after that, about three > or four weeks later than planned. I'd suggest moving all the open items from CF3 into CF4 and start CF4 right away. If we did that, the number of patches in CF4 will be about the same as in the last commit fests of the previous years, so it wouldn't be completely off track. If you postpone the start of the last commit fest to mid-February, I'd expect that we will have a much larger number of patches.
Peter Eisentraut <peter_e@gmx.net> writes: > If you postpone the start of the last commit fest to mid-February, I'd > expect that we will have a much larger number of patches. Since some people seem to be under the impression that new work is still okay to submit now, you have a point ... we are definitely past what was envisioned as the deadline for new 9.3 patches. regards, tom lane
On 01/17/2013 12:48 PM, Abhijit Menon-Sen wrote: > At 2013-01-16 22:40:07 -0500, tgl@sss.pgh.pa.us wrote: >> However, since we already missed the scheduling agreed to then, the >> question that's on the table now is what we should do instead. > I suggest we close CF3 and bring the pending CF3 patches into CF4, but > still have a triage of CF4 patches in early February. Until then, there > are several patches marked "Ready for committer" for committers to look > at when they have time (14 in CF3, 9 in CF4, with two weeks remaining to > the beginning of February). This seems sensible to me. 2012-11 is gone, whether truly finished or not, and if everyone's OK with it I'd like to move all open work into 2013-01, close 2012-11, and open 2013-03 for post-9.3 work. That'll at least provide a place for post-9.3 patches and consolodate everything for somewhat easier tracking. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jan 17, 2013 at 3:52 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
Michael Paquier
http://michael.otacoo.com
This seems sensible to me. 2012-11 is gone, whether truly finished or
not, and if everyone's OK with it I'd like to move all open work into
2013-01, close 2012-11, and open 2013-03 for post-9.3 work. That'll at
least provide a place for post-9.3 patches and consolodate everything
for somewhat easier tracking.
Is it really necessary to create a new commit fest just to move the items? Marking the patches that are considered as being too late for 9.3 should be just returned with feedback. The former patch writers, or people who want to pick up the old patches and resubmit them, will just need to move the items to the 9.4 commit fests once they are officially created.
-- http://michael.otacoo.com
At 2013-01-17 16:05:05 +0900, michael.paquier@gmail.com wrote: > > Is it really necessary to create a new commit fest just to move the > items? Marking the patches that are considered as being too late for > 9.3 should be just returned with feedback. Opening 2013-03 is not so much to move existing patches, but to give people a place to submit *new* post-9.3 patches without interrupting the 2013-01 CF. -- Abhijit
On Wed, Jan 16, 2013 at 1:51 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
At 2013-01-16 02:07:29 -0500, tgl@sss.pgh.pa.us wrote:
>
> In case you hadn't noticed, we've totally lost control of
> the CF process.
What can we do to get it back on track?
I know various people (myself included) have been trying to keep CF3
moving, e.g. sending followup mail, adjusting patch status, etc.
I want to help, but I don't know what's wrong. What are the committers
working on, and what is the status of the "Ready for commiter" patches?
Is the problem that the patches marked Ready aren't, in fact, ready? Or
is it lack of feedback from authors? Or something else?
ISTM that even committers are often overwhelmed with the work, their own as well as that of reviewing other's patches and committing them. Especially when a patch is large or touches core areas, I can feel the significant work that the committer has to do even after some help and initial review from CF reviewers. On the other hand, if the patches are not committed in time, we loose context, interest dies down and when the patch is actually picked up by a committer who is often not involved in the original discussion, many points need to be revisited and reworked.
Would it help to step up a few developers and create a second line of committers ? The commits by the second line committers will still be reviewed by the first line committers before they make into the product, but may be at later stage or when we are in beta. I thought of even suggesting that the master branch will contain only commits by the first line committers. We then maintain a secondary branch which also have commits from the second line committers in addition to all commits from the master branch. The first line committers can then cherry pick from the secondary branch at some later stage. But not sure if this will add more overhead and defeat the problem at hand.
My 2 cents on this difficult topic.
Thanks,
Pavan
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee
<p dir="ltr"><br /> On Jan 17, 2013 8:15 AM, "Abhijit Menon-Sen" <<a href="mailto:ams@2ndquadrant.com">ams@2ndquadrant.com</a>>wrote:<br /> ><br /> > At 2013-01-17 16:05:05 +0900, <ahref="mailto:michael.paquier@gmail.com">michael.paquier@gmail.com</a> wrote:<br /> > ><br /> > > Is it reallynecessary to create a new commit fest just to move the<br /> > > items? Marking the patches that are consideredas being too late for<br /> > > 9.3 should be just returned with feedback.<br /> ><br /> > Opening2013-03 is not so much to move existing patches, but to give<br /> > people a place to submit *new* post-9.3 patcheswithout interrupting<br /> > the 2013-01 CF.<p dir="ltr">Yeah, and +1 for doing that. The sooner the better. Bywhichever of the time frames for the cf that had been discussed, it should certainly not be open for accepting new patchesfor 9.3 anymore. <p dir="ltr">/Magnus
On Thu, Jan 17, 2013 at 8:19 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > > On Wed, Jan 16, 2013 at 1:51 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> > wrote: >> >> At 2013-01-16 02:07:29 -0500, tgl@sss.pgh.pa.us wrote: >> > >> > In case you hadn't noticed, we've totally lost control of >> > the CF process. >> >> What can we do to get it back on track? >> >> I know various people (myself included) have been trying to keep CF3 >> moving, e.g. sending followup mail, adjusting patch status, etc. >> >> I want to help, but I don't know what's wrong. What are the committers >> working on, and what is the status of the "Ready for commiter" patches? >> Is the problem that the patches marked Ready aren't, in fact, ready? Or >> is it lack of feedback from authors? Or something else? >> > > ISTM that even committers are often overwhelmed with the work, their own as > well as that of reviewing other's patches and committing them. Especially > when a patch is large or touches core areas, I can feel the significant work > that the committer has to do even after some help and initial review from CF > reviewers. On the other hand, if the patches are not committed in time, we > loose context, interest dies down and when the patch is actually picked up > by a committer who is often not involved in the original discussion, many > points need to be revisited and reworked. > > Would it help to step up a few developers and create a second line of > committers ? The commits by the second line committers will still be > reviewed by the first line committers before they make into the product, but > may be at later stage or when we are in beta. I thought of even suggesting > that the master branch will contain only commits by the first line > committers. We then maintain a secondary branch which also have commits from > the second line committers in addition to all commits from the master > branch. The first line committers can then cherry pick from the secondary > branch at some later stage. But not sure if this will add more overhead and > defeat the problem at hand. While we can certainly do that, it would probably help just to havae a "second line of reviewers". Basically a set of more senior reviewers - so a patch would go submission -> reviewer -> senior reviewer -> committer. With the second line of reviewers focusing more on the whole "how to do things", etc. As you, I'm not sure if it creates more overhead than it solves, but it might be worth a try. In a way it already exists, because I'm sure committers pay slightly more attention to reviews by people who ahve been doing it a lot and are known to process those things, than to new entries. All that woudl bee needed was for those people to realize it would be helpful for them to do a second-stage review even if somebody else has done the first one. --Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
<div class="moz-cite-prefix">On 01/17/2013 04:43 PM, Magnus Hagander wrote:<br /></div><blockquote cite="mid:CABUevEz1PZVgMd-crbHKjngK+p4NYNVFVs8Gg-5EtNkNv9JnTg@mail.gmail.com"type="cite"><p dir="ltr"><br /> On Jan 17, 20138:15 AM, "Abhijit Menon-Sen" <<a href="mailto:ams@2ndquadrant.com" moz-do-not-send="true">ams@2ndquadrant.com</a>>wrote:<br /> ><br /> > At 2013-01-17 16:05:05 +0900, <a href="mailto:michael.paquier@gmail.com"moz-do-not-send="true">michael.paquier@gmail.com</a> wrote:<br /> > ><br />> > Is it really necessary to create a new commit fest just to move the<br /> > > items? Marking the patchesthat are considered as being too late for<br /> > > 9.3 should be just returned with feedback.<br /> ><br/> > Opening 2013-03 is not so much to move existing patches, but to give<br /> > people a place to submit *new*post-9.3 patches without interrupting<br /> > the 2013-01 CF.<p dir="ltr">Yeah, and +1 for doing that. The soonerthe better. By whichever of the time frames for the cf that had been discussed, it should certainly not be open foraccepting new patches for 9.3 anymore.<br /></blockquote> I've moved all pending patches from 2012-11 to 2013-01. I'llgo through and poke them for aliveness and start chasing things up; in the mean time, any chance of closing 2012-11?<br/><pre class="moz-signature" cols="72">-- Craig Ringer <a class="moz-txt-link-freetext" href="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a>PostgreSQLDevelopment, 24x7 Support, Training & Services</pre>
On Thu, Jan 17, 2013 at 8:17 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > I've moved all pending patches from 2012-11 to 2013-01. I'll go through and > poke them for aliveness and start chasing things up; in the mean time, any > chance of closing 2012-11? Done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thursday, January 17, 2013, Magnus Hagander wrote:
> Would it help to step up a few developers and create a second line of
> committers ? The commits by the second line committers will still be
> reviewed by the first line committers before they make into the product, but
> may be at later stage or when we are in beta.
Perhaps this is because I've never needed or played with "prepared transactions"; but to me a commit is a commit is a commit. If it isn't in the product, then whatever it was was not a commit.
..
While we can certainly do that, it would probably help just to havae a
"second line of reviewers". Basically a set of more senior reviewers -
so a patch would go submission -> reviewer -> senior reviewer ->
committer. With the second line of reviewers focusing more on the
whole "how to do things", etc.
That order seems partially backwards to me. If someone needs to glance at a patch and say "That whole idea is never going to work" or more optimistically "you need to be making that change in smgr, not lmgr", it is probably not going to be a novice reviewer who does that.
Sometime this type of high-level summary review does happen, at the senior person's whim, but is not a formal part of the commit fest process.
What I don't know is how much work it takes for one of those senior people to make one of those summary judgments, compared to how much it takes for them to just do an entire review from scratch.
Cheers,
Jeff
On Sat, Jan 19, 2013 at 5:21 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > Sometime this type of high-level summary review does happen, at the senior > person's whim, but is not a formal part of the commit fest process. > > What I don't know is how much work it takes for one of those senior people > to make one of those summary judgments, compared to how much it takes for > them to just do an entire review from scratch. IME, making such summary judgements can often be done in a few minutes, but convincing that the patch submitter that you haven't created the objection purely as an obstacle to progress is the work of a lifetime. We could perhaps do better at avoiding perverse incentives, there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 20 January 2013 18:42, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Jan 19, 2013 at 5:21 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> Sometime this type of high-level summary review does happen, at the senior >> person's whim, but is not a formal part of the commit fest process. >> >> What I don't know is how much work it takes for one of those senior people >> to make one of those summary judgments, compared to how much it takes for >> them to just do an entire review from scratch. > > IME, making such summary judgements can often be done in a few > minutes, but convincing that the patch submitter that you haven't > created the objection purely as an obstacle to progress is the work of > a lifetime. We could perhaps do better at avoiding perverse > incentives, there. (Without meaning to paraphrase you in any negative way...) Judgements made in a few minutes are very frequently wrong, and it takes a lot of time to convince the person making snap decisions that they should revise their thinking in light of new or correct information. I feel we are very prone to making judgements on little information. This is especially true with regard to voting, with people zooming in from the side without having even read a patch to vote one way or the other, voting for the person not the idea. It can be a big problem telling the difference between a patch submitter that really is in possession of information that should be heeded and someone so blinded by their patch/idea that they mistakenly push forwards. At times, I have been both and I've also witnessed both as committer. There is a clear and understandable conservatism in being a reviewer/committer that people that haven't done it don't understand. I definitely didn't until I was a committer and I remember clearly me pushing for HS to go into 8.4 when it was a long way from being ready. I think it would be useful to expand the pool of committers and perhaps that can be done via some intermediate stage, though I do worry that the sense of responsibility might not be as strong in the intermediate rank ($NAME). I don't think we should be encouraging people to make fast decisions. Senior or not. (Though we must make decisions and have some coming soon). This is high in my mind right now since I've been looking at skip checkpoint patch for months, seeing how I feel about it. Nervous, basically. From that I think that some areas of the code are more critical than others and harder to fix in production if they go wrong. We need to be taking more care in critical areas and it would be useful to be able to mark a patch for its level of risk, rather than just "is it ready". That way we can gauge the risk/benefit. Examples of high risk would be checksums, examples of low risk would be logical replication patches. Anyway, some thoughts to discuss in May. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
2013/1/20 Simon Riggs <simon@2ndquadrant.com>: > On 20 January 2013 18:42, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sat, Jan 19, 2013 at 5:21 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> Sometime this type of high-level summary review does happen, at the senior >>> person's whim, but is not a formal part of the commit fest process. >>> >>> What I don't know is how much work it takes for one of those senior people >>> to make one of those summary judgments, compared to how much it takes for >>> them to just do an entire review from scratch. >> >> IME, making such summary judgements can often be done in a few >> minutes, but convincing that the patch submitter that you haven't >> created the objection purely as an obstacle to progress is the work of >> a lifetime. We could perhaps do better at avoiding perverse >> incentives, there. > > (Without meaning to paraphrase you in any negative way...) > > Judgements made in a few minutes are very frequently wrong, and it > takes a lot of time to convince the person making snap decisions that > they should revise their thinking in light of new or correct > information. I feel we are very prone to making judgements on little > information. This is especially true with regard to voting, with > people zooming in from the side without having even read a patch to > vote one way or the other, voting for the person not the idea. > > It can be a big problem telling the difference between a patch > submitter that really is in possession of information that should be > heeded and someone so blinded by their patch/idea that they mistakenly > push forwards. At times, I have been both and I've also witnessed both > as committer. > > There is a clear and understandable conservatism in being a > reviewer/committer that people that haven't done it don't understand. > I definitely didn't until I was a committer and I remember clearly me > pushing for HS to go into 8.4 when it was a long way from being ready. > I think it would be useful to expand the pool of committers and > perhaps that can be done via some intermediate stage, though I do > worry that the sense of responsibility might not be as strong in the > intermediate rank ($NAME). > > I don't think we should be encouraging people to make fast decisions. > Senior or not. (Though we must make decisions and have some coming > soon). > > This is high in my mind right now since I've been looking at skip > checkpoint patch for months, seeing how I feel about it. Nervous, > basically. > > From that I think that some areas of the code are more critical than > others and harder to fix in production if they go wrong. We need to be > taking more care in critical areas and it would be useful to be able > to mark a patch for its level of risk, rather than just "is it ready". > That way we can gauge the risk/benefit. Examples of high risk would be > checksums, examples of low risk would be logical replication patches. > > Anyway, some thoughts to discuss in May. +1 Pavel > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jan 20, 2013 at 2:39 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > (Without meaning to paraphrase you in any negative way...) > > Judgements made in a few minutes are very frequently wrong, and it > takes a lot of time to convince the person making snap decisions that > they should revise their thinking in light of new or correct > information. I feel we are very prone to making judgements on little > information. This is especially true with regard to voting, with > people zooming in from the side without having even read a patch to > vote one way or the other, voting for the person not the idea. > > It can be a big problem telling the difference between a patch > submitter that really is in possession of information that should be > heeded and someone so blinded by their patch/idea that they mistakenly > push forwards. At times, I have been both and I've also witnessed both > as committer. > > There is a clear and understandable conservatism in being a > reviewer/committer that people that haven't done it don't understand. > I definitely didn't until I was a committer and I remember clearly me > pushing for HS to go into 8.4 when it was a long way from being ready. > I think it would be useful to expand the pool of committers and > perhaps that can be done via some intermediate stage, though I do > worry that the sense of responsibility might not be as strong in the > intermediate rank ($NAME). > > I don't think we should be encouraging people to make fast decisions. > Senior or not. (Though we must make decisions and have some coming > soon). > > This is high in my mind right now since I've been looking at skip > checkpoint patch for months, seeing how I feel about it. Nervous, > basically. > > From that I think that some areas of the code are more critical than > others and harder to fix in production if they go wrong. We need to be > taking more care in critical areas and it would be useful to be able > to mark a patch for its level of risk, rather than just "is it ready". > That way we can gauge the risk/benefit. Examples of high risk would be > checksums, examples of low risk would be logical replication patches. > > Anyway, some thoughts to discuss in May. I agree with some but not all of your observations here, but they're along a different line than the point I was trying to make. I would distinguish between value judgements (i.e. this patch is bad because we don't want that) and architectural judgments (i.e. this patch is bad because the logic you've added needs to go in the planner, not the parser). I often disagree with the value judgements that others make, regardless of how much or little time they've spent on them, because, well, at the end of the day, it's an opinion. Our experiences inform our opinions, and since we all have different experiences, we won't always have the same opinions about everything. Architectural judgements are something else again. If Tom says that a particular piece of logic needs to live in the planner rather than the parser, the chances that he is right are upwards of 90%, in my experience. There are more complicated or controversial cases, of course, but I find it's often not difficult to answer the question "assuming we were going to do this, how ought we to do it?". People don't always like hearing those answers, and they're not *always* easy, but there are many cases where I think they are. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sunday, January 20, 2013, Simon Riggs wrote:
On 20 January 2013 18:42, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Jan 19, 2013 at 5:21 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> Sometime this type of high-level summary review does happen, at the senior
>> person's whim, but is not a formal part of the commit fest process.
>>
>> What I don't know is how much work it takes for one of those senior people
>> to make one of those summary judgments, compared to how much it takes for
>> them to just do an entire review from scratch.
>
> IME, making such summary judgements can often be done in a few
> minutes, but convincing that the patch submitter that you haven't
> created the objection purely as an obstacle to progress is the work of
> a lifetime. We could perhaps do better at avoiding perverse
> incentives, there.
As a junior reviewer, I'd like to know if my main task should be to decide between 1) writing a review convincing you or Tom that your judgement is hasty, or 2) to convince the author that your judgement is correct. That would provide me with some direction, rather than just having me pondering whether a certain variable name ought to have one more or one fewer underscores in it. On the other hand if a summary judgement is that the patch is fundamentally sound but needs some line-by-line code-lawyering, or some performance testing, or documentation/usability testing, or needs to ponder the implications to part XYZ of the code base (which neither I nor the other have even heard of before), that would also be good to know.
My desire is not for you to tell me what the outcome of the review should be, but rather what the focus of it should be.
As it is now, when I see a patch that needs review but has no comments, I don't know if that is because no senior developer has read it, or because they have read it but didn't think it needed comment. The wiki page does list points to consider when reviewing a submission, but not all points are equally relevant to all patches--and it is not always obvious to me which are more relevant.
(Without meaning to paraphrase you in any negative way...)
Judgements made in a few minutes are very frequently wrong, and it
takes a lot of time to convince the person making snap decisions that
they should revise their thinking in light of new or correct
information.
This is true, but I'd like to know up front that convincing them to revise their thinking is the main thing I need to do during the review process. Restating and clarifying the submitter's arguments, perhaps with the addition of some experimental evidence, is one part where I think I can contribute, provided I know that that is what I need to be doing. Having them reserve their opinion until after it is marked "ready for commiter" doesn't make it easier to change their mind, I don't think. As a reviewer I can't help address their specific concerns, if I don't know what those were.
Cheers,
Jeff
On Sun, Jan 20, 2013 at 7:07 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > As a junior reviewer, I'd like to know if my main task should be to decide > between 1) writing a review convincing you or Tom that your judgement is > hasty, or 2) to convince the author that your judgement is correct. That's a hard question. I'm not sure there's a categorical answer. I take positions both in support of and in opposition to the positions of other reviewers, and I suppose I don't see why you shouldn't do the same. It is of course worth bearing in mind that unless at least one committer is willing to support a given approach, it's not going anywhere ... but on the other hand, committers are just people, and do sometimes change their minds, too. So, really, I think you should try to persuade the person that you think is wrong, whoever that is. What I like least is when someone writes a review and says "some people think this is a good idea and some people think it's a bad idea". When I go back to look at the discussion and make a decision about whether to commit something, I want to have a clear idea whether most people liked it or most people didn't like it, and why. That helps to inform my thinking. When it's just me and the submitter, it's hard to tell whether anyone cares at all, one way or the other. > As it is now, when I see a patch that needs review but has no comments, I > don't know if that is because no senior developer has read it, or because > they have read it but didn't think it needed comment. The wiki page does > list points to consider when reviewing a submission, but not all points are > equally relevant to all patches--and it is not always obvious to me which > are more relevant. The main things I personally like to see reviewers do, in descending order of importance, are: 1. express about whether we want it or not, with supporting reasoning 2. review the architecture and draw attention to any possibly worrisome points 3. checklist items (does it have docs? does it have regression tests?coding style OK? etc.) > This is true, but I'd like to know up front that convincing them to revise > their thinking is the main thing I need to do during the review process. > Restating and clarifying the submitter's arguments, perhaps with the > addition of some experimental evidence, is one part where I think I can > contribute, provided I know that that is what I need to be doing. Agreed. > Having > them reserve their opinion until after it is marked "ready for commiter" > doesn't make it easier to change their mind, I don't think. As a reviewer I > can't help address their specific concerns, if I don't know what those were. Also agreed. I think one important duty of a reviewer (alluded to above) is to try to draw focus to whatever the central issues around the patch are. For some patches, the big question is "is it OK to break backward compatibility like this?", whereas for others it may be "is this actually useful?" and for still others it may be "does the SQL standard have anything to say about this?". Even if you as a reviewer don't know the answer to those questions, clearly delineating the key issues may enable others to comment without having to read and understand the whole patch, which can expedite things considerably. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 01/20/2013 09:37 PM, Robert Haas wrote: > On Sun, Jan 20, 2013 at 7:07 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> As a junior reviewer, I'd like to know if my main task should be to decide >> between 1) writing a review convincing you or Tom that your judgement is >> hasty, or 2) to convince the author that your judgement is correct. > That's a hard question. I don't think so. IMNSHO your job is neither - it is to give us your independent judgment. cheers andrew
On Sun, Jan 20, 2013 at 10:04 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > On 01/20/2013 09:37 PM, Robert Haas wrote: >> >> On Sun, Jan 20, 2013 at 7:07 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> >>> As a junior reviewer, I'd like to know if my main task should be to >>> decide >>> between 1) writing a review convincing you or Tom that your judgement is >>> hasty, or 2) to convince the author that your judgement is correct. >> >> That's a hard question. > > > > I don't think so. IMNSHO your job is neither - it is to give us your > independent judgment. That's pretty much what I went on to say. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jan 20, 2013 at 7:07 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> As a junior reviewer, I'd like to know if my main task should be to decide >> between 1) writing a review convincing you or Tom that your judgement is >> hasty, or 2) to convince the author that your judgement is correct. > ... I think you should try > to persuade the person that you think is wrong, whoever that is. Absolutely. Form your own opinion and marshal an argument for it. At the end of the day, most of us are reasonable people and can be convinced by appropriate evidence. regards, tom lane
On 21.01.2013 02:07, Jeff Janes wrote: > As a junior reviewer, I'd like to know if my main task should be to decide > between 1) writing a review convincing you or Tom that your judgement is > hasty, or 2) to convince the author that your judgement is correct. That > would provide me with some direction, rather than just having me pondering > whether a certain variable name ought to have one more or one fewer > underscores in it. On the other hand if a summary judgement is that the > patch is fundamentally sound but needs some line-by-line code-lawyering, or > some performance testing, or documentation/usability testing, or needs to > ponder the implications to part XYZ of the code base (which neither I nor > the other have even heard of before), that would also be good to know. > > My desire is not for you to tell me what the outcome of the review should > be, but rather what the focus of it should be. Often a patch contains a contentious change buried deep in the patch. The patch submitter might not realize there's anything out of the ordinary in changing parser behavior based on a GUC, or doing things in the executor that should be done in the planner, so he doesn't mention it in the submission email or highlight it with any comments. The longer the patch, the easier it is for things like that to hide in the crowd. If a reviewer can bring those potentially contentious things that don't "smell right" to light, that's extremely helpful. It's not so important what judgment you make on it; just bringing it up, in a short, separate reply to the email thread, will allow the submitter to reconsider, and a committer to jump in early to say "yeah, you can't do that, because X.". IMHO that's the single most important task of a review. - Heikki
> IMHO that's the single most important task of a review. Really? I'd say the most important task for a review is "does the patch do what it says it does?". That is, if the patch is supposed to implement feature X, does it actually? If it's a performance patch, does performance actually improve? If the patch doesn't implement what it's supposed to, who cares what the code looks like? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >> IMHO that's the single most important task of a review. > Really? I'd say the most important task for a review is "does the patch > do what it says it does?". That is, if the patch is supposed to > implement feature X, does it actually? If it's a performance patch, > does performance actually improve? > If the patch doesn't implement what it's supposed to, who cares what the > code looks like? But even before that, you have to ask whether what it's supposed to do is something we want. regards, tom lane
> But even before that, you have to ask whether what it's supposed to do > is something we want. The reviewer can't usually answer that though. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Mon, Jan 21, 2013 at 2:09 PM, Josh Berkus <josh@agliodbs.com> wrote: >> But even before that, you have to ask whether what it's supposed to do >> is something we want. > > The reviewer can't usually answer that though. They can answer whether THEY want it, though. And Tom, Andrew, and I all just got through arguing that that is one of the most, if not the most, important parts of a review. Seriously. Opinions are good. Lack of opinions leads to "ivory tower" syndrome, which of course we've got, but I think most of us are sufficiently self-aware to at least know that it isn't a good thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jan 20, 2013 at 7:07 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> As a junior reviewer, I'd like to know if my main task should be to decide >> between 1) writing a review convincing you or Tom that your judgement is >> hasty, or 2) to convince the author that your judgement is correct. > > Even if you as a > reviewer don't know the answer to those questions, clearly delineating > the key issues may enable others to comment without having to read and > understand the whole patch, which can expedite things considerably. We wouldn't have to convince anyone about a judgement being hasty if only those making judgement took some time and read the patch before making those judgements on list (even if doing so is not changing the judgement, it changes the hasty part already). Offering advice and making judgments are two very different things. And I'm lucky enough to have received plenty of very good advice from senior developers here that seldom did read my patches before offering them. I'm yet to see good use of anyone's time once a hasty judgement has been made about a patch. And I think there's no value judgement in calling a judgement hasty here: it's a decision you took and published *about a patch* before reading the patch. That's about it. If you don't have the time to read the patch, offer advice. Anything more, and you can set a watch to count the cumulative time we are all loosing on trying to convince each other about something else. Ok, now you will tell me there's no difference and it's just playing with words. It's not. Judgement is about how the patch do things, Advice is about how to do things in general. Ok, here's an hypothetical example, using some words we tend to never use here because we are all very polite and concerned about each other happiness when talking about carefully crafted code: Advice: You don't do things that way, this way is the only one we will ever accept, because we've been sweatingblood over the years to get in a position where it now works. Hint: it's not talking about the patch content, but what is supposedly a problem with the patch. It'seasy to answer "I'm so happy I didn't actually do it that way". Judgement: You need to think about the problem you want to solve before sending a patch, because there's a holein it too big for me to be able to count how many bugs are going to to dance in there. It's nota patch, it's a quagmire. BTW, I didn't read it, it stinks too much. Hint: imagine it was your patch and now you have to try and convince any other commiter to have alook at it. Now, I've been reviewing patches in all commit fests but one, and began reviewing well before the idea of a commit fest even existed. My idea of a good review is being able to come up with one of only 3 possible summaries: - it looks good, please consider a commit, any remaining work is going to have to be done by a commiter anyway - it's not ready, please fix this and that, think about this use case, review docs, whatever - it's ahead of me, this patch now needs a commiter (or just someone else) to read it then weigth in. at least it compiles,follow the usual code and comment style, the documentation is complete and free or errors, and I tried it andit does what it says (perfs, features, etc etc, bonus points for creative usage and trying to make it crash). Of course, reviewers don't get much feedback in general, so I can only hope that guideline is good enough for most commiters. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Wed, Jan 16, 2013 at 8:18 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: > Here's a breakdown based purely on the names from the CF page (i.e. I > didn't check archives to see who actually posted reviews, and didn't > take into account reviews posted without updating the CF page). FWIW, I reviewed at least one at the point you did this survey, and I did update the CF page, but I didn't put my name into that box marked reviewers because it is an extra step (Edit Patch) that isn't immediatly obvious. Isn't there a way to automatically populate that based on people linking in their reviews? I guess it might be difficult when a CF manager comes along to add them and they aren't their reviews.
On Mon, Jan 21, 2013 at 6:23 PM, Phil Sorber <phil@omniti.com> wrote: > On Wed, Jan 16, 2013 at 8:18 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: >> Here's a breakdown based purely on the names from the CF page (i.e. I >> didn't check archives to see who actually posted reviews, and didn't >> take into account reviews posted without updating the CF page). > > FWIW, I reviewed at least one at the point you did this survey, and I > did update the CF page, but I didn't put my name into that box marked > reviewers because it is an extra step (Edit Patch) that isn't > immediatly obvious. Isn't there a way to automatically populate that > based on people linking in their reviews? Sadly, the CF application doesn't have access to the user name -> real name mappings. And while it's only mildly annoying that the updates are displayed under user name rather than realname, showing username for the author and reviewers fields would be really annoying, especially because not all patch authors necessarily even have accounts. > I guess it might be > difficult when a CF manager comes along to add them and they aren't > their reviews. That's also an issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jan 21, 2013 at 5:48 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Advice: You don't do things that way, this way is the only one we > will ever accept, because we've been sweating blood over > the years to get in a position where it now works. > > Hint: it's not talking about the patch content, but what is > supposedly a problem with the patch. It's easy to answer > "I'm so happy I didn't actually do it that way". > > Judgement: You need to think about the problem you want to solve > before sending a patch, because there's a hole in it too > big for me to be able to count how many bugs are going to > to dance in there. It's not a patch, it's a quagmire. BTW, > I didn't read it, it stinks too much. > > Hint: imagine it was your patch and now you have to try and > convince any other commiter to have a look at it. I'm not going to pretend that all review comments are constructive, but I also think that to some degree the difference between these two things depends on your perspective. I recall, in particular, the email that prompted the famous "in short: -1 from me regards tom lane" T-shirt, which I believe to be this one: http://www.postgresql.org/message-id/28927.1236820868@sss.pgh.pa.us That's not a positive review, but when it comes down to it, it's a pretty factual email. IMHO, anyway, and YMMV. My own experience is different from yours, I guess. I actually like it when I post a patch, or suggest a concept, and Tom fires back with a laundry list of reasons it won't work. It often induces me to step back and approach the same problem from a different and better angle, and the result is often better for it. What I don't like is when I (or anyone) posts a patch and somebody says something that boils down to "no one wants that". *That* ticks me off. Because you know what? At a minimum, *I* want that. If I didn't, I wouldn't have written a patch. And usually, the customers I support want that, too. Now, somebody else may not want it, and that is fine. But IMHO, posting a patch should be considered prima facie evidence of non-zero demand for the associated feature. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert, > http://www.postgresql.org/message-id/28927.1236820868@sss.pgh.pa.us > > That's not a positive review, but when it comes down to it, it's a > pretty factual email. IMHO, anyway, and YMMV. Really? I've always thought that was a pretty constructive review. It certainly gave me the laundry list of things I'd have to fix to ever get that change in, and how what looked like a simple patch is actually fairly complicated. > My own experience is different from yours, I guess. I actually like > it when I post a patch, or suggest a concept, and Tom fires back with > a laundry list of reasons it won't work. This can be a problem with new submitters, though. If you're not used to the current community dialog, that email can be taken as "your idea is stupid because" rather than what it actually means, which is "fix these issues and resubmit, please". That's often not clearly communicated, and is important with new submitters. > It often induces me to step > back and approach the same problem from a different and better angle, > and the result is often better for it. What I don't like is when I > (or anyone) posts a patch and somebody says something that boils down > to "no one wants that". *That* ticks me off. Because you know what? > At a minimum, *I* want that. If I didn't, I wouldn't have written a > patch. And usually, the customers I support want that, too. Now, > somebody else may not want it, and that is fine. But IMHO, posting a > patch should be considered prima facie evidence of non-zero demand for > the associated feature. On the other hand, saying "this patch has potential side effects / peformance penalty / makes admin more complex / has a lot of code to maintain. How common is the use-case you're talking about?" is legit. Features which impose a "cost" on people who don't use them need to be justified as "useful enough" to be worth the cost. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Mon, Jan 21, 2013 at 8:23 PM, Robert Haas <robertmhaas@gmail.com> wrote: > What I don't like is when I > (or anyone) posts a patch and somebody says something that boils down > to "no one wants that". *That* ticks me off. Because you know what? > At a minimum, *I* want that. If I didn't, I wouldn't have written a > patch. And usually, the customers I support want that, too. Now, > somebody else may not want it, and that is fine. But IMHO, posting a > patch should be considered prima facie evidence of non-zero demand for > the associated feature. +1 I'd much rather hear "what you are trying to accomplish is much better done this other way." rather than, "why would you want to do this?" or as you said "no one wants that." > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 21, 2013 at 8:47 PM, Josh Berkus <josh@agliodbs.com> wrote: >> My own experience is different from yours, I guess. I actually like >> it when I post a patch, or suggest a concept, and Tom fires back with >> a laundry list of reasons it won't work. > > This can be a problem with new submitters, though. If you're not used > to the current community dialog, that email can be taken as "your idea > is stupid because" rather than what it actually means, which is "fix > these issues and resubmit, please". That's often not clearly > communicated, and is important with new submitters. > +1 to this as well. I definitely felt that way when submitting my first patch. Robert might even recall a convo at a conference that he and I had about it. If not for that I might have given up long ago. Now I can say I am used to it though, and I appreciate the honest and constructive criticism I receive because over time I have seen that Tom and others are even critical of themselves and it's really for the betterment of the final product.
On Mon, Jan 21, 2013 at 8:47 PM, Josh Berkus <josh@agliodbs.com> wrote: >> http://www.postgresql.org/message-id/28927.1236820868@sss.pgh.pa.us >> >> That's not a positive review, but when it comes down to it, it's a >> pretty factual email. IMHO, anyway, and YMMV. > > Really? I've always thought that was a pretty constructive review. It > certainly gave me the laundry list of things I'd have to fix to ever get > that change in, and how what looked like a simple patch is actually > fairly complicated. I agree that it was constructive, but it wasn't exactly endorsing the concept you put forward, so it was not a positive, aka favorable, review. Also, you did put it on a T-shirt. :-) >> My own experience is different from yours, I guess. I actually like >> it when I post a patch, or suggest a concept, and Tom fires back with >> a laundry list of reasons it won't work. > > This can be a problem with new submitters, though. If you're not used > to the current community dialog, that email can be taken as "your idea > is stupid because" rather than what it actually means, which is "fix > these issues and resubmit, please". That's often not clearly > communicated, and is important with new submitters. Yep. Even long-time participants in the process sometimes get demoralized. It's always fun to be the smartest guy in the room, and one rarely gets that luxury with any regularity on pgsql-hackers. >> It often induces me to step >> back and approach the same problem from a different and better angle, >> and the result is often better for it. What I don't like is when I >> (or anyone) posts a patch and somebody says something that boils down >> to "no one wants that". *That* ticks me off. Because you know what? >> At a minimum, *I* want that. If I didn't, I wouldn't have written a >> patch. And usually, the customers I support want that, too. Now, >> somebody else may not want it, and that is fine. But IMHO, posting a >> patch should be considered prima facie evidence of non-zero demand for >> the associated feature. > > On the other hand, saying "this patch has potential side effects / > peformance penalty / makes admin more complex / has a lot of code to > maintain. How common is the use-case you're talking about?" is legit. > Features which impose a "cost" on people who don't use them need to be > justified as "useful enough" to be worth the cost. Totally agreed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jan 22, 2013 at 7:57 AM, Phil Sorber <phil@omniti.com> wrote: > On Mon, Jan 21, 2013 at 8:47 PM, Josh Berkus <josh@agliodbs.com> wrote: >>> My own experience is different from yours, I guess. I actually like >>> it when I post a patch, or suggest a concept, and Tom fires back with >>> a laundry list of reasons it won't work. >> >> This can be a problem with new submitters, though. If you're not used >> to the current community dialog, that email can be taken as "your idea >> is stupid because" rather than what it actually means, which is "fix >> these issues and resubmit, please". That's often not clearly >> communicated, and is important with new submitters. >> > > +1 to this as well. I definitely felt that way when submitting my > first patch. Robert might even recall a convo at a conference that he > and I had about it. If not for that I might have given up long ago. > Now I can say I am used to it though, and I appreciate the honest and > constructive criticism I receive because over time I have seen that > Tom and others are even critical of themselves and it's really for the > betterment of the final product. For me our reluctance for any kind of change is a major demoralizing factor. For example, the recent discussion on Jeff Davis's patch on PD_ALL_VISIBLE flag. Our committers are quite candid in accepting that they may not have put in so much thought or did any real testing while adding the original code, but when it comes to making a change we show reluctance, ask for several test results and the patch author really needs to show extra worth for the change. That looks a bit unfair because the original code may not have received the same attention. Though I completely understand that the committers usually have a better sense and gut feel. If Heikki had not put that page level flag in the first place and if someone would have suggested adding that flag today, my sense is that we would have pushed him/her back and asked for many explanations like why waste a bit in the page or the performance increase is just 10% and that too in the worst case, so not worth it. Another example is the SKIP_PAGES_THRESHOLD in vacuumlazy.c. I'm quite convinced that the current default of 32 is quite arbitrary and the entire logic to not skip heap pages unless there are 32 in sequence is hard to justify, but if that needs to be changed, we would need hours of testing to show its worth. (BTW, I know both these changes are probably committed by Heikki. But thats just a coincidence :-) I've great respect for Heikki's talent and he is often right in his judgement) Having said that, I quite understand the reasoning behind the reluctance for change - we are a small community and can't afford to spend cycles spending on unnecessary regressions. But is that changing in the recent years ? I mean, aren't there a lot more developers and a lot more companies using/testing the new features/releases and reporting issues ? I see a marked increase in the number of bugs reported (haven't really counted but just observation and it could be wrong). Not sure if its a result of increased testing/adaption or a slight drop in the quality or just because of the sheer increase in the number of features/changes we are doing. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Pavan Deolasee <pavan.deolasee@gmail.com> writes: > For me our reluctance for any kind of change is a major demoralizing > factor. I hardly think we're "reluctant for any kind of change" --- the rate of commits belies that. What we want is a convincing case that a proposed change is an improvement over what we have. (You're entirely right that there are plenty of dubious places in the existing code, but most of them replaced nothing at all, and were thus improvements. To improve further requires, well, clear improvement.) In the case of PD_ALL_VISIBLE, I believe there is very serious reason to think removing it would be a performance loss at least some of the time. We need proof it's an overall improvement, not lack of proof it isn't. > Having said that, I quite understand the reasoning behind the > reluctance for change - we are a small community and can't afford to > spend cycles spending on unnecessary regressions. But is that changing > in the recent years ? I mean, aren't there a lot more developers and a > lot more companies using/testing the new features/releases and > reporting issues ? Yeah, and a lot more fairly-new developers who don't understand all the connections in the existing system. Let me just push back a bit here: based on the amount of time I've had to spend fixing bugs over the past five months, 9.2 was our worst release ever. I don't like that trend, and I don't want to see it continued because we get laxer about accepting patches. IMO we are probably too lax already. In this connection I refer you to Sturgeon's Law(*): 90% of everything is crud. Applied to our problem, it says that 90% of all patch ideas are bad. Therefore, we should be expecting to reject a large fraction of submitted patches. It distresses me that people seem to think the default result for a submitted patch is that it gets committed. That's backwards. For a very long time we've tried to encourage people to submit rough ideas to pgsql-hackers for discussion *before* they start coding. The point of that is to weed out, or fix if possible, (some of) the bad ideas before a lot of effort has gotten expended on them. It seems though that less and less of that is actually happening, so I wonder whether the commitfest process is encouraging inefficiency by making people think they should start a discussion with a mostly-complete patch. Or maybe CF review is just crowding out any serious discussion of rough ideas. There was some discussion at the last dev meeting of creating a "design review" process within commitfests, but nothing got done about it ... regards, tom lane (*) The wikipedia entry is good enough. Whether Ted had the percentage spot-on is not the key issue here.
<p dir="ltr"><br /> On Jan 22, 2013 1:31 AM, "Robert Haas" <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>wrote:<br /> ><br /> > On Mon, Jan 21, 2013 at 6:23PM, Phil Sorber <<a href="mailto:phil@omniti.com">phil@omniti.com</a>> wrote:<br /> > > On Wed, Jan 16, 2013at 8:18 AM, Abhijit Menon-Sen <<a href="mailto:ams@2ndquadrant.com">ams@2ndquadrant.com</a>> wrote:<br /> >>> Here's a breakdown based purely on the names from the CF page (i.e. I<br /> > >> didn't check archivesto see who actually posted reviews, and didn't<br /> > >> take into account reviews posted without updatingthe CF page).<br /> > ><br /> > > FWIW, I reviewed at least one at the point you did this survey, andI<br /> > > did update the CF page, but I didn't put my name into that box marked<br /> > > reviewers becauseit is an extra step (Edit Patch) that isn't<br /> > > immediatly obvious. Isn't there a way to automaticallypopulate that<br /> > > based on people linking in their reviews?<br /> ><br /> > Sadly, the CFapplication doesn't have access to the user name -> real<br /> > name mappings. And while it's only<p dir="ltr">Sureit does. It only has it for users that have logged in at least once. But that wouldn't be a problem for thisscenario as they would have already logged in to post said link. <p dir="ltr">/Magnus
Tom Lane <tgl@sss.pgh.pa.us> writes: > In this connection I refer you to Sturgeon's Law(*): 90% of everything > is crud. Applied to our problem, it says that 90% of all patch ideas > are bad. Therefore, we should be expecting to reject a large fraction > of submitted patches. It distresses me that people seem to think the > default result for a submitted patch is that it gets committed. That's > backwards. +1 I still think it takes loads of stupid ideas and discussion before reaching a point where a patch can be brewed and proposed. Once you reach a certain point though, typically when we begin talking about careful implementation details, then my feeling is that a patch is necessary for the discussion to remain a productive use of everyone's time. So the goal in your proposed terms would be to only spend time cooking a patch for about 10% of your ideas, and be prepared to rewrite it from about scratch a least a couple of times (for a simple enough patch). That matches my experience. > For a very long time we've tried to encourage people to submit rough > ideas to pgsql-hackers for discussion *before* they start coding. > The point of that is to weed out, or fix if possible, (some of) the bad > ideas before a lot of effort has gotten expended on them. It seems > though that less and less of that is actually happening, so I wonder > whether the commitfest process is encouraging inefficiency by making > people think they should start a discussion with a mostly-complete > patch. Or maybe CF review is just crowding out any serious discussion > of rough ideas. There was some discussion at the last dev meeting of > creating a "design review" process within commitfests, but nothing got > done about it ... I share that feeling that while commit fest is a giant step forward as far as allowing patches to make progress and hit the next release without delaying said release, it might be encouraging people to cook patches too early: there's no entry for "crazy ideas" or design. I guess in getting more formal it's harder for newcomers to just throw (not so) random ideas on list, as it used to be the only way to begin a contribution to PostgreSQL. My understanding is that we already have too many lists, but maybe we could have another one to just speak about those 10% ideas that turn into patches or commit fest entries (pgsql-workers) and keep hackers for crazy idea, design, community processes, etc. I'm not sold on it myself, but it could maybe help in encouraging design ideas again? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Tue, Jan 22, 2013 at 11:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > In this connection I refer you to Sturgeon's Law(*): 90% of everything > is crud. Applied to our problem, it says that 90% of all patch ideas > are bad. That reminds of my conversation with our masters thesis guide who is a accomplished professor and an IEEE fellow with loads of research work to his credit. I was disappointed that I'm not doing a *significant* research work, to which he replied that 90% work is just like that. But every once in a while someone gets inspired by that work and either take that further or creates a landmark work out of it. I was satisfied with his answer. So the 90% crap you mentioned is probably not all that crap and we need that so that newer ideas are conceived while discussing some of them. > Therefore, we should be expecting to reject a large fraction > of submitted patches. It distresses me that people seem to think the > default result for a submitted patch is that it gets committed. That's > backwards. > Yeah, probably that expectation needs to be set. What will help more is to somehow guide the patch submitter to a related area and encourage to handle that issue or the same issue in a little different but more acceptable way. I think we already do a lot of that but more can be done. Its OK to push back patch submitter and they will be fine as long as their second or third attempt succeeds. If Tom or equivalent feels that the idea is no starter, just reject it at the first go and before the submitter has done multiple iterations to save everyone's time and reduce frustration. > For a very long time we've tried to encourage people to submit rough > ideas to pgsql-hackers for discussion *before* they start coding. > The point of that is to weed out, or fix if possible, (some of) the bad > ideas before a lot of effort has gotten expended on them. It seems > though that less and less of that is actually happening, Agree. Many a times its important though that senior folks give their opinion one way or the other, early in the review cycle. That does not happen always. I remember during HOT development I used to wait for days and weeks for Tom to take note of my emails and respond, even if he is ripping apart my ideas. But the fact that HE has looked into the work was a great satisfaction. Once someone is taking interest then there is a chance of convincing that person. Thankfully we have a lot more Tom-alike (not Tom yet :-)) now than 5 years back and thats why we are seeing a lot more intrusive work. > There was some discussion at the last dev meeting of > creating a "design review" process within commitfests, but nothing got > done about it ... > Yeah, agree. May be we need to put that in the process itself. So no patch be submitted unless the idea has been discussed and agreed upon to some extent. Of course, few things you will only know once you start writing the code. But at least the major points must have been accepted by at least one major developer or a committer. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Robert Haas <robertmhaas@gmail.com> writes: > I'm not going to pretend that all review comments are constructive, > but I also think that to some degree the difference between these two > things depends on your perspective. I recall, in particular, the > email that prompted the famous "in short: -1 from me regards tom lane" > T-shirt, which I believe to be this one: > > http://www.postgresql.org/message-id/28927.1236820868@sss.pgh.pa.us > > That's not a positive review, but when it comes down to it, it's a > pretty factual email. IMHO, anyway, and YMMV. This email is not saying anything about the patch's content, not offering any judgement ("you did it the wrong way"), it is all about offering some pieces of advice on the complexity of SET and RESET. > My own experience is different from yours, I guess. I actually like > it when I post a patch, or suggest a concept, and Tom fires back with > a laundry list of reasons it won't work. It often induces me to step > back and approach the same problem from a different and better angle, > and the result is often better for it. What I don't like is when I What I was talking about is judgements on a patch the commenter didn't read. Being offered advices by people in the know is awesome, and if that happens early in the patch life (design), it's even better. > (or anyone) posts a patch and somebody says something that boils down > to "no one wants that". *That* ticks me off. Because you know what? > At a minimum, *I* want that. If I didn't, I wouldn't have written a > patch. And usually, the customers I support want that, too. Now, > somebody else may not want it, and that is fine. But IMHO, posting a > patch should be considered prima facie evidence of non-zero demand for > the associated feature. That part reminds me too much of the Inline Extension patch series for me to comment any further. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
>> > > Yeah, agree. May be we need to put that in the process itself. So no > patch be submitted unless the idea has been discussed and agreed upon > to some extent. Of course, few things you will only know once you > start writing the code. But at least the major points must have been > accepted by at least one major developer or a committer. This is little bit problem, because some discussion starts too late - in commitfest. In my experience +/- 30% of commited patches was zero response after proposal. Usually people watching some interesting subset of topics and doesn't comment "uninteresting" proposals. There are no timeout for rejection or accepting proposals. > > Thanks, > Pavan > > -- > Pavan Deolasee > http://www.linkedin.com/in/pavandeolasee > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On 01/22/2013 01:15 AM, Tom Lane wrote: > > Yeah, and a lot more fairly-new developers who don't understand all the > connections in the existing system. I think it's just in the nature of the beast we're dealing with to be much more conservative about what we accept than it might be for some other projects. This helps to contribute to our deserved reputation for stability and reliability. > > For a very long time we've tried to encourage people to submit rough > ideas to pgsql-hackers for discussion *before* they start coding. > The point of that is to weed out, or fix if possible, (some of) the bad > ideas before a lot of effort has gotten expended on them. Getting this idea across can be surprisingly difficult, especially in cultures not used to dealing with open source. Large commercial ventures with a history in proprietary software tend to want to come to the table with a shiny new feature patch, all gift wrapped and sparkly. And they are not alone - even new individual contributors want to work this way. I wouldn't blame the commitfest procedure - it's just something we need to keep hammering on. cheers andrew
<div class="moz-cite-prefix">On 22/01/13 22:35, Dimitri Fontaine wrote:<br /></div><blockquote cite="mid:m2sj5tmufc.fsf@2ndQuadrant.fr"type="cite"><pre wrap="">Tom Lane <a class="moz-txt-link-rfc2396E" href="mailto:tgl@sss.pgh.pa.us"><tgl@sss.pgh.pa.us></a>writes: </pre><blockquote type="cite"><pre wrap="">In this connection I refer you to Sturgeon's Law(*): 90% of everything is crud. Applied to our problem, it says that 90% of all patch ideas are bad. Therefore, we should be expecting to reject a large fraction of submitted patches. It distresses me that people seem to think the default result for a submitted patch is that it gets committed. That's backwards. </pre></blockquote><pre wrap=""> +1 I still think it takes loads of stupid ideas and discussion before reaching a point where a patch can be brewed and proposed. Once you reach a certain point though, typically when we begin talking about careful implementation details, then my feeling is that a patch is necessary for the discussion to remain a productive use of everyone's time. So the goal in your proposed terms would be to only spend time cooking a patch for about 10% of your ideas, and be prepared to rewrite it from about scratch a least a couple of times (for a simple enough patch). That matches my experience. </pre><blockquote type="cite"><pre wrap="">For a very long time we've tried to encourage people to submit rough ideas to pgsql-hackers for discussion *before* they start coding. The point of that is to weed out, or fix if possible, (some of) the bad ideas before a lot of effort has gotten expended on them. It seems though that less and less of that is actually happening, so I wonder whether the commitfest process is encouraging inefficiency by making people think they should start a discussion with a mostly-complete patch. Or maybe CF review is just crowding out any serious discussion of rough ideas. There was some discussion at the last dev meeting of creating a "design review" process within commitfests, but nothing got done about it ... </pre></blockquote><pre wrap=""> I share that feeling that while commit fest is a giant step forward as far as allowing patches to make progress and hit the next release without delaying said release, it might be encouraging people to cook patches too early: there's no entry for "crazy ideas" or design. I guess in getting more formal it's harder for newcomers to just throw (not so) random ideas on list, as it used to be the only way to begin a contribution to PostgreSQL. My understanding is that we already have too many lists, but maybe we could have another one to just speak about those 10% ideas that turn into patches or commit fest entries (pgsql-workers) and keep hackers for crazy idea, design, community processes, etc. I'm not sold on it myself, but it could maybe help in encouraging design ideas again? Regards, </pre></blockquote><font size="-1">Maybe we should have a list named 'cra<font size="-1">zy talk' for such ideas.<br /><br/><font size="-1">Possibly encourage Tom <font size="-1">and other pg heavy weights to write brief notes about theirinten<font size="-1">tions<font size="-1">. So people ca<font size="-1">n</font> see that even the best pg developerscan have half baked ideas, to encourage <font size="-1">newcomers</font> and less experienced developers to letpeople know well in advance what their intentions are.<br /><br /><font size="-1">Most people don't realize tat to bereally stupid, one needs to be both hi<font size="-1">ghly intelligent and creative!<br /><br /><font size="-1">More tothe poin<font size="-1">t</font>, perhaps, is that the most effective developers often have many silly ideas that are:well intentioned but not practicable<font size="-1">,</font> ini<font size="-1">tially implemented poorly,</font></font></font></font></font></font></font></font></font></font><fontsize="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"> or develop something that seem good until grim reality hits. However, when theystrike gold, the<font size="-1">y improve pg: in considerable ways, make trivial changes that are disp<font size="-1">r</font>oportionately useful,</font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"> or put a lot of <font size="-1">effort into makin<fontsize="-1">g what superficially looks like a simple idea to actually work without bad side effects.<br /><br /><fontsize="-1">I am sure with even my 40+ years of development experience in other areas, that if I attempted a 'trivial'change to pg, that I would most likely cause unwanted side effects without realizing it - assuming I got it workingat all! Tom and others would then quite rightly reject my patch, or give me constructive <font size="-1">criticism(that I might take as negative feed<font size="-1">back if I was depressed!). Howevever, if I first proposedmy idea on a 'crazy talk' mailing list, then poss<font size="-1">i</font>bly I would either get suggestions as tohow to proceed to avoid such side effects, or be told that <font size="-1">what I proposed was not wo<font size="-1">r<fontsize="-1">t</font>h the effort. In the latter case, it would be up to me to re<font size="-1">search reasonsfor proceedin<font size="-1">g, if I felt strongly tha<font size="-1">t I was right.</font></font></font></font></font></font></font></font><br /></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font><br /><br/><font size="-1">Cheers,<br /><font size="-1">Gavin</font><br /></font></font></font></font></font></font></font>
On Tue, Jan 22, 2013 at 1:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah, and a lot more fairly-new developers who don't understand all the > connections in the existing system. Let me just push back a bit here: > based on the amount of time I've had to spend fixing bugs over the past > five months, 9.2 was our worst release ever. I don't like that trend, > and I don't want to see it continued because we get laxer about > accepting patches. IMO we are probably too lax already. Really? Hmm, that's not good. I seem to recall 8.4.x being pretty bad, and some of the recent bugs we fixed were actually 9.1.x problems that slipped through the cracks. > For a very long time we've tried to encourage people to submit rough > ideas to pgsql-hackers for discussion *before* they start coding. > The point of that is to weed out, or fix if possible, (some of) the bad > ideas before a lot of effort has gotten expended on them. It seems > though that less and less of that is actually happening, so I wonder > whether the commitfest process is encouraging inefficiency by making > people think they should start a discussion with a mostly-complete > patch. Or maybe CF review is just crowding out any serious discussion > of rough ideas. There was some discussion at the last dev meeting of > creating a "design review" process within commitfests, but nothing got > done about it ... I think there's been something of a professionalization of PostgreSQL development over the last few years. More and more people are able to get paid to work on PostgreSQL as part or in a few cases all of their job. This trend includes me, but also a lot of other people. There are certainly good things about this, but I think it increases the pressure to get patches committed. If you are developing a patch on your own time and it doesn't go in, it may be annoying but that's about it. If you're getting paid to get that patch committed and it doesn't go in, either your boss cares or your customer cares, or both, so now you have a bigger problem. Of course, this isn't always true: I don't know everyone's employment arrangements, but there are some people who are paid to work on PostgreSQL with no real specific agenda, just a thought of generally contributing to the community. However, I believe this to be less common than an arrangement involving specific deliverables. Whatever the arrangement, jobs where you get to work on PostgreSQL as part of your employment mean that you can get more stuff done. Whatever you can get done during work time plus any nonwork time you care to contribute will be more than what you could get done in the latter time alone. And I think we're seeing that, too. That then puts more pressure on the people who need to do reviews and commits, because there's just more stuff to look at, and you know, a lot of it is really good stuff. A lot of it has big problems, too, but we could be doing a lot worse. Nonwithstanding, it's a lot of work, and the number of people who know the system well enough to review the really difficult patches is growing, but not as fast as the number of people who have time to write them. For all of that, I'm not sure that people failing to seek consensus before coding is really so much of a problem as you seem to think. A lot of the major efforts underway for this release have been discussed extensively on multiple threads. The fact that some of those ideas may be less than half-baked at this point may in some cases be the submitter's fault, but there also cases where it's just that they haven't got enough looking-at from the people who know enough to evaluate them in detail, and perhaps some cases that are really nobody's fault: nothing in life is going to be perfect all the time no matter how hard everyone tries. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-01-23 11:44:29 -0500, Robert Haas wrote: > On Tue, Jan 22, 2013 at 1:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Yeah, and a lot more fairly-new developers who don't understand all the > > connections in the existing system. Let me just push back a bit here: > > based on the amount of time I've had to spend fixing bugs over the past > > five months, 9.2 was our worst release ever. I don't like that trend, > > and I don't want to see it continued because we get laxer about > > accepting patches. IMO we are probably too lax already. > > Really? Hmm, that's not good. I seem to recall 8.4.x being pretty > bad, and some of the recent bugs we fixed were actually 9.1.x problems > that slipped through the cracks. FWIW I concur with Tom's assessment. > On Tue, Jan 22, 2013 at 1:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > For a very long time we've tried to encourage people to submit rough > > ideas to pgsql-hackers for discussion *before* they start coding. > > The point of that is to weed out, or fix if possible, (some of) the bad > > ideas before a lot of effort has gotten expended on them. It seems > > though that less and less of that is actually happening, so I wonder > > whether the commitfest process is encouraging inefficiency by making > > people think they should start a discussion with a mostly-complete > > patch. Or maybe CF review is just crowding out any serious discussion > > of rough ideas. There was some discussion at the last dev meeting of > > creating a "design review" process within commitfests, but nothing got > > done about it ... Are you thinking of specific patches? From what I remember all all the bigger patches arround the 9.3 cycle were quite heavily discussed during multiple stages of their development. I agree that its not necessarily the case for smaller patches but at least for me in those cases its often hard to judge how complex something is before doing an initial prototype. One aspect of this might be that its sometimes easier to convince -hackers of some idea if you can prove its doable. Another that the amount of bikeshedding seems to be far lower if a patch already shapes a feature in some way. > I think there's been something of a professionalization of PostgreSQL > development over the last few years. More and more people are able > to get paid to work on PostgreSQL as part or in a few cases all of > their job. This trend includes me, but also a lot of other people. Yes. > There are certainly good things about this, but I think it increases > the pressure to get patches committed. If you are developing a patch > on your own time and it doesn't go in, it may be annoying but that's > about it. If you're getting paid to get that patch committed and it > doesn't go in, either your boss cares or your customer cares, or both, > so now you have a bigger problem. And it also might shape the likelihood of getting paid for future work, be that a specific patch or time for hacking/maintenance. > For all of that, I'm not sure that people failing to seek consensus > before coding is really so much of a problem as you seem to think. A > lot of the major efforts underway for this release have been discussed > extensively on multiple threads. The fact that some of those ideas > may be less than half-baked at this point may in some cases be the > submitter's fault, but there also cases where it's just that they > haven't got enough looking-at from the people who know enough to > evaluate them in detail, and perhaps some cases that are really > nobody's fault: nothing in life is going to be perfect all the time no > matter how hard everyone tries. I agree. Take logical replication/decoding for example. While we developed a prototype first, we/I tried to take in as much feedback from the community as we could. Just about no code, and only few concepts, from the initial prototype remain, and thats absolutely good. I don't think a more "talk about it first" approach would have helped here. Do others disagree? But as soon as the rough, rough design was laid out the amount of specific feedback shrank. Part of that is that some people involved in the discussions had changes in their work situation that influenced the amount of time they could spend on it, but I think one other problem is that at some point it gets hard to give feedback on a complex, evolving patch without it eating up big amount of time. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 21, 2013 at 02:04:14PM -0500, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > >> IMHO that's the single most important task of a review. > > > Really? I'd say the most important task for a review is "does the patch > > do what it says it does?". That is, if the patch is supposed to > > implement feature X, does it actually? If it's a performance patch, > > does performance actually improve? > > > If the patch doesn't implement what it's supposed to, who cares what the > > code looks like? > > But even before that, you have to ask whether what it's supposed to do > is something we want. Yep. Our TODO list has a pretty short summary of this at the top: Desirability -> Design -> Implement -> Test -> Review -> Commit -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 01/23/2013 09:08 AM, Andres Freund wrote: > On 2013-01-23 11:44:29 -0500, Robert Haas wrote: >> On Tue, Jan 22, 2013 at 1:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Yeah, and a lot more fairly-new developers who don't understand all the >>> connections in the existing system. Let me just push back a bit here: >>> based on the amount of time I've had to spend fixing bugs over the past >>> five months, 9.2 was our worst release ever. I don't like that trend, >>> and I don't want to see it continued because we get laxer about >>> accepting patches. IMO we are probably too lax already. >> >> Really? Hmm, that's not good. I seem to recall 8.4.x being pretty >> bad, and some of the recent bugs we fixed were actually 9.1.x problems >> that slipped through the cracks. > > FWIW I concur with Tom's assessment. The only way to fix increasing bug counts is through more-comprehensive regular testing. Currently we have regression/unit tests which cover maybe 30% of our code. Performance testing is largely ad-hoc. We don't require comprehensive acceptance testing for new patches. And we have > 1m lines of code. Of course our bug count is increasing. I'm gonna see if I can do something about improving our test coverage. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 01/23/2013 09:51 AM, Josh Berkus wrote: > The only way to fix increasing bug counts is through more-comprehensive > regular testing. Currently we have regression/unit tests which cover > maybe 30% of our code. Performance testing is largely ad-hoc. We don't > require comprehensive acceptance testing for new patches. And we have > > 1m lines of code. Of course our bug count is increasing. > And... slow down the release cycle or slow down the number of features that are accepted. Don't get me wrong I love everything we have and are adding every cycle but there does seem to be a definite weight difference between # of features added and time spent allowing those features to settle. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579
* Robert Haas (robertmhaas@gmail.com) wrote: > For all of that, I'm not sure that people failing to seek consensus > before coding is really so much of a problem as you seem to think. For my part, I don't think the lack of consensus-finding before submitting patches is, in itself, a problem. The problem, imv, is that everyone is expecting that once they've written a patch and put it on a commitfest that it's going to get committed- and it seems like committers are feeling under pressure that, because something's on the CF app, it needs to be committed in some form. There's a lot of good stuff out there, sure, and even more good *ideas*, but it's important to make sure we can provide a stable system with regular releases. As discussed, we really need to be ready to truely triage the remaining patch set, figure out who is going to work on what, and punt the rest til post-9.3. Thanks, Stephen
On Wed, Jan 23, 2013 at 1:44 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> For all of that, I'm not sure that people failing to seek consensus >> before coding is really so much of a problem as you seem to think. > > For my part, I don't think the lack of consensus-finding before > submitting patches is, in itself, a problem. > > The problem, imv, is that everyone is expecting that once they've > written a patch and put it on a commitfest that it's going to get > committed- and it seems like committers are feeling under pressure > that, because something's on the CF app, it needs to be committed > in some form. > FWIW, I have NO delusions that something I propose or submit or put in a CF is necessarily going to get committed. For me it's not committed until I can see it in 'git log' and even then, I've seen stuff get reverted. I would hope that if a committer isn't comfortable with a patch they would explain why, and decline to commit. Then it's up to the submitter as to whether or not they want to make changes, try to explain why they are right and the committer is wrong, or withdraw the patch. > There's a lot of good stuff out there, sure, and even more good *ideas*, > but it's important to make sure we can provide a stable system with > regular releases. As discussed, we really need to be ready to truely > triage the remaining patch set, figure out who is going to work on what, > and punt the rest til post-9.3. > > Thanks, > > Stephen
On Wed, Jan 23, 2013 at 3:23 PM, Phil Sorber <phil@omniti.com> wrote: > On Wed, Jan 23, 2013 at 1:44 PM, Stephen Frost <sfrost@snowman.net> wrote: >> * Robert Haas (robertmhaas@gmail.com) wrote: >>> For all of that, I'm not sure that people failing to seek consensus >>> before coding is really so much of a problem as you seem to think. >> >> For my part, I don't think the lack of consensus-finding before >> submitting patches is, in itself, a problem. >> >> The problem, imv, is that everyone is expecting that once they've >> written a patch and put it on a commitfest that it's going to get >> committed- and it seems like committers are feeling under pressure >> that, because something's on the CF app, it needs to be committed >> in some form. >> > > FWIW, I have NO delusions that something I propose or submit or put in > a CF is necessarily going to get committed. For me it's not committed > until I can see it in 'git log' and even then, I've seen stuff get > reverted. I would hope that if a committer isn't comfortable with a > patch they would explain why, and decline to commit. Then it's up to > the submitter as to whether or not they want to make changes, try to > explain why they are right and the committer is wrong, or withdraw the > patch. I think that's the right attitude, but it doesn't always work out that way. Reviewers and committers sometimes spend a lot of time writing a review and then get flamed for their honest opinion about the readiness of a patch. Of course, reviewers and committers can be jerks, too. As far as I know, we're all human, here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 23.01.2013 20:44, Stephen Frost wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> For all of that, I'm not sure that people failing to seek consensus >> before coding is really so much of a problem as you seem to think. > > For my part, I don't think the lack of consensus-finding before > submitting patches is, in itself, a problem. I feel the same. Posting a patch before design and consensus may be a huge waste of time for the submitter himself, but it's not a problem for others. > The problem, imv, is that everyone is expecting that once they've > written a patch and put it on a commitfest that it's going to get > committed- and it seems like committers are feeling under pressure > that, because something's on the CF app, it needs to be committed > in some form. I'm sure none of the committers have a problem rejecting a patch, when there's clear grounds for it. Rejecting is the easiest way to deal with a patch. However, at least for me, >50% of the patches in any given commitfest don't interest me at all. I don't object to them, per se, but I don't want to spend any time on them either. I can imagine the same to be true for all other committers too. If a patch doesn't catch the interest of any committer, it's going to just sit there in the commitfest app for a long time, even if it's been reviewed. > As discussed, we really need to be ready to truely > triage the remaining patch set, figure out who is going to work on what, > and punt the rest til post-9.3. FWIW, here's how I feel about some the patches. It's not an exhaustive list. "Event Triggers: Passing Information to User Functions (from 2012-11)" I don't care about this whole feature, and haven't looked at the patch.. Probably not worth the complexity. But won't object if someone else wants to deal with it. "Extension templates" Same as above. "Checksums (initdb-time)" I don't want this feature, and I've said that on the thread. "Identity projection (partitioning)" Nothing's happened for over a month. Seems dead for that reason. "Remove unused targets from plan" Same here. "Store additional information in GIN index" Same here. "Index based regexp search for pg_trgm" I'd like to see this patch go in, but it still needs a fair amount of work. Probably needs to be pushed to next commitfest for that reason. "allowing privileges on untrusted languages" Seems like a bad idea to me, for the reasons Tom mentioned on that thread. "Skip checkpoint on promoting from streaming replication" Given that we still don't have an updated patch for this, it's probably getting too late for this. Would be nice to see the patch or an explanation of the new design Simon's been working. "Patch to compute Max LSN of Data Pages (from 2012-11)" Meh. Seems like a really special-purpose tool. Probably better to put this on pgfoundry. "logical changeset generation v4" This is a boatload of infrastructure for supporting logical replication, yet we have no code actually implementing logical replication that would go with this. The premise of logical replication over trigger-based was that it'd be faster, yet we cannot asses that without a working implementation. I don't think this can be committed in this state. "built-in/SQL Command to edit the server configuration file (postgresql.conf)" I think this should be a pgfoundry project, not in core. At least for now. "json generator function enhacements" "Json API and extraction functions" To be honest, I don't understand why the json datatype had to be built-in to begin with. But it's too late to object to that now, and if the datatype is there, these functions probably should be, too. Or could these be put into a separate "json-extras" extension? I dunno. "psql watch" Meh. In practice, for the kind of ad-hoc monitoring this would be useful for, you might as well just use "watch psql -c 'select ...' ". Yes, that reconnects for each query, but so what. "plpgsql_check_function" I don't like this in its current form. A lot of code that mirrors pl_exec.c. I think we'll have to find a way to somehow re-use the existing code for this. Like, compile the function as usual, but don't stop on error. - Heikki
Heikki, * Heikki Linnakangas (hlinnakangas@vmware.com) wrote: > FWIW, here's how I feel about some the patches. It's not an exhaustive list. Thanks for going through them and commenting on them. > "Event Triggers: Passing Information to User Functions (from 2012-11)" > I don't care about this whole feature, and haven't looked at the > patch.. Probably not worth the complexity. But won't object if > someone else wants to deal with it. I'd like to see it happen. > "Extension templates" > Same as above. This one isn't actually all that complex, though it does add a few additional catalogs for keeping track of things. For my part, I'd like to see this go in as I see it being another step closer to Packages that a certain other RDBMS has. > "Checksums (initdb-time)" > I don't want this feature, and I've said that on the thread. I see a lot of value in this. > "Identity projection (partitioning)" > Nothing's happened for over a month. Seems dead for that reason. I don't think this is dead.. > "Remove unused targets from plan" > Same here. > > "Store additional information in GIN index" > Same here. Havn't been following these so not sure. Some of these are in a state of lack of progress for having not been reviewed. > "Index based regexp search for pg_trgm" > I'd like to see this patch go in, but it still needs a fair amount > of work. Probably needs to be pushed to next commitfest for that > reason. Agreed. > "allowing privileges on untrusted languages" > Seems like a bad idea to me, for the reasons Tom mentioned on that thread. On the fence about this one. I like the idea of reducing the need to be a superuser, but there are risks associated with this change also. > "Skip checkpoint on promoting from streaming replication" > Given that we still don't have an updated patch for this, it's > probably getting too late for this. Would be nice to see the patch > or an explanation of the new design Simon's been working. > > "Patch to compute Max LSN of Data Pages (from 2012-11)" > Meh. Seems like a really special-purpose tool. Probably better to > put this on pgfoundry. Agreed on these two. > "logical changeset generation v4" > This is a boatload of infrastructure for supporting logical > replication, yet we have no code actually implementing logical > replication that would go with this. The premise of logical > replication over trigger-based was that it'd be faster, yet we > cannot asses that without a working implementation. I don't think > this can be committed in this state. Andres had a working implementation of logical replication, with code to back it up and performance numbers showing how much faster it is, at PGCon last year, as I recall... Admittedly, it probably needs changing and updating due to the changes which the patches have been going through, but I don't think the claim that we don't know it's faster is fair. > "built-in/SQL Command to edit the server configuration file > (postgresql.conf)" > I think this should be a pgfoundry project, not in core. At least for now. I don't think it would ever get deployed or used then.. > "json generator function enhacements" > "Json API and extraction functions" > To be honest, I don't understand why the json datatype had to be > built-in to begin with. But it's too late to object to that now, and > if the datatype is there, these functions probably should be, too. > Or could these be put into a separate "json-extras" extension? I > dunno. There were good reasons to add json as a data type, I thought, though I don't have them offhand. > "psql watch" > Meh. In practice, for the kind of ad-hoc monitoring this would be > useful for, you might as well just use "watch psql -c 'select ...' > ". Yes, that reconnects for each query, but so what. I do that pretty often. A better approach, imv, would be making psql a bit more of a 'real' shell, with loops, conditionals, better variable handling, etc. > "plpgsql_check_function" > I don't like this in its current form. A lot of code that mirrors > pl_exec.c. I think we'll have to find a way to somehow re-use the > existing code for this. Like, compile the function as usual, but > don't stop on error. Havn't looked at this yet, but your concerns make sense to me. Thanks! Stephen
Hello > > I do that pretty often. A better approach, imv, would be making psql a > bit more of a 'real' shell, with loops, conditionals, better variable > handling, etc. > after a few years prototyping on this area I am not sure so this is good idea. Maybe better to start some new console from scratch. >> "plpgsql_check_function" >> I don't like this in its current form. A lot of code that mirrors >> pl_exec.c. I think we'll have to find a way to somehow re-use the >> existing code for this. Like, compile the function as usual, but >> don't stop on error. > > Havn't looked at this yet, but your concerns make sense to me. I invite any moving in this subject - again I wrote lot of variants and current is a most maintainable (my opinion) - there is redundant structure (not code) - and simply code. After merging the code lost readability :(. But I would to continue in work - I am sure so it has a sense and people and some large companies use it with success, so it should be in core - in any form. Regards Pavel > > Thanks! > > Stephen
On Wed, Jan 23, 2013 at 10:14 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I think there's been something of a professionalization of PostgreSQL > development over the last few years. More and more people are able > to get paid to work on PostgreSQL as part or in a few cases all of > their job. This trend includes me, but also a lot of other people. > There are certainly good things about this, but I think it increases > the pressure to get patches committed. Also many of the new developers who were previously working from proprietary companies (includes me) may not have seen so long cycles for feature development, from design to acceptance/rejection. The problem aggravates because the same developer was previously doing development at much much faster pace and will find our style very conservative. Of course, the quality of work might be a notch lower when you are coding features at that pace and you may introduce regression and bugs on the way, but that probably gets compensated by more structured QA and testing that proprietary companies do. And many of these developers might be working on equally important and mission critical products even before. I know we are very conscious of our code quality and product stability, and for the right reasons, but I wonder the overall productivity will go up if we do the same i.e. have quick feature acceptance cycles compensated by more QA. The problem is being a community driven project we attract more developers but very less QA people. Would it help to significantly enhance our regression coverage so that bugs are caught early (assuming that's a problem) ? If so, may be we should have a month-long QAfest once where every developer is only writing test cases. Would it help to have a formal product manager (or a team) that gets to decide whether we want a particular feature or not ? This might be similar to our current model of discussion on hackers but more time bound and with the authority to the team to accept/reject ideas at the end of the time period and not keeping it vague for later decision after people have put in a lot of efforts already. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
On Thursday, January 24, 2013 2:58 AM Stephen Frost wrote: > Heikki, > > * Heikki Linnakangas (hlinnakangas@vmware.com) wrote: > > FWIW, here's how I feel about some the patches. It's not an > exhaustive list. > > Thanks for going through them and commenting on them. > > > "built-in/SQL Command to edit the server configuration file > > (postgresql.conf)" > > I think this should be a pgfoundry project, not in core. At least for > now. > > I don't think it would ever get deployed or used then.. I think for this feature there has been lot of test already done and reviewed by quite a few people. About the feature's value and high level design, there is quite a lot of discuss already happened. At this point, there is no design level concern for this patch, there are few issues which can be dealt with minor changes. Do you see any lack of importance for this feature? With Regards, Amit Kapila.
On Tuesday, January 15, 2013 at 2:14 PM, Bruce Momjian wrote:
I mentioned last year that I wanted to start working on parallelism:I believe it is time to start adding parallel execution to the backend.We already have some parallelism in the backend:effective_io_concurrency and helper processes. I think it is time westart to consider additional options
Parallelism isn't going to help all queries, in fact it might be just asmall subset, but it will be the larger queries. The pg_upgradeparallelism only helps clusters with multiple databases or tablespaces,but the improvements are significant.
I just got out of a meeting that included Oracle Spatial folks, who
were boasting of big performance increases in enabling parallel query
on their spatial queries. Basically the workloads on things like big
spatial joins are entirely CPU bound, so they are seeing that adding
15 processors makes things 15x faster. Spatial folks would love love
love to see parallel query execution.
were boasting of big performance increases in enabling parallel query
on their spatial queries. Basically the workloads on things like big
spatial joins are entirely CPU bound, so they are seeing that adding
15 processors makes things 15x faster. Spatial folks would love love
love to see parallel query execution.
--
Paul Ramsey
http://cleverelephant.ca
http://postgis.net
On Thu, Jan 24, 2013 at 02:34:49PM -0800, Paul Ramsey wrote: > On Tuesday, January 15, 2013 at 2:14 PM, Bruce Momjian wrote: > > I mentioned last year that I wanted to start working on parallelism: > > https://wiki.postgresql.org/wiki/Parallel_Query_Execution > > I believe it is time to start adding parallel execution to the backend. > We already have some parallelism in the backend: > effective_io_concurrency and helper processes. I think it is time we > start to consider additional options > > Parallelism isn't going to help all queries, in fact it might be just a > small subset, but it will be the larger queries. The pg_upgrade > parallelism only helps clusters with multiple databases or tablespaces, > but the improvements are significant. > > > I just got out of a meeting that included Oracle Spatial folks, who > were boasting of big performance increases in enabling parallel query > on their spatial queries. Basically the workloads on things like big > spatial joins are entirely CPU bound, so they are seeing that adding > 15 processors makes things 15x faster. Spatial folks would love love > love to see parallel query execution. I added PostGIS under the "Expensive Functions" opportunity: https://wiki.postgresql.org/wiki/Parallel_Query_Execution#Specific_Opportunities -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +