Re: cutting down the TODO list thread - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: cutting down the TODO list thread |
Date | |
Msg-id | CAEze2WhOS_ADOt-QortU69wTqDT8bNWh09+nNTDOAM=Cz8rq6w@mail.gmail.com Whole thread Raw |
In response to | Re: cutting down the TODO list thread (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: cutting down the TODO list thread
|
List | pgsql-hackers |
On Mon, 15 May 2023 at 20:51, Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, May 15, 2023 at 2:05 PM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > Add SET PERFORMANCE_TIPS option to suggest INDEX, VACUUM, VACUUM ANALYZE, and CLUSTER > > > -> There are external tools that help with this kind of analysis > > > > Althrough there are external tools which help with the analysis, the > > silent complaint of this item seems to be "PostgreSQL doesn't provide > > the user with actionable maintenance tasks/DDL info", and I > > wholeheartedly agree with that. > > Well, if SET PERFORMANCE_TIPS is adding a new GUC, that's inherently > something that could not happen in a contrib module, only in core. If Why wouldn't it? pgss has (amongst others) pgss.save, other extensions have their own GUCs, why can't this be one of them? Sure, it'd need to be namespaced to the extension, but that's never held anyone back from adding their own GUCs. > it's dedicated syntax of some kind, same thing. I am not at all > convinced that adding something spelled SET PERFORMANCE_TIPS in any > form is a good idea, Well, that part I agree with; the design of this maintenance/performance helper is very different from how I would imagine such feature to exist in or alongside PostgreSQL. I did not make that very clear in my initial mail, sorry for that, but I wouldn't suggest that "SET PERFORMANCE_TIPS" > but at the very least someone would have to > propose what the possible values of that option would be and what they > would do in a pretty detailed way for us to decide whether we liked it > or not. It seems to me that it would be quite difficult to get any > kind of consensus. If you came up with several different kinds of > performance tips and made the GUC a comma-separated list of tip types, > I suspect that you'd get a good bit of opposition: maybe Tom would > dislike one of the tip types for some reason, me a second, and Andres > a third. If you remove all three of those you've gutted your > implementation. Whee. Yes, I know that bikeshedding and strong personal opinions are common on this list - I know I am guilty of that as well. Yet the expectation that people will tend to bikeshed shouldn't be a reason to move a feature to the "Not Wanted" list. If that were the qualifier, then we wouldn't add any new major features, because there's always going to be some bikeshedding for major new features. > But even if we leave the syntax aside, it is very difficult, IMHO, to > come up with something in this area that makes sense to put in core. > There are so many things you could warn about, and so many possible > algorithms that you could use to emit warnings. We do have a few > things somewhat like this in core, like the warning that checkpoints > are happening too close together, or the hint that when you said DROP > TABLE olders maybe you really meant DROP TABLE orders. But in those > cases, the situation is fairly unambiguous: if your checkpoints are > happening too close together, you should probably raise max_wal_size, > as long as it's not going to run you out of disk space. If you > specified a non-existent object name, you should probably correct the > object name to something that does exist. > > But things like CREATE INDEX or CLUSTER are a lot trickier. I struggle > to think of what individual PostgreSQL command would have enough > context to know that those things are a good idea. For example, > clustering a table on an index makes sense if (1) there are queries > that would run faster with the clustering and (2) they are run > frequently enough and are expensive enough that the savings would be > material and (3) the clustering wouldn't degrade so quickly as to be > pointless. But I don't see how it would be possible to discover this > situation without instrumenting the whole workload, or at least having > some trace of the workload. Even if you have the data, you probably > need to do a bunch of number-crunching to come up with good > recommendations, and that's expensive, and you probably have to be OK > with a significantly higher risk of wrong answers, too, because the > past may be different from the future, and the planner's estimates of > what the clustering would save might be wrong. Agreed on most parts, including the "one statement is unlikely to have enough context". But, not all of those issues need to be tackled to have actionable suggestions. Just a "look at <X = shared_buffers> because it is likely you left performance on the table" would be a good start. Other examples of such suggestions would be detecting a small fraction of hot updates vs missed-hot updates which could produce a hint to decrease the fillfactor of a table; a high number of table scans vs tuples returned could produce an indication about likely missing indexes; etc. These hints wouldn't necessarily have to be produced with per-statement hints like the TODO suggests. > I wouldn't go so far as to say that doing anything of this sort is > absolutely and categorically hopeless, but suggesting to an aspiring > hacker (or even an established one) that they go try to implement SET > PERFORMANCE_TIPS isn't helpful at all. At least in my opinion, it's > not clear what that means, or that we want it, or what we might want > instead, or even that we want anything at all. We should aim to have a > TODO list filled with things that are actionable and likely to be > worth the effort someone might choose to invest in them. Agreed; and that's why I'm not against removing the specific wording of the item. This may not have been clearly described in my previous mail, but I would instead like to see a TODO list item which covers the need to improve the number of cases where we provide actionable advice, and specifically those cases where there is not One Obvious Issue (OOI;s like when getting close to wraparound; or close checkpoints, or ...). For example, we could have an extension, similar to pageinspect, that scans the catalogs and detects actionable problems like tables with 50%+ unused space, or tables that are clustered on an index but have lost all semblance of ordering (according to statistics), or ... I guess my main point is that we gather a lot of statistics, but do not provide tooling which provides actionable reports. And I think that is the core of such a todo item - we could use a feature like that either in core, or more likely in contrib as an extension (like pgss and pageinspect). Kind regards, Matthias van de Meent.
pgsql-hackers by date: