Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem? - Mailing list pgsql-general

From Sam Gendler
Subject Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?
Date
Msg-id CAEV0TzD7bvmSURG4S7XbgRvSPX93WO0e7zPVtRZi5THvd_Cgkg@mail.gmail.com
Whole thread Raw
In response to Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?  (tutiluren@tutanota.com)
List pgsql-general


On Thu, Sep 24, 2020 at 10:40 PM <tutiluren@tutanota.com> wrote:

Well not partial as in incremental. Instead dump only some portion of the schema with or without its associated data.
It's funny that you should bring that up, considering how it was one of my points... See the point about pg_dump's bug on Windows.

And you seem to have ignored the fact that one of the core developers pointed out that it likely isn't a pg_dump bug - if your terminal is using the same locale as the database, it should have no difficulty dealing with the characters you are having trouble with.  It seems likely that you simply need to learn how to get your terminal set up correctly for it to work.
 
I'm saying that PostGIS has a bug due to incorrectly constructed internal queries which makes it impossible to properly name the schema where PostGIS is to reside, causing my database to look very ugly when it has to say "postgis" instead of "PostGIS" for PostGIS's schema. And that was an example of how sloppy/bad third-party things always are, and is one reason why I don't like it when I have to rely on "extensions".
They are extensions so you aren't required to use them and rely on their way of doing things. You have the choice of writing your own code/extension or do without completely.
It sure is great to have such choices... I can't take it seriously when people say things like this. It's similar to "it's open source so you can easily vet it yourself". It's not taking reality into consideration at all.

As for doing without it, that would make it impossible to deal with GPS coordinates/maps. So it's not really a choice at all.

Never mind that your tone in your emails is remarkably rude for someone who is doing nothing but complain about perceived shortfalls in a product that is entirely free and of which you appear to be fairly far from an expert user, has it occurred to you that YOUR issue is that you have absolutely no understanding of the variety of uses that people put a database like postgres to out in the world? Imagine if the core database server was subject to the development schedule of every extension that you happen to think ought to be included in the core product - or vice versa.  Someone finds a bug in PostGIS and it can't be fixed until the next major or minor release of the core postgres server?  That would result in a terrible user experience and force development of all extensions to move in lockstep with the core server.  The vast majority of users do not ever use GIS extensions and have absolutely no use for their presence in the core product. The fact that you need it is your one individual use case.  I've been using Postgres professionally since 2005 and I've only ever used postgis for a single project. Who are you to say what should and should not be included in a product to which you contribute absolutely nothing? And in the same breath, you complain about the quality of that extension.  If it is so lacking in quality, why would you want it embedded in the core service where it could potentially impact the quality of every user's experience?  Maybe consider that the dev team has good reasons for keeping non-essential functionality in extensions instead of the core codebase.
That would entail building an AI into the code that would deal with
all the possible OS(versions), Postgres(versions), hardware
permutations.

I... guess. If "AI" means "a series of ifs". Which is what software... is? I doubt that people who can make the world's most advanced open source database cannot check the amount of RAM and see how fast the CPU/disk is.

It is more then that. It would have to take into account the behavior changes that happen in Postgres between major versions. It also would have to account for OS specific parameters and the changes that happen there between OS versions. It also would need to 'know' how the database was going to be used; readonly, heavy writes, etc. Also how the database should play with other programs on the same machine. Add to the mix containers, cloud instances and so on and you are outrunning the ability of 'ifs' to handle it.
If it changes that much, it's far, far worse than I even thought, and it sounds like it will be pointless to even *try* to learn it as it keeps changing between versions/OSes/other stuff. 

I can't help but feel as if people just don't want to answer this and other concerns I have. As if there's some silent agreement along the lines of "securing PG DBAs' jobs".

You really come off as something of an ass with this commentary. DBAs have plenty of far more interesting things to do with their time than performance tune a server config and certainly don't need that in order to be secure in their role. What is clear is that you have absolutely no understanding of the variety of ways in which a database server might be configured depending on the way the database is being used, the quantity of data, the rate of queries, the size of queries, the other applications that must share the host, the speed and quantity of storage systems, etc.  If it is such an easy problem to solve, then I'm sure everyone here would encourage you to build it yourself and contribute it back to the project - all of us who have spent years learning how to fine tune the performance of postgresql would surely thank you for taking a big chunk of what is otherwise basically toil off our plates.  Take a look at the performance mailing list some time and look at the effort even people with decades of postgres experience go to in order to determine optimal settings for new hardware and then tell us how easy it would be to just automatically configure a server with magic values that work for everyone.  You are just flat-out wrong, here, and you lack the expertise or experience to even understand that you don't know enough to have a valid opinion on this point.  Everyone else has tried to let you know politely, but your tone has ticked me off so I'm just telling it to you straight. 
 

The thing is 'general mode' is going to mean something different to someone running a database in the MB-low GB range vs. high GB vs. TB vs. PB.
I don't mean this to sound rude, but it's like talking to a wall... What I mean is that there are obviously technical means for software to know whether they are exhausting the system they are running on or not, and expecting people to understand all these intricate internal parameters is just... bizarre. There ought to be some kind of "abstract" setting for those of us who aren't able to (or even *wish* to) comprehend all the PG internals, and just want an efficient database using (roughly) as much of our machine as we want.

Well, you most definitely DO sound rude, and you have since your first email on this thread. And there are most definitely NOT technical means for software to know whether it is exhausting a system or not. Just for starters, it has no idea if it is sharing server resources with other processes or if all resources are dedicated to itself. If disk response is suddenly slow, is that because some index or table just hit critical mass and caused a knee in a performance curve that might be tuned out or is it because someone just started bulk copying a bunch of data to the same network attached storage that the database is using?  There is no way to tell if a query is slow because other queries were occupying all of the CPU, because something else sucked up all the memory and the OS is swapping, or because some other process just consumed 90% of CPU resources on the host.  The database doesn't know if 99% of your queries are going to be to a bunch of 1MB tables or to the one 40TB table that is actually just being written for auditing purposes and will rarely, if ever be queried on the production host.  Never mind the differences between data warehousing workloads vs OLTP workloads. 

Just maybe, the folks around here, many with decades of postgresql experience, know more than you do about the variety of uses a database might be put to, not to mention the variety of settings that go into tuning for them, and have come to the conclusion that it is a problem too complicated to solve via automation. Instead of arguing with them, and then complaining about how hard it is for YOU - someone who almost certainly is smarter than any algorithm is likely to be - cannot figure it out. Maybe take some time to do some reading and learning, set up a system and do some benchmarking of various config changes, and figure it out. Honestly, it really isn't all that hard, but it IS time consuming. Skill acquisition usually is.  That's why employers pay well the people who do it. Not only are there already tools available for 'automatically' tuning your database based on your projected utilization and available resources, but many of us have somehow managed to acquire the knowledge you seek via the documentation and the shared knowledgebase that is the internet.  Instead of berating the developers, how about just asking for assistance and then using that assistance to further your own knowledge.  No one is going to be quick to volunteer to assist someone who just angrily writes accusatory complaints that not enough was done to proactively assist them - as if the folks on this mailing list don't have anything better to do with their time than to drop everything to help you - and yet they do.  Instead of being angry about it, maybe thank them instead.

The postgresql docs are reference docs.  They are written in a manner which documents available APIs and capabilities in a thorough manner. As a result, they aren't that useful for learning which bits and pieces are most useful for doing the thing that you want to do. But there are countless books, blog posts, mailing lists, and other resources available to you - just as they are for almost every other open source project of significance.  Can you name a database that is even remotely as capable as PostgreSQL which is significantly more easily administered than PostgreSQL? Oracle?  Nope, it is MASSIVELY more difficult. SQL Server? Nope, also a pain in the neck to tune and, in my opinion as something of a novice user of that system, a whole heck of a lot more poorly documented when it comes to administrative tasks. MySQL?  Not really.  Maybe slightly simpler to configure if you don't need all of the functionality that postgres brings to the table, but otherwise every bit as hard to manage.  If you want Microsoft Access simplicity, you should use Microsoft Access.  There's a reason all of those databases have manually managed configurations that take lots of experience to become fully competent with.  The only way to simplify configuration is to simplify functionality, and you would inevitably complain about that if it happened.

 
"There's plenty of guides" and "the information is out there" doesn't help me and all the other people who have stuck with the default config and thus a massively restricted PG database for all these years. Just because it's easy to you doesn't mean it's easy to everyone else. Just dealing with composing efficient-enough SQL queries and designing an optimized database structure is (way) more than enough work for most of us. I don't have the luxury of some hired DBA who sits all day tuning the PG server. Besides, I've already explained the privacy issues with that even if I had the money...

None of us had the luxury of having some hired DBA who sat all day tuning the PG server.  That's why we sat down and acquired those skills for ourselves. And I'm reasonably certain that none of us sit around all day idly modifying database configs during our copious spare time.  Complex systems are hard.  Welcome to your career.  The only thing that will keep you gainfully employed for the next decades is if you learn how to quickly and effectively acquire new skills (and not piss of whole communities of potential support in the process).  And you know what is a damn good way to do that?  Read books about the tech.  Read the mailing lists for the tech.  And then contribute something useful to that tech.  Feel free to fix any and/or all of the perceived defects that you seem to think are so trivial to resolve.  I can promise you that if your implementations are of high quality and don't break other functionality or diverge from the project's core mission, there's a high likelihood that they'll be merged into the codebase eventually.

I just typed PostgreSQL into my oreilly bookshelf and came up with over 3000 hits.  The first page is entirely high quality books published by reputable publishers within the last 3 years, covering pg 11 or 12.  The first one is called "Mastering PostgreSQL 12" and seems a likely candidate.  A few rows farther down, I find "PostgreSQL Configuration: Best Practices for Performance and Security" which was published less than 6 months ago and is surely entirely up to date.  I refuse to believe that you could possibly have so much difficulty learning how to configure a system from a book like that, which almost certainly gives you step by step guidance for determining optimal values for the most relevant settings for use cases similar to yours.  It's probably worth pointing out that there's likely a 99% chance that the authors of those books are reading your emails on this list.  So maybe go buy, borrow, or check out one of their books and learn how to configure your system.  Unless you pay them an hourly rate, no one here is going to configure your system for you.

There's my $0.02 (and then some). 

--sam

pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?
Next
From: Jack Douglas
Date:
Subject: Yum repository RPM behind release