Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem? - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?
Date
Msg-id 22fd4d2d-e7da-f30c-62fa-d317b160abd3@aklaver.com
Whole thread Raw
In response to Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?  (tutiluren@tutanota.com)
List pgsql-general
On 9/22/20 3:28 PM, tutiluren@tutanota.com wrote:
> 
> Sep 21, 2020, 7:53 PM by jd@commandprompt.com:
> 
>     I have to agree that pg_dump is largely a step child backup program.
>     It has consistently been found over the years to be lacking in a
>     number of areas. Unfortunately, working on pg_dump isn't sexy and it
>     is difficult to get volunteers or even paid resources to do such a
>     thing. The real solution for pg_dump is a complete refactor which
>     includes pg_dumpall and it is not a small undertaking. It should be
>     noted that it is also a less and less used program. On our team it
>     is normally used for only very specific needs (grabbing a schema)
>     and we use binary backups or logical replication to receive specific
>     data.
> 
> Huh? Are you saying that there is another, superior way to back up 
> PostgreSQL databases other than pg_dump? I re-read the manual on it just 
> now, but didn't see a single word about it being "legacy" or 
> "deprecated" or even that there's any other way to do it. What do you mean?

It is not deprecated or legacy and it is still used by many. The issue 
is that people try to do 'partial' dumps without reading the docs on 
what those switches actually do and the potential conflicts therein. For 
large installations there it is often better to use some form of 
replication(binary and/or logical) to maintain a continuous backup. 
pg_dump can take a long time on very big databases, so it may not keep up.


> 
>     This is really using the wrong tool for the job type of issue. PG
>     was never designed for such a scenario.
> 
> I hate the "wrong tool for the job" argument. It assumes that everyone 
> has infinite time, energy and brain capacity to learn endless redundant 
> tools just to "use the right tool for the job" rather than "use what you 
> actually know". I know PG. I don't know SQLite. They are very different. 
> So obviously, I want to use PG.

This comes down to what your definition of embedded is?  A matter of 
determining whether we are talking apples or oranges.


> 
> What exactly makes PG unsuitable for this? I don't get it. But at the 
> same time, I also realize that it's not going to happen at this point. 
> The entire concept of a desktop computer appears to be phased out as we 
> speak...

> 
> While I understand what you mean, and even agree in theory, in practice, 
> this always results in crappy third-party solutions which I don't want 
> to deal with. PostGIS, for example, forces me to use "postgis" for its 
> schema instead of "PostGIS" just because they arrogantly didn't 
> construct their internal queries properly. "Little" things like that.

Huh? A schema is just a name space, why does it matter how the extension 
chooses to define it? I mean you could have number of permutations of 
postgis.

> 
>         The practical end result of this is that I've always gone back
>         to using the untouched default configuration file (except for
>         the logging-related options), which, especially in the past on
>         FreeBSD, *severely* crippled my PG database to not even come
>         close to taking advantage of the full power of the hardware.
>         Instead, it felt like I was using maybe 1% of the machine's
>         power, even with a proper database design and indexes and all of
>         that stuff, simply because the default config was so
>         "conservative" and it couldn't be just set to "use whatever
>         resources are available".
> 
> 
>     Not to be unkind but this does seem lazy. There are literally
>     hundreds of "how to make postgres go fast", "how to optimize
>     postgres" if you take 15 minutes to Google. It is true that the
>     project (outside of the wiki) doesn't have much information in the
>     official documentation but that doesn't mean that the information is
>     not available.
> 
> Hundreds of crappy, outdated, confusing, badly written "web tutorialz" 
> are worth nothing. A couple of clear, unambiguous documentation 
> paragraphs are worth their (metaphorical) weight in gold.

The problem is the 'unambiguous' part. There are so many ways Postgres 
is used, writing a configuration doc that suited everyone would have so 
many if/and/or/buts that it would define ambiguity. If you want that 
information post an email here with the information on your server specs 
and proposed usage.

> 
> Claiming that "the information is out there" is just hand-waving. It's 
> shifting the burden to the user to actively hunt for information, and 
> very likely be misled by all the garbage articles out there. I learned 
> some horrible practices early on from "web tutz" and it took me many 
> years to unlearn that stuff.
> 
> I know that writing documentation isn't fun, but it's necessary. Also, 
> my overall point was to not even have to deal with the specifics, but 
> just be able to tell PG with a single config option that it's allowed to 
> "use most of the machine's resources".

That would entail building an AI into  the code that would deal with all 
the possible OS(versions), Postgres(versions), hardware permutations.

> 
>         I wish so much for PG to have a mode where it self-tunes itself
>         as needed, over time, based on the actual workload, or at least
>         allowed some kind of abstract "performance mode" such as: "you
>         are allowed to use significant system resources, PG", or: "You
>         are one of my most important applications. Just use as much
>         power as you currently need, but at least save about 10% for the
>         rest of the system, will you?" Maybe this is also harder than it
>         sounds to accomplish, but for somebody like me who has zero
>         funding, I cannot hire some professional to sit down with me and
>         fine-tune my system for $899/hour.
> 
> 
>     See my comment about Google. The information is out there and easy
>     to find.
> 
> I guess I'm the worst idiot in the world, then, who can't DuckDuckGo 
> (Google is evil) it even after 15 years.
> 
> Seriously, I didn't type my feedback "for fun". It may be difficult for 
> very intelligent people to understand (as often is the case, because you 
> operate on a whole different level), but the performance-related 
> PostgreSQL configuration options are a *nightmare* to me and many 
> others. I spent *forever* reading about them and couldn't make any sense 
> of it all. Each time I tried, I would give up, frustrated and angry, 
> with no real clue what "magic numbers" it wanted.
> 
> It's quite baffling to me how this can be so difficult for you all to 
> understand. Even if we disregard the sheer intelligence factor, it's 
> clear that users of PG don't have the same intimate knowledge of PG's 
> internals as the PG developers, nor could possibly be expected to.
> 
> As mentioned, I kept going back to the default configuration over and 
> over again. Anyone who doesn't is either a genius or pretends/thinks 
> that they understand it. (Or I'm extremely dumb.)

Does your server runs to your satisfaction with the default settings?
If not what issues do you see?

> 
> Very often, I get the feeling that things like that are the way they are 
> on purpose. Work security and whatnot. But it's very frustrating for 
> people like me who can't afford to buy help and don't have the enormous 
> brain capacity necessary to comprehend the complex relations between the 
> numerous performance-related config options. It really is that difficult.
> 
>     Discord and Slack
> 
> Those modern services don't even let me load them. It's the same thing 
> with everything these days: "verify with phone", "we've detected 
> suspicious activity", "fake error message", etc.


-- 
Adrian Klaver
adrian.klaver@aklaver.com



pgsql-general by date:

Previous
From: tutiluren@tutanota.com
Date:
Subject: Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?
Next
From: raf
Date:
Subject: I'm surprised that this worked