Re: Proof of concept: standalone backend with full FE/BE protocol - Mailing list pgsql-hackers

From Christopher Browne
Subject Re: Proof of concept: standalone backend with full FE/BE protocol
Date
Msg-id CAFNqd5VF-Y_xGjBAdp6YVGcUYufzSTTs81XfT9f55kKJ_Sye4w@mail.gmail.com
Whole thread Raw
In response to Re: Proof of concept: standalone backend with full FE/BE protocol  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers

Preface: I think there's some great commentary here, and find myself agreeing
pretty whole-heartedly.

On Tue, Nov 13, 2012 at 2:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 13 November 2012 17:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> The most popular relational database in the world is Microsoft Access,
>> not MySQL. Access appears desirable because it allows a single user to
>> create and use a database (which is very good). But all business
>> databases have a requirement for at least one of: high availability,
>> multi-user access or downstream processing in other parts of the
>> business.
>
> That's a mighty sweeping claim, which you haven't offered adequate
> evidence for.  The fact of the matter is that there is *lots* of demand
> for simple single-user databases, and what I'm proposing is at least a
> first step towards getting there.

I agree there is lots of demand for simple single-user databases and I
wish that too. What I don't agree with is something that casts that
requirement in stone by architecturally/permanently disallowing
secondary connections.

Evidence for claims:
* The whole Business Intelligence industry relies on being able to
re-purpose existing data, forming integrated webs of interconnecting
databases. All of that happens after the initial developers write the
first version of the database application.
* Everybody wants a remote backup, whether its for your mobile phone
contact list or your enterprise datastore.

People are migrating away from embedded databases in droves for these
very reasons.

There seems to be a continuum of different sorts of scenarios of more-to-less
concurrency that are desirable for some different reasons.  From most-to-least,
I can see:

1 - Obviously, there's the case that Postgres is eminently good at, of supporting
  many users concurrently using a database.  We love that, let's not break it :-).

2 - We have found it useful to have some extra work processes that do some
  useful internal things, such as vacuuming, forcing background writes,
  collecting statistics.  And an online backup requires having a second process.

3 - People doing embedded systems find it attractive to attach all the data
  to the singular user running the system.  Witness the *heavy* deployment
  of SQLite on Android and iOS.  People make an assumption that this is
  a single-process thing, but I am inclined to be a bit skeptical.  What they
  *do* know is that it's convenient to not spawn extra processes and do
  IPC.  That's not quite the same thing as it being a certainty that they
  definitely want not to have more than one process.

4 - There are times when there *is* certainty about not wanting there to be more
  than one process.  When running pg_upgrade, or, at certain times, when
  doing streaming replication node status switches, one might have that
  certainty.  Or when reindexing system tables, which needs single user mode.

For us to conflate the 3rd and 4th items seems like a mistake to me.
 
> The main disadvantage of approaching this via the existing single-user
> mode is that you won't have any autovacuum, bgwriter, etc, support.
> But the flip side is that that lack of infrastructure is a positive
> advantage for certain admittedly narrow use-cases, such as disaster
> recovery and pg_upgrade.  So while I agree that this isn't the only
> form of single-user mode that we'd like to support, I think it is *a*
> form we'd like to support, and I don't see why you appear to be against
> having it at all.

I have no problem with people turning things off, I reject the idea
that we should encourage people to never be able to turn them back on.

Yep.  That seems like conflating #2 with #4.

It's mighty attractive to have a forcible "single process mode" to add safety to
certain activities.

I think we need a sharper knife, though, so we don't ablate off stuff like #2, just
because someone imagined that "Must Have Single Process!!!" was the right
doctrine.
 
> A more reasonable objection would be that we need to make sure that this
> isn't foreclosing the option of having a multi-process environment with
> a single user connection.  I don't see that it is, but it might be wise
> to sketch exactly how that case would work before accepting this.

Whatever we provide will become the norm. I don't have a problem with
you providing BOTH the proposed single user mode AND the multi-process
single user connection mode in this release. But if you provide just
one of them and its the wrong one, we will be severely hampered in the
future.

Yes, I am very much against this project producing a new DBMS
architecture that works on top of PostgreSQL data files, yet prevents
maintenance, backup, replication and multi-user modes.

I see this decision as a critical point for this project, so please
consider this objection and where it comes from.

I don't think we're necessarily *hugely* hampered by doing one of these;
I don't think it precludes doing the other.

Pointedly, I think it's a fine thing to have a way to force the system into
a single process to support #4 (e.g. - Single Process Mode to support
recovery from failover, pg_upgrade, and other pretty forcible SUM stuff).
That's useful to lots of people doing non-embedded cases.

That doesn't preclude allowing a somewhat more in-between case later.

Perhaps in-between is supported by slightly richer configuration:
- pg_hba.conf knows about replication connections; perhaps we could
  augment it to have rules for the "internal daemons" like autovac, stats
  collector, and background writer.
- pg_hba.conf might be augmented to forbid >1 connection per user,
  for regular connections?

Possibly I'm not solving the true problem, of course.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Enabling Checksums
Next
From: Robert Haas
Date:
Subject: Re: Enabling Checksums