Thread: Confusing terminology

Confusing terminology

From
Peter Eisentraut
Date:
PostgreSQL often uses terminology in programs and documentation that is
vague and meaningless to anyone who is new to the system or not familiar
with the implementation.  Users are often heard to complain that in
PostgreSQL everything is named differently.  We should clean these things
up and in the future think twice before we write something or name
something.

In particular, I have four examples in mind:

"postmaster" is widely used.  The term is already confusing to outsiders
because it has nothing to do with the post office.  But OK, it's
historical.  But note that many users never see a process or executable
called postmaster because it's started automatically or they use pg_ctl,
or they're just using their ODBC client.  People tend to know what a
"server" is, so I suggest that term except when you are actually talking
about the executable.

"backend" is often used to mean one of the child processes of the
postmaster.  In a general sense, backend is just the same as server, which
is the opposite of client or frontend.  Users can't be expected to know
that the server forks subprocesses to do its thing.  One of the statistics
access functions is described as "Number of active backends in database".
How does that work?  I thought you could only run one postmaster per data
area? -- I think the term "session" is generally clearer, because you
already have session users and these things.

"tuple" is described in one place as "A tuple is an individual state of a
row; each update of a row creates a new tuple for the same logical row."
This definition is inconsistent with common usage -- and even the rest of
the manual.

A "query" is actually only something that retrieves data from a database,
that is, a SELECT statement.  UPDATEs are not queries, DELETEs are not
queries, and certainly CREATE TABLE isn't a query.  These things are just
statements or commands.  Some documentation has this completely mixed up.

Please take these kinds of issues into consideration, as they could make
many users' lifes just slightly easier.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Confusing terminology

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> In particular, I have four examples in mind:

> [ "postmaster" and "backend" -> "server" and "session" ]

I think we should use "postmaster" when we are specifically discussing
the parent daemon process as distinguished from its children, and
"backend" when we are specifically describing a child process as
distinguished from its parent and siblings.  "Server" is fine in any
context where you just mean "that software at the other end of the wire
from the client".  Look at it this way: if Postgres were implemented as
a monolithic server process, would your documentation still be correct
and sensible?  If so, say "server".  Use the other two terms when you
need to distinguish the parts.  Example:
After receiving a connection request, the postmaster spawnsa backend process to handle that client session.

While this is certainly project-specific language, it's useful to people
who may actually have to look at the code; and if they're reading
documentation that is talking about the parts of the server in the first
place, they're not that far away from wanting to look at code.  I don't
think that
After receiving a connection request, the server spawnsa session process to handle that client session.

is an improvement --- it seems more to have reduced the concept to a
tautology.  (Also, as seen here, I don't care for using "session"
to describe a process.  A session is a different sort of animal.)

> "tuple" is described in one place as "A tuple is an individual state of a
> row; each update of a row creates a new tuple for the same logical row."
> This definition is inconsistent with common usage -- and even the rest of
> the manual.

Give us "common usage" that distinguishes these two concepts, please.
I agree that we've not been consistent, but unless someone lays down
a clear definition for everyone to follow, it won't get better.

Maybe it's time for someone to prepare an "official" glossary that sets
out all these terms carefully, so that people will have something to
refer to when they're trying to pick a word to use.
        regards, tom lane


Re: Confusing terminology

From
Bruce Momjian
Date:
> While this is certainly project-specific language, it's useful to people
> who may actually have to look at the code; and if they're reading
> documentation that is talking about the parts of the server in the first
> place, they're not that far away from wanting to look at code.  I don't
> think that
> 
>     After receiving a connection request, the server spawns
>     a session process to handle that client session.
> 
> is an improvement --- it seems more to have reduced the concept to a
> tautology.  (Also, as seen here, I don't care for using "session"
> to describe a process.  A session is a different sort of animal.)

Yes, I feel session is something that exists between the client and the
server.  It is not a server-only concept.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Confusing terminology

From
Peter Eisentraut
Date:
Tom Lane writes:

> >from the client".  Look at it this way: if Postgres were implemented as
> a monolithic server process, would your documentation still be correct
> and sensible?  If so, say "server".

That was exactly my thought.

> Use the other two terms when you need to distinguish the parts.
> Example:
>
>     After receiving a connection request, the postmaster spawns
>     a backend process to handle that client session.

This is OK, because it's true:  There's a new process and it's at the
backend side of the wire.  (Actually, a session is something that exists
between a client and a server.)  What I don't like is language like "how
many backends are active on this database?" -- It's one: PostgreSQL.  It
would be correct to say "how many (PostgreSQL) backend *processes* are
active...", or maybe just "how many clients are connected to this
database".

> While this is certainly project-specific language, it's useful to people
> who may actually have to look at the code; and if they're reading
> documentation that is talking about the parts of the server in the first
> place, they're not that far away from wanting to look at code.

Right, but there are only specific chapters in the documentation that talk
about this.

> > "tuple" is described in one place as "A tuple is an individual state of a
> > row; each update of a row creates a new tuple for the same logical row."
> > This definition is inconsistent with common usage -- and even the rest of
> > the manual.
>
> Give us "common usage" that distinguishes these two concepts, please.

The libpq API uses tuple to mean row (and field to mean column).  Other
APIs like pgtcl and libpq++ have copied that.  I think that that's more
common usage than xmin and xmax.

> I agree that we've not been consistent, but unless someone lays down
> a clear definition for everyone to follow, it won't get better.

I think it's OK to use tuple == row, and "row state" or "tuple state" when
you're talking about MVCC (which is only rarely done anyway).  A row can
have more than one state at the same time under MVCC, but a row can have
more than one tuple???

> Maybe it's time for someone to prepare an "official" glossary that sets
> out all these terms carefully, so that people will have something to
> refer to when they're trying to pick a word to use.

Yeah, I think I'd like to set something like this up as part of the
program message style guide that I've talked about recently.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Confusing terminology

From
Bruce Momjian
Date:
> >     After receiving a connection request, the postmaster spawns
> >     a backend process to handle that client session.
> 
> This is OK, because it's true:  There's a new process and it's at the
> backend side of the wire.  (Actually, a session is something that exists
> between a client and a server.)  What I don't like is language like "how
> many backends are active on this database?" -- It's one: PostgreSQL.  It
> would be correct to say "how many (PostgreSQL) backend *processes* are
> active...", or maybe just "how many clients are connected to this
> database".

Or how many sessions.  That seems to be the best wording unless you want
to highlight the existance of backend processes.

I am not sure I agree that there is only one backend running, well maybe
I see your point but it seems a little confusing.  We used the term
'backend' with Ingres and it always meant your backend process.

> > Maybe it's time for someone to prepare an "official" glossary that sets
> > out all these terms carefully, so that people will have something to
> > refer to when they're trying to pick a word to use.
> 
> Yeah, I think I'd like to set something like this up as part of the
> program message style guide that I've talked about recently.

There is a crude attempt in the FAQ.  Maybe we can add there.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Confusing terminology

From
Mike Mascari
Date:
Bruce Momjian wrote:
> 
> > >     After receiving a connection request, the postmaster spawns
> > >     a backend process to handle that client session.
> >
> > This is OK, because it's true:  There's a new process and it's at the
> > backend side of the wire.  (Actually, a session is something that exists
> > between a client and a server.)  What I don't like is language like "how
> > many backends are active on this database?" -- It's one: PostgreSQL.  It
> > would be correct to say "how many (PostgreSQL) backend *processes* are
> > active...", or maybe just "how many clients are connected to this
> > database".
> 
> Or how many sessions.  That seems to be the best wording unless you want
> to highlight the existance of backend processes.
> 
> I am not sure I agree that there is only one backend running, well maybe
> I see your point but it seems a little confusing.  We used the term
> 'backend' with Ingres and it always meant your backend process.
> 
> > > Maybe it's time for someone to prepare an "official" glossary that sets
> > > out all these terms carefully, so that people will have something to
> > > refer to when they're trying to pick a word to use.
> >
> > Yeah, I think I'd like to set something like this up as part of the
> > program message style guide that I've talked about recently.
> 
> There is a crude attempt in the FAQ.  Maybe we can add there.

What about "relation" vs. "table"? 

CREATE TABLE foo(key integer);

ERROR: Relation 'foo' already exists

I realize the historical context of the word, but it flies in the face
of the language.

Mike Mascari
mascarm@mascari.com


Re: Confusing terminology

From
"Roderick A. Anderson"
Date:
On Fri, 18 Jan 2002, Mike Mascari wrote:

> What about "relation" vs. "table"? 
> 
> CREATE TABLE foo(key integer);
> 
> ERROR: Relation 'foo' already exists

Can a table named foo and a view named foo exist in the same database?


Cheers,
Rod
--                      Let Accuracy Triumph Over Victory
                                                      Zetetic Institute
     "David's Sling"                                                        Marc Stiegler
 



Re: Confusing terminology

From
"Arguile"
Date:
Roderick A. Anderson writes:
>
> On Fri, 18 Jan 2002, Mike Mascari wrote:
>
> > What about "relation" vs. "table"?
> >
> > CREATE TABLE foo(key integer);
> >
> > ERROR: Relation 'foo' already exists
>
> Can a table named foo and a view named foo exist in the same database?
>
>
> Cheers,
> Rod

No. There's no reasonable way for the server to know which you mean when you
execute a statement. This applies to tables, views, sequences, etc.





Re: Confusing terminology

From
"carl garland"
Date:
>Maybe it's time for someone to prepare an "official" glossary that sets
>out all these terms carefully, so that people will have something to
>refer to when they're trying to pick a word to use.

Maybe we could coax Robert Easter to add to his already wonderful page
of Essential Database Terminology that is linked off the techdocs page
a section which is Postgres specific terminology.... having them in
the same place would prevent duplication of effort and his existing
glossary is very well done IMO.
Best Regards,
Carl Garland

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp.



Re: Confusing terminology

From
"Roderick A. Anderson"
Date:
On Fri, 18 Jan 2002, Arguile wrote:

> Roderick A. Anderson writes:

> > > ERROR: Relation 'foo' already exists
> >
> > Can a table named foo and a view named foo exist in the same database?
> >
> No. There's no reasonable way for the server to know which you mean when you
> execute a statement. This applies to tables, views, sequences, etc.

It was a rhetorical question (aka. smart-ass).  My point was a table,
view, sequence, and their friends are all relations.  Or at least to my
understanding the table and view are relations.  And therefore Relation 'foo' already exists makes sense to me.


Best,
Rod
--                      Let Accuracy Triumph Over Victory
                                                      Zetetic Institute
     "David's Sling"                                                        Marc Stiegler
 



Re: Confusing terminology

From
"Arguile"
Date:
Roderick A. Anderson wrote:

[snip]
>
> It was a rhetorical question (aka. smart-ass).  My point was a table,
> view, sequence, and their friends are all relations.  Or at least to my
> understanding the table and view are relations.
>    And therefore Relation 'foo' already exists makes sense to me.
>

Oops, sorry about that. Text, at times, doesn't fully convey the proper
tone. :)




Re: Confusing terminology

From
Mike Mascari
Date:
"Roderick A. Anderson" wrote:
> 
> On Fri, 18 Jan 2002, Arguile wrote:
> 
> > Roderick A. Anderson writes:
> 
> > > > ERROR: Relation 'foo' already exists
> > >
> > > Can a table named foo and a view named foo exist in the same database?
> > >
> > No. There's no reasonable way for the server to know which you mean when you
> > execute a statement. This applies to tables, views, sequences, etc.
> 
> It was a rhetorical question (aka. smart-ass).  My point was a table,
> view, sequence, and their friends are all relations.  Or at least to my
> understanding the table and view are relations.
>    And therefore Relation 'foo' already exists makes sense to me.

CREATE INDEX i_foo1 on foo(key);

ERROR: DefineIndex: relation "foo" not found

ALTER TABLE foo ADD COLUMN key integer;

ERROR: ALTER TABLE: column name "key" already exists in table "foo"

TRUNCATE TABLE foo;

ERROR:  Relation 'foo' does not exist

Mike Mascari
mascarm@mascari.com


Re: Confusing terminology

From
Peter Eisentraut
Date:
Roderick A. Anderson writes:

> It was a rhetorical question (aka. smart-ass).  My point was a table,
> view, sequence, and their friends are all relations.  Or at least to my
> understanding the table and view are relations.
>    And therefore Relation 'foo' already exists makes sense to me.

From a point of view of implementation, the term "relation" also covers
indexes, which can be confusing.  Standard SQL (which doesn't have indexes
or sequences) uses the term "table" to mean both regular tables and views.
Neither of these choices are entirely pretty.

-- 
Peter Eisentraut   peter_e@gmx.net