commit d9e4657a17b5f2a32b37534f355737433770a88a Author: David G. Johnston Date: Mon Nov 9 23:07:29 2020 +0000 v0012-architecture-suggestions diff --git a/doc/src/sgml/architecture.sgml b/doc/src/sgml/architecture.sgml index b7589f9a4f..bd22ada939 100644 --- a/doc/src/sgml/architecture.sgml +++ b/doc/src/sgml/architecture.sgml @@ -28,10 +28,11 @@ - The first step when an Instance starts is the start of the + All aspects of an Instance are launched and managed using a single primary + process termed the Postmaster. - It loads the configuration files, allocates Shared Memory, and - starts the other processes of the Instance: + It loads configuration files, allocates Shared Memory, and + starts the other collaborating processes of the Instance: Background Writer, Checkpointer, WAL Writer, @@ -39,9 +40,10 @@ Autovacuum, Statistics Collector, Logger, and more. - Later, the Postmaster starts + Later, the Postmaster listens on its configured system port(s) and in response + to client connection attempts launches Backend processes - which communicate with clients and handle their requests. + to which it delegates authentication, communication, and the handling of their requests. visualizes the processes of an Instance and the main aspects of their collaboration. @@ -62,14 +64,6 @@ - - When a client application tries to connect to a - database, - this request is handled initially by the Postmaster. It - starts a new Backend process, which handles all further - client's requests. - - Client requests like SELECT or UPDATE usually lead to the @@ -84,9 +78,9 @@ Shared Memory is limited in size and it can become necessary to evict pages. As long as the content of such pages hasn't - changed, this is not a problem. But in Shared Memory also - write actions take place. Modified pages are called dirty - pages or dirty buffers and before they can be evicted they + changed, this is not a problem. But writes directly modify + the pages in Shared Memory. Modified pages are called dirty + pages (or dirty buffers) and before they can be evicted they must be written to disk. This happens regularly by the Checkpointer and the Background Writer processes to ensure that the disk version of the pages are up-to-date. @@ -98,7 +92,7 @@ WAL record is created from the delta-information (difference between the old and the new content) and stored in another area of - Shared Memory. The parallel running WAL Writer process + Shared Memory. The concurrently running WAL Writer process reads them and appends them to the end of the current WAL file. Such sequential writes are faster than writes to random @@ -108,8 +102,8 @@ - Second, the transfer of dirty buffers from Shared Memory to - files must take place. This is the primary task of the + Second, the instance transfers dirty buffers from Shared Memory to + files. This is the primary task of the Background Writer process. Because I/O activities can block other processes, it starts periodically and acts only for a short period. Doing so, its extensive (and @@ -123,14 +117,8 @@ Checkpoints. A Checkpoint is a point in time when all older dirty buffers, all older WAL records, and finally a special Checkpoint record - are written and flushed to disk. Heap and index files, - and WAL files are now in sync. - Older WAL is no longer required. In other words, - a possibly occurring recovery, which integrates the delta - information of WAL into heap and index files, will happen - by replaying only WAL past the last-recorded checkpoint. - This limits the amount of WAL to be replayed - during recovery in the event of a crash. + are written and flushed to disk. + Older WAL files are no longer required to recover the system from a crash. @@ -141,8 +129,10 @@ less common). Options and details are covered in the backup and restore section (). For our purposes here, just note that the WAL Archiver process - can be enabled and configured to run a script on filled WAL - files — usually to copy them to a remote location. + can be enabled and configured to run a script on completed WAL + files — usually to copy them to a remote location. Note + that when a Checkpoint record is written to the WAL the current + file is immediately completed. @@ -166,20 +156,29 @@ A server contains one or more database clusters (clusters - for short). Each cluster contains three or more - databases. - Each database can contain many - schemas. - A schema can contain + for short). By default each newly initialized cluster contains three + databases + (one interactive and two templates, see ). + Each database can contain many user-writable + schemas + (public, by name and permissiveness, by default) and three system generated + user-facing schemas (pg_catalog, pg_temp, and information_schema). + Schemas contain tables, views, and a lot - of other objects. Each table or view belongs to a single schema - only; they cannot belong to another schema as well. The same is - true for the schema/database and database/cluster relation. + of other objects. visualizes this hierarchy. + + Every object uniquely resides in a single schema, + though a single client connection can access multiple schemas within + the same database simultaneously. Special configuration is required to + access multiple databases, even within the same cluster, from a single + client connection. + +
Cluster, Database, Schema @@ -196,61 +195,12 @@
- - A cluster is the outer container for a - collection of databases. Clusters are created by the command - . - - - - template0 is the very first - database of any cluster. Database template0 - is created during the initialization phase of the cluster. - In a second step, database template1 is generated - as a copy of template0, and finally database - postgres is generated as a copy of - template1. Any - new databases - of the cluster that a user might need, - such as my_db, will be copied from the - template1 database. Due to the unique - role of template0 as the pristine original - of all other databases, no client can connect to it. - - - - Every database must contain at least one schema because all - SQL Objects - must be contained in a schema. - Schemas are namespaces for SQL objects and ensure - (with one exception) that the SQL object names are used only once within - their scope across all types of SQL objects. E.g., it is not possible - to have a table employee and a view - employee within the same schema. But it is - possible to have two tables employee in two - different schemas. In this case, the two tables - are separate objects and independent of each - other. The only exception to this cross-type uniqueness is that - unique constraints - and the according unique index - () use the same name. - - - - Some schemas are predefined. public - acts as the default schema and contains all SQL objects - which are created within public or - without using an explicit schema name. public - should not contain user-defined SQL objects. Instead, it is - recommended to create a separate schema that holds individual - objects like application-specific tables or views. - pg_catalog is a schema for all tables and views of the - System Catalog. - information_schema is a schema for several - tables and views of the System Catalog in a way that conforms - to the SQL standard. - - + + + + There are many different SQL object types: database, schema, table, view, materialized @@ -261,8 +211,8 @@ strict hierarchy: All database names, all tablespace names, and all role names are automatically - available throughout the cluster, independent from - the database or schema in which they were defined originally. + available throughout the cluster, not just the database in which + the SQL Command was executed. shows the relation between the object types. @@ -286,7 +236,7 @@ - The physical Perspective: Directories and Files + The Physical Perspective: Directories and Files PostgreSQL organizes long-lasting (persistent) @@ -297,7 +247,7 @@ variable PGDATA points to this directory. The example shown in uses - data as the name of this root directory. + pgdata as the name of this root directory.
@@ -317,16 +267,16 @@
- data contains many subdirectories and + pgdata contains many subdirectories and some files, all of which are necessary to store long-lasting as well as temporary data. The following paragraphs describe the files and subdirectories in - data. + pgdata. - base is a subdirectory in which one - subdirectory per database exists. The names of those + base is a subdirectory containing one + subdirectory per database. The names of those subdirectories consist of numbers. These are the internal Object Identifiers (OID), which are numbers to identify the database definition in the @@ -335,7 +285,7 @@ Within the database-specific - subdirectories, there are many files: one or more for + subdirectories there are many files: one or more for every table and every index to store heap and index data. Those files are accompanied by files for the Free Space Maps @@ -348,10 +298,12 @@ Another subdirectory is global which contains files with information about Global SQL Objects. - One type of such Global SQL Objects are - tablespaces. - In global there is information about - the tablespaces; not the tablespaces themselves. + + + + In pg_tblspc, there are symbolic links + that point to directories containing SQL objects + that exist within a non-default tablespace. @@ -370,17 +322,12 @@ - In pg_tblspc, there are symbolic links - that point to directories containing SQL objects - that exist within a non-default tablespace. - - - - In the root directory data + In the root directory pgdata there are also some files. In many cases, the configuration files of the cluster are stored here. If the instance is up and running, the file postmaster.pid exists here + (by default) and contains the process ID (pid) of the Postmaster which started the instance.