diff --git a/doc/src/sgml/architecture.sgml b/doc/src/sgml/architecture.sgml index b6c0eb4a31..e547a87d08 100644 --- a/doc/src/sgml/architecture.sgml +++ b/doc/src/sgml/architecture.sgml @@ -13,42 +13,26 @@ Collaboration of Processes, RAM, and Files - In a client/server architecture - clients do not have direct access to stored data. Instead, - they send requests to the server and receive - the requested data in response. In the case of - PostgreSQL, the server launches a - single process for each connected client, referred to as a - Backend process. - - - - - - It acts in close cooperation with the - instance which - is a group of tightly coupled server-side processes plus a - Shared Memory - area located in RAM. - Notably, PostgreSQL does not utilize application threading within its - implementation. - - - - - - - - During instance startup time, the - Postmaster - process loads the - configuration files, allocates - Shared Memory, - and starts supporting background processes: + In a client/server architecture clients do not have direct access + to database files and the data stored in them. Instead, they send + requests to the server and receive the requested data in the response. + In the case of PostgreSQL, the server + launches a single process for each client connection, referred to as a + Backend process. + Those Backend processes handle the client's requests by acting on the + Shared Memory. + This leads to other activities (file access, WAL, vacuum, ...) of the + Instance. The + Instance is a group of server-side processes acting on a common + Shared Memory. Notably, PostgreSQL does not utilize application + threading within its implementation. + + + + The first step in an Instance start is the start of the + Postmaster. + He loads the configuration files, allocates Shared Memory, and + starts the other processes of the Instance: Background Writer, Checkpointer, WAL Writer, @@ -56,8 +40,11 @@ Autovacuum, Statistics Collector, Logger, and more. - visualizes - the main aspects of their collaboration. + Later, the Postmaster starts + Backend processes + which communicate with clients and handle their requests. + visualizes the processes + of an Instance and the main aspects of their collaboration.
@@ -77,155 +64,110 @@
- - - - - When a client application tries to connect to a database, - this request is handled initially by the - Postmaster process. It checks authorization, - starts a new Backend process, - and instructs the client application to connect to it. All - further client requests go to this process and are handled - by it. + this request is handled initially by the Postmaster. He + starts a new Backend process and instructs the client + application to connect to it. All further client requests + go to this process and are handled by it. Client requests like SELECT or UPDATE usually lead to the - necessity to read or write some data. In a first attempt - the client's Backend process tries - to get the information out of Shared - Memory. This Shared - Memory is a mirror of parts of the - heap and - index files. - Because files are often larger than memory, it's likely that - the desired information is not (completely) available - in RAM. In this case the Backend process - must transfer additional file pages to - Shared Memory. Files are physically - organized in pages. Every transfer between files and - RAM is performed in units of complete pages; such transfers - do not change the size or layout of pages. - - - - Reading file pages is much slower than reading - RAM. This is the primary motivation for the usage of - Shared Memory. As soon as one - of the Backend processes has - read pages into memory, those pages become available for all - other Backend processes for direct - access in RAM. - - - - Shared Memory is limited in size. - Sooner or later, it becomes necessary to overwrite old RAM - pages. As long as the content of such pages hasn't - changed, this is not a problem. But in - Shared Memory also write - actions take place - — performed by any of the Backend - processes (or an - autovacuum process, - or other processes). Such modified pages are called - dirty pages. - Before dirty pages can be overwritten, - they must be written back to disk. This is a two-step process. - + necessity to read or write some data. This is carried out + by the client's backend process. Reads involve a page-level + cache housed in Shared Memory (for details see: + ) for the benefit of all processes + in the instance. Writes also involve this cache, in additional + to a journal, called a write-ahead-log or WAL. + + + + Shared Memory is limited in size. Thus, it becomes necessary + to evict pages. As long as the content of such pages hasn't + changed, this is not a problem. But in Shared Memory also + write actions take place. Modified pages are called dirty + pages or dirty buffers and before they can be evicted they + must be written back to disk. This happens regularly by the + Background Writer and the Checkpointer process to ensure + that the disk version of the pages are kept up-to-date. + The synchronisation from RAM to disk consists of two steps. + + + First, whenever the content of a page changes, a WAL record - is created out - of the delta-information (difference between the old and - the new content) and stored in another area of - Shared Memory. These - WAL records are read by the - WAL Writer process, - which runs in parallel to the Backend - processes and other processes of - the Instance. It writes - the continuously arising WAL records to - the end of the current + is created out of the delta-information (difference between the + old and the new content) and stored in another area of + Shared Memory. The parallel running WAL Writer process + reads them and appends them to the end of the current WAL file. - Because this writing is sequential, it is much - faster than the more or less random access - to data files with heap - and index information. - As mentioned, this WAL-writing happens - in an independent process. All - WAL records created out of one - dirty page must be transferred - to disk before the dirty page - itself can be transferred to disk. - - - - Second, the transfer of dirty buffers - from Shared Memory to file must - take place. This is the primary task of the - Background Writer process. Because - I/O activities can block other processes significantly, - it starts periodically and acts only for a short period. - Doing so, its expensive I/O activities are spread over - time, avoiding debilitating I/O peaks. Also, the - Checkpointer process transfers - dirty buffers to file — - see next paragraph. - - - - The Checkpointer creates + Such sequential writes are much faster than writes to random + positions of heap and index files. All WAL records created + out of one dirty page must be transferred to disk before the + dirty page itself can be transferred to disk in the second step. + + + + Second, the transfer of dirty buffers from Shared Memory to + files must take place. This is the primary task of the + Background Writer process. Because I/O activities can block + other processes significantly, it starts periodically and + acts only for a short period. Doing so, its extensive (and + expensive) I/O activities are spread over time, avoiding + debilitating I/O peaks. Also, the Checkpointer process + transfers dirty buffers to file. + + + + The Checkpointer creates Checkpoints. - A Checkpoint - is a point in time when all older dirty buffers, - all older WAL records, and - finally a special Checkpoint record - have been written and flushed to disk. - After a Checkpoint, we say - data files and WAL files are in sync. - In case of a recovery (after a crash of the instance) - it can be relied upon that the information of all - WAL records preceding - the last Checkpoint record - were already integrated into the data files. This - speeds up the recovery. + A Checkpoint is a point in time when all older dirty buffers, + all older WAL records, and finally a special Checkpoint record + have been written and flushed to disk. Heap and index files + on the one hand and WAL files on the other hand are in sync. + Previous WAL is no longer required. In other words, + a possibly occurring recovery, which integrates the delta + information of WAL into heap and index files, will happen + by replaying only WAL past the last recorded checkpoint + on top of the current heap and files. This speeds up recovery. - As a result of data changes, - WAL records arise and get written - to WAL files. - Those WAL files — in combination with - a previously taken Base Backup — - are necessary to restore a database after a crash of the - disk on which data files have been stored. Therefore it is - recommended to transfer a copy of the - WAL files - to a second, independent place. The purpose of the - WAL Archiver process is to perform - this copy action. + While the Checkpointer ensures that a running system can crash + and restart itself in a valid state, the administrator needs + to handle the case where the heap and files themselves become + corrupted (and possibly the locally written WAL, though that is + less common). The options and details are covered extensively + in the backup and restore section (). + For our purposes here, note just that the WAL Archiver process + can be enabled and configured to run a script on filled WAL + files — usually to copy them to a remote location. + + + - The Statistics Collector collects - counters about accesses to SQL objects - like tables, rows, indexes, pages, and more. It stores the - obtained information in system tables. + The Statistics Collector collects counters about accesses to + SQL objects like tables, rows, indexes, pages, and more. It + stores the obtained information in system tables. - The Logger writes - text lines about serious and less serious events which can happen - during database access, e.g., wrong password, no permission, - long-running queries, etc. + The Logger writes text lines about serious and less serious + events which can happen during database access, e.g., wrong + password, no permission, long-running queries, etc.
@@ -286,46 +228,40 @@ such as my_db, will be copied from the template1 database. Due to the unique role of template0 as the pristine original - of all other databases, no client - can connect to it. + of all other databases, no client can connect to it. - Every database must contain - at least one schema because - schemas contain the other - SQL Objects. - Schemas are namespaces for - their SQL objects and ensure — with one - exception — that within their scope, names are used only once across all - types of SQL objects. E.g., it is not possible + Every database must contain at least one schema because all + SQL Objects + are contained in a schema. + Schemas are namespaces for their SQL objects and ensure + (with one exception) that within their scope names are used + only once across all types of SQL objects. E.g., it is not possible to have a table employee and a view - employee within the same - schema. But it is possible to have - two tables employee in different - schemas. In this case, the two tables + employee within the same schema. But it is + possible to have two tables employee in + different schemas. In this case, the two tables are separate objects and independent of each other. The only exception to this cross-type uniqueness is that unique constraints - and the according unique index - use the same name. + and the according unique index + () use the same name. Some schemas are predefined. public - acts as the default schema and contains all - SQL objects which are created - within public or without using an explicit schema - name. public should not contain user-defined - SQL objects. Instead, it is recommended to - create a separate schema that - holds individual objects like application-specific tables or - views. pg_catalog is a schema for all tables - and views of the - System Catalog. + acts as the default schema and contains all SQL objects + which are created within public or + without using an explicit schema name. public + should not contain user-defined SQL objects. Instead, it is + recommended to create a separate schema that holds individual + objects like application-specific tables or views. + pg_catalog is a schema for all tables and views of the + System Catalog. information_schema is a schema for several - tables and views of the System Catalog - in a way that conforms to the SQL standard. + tables and views of the System Catalog in a way that conforms + to the SQL standard. @@ -334,11 +270,11 @@ view, index, constraint, sequence, function, procedure, trigger, role, data type, operator, tablespace, extension, foreign data wrapper, and more. A few of them, the - Global SQL Objects, - are outside of the strict hierarchy: - All database names, all tablespace names, and all role names - are automatically known and available throughout the - cluster, independent from + Global SQL Objects, are outside of the + strict hierarchy: All database names, + all tablespace names, and all + role names are automatically known and + available throughout the cluster, independent from the database or schema in which they where defined originally. shows the relation between the object types. @@ -369,7 +305,7 @@ PostgreSQL organizes long-lasting data as well as volatile state information about transactions or replication actions in the file system. Every - Cluster has its root directory + has its root directory somewhere in the file system. In many cases, the environment variable PGDATA points to this directory. The example shown in @@ -427,8 +363,8 @@ subdirectories, there are files containing information about Global SQL Objects. One type of such Global SQL Objects are - tablespaces. In - global there is information about + tablespaces. + In global there is information about the tablespaces, not the tablespaces themselves. @@ -443,14 +379,14 @@ The subdirectory pg_xact contains information about the status of each transaction: - in_progress, committed, aborted, or sub_committed. + in_progress, committed, + aborted, or sub_committed. In pg_tblspc, there are symbolic links - that point to directories containing such - SQL objects that are created within - tablespaces. + that point to directories containing such SQL objects + that are created within tablespaces. @@ -474,7 +410,7 @@ MVCC — Multiversion Concurrency Control - In most cases, PostgreSQL based applications + In most cases, PostgreSQL databases support many clients at the same time. Therefore, it is necessary to protect concurrently running requests from unwanted overwriting of other's data as well as from reading inconsistent data. Imagine an @@ -533,7 +469,7 @@ - The description in this chapter simplifies by omitting detail. + The description in this chapter simplifies by omitting some details. When many transactions are running simultaneously, things can get complicated. Sometimes transactions get aborted via ROLLBACK immediately or after a lot of other activities, sometimes @@ -644,7 +580,7 @@ xids grow, old row versions get out of scope over time. If an old row version is no longer valid for ALL existing transactions, it's called dead. The - space occupied by all dead row versions is called + space occupied by dead row versions is part of the bloat. @@ -680,13 +616,13 @@ bloat. This chapter explains how the SQL command VACUUM and the automatically running - autovacuum processes clean up + Autovacuum processes clean up by eliminating bloat. - Autovacuum runs automatically by + Autovacuum runs automatically by default. Its default parameters as well as such for VACUUM fit well for most standard situations. Therefore a novice database manager can @@ -696,30 +632,27 @@ - Client processes can issue the SQL command VACUUM at arbitrary - points in time. DBAs do this when they recognize special situations, - or they start it in batch jobs which run periodically. - Autovacuum processes run as part of the - instance at the server. - There is a constantly running autovacuum daemon. - It permanently controls the state of all databases based on values that - are collected by the + Client processes can issue the SQL command VACUUM + at arbitrary points in time. DBAs do this when they recognize + special situations, or they start it in batch jobs which run + periodically. Autovacuum processes run as part of the + Instance at the server. + There is a constantly running Autovacuum daemon. It permanently + controls the state of all databases based on values that are collected by the Statistics Collector - and starts autovacuum processes whenever it detects + and starts Autovacuum processes whenever it detects certain situations. Thus, it's a dynamic behavior of PostgreSQL with the intention to tidy up — whenever it is appropriate. - VACUUM, as well as - autovacuum, don't just eliminate bloat. - They perform additional tasks for minimizing future + VACUUM, as well as Autovacuum, don't just eliminate + bloat. They perform additional tasks for minimizing future I/O activities of themselves as well as of other processes. - This extra work can be done in a very efficient way - since in most cases the expensive physical access to pages - has taken place anyway to eliminate bloat. - The additional operations are: + This extra work can be done in a very efficient way since in most + cases the expensive physical access to pages has taken place anyway + to eliminate bloat. The additional operations are: @@ -758,7 +691,7 @@ freeze is controlled by configuration parameters, runtime flags, and in extreme situations by the processes themselves. Because vacuum operations typically are I/O - intensive, which can hinder other activities, autovacuum + intensive, which can hinder other activities, Autovacuum avoids performing many vacuum operations in bulk. Instead, it carries out many small actions with time gaps in between. The SQL command VACUUM runs immediately @@ -784,8 +717,8 @@ xmax must contain an xid which is older - than the oldest xid of all - currently running transactions (min(pg_stat_activity.backend_xmin)). + than the oldest xid of all currently running transactions + (min(pg_stat_activity.backend_xmin)). This criterion guarantees that no existing or upcoming transaction will have read or write access to this row version. @@ -810,9 +743,9 @@ After the vacuum operation detects a superfluous row version, it - marks its space as free for future use of writing - actions. Only in rare situations (or in the case of VACUUM FULL), - is this space released to the operating system. In most cases, + marks its space as free for future use of writing actions. Only + in rare situations (or in the case of VACUUM FULL), + this space is released to the operating system. In most cases, it remains occupied by PostgreSQL and will be used by future INSERT or UPDATE commands concerning this row or a @@ -860,7 +793,7 @@ - When an autovacuum process acts. For optimization + When an Autovacuum process acts. For optimization purposes, it considers the Visibility Map in the same way as VACUUM. Additionally, it ignores tables with few modifications; see , @@ -877,7 +810,7 @@ This logic only applies to row versions of the heap. Index entries don't use xmin/xmax. Nevertheless, such index entries, which would lead to outdated row versions, are released - accordingly. (??? more explanations ???) + accordingly. @@ -996,7 +929,8 @@ - The transaction of xmin must be committed. + The transactions of xmin and + xmax must be committed. @@ -1029,8 +963,8 @@ - When an autovacuum process runs. Such - a process acts in one of two modes: + When an Autovacuum process runs. Such a process acts in one + of two modes: @@ -1053,7 +987,7 @@ (default: 200 million). The value of the oldest unfrozen xid is stored per table in pg_class.relfrozenxid. - In this aggressive mode autovacuum + In this aggressive mode Autovacuum processes all such pages of the selected table that are marked in the Visibility Map to potentially have bloat or unfrozen rows. @@ -1065,7 +999,7 @@ - In the first two cases and with autovacuum in + In the first two cases and with Autovacuum in aggressive mode, the system knows to which value the oldest unfrozen xid has moved forward and logs the value in pg_class.relfrozenxid. @@ -1093,13 +1027,11 @@ Protection against Wraparound Failure - The autovacuum processes are initiated by the - constantly running autovacuum daemon. - If the daemon detects that for a table - autovacuum_freeze_max_age is exceeded, it - starts an autovacuum process in - aggressive mode - (see above) — even if autovacuum is disabled. + The Autovacuum processes are initiated by the constantly running + Autovacuum daemon. If the daemon detects that for a table + autovacuum_freeze_max_age is exceeded, it + starts an Autovacuum process in aggressive mode + (see above) — even if Autovacuum is disabled. Visibility Map and Free Space Map @@ -1127,7 +1059,7 @@ The setting of the flags is silently done by VACUUM - and autovacuum during their bloat and freeze operations. + and Autovacuum during their bloat and freeze operations. This is done to speed up future vacuum actions, regular accesses to heap pages, and some accesses to the index. Every data-modifying operation on any row @@ -1138,10 +1070,9 @@ The Free Space Map (FSM) tracks the amount of free space per page. It is organized as a highly condensed b-tree of (rounded) sizes. - As long as VACUUM or - autovacuum change the free space - on any processed page, they log the new values in - the FSM in the same way as all other writing + As long as VACUUM or Autovacuum change + the free space on any processed page, they log the new + values in the FSM in the same way as all other writing processes. @@ -1153,11 +1084,10 @@ decisions for the generation of execution plans. This information can be gathered with the SQL commands ANALYZE or VACUUM ANALYZE. - But autovacuum processes also gather + But also Autovacuum processes gather such information. Depending on the percentage of changed rows per table , - the autovacuum daemon starts - autovacuum processes to collect + the Autovacuum daemon starts Autovacuum processes to collect statistics per table. This dynamic invocation of analyze operations allows PostgreSQL to adopt queries to changing circumstances. @@ -1221,7 +1151,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; The atomicity also affects the visibility of changes. No - connections running simultaneously to a data modifying + connection running simultaneously to a data modifying transaction will ever see any change before the transaction successfully executes a COMMIT — even in the lowest @@ -1293,7 +1223,7 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; ROLLBACK command instead of a COMMIT. The ROLLBACK cancels the transaction, and all changes made so far remain - invisible forever; it's like they never happened. There + invisible forever; it is as if they had never happened. There is no need for the application to log its activities and undo every step of the transaction separately. @@ -1319,10 +1249,11 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; - There is an additional feature which defines transactions' - isolation level - to each other in a declarative way. It automatically - prevents applications from some strange situations. + There is the additional feature + 'isolation level', + which separates transactions from each other in certain ways. + It automatically prevents applications from some strange + situations. @@ -1405,19 +1336,21 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; are also in the files, but - as usual - they are never seen by any of the following transactions because uncommited changes are never shown. Such recovery actions run - completely automatically, it is not necessary that you - configure or start anything by yourself. + completely automatically, it is not necessary that a + database administrator configure or start anything by + himself. Disk crash If a disk crashes, the course of action described previously cannot work. It is likely that the WAL files and/or the - data and index files are no longer available. You need - to take special actions to overcome such situations. + data and index files are no longer available. The + database administrator must take special actions to + overcome such situations. - You obviously need a backup. How to take such a backup + He obviously needs a backup. How to take such a backup and use it as a starting point for a recovery of the cluster is explained in more detail in the next chapter. @@ -1428,13 +1361,12 @@ UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; It is conceivable that over time the disk gets full, and there is no room for additional data. In this case, PostgreSQL stops accepting - commands which change the data or even terminates - completely. No data loss or data corruption will - occur. + data-modifying commands or even terminates completely. + No data loss or data corruption will occur. - To come out of such a situation, you should remove - unused files from this disk. But you should never + To come out of such a situation, the administrator should + remove unused files from this disk. But he should never delete files from the data directory. Nearly all of them are necessary for the consistency diff --git a/doc/src/sgml/start.sgml b/doc/src/sgml/start.sgml index 8751410179..abb61445f2 100644 --- a/doc/src/sgml/start.sgml +++ b/doc/src/sgml/start.sgml @@ -76,30 +76,27 @@ A process at the server site with the name Postmaster. postgres + postmaster It accepts connection requests from client applications, starts (forks) a new - Backend process for each of them, and passes + Backend process for each of them, and passes the connection to it. From that point on, the client and the new - Backend process - communicate directly without intervention by the original - postgres process. Thus, the - postgres process is always running, waiting - for new client connections, whereas clients and associated - Backend processes come and go. - (All of this is of course invisible to the user. We only mention it - here for completeness.) + Backend process communicate directly without intervention by the original + Postmaster process. Thus, the Postmaster process is always running, + waiting for new client connections, whereas clients and associated + Backend processes come and go. (All of this is of course invisible + to the user. We only mention it here for completeness.) A group of processes at the server site, the instance, to which also the - postgres process belongs. Their duties are - handling of central, common database activities like file access, - vacuum, checkpoints, - replication, and more. The mentioned Backend processes - delegate those actions to the instance. + linkend="glossary-instance">Instance, to which also + the Postmaster process belongs. Their duties are handling of + central, common database activities like file access, transaction + handling, vacuum, checkpoints, replication, and more. The mentioned + Backend processes delegate those actions to the instance. @@ -127,18 +124,6 @@ file name) on the database server machine. - - The PostgreSQL server can handle - multiple concurrent connections from clients. To achieve this it - starts (forks) a new process for each connection. - From that point on, the client and the new server process - communicate without intervention by the original - postgres process. Thus, the - supervisor server process is always running, waiting for - client connections, whereas client and associated server processes - come and go. (All of this is of course invisible to the user. We - only mention it here for completeness.) -