Design and development of mission-critical high-performance systems based on PostgreSQL
A new version of pg_probackup, a backup tool developed by Postgres Pro team, has been released
Postgres Pro Standard 22.214.171.124 has been released today. It is based on PostgreSQL 9.6.2 and contains the new upstream fixes and Postgres Pro extensions updates, including pg_pathman, pg_probackup and sr_plan.
What is the scientifically correct way if using NULLs ?
Gilles Darold from the french company Dalibo, the ora2pg author, has blogged that the performance improvement given by the Svetlana's patch can be as high as 45%.
As already mentioned, several versions of each row can be simultaneously available in the database. And we need to somehow distinguish one version from another one. To this end, each version is labeled with its effective "time" (
xmin) and expiration "time" (
xmax). Quotation marks denote that a special incrementing counter is used rather than the time itself. And this counter is the transaction identifier.
(As usual, in reality this is more complicated: the transaction ID cannot always increment due to a limited bit depth of the counter. But we will explore more details of this when our discussion reaches freezing.)
When a row is created, the value of
xmin is set equal to the ID of the transaction that performed the INSERT command, while
xmax is not filled in.
When a row is deleted, the
xmax value of the current version is labeled with the ID of the transaction that performed DELETE.
An UPDATE command actually performs two subsequent operations: DELETE and INSERT. In the current version of the row,
xmax is set equal to the ID of the transaction that performed UPDATE. Then a new version of the same row is created, in which the value of
xmin is the same as
xmax of the previous version.
Last time we talked about data consistency, looked at the difference between levels of transaction isolation from the point of view of the user and figured out why this is important to know. Now we are starting to explore how PostgreSQL implements snapshot isolation and multiversion concurrency.
In this article, we will look at how data is physically laid out in files and pages. This takes us away from discussing isolation, but such a digression is necessary to understand what follows. We will need to figure out how the data storage is organized at a low level.
If you look inside tables and indexes, it turns out that they are organized in a similar way. Both are database objects that contain some data consisting of rows.
There is no doubt that a table consists of rows, but this is less obvious for an index. However, imagine a B-tree: it consists of nodes that contain indexed values and references to other nodes or table rows. It's these nodes that can be considered index rows, and in fact, they are.
Actually, a few more objects are organized in a similar way: sequences (essentially single-row tables) and materialized views (essentially, tables that remember the query). And there are also regular views, which do not store data themselves, but are in all other senses similar to tables.
All these objects in PostgreSQL are called the common word relation. This word is extremely improper because it is a term from the relational theory. You can draw a parallel between a relation and a table (view), but certainly not between a relation and an index. But it just so happened: the academic origin of PostgreSQL manifests itself. It seems to me that it's tables and views that were called so first, and the rest swelled over time.
With this article I start a series about the internal structure of PostgreSQL.
The material will be based on training courses (in Russian) on administration that Pavel Luzanov and I are creating. Not everyone likes to watch video (I definitely do not), and reading slides, even with comments, is no good at all.
Unfortunately, the only course available in English at the moment is 2-Day Introduction to PostgreSQL 11.
Of course, the articles will not be exactly the same as the content of the courses. I will talk only about how everything is organized, omitting the administration itself, but I will try to do it in more detail and more thoroughly. And I believe that the knowledge like this is as useful to an application developer as it is to an administrator.
I will target those who already have some experience in using PostgreSQL and at least in general understand what is what. The text will be too difficult for beginners. For example, I will not say a word about how to install PostgreSQL and run psql.
The stuff in question does not vary much from version to version, but I will use PostgreSQL 11.
The first series deals with issues related to isolation and multiversion concurrency, and the plan of the series is as follows:
Off we go!
Pg_probackup is a utility to manage backup and recovery of PostgreSQL database clusters. It is designed to perform periodic backups of the PostgreSQL instance that enable you to restore the server in case of a failure.
Functions for defining and using variables in client sessions.
An active monitoring agent for Postgres Pro. Based on zabbix, mamonsu provides an extensible cross-platform solution that can collect and visualize multiple Postgres Pro and system metrics.
JsQuery – is a language to query jsonb data type, introduced in PostgreSQL release 9.4. It's primary goal is to provide an additional functionality to jsonb (currently missing in PostgreSQL), such as a simple and effective way to search in nested objects and arrays, more comparison operators with indexes support.
Temporary tables are often being used. Their performance could be increased if they are removed from system catalog. Also it is attractive to allow using temporary tables on read-only standbys.
PGLZ compression of individual values is used now in PostgreSQL. That’s good, but sometimes it’s possible to achieve significant compression only when compressing multiple values together. This is why we’re considering page-level data compression.
Current approach for execution is essentially a kind plan tree interpretation. JIT-compilation of queries means compilation of plan tree into binary. Such compilation could get rid of multiple levels of indirection therefore accelerating queries.
Postgres Professional is the Russian PostgreSQL company founded by Russian PostgreSQL contributors. Company has 50+ employees, among them three Major PostgreSQL Contributors.
Postgres Professional is an active member of international PostgreSQL community, developers had committed 93 patches to the latest release of PostgreSQL 10.0.
Our company had successfully performed large PostgreSQL projects including database migration projects for well-known Russian and international companies. We provide industrial PostgreSQL services: vendor technical support, migration, custom extensions and core patches development, migration-related consulting, training and certification.