Re: Questions about PostgreSQL implementation details - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: Questions about PostgreSQL implementation details |
Date | |
Msg-id | CAMsr+YGdO6WsuXPABEWE3aT3a0Juwx1WpP=VrJv0j7qhDhjf+A@mail.gmail.com Whole thread Raw |
In response to | Questions about PostgreSQL implementation details (Julien Delplanque <julien.delplanque@inria.fr>) |
List | pgsql-hackers |
On Mon, 9 Dec 2019 at 23:35, Julien Delplanque <julien.delplanque@inria.fr> wrote:
Hello PostgreSQL hackers,
I hope I am posting on the right mailing-list.
I am actually doing a PhD related to relational databases and software engineering.
I use PostgreSQL for my research.
I have a few questions about the internals of PostgreSQL and I think they require experts knowledge.
I could not find documentation about that in the nice PostgreSQL documentation but maybe I missed something? Tell me if it is the case.
There are a bunch of README files in the source tree that concern various innards of PostgreSQL. They're not always referred to by any comments etc, so you have to know they exist. They're usually well worth reading, though it can take a while before you understand enough of PostgreSQL's architecture for them to make sense...
Try
find src/ -name README\*
Q1. Are PostgreSQL's meta-description tables (such as pg_class) the "reality" concerning the state of the DB or are they just a virtual representation ?
That's been largely answered. But I want to point out an important caveat that isn't obvious to new people: The oid of a relation (pg_class.oid) is not the same thing as the pg_class.relfilenode, which is usually the base of the filename of the on-disk storage for the relation. On an idle or new database most relations are created with an equal oid and relfilename, so it's easy to think the oid maps to the on-disk name of a relation, but it doesn't. The relation oid will not change so long as the relation exists, but the relfilenode may change if the table contents are rewritten, etc. Additionally, there are special tables that are "relmapped" such that they don't have a normal relfilenode at all, instead access is indirected via a separate mapping. IIRC that's mainly necessary so we can bootstrap access to the catalog tables that tell us how to read the catalogs.
What I would like to know with this question is: would it be possible to implement DDL queries (e.g. CREATE TABLE, DROP TABLE, CREATE VIEW, ALTER TABLE, etc.) as DML queries that modify the meta-data stored in meta-description tables?
Not really.
PostgreSQL has a caching layer - sycache, relcache, catcache - and invalidation scheme that it relies on. It doesn't execute regular queries on the system catalogs. It also has simplifying rules around how they are updated and accessed. See the logic in genam.c etc. Catalogs may also represent things that aren't just other DB rows - for example, pg_class entries are associated with files on disk for individual database tables.
You can't just insert into pg_class, pg_attribute, etc and expect that to safely create a table. Though it's surprising how much you can get away with by hacking the catalogs if you're very careful and you trick PostgreSQL into firing appropriate invalidations. I'd quite like to have a SQL-exposed way to do a forced global cache flush and invalidation for use in emergency scary catalog hacking situations.
So you can do quite a bit with direct catalog surgery, but it's dangerous and if you break the database, you get to keep the pieces.
Q1.1 If it is possible, is what is done in reality? I have the feeling that it is not the case and that DDL queries are implemented in C directly.
Right. See standard_ProcessUtility() and friends.
Q1.2 If it is possible and not done, what is the reason?
Speed - no need to run the full executor. Simplification of catalog access. Caching and invalidations. Chicken/egg problems: how do you "CREATE TABLE pg_class"? . Lots more.
Q2. Are PostgreSQL's "meta-constraints" (i.e. constraints related to database structure such as "a table can only have a single primary key") implemented in C code or via data constraints on PostgreSQL's meta-description tables?
System catalogs are not permitted to have CONSTRAINTs (CHECK constraints, UNIQUE constraints, PRIMARY KEY constraints, FOREIGN KEY constraints, etc).
All such management is done in C level logic with the assistance of the pg_depend catalog and the relationships it tracks.
Q2.1 If they are not implemented via data constraints on meta-description tables, why ?
Same as above.
Q2.2 Is there somewhere in the documentation a list of such "meta-constraints" implemented by PostgreSQL?
Not AFAIK.
Why?
pgsql-hackers by date: