Hacking PostgreSQL 13 Webinar: Questions & Answers
To get things done before 2020 is over, I decided to publish my blog post with answers to your questions from our webinar on PostgreSQL 13 that we hosted in October. I’m still looking into release 13 and will soon give another talk on PG13 monitoring-related features that became available in this version of PostgreSQL. For now, let’s go through the webinar questions.
Q. What's the developer's say about why the new vacuum option: Parallel n_workers is not applicable on vacuum full / auto vacuum?
A. Vacuum FULL doesn’t have much in common with regular vacuum. It uses a different algorithm, thus parallel vacuum full would be another feature. I am not aware of any development effort in this area.
Autovacuum could possibly use this optimization, but developers decided to disable it, because it will make autovacuum tuning more complex. Parallelism increases resource consumption, which is good to effectively utilise them. However, it may not be desirable for background processes, such as autovacuum.
Q. Some more insight about "Password Protocol" would be great!
A. I was referring to the discussion in pgsql-hackers entitled “Add "password_protocol" connection parameter to libpq” https://www.postgresql.org/message-id/flat/edaeee0d-5367-ad46-6a5a-1357ea595151%40postgresql.org
It eventually evolved into the ‘channel_binding’ option. You can find the documentation here: https://www.postgresql.org/docs/13/libpq-connect.html
Q. I wonder who needs these new Postgres releases? In my recent 2 projects, we used PostgreSQL 9.+, and everyone was happy! :)
A. Maybe they exist for PostgreSQL hackers developing the DBMS so they could have jobs and entertain themselves. However, with 100+ features coming with each new release, I guess, they still have some value.
Q. Do we have any built-in tools for a selective dump of DB?
A. Built-in utility pg_dump has options to select objects (schemas, tables, access privileges, foreign data, etc) to dump.
Q. Does btree deduplication have any overhead on insert and update operations?
A. The overhead for the insert-only benchmark is about 2%. The overhead is amortized across insertions, as deduplication only happens when the page is full to prevent page split.
Also, for certain workloads, deduplication can be undesirable because index size reduction may increase lock contention.
Q. Any plans to implement partitioning on a column that is not a part of the primary key and reference partitioning?
A. None that I am aware of. A bunch of other features for partitioning is being actively developed now. If you are interested in it, please, feel free to try new features early and share your thoughts. It will help to move the discussion forward.
Q. How to become a Postgres developer?
A. This article pretty much covers the topic. https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F
If you want to develop a low-level extension, look for an article like this one:
https://www.highgo.ca/2020/01/10/how-to-create-test-and-debug-an-extension-written-in-c-for-postgresql/
Always check the version of PostgreSQL, used in the examples, because the internal API may change from version to version.
If you want to learn more about the PostgreSQL internals for a start, I highly recommend this series of articles: http://www.interdb.jp/pg/index.html
if you want to contribute to the community version, consider reviewing existing patches on commitfest: https://commitfest.postgresql.org/31/
Q. What is the best way of "data validation" migration from Oracle to PostgreSQL? Is there any forum about сross-engine database migration from Oracle to PostgreSQL?
A. From what I know, Alicja Kucharczyk has recently created the group on LinkedIn covering Oracle-to-PostgreSQL migrations: https://www.linkedin.com/groups/13921469/.
Q. When do you think Postgres 13 should be used in the production environment?
Q. Considering that PostgreSQL 13 was released on Sept 24th, when would you consider the version stable?
A. It mostly depends on the practice of your company. I would consider the first (13.1) minor release stable enough for most users. However, large corporations and banks often wait for several years before they upgrade their database software.
Whatever you choose, an early adopter trail or a skeptic road, always test a database upgrade specifically on your data and workload before upgrading your production database.
Q. What is the request performance of Postgres 13?
A. Performance highly depends on specific database setup and workload. You can find some standard benchmarks here: https://www.2ndquadrant.com/en/blog/tpc-h-performance-since-postgresql-8-3/
Q. What do you think about HA based on logical replication?
A. Well, logical replication should be a topic for a separate discussion. We cannot normally recommend anything as a solution for HA without looking into a certain well-described case. You can share the details with us via info@postgrespro.com
Q. Is there anything related to pre-fetching WAL files to speed up the start-up process?
A. Yep. We have a couple of active discussions:
https://commitfest.postgresql.org/31/2410/
https://commitfest.postgresql.org/30/2732/
If you are interested in this feature, feel free to join and share your thoughts.
Q. Why is jsonb field so slow?
A. I would you to address this question to Oleg Bartunov, the head of our internal group working on PostgreSQL JSONB, you can send an email to info@postgrespro.com, and it will be forwarded to our JSONB developers.
Q. Will your PostgreSQL course ever be available in English, Anastasia?
A. To be honest, it isn’t my top priority for now. The first version of this course only scratched a surface, now I want to go deeper.
If you are interested in the architecture details, you can check this series of articles: http://www.interdb.jp/pg/index.html
If you want to hear some specific topic, feel free to contact me in Linkedin or e-mail. I am always in search of exciting topics for my talks.
Q. Any chance to change the defaults for autovacuum thresholds? What I often see, is that autovacuum is focusing on very small tables ~1-20MB in size and is running thousands times a day on it. I think that increasing the threshold for both vacuum and analyze would help a lot to 10k for instance.
A. Maybe you will find this discussion relevant. https://commitfest.postgresql.org/31/2658/
Don’t hesitate to share your thoughts with developers.
Q. Can we use parallelism to execute reindex after an upgrade?
A. Internally REINDEX uses CREATE INDEX, which in its turn is able to utilize multiple CPUs to sort data. Besides, you can always run several REINDEX processes in parallel.
If you have more questions for me, please email me or join me this Thursday, December 10, at PostgreSQL Monitoring Day with Zabbix & PostgresPro where I will be discussing monitoring-related features in PostgreSQL 13.