Re: Initial COPY of Logical Replication is too slow - Mailing list pgsql-hackers
| From | Masahiko Sawada |
|---|---|
| Subject | Re: Initial COPY of Logical Replication is too slow |
| Date | |
| Msg-id | CAD21AoDQM62GOtaTzD_CVMSsFhv6o9c0Au1dSM1QuxeKFkWAKw@mail.gmail.com Whole thread |
| In response to | Re: Initial COPY of Logical Replication is too slow (Masahiko Sawada <sawada.mshk@gmail.com>) |
| Responses |
Re: Initial COPY of Logical Replication is too slow
Re: Initial COPY of Logical Replication is too slow Re: Initial COPY of Logical Replication is too slow |
| List | pgsql-hackers |
On Mon, Jan 26, 2026 at 12:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Jan 19, 2026 at 9:44 AM Marcos Pegoraro <marcos@f10.com.br> wrote: > > > > Em sex., 19 de dez. de 2025 às 22:59, Masahiko Sawada <sawada.mshk@gmail.com> escreveu: > >> > >> Yeah, if we pass a publication that a lot of tables belong to to > >> pg_get_publication_tables(), it could take a long time to return as it > >> needs to construct many entries. > > > > > > Well, I don't know how to help but I'm sure it's working badly. > > Today I added some fields on my server, then seeing logs I could see how slow this process is. > > > > duration: 2213.872 ms statement: SELECT DISTINCT (CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts) THEN NULL ELSEgpt.attrs END) FROM pg_publication p, LATERAL pg_get_publication_tables(p.pubname) gpt, pg_class c WHERE gpt.relid= 274376788 AND c.oid = gpt.relid AND p.pubname IN ( 'mypub' ) > > > > 2 seconds to get the list of fields of a table is really too slow. > > How can we solve this ? > > After more investigation of slowness, it seems that the > list_concat_unique_oid() called below is quite slow when the database > has a lot of tables to publish: > > relids = GetPublicationRelations(pub_elem->oid, > pub_elem->pubviaroot ? > PUBLICATION_PART_ROOT : > PUBLICATION_PART_LEAF); > schemarelids = GetAllSchemaPublicationRelations(pub_elem->oid, > pub_elem->pubviaroot ? > PUBLICATION_PART_ROOT : > PUBLICATION_PART_LEAF); > pub_elem_tables = list_concat_unique_oid(relids, schemarelids); > > This is simply because it's O(n^2), where n is the number of oids in > schemarelids in the test case. A simple change would be to do sort & > dedup instead. With the attached experimental patch, the > pg_get_publication_tables() execution time gets halved in my > environment (796ms -> 430ms with 50k tables). If the number of tables > is not large, this method might be slower than today but it's not a > huge regression. > > In the initial tablesync cases, it could be optimized further in a way > that we introduce a new SQL function that gets the column list and > expr of the specific table. This way, we can filter the result by > relid at an early stage instead of getting all information and > filtering by relid as the tablesync worker does today, avoiding > overheads of gathering system catalog scan results. I've drafted this idea and I find it looks like a better approach. The patch introduces the pg_get_publication_table_info() SQL function that returns the column list and row filter expression like pg_get_publication_tables() returns but it checks only the specific table unlike pg_get_publication_tables(). On my env, the tablesync worker's query in question becomes 0.6ms from 288 ms with 50k tables in one publication. Feedback is very welcome. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: