Optimizing ResouceOwner to speed up COPY - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Optimizing ResouceOwner to speed up COPY |
Date | |
Msg-id | 84f20db7-7d57-4dc0-8144-7e38e0bbe75d@vondra.me Whole thread Raw |
Responses |
Re: Optimizing ResouceOwner to speed up COPY
Re: Optimizing ResouceOwner to speed up COPY |
List | pgsql-hackers |
Hi, While reviewing and testing a nearby patch (using COPY for batching in postgres_fdw), I noticed some of the COPY queries are spending a substantial amount of time in ResourceOwnerAddToHash(). The exact figure depends on amount of data in the COPY, but it was often close to 50% (according to perf-record/report). The reason is pretty simple - ResourceOwner tracks the resources in a very simple hash table, with O(n^2) behavior with duplicates. This happens with COPY, because COPY creates an array of a 1000 tuple slots, and each slot references the same tuple descriptor. And the descriptor is added to ResourceOwner for each slot. The trouble is the ResourceOwner uses a simple open addressing hash table, and the duplicate values end up in a long dense "train" of entries, slowing down both additions and deletions. Starting with an empty hash table, adding the first entry is fine - the first slot is available. But the following additions need to skip over the current entries - the n-th addition needs to skip over n-1 entries. With N entries this means (N * (N-1) / 2). And then the same thing happens for deletions, in reverse. This is not the first time I noticed ResourceOwner in profiles, but I never investigated it. I don't know if all those cases were caused by this same behavior, but it's likely at least some were ... There's an easy way to improve this by allowing a single hash entry to represent multiple references to the same resource. The attached patch adds a "count" to the ResourceElem, tracking how many times that resource was added. So if you add 1000 tuples slots, the descriptor will have just one ResourceElem entry with count=1000. This reduces the insert/delete complexity - not quite to O(1), but much lower than O(n^2). It also reduces the memory usage with duplicates, and postpones when the hash table needs to be enlarged. Of course, if there are no duplicates, it'll use a bit more memory. The deduplication is "opportunistic" in the sense that you may still end up with multiple entries for the same value/kind. When adding a resource finds an empty slot before hitting the other entry, it'll use the slot. Also, only the hash table is deduplicated, not the initial array (which however quite small). This is a reasonable trade off because it means it does not get more expensive for cases with no duplicates, while still improving cases with many duplicate values. To demonstrate the benefits, here's a throughput (tps) from a COPY of a certain number of rows from a file into an UNLOGGED table, with 1, 4 and 8 clients. master patched rows | 1 4 8 | 1 4 8 ---------|--------------------------|------------------------- 10 | 112090 418421 594905 | 112871 419043 589646 100 | 35265 121903 168238 | 42536 138578 183740 1000 | 1450 5700 10725 | 5847 21594 30713 10000 | 531 1943 2988 | 743 2619 3557 100000 | 72 244 324 | 76 256 332 Or, as relative throughput: patched / master rows | 1 4 8 --------------------------------------- 10 | 101% 100% 99% 100 | 121% 114% 109% 1000 | 403% 379% 286% 10000 | 140% 135% 119% 100000 | 106% 105% 102% Those are some nice improvements, especially with 1000 rows. Which makes sense, because 1000 is the internal batch size in COPY. So the overhead gradually increases up to 1000 rows, and then it amortizes over many batches. It has very little impact for large COPY. I've done the same test with a logged table. The results significantly depend on the storage (how fast it can handle commits) - not surprising. But with good storage the overall behavior is almost exactly the same. I didn't do any benchmarking focused on cases with no/few duplicates (although the COPY with 10 rows could qualify) yet. But as I explained earlier, the behavior should not change at all. There's an extra uint32, but that's it. regards -- Tomas Vondra
Attachment
pgsql-hackers by date: