Thread: Small patch to fix an O(N^2) problem in foreign keys
Attached is a small patch and a script to reproduce the issue. The problem is a cache introduced in commit 45ba4247 that improves foreign key lookups during bulk updates when the FK value does not change. When restoring a schema dump from a database with many (say 100,000) foreign keys, this cache is growing very big and every ALTER TABLE command is causing a InvalidateConstraintCacheCallBack(), which does a sequential hash table scan. The patch uses a heuristic method of detecting when the hash table should be destroyed and recreated. InvalidateConstraintCacheCallBack() adds the current size of the hash table to a counter. When that sum reaches 1,000,000, the hash table is flushed. This improves the schema restore of a database with 100,000 foreign keys by factor 3. According to my tests the patch does not interfere with the bulk updates, the original feature was supposed to improve. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info
Attachment
Jan Wieck <jan@wi3ck.info> wrote: > The problem is a cache introduced in commit 45ba4247 that improves That's a bit off; 45ba424f seems to be what you mean. > foreign key lookups during bulk updates when the FK value does not > change. When restoring a schema dump from a database with many (say > 100,000) foreign keys, this cache is growing very big and every ALTER > TABLE command is causing a InvalidateConstraintCacheCallBack(), which > does a sequential hash table scan. > > The patch uses a heuristic method of detecting when the hash table > should be destroyed and recreated. InvalidateConstraintCacheCallBack() > adds the current size of the hash table to a counter. When that sum > reaches 1,000,000, the hash table is flushed. This improves the schema > restore of a database with 100,000 foreign keys by factor 3. > > According to my tests the patch does not interfere with the bulk > updates, the original feature was supposed to improve. In the real-world customer case that caused you to look into this, I thought 45ba424f drove schema-only restore time from 2 hours to about 25 hours, and that this patch takes it back down to 2 hours. Am I remembering right? And this came about because it added 20-some hours to a pg_upgrade run? If there are no objections, I will push this as a bug fix back to 9.3, where the performance regression was introduced. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 09/08/2015 08:49 AM, Kevin Grittner wrote: > Jan Wieck <jan@wi3ck.info> wrote: > >> The problem is a cache introduced in commit 45ba4247 that improves > > That's a bit off; 45ba424f seems to be what you mean. Copy paste over paper. > >> foreign key lookups during bulk updates when the FK value does not >> change. When restoring a schema dump from a database with many (say >> 100,000) foreign keys, this cache is growing very big and every ALTER >> TABLE command is causing a InvalidateConstraintCacheCallBack(), which >> does a sequential hash table scan. >> >> The patch uses a heuristic method of detecting when the hash table >> should be destroyed and recreated. InvalidateConstraintCacheCallBack() >> adds the current size of the hash table to a counter. When that sum >> reaches 1,000,000, the hash table is flushed. This improves the schema >> restore of a database with 100,000 foreign keys by factor 3. >> >> According to my tests the patch does not interfere with the bulk >> updates, the original feature was supposed to improve. > > In the real-world customer case that caused you to look into this, > I thought 45ba424f drove schema-only restore time from 2 hours to > about 25 hours, and that this patch takes it back down to 2 hours. > Am I remembering right? And this came about because it added > 20-some hours to a pg_upgrade run? From 2 hours to >50, but yes, this is that case. > > If there are no objections, I will push this as a bug fix back to > 9.3, where the performance regression was introduced. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info
Jan Wieck <jan@wi3ck.info> wrote: >>> The patch uses a heuristic method of detecting when the hash table >>> should be destroyed and recreated. InvalidateConstraintCacheCallBack() >>> adds the current size of the hash table to a counter. When that sum >>> reaches 1,000,000, the hash table is flushed. This improves the schema >>> restore of a database with 100,000 foreign keys by factor 3. >>> >>> According to my tests the patch does not interfere with the bulk >>> updates, the original feature was supposed to improve. >> >> In the real-world customer case that caused you to look into this, >> I thought 45ba424f drove schema-only restore time from 2 hours to >> about 25 hours, and that this patch takes it back down to 2 hours. >> Am I remembering right? And this came about because it added >> 20-some hours to a pg_upgrade run? > > From 2 hours to >50, but yes, this is that case. > >> If there are no objections, I will push this as a bug fix back to >> 9.3, where the performance regression was introduced. Hearing no objections, pushed with one minor tweak Jan and I discussed off-list -- the original patch duplicated a bit of code that we agreed should be split into a static function and called from both places, to protect against someone later changing one and missing the other. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company