Re: some namespace.c refactoring - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: some namespace.c refactoring |
Date | |
Msg-id | CALDaNm0xD7+B4EsijJRuNUTk-=wFWOiJ2gycUtgKZQnutOfZCA@mail.gmail.com Whole thread Raw |
In response to | Re: some namespace.c refactoring (Peter Eisentraut <peter.eisentraut@enterprisedb.com>) |
List | pgsql-hackers |
On Thu, 23 Feb 2023 at 16:38, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > On 20.02.23 15:03, Peter Eisentraut wrote: > > On 15.02.23 19:04, Alvaro Herrera wrote: > >> That said, I think most of this code is invoked for DDL, where > >> performance is not so critical; probably just fixing > >> get_object_property_data to not be so naïve would suffice. > > > > Ok, I'll look into that. > > I did a variety of performance testing on this now. > > I wrote a C function that calls the "is visible" functions in a tight loop: > > Datum > pg_test_visible(PG_FUNCTION_ARGS) > { > int32 count = PG_GETARG_INT32(0); > Oid relid = PG_GETARG_OID(1); > Oid typid = PG_GETARG_OID(2); > > for (int i = 0; i < count; i++) > { > RelationIsVisible(relid); > TypeIsVisible(typid); > //ObjectIsVisible(RelationRelationId, relid); > //ObjectIsVisible(TypeRelationId, typid); > } > > PG_RETURN_VOID(); > } > > (It's calling two different ones to defeat the caching in > get_object_property_data().) > > Here are some run times: > > unpatched: > > select pg_test_visible(100_000_000, 'pg_class', 'int4'); > Time: 4536.747 ms (00:04.537) > > select pg_test_visible(100_000_000, 'tenk1', 'widget'); > Time: 10828.802 ms (00:10.829) > > (Note that the "is visible" functions special case system catalogs.) > > patched: > > select pg_test_visible(100_000_000, 'pg_class', 'int4'); > Time: 11409.948 ms (00:11.410) > > select pg_test_visible(100_000_000, 'tenk1', 'widget'); > Time: 18649.496 ms (00:18.649) > > So, it's slower, but it's not clear whether it matters in practice, > considering this test. > > I also wondered if this is visible through a normal external function > call, so I tried > > do $$ begin perform pg_get_statisticsobjdef(28999) from > generate_series(1, 1_000_000); end $$; > > (where that is the OID of the first object from select * from > pg_statistic_ext; in the regression database). > > unpatched: > > Time: 6952.259 ms (00:06.952) > > patched (first patch only): > > Time: 6993.655 ms (00:06.994) > > patched (both patches): > > Time: 7114.290 ms (00:07.114) > > So there is some visible impact, but again, the test isn't realistic. > > Then I tried a few ways to make get_object_property_data() faster. I > tried building a class_id+index cache that is qsort'ed (once) and then > bsearch'ed, that helped only minimally, not enough to make up the > difference. I also tried just searching the class_id+index cache > linearly, hoping maybe that if the cache is smaller it would be more > efficient to access, but that actually made things (minimally) worse, > probably because of the indirection. So it might be hard to get much > more out of this. I also thought about PerfectHash, but I didn't code > that up yet. > > Another way would be to not use get_object_property_data() at all but > write a "common" function that we pass in all it needs hardcodedly, like > > bool > RelationIsVisible(Oid relid) > { > return IsVisible_common(RELOID, > Anum_pg_class_relname > Anum_pg_class_relnamespace); > } > > This would still save a lot of duplicate code. > > But again, I don't think the micro-performance really matters here. I'm seeing that there has been no activity in this thread for almost a year now, I'm planning to close this in the current commitfest unless someone is planning to take it forward. Regards, Vignesh
pgsql-hackers by date: