I think this can be solved easily in the patch, by having ri_BuildQueryKey() compare the parent's fk_attnums to the parent; if they are equal then use the parent's constaint_id, otherwise use the child constraint. That way, the cache entry is reused in the common case where they are identical.
Somewhat of a detour, but in reviewing the patch for Statement-Level RI checks, Andres and I observed that SPI made for a large portion of the RI overhead.
Given that we're already looking at these checks, I was wondering if this might be the time to consider implementing these checks by directly scanning the constraint index.