These aren't the combining codepoints. They're new substitutions defined in r34 of the Latin-ASCII transliteration file. I had wondered about those, too, and did some testing.
I don't think this is quite right.
However, you are correct that something isn't write. In testing why I was getting a different output, I had reverted to the generate_unaccent_rules.py BEFORE my changes. And then I applied my update for the transliteration file format to the reverted version. The patch for generate_unaccent_rules should still be good, but the generated rules file didn't include the combining diacriticals. In generating that, I want to double check some of the additions before re-submitting.
Could you also add some tests in contrib/unaccent/sql/unaccent.sql at the same time? That would be nice to check easily the extent of the patches proposed on this thread.
That makes sense. I'm happy to do that. Let me look at that file and see how extensive the other changes (encoding and removal of special characters would be).