When I've been thinking about adding a built-in connection pool, my rough plan was mostly "bgworker doing something like pgbouncer" (that is, listening on a separate port and proxying everything to regular backends). Obviously, that has pros and cons, and probably would not work serve the threading use case well.
And we will get the same problem as with pgbouncer: one process will not be able to handle all connections... Certainly it is possible to start several such scheduling bgworkers... But in any case it is more efficient to multiplex session in backend themselves.
pgbouncer hold all time client connect. When we implement the listeners, then all work can be done by worker processes not by listeners.
Sorry, I do not understand your point. In my case pgbench establish connection to the pgbouncer only once at the beginning of the test. And pgbouncer spends all time in context switches (CPU usage is 100% and it is mostly in kernel space: top of profile are kernel functions). The same picture will be if instead of pgbouncer you will do such scheduling in one bgworker. For the modern systems are not able to perform more than several hundreds of connection switches per second. So with single multiplexing thread or process you can not get speed more than 100k, while at powerful NUMA system it is possible to achieve millions of TPS. It is illustrated by the results I have sent in the previous mail: by spawning 10 instances of pgbouncer I was able to receive 7 times bigger speed.
pgbouncer is proxy sw. I don't think so native pooler should be proxy too. So the compare pgbouncer with hypothetical native pooler is not fair, because pgbouncer pass all communication