Bruce Momjian <bruce@momjian.us> wrote:
> ENGEMANN, DAYSE wrote:
>> pg_dump -h sourcemachine -U sourceuser source_dbname \
>> | psql target_dbname
>
> Has anyone done any measurement of whether it is faster to do the
> dump on the local machine with psql remote or from a remote
> machine (where psql would be local)?
I haven't, because I want to dump with the pg_dump from the target
environment and because I want to restore using a database
superuser, which we only allow through a local ident connection.
It would take a pretty big performance difference to overcome the
operational motivations for running on the target.
Thinking about this a little, though, brought to mind the
performance issues when we were converting from Sybase to PostgreSQL
using a home-grown Java conversion utility. We found best
performance running it on the target for that. We also got a rather
large performance boost by reading on one thread and writing on
another -- even a 50 row queue to decouple the threads yielded a
very large benefit. I've heard that we have a big bottleneck in
parsing the input during a restore; I suspect that this alternates
with disk I/O as a bottleneck. I know multi-threading is always
controversial, but I wonder whether there wouldn't be some way to
decouple the parsing during a COPY FROM from the tuple insert, to
keep two cores busy on the target even during a piped conversion
like this.
-Kevin