On Mon, 2011-10-31 at 18:26 -0300, Fernando Hevia wrote:
> [...]
> I could reproduce the issue in a fresh Windows 7 install with no other apps
> running other than pgAdmin v1.14.0.
> From what I could see, the execute-to-file function runs in 2 stages:
> 1. Rows are retrieved from DB server to RAM
> 2. Rows are written from RAM to file
>
That's right.
> The delay clearly occurs in step 2.
That's also right.
> While with small datasets (<2000) the writing to disk delay is barely
> perceivable, when the number of rows is incremented (>10k) it is quite
> distinctive how step 1 keeps completing in the expected time frame but step
> 2 takes much much longer. In any case it should be the other way around.
>
Nope, step 2 does a lot of work.
> With 10k rows (2.3 MB) it took 3.5 seconds to retrieve data from DB and 40
> seconds to write the file to an SATA 7200 disk with write-through cache.
> With 100k rows (23 MB) the DB retrieve went for 35 seconds while the file
> writing part took over 4 minutes (didn't wait for it to finish).
>
I don't have the same numbers:
* 1000 : 1s
* 10000 : 1s
* 200000 : 7s
* 1000000 : 18s
> The file is being written at an avg 60 KB per second, which is extremely
> slow.
If the only thing pgAdmin does was writing, I would agree. But,
actually, it does a lot more things:
* for each row * for each column * adds the column separator, if needed * grabs one cell's value * quotes the
value,if needed (which also means doubling the quote if it's within the value) * adds the line separator * converts
itto the encoding, if needed * writes it to the file
That could take some time.
I searched if there were some parts that took much longer than others,
but failed to find one.
--
Guillaume http://blog.guillaume.lelarge.info http://www.dalibo.com