Michael Simms <grim@argh.demon.co.uk> writes:
>> If I do a large search the first time is about three times slower than
>> any subsequent overlapping (same data) searches. I would like to always
>> get the higher performance.
> What happens the first time is that it must read the data off the disc. After
> that the data comes from memory IF it is cached. Disc read will always be
> slower with current disc technology.
There is that effect, but I suspect Edwin may also be seeing another
effect. When a tuple is first inserted or modified, it is written into
the table with a marker saying (in effect) "Inserted by transaction NNN,
not committed yet". To find out whether the tuple is really any good,
you have to go and consult pg_log to see if that transaction got
committed. Obviously, that's slow, so the first subsequent transaction
that does so and finds that NNN really did get committed will rewrite
the disk page with the tuple's state changed to "Known committed".
So, the first select after an update transaction will spend additional
cycles checking pg_log and marking committed tuples. In effect, it's
doing the last phase of the update. We could instead force the update
to do all its own housekeeping, but the overall result wouldn't be any
faster; probably it'd be slower.
> I would imagine (Im not an expert, but through observation) that if
> you drasticly increase the number of shared memory buffers, then when
> you startup your front-end simply do a select * from the tables, it
> may even keep them all in memory from the start.
The default buffer space (64 disk pages) is not very large --- use
a larger -B setting if you have the memory to spare.
regards, tom lane