Don Baccus <dhogaza@pacifier.com> writes:
> Of course, if I've understood past postings to this list,
> Postgres also fsynch's after read-only selects, too,
I recently learned something about this that I hadn't understood before.
When a tuple is written out during an insert/update transaction, it is
marked as not definitely committed (since of course Postgres can't know
whether you'll abort the transaction later). The ID of the transaction
that wrote it is stored with it. Subsequently, whenever the tuple is
scanned, the backend has to go to the "transaction log" to see if that
transaction has been committed yet --- if not, it ignores the tuple.
As soon as the transaction is known to be committed, the next operation
that visits that tuple will mark it as "known committed", so as to avoid
future consultations of the transaction log. This happens *even if the
current operation is a select*. That is why selects can cause disk
writes in Postgres.
Similar things happen when a tuple is replaced or deleted, of course.
In short, if you load a bunch of tuples into a table, the first select
after the load can run a lot slower than you might expect, because it'll
be writing back most or all of the pages it touches. But that penalty
doesn't affect every select, only the first one to scan a newly-written
tuple.
regards, tom lane