Re: Page Checksums + Double Writes - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: Page Checksums + Double Writes |
Date | |
Msg-id | 4EF4546E0200002500044091@gw.wicourts.gov Whole thread Raw |
In response to | Re: Page Checksums + Double Writes ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Responses |
Re: Page Checksums + Double Writes
|
List | pgsql-hackers |
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: >> I would suggest you examine how to have an array of N bgwriters, >> then just slot the code for hinting into the bgwriter. That way a >> bgwriter can set hints, calc CRC and write pages in sequence on a >> particular block. The hinting needs to be synchronised with the >> writing to give good benefit. > > I'll think about that. I see pros and cons, and I'll have to see > how those balance out after I mull them over. I think maybe the best solution is to create some common code to use from both. The problem with *just* doing it in bgwriter is that it would not help much with workloads like Robert has been using for most of his performance testing -- a database which fits entirely in shared buffers and starts thrashing on CLOG. For a background hinter process my goal would be to deal with xids as they are passed by the global xmin value, so that you have a cheap way to know that they are ripe for hinting, and you can frequently hint a bunch of transactions that are all in the same CLOG page which is recent enough to likely be already loaded. Now, a background hinter isn't going to be a net win if it has to grovel through every tuple on every dirty page every time it sweeps through the buffers, so the idea depends on having a sufficiently efficient was to identify interesting buffers. I'm hoping to improve on this, but my best idea so far is to add a field to the buffer header for "earliest unhinted xid" for the page. Whenever this background process wakes up and is scanning through the buffers (probably just in buffer number order), it does a quick check, without any pin or lock, to see if the buffer is dirty and the earliest unhinted xid is below the global xmin. If it passes both of those tests, there is definitely useful work which can be done if the page doesn't get evicted before we can do it. We pin the page, recheck those conditions, and then we look at each tuple and hint where possible. As we go, we remember the earliest xid that we see which is *not* being hinted, to store back into the buffer header when we're done. Of course, we would also update the buffer header for new tuples or when an xmax is set if the xid involved precedes what we have in the buffer header. This would not only help avoid multiple page writes as unhinted tuples on the page are read, it would minimize thrashing on CLOG and move some of the hinting work from the critical path of reading a tuple into a background process. Thoughts? -Kevin
pgsql-hackers by date: