Thread: Trying to understand page structures in PG
Hello I understand that when an update of say 100,000 rows are made, PG writes the updated rows as a new row. These new rows are not visible to any sessions except the one creating it. At commit time PG flips something internally to make these rows visible to all. My Q: what happens to those rows which use to contain the values before the update. Shouldn't something change in those rows to indicate that those rows are no longer valid. Who does it chain those rows to the new rows. thanks.
Rakesh Kumar wrote: > I understand that when an update of say 100,000 rows are made, PG > writes the updated rows as a new row. These new rows are not visible > to any sessions except the one creating it. At commit time PG flips > something internally to make these rows visible to all. > > My Q: what happens to those rows which use to contain the values > before the update. Shouldn't something change in those rows to > indicate that those rows are no longer valid. Who does it chain those > rows to the new rows. Every row has two system columns associated with it: xmin and xmax xmin is the transaction ID that created the row, while xmax is the transaction ID that removed the row. So when an update takes place, xmax of the original row and xmin of the new row are set to the transaction ID of the current transaction. Furthermore, the commit log (CLOG) logs for each transaction whether it was committed or rolled back. Now when a backend examines a row, it first checks if the row is visible, i.e. xmin must be less or equal to the current transaction ID and xmax must be 0 or greater than the current transaction ID or belong to a transaction that was rolled back. To save CLOG lookups, the first reader who consults the CLOG will save the result in so-called "hint bits" on the row itself. Yours, Laurenz Albe
On 04/06/2016 02:39 AM, Rakesh Kumar wrote: > Hello > > I understand that when an update of say 100,000 rows are made, PG > writes the updated rows as a new row. These new rows are not visible > to any sessions except the one creating it. At commit time PG flips > something internally to make these rows visible to all. > > My Q: what happens to those rows which use to contain the values > before the update. Shouldn't something change in those rows to > indicate that those rows are no longer valid. Who does it chain those > rows to the new rows. They are marked dead and at a threshold vacuum will come along automatically and mark them reusable. The vacuum and maintenance docs explain this pretty well. Sincerely, JD > > thanks. > > -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them.
On Wed, 2016-04-06 at 10:33 +0000, Albe Laurenz wrote: <snip> > Every row has two system columns associated with it: xmin and xmax > > xmin is the transaction ID that created the row, while xmax is > the transaction ID that removed the row. > > So when an update takes place, xmax of the original row and xmin > of the new row are set to the transaction ID of the current > transaction. > > > Yours, > Laurenz Albe > </snip> Hello Laurenz, Out of curiosity:- 1) If you only updated a column defined as BOOLEAN, DATE or TIMESTAMP, the tuples physical size should be the same. Is it written back to the same page with altered xmin, xmax values? 2) If you defined a table with a FILLFACTOR of say 70%, and there is still space left on its page for the updated tuple, does the same apply? I am curious because of "while xmax is the transaction ID that *removed* the row". Not urgent. Cheers, rob
On Wed, Apr 6, 2016 at 8:35 AM, rob stone <floriparob@gmail.com> wrote: > On Wed, 2016-04-06 at 10:33 +0000, Albe Laurenz wrote: > > <snip> >> Every row has two system columns associated with it: xmin and xmax >> >> xmin is the transaction ID that created the row, while xmax is >> the transaction ID that removed the row. >> >> So when an update takes place, xmax of the original row and xmin >> of the new row are set to the transaction ID of the current >> transaction. > Out of curiosity:- > 1) If you only updated a column defined as BOOLEAN, DATE or TIMESTAMP, > the tuples physical size should be the same. Is it written back to the > same page with altered xmin, xmax values? Being the same size doesn't matter, because it *has* to be copied. If there is room for the copy to go onto the same page, then that is done. If there is not room, then it goes onto a different page. > > 2) If you defined a table with a FILLFACTOR of say 70%, and there is > still space left on its page for the updated tuple, does the same > apply? > > I am curious because of "while xmax is the transaction ID that > *removed* the row". "marked for removal" would be more accurate. If the row were actually physically removed, it would no longer have a xmax to set. Cheers, Jeff
On Wed, Apr 6, 2016 at 6:33 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: > Rakesh Kumar wrote: > Every row has two system columns associated with it: xmin and xmax > > xmin is the transaction ID that created the row, while xmax is > the transaction ID that removed the row. > > So when an update takes place, xmax of the original row and xmin > of the new row are set to the transaction ID of the current transaction. > > Furthermore, the commit log (CLOG) logs for each transaction whether > it was committed or rolled back. > > Now when a backend examines a row, it first checks if the row is > visible, i.e. xmin must be less or equal to the current transaction ID > and xmax must be 0 or greater than the current transaction ID > or belong to a transaction that was rolled back. > > To save CLOG lookups, the first reader who consults the CLOG will > save the result in so-called "hint bits" on the row itself. I am assuming the same must be true for the indexes also. Does PG look up primary key by examining the rows like you described above.
Jeff Janes wrote: >> I am curious because of "while xmax is the transaction ID that >> *removed* the row". > > "marked for removal" would be more accurate. If the row were actually > physically removed, it would no longer have a xmax to set. Yes, thanks for the clarification. I was thinking "logically removed", but I realize my wording was confusing. Yours, Laurenz Albe
Rakesh Kumar wrote: >> Every row has two system columns associated with it: xmin and xmax >> >> xmin is the transaction ID that created the row, while xmax is >> the transaction ID that removed the row. >> >> So when an update takes place, xmax of the original row and xmin >> of the new row are set to the transaction ID of the current transaction. >> >> Furthermore, the commit log (CLOG) logs for each transaction whether >> it was committed or rolled back. >> >> Now when a backend examines a row, it first checks if the row is >> visible, i.e. xmin must be less or equal to the current transaction ID >> and xmax must be 0 or greater than the current transaction ID >> or belong to a transaction that was rolled back. >> >> To save CLOG lookups, the first reader who consults the CLOG will >> save the result in so-called "hint bits" on the row itself. > > I am assuming the same must be true for the indexes also. Does PG look > up primary key > by examining the rows like you described above. Rows remain in the index as long as they remain in the table. What actually removes old rows from the table - VACUUM - also removes them from the index. The index does not contain visibility information about the indexed row, so every row found in an index must still be checked for visibility, either as described above, or in the case of an index-only scan by checking the visibility map if all tuples on that heap page are visible to all transactions. Yours, Laurenz Albe