Re: Postgresql | Vacuum information - Mailing list pgsql-admin

From MichaelDBA
Subject Re: Postgresql | Vacuum information
Date
Msg-id 5A98250D.5090303@sqlexec.com
Whole thread Raw
In response to Re: Postgresql | Vacuum information  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-admin
oh my my, think David has the correct answer here.  Thanks David for your research and correction.

So INSERT logic: see if space on current target page, then check FSM, then default to end of relation

Regards,
Michael Vitale

Wednesday, February 28, 2018 4:53 PM
On Wed, Feb 28, 2018 at 2:21 PM, MichaelDBA <MichaelDBA@sqlexec.com> wrote:
wee bit further elaboration...

MVCC for updates is DELETE followed by INSERT.  That is what the docs are talking about when mentioning "new rows", not the type of "new row" that is done with an INSERT statement which is ALWAYS appended to end of table.  I actually had to ask about this one in PostgreSQL slack chat because I wasn't 100% positive that index clustering order does not influence INSERT location, but apparently even clustering order does not influence a location for INSERTs other than the end of the table.

​ My (admittedly inexperienced) reading of the code (src/backend/access/head/hio.c::RelationGetBufferForTuple) leads me to conclude that my assumption about the behavior is indeed correct.  Both update-related insertions and isolated insertions result in the system using the free space map (FSM) to locate existing pages with free space on them to use for persisting the newly created tuples.

Specifically:

/*
* We first try to put the tuple on the same page we last inserted a tuple
* on, as cached in the BulkInsertState or relcache entry.  If that
* doesn't work, we ask the Free Space Map to locate a suitable page.
* Since the FSM's info might be out of date, we have to be prepared to
* loop around and retry multiple times. (To insure this isn't an infinite
* loop, we must update the FSM with the correct amount of free space on
* each page that proves not to be suitable.)  If the FSM has no record of
* a page with enough free space, we give up and extend the relation.
*
* When use_fsm is false, we either put the tuple onto the existing target
* page or extend the relation.
*/

and

if (targetBlock == InvalidBlockNumber && use_fsm)
{
/*
* We have no cached target page, so ask the FSM for an initial
* target.
*/
targetBlock = GetPageWithFreeSpace(relation, len + saveFreeSpace);
[...]

David J.

Wednesday, February 28, 2018 4:21 PM
wee bit further elaboration...

MVCC for updates is DELETE followed by INSERT.  That is what the docs are talking about when mentioning "new rows", not the type of "new row" that is done with an INSERT statement which is ALWAYS appended to end of table.  I actually had to ask about this one in PostgreSQL slack chat because I wasn't 100% positive that index clustering order does not influence INSERT location, but apparently even clustering order does not influence a location for INSERTs other than the end of the table.

We all learned something here, lol

Regards,
Michael Vitale


Wednesday, February 28, 2018 4:04 PM
Actually, David, that reference to the docs about "new rows" in https://www.postgresql.org/docs/10/static/routine-vacuuming.html#VACUUM-BASICS, recovering disk space, relates to UPDATEs and DELETEs, not INSERTs.  The main gain for reusing dead space is where UPDATES are concerned where it will try to insert on the same page if possible.

Regards,
Michael Vitale

Wednesday, February 28, 2018 1:37 PM
On Wed, Feb 28, 2018 at 11:31 AM, Sohel Tamboli <sohel.tamboli0016@gmail.com> wrote:
Hi, 

I need some information on vacuum in postgresql. I know that "vacuum full" recreate full table and releases space to OS. Only "vacuum" clears the dead tuples and free the space but does not returns free space to OS, indeed it keeps free space as a part of table. 
My question is, after running only "vacuum", how does new data or insert is written to the table? I need to know that Does new data gets inserted in free space available in between of live tuples or gets inserted at the end of table everytime. 


There would be no point to non-full vacuuming if "new data [was] inserted ... at the end of the table everytime"...

​ This logic is also documented:​

​"
The space it occupies must then be reclaimed for reuse by new rows, to avoid unbounded growth of disk space requirements. This is done by running VACUUM.
​"​

​ David J.

Wednesday, February 28, 2018 1:31 PM
Hi, 

I need some information on vacuum in postgresql. I know that "vacuum full" recreate full table and releases space to OS. Only "vacuum" clears the dead tuples and free the space but does not returns free space to OS, indeed it keeps free space as a part of table. 
My question is, after running only "vacuum", how does new data or insert is written to the table? I need to know that Does new data gets inserted in free space available in between of live tuples or gets inserted at the end of table everytime. 

Appreciate your quick response!! 

Thanks, 
Sohel

pgsql-admin by date:

Previous
From: Rui DeSousa
Date:
Subject: Re: Reliable WAL file shipping over unreliable network
Next
From: Alvaro Herrera
Date:
Subject: Re: postgresql 9.6 - cannot freeze committed xmax