Thread: COPY FREEZE and PD_ALL_VISIBLE
I'm planning on adding a todo item to have COPY FREEZE set PD_ALL_VISIBLE. Or is there some reason this can't be done?
Since the whole point of COPY FREEZE is to avoid needing to rewrite the entire table, it seems rather perverse that the first time the table is vacuumed, it needs to rewrite the entire table.
Cheers,
Jeff
On Sun, Oct 18, 2015 at 5:23 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > I'm planning on adding a todo item to have COPY FREEZE set PD_ALL_VISIBLE. > Or is there some reason this can't be done? > > Since the whole point of COPY FREEZE is to avoid needing to rewrite the > entire table, it seems rather perverse that the first time the table is > vacuumed, it needs to rewrite the entire table. *facepalm* I don't know how hard that is to implement, but +1 for trying to figure out a way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Oct 20, 2015 at 7:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Oct 18, 2015 at 5:23 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> I'm planning on adding a todo item to have COPY FREEZE set PD_ALL_VISIBLE.
> Or is there some reason this can't be done?
>
> Since the whole point of COPY FREEZE is to avoid needing to rewrite the
> entire table, it seems rather perverse that the first time the table is
> vacuumed, it needs to rewrite the entire table.
*facepalm*
I don't know how hard that is to implement, but +1 for trying to
figure out a way.
It turns out it was pretty easy to set PD_ALL_VISIBLE on the new pages, since the code in hio that requests the relation to be extended already has info on the tuple's intended freeze status.
Then you just need to refrain from clearing PD_ALL_VISIBLE when that tuple is actually written into the page. Not only because clearing would defeat the purpose, but also because it will cause an error--apparently the incipient page is not yet in a state where visibilitymap_clear is willing to deal with it.
With this patch, you get a table which has PD_ALL_VISIBLE set for all pages, but which doesn't have a _vm file. Index-only scans will visit the heap for each tuple until the first VACUUM is done.
The first vacuum will read the entire table, but not need to write it anymore. And will create the _vm file.
I think we really want to create _vm file as well as set PD_ALL_VISIBLE, but I don't know the best way to do that. Set a flag somewhere and then create it in bulk at the end of the transaction? Set it bit by bit as the pages are extended and initialized?
Cheers,
Jeff
Attachment
On Wed, Oct 21, 2015 at 1:31 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > It turns out it was pretty easy to set PD_ALL_VISIBLE on the new pages, > since the code in hio that requests the relation to be extended already has > info on the tuple's intended freeze status. > > Then you just need to refrain from clearing PD_ALL_VISIBLE when that tuple > is actually written into the page. Not only because clearing would defeat > the purpose, but also because it will cause an error--apparently the > incipient page is not yet in a state where visibilitymap_clear is willing to > deal with it. Wouldn't it be better to instead fill the page with tuples first and THEN set PD_ALL_VISIBLE? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 18 October 2015 at 17:23, Jeff Janes <jeff.janes@gmail.com> wrote:
--
I'm planning on adding a todo item to have COPY FREEZE set PD_ALL_VISIBLE. Or is there some reason this can't be done?Since the whole point of COPY FREEZE is to avoid needing to rewrite the entire table, it seems rather perverse that the first time the table is vacuumed, it needs to rewrite the entire table.
That's pretty darn weird. I remember measuring the benefit during testing. I remember Jeff D was talking about removing that flag at that time, so perhaps that's it.
Either way, my bug, my bad, thanks for the report.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 21 October 2015 at 13:31, Jeff Janes <jeff.janes@gmail.com> wrote:
--
On Tue, Oct 20, 2015 at 7:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:On Sun, Oct 18, 2015 at 5:23 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> I'm planning on adding a todo item to have COPY FREEZE set PD_ALL_VISIBLE.
> Or is there some reason this can't be done?
>
> Since the whole point of COPY FREEZE is to avoid needing to rewrite the
> entire table, it seems rather perverse that the first time the table is
> vacuumed, it needs to rewrite the entire table.
*facepalm*
I don't know how hard that is to implement, but +1 for trying to
figure out a way.It turns out it was pretty easy to set PD_ALL_VISIBLE on the new pages, since the code in hio that requests the relation to be extended already has info on the tuple's intended freeze status.Then you just need to refrain from clearing PD_ALL_VISIBLE when that tuple is actually written into the page. Not only because clearing would defeat the purpose, but also because it will cause an error--apparently the incipient page is not yet in a state where visibilitymap_clear is willing to deal with it.With this patch, you get a table which has PD_ALL_VISIBLE set for all pages, but which doesn't have a _vm file.
Patch is simple enough. All usage looks safe, so I reckon this is good.
Index-only scans will visit the heap for each tuple until the first VACUUM is done.The first vacuum will read the entire table, but not need to write it anymore. And will create the _vm file.I think we really want to create _vm file as well as set PD_ALL_VISIBLE, but I don't know the best way to do that. Set a flag somewhere and then create it in bulk at the end of the transaction? Set it bit by bit as the pages are extended and initialized?
Easy enough to do it at the end of the COPY FREEZE in one step. No need to wait until EOXact.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 21 October 2015 at 13:31, Jeff Janes <jeff.janes@gmail.com> wrote:Index-only scans will visit the heap for each tuple until the first VACUUM is done.The first vacuum will read the entire table, but not need to write it anymore. And will create the _vm file.I think we really want to create _vm file as well as set PD_ALL_VISIBLE, but I don't know the best way to do that. Set a flag somewhere and then create it in bulk at the end of the transaction? Set it bit by bit as the pages are extended and initialized?Easy enough to do it at the end of the COPY FREEZE in one step.
Here, we might want to consider that setting bit in visibility map
will generate WAL log whereas Copy Freeze otherwise skip WAL
when wal_level is less than archive. This can lead to extra disk
writes which can slow down Copy Freeze, but OTOH that might
be acceptable.
On Fri, Oct 23, 2015 at 2:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Oct 21, 2015 at 1:31 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> > It turns out it was pretty easy to set PD_ALL_VISIBLE on the new pages,
> > since the code in hio that requests the relation to be extended already has
> > info on the tuple's intended freeze status.
> >
> > Then you just need to refrain from clearing PD_ALL_VISIBLE when that tuple
> > is actually written into the page. Not only because clearing would defeat
> > the purpose, but also because it will cause an error--apparently the
> > incipient page is not yet in a state where visibilitymap_clear is willing to
> > deal with it.
>
> Wouldn't it be better to instead fill the page with tuples first and
> THEN set PD_ALL_VISIBLE?
>
>
> On Wed, Oct 21, 2015 at 1:31 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> > It turns out it was pretty easy to set PD_ALL_VISIBLE on the new pages,
> > since the code in hio that requests the relation to be extended already has
> > info on the tuple's intended freeze status.
> >
> > Then you just need to refrain from clearing PD_ALL_VISIBLE when that tuple
> > is actually written into the page. Not only because clearing would defeat
> > the purpose, but also because it will cause an error--apparently the
> > incipient page is not yet in a state where visibilitymap_clear is willing to
> > deal with it.
>
> Wouldn't it be better to instead fill the page with tuples first and
> THEN set PD_ALL_VISIBLE?
>
Ideally that would be better and if we want to do that way, then I think it
needs to be done at end of Copy Freeze operation, also for that we
need to mark the buffer as dirty again (do you see any other way
of achieving the same?). OTOH, I don't see any harm even if we do it
in the way as done in patch.
On 3 November 2015 at 15:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
--
On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote:On 21 October 2015 at 13:31, Jeff Janes <jeff.janes@gmail.com> wrote:Index-only scans will visit the heap for each tuple until the first VACUUM is done.The first vacuum will read the entire table, but not need to write it anymore. And will create the _vm file.I think we really want to create _vm file as well as set PD_ALL_VISIBLE, but I don't know the best way to do that. Set a flag somewhere and then create it in bulk at the end of the transaction? Set it bit by bit as the pages are extended and initialized?Easy enough to do it at the end of the COPY FREEZE in one step.Here, we might want to consider that setting bit in visibility mapwill generate WAL log whereas Copy Freeze otherwise skip WALwhen wal_level is less than archive. This can lead to extra diskwrites which can slow down Copy Freeze, but OTOH that mightbe acceptable.
I'm building the map as I go, in the latest version of this patch I'm working on.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Nov 3, 2015 at 8:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On 3 November 2015 at 15:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>
>>> Easy enough to do it at the end of the COPY FREEZE in one step.
>>
>>
>> Here, we might want to consider that setting bit in visibility map
>> will generate WAL log whereas Copy Freeze otherwise skip WAL
>> when wal_level is less than archive. This can lead to extra disk
>> writes which can slow down Copy Freeze, but OTOH that might
>> be acceptable.
>
>
> I'm building the map as I go, in the latest version of this patch I'm working on.
>
>
> On 3 November 2015 at 15:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>
>>> Easy enough to do it at the end of the COPY FREEZE in one step.
>>
>>
>> Here, we might want to consider that setting bit in visibility map
>> will generate WAL log whereas Copy Freeze otherwise skip WAL
>> when wal_level is less than archive. This can lead to extra disk
>> writes which can slow down Copy Freeze, but OTOH that might
>> be acceptable.
>
>
> I'm building the map as I go, in the latest version of this patch I'm working on.
>
As, you are working on this patch, I have marked it as "Waiting on Author".
On Wed, Nov 4, 2015 at 12:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Nov 3, 2015 at 8:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> >> On 3 November 2015 at 15:23, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> >>> On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com> >>> wrote: >>>> >>>> Easy enough to do it at the end of the COPY FREEZE in one step. >>> >>> >>> Here, we might want to consider that setting bit in visibility map >>> will generate WAL log whereas Copy Freeze otherwise skip WAL >>> when wal_level is less than archive. This can lead to extra disk >>> writes which can slow down Copy Freeze, but OTOH that might >>> be acceptable. >> >> >> I'm building the map as I go, in the latest version of this patch I'm >> working on. >> > > As, you are working on this patch, I have marked it as "Waiting on Author". There has not been an update for this patch in a while, switched to "returned with feedback" because of lack of author's input. -- Michael
On Wed, Dec 23, 2015 at 6:14 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Nov 4, 2015 at 12:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Tue, Nov 3, 2015 at 8:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> >>> On 3 November 2015 at 15:23, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>> >>>> On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com> >>>> wrote: >>>>> >>>>> Easy enough to do it at the end of the COPY FREEZE in one step. >>>> >>>> >>>> Here, we might want to consider that setting bit in visibility map >>>> will generate WAL log whereas Copy Freeze otherwise skip WAL >>>> when wal_level is less than archive. This can lead to extra disk >>>> writes which can slow down Copy Freeze, but OTOH that might >>>> be acceptable. >>> >>> >>> I'm building the map as I go, in the latest version of this patch I'm >>> working on. >>> >> >> As, you are working on this patch, I have marked it as "Waiting on Author". > > There has not been an update for this patch in a while, switched to > "returned with feedback" because of lack of author's input. And I've added Simon as an author, as I don't plan on picking it up again myself as long as he is working on his alternative. Thanks, Jeff
On Tue, Nov 3, 2015 at 6:37 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 3 November 2015 at 15:23, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com> >> wrote: >>> >>> On 21 October 2015 at 13:31, Jeff Janes <jeff.janes@gmail.com> wrote: >>> >>>> Index-only scans will visit the heap for each tuple until the first >>>> VACUUM is done. >>>> >>>> The first vacuum will read the entire table, but not need to write it >>>> anymore. And will create the _vm file. >>>> >>>> I think we really want to create _vm file as well as set PD_ALL_VISIBLE, >>>> but I don't know the best way to do that. Set a flag somewhere and then >>>> create it in bulk at the end of the transaction? Set it bit by bit as the >>>> pages are extended and initialized? >>> >>> >>> Easy enough to do it at the end of the COPY FREEZE in one step. >> >> >> Here, we might want to consider that setting bit in visibility map >> will generate WAL log whereas Copy Freeze otherwise skip WAL >> when wal_level is less than archive. This can lead to extra disk >> writes which can slow down Copy Freeze, but OTOH that might >> be acceptable. > > > I'm building the map as I go, in the latest version of this patch I'm > working on. Hi Simon, Is this still on your radar? If you would like someone else to pick it up, can you post the WIP patch you have? Thanks, Jeff
Hello,
osmium export -c osmium.config -f pg belarus-latest.osm.pbf -v --progress | psql -1 -c 'create table byosm(geom geometry, osm_type text, osm_id bigint, tags jsonb);copy byosm from stdin freeze;'
Today I bumped into need to limit first VACUUM time on data import.
I'm using utility called osmium together with COPY FREEZE to import openstreetmap data into database.
osmium export -c osmium.config -f pg belarus-latest.osm.pbf -v --progress | psql -1 -c 'create table byosm(geom geometry, osm_type text, osm_id bigint, tags jsonb);copy byosm from stdin freeze;'
However, first pass of VACUUM rewrites the whole table. Here is two logs of VACUUM VERBOSE in a row:
In Russian Postgres Telegram group I've been recommended this thread.
Can the patch be revived? What is needed to get it up for 12?
On Sun, Aug 14, 2016 at 10:37 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Nov 3, 2015 at 6:37 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 3 November 2015 at 15:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Fri, Oct 23, 2015 at 6:29 AM, Simon Riggs <simon@2ndquadrant.com>
>> wrote:
>>>
>>> On 21 October 2015 at 13:31, Jeff Janes <jeff.janes@gmail.com> wrote:
>>>
>>>> Index-only scans will visit the heap for each tuple until the first
>>>> VACUUM is done.
>>>>
>>>> The first vacuum will read the entire table, but not need to write it
>>>> anymore. And will create the _vm file.
>>>>
>>>> I think we really want to create _vm file as well as set PD_ALL_VISIBLE,
>>>> but I don't know the best way to do that. Set a flag somewhere and then
>>>> create it in bulk at the end of the transaction? Set it bit by bit as the
>>>> pages are extended and initialized?
>>>
>>>
>>> Easy enough to do it at the end of the COPY FREEZE in one step.
>>
>>
>> Here, we might want to consider that setting bit in visibility map
>> will generate WAL log whereas Copy Freeze otherwise skip WAL
>> when wal_level is less than archive. This can lead to extra disk
>> writes which can slow down Copy Freeze, but OTOH that might
>> be acceptable.
>
>
> I'm building the map as I go, in the latest version of this patch I'm
> working on.
Hi Simon,
Is this still on your radar? If you would like someone else to pick
it up, can you post the WIP patch you have?
Thanks,
Jeff
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
On Tue, Jan 15, 2019 at 8:48 PM Darafei "Komяpa" Praliaskouski <me@komzpa.net> wrote:
Hello,Today I bumped into need to limit first VACUUM time on data import.I'm using utility called osmium together with COPY FREEZE to import openstreetmap data into database.
osmium export -c osmium.config -f pg belarus-latest.osm.pbf -v --progress | psql -1 -c 'create table byosm(geom geometry, osm_type text, osm_id bigint, tags jsonb);copy byosm from stdin freeze;'However, first pass of VACUUM rewrites the whole table. Here is two logs of VACUUM VERBOSE in a row:In Russian Postgres Telegram group I've been recommended this thread.Can the patch be revived? What is needed to get it up for 12?
I posted a new patch [1] for consideration to include in PG12. I started a new thread because the patch is completely new and this thread was a bit too old.
Thanks,
Pavan
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
PostgreSQL Development, 24x7 Support, Training & Services