Thread: BUG #14290: materialized view refresh doesn't use temp_tablespace
VGhlIGZvbGxvd2luZyBidWcgaGFzIGJlZW4gbG9nZ2VkIG9uIHRoZSB3ZWJz aXRlOgoKQnVnIHJlZmVyZW5jZTogICAgICAxNDI5MApMb2dnZWQgYnk6ICAg ICAgICAgIFJpY2sgT3R0ZW4KRW1haWwgYWRkcmVzczogICAgICByb3R0ZW5A d2luZGZpc2gubmV0ClBvc3RncmVTUUwgdmVyc2lvbjogOS41LjQKT3BlcmF0 aW5nIHN5c3RlbTogICBNYWMgb3IgVWJ1bnR1CkRlc2NyaXB0aW9uOiAgICAg ICAgCgpJJ3ZlIHJlY2VudGx5IG9ic2VydmVkIHRoYXQgaWYgd2hpbGUgcmVm cmVzaGluZyBhIG1hdGVyaWFsaXplZCB2aWV3LCB0ZW1wCnNwYWNlIGlzIHJl cXVpcmVkLCBpdCB3aWxsIG5vdCB1c2UgdGhlIGRlZmF1bHQgdGVtcF90YWJs ZXNwYWNlcyBkZWZpbmVkIGluCnBvc3RncmVzcWwuY29uZi4gIEluc3RlYWQg aXQgdXNlcyB0aGUgdGFibGVzcGFjZSB3aGVyZSB0aGUgbWF0ZXJpYWxpemVk IHZpZXcKbGl2ZXMuDQoNClRoaXMgc3VycHJpc2VkIG1lIHRoZSBvdGhlciBk YXkgd2hlbiBteSBwcm9kdWN0aW9uIGRhdGFiYXNlIHJhbiBvdXQgb2YgZGlz awpzcGFjZS4gIEkgd2FzIGFibGUgdG8gY29uZmlybSB0aGUgYmVoYXZpb3Ig aW4gbXkgZGV2ZWxvcG1lbnQgZW52aXJvbm1lbnQuDQoNClRvIHJlcHJvZHVj ZToNCjEpIFNldCB1cCBhIHRlbXAgdGFibGVzcGFjZS4NCjIpIENvbmZpZ3Vy ZSB0ZW1wX3RhYmxlc3BhY2VzIGluIHBvc3RncnNxbC5jb25mIHRvIHVzZSB0 aGF0IHRhYmxlc3BhY2UuDQozKSBUdXJuIG9uIGxvZ2dpbmcgb2YgdGFibGVz cGFjZSBuYW1lcy4NCjQpIFJlZHVjZSB3b3JrX21lbSAoc28gdGhhdCB0ZW1w IHNwYWNlIGlzIG1vcmUgbGlrZWx5IHRvIGJlIHJlcXVpcmVkKS4NCjUpIEJv dW5jZSB0aGUgREIsIG9yIHJlbG9hZCB0aGUgcG9zdGdyZXNxbC5jb25mDQo2 KSBSZWZyZXNoIGEgZmFpcmx5IGxhcmdlIG1hdGVyaWFsaXplZCB2aWV3Lg0K NykgT2JzZXJ2ZSBpbiB0aGUgbG9ncyB3aGVyZSB0aGUgdGVtcCB0YWJsZXNw YWNlIHdhcyBjcmVhdGVkLg0KDQpZb3UgY2FuIGRvIGl0IGNvbmN1cnJlbnRs eSwgb3Igbm90LiAgSW4gYm90aCBjYXNlcyB0aGUgdGVtcG9yYXJ5IHRhYmxl cyBlbmQKdXAgaW4gdGhlIHNhbWUgdGFibGVzcGFjZSBhcyB0aGUgbWF0ZXJp YWxpemVkIHZpZXcsIG5vdCBpbiB0aGUKdGVtcF90YWJsZXNwYWNlLg0KDQpJ ZiB0aGlzIGNhbid0IGJlIGVhc2lseSBjaGFuZ2VkIHRvIHVzZSB0aGUgZGVm YXVsdCB0ZW1wIHRhYmxlc3BhY2VzLCBpdAp3b3VsZCBiZSBnb29kIGlmIHRo ZSBkb2N1bWVudGF0aW9uIHdlcmUgdXBkYXRlZCB0byBhbGVydCBtYXRlcmlh bGl6ZWQgdmlldwp1c2VycyB0byB0aGlzIGNhdmVhdC4NCg0KUmVmcmVzaGlu ZyBhIG1hdGVyaWFsaXplZCB2aWV3IG1heSB1c2UgYXMgbXVjaCB0ZW1wb3Jh cnkgc3BhY2UsIG9yIG1vcmUsCnRoYW4gaXRzIGN1cnJlbnQgc2l6ZS4gIEZv ciBleGFtcGxlLCBJIGhhdmUgYSAxMDBHIG1hdGVyaWFsaXplZCB2aWV3LiAK UmVmcmVzaGluZyBpdCB1c2VzIGFsbW9zdCAzMDBHIGFkZGl0aW9uYWwgZGlz ayB3aGlsZSB0aGUgcmVmcmVzaCBpcyBydW5uaW5nLgogDQoKCg==
On Fri, Aug 19, 2016 at 4:06 AM, <rotten@windfish.net> wrote: > I've recently observed that if while refreshing a materialized view, temp > space is required, it will not use the default temp_tablespaces defined in > postgresql.conf. Instead it uses the tablespace where the materialized view > lives. > > If this can't be easily changed to use the default temp tablespaces, it > would be good if the documentation were updated to alert materialized view > users to this caveat. > > Refreshing a materialized view may use as much temporary space, or more, > than its current size. For example, I have a 100G materialized view. > Refreshing it uses almost 300G additional disk while the refresh is running. Per the docs: https://www.postgresql.org/docs/9.6/static/runtime-config-client.html This variable specifies tablespaces in which to create temporary objects (temp tables and indexes on temp tables) when a CREATE command does not explicitly specify a tablespace. Temporary files for purposes such as sorting large data sets are also created in these tablespaces. And as far as I know, there is no concept of temporary object for matviews, so the documentation is correct IMO, and the behavior you are seeing expected. -- Michael
Refreshing materialized views can definitely create temporary objects. They even get logged when I turn on logging of temporary objects.
On 2016-08-24 03:47, Michael Paquier wrote:
On Fri, Aug 19, 2016 at 4:06 AM, <rotten@windfish.net> wrote:I've recently observed that if while refreshing a materialized view, temp space is required, it will not use the default temp_tablespaces defined in postgresql.conf. Instead it uses the tablespace where the materialized view lives. If this can't be easily changed to use the default temp tablespaces, it would be good if the documentation were updated to alert materialized view users to this caveat. Refreshing a materialized view may use as much temporary space, or more, than its current size. For example, I have a 100G materialized view. Refreshing it uses almost 300G additional disk while the refresh is running.Per the docs: https://www.postgresql.org/docs/9.6/static/runtime-config-client.html This variable specifies tablespaces in which to create temporary objects (temp tables and indexes on temp tables) when a CREATE command does not explicitly specify a tablespace. Temporary files for purposes such as sorting large data sets are also created in these tablespaces. And as far as I know, there is no concept of temporary object for matviews, so the documentation is correct IMO, and the behavior you are seeing expected.
On Thu, Aug 18, 2016 at 2:06 PM, <rotten@windfish.net> wrote: > I've recently observed that if while refreshing a materialized view, temp > space is required, it will not use the default temp_tablespaces defined in > postgresql.conf. Instead it uses the tablespace where the materialized view > lives. > > This surprised me the other day when my production database ran out of disk > space. I was able to confirm the behavior in my development environment. > > To reproduce: > 1) Set up a temp tablespace. > 2) Configure temp_tablespaces in postgrsql.conf to use that tablespace. > 3) Turn on logging of tablespace names. > 4) Reduce work_mem (so that temp space is more likely to be required). > 5) Bounce the DB, or reload the postgresql.conf > 6) Refresh a fairly large materialized view. > 7) Observe in the logs where the temp tablespace was created. > > You can do it concurrently, or not. In both cases the temporary tables end > up in the same tablespace as the materialized view, not in the > temp_tablespace. You are confusing two completely different things: temporary objects (created by statements starting with CREATE TEMPORARY) and temporary files (created when data spills to disk during, for example, a sort, hash, or materialization of data internal to processing some statement). The former are placed based on temp_tablespaces; the latter are normally placed in the base/pgsql_tmp/ subdirectory. It would be possible to place them underneath a tablespace specified by temp_tablespaces, and it might even be a good enhancement to implement, but that is not the normal or default location for temporary files. Out of curiosity, and to help justify this as a feature request worth pursuing, can you explain why you want to do this? For example, have you placed your temporary tablespace on a faster medium? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
The distinction between temporary objects that write to disk, and temporary files, never occurred to me before. I had always thought they were the same thing. I think it is a pretty confusing fine point.
Yes, I do have the temporary tablespace on the fastest disk since I thought that when I spilled out of memory, I'd still want the best performance I could get during those operations.
The temporary files generated during the refresh are several times larger than the final materialized view. This tells me they are probably doing much more I/O than the final (concurrent) write into the materialized view. It made sense to put those temporary writes onto the faster disk.
Additionally, in a tiered storage environment, I might put my regular tablespace on a highly redundant (ie, expensive) disk array, and then put my temporary tablespace on a fast, but cheaper disk array because I don't really need the redundancy in the temporary tables the way I do with my regular tables.
Another issue is a psychology of disk management. When I look at a 1000G "regular" tablespace that only has 200G of tables on it, I might think "oh, there is plenty of space there for more tables" or "oh, look at all the space I'm wasting, I should pare that back and save a few dollars".
Whereas, when I see an 800G temp tablespace on the database, even if it isn't full _at_the_moment_, I think to myself "wow, there must be some big transient queries that need that space which must be why it is there".
I now know that I need all that extra space in the "regular tablespace" to support refreshing the materialized views and that in general I should plan for lots of empty space in my regular tablespaces to support that. As I mentioned, I discovered this when a refresh view concurrently failed because it ran the regular tablespace out of disk, even though the disk was only at 40% capacity (when the refresh wasn't running) and I had a large temporary tablespace set up to support things like that. Fortunately I had everything on logical volume managers so it was easy to extend the regular tablespace on the fly once the issue became apparent. This could have been a much more painful thing to discover.
Lastly, if I had several tablespaces with these sorts of materialized views in them, they could all share the same temporary tablespace for the refresh. Which means I wouldn't need so much extra space in every tablespace (as long as they didn't all refresh at the same time).
On 2016-08-24 13:46, Kevin Grittner wrote:
You are confusing two completely different things: temporary
objects (created by statements starting with CREATE TEMPORARY) and temporary files (created when data spills to disk during, for example, a sort, hash, or materialization of data internal to processing some statement). The former are placed based on temp_tablespaces; the latter are normally placed in the base/pgsql_tmp/ subdirectory. It would be possible to place them underneath a tablespace specified by temp_tablespaces, and it might even be a good enhancement to implement, but that is not the normal or default location for temporary files. Out of curiosity, and to help justify this as a feature request worth pursuing, can you explain why you want to do this? For example, have you placed your temporary tablespace on a faster medium? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Aug 24, 2016 at 1:21 PM, Rick Otten <rotten@windfish.net> wrote: > The distinction between temporary objects that write to disk, and temporary > files, never occurred to me before. I had always thought they were the same > thing. I think it is a pretty confusing fine point. On further investigation, both are supposed to use the tablespaces you specify, which does make this a bug. Will fix. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>>>>> "Kevin" == Kevin Grittner <kgrittn@gmail.com> writes: Kevin> You are confusing two completely different things: temporary Kevin> objects (created by statements starting with CREATE TEMPORARY) Kevin> and temporary files (created when data spills to disk during, Kevin> for example, a sort, hash, or materialization of data internal Kevin> to processing some statement). The former are placed based on Kevin> temp_tablespaces; the latter are normally placed in the Kevin> base/pgsql_tmp/ subdirectory. This is not true. If temp_tablespaces is set, then all sort / hash / materialization files will use the list of tablespaces in temp_tablespaces on a rotational basis (to spread load). The only exception is for temp files that might have to outlive the current transaction; these are forced into the default tablespace of the current database. See storage/file/fd.c However, materialized view refresh (without CONCURRENTLY) needs the new copy of the matview's data to be in the same tablespace as the old copy, since it moves the data into place via a heap swap. This restriction does not apply to sort/hash/materialization files created by the matview's query. -- Andrew (irc:RhodiumToad)
>>>>> "rotten" == rotten <rotten@windfish.net> writes: rotten> I've recently observed that if while refreshing a materialized rotten> view, temp space is required, it will not use the default rotten> temp_tablespaces defined in postgresql.conf. Instead it uses rotten> the tablespace where the materialized view lives. rotten> This surprised me the other day when my production database ran rotten> out of disk space. I was able to confirm the behavior in my rotten> development environment. rotten> To reproduce: rotten> 1) Set up a temp tablespace. rotten> 2) Configure temp_tablespaces in postgrsql.conf to use that tablespace. rotten> 3) Turn on logging of tablespace names. rotten> 4) Reduce work_mem (so that temp space is more likely to be required). rotten> 5) Bounce the DB, or reload the postgresql.conf rotten> 6) Refresh a fairly large materialized view. rotten> 7) Observe in the logs where the temp tablespace was created. I can't reproduce this. postgres=# show server_version; server_version ---------------- 9.5.4 postgres=# create materialized view mvx1 as select * from generate_series(1,100000); SELECT 100000 postgres=# set temp_tablespaces = 't1'; SET postgres=# set log_temp_files = 0; SET postgres=# set work_mem = '64kB'; SET postgres=# refresh materialized view mvx1; LOG: temporary file: path "pg_tblspc/18002/PG_9.5_201510051/pgsql_tmp/pgsql_tmp91239.1", size 1400000 STATEMENT: refresh materialized view mvx1; REFRESH MATERIALIZED VIEW (this clearly shows that the temp file created by the FunctionScan of generate_series is located in tablespace oid 18002, which happens to be t1) -- Andrew (irc:RhodiumToad)
Is t1 also the tablespace for your materialized view? (It isn't clear from what you've posted below.)
In my environment my default tablespace for the database is different than the temp_tablespace.
On 2016-08-24 14:58, Andrew Gierth wrote:
"rotten" == rotten <rotten@windfish.net> writes:rotten> I've recently observed that if while refreshing a materializedrotten> view, temp space is required, it will not use the defaultrotten> temp_tablespaces defined in postgresql.conf. Instead it usesrotten> the tablespace where the materialized view lives. rotten> This surprised me the other day when my production database ranrotten> out of disk space. I was able to confirm the behavior in myrotten> development environment. rotten> To reproduce:rotten> 1) Set up a temp tablespace.rotten> 2) Configure temp_tablespaces in postgrsql.conf to use that tablespace.rotten> 3) Turn on logging of tablespace names.rotten> 4) Reduce work_mem (so that temp space is more likely to be required).rotten> 5) Bounce the DB, or reload the postgresql.confrotten> 6) Refresh a fairly large materialized view.rotten> 7) Observe in the logs where the temp tablespace was created. I can't reproduce this. postgres=# show server_version;server_version ----------------9.5.4 postgres=# create materialized view mvx1 as select * from generate_series(1,100000); SELECT 100000 postgres=# set temp_tablespaces = 't1'; SET postgres=# set log_temp_files = 0; SET postgres=# set work_mem = '64kB'; SET postgres=# refresh materialized view mvx1; LOG: temporary file: path "pg_tblspc/18002/PG_9.5_201510051/pgsql_tmp/pgsql_tmp91239.1", size 1400000 STATEMENT: refresh materialized view mvx1; REFRESH MATERIALIZED VIEW (this clearly shows that the temp file created by the FunctionScan of generate_series is located in tablespace oid 18002, which happens to be t1)
>>>>> "Rick" == Rick Otten <rotten@windfish.net> writes: Rick> Is t1 also the tablespace for your materialized view? (It isn't Rick> clear from what you've posted below.) No. The matview was created in pg_default. Rick> In my environment my default tablespace for the database is Rick> different than the temp_tablespace. Possibly relevant here is that to put a tablespace into temp_tablespaces, you must have CREATE permission on it. If you set the value in postgresql.conf rather than with SET, you may not see errors from lack of permission, instead tablespaces without permission will be silently omitted. -- Andrew (irc:RhodiumToad)
Aha. Maybe that is my problem. The refresh is silently failing to use TEMP_TABLESPACES.
When I explicitly do the grant create, the refresh then uses the temp tablespace as originally expected.
Thanks.
On 2016-08-24 16:14, Andrew Gierth wrote:
"Rick" == Rick Otten <rotten@windfish.net> writes:Rick> Is t1 also the tablespace for your materialized view? (It isn'tRick> clear from what you've posted below.) No. The matview was created in pg_default. Rick> In my environment my default tablespace for the database isRick> different than the temp_tablespace. Possibly relevant here is that to put a tablespace into temp_tablespaces, you must have CREATE permission on it. If you set the value in postgresql.conf rather than with SET, you may not see errors from lack of permission, instead tablespaces without permission will be silently omitted.