Thread: Creating a read/write virtual table?
Hi, I'm the author of the libferris virtual filesystem. Last year I created the ability to expose a ferris filesystem as a virtual table in SQLite [1] and would like to do much the same for PostgreSQL if possible. I was digging into where to get the whole libferris-as-vtable thing to happen with pg: http://www.postgresql.org/docs/8.4/static/xfunc-c.html it strikes me that in the returning sets part: http://www.postgresql.org/docs/8.4/static/xfunc-c.html#XFUNC-C-RETURN-SET I might be able to prod the tupledesc that is generated by get_call_result_type() to work out what should be returned. There are two major drawbacks to this approach: 1) This is run as the postgres user on the server 2) This is read only. In my SQLite vtable both of the above restrictions don't apply. To mitigate (1) I was hopeful that there was a way to get the connection to PG on the server to fork/setuid ala how one can setup apache to run web responses as linux uids. Given the difference between \copy and COPY (client/server fs access) I don't know if such a restriction is easy to lift. I think using a function restricts the results to being read only. Perhaps having a second function that allows updates to be processed separately. Any pointers / suggestions on how to do this sort of thing with postgresql would be wonderful. Please CC any such replies as I'm not currently on list. [1] http://ldn.linuxfoundation.org/article/libferris-and-sqlite-a-powerful-combination-part-1 https://ldn.linuxfoundation.org/node/26821
Attachment
On 12/04/10 11:34, Ben Martin wrote: > In my SQLite vtable both of the above restrictions don't apply. To > mitigate (1) I was hopeful that there was a way to get the connection to > PG on the server to fork/setuid The pg backends are designed to run as one user. You might have issues setting it up to do otherwise. Consider talking to your storage backend over a unix socket, pipe, tcp/ip socket, shared memory, or other suitable IPC mechanism from Pg. This also protects Pg from any issues caused by the storage backend doing things like spawning unexpected threads, corrupting the stack/heap, outright crashing, etc. > I think using a function restricts the results to being read only. If you're planning on writing a C-level UDF loaded via 'CREATE FUNCTION', then yes it'll have to return a materialized data set and there's no way to UPDATE/INSERT/DELETE against the function's results. > Perhaps having a second function that allows updates to be processed > separately. Yep, though that's clumsy I expect it'd work, and honestly it's probably your best bet. > Any pointers / suggestions on how to do this sort of thing with > postgresql would be wonderful. Please CC any such replies as I'm not > currently on list. AFAIK Pg's innards aren't currently very friendly toward extension with random user-provided table/database storage backends. Unlike (say) MySQL, there's no pluggable storage engine system. That means Pg's planner/executor can make a lot of decisions based on things it "knows" about how Pg's data storage works, speeding and simplifying it and letting it be much smarter. Pg can enforce strict rules about transactional scope, ACID, error handling, etc based on knowledge of the data storage's capabilities and expectations. On the other hand, it means it's hard to plug in your own storage system. After all, how would your external system handle something like this: BEGIN; INSERT INTO my_external_storage(a, b) VALUES (1,2); SAVEPOINT sp_a; INSERT INTO my_external_storage(a, b) VALUES (3,4); ROLLBACK TO SAVEPOINT sp_a; INSERT INTO my_external_storage(a, b) VALUES (5,6); COMMIT; How would it honour SERIALIZABLE isolation if it doesn't its self maintain MVCC snapshots and have some way of mapping pg's xids to its internal versions? Etc. Pg would have to be able to error out on many commands with "not supported by underlying storage system" or the like. Yet sometimes it won't even know that until it's well into executing something. SQLite has the advantage of simplicity. It doesn't have to worry about complicated concurrency rules, versioning, user-defined data type storage, etc etc etc. And MySQL was designed to handle pluggable backends, so it's prepared for storage backends not to be able to perform certain operations, and it makes some big sacrifices in terms of its broader capabilities to permit pluggable backends. I guess at this point I have to ask "why do you want to do this? What will you gain?" -- Craig Ringer
On Mon, 2010-04-12 at 13:15 +0800, Craig Ringer wrote: > On 12/04/10 11:34, Ben Martin wrote: > > > In my SQLite vtable both of the above restrictions don't apply. To > > mitigate (1) I was hopeful that there was a way to get the connection to > > PG on the server to fork/setuid > > The pg backends are designed to run as one user. You might have issues > setting it up to do otherwise. I was suspecting this to be the case, with apache it's somewhat different to be able to set the uid and then access the filesystem, but trying to have multiple users + processes accessing pg data files would be a messier thing. > > Consider talking to your storage backend over a unix socket, pipe, > tcp/ip socket, shared memory, or other suitable IPC mechanism from Pg. > This also protects Pg from any issues caused by the storage backend > doing things like spawning unexpected threads, corrupting the > stack/heap, outright crashing, etc. Hmm, this might be a good way to go anyway just for the isolation. And actually fork/execing ferrisls --xml and massaging that as the return value might be a fairly quick and robust initial implementation. Thinking broader, being able to run "a program" and massage xml/json/yaml back as the return value of a custom function might be a handy generic thing to have. I'll have a quick search around to make sure such a thing doesn't already exist... > > > I think using a function restricts the results to being read only. > > If you're planning on writing a C-level UDF loaded via 'CREATE > FUNCTION', then yes it'll have to return a materialized data set and > there's no way to UPDATE/INSERT/DELETE against the function's results. > > > Perhaps having a second function that allows updates to be processed > > separately. > > Yep, though that's clumsy I expect it'd work, and honestly it's probably > your best bet. OK > > > Any pointers / suggestions on how to do this sort of thing with > > postgresql would be wonderful. Please CC any such replies as I'm not > > currently on list. > > AFAIK Pg's innards aren't currently very friendly toward extension with > random user-provided table/database storage backends. Unlike > (say) MySQL, there's no pluggable storage engine system. > > That means Pg's planner/executor can make a lot of decisions based on > things it "knows" about how Pg's data storage works, speeding and > simplifying it and letting it be much smarter. Pg can enforce strict > rules about transactional scope, ACID, error handling, etc based on > knowledge of the data storage's capabilities and expectations. On the > other hand, it means it's hard to plug in your own storage system. This all sounds very reasonable. Things also become a quagmire when the database would need to ask the table implementation about virtual indexes and their selectivity etc, etc. Once again, not too much of a problem for smallish data sets and SQLite when a full virtual table scan is acceptable for performance. > > After all, how would your external system handle something like this: > > BEGIN; > INSERT INTO my_external_storage(a, b) VALUES (1,2); > SAVEPOINT sp_a; > INSERT INTO my_external_storage(a, b) VALUES (3,4); > ROLLBACK TO SAVEPOINT sp_a; > INSERT INTO my_external_storage(a, b) VALUES (5,6); > COMMIT; > > How would it honour SERIALIZABLE isolation if it doesn't its self > maintain MVCC snapshots and have some way of mapping pg's xids to its > internal versions? Etc. This indeed gets very hairy. I have thought about trying to implement transactions in the filesystem, but for some virtual filesystems it is not really possible, especially when rollback itself could fail. > > > Pg would have to be able to error out on many commands with "not > supported by underlying storage system" or the like. Yet sometimes it > won't even know that until it's well into executing something. > > SQLite has the advantage of simplicity. It doesn't have to worry about > complicated concurrency rules, versioning, user-defined data type > storage, etc etc etc. And MySQL was designed to handle pluggable > backends, so it's prepared for storage backends not to be able to > perform certain operations, and it makes some big sacrifices in terms of > its broader capabilities to permit pluggable backends. > > I guess at this point I have to ask "why do you want to do this? What > will you gain?" > A few things I do with the SQLite engine are analysing apache log files, or other separated files (like running awk with SQL as the query language), and interacting with web services as a table, eg, selecting data from a google spreadsheet and a local sqlite database in a single query. As a fair amount of the use cases I have are simply introducing and joining data from other sources into queries I think perhaps starting out with just a record set returning function would be a good thing. This nicely avoids the issues that the filesystem can't really do MVCC or transactions properly, but gives access to some more diverse data in a query for folks who want to perform such queries.
Attachment
On 12/04/10 17:52, Ben Martin wrote: > A few things I do with the SQLite engine are analysing apache log files, > or other separated files (like running awk with SQL as the query > language), and interacting with web services as a table, eg, selecting > data from a google spreadsheet and a local sqlite database in a single > query. As a fair amount of the use cases I have are simply introducing > and joining data from other sources into queries I think perhaps > starting out with just a record set returning function would be a good > thing. > > This nicely avoids the issues that the filesystem can't really do MVCC > or transactions properly, but gives access to some more diverse data in > a query for folks who want to perform such queries. > If that's all you're after, you may well be able to use the existing COPY protocol to populate a temp table by outputting suitable csv from your data source. -- Craig Ringer