Thread: MySQL file system
Anyone heard about this? http://no.spam.ee/~tonu/mysqlfs.html -- ---------------------------------------------------- Ned Lilly e: ned@greatbridge.com Vice President w: www.greatbridge.com Evangelism / Hacker Relations v: 757.233.5523 Great Bridge, LLC f: 757.233.5555
On Tue, 16 Jan 2001, Ned Lilly wrote: > http://no.spam.ee/~tonu/mysqlfs.html Can I puke now? Please. -- Dominic J. Eidson "Baruk Khazad! Khazad ai-menu!" - Gimli ------------------------------------------------------------------------------- http://www.the-infinite.org/ http://www.the-infinite.org/~dominic/
Hehe, very amusing. (albeit vaporware for now) I guess they want to it with KOrbit, and some software translating cobra->mysql (mysql doesn't have corba bindings yet, to my knowledge). The same should be possible with postgres, and it actually DOES have corba bindings (albeit they are broken AFAIK :) -alex I'm sure something like this would be possible On Tue, 16 Jan 2001, Ned Lilly wrote: > Anyone heard about this? > > http://no.spam.ee/~tonu/mysqlfs.html > >
more on slashdot: http://slashdot.org/article.pl?sid=01/01/16/1855253&mode=nested Alex Pilosov wrote: > Hehe, very amusing. (albeit vaporware for now) > > I guess they want to it with KOrbit, and some software translating > cobra->mysql (mysql doesn't have corba bindings yet, to my knowledge). > > The same should be possible with postgres, and it actually DOES have corba > bindings (albeit they are broken AFAIK :) > > -alex > > I'm sure something like this would be possible > > On Tue, 16 Jan 2001, Ned Lilly wrote: > >> Anyone heard about this? >> >> http://no.spam.ee/~tonu/mysqlfs.html >> >> -- ---------------------------------------------------- Ned Lilly e: ned@greatbridge.com Vice President w: www.greatbridge.com Evangelism / Hacker Relations v: 757.233.5523 Great Bridge, LLC f: 757.233.5555
Ned Lilly wrote: > > Anyone heard about this? > > http://no.spam.ee/~tonu/mysqlfs.html I was bout to copy about 3000 RFCs onto my hard drive last night. Bowsing them online was just too painfully slow. Then I realized that although I had the disk space for the total file size, they were mostly smaller than a block on my disk, so I was going to loose alot of space. My first thought was, well postgress would stor that as one file - that might work. Then I thought "I'm trying to be FAST here -- any time spent on this is a distraction. Too bad there is no such thing as PostgreSQL/FS" It's not earth shattering, and sure there are other solutions. But it's cute and could be useful. Add some bugs and a few security compromises, and it almost looks like M$. -- Karl DeBisschop kdebisschop@alert.infoplease.com Learning Network/Information Please http://www.infoplease.com Netsaint Plugin Developer kdebisschop@users.sourceforge.net
Hi all - I'm not here to start a war, but it seems to me that there is a fairly large amount of MySQL bashing in this group. Why? What's the point? It seems to me that if PostgreSQL wants an enemy to fight it should be Microsoft SQLServer and Oracle. It seems to me that it's PostgreSQL/MySQL vs those two. Why fight amongst ourselves? Now, if there is something truly lacking in MySQL, that completely different because this stuff ends up in the archives and people make decisions based off what they find there. However, it needs to be kept professional, don't you think? What are my impressions going to be of PostgreSQL (and now Great Bridge since their name will pop up as well) if I search for "mysql vs postgresql" and see a lot of mysql bashing posts in thse lists? I'm going to think you all are a bunch of whining losers who can't fathom the possibility that there might be good (or even better) alternatives out there. That can't be good for PostgreSQL, can it? -philip
Hey Guys, I am using PHP to process this query - I get the error shown below: I can't see why I should get this error as I have other similar queries that work fine. $sql = "insert into projects values('$number', '$description', '$date', '$deadline', '$manager', '$csindex', '$status', '$dataentrycontact', '$company', '$unique')"; $sql_result = pg_exec($connection,$sql) or die ("Couldn't execute query."); Warning: 1 is not a PostgresSQL link index in /www/vhtdocs/testarea/intranet/addproject.php3 on line 23 Any help appreciated. Thanks, Abe
On Tue, 16 Jan 2001, Ned Lilly wrote: > Anyone heard about this? > > http://no.spam.ee/~tonu/mysqlfs.html That only confirms the criticisms that some have leveled at MySQL of being an SQL interface to a filesystem and not a true relational database management system. -- Brett http://www.chapelperilous.net/~bmccoy/ --------------------------------------------------------------------------- Boys, you have ALL been selected to LEAVE th' PLANET in 15 minutes!!
Karl DeBisschop wrote: > > Ned Lilly wrote: > > > > Anyone heard about this? > > > > http://no.spam.ee/~tonu/mysqlfs.html > > I was bout to copy about 3000 RFCs onto my hard drive last night. > Bowsing them online was just too painfully slow. > > Then I realized that although I had the disk space for the total file > size, they were mostly smaller than a block on my disk, so I was going > to loose alot of space. > > My first thought was, well postgress would stor that as one file - that > might work. Then I thought "I'm trying to be FAST here -- any time spent > on this is a distraction. Too bad there is no such thing as > PostgreSQL/FS" > So what is needed it to be able to mount a .zip or a .tar file as a file system with the loopback device. Mounting a database on the filesystem sounds pretty useless to me. -- Joseph Shraibman jks@selectacast.net Increase signal to noise ratio. http://www.targabot.com
* Joseph Shraibman <jks@selectacast.net> [010116 19:29] wrote: > Karl DeBisschop wrote: > > > > Ned Lilly wrote: > > > > > > Anyone heard about this? > > > > > > http://no.spam.ee/~tonu/mysqlfs.html > > > > I was bout to copy about 3000 RFCs onto my hard drive last night. > > Bowsing them online was just too painfully slow. > > > > Then I realized that although I had the disk space for the total file > > size, they were mostly smaller than a block on my disk, so I was going > > to loose alot of space. > > > > My first thought was, well postgress would stor that as one file - that > > might work. Then I thought "I'm trying to be FAST here -- any time spent > > on this is a distraction. Too bad there is no such thing as > > PostgreSQL/FS" > > > > So what is needed it to be able to mount a .zip or a .tar file as a file > system with the loopback device. Mounting a database on the filesystem > sounds pretty useless to me. Raw disk access allows: 1) removing the double buffering problem associated with read/write + shm 2) your own filesystem semantics and explicit block flushing versus whole file flushing (fsync) 3) removing layers of abstraction, you loose the FS abstraction and allow the database abstraction to take over which shortens the code path and gives complete control over the data to the database. However it limits: 1) doing backups, you must either have a utility that can read the 'dbfs' or use 'dd' to copy the entire disk. (this is ok, we have pg_dump) 2) taking advantage of built in filesystem advantages like snapshots, logging and other advanced features. (only things i can think of is snapshots to make an immediate backup but then again we have pg_dump as well as filesystems that can grow) 3) portability, not all OS's support character devices, the ones that don't can't safely guarantee data write ordering. 4) development time, as the database now has a filesystem module to maintain. I'm sure there's a few points that I missed but there are some advantages, Oracle does or did have a "raw disk" mode. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."
> > So what is needed it to be able to mount a .zip or a .tar file as a file > > system with the loopback device. Mounting a database on the filesystem > > sounds pretty useless to me. > > Raw disk access allows: > > 1) removing the double buffering problem associated with read/write + shm > 2) your own filesystem semantics and explicit block flushing versus whole > file flushing (fsync) > 3) removing layers of abstraction, you loose the FS abstraction and allow > the database abstraction to take over which shortens the code path > and gives complete control over the data to the database. > > However it limits: > > 1) doing backups, you must either have a utility that can read the > 'dbfs' or use 'dd' to copy the entire disk. > (this is ok, we have pg_dump) > 2) taking advantage of built in filesystem advantages like snapshots, > logging and other advanced features. > (only things i can think of is snapshots to make an immediate backup > but then again we have pg_dump as well as filesystems that can grow) > 3) portability, not all OS's support character devices, the ones that > don't can't safely guarantee data write ordering. > 4) development time, as the database now has a filesystem module > to maintain. > > I'm sure there's a few points that I missed but there are some > advantages, Oracle does or did have a "raw disk" mode. > I think the conclusion we came up to in the past is that raw database access was important in the 80's with older file systems, but modern ones don't need raw access, and most database vendors are moving away from recommanding raw access. [ FYI, Linux ext2 is not a modern file system.] -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote: > > > > So what is needed it to be able to mount a .zip or a .tar file as a file > > > system with the loopback device. Mounting a database on the filesystem > > > sounds pretty useless to me. Which can be done with cramfs on Linux. Still could be a cute trick with PostgreSQL that could be somewhat handy from timeto time. <snip> > > I'm sure there's a few points that I missed but there are some > > advantages, Oracle does or did have a "raw disk" mode. > > > > I think the conclusion we came up to in the past is that raw database > access was important in the 80's with older file systems, but modern > ones don't need raw access, and most database vendors are moving away > from recommanding raw access. I think this was a different sense of 'raw access' - it meant 'allow the user to see a database as if it were a filesystem'if I read the post correctly. Not the same issue, I think. -- Karl DeBisschop kdebisschop@alert.infoplease.com Learning Network/Information Please http://www.infoplease.com Netsaint Plugin Developer kdebisschop@users.sourceforge.net
Philip Hallstrom wrote: > I'm not here to start a war, but it seems to me that there is a > fairly large amount of MySQL bashing in this group. Why? I believe it's a matter of different cultural backgrounds: core developers and old users are mostly concerned about portability, SQL standards compliance, result correctness. MySql community is geared toward speed, ease-of-use, web integration, hyped featurism. While all these goals belong to the other "camp" as well, it's a matter of approach. Personally, I don't like Monty and other's approach which seems to come from people who never opened a Database theory textbook (strictly IMVHO ofcourse). -- Alessio F. Bragadini alessio@albourne.com APL Financial Services http://village.albourne.com Nicosia, Cyprus phone: +357-2-755750 "It is more complicated than you think" -- The Eighth Networking Truth from RFC 1925
On Tue, Jan 16, 2001 at 17:36:24 -0500, Karl DeBisschop wrote: > Then I thought "I'm trying to be FAST here -- any time spent on this is a > distraction. Too bad there is no such thing as PostgreSQL/FS" There is/was, for Linux at least; see http://noframes.linuxjournal.com/lj-issues/issue42/1383.html I haven't been able to find the sources though. HTH, Ray -- I'm having trouble keeping my clever schemes separate from my sarcasm. Wally
On Tue, Jan 16, 2001 at 22:21:19 -0500, Joseph Shraibman wrote: > So what is needed it to be able to mount a .zip or a .tar file as a file > system with the loopback device. For Linux, there used to be a "userfs" that allowed this kind of thing. AFAIK development stopped quite some time ago, as there were numerous problems with deadlocks. > Mounting a database on the filesystem sounds pretty useless to me. Depends on the database, I guess. There is a succesful configuration management system that uses a database as its backend and exports trees/branches/versions via NFS. This combines the ease of use of a filesystem (rather than "projects" in some IDE) with the reliability (transactions) provided by a DBMS. Ray -- "Never trust a poll you haven't rigged yourself."
On Wed, 17 Jan 2001, J.H.M. Dassen (Ray) wrote: > > So what is needed it to be able to mount a .zip or a .tar file as a file > > system with the loopback device. > > For Linux, there used to be a "userfs" that allowed this kind of thing. > AFAIK development stopped quite some time ago, as there were numerous > problems with deadlocks. I am working (since a long time now), on a migration filesystem where you can extend a filesystem with data on tapes, NFS, etc, using a stacking filesystem (FIST). Migration can be scheduled or automatic, when the fs is full. It only migrates/unmigrates complete files. http://www-internal.alphanet.ch/~schaefer/mfs.html It's a bit your poor man's HSM. It is supposed it will be one day merged with filed (http://www-internal.alphanet.ch/~schaefer/filed.html). That's Linux-only, however, and still very gamma.
Um, I think that file servers may have use for it (DBFS), as they will want to store documents and media files in a database with various plug-ins available to be able to get into files, and more importantly, to be able to find them quickly when required. How do I find all documents with the keyword 'telecommunications'? I could do a grep -r, or I could have it all stored in a database with plug-ins having extracted the keywords on insert, and run an SQL query (there are better examples, this is just a simple one). I think the file system idea is a good one, *in a niche area*, and I think that Postgres is probably better suited to doing it than mySQL, given some of the things that Postgres can do, and the ORDB descriptor.
In Ned's case, I agree that a .tar.gz mount (well it's not actually a mount, simply a smart file browser) is probably the best solution. It's the way I do it, and I'm reasonably happy with it.
Cheers...
MikeA
-----Original Message-----
From: Joseph Shraibman [mailto:jks@selectacast.net]
Sent: 17 January 2001 03:21
To: Karl DeBisschop
Cc: Ned Lilly; PostgreSQL General
Subject: Re: [GENERAL] MySQL file system
Karl DeBisschop wrote:
>
> Ned Lilly wrote:
> >
> > Anyone heard about this?
> >
> > http://no.spam.ee/~tonu/mysqlfs.html
>
> I was bout to copy about 3000 RFCs onto my hard drive last night.
> Bowsing them online was just too painfully slow.
>
> Then I realized that although I had the disk space for the total file
> size, they were mostly smaller than a block on my disk, so I was going
> to loose alot of space.
>
> My first thought was, well postgress would stor that as one file - that
> might work. Then I thought "I'm trying to be FAST here -- any time spent
> on this is a distraction. Too bad there is no such thing as
> PostgreSQL/FS"
>
So what is needed it to be able to mount a .zip or a .tar file as a file
system with the loopback device. Mounting a database on the filesystem
sounds pretty useless to me.
--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio. http://www.targabot.com
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
Nick West - Global Infrastructure Manager.
This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.
www.mimesweeper.com
**********************************************************************
>Then I realized that although I had the disk space for the total file >size, they were mostly smaller than a block on my disk, so I was going >to loose alot of space. The solution to this is sub-allocation, which Novell has achieved. NOT a DB filesystem. >It's not earth shattering, and sure there are other solutions. But it's >cute and could be useful. Add some bugs and a few security compromises, >and it almost looks like M$. Heh, except the stuff from Microsoft is likely more useful ;) Rob Nelson rdnelson@co.centre.pa.us
>Raw disk access allows: If I'm correct, mysql is providing a filesystem, not a way to access raw disk, like Oracle does. Huge difference there - with a filesystem, you have overhead of FS *and* SQL at the same time. Rob Nelson rdnelson@co.centre.pa.us
Hello, I have what I think is a trivial problem. But I wasn't able to find a solution in the FAQ or in the docs I have. It has to do with case sensitivity. I am working with an application of PostrreSQL (7.02) and PHP4. My table name is (for example) avi_f_amm and contains a field named Number (capital N) The query select * from avi_f_amm works fine. The query select * from avi_f_amm where Number gt 1 does not because number (small cap n) is to a field in the table. In other words Postgresql does not recognize "Number" and thinks it is "number" Any idea of what is going on or where should I look in the documentation? Fausto -- Fausto Guzzetti CNR - IRPI Perugia via della Madonna Alta, 126 06128 PERUGIA (Italy) Tel. +39 075.505.4943 Fax. +39 075.505.1325 E.mail F.Guzzetti@irpi.pg.cnr.it
<snip> > >Then I realized that although I had the disk space for the total file > >size, they were mostly smaller than a block on my disk, so I was going > >to loose alot of space. > > The solution to this is sub-allocation, which Novell has achieved. NOT a DB > filesystem. > > >It's not earth shattering, and sure there are other solutions. But it's > >cute and could be useful. Add some bugs and a few security compromises, > >and it almost looks like M$. > > Heh, except the stuff from Microsoft is likely more useful ;) > > > Rob Nelson > rdnelson@co.centre.pa.us > Doesn't ReiserFS support sub-allocation? I have never used it but remember that it was supposed to be efficient in storage of small files.
I believe "Number" should work, refer to the postgres Manual: http://www.postgresql.org/docs/user/x524.htm - Stuart > -----Original Message----- > From: Fausto Guzzetti [SMTP:Fausto.Guzzetti@irpi.pg.cnr.it] > Sent: Wednesday, January 17, 2001 1:21 PM > To: PostgreSQL General > Subject: Re: MySQL file system > > > Hello, > > I have what I think is a trivial problem. But I wasn't able to find a > solution > in the FAQ or in the docs I have. > > It has to do with case sensitivity. I am working with an application of > PostrreSQL (7.02) and PHP4. > > My table name is (for example) avi_f_amm and contains a field named Number > (capital N) > > The query select * from avi_f_amm works fine. > The query select * from avi_f_amm where Number gt 1 does not because > number > (small cap n) is to a field in the table. > In other words Postgresql does not recognize "Number" and thinks it is > "number" > > Any idea of what is going on or where should I look in the documentation? > > Fausto > > -- > Fausto Guzzetti > CNR - IRPI Perugia > via della Madonna Alta, 126 > 06128 PERUGIA (Italy) > Tel. +39 075.505.4943 > Fax. +39 075.505.1325 > E.mail F.Guzzetti@irpi.pg.cnr.it >
* Robert D. Nelson <RDNELSON@co.centre.pa.us> [010117 05:17] wrote: > >Raw disk access allows: > > If I'm correct, mysql is providing a filesystem, not a way to access raw > disk, like Oracle does. Huge difference there - with a filesystem, you have > overhead of FS *and* SQL at the same time. Oh, so it's sort of like /proc for mysql? What a terrible waste of time and resources. :( -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."
I wasn't following the thread too closely, but database for a filesystem has been done. BeOS uses a database for a filesystem as well as AS/400 and Mainframes. Adam Lang Systems Engineer Rutgers Casualty Insurance Company http://www.rutgersinsurance.com ----- Original Message ----- From: "Alfred Perlstein" <bright@wintelcom.net> To: "Robert D. Nelson" <RDNELSON@co.centre.pa.us> Cc: "Joseph Shraibman" <jks@selectacast.net>; "Karl DeBisschop" <karl@debisschop.net>; "Ned Lilly" <ned@greatbridge.com>; "PostgreSQL General" <pgsql-general@postgresql.org> Sent: Wednesday, January 17, 2001 12:23 PM Subject: Re: [GENERAL] MySQL file system > * Robert D. Nelson <RDNELSON@co.centre.pa.us> [010117 05:17] wrote: > > >Raw disk access allows: > > > > If I'm correct, mysql is providing a filesystem, not a way to access raw > > disk, like Oracle does. Huge difference there - with a filesystem, you have > > overhead of FS *and* SQL at the same time. > > Oh, so it's sort of like /proc for mysql? > > What a terrible waste of time and resources. :( > > -- > -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] > "I have the heart of a child; I keep it in a jar on my desk."
Is there a way to tune queries? I'm doing queries that join around 5-6 tables. All join fields are indexed either in hash (where tables are small enough and join is done on "="), or btree (big tables, not joined on "="). The tables have between several hundred and several tens of millions of records. The problem is that this tends to take a _VERY_ long time. I know that I'm asking for a bit much on such a huge task, but if I break the queries down manually into 10 simper ones that I could run manually with temporary tables, each of those would take a few seconds at most. The optimizer occasionally decides to do sequential scans, and this is probably what is killing the performance. Is there any way to give the optimizer hints? I suspect that it would help in a vast majority of cases if it looked ad the where clauses in views and selects on views and tried cutting down the working set through that, and then pruning down the rest as it went along. It just seems a bit strange that doing subqueries with temporary tables should be so much faster. Doing VACUUM ANALYZE often helps, but not always. Is there any way to give the optimizer hints on how to speed up selects on views that do big joins, both inner and outer? Another question - I have to do a join on the "<" operation. Something like: SELECT * FROM Table1, Table2 WHERE Table1.Field1 < Table2.Field2 ORDER BY Table1.Field1 DESC, Table2.Field2 DESC; This will give me a very large set of records. However, I only really need a few of those records. I only want the highest Field1, Field2 combination records for some other field in Table1. Effectively, something like: CREATE VIEW SomeView AS SELECT * FROM Table1, Table2 WHERE Table1.Field1 < Table2.Field2 ORDER BY Table1.Field1 DESC, Table2.Field2 DESC; and then doing: SELECT DISTINCT ON (Table1.Field3) * FROM SomeView. I would hope that DISTINCT would pick the first record returned for each of the different occurences in SomeView. Unfortunately it doesn't. By having a quick scan at the returned records, it doesn't seem to pick them in any particular order. Is this the case? And is there an SQL standard that says which records should DISTINCT return (first, last, or arbitrary)? And is there a way to enforce this behaviour, just in case the default behaviour changes in the future? Alternatively, can anyone think of a solution to this problem? Thanks. Gordan
On Thu, 18 Jan 2001, Gordan Bobic wrote: > Is there a way to tune queries? > > I'm doing queries that join around 5-6 tables. All join fields are indexed > either in hash (where tables are small enough and join is done on "="), or > btree (big tables, not joined on "="). The tables have between several > hundred and several tens of millions of records. The problem is that this > tends to take a _VERY_ long time. I know that I'm asking for a bit much on > such a huge task, but if I break the queries down manually into 10 simper > ones that I could run manually with temporary tables, each of those would > take a few seconds at most. The optimizer occasionally decides to do > sequential scans, and this is probably what is killing the performance. Is > there any way to give the optimizer hints? I suspect that it would help in > a vast majority of cases if it looked ad the where clauses in views and > selects on views and tried cutting down the working set through that, and > then pruning down the rest as it went along. It just seems a bit strange > that doing subqueries with temporary tables should be so much faster. Is it deciding that a lot of rows will match when it does the sequential scans? I guess a copy of the schema and query would help. > Doing VACUUM ANALYZE often helps, but not always. Is there any way to give > the optimizer hints on how to speed up selects on views that do big joins, > both inner and outer? > > Another question - I have to do a join on the "<" operation. Something > like: > > SELECT * FROM Table1, Table2 WHERE Table1.Field1 < Table2.Field2 ORDER BY > Table1.Field1 DESC, Table2.Field2 DESC; > > This will give me a very large set of records. However, I only really need > a few of those records. I only want the highest Field1, Field2 combination > records for some other field in Table1. Effectively, something like: > > CREATE VIEW SomeView AS SELECT * FROM Table1, Table2 WHERE Table1.Field1 < > Table2.Field2 ORDER BY Table1.Field1 DESC, Table2.Field2 DESC; > > and then doing: > > SELECT DISTINCT ON (Table1.Field3) * FROM SomeView. I would hope that > DISTINCT would pick the first record returned for each of the different > occurences in SomeView. > > Unfortunately it doesn't. By having a quick scan at the returned records, > it doesn't seem to pick them in any particular order. Is this the case? And > is there an SQL standard that says which records should DISTINCT return > (first, last, or arbitrary)? And is there a way to enforce this behaviour, > just in case the default behaviour changes in the future? DISTINCT ON is a non-standard "feature" and yes, without additional help, it will not pick a particular row (well, it's probably the first one it comes across). I believe if you use order by you can get the first/last by the metric you ordered by. So if you did Field3, Field1 DESC, Field2 DESC in the order by it should give you the highest Field1 valued row, and then higest Field2 for ties.
On Wed, Jan 17, 2001 at 02:21:01PM +0100, Fausto Guzzetti wrote: > > Hello, > > I have what I think is a trivial problem. But I wasn't able to find a solution > in the FAQ or in the docs I have. > > It has to do with case sensitivity. I am working with an application of > PostrreSQL (7.02) and PHP4. > > My table name is (for example) avi_f_amm and contains a field named Number > (capital N) > > The query select * from avi_f_amm works fine. > The query select * from avi_f_amm where Number gt 1 does not because number > (small cap n) is to a field in the table. > In other words Postgresql does not recognize "Number" and thinks it is "number" > > Any idea of what is going on or where should I look in the documentation? Try: select * from avi_f_amm where "Number" > 1; Cheers, Patrick
"Gordan Bobic" <gordan@freeuk.com> writes: > It just seems a bit strange > that doing subqueries with temporary tables should be so much faster. Hard to respond to this complaint without a specific example. > SELECT DISTINCT ON (Table1.Field3) * FROM SomeView. I would hope that > DISTINCT would pick the first record returned for each of the different > occurences in SomeView. It does, but kindly notice that you have not constrained what the first record returned is. See the example of SELECT DISTINCT ON usage in the SELECT reference page, http://www.postgresql.org/users-lounge/docs/7.0/postgres/sql-select.htm regards, tom lane
Hi. I'm getting a strange error message. It says: NOTICE: RegisterSharedInvalid: SI buffer overflow NOTICE: InvalidateSharedInvalid: cache state reset Is this bad/dangerous? What can I do to correct the problem? I'm using v7.1b3 from the RPMs. Thanks. Gordan
"Gordan Bobic" <gordan@freeuk.com> writes: > NOTICE: RegisterSharedInvalid: SI buffer overflow > NOTICE: InvalidateSharedInvalid: cache state reset > Is this bad/dangerous? What can I do to correct the problem? No, it's really just a debugging message that we never got around to downgrading to DEBUG level ... once upon a time that condition was risky, but I think it is safe now. If you see this a *lot* then you might have a performance problem due to catcache getting flushed all the time. An occasional occurrence is not something to worry over, however. regards, tom lane
This idea is such a popular (even old) one that Oracle developed it for 8i -- IFS. Yep, AS/400 has had it forever, and BeOS is another example. Informix has had its DataBlades for years, as well. In fact, Reiser-FS is an FS implemented on a DB, albeit probably not a SQL DB. AIX's LVM and JFS is extent/DB-based, as well. Let's see now, why would all those guys do that? (Now, some of those that aren't SQL-based probably won't allow SQL queries on files, so just think about those that do, for a minute).... Rather than asking why, a far better question is why not? There is SO much functionality to be gained here that it's silly to ask why. At a higher level, treating BLOBs as files and as DB entries simultaneously has so many uses, that one has trouble answering the question properly without the puzzled stare back at the questioner. Again, look at the above list, particularly at AS/400 -- the entire OS's FS sits on top of DB/2! For example, think how easy dynamically generated web sites could access online catalog information, with all those JPEG's, GIFs, PNGs, HTML files, Text files, .PDF's, etc., both in the DB and in the FS. This would be so much easier to maintain, when you have webmasters, web designers, artists, programmers, sysadmins, dba's, etc., all trying to manage a big, dynamic, graphics-rich web site. Who cares if the FS is a bit slow, as long as it's not too slow? That's not the point, anyway. The point is easy access to data: asset management, version control, the ability to access the same data as a file and as a BLOB simultaneously, the ability to replicate easier, the ability to use more tools on the same info, etc. It's not for speed, per se; instead, it's for accessibility. Think about this issue. You have some already compiled text-based program that works on binary files, but not on databases -- it was simply never designed into the program. How are you going to get your graphics BLOBs into that program? Oh yeah, let's write another program to transform our data into files, first, then after processing delete them in some cleanup routine.... Why? If you have a DB'ed FS, then file data can simultaneously have two views -- one for the DB and one as an FS. (You can easily reverse the scenario.) Not only does this save time and disk space; it saves you from having to pay for the most expensive element of all -- programmer time. BTW, once this FS-on-a-DB concept really sinks in, imagine how tightly integrated Linux/Unix apps could be written. Imagine if a bunch of GPL'ed software started coding for this and used this as a means to exchange data, all using a common set of libraries. You could get to the point of uniting files, BLOBs, data of all sorts, IPC, version control, etc., all under one umbrella, especially if XML was the means data was exchanged. Heck, distributed authentication, file access, data access, etc., could be improved greatly. Well, this paragraph sounds like flame bait, but really consider the ramifications. Also, read the next paragraph.... Something like this *has* existed for Postgres for a long time -- PGFS, by Brian Bartholomew. It's even supposedly matured with age. Unfortunately, I cannot get to http://www.wv.com/ (Working Version's main site). Working Version is a version control system that keeps old versions of files around in the FS. It uses PG as the back-end DB and lets you mount it like another FS. It's supposedly an awesome system, but where is it? It's not some clunky korbit thingy, either. (If someone can find it, please let me know by email, if possible.) The only thing I can find on this is from a Google search, which caches everything but the actual software: http://www.google.com/search?q=pgfs+postgres&num=100&hl=en&lr=lang_en&newwindow=1&safe=active Also, there is the Perl-FS that can be transformed into something like PGFS: http://www.assurdo.com/perlfs/ It allows you to write Perl code that can mount various protocols or data types as an FS, in user space. (One example is the ability to mount FTP sites, BTW.) The Linux Journal liked it so much that they covered it in the SysAdmin issue: http://www2.linuxjournal.com/lj-issues/issue42/1383.html Instead of ridiculing something you've never tried, consider that MySQL-FS, Oracle (IFS), Informix (DataBlades), AS/400 (DB/2), BeOS, and Reiser-FS are doing this today. Do you want to be left behind and let them tell us what it's good for? Or, do we want this for PG? (Reiser-FS, BTW, is FASTER than ext2, but has no SQL hooks). There were many posts on this on slashdot: http://slashdot.org/article.pl?sid=01/01/16/1855253&mode=thread (I wrote some comments here, as well, just look for mikehoskins) I, for one, want to see this succeed for MySQL, PostgreSQL, msql, etc. It's an awesome feature that doesn't need to be speedy because it can save HUMANS time. The question really is, "When do we want to catch up to everyone else?" We are always moving to higher levels of abstraction, anyway, so it's just a matter of time. PG should participate. Adam Lang wrote: > I wasn't following the thread too closely, but database for a filesystem has > been done. BeOS uses a database for a filesystem as well as AS/400 and > Mainframes. > > Adam Lang > Systems Engineer > Rutgers Casualty Insurance Company > http://www.rutgersinsurance.com > ----- Original Message ----- > From: "Alfred Perlstein" <bright@wintelcom.net> > To: "Robert D. Nelson" <RDNELSON@co.centre.pa.us> > Cc: "Joseph Shraibman" <jks@selectacast.net>; "Karl DeBisschop" > <karl@debisschop.net>; "Ned Lilly" <ned@greatbridge.com>; "PostgreSQL > General" <pgsql-general@postgresql.org> > Sent: Wednesday, January 17, 2001 12:23 PM > Subject: Re: [GENERAL] MySQL file system > > > * Robert D. Nelson <RDNELSON@co.centre.pa.us> [010117 05:17] wrote: > > > >Raw disk access allows: > > > > > > If I'm correct, mysql is providing a filesystem, not a way to access raw > > > disk, like Oracle does. Huge difference there - with a filesystem, you > have > > > overhead of FS *and* SQL at the same time. > > > > Oh, so it's sort of like /proc for mysql? > > > > What a terrible waste of time and resources. :( > > > > -- > > -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] > > "I have the heart of a child; I keep it in a jar on my desk."
This idea is such a popular (even old) one that Oracle developed it for 8i -- IFS. Yep, AS/400 has had it forever, and BeOS is another example. Informix has had its DataBlades for years, as well. In fact, Reiser-FS is an FS implemented on a DB, albeit probably not a SQL DB. AIX's LVM and JFS is extent/DB-based, as well. Let's see now, why would all those guys do that? (Now, some of those that aren't SQL-based probably won't allow SQL queries on files, so just think about those that do, for a minute).... Rather than asking why, a far better question is why not? There is SO much functionality to be gained here that it's silly to ask why. At a higher level, treating BLOBs as files and as DB entries simultaneously has so many uses, that one has trouble answering the question properly without the puzzled stare back at the questioner. Again, look at the above list, particularly at AS/400 -- the entire OS's FS sits on top of DB/2! For example, think how easy dynamically generated web sites could access online catalog information, with all those JPEG's, GIFs, PNGs, HTML files, Text files, .PDF's, etc., both in the DB and in the FS. This would be so much easier to maintain, when you have webmasters, web designers, artists, programmers, sysadmins, dba's, etc., all trying to manage a big, dynamic, graphics-rich web site. Who cares if the FS is a bit slow, as long as it's not too slow? That's not the point, anyway. The point is easy access to data: asset management, version control, the ability to access the same data as a file and as a BLOB simultaneously, the ability to replicate easier, the ability to use more tools on the same info, etc. It's not for speed, per se; instead, it's for accessibility. Think about this issue. You have some already compiled text-based program that works on binary files, but not on databases -- it was simply never designed into the program. How are you going to get your graphics BLOBs into that program? Oh yeah, let's write another program to transform our data into files, first, then after processing delete them in some cleanup routine.... Why? If you have a DB'ed FS, then file data can simultaneously have two views -- one for the DB and one as an FS. (You can easily reverse the scenario.) Not only does this save time and disk space; it saves you from having to pay for the most expensive element of all -- programmer time. BTW, once this FS-on-a-DB concept really sinks in, imagine how tightly integrated Linux/Unix apps could be written. Imagine if a bunch of GPL'ed software started coding for this and used this as a means to exchange data, all using a common set of libraries. You could get to the point of uniting files, BLOBs, data of all sorts, IPC, version control, etc., all under one umbrella, especially if XML was the means data was exchanged. Heck, distributed authentication, file access, data access, etc., could be improved greatly. Well, this paragraph sounds like flame bait, but really consider the ramifications. Also, read the next paragraph.... Something like this *has* existed for Postgres for a long time -- PGFS, by Brian Bartholomew. It's even supposedly matured with age. Unfortunately, I cannot get to http://www.wv.com/ (Working Version's main site). Working Version is a version control system that keeps old versions of files around in the FS. It uses PG as the back-end DB and lets you mount it like another FS. It's supposedly an awesome system, but where is it? It's not some clunky korbit thingy, either. (If someone can find it, please let me know by email, if possible.) The only thing I can find on this is from a Google search, which caches everything but the actual software: http://www.google.com/search?q=pgfs+postgres&num=100&hl=en&lr=lang_en&newwindow=1&safe=active Also, there is the Perl-FS that can be transformed into something like PGFS: http://www.assurdo.com/perlfs/ It allows you to write Perl code that can mount various protocols or data types as an FS, in user space. (One example is the ability to mount FTP sites, BTW.) Instead of ridiculing something you've never tried, consider that MySQL-FS, Oracle (IFS), Informix (DataBlades), AS/400 (DB/2), BeOS, and Reiser-FS are doing this today. Do you want to be left behind and let them tell us what it's good for? Or, do we want this for PG? (Reiser-FS, BTW, is FASTER than ext2, but has no SQL hooks). There were many posts on this on slashdot: http://slashdot.org/article.pl?sid=01/01/16/1855253&mode=thread (I wrote some comments here, as well, just look for mikehoskins) I, for one, want to see this succeed for MySQL, PostgreSQL, msql, etc. It's an awesome feature that doesn't need to be speedy because it can save HUMANS time. The question really is, "When do we want to catch up to everyone else?" We are always moving to higher levels of abstraction, anyway, so it's just a matter of time. PG should participate. Adam Lang wrote: > I wasn't following the thread too closely, but database for a filesystem has > been done. BeOS uses a database for a filesystem as well as AS/400 and > Mainframes. > > Adam Lang > Systems Engineer > Rutgers Casualty Insurance Company > http://www.rutgersinsurance.com > ----- Original Message ----- > From: "Alfred Perlstein" <bright@wintelcom.net> > To: "Robert D. Nelson" <RDNELSON@co.centre.pa.us> > Cc: "Joseph Shraibman" <jks@selectacast.net>; "Karl DeBisschop" > <karl@debisschop.net>; "Ned Lilly" <ned@greatbridge.com>; "PostgreSQL > General" <pgsql-general@postgresql.org> > Sent: Wednesday, January 17, 2001 12:23 PM > Subject: Re: [GENERAL] MySQL file system > > > * Robert D. Nelson <RDNELSON@co.centre.pa.us> [010117 05:17] wrote: > > > >Raw disk access allows: > > > > > > If I'm correct, mysql is providing a filesystem, not a way to access raw > > > disk, like Oracle does. Huge difference there - with a filesystem, you > have > > > overhead of FS *and* SQL at the same time. > > > > Oh, so it's sort of like /proc for mysql? > > > > What a terrible waste of time and resources. :( > > > > -- > > -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] > > "I have the heart of a child; I keep it in a jar on my desk."
On Tue, Jan 30, 2001 at 06:30:36PM -0600, Mike Hoskins wrote: <snip> > I, for one, want to see this succeed for MySQL, PostgreSQL, msql, etc. It's an > awesome feature that doesn't need to be speedy because it can save HUMANS time. > > The question really is, "When do we want to catch up to everyone else?" We are > always moving to higher levels of abstraction, anyway, so it's just a matter of > time. PG should participate. The rest of your argument is probably fine. I'm not qualified to argue the merits of this one way or another. But this passage's type of reasoning reminds me of the Linus/Tannenbaum microkernel debates. All the academics agreed that microkernel was _the_ way to go, and Linus disagreed for some very good reasons. So now we have Linux, and Minix has faded away. Yes, there are some working microkernels out there. But popularity or popular agreement does not make something a good idea. Can you say, "Windows"? Paul
What you're saying seems to be to have a data structure where the same data can be accessed in both the filesystem style and the RDBMs style. How does that work? How is the mapping done between both structures? Slapping a filesystem on top of a RDBMs doesn't do that does it? Most filesystems are basically databases already, just differently structured and featured databases. And so far most of them do their job pretty well. You move a folder/directory somewhere, and everything inside it moves. Tons of data are already arranged in that form. Though porting over data from one filesystem to another is not always straightforward, RDBMSes are far worse. Maybe what would be nice is not a filesystem based on a database, rather one influenced by databases. One with a decent fulltextindex for data and filenames, where you have the option to ignore or not ignore nonalphanumerics and still get an indexed search. Then perhaps we could do something like the following: select file.name from path "/var/logs/" where file.name like "%.log%' and file.lastmodified > '2000/1/1' and file.contents =~ 'te_st[0-9]+\.gif$' use index Checkpoints would be nice too. Then I can rollback to a known point if I screw up ;). In fact the SQL style interface doesn't have to be built in at all. Neither does the index have to be realtime. I suppose there could be an option to make it realtime if performance is not an issue. What could be done is to use some fast filesystem. Then we add tools to maintain indexes, for SQL style interfaces and other style interfaces. Checkpoints and rollbacks would be harder of course. Cheerio, Link.