Thread: Closes Match
Hi there, I want to match a string against the database and get the closest (shorter) match. I was asking myself if its possible to realise this in SQL. So if the string is: abcdefg And in the database are entries like: a ab abc I just want to get back 'abc'. I looked around in the internet and the documentation but found no solution. Cheers Christian
On Mon, Jul 25, 2005 at 19:31:11 +0200, Christian Rusa <christian@rusa.at> wrote: > Hi there, > > I want to match a string against the database and get the closest > (shorter) match. > I was asking myself if its possible to realise this in SQL. > > So if the string is: > abcdefg > > And in the database are entries like: > a > ab > abc > > I just want to get back 'abc'. > > I looked around in the internet and the documentation but found no solution. You could ORDER BY the length of the strings descending and use LIMIT to return only one row. You can write a regular expression to do the matching. Something like: ^(a(b(c(d(e(fg?)?)?)?)?)?$
unsubscribe -----Original Message----- From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Bruno Wolff III Sent: Monday, July 25, 2005 10:50 AM To: Christian Rusa Cc: pgsql-sql@postgresql.org Subject: Re: [SQL] Closes Match On Mon, Jul 25, 2005 at 19:31:11 +0200, Christian Rusa <christian@rusa.at> wrote: > Hi there, > > I want to match a string against the database and get the closest > (shorter) match. > I was asking myself if its possible to realise this in SQL. > > So if the string is: > abcdefg > > And in the database are entries like: > a > ab > abc > > I just want to get back 'abc'. > > I looked around in the internet and the documentation but found no solution. You could ORDER BY the length of the strings descending and use LIMIT to return only one row. You can write a regular expression to do the matching. Something like: ^(a(b(c(d(e(fg?)?)?)?)?)?$ ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Hello Would you like to advice to use REINDEX DATABASE on regular basis ? if (yes) how it should be connected with VACUUM FULL ANALYZE which is run regularly ? (reindex before vacuum or vacuum before reindex?) else haw to determine _when_ to run REINDEX ? many thanks
> Hello > > Would you like to advice to use REINDEX DATABASE on regular basis ? > > if (yes) > how it should be connected with VACUUM FULL ANALYZE which is run > regularly ? (reindex before vacuum or vacuum before reindex?) > > else > haw to determine _when_ to run REINDEX ? If you are doing ordinary VACUUM ANALYZE frequently enough, it shouldn't be necessary to either VACUUM FULL or REINDEX. Back in the 7.2 days, there were sorts of update patterns that would mandate reindexing every so often, as you could get cases where index pages would be very sparsely populated. That was alleviated in version 7.3, I believe, and was clearly evident in 7.4. You know you need to REINDEX if analysis of an index shows that it is sparsely populated. This generally shows up if you do an analyze on the table and find an index has more pages than tuples. But if you run VACUUM reasonably frequently, this shouldn't be necessary... -- let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];; http://cbbrowne.com/info/rdbms.html Rules of the Evil Overlord #153. "My Legions of Terror will be an equal-opportunity employer. Conversely, when it is prophesied that no man can defeat me, I will keep in mind the increasing number of non-traditional gender roles." <http://www.eviloverlord.com/>
O Christopher Browne έγραψε στις Jul 26, 2005 : > > Hello > > > > Would you like to advice to use REINDEX DATABASE on regular basis ? > > > > if (yes) > > how it should be connected with VACUUM FULL ANALYZE which is run > > regularly ? (reindex before vacuum or vacuum before reindex?) > > > > else > > haw to determine _when_ to run REINDEX ? > > If you are doing ordinary VACUUM ANALYZE frequently enough, it > shouldn't be necessary to either VACUUM FULL or REINDEX. > > Back in the 7.2 days, there were sorts of update patterns that would > mandate reindexing every so often, as you could get cases where index > pages would be very sparsely populated. That was alleviated in > version 7.3, I believe, and was clearly evident in 7.4. > > You know you need to REINDEX if analysis of an index shows that it is > sparsely populated. This generally shows up if you do an analyze on > the table and find an index has more pages than tuples. > > But if you run VACUUM reasonably frequently, this shouldn't be > necessary... I vacuum analyze every hour, however after reindexdb my (currently) 2.47 GB db is reduced to 2.37 GB, thus helping both db-wise and freebsd_cache-wise boost performance. All the above in 7.4.6. I am not arguing that i have done detailed analysis of the situation, i'm just saying that i have witnessed a performance gain after running contrib/reindexdb every month (or so). > -- -Achilleus
achill@matrix.gatewaynet.com (Achilleus Mantzios) writes: > O Christopher Browne έγραψε στις Jul 26, 2005 : > >> > Hello >> > >> > Would you like to advice to use REINDEX DATABASE on regular basis ? >> > >> > if (yes) >> > how it should be connected with VACUUM FULL ANALYZE which is run >> > regularly ? (reindex before vacuum or vacuum before reindex?) >> > >> > else >> > haw to determine _when_ to run REINDEX ? >> >> If you are doing ordinary VACUUM ANALYZE frequently enough, it >> shouldn't be necessary to either VACUUM FULL or REINDEX. >> >> Back in the 7.2 days, there were sorts of update patterns that would >> mandate reindexing every so often, as you could get cases where index >> pages would be very sparsely populated. That was alleviated in >> version 7.3, I believe, and was clearly evident in 7.4. >> >> You know you need to REINDEX if analysis of an index shows that it is >> sparsely populated. This generally shows up if you do an analyze on >> the table and find an index has more pages than tuples. >> >> But if you run VACUUM reasonably frequently, this shouldn't be >> necessary... > > I vacuum analyze every hour, however after reindexdb > my (currently) 2.47 GB db is reduced to 2.37 GB, > thus helping both db-wise and freebsd_cache-wise boost performance. > > All the above in 7.4.6. > > I am not arguing that i have done detailed analysis of the > situation, i'm just saying that i have witnessed a performance gain after > running contrib/reindexdb every month (or so). That doesn't strike me as being a material improvement, and it comes at a pretty high cost. You can get a savings of about 4% of the space, but at the cost of taking an appreciable outage during which the database is not usable. I wouldn't expect the 4% savings in space to lead to a particularly measurable improvement in performance, certainly not one worth the outage. -- (format nil "~S@~S" "cbbrowne" "acm.org") http://www.ntlug.org/~cbbrowne/sap.html Rules of the Evil Overlord #78. "I will not tell my Legions of Terror "And he must be taken alive!" The command will be: ``And try to take him alive if it is reasonably practical.''" <http://www.eviloverlord.com/>
O Chris Browne έγραψε στις Jul 26, 2005 : > achill@matrix.gatewaynet.com (Achilleus Mantzios) writes: > > O Christopher Browne έγραψε στις Jul 26, 2005 : > > > >> > Hello > >> > > >> > Would you like to advice to use REINDEX DATABASE on regular basis ? > >> > > >> > if (yes) > >> > how it should be connected with VACUUM FULL ANALYZE which is run > >> > regularly ? (reindex before vacuum or vacuum before reindex?) > >> > > >> > else > >> > haw to determine _when_ to run REINDEX ? > >> > >> If you are doing ordinary VACUUM ANALYZE frequently enough, it > >> shouldn't be necessary to either VACUUM FULL or REINDEX. > >> > >> Back in the 7.2 days, there were sorts of update patterns that would > >> mandate reindexing every so often, as you could get cases where index > >> pages would be very sparsely populated. That was alleviated in > >> version 7.3, I believe, and was clearly evident in 7.4. > >> > >> You know you need to REINDEX if analysis of an index shows that it is > >> sparsely populated. This generally shows up if you do an analyze on > >> the table and find an index has more pages than tuples. > >> > >> But if you run VACUUM reasonably frequently, this shouldn't be > >> necessary... > > > > I vacuum analyze every hour, however after reindexdb > > my (currently) 2.47 GB db is reduced to 2.37 GB, > > thus helping both db-wise and freebsd_cache-wise boost performance. > > > > All the above in 7.4.6. > > > > I am not arguing that i have done detailed analysis of the > > situation, i'm just saying that i have witnessed a performance gain after > > running contrib/reindexdb every month (or so). > > That doesn't strike me as being a material improvement, and it comes > at a pretty high cost. The numbers above are just the numbers reported after doing a live reindex on the spot after deciding to answer to this email. In the past (when the db was smaller) and for the same version, i have come across much "striking" numbers, and much more striking boost gain. (e.g. a 2+ GB db directory turning to a ~1.5 GB dir). BTW whats the limit of kern.ipc.shmmax for 32-bit intel? ~ 2^31 = ~ 2 GB. In those cases i noticed severe boost gain. (And of course the 1 hour vacuum analyze schedule was always there) > > You can get a savings of about 4% of the space, but at the cost of > taking an appreciable outage during which the database is not usable. 1st not all database is unusable, during the whole reindexdb run, 2nd outage outside office hours is acceptable for those who apply. > > I wouldn't expect the 4% savings in space to lead to a particularly > measurable improvement in performance, certainly not one worth the > outage. > I am just saying that the common saying "reindex is not needed for 7.4+" maybe is not true in all circumstances. -- -Achilleus
achill@matrix.gatewaynet.com (Achilleus Mantzios) writes: > O Chris Browne έγραψε στις Jul 26, 2005 : >> You can get a savings of about 4% of the space, but at the cost of >> taking an appreciable outage during which the database is not usable. > > 1st not all database is unusable, during the whole reindexdb run, > 2nd outage outside office hours is acceptable for those who apply. I'm not willing to assume that. I help support some applications where "outages outside office hours" are acceptable; I help support some other applications for which "office hours" are 24 hours per day, 7 days per week, and such an outage would be deemed decidedly unacceptable. >> I wouldn't expect the 4% savings in space to lead to a particularly >> measurable improvement in performance, certainly not one worth the >> outage. > I am just saying that the common saying "reindex is not needed for > 7.4+" maybe is not true in all circumstances. Let me quote the release notes for 7.4.1: "Make free space map efficiently reuse empty index pages, and other free space management improvements In previous releases, B-tree index pages that were left empty because of deleted rows could only be reused by rows withindex values similar to the rows originally indexed on that page. In 7.4, VACUUM records empty index pages and allowsthem to be reused for any future index rows." <http://www.postgresql.at/pg/pgsql/doc/html/release-7-4.html> In versions earlier than 7.4, running a REINDEX periodically was *essential* if you had update patterns consistent with the (remarkably common) scenario described above. This reason to reindex (which was the main reason we required reindexing when using 7.2) has been resolved and gone away in 7.4. There may be other factors that could mandate REINDEX; as far as I can tell, the main such factor that remains would be where a table sees enormous numbers of updates but is not VACUUMed often enough. _That_ scenario isn't consistent with what you describe, as it would be expected to involve a whole lot more than 4% growth in the size of the database. -- (format nil "~S@~S" "cbbrowne" "acm.org") http://www.ntlug.org/~cbbrowne/sap.html Rules of the Evil Overlord #78. "I will not tell my Legions of Terror "And he must be taken alive!" The command will be: ``And try to take him alive if it is reasonably practical.''" <http://www.eviloverlord.com/>
Chris Browne <cbbrowne@acm.org> writes: > achill@matrix.gatewaynet.com (Achilleus Mantzios) writes: >> I am just saying that the common saying "reindex is not needed for >> 7.4+" maybe is not true in all circumstances. > ... > In versions earlier than 7.4, running a REINDEX periodically was > *essential* if you had update patterns consistent with the (remarkably > common) scenario described above. > This reason to reindex (which was the main reason we required > reindexing when using 7.2) has been resolved and gone away in 7.4. > There may be other factors that could mandate REINDEX; as far as I can > tell, the main such factor that remains would be where a table sees > enormous numbers of updates but is not VACUUMed often enough. I think the case that 7.4 resolved is where you have a continually-moving window of index values; for example, an index on a timestamp column in a table where you delete entries older than 30 days. Before 7.4, index pages for timestamps older than 30 days would become empty and then just sit there, with no other way to reclaim them than REINDEX. The case that isn't resolved yet is where you have a usage pattern that causes a lot of index pages to become mostly but not entirely empty. For example, your entries are timestamps, and you have a cleanup process that removes just 99 out of every 100 successive entries. This'll leave you with just a couple of index entries per page, which might not be infinite bloat but it's surely not too efficient. regards, tom lane
O Chris Browne έγραψε στις Jul 27, 2005 : > achill@matrix.gatewaynet.com (Achilleus Mantzios) writes: > > O Chris Browne έγραψε στις Jul 26, 2005 : > >> You can get a savings of about 4% of the space, but at the cost of > >> taking an appreciable outage during which the database is not usable. > > > > 1st not all database is unusable, during the whole reindexdb run, > > 2nd outage outside office hours is acceptable for those who apply. > > I'm not willing to assume that. > > I help support some applications where "outages outside office hours" > are acceptable; I help support some other applications for which > "office hours" are 24 hours per day, 7 days per week, and such an > outage would be deemed decidedly unacceptable. > > >> I wouldn't expect the 4% savings in space to lead to a particularly > >> measurable improvement in performance, certainly not one worth the > >> outage. > > > I am just saying that the common saying "reindex is not needed for > > 7.4+" maybe is not true in all circumstances. > > Let me quote the release notes for 7.4.1: > > "Make free space map efficiently reuse empty index pages, and other > free space management improvements > > In previous releases, B-tree index pages that were left empty > because of deleted rows could only be reused by rows with index > values similar to the rows originally indexed on that page. In > 7.4, VACUUM records empty index pages and allows them to be reused > for any future index rows." > > <http://www.postgresql.at/pg/pgsql/doc/html/release-7-4.html> > > In versions earlier than 7.4, running a REINDEX periodically was > *essential* if you had update patterns consistent with the (remarkably > common) scenario described above. > > This reason to reindex (which was the main reason we required > reindexing when using 7.2) has been resolved and gone away in 7.4. > > There may be other factors that could mandate REINDEX; as far as I can > tell, the main such factor that remains would be where a table sees > enormous numbers of updates but is not VACUUMed often enough. I'll try to postpone the next reindexdb at the end of august, and get some numbers then. However the big difference in performance as i told was near the 2GB "threshold", and at *that* point (and maybe for different reasons) performance gain was remarkable. > > _That_ scenario isn't consistent with what you describe, as it would > be expected to involve a whole lot more than 4% growth in the size of > the database. > -- -Achilleus
O Achilleus Mantzios έγραψε στις Jul 28, 2005 : > O Chris Browne έγραψε στις Jul 27, 2005 : > > > I'll try to postpone the next reindexdb at the end of august, > and get some numbers then. > However the big difference in performance as i told was near the 2GB > "threshold", and at *that* point (and maybe for different reasons) > performance gain was remarkable. > Well, today i run reindexdb and i noticed a decrease in database size, from 2890148K to 2527552K, about 12.54% decrease in size. However i have not an estimation of the distribution of the SQL commands (INSERT,UPDATE,DELETE) over August. I am running 7.4.6. > > -- -Achilleus
Hello all ! look template1=> \set a select template1=> \set b 1 template1=> :a :b; column ------ 1 hope this is only psql's feathure not the server's one;
On Wed, Nov 23, 2005 at 02:54:27PM +0300, Eugene E. wrote: > template1=> \set a select > template1=> \set b 1 > template1=> :a :b; > > column > ------ > 1 > > hope this is only psql's feathure not the server's one; What about this do you find surprising? Have you read the psql documentation regarding \set, variables, and interpolation? http://www.postgresql.org/docs/8.1/interactive/app-psql.html#APP-PSQL-VARIABLES test=> \set a select test=> \set b 1 test=> \set ECHO queries test=> :a :b; select 1;?column? ---------- 1 (1 row) -- Michael Fuhr
Hi all the serious problem with permissions is encountered NOTE: the following example is really useful but there is no room to describe it's use. db=# CREATE USER u; db=# CREATE TABLE t (i int, a text); db=# REVOKE all ON t FROM u; db=# GRANT update,insert,delete ON t TO u; db=# \c - u db=> INSERT INTO t VALUES (1,'x'); INSERT db=> UPDATE t SET a='y' WHERE i=1; ERROR: Permission denied for relation t; db=> UPDATE t SET a='y'; UPDATE 1) The user "u" is permitted but unable to perfom the operation ! 2) A user is able to update WHOLE table but unable to update ANY part of it ! Please examine the following patch and make your judgment: --- src/backend/executor/execMain.c.orig 2005-11-22 1:23:08.000000000 +0300 +++ src/backend/executor/execMain.c 2006-02-17 13:19:29.000000000 +0300 @@ -460,6 +460,16 @@ bool do_select_into; TupleDesc tupType; + if ( operation == CMD_UPDATE || operation == CMD_DELETE ) + { + ListCell *l; + foreach(l, parseTree->rtable) + { + RangeTblEntry *rte = lfirst(l); + rte->requiredPerms ^= ACL_SELECT; + } + } + /* * Do permissions checks. It's sufficient to examine the query's top * rangetable here --- subplan RTEs will be checked during
Hi Fellow PostgreSQL users, Just a question here. Is there any bad effect on the PostgreSQL performance If I encounter many fails on inserting records to database with primary key column. For example I have this table CREATE TABLE unique_items ( item_id text NOT NULL, CONSTRAINT unique_items_pkey PRIMARY KEY (item_id), CONSTRAINT unique_item_id_fk FOREIGN KEY (item_id) REFERENCES items (item_id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE ) WITH OIDS; With these valu: Unique items ------------ Item001 Then I have a program that insert 1(one) million times like this: Insert into unique_items(item_id) values('Item001) Ofcourse we all know that it will fail because there is already a record in the database. Would there be any bad effect on the database or none? Regards, Ian I choose Polesoft Lockspam to fight spam, and you? http://www.polesoft.com/refer.html
On Mon, 2006-03-13 at 12:51 +0300, Eugene E. wrote: > Hi all > the serious problem with permissions is encountered > > NOTE: the following example is really useful but there is no room to > describe it's use. > > > db=# CREATE USER u; > db=# CREATE TABLE t (i int, a text); > db=# REVOKE all ON t FROM u; > db=# GRANT update,insert,delete ON t TO u; > db=# \c - u > > db=> INSERT INTO t VALUES (1,'x'); > INSERT > db=> UPDATE t SET a='y' WHERE i=1; > ERROR: Permission denied for relation t; > db=> UPDATE t SET a='y'; > UPDATE > > 1) The user "u" is permitted but unable to perfom the operation ! > 2) A user is able to update WHOLE table but unable to update ANY part of > it ! > Good chance this was on purpose. BEGIN; UPDATE compensation SET salary = salary WHERE name = 'Tom' and salary BETWEEN 50000 and 60000; -- No rows updated -- that's not Toms salary rollback; BEGIN; UPDATE compensation SETsalary = salary WHERE name = 'Tom' and salary BETWEEN 60000 and 70000; -- One row updated so I found the range,I need a raise! rollback; By allowing the user a where clause you grant them select privileges. You will find that delete works the same way. This is one of those times when per column permissions are useful. You could grant them select access on the "name" column but not the "salary" column. --
Rod Taylor wrote: > By allowing the user a where clause you grant them select privileges. > You will find that delete works the same way. > > This is one of those times when per column permissions are useful. You > could grant them select access on the "name" column but not the "salary" > column. If I understand clearly, the patch he posted modified things so that if the user issued an UPDATE command, the SELECT permission was required as well. Thus a user with UPDATE privileges but no SELECT was not allowed to execute the UPDATE command. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, 2006-03-13 at 10:46 -0400, Alvaro Herrera wrote: > Rod Taylor wrote: > > > By allowing the user a where clause you grant them select privileges. > > You will find that delete works the same way. > > > > This is one of those times when per column permissions are useful. You > > could grant them select access on the "name" column but not the "salary" > > column. > > If I understand clearly, the patch he posted modified things so that if > the user issued an UPDATE command, the SELECT permission was required as > well. Thus a user with UPDATE privileges but no SELECT was not allowed > to execute the UPDATE command. Okay, I got it backward. The exclamation mark behind the first point made me think it was an issue. What information can be retrieved from a structure by being able to update all rows? You possibly find the range of values (max/min via integer overflow) but still would not know which tuple is associated with which value. I don't have a use-case for update but the delete / insert combination is quite handy for doing data resets from a host which should not be able to retrieve the interim states.
"Eugene E." <sad@bankir.ru> writes: > db=# REVOKE all ON t FROM u; > db=# GRANT update,insert,delete ON t TO u; > db=# \c - u > db=> INSERT INTO t VALUES (1,'x'); > INSERT > db=> UPDATE t SET a='y' WHERE i=1; > ERROR: Permission denied for relation t; > db=> UPDATE t SET a='y'; > UPDATE This behavior is correct and as documented in the UPDATE reference page: You must have the UPDATE privilege on the table to update it, aswell as the SELECT privilege to any table whose values arereadin the expressions or condition. The use of "i" in the WHERE clause is what causes SELECT privilege to be needed. If we had per-column privileges then we could be finer-grained about it, but we don't (yet). > Please examine the following patch and make your judgment: This patch is so wrong it's not even worth discussing :-( regards, tom lane
Christian Paul B. Cosinas wrote: > > Then I have a program that insert 1(one) million times like this: > Insert into unique_items(item_id) values('Item001) > > Ofcourse we all know that it will fail because there is already a record in > the database. > > Would there be any bad effect on the database or none? No long-term effects. Obviously it takes time to run one million queries even if they are all inserts that fail. -- Richard Huxton Archonet Ltd
> What information can be retrieved from a structure by being able to > update all rows? Write a plpgsql function snoop(x) which inserts x into a table 'log' created by you, and also returns x.UPDATE users SET password=snoop(password).Read log table.Done. If you have WHERE rights on a table, you can guess any column content pretty quick. Just do a binary search. Some time ago I had a friend whose website had some SQL injection holes, and he said "so what ? who cares ? I have magicquotes" (lol), so I coded a python script which injected a "password BETWEEN x AND y" (using CHR() to avoid quotes) and narrowed the search. It took about 1 minute to get the password (which turned out to be md5 that resisted a few seconds to dictionary attack using whatever evil hax0rz tool).
Tom Lane wrote: > "Eugene E." <sad@bankir.ru> writes: > >>db=# REVOKE all ON t FROM u; >>db=# GRANT update,insert,delete ON t TO u; >>db=# \c - u > > >>db=> INSERT INTO t VALUES (1,'x'); >>INSERT >>db=> UPDATE t SET a='y' WHERE i=1; >>ERROR: Permission denied for relation t; >>db=> UPDATE t SET a='y'; >>UPDATE > > > This behavior is correct and as documented in the UPDATE reference page: Good if you have a strange behavior - just document it. quite good. > You must have the UPDATE privilege on the table to update it, as > well as the SELECT privilege to any table whose values are read > in the expressions or condition. This means that some privileges are NOT INDEPENDENT.
PFC wrote: > > >> What information can be retrieved from a structure by being able to >> update all rows? > > > Write a plpgsql function snoop(x) which inserts x into a table > 'log' created by you, and also returns x. > UPDATE users SET password=snoop(password). > Read log table. > Done. This trick is available _NOW_. (in the current state of permission system)
Eugene E. wrote: > PFC wrote: > >> >> >>> What information can be retrieved from a structure by being able to >>> update all rows? >> >> >> >> Write a plpgsql function snoop(x) which inserts x into a table >> 'log' created by you, and also returns x. >> UPDATE users SET password=snoop(password). >> Read log table. >> Done. > > > This trick is available _NOW_. > (in the current state of permission system) I AM WRONG ! sorry.
Hi, Eugene, Eugene E. wrote: > This means that some privileges are NOT INDEPENDENT. No, it means that the UPDATE operation needs both UPDATE and SELECT privileges. Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
Markus Schaber wrote: > Hi, Eugene, > > Eugene E. wrote: > > >>This means that some privileges are NOT INDEPENDENT. > > > No, it means that the UPDATE operation needs both UPDATE and SELECT > privileges. > > Markus > thanx. I already clear this to me.