Thread: Isnumeric function?
<p><font size="2">How could you determine if a value being inserted into a varchar column is numeric?</font><p><font size="2">Iwas thinking of using a Regular expression to find this, something like</font><p><font size="2">.. Where content~* '^[0-9]{1,10}'</font><p><font size="2">There must be an easier way like a isNumeric() function?</font><p><fontsize="2">Theo</font><table><tr><td bgcolor="#ffffff"><font color="#000000">______________________________________________________________________<br/>This email, including attachments,is intended only for the addressee<br />and may be confidential, privileged and subject to copyright. If you<br/>have received this email in error, please advise the sender and delete<br />it. If you are not the intended recipientof this email, you must not<br />use, copy or disclose its content to anyone. You must not copy or <br />communicateto others content that is confidential or subject to <br />copyright, unless you have the consent of the contentowner.<br /></font></td></tr></table>
On Tue, 2004-09-07 at 06:44, Theo Galanakis wrote: > How could you determine if a value being inserted into a varchar > column is numeric? > > I was thinking of using a Regular expression to find this, something > like > > .. Where content ~* '^[0-9]{1,10}' ~ '^[0-9]+$' Your version only checks the beginning of the string. > There must be an easier way like a isNumeric() function? Not that I know of. -- Oliver Elphick olly@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA ======================================== "For whosoever shall call upon the name of the Lord shall be saved." Romans 10:13
Oliver, Theo: > ~ '^[0-9]+$' Actually, I usually do: ~ '^[0-9]+\.?[0-9]*$' ... to include decimals. However, the above assumes that there is at least a "0" before the decimal; it would be nice to adapt it to matching a leading decimal (i.e. .057 ) as well. Can't see any easy way, though ... -- Josh Berkus Aglio Database Solutions San Francisco
On Wed, 2004-09-08 at 17:47, Josh Berkus wrote: > Oliver, Theo: > > > ~ '^[0-9]+$' > > Actually, I usually do: > > ~ '^[0-9]+\.?[0-9]*$' > > ... to include decimals. However, the above assumes that there is at least a > "0" before the decimal; it would be nice to adapt it to matching a leading > decimal (i.e. .057 ) as well. Can't see any easy way, though ... ~ '^([0-9]+|[0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+)$' -- Oliver Elphick olly@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA ======================================== "Put on the whole armor of God, that ye may be able to stand against thewiles of the devil." Ephesians 6:11
Theo, Oliver, > Any reason why you don't like ~ '^([0-9]?)+\.?[0-9]*$' ? Yes, because it also matches "." , which is not a valid numeric value. > ~ '^([0-9]+|[0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+)$' Ah, the brute force approach ;-) Actually, the above could be written: ~ '^([0-9]+)|([0-9]*\\.[0-9]+)$' ... though that still seems inelegant to me. Is there a regex expert in the house? -- Josh Berkus Aglio Database Solutions San Francisco
On Wed, 2004-09-08 at 18:48, Josh Berkus wrote: > Theo, Oliver, > > > Any reason why you don't like ~ '^([0-9]?)+\.?[0-9]*$' ? > > Yes, because it also matches "." , which is not a valid numeric value. > > > ~ '^([0-9]+|[0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+)$' > > Ah, the brute force approach ;-) Nothing like using a nice big hammer! > Actually, the above could be written: > > ~ '^([0-9]+)|([0-9]*\\.[0-9]+)$' But that doesn't allow a trailing decimal point. > > ... though that still seems inelegant to me. Is there a regex expert in the > house? All the elegant approaches I can think of match the empty string. There must be at least one digit and 0 or 1 decimal point with no other characters permitted. If you use this as a constraint, you could make it elegant and combine it with another constraint to exclude '' and '.'. -- Oliver Elphick olly@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA ======================================== "Put on the whole armor of God, that ye may be able to stand against thewiles of the devil." Ephesians 6:11
<p><font size="2">Thankyou all for your feedback. I actually only want to check for whole numbers, so the ~ '^[0-9]+$' expressionis good.</font><p><font size="2">The issue really is that our CMS system sometimes holds the value of primary keyswithin a "content" varchar column(don't ask!), which is a nightmare to search across. I tried applying an index acrossthe "content" varchar column and it failed.</font><p> <font size="2">error: btree item size 2744 exceeds maximum2713.</font><p><font size="2">I assume I had to change some server settings to extend the maximum, however in theend this column holds content, and even applying an index would be incredible slow to search across hundred of thousandsof "content" records looking for a primary key.</font><p><font size="2">So I came up with the following. A Insert/updatetrigger would call a procedure to check to see if the content is numeric(a whole number), if so would updatean indexed integer column called (content_numeric). Which would be the base column to search appon.</font><br /><p><fontsize="2">Here is the function anyway:</font><br /><font size="2">CREATE OR REPLACE FUNCTION update_content_node()</font><br/><font size="2"> RETURNS trigger AS</font><br /><font size="2">'</font><br /><font size="2">begin</font><br/><font size="2"> /* New function body */</font><br /><font size="2"> IF NEW.content ~ \'^[0-9]+$\'THEN</font><br /><font size="2"> NEW.content_numeric := NEW.content;</font><br /><font size="2"> ELSE</font><br/><font size="2"> NEW.content_numeric := null;</font><br /><font size="2"> END IF;</font><br /><font size="2"> RETURN NEW;</font><br /><font size="2">end;</font><br /><font size="2">'</font><br /><font size="2"> LANGUAGE'plpgsql' IMMUTABLE;</font><br /><p><font size="2">Does anyone have any better suggestions???</font><p><font size="2">Theo</font><br/><br /><table><tr><td bgcolor="#ffffff"><font color="#000000">______________________________________________________________________<br/>This email, including attachments,is intended only for the addressee<br />and may be confidential, privileged and subject to copyright. If you<br/>have received this email in error, please advise the sender and delete<br />it. If you are not the intended recipientof this email, you must not<br />use, copy or disclose its content to anyone. You must not copy or <br />communicateto others content that is confidential or subject to <br />copyright, unless you have the consent of the contentowner.<br /></font></td></tr></table>
Theo Galanakis <Theo.Galanakis@lonelyplanet.com.au> writes: > So I came up with the following. A Insert/update trigger would call a > procedure to check to see if the content is numeric(a whole number), if so > would update an indexed integer column called (content_numeric). Which would > be the base column to search appon. > CREATE OR REPLACE FUNCTION update_content_node() > RETURNS trigger AS > ' > begin > /* New function body */ > IF NEW.content ~ \'^[0-9]+$\' THEN > NEW.content_numeric := NEW.content; > ELSE > NEW.content_numeric := null; Hmm. Seems like you could get burnt by "content" that is by chance a long string of digits --- you'd get an integer overflow error at the attempt to assign to content_numeric. Can you make an assumption that indexable keys are at most 9 digits? If so thenIF NEW.content ~ \'^[0-9]{1,9}$\' THEN Or use a bigint column and crank up the number of digits appropriately. regards, tom lane
<p><font size="2">Thanks Tom,</font><br /><font size="2">Actually I did not attach the latest function, I did have a limitof 9 numerical characters, found that out when I applied the update to move all current numerical values to that column.</font><p><fontsize="2">Theo</font><p><font size="2">-----Original Message-----</font><br /><font size="2">From: TomLane [<a href="mailto:tgl@sss.pgh.pa.us">mailto:tgl@sss.pgh.pa.us</a>] </font><br /><font size="2">Sent: Thursday, 9 September2004 9:57 AM</font><br /><font size="2">To: Theo Galanakis</font><br /><font size="2">Cc: pgsql-sql@postgresql.org</font><br/><font size="2">Subject: Re: [SQL] Isnumeric function? </font><br /><p><font size="2">TheoGalanakis <Theo.Galanakis@lonelyplanet.com.au> writes:</font><br /><font size="2">> So I came up withthe following. A Insert/update trigger would call a </font><br /><font size="2">> procedure to check to see if thecontent is numeric(a whole number), </font><br /><font size="2">> if so would update an indexed integer column called(content_numeric). </font><br /><font size="2">> Which would be the base column to search appon.</font><p><fontsize="2">> CREATE OR REPLACE FUNCTION update_content_node()</font><br /><font size="2">> RETURNStrigger AS</font><br /><font size="2">> '</font><br /><font size="2">> begin</font><br /><font size="2">> /* New function body */</font><br /><font size="2">> IF NEW.content ~ \'^[0-9]+$\' THEN</font><br /><fontsize="2">> NEW.content_numeric := NEW.content;</font><br /><font size="2">> ELSE</font><br /><font size="2">> NEW.content_numeric := null;</font><p><font size="2">Hmm. Seems like you could get burnt by "content"that is by chance a long string of digits --- you'd get an integer overflow error at the attempt to assign to content_numeric. Can you make an assumption that indexable keys are at most 9 digits? If so then</font><p> <fontsize="2">IF NEW.content ~ \'^[0-9]{1,9}$\' THEN</font><br /><font size="2">Or use a bigint column and crank up the numberof digits appropriately.</font><p> <font size="2">regards, tom lane</font><table><tr><td bgcolor="#ffffff"><fontcolor="#000000">______________________________________________________________________<br />This email,including attachments, is intended only for the addressee<br />and may be confidential, privileged and subject to copyright.If you<br />have received this email in error, please advise the sender and delete<br />it. If you are not theintended recipient of this email, you must not<br />use, copy or disclose its content to anyone. You must not copy or<br />communicate to others content that is confidential or subject to <br />copyright, unless you have the consent ofthe content owner.<br /></font></td></tr></table>
Oliver Elphick wrote: >On Wed, 2004-09-08 at 18:48, Josh Berkus wrote: > > >>Theo, Oliver, >> >> >> >>>Any reason why you don't like ~ '^([0-9]?)+\.?[0-9]*$' ? >>> >>> >>Yes, because it also matches "." , which is not a valid numeric value. >> >> >> >>> ~ '^([0-9]+|[0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+)$' >>> >>> >>Ah, the brute force approach ;-) >> >> > >Nothing like using a nice big hammer! > > > Would "^([0-9]+\\.{0,1}[0-9]*|\\.[0-9]+)$" be a little cleaner? >>Actually, the above could be written: >> >>~ '^([0-9]+)|([0-9]*\\.[0-9]+)$' >> >> > >But that doesn't allow a trailing decimal point. > > > >>... though that still seems inelegant to me. Is there a regex expert in the >>house? >> >> > >All the elegant approaches I can think of match the empty string. There >must be at least one digit and 0 or 1 decimal point with no other >characters permitted. If you use this as a constraint, you could make >it elegant and combine it with another constraint to exclude '' and '.'. > >
Theo Galanakis <Theo.Galanakis@lonelyplanet.com.au> writes: > error: btree item size 2744 exceeds maximum 2713. > > I assume I had to change some server settings to extend the maximum, however I would guess the block size. But I'm just guessing. > in the end this column holds content, and even applying an index would be > incredible slow to search across hundred of thousands of "content" records > looking for a primary key. Perhaps you could have an indexed column that contains a crc32 hash? Then you could do searches by comparing crc32 which make for fast efficient integer index lookups. You should still include a comparison against the original content column since it is possible for there to be a rare crc32 collision. This doesn't let you do range lookups efficiently. But it does let you look up specific values quickly even when they aren't numeric. -- greg
Ok, how about this. At least it works in my testing. I have extended it to allow a negative sign (trailing also), which I would expect to be allowed in a comprehensive "isnumeric" function. If I am wrong, feel free to slap me around; although correcting the regex would be more constructive. ;-) create function isnumeric(text) returns boolean as ' select $1 ~ \'(-?([0-9]+\\.?[0-9]*|[0-9]*\\.?[0-9]+)|([0-9]+\\.?[0-9]*|[0-9]*\\.?[0-9]+)-?)\' 'language 'sql'; --- Thomas Swan <tswan@idigx.com> wrote: > Oliver Elphick wrote: > > >On Wed, 2004-09-08 at 18:48, Josh Berkus wrote: > > > > > >>Theo, Oliver, > >> > >> > >> > >>>Any reason why you don't like ~ > '^([0-9]?)+\.?[0-9]*$' ? > >>> > >>> > >>Yes, because it also matches "." , which is not a > valid numeric value. > >> > >> > >> > >>> ~ '^([0-9]+|[0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+)$' > >>> > >>> > >>Ah, the brute force approach ;-) > >> > >> > > > >Nothing like using a nice big hammer! > > > > > > > Would "^([0-9]+\\.{0,1}[0-9]*|\\.[0-9]+)$" be a > little cleaner? > > >>Actually, the above could be written: > >> > >>~ '^([0-9]+)|([0-9]*\\.[0-9]+)$' > >> > >> > > > >But that doesn't allow a trailing decimal point. > > > > > > > >>... though that still seems inelegant to me. Is > there a regex expert in the > >>house? > >> > >> > > > >All the elegant approaches I can think of match the > empty string. There > >must be at least one digit and 0 or 1 decimal point > with no other > >characters permitted. If you use this as a > constraint, you could make > >it elegant and combine it with another constraint > to exclude '' and '.'. > > > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > __________________________________ Do you Yahoo!? Yahoo! Mail - 50x more storage than other providers! http://promotions.yahoo.com/new_mail
How about this CREATE OR REPLACE FUNCTION is_numeric ( text ) RETURNS bool AS ' if { [string is integer $1] || [string is double $1] } { return true } return false ' LANGUAGE 'pltcl' IMMUTABLE; SELECT is_numeric ( '-1' );is_numeric ------------t (1 row) SELECT is_numeric ( '+1e-1' );is_numeric ------------t (1 row) SELECT is_numeric ( '1.1.1' );is_numeric ------------f (1 row) -- Vadim Passynkov -----Original Message----- From: Jeff Eckermann [mailto:jeff_eckermann@yahoo.com] Sent: Thursday, September 09, 2004 10:02 AM To: Thomas Swan; olly@lfix.co.uk Cc: Josh Berkus; Theo Galanakis; pgsql-sql@postgresql.org Subject: Re: [SQL] Isnumeric function? Ok, how about this. At least it works in my testing. I have extended it to allow a negative sign (trailing also), which I would expect to be allowed in a comprehensive "isnumeric" function. If I am wrong, feel free to slap me around; although correcting the regex would be more constructive. ;-) create function isnumeric(text) returns boolean as ' select $1 ~ \'(-?([0-9]+\\.?[0-9]*|[0-9]*\\.?[0-9]+)|([0-9]+\\.?[0-9]*|[0-9]*\\.?[0-9]+) -?)\' 'language 'sql'; --- Thomas Swan <tswan@idigx.com> wrote: > Oliver Elphick wrote: > > >On Wed, 2004-09-08 at 18:48, Josh Berkus wrote: > > > > > >>Theo, Oliver, > >> > >> > >> > >>>Any reason why you don't like ~ > '^([0-9]?)+\.?[0-9]*$' ? > >>> > >>> > >>Yes, because it also matches "." , which is not a > valid numeric value. > >> > >> > >> > >>> ~ '^([0-9]+|[0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+)$' > >>> > >>> > >>Ah, the brute force approach ;-) > >> > >> > > > >Nothing like using a nice big hammer! > > > > > > > Would "^([0-9]+\\.{0,1}[0-9]*|\\.[0-9]+)$" be a > little cleaner? > > >>Actually, the above could be written: > >> > >>~ '^([0-9]+)|([0-9]*\\.[0-9]+)$' > >> > >> > > > >But that doesn't allow a trailing decimal point. > > > > > > > >>... though that still seems inelegant to me. Is > there a regex expert in the > >>house? > >> > >> > > > >All the elegant approaches I can think of match the > empty string. There > >must be at least one digit and 0 or 1 decimal point > with no other > >characters permitted. If you use this as a > constraint, you could make > >it elegant and combine it with another constraint > to exclude '' and '.'. > > > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > __________________________________ Do you Yahoo!? Yahoo! Mail - 50x more storage than other providers! http://promotions.yahoo.com/new_mail ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Theo, > Does anyone have any better suggestions??? Well, one suggestion would be to take a machete to your application. Putting key references and text data in the same column? Sheesh. If that's not an option, in addition to the approach you've taken, you could also do a partial index on the appropriate numeric values: CREATE INDEX idx_content_numeric ON botched_table(content) WHERE content ~ '^[0-9]{1,9}$'; However, this approach may be more/less effective that the segregation approach you've already taken. -- Josh Berkus Aglio Database Solutions San Francisco
<p><font size="2">I was just thinking, wouldn't it be great if the pg community had a site where anyone could contributetheir generic functions, or request for a particular function.</font><p><font size="2">Cold Fusion has a cflib.org,perhaps a pglib.org?</font><br /><p><font size="2">-----Original Message-----</font><br /><font size="2">From:Jeff Eckermann [<a href="mailto:jeff_eckermann@yahoo.com">mailto:jeff_eckermann@yahoo.com</a>] </font><br /><fontsize="2">Sent: Friday, 10 September 2004 12:02 AM</font><br /><font size="2">To: Thomas Swan; olly@lfix.co.uk</font><br/><font size="2">Cc: Josh Berkus; Theo Galanakis; pgsql-sql@postgresql.org</font><br /><font size="2">Subject:Re: [SQL] Isnumeric function?</font><br /><p><font size="2">Ok, how about this. At least it works in mytesting. </font><br /><font size="2">I have extended it to allow a negative sign (trailing</font><br /><font size="2">also),which I would expect to be allowed in a</font><br /><font size="2">comprehensive "isnumeric" function. IfI am wrong,</font><br /><font size="2">feel free to slap me around; although correcting the</font><br /><font size="2">regexwould be more constructive. ;-)</font><p><font size="2">create function isnumeric(text) returns boolean as'</font><br /><font size="2">select $1 ~ \'(-?([0-9]+\\.?[0-9]*|[0-9]*\\.?[0-9]+)|([0-9]+\\.?[0-9]*|[0-9]*\\.?[0-9]+)-?)\'</font><br/><font size="2">'language 'sql';</font><p><fontsize="2">--- Thomas Swan <tswan@idigx.com> wrote:</font><p><font size="2">> Oliver Elphickwrote:</font><br /><font size="2">> </font><br /><font size="2">> >On Wed, 2004-09-08 at 18:48, Josh Berkuswrote:</font><br /><font size="2">> > </font><br /><font size="2">> ></font><br /><font size="2">>>>Theo, Oliver,</font><br /><font size="2">> >></font><br /><font size="2">> >> </font><br/><font size="2">> >></font><br /><font size="2">> >>>Any reason why you don't like ~</font><br/><font size="2">> '^([0-9]?)+\.?[0-9]*$' ?</font><br /><font size="2">> >>> </font><br /><fontsize="2">> >>></font><br /><font size="2">> >>Yes, because it also matches "." , which is nota</font><br /><font size="2">> valid numeric value.</font><br /><font size="2">> >></font><br /><font size="2">>>> </font><br /><font size="2">> >></font><br /><font size="2">> >>> ~ '^([0-9]+|[0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+)$'</font><br/><font size="2">> >>> </font><br /><font size="2">>>>></font><br /><font size="2">> >>Ah, the brute force approach ;-)</font><br /><font size="2">>>> </font><br /><font size="2">> >></font><br /><font size="2">> ></font><br /><fontsize="2">> >Nothing like using a nice big hammer!</font><br /><font size="2">> ></font><br /><font size="2">>> </font><br /><font size="2">> ></font><br /><font size="2">> Would "^([0-9]+\\.{0,1}[0-9]*|\\.[0-9]+)$"be a</font><br /><font size="2">> little cleaner?</font><br /><font size="2">></font><br /><font size="2">> >>Actually, the above could be written:</font><br /><font size="2">>>></font><br /><font size="2">> >>~ '^([0-9]+)|([0-9]*\\.[0-9]+)$'</font><br /><font size="2">>>> </font><br /><font size="2">> >></font><br /><font size="2">> ></font><br /><fontsize="2">> >But that doesn't allow a trailing decimal point.</font><br /><font size="2">> ></font><br/><font size="2">> > </font><br /><font size="2">> ></font><br /><font size="2">> >>...though that still seems inelegant to me. Is</font><br /><font size="2">> there a regex expert in the</font><br/><font size="2">> >>house?</font><br /><font size="2">> >> </font><br /><font size="2">>>></font><br /><font size="2">> ></font><br /><font size="2">> >All the elegant approachesI can think of match the</font><br /><font size="2">> empty string. There</font><br /><font size="2">> >mustbe at least one digit and 0 or 1 decimal point</font><br /><font size="2">> with no other</font><br /><font size="2">>>characters permitted. If you use this as a</font><br /><font size="2">> constraint, you could make</font><br/><font size="2">> >it elegant and combine it with another constraint</font><br /><font size="2">>to exclude '' and '.'.</font><br /><font size="2">> > </font><br /><font size="2">> ></font><br/><font size="2">> </font><br /><font size="2">> </font><br /><font size="2">> ---------------------------(endof</font><br /><font size="2">> broadcast)---------------------------</font><br /><fontsize="2">> TIP 6: Have you searched our list archives?</font><br /><font size="2">> </font><br /><font size="2">> <a href="http://archives.postgresql.org" target="_blank">http://archives.postgresql.org</a></font><br/><font size="2">> </font><br /><br /><p> <br/><font size="2">__________________________________</font><br /><font size="2">Do you Yahoo!?</font><br /><font size="2">Yahoo!Mail - 50x more storage than other providers! <a href="http://promotions.yahoo.com/new_mail" target="_blank">http://promotions.yahoo.com/new_mail</a></font><table><tr><tdbgcolor="#ffffff"><font color="#000000">______________________________________________________________________<br/>This email, including attachments,is intended only for the addressee<br />and may be confidential, privileged and subject to copyright. If you<br/>have received this email in error, please advise the sender and delete<br />it. If you are not the intended recipientof this email, you must not<br />use, copy or disclose its content to anyone. You must not copy or <br />communicateto others content that is confidential or subject to <br />copyright, unless you have the consent of the contentowner.<br /></font></td></tr></table>
Theo, > I was just thinking, wouldn't it be great if the pg community had a site > where anyone could contribute their generic functions, or request for a > particular function. In theory, this is supposed to be a feature of pgFoundry.org. However, there is a bug in gForge that prevents us from using it right now, and fixing the bug is complicated. -- Josh Berkus Aglio Database Solutions San Francisco
<br /><p><font size="2">Josh,</font><p><font size="2">I agree with the machete technique, unfortunately The structure isinplace and a work-around was required.</font><p><font size="2">I created the Index you specified, however it chooses torun a seq scan on the column rather than a Index scan. How can you force it to use that Index..</font><p><font size="2">CREATEINDEX idx_content_numeric ON botched_table(content) WHERE content ~ '^[0-9]{1,9}$';</font><p><font size="2">select* from botched_table where content = 200::integer</font><p><font size="2">Theo</font><br /><font size="2">-----OriginalMessage-----</font><br /><font size="2">From: Josh Berkus [<a href="mailto:josh@agliodbs.com">mailto:josh@agliodbs.com</a>]</font><br /><font size="2">Sent: Friday, 10 September 20044:46 AM</font><br /><font size="2">To: Theo Galanakis; pgsql-sql@postgresql.org</font><br /><font size="2">Subject: Re:[SQL] Isnumeric function?</font><br /><p><font size="2">Theo,</font><p><font size="2">> Does anyone have any bettersuggestions???</font><p><font size="2">Well, one suggestion would be to take a machete to your application. Putting</font><br /><font size="2">key references and text data in the same column? Sheesh.</font><p><font size="2">Ifthat's not an option, in addition to the approach you've taken, you could </font><br /><font size="2">also doa partial index on the appropriate numeric values:</font><p><font size="2">CREATE INDEX idx_content_numeric ON botched_table(content)WHERE content ~ '^[0-9]{1,9}$';</font><p><font size="2">However, this approach may be more/less effectivethat the segregation </font><br /><font size="2">approach you've already taken.</font><p><font size="2">-- </font><br/><font size="2">Josh Berkus</font><br /><font size="2">Aglio Database Solutions</font><br /><font size="2">SanFrancisco</font><table><tr><td bgcolor="#ffffff"><font color="#000000">______________________________________________________________________<br/>This email, including attachments,is intended only for the addressee<br />and may be confidential, privileged and subject to copyright. If you<br/>have received this email in error, please advise the sender and delete<br />it. If you are not the intended recipientof this email, you must not<br />use, copy or disclose its content to anyone. You must not copy or <br />communicateto others content that is confidential or subject to <br />copyright, unless you have the consent of the contentowner.<br /></font></td></tr></table>
On Friday 10 September 2004 04:20, Theo Galanakis wrote: > I was just thinking, wouldn't it be great if the pg community had a site > where anyone could contribute their generic functions, or request for a > particular function. i vote positive.
O Theo Galanakis έγραψε στις Sep 10, 2004 : > > > Josh, > > I agree with the machete technique, unfortunately The structure is inplace > and a work-around was required. > > I created the Index you specified, however it chooses to run a seq scan on > the column rather than a Index scan. How can you force it to use that > Index.. > > CREATE INDEX idx_content_numeric ON botched_table(content) WHERE content ~ > '^[0-9]{1,9}$'; > > select * from botched_table where content = 200::integer EXPLAIN ANALYZE is your friend. VACUUM [FULL] ANALYZE also. Try with 200::text In the end if there is an option for the planner to use the index but he doesn't, then maybe its not worth it. > > Theo > -----Original Message----- > From: Josh Berkus [mailto:josh@agliodbs.com] > Sent: Friday, 10 September 2004 4:46 AM > To: Theo Galanakis; pgsql-sql@postgresql.org > Subject: Re: [SQL] Isnumeric function? > > > Theo, > > > Does anyone have any better suggestions??? > > Well, one suggestion would be to take a machete to your application. > Putting > key references and text data in the same column? Sheesh. > > If that's not an option, in addition to the approach you've taken, you could > > also do a partial index on the appropriate numeric values: > > CREATE INDEX idx_content_numeric ON botched_table(content) WHERE content ~ > '^[0-9]{1,9}$'; > > However, this approach may be more/less effective that the segregation > approach you've already taken. > > -- -Achilleus
Theo Galanakis <Theo.Galanakis@lonelyplanet.com.au> writes: > I created the Index you specified, however it chooses to run a seq scan on > the column rather than a Index scan. How can you force it to use that > Index.. > > CREATE INDEX idx_content_numeric ON botched_table(content) WHERE content ~ > '^[0-9]{1,9}$'; > > select * from botched_table where content = 200::integer You need to put a "and content ~ '^[0-9]{1,9}$'" in your query, the clause has to match the clause in the partial index pretty closely. perhaps you would find it convenient to make a view ofselect * from botched_table where content ~ '^[0-9]{1,9}$' and then just always select these values from that view. Also the "::integer" is useless. It actually gets cast to text here anyways. The index is on the text contents of the content column. You might consider making the index a functional index on content::integer instead. I suspect that would be faster and smaller than an index on the text version of content: slo=> create table botched_table (content text); CREATE TABLE slo=> create index idx_botched_table on botched_table ((content::integer)) where content ~ '^[0-9]{1,9}$'; CREATE INDEX slo=> create view botched_view as (select content::integer as content_id, * from botched_table where content ~ '^[0-9]{1,9}$'); CREATE VIEW slo=> explain select * from botched_view where content_id = 1; QUERY PLAN ----------------------------------------------------------------------------------------Index Scan using idx_botched_tableon botched_table (cost=0.00..3.72 rows=3 width=32) Index Cond: ((content)::integer = 1) Filter: (content~ '^[0-9]{1,9}$'::text) (3 rows) -- greg
Greg Stark <gsstark@MIT.EDU> writes: > Theo Galanakis <Theo.Galanakis@lonelyplanet.com.au> writes: > > > I created the Index you specified, however it chooses to run a seq scan on > > the column rather than a Index scan. How can you force it to use that > > Index.. > > > > CREATE INDEX idx_content_numeric ON botched_table(content) WHERE content ~ > > '^[0-9]{1,9}$'; > > > > select * from botched_table where content = 200::integer > > You need to put a "and content ~ '^[0-9]{1,9}$'" in your query, the clause has > to match the clause in the partial index pretty closely. Well this is weird. I tried to come up with a cleaner way to arrange this than the view I described before using a function. But postgres isn't using the partial index when it seems it ought to be available. When I say it has to match "pretty closely" in this case I think it would have to match exactly, however in the case of simple range operators postgres knows how to figure out implications. Ie, "where a>1" should use a partial index built on "where a>0". slo=> create table test (a integer); CREATE TABLE slo=> create index idx_text on test (a) where a > 0; CREATE INDEX slo=> explain select * from test where a > 0; QUERY PLAN ------------------------------------------------------------------------Index Scan using idx_text on test (cost=0.00..17.50rows=334 width=4) Index Cond: (a > 0) (2 rows) slo=> explain select * from test where a > 1; QUERY PLAN ------------------------------------------------------------------------Index Scan using idx_text on test (cost=0.00..17.50rows=334 width=4) Index Cond: (a > 1) (2 rows) That's all well and good. But when I tried to make a version of your situation that used a function I found it doesn't work so well with functional indexes: slo=> create function test(integer) returns integer as 'select $1' language plpgsql immutable; CREATE FUNCTION slo=> create index idx_test_2 on test (test(a)) where test(a) > 0; CREATE INDEX slo=> explain select test(a) from test where test(a) > 0; QUERY PLAN --------------------------------------------------------------------------Index Scan using idx_test_2 on test (cost=0.00..19.17rows=334 width=4) Index Cond: (test(a) > 0) (2 rows) slo=> explain select test(a) from test where test(a) > 1; QUERY PLAN -------------------------------------------------------Seq Scan on test (cost=0.00..25.84 rows=334 width=4) Filter: (test(a)> 1) (2 rows) I can't figure out why this is happening. I would think it has something to do with the lack of statistics on functional indexes except a) none of the tables is analyzed anyways and b) the estimated row count is the same anyways. -- greg
Greg Stark <gsstark@MIT.EDU> writes: > That's all well and good. But when I tried to make a version of your > situation that used a function I found it doesn't work so well with > functional indexes: > ... > I can't figure out why this is happening. You're using 7.3 or older? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@MIT.EDU> writes: > > That's all well and good. But when I tried to make a version of your > > situation that used a function I found it doesn't work so well with > > functional indexes: > > ... > > I can't figure out why this is happening. > > You're using 7.3 or older? 7.4.3. -- greg