Thread: problem with splitting a string
Hi,<br /><br />I'm trying to develop a contrib module in order to parse sqlf queries, I'm using lemon as a LALR parser generator(because I think it's easier than bison) and re2c (because I think it's easier than flex) but when I try to splitthe string into words postgres add some weird characters (this works in pure gcc), I write something like "CREATE FUZZYPREDICATE joven ON 0..120 AS (0,0,35,120);", but postgresql adds a character like at the end of "joven" and the otherswords.<br /><br />The code I use to split the string is:<br /><br />void parse_query(char *str,const char **sqlf){<br/><br /> parse_words(str);<br /> *sqlf=fuzzy_query;<br />}<br />void parse_words(char *str){<br /> char*word;<br /> int token;<br /> const char semicolon =';';<br /> const char dot='.';<br /> const char comma=',';<br/> const char open_bracket='(';<br /> const char close_bracket=')';<br /> struct Token sToken;<br /><br/> int i = 0;<br /><br /> void* pParser = ParseAlloc (malloc);<br /><br /> while(str[i] !='\0'){<br /> int c=0;<br /><br /> word=(char *)malloc(sizeof(char));<br /><br /> if(isspace(str[i]) || str[i]==semicolon){<br/> i++;<br /> continue;<br /> }<br /><br /> if (str[i]==open_bracket|| str[i]==close_bracket ||<br /> str[i]==dot || str[i]==comma){<br /> word[c]= str[i];<br /> i++;<br /> token=scan(word, strlen(word));<br /> Parse(pParser,token, sToken);<br /> continue;<br /> }else{<br /> while(!isspace(str[i])&& str[i]!=semicolon && str[i]!='\0' &&<br /> str[i]!=open_bracket&& str[i]!=close_bracket &&<br /> str[i]!=dot && str[i]!=comma){<br/> word[c++] = str[i++];<br /> }<br /> }<br /><br /> token=scan(word,strlen(word));<br /><br /> if (token==PARAMETRO){<br /> //TODO: I don't know why it needsthe malloc function again, all I know is it's working<br /> const char *param=word;<br /> word=(char *)malloc(sizeof(char));<br /> sToken.z=param;<br /> }<br /><br /> Parse(pParser, token,sToken);<br /> free(word);<br /> }<br /> Parse(pParser, 0, sToken);<br /> ParseFree(pParser, free );<br/><br />}<br /><br />Header:<br /><br />#ifndef SQLF_H_<br />#define SQLF_H_<br /><br />typedef struct Token {<br /> const char *z;<br /> int value;<br /> unsigned n;<br />} Token;<br />void parse_query(char *str,const char **sqlf);<br/>void parse_words(char *str);<br /> int scan(char *s, int l);<br /><br />#endif /* SQLF_H_ */<br /><br /><br/>Screen:<br /><br />postgres=# select * from fuzzy.sqlf('CREATE FUZZY PREDICATE joven ON 0..120 AS (0,0,35,120);'::text);<br/>ERROR: syntax error at or near ""<br /> LINE 1: INSERT INTO fuzzydb.pg_fuzzypredicate VALUES(joven,0�<br/> �,120<br /> ...<br /> ^<br /> QUERY: INSERT INTO fuzzydb.pg_fuzzypredicate VALUES(joven,0�<br/> �,120<br /> �,0�<br /> �,0�<br /> �,35<br /> �,120<br /> �);<br /><br />Thanks forany help<br />
Werner Echezuria <wercool@gmail.com> writes: > I'm trying to develop a contrib module in order to parse sqlf queries, I'm > using lemon as a LALR parser generator (because I think it's easier than > bison) and re2c (because I think it's easier than flex) but when I try to > split the string into words postgres add some weird characters (this works > in pure gcc), I write something like "CREATE FUZZY PREDICATE joven ON 0..120 > AS (0,0,35,120);", but postgresql adds a character like at the end of > "joven" and the others words. Maybe you are expecting 'text' values to be null-terminated? They are not. You might look into using TextDatumGetCString or related functions to convert. regards, tom lane PS: the chances of us accepting a contrib module that requires significant unusual infrastructure to build seem pretty low from where I sit. You're certainly free to do whatever you want for private work, or even for a pgfoundry project --- but if you do have ambitions of this eventually becoming contrib, "it's easier" is not going to be sufficient rationale to not use bison/flex.
Hi,<br /><br />Well, I use TextDatumGetCString in the main file, but it remains with the weird characters.<br /><br />thisis the main file:<br /><br />#include "postgres.h"<br />#include "fmgr.h"<br />#include "gram.h"<br /> #include "sqlf.h"<br/>#include "utils/builtins.h"<br /><br />extern Datum sqlf(PG_FUNCTION_ARGS);<br /><br />PG_MODULE_MAGIC;<br /><br/>PG_FUNCTION_INFO_V1(sqlf);<br /><br />Datum<br />sqlf(PG_FUNCTION_ARGS){<br /><br /> char *query = TextDatumGetCString(PG_GETARG_DATUM(0));<br/> const char *parse_str;<br /> char *result;<br /><br /> parse_query(query,&parse_str);<br /><br /> result=parse_str;<br /><br /> PG_RETURN_TEXT_P(cstring_to_text(result));<br/>}<br /><br />About the PS: Ok, I understand that if I want that you includethis as a contrib module I need to use bison/flex, I never thought about it, but I now have a couple of questions:<br/> What are the chances to really include it in PostgreSQL as a contrib module?<br />Are there any requirementI have to follow?<br /><br /><div class="gmail_quote">2009/8/6 Tom Lane <span dir="ltr"><<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>></span><br/><blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="im">Werner Echezuria<<a href="mailto:wercool@gmail.com">wercool@gmail.com</a>> writes:<br /> > I'm trying to develop a contribmodule in order to parse sqlf queries, I'm<br /> > using lemon as a LALR parser generator (because I think it'seasier than<br /> > bison) and re2c (because I think it's easier than flex) but when I try to<br /> > split thestring into words postgres add some weird characters (this works<br /> > in pure gcc), I write something like "CREATEFUZZY PREDICATE joven ON 0..120<br /> > AS (0,0,35,120);", but postgresql adds a character like at the end of<br/> > "joven" and the others words.<br /><br /></div>Maybe you are expecting 'text' values to be null-terminated? They are<br /> not. You might look into using TextDatumGetCString or related functions<br /> to convert.<br/><br /> regards, tom lane<br /><br /> PS: the chances of us accepting a contrib modulethat requires<br /> significant unusual infrastructure to build seem pretty low from<br /> where I sit. You're certainlyfree to do whatever you want for<br /> private work, or even for a pgfoundry project --- but if you do<br /> haveambitions of this eventually becoming contrib, "it's easier"<br /> is not going to be sufficient rationale to not usebison/flex.<br /></blockquote></div><br />
Werner Echezuria <wercool@gmail.com> writes: > Well, I use TextDatumGetCString in the main file, but it remains with the > weird characters. Hmm, no ideas then. Your interface code looks fine (making parse_str const seems a bit strange, but it's not related to the problem at hand). Given that the problems appear at token boundaries I'd guess that re2c isn't behaving the way you expect, but I'm not familiar with that tool so I can't give any specific advice. > About the PS: Ok, I understand that if I want that you include this as a > contrib module I need to use bison/flex, I never thought about it, but I now > have a couple of questions: > What are the chances to really include it in PostgreSQL as a contrib module? > Are there any requirement I have to follow? Well, it'd mainly be a question of whether there's enough interest out there, which I can't judge. From a project standpoint we just require that it be BSD-licensed and not impose any undue new burden on maintainers (thus not wanting new build tools), but beyond that it's a matter of how many people might use it. regards, tom lane
Tom Lane escribió: > Well, it'd mainly be a question of whether there's enough interest out > there, which I can't judge. From a project standpoint we just require > that it be BSD-licensed and not impose any undue new burden on > maintainers (thus not wanting new build tools), but beyond that it's a > matter of how many people might use it. What use is there for fuzzy predicates? I think it would mainly be to stop more students from coming up with new implementations of the same thing over and over. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt0pt 0.8ex; padding-left: 1ex;"><div class="im"><br /></div>What use is there for fuzzy predicates? I think it would mainlybe to<br /> stop more students from coming up with new implementations of the same<br /> thing over and over.<br /></blockquote></div><br/>Well, I'm sorry if anyone of us who is involved on these projects have already explain the trueusefulness of sqlf and fuzzy database, I guess we focus just in the technical problem, but never explain the theory.<br/><br />For example here is a paragraph from Flexible queries in relational databases paper:<br /><br /> "Thispaper deals with this second type of "uncertainty" and is concerned essentially with<br />database language extensionsin order to deal with more expressive requirements. Indeed,<br /> consider a query such that, for instance, "retrievethe apartments which are not too expensive<br />and not too far from downtown". In such a case, there does not exista definite threshold for<br />which the price becomes suddenly too high, but rather we have to discriminate between<br/> prices which are perfectly acceptable for the user, and other prices, somewhat higher, which<br />are stillmore or less acceptable (especially if the apartment is close to downtown). Note that<br />the meaning of vague predicateexpressions like "not too expensive" is context/user<br /> dependent, rather than universal. Fuzzy set membershipfunctions [26] are convenient tools<br />for modelling user's preference profiles and the large panoply of fuzzyset connectives can<br />capture the different user attitudes concerning the way the different criteria present in his/her<br/> query compensate or not; see [4] for a unified presentation in the fuzzy set framework of the<br />existingproposals for handling flexible queries. Moreover in a given query, some part of the<br />request may be less importantto fulfill (e.g., in the above example, the price requirement<br /> may be judged more important than the distanceto downtown); the handling of importance<br />leads to the need for weighted connectives, as it will be seen in thefollowing."<br /><br /><br />I really think this could be something useful, but it is sometimes difficult to implementand I'm trying to make a different and easy way to do things.<br /><br />regards<br />