Thread: problem with splitting a string

problem with splitting a string

From
Werner Echezuria
Date:
Hi,<br /><br />I'm trying to develop a contrib module in order to parse sqlf queries, I'm using lemon as a LALR parser
generator(because I think it's easier than bison) and re2c (because I think it's easier than flex) but when I try to
splitthe string into words postgres add some weird characters (this works in pure gcc), I write something like "CREATE
FUZZYPREDICATE joven ON 0..120 AS (0,0,35,120);", but postgresql adds a character like  at the end of "joven" and the
otherswords.<br /><br />The code I use to split the string is:<br /><br />void parse_query(char *str,const char
**sqlf){<br/><br />    parse_words(str);<br />    *sqlf=fuzzy_query;<br />}<br />void parse_words(char *str){<br />   
char*word;<br />    int token;<br />     const char semicolon =';';<br />    const char dot='.';<br />    const char
comma=',';<br/>    const char open_bracket='(';<br />    const char close_bracket=')';<br />    struct Token sToken;<br
/><br/>    int i = 0;<br /><br />    void* pParser = ParseAlloc (malloc);<br /><br />    while(str[i] !='\0'){<br />   
   int c=0;<br /><br />        word=(char *)malloc(sizeof(char));<br /><br />        if(isspace(str[i]) ||
str[i]==semicolon){<br/>             i++;<br />            continue;<br />        }<br /><br />        if
(str[i]==open_bracket|| str[i]==close_bracket ||<br />            str[i]==dot || str[i]==comma){<br />               
word[c]= str[i];<br />                i++;<br />                 token=scan(word, strlen(word));<br />               
Parse(pParser,token, sToken);<br />                continue;<br />        }else{<br />           
while(!isspace(str[i])&& str[i]!=semicolon && str[i]!='\0' &&<br />                    
str[i]!=open_bracket&& str[i]!=close_bracket &&<br />                    str[i]!=dot &&
str[i]!=comma){<br/>                        word[c++] = str[i++];<br />            }<br />        }<br /><br />       
token=scan(word,strlen(word));<br /><br />        if (token==PARAMETRO){<br />            //TODO: I don't know why it
needsthe malloc function again, all I know is it's working<br />            const char *param=word;<br />            
word=(char *)malloc(sizeof(char));<br />            sToken.z=param;<br />        }<br /><br />        Parse(pParser,
token,sToken);<br />        free(word);<br />    }<br />  Parse(pParser, 0, sToken);<br />  ParseFree(pParser, free
);<br/><br />}<br /><br />Header:<br /><br />#ifndef SQLF_H_<br />#define SQLF_H_<br /><br />typedef struct Token {<br
/> const char *z;<br />  int value;<br />  unsigned n;<br />} Token;<br />void parse_query(char *str,const char
**sqlf);<br/>void parse_words(char *str);<br /> int scan(char *s, int l);<br /><br />#endif /* SQLF_H_ */<br /><br
/><br/>Screen:<br /><br />postgres=# select * from fuzzy.sqlf('CREATE FUZZY PREDICATE joven ON 0..120 AS
(0,0,35,120);'::text);<br/>ERROR:  syntax error at or near ""<br /> LINE 1: INSERT INTO fuzzydb.pg_fuzzypredicate
VALUES(joven,0�<br/>                                                               �,120<br
/>                                                                    ...<br
/>                                                         ^<br /> QUERY:  INSERT INTO fuzzydb.pg_fuzzypredicate
VALUES(joven,0�<br/>                                                               �,120<br
/>                                                                    �,0�<br
/>                                                                          �,0�<br />
                                                                                �,35<br
/>                                                                                      �,120<br
/>                                                                                            �);<br /><br />Thanks
forany help<br /> 

Re: problem with splitting a string

From
Tom Lane
Date:
Werner Echezuria <wercool@gmail.com> writes:
> I'm trying to develop a contrib module in order to parse sqlf queries, I'm
> using lemon as a LALR parser generator (because I think it's easier than
> bison) and re2c (because I think it's easier than flex) but when I try to
> split the string into words postgres add some weird characters (this works
> in pure gcc), I write something like "CREATE FUZZY PREDICATE joven ON 0..120
> AS (0,0,35,120);", but postgresql adds a character like  at the end of
> "joven" and the others words.

Maybe you are expecting 'text' values to be null-terminated?  They are
not.  You might look into using TextDatumGetCString or related functions
to convert.
        regards, tom lane

PS: the chances of us accepting a contrib module that requires
significant unusual infrastructure to build seem pretty low from
where I sit.  You're certainly free to do whatever you want for
private work, or even for a pgfoundry project --- but if you do
have ambitions of this eventually becoming contrib, "it's easier"
is not going to be sufficient rationale to not use bison/flex.


Re: problem with splitting a string

From
Werner Echezuria
Date:
Hi,<br /><br />Well, I use TextDatumGetCString in the main file, but it remains with the weird characters.<br /><br
/>thisis the main file:<br /><br />#include "postgres.h"<br />#include "fmgr.h"<br />#include "gram.h"<br /> #include
"sqlf.h"<br/>#include "utils/builtins.h"<br /><br />extern Datum sqlf(PG_FUNCTION_ARGS);<br /><br />PG_MODULE_MAGIC;<br
/><br/>PG_FUNCTION_INFO_V1(sqlf);<br /><br />Datum<br />sqlf(PG_FUNCTION_ARGS){<br /><br />    char        *query =
TextDatumGetCString(PG_GETARG_DATUM(0));<br/>     const char    *parse_str;<br />    char         *result;<br /><br
/>   parse_query(query,&parse_str);<br /><br />    result=parse_str;<br /><br />   
PG_RETURN_TEXT_P(cstring_to_text(result));<br/>}<br /><br />About the PS: Ok, I understand that if I want that you
includethis as a contrib module I need to use bison/flex, I never thought about it, but I now have a couple of
questions:<br/> What are the chances to really include it in PostgreSQL as a contrib module?<br />Are there any
requirementI have to follow?<br /><br /><div class="gmail_quote">2009/8/6 Tom Lane <span dir="ltr"><<a
href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>></span><br/><blockquote class="gmail_quote"
style="border-left:1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="im">Werner
Echezuria<<a href="mailto:wercool@gmail.com">wercool@gmail.com</a>> writes:<br /> > I'm trying to develop a
contribmodule in order to parse sqlf queries, I'm<br /> > using lemon as a LALR parser generator (because I think
it'seasier than<br /> > bison) and re2c (because I think it's easier than flex) but when I try to<br /> > split
thestring into words postgres add some weird characters (this works<br /> > in pure gcc), I write something like
"CREATEFUZZY PREDICATE joven ON 0..120<br /> > AS (0,0,35,120);", but postgresql adds a character like   at the end
of<br/> > "joven" and the others words.<br /><br /></div>Maybe you are expecting 'text' values to be
null-terminated? They are<br /> not.  You might look into using TextDatumGetCString or related functions<br /> to
convert.<br/><br />                        regards, tom lane<br /><br /> PS: the chances of us accepting a contrib
modulethat requires<br /> significant unusual infrastructure to build seem pretty low from<br /> where I sit.  You're
certainlyfree to do whatever you want for<br /> private work, or even for a pgfoundry project --- but if you do<br />
haveambitions of this eventually becoming contrib, "it's easier"<br /> is not going to be sufficient rationale to not
usebison/flex.<br /></blockquote></div><br /> 

Re: problem with splitting a string

From
Tom Lane
Date:
Werner Echezuria <wercool@gmail.com> writes:
> Well, I use TextDatumGetCString in the main file, but it remains with the
> weird characters.

Hmm, no ideas then.  Your interface code looks fine (making parse_str
const seems a bit strange, but it's not related to the problem at hand).
Given that the problems appear at token boundaries I'd guess that re2c
isn't behaving the way you expect, but I'm not familiar with that tool
so I can't give any specific advice.

> About the PS: Ok, I understand that if I want that you include this as a
> contrib module I need to use bison/flex, I never thought about it, but I now
> have a couple of questions:
> What are the chances to really include it in PostgreSQL as a contrib module?
> Are there any requirement I have to follow?

Well, it'd mainly be a question of whether there's enough interest out
there, which I can't judge.  From a project standpoint we just require
that it be BSD-licensed and not impose any undue new burden on
maintainers (thus not wanting new build tools), but beyond that it's a
matter of how many people might use it.
        regards, tom lane


Re: problem with splitting a string

From
Alvaro Herrera
Date:
Tom Lane escribió:

> Well, it'd mainly be a question of whether there's enough interest out
> there, which I can't judge.  From a project standpoint we just require
> that it be BSD-licensed and not impose any undue new burden on
> maintainers (thus not wanting new build tools), but beyond that it's a
> matter of how many people might use it.

What use is there for fuzzy predicates?  I think it would mainly be to
stop more students from coming up with new implementations of the same
thing over and over.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: problem with splitting a string

From
Werner Echezuria
Date:
<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt
0pt0pt 0.8ex; padding-left: 1ex;"><div class="im"><br /></div>What use is there for fuzzy predicates?  I think it would
mainlybe to<br /> stop more students from coming up with new implementations of the same<br /> thing over and over.<br
/></blockquote></div><br/>Well, I'm sorry if anyone of us who is involved on these projects have already explain the
trueusefulness of sqlf and fuzzy database, I guess we focus just in the technical problem, but never explain the
theory.<br/><br />For example here is a paragraph from Flexible queries in relational databases paper:<br /><br />  
"Thispaper deals with this second type of "uncertainty" and is concerned essentially with<br />database language
extensionsin order to deal with more expressive requirements. Indeed,<br /> consider a query such that, for instance,
"retrievethe apartments which are not too expensive<br />and not too far from downtown". In such a case, there does not
exista definite threshold for<br />which the price becomes suddenly too high, but rather we have to discriminate
between<br/> prices which are perfectly acceptable for the user, and other prices, somewhat higher, which<br />are
stillmore or less acceptable (especially if the apartment is close to downtown). Note that<br />the meaning of vague
predicateexpressions like "not too expensive" is context/user<br /> dependent, rather than universal. Fuzzy set
membershipfunctions [26] are convenient tools<br />for modelling user's preference profiles and the large panoply of
fuzzyset connectives can<br />capture the different user attitudes concerning the way the different criteria present in
his/her<br/> query compensate or not; see [4] for a unified presentation in the fuzzy set framework of the<br
/>existingproposals for handling flexible queries. Moreover in a given query, some part of the<br />request may be less
importantto fulfill (e.g., in the above example, the price requirement<br /> may be judged more important than the
distanceto downtown); the handling of importance<br />leads to the need for weighted connectives, as it will be seen in
thefollowing."<br /><br /><br />I really think this could be something useful, but it is sometimes difficult to
implementand I'm trying to make a different and easy way to do things.<br /><br />regards<br />