Re: Ltree syntax improvement - Mailing list pgsql-hackers
From | Nikolay Shaplov |
---|---|
Subject | Re: Ltree syntax improvement |
Date | |
Msg-id | 1659194.9VRYtGT3Pi@x200m Whole thread Raw |
In response to | Ltree syntax improvement (Dmitry Belyavsky <beldmit@gmail.com>) |
Responses |
Re: Ltree syntax improvement
|
List | pgsql-hackers |
В письме от вторник, 29 января 2019 г. 20:43:07 MSK пользователь Dmitry Belyavsky написал: > Dear all, > > Please find attached the patch extending the sets of symbols allowed in > ltree labels. The patch introduces 2 variants of escaping symbols, via > backslashing separate symbols and via quoting the labels in whole for > ltree, lquery and ltxtquery datatypes. Sorry I still did not do full revision, I've stumbled up on other things in the code, which, as I think should be fixed before final checking through. (I guess that code do what is expected, since it passes tests and so on, it needs recheck, but I'd like to do this full recheck only once) So my doubts about this code is that a lot parts of code is just a copy-paste of the same code that was written right above. And it can be rewritten in a way that the same lines (or parts of lines) is repeated only once... This should make code more readable, and make code more similar to the code of postgres core, I've seen. Here I will speak about lquery_in function from ltree_io.c. 1. May be it is better here to switch from else if (state == LQPRS_[something]) to switch (stat) { case LQPRS_[something]: } because because here the main case is to determine what is the current state of your state machine, do something with it, and then go to the next step. Now after you've done what should be done in this step, you should make sure that no other code is executed writing more ifs... If you use switch/case you will be able to say break wherever you like, it will save you some ifs. Theoretical example (cl is for character length fc is for fist char) if (cl==1 && fc =='1') {do_case1();} else if (cl==1 && fc =='2') {do_case2();} else if (cl==1 && fc =='3') {do_case3();} else {do_otherwise();} If it is wrapped in switch/case it will be like if (cl==1) { if (fc =='1') {do_case1(); break;} if (fc =='2') {do_case2(); break;} if (fc =='3') {do_case3(); break;} } /*if nothing is found */ do_otherwise(); This for example will allow you to get rid of multiply charlen == 1 checks in else if ((lptr->flag & LVAR_QUOTEDPART) == 0) { if (charlen == 1 && t_iseq(ptr, '@')) { if (lptr->start == ptr) UNCHAR; lptr->flag |= LVAR_INCASE; curqlevel->flag |= LVAR_INCASE; } else if (charlen == 1 && t_iseq(ptr, '*')) { if (lptr->start == ptr) UNCHAR; lptr->flag |= LVAR_ANYEND; curqlevel->flag |= LVAR_ANYEND; } else if (charlen == 1 && t_iseq(ptr, '%')) { if (lptr->start == ptr) UNCHAR; lptr->flag |= LVAR_SUBLEXEME; curqlevel->flag |= LVAR_SUBLEXEME; } else if (charlen == 1 && t_iseq(ptr, '|')) { real_nodeitem_len(lptr, ptr, escaped_count, tail_space_bytes, tail_space_symbols); if (state == LQPRS_WAITDELIMSTRICT) adjust_quoted_nodeitem(lptr); check_level_length(lptr, pos); state = LQPRS_WAITVAR; } else if (charlen == 1 && t_iseq(ptr, '.')) { real_nodeitem_len(lptr, ptr, escaped_count, tail_space_bytes, tail_space_symbols); if (state == LQPRS_WAITDELIMSTRICT) adjust_quoted_nodeitem(lptr); check_level_length(lptr, pos); state = LQPRS_WAITLEVEL; curqlevel = NEXTLEV(curqlevel); } else if (charlen == 1 && t_iseq(ptr, '\\')) { if (state == LQPRS_WAITDELIMSTRICT) UNCHAR; state = LQPRS_WAITESCAPED; } else { if (charlen == 1 && strchr("!{}", *ptr)) UNCHAR; if (state == LQPRS_WAITDELIMSTRICT) { if (t_isspace(ptr)) { ptr += charlen; pos++; tail_space_bytes += charlen; tail_space_symbols = 1; continue; } UNCHAR; } if (lptr->flag & ~LVAR_QUOTEDPART) UNCHAR; } } There are a lot of them, I would suggest to reduce there number to one check in the right place. Similar thing is about "GETVAR(curqlevel) = lptr = (nodeitem *) palloc0(sizeof(nodeitem) * numOR);" in this part of code. if (charlen == 1) { if (t_iseq(ptr, '!')) { GETVAR(curqlevel) = lptr = (nodeitem *) palloc0(sizeof(nodeitem) * numOR); lptr->start = ptr + 1; state = LQPRS_WAITDELIM; curqlevel->numvar = 1; curqlevel->flag |= LQL_NOT; hasnot = true; } else if (t_iseq(ptr, '*')) state = LQPRS_WAITOPEN; else if (t_iseq(ptr, '\\')) { GETVAR(curqlevel) = lptr = (nodeitem *) palloc0(sizeof(nodeitem) * numOR); lptr->start = ptr; curqlevel->numvar = 1; state = LQPRS_WAITESCAPED; } else if (strchr(".|@%{}", *ptr)) { UNCHAR; } else { GETVAR(curqlevel) = lptr = (nodeitem *) palloc0(sizeof(nodeitem) * numOR); lptr->start = ptr; state = LQPRS_WAITDELIM; curqlevel->numvar = 1; if (t_iseq(ptr, '"')) { lptr->flag |= LVAR_QUOTEDPART; } } } else { GETVAR(curqlevel) = lptr = (nodeitem *) palloc0(sizeof(nodeitem) * numOR); lptr->start = ptr; state = LQPRS_WAITDELIM; curqlevel->numvar = 1; } It is repeated almost is every branch of this if-subtree... Can it be rewritten the way it is repeated only once? And so on. So my request is to reduce multiply repetition of the same code... PS. One other thing that I've been thinking over but did not came to final conclusion... It is often quite often case if (t_iseq(ptr, 'a'){same_code(); code_for_a();} else if (t_iseq(ptr, 'b'){same_code(); code_for_b();} else if (t_iseq(ptr, 'c'){same_code(); code_for_c);} else something_else(); I've been thinking if it is possible to move this to const unsigned char c=TOUCHAR(ptr); switch(c) { case 'a': case 'b': case 'c': same_code(); if (c=='a') {code_for_a();} if (c=='b') {code_for_b();} if (c=='c') {code_for_c();} break; default: something_else(); } I did not still get, how valid is replacement of t_iseq() with TOUCHAR() + "==". It is quite equivalent now, but I do not know how the core code is going to evolve, so can't get final understanding. I even tried to discuss it with Teodor (you've seen it) but still did come to any conclusion. So for now status of this idea is "my considerations about ways to deduplicate the code". So this is all essential things I can say about this patch as it is now. May be somebody can add something more...
pgsql-hackers by date: