librelist archives

« back to archive

Literal Javascript?

Literal Javascript?

From:
Adrian
Date:
2011-05-05 @ 01:45
I'm working on a parser that adds minor syntactical improvements to
javascript. It is designed to be a strict superset, so apart from the
additions, no input should not be processed. I sort of have this working, I
have a token that matches all characters that I search for after the other
tokens. In other words I have a token matching `[\s\S]+` named `JS`. The
problem is that, being a greedy match, none of the input after literal
javascript is ever matched. I tried a lazy match or only matching a single
character, but that didn't seem to work. Is it at all possible to allow
literal javascript like this? My parser source is below:


    %lex

    %%


    "@@{"    return 'AFN';

    "@{"     return 'FN';

    "}"      return 'CLOSE';

    [\s\S]+  return 'JS';

    <<EOF>>  return 'EOF';


    /lex


    %start expressions


    %%


    expressions

      : e EOF

        {return $1;}

      ;


    e

      : AFN e CLOSE

        {{$$ = (function() { $2 }());}}

      | FN e CLOSE

        {{$$ = "function() {" + $2 + "}";}}

      | JS

        {$$ = $1;}

      ;


Obviously it will do more eventually, but I can't really do anything with it
until I have this problem sorted out

Re: [jison] Literal Javascript?

From:
Zachary Carter
Date:
2011-05-05 @ 02:07
On Wed, May 4, 2011 at 9:45 PM, Adrian <adrian@adrusi.com> wrote:
> I'm working on a parser that adds minor syntactical improvements to
> javascript. It is designed to be a strict superset, so apart from the
> additions, no input should not be processed. I sort of have this working, I
> have a token that matches all characters that I search for after the other
> tokens. In other words I have a token matching `[\s\S]+` named `JS`. The
> problem is that, being a greedy match, none of the input after literal
> javascript is ever matched. I tried a lazy match or only matching a single
> character, but that didn't seem to work. Is it at all possible to allow
> literal javascript like this? My parser source is below:
>
>     %lex
>
>     %%
>
>     "@@{"    return 'AFN';
>
>     "@{"     return 'FN';
>
>     "}"      return 'CLOSE';
>
>     [\s\S]+  return 'JS';
>
>     <<EOF>>  return 'EOF';
>
>     /lex
>
>     %start expressions
>
>     %%
>
>     expressions
>
>       : e EOF
>
>         {return $1;}
>
>       ;
>
>     e
>
>       : AFN e CLOSE
>
>         {{$$ = (function() { $2 }());}}
>
>       | FN e CLOSE
>
>         {{$$ = "function() {" + $2 + "}";}}
>
>       | JS
>
>         {$$ = $1;}
>
>       ;
>
> Obviously it will do more eventually, but I can't really do anything with it
> until I have this problem sorted out

The tricky part about this is that the `JS` code could also have
closing braces, so in order to know when you reach the brace matching
your extensions, you'll effectively have to parse it.

You may not have to parse all of JavaScript's syntax though. I would
try tokenizing chunks of non-brace characters and braces separately.
Something like:

...
[^{}]+    return 'NonBraceJS';
"{"      return 'OPEN';
"}"      return 'CLOSE';

...

Then in your grammar you can build the JS strings back up and you'll
also have correctly matched braces.

I actually need to implement something like this for Jison's grammar
parser so we can get rid of those nasty double braces around semantic
actions. If you are still having trouble by next week I should have
something working.

-- 
Zach Carter

Re: [jison] Literal Javascript?

From:
Zachary Carter
Date:
2011-06-19 @ 18:24
FYI, here is the solution I used for semantic actions in Jison's grammar files.

The lexer makes use of start conditions[1]:

%x action

%%

// ... other rules ...
"{"                     yy.depth=0; this.begin('action'); return '{';
// ... other rules ...

<action>[^{}]+          return 'ACTION_BODY';
<action>"{"             yy.depth++; return '{';
<action>"}"             yy.depth==0? this.begin('INITIAL') :
yy.depth--; return '}';


And here are the relevant grammar rules:

action
    : '{' action_body '}'
        {$$ = $2;}
    |
        {$$ = '';}
    ;

action_body
    :
        {$$ = '';}
    | ACTION_BODY
        {$$ = yytext;}
    | action_body '{' action_body '}' action_body
        {$$ = $1+$2+$3+$4+$5;}
    ;

On Wed, May 4, 2011 at 10:07 PM, Zachary Carter <zack.carter@gmail.com> wrote:
> On Wed, May 4, 2011 at 9:45 PM, Adrian <adrian@adrusi.com> wrote:
>> I'm working on a parser that adds minor syntactical improvements to
>> javascript. It is designed to be a strict superset, so apart from the
>> additions, no input should not be processed. I sort of have this working, I
>> have a token that matches all characters that I search for after the other
>> tokens. In other words I have a token matching `[\s\S]+` named `JS`. The
>> problem is that, being a greedy match, none of the input after literal
>> javascript is ever matched. I tried a lazy match or only matching a single
>> character, but that didn't seem to work. Is it at all possible to allow
>> literal javascript like this? My parser source is below:
>>
>>     %lex
>>
>>     %%
>>
>>     "@@{"    return 'AFN';
>>
>>     "@{"     return 'FN';
>>
>>     "}"      return 'CLOSE';
>>
>>     [\s\S]+  return 'JS';
>>
>>     <<EOF>>  return 'EOF';
>>
>>     /lex
>>
>>     %start expressions
>>
>>     %%
>>
>>     expressions
>>
>>       : e EOF
>>
>>         {return $1;}
>>
>>       ;
>>
>>     e
>>
>>       : AFN e CLOSE
>>
>>         {{$$ = (function() { $2 }());}}
>>
>>       | FN e CLOSE
>>
>>         {{$$ = "function() {" + $2 + "}";}}
>>
>>       | JS
>>
>>         {$$ = $1;}
>>
>>       ;
>>
>> Obviously it will do more eventually, but I can't really do anything with it
>> until I have this problem sorted out
>
> The tricky part about this is that the `JS` code could also have
> closing braces, so in order to know when you reach the brace matching
> your extensions, you'll effectively have to parse it.
>
> You may not have to parse all of JavaScript's syntax though. I would
> try tokenizing chunks of non-brace characters and braces separately.
> Something like:
>
> ...
> [^{}]+    return 'NonBraceJS';
> "{"      return 'OPEN';
> "}"      return 'CLOSE';
>
> ...
>
> Then in your grammar you can build the JS strings back up and you'll
> also have correctly matched braces.
>
> I actually need to implement something like this for Jison's grammar
> parser so we can get rid of those nasty double braces around semantic
> actions. If you are still having trouble by next week I should have
> something working.
>
> --
> Zach Carter
>



-- 
Zach Carter

Re: [jison] Literal Javascript?

From:
Robert Plummer
Date:
2011-06-19 @ 18:50
How will this need modified in order to work with tiki syntax? Would you be
interested in us working together to integrate into tiki syntax this system
as a paying gig?  What do you charge per hour?

On Sun, Jun 19, 2011 at 2:24 PM, Zachary Carter <zack.carter@gmail.com>wrote:

> FYI, here is the solution I used for semantic actions in Jison's grammar
> files.
>
> The lexer makes use of start conditions[1]:
>
> %x action
>
> %%
>
> // ... other rules ...
> "{"                     yy.depth=0; this.begin('action'); return '{';
> // ... other rules ...
>
> <action>[^{}]+          return 'ACTION_BODY';
> <action>"{"             yy.depth++; return '{';
> <action>"}"             yy.depth==0? this.begin('INITIAL') :
> yy.depth--; return '}';
>
>
> And here are the relevant grammar rules:
>
> action
>    : '{' action_body '}'
>        {$$ = $2;}
>    |
>        {$$ = '';}
>    ;
>
> action_body
>    :
>        {$$ = '';}
>    | ACTION_BODY
>        {$$ = yytext;}
>    | action_body '{' action_body '}' action_body
>        {$$ = $1+$2+$3+$4+$5;}
>    ;
>
> On Wed, May 4, 2011 at 10:07 PM, Zachary Carter <zack.carter@gmail.com>
> wrote:
> > On Wed, May 4, 2011 at 9:45 PM, Adrian <adrian@adrusi.com> wrote:
> >> I'm working on a parser that adds minor syntactical improvements to
> >> javascript. It is designed to be a strict superset, so apart from the
> >> additions, no input should not be processed. I sort of have this
> working, I
> >> have a token that matches all characters that I search for after the
> other
> >> tokens. In other words I have a token matching `[\s\S]+` named `JS`. The
> >> problem is that, being a greedy match, none of the input after literal
> >> javascript is ever matched. I tried a lazy match or only matching a
> single
> >> character, but that didn't seem to work. Is it at all possible to allow
> >> literal javascript like this? My parser source is below:
> >>
> >>     %lex
> >>
> >>     %%
> >>
> >>     "@@{"    return 'AFN';
> >>
> >>     "@{"     return 'FN';
> >>
> >>     "}"      return 'CLOSE';
> >>
> >>     [\s\S]+  return 'JS';
> >>
> >>     <<EOF>>  return 'EOF';
> >>
> >>     /lex
> >>
> >>     %start expressions
> >>
> >>     %%
> >>
> >>     expressions
> >>
> >>       : e EOF
> >>
> >>         {return $1;}
> >>
> >>       ;
> >>
> >>     e
> >>
> >>       : AFN e CLOSE
> >>
> >>         {{$$ = (function() { $2 }());}}
> >>
> >>       | FN e CLOSE
> >>
> >>         {{$$ = "function() {" + $2 + "}";}}
> >>
> >>       | JS
> >>
> >>         {$$ = $1;}
> >>
> >>       ;
> >>
> >> Obviously it will do more eventually, but I can't really do anything
> with it
> >> until I have this problem sorted out
> >
> > The tricky part about this is that the `JS` code could also have
> > closing braces, so in order to know when you reach the brace matching
> > your extensions, you'll effectively have to parse it.
> >
> > You may not have to parse all of JavaScript's syntax though. I would
> > try tokenizing chunks of non-brace characters and braces separately.
> > Something like:
> >
> > ...
> > [^{}]+    return 'NonBraceJS';
> > "{"      return 'OPEN';
> > "}"      return 'CLOSE';
> >
> > ...
> >
> > Then in your grammar you can build the JS strings back up and you'll
> > also have correctly matched braces.
> >
> > I actually need to implement something like this for Jison's grammar
> > parser so we can get rid of those nasty double braces around semantic
> > actions. If you are still having trouble by next week I should have
> > something working.
> >
> > --
> > Zach Carter
> >
>
>
>
> --
> Zach Carter
>



-- 
Robert Plummer

Re: [jison] Literal Javascript?

From:
Zachary Carter
Date:
2011-06-19 @ 20:25
On Sun, Jun 19, 2011 at 2:50 PM, Robert Plummer
<robertleeplummerjr@gmail.com> wrote:
> How will this need modified in order to work with tiki syntax? Would you be
> interested in us working together to integrate into tiki syntax this system
> as a paying gig?  What do you charge per hour?

Your case is a bit different in some ways -- I'll shoot an email off-list.

>
> On Sun, Jun 19, 2011 at 2:24 PM, Zachary Carter <zack.carter@gmail.com>
> wrote:
>>
>> FYI, here is the solution I used for semantic actions in Jison's grammar
>> files.
>>
>> The lexer makes use of start conditions[1]:
>>
>> %x action
>>
>> %%
>>
>> // ... other rules ...
>> "{"                     yy.depth=0; this.begin('action'); return '{';
>> // ... other rules ...
>>
>> <action>[^{}]+          return 'ACTION_BODY';
>> <action>"{"             yy.depth++; return '{';
>> <action>"}"             yy.depth==0? this.begin('INITIAL') :
>> yy.depth--; return '}';
>>
>>
>> And here are the relevant grammar rules:
>>
>> action
>>    : '{' action_body '}'
>>        {$$ = $2;}
>>    |
>>        {$$ = '';}
>>    ;
>>
>> action_body
>>    :
>>        {$$ = '';}
>>    | ACTION_BODY
>>        {$$ = yytext;}
>>    | action_body '{' action_body '}' action_body
>>        {$$ = $1+$2+$3+$4+$5;}
>>    ;
>>
>> On Wed, May 4, 2011 at 10:07 PM, Zachary Carter <zack.carter@gmail.com>
>> wrote:
>> > On Wed, May 4, 2011 at 9:45 PM, Adrian <adrian@adrusi.com> wrote:
>> >> I'm working on a parser that adds minor syntactical improvements to
>> >> javascript. It is designed to be a strict superset, so apart from the
>> >> additions, no input should not be processed. I sort of have this
>> >> working, I
>> >> have a token that matches all characters that I search for after the
>> >> other
>> >> tokens. In other words I have a token matching `[\s\S]+` named `JS`.
>> >> The
>> >> problem is that, being a greedy match, none of the input after literal
>> >> javascript is ever matched. I tried a lazy match or only matching a
>> >> single
>> >> character, but that didn't seem to work. Is it at all possible to allow
>> >> literal javascript like this? My parser source is below:
>> >>
>> >>     %lex
>> >>
>> >>     %%
>> >>
>> >>     "@@{"    return 'AFN';
>> >>
>> >>     "@{"     return 'FN';
>> >>
>> >>     "}"      return 'CLOSE';
>> >>
>> >>     [\s\S]+  return 'JS';
>> >>
>> >>     <<EOF>>  return 'EOF';
>> >>
>> >>     /lex
>> >>
>> >>     %start expressions
>> >>
>> >>     %%
>> >>
>> >>     expressions
>> >>
>> >>       : e EOF
>> >>
>> >>         {return $1;}
>> >>
>> >>       ;
>> >>
>> >>     e
>> >>
>> >>       : AFN e CLOSE
>> >>
>> >>         {{$$ = (function() { $2 }());}}
>> >>
>> >>       | FN e CLOSE
>> >>
>> >>         {{$$ = "function() {" + $2 + "}";}}
>> >>
>> >>       | JS
>> >>
>> >>         {$$ = $1;}
>> >>
>> >>       ;
>> >>
>> >> Obviously it will do more eventually, but I can't really do anything
>> >> with it
>> >> until I have this problem sorted out
>> >
>> > The tricky part about this is that the `JS` code could also have
>> > closing braces, so in order to know when you reach the brace matching
>> > your extensions, you'll effectively have to parse it.
>> >
>> > You may not have to parse all of JavaScript's syntax though. I would
>> > try tokenizing chunks of non-brace characters and braces separately.
>> > Something like:
>> >
>> > ...
>> > [^{}]+    return 'NonBraceJS';
>> > "{"      return 'OPEN';
>> > "}"      return 'CLOSE';
>> >
>> > ...
>> >
>> > Then in your grammar you can build the JS strings back up and you'll
>> > also have correctly matched braces.
>> >
>> > I actually need to implement something like this for Jison's grammar
>> > parser so we can get rid of those nasty double braces around semantic
>> > actions. If you are still having trouble by next week I should have
>> > something working.
>> >
>> > --
>> > Zach Carter
>> >
>>
>>
>>
>> --
>> Zach Carter
>
>
>
> --
> Robert Plummer
>



-- 
Zach Carter