librelist archives

« back to archive

gobbling up a block

gobbling up a block

From:
Joe Hellerstein
Date:
2011-05-18 @ 15:37
This is probably an elementary question, so forgive me.  

I'm writing a little DSL and would like it to be able to contain embedded 
Ruby blocks (delimited by curly braces or do/end).  During parsing I'd 
like to just gobble up the text of the block and stuff it into a single 
node in the AST.  

Of course there may be nested blocks inside the Ruby.  The nesting 
examples on the website seem to explicitly parse any matching delimiters. 
But I don't want to parse the Ruby here, I just want to gobble it up.  
However, catching the Ruby as 
 str('{') >> any.repeat(1) >> str('}') 
causes parsing to fail since the rightmost curly brace gets captured by 
the any.repeat.  I include an example below and output.

Any suggestions?

Thanks.
Joe

require 'rubygems'
require 'parslet'

class Mini < Parslet::Parser
 rule(:lbrace)     { str('{') >> space? }
 rule(:rbrace)     { str('}') >> space? }
 rule(:word) { match['a-z'].repeat(1) >> space? }
 rule(:space)      { match('\s').repeat(1) }
 rule(:space?)     { space.maybe }
 rule(:block) { lbrace >> any.repeat(1) >> rbrace }
 rule(:stmt) { (block.as(:block) | word).repeat }  
 root :stmt
end

def parse(str)
 mini = Mini.new
 print "Parsing #{str}: "

 p mini.parse(str)
 rescue Parslet::ParseFailed => error
   puts error, mini.root.error_tree
end

parse "joe is here {hi}"

(joeh@manila) 1351 > ruby lilparse.rb
Parsing joe is here {hi}: Don't know what to do with {hi} at line 1 char 13.
`- Expected one of [block:BLOCK, WORD]. at line 1 char 13.
  |- Failed to match sequence (LBRACE .{1, }) at line 1 char 17.
  |  `- Failed to match sequence ('}' SPACE?) at line 1 char 17.
  |     `- Premature end of input at line 1 char 17.
  `- Failed to match sequence ([a-z]{1, } SPACE?) at line 1 char 13.
     `- Expected at least 1 of [a-z] at line 1 char 13.
        `- Failed to match [a-z] at line 1 char 13.
(joeh@manila) 1352 > 

Re: [ruby.parslet] gobbling up a block

From:
Jonathan Rochkind
Date:
2011-05-18 @ 15:43
In general, this is somewhat similar to paren-balancing task. 
https://github.com/kschiess/parslet/blob/master/example/parens.rb

That might help you. Although wait, I guess that particular example 
doesn't allow your parens to have any content, heh.

Although what you are doing may be even simpler than a paren balancing 
task with content -- What do you want to happen to your nested blocks?  
They don't need to be parsed out seperately, they are still gobbled up 
in just one string, the top level block structure?

But I think that example should give you some hints maybe, or the more 
complicated one that actually lets the parens have content, here: 
https://github.com/kschiess/parslet/blob/master/example/minilisp.rb








On 5/18/2011 11:37 AM, Joe Hellerstein wrote:
> This is probably an elementary question, so forgive me.
>
> I'm writing a little DSL and would like it to be able to contain 
embedded Ruby blocks (delimited by curly braces or do/end).  During 
parsing I'd like to just gobble up the text of the block and stuff it into
a single node in the AST.
>
> Of course there may be nested blocks inside the Ruby.  The nesting 
examples on the website seem to explicitly parse any matching delimiters. 
But I don't want to parse the Ruby here, I just want to gobble it up.  
However, catching the Ruby as
>   str('{')>>  any.repeat(1)>>  str('}')
> causes parsing to fail since the rightmost curly brace gets captured by 
the any.repeat.  I include an example below and output.
>
> Any suggestions?
>
> Thanks.
> Joe
>
> require 'rubygems'
> require 'parslet'
>
> class Mini<  Parslet::Parser
>   rule(:lbrace)     { str('{')>>  space? }
>   rule(:rbrace)     { str('}')>>  space? }
>   rule(:word) { match['a-z'].repeat(1)>>  space? }
>   rule(:space)      { match('\s').repeat(1) }
>   rule(:space?)     { space.maybe }
>   rule(:block) { lbrace>>  any.repeat(1)>>  rbrace }
>   rule(:stmt) { (block.as(:block) | word).repeat }
>   root :stmt
> end
>
> def parse(str)
>   mini = Mini.new
>   print "Parsing #{str}: "
>
>   p mini.parse(str)
>   rescue Parslet::ParseFailed =>  error
>     puts error, mini.root.error_tree
> end
>
> parse "joe is here {hi}"
>
> (joeh@manila) 1351>  ruby lilparse.rb
> Parsing joe is here {hi}: Don't know what to do with {hi} at line 1 char 13.
> `- Expected one of [block:BLOCK, WORD]. at line 1 char 13.
>    |- Failed to match sequence (LBRACE .{1, }) at line 1 char 17.
>    |  `- Failed to match sequence ('}' SPACE?) at line 1 char 17.
>    |     `- Premature end of input at line 1 char 17.
>    `- Failed to match sequence ([a-z]{1, } SPACE?) at line 1 char 13.
>       `- Expected at least 1 of [a-z] at line 1 char 13.
>          `- Failed to match [a-z] at line 1 char 13.
> (joeh@manila) 1352>

Re: [ruby.parslet] gobbling up a block

From:
Jonathan Rochkind
Date:
2011-05-18 @ 15:47
Ie, something like this maybe (just typing it into the email client, 
dont' know if it even compiles, let alone works, but may give you some 
ideas).

rule :block do
    str('{') <<   content.maybe <<  block.maybe << content.maybe << str('}'
end

rule :content do
    match['^{}].repeat
end

The trick I think might work is the recursive call to block that will 
allow a (balanced) block to be inside a block, but otherwise we dont' 
allow '{' or '}'.  Except I actually have no idea if this will actually 
work, heh, but some ideas to work with.


On 5/18/2011 11:37 AM, Joe Hellerstein wrote:
> This is probably an elementary question, so forgive me.
>
> I'm writing a little DSL and would like it to be able to contain 
embedded Ruby blocks (delimited by curly braces or do/end).  During 
parsing I'd like to just gobble up the text of the block and stuff it into
a single node in the AST.
>
> Of course there may be nested blocks inside the Ruby.  The nesting 
examples on the website seem to explicitly parse any matching delimiters. 
But I don't want to parse the Ruby here, I just want to gobble it up.  
However, catching the Ruby as
>   str('{')>>  any.repeat(1)>>  str('}')
> causes parsing to fail since the rightmost curly brace gets captured by 
the any.repeat.  I include an example below and output.
>
> Any suggestions?
>
> Thanks.
> Joe
>
> require 'rubygems'
> require 'parslet'
>
> class Mini<  Parslet::Parser
>   rule(:lbrace)     { str('{')>>  space? }
>   rule(:rbrace)     { str('}')>>  space? }
>   rule(:word) { match['a-z'].repeat(1)>>  space? }
>   rule(:space)      { match('\s').repeat(1) }
>   rule(:space?)     { space.maybe }
>   rule(:block) { lbrace>>  any.repeat(1)>>  rbrace }
>   rule(:stmt) { (block.as(:block) | word).repeat }
>   root :stmt
> end
>
> def parse(str)
>   mini = Mini.new
>   print "Parsing #{str}: "
>
>   p mini.parse(str)
>   rescue Parslet::ParseFailed =>  error
>     puts error, mini.root.error_tree
> end
>
> parse "joe is here {hi}"
>
> (joeh@manila) 1351>  ruby lilparse.rb
> Parsing joe is here {hi}: Don't know what to do with {hi} at line 1 char 13.
> `- Expected one of [block:BLOCK, WORD]. at line 1 char 13.
>    |- Failed to match sequence (LBRACE .{1, }) at line 1 char 17.
>    |  `- Failed to match sequence ('}' SPACE?) at line 1 char 17.
>    |     `- Premature end of input at line 1 char 17.
>    `- Failed to match sequence ([a-z]{1, } SPACE?) at line 1 char 13.
>       `- Expected at least 1 of [a-z] at line 1 char 13.
>          `- Failed to match [a-z] at line 1 char 13.
> (joeh@manila) 1352>

Re: [ruby.parslet] gobbling up a block

From:
Jonathan Rochkind
Date:
2011-05-19 @ 14:31
So, do you think your task ends up being basically writing a parser for 
the ruby language? That's obviously a somewhat hard problem. :)

What does the "DSL" you are embedding these 'blocks' in look like?  
There may be an easier way of approaching the parsing, knowing the 
context.  Or you may be able to alter your "DSL" to make the parsing 
easier, but more unambiguously parseable things in the rest of the DSL.

But there's a reason that many ruby libraries use plain old ruby code as 
a "DSL" -- you already have a ruby parser, ruby itself, and don't need 
to write one.

Jonathan

On 5/18/2011 5:25 PM, Joe Hellerstein wrote:
> The plot thickens when we consider handling "do..end" blocks in Ruby.
>
> To extend our curly-brace-balancing scheme, we'd need to balance all 
uses of "end" inside a Ruby block.  That is fine -- there's only a small 
set of Ruby expressions that end in "end" and I can enumerate their 
starting words (if, unless, while, until, case, for, class, module, def.)
>
> The problem is the way Ruby allows the use of "if"/"unless" at the end 
of a Ruby statement without a matching "end".  To do my "end" balancing 
right and handle these cases, I'd need to recognize Ruby statements so I 
could figure out that the "if"/"unless" logic was a suffix.  Blech.
>
> Any further thoughts?
>
> J
>
>
> On May 18, 2011, at 12:39 PM, Joe Hellerstein wrote:
>
>> Nicely done and thank you!  As long as I don't mind enforcing 
delimiter-balancing in the thing I'm gobbling up (and in this case I 
don't), your trick works.
>>
>> Fixed example below for future ref.
>>
>> Joe
>>
>> require 'rubygems'
>> require 'parslet'
>>
>> class Mini<  Parslet::Parser
>>   rule(:lbrace)     { str('{')>>  space? }
>>   rule(:rbrace)     { str('}')>>  space? }
>>   rule(:word) { match['a-z'].repeat(1)>>  space? }
>>   rule(:space)      { match('\s').repeat(1) }
>>   rule(:space?)     { space.maybe }
>>   rule(:block) { lbrace>>  (content | block).repeat(1)>>  rbrace }
>>   rule(:content) { match['^{}'] }
>>
>>   rule(:stmt) { (block.as(:block) | word).repeat }
>>   root :stmt
>> end
>>
>> def parse(str)
>>   mini = Mini.new
>>   print "Parsing #{str}: "
>>
>>   p mini.parse(str)
>>   rescue Parslet::ParseFailed =>  error
>>     puts error, mini.root.error_tree
>> end
>>
>> parse "joe is here {hi {it's} joe}"
>>
>>
>>
>> On May 18, 2011, at 11:47 AM, Jonathan Rochkind wrote:
>>
>>> Ie, something like this maybe (just typing it into the email client, 
dont' know if it even compiles, let alone works, but may give you some 
ideas).
>>>
>>> rule :block do
>>>   str('{')<<    content.maybe<<   block.maybe<<  content.maybe<<  str('}'
>>> end
>>>
>>> rule :content do
>>>   match['^{}].repeat
>>> end
>>>
>>> The trick I think might work is the recursive call to block that will 
allow a (balanced) block to be inside a block, but otherwise we dont' 
allow '{' or '}'.  Except I actually have no idea if this will actually 
work, heh, but some ideas to work with.
>>>

Re: [ruby.parslet] gobbling up a block

From:
Jonathan Rochkind
Date:
2011-05-19 @ 14:51
For instance, if your 'dsl' was YAML but a value could look like a 'ruby 
block'... then you would just need to parse YAML, your parser wouldn't 
have to care that a value happened to look like a 'ruby block', that's 
just payload like any other.  And once parsed, you could investigate the 
payloads with by using imperative code, or with regexps,  to see which 
ones looked like 'a ruby block', if it mattered to pull those out.

And of course there's already a YAML parser built into ruby, you 
wouldn't need to write one.

If your "dsl" is not YAML but is your own home-built thing, the same 
basic approach could be tried, of structuring it so your parser doesn't 
actually have to recognize a 'block', it's just a payload in the larger 
structure.

One way or another, I suspect you want to re-think the nature of your 
"dsl" to be easier to work with.

Ironically, however, Parslet itself demonstrates the utility of using 
Plain Old Ruby for your "dsl" -- note how a parslet grammar is just ruby 
code, it's not parsed by anything other than ruby itself.  Again, 
there's a reason many ruby libraries take this approach, writing your 
own parser for a 'dsl' instead can be a lot of cost for little benefit 
compared to just structuring your API such that plain old ruby is a 
decent "dsl".

On 5/19/2011 10:31 AM, Jonathan Rochkind wrote:
> So, do you think your task ends up being basically writing a parser for
> the ruby language? That's obviously a somewhat hard problem. :)
>
> What does the "DSL" you are embedding these 'blocks' in look like?
> There may be an easier way of approaching the parsing, knowing the
> context.  Or you may be able to alter your "DSL" to make the parsing
> easier, but more unambiguously parseable things in the rest of the DSL.
>
> But there's a reason that many ruby libraries use plain old ruby code as
> a "DSL" -- you already have a ruby parser, ruby itself, and don't need
> to write one.
>
> Jonathan
>
> On 5/18/2011 5:25 PM, Joe Hellerstein wrote:
>> The plot thickens when we consider handling "do..end" blocks in Ruby.
>>
>> To extend our curly-brace-balancing scheme, we'd need to balance all 
uses of "end" inside a Ruby block.  That is fine -- there's only a small 
set of Ruby expressions that end in "end" and I can enumerate their 
starting words (if, unless, while, until, case, for, class, module, def.)
>>
>> The problem is the way Ruby allows the use of "if"/"unless" at the end 
of a Ruby statement without a matching "end".  To do my "end" balancing 
right and handle these cases, I'd need to recognize Ruby statements so I 
could figure out that the "if"/"unless" logic was a suffix.  Blech.
>>
>> Any further thoughts?
>>
>> J
>>
>>
>> On May 18, 2011, at 12:39 PM, Joe Hellerstein wrote:
>>
>>> Nicely done and thank you!  As long as I don't mind enforcing 
delimiter-balancing in the thing I'm gobbling up (and in this case I 
don't), your trick works.
>>>
>>> Fixed example below for future ref.
>>>
>>> Joe
>>>
>>> require 'rubygems'
>>> require 'parslet'
>>>
>>> class Mini<   Parslet::Parser
>>>    rule(:lbrace)     { str('{')>>   space? }
>>>    rule(:rbrace)     { str('}')>>   space? }
>>>    rule(:word) { match['a-z'].repeat(1)>>   space? }
>>>    rule(:space)      { match('\s').repeat(1) }
>>>    rule(:space?)     { space.maybe }
>>>    rule(:block) { lbrace>>   (content | block).repeat(1)>>   rbrace }
>>>    rule(:content) { match['^{}'] }
>>>
>>>    rule(:stmt) { (block.as(:block) | word).repeat }
>>>    root :stmt
>>> end
>>>
>>> def parse(str)
>>>    mini = Mini.new
>>>    print "Parsing #{str}: "
>>>
>>>    p mini.parse(str)
>>>    rescue Parslet::ParseFailed =>   error
>>>      puts error, mini.root.error_tree
>>> end
>>>
>>> parse "joe is here {hi {it's} joe}"
>>>
>>>
>>>
>>> On May 18, 2011, at 11:47 AM, Jonathan Rochkind wrote:
>>>
>>>> Ie, something like this maybe (just typing it into the email client, 
dont' know if it even compiles, let alone works, but may give you some 
ideas).
>>>>
>>>> rule :block do
>>>>    str('{')<<     content.maybe<<    block.maybe<<   content.maybe<<
str('}'
>>>> end
>>>>
>>>> rule :content do
>>>>    match['^{}].repeat
>>>> end
>>>>
>>>> The trick I think might work is the recursive call to block that will
allow a (balanced) block to be inside a block, but otherwise we dont' 
allow '{' or '}'.  Except I actually have no idea if this will actually 
work, heh, but some ideas to work with.
>>>>

Re: [ruby.parslet] gobbling up a block

From:
Joe Hellerstein
Date:
2011-05-19 @ 20:14
Understood.  My DSL is Bloom, http://bloom-lang.net  We already did a 
"plain-old-ruby" implementation via significant metaprogramming in the 
"bud" prototype. I'm trying to put a cleaner parser front-end on it to 
support program rewriting in a better way than our current Ruby AST 
rewrites.

Anyhow, for the time being I think it's sufficient for me to introduce a 
new pair of bracketing keywords into the DSL as a workaround.

Thanks for the help!
Joe



On May 19, 2011, at 10:51 AM, Jonathan Rochkind wrote:

> For instance, if your 'dsl' was YAML but a value could look like a 'ruby 
> block'... then you would just need to parse YAML, your parser wouldn't 
> have to care that a value happened to look like a 'ruby block', that's 
> just payload like any other.  And once parsed, you could investigate the 
> payloads with by using imperative code, or with regexps,  to see which 
> ones looked like 'a ruby block', if it mattered to pull those out.
> 
> And of course there's already a YAML parser built into ruby, you 
> wouldn't need to write one.
> 
> If your "dsl" is not YAML but is your own home-built thing, the same 
> basic approach could be tried, of structuring it so your parser doesn't 
> actually have to recognize a 'block', it's just a payload in the larger 
> structure.
> 
> One way or another, I suspect you want to re-think the nature of your 
> "dsl" to be easier to work with.
> 
> Ironically, however, Parslet itself demonstrates the utility of using 
> Plain Old Ruby for your "dsl" -- note how a parslet grammar is just ruby 
> code, it's not parsed by anything other than ruby itself.  Again, 
> there's a reason many ruby libraries take this approach, writing your 
> own parser for a 'dsl' instead can be a lot of cost for little benefit 
> compared to just structuring your API such that plain old ruby is a 
> decent "dsl".
> 
> On 5/19/2011 10:31 AM, Jonathan Rochkind wrote:
>> So, do you think your task ends up being basically writing a parser for
>> the ruby language? That's obviously a somewhat hard problem. :)
>> 
>> What does the "DSL" you are embedding these 'blocks' in look like?
>> There may be an easier way of approaching the parsing, knowing the
>> context.  Or you may be able to alter your "DSL" to make the parsing
>> easier, but more unambiguously parseable things in the rest of the DSL.
>> 
>> But there's a reason that many ruby libraries use plain old ruby code as
>> a "DSL" -- you already have a ruby parser, ruby itself, and don't need
>> to write one.
>> 
>> Jonathan
>> 
>> On 5/18/2011 5:25 PM, Joe Hellerstein wrote:
>>> The plot thickens when we consider handling "do..end" blocks in Ruby.
>>> 
>>> To extend our curly-brace-balancing scheme, we'd need to balance all 
uses of "end" inside a Ruby block.  That is fine -- there's only a small 
set of Ruby expressions that end in "end" and I can enumerate their 
starting words (if, unless, while, until, case, for, class, module, def.)
>>> 
>>> The problem is the way Ruby allows the use of "if"/"unless" at the end
of a Ruby statement without a matching "end".  To do my "end" balancing 
right and handle these cases, I'd need to recognize Ruby statements so I 
could figure out that the "if"/"unless" logic was a suffix.  Blech.
>>> 
>>> Any further thoughts?
>>> 
>>> J
>>> 
>>> 
>>> On May 18, 2011, at 12:39 PM, Joe Hellerstein wrote:
>>> 
>>>> Nicely done and thank you!  As long as I don't mind enforcing 
delimiter-balancing in the thing I'm gobbling up (and in this case I 
don't), your trick works.
>>>> 
>>>> Fixed example below for future ref.
>>>> 
>>>> Joe
>>>> 
>>>> require 'rubygems'
>>>> require 'parslet'
>>>> 
>>>> class Mini<   Parslet::Parser
>>>>   rule(:lbrace)     { str('{')>>   space? }
>>>>   rule(:rbrace)     { str('}')>>   space? }
>>>>   rule(:word) { match['a-z'].repeat(1)>>   space? }
>>>>   rule(:space)      { match('\s').repeat(1) }
>>>>   rule(:space?)     { space.maybe }
>>>>   rule(:block) { lbrace>>   (content | block).repeat(1)>>   rbrace }
>>>>   rule(:content) { match['^{}'] }
>>>> 
>>>>   rule(:stmt) { (block.as(:block) | word).repeat }
>>>>   root :stmt
>>>> end
>>>> 
>>>> def parse(str)
>>>>   mini = Mini.new
>>>>   print "Parsing #{str}: "
>>>> 
>>>>   p mini.parse(str)
>>>>   rescue Parslet::ParseFailed =>   error
>>>>     puts error, mini.root.error_tree
>>>> end
>>>> 
>>>> parse "joe is here {hi {it's} joe}"
>>>> 
>>>> 
>>>> 
>>>> On May 18, 2011, at 11:47 AM, Jonathan Rochkind wrote:
>>>> 
>>>>> Ie, something like this maybe (just typing it into the email client,
dont' know if it even compiles, let alone works, but may give you some 
ideas).
>>>>> 
>>>>> rule :block do
>>>>>   str('{')<<     content.maybe<<    block.maybe<<   content.maybe<<
str('}'
>>>>> end
>>>>> 
>>>>> rule :content do
>>>>>   match['^{}].repeat
>>>>> end
>>>>> 
>>>>> The trick I think might work is the recursive call to block that 
will allow a (balanced) block to be inside a block, but otherwise we dont'
allow '{' or '}'.  Except I actually have no idea if this will actually 
work, heh, but some ideas to work with.
>>>>>