This is probably an elementary question, so forgive me.
I'm writing a little DSL and would like it to be able to contain embedded
Ruby blocks (delimited by curly braces or do/end). During parsing I'd
like to just gobble up the text of the block and stuff it into a single
node in the AST.
Of course there may be nested blocks inside the Ruby. The nesting
examples on the website seem to explicitly parse any matching delimiters.
But I don't want to parse the Ruby here, I just want to gobble it up.
However, catching the Ruby as
str('{') >> any.repeat(1) >> str('}')
causes parsing to fail since the rightmost curly brace gets captured by
the any.repeat. I include an example below and output.
Any suggestions?
Thanks.
Joe
require 'rubygems'
require 'parslet'
class Mini < Parslet::Parser
rule(:lbrace) { str('{') >> space? }
rule(:rbrace) { str('}') >> space? }
rule(:word) { match['a-z'].repeat(1) >> space? }
rule(:space) { match('\s').repeat(1) }
rule(:space?) { space.maybe }
rule(:block) { lbrace >> any.repeat(1) >> rbrace }
rule(:stmt) { (block.as(:block) | word).repeat }
root :stmt
end
def parse(str)
mini = Mini.new
print "Parsing #{str}: "
p mini.parse(str)
rescue Parslet::ParseFailed => error
puts error, mini.root.error_tree
end
parse "joe is here {hi}"
(joeh@manila) 1351 > ruby lilparse.rb
Parsing joe is here {hi}: Don't know what to do with {hi} at line 1 char 13.
`- Expected one of [block:BLOCK, WORD]. at line 1 char 13.
|- Failed to match sequence (LBRACE .{1, }) at line 1 char 17.
| `- Failed to match sequence ('}' SPACE?) at line 1 char 17.
| `- Premature end of input at line 1 char 17.
`- Failed to match sequence ([a-z]{1, } SPACE?) at line 1 char 13.
`- Expected at least 1 of [a-z] at line 1 char 13.
`- Failed to match [a-z] at line 1 char 13.
(joeh@manila) 1352 >
In general, this is somewhat similar to paren-balancing task. https://github.com/kschiess/parslet/blob/master/example/parens.rb That might help you. Although wait, I guess that particular example doesn't allow your parens to have any content, heh. Although what you are doing may be even simpler than a paren balancing task with content -- What do you want to happen to your nested blocks? They don't need to be parsed out seperately, they are still gobbled up in just one string, the top level block structure? But I think that example should give you some hints maybe, or the more complicated one that actually lets the parens have content, here: https://github.com/kschiess/parslet/blob/master/example/minilisp.rb On 5/18/2011 11:37 AM, Joe Hellerstein wrote: > This is probably an elementary question, so forgive me. > > I'm writing a little DSL and would like it to be able to contain embedded Ruby blocks (delimited by curly braces or do/end). During parsing I'd like to just gobble up the text of the block and stuff it into a single node in the AST. > > Of course there may be nested blocks inside the Ruby. The nesting examples on the website seem to explicitly parse any matching delimiters. But I don't want to parse the Ruby here, I just want to gobble it up. However, catching the Ruby as > str('{')>> any.repeat(1)>> str('}') > causes parsing to fail since the rightmost curly brace gets captured by the any.repeat. I include an example below and output. > > Any suggestions? > > Thanks. > Joe > > require 'rubygems' > require 'parslet' > > class Mini< Parslet::Parser > rule(:lbrace) { str('{')>> space? } > rule(:rbrace) { str('}')>> space? } > rule(:word) { match['a-z'].repeat(1)>> space? } > rule(:space) { match('\s').repeat(1) } > rule(:space?) { space.maybe } > rule(:block) { lbrace>> any.repeat(1)>> rbrace } > rule(:stmt) { (block.as(:block) | word).repeat } > root :stmt > end > > def parse(str) > mini = Mini.new > print "Parsing #{str}: " > > p mini.parse(str) > rescue Parslet::ParseFailed => error > puts error, mini.root.error_tree > end > > parse "joe is here {hi}" > > (joeh@manila) 1351> ruby lilparse.rb > Parsing joe is here {hi}: Don't know what to do with {hi} at line 1 char 13. > `- Expected one of [block:BLOCK, WORD]. at line 1 char 13. > |- Failed to match sequence (LBRACE .{1, }) at line 1 char 17. > | `- Failed to match sequence ('}' SPACE?) at line 1 char 17. > | `- Premature end of input at line 1 char 17. > `- Failed to match sequence ([a-z]{1, } SPACE?) at line 1 char 13. > `- Expected at least 1 of [a-z] at line 1 char 13. > `- Failed to match [a-z] at line 1 char 13. > (joeh@manila) 1352>
Ie, something like this maybe (just typing it into the email client,
dont' know if it even compiles, let alone works, but may give you some
ideas).
rule :block do
str('{') << content.maybe << block.maybe << content.maybe << str('}'
end
rule :content do
match['^{}].repeat
end
The trick I think might work is the recursive call to block that will
allow a (balanced) block to be inside a block, but otherwise we dont'
allow '{' or '}'. Except I actually have no idea if this will actually
work, heh, but some ideas to work with.
On 5/18/2011 11:37 AM, Joe Hellerstein wrote:
> This is probably an elementary question, so forgive me.
>
> I'm writing a little DSL and would like it to be able to contain
embedded Ruby blocks (delimited by curly braces or do/end). During
parsing I'd like to just gobble up the text of the block and stuff it into
a single node in the AST.
>
> Of course there may be nested blocks inside the Ruby. The nesting
examples on the website seem to explicitly parse any matching delimiters.
But I don't want to parse the Ruby here, I just want to gobble it up.
However, catching the Ruby as
> str('{')>> any.repeat(1)>> str('}')
> causes parsing to fail since the rightmost curly brace gets captured by
the any.repeat. I include an example below and output.
>
> Any suggestions?
>
> Thanks.
> Joe
>
> require 'rubygems'
> require 'parslet'
>
> class Mini< Parslet::Parser
> rule(:lbrace) { str('{')>> space? }
> rule(:rbrace) { str('}')>> space? }
> rule(:word) { match['a-z'].repeat(1)>> space? }
> rule(:space) { match('\s').repeat(1) }
> rule(:space?) { space.maybe }
> rule(:block) { lbrace>> any.repeat(1)>> rbrace }
> rule(:stmt) { (block.as(:block) | word).repeat }
> root :stmt
> end
>
> def parse(str)
> mini = Mini.new
> print "Parsing #{str}: "
>
> p mini.parse(str)
> rescue Parslet::ParseFailed => error
> puts error, mini.root.error_tree
> end
>
> parse "joe is here {hi}"
>
> (joeh@manila) 1351> ruby lilparse.rb
> Parsing joe is here {hi}: Don't know what to do with {hi} at line 1 char 13.
> `- Expected one of [block:BLOCK, WORD]. at line 1 char 13.
> |- Failed to match sequence (LBRACE .{1, }) at line 1 char 17.
> | `- Failed to match sequence ('}' SPACE?) at line 1 char 17.
> | `- Premature end of input at line 1 char 17.
> `- Failed to match sequence ([a-z]{1, } SPACE?) at line 1 char 13.
> `- Expected at least 1 of [a-z] at line 1 char 13.
> `- Failed to match [a-z] at line 1 char 13.
> (joeh@manila) 1352>
So, do you think your task ends up being basically writing a parser for the ruby language? That's obviously a somewhat hard problem. :) What does the "DSL" you are embedding these 'blocks' in look like? There may be an easier way of approaching the parsing, knowing the context. Or you may be able to alter your "DSL" to make the parsing easier, but more unambiguously parseable things in the rest of the DSL. But there's a reason that many ruby libraries use plain old ruby code as a "DSL" -- you already have a ruby parser, ruby itself, and don't need to write one. Jonathan On 5/18/2011 5:25 PM, Joe Hellerstein wrote: > The plot thickens when we consider handling "do..end" blocks in Ruby. > > To extend our curly-brace-balancing scheme, we'd need to balance all uses of "end" inside a Ruby block. That is fine -- there's only a small set of Ruby expressions that end in "end" and I can enumerate their starting words (if, unless, while, until, case, for, class, module, def.) > > The problem is the way Ruby allows the use of "if"/"unless" at the end of a Ruby statement without a matching "end". To do my "end" balancing right and handle these cases, I'd need to recognize Ruby statements so I could figure out that the "if"/"unless" logic was a suffix. Blech. > > Any further thoughts? > > J > > > On May 18, 2011, at 12:39 PM, Joe Hellerstein wrote: > >> Nicely done and thank you! As long as I don't mind enforcing delimiter-balancing in the thing I'm gobbling up (and in this case I don't), your trick works. >> >> Fixed example below for future ref. >> >> Joe >> >> require 'rubygems' >> require 'parslet' >> >> class Mini< Parslet::Parser >> rule(:lbrace) { str('{')>> space? } >> rule(:rbrace) { str('}')>> space? } >> rule(:word) { match['a-z'].repeat(1)>> space? } >> rule(:space) { match('\s').repeat(1) } >> rule(:space?) { space.maybe } >> rule(:block) { lbrace>> (content | block).repeat(1)>> rbrace } >> rule(:content) { match['^{}'] } >> >> rule(:stmt) { (block.as(:block) | word).repeat } >> root :stmt >> end >> >> def parse(str) >> mini = Mini.new >> print "Parsing #{str}: " >> >> p mini.parse(str) >> rescue Parslet::ParseFailed => error >> puts error, mini.root.error_tree >> end >> >> parse "joe is here {hi {it's} joe}" >> >> >> >> On May 18, 2011, at 11:47 AM, Jonathan Rochkind wrote: >> >>> Ie, something like this maybe (just typing it into the email client, dont' know if it even compiles, let alone works, but may give you some ideas). >>> >>> rule :block do >>> str('{')<< content.maybe<< block.maybe<< content.maybe<< str('}' >>> end >>> >>> rule :content do >>> match['^{}].repeat >>> end >>> >>> The trick I think might work is the recursive call to block that will allow a (balanced) block to be inside a block, but otherwise we dont' allow '{' or '}'. Except I actually have no idea if this will actually work, heh, but some ideas to work with. >>>
For instance, if your 'dsl' was YAML but a value could look like a 'ruby block'... then you would just need to parse YAML, your parser wouldn't have to care that a value happened to look like a 'ruby block', that's just payload like any other. And once parsed, you could investigate the payloads with by using imperative code, or with regexps, to see which ones looked like 'a ruby block', if it mattered to pull those out. And of course there's already a YAML parser built into ruby, you wouldn't need to write one. If your "dsl" is not YAML but is your own home-built thing, the same basic approach could be tried, of structuring it so your parser doesn't actually have to recognize a 'block', it's just a payload in the larger structure. One way or another, I suspect you want to re-think the nature of your "dsl" to be easier to work with. Ironically, however, Parslet itself demonstrates the utility of using Plain Old Ruby for your "dsl" -- note how a parslet grammar is just ruby code, it's not parsed by anything other than ruby itself. Again, there's a reason many ruby libraries take this approach, writing your own parser for a 'dsl' instead can be a lot of cost for little benefit compared to just structuring your API such that plain old ruby is a decent "dsl". On 5/19/2011 10:31 AM, Jonathan Rochkind wrote: > So, do you think your task ends up being basically writing a parser for > the ruby language? That's obviously a somewhat hard problem. :) > > What does the "DSL" you are embedding these 'blocks' in look like? > There may be an easier way of approaching the parsing, knowing the > context. Or you may be able to alter your "DSL" to make the parsing > easier, but more unambiguously parseable things in the rest of the DSL. > > But there's a reason that many ruby libraries use plain old ruby code as > a "DSL" -- you already have a ruby parser, ruby itself, and don't need > to write one. > > Jonathan > > On 5/18/2011 5:25 PM, Joe Hellerstein wrote: >> The plot thickens when we consider handling "do..end" blocks in Ruby. >> >> To extend our curly-brace-balancing scheme, we'd need to balance all uses of "end" inside a Ruby block. That is fine -- there's only a small set of Ruby expressions that end in "end" and I can enumerate their starting words (if, unless, while, until, case, for, class, module, def.) >> >> The problem is the way Ruby allows the use of "if"/"unless" at the end of a Ruby statement without a matching "end". To do my "end" balancing right and handle these cases, I'd need to recognize Ruby statements so I could figure out that the "if"/"unless" logic was a suffix. Blech. >> >> Any further thoughts? >> >> J >> >> >> On May 18, 2011, at 12:39 PM, Joe Hellerstein wrote: >> >>> Nicely done and thank you! As long as I don't mind enforcing delimiter-balancing in the thing I'm gobbling up (and in this case I don't), your trick works. >>> >>> Fixed example below for future ref. >>> >>> Joe >>> >>> require 'rubygems' >>> require 'parslet' >>> >>> class Mini< Parslet::Parser >>> rule(:lbrace) { str('{')>> space? } >>> rule(:rbrace) { str('}')>> space? } >>> rule(:word) { match['a-z'].repeat(1)>> space? } >>> rule(:space) { match('\s').repeat(1) } >>> rule(:space?) { space.maybe } >>> rule(:block) { lbrace>> (content | block).repeat(1)>> rbrace } >>> rule(:content) { match['^{}'] } >>> >>> rule(:stmt) { (block.as(:block) | word).repeat } >>> root :stmt >>> end >>> >>> def parse(str) >>> mini = Mini.new >>> print "Parsing #{str}: " >>> >>> p mini.parse(str) >>> rescue Parslet::ParseFailed => error >>> puts error, mini.root.error_tree >>> end >>> >>> parse "joe is here {hi {it's} joe}" >>> >>> >>> >>> On May 18, 2011, at 11:47 AM, Jonathan Rochkind wrote: >>> >>>> Ie, something like this maybe (just typing it into the email client, dont' know if it even compiles, let alone works, but may give you some ideas). >>>> >>>> rule :block do >>>> str('{')<< content.maybe<< block.maybe<< content.maybe<< str('}' >>>> end >>>> >>>> rule :content do >>>> match['^{}].repeat >>>> end >>>> >>>> The trick I think might work is the recursive call to block that will allow a (balanced) block to be inside a block, but otherwise we dont' allow '{' or '}'. Except I actually have no idea if this will actually work, heh, but some ideas to work with. >>>>
Understood. My DSL is Bloom, http://bloom-lang.net We already did a "plain-old-ruby" implementation via significant metaprogramming in the "bud" prototype. I'm trying to put a cleaner parser front-end on it to support program rewriting in a better way than our current Ruby AST rewrites. Anyhow, for the time being I think it's sufficient for me to introduce a new pair of bracketing keywords into the DSL as a workaround. Thanks for the help! Joe On May 19, 2011, at 10:51 AM, Jonathan Rochkind wrote: > For instance, if your 'dsl' was YAML but a value could look like a 'ruby > block'... then you would just need to parse YAML, your parser wouldn't > have to care that a value happened to look like a 'ruby block', that's > just payload like any other. And once parsed, you could investigate the > payloads with by using imperative code, or with regexps, to see which > ones looked like 'a ruby block', if it mattered to pull those out. > > And of course there's already a YAML parser built into ruby, you > wouldn't need to write one. > > If your "dsl" is not YAML but is your own home-built thing, the same > basic approach could be tried, of structuring it so your parser doesn't > actually have to recognize a 'block', it's just a payload in the larger > structure. > > One way or another, I suspect you want to re-think the nature of your > "dsl" to be easier to work with. > > Ironically, however, Parslet itself demonstrates the utility of using > Plain Old Ruby for your "dsl" -- note how a parslet grammar is just ruby > code, it's not parsed by anything other than ruby itself. Again, > there's a reason many ruby libraries take this approach, writing your > own parser for a 'dsl' instead can be a lot of cost for little benefit > compared to just structuring your API such that plain old ruby is a > decent "dsl". > > On 5/19/2011 10:31 AM, Jonathan Rochkind wrote: >> So, do you think your task ends up being basically writing a parser for >> the ruby language? That's obviously a somewhat hard problem. :) >> >> What does the "DSL" you are embedding these 'blocks' in look like? >> There may be an easier way of approaching the parsing, knowing the >> context. Or you may be able to alter your "DSL" to make the parsing >> easier, but more unambiguously parseable things in the rest of the DSL. >> >> But there's a reason that many ruby libraries use plain old ruby code as >> a "DSL" -- you already have a ruby parser, ruby itself, and don't need >> to write one. >> >> Jonathan >> >> On 5/18/2011 5:25 PM, Joe Hellerstein wrote: >>> The plot thickens when we consider handling "do..end" blocks in Ruby. >>> >>> To extend our curly-brace-balancing scheme, we'd need to balance all uses of "end" inside a Ruby block. That is fine -- there's only a small set of Ruby expressions that end in "end" and I can enumerate their starting words (if, unless, while, until, case, for, class, module, def.) >>> >>> The problem is the way Ruby allows the use of "if"/"unless" at the end of a Ruby statement without a matching "end". To do my "end" balancing right and handle these cases, I'd need to recognize Ruby statements so I could figure out that the "if"/"unless" logic was a suffix. Blech. >>> >>> Any further thoughts? >>> >>> J >>> >>> >>> On May 18, 2011, at 12:39 PM, Joe Hellerstein wrote: >>> >>>> Nicely done and thank you! As long as I don't mind enforcing delimiter-balancing in the thing I'm gobbling up (and in this case I don't), your trick works. >>>> >>>> Fixed example below for future ref. >>>> >>>> Joe >>>> >>>> require 'rubygems' >>>> require 'parslet' >>>> >>>> class Mini< Parslet::Parser >>>> rule(:lbrace) { str('{')>> space? } >>>> rule(:rbrace) { str('}')>> space? } >>>> rule(:word) { match['a-z'].repeat(1)>> space? } >>>> rule(:space) { match('\s').repeat(1) } >>>> rule(:space?) { space.maybe } >>>> rule(:block) { lbrace>> (content | block).repeat(1)>> rbrace } >>>> rule(:content) { match['^{}'] } >>>> >>>> rule(:stmt) { (block.as(:block) | word).repeat } >>>> root :stmt >>>> end >>>> >>>> def parse(str) >>>> mini = Mini.new >>>> print "Parsing #{str}: " >>>> >>>> p mini.parse(str) >>>> rescue Parslet::ParseFailed => error >>>> puts error, mini.root.error_tree >>>> end >>>> >>>> parse "joe is here {hi {it's} joe}" >>>> >>>> >>>> >>>> On May 18, 2011, at 11:47 AM, Jonathan Rochkind wrote: >>>> >>>>> Ie, something like this maybe (just typing it into the email client, dont' know if it even compiles, let alone works, but may give you some ideas). >>>>> >>>>> rule :block do >>>>> str('{')<< content.maybe<< block.maybe<< content.maybe<< str('}' >>>>> end >>>>> >>>>> rule :content do >>>>> match['^{}].repeat >>>>> end >>>>> >>>>> The trick I think might work is the recursive call to block that will allow a (balanced) block to be inside a block, but otherwise we dont' allow '{' or '}'. Except I actually have no idea if this will actually work, heh, but some ideas to work with. >>>>>