librelist archives

« back to archive

Error message: duplicate subtrees

Error message: duplicate subtrees

From:
Thiel Chang
Date:
2011-05-02 @ 14:20
Hi guys,

Due to a problem with code generation I changed my grammar rules.

After running this code: parser.parse_with_debug( '1  +  2' )

I got this result:

{:stm=>{:ident=>"2"@4}}
Duplicate subtrees while merging result of
   stm:EXP
only the values of the latter will be kept. (keys: [:ident])

Can anyone tell me what's  wrong?

And I also have another question: when do you have to use "space" in a 
grammar rule?

Thanks in advance,

Thiel








Re: [ruby.parslet] Error message: duplicate subtrees

From:
Ant Skelton
Date:
2011-05-02 @ 14:36
On Mon, May 2, 2011 at 3:20 PM, Thiel Chang <schang@wxs.nl> wrote:

> Hi guys,
>
> Due to a problem with code generation I changed my grammar rules.
>
> After running this code: parser.parse_with_debug( '1  +  2' )
>
> I got this result:
>
> {:stm=>{:ident=>"2"@4}}
> Duplicate subtrees while merging result of
>   stm:EXP
> only the values of the latter will be kept. (keys: [:ident])
>
> Can anyone tell me what's  wrong?
>

Hard to say without seeing the grammar, but I'm guessing you have a rule
that has duplicate ".as(:stm)"  expressions?

And I also have another question: when do you have to use "space" in a
> grammar rule?
>

When you want to consume whitespace characters. generally these are
optional, so you use the "space?" idiom.
Thus
     rule(:foo) { str('1') >> space? str('+') >> space? >> str('2) }

matches your above example, and also '1+2', '1     +     2' etc.


cheers

ant

Re: [ruby.parslet] Error message: duplicate subtrees

From:
Thiel Chang
Date:
2011-05-02 @ 15:02
Hi Anton,

Many thanks for your advice.  I  have enclosed my grammar rules:

   rule(:epsilon)     { any.absnt? }

   rule(:lparen)      { str('(') >> space? }
   rule(:rparen)     { str(')') >> space? }

   rule(:space)       { match('\s').repeat(1) }
   rule(:space?)     { space.maybe }

   rule(:plus_op)     { str('+') >> space? }
   rule(:mult_op)    { str('*') >> space? }
   rule(:min_op)     { str('-') >> space? }
   rule(:div_op)      { str('/') >> space? }

   rule(:integer)    { (space? >> (str('+') | str('-')).maybe >> space? >>
                                match("[0-9]").repeat(1)).as(:integer) 
 >> space?}

   rule(:exp)            {  term >> exp_acc }

   rule(:exp_acc)    { space? >>  mult_op >>  exp |
                                 space? >>  div_op >> exp |
                                epsilon  }

   rule(:term)         { factor >>  space? >>  term_acc }

   rule(:term_acc)   {  (space? >>  plus_op >>  term) |
                                    space? >>  min_op >> term |
                                   epsilon}

   rule(:factor)        { identifier |float | integer | var |  (lparen 
 >> exp >> rparen) |  epsilon }

   rule(:stm)            { exp }

   root  :stm

Actually, this should be a  simple grammar for parsing arithmetic 
expressions.

But I am still struggling:-)

Thiel


Op 2-5-2011 16:36, Ant Skelton schreef:
> On Mon, May 2, 2011 at 3:20 PM, Thiel Chang <schang@wxs.nl 
> <mailto:schang@wxs.nl>> wrote:
>
>     Hi guys,
>
>     Due to a problem with code generation I changed my grammar rules.
>
>     After running this code: parser.parse_with_debug( '1  +  2' )
>
>     I got this result:
>
>     {:stm=>{:ident=>"2"@4}}
>     Duplicate subtrees while merging result of
>       stm:EXP
>     only the values of the latter will be kept. (keys: [:ident])
>
>     Can anyone tell me what's  wrong?
>
>
> Hard to say without seeing the grammar, but I'm guessing you have a 
> rule that has duplicate ".as(:stm)"  expressions?
>
>     And I also have another question: when do you have to use "space" in a
>     grammar rule?
>
>
> When you want to consume whitespace characters. generally these are 
> optional, so you use the "space?" idiom.
> Thus
>      rule(:foo) { str('1') >> space? str('+') >> space? >> str('2) }
>
> matches your above example, and also '1+2', '1     +     2' etc.
>
>
> cheers
>
> ant

Re: [ruby.parslet] Error message: duplicate subtrees

From:
Ant Skelton
Date:
2011-05-02 @ 15:27
On Mon, May 2, 2011 at 4:02 PM, Thiel Chang <schang@wxs.nl> wrote:

>  Hi Anton,
>
> Anton?! ;)


> Many thanks for your advice.  I  have enclosed my grammar rules:
>
*snip*

Your 'term' rule contains factor, which will insert (:integer => '1') when
it matches. But the 'term' rule also includes 'term_acc', which includes
'term'....which will insert (:integer => '2') when it matches.....but you've
already got one of those, which is why Parslet complains. If you investigate
the ".as(:foo)" stuff, you'll be able to get some hierarchy into your parse
tree. For example changing your term rule to:


   rule(:term)  { factor >> space? >> term_acc.as('foo') }

will cause the parse to succeed.


Actually, this should be a  simple grammar for parsing arithmetic
> expressions.
>

Haha, that's never easy in a PEG ;) I've just done something very similar to
a C expression parser, I ended up with a big chain of rules that follow this
format:

     rule(:logicalOrExpression)       { logicalAndExpression >>
logicalOrExpressionTail.repeat }
    rule(:logicalOrExpressionTail)   { space? >>
logicalOrOperator.as(:binop) >> space? >> logicalAndExpression.as(:op) }
    rule(:logicalOrOperator)         { str('||') }

    rule(:logicalAndExpression)      { inclusiveOrExpression >>
logicalAndExpressionTail.repeat }
    rule(:logicalAndExpressionTail)  { space? >>
logicalAndOperator.as(:binop) >> space? >> inclusiveOrExpression.as(:op) }
    rule(:logicalAndOperator)        { str('&&') }

 etc etc

This is a common idiom for refactoring to avoid left recursion (which is a
big no-no in PEGs), although it can be somewhat counter-intuitive. I found
that studying grammars for other parsers very helpful; Spirit (for
c++/boost) has some good examples, as does ANTLR. You can translate these
into parslet without too much difficulty.

But I am still struggling :-)
>

Good luck!

ant

Re: [ruby.parslet] Error message: duplicate subtrees

From:
Thiel Chang
Date:
2011-05-03 @ 12:34
Hi Ant,

Thanks for your comment. I changed the rule(:term) as you suggested and 
my "1 + 2 " expression was parsed. Unfortunately, the parser did not 
generate the plus-operator.
Then I added  the .as(:plus) in the rule and the parser showed the 
plus-operator.

My lesson learned is that I have to experiment with Parslet as like a 
game just to discover the parsing rules and the (hidden?) features. 
Anyhow, I'm glad that this forum provides me the answers I needed.

Cheers,

Thiel

Op 2-5-2011 17:27, Ant Skelton schreef:
> On Mon, May 2, 2011 at 4:02 PM, Thiel Chang <schang@wxs.nl 
> <mailto:schang@wxs.nl>> wrote:
>
>     Hi Anton,
>
> Anton?! ;)
>
>     Many thanks for your advice.  I  have enclosed my grammar rules:
>
> *snip*
> Your 'term' rule contains factor, which will insert (:integer => '1') 
> when it matches. But the 'term' rule also includes 'term_acc', which 
> includes 'term'....which will insert (:integer => '2') when it 
> matches.....but you've already got one of those, which is why Parslet 
> complains. If you investigate the ".as(:foo)" stuff, you'll be able to 
> get some hierarchy into your parse tree. For example changing your 
> term rule to:
>
>
>    rule(:term)  { factor >> space? >> term_acc.as 
> <http://term_acc.as>('foo') }
>
> will cause the parse to succeed.
>
>
>     Actually, this should be a  simple grammar for parsing arithmetic
>     expressions.
>
>
> Haha, that's never easy in a PEG ;) I've just done something very 
> similar to a C expression parser, I ended up with a big chain of rules 
> that follow this format:
>
>    rule(:logicalOrExpression)       { logicalAndExpression >> 
> logicalOrExpressionTail.repeat }
>     rule(:logicalOrExpressionTail)   { space? >> 
> logicalOrOperator.as(:binop) >> space? >> logicalAndExpression.as(:op) }
>     rule(:logicalOrOperator)         { str('||') }
>
>     rule(:logicalAndExpression)      { inclusiveOrExpression >> 
> logicalAndExpressionTail.repeat }
>     rule(:logicalAndExpressionTail)  { space? >> 
> logicalAndOperator.as(:binop) >> space? >> inclusiveOrExpression.as(:op) }
>     rule(:logicalAndOperator)        { str('&&') }
>
>  etc etc
>
> This is a common idiom for refactoring to avoid left recursion (which 
> is a big no-no in PEGs), although it can be somewhat 
> counter-intuitive. I found that studying grammars for other parsers 
> very helpful; Spirit (for c++/boost) has some good examples, as does 
> ANTLR. You can translate these into parslet without too much difficulty.
>
>     But I am still struggling:-)
>
>
> Good luck!
>
> ant

Re: [ruby.parslet] Error message: duplicate subtrees

From:
Ant Skelton
Date:
2011-05-03 @ 15:24
> Thanks for your comment. I changed the rule(:term) as you suggested and my
> "1 + 2 " expression was parsed. Unfortunately, the parser did not generate
> the plus-operator.
> Then I added  the .as(:plus) in the rule and the parser showed the
> plus-operator.
>

Ok. Once you start using .as, anything you *didn't* use it on gets thrown
away; parslet assumes that since you didn't ask for it, you don't need it.
99% of the time, that's a hugely useful thing as it filters out a lot of the
chaff from your parse tree.


> My lesson learned is that I have to experiment with Parslet as like a game
> just to discover the parsing rules and the (hidden?) features. Anyhow, I'm
> glad that this forum provides me the answers I needed.
>

It's all in the docs on the website. I think there were a couple of things
that I discovered from the rdoc doco, but that's it. The problem is, there's
a lot to take in, so you need to read it a good few times. The problem you
describe above, for example, is covered in the docs.

cheers

ant

Re: Error message: duplicate subtrees

From:
Kaspar Schiess
Date:
2011-05-02 @ 15:37
>     rule(:logicalOrExpression)       { logicalAndExpression >>
> logicalOrExpressionTail.repeat }
>      rule(:logicalOrExpressionTail)   { space? >>
> logicalOrOperator.as(:binop) >> space? >> logicalAndExpression.as(:op) }
>      rule(:logicalOrOperator)         { str('||') }
>
>      rule(:logicalAndExpression)      { inclusiveOrExpression >>
> logicalAndExpressionTail.repeat }
>      rule(:logicalAndExpressionTail)  { space? >>
> logicalAndOperator.as(:binop) >> space? >> inclusiveOrExpression.as(:op) }
>      rule(:logicalAndOperator)        { str('&&') }
>
>   etc etc
>
> This is a common idiom for refactoring to avoid left recursion (which is
> a big no-no in PEGs), although it can be somewhat counter-intuitive. I
> found that studying grammars for other parsers very helpful; Spirit (for
> c++/boost) has some good examples, as does ANTLR. You can translate
> these into parslet without too much difficulty.

To add a really not well thought out remark: Could this kind of code be 
eliminated through a single meta rule? Like one that says:

	expression_of(term, [%w(&& ||), %w(* /), %w(+ -)])

The expression_of method would take term and build a hierarchy of binary 
rules that encode precedence in PEG-style. It would be so much easier to 
read... Left recursion elimination in the engine seems suddenly less 
elegant to me.

But maybe such a thing cannot be created for reasons not obvious to me 
right now. Anyone biting?

k

Re: [ruby.parslet] Re: Error message: duplicate subtrees

From:
Ant Skelton
Date:
2011-05-02 @ 15:51
On Mon, May 2, 2011 at 4:37 PM, Kaspar Schiess <eule@space.ch> wrote:

>
> To add a really not well thought out remark: Could this kind of code be
> eliminated through a single meta rule? Like one that says:
>
>        expression_of(term, [%w(&& ||), %w(* /), %w(+ -)])
>
> The expression_of method would take term and build a hierarchy of binary
> rules that encode precedence in PEG-style. It would be so much easier to
> read...


Sounds interesting... I considered doing this myself using a custom
function, but gave up because I didn't really know what I was doing, plus
the old way is sort of familiar looking. I've got my expression grammar down
to the usual rules and about 10 transformations....there's some slightly
hairy stuff to do with stitching chains of binary operators together
(especially in the presence of parentheses), but it works and is passing
tests.



> Left recursion elimination in the engine seems suddenly less
> elegant to me.
>

Ah yes, I was going to ask about the imminent  left-recursive parslet....

But maybe such a thing cannot be created for reasons not obvious to me
> right now. Anyone biting?


I can see 'expression_of' causing people to wonder why right-associative
operators aren't working, doing it by hand at least makes this reasonably
obvious. I skipped around the problem by not having any ;)

cheers

ant