librelist archives

« back to archive

Matching 1-2 arbitrary words surrounded by more specific rules

Matching 1-2 arbitrary words surrounded by more specific rules

From:
Cody Russell
Date:
2011-12-12 @ 21:53
Hello!

I'm new to parslet, and I've never used any parsers before now (and btw,
thanks for making this great library, it's been fun to use so far!)  I'm
trying to make an interface where I can create calendar events by typing in
natural-like phrases.  I've written rules to match dates, times, and entire
date ranges.  Next I was going to build on top of this so that I can say
something like "lesson with John Doe on Friday at 2:00pm".  The part that
I'm having trouble is matching the name.

I tried to do something like this:

rule(:word)  { match('\\w').repeat }
rule(:attendee) { ( word | ( word >> ( space >> word ) ) ).as(:attendee) }
rule(:event_type) { ( str('lesson') | str('class') | str('interview')
).as(:event_type) }

rule(:event) do
    event_type >> space >> ( str('with ') >> attendee >> space ).maybe >> (
range | datetime | date | time )
end

It is correctly matching "lesson with John Friday at 2:00pm", but it's not
matching "lesson with john doe friday at 2:00pm", and I'm having trouble
getting it to do this.  I suppose there could even be multi-word last names
such as "ludwig van beethoven" but since I can't even get "john doe" to
match yet I figure there is no reason to try to get larger names to match
yet. :)

Is there a common strategy for trying to do something like this?

Thanks very much,
   Cody

Re: Matching 1-2 arbitrary words surrounded by more specific rules

From:
Kaspar Schiess
Date:
2011-12-13 @ 08:32
Hi Cody,

The kind of sentences you're trying to match are highly ambiguous. 
Suppose you want to schedule dinner with Monday Doe, named after the day 
she was born? You will have to exclude all week-days from all names in 
order to even match names - and then clashes like this one are 
programmed to happen.

If you look at the history of parsing, other formalisms than PEG have 
been developed for this kind of sentences. They use probabilistic and 
heuristic 'interpretation'-type approaches. [1] gives a good overview 
and many of these algorithms have been implemented in Ruby.

Parslet implements PEG very slavishly, it doesn't even do whitespace 
ignore or left recursion. This is going to stay that way. PEG is very 
good (IMHO) for computer languages. It avoids many ambiguities by posing 
a formalism that doesn't allow them, which appeals to me because it is 
elegant.

So that means: Parsing natural language with parslet will make you 
unhappy, folks! Cody, the common strategy for parsing things like this 
is to use a bottom up parser, LR(k) or the like. There are very elegant 
natural language frameworks out there as well. And last (but .. not 
least) there is chronic[2], which does part of what you want...

regards,
kaspar

[1] http://en.wikipedia.org/wiki/Parser
[2] http://rubygems.org/gems/chronic
[3] http://duckduckgo.com/?q=natural+language+ruby

Re: [ruby.parslet] Re: Matching 1-2 arbitrary words surrounded by more specific rules

From:
Nigel Thorne
Date:
2011-12-14 @ 11:20
Hi Cody..

I have to agree with Kaspar, this isn't what Parslet is for..

However...If I _had_ to get your parser to work..

1/ You are parsing (word | word space word)... Parslet will always consume
the first matching option.. so 'word' will match instead of 'word space
word'. Flip them to get 'word' to be the fallback option...

2/ there is ambiguity on the day (as it may be part of the name)... so I
would change the grammar to have "on" as your indicator that you are
starting a date... and 'at' to indicate a time.  This would mean you can't
have someone called "on" as their last name... shame. With this (and you
assume your order is always the same... ie event_type, optional attendee,
date, time... )  then you can

 rule(:attendee) { (word >> (space >> str('on').absnt? >> word ).repeat
).as(:attendee)}

<code>
require 'rubygems'
require 'parslet'

class SimpleParser < Parslet::Parser
rule(:space){str(" ")}
rule(:word)  { match('\w').repeat(1) }
# rule(:attendee) { ((word >> space >> word )| word ).as(:attendee)}
        rule(:attendee) { (word >> (space >> str('on').absnt? >> word
).repeat ).as(:attendee)}
rule(:event_type) { ( str('lesson') | str('class') | str('interview')
).as(:event_type) }
rule(:attendance) { str('with') >> space >> attendee}
rule(:stuff) {any.repeat}
rule(:temporality){ str('on') >> space >>  stuff.as(:when) }
rule(:event) { event_type >> space >> attendance >> space >> temporality}
root(:event)
end

@parser  = SimpleParser.new


 require 'parslet/convenience'
# puts @parser.attendee.parse_with_debug("nigel thorne")
 puts      @parser.parse_with_debug("lesson with john doe on friday at
2:00pm")
</code>

... something like this


---
"No man is an island... except Philip"


On 13 December 2011 19:32, Kaspar Schiess <eule@space.ch> wrote:

> Hi Cody,
>
> The kind of sentences you're trying to match are highly ambiguous.
> Suppose you want to schedule dinner with Monday Doe, named after the day
> she was born? You will have to exclude all week-days from all names in
> order to even match names - and then clashes like this one are
> programmed to happen.
>
> If you look at the history of parsing, other formalisms than PEG have
> been developed for this kind of sentences. They use probabilistic and
> heuristic 'interpretation'-type approaches. [1] gives a good overview
> and many of these algorithms have been implemented in Ruby.
>
> Parslet implements PEG very slavishly, it doesn't even do whitespace
> ignore or left recursion. This is going to stay that way. PEG is very
> good (IMHO) for computer languages. It avoids many ambiguities by posing
> a formalism that doesn't allow them, which appeals to me because it is
> elegant.
>
> So that means: Parsing natural language with parslet will make you
> unhappy, folks! Cody, the common strategy for parsing things like this
> is to use a bottom up parser, LR(k) or the like. There are very elegant
> natural language frameworks out there as well. And last (but .. not
> least) there is chronic[2], which does part of what you want...
>
> regards,
> kaspar
>
> [1] http://en.wikipedia.org/wiki/Parser
> [2] http://rubygems.org/gems/chronic
> [3] http://duckduckgo.com/?q=natural+language+ruby
>
>
>

Re: [ruby.parslet] Re: Matching 1-2 arbitrary words surrounded by more specific rules

From:
Jonathan Rochkind
Date:
2011-12-14 @ 15:43
I wonder if an 'ordinary' regex-based solution might actually do you 
better.

But I'd definitely start out looking at the source of the 'chronic' gem, 
as Kaspar suggested, to see how they do it -- they do some portion of 
what you want to do already, fairly well.

On 12/14/2011 6:20 AM, Nigel Thorne wrote:
> Hi Cody..
>
> I have to agree with Kaspar, this isn't what Parslet is for..
>
> However...If I _had_ to get your parser to work..
>
> 1/ You are parsing (word | word space word)... Parslet will always 
> consume the first matching option.. so 'word' will match instead of 
> 'word space word'. Flip them to get 'word' to be the fallback option...
>
> 2/ there is ambiguity on the day (as it may be part of the name)... so 
> I would change the grammar to have "on" as your indicator that you are 
> starting a date... and 'at' to indicate a time.  This would mean you 
> can't have someone called "on" as their last name... shame. With this 
> (and you assume your order is always the same... ie event_type, 
> optional attendee, date, time... )  then you can
>
>  rule(:attendee) { (word >> (space >> str('on').absnt? >> word 
> ).repeat ).as(:attendee)}
>
> <code>
> require 'rubygems'
> require 'parslet'
>
> class SimpleParser < Parslet::Parser
> rule(:space){str(" ")}
> rule(:word)  { match('\w').repeat(1) }
> #rule(:attendee) { ((word >> space >> word )| word ).as(:attendee)}
>         rule(:attendee) { (word >> (space >> str('on').absnt? >> word 
> ).repeat ).as(:attendee)}
> rule(:event_type) { ( str('lesson') | str('class') | str('interview') 
> ).as(:event_type) }
> rule(:attendance) { str('with') >> space >> attendee}
> rule(:stuff) {any.repeat}
> rule(:temporality){ str('on') >> space >> stuff.as 
> <http://stuff.as>(:when) }
> rule(:event) { event_type >> space >> attendance >> space >> temporality}
> root(:event)
> end
>
> @parser  = SimpleParser.new
>
>
>  require 'parslet/convenience'
> # puts @parser.attendee.parse_with_debug("nigel thorne")
>  puts      @parser.parse_with_debug("lesson with john doe on friday at 
> 2:00pm")
> </code>
>
> ... something like this
>
>
> ---
> "No man is an island... except Philip"
>
>
> On 13 December 2011 19:32, Kaspar Schiess <eule@space.ch 
> <mailto:eule@space.ch>> wrote:
>
>     Hi Cody,
>
>     The kind of sentences you're trying to match are highly ambiguous.
>     Suppose you want to schedule dinner with Monday Doe, named after
>     the day
>     she was born? You will have to exclude all week-days from all names in
>     order to even match names - and then clashes like this one are
>     programmed to happen.
>
>     If you look at the history of parsing, other formalisms than PEG have
>     been developed for this kind of sentences. They use probabilistic and
>     heuristic 'interpretation'-type approaches. [1] gives a good overview
>     and many of these algorithms have been implemented in Ruby.
>
>     Parslet implements PEG very slavishly, it doesn't even do whitespace
>     ignore or left recursion. This is going to stay that way. PEG is very
>     good (IMHO) for computer languages. It avoids many ambiguities by
>     posing
>     a formalism that doesn't allow them, which appeals to me because it is
>     elegant.
>
>     So that means: Parsing natural language with parslet will make you
>     unhappy, folks! Cody, the common strategy for parsing things like this
>     is to use a bottom up parser, LR(k) or the like. There are very
>     elegant
>     natural language frameworks out there as well. And last (but .. not
>     least) there is chronic[2], which does part of what you want...
>
>     regards,
>     kaspar
>
>     [1] http://en.wikipedia.org/wiki/Parser
>     [2] http://rubygems.org/gems/chronic
>     [3] http://duckduckgo.com/?q=natural+language+ruby
>
>
>