Tuesday, August 15, 2006

August 15th, 2006 -- Last Day of SoC

In fact, this is a post for 8/14. I worked until three o'clock this morning.

Since fglock++ worked a lot on changing Pugs::Rule::Compiler to make module v6 better, I also made some changes to make my existing code work after the merge from pX/ to perl5/. After that, since most parts of Literal.hs are in the test file of the Parsec emitting module, I began writing the translation tool.

It is now here, with the Literal.grammar added. Since parameter of rules is not supported, three of them are still written in Haskell after the "use Haskell" statement. Beside that, a patch file is included which make the generated file really work.

The reasons of needing a patch file are: non-LL grammar which need backtrack and a weird error message without type annotation. There are three places where "try" is added in the generated source to allow backtrack, they can be avoid by rewriting the grammar.

But the need of annotation is not clear to me. The original parser is like this:
ruleTwigil :: RuleParser String
ruleTwigil = option "" . choice . map string $ words " ^ * ? . ! + ; "
And the generated version is (ugly and obviously not hand-written)
ruleTwigil = option "" $ do
string "^"
<|>
string "*"
<|>
string "?"
<|>
string "."
<|>
string "!"
<|>
string "+"
<|>
string ";"
Without type annotanion, both version of code cause the error
No instance for (MonadState RuleState (GenParser Char st))
Any comment?

Monday, July 31, 2006

July 31st, 2006 -- Parsec emitter and underlay problems

Main goal of this week was to complete the Parsec emitter. However, due to some natural limitation, it's not possible to really "complete" it, I just try my best to complete as many components as I can and hope those be enough for writing complete Perl 6 grammar. The accepting grammar constructions implemented from last post are:
  • :sigspace option
  • complete \X syntax (but not \Xxxx)
  • numbered captures
  • subrule with parameters
  • non-capture group

And I mailed pmichaud++ about the different semantics of "non-backtrack" in Perl 6 rules and Parsec parsing strategy. The :ratchet option, which is turned on by rule and token, is to make backtrack over atom fail. But different branches are still tried even if first several atoms in one branch are matched. In Parsec, the whole parsing fails if it goes into some branch, consumes some tokens and is unable to go on further. However, this can be changed by wrapping the branch with "try," which, as the name tells, tries to match the branch but will try other ones if failed. Such action is like adding a "::" after each atom instead of adding ":", which is done by rule and token in Perl 6 grammar.

One way to solve it is to add "try" everywhere. But that means giving up the high-efficiency parsing provided by Parsec. When the grammar is not LL, it's unavoidable. Parsec performs best on LL grammar (from official page), so in this stage, I'll feed only LL grammar to it and no additional "try" is added.

Sunday, July 23, 2006

July 23rd, 2006 -- Pugs::Emitter::Rule::Parsec

I finally escaped from the final exams and projects, and the unexpected busy early July. The first checkin of the Pugs::Emitter::Rule::Parsec module is on July 20th. It accepted and emitted correct Haskell code on the yada example given in the README of MiniPerl6 module. (By the way, the Pugs::Grammar::MiniPerl6 and Pugs::Compiler::Rule modules have been moved from pX/ to perl5/)

After three days' hacking, it now accepts a lot of rule constructions. Also, a test file was added. In Parser.Literal there are 10 parser routines, two of them take arguments (namedLiteral and possiblyTypeLiteral) which I currently don't know how to present in Perl 6 rule, one uses previous parsing state to decide next action (ruleWordboundary), two use negative look-ahead (ruleDot and ruleLongDot) but <!before pattern> is not ready yet, all other five can be easily generated. In fact, all five of them are already in the test file, the result is tested by replacing existing code by the generated one proven that it gears Pugs' parser, too.

UPDATE: <!before pattern> support is added to both Pugs::Compiler::Rule and Parsec emitter. However, since the existing parser is not LL, I have to put an additional "try" in the generated code by hand to make it work.

Monday, June 05, 2006

June 5th, 2006 -- Named Captures

Someone in another open source project I joined made some mistake and since the system is online, it corrupted many users config files which took me a lot of time help fixing it. (The error report is still growing..)

I added a rule to Pugs::Grammar::Rule so that it accepts more named capture syntax. It accepted only $<var> := (pattern). The parentheses was necessary. Now more general $<var> := [pattern] and $<var> := <subrule> are also valid.
Since when dealing with Perl 5 ratchet emitter, I've hacked the named capture emitting routine and made sure that it still work with the extended syntax. What made me surprised is, even the non-ratchet emitter works fine with the newly acceptable syntax. However, I'm still suspecting that the positional capture is not correctly generated. When $<var> := (pattern) is used, the (pattern) should NOT be listed in the positional captures.

Any way, the tests in ratchet shows that Pugs::Emitter::Rule::Perl5::Ratchet works correctly. And I'll add tests for the non-ratchet one if possible.

On the other hand, the Pugs::Emitter::Rule::Parsec plan has been in my brain for a while and there IS a Perl module for it now -- in my working copy and has a lot of functions with empty body. As soon as it has some basic construction, I will check it in and begin our Parser/*.hs replacement.

Saturday, June 03, 2006

June 3rd, 2006 -- Well formatted grammar

Still not begin writing Pugs::Emitter::Rule::Parsec. Still dealing with MiniPerl6. Aufrank++ and spinclad++ gave some great comments about my variable matching rule. And the post I sent to perl.perl6.language, which was NOT sent to the list since I posted it dirctly by group.google, was replied by pmichaud++, my mentor :)

The thread is here (http://0rz.net/7b1pZ). Following his comments, I changed the matching rule of <ws> and changed some rules into tokens. The grammar file is not so hateful now.

Update: Audrey++ confirmed that the PGE-way of <ws> is correct and has updated S05.

Thursday, June 01, 2006

June 1st, 2006 -- A new month, a new stage

Since the MiniPerl6 is finished, and it seems that there will not be big changes, I am planning moving on next stage, Pugs::Emitter::Rule::Parsec. There is no progress yet but I spent some time exploring papers on monads (most of them are published 10+ years ago) mostly for my current researching topic but also that Parsec is a monadic parser and I haven't totally understood monad.

On the other hand, I don't really like (pronounce "hate") the current MiniPerl6.grammar. It was well-formatted before :sigspace was implemented in Pugs::Compiler::Rule. But after the default-on option was implemented, I could not put extra spaces to format it well anymore. I concluded two reasons why it happened and posted a suggestion on perl.perl6.language but no one replied yet.

The two reasons (or, suggestion of changes) are:
  1. Spaces at beginning and end of rule blocks should be ignored since space before and after current rule are most likely be defined in rules using current one. (The spaces around the rule-level alternative could also be ignored).
  2. I am not sure the default rule of , I couldn't found it in S05. Currently the engine use :P5/\s+/ but I would like it to be :P5/\s*/ when it's before or after non-words and remains the same (\s+) otherwise.
I think these will help me formatting the grammar file better.

Sunday, May 28, 2006

May 28th, 2006 -- MiniPerl6 Finished Again

Continue on yesterday's work, I finished converting MiniPerl6 module so that it now work happily with ratchet.

The problem is that I used \w+ to match variable names but I should use [<alpha>|_]\w* instead (since the variables named with numbers are numbered captures, a special category). Another problem was hidden, or bypassed, by changing the order of alternations after I fixed the rule-using problem. It happened to be that the named capture parsing code in Pugs::Emitter::Rule::Perl5::Ratchet used a reference of boolean value in an "if" statement. I have to add another pair of ${} around it, at least before switching to Perl 6.