Sunday, May 28, 2006

May 28th, 2006 -- MiniPerl6 Finished Again

Continue on yesterday's work, I finished converting MiniPerl6 module so that it now work happily with ratchet.

The problem is that I used \w+ to match variable names but I should use [<alpha>|_]\w* instead (since the variables named with numbers are numbered captures, a special category). Another problem was hidden, or bypassed, by changing the order of alternations after I fixed the rule-using problem. It happened to be that the named capture parsing code in Pugs::Emitter::Rule::Perl5::Ratchet used a reference of boolean value in an "if" statement. I have to add another pair of ${} around it, at least before switching to Perl 6.

Saturday, May 27, 2006

May 27th, 2006 -- Ratchet MiniPerl6

The Pugs::Grammar::MiniPerl6 module was broken since the default setting of parsing option of Pugs::Compiler::Rule several days ago so that the ratchet option was turned off in it. Currently compile_p6grammar.pl generates a module used no ratchet feature which is identical to that it used to do. So that I focused on it these days.

I fixed the ratchet Perl 5 emitter on backslashes and named capture yesterday. And, I've almost finished fixing the MiniPerl6 module itself.

The main parts are to revise the grammar and update the translator compile_p6grammar.pl. The former one including changing some rules into "token" and rewrite the others to fit the :sigspace. And the later part is to make it accept "token" and "regex" as well as "rule."

Another big change is a modifier ":p" on rule ProductionRule. Since ProductionRule is the top rule in the grammar, the match of it should start from the very begin of input string. Correspondingly, compile_p6grammar.pl also translate it correctly.

I said that I "almost" finished fixing it. That means some of them are still broken and if you tried "make test" you will find the last test failed. I'll fix the temporary variable declaration as soon as possible. Stay tuned!

Wednesday, May 24, 2006

May 24th, 2006 -- Google Summer of Code!

My proposal of "Pugs Self-hosting Bootstrap From Perl 5 and Rules" was accepted by google summer of code. There are five projects accepted from The Perl Foundation.

My proposal is listed
Shu-Chun Weng

Contact information
email: (hidden)
IRC: freenode.net #perl6 with id "scw"
Blog: http://YAPHB.blogspot.com

Perl 6 Self-hosting Bootstrap From Perl 5 and Rules

Synopsis
This is a subproject of Pugs (http://pugscode.org). One of the goals
of Pugs is to give an implementation of Perl 6 and help it
bootstraping. A new bootstrap plan has been proposed shortly. It
takes four steps and I propose to complete the first two steps in
this subproject. It consists of two new perl 5 modules. One
translates a very restrictive subset of Perl 6, MiniPerl6, to Haskell
and the another translates the Perl 6 Rule in to Parsec (a Haskell
parsing module) parsing code.

This is the easiest way to bootstrap we currently found. Bootstrap
from pure Perl 5 has been proven unbelievable hard due to the lack of
good parsing components. Pugs itself is written in Haskell because of
the Parsec module. It makes the parsing easier but still the parser
has grown too large to debug and maintain. Since the Perl 6 Rule has
been discussed for a long time and became stable and powerful enough,
the Perl 6 seems better be parsed by Perl 6 Rule itself.

Another feature of this project is that there is no "mixed" components
in the whole process. Bootstrap plans often consist of some parts
mixing two languages to get over the language gap before the target
can be used. But we carefully designed the plan and languages to be
used making that every components are written in a pure Perl 5 or Perl
6. This make the project reusable.

Deliverables
Module Pugs::Grammar::MiniPerl6
This module includes a Perl 6 Rule file and a Perl 5 script
translates the rule into Perl 5 parser module with the help of
Pugs::Compiler::Rule.

Module Pugs::Emitter::Rule::Parsec
This is the module used in the second step of bootstrap. It emits
Haskell code from the AST producted by Pugs::Compiler::Rule.

Module Pugs::Compiler::Rule
This is an existing module mainly written by Flavio S. Glock
(FGLOCK). Since it's heavily used is the above two modules and is
not completed, I may spend some time improving it to support more
Perl 6 Rule syntax.

Project Details
The new bootstrap plan is: (original in http://svn.openfoundry.org/pugs/misc/pX/Common/Pugs-Grammar-MiniPerl6/README)

1. This module, Pugs::Grammar::MiniPerl6, uses Pugs::Compiler::Rule
to read a special *mixed* Perl 6 Rule whose production rules
are written in Perl 5 (the current requirement of P::C::Rule).
1a. The rule is used to translate a subset of Perl 6, MiniPerl6,
to haskell.

2. Then, Pugs::Compiler::Emit::Parsec lands and uses this module
to translate the full Perl 6 grammar into a parser. The full
grammar can now write production rules in MiniPerl6 since
P::C::E::Parsec can use this module to translate such production
rules to haskell and makes the final output be pure Haskell.

3. When compiling Pugs, the .hs preprocessor will use
Pugs::Compiler::Emit::Parsec to accept the full Perl 6 grammar
generating Parser.hs. Then GHC will compile it to executable.

4. The executable can now read the full Perl 6 grammar again
generating compiler in PIL. Then self hosting is done.

I propose to finish the first two steps. Note that the first step
seems contain a mixed-language file, but since the Perl 5 code only
appear in production rules and are marked with 'use v5,' it's still
valid Perl 6.

The most challenging parts of this project are:

1. The parsed grammar saved in Perl 5 hash-and-array tree-like data
structure is extremely complex and nearly unreadable. Since the
correctness has not been tested much, debugging modules using it
is a difficult job.

2. We don't know if the Perl 5 Rule parser is powerful and efficient
enough to handle the full grammar. Currently translating a
sample grammar file with five small rules takes one minute CPU
time on a Xeon 2.8GHz machine (may be unprecise due to HTT).

3. We don't know if Parsec is powerful enough to handle all
functionality provided by Perl 6 Rules. Hopefully, we should be
able to use only those which can be translated easily in the
first version of the full grammar of Perl 6. And since Parsec
code was copied into Pugs' source tree for the ease of modifying,
it's not that serious.

The project has it's own directory in the Pugs' subversion repository
now:
http://svn.openfoundry.org/pugs/misc/pX/Common/Pugs-Grammar-MiniPerl6/

And the second part will appear as
misc/pX/Common/Pugs-Compiler-Rule/lib/Pugs/Emitter/Rule/Parsec.pm

The whole project will be licensed under the same licenses of the
Pugs source tree, which are currently dual licensed under GPL-2 and
Artistic-2.0b5.

Project Schedule
The following is the planned schedule:

May 31st -- Pugs::Grammar::MiniPerl6 has complete MiniPerl6 grammar
with correct production rules.
Jun 31st -- Pugs::Compiler::Rule be able to accept all syntax we
want to use in the full Perl 6 grammar.
Jul 1st -- Start organizing full Perl 6 grammar.
Jul 31st -- Pugs::Compiler::Emitter::Rule::Parsec can translate
most rule constructions.
Aug 1st -- Start rewriting exist parsing functions to Perl 6
rules.
Aug 20st -- Parser/*.hs can be replaced by the generated ones.
Sep 15th -- Parser.hs can be replaced by the generated one.

Bio
I am a undergraduated student interested in pragramming language
field as well as one of the commiter of Pugs. I use Perl a lot in
daily works (data processing, crontab tasks, etc.), and with the PL
interest, I noticed and took a look at Perl 6 about two years ago.
I joint Pugs project and became a commiter right after one month of
the birth of it. I focused mainly on parser and added some language
constructions.

Some works listed:

Analysis of ?? :: (currently ?? !!) posted on mailing list:
http://www.nntp.perl.org/group/perl.perl6.compiler/277

Some pugs blog entry mentioned my works
Array slicing http://pugs.blogs.com/pugs/2005/03/day_36_many_thr.html
debian control files and :key(val)
http://pugs.blogs.com/pugs/2005/03/day_56_leopold_.html
eval fix http://pugs.blogs.com/pugs/2005/03/day_54_yapctaip.html
http://pugs.blogs.com/pugs/2005/03/day_50_pugscc_r.html

Link to Further Information:
http://svn.openfoundry.org/pugs/misc/pX/Common/Pugs-Grammar-MiniPerl6/README

Tuesday, May 23, 2006

May 23rd, 2006 -- Ratchet rocks

These days the Pugs repository was unusually silent, so did the channel which is full of joining and leaving messages. The main reason might be the disappearance of Audrey: her last commit was on May 18th, five days ago.

These days I spent most of development time on understanding and fixing ratchet. The goal was to make it handle compile_p6grammar.pl correctly. I achieved it several hours ago by properly handle match objects generated by quantified subrules. The result is:
$ time perl compile_p6grammar.pl MiniPerl6.grammar > MiniPerl6.pm
real 0m0.602s
user 0m0.516s
sys 0m0.027s

which took about two and half a minutes to complete, a two hundred times speed up. Amazing.

Sunday, May 14, 2006

May 14th, 2006 -- Ratchet, don't backtrack

As I mentioned several days ago, the Pugs::Grammar::MiniPerl6 was almost finished so that I turned my attention to Pugs::Compiler::Rule. fglock++ just committed the first version of ratchet parser which will not backtrack by default and will give an approximately 10x+ speedup by eliminating backtracking.

Currently the ratchet support is far from complete. 5 of 24 test cases failed before yesterday. I fixed one and found a serious bug on $pos resetting. I added two test cases for it so that it now fails 6 of 26 test cases.

fglock++ marked the failed tests as TODO and I found that I made a mistake on the meaning of "no backtracking," thanks to him and pmichaud++ correcting my understanding. I'll spend some time on the (?>) and section of backtracking of the perlre manual page.

Wednesday, May 10, 2006

May 10th, 2006 -- Variables! Tests!

As I mentionedyesterday, Makefile.PL and a basic test file were added. But there were still something missing. I fixed the t/basic.t so that the test really passes. And the last two tests are corrected by updating the compile_p6grammar.pl to the one I used which deal with backslashes correctly.

On the another hand, I originally planned to support statements more than a single "return." But I dropped the idea due to two reasons:

  1. I was not familiar enough to rules to write the grammar.

  2. Translating general statements to Haskell is not straight forward.

But I implemented an alternative: temporary variables.

Assignment is not provided so variables can be translated to a simple "let ... in ..." expression, pure lambda one. The old yada example can now be written as
rule yadaLiteral {
$<sym> := [ <'...'> | <'???'> | <'!!!'> ]
{ my $yada = doYada( $<sym> );
my $err = Val( VStr( $<sym> ~ " - not yet implemented") );
return App(
Var( $yada ),
Nothing,
[ $err ]
)
}
}
and translated into (the production rule part)
let yada = (doYada sym) in
let err = (Val (VStr (sym ++ " - not yet implemented"))) in
return $ (App (Var yada) Nothing [err])


Final notes:

  • Now it takes nearly three minutes for compile_p6grammar.pl to translate the grammar to .pm file.

  • There are three kinds of variables now: named capture, numbered capture, and temporary. Some renaming (prefix or suffix adding) may be performed to prevent conflict.

Tuesday, May 09, 2006

May 9th, 2006 -- Grammar and module formed

It was really a long weekend and Monday. And a progressive Monday night and Tuesday. However, I forgot to write the blog entry last night. The numbered captures ($0, $1, ... ) are supported yesterday as well as escaped characters in string. But it didn't work until I commited the part of code dealing with character classes in Pugs::Emitter::Rule::Perl5 several minutes before. And the grammar now accept the array constructor "[1, 2, 3]".

clkao++ added the Makefile.PL and test suit making Pugs::Grammar::MiniPerl6 more like a module instead of the sank box of my own. I have to admit that I haven't written any perl module before and just studied the Makefile.PL in Pugs::Compiler::Rule.

The building process includes an invokation of compile_p6grammar.pl to translate the grammar file to the .pm file. I tried to use lrep instead but found character classes are not supported (just as what happened in Pugs::Compiler::Rule). But fglock++ stop me from switching to lrep and got offline right after that -- could you tell me why should I stay using compile_p6grammar.pl? :)

Currently all language constructions I planned to support in MiniPerl6 are implemented (except the "(a, b)" one but it's not necessary). I'll add more tests for it then move my concentration on Pugs::Compiler::Rule.

Updated: fglock++ left an IRC message to me "lrep is under a rewrite - compile_p6grammar.pl has less features, but it is more up-to-date". Ok, I'll use you script then :)

Friday, May 05, 2006

May 4th, 2006 -- A long silence, and function application

The last post was on midnight of Sunday, 4 days ago. There were three in class midterms in 37 hours from Tuesday morning to noon of Wednesday. They are "special topics on graph algorithms," PDE, and compiler. Yes, I'm taking a compiler course this semester, which is a required course for junior. The exam problem is quite cool: given a grammar of Java-like (or may it's really a snip of java spec) array-newing syntax, write a parser accepting it, 100 points. No other constrains on algorithm. And here comes the bonus: write another one, 50 points. It took me some time to convince myself that write a parser in Perl 6 Rule will not count.

Then I wrote the proposal of summer of code and submitted it in evening. There are some more detail thought about the new plan with not many new ideas. But the may-faced challenging part is one of them:
  1. The parsed grammar saved in Perl 5 hash-and-array tree-like data structure is extremely complex and nearly unreadable. Since the correctness has not been tested much, debugging modules using it is a difficult job.

  2. We don't know if the Perl 5 Rule parser is powerful and efficient enough to handle the full grammar. Currently translating a sample grammar file with five small rules takes one minute CPU time on a Xeon 2.8GHz machine (may be unprecise due to HTT).

  3. We don't know if Parsec is powerful enough to handle all functionality provided by Perl 6 Rules. Hopefully, we should be able to use only those which can be translated easily in the first version of the full grammar of Perl 6. And since Parsec code was copied into Pugs' source tree for the ease of modifying, it's not that serious.
clkao asked my plan on that if the second one really happened.
clkao> scw: what are you going to do if perl5 based rule parser is too slow?
scw> clkao: tune it :p
* clkao grins
I love the idea.

And some real-work reports. I fixed the variable and function naming rule so that it doesn't consume too many things in the afternoon. That makes function application as a operand of summation works. Then a little rewrite of rules of function application with parameters makes it work, too.

The performance problem seems affect my development more and more. I now do changing only on the generated Perl 5 module and only when everything works I'll merge the change to the grammar file and run the translation script again to see if the result is still correct. The reason is that I found the rule I used in the afternoon took one minutes and twelve seconds to translate and the one I committed several minutes before took nearly two minutes. It seems time to hack into Pugs::Compiler::Rule.

On the another hand, fglock++ revised the benchmark between P::C::R, inlined matching functions and Perl 5 regular expression. The performance seem to be ten times better one by one from the last one (P5regex) to the first one (P::C::R). Maybe I'll then write my own translation script in P5regex (a *lot* to do then) or inlined P::C::R structure instead of current one which uses P::C::R.