Wednesday, May 24, 2006

May 24th, 2006 -- Google Summer of Code!

My proposal of "Pugs Self-hosting Bootstrap From Perl 5 and Rules" was accepted by google summer of code. There are five projects accepted from The Perl Foundation.

My proposal is listed
Shu-Chun Weng

Contact information
email: (hidden)
IRC: #perl6 with id "scw"

Perl 6 Self-hosting Bootstrap From Perl 5 and Rules

This is a subproject of Pugs ( One of the goals
of Pugs is to give an implementation of Perl 6 and help it
bootstraping. A new bootstrap plan has been proposed shortly. It
takes four steps and I propose to complete the first two steps in
this subproject. It consists of two new perl 5 modules. One
translates a very restrictive subset of Perl 6, MiniPerl6, to Haskell
and the another translates the Perl 6 Rule in to Parsec (a Haskell
parsing module) parsing code.

This is the easiest way to bootstrap we currently found. Bootstrap
from pure Perl 5 has been proven unbelievable hard due to the lack of
good parsing components. Pugs itself is written in Haskell because of
the Parsec module. It makes the parsing easier but still the parser
has grown too large to debug and maintain. Since the Perl 6 Rule has
been discussed for a long time and became stable and powerful enough,
the Perl 6 seems better be parsed by Perl 6 Rule itself.

Another feature of this project is that there is no "mixed" components
in the whole process. Bootstrap plans often consist of some parts
mixing two languages to get over the language gap before the target
can be used. But we carefully designed the plan and languages to be
used making that every components are written in a pure Perl 5 or Perl
6. This make the project reusable.

Module Pugs::Grammar::MiniPerl6
This module includes a Perl 6 Rule file and a Perl 5 script
translates the rule into Perl 5 parser module with the help of

Module Pugs::Emitter::Rule::Parsec
This is the module used in the second step of bootstrap. It emits
Haskell code from the AST producted by Pugs::Compiler::Rule.

Module Pugs::Compiler::Rule
This is an existing module mainly written by Flavio S. Glock
(FGLOCK). Since it's heavily used is the above two modules and is
not completed, I may spend some time improving it to support more
Perl 6 Rule syntax.

Project Details
The new bootstrap plan is: (original in

1. This module, Pugs::Grammar::MiniPerl6, uses Pugs::Compiler::Rule
to read a special *mixed* Perl 6 Rule whose production rules
are written in Perl 5 (the current requirement of P::C::Rule).
1a. The rule is used to translate a subset of Perl 6, MiniPerl6,
to haskell.

2. Then, Pugs::Compiler::Emit::Parsec lands and uses this module
to translate the full Perl 6 grammar into a parser. The full
grammar can now write production rules in MiniPerl6 since
P::C::E::Parsec can use this module to translate such production
rules to haskell and makes the final output be pure Haskell.

3. When compiling Pugs, the .hs preprocessor will use
Pugs::Compiler::Emit::Parsec to accept the full Perl 6 grammar
generating Parser.hs. Then GHC will compile it to executable.

4. The executable can now read the full Perl 6 grammar again
generating compiler in PIL. Then self hosting is done.

I propose to finish the first two steps. Note that the first step
seems contain a mixed-language file, but since the Perl 5 code only
appear in production rules and are marked with 'use v5,' it's still
valid Perl 6.

The most challenging parts of this project are:

1. The parsed grammar saved in Perl 5 hash-and-array tree-like data
structure is extremely complex and nearly unreadable. Since the
correctness has not been tested much, debugging modules using it
is a difficult job.

2. We don't know if the Perl 5 Rule parser is powerful and efficient
enough to handle the full grammar. Currently translating a
sample grammar file with five small rules takes one minute CPU
time on a Xeon 2.8GHz machine (may be unprecise due to HTT).

3. We don't know if Parsec is powerful enough to handle all
functionality provided by Perl 6 Rules. Hopefully, we should be
able to use only those which can be translated easily in the
first version of the full grammar of Perl 6. And since Parsec
code was copied into Pugs' source tree for the ease of modifying,
it's not that serious.

The project has it's own directory in the Pugs' subversion repository

And the second part will appear as

The whole project will be licensed under the same licenses of the
Pugs source tree, which are currently dual licensed under GPL-2 and

Project Schedule
The following is the planned schedule:

May 31st -- Pugs::Grammar::MiniPerl6 has complete MiniPerl6 grammar
with correct production rules.
Jun 31st -- Pugs::Compiler::Rule be able to accept all syntax we
want to use in the full Perl 6 grammar.
Jul 1st -- Start organizing full Perl 6 grammar.
Jul 31st -- Pugs::Compiler::Emitter::Rule::Parsec can translate
most rule constructions.
Aug 1st -- Start rewriting exist parsing functions to Perl 6
Aug 20st -- Parser/*.hs can be replaced by the generated ones.
Sep 15th -- Parser.hs can be replaced by the generated one.

I am a undergraduated student interested in pragramming language
field as well as one of the commiter of Pugs. I use Perl a lot in
daily works (data processing, crontab tasks, etc.), and with the PL
interest, I noticed and took a look at Perl 6 about two years ago.
I joint Pugs project and became a commiter right after one month of
the birth of it. I focused mainly on parser and added some language

Some works listed:

Analysis of ?? :: (currently ?? !!) posted on mailing list:

Some pugs blog entry mentioned my works
Array slicing
debian control files and :key(val)
eval fix

Link to Further Information:


Post a Comment

Links to this post:

Create a Link

<< Home