comedy.parser — Veeyu parser

This module provides a handwritten poor parser for Veeyu. Enough poor to parse every Veeyu code.

exception comedy.parser.SyntaxError(text, offset, message='')

Exception which rise when the text to be parsed has an invalid syntax.

text

exception text

offset

exception offset

line

Zero-based offset of the line in int or long.

column

Zero-based offset of the column in int or long.

line_string

The unicode string of the line.

filename

exception filename

lineno

exception lineno

msg

exception msg

print_file_and_line

exception print_file_and_line

comedy.parser.parse_form(text, offset=0, terminates=None)

Parses a form text and returns a comedy.node.Form instance.

Parsing atomic forms e.g. comedy.node.Symbol:

>>> parse_form(u'sym')
(comedy.node.Symbol(u'sym'), 3)

Parsing comedy.node.Quote:

>>> parse_form(u':(sym)')
(comedy.node.Quote(u'sym'), 6)
>>> parse_form(u':sym')
(comedy.node.Quote(u'sym'), 4)

Parsing comedy.node.Definition:

>>> parse_form(u'a = b')  
(comedy.node.Definition(comedy.node.Symbol(u'a'),
                        comedy.node.Symbol(u'b')), 5)
>>> parse_form(u'c:=d')  
(comedy.node.Definition(comedy.node.Symbol(u'c'),
                        comedy.node.Symbol(u'd'),
                        local=True), 4)

Parsing comedy.node.NumberLiteral:

>>> parse_form(u'123')
(comedy.node.NumberLiteral(123), 3)

Parsing comedy.node.StringLiteral:

>>> parse_atomic_form(u'"Avishai Cohen Trio"')
(comedy.node.StringLiteral(u'Avishai Cohen Trio'), 20)

Parsing comedy.node.Attribute:

>>> parse_form(u'rcvr.attr')
(comedy.node.Attribute(comedy.node.Symbol(u'rcvr'), u'attr'), 9)
>>> parse_form(u'(rcvr .attr)')
(comedy.node.Attribute(comedy.node.Symbol(u'rcvr'), u'attr'), 12)
>>> parse_form(u'(rcvr). attr')
(comedy.node.Attribute(comedy.node.Symbol(u'rcvr'), u'attr'), 12)
>>> parse_form(u':q . attr')
(comedy.node.Attribute(comedy.node.Quote(u'q'), u'attr'), 9)
>>> parse_form(u':(q) .attr')
(comedy.node.Attribute(comedy.node.Quote(u'q'), u'attr'), 10)
>>> parse_form(u':q.a.b')  
(comedy.node.Attribute(comedy.node.Attribute(comedy.node.Quote(u'q'),
                                             u'a'), u'b'), 6)
>>> a, offset = parse_form(u'a[]')
>>> a  
comedy.node.Attribute(comedy.node.Symbol(u'a'),
                      comedy.node.Index(comedy.node.ArgumentList()))
>>> offset
3
>>> b, offset = parse_form(u'a.[]')
>>> offset
4
>>> b == a
True
>>> a, offset = parse_form(u'a{}')
>>> a  
comedy.node.Attribute(comedy.node.Symbol(u'a'),
                      comedy.node.Block(comedy.node.Program()))
>>> offset
3
>>> b, offset = parse_form(u'a.{}')
>>> offset
4
>>> b == a
True

Parsing comedy.node.Call:

>>> parse_form(u'func()')
(comedy.node.Call(comedy.node.Symbol(u'func')), 6)
>>> parse_form(u'f(x, y)')  
(comedy.node.Call(comedy.node.Symbol(u'f'),
             comedy.node.ArgumentList([comedy.node.Symbol(u'x'),
                                       comedy.node.Symbol(u'y')])),
 7)
>>> parse_form(u'f: x, y')  
(comedy.node.Call(comedy.node.Symbol(u'f'),
             comedy.node.ArgumentList([comedy.node.Symbol(u'x'),
                                       comedy.node.Symbol(u'y')])),
 7)
>>> binop, offset = parse_form(u'a b c')
>>> binop  
comedy.node.Call(...)
>>> binop.function
comedy.node.Attribute(comedy.node.Symbol(u'a'), u'b')
>>> binop.arguments
comedy.node.ArgumentList([comedy.node.Symbol(u'c')])
>>> offset
5

A SyntaxError rise when the given code is invalid:

>>> parse_form(u'(sym')
Traceback (most recent call last):
  ...
SyntaxError: 0:4: expected end of parentheses ')'
(sym
    ^
>>> parse_form(u'((sym)')
Traceback (most recent call last):
  ...
SyntaxError: 0:6: expected end of parentheses ')'
((sym)
      ^
>>> parse_form(u'()')
Traceback (most recent call last):
  ...
SyntaxError: 0:1: unexpected character u')'
()
 ^
>>> parse_form(u'abc .')
Traceback (most recent call last):
  ...
SyntaxError: 0:5: expected attribute name
abc .
     ^
Parameters:
  • text – a unicode string to be parsed. a str is also acceptable
  • offset – start position to parse in int or long
  • terminates – a function object that takes text and offset and predicates offset of termination. used internally
Returns:

a tuple of parsed comedy.node.Form instance and parsed zero-based last position in int or long. the last position is not subtracted from the given offset

comedy.parser.parse_program(text, offset=0, terminates=None)

Parses a program text and returns comedy.node.Program instance.

>>> parse_program(u'a; b')  
(comedy.node.Program([comedy.node.Symbol(u'a'),
                      comedy.node.Symbol(u'b')]), 4)
>>> parse_program(u'a ;b')  
(comedy.node.Program([comedy.node.Symbol(u'a'),
                      comedy.node.Symbol(u'b')]), 4)
>>> parse_program(u'a ;; b')  
(comedy.node.Program([comedy.node.Symbol(u'a'),
                      comedy.node.Symbol(u'b')]), 6)
>>> parse_program(u' ; a ;; b')  
(comedy.node.Program([comedy.node.Symbol(u'a'),
                      comedy.node.Symbol(u'b')]), 9)
>>> parse_program(u'a\r\nb')  
(comedy.node.Program([comedy.node.Symbol(u'a'),
                      comedy.node.Symbol(u'b')]), 4)
comedy.parser.parse_atomic_form(text, offset=0, terminates=None)

Parses an atomic form text and returns a comedy.node.AtomicForm instance.

Parsing comedy.node.Symbol:

>>> parse_atomic_form(u'sym')
(comedy.node.Symbol(u'sym'), 3)
>>> parse_atomic_form(u'(sym)')
(comedy.node.Symbol(u'sym'), 5)
>>> parse_atomic_form(u'(((sym)))')
(comedy.node.Symbol(u'sym'), 9)

Parsing comedy.node.Quote:

>>> parse_atomic_form(u':(sym)')
(comedy.node.Quote(u'sym'), 6)
>>> parse_atomic_form(u':(( sym) )')
(comedy.node.Quote(u'sym'), 10)
>>> parse_atomic_form(u':sym')
(comedy.node.Quote(u'sym'), 4)

Parsing comedy.node.Program:

>>> parse_atomic_form(u'{ }')
(comedy.node.Block(comedy.node.Program()), 3)
>>> parse_atomic_form(u'''{ a; b
... c(123)
... d}''')  
(comedy.node.Block(comedy.node.Program([comedy.node.Symbol(u'a'),
  comedy.node.Symbol(u'b'),
  comedy.node.Call(comedy.node.Symbol(u'c'),
    comedy.node.ArgumentList([comedy.node.NumberLiteral(123)])),
  comedy.node.Symbol(u'd')])),
 16)

Parsing comedy.node.Index:

>>> parse_atomic_form(u'[ ]')
(comedy.node.Index(comedy.node.ArgumentList()), 3)
>>> index, offset = parse_atomic_form(u'[ 1, 2, 3 ]')
>>> index  
comedy.node.Index(comedy.node.ArgumentList([...]))
>>> offset
11
>>> list(index.arguments)  
[(None, comedy.node.NumberLiteral(1)),
 (None, comedy.node.NumberLiteral(2)),
 (None, comedy.node.NumberLiteral(3))]

Parsing comedy.node.NumberLiteral:

>>> parse_atomic_form(u'123')
(comedy.node.NumberLiteral(123), 3)
>>> parse_atomic_form(u'-3.14')  
(comedy.node.NumberLiteral(-3.14...), 5)
>>> parse_atomic_form(u'10e-4')
(comedy.node.NumberLiteral(0.001), 5)
>>> parse_atomic_form(u'-3.141e+4')
(comedy.node.NumberLiteral(-31410.0), 9)

Parsing comedy.node.StringLiteral:

>>> parse_atomic_form(u'"Avishai Cohen Trio"')
(comedy.node.StringLiteral(u'Avishai Cohen Trio'), 20)
>>> parse_atomic_form(u'"Avishai\\tCohen\\nTrio\n"')
(comedy.node.StringLiteral(u'Avishai\tCohen\nTrio\n'), 23)
>>> parse_atomic_form(u"'Avishai Cohen Trio'")
(comedy.node.StringLiteral(u'Avishai Cohen Trio'), 20)
>>> parse_atomic_form(u"'Avishai\\tCohen\\nTrio\n'")
(comedy.node.StringLiteral(u'Avishai\\tCohen\\nTrio\n'), 23)
>>> parse_atomic_form(u'p"with prefix"')
(comedy.node.StringLiteral(u'with prefix', prefix='p'), 14)
>>> parse_atomic_form(u"p'with prefix'")
(comedy.node.StringLiteral(u'with prefix', prefix='p'), 14)
>>> parse_atomic_form(u'"Avishai Cohen Trio"')
(comedy.node.StringLiteral(u'Avishai Cohen Trio'), 20)
>>> parse_atomic_form(u'"' u'""Avishai\\tCohen\\nTrio\n""' u'"')
(comedy.node.StringLiteral(u'Avishai\tCohen\nTrio\n'), 27)
>>> parse_atomic_form(u"'''Avishai Cohen Trio'''")
(comedy.node.StringLiteral(u'Avishai Cohen Trio'), 24)
>>> parse_atomic_form(u"'''Avishai\\tCohen\\nTrio\n'''")
(comedy.node.StringLiteral(u'Avishai\\tCohen\\nTrio\n'), 27)
>>> parse_atomic_form(u'p"' u'""with prefix""' u'"')
(comedy.node.StringLiteral(u'with prefix', prefix='p'), 18)
>>> parse_atomic_form(u"p'''with prefix'''")
(comedy.node.StringLiteral(u'with prefix', prefix='p'), 18)
Parameters:
  • text – a unicode string to be parsed. a str is also acceptable
  • offset – start position to parse in int or long
  • terminates – a function object that takes text and offset and predicates offset of termination. used internally
Returns:

a tuple of parsed comedy.node.AtomicForm instance and parsed zero-based last position in int or long. the last position is not subtracted from the given offset

comedy.parser.parse_arguments(text, offset=0, named_arguments=True, terminates=None)

Parses an argument list text and returns a comedy.node.ArgumentList instance.

Used internally.

Parsing empty arguments:

>>> parse_arguments('')
(comedy.node.ArgumentList(), 0)
>>> parse_arguments('  ')
(comedy.node.ArgumentList(), 2)

Parsing positional arguments:

>>> parse_arguments('arg')
(comedy.node.ArgumentList([comedy.node.Symbol(u'arg')]), 3)
>>> parse_arguments('a ,b,c')  
(comedy.node.ArgumentList([comedy.node.Symbol(u'a'),
                           comedy.node.Symbol(u'b'),
                           comedy.node.Symbol(u'c')]), 6)
>>> parse_arguments(':abc, :def')  
(comedy.node.ArgumentList([comedy.node.Quote(u'abc'),
                           comedy.node.Quote(u'def')]), 10)

Parsing keyword (named) arguments:

>>> parse_arguments('abc: def')  
(comedy.node.ArgumentList([(u'abc', comedy.node.Symbol(u'def'))]),
 8)
>>> parse_arguments('a::b')  
(comedy.node.ArgumentList([(u'a', comedy.node.Quote(u'b'))]), 4)
>>> parse_arguments('a ::b')  
(comedy.node.ArgumentList([(u'a', comedy.node.Quote(u'b'))]), 5)
>>> parse_arguments('a : :b')  
(comedy.node.ArgumentList([(u'a', comedy.node.Quote(u'b'))]), 6)

Mixed arguments:

>>> parse_arguments('a ,b: c, d,e:f')  
(comedy.node.ArgumentList([comedy.node.Symbol(u'a'),
                           (u'b', comedy.node.Symbol(u'c')),
                           comedy.node.Symbol(u'd'),
                           (u'e', comedy.node.Symbol(u'f'))]), 14)

Compex arguments:

>>> args, offset = parse_arguments(u'x.add(y), x: xs, y: ys, x.cmp(y)')
>>> args  
comedy.node.ArgumentList([...])
>>> offset
32
>>> for name, arg in args:
...     print name, '=>', arg.generate_code_string()
...
None => x.add(y)
comedy.node.Symbol(u'x') => xs
comedy.node.Symbol(u'y') => ys
None => x.cmp(y)
Parameters:
  • text – a unicode string to be parsed. a str is also acceptable
  • offset – start position to parse in int or long
  • terminates – a function object that takes text and offset and predicates offset of termination
Returns:

a tuple of parsed comedy.node.ArgumentList instance and parsed zero-based last position in int or long. the last position is not subtracted from the given offset

Project Versions

Previous topic

comedy.node — Abstract syntax tree

Next topic

comedy.visitor — Simple visitor pattern

This Page