Architecture of the Amara Parser

The Amara parser is based on Expat, and can build a tree or drive SAX2 from Expat low-level parse. In addition, here are some key features and characteristics:

What happens where

The starting point for parsing is: amara/lib/tree.py, parse function, which handles basic parsing flags and calls _domlette.parse from , function reader_parse in lib/src/expat/reader.c for the real work of tree building.

The SAX2 parsing is defined in lib/src/expat/sax_filter.c

Notes for the C code

Most modules have an init and fini. The fini (e.g. DomletteReader_Fini ) is not technically needed, but is useful if you want to run Valgrind.

Domlette.c calls all the inits (init_domlette) and finis ('fini_domlette').

In all the files private stuff is at the top, public stuff at the bottom (for one thing saves a lot of forward declarations.) (GCC trivia: if you fwd declare a static function it will not get in-lined.)

To enable debugger prints in c, edit debuh.h and uncomment the #define DEBUG_PARSER. The debug flag for Python enables some debugging behavior automatically.

reader_parse and reader_parse_entity are convenience entries to builder_parse in reader.c

Build document with XInclude processing

# XInclude processing occurs by default
_domlette.parse(isrc)

Build document without XInclude processing

Obsolete

isrc.process_xincludes = False
_domlette.parse(isrc)

Build document with XInclude processing and custom Element classes

Obsolete

# XInclude processing occurs by default
factories = {Node.ELEMENT_NODE_TYPE: MyElement}
cDomlettec.ValParse(isrc, factories)

Build document without XInclude processing and custom Element classes

Obsolete

isrc.processXIncludes = False
factories = {Node.ELEMENT_NODE_TYPE: MyElement}
cDomlettec.ValParse(isrc, factories)

Non-validating Document Building


All Obsolete

- Build document with XInclude processing

- Build document without XInclude processing

- Build document with XInclude processing and without external XML entities

- Build document without XInclude processing and without external XML entities

- Build document with XInclude processing and custom Element classes

- Build document without XInclude processing and custom Element classes

- Build document with XInclude processing and without external XML entities and custom Element classes

- Build document without XInclude processing and without external XML entities and custom Element classes

Parsed Entity Building


- Build entity with XInclude processing

- Build entity without XInclude processing

- Build entity with XInclude processing and with additional in-scope namespaces

- Build entity without XInclude processing and with additional in-scope namespaces

- Build entity with XInclude processing and custom Element classes

- Build entity without XInclude processing and custom Element classes

- Build entity with XInclude processing and with additional in-scope namespaces and custom Element classes

- Build entity without XInclude processing and with additional in-scope namespaces and custom Element classes

Event-based Document Parsing


Note: I'm only covering the extensions to Python's SAX API that are provided

- Parse document as an iterator

class ContentHandler:
    def __init__(self, parser):
        self._parser = parser
    def characterData(self, data):
        self._parser.setProperty(PROPERTY_YIELD_RESULT, data)
    parser = cDomlettec.
    parser.setContentHandler(ContentHandler(parser))
    parser.setFeature(FEATURE_GENERATOR, True)
    for data in parser.parse(isrc):
        print data

Amara2/Architecture/Parser (last edited 2010-03-03 21:25:47 by UcheOgbuji)