Obsolete information on parser architecture according to use-cases provided by Jeremy

Domlette C impl

All Domlette building starts in reader.c

Parsing (e.g. SAX2 parser) is in xmlparser.c

Most modules have an init and fini. The fini (e.g. DomletteReader_Fini ) is not technically needed, but is useful if you want to run Valgrind.

Domlette.c calls all the inits and finis.

In all the files private stuff is at the top, public stuff at the bottom (for one thing saves a lot of forward declarations.) (GCC trivia: if you fwd declare a static function it will not get inlined.)

To enable debugger prints in c, edit debuh.h and uncomment the #define DEBUG_PARSER. The debug flag for Python enables some debugging behavior automatically.

parse_event_handler.c: the expat handler to DOM builder

parse_document and parse_entity are convenience entries to builder_parse in parse_event_handler.c

Types provided by Domlette:

Methods provided by module:

Use Cases

Note: Code snippets use isrc which is any valid amara.lib.inputsource.InputSource instance

Validating Document Building

Build document with XInclude processing

# XInclude processing occurs by default
_domlette.val_parse(isrc)

Build document without XInclude processing

isrc.process_xincludes = False
cDomlettec.ValParse(isrc)

Build document with XInclude processing and custom Element classes

# XInclude processing occurs by default
factories = {Node.ELEMENT_NODE_TYPE: MyElement}
cDomlettec.ValParse(isrc, factories)

Build document without XInclude processing and custom Element classes

isrc.processXIncludes = False
factories = {Node.ELEMENT_NODE_TYPE: MyElement}
cDomlettec.ValParse(isrc, factories)

Non-validating Document Building


- Build document with XInclude processing

- Build document without XInclude processing

- Build document with XInclude processing and without external XML entities

- Build document without XInclude processing and without external XML entities

- Build document with XInclude processing and custom Element classes

- Build document without XInclude processing and custom Element classes

- Build document with XInclude processing and without external XML entities and custom Element classes

- Build document without XInclude processing and without external XML entities and custom Element classes

Parsed Entity Building


- Build entity with XInclude processing

- Build entity without XInclude processing

- Build entity with XInclude processing and with additional in-scope namespaces

- Build entity without XInclude processing and with additional in-scope namespaces

- Build entity with XInclude processing and custom Element classes

- Build entity without XInclude processing and custom Element classes

- Build entity with XInclude processing and with additional in-scope namespaces and custom Element classes

- Build entity without XInclude processing and with additional in-scope namespaces and custom Element classes

Event-based Document Parsing


Note: I'm only covering the extensions to Python's SAX API that are provided

- Parse document as an iterator

Amara/Architecture/Parser/Cruft (last edited 2010-12-03 17:56:18 by LuisMiguel)