Architecture of the Amara Parser
The Amara parser is based on Expat, and can build a tree or drive SAX2 from Expat low-level parse. In addition, here are some key features and characteristics:
- XInclude support
- DTD validation support
- Using specialized tree node classes rather than defaults
- Signaling certain events in the tree building, such as append of a child element or addition of an attribute
NOT YET IMPLEMENTED Parser filters
What happens where
The starting point for parsing is: amara/lib/tree.py, parse function, which handles basic parsing flags and calls _domlette.parse from , function reader_parse in lib/src/expat/reader.c for the real work of tree building.
The SAX2 parsing is defined in lib/src/expat/sax_filter.c
Notes for the C code
Most modules have an init and fini. The fini (e.g. DomletteReader_Fini ) is not technically needed, but is useful if you want to run Valgrind.
Domlette.c calls all the inits (init_domlette) and finis ('fini_domlette').
In all the files private stuff is at the top, public stuff at the bottom (for one thing saves a lot of forward declarations.) (GCC trivia: if you fwd declare a static function it will not get in-lined.)
To enable debugger prints in c, edit debuh.h and uncomment the #define DEBUG_PARSER. The debug flag for Python enables some debugging behavior automatically.
reader_parse and reader_parse_entity are convenience entries to builder_parse in reader.c
Build document with XInclude processing
# XInclude processing occurs by default _domlette.parse(isrc)
Build document without XInclude processing
Obsolete
isrc.process_xincludes = False _domlette.parse(isrc)
Build document with XInclude processing and custom Element classes
Obsolete
# XInclude processing occurs by default
factories = {Node.ELEMENT_NODE_TYPE: MyElement}
cDomlettec.ValParse(isrc, factories)
Build document without XInclude processing and custom Element classes
Obsolete
isrc.processXIncludes = False
factories = {Node.ELEMENT_NODE_TYPE: MyElement}
cDomlettec.ValParse(isrc, factories)Non-validating Document Building
All Obsolete
- Build document with XInclude processing
- # XInclude processing occurs by default
cDomlettec.NonvalParse(isrc)
- Build document without XInclude processing
- isrc.processXIncludes = False
cDomlettec.NonvalParse(isrc)
- Build document with XInclude processing and without external XML entities
- # XInclude processing occurs by default
cDomlettec.NonvalParse(isrc, False)
- Build document without XInclude processing and without external XML entities
- isrc.processXIncludes = False
cDomlettec.NonvalParse(isrc, False)
- Build document with XInclude processing and custom Element classes
- # XInclude processing occurs by default
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.NonvalParse(isrc, nodeFactories=factories)
- Build document without XInclude processing and custom Element classes
- isrc.processXIncludes = False
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.NonvalParse(isrc, nodeFactories=factories)
- Build document with XInclude processing and without external XML entities and custom Element classes
- # XInclude processing occurs by default
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.NonvalParse(isrc, False, factories)
- Build document without XInclude processing and without external XML entities and custom Element classes
- isrc.processXIncludes = False
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.NonvalParse(isrc, False, factories)
Parsed Entity Building
- Build entity with XInclude processing
- # XInclude processing occurs by default
cDomlettec.ParseFragment(isrc)
- Build entity without XInclude processing
- isrc.processXIncludes = False
cDomlettec.ParseFragment(isrc)
- Build entity with XInclude processing and with additional in-scope namespaces
- # XInclude processing occurs by default namespaces = {'ns-prefix': 'ns-uri'}
cDomlettec.ParseFragment(isrc, namespaces)
- Build entity without XInclude processing and with additional in-scope namespaces
- isrc.processXIncludes = False namespaces = {'ns-prefix': 'ns-uri'}
cDomlettec.ParseFragment(isrc, namespaces)
- Build entity with XInclude processing and custom Element classes
- # XInclude processing occurs by default
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.ParseFragment(isrc, nodeFactories=factories)
- Build entity without XInclude processing and custom Element classes
- isrc.processXIncludes = False
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.ParseFragment(isrc, nodeFactories=factories)
- Build entity with XInclude processing and with additional in-scope namespaces and custom Element classes
- # XInclude processing occurs by default namespaces = {'ns-prefix': 'ns-uri'}
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.ParseFragment(isrc, namespaces, factories)
- Build entity without XInclude processing and with additional in-scope namespaces and custom Element classes
- isrc.processXIncludes = False namespaces = {'ns-prefix': 'ns-uri'}
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.ParseFragment(isrc, namespaces, factories)
Event-based Document Parsing
Note: I'm only covering the extensions to Python's SAX API that are provided
- Parse document as an iterator
class ContentHandler:
def __init__(self, parser):
self._parser = parser
def characterData(self, data):
self._parser.setProperty(PROPERTY_YIELD_RESULT, data)
parser = cDomlettec.
parser.setContentHandler(ContentHandler(parser))
parser.setFeature(FEATURE_GENERATOR, True)
for data in parser.parse(isrc):
print data
Types provided by Domlette:
CharacterData(node) -- private interface; not actually exposed in module
comment(CharacterData)
text(CharacterData)
Methods provided by module: