Obsolete information on parser architecture according to use-cases provided by Jeremy
Domlette C impl
All Domlette building starts in reader.c
Parsing (e.g. SAX2 parser) is in xmlparser.c
Most modules have an init and fini. The fini (e.g. DomletteReader_Fini ) is not technically needed, but is useful if you want to run Valgrind.
Domlette.c calls all the inits and finis.
In all the files private stuff is at the top, public stuff at the bottom (for one thing saves a lot of forward declarations.) (GCC trivia: if you fwd declare a static function it will not get inlined.)
To enable debugger prints in c, edit debuh.h and uncomment the #define DEBUG_PARSER. The debug flag for Python enables some debugging behavior automatically.
parse_event_handler.c: the expat handler to DOM builder
parse_document and parse_entity are convenience entries to builder_parse in parse_event_handler.c
Types provided by Domlette:
- node
- attribute(node)
- processing_instruction(node)
CharacterData(node) -- private interface; not actually exposed in module
comment(CharacterData)
- container(node) -- private interface; not actually exposed
- element(container)
- entity(container)
- namespace(node)
text(CharacterData)
Methods provided by module:
- parse
- parse_fragment
Use Cases
Note: Code snippets use isrc which is any valid amara.lib.inputsource.InputSource instance
Validating Document Building
Build document with XInclude processing
# XInclude processing occurs by default _domlette.val_parse(isrc)
Build document without XInclude processing
isrc.process_xincludes = False cDomlettec.ValParse(isrc)
Build document with XInclude processing and custom Element classes
# XInclude processing occurs by default
factories = {Node.ELEMENT_NODE_TYPE: MyElement}
cDomlettec.ValParse(isrc, factories)
Build document without XInclude processing and custom Element classes
isrc.processXIncludes = False
factories = {Node.ELEMENT_NODE_TYPE: MyElement}
cDomlettec.ValParse(isrc, factories)Non-validating Document Building
- Build document with XInclude processing
- # XInclude processing occurs by default
cDomlettec.NonvalParse(isrc)
- Build document without XInclude processing
- isrc.processXIncludes = False
cDomlettec.NonvalParse(isrc)
- Build document with XInclude processing and without external XML entities
- # XInclude processing occurs by default
cDomlettec.NonvalParse(isrc, False)
- Build document without XInclude processing and without external XML entities
- isrc.processXIncludes = False
cDomlettec.NonvalParse(isrc, False)
- Build document with XInclude processing and custom Element classes
- # XInclude processing occurs by default
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.NonvalParse(isrc, nodeFactories=factories)
- Build document without XInclude processing and custom Element classes
- isrc.processXIncludes = False
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.NonvalParse(isrc, nodeFactories=factories)
- Build document with XInclude processing and without external XML entities and custom Element classes
- # XInclude processing occurs by default
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.NonvalParse(isrc, False, factories)
- Build document without XInclude processing and without external XML entities and custom Element classes
- isrc.processXIncludes = False
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.NonvalParse(isrc, False, factories)
Parsed Entity Building
- Build entity with XInclude processing
- # XInclude processing occurs by default
cDomlettec.ParseFragment(isrc)
- Build entity without XInclude processing
- isrc.processXIncludes = False
cDomlettec.ParseFragment(isrc)
- Build entity with XInclude processing and with additional in-scope namespaces
- # XInclude processing occurs by default namespaces = {'ns-prefix': 'ns-uri'}
cDomlettec.ParseFragment(isrc, namespaces)
- Build entity without XInclude processing and with additional in-scope namespaces
- isrc.processXIncludes = False namespaces = {'ns-prefix': 'ns-uri'}
cDomlettec.ParseFragment(isrc, namespaces)
- Build entity with XInclude processing and custom Element classes
- # XInclude processing occurs by default
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.ParseFragment(isrc, nodeFactories=factories)
- Build entity without XInclude processing and custom Element classes
- isrc.processXIncludes = False
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.ParseFragment(isrc, nodeFactories=factories)
- Build entity with XInclude processing and with additional in-scope namespaces and custom Element classes
- # XInclude processing occurs by default namespaces = {'ns-prefix': 'ns-uri'}
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.ParseFragment(isrc, namespaces, factories)
- Build entity without XInclude processing and with additional in-scope namespaces and custom Element classes
- isrc.processXIncludes = False namespaces = {'ns-prefix': 'ns-uri'}
factories = {Node.ELEMENT_NODE_TYPE: MyElement} cDomlettec.ParseFragment(isrc, namespaces, factories)
Event-based Document Parsing
Note: I'm only covering the extensions to Python's SAX API that are provided
- Parse document as an iterator
class ContentHandler:
def init(self, parser):
- self._parser = parser
- self._parser.setProperty(PROPERTY_YIELD_RESULT, data)
parser = cDomlettec.CreateParser() parser.setContentHandler(ContentHandler(parser)) parser.setFeature(FEATURE_GENERATOR, True) for data in parser.parse(isrc):
- print data
