Contents
Amara 2 will target Python 2.5 and 2.6. There will be an Amara 3.0 branch targeting the big changes in Python 3K.
Module layout
The top-level namespace is "amara", not "amara2" because a namespace is not a great place for versioning.
This might cause some pain for those who want to use Amara 1.x as well, especially while Amara 2 is not yet ready for production, but long-term it's the right thing to do. Tools such as virtualenv can provide more systematic relief for such users.
+ amara (xml core libraries at this level, including core parsing API ("tree"--a very low level infoset node data model))
| (includes reader.py -- Saxlette)
|
+--+ lib
| |
| +-- iri
| |
| +-- inputsource
|
+-- writer (General purpose writers. essentially the opposites of Saxlette. includes output params)
|
+-- dom (Python DOM binding: builds on amara.tree)
|
+-- bindery (Dynamic data binding: builds on amara.tree)
|
+-- xpath (XPath impl: operates on amara.tree and derivatives)
|
+-- xslt (XSLT impl: operates on amara.tree and derivatives)
| |
| +-- elements (the element classes)
| |
| +-- readers (interpretation of XML structures)
| |
| +-- xpattern (XSLT patterns)
| |
| +-- extensions (functions and elements)
| |
| +-- functions (core functions)
| |
| +-- exslt
| |
| +-- writers (specializations of writers for XSLT)
|
+-- xupdate
|
+-- schematron
|
+-- relaxngSample imports:
- from amara import parse #returns domlette root node
- from amara.bindery import parse #returns bindery root node
from amara.dom import parse #returns dom if ya just gotta have it for some reason
- from amara.lib import iri
- from amara.lib import inputsource
- from amara import xpath
- from amara.xslt import transform
Input sources
Loosely based on LSInput, and represents one specific input source, which is one of the following:
- string
- public identifier (requires an XML catalogue)
- system identifier (IRI)
- a byte stream
- a character stream.
You create an inputsource instance as follows:
1 from amara.lib import inputsource
2
3 #From URI
4 isrc = inputsource("http://xmlhack.com/read.php?item=1560")
5
6 # From string:
7 isrc = inputsource("<spam>eggs</spam>", baseiri=u"http://spam.com/base")
8
9 # From byte stream:
10 f = open('test.xml')
11 isrc = inputsource(f, baseiri=u"file:///spam/test.xml")
12
13 # From file:
14 isrc = inputsource('test.xml')
15
inputsource object attributes
isrc.baseiri: Base IRI, used for relative resolution of IRIs. Can be None, but this is strongly discouraged
isrc.resolve(iri): resolve an IRI against this inputsource, generally using the base IRI to determine the new InputSource IRI
isrc.encoding: the character encoding, if relevant and known, otherwise None. The encoding must be a string acceptable for an XML encoding declaration. Generally an encoding specified by means of this attribute will override any encoding specified in the XML declaration or the Text declaration, or an encoding obtained from a higher level protocol, such as HTTP
Custom resolution
You can provide your own custom entity resolution (e.g. to handle non-URL IRI types) by subclassing inputsource, or equivalent device. Generally you just need to override the _resolve() entity. The resolve() method is smart enough to use the same class for new inputsources.
Printers and writers
- Printer - low level deserialization to output stream
- Writer - high level logic at the markup structural level
Note:
- Parsers - low level string/stream tokenizers
- Reader - structural interpreter of markup
Domlette node families
- Amara 2.0 Domlette has an even thinner relationship to DOM. It is in effect a very low level node data model that is suitable for use by DOM as well as Bindery. The use by Bindery means that rather than using DOM method and attribute names it uses names that minimize any clash with information item names in the source XML. It does this generally by prepending "xml_". The low level Domlette nodes are subclassed for Amara bindery nodes and there is a separate Amara DOM module (amara.xml.dom) that provides the Python binding to W3C DOM.
