This document is part of Seven days of Amara 2.x (Amara/Seven_days), a rapid introduction, based on practical tasks, to the many new features in the new version of Amara
Using XML model information
XML is eminently flexible, but this flexibility can be a bit of a pain for developers. Amara is all about making XML less of a pain for developers, and in Amara 2.0 you have a powerful new tool. You can control the content model of parsed XML documents, and you can use such information to simplify the parsing, with just a little up-front work. First let's look at the constraint capability.
from amara import bindery, xml_print
from amara.bindery.model import *
MONTY_XML = """<monty>
<python spam="eggs">What do you mean "bleh"</python>
<python ministry="abuse">But I was looking for argument</python>
</monty>"""
doc = bindery.parse(MONTY_XML)
#Add a constraint that `python` elements must have a `ministry` attribute
c = constraint(u'@ministry')
try:
doc.monty.python.xml_model.add_constraint(c, validate=True)
except bindery.BinderyError, e:
print e
doc.monty.python.xml_attributes[None, u'ministry'] = u'argument'
doc.monty.python.xml_model.add_constraint(c, validate=True)
Notice that the constraints are expressed using XPath. The closest more general equivalent to Amara 2.0's content model is Schematron http://www.schematron.com/ . I highly recommend that Amara users at least gain some familiarity with this schema language, which is by far the most powerful in the XML world. Chimezie's "Validating XML with Schematron" is a great conceptual intro from a general point of view. If you have a strong XSLT background see my article "Introducing the Schematron".
Both these articles cover older versions of Schematron, but they do give you the general idea. If you'd like to dive deeper and learn the current, ISO version of the standard, see "A hands-on introduction to Schematron" (free registration required) and "Discover the flexibility of Schematron abstract patterns" (no registration needed.) Abstract patterns are a *very* powerful device, and I hope to work them into Amara 2.0's content model facilities somehow.
But back to Amara 2.0. You can add any general constraint along the lines of a Schematron assertion, and it's applicable to any element that uses that same binding class (xml_model is a class property).
When I add the constraint I tell Amara to run validation, and since the first python element is missing the specified attribute, Amara throws an exception. So I add the missing attribute by hand, and now when I try to add a constraint, all is well. You can also add a constraint without validating, but this might not be a good idea because you might unknowingly end up in an inconsistent state.
Amara constraints are very flexible, and you can add your own. Two specialized constraints that come bundled are constraints on the presence of a child element or an attribute. You can specify the constraint from the above example as follows:
doc = bindery.parse(MONTY_XML)
c = attribute_constraint(None, u'ministry', u'nonesuch')
doc.monty.python.xml_model.add_constraint(c, validate=True)
xml_print(doc)
You specify the namespace and local name of the required attribute. There is also a third argument, u'nonesuch'. This is called the fixup default, and explores another interesting aspect of the content model facilities. Amara can often correct mismatches between models and actual content. In this case, if during validation Amara finds a python element with a missing ministry attribute, it will create one for you with the value u'nonesuch'. If you carefully examine the document as it's printed out after validation, you can see the added attribute.
You can do the same with child element constraints:
SVG = """<?xml version="1.0" encoding="utf-8"?>
<svg version="1.1" baseProfile="full"
xmlns="http://www.w3.org/2000/svg">
<title>A pair of lines and a pair of ellipses</title>
<g>
<ellipse cx="150" cy="100" rx="100" ry="50"/>
<line x1="450" y1="50" x2="550" y2="150"/>
</g>
<g>
<title>Rotated shapes</title>
<ellipse cx="150" cy="300" rx="100" ry="50"
transform="rotate(20)"/>
<line x1="350" y1="200" x2="450" y2="300"
transform="rotate(20)"/>
</g>
</svg>
"""
from amara.namespaces import *
doc = bindery.parse(SVG)
c = child_element_constraint(SVG_NAMESPACE, u'title', u'[NO TITLE]')
doc.svg.g.xml_model.add_constraint(c, validate=True)
xml_print(doc)
In this case the fixup for a missing title element is a new element with the content u'[NO TITLE]'. You might now be thinking "hey, that's cool, but do I have to write a ton of manual constraints to build my content model bit by bit? Well of course you don't. You can imagine using DTD or RELAX NG or WXS schemata to generate Amara 2.x constraints for you, and all this is planned, but for now Amara supports only one schema language. Luckily it's the simplest of all schema languages: Examplotron, so named because an example document basically your schema. For an introduction see my article "Introducing Examplotron".
LABEL_MODEL = '''<?xml version="1.0" encoding="utf-8"?>
<labels>
<label>
<name>[Addressee name]</name>
<address>
<street>[Address street info]</street>
<city>[City]</city>
<state>[State abbreviation]</state>
</address>
</label>
</labels>
'''
VALID_LABEL_XML = '''<?xml version="1.0" encoding="utf-8"?>
<labels>
<label>
<name>Thomas Eliot</name>
<address>
<street>3 Prufrock Lane</street>
<city>Stamford</city>
<state>CT</state>
</address>
</label>
</labels>
'''
INVALID_LABEL_XML = '''<?xml version="1.0" encoding="utf-8"?>
<labels>
<label>
<quote>What thou lovest well remains, the rest is dross</quote>
<name>Ezra Pound</name>
<address>
<street>45 Usura Place</street>
<city>Hailey</city>
<state>ID</state>
</address>
</label>
</labels>
'''
from amara.bindery.model import *
label_model = examplotron_model(LABEL_MODEL)
doc = bindery.parse(VALID_LABEL_XML, model=label_model)
doc.xml_validate()
doc = bindery.parse(INVALID_LABEL_XML, model=label_model)
try:
doc.xml_validate()
except bindery.BinderyError, e:
print e
xml_print(doc)
Oh yes, LABEL_MODEL, that's really your schema. Nice, eh? You can establish occurrence constraints and such using additional Examplotron features. See the article above for more details. Anyway, you specify the Examplotron XML as your model document, and then you use that model to parse the candidate documents. The first matches the schema just fine, but the second has an unexpected quote element, so it fails validation and raises an exception. The xml_validate() method is the quickest way to validate a document, and it works for elements, too (just validating all elements in that subtree). The coolest thing about this is that you can validate documents after mutating them with the Amara API. Validation can be a bit expensive, speed-wise (though not noticeably unless you're dealing with huge docs), so use it judiciously. The good news is that you only pay the penalty upon actual validation. Mutation, document access and other operations proceed at regular speed.
One last thing I'd like to cover today concerns one of the most common bits of feedback from Amara 1.x. If you had a somewhat irregular XML document, you couldn't freely use bindery property traversal (e.g. doc.labels.label) without risking AttributeError. In Amara 2.0, if you use a model in parsing a document, this model makes the binding smart, and you can set a default value to be returned in cases where a known element happens to be missing somewhere in your instance document.
LABEL_MODEL = '''<?xml version="1.0" encoding="utf-8"?>
<labels>
<label>
<quote>What thou lovest well remains, the rest is dross</quote>
<name>Ezra Pound</name>
<address>
<street>45 Usura Place</street>
<city>Hailey</city>
<state>ID</state>
</address>
</label>
</labels>
'''
TEST_LABEL_XML = '''<?xml version="1.0" encoding="utf-8"?>
<labels>
<label>
<name>Thomas Eliot</name>
<address>
<street>3 Prufrock Lane</street>
<city>Stamford</city>
<state>CT</state>
</address>
</label>
</labels>
'''
from amara.bindery.model import *
label_model = examplotron_model(LABEL_MODEL)
doc = bindery.parse(TEST_LABEL_XML, model=label_model)
print doc.labels.label.quote #None, rather than raising AttributeError
So even though the instance document doesn't have a quote element, Amara knows that it could from the model. If you try to access the quote element you get back the default value of None. You can of course override this default if you like.
