An XML library for PHP you may not hate.
If you are a PHP programmer, chances are that you will need to write and parse XML from time to time. You may even consider this a good thing. Chances are though that dealing with XML has caused you to flock to JSON.
But XML has advantages, and sometimes you simply don’t have an option.
I myself have gone through several stages of this. Back in the day everybody used expat because it was fast. I switched to simplexml because it had a friendlier API, and I used the DOM when I needed access to a wider range of XML features. I’ve also simply created XML output by concatenating strings.
But ever since PHP shipped with XMLReader and XMLWriter I’ve wondered if it was a better fit. Early on I was deterted several times due to these objects not being very stable.
The XMLReader and XMLWriter objects are nice, but in order to effectively use them, they need a sort of design pattern. I’ve experimented with this concept off and on since 2009, and finally landed on something I’m reasonably happy with.
A few people have randomly stumbled upon this experiment and I got mostly positive feedback. Today I wanted to show it off to everyone. I’ve iterated on the base concept for several years, and tweaked it every time to get a sort of ‘good enough’ api that behaves reasonably sane in various scenarios.
The library is called sabre/xml, and I hope people are willing to kick its tires and give some feedback.
How it works
sabre/xml extends the XMLReader and XMLWriter class and adds a bunch of functionality that makes it quick to generate and parse xml.
By default it parses from/to PHP arrays, which is great for quick one-shot parsers/writers, but the biggest feature is that it allows you to intuitively map XML to PHP objects and vice-versa.
This gives this XML library a distinct advantage. It’s very easy to get started, but its design pattern still works for more complex XML application.
The one caveat is that reading and writing are single-pass by design. Unlike the DOM, you can’t load in a document, make a small modification and save it again.
Writing XML in a nutshell
<?php
$xmlWriter = new Sabre\Xml\Writer();
$xmlWriter->openMemory();
$xmlWriter->startDocument();
$xmlWriter->setIndent(true);
$xmlWriter->namespaceMap = ['http://example.org' => 'b'];
$xmlWriter->write(['{http://example.org}book' => [
'{http://example.org}title' => 'Cryptonomicon',
'{http://example.org}author' => 'Neil Stephenson',
]]);
?>
Output:
<?xml version="1.0"?>
<b:book xmlns:b="http://example.org">
<b:title>Cryptonomicon</b:title>
<b:author>Neil Stephenson</b:author>
</b:book>
As you can see, you can quickly generate complex xml from simple array structures.
Instead of serializing strings, you can also serialize objects. There’s a
Sabre\Xml\XmlSerializable
interface included that is meant to work similar
to PHP’s JsonSerializable
.
Reading XML in a nutshell
This is how you parse an xml document:
<?php
$input = <<<XML
<article xmlns="http://example.org/">
<title>Hello world</title>
<content>Fuzzy Pickles</content>
</article>
XML;
$reader = new Sabre\Xml\Reader();
$reader->elementMap = [
'{http://example.org/}article' => 'Sabre\Xml\Element\KeyValue',
];
$reader->xml($input);
print_r($reader->parse());
?>
This will output something like:
Array
(
[name] => {http://example.org/}article
[value] => Array
(
[{http://example.org/}title] => Hello world
[{http://example.org/}content] => Fuzzy Pickles
)
[attributes] => Array
(
)
)
The key in the last example, is that we tell the parser to treat the contents of the article XML node as a key-value structure.
This is optional, but by adding this hint the resulting output becomes a lot simpler.
The parser comes with a few parsing strategies for common needs, and you can easily create your own by writing deserializer classes, or just by providing a callback:
<?php
$reader->elementMap = [
'{http://example.org/}article' => function(Sabre\Xml\Reader $reader) {
// Read the element's contents, and return the result here.
}
];
?>
Element classes and interfaces
Sabre\Xml\XmlSerializable
is used to allow an object to serialize itself.Sabre\Xml\XmlDeserializable
turns an object into a factory for parsing and returning a value.Sabre\Xml\Element
is a convenience interface that just extends the previous two.
You can implement these interfaces yourself, but a few standard implementations are included:
Sabre\Xml\Element\Base
is the default and turns every element into an array with aname
,value
, andattributes
key.Sabre\Xml\Element\KeyValue
flattens the array, and turns it into a key-value array.Sabre\Xml\Element\Elements
discards element values, and gives you a flat array of element names. Useful for ‘enums’.Sabre\Xml\Element\CData
allows you to easily embed a CDATA structure.Sabre\Xml\Element\XmlFragment
extracts a subtree from XML and gives you a valid xml fragment, including namespace declarations.
The benefits
This type of design pattern has a number of major advantages. It’s possible for users to create PHP classes that represent specific XML elements.
For complex XML application this is useful, because elements may be re-used in various document types, and now those element classes can be re-used in the same way.
It would also allow someone to publish a set of Element classes for a specific xml format such as Atom on packagist and allow someone else to re-use specific parts of of that format into a new format. I’m hoping to fulfill the promise of XML extensibility by bringing it in PHP, but that might be too bold of a statement.
At the very least I think it will make your XML parsing code simpler, reusable, extensible and more legible. I also found it more fun to work with XML, but I’m biased.
The full docs can be found on http://sabre.io/xml/, the source on GitHub and it may be installed with:
composer require sabre/xml ~0.4.0
Comments
rkr •
What about xpath? The examples you've shown, could be archived with little overhead using DOM. The only real feature so far (as I got it) is the mapper. Can you make it more clear were the differences between DOM and sabre/xml are?
Evert •
Aside from object mapping, the other benefits are really the same benefits that XMLReader has over the DOM. XMLReader is single-pass, low on memory and I believe that it has a nicer API.
This library is lifted effectively from sabre/dav, which is a library for webdav, caldav and carddav. Those protocols use a ton of XML, and I have been using the DOM for years. For me this was a major upgrade, and will also allow me to do streaming XML responses for large bodies in the future, whereas with the DOM I just had to submit to hundreds of MB in memory usage.
z2z •
PHP really needs something like JAXB. Searching for one...
Petah •
Repeating the namespace is a bit annoying
Ryan Tate •
Repeating the namespace all over is not that DRY. Needs some sort of namespacing tool.
function mkNS(string $ns){
return function(string $key = '') use($ns){
return $key ? "{{$ns}}$key" : $ns;
};
}
$ns = mkNS('http://example.org');
$xmlWriter = new Sabre\Xml\Writer();
$xmlWriter->openMemory();
$xmlWriter->startDocument();
$xmlWriter->setIndent(true);
$xmlWriter->namespaceMap = [$ns() => 'b'];
$xmlWriter->write([
$ns('book') => [
$ns('title') => 'Cryptonomicon',
$ns('author') => 'Neil Stephenson',
]
]);