subscribe

An XML library for PHP you may not hate.

If you are a PHP programmer, chances are that you will need to write and parse XML from time to time. You may even consider this a good thing. Chances are though that dealing with XML has caused you to flock to JSON.

But XML has advantages, and sometimes you simply don’t have an option.

I myself have gone through several stages of this. Back in the day everybody used expat because it was fast. I switched to simplexml because it had a friendlier API, and I used the DOM when I needed access to a wider range of XML features. I’ve also simply created XML output by concatenating strings.

But ever since PHP shipped with XMLReader and XMLWriter I’ve wondered if it was a better fit. Early on I was deterted several times due to these objects not being very stable.

The XMLReader and XMLWriter objects are nice, but in order to effectively use them, they need a sort of design pattern. I’ve experimented with this concept off and on since 2009, and finally landed on something I’m reasonably happy with.

A few people have randomly stumbled upon this experiment and I got mostly positive feedback. Today I wanted to show it off to everyone. I’ve iterated on the base concept for several years, and tweaked it every time to get a sort of ‘good enough’ api that behaves reasonably sane in various scenarios.

The library is called sabre/xml, and I hope people are willing to kick its tires and give some feedback.

How it works

sabre/xml extends the XMLReader and XMLWriter class and adds a bunch of functionality that makes it quick to generate and parse xml.

By default it parses from/to PHP arrays, which is great for quick one-shot parsers/writers, but the biggest feature is that it allows you to intuitively map XML to PHP objects and vice-versa.

This gives this XML library a distinct advantage. It’s very easy to get started, but its design pattern still works for more complex XML application.

The one caveat is that reading and writing are single-pass by design. Unlike the DOM, you can’t load in a document, make a small modification and save it again.

Writing XML in a nutshell

<?php

$xmlWriter = new Sabre\Xml\Writer();
$xmlWriter->openMemory();
$xmlWriter->startDocument();
$xmlWriter->setIndent(true);
$xmlWriter->namespaceMap = ['http://example.org' => 'b'];

$xmlWriter->write(['{http://example.org}book' => [
    '{http://example.org}title' => 'Cryptonomicon',
    '{http://example.org}author' => 'Neil Stephenson',
]]);

?>

Output:

<?xml version="1.0"?>
<b:book xmlns:b="http://example.org">
 <b:title>Cryptonomicon</b:title>
 <b:author>Neil Stephenson</b:author>
</b:book>

As you can see, you can quickly generate complex xml from simple array structures.

Instead of serializing strings, you can also serialize objects. There’s a Sabre\Xml\XmlSerializable interface included that is meant to work similar to PHP’s JsonSerializable.

Reading XML in a nutshell

This is how you parse an xml document:

<?php

$input = <<<XML
<article xmlns="http://example.org/">
    <title>Hello world</title>
    <content>Fuzzy Pickles</content>
</article>
XML;

$reader = new Sabre\Xml\Reader();
$reader->elementMap = [
    '{http://example.org/}article' => 'Sabre\Xml\Element\KeyValue',
];
$reader->xml($input);

print_r($reader->parse());

?>

This will output something like:

Array
(
    [name] => {http://example.org/}article
    [value] => Array
        (
            [{http://example.org/}title] => Hello world
            [{http://example.org/}content] => Fuzzy Pickles
        )

    [attributes] => Array
        (
        )

)

The key in the last example, is that we tell the parser to treat the contents of the article XML node as a key-value structure.

This is optional, but by adding this hint the resulting output becomes a lot simpler.

The parser comes with a few parsing strategies for common needs, and you can easily create your own by writing deserializer classes, or just by providing a callback:

<?php

$reader->elementMap = [
    '{http://example.org/}article' => function(Sabre\Xml\Reader $reader) {
        // Read the element's contents, and return the result here.
    }
];

?>

Element classes and interfaces

  • Sabre\Xml\XmlSerializable is used to allow an object to serialize itself.
  • Sabre\Xml\XmlDeserializable turns an object into a factory for parsing and returning a value.
  • Sabre\Xml\Element is a convenience interface that just extends the previous two.

You can implement these interfaces yourself, but a few standard implementations are included:

  • Sabre\Xml\Element\Base is the default and turns every element into an array with a name, value, and attributes key.
  • Sabre\Xml\Element\KeyValue flattens the array, and turns it into a key-value array.
  • Sabre\Xml\Element\Elements discards element values, and gives you a flat array of element names. Useful for ‘enums’.
  • Sabre\Xml\Element\CData allows you to easily embed a CDATA structure.
  • Sabre\Xml\Element\XmlFragment extracts a subtree from XML and gives you a valid xml fragment, including namespace declarations.

The benefits

This type of design pattern has a number of major advantages. It’s possible for users to create PHP classes that represent specific XML elements.

For complex XML application this is useful, because elements may be re-used in various document types, and now those element classes can be re-used in the same way.

It would also allow someone to publish a set of Element classes for a specific xml format such as Atom on packagist and allow someone else to re-use specific parts of of that format into a new format. I’m hoping to fulfill the promise of XML extensibility by bringing it in PHP, but that might be too bold of a statement.

At the very least I think it will make your XML parsing code simpler, reusable, extensible and more legible. I also found it more fun to work with XML, but I’m biased.

The full docs can be found on http://sabre.io/xml/, the source on GitHub and it may be installed with:

composer require sabre/xml ~0.4.0

Web mentions

Comments

  • rkr

    What about xpath? The examples you've shown, could be archived with little overhead using DOM. The only real feature so far (as I got it) is the mapper. Can you make it more clear were the differences between DOM and sabre/xml are?

    • Evert

      Evert

      Aside from object mapping, the other benefits are really the same benefits that XMLReader has over the DOM. XMLReader is single-pass, low on memory and I believe that it has a nicer API.

      This library is lifted effectively from sabre/dav, which is a library for webdav, caldav and carddav. Those protocols use a ton of XML, and I have been using the DOM for years. For me this was a major upgrade, and will also allow me to do streaming XML responses for large bodies in the future, whereas with the DOM I just had to submit to hundreds of MB in memory usage.

  • z2z

    PHP really needs something like JAXB. Searching for one...

  • Petah

    Repeating the namespace is a bit annoying

  • Ryan Tate

    Repeating the namespace all over is not that DRY. Needs some sort of namespacing tool.

    function mkNS(string $ns){
    return function(string $key = '') use($ns){
    return $key ? "{{$ns}}$key" : $ns;
    };
    }

    $ns = mkNS('http://example.org');
    $xmlWriter = new Sabre\Xml\Writer();
    $xmlWriter->openMemory();
    $xmlWriter->startDocument();
    $xmlWriter->setIndent(true);
    $xmlWriter->namespaceMap = [$ns() => 'b'];

    $xmlWriter->write([
    $ns('book') => [
    $ns('title') => 'Cryptonomicon',
    $ns('author') => 'Neil Stephenson',
    ]
    ]);