subscribe

Save memory by switching to generators

Since the release of PHP 5.5, we now have access to generators. Generators are a pretty cool language feature, and can allow you to save quite a bit of memory if they are used in the right places.

To demonstrate, imagine that our model has a function that fetches records from a database:

<?php

function getArticles() {

   $articles = [];
   $result = $this->db->query('SELECT * FROM articles');

   while($record = $result->fetch(PDO::FETCH_ASSOC)) {

       $articles[] = $this->mapToObject($record);

   }

   return $articles;

}

?>

The preceding example is a fairly common pattern in CRUD application. In the example, we’re fetching a list of records from the database, and we apply some function to the records before returning them.

Somewhere else in the application, this might be used like this in some view:

<?php

foreach($model->getArticles() as $article) {

    echo "<li>", htmlspecialchars($article->title), "</li>";

}
?>

The memory problem

If our articles table contains a lot of records, we’re storing each one of those in the $articles variable. This means that your “peak memory usage” is dependent on how many records there are. For many smaller use-cases this might not really be an issue, but sometimes you do have to work with a lot of data.

It’s not uncommon in complex applications for the result of a function like our getArticles to be passed to multiple functions that mangle or modify the data further.

Each of these functions tend to have a (foreach) loop and will grow in memory usage as the amount of data goes up.

Generators to the rescue

A generator allows you to return an ‘array-like result’ from a function, but only return one record at a time.

It’s possible to convert our getArticles() function to a generator relatively easy. Here’s our new function:

<?php

function getArticles() {

   $result = $this->db->query('SELECT * FROM articles');

   while($record = $result->fetch(PDO::FETCH_ASSOC)) {

       yield $this->mapToObject($record);

   }

}

?>

As you can see from this, the function is actually shorter, and the $articles variable no longer exists.

Our earlier “view” code does not need to be modified, this still works exactly as-is:

<?php

foreach($model->getArticles() as $article) {

    echo "<li>", htmlspecialchars($article->title), "</li>";

}
?>

The difference? Every time the getArticles() method ‘generates’ a new record, the function effectively ‘pauses’, and for every iteration of the foreach loop, the function is continued until it hits another yield.

Things to look out for

The result of getArticles() now no longer returns an array, but it actually returns an “iterator”, which is an object.

Things like a foreach loop behave mostly the same, but not everything you can do with an array, you can do on an iterator as well.

For instance, before we switched to generators, we would have been able to access a specific record from getArticles() like this:

<?php

$fifthArticle = $model->getArticles()[4];

?>

With generators you can no longer do this, and it will result in an error. Switching to generators means that you must access the result in sequence.

The big PHP fuck-up

Unfortunately, PHP also does not allow you to use the array traversal functions on generators, including array_filter, array_map, etc.

You can first convert the result of an iterator back into an array:

<?php

$array = iterator_to_array(
    $model->getArticles()
);

?>

But, converting to an array defeats the point of using generators a little bit.

To me, this is something that can instantly get added to the infamous “fractal of bad design” article. We’ve had generators since PHP 5.5, iterators since PHP 5.0, and array_map since PHP 4.0, so PHP maintainers have had over a decade to fix this shortcoming.

Web mentions

Comments

  • Matthias Noback

    Although it doesn't fix the issue with array_* functions and iterators, https://github.com/nikic/iter may help bridge the gap.

    • Evert

      Evert

      Nice! Excellent link =)

  • stuartcarnie

    array_map and array_filter not being supported is not a mistake. Generators allow for efficient composition or pipelining. In fact, PHP already has an analog for array_filter using the \CallbackFilterIterator. It is trivial to write a MapIterator or anything else your heart desires. The problem with array_map and array_filter is they require the full array, and therefore lose the memory efficiency of generators

  • HappyArchLabsUser

    > Unfortunately, PHP also does not allow you to use the array traversal functions on generators, including array_filter, array_map

    why u should use them if u have "foreach" ?

    > foreach($model->getArticles() as $article) {

    • Christopher Thomas

      easy, because I can do this and it's much compact and conceptually simple

      $result = array_map("trim",$input);

      • philjohn2

        Even better - put it in the generator if you're going to do it in the first place anyway ...

  • Mark Baker

    You can simulate array_filter() and array_map() (and even array_reduce()) pretty easily for Generators:

    function generator_map(Traversable $filter, Callable $callback) {
    foreach ($filter as $value) {
    yield $value => call_user_func($callback, $value);
    }
    }

    function generator_reduce(Traversable $filter, Callable $callback, $initial=0.0) {
    $result = $initial;
    foreach($filter as $value) {
    $result = call_user_func($callback, $value, $result);
    }
    yield $result;
    }

  • Johannes

    Mind that the PDOStatement is an Iterator (more precise: Traversable) already.

    You can therefore simplify getArticles to this:

    function getArticles() {
    return $this->db->query('SELECT * FROM articles');
    }

    No need for generators there.

    Also by using Iterators you can also use a custom Iterator to do the mapping:

    class HTMLListIterator extends IteratorIterator {
    public function __construct(Traversable $inner) {
    // PDOStatement is a Traversable, which doesn't have a "current"
    // method, therefore we wrap it, so we have a current() in the
    // function below
    parent::__construct(new IteratorIterator($inner));
    }

    public function current() {
    return "<li>", htmlspecialchars($this->getInnerIterator()->current()->title), "</li>";
    }
    }

    foreach(new HTMLListIterator($model->getArticles()) as $article) {
    echo $article;
    }

    The interesting effect here (maybe not so much in that specific example, but abstract it a bit ...) is that we separate different aspects. If we want to replace the database with an array or some other source we don't have to change any other code. If we want to use the same formatting somewhere else (i.e. to write a rendered cache file) we can re-use it etc.

    Iterators are really powerful, but you need some boilerplate ... and be careful to understand what's going one lateron.

  • Matrix AI

    If you use `rowCount()` on a PDOStatement, does this end up counting through all the rows to get you the number, thus defeating the purpose of using the same PDOStatement as an iterator? Or is there another way to find out whether there are any results without doing any iteration?