Save memory by switching to generators
Since the release of PHP 5.5, we now have access to generators. Generators are a pretty cool language feature, and can allow you to save quite a bit of memory if they are used in the right places.
To demonstrate, imagine that our model has a function that fetches records from a database:
<?php
function getArticles() {
$articles = [];
$result = $this->db->query('SELECT * FROM articles');
while($record = $result->fetch(PDO::FETCH_ASSOC)) {
$articles[] = $this->mapToObject($record);
}
return $articles;
}
?>
The preceding example is a fairly common pattern in CRUD application. In the example, we’re fetching a list of records from the database, and we apply some function to the records before returning them.
Somewhere else in the application, this might be used like this in some view:
<?php
foreach($model->getArticles() as $article) {
echo "<li>", htmlspecialchars($article->title), "</li>";
}
?>
The memory problem
If our articles
table contains a lot of records, we’re storing each one
of those in the $articles
variable. This means that your “peak memory
usage” is dependent on how many records there are. For many smaller use-cases
this might not really be an issue, but sometimes you do have to work with
a lot of data.
It’s not uncommon in complex applications for the result of a function like
our getArticles
to be passed to multiple functions that mangle or modify
the data further.
Each of these functions tend to have a (foreach) loop and will grow in memory usage as the amount of data goes up.
Generators to the rescue
A generator allows you to return an ‘array-like result’ from a function, but only return one record at a time.
It’s possible to convert our getArticles()
function to a generator
relatively easy. Here’s our new function:
<?php
function getArticles() {
$result = $this->db->query('SELECT * FROM articles');
while($record = $result->fetch(PDO::FETCH_ASSOC)) {
yield $this->mapToObject($record);
}
}
?>
As you can see from this, the function is actually shorter, and the
$articles
variable no longer exists.
Our earlier “view” code does not need to be modified, this still works exactly as-is:
<?php
foreach($model->getArticles() as $article) {
echo "<li>", htmlspecialchars($article->title), "</li>";
}
?>
The difference? Every time the getArticles()
method ‘generates’ a new
record, the function effectively ‘pauses’, and for every iteration of the
foreach
loop, the function is continued until it hits another yield
.
Things to look out for
The result of getArticles()
now no longer returns an array, but it actually
returns an “iterator”, which is an object.
Things like a foreach loop behave mostly the same, but not everything you can do with an array, you can do on an iterator as well.
For instance, before we switched to generators, we would have been able to
access a specific record from getArticles()
like this:
<?php
$fifthArticle = $model->getArticles()[4];
?>
With generators you can no longer do this, and it will result in an error. Switching to generators means that you must access the result in sequence.
The big PHP fuck-up
Unfortunately, PHP also does not allow you to use the array traversal
functions on generators, including array_filter
, array_map
, etc.
You can first convert the result of an iterator back into an array:
<?php
$array = iterator_to_array(
$model->getArticles()
);
?>
But, converting to an array defeats the point of using generators a little bit.
To me, this is something that can instantly get added to the infamous “fractal of bad design” article. We’ve had generators since PHP 5.5, iterators since PHP 5.0, and array_map since PHP 4.0, so PHP maintainers have had over a decade to fix this shortcoming.
Comments
Matthias Noback •
Although it doesn't fix the issue with
array_*
functions and iterators, https://github.com/nikic/iter may help bridge the gap.Evert •
Nice! Excellent link =)
stuartcarnie •
array_map and array_filter not being supported is not a mistake. Generators allow for efficient composition or pipelining. In fact, PHP already has an analog for array_filter using the \CallbackFilterIterator. It is trivial to write a MapIterator or anything else your heart desires. The problem with array_map and array_filter is they require the full array, and therefore lose the memory efficiency of generators
HappyArchLabsUser •
> Unfortunately, PHP also does not allow you to use the array traversal functions on generators, including array_filter, array_map
why u should use them if u have "foreach" ?
> foreach($model->getArticles() as $article) {
Christopher Thomas •
easy, because I can do this and it's much compact and conceptually simple
$result = array_map("trim",$input);
philjohn2 •
Even better - put it in the generator if you're going to do it in the first place anyway ...
Mark Baker •
You can simulate array_filter() and array_map() (and even array_reduce()) pretty easily for Generators:
function generator_map(Traversable $filter, Callable $callback) {
foreach ($filter as $value) {
yield $value => call_user_func($callback, $value);
}
}
function generator_reduce(Traversable $filter, Callable $callback, $initial=0.0) {
$result = $initial;
foreach($filter as $value) {
$result = call_user_func($callback, $value, $result);
}
yield $result;
}
Johannes •
Mind that the PDOStatement is an Iterator (more precise: Traversable) already.
You can therefore simplify getArticles to this:
function getArticles() {
return $this->db->query('SELECT * FROM articles');
}
No need for generators there.
Also by using Iterators you can also use a custom Iterator to do the mapping:
class HTMLListIterator extends IteratorIterator {
public function __construct(Traversable $inner) {
// PDOStatement is a Traversable, which doesn't have a "current"
// method, therefore we wrap it, so we have a current() in the
// function below
parent::__construct(new IteratorIterator($inner));
}
public function current() {
return "<li>", htmlspecialchars($this->getInnerIterator()->current()->title), "</li>";
}
}
foreach(new HTMLListIterator($model->getArticles()) as $article) {
echo $article;
}
The interesting effect here (maybe not so much in that specific example, but abstract it a bit ...) is that we separate different aspects. If we want to replace the database with an array or some other source we don't have to change any other code. If we want to use the same formatting somewhere else (i.e. to write a rendered cache file) we can re-use it etc.
Iterators are really powerful, but you need some boilerplate ... and be careful to understand what's going one lateron.
Matrix AI •
If you use `rowCount()` on a PDOStatement, does this end up counting through all the rows to get you the number, thus defeating the purpose of using the same PDOStatement as an iterator? Or is there another way to find out whether there are any results without doing any iteration?