PHP-RPC
Update: I found out the difference between pointer and normal references, so I updated the 'data format section. Update2: Got the definition of all data types
Over the past time I've seen several proposals and implementations of people trying to leverage PHP's serialize format for RPC (remote procedure calls).
PHP's format is very compact compared to XML-RPC, not to mention SOAP.. There's no complex XML Parsing involved and its very fast to parse. Consuming a webservices leveraging this format can often be done using 2 or 3 lines of code without use of any external library.
Additionally it allows you to send over typed objects.. You could for example, say, send a object of the 'User' class, and on the other end of the line it would show up with the exact same classname.
Disadvantages:
- Most of the advantages only apply when its used with PHP. There are better ways to communicate when there's other languages involved.
- The classmapping will only be effective when both the client and the server have the same class definitions.
- The serialize structure is not 100% compatible between versions.
- There is no formal standard for both the structure, or the RPC protocol.
I hope, by typing the following document I can fix the last 2 problems in this list. The classmapping issue might also be fixed down the road by adding some kind of negotiation scheme. Another TODO is adding introspection and multiple calls in one request, like most XML-RPC implementations today support.
If there's enough interest in a standard like this, I will change this document into a more 'official' one and detach it from this blog.. If there's not, well, it means I will have a nice set of business requirements for use within our business :). Please note that this an early draft, so subject to change.
The proposal (0.1)
Goals
- Client should be very easy to implement. Server is allowed to be a bit more complex.
- No duplication of the HTTP protocol. For example, HTTP already provides encryption, redirecting and authentication.
- PHP 4/5/6 compatiblity.
- Client and server implementations should be built from the idea 'be strict in what you produce, be liberal in what you accept'
The request
Requests are made using either GET or POST. Both should be accepted. GET is more appropriate for fetching information, whereas POST is used for posting new data. POST has the advantage that it doesn't have any limits in the size of the request and an encoding can be supplied. GET has the advantage that information can be fetched using a one-liner.
When there is no encoding specified, UTF-8 is assumed. Data supplied using POST should be encoded as application/x-www-form-urlencoded (this is how a browser submits data by default).
The method thats called should always be supplied as the 'method' variable. The method can contain periods (.) to seperate namespaces like XML-RPC. Arguments can be specified in two ways, and the API documentation should specify what the appropriate way is. The first way is using named arguments, a GET example would be:
http://www.example.org/services/phprpc?method=getUsers&maxItems=20
The method here is getUsers, the named argument is maxItems and its value is 20.
The second way is using a list of arguments, which might be more appropriate in some cases where you want to directly map services and methods from a class on the server to the api. This is also how XML-RPC works.
http://www.example.org/services/phprpc?method=getUsers&arguments[0]=20&arguments[1]=1
The first argument is 20, the second is 1.
Smart clients should autodetect if the user is trying to use named arguments or a sequence by checking out the type of the keys in the array.
Smart servers should use reflection to automatically map named arguments to the actual arguments in a list.
Clients SHOULD supply the version of PHP they are running. This can be either a complete version number, or just the major version (e.g.: 4, 5, 6). Clients should supply this as the phpVersion parameter. If the versionnumber is not supplied, the current stable PHP version is assumed, which is at the time of writing 5.
Clients SHOULD also supply the version of the PHP-RPC protocol as the 'version' parameter. Currently this is 0.1.
Clients MAY supply a returnClasses parameter. The value for returnClasses is either 0 or 1 and this can tell the server if the client is aware of typed objects that might be sent from the server.
The server
The server MUST allow requests both GET and POST requests. The server MUST treat any incoming text without encoding as UTF-8.
The server SHOULD allow both named arguments and indexed arguments for methods where this is possible.
If the client sent phpVersion the server MUST convert the returned serialized string so it can be read by the server. If the phpVersion is 4 or 5 the server MUST convert all unicode-strings (type U) to binary strings (type s). If the phpVersion is 4 the server MUST convert all private and protected properties to public properties.
Servers SHOULD also convert all typed objects to either STDClass'es or arrays when the client supplied returnClasses is set to 0, if this is appropriate.
When the method-call was successful the server should send HTTP code 200. When an error occurred the server should send an appropriate HTTP error code. (for example 400 for missing arguments, 500 for unexpected exceptions, 401 if the user should authenticate itself first and 403 is the method was not allowed to be called).
The return data is always in PHP's serialize data format. The Content-Type header should always be 'application/x-php-serialized'
When an error occurred the server MUST send back an array, with at least the 'message' property, which should contain a description of the error that occurred. The server MAY supply more information in this array, such as line number, filename, class of the exception, stacktrace, etc..
The serialized data format
All data is serialized using PHP's serialize format. This is an unofficial specification. Although the format is human-readable, it is and should be treated as a binary format.
All items start with an 1 byte type identifier. These are the different types out there:
a | array |
b | boolean |
C | object which implements Serializable |
d | double |
i | integer |
N | null |
o | seems to be a depreciated way to encode objects |
O | object + class |
r | reference |
R | Pointer reference |
s | string |
S | escaped string. PHP6 uses this, but recent versions of PHP5 can also decode it. |
U | Unicode string (PHP6) |
A boolean, double and integer all have the format:
type:value;
Where type is either b, i or d and value is a literal number (e.g. 12, 85.12, or 1 for true, 0 for false).
A null is specified as:
N;
A string is specified with the length of the string, and the actual string between double quotes.
s:10:"helloworld";
A Unicode string works the exact same way, however.. This type is only supported in PHP6. PHP6 differentiates between binary strings and unicode strings. Strings coming from older versions of PHP will therefore always be treated as binary strings in PHP6. Unicode strings will be supplied as UTF-16 and the length specifies the number of bytes, not characters.
Arrays wrap their elements in curly braces { }. The contents of the array are always simply a list of one or the other types (or more arrays.)
a:lengthofarray:{key1 + value1 + key2 + value2}
Note: the + signs are not literals here. Also note that arrays and objects are the only types that do not end with a semi-colon (;).
Example:
a:2:{i:0;s:3:"moo";i:1;s:4:"unox";}
Objects work similar to arrays, but they include the name of the class.
O:classnamelength:"ClassName":propertycount:{key1 + value1 + key2 + value2}
Example:
O:6:"MyClass":2:{s:6:"*prop1";s:6:"value1";s:5:"prop2";s:6:"value2";}
PHP5 introduces private and protected properties. If the name of a property is prepended with a *, it means the property is protected. When a property is private, it includes the name of the defining class and contains 0x00 before and 0x00 after the name of the class.
Written out, that would be:
public s:9:"property1";s:6:"value1";
protected s:10:"*property2";s:6:"value2";
private s:20:"0x00 + ClassName + 0x00 + property3";s:"value3"; // all whitespace and + signs should be ignored in this line
PHP5 also introduced the Serializable interface, which allow a custom encoding of objects. Serializable objects are encoded as:
C:classnamelength:"ClassName":datalength:{data}
So if the serialize method returned "foo", the result could look like this:
C:9:"TestClass":3:{foo}
Lastly, references. When the structure you're serializing contains a reference to a variable that was used earlier, it will be referenced. The main reason for this is, if you would have a structure with a circular reference, the serialization would keep on traversing your structure.. until, well, it would break.. Also, if two variables reference the same data.. that link is actually maintained..
A reference looks like this:
R:19;
There is a second reference type in PHP5. Objects in PHP sort of work like other references, but not completely. To illustrate I will just show an example.
<?php
class MyClass {
var $myProp = 1;
}
$obj1 = new MyClass(); // new object
$obj2 &= $obj1; // pointer reference
$obj3 = $obj1; // value reference
$obj1->myProp = 2;
echo $obj2->myProp, "\n"; // will display 2
echo $obj3->myProp, "\n"; // will display 2
$obj1 = new MyClass();
$obj1->myProp = 3;
echo $obj2->myProp, "\n"; // will display 3
echo $obj3->myProp, "\n"; // will display 2
?>
PHP4 made a copy of every object when it was assigned to a new variable.. to understand the difference, the output of this script in PHP4 would be:
2
1
3
1
Value references in PHP are serialized as:
r:19;
If you want to know which variable this is referencing to, you should be looking for the 19th variable you decoded so far in your structure (you start counting at 1), but excluding other references and property names.. (so array indexes don't count, array values do..).
PHP4 seems to be treating r and R as the same thing (pointer references), so there's no need for conversion for PHP4 clients. I tested this with PHP 4.4.4.
Comments
Lars Strojny •
As argument[n] seems to be pretty unconvinient for me and not really that API-alike, I would propose to utilize reflection fetch the names of a certain parameter for a certain method. Inspect a doc block like:/**
* @param $foo string
*/
Maps foo=mystring to the foo attribute. Though you would have both. Easy inspection and convinient API.
Evert •
You are probably right, it is a bit ugly..But a big advantage is, you can actually make a client API in the form of this:
$service = new PHPRPC_Client('http://gateway','myService');
$result = $service->someRemoteMethod('param1','param2');
Now, how would you send that over? From the simple form:
$data = unserialize(file_get_contents('url?param=value'));
I agree, it is ugly.. which is why I felt there should be support for both approaches and both essentially optional to implement..
The first is more REST-full, the other more RPC-like..