7 Ways to Optimize PHP Applications that Use Long Running Operations
IT copywriter
Reading time:
Today I’ll talk about possible ways to optimize long running PHP operations based on a recent project we’ve successfully completed here in Azoft web development department. Our experience would be helpful for web developers who’d like to get smoothly-working PHP applications even though the task is constantly changing during the development process.
This article doesn’t cover all possible solutions, it just aims to give an overview of options one can use when facing the problem of long running operations to save your time.
Project overview
The following technologies were used in this project:
- MVC Framework CakePHP 2.1 (PHP 5.3)
- MySQL
- jQuery
- jQuery UI
- jQuery plugins
- Selenium server
- CentOS
The Goal
The goal was to develop a prototype for a future system. In fact, this was a pilot project, so its specifications and priorities often changed during the course of development. For this reason, it was rather difficult to predict during the early stages of this project that the system will contain lengthy PHP operations.
The Problem
As it turned out, such operations caused a major challenge at later stages of development:
Such long operations blocked other operations within the same session.
The entire process of navigating through links of the system became impossible until the long operation was complete.
Having examined these problems on a lower level, we found that until the operation is complete and session data is flushed to the session file therefore removing the blockage, all other requests had to patiently wait in a queue.
In fact, the default method for storing sessions in PHP is storing in a file. When a session is opened, a function such as fopen() is activated, blocking the file for reading and writing by other processes.
During the time when requests and navigation through links was blocked, the user had no idea what was going on at the moment, when the operation would be finished and whether or not there had been some error.
Ways to solve this problem
1. Divide the operation into steps
The first solution that might come to mind is to divide the long operation into shorter steps. Upon completion of each step and sending the result to the user’s browser, the browser automatically initiates the next step, sending corresponding AJAX request to the server.
In context of the given project, this particular approach wasn’t successful since long operations were virtually inseparable into steps. Besides, in our project we used third-party services running on the server, working with which required to create and prepare corresponding objects during the time of request processing. After processing the request, presented by the first step of the operation, the objects were killed.
2. AJAX Polling
Another approach to dealing with lengthy PHP operations is to start such operation on a server and constantly poll the server for operation status updates by sending a series of AJAX request at certain time intervals. As for the client-side, we can analyze the server’s response (for example this could be JSON containing “message”, “percentage”, “error” and “redirect”) and create a progress bar that shows the status of the current operation.
For our project, we tried using this Polling approach by applying two different methods for storing the results of the operation:
- Storing results in a file <session_id> + <operation>.txt
- Storing results in a database in a corresponding table lengthy_operations
The user initated the operation with AJAX request. Then, the client script periodically polled the server and received status updates about the current operation:
/operations/get_status/<operation_id></operation_id>
However, here we encountered another problem: when the lengthy operation was started it blocked the session file, thus eliminating the possibility of processing concurrent requests within the same session. In other words, after the lengthy operation began, the requests that were supposed to be polling the server had to wait in the queue until the main request was finished with the session file.
To solve this problem, our goal was to unlock the session file. In order to do this, we took advantage of a PHP function — session_write_close() , which allows to end the current session and store session data. In fact, it is possible to launch the operation, read session’s data, introduce changes to the data, and disable write permissions for the session file.
In practice, however, we weren’t able to do this for our project. Considering the existing architecture, it was necessary to record the session in too many places in the lengthy operation. Besides, such solution wouldn’t be considered ‘clean’, since the user might want to navigate throughout the site and possibly launch another lengthy operation at the same time. Therefore, we needed to find an alternative.
The other way was to change the Session Storage, which allowed working with the session without blocking it upon opening. When choosing a new storage for the session, there are several options:
- MySQL Database
- MongoDB
- Memcached
In order to change the Session Storage in PHP, there is a special function for setting user functions of session storage — session.save_handler(). These user functions are used for storing and retrieving data, associated with the session. In fact, the function session.save_handler() can be used in many different situations. There are even classes available for transferring the session into a database or memcached.
Note: In order to work with the session.save_handler() function, you must install the option session.save_handler in the user value in your configuration file php.ini.
While working on our project we experimented with MySQL and MongoDB as the session storage.
Note: Let’s say you recorded a parameter into the session and you need it to be available to other requests immediately. In this case, you should “flush” the session into MySQL or MongoDB, therefore letting the session-recording function work. To do this, you need to disable edit permission and then reopen it.
public function session_flush() { session_write_close(); session_start(); }
After we implemented these components the session was no longer blocked and allowed to be processed by several requests associated with the same session, at the same time. The session no longer locks, as was the case with opening files via fopen function. Not to mention, the speed of session performance increased significantly.
Note: To use MongoDB in PHP it is necessary to install MongoDB as a service, as well as a driver for working in PHP.
Example: Installation on CentOS
1. Add a repository
Create a file
/etc/yum.repos.d/10gen.repo
File content:
[10gen] name=10genRepository baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64 gpgcheck=0 enabled=1
2. Install the packages required:
Call
yum update yum install mongo-10gen mongo-10gen-server
Start service
service mongod start
Install driver:
(run the first two lines in console, add the third to php.ini)
yum -y install php-devel sudo pecl install mongo
extension=mongo.so
After you complete these steps, you can start using MongoDB.
As an example, take a look at these CakePHP listing components for transferring Session Storage and MongoDB.
mongo = new Mongo($connection_string); /* indexes */ $this->mongo->{self::MONGO_DATABASE}->{self::MONGO_COLLECTION}->ensureIndex("id", array('id' => 1)); $this->mongo->{self::MONGO_DATABASE}->{self::MONGO_COLLECTION}->ensureIndex("id", array('id' => 1, "expires" => 1)); // Register this object as the session handler if ($this->forceSaveHandler) { session_set_save_handler( array($this, "open"), array($this, "close"), array($this, "read"), array($this, "write"), array($this, "destroy"), array($this, "gc") ); } $this->_timeout = Configure::read('Session.timeout') * 60; } public function __destruct() { try { $this->mongo->close(); session_write_close(); } catch (Exception $e) { } } public function open() { return true; } public function close() { $probability = mt_rand(1, 150); if ($probability <= 3) { $this->gc(); } return true; } public function read($id) { $cursor = $this->mongo->{self::MONGO_DATABASE}->{self::MONGO_COLLECTION}->find(array("id" => $id)); if ($cursor->count() == 1) { $cursor->next(); } else { return false; } $result = $cursor->current(); if (!emptyempty($result) && isset($result['data'])) { return $result['data']; } } public function write($id, $data) { if (!$id) { return false; } $expires = time() + $this->_timeout; $session = array("id" => $id, "data" => $data, "expires" => $expires); $filter = array("id" => $id); $options = array( 'safe' => true, 'fsync' => true, ); $collection = $this->mongo->{self::MONGO_DATABASE}->{self::MONGO_COLLECTION}; if ($collection->findOne($filter) == null) { return $collection->insert(am( array("_id" => new MongoId($id)), $session), $options); } else { return $collection->update($filter, array('$set' => $session), am($options, array('upsert' => false))); } } public function destroy($id) { return $this->mongo->{self::MONGO_DATABASE}->{self::MONGO_COLLECTION}->remove(array("id" => $id), true); } public function gc($time = null) { if (emptyempty($time)) { $time = time(); } return $this->mongo->{self::MONGO_DATABASE}->{self::MONGO_COLLECTION}->remove(array("expires" => array('$lt' => $time)), true); } }
The implementation of components to transfer sessions in MySQL is similar and differs only in the implementation of functions gc (), destroy (), open (), write (), read (), close ().
Note: Don’t forget to change the project configuration file core.php:
Configure::write('Session', array( 'defaults' => 'database', 'handler' => array('engine' => 'MongoSession') ));
You can also add protection to the number of concurrent requests within a single session. In our project this was taken care of by a separate component, responsible for lengthy operations as a whole — LengthyOperationsComponent.php. More specifically, it kept track of the number of operations running in a single session and, depending on the settings, only allowed launching a certain number of operations.
Summing up
- Session Storage is changed. No more blocked sessions.
- From the client-side, AJAX request is initiated to start operation.
- A series of Polling requests is implemented for status updates and progress-bar.
- A separate component for managing long operations is created.
3. Long Polling
This approach is similar to Ajax Polling, but there is one essential difference. In AJAX polling, the client polls the server to find out if any changes occurred, but in Long Polling approach the server sends a signal to the client when any changes appear. This being said, Long Polling approach needs a steady network connection between the server and the cфlient. The advantage of this approach is in reducing traffic between the client and the server.
The working principle. You can think of it this way: the client-side script calls the server and says, “If data appears, I’ll be ready to take it from you immediately, after that I connect with you again”. In some server implementations there is buffering when the server doesn’t give data immediately, waiting, “If something else appears now, I will send all the data at once”. However, such buffering is harmful as it causes delays, and we want to reach the maximum speed!
After the browser receives data it should again open a new connection. In theory, such connection can last for several hours. But usually there is much less time — maximum 5 minutes after which a new connection is created. The reason for doing so is the fact that servers don’t like such long-lasting sessions, and HTTP protocol is not very suitable for such using.
As for our project at hand, we certainly considered this approach but didn’t end up using it.
4. Forever IFrame
We experimented with this approach while working on our project.
The working principle. First, we should set up HTTP-server and PHP, so that they could send data in portions while the operation is executing. Then create a hidden iFrame tag in the page. The tag will incrementally render information about operation progress or execute JavaScript.
The client will use this iFrame tag to initialize the operation. On the server-side the operation will give data in portions and immediately send the response to the client where iFrame executes the sent response.
5. Streaming
This was another approach we experimented with while working on our project.
The idea was to initialize the operation via AJAX request, whereas the server would respond by sending data in portions, i.e. stream data. This way, after receiving a data portion, it’s possible that some event could occur on the client-side. Using such event we could refresh a corresponding block of data and display the progress of operation on data loading.
To use this approach, first we needed to set up Apache server and PHP in particular way. We searched the information about the settings on the Web and made some tests before using them. Eventually, the settings were the following:
public function prepare() { // Turn off output buffering ini_set('output_buffering', 'off'); // Turn off PHP output compression ini_set('zlib.output_compression', false); // Implicitly flush the buffer(s) ini_set('implicit_flush', true); ob_implicit_flush(true); // Clear, and turn off output buffering while (ob_get_level() > 0) { // Get the current level $level = ob_get_level(); // End the buffering ob_end_clean(); // If the current level has not changed, abort if (ob_get_level() == $level) break; } // Disable apache output buffering/compression if (function_exists('apache_setenv')) { apache_setenv('no-gzip', '1'); apache_setenv('dont-vary', '1'); } }
Then we wrote a component that would make this approach possible.
After a more detailed analysis it turned out that this approach was not suitable for our particular project. The reason for this was the fact that there were no events on the client-side that could signal about receiving the portion of data after the AJAX request, sent earlier. The onSuccess event that was on the client worked only if data came completely and as a whole.
6. Comet-server
Wikipedia says that, Comet is a web application model in which a long-held HTTP request allows a web server to push data to a browser, without the browser explicitly requesting it.
A common characteristic of such models is in the fact that all of them are based not on proprietary plug-ins but on technologies supported directly by the browser, for example on JavaScript.
In this project we tested the Dklab Realplexor Comet-server.
Below is the definition of Dklab Realplexor we translated according to the official Russian site of this project.
Dklab Realplexor is a Comet-server which allows to handle 1000000+ parallel long-held HTTP-connections with users’ browsers. JavaScript code run in browser subscribes on one or several Realplexor’s channels and set up a handler on data receiving. Server can write a message in one of these channels at any time. Then the message will be immediately passed to all subscribers – whether to one or to a thousands – in real time mode with minimum load on server.
7. WebSockets
We were also discussing the possibility of using WebSockets.
Wikipedia says that, WebSocket is a web technology providing full-duplex communications channels over a single TCP connection. It is used for exchange messages between browser and web-server in real time mode.
But we rejected this approach as it isn’t compatible with older browsers.
Summing up
As a result, we solved the issue using the Polling approach with transferring sessions to MongoDB.
However, further discussions and an increase in the amount of long running tasks as well as their complexity resulted in the use of a more standard and reliable solution: implementation of the queue of executed operations with CRON (Command Run ON).
In fact, after checking all rights on executing the operation we can serialize OperationContext and save it in database: the chart cron_tasks. On server it will run CRON shell in particular time intervals, which will take the next task from queue, change its status on IN_PROGRESS and pass it to a consequent handler (TaskDispatcherComponent). The handler will take the serialized task context and execute it in separate process. Note that the handler has access to all system models and components.
To find out the progress of executing task you can use Ajax Polling and Long Polling, as well as organize an overview of task queue in a separate display. This approach has proven to be the most reliable and understandable, even though it requires some changes in system architecture.
Comments