Using Rackspace Cloud Files – 1

In Architecture on April 11, 2010 by petrem66

Rackspace Cloud Files is an inexpensive way to persist private content that needs to be available on a longer period of time, but it is not practical for use as shared file system for Rackspace cloud servers. The storage does not expose internal IP therefore whenever the application needs to load or save to it objects, they will charge you network fees. In order to assess the benefits versus short fallings of using the Rackspace FS in my design I conducted a set of tests.

This blog is the starter of a series on experimenting with CF

Test premise

My web application is a virtual J2EE based server consisting of a number of cloud based façade servers, and a grid of worker to do the actual processing on files. In between I have a simple message queue to disconnect the two groups of servers.

Assuming that an action triggered by the user causes an object O1 to be transformed into O2 and that user expects to be presented O2, the question is what is the best to employ Rackspace Cloud Files?. The criteria for best are cost, complexity, and performance (response time)

Possible architecture

I am considering two scenarios of usage of Rackspace CF by the components of the virtual server:

  • virtual file system, when both the application on the Web server and the Worker processing the request can access it to get and store objects (the content of the file)
  • lazy repository, when only the Worker can access it. The application on the Web server gets the content of the file through the Worker at the end of processing

Using Rackspace CF As Virtual File System

  1. User request
  2. Web application publishes request for processing
  3. Worker gets the request
  4. Worker gets the input file O1
  5. After processing, the Worker stores the result as O2 to cf
  6. Worker publishes work done with follow up Id
  7. Web application gets the follow up for processing completion
  8. Web app gets O2 from cf
  9. Reply to user

Using Rackspace CF As a Lazy Repository

  1. User sends request
  2. The Web application publishes request for processing
  3. The Worker gets the request
  4. The Worker gets the input file O1
  5. The Worker processes O1 and keeps it as O2 locally
  6. The Web application gets the reply from the queue
  7. The Web application gets O2 from the worker and puts it in the response body
  8. The user receives O2
  9. The worker stores O2 to CF

Both scenarios have strength and weaknesses. Using Rackspace CF as a virtual file system, the application can make up a transactional-like process to cover for the case when the worker, or the queue can go under before the response is propagated back. Also it is a true grid architecture in that processors are completely isolated from the facade, but it is expensive in terms of decreased speed, and extra cost. The communication between Rackspace servers and CF is regarded as external traffic. They charge any GB of inbound and outbound, so for any 1 MB of O2, it would cost 3 X 1 MB to reach the user’s browser (see steps 5, 8, 9). Also, if O2 needs to be transient as per some business cases, storing it into CF is a waste. I need to test the speed penalty for external traffic vs internal traffic for this scenarion

Using Rackspace CF as a lazy repository, the cost due to O2 transportation over the network is reduced to 2 x 1MB (see step 9, and 10). Also, in case O2 is transient, the application can discard it after usage saving the cost of storage on the CF, but such configuration is not true grid and it would be a bit more work to ensure reprocessing in case of a Worker lost


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: