Tuesday, October 23, 2007

Amazon Web Services (AWS)

I went to an event by Amazon on their Amazon Web Services in Santa Monica today. The focus was S3 - storage service, EC2 - their compute cloud, their queuing system, and their flexible payment system. The S3 system is not a transactional object system, it's for larger objects, larger updates. The EC2 is very similar to having a Linux box in a colocation facility. However, there are some interesting differences in terms of not being able to control proximity of multiple boxes, having a fixed IP address and probably most interesting is the ephemeral nature of the persistent storage on the boxes. Basically, your box can go away (and its storage) at any point. You need to push out anything truly persistent to another machine and/or storage, e.g., S3.

It wasn't until a couple of the customers who used it explained what's really required. There appears to be a problem with having a single point of failure around distributing the load over machines and how DNS is handled in the patterns explained. However, the persistence issue seems to have some patterns defined that work pretty well. You definitely have to architect to work effectively for this environment.

There seems to be a very clear case if you are doing batch computing over objects, or if you have widely varying computing needs or want cheap storage of blobs. I'm assuming that Amazon will get the DNS/load distribution issue figured out and then this would seem to be a pretty compelling offering.

As it stands, there are a few cases where it would have applied on projects I've worked on:

  • LivePlanet - could have used it to bulk up for each contest, store videos and scripts on S3, and possibly do format conversions. A lot of the examples that Amazon used were media related.
What's interesting about storing media is that it's not a CDN like Akamai. Rather it's cheap, reliable storage that may or may not have good retrieval characteristics.

  • eHarmony and MyShape seem they could use it to distribute load around matching especially during media events. There are periodic intensive compute cycles and the need to ramp up quickly based on particular needs. This would seem to be a good case for us to separate off matching and making that a more scalable operation on EC2.
This will definitely be something for us to look at going forward. I'm hoping we can have one or two of the users come into our CTO Forum to talk about their experience.

No comments: