Scaling Digg And Other Web Applications
Joe Stump, Lead Architect at Digg, gave this presentation at the Web 2.0 Expo. I couldn't find the actual presentation, but fortunately Kris Jordan took some great notes. That's how key moments in history are accidentally captured forever. Joe was also kind enough to respond to my email questions with a phone call.
In this first part of the post Joe shares some timeless wisdom that you may or may not have read before. I of course take some pains to extract all the wit from the original presentation in favor of simple rules. What really struck me however was how Joe thought MemcacheDBWill be the biggest new kid on the block in scaling. MemcacheDB has been around for a little while and I've never thought of it in that way. Well learn why Joe is so excited by MemcacheDB at the end of the post.
In this first part of the post Joe shares some timeless wisdom that you may or may not have read before. I of course take some pains to extract all the wit from the original presentation in favor of simple rules. What really struck me however was how Joe thought MemcacheDBWill be the biggest new kid on the block in scaling. MemcacheDB has been around for a little while and I've never thought of it in that way. Well learn why Joe is so excited by MemcacheDB at the end of the post.
Impressive Stats
Scaling Strategies
IO. Bottlenecks aren't in the language when you are handling so many simultaneous requests. Making PHP 300% faster won't matter. Don't optimize PHP by using single quotes instead of double quotes when
the database is pegged.
- This approach pushes chunks of processing to another service and let's that service schedule the processing on a grid of processors.
- It's faster and more responsive than cron and only slightly less responsive than real-time.
- For example, issuing 5 synchronous database requests slows you down. Do them in parallel.
- Digg uses Gearman. An example use is to get a permalink. Three operations are done parallel: get the current logged, get the permalink, and grab the comments. All three are then combined to return a combined single answer to the client. It's also used for site crawling and logging. It's a different way of thinking.
- See Flickr - Do the Essential Work Up-front and Queue the Rest and The Canonical Cloud Architecture for more information.
- denormalize
- avoid joins
- avoid large scans across databases by partitioning
- cache
- add read slaves
- don't use NFS
Miscellaneous
decisions. Trust that people have it handled and they'll take care of it. Cuts down on meetings because you know people will do the job right.
/1.0/service/id/xml. Version both internal and external services.
No comments:
Post a Comment