Mustali Kachwala's Blog: Scaling Digg and Other Web Applications

Scaling Digg and Other Web Applications - High Scalability -

Scaling Digg And Other Web Applications

Joe Stump, Lead Architect at Digg, gave this presentation at the Web 2.0 Expo. I couldn't find the actual presentation, but fortunately Kris Jordan took some great notes. That's how key moments in history are accidentally captured forever. Joe was also kind enough to respond to my email questions with a phone call.

In this first part of the post Joe shares some timeless wisdom that you may or may not have read before. I of course take some pains to extract all the wit from the original presentation in favor of simple rules. What really struck me however was how Joe thought MemcacheDBWill be the biggest new kid on the block in scaling. MemcacheDB has been around for a little while and I've never thought of it in that way. Well learn why Joe is so excited by MemcacheDB at the end of the post.

Impressive Stats

80th-100th largest site in the world

26 million uniques a month

30 million users.

Uniques are only half that traffic. Traffic = unique web visitors + APIs + Digg buttons.

2 billion requests a month

13,000 requests a second, peak at 27,000 requests a second.

3 Sys Admins, 2 DBAs, 1 Network Admin, 15 coders, QA team

Lots of servers.

Scaling Strategies

Scaling is specialization. When off the shelf solutions no longer work at a certain scale you have to create systems that work for your particular needs.

Lesson of web 2.0: people love making crap and sharing it with the world.

Web 2.0 sucks for scalability. Web 1.0 was flat with a lot of static files. Additional load is handled by adding more hardware. Web 2.0 is heavily interactive. Content can be created at a crushing rate.

Languages don't scale. 100% of the time bottlenecks are in
IO. Bottlenecks aren't in the language when you are handling so many simultaneous requests. Making PHP 300% faster won't matter. Don't optimize PHP by using single quotes instead of double quotes when
the database is pegged.

Don’t share state. Decentralize. Partitioning is required to process a high number of requests in parallel.

Scale out instead of up. Expect failures. Just add boxes to scale and avoid the fail.

Database-driven sites need to be partitioned to scale both horizontally and vertically. Horizontal partitioning means store a subset of rows on a different machines. It is used when there's more data than will fit on one machine. Vertical partitioning means putting some columns in one table and some columns in another table. This allows you to add data to the system without downtime.

Data are separated into separate clusters: User Actions, Users, Comments, Items, etc.

Build a data access layer so partitioning is hidden behind an API.

With partitioning comes the CAP Theorem: you can only pick two of the following three: Strong Consistency, High Availability, Partition Tolerance.

Partitioned solutions require denormalization and has become a big problem at Digg. Denormalization means data is copied in multiple objects and must be kept synchronized.

MySQL replication is used to scale out reads.

Use an asynchronous queuing architecture for near-term processing.
- This approach pushes chunks of processing to another service and let's that service schedule the processing on a grid of processors.
- It's faster and more responsive than cron and only slightly less responsive than real-time.
- For example, issuing 5 synchronous database requests slows you down. Do them in parallel.
- Digg uses Gearman. An example use is to get a permalink. Three operations are done parallel: get the current logged, get the permalink, and grab the comments. All three are then combined to return a combined single answer to the client. It's also used for site crawling and logging. It's a different way of thinking.
- See Flickr - Do the Essential Work Up-front and Queue the Rest and The Canonical Cloud Architecture for more information.

Bottlenecks are in IO so you have tune the database. When the database is bigger than RAM the disk is hit all the time which kills performance. As the database gets larger the table can't be scanned anymore. So you have to:
- denormalize
- avoid joins
- avoid large scans across databases by partitioning
- cache
- add read slaves
- don't use NFS

Run numbers before you try and fix a problem to make sure things actually will work.

Files like for icons and photos are handled by using MogileFS, a distributed file system. DFSs support high request rates because files are distributed and replicated around a network.

Cache forever and explicitly expire.

Cache fairly static content in a file based cache.

Cache changeable items in memcached

Cache rarely changed items in APC. APC is a local cache. It's not distributed so no other program have access to the values.

For caching use the Chain of Responsibility pattern. Cache in MySQL, memcached APC, and PHP globals. First check PHP globals as the fastest cache. If not present check APC, memcached and on up the chain.

Digg's recommendation engine is a custom graph database that is eventually consistent. Eventually consistent means that writes to one partition will eventually make it to all the other partitions. After a write reads made one after another don't have to return the same value as they could be handled by different partitions. This is a more relaxed constraint than strict consistency which means changes must be visible at all partitions simultaneously. Reads made one after another would always return the same value.

Assume 1 million people a day will bang on any new feature so make it scalable from the start. Example: the About page on Digg did a live query against the master database to show all employees. Just did a quick hack to get out. Then a spider went crazy and took the site down.

Miscellaneous

Digg buttons were a major key to generating traffic.

Uses Debian Linux, Apache, PHP, MySQL.

Pick a language you enjoy developing in, pick a coding standard, add inline documentation that's extractable, use a code repository, and a bug tracker. Likes PHP, Track, and SVN.

You are only as good as your people. Have to trust guy next to you that he's doing his job. To cultivate trust empower people to make
decisions. Trust that people have it handled and they'll take care of it. Cuts down on meetings because you know people will do the job right.

Completely a Mac shop.

Almost all developers are local. Some people are remote to offer 24 hour support.

Joe's approach is pragmatic. He doesn't have a language fetish. People went from PHP, to Python/Ruby, to Erlang. Uses vim. Develops from the command line. Has no idea how people constantly change tool sets all the time. It's not very productive.

Services (SOA) decoupling is a big win. Digg uses REST. Internal services return a vanilla structure that's mapped to JSON, XML, etc. Version in URL because it costs you nothing, for example:
/1.0/service/id/xml. Version both internal and external services.

People don't understand how many moving parts are in a website. Something is going to happen and it will go down.

Mustali Kachwala's Blog

Search This Blog

Saturday, October 11, 2014

Scaling Digg and Other Web Applications - High Scalability -

Scaling Digg And Other Web Applications

Impressive Stats

Scaling Strategies

Miscellaneous

No comments:

Post a Comment