Wednesday, April 30, 2014

How Freshdesk Scaled Its Technology (Part I) – Before Sharding

April 29, 2014 · by Guest Author · in Technology

mperham/dalli · GitHub

Dalli

Dalli is a high performance pure Ruby client for accessing memcached servers. It works with memcached 1.4+ only as it uses the newer binary protocol. It should be considered a replacement for the memcache-client gem.

The name is a variant of Salvador Dali for his famous painting The Persistence of Memory.

bdurand/seamless_database_pool · GitHub

Seamless Database Pool provides a simple way in which to add support for a master/slave database cluster to ActiveRecord to allow massive scalability and automatic failover. The guiding design principle behind this code is to make it absolutely trivial to add to an existing, complex application. That way when you have a big, nasty application which needs to scale the database you won't have to stop all feature development just to refactor your database connection code. Let's face it, when the database is having scaling problems, you are in for a world of hurt and the faster you can fix the problem the better.

This code is available as both a Rails plugin and a gem so it will work with any ActiveRecord application.

Database Clusters

In a master/slave cluster you have one master database server which uses replication to feed all changes to one or more slave databases which are set up to only handle reads. Since most applications put most of the load on the server with reads, this setup can scale out an application quite well. You'll need to work with your database of choice to get replication set up. This plugin has an connection adapter which will handle proxying database requests to the right server.

Simple Integration

You can convert a standard Rails application (i.e. one that follows the scaffold conventions) to use a database cluster with three simple steps:

Set up the database cluster (OK maybe this one isn't simple)
Update database.yml settings to point to the servers in the cluster
Add this code to ApplicationController:

include SeamlessDatabasePool::ControllerFilter
use_database_pool :all => :persistent, [:create, :update, :destroy] => :master

If needed you can control how the connection pool is utilized by wrapping your code in some simple blocks.

Failover

One of the other main advantages of using any sort of cluster is that one node can fail without bringing down your application. This plugin automatically handles failing over dead database connections in the read pool. That is if it tries to use a read connection and it is found to be inactive, the connector will try to reconnect. If that fails, it will try another connection in the read pool. After thirty seconds it will try to reconnect the dead connection again.

One limitation on failover is when database servers are down when the pool is being initialized during startup. In this case, the connections cannot be initialized and are not added to the pool. If this happens, you will need to restart your processes once the database servers are back online.

Configuration

The pool configuration

The cluster connections are configured in database.yml using the seamless_database_pool adapter. Any properties you configure for the connection will be inherited by all connections in the pool. In this way, you can configure ports, usernames, etc. once instead of for each connection. One exception is that you can set the pool_adapter property which each connection will inherit as the adapter property. Each connection in the pool uses all the same configuration properties as normal for the adapters.

The read pool

The read pool is specified with a read_pool property in the pool connection definition in database.yml. This should be specified as an array of hashes where each hash is the configuration for each read connection you'd like to use (see below for an example). As noted above, the configuration for the entire pool will be merged in with the options for each connection.

Each connection can be assigned an additional option of pool_weight. This value should be number which indicates the relative weight that the connection should be given in the pool. If no value is specified, it will default to one. Setting the value to zero will keep the connection out of the pool.

If possible, you should set the permissions on the database user for the read connections to one that only has select permission. This can be especially useful in development and testing to ensure that the read connection never have writes sent to them.

The master connection

The master connection is specified with a master_connection property in the pool connection definition in database.yml (see below for an example). The master connection will be used for all non-select statements against the database (i.e. insert, update, delete, etc.). It will also be used for all statements inside a transaction or any reload commands.

By default, the master connection will be included in the read pool. If you would like to dedicate this connection only for write operations, you should set the pool weight to zero. Do not duplicate the master connection in the read pool as this will result in the additional overhead of two connections to the database.

Example configuration

development:
  adapter: seamless_database_pool
  database: mydb_development
  username: read_user
  password: abc123
  pool_adapter: mysql2
  port: 3306
  master:
    host: master-db.example.com
    port: 6000
    username: master_user
    password: 567pass
  read_pool:
    - host: read-db-1.example.com
      pool_weight: 2
    - host: read-db-2.example.com

In this configuration, the master connection will be a mysql connection to master-db.example.com:6000 using the username master_user and the password 567pass.

The read pool will use three mysql connections to master-db, read-db-1, and read-db-2. The master connection will use a different port, username, password for the connection. The read connections will use the same values. Further, the connection read-db-1 will get half the traffic as the other two connections, so presumably it's on a more powerful box.

You must use compatible database adapters for both the master and the read connections. For example, you cannot use an Oracle server as your master and PostgreSQL servers as you read slaves.

Using the read pool

By default, the master connection will be used for everything. This is not terribly useful, so you should really specify a method of using the read pool for the actions that need it. Read connections will only be used for select statements against the database.

This is done with static methods on SeamlessDatabasePool.

Controller Filters

To ease integration into a Ruby on Rails application, several controller filters are provided to invoke the above connection methods in a block. These are not implemented as standard controller filters so that the connection methods can be in effect for other filters.

See SeamlessDatabasePool::ControllerFilter for more details.

Mobile Users at Airports Spend Over 1.6 Hours on Their Devices [Report]

April 29, 2014 · by Darshana S · in Mobile

Monday, April 28, 2014

Why Erlang Is Awesome

Why use Erlang?

built to kick ass, battle-proven, saves time & money, easy to learn

What's so awesome?

lightweight concurrency, transparent distribution, hot code upgrades, OTP & more...

Get started

Official site: erlang.org
Try Erlang right in your browser on tryerlang.org
For a good book try Erlang Programming

built to kick ass

Erlang was developed at Ericsson and was designed from the ground up for writingscalable, fault-tolerant, distributed, non-stop, soft-realtime applications.Everything in the language, runtime and libraries reflects that purpose, which makes Erlang the best platform for developing this kind of software.

Use Erlang if you want your application to:

handle very large number of concurrent activities
be easily distributable over a network of computers
be fault-tolerant to both software & hardware errors
scale with the number of machines on the network
be upgradable & reconfugurable without having to stop & restart
be responsive to users within certain strict timeframes
stay in continuous operation for many years

Because Erlang is oriented around concurrency, it's also naturally good at utilizing modern multicore systems.

Lightweight concurrency, transparent distribution, hot code replacement, and OTP are some of the specific features that make Erlang a joy to work with.

battle-proven

Erlang has been successfully used in production systems for over 20 years (with reported uptimes of 9-nines — that's 31ms of downtime a year). It's been proven to work well in both large-scale industrial software development, and in small agile teams in startups.

Ericsson themselves have used Erlang extensively for many projects of varying sizes, both commercial and internal. The AXD301 ATM, one of Ericsson's flagship products, may be the largest Erlang project in existence at over 1.1 million lines of Erlang.

Erlang users in the telecoms industry: Motorola, Nokia, T-Mobile, BT. Large software companies & startups: Amazon, Yahoo!, Facebook, Last.fm, Klarna,Tail-F, Github, Heroku, Engine Yard, MochiMedia. Open source projects: Flussonic, ejabberd, CouchDb, Riak, Disco, RabbitMQ, Dynomite.

saves time & money

Erlang lets you deliver kick-ass software faster, on smaller budgets and with smaller teams, and reduce TLC & TCO.

This is made possible by a number of reasons:

The OTP libraries provide a complete set of easy-to-use components for building robust distributed applications, that have been used in hundreds of projects, and thouroughly tested & debugged over the last 10 years.
A large number of high-quality open-source libraries is available for many other tasks, such as XML processing or interacting with database systems such as PostgreSQL. Interfacing with existing code in Java, C, Python or Ruby is straightforward too.
Erlang code tends to be conscise & readable, which is made possible by the simplicity of the language & powerful abstraction mechanisms available.
Erlang scales well to large & small teams, and makes both top-down and bottom-up approaches to building software natural.
Erlang is easy to learn. An experienced programmer can start writing useful code after a couple of days of learning Erlang.
Availability of high-quality tools such as documentation generators, testing frameworks, debuggers, graphical diagnostics tools, and IDEs.

easy to learn

Erlang has a simple & consistent core which makes it easy to pick up. Experienced programmers can start writing useful code after a couple of days with Erlang. There are no complicated concepts to understand or arcane theories to master. The syntax may look a little different if you're coming from Ruby, Python, or Java, but it doesn't take long to get used to.

In fact, making the language easy to pick up was one of the original design goals of the Erlang development team. Erlang is very pragmatic & has been made by working programmers, for working programmers.

lightweight concurrency

Processes are very lightweight, with only about 500 bytes of overhead per-process. This means that millions of processes can be created, even on older computers.

Because Erlang's processes are completely independent of OS processes (and aren't managed by the OS scheduler), your programs will behave in exactly the same way regardless of whether they run on Linux, FreeBSD, Windows or any of the other systems that Erlang runs on.

Because of Erlang's great support for concurrency it becomes natural to model applications around multiple independent communicating agents, which is just how things are in the real world.

hot code replacement

In a real-time control system we often don't want to stop the system in order to upgrade the code. In certain real-time control systems we may never be able to turn off the system to perform upgrades, and such systems have to be designed with dynamic code upgrades in mind. An example of such system is the X2000 satellite control system developed by NASA.

When you write your app in Erlang, you get dynamic code upgrade support for freewhen you use OTP. The mechanism itself is very straightforward and easy to understand.

This can save hundreds of hours of time in development:

This is a common Erlang development workflow:

Start the app.
Edit the code.
Recompile. (one keystroke)
That's it! There is no restart step. The app gets updated with the new code while it's running and tests get run automatically to ensure there are no regressions. This of course works great with TDD too.

transparent distribution

Erlang programs can be easily ported from a single computer to a network of computers. With the exception of timing all operations in the distributed system will work in exactly the same way as they worked in a single-node system.

OTP

OTP, the Open Telecom Platform, is a collection of standard libraries that distill years of real-world experience of building scalable, distributed, fault-tolerant applications.

more...

Free & open-source. Erlang is distributed under a permissive open-source license, and is free to use for any open-source, freeware, or commercial projects.
Cross-platform. Erlang runs on Linux, FreeBSD, Windows, Solaris, Mac OS X, and even embedded platforms such as VxWorks.
Well-supported. A dedicated team of engineers is employed by Ericsson to work on Erlang. Commercial support & services are available from Erlang Solutions and a number of other companies. There is also a responsive community around the world, centered around the Erlang Mailing List & IRC (#erlang on Freenode).
Plays well with the outside world. Integration with existing Java, .NET, C, Python, or Ruby code is straightforward. There is an interface to the underlying OS should you need one. Solid libraries to work with XML, JSON, ASN.1, CORBA etc are also available.
HiPE. The High Performance Erlang Compiler can compile Erlang to native code on Windows, Linux and Mac OS X and comes in the standard Erlang distribution.
Static typing, when you need it. You can annotate your code with type information & use Dialyzer, a powerful typechecker, to ensure the correctness of your code and gain performance. Dialyzer comes bundled with Erlang, and also supports gradual typing to give you maximum flexibility.
Bit syntax. Another feature unique to Erlang that makes working with binary data a breeze. Writing programs such as binary file readers or network protocol parsers is easier in Erlang than in any other language. Erlang code that uses the binary syntax is compiled into very efficient machine code, often beating hand-written C code in performance.

Google Finds: Centralized Control, Distributed Data Architectures Work Better than Fully Decentralized Architectures - High Scalability -

Google Finds: Centralized Control, Distributed Data Architectures Work Better Than Fully Decentralized Architectures

MONDAY, APRIL 7, 2014 AT 8:56AM

For years a war has been fought in the software architecture trenches between the ideal of decentralized services and the power and practicality of centralized services. Centralized architectures, at least at themanagement and control plane level, are winning. And Google not only agrees, they are enthusiastic adopters of this model, even in places you don't think it should work.

Here's an excerpt from Google Lifts Veil On “Andromeda” Virtual Networking, an excellent article by Timothy Morgan, that includes a money quote from Amin Vahdat, distinguished engineer and technical lead for networking at Google:

Like many of the massive services that Google has created, the Andromeda network has centralized control. By the way, so did the Google File System and the MapReduce scheduler that gave rise to Hadoop when it was mimicked, so did the BigTable NoSQL data store that has spawned a number of quasi-clones, and even the B4 WAN and the Spanner distributed file system that have yet to be cloned.

"What we have seen is that a logically centralized, hierarchical control plane with a peer-to-peer data plane beats full decentralization,” explained Vahdat in his keynote. “All of these flew in the face of conventional wisdom,” he continued, referring to all of those projects above, and added that everyone was shocked back in 2002 that Google would, for instance, build a large-scale storage system like GFS with centralized control. “We are actually pretty confident in the design pattern at this point. We can build a fundamentally more efficient system by prudently leveraging centralization rather than trying to manage things in a peer-to-peer, decentralized manner.”

The context of the article is Google's impressive home brew SDN (software defined network) system that uses a centralized control architecture instead of the Internet's decentralizedAutonomous System model, which thinks of the Internet as individual islands that connect using routing protocols.

SDN completely changes that model as explained by Greg Ferro:

The major difference between SDN and traditional networking lies in the model of controller-based networking. In a software-defined network, a centralized controller has a complete end-to-end view of the entire network, and knowledge of all network paths and device capabilities resides in a single application. As a result, the controller can calculate paths based on both source and destination addresses; use different network paths for different traffic types; and react quickly to changing networking conditions.

In addition to delivering these features, the controller serves as a single point of configuration. This full programmability of the entire network from a single location, which finally enables network automation, is the most valuable aspect of SDN.

So a centralized controller knows all and sees all and hardwires routes by directly programming routers. In the olden says slow BGP convergence times after a fault was detected would kill performance. With your own SDN on your hardware failure response times can be immediate, as the centralized controller will program routers with a possibly precalculated alternative route. This is a key feature for today's cloud based systems that demand highly available, low latency connections, even across the WAN.

Does this mean the controller is a single process? Not at all. It's logically centralized, but may be split up among numerous machines as is typical in any service architecture. This is how it can scale. With today's big iron, big memory, and fast networks the motivation for adopting a completely decentralized architecture for capacity reasons is not compelling except for all but the largest problems.

At Internet scale, the Autonomous System model of being logically and physically decentralized is still a win, it can scale wonderfully, but at the price of high coordination costs and slow reactions times. Which was fine in the past, but doesn't work for today's networking needs.

Google isn't running an Internet. They are running a special purpose network for their own particular portfolio of needs. Why should they use an over generalized technology meant for a completely different purpose?

We Can See Centralization Winning In The Services That People Choose To Use.

Email and NNTP, both fully decentralized services, while not dead by any means, have given way to centralized services like Twitter, Facebook, G+, WhatsApp, and push notifications. While decentralization plays an important part in the back-end of most every software service, the services themselves are logically centralized.

Centralization makes a lot of things easier. Search, for example. If you want great search you need all the data in one place. That's why Google crawls the web and stashes it in their very large back pocket. Identity is a dish best served centralized. As are things like follow lists, joins, profiles, A/B testing, frequent pushes, iterative design, fraud detection, DDoS mitigation, deep learning, and virtually any kind of high value add feature you want to create.

Also, having a remote entity not under your control as a key component to your product is inviting a high latency and a variable user experience due to failures. Not something you want in your service. End-to-end control is key for creating an experience.

So when you argue for a fully decentralized architecture it's hard to argue based on features or scalability, you have to look elsewhere.

Decentralization Is Also A Political Choice.

Attempts to make a decentralized or federated Twitter service, for example, while technically feasible, have not busted out into general adoption. The simple reason is centralization works and as a user what you want is something that works. That's primary. Secondary qualities like security, owning your own data, resilience, free speech, etc. while of great importance to some, barely register as issues to the many.

But for the few, these secondary qualities are exactly what they prize the most. Doc Searls in articles like Escaping the Black Holes of Centralization makes the case that decentralization is important for human rights and personal sovereignty reasons. A fully distributed and encrypted P2P chat system is a lot harder to compromise than a centralized service run by a large faceless corporation.

When You Are Thinking About The Architecture Of Your Own System...

If it is for personal sovereignty purposes, or it operates at Internet or inter-planetary scale, or it must otherwise operate autonomously then federation is your friend.

If your system is smallish then a completely centralized architecture is still quite attractive.

For the vast middle ground Google has shown centralized management and control combined with distributed data is probably now the canonical architecture. Don't get caught up trying to make distributed everything work. You probably don't need it and it's really really hard.

But then again Oceania has always been at war with Eastasia.

Stop Wasting Users' Time | Smashing Magazine

Stop Wasting Users’ Time

By Paul Boag

Our users are precious about their time and we must stop wasting it. On each project ask two questions: “Am I saving myself time at the expense of the user?” and “How can I save the user time here?” What is the single most precious commodity in Western society? Money? Status? I would argue it is time.

We are protective of our time, and with good reason. There are so many demands on it. We have so much to do. So much pressure. People hate to have their time wasted, especially online. We spend so much of our time online these days, and every interaction demands a slice of our time. One minor inconvenience on a website might not be much, but, accumulated, it is death by a thousand cuts.

Steve Jobs claimed that improving the boot time on the Macintosh would save lives. A 10-second improvement added up to many lifetimes over the millions of users booting their computers multiple times a day.

Steve Jobs was obsessed with saving the user time, and we should be, too. (Large preview)

Millions of people might not use your website, but millions do use the Web as a whole. Together, we are stealing people’s lives through badly designed interactions. When I work on a website, one question is front and center in my mind:

“Am I saving myself time at the expense of the user?”

That is the heart of the problem. In our desire to meet deadlines and stay on budget, we often save ourselves time by taking shortcuts via our users’ time. Let’s explore some examples of what I mean.

Taking Time To Improve Performance

The most obvious example of wasting users’ time is website performance. This is what Jobs was getting at with boot times. If our websites are slow, then we’ll waste our users’ valuable time and start to irritate them. One more cut, so to speak.

The problem is that improving performance is hard. We became lazy as broadband became widespread. We cut corners in image optimization, HTTP requests and JavaScript libraries. Now, users pay the price when they try to access our websites on slow mobile devices over cellular networks.

Optimizing your website for performance not only saves your users time, but improves your search engine rankings. (Large preview)

Making our websites faster takes time and effort, but why should users suffer for our problems? On the subject of making our problems the users’ problem, let’s take a moment to talk about CAPTCHA.

CAPTCHA: The Ultimate Time-Waster

CAPTCHA is the ultimate example of unloading our problems onto users. How many millions of hours have users wasted filling in CAPTCHA forms? Hours wasted because we haven’t addressed the problem of bots.

CAPTCHA forces the user to deal with something that is really our problem.

Just to be clear, I am not just talking about traditional CAPTCHA either. I am talking about any system that forces the user to prove they are human. Why should they have to prove anything? Once again, another inconvenience, another drain on their precious time.

We could solve this problem if we put the time into it. The honeytrap technique helps. There are also server-side solutions for filtering out automated requests. The problem is that throwing a CAPTCHA on a website is easier.

Not that CAPTCHA is the only way that we waste the user’s time when completing forms.

Don’t Make Users Correct “Their” Mistakes In Forms

Sometimes we even waste the user’s time when we are trying to help them. Take postal-code lookup. I have been on websites that try to save me time by asking me to enter my postal code so that it can auto-populate my address. A great idea to save me some time — great if it works, that is.

The problem is that some lookup scripts require the postal code to have no spaces. Instead of the developer configuring the script to remove any spaces, they just return an error, and the user has to correct “their” mistake. Why should the user have to enter the data in a particular way? Why waste their time by requiring them to re-enter their postal code? This doesn’t just apply to postal codes either. Telephone numbers and email addresses come with similar problems.

We also need to better help mobile users interact with forms. Forms are particularly painful on touchscreens, so we need to explore alternative form controls, such as sliders and the credit-card input system in Square’s mobile app.

Then, there are passwords.

Why Are Passwords So Complicated?

Why do we waste so much of the users’ time with creating passwords? Every website I visit these days seems to have ever more complex requirements for my password. Security is important, but can’t we come up with a better solution than an arcane mix of uppercase, numbers and symbols?

Why couldn’t we ask users to type in a long phrase instead of a single word? Why can’t my password be, “This is my password and I defy anyone to guess it”? The length would make it secure, and remembering and typing it would be much easier. If your system doesn’t like the spaces, strip them out. You could even provide an option for people to see what they’re typing.

Example of how longer passwords help security

A long password phrase is as secure as a short password with numbers and symbols yet easier to remember. (Large preview)

If you can’t do that, at least provide instructions when the user tries to log in. Remind them of whether your website wants uppercase or a certain number of characters. That would at least help them remember their password for your website.

The important thing is to recognize that people have to log in all the time. The task demands extra attention so that it is as painless as possible.

Pay Special Attention To Repetitive Tasks

We should ask ourselves not only whether we are unloading our problems onto users, but also how we can save our users time.

Take those common tasks that users do on our websites time and again. How can we shave a quarter of a second off of those tasks? What about search? If the user enters a search term on your website, will hitting the “Return” key submit the query? They shouldn’t have to click the “Search” button.

Drop-down menus are another good example. Navigating country-pickers can be painful. Could we display countries differently, or make the most common countries faster to access? In fact, so much could be done to improve country-pickers if we just take the time.

Something as simple as a country-picker can waste a surprising amount of time, especially if you are British! (Large preview)

For that matter, a more robust solution to “Remember me” functionality would be nice, so that users are, in fact, remembered!

I am aware that this post might sound like a rant against developers. It is not. It is a problem faced by all Web professionals. Designers need to pay close attention to the details of their designs. Web managers need to ensure that the budget exists to refine their user interfaces. And content creators need to optimize their content for fast consumption.

Help Users Process Our Content Faster

We waste so much of our users’ time with verbose, poorly written and dense copy, making it hard for them to find the piece of information they need. The real shame is that we could do so much to help. For a start, we could give the user a sense of approximately how long a page will take to read. I offer this functionality on my personal blog, and it is the feature most commented on. Users love knowing how much of their time a post will take up.

We can also make our content a lot more scannable, with better use of headings, pullout quotes and lists. Finally, we can take a leaf out of Jakob Nielsen’s website. At the beginning of each post, he provides a quick summary of the page.

THE TIP OF THE ICEBERG

We could do so much more in all aspects of Web design to save users’ time. From information architecture to website analytics, we waste too much of it. Sometimes we even know we are doing it! We need to be forever vigilant and always ask ourselves:

“How can I save the user time in this situation?”

What are your thoughts on this topic? Please share your experiences and opinions with us, and join in the discussion in the comments section below.

Search This Blog