Mustali Kachwala's Blog: August 2011

Thursday, August 25, 2011

MySQL: Partition-wise backups with mysqldump

MySQL: Partition-wise backups with mysqldump: To whom it may concern,

in response to a query from André Simões (also known as ITXpander), I slapped together a MySQL script that outputs mysqldump commands for backing up individual partitions of the tables in the current schema. The script is maintained as a snippet at MySQL Forge.

How it works

The script works by querying the information_schema.PARTITIONS system view to generate an appropriate expression for mysqldump's --where option. The generated command also redirects the output to a file with this name pattern:

<schema>.<table>.<partition-name>.sql

For example, for this table (taken from the MySQL reference manual):

CREATE TABLE members (
    firstname VARCHAR(25) NOT NULL,
    lastname VARCHAR(25) NOT NULL,
    username VARCHAR(16) NOT NULL,
    email VARCHAR(35),
    joined DATE NOT NULL
)
PARTITION BY RANGE( YEAR(joined) ) (
    PARTITION p0 VALUES LESS THAN (1960),
    PARTITION p1 VALUES LESS THAN (1970),
    PARTITION p2 VALUES LESS THAN (1980),
    PARTITION p3 VALUES LESS THAN (1990),
    PARTITION p4 VALUES LESS THAN MAXVALUE
);

the script generates the following commands:

mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) < 1960" test members > test.members.p0.sql
mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) >= 1960 and  YEAR(joined) < 1970" test members > test.members.p1.sql
mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) >= 1970 and  YEAR(joined) < 1980" test members > test.members.p2.sql
mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) >= 1980 and  YEAR(joined) < 1990" test members > test.members.p3.sql
mysqldump --user=username --password=password --no-create-info --where=" YEAR(joined) >= 1990 and  YEAR(joined) < 18446744073709551615" test members > test.members.p4.sql

Tip: in order to obtain directly executable output from the mysql command line tool, run the script with the --skip-column-names (or -N) option.

Features

Currently, the script supports the following partitioning methods:

HASH

LIST

RANGE

Limitations

The LINEAR HASH method is currently not supported, but I may implement that in the future.

Currently I do not have plans to implement the KEY and LINEAR KEY partitioning methods, but I may reconsider if and when I have more information about the storage-engine specific partitioning functions used by these methods.

Finally, I should point out that querying the information_schema.PARTITIONS table is dog-slow. This may not be too big of an issue, however it is pretty annoying. If anybody has some tips to increase performance, please let me know.

Acknowledgements

Thanks to André for posing the problem. I had a fun hour of procrastination to implement this, and it made me read part of the MySQL reference manual on partitioning.

I also would like to thank Giuseppe Maxia (the Datacharmer) for providing valuable feedback. If you're interested in either partitioning or the mysql command line, you should visit his tutorials at the MySQL conference, april 12-15, 2010.

MySQL Partitioning – can save you or kill you

MySQL Partitioning – can save you or kill you:

I wanted for a while to write about using MySQL Partitioning for Performance Optimization and I just got a relevant customer case to illustrate it. First you need to understand how partitions work internally. Partitions are on the low level are separate table. This means when you're doing lookup by partitioned key you will look at one (or some of) partitions, however lookups by other keys will need to perform lookup in all partitions and hence can be a lot slower. The gain from updates typically comes from having smaller BTREE on the active partition(s) which allows for a lot better fit. Having potentially fewer level in BTREE is not that significant issue.

So lets see at example:

PLAIN TEXT

SQL:

CREATE TABLE `tbl` (

`id` bigint(20) UNSIGNED AUTO_INCREMENT NOT NULL,

`uu` varchar(255) DEFAULT NULL,

`data` bigint(20) UNSIGNED DEFAULT NULL,

PRIMARY KEY (`id`),

KEY `uu` (`uu`),

) ENGINE=InnoDB

The access pattern to this table is to lookup data by "uu" which has UUID values and when number of deletes by "id" and bunch of inserts. The deletes are mainly clustered around most recent id values.

The table (and index) is much larger than buffer pool size.

The first problem was replication lag, which are mainly due to modifying the uu index. This is because UUID() spreads values prefix very well effectively giving almost uniform access to all BTREE. To solve this problem partitioning was a good choice - PARTITION BY HASH (id div 10000000) PARTITIONS 32 - This allows to partition data to 32 partitions placing sequential ranges of 10M values in the same partition - very handy if you have very active access to values which ave been added to the table recently.

Using this trip replication could be speed up about 10 times as couple of partitions which were actively used could fit in buffer pool completely so replication became CPU bound (single thread) instead of IO bound.

You could celebrate but hey.... you need to check the impact on master too. Master in its turn was getting a lot of lookups by the uu value which is not part of partitioned key and hence we're looking at 32 logical lookups, one per partition. True only one of the partitions would contain the value but many of them will require physical IO and going down to the leaf key to verify such value does not exist, which reduced performance for random selects by UUID from 400 to 20 per second (from single thread).

Decreasing number of partitions made replication less efficient but the number of selects the table could deliver was increasing and there seems to be a reasonable number which would allow replication to perform better when it is now, while selects still performed in the amount system needs.

What is a take away ? When you're creating partitions think clearly what you're trying to archive. Partitioning is not some magic feature which just makes everything a lot faster. I've seen some people applying partition to basically all of their tables without much a thought and believe me results were not pretty.

Entry posted by Peter Zaitsev |
10 comments

Add to: | | | |

Using MySQL Partitioning Instead of MERGE Tables

Using MySQL Partitioning Instead of MERGE Tables:

One common question I get is how to use partitioning instead of MERGE tables. The process I use involves using stored procedures to create and drop partitions. This article will go over the stored procedures I use; special thanks to Roland Bouman for taking a look and giving great feedback to optimize this process.

First, a simple table, not partitioned (yet):


use test;

DROP TABLE IF EXISTS my_part;

CREATE TABLE IF NOT EXISTS my_part (

  id int NOT NULL,

  creationDate datetime NOT NULL,

  PRIMARY KEY (id,creationDate)

) ENGINE=InnoDB;

In real, life there is more to the table than just id and creationDate. The most important part is that the partitioned field(s) need to be part of the primary key.

Now, add the partition definition. This can be done in the CREATE statement, but I have found that it is easier for me to think about what fields and indexes I want first, and then worry about partitioning, so I naturally gravitate towards defining the fields and indexes first and then altering the table to add partitioning. This also helps me think about how to modify an existing table to add partitioning.


ALTER TABLE my_part PARTITION BY RANGE (TO_DAYS(creationDate)) (

partition 2010_07_01 values less than (to_days('2010-07-02')),

partition 2010_07_02 values less than (to_days('2010-07-03')),

partition 2010_07_03 values less than (to_days('2010-07-04')),

partition 2010_07_04 values less than (to_days('2010-07-05'))

);

This makes it pretty clear what is happening — the idea is to give the partition names actual dates that they hold, so that it is easy to see what partitions need to be added and deleted.

Deleting partitions

I find that making stored procedures makes things easy….so I will define a procedure called partition_drop to drop partitions. The partition_drop stored procedure takes in a table name and schema name to drop partitions from, and a date to delete up through and including that date. Here’s the procedure:


DELIMITER ||

DROP PROCEDURE IF EXISTS partition_drop ||



CREATE PROCEDURE partition_drop (IN through_date date, IN tbl varchar(64), IN db varchar(64))

BEGIN

DECLARE delete_me varchar(64);

DECLARE notfound BOOL DEFAULT FALSE;

DECLARE pname CURSOR FOR SELECT PARTITION_NAME

FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME=tbl AND TABLE_SCHEMA=db

AND DATE(PARTITION_NAME)!= 0

AND DATE(PARTITION_NAME) IS NOT NULL

AND DATE(PARTITION_NAME)<=through_date;



DECLARE CONTINUE HANDLER FOR NOT FOUND SET notfound:=TRUE;

OPEN pname;



cursor_loop: LOOP

FETCH pname INTO delete_me;

IF notfound THEN LEAVE cursor_loop; END IF;

SET @alter_stmt:=CONCAT("ALTER TABLE ",db,".",tbl," DROP PARTITION ",delete_me);



# sanity check commented out for production use

# SELECT @alter_stmt;

PREPARE stmt_alter FROM @alter_stmt; EXECUTE stmt_alter; DEALLOCATE PREPARE stmt_alter;



END LOOP;

CLOSE pname;

END ||

DELIMITER ;

Go ahead and run CALL partition_drop('2010-07-02','my_part','test'); to verify that SHOW CREATE TABLE my_part; shows that the desired partitions have been dropped.

Adding partitions

Adding partitions is similar to deleting partitions — using a stored procedure that takes in the date to add partitions up to that date. It will not try to add so many partitions that the table will have more than 1024, and it won’t add any partitions that already exist.


DELIMITER ||

DROP PROCEDURE IF EXISTS partition_add ||



CREATE PROCEDURE partition_add (IN through_date date, IN tbl varchar(64), IN db varchar(64))

BEGIN

DECLARE add_me char(10);

DECLARE max_new_parts,add_cnt smallint unsigned default 0;

SELECT 1024-COUNT(*) AS max_new_parts,

SUM(CASE WHEN

 DATE(PARTITION_NAME)>=through_date then 1 else 0

 END)

INTO max_new_parts, add_cnt

FROM INFORMATION_SCHEMA.PARTITIONS

WHERE  TABLE_SCHEMA = db

 AND TABLE_NAME = tbl;



IF add_cnt=0 THEN

BEGIN

SELECT MAX(DATE(PARTITION_NAME)) INTO add_me

FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME=tbl and TABLE_SCHEMA=db

AND DATE(PARTITION_NAME)<through_date;



# to do: declare handler for exceptions here

IF DATEDIFF(through_date,add_me)+1 < max_new_parts THEN

BEGIN

WHILE add_me<through_date do BEGIN

SET add_me:=DATE_FORMAT(add_me + INTERVAL 1 DAY,"%Y_%m_%d");

SET @alter_stmt:=CONCAT("ALTER TABLE ",db,".",tbl," ADD PARTITION (PARTITION ",add_me," VALUES LESS THAN (TO_DAYS('",add_me+INTERVAL 1 DAY, "')))" );



PREPARE stmt_alter FROM @alter_stmt; EXECUTE stmt_alter; DEALLOCATE PREPARE stmt_alter;



END;

END WHILE;

END;

END IF;

END;

END IF;

END ||

DELIMITER ;

Here’s how to call that stored procedure:

CALL partition_add ('2010_07_10', 'my_part','test');

Caveat: This stored procedure will only add in partitions from the maximum partition name/date until the date you add it in, it will not fill in any gaps. However, it is possible to add in another input parameter to be the “start date” and change the stored procedure to add the partition if it does not exist, from the start date through the end date.

Note: It was pointed out by Roland Bouman that it would be better to change both the add and drop stored procedures to do one ALTER TABLE instead of sequential ones. He wrote:

It’s probably better to generate a single statement to drop / add all partitions.

So the general pattern would be:

- generate and concatenate lines for each partition

- one single sequence of prepare, execute, deallocate to execute one DDL statement.

For the drop partition procedure, this approach would allow you to do away with the cursor. You can simply use GROUP_CONCAT in a single SELECT…INTO statement to generate the entire DDL statement.

Note: Roland also mentioned that these stored procedures could be generalized to use any interval, for example if each table partition held 7 days. The only limitation is that to get the optimal partition performance with dates, use either TO_DAYS(date_field) or YEAR(date_field) as the partitionining function.

Putting it all together

The procedure: daily I would run the following in MySQL, to keep only the previous “x” days:

CALL partition_drop('CURRENT_DATE()-INTERVAL x DAY', 'tbl_name','schema_name');

To add new partitions, I would do:

CALL partition_add('CURRENT_DATE()-INTERVAL 32 DAY', 'tbl_name','schema_name');

It is run daily just in case; the side benefit is that it will catch any gaps if you do not modify the stored procedure to include a start date for when to add. Adding a partition does not take a long time, because there’s no data to reorganize.

I would make a daily MySQL event, and then everything is stored nicely in the database, and backed up with the database. Others may choose to run the stored procedures in a cron script or scheduled task (Windows), but that requires a password to be stored somewhere — either in the script, somewhere the script can read, or in a config file such as .my.cnf.

Keeping Your Database “Fresh” Using MERGE Engine

Keeping Your Database “Fresh” Using MERGE Engine:

Problem Statement:

As you store more and more records in your tables, your database becomes slower and slower. if you do not have a good archiving strategy and implementation, chances are your Web applications will slow down over time due to time consuming data scans in large tables. Don’t you wish your database looked exactly as new as it looked the day you turned it on live?

Good news! There is a way you can achieve a “fresh” feel for your database using the MERGE table engine. In this article we will discuss how. The best part of this solution is that you do not have to change your Web application a single line to take advantage of this hack!

Solution

The best way to describe this solution would be to use an example scenario. Assume that you have a MySQL table that stores order information. Every day new orders are being inserted into this table. If you run a large e-commerce site, you would have a huge order table in your hand in a year or so. As order table gets larger, your Web application gets incrementally slower and slower.

You could solve this problem by removing or archiving the order information to a different table and change your Web application responsible for showing order info to handle both tables in the code. This is a very desirable solution as the burden of database size issue falls on the Web application and not the database itself.

What would be ideal is a solution which allows you to empty the order table on a schedule — say daily, weekly, monthly, quarterly or even yearly — and have no change in your Web apps. Imagine if you could empty your order table every night and have all the old records still accessible as easily as before without changing a single line of Web application! Wow, that would be great. Enter MERGE Engine from MySQL!

MySQL Merge Table

Benefits of MERGE Engine

When you recreate a table to exists as a union of two or more merged tables using the MySQL MERGE engine type, you can direct all your INSERT traffic to a table in the union so that INSERT traffic completes less with your SELECT, UPDATE, DELETE traffic for older records. How much of the competition is reduced depends entirely on your applications.

Recreating your table with MERGE engine

MySQL MERGE table engine is designed to allow you just that. Here is how you can make it work:

Create two identical copies of your order table and call them order_live and order_archive

Create a new table called order with the same structure with ENGINE=MERGE UNION=(order_live,order_archive) INSERT_METHOD=FIRST; at the end of your create table statement

Now you should have order_live, order_archive and a merge table called order which is a union of the first two tables. Your Web application accesses the order table as usual. All inserts go to order table which automatically writes them into order_live (due to setting the INSERT_METHOD=FIRST option in the create table statement). This means your Web application continues to function exactly as before but new orders automatically enters into order_live table.

Setting up a cron job to move old data

Now setup a simple cron job that moves all records from order_live to order_archive using:

LOCK TABLE db_name.order_live WRITE;

REPLACE INTO db_name.order_archive SELECT * FROM db_name.order_live;

TRUNCATE db_name.order_live;

UNLOCK TABLE db_name.order_live;

This cron job locks the order_live table in db_name database so that no write operation is permitted during the execution of the cron job. It then inserts (or replaces if the records already exists) the order_live records into order_archive table. Once the data from order_live is copied into the order_archive table, the data in order_live is truncated. This frees up the order_live table. The lock on the order_live table is then released.

At this point, you have a fresh, empty order_live table ready to take the orders of tomorrow. The new orders will insert themselves into order_live via the merge interface provided by the order table and your older records are safely stored in the order_archive for other parts of your Web application to perform SELECT operation. Since your Web application only deal with the order table itself, there is no need for your application to know where the actual record resides.

For example, say you inserted an order with order_id set to 1000 today and it got moved to order_archive when the cron job ran. Now say for some reason your application need to perform a rare but needed update statement tomorrow on this record, it still will perform a query such as UPDATE order SET field=value WHERE order_id = 1000; without needing to know anything about the physical table location of the record. Beautiful, isn’t it?

Caveats of MERGE table

Nothing is ever free from caveats; sorry. The major caveats that we found when using MERGE table is related to altering merge table fields. Say you want to add a new field to your order table, you will have to add the fields in all the member tables and then recreate the merge table itself.

Tuesday, August 23, 2011

UIEtips: The $300 Million Button

UIEtips: The $300 Million Button:

In today’s UIEtips, I tell a story about a client who found a way to dramatically increase their e-commerce site’s revenues with a couple of simple changes. While the story is interesting, the story-behind-the-story is just as interesting.

The client had hired us because they were concerned about checkout-process abandonment. Their analytics were showing a 13% drop off in sales, which, based on the average value of the abandoned shopping carts, was worth about $1.2 million a year in additional revenue.

Checkout-process abandonment is common in e-commerce sites and something that you can easily detect with your site’s usage logs. You just look at the number of people who get to the first screen and then the number of people who actually complete the transaction. Everyone who doesn’t make it is an abandonment.

When the team contacted us, they’d already pretty much decided what the problem was and how they were going to fix it, even though they had never watched any shoppers make purchases. And they were dead wrong. Not only was their fix not going to help, our research showed that it was going to increase abandonment.

Two weeks of usability testing on the live site (and on competitors’ sites), followed by two weeks of iterative paper prototype testing produced a streamlined checkout process, which, once implemented, showed a dramatic increase in revenues. It’s amazing what you’ll learn when you actually watch your users.

Today’s article, The $300 Million Button, talks about the bulk of that increase — how a simple change to a common screen produced $300,000,000 of additional revenue over the next year. I’m sure you’ll find it interesting.

Improving forms, like a checkout process, can show immediate results in your design’s user experience. We’re fortunate that at this year’s UIE Web App Summit, we have Luke Wroblewski repeating last year’s top-rated Web Application Form Design full-day seminar. If your site has forms (and what site doesn’t these days), this is a must-take course!

Have you seen results from changes to your forms? We’d love to hear your experiences. Share them with us below.

New Approaches To Designing Log-In Forms

New Approaches To Designing Log-In Forms:

For many of us, logging into websites is a part of our daily routine. In fact, we probably do it so often that we’ve stopped having to think about how it’s done… that is, until something goes wrong: we forget our password, our user name, the email address we signed up with, how we signed up, or even if we ever signed up at all.

These experiences are not just frustrating for us, but are bad for businesses as well. How bad? User Interface Engineering’s analysis of a major online retailer found that 45% of all customers had multiple registrations in the system, 160,000 people requested their password every day, and 75% of these people never completed the purchase they started once they requested their password.

To top it off, visitors who are not logged in do not see a personalized view of a website’s content and recommendations, which reduces conversion rates and engagement. So, log-in is a big deal — big enough that some websites have started exploring new designs solutions for the old problem.

Is This You?

Gowalla’s sign-in form (below) looks pretty standard: enter your user name or email address and your password, and then sign in. There’s also help for those of us who have forgotten our password or are new to the website. In other words, all of the most common log-in user-interface components are accounted for.

The sign-in form on Gowalla.

But Gowalla has taken the time to include a few more components to help people log in with more confidence if their first attempt hasn’t worked. If you attempt to sign in with a user name (or email address) and password that do not match, the website not only returns an error but returns the profile image and user name of the account you are trying to sign into as well:

A log-in error on Gowalla.

Including a profile picture provides instant visual confirmation: “Yes, this is my account, and I may have forgotten my password,“ or “No, this isn’t my account, so I must have entered the wrong user name or email address.” In either case, Gowalla provides a way to resolve the problem: “This isn’t me” or “I don’t know my password.”

The Q&A website Quora takes a similar approach, but it doesn’t wait until you are done trying to sign in before providing feedback. Quora’s log-in form immediately tells you if no account is associated with the email address you have entered, and it gives you the option to create a new account right then and there:

Quora instantly lets you know if there are no matching accounts for the email address you have entered.

If the address you have entered does match an account on Quora, then the account’s profile image and user name will appear to the right of the log-in form. This confirmation is similar to Gowalla’s but comes right away instead of after you’ve submitted the form.

If the email address you enter on Quora matches an account, you get visual confirmation instantly.

Instant Sign-In

Quora’s log-in form also includes an option to “Let me log in without a password on this browser.” Checked by default, this setting does just what it says: it eliminates the need for you to enter a password when re-logging into Quora. All you need to do to enter the website is click on your profile picture or name on the log-in screen.

Quora’s one-click log-in page.

To go back to the standard log-in screen, just click the “x” or “Log in as another user,” and then you can sign in the hard way: by entering your email address and password.

While one-click sign-in on Quora is convenient, it doesn’t really help you across the rest of the Web. For that, many websites are turning to third-party sign-in solutions.

“Single-sign-on” solutions such as Facebook, Twitter, OpenID and more have tried to tackle log-in issues by cutting down on the number of sign-in details that people need to remember across all of the websites that they use. With these services, one account will get you into many different websites.

A sampling of single-sign-on solutions.

Logging in this way is faster, too. When someone connects their Facebook or Twitter account to a website, they simply need to click the “Sign in with Facebook (or Twitter)” button to log in. Of course, they need to be signed into their Facebook or Twitter account in order for it to work with one click. But with 50% of Facebook’s 750 million active users logging into Facebook on any given day, the odds are good that one click is all it takes.

You can see this log-in solution in action on Gowalla (screenshot below). A Gowalla user who has connected their Facebook account needs only to click on the “Log in with Facebook” option in order to sign in — provided they are already signed into Facebook, of course. If they’re not signed into Facebook, they’ll need to do that first (usually in a new dialog box or browser tab). After doing so, they will be instantly redirected to Gowalla and logged in.

Gowalla provides an option to log in using your Facebook account.

New Log-In Problems

But with these new benefits come new problems — usually in the form of too many choices. When faced with multiple sign-in options on a website, people do one of the following:

They remember the service they used to sign up (or that they connected to their account), and they log in instantly. This is the best case scenario.

They assume they can sign in with any third-party service (for which they have an account), regardless of whether they have an account on the website they are trying to log into. The thought process for these folks goes something like this: “It says I can sign in with Facebook. I have a Facebook account. I should be able to sign in.”

They forget which service they used to sign up or if they used one at all, and thus hesitate or fail to log in.

To make matters worse, if someone picks the wrong provider, instead of signing in to the service they’re trying to use, they might end up signing up again, thereby creating a second account. While a website can do its best to match accounts from different services, there’s no completely accurate way (that I know of) to determine whether a Twitter and a Facebook account definitively belong to the same person.

So, while third-party sign-in addresses some problems, it also creates a few new ones. In an attempt to solve some of these new sign-in issues, we’ve been experimenting with new log-in screen designs on Bagcheck.

Our most recent sign-in screen (below) is an attempt to reduce confusion and prevent the types of errors I have just described — admittedly, though, at the expense of one-click sign-in. In this design, people are required to enter their user name or email address to sign in. We use instant search results to match their input to an existing user on the website, so someone needs to type only the first few letters of their name to find their account quickly. This tends to be much faster than typing an entire email address. But because more than one person is likely to have the same name, we provide the ability to sign in with an email address as well.

Once someone selects their name or enters their email address, then their options for signing in are revealed. No sign-in actions are shown beforehand.

The current Bagcheck sign-in screen does not reveal any log-in options until you select your name or enter your email address.

True, in this design people can no longer sign in with one click, because the sign-in buttons are not visible by default. But this may be a trade-off worth making, for the following reasons:

We keep people signed in until they explicitly sign out. So, hopefully people will rarely need to go through the sign-in process. Remember: the less people need to log in, the fewer sign-in problems you’ll have!

The added amount of effort required to sign in is small: just start typing your name and select a search result, or enter your complete email address, and then click the sign-in button. It’s not one-click, but it’s not a lot of work either.

Trying to sign in with an account provider that you have not set up on Bagcheck is no longer possible, because the log-in buttons don’t show up until after you have selected your name. This cuts down on duplicate accounts and confusion over which account you have signed up with or connected (especially on different browsers and computers where a cookie has not been set).

On mobile, however, these trade-offs may not be worth it. Logging into a website on a mobile device by typing is a lot more work than just tapping a button. So, in the Bagcheck mobile Web experience, we’ve kept the third-party sign-in buttons front and center, allowing people to log in with just one tap. It’s just another example of how the constraints and capabilities of different devices can influence design decisions.

The Bagcheck mobile Web experience keeps one-tap sign-in options visible.

Since launching this log-in experience on Bagcheck, we’ve gotten a lot of great feedback and ideas for improving the interactions. Many people have suggested using browser cookies to set a default sign-in option for returning visitors. While this might help people who return to the website using the same browser, we’ve seen many more sign-in issues when people use a different browser or computer. In these cases, a browser cookie won’t help.

Another common question is whether allowing anyone to search the list of Bagcheck users by name or email address reduces security. While this design does somewhat reduce the security of a Bagcheck account (compared to our previous log-in screen design), it’s no worse than many websites that let you sign in with your public user name, like Twitter.

And because all Bagcheck profile pages are public, users can be searched for on Google and on Bagcheck itself. Despite this, we’ve seen a bit of increased concern over this same search capability being on the sign-in screen. So, if you’re thinking about trying this design, make sure your profile pages are public, and be aware that people may still be a bit sensitive about it.

We’ve All Got Email

Although signing into a service with one’s name may be too new for some people, logging in with an email address is common practice for most. Using a solution that brings together a lot of the ideas outlined in this article, Google’s Identity Toolkit and Account Chooser allow people to sign in across the Web using just their email address:

Google’s Identity Toolkit allows people to sign in with a number of email verification options.

When multiple accounts have been accessed in the same Web browser, each account is listed as a sign-in option, making account selection easier. If you want to try out this sign-in solution, you can opt in on Google’s website or implement it on your website with Google’s Toolkit.

Selecting from multiple accounts on Google’s experimental sign-in page.

The Little Things Matter, Too

The Bagcheck and Google examples we just looked at try to rethink log-in pages in big ways. But not all sign-in innovations need to be so comprehensive. Even small changes can have a big impact. For example, I mentioned earlier that inputting text precisely on mobile devices can be harder than on full keyboard computers. Coupled with obscured password fields, this can make logging into a website on a mobile device a challenge.

Facebook’s mobile Web experience tackles this in a small but useful way. If you enter an incorrect password when trying to sign in, the website will change the password field to plain text so that you can actually see your input. Facebook also offers an alternate way to log in, using your email address or phone number (screenshot below). It’s a small enhancement but one that can go a long way on mobile.

Facebook does a lot to help you log in on mobile.

It’s Not Over

As these examples illustrate, even the most common interactions on the Web (like logging in) could benefit from new ideas and design improvements. Not every idea I’ve walked through here will become part of all the log-in forms we encounter on the Web — chances are none of them will. But without trying, we’ll never know.

So, if you have some new ideas for signing in or any other Web interaction we’ve come to take for granted, try them out and let the rest of us know what you’ve learned!

Online Resources

“Web Form Design: Filling in the Blanks”

Everything you need to know about designing effective and engaging Web forms.

“Killing Sign-Up Forms”

Solutions for engaging people on websites without asking them to fill in a bunch of input fields.

“Text Input on Mobile Is Hard, But…”

… people enter text all the time on their mobile devices.

“The Social Agent, Part 2: Connect”

Part of a Mozilla Labs concept series, on using the Web browser against your account provider online.

Google’s Internet Identity Research

Public information from Google on projects in the identity space.

(al)

Friday, August 19, 2011

Free Backup Utility for Windows and Linux

Free Backup Utility for Windows and Linux:

Backup utility

Windows Windows; Linux Linux
Areca Backup is one of the few backup utilities that I’ve seen that runs on both Windows and Linux operating systems. Like most backup utilities you can choose what files you want to backup and where you want to send them to, which includes backing up the files to an FTP/SFTP server. You can also have the the backup zipped to help save some space.

Here are some of the features the developer highlights:

Archives compression (Zip & Zip64 format)

Archives encryption (AES128 & AES256 encryption algorithms)

Storage on local hard drive, network drive, USB key, FTP, FTPs (with implicit and explicit SSL / TLS) or SFTP server

Source file filters (by extension, subdirectory, regular expression, size, date, status, with AND/OR/NOT logical operators)

Incremental, differential and full backup support

Support for delta backup (store only modified parts of your files)

Archives merges : You can merge contiguous archives into one single archive to save storage space.

As of date recovery : Areca allows you to recover your archives (or single files) as of a specific date.

Transaction mechanism : All critical processes (such as backups or merges) are transactional. This guarantees your backups’ integrity.

Backup reports : Areca generates backup reports that can be stored on your disk or sent by email.

Post backup scripts : Areca can launch shell scripts after backup.

Files permissions, symbolic links and named pipes can be stored and recovered. (Linux only)

Areca Backup is developed using Java which is likely why it is available for both Windows and Linux. I generally stay away from Java apps, but I haven’t seen a backup utility this powerful for Linux before so I thought it was worth mentioning.

Areca Backup Homepage (Windows/Linux; Freeware)

How to Compete If Your Job Market Is Rebounding

How to Compete If Your Job Market Is Rebounding: Hiring is bouncing back in certain pockets around the U.S. Use these tips to help recruit the cream of the crop.

7 Ways to Troubleshoot MySQL

7 Ways to Troubleshoot MySQL:

MySQL databases are a great workhorse of the internet. They back tons of modern websites, from blogs and checkout carts, to huge sites like Facebook. But these technologies don't run themselves. When you're faced with a system that is slowing down, you'll need the right tools to diagnose and troubleshoot the problem. MySQL has a huge community following and that means scores of great tools for your toolbox.

1. Use innotop

Innotop is a great tool for MySQL which despite the name monitors MySQL generally as well as InnoDB usage. It's fairly easy to install, just download the perl script. Be sure to include a [client] section to your local users .my.cnf file (you have one don't you?). Inside that section, place one line with "user=xyz" and one line with "password=abc".

If you're concerned that installing something new is too complicated, use the poor man's version:

$ watch 'mysqladmin proc'

2. Analyze & Tune Queries

You've heard it before, so I hope you're already doing it. Not sure where to start? Enable your slow query log first. Set the long_query_time low enough to capture some queries.

Then use the great maatkit tool called mk-query-digest to analyze the slow.log file. The results will amaze you. You'll quickly be able to sift for the heaviest four or five queries.

Next do an EXPLAIN for each of those queries from the command line prompt. Isolate those lines which return a large number of rows. Attempt to add columns in the WHERE clause to reduce these. Already have a where clause? Be sure that column is indexed. Try to reduce sorting, consider how you are joining tables, and so forth. Eliminate SELECT * and instead specify which columns you want, and which rows.

All these changes will make a huge impact on performance.

3. Verify & Troubleshoot Replication

Have you ever wondered if your replication is working properly? Do you wonder if all the data in your slave database perfectly matches your master database? As it turns out, the slave can silently drift out of sync with the master. We are lucky when this happens and MySQL returns an error.

In either case you'll need a tool to help you. There are two included in maatkit which will come to your rescue. The first mk-table-checksum is run periodically on the master. It creates checksums in a table, and that tables data propagates through replication to the slave. You then run another check with the same tool to verify the slave.

What happens if I find differences? Well then you'll turn to another wonderful Maatkit tool mk-table-sync designed specifically for that purpose. Check the manual for details on usage.

4.Use Aspersa

Another great tool by Baron Schwartz author of Maatkit is the Aspersa toolset. It helps you collect system information and then do performance profiling on that data. Great diagnostic tool.

5. General Database Tuning

You may want to get a birds eye view on what is happening in your database. How many tables are InnoDB versus MyISAM? Am I using indexes well? Am I doing too many joins? Are there some memory settings that are not quite tweaked as best they can be.

To be sure much of this type of knowledge is learned with years of experience, there is a tool which can give you a healthy start. Enter the MySQLTuner!

Grab a copy with wget. Since they grabbed a .pl top-level domain, the command is quite simple to remember:

$ wget mysqltuner.pl

Then set execute permissions, and go! Be sure to edit your .my.cnf file and include a [client] section so the tool can pickup your user/pass automatically!

6. Database Performance and Load Testing

As of 5.1 release of MySQL, all distributions include a really great tool for doing performance and load testing called mysqlslap. You can hand it a query, tell it to run that query 10 times, 100 times, or in 10 threads and so forth. It will fire consecutive queries at the database, and you can then capture the slow query log, or use innotop and the OS monitoring tools to see how the server responds.

7. Operating System Monitoring Tools

Most Unix folks are familiar with the interactive monitoring tool top. If you're not, please fire it up and watch it for a little while. It will give you insight into what's happening right now on your server. Please also take a look at iostat to see what is happening as far as disk I/O goes. Databases are generally disk hungry so keep a close eye on this output. A tool called mpstat will give insight into what your processors are doing. Incidentally top can be configured to shop all of the individual processors as well, so check the options there for details. Lastly sar can come to your assistance. The acronym stands for system activity reporter, and tells you what Linux knows about the activity on the box.

5 Ways to Boost MySQL Scalability

5 Ways to Boost MySQL Scalability:

There are a lot of scalability challenges we see with clients over and over. The list could easily include 20, 50 or even 100 items, but we shortened it down to the biggest five issues we see.

1. Tune those queries

By far the biggest bang for your buck is query optimization. Queries can be functionally correct and meet business requirements without being stress tested for high traffic and high load. This is why we often see clients with growing pains, and scalability challenges as their site becomes more popular. This also makes sense. It wouldn't necessarily be a good use of time to tune a query for some page off in a remote corner of your site, that didn't receive real-world traffic. So some amount of reactive tuning is common and appropriate.

Enable the slow query log and watch it. Use mk-query-digest, the great tool from Maatkit to analyze the log. Also make sure the log_queries_not_using_indexes flag is set. Once you've found a heavy resource intensive query, optimize it! Use the EXPLAIN facility, use a profiler, look at index usage and create missing indexes, and understand how it is joining and/or sorting.

2. Employ Master-Master Replication

Master-master active-passive replication, otherwise known as circular replication, can be a boon for high availability, but also for scalability. That's because you immediately have a read-only slave for your application to hit as well. Many web applications exhibit an 80/20 split, where 80% of activity is read or SELECT and the remainder is INSERT and UPDATE. Configure your application to send read traffic to the slave or rearchitect so this is possible. This type of horizontal scalability can then be extended further, adding additional read-only slaves to the infrastructure as necessary.

3. Use Your Memory

It sounds very basic and straightforward, yet there are often details overlooked. At minimum be sure to set these:

innodb_buffer_pool_size

key_buffer_size (MyISAM index caching)

query_cache_size - though beware of issues on large SMP boxes

thread_cache & table_cache

innodb_log_file_size & innodb_log_buffer_size

sort_buffer_size, join_buffer_size, read_buffer_size, read_rnd_buffer_size

tmp_table_size & max_heap_table_size

4. RAID Your Disk I/O

What is underneath your database? You don't know? Well please find out! Are you using RAID 5? This is a big performance hit. RAID5 is slow for inserts and updates. It is also almost non-functional during a rebuild if you lose a disk. Very very slow performance. What should I use instead? RAID 10 mirroring and striping, with as many disks as you can fit in your server or raid cabinet. A database does a lot of disk I/O even if you have enough memory to hold the entire database. Why? Sorting requires rearranging rows, as does group by, joins, and so forth. Plus the transaction log is disk I/O as well!

Are you running on EC2? In that case EBS is already fault tolerant and redundant. So give your performance a boost by striping-only across a number of EBS volumes using the Linux md software raid.

5. Tune Key Parameters

These additional parameters can also help a lot with performance.

innodb_flush_log_at_trx_commit=2

This speeds up inserts & updates dramatically by being a little bit lazy about flushing the innodb log buffer. You can do more research yourself but for most environments this setting is recommended.

innodb_file_per_table

Innodb was developed like Oracle with the tablespace model for storage. Apparently the kernel developers didn't do a very good job. That's because the default setting to use a single tablespace turns out to be a performance bottleneck. Contention for file descriptors and so forth. This setting makes innodb create tablespace and underlying datafile for each table, just like MyISAM does.

The 5 Biggest Ways to Boost MySQL Scalability

The 5 Biggest Ways to Boost MySQL Scalability:

Sean Hall with some really good MySQL advice:

Tune those queries. Biggest bang for your buck. Enable the slow query log and watch it. Once you've found a heavy resource intensive query, optimize it! Tune what receives real-world traffic.

Employ Master-Master Replication. Immediately have a read-only slave for your application to hit as well.

Use Your Memory. Set innodb_buffer_pool_size, key_buffer_size and other key options.

RAID Your Disk I/O. Use RAID 10 mirroring and striping. On EC2 striping across a number of EBS volumes using the Linux md software raid.

Tune Key Parameters. speeds up inserts & updates with innodb_flush_log_at_trx_commit=2; create a tablespace and underlying datafile for each table with innodb_file_per_table.

Full details at the original article.

How To Build A Better Web Application For Your Business

How To Build A Better Web Application For Your Business:

Are you fed up with hearing about yet another Silicon Valley Web application built with fairy dust and funded by magic pixies? If so, this post is for you. Most of us will never get to work on a Web application that is funded by venture capital and for which the business aims are a secondary consideration. For us, developing a Web application is about meeting a particular business need as part of our job working with some large organization.

Whether as an in-house developer or as part of an agency, we work under strict business constraints and with limited budget and time. Personally, I thrive on this. But it is challenging, so finding the right approach is crucial. In my time of working on Web applications for businesses, I have identified three secrets that seem to make things go a lot smoother:

Focus on user tasks and not features,

Don’t try to solve everything and

Ask the right questions early on.

Let’s begin by looking at user tasks:

Focus On User Tasks, Not Features

When you’re asked to build a Web application, you do exactly that: build a Web application. You have not been asked to solve a business problem, nor to make it easy for the user group to complete a particular task. Instead (at least in my experience), your job is to add certain features and build a specific type of application.

Unfortunately, this is a dangerous approach. By focusing on the application you are building, the emphasis is firmly on technology and functionality, not the users’ needs or the underlying problem to be solved.

Take a Step Back

A good development team will step back at the beginning of a project and look at the underlying issues that have led to the application being initiated.

Spend time with those who will use the application. Observing how users complete the tasks you are trying to simplify is more enlightening than any specification document.

Actually speaking to those who will be interacting with your application on a daily basis will create a much more effective solution than blindly following the directives of whoever commissioned the project.

Make User Testing Part of the Development Process

User testing is key to getting to know the user. Aim to test the application at least once a month throughout the entire development cycle. This does not need to be expensive or time-consuming. Rather, each session needs only three or four users and should be easily completed within a morning. This allows the entire development team to take part in these sessions and be involved in the debriefing, which can happen over lunch.

For more information on this “budget” approach to usability, I recommend Steve Krug’s latest book Rocket Surgery Made Easy.

When it comes to building Web applications for the business, task completion is king. Features merely exist to help users complete tasks.

Which brings me to the next secret…

Don’t Try To Solve Everything

If you fail to stay focused on user needs and business goals, things can get out of hand. These kinds of projects tend to suffer particularly badly from scope creep. Once people in your organization see the potential of the application, they will start suggesting ideas for new functionality. The problem is that with every new feature comes more complexity. This can ultimately undermine the effectiveness of the app. When developing a Web application, I urge our clients to start simple.

Predicting how users will respond to your application can be hard, and a lot of time and money can be wasted building features that no one actually uses.

Monitor Your Key Performance Indicators

Once the simple application has launched, move into a phase of monitoring key performance indicators. This will help you judge the success of the app.

The indicators will vary between projects. However, establishing at the beginning of the process how the success of the app will be measured is important. Combined with user feedback, this monitoring provides a clearer picture of where you should go next. But be careful with user feedback.

Don’t Overreact to User Feedback

Users often react negatively to change. Learning a new system takes time, even if it ultimately is easier to use. Users will inevitably complain and make a plethora of suggestions.

Don’t react too quickly to these suggestions. Daniel Burka once told me from his time at Digg that they allow at least two weeks before reacting to user feedback. Allow users time to adjust to the application before making changes.

Users don’t like change, as Facebook has discovered.

But that is not an excuse for ignoring the opinions of users. In fact, you should carefully gather as much feedback as you can.

Sometimes Technology Is Not the Answer

Interestingly, many of the suggestions made by project stakeholders (not users) revolve around management issues, such as reporting, workflow and monitoring.

While these suggestions are sometimes valid, I have found that the simplest solution to these problems is usually managerial, not technical. For example, a number of clients have asked me for workflow functionality in their content management systems, so that documents cannot be published without approval from elsewhere in the organization. Of course, this is entirely possible to build. In fact, it comes standard in most content management systems.

But I usually wonder whether it would be easier just to tell content providers not to publish a document before it’s checked by someone else. Does this really need a technical solution when a simple policy would do the job?

If more features add more complexity, perhaps we should not solve every problem with a new feature. We could always add that functionality later if it really is required. Of course, that depends on whether the application is easy to expand.

Make It Expandable

Because our feature set is likely to change based on user feedback and business aims, building the application in anything but the most flexible way would be unwise, especially if we purposely haven’t added all of the intended functionality for launch.

Making an application flexible is obviously not easy. But if the application has a plug-in infrastructure from the beginning, then adapting it over time becomes easier. The trick is to recognize from the outset of the project that you do need flexibility. Which brings us to the next point:

Ask the Right Questions Early On

When building a Web application, nothing is worse than surprises. Make sure you have all the facts before beginning. Of course, you cannot know what you don’t know. But the trick is to know the right questions to ask before building. Too often, we focus on the wrong types of questions, such as:

Will this application get internal approval?

How will person X respond if we take this approach?

Does this conform to our branding guidelines?

How will this content be managed internally?

Focusing on these kinds of internal-facing questions may get the project approved faster, but it will lead to a far less effective application. In my experience, four particular questions, if neglected, will cause most problems in the development process:

What is the hosting environment?

When dealing with complex Web applications, knowing the hosting environment is important. Without knowing the environment, you cannot replicate it exactly on your development server, which increases the risk of incompatibilities down the line.

How will users be authenticated?

Most Web applications require users to identify themselves. Realizing late in the game that this authentication has to happen a particular way or be integrated with some legacy system creates all kinds of headaches. Many companies have a central user-authentication system, and your application will probably have to use it.

How will data be backed up?

Web applications often hold valuable data, some of which is confidential. This means that having a solid back-up plan is both business-critical and potentially complicated. By considering from the outset how to handle back-ups, you keep this from becoming a serious problem later in the development process.

Is there any legacy data?

Many new applications will replace existing systems that contain a lot of legacy data. Knowing exactly what this data is and having a plan in place to migrate it to the new system is important.

Learn From Your Mistakes

Every Web application presents unique challenges. Over time, though, you learn from your mistakes and discover the key issues. Whether it is focusing on users’ needs, keeping things simple or asking the right questions, these lessons will be invaluable going forward.

However, there is also an opportunity to learn from one another. Unfortunately, many development teams toil away in isolation within large organizations. Articles like this should stimulate discussion and encourage us to share our experiences — both good and bad — of working on these little-heard-of Web apps.

I hope you will take the time to share your experiences in the comments, so that we can come up with new best practice for developing Web applications in our businesses.

(al) (il)

Three things clients and customers want

Three things clients and customers want:

Not just the first one.

And not all three.

But you really need at least one.

1. Results. If you can offer a return on investment, an engineering solution, more sales, no tax audits, a cute haircut, the fastest rollercoaster, a pristine beach, reliable insurance payouts at the best price, peace of mind, productive consulting or any other measurable result, this is a great place to start.

2. Thrills. More difficult to quantify but often as important, partners and customers respond to heroism. We are amazed and drawn to over the top effort, incredible risk taking on our behalf, the blood, sweat and tears that (rarely) comes from a great partner. A smart person working harder on your behalf than you'd be willing to work--that's pretty compelling.

3. Ego. Is it nice to feel important? You bet. When you greet us at the door with a glass of white wine, put our name in the lobby of the hotel, actually treat us better than anyone else does (not just promise it, but do it)... This can get old really fast if you industrialize and systemize it, though.

This explains why the local branch of the big insurance company has trouble growing. It's hard for them to outdeliver the other guys when it comes to the cost effectiveness of their policy (#1). They are unsuited from a personality and organizational point of view to do #2. And they just can't scale the third.

Put just about any business with partners into this matrix and you see how it works. Book publishing, for sure. Hairdressers. Spas. Even real estate.

The Ritz Carlton is all about #3, ego, right? And on a good day, there's a perception that the guys at Apple are hellbent on amazing us yet again, delivering on #2, taking huge career and corporate risks on our behalf. As soon as they stop doing that, the tribe will get bored.

(There's a variation of ego, #3, that comes from being in good company. This is what gets people to sign up for Davos, or to choose ICM as their agent. Your ego is stroked by knowing that only people as cool as you are part of this gig. Sort of the anti-Groucho opportunity. Nice position, if you can get it, because it scales.).

It's tempting, particularly for a small business, to obsess about the first—results—to spend all its time trying to prove that the ROI is higher, the brownies are tastier and the coaching is more effective. You'd be amazed at how far you can go with the other two, if you commit to doing it, not merely talking about it.

"I'm under a lot of pressure..."

"I'm under a lot of pressure...":

The ellipsis hides the most important part of this sentence:

"I'm under a lot of pressure from myself."

When you have a big presentation or a large speech or a spreadsheet due, the pressure you feel is self-induced. How do I know? Because stuff that felt high-pressure a few years ago is old hat to you now. Because it used to be hard for you to speak to ten people, and now it takes a hundred or a thousand for you to feel those butterflies. Because not only do you get used to it, you thrive on it.

Unless you're in a James Bond movie, it's really unlikely that the pressure that you're feeling is anything but self-induced.

What you do with the pressure is up to you. If it's not helping you do great work, don't embrace it. Pressure ignored ceases to be pressure.

Search This Blog