Mustali Kachwala's Blog: An Introduction To Programming Type Systems

An Introduction To Programming Type Systems | Smashing Coding:

Static typing is great because it keeps you out of trouble. Dynamic typing is great because it gets out of your way and lets you get your work done faster. The debate between strongly and dynamically typed languages rages on, but understanding the issue starts with weak typing and languages such as C.

C treats everything like a number. A character like aor 7 or % is the number of the ASCII symbol representing it; “true” and “false” are just 1 and 0.

C defines variables with types such as int for integer and char for character, but that just defines how much memory to use. To access the variable and print it out, I need to know the type.

int a = 1;
printf("The number is %i\n", a);

char z = 'z';
printf("The character is %c\n", z);

When I run this program, it shows this:

The number is 1 The character is z

The printf function needs a hint from us to know how to format the variable. If I give the wrong hint…

char z = 'z';
printf("The character is %i\n", z);

… then I get this:

The character is 122

C doesn’t know whether z is a character or an integer; it has a weak type.

Weak typing is fast because there’s no overhead of remembering the different types, but it leads to some nasty bugs. There’s no way to format the z if you don’t know its type ahead of time. Imagine accessing a variable and getting the number 1229799107. The number could be the result of a mathematical calculation (1,229,799,107), the cost of a government program ($1.2 billion) or a date (Saturday, 20 December 2008). When all you have is the number, there’s no way to know that it’s really the code for the letters in my name: zack.

So far, we’ve just covered weak typing. “Weak versus strong” and “static versus dynamic” are often spoken of synonymously, but they each describe a different phase of the same problem. They’re also politicized, with the words carrying an implied value judgment. “Weak versus strong” makes one of them sound a lot better than the other, while “static vs. dynamic” makes one sound stodgy and the other exciting.

(Image: Cris)

These two terms are used interchangeably, but they describe the difference between the way a language defines types and how it figures them out when the program runs. For example:

Weak with no typing	Assembly languages don’t provide any way of specifying or checking types at all. Everything is just a number.
Weak static typing	C lets you define object types as structures, but it doesn’t do much to enforce or remember them. C automatically convert between many types. C++ and Objective-C go further with the definitions but still don’t enforce the resulting types.
Strong static typing	Java forces you to define all types and checks them with a virtual machine.
Strong dynamic typing	Python, JavaScript and Ruby dynamically infer the types of objects, instead of forcing you to define them, and then enforce those types when the program runs in the interpreter. All dynamically typed languages need a strong typing system at runtime or else they won’t be able to resolve the object types.
Weak dynamic typing	Dynamically inferred types don’t work in a weakly typed language because there aren’t any types to infer.

No programming language fits any of these definitions 100%. Java is considered one of the most static languages, but it implemented a comprehensive reflection API that lets you change classes at runtime, thus resembling more dynamic languages. This feature allows the Java Virtual Machine to support very dynamic languages such as Groovy.

Functional programming languages such as Lisp, Erlang and Haskell blur the lines even more.

Usually when people argue about the merits of strong versus weak programming languages, they really mean the varying degrees of weak, strong, static and dynamic philosophies in every language.

(Smashing’s Note: We are organizing a series of full-day workshops with some of the most well-respected Web design experts — taking place in our headquarters in Freiburg, Germany. Seats are very limited, so grab your workshop ticket today!)

Weak Static Languages: C, C++ And Objective-C

These next programming languages use a subset of C’s functionality and strict guidelines to improve the loose nature of the language. C++ and Objective-C compile into the same bytes as C, but they use the compiler to restrict the code you can write.

In C++ and Objective-C, our number (1229799107) has a meaning. I can define it as a string of characters and make sure that no one uses it as a currency or a date. The compiler enforces its proper use.

Static typing supports objects with sets of functionality that always work in a well-defined way. Now I can create a Person object and make sure the getName function always returns the string of someone’s name.

class Person {
    public:
        string getName() {
            return "zack";
        }
};

Now I can call my object like this:

Person p;
printf("The name is %s\n", p.getName().c_str());

Static typing goes a long way to avoid the bugs of weakly typed languages by adding more constraints in the compiler, but it can’t check anything when the program is running because a C++ or Objective-C program is just like C code when it runs. Both languages also leave the option of mixing weakly typed C code with static typed C++ or Objective-C to bypass all of the type checking.

Java goes a step beyond that, adding type checking when the code runs in a virtual machine.

Strong Static Languages: Java

C++ offers some stricter ways of using C; Java makes sure you use them. Java needs everything to be defined so that you know at all times what type of object you have, which functions that object has and whether you’re calling them properly.

Java also stopped supporting C code and other ways of getting out of static typing.

The Person object looks almost the same in Java:

public class Person {
    public String getName() {
        return "zack";
    }
}

I get the name by creating a new object and calling the getName function, like this:

public class Main {
    public static void main (String args[]) {
        Person person = new Person();
        System.out.println("The name is " + person.getName());
    }
}

This code creates a new Person object, assigns it to a variable named person, calls the getNamefunction and prints out the value.

If I try to assign my person variable to a different type, such as a character or integer, then the Java compiler will show an error that these types are incompatible. If I was calling a separate API that had changed since I compiled, then the Java runtime would still find the type error.

Java doesn’t allow code outside of a class. It’s a major reason why people complain that Java forces you to write too much boilerplate.

The popularity of Java and its strong adherence to strong typing made a huge impact on the programming landscape. Strong typing advocates lauded Java for fixing the cracks in C++. But many programmers found Java overly prescriptive and rigid. They wanted a fast way to write code without all of the extra definition of Java.

Strong Dynamic Languages: JavaScript, Python, Ruby And Many More

In JavaScript, I define a variable with the keyword var, instead of a type like int or char. I don’t know the type of this variable and I don’t need to until I actually want to access it.

I can define an object in JavaScript with the getName function.

var person = {
    getName: function() {
        return 'zack';
    }
};

alert('The name is ' + person.getName());

Now I have an object named person, and it has a function named getName. If I call person.getName(), it will result in zack.

I declared person as a var, and I can reassign it to anything.

var person = {
    getName: function() {
        return 'zack';
    }
};

person = 5;

alert('The name is ' + person.getName());

This code creates a variable named person and assigns it to an object with a getPerson function, but then it reassigns that variable to the number 5. When this code runs, the result is TypeError: Object 5 has no method 'getName'. JavaScript says that the object 5 doesn’t have a function named getName. In Java, this error would come up during compilation, but JavaScript makes you wait for runtime.

I can also change the type of the object based on the conditions of the program.

var person = {
    getName: function() {
        return 'zack';
    }
};

if (new Date().getMinutes() > 29) {
    person = 5;
}

alert('The name is ' + person.getName());

Now this code will work at 9:15 but will fail at 9:30. Java would call this a type error, but it’s fine in JavaScript.

The most popular form of dynamic typing is called “duck typing” because the code looks at the object during runtime to determine the type — and if it walks like a duck and quacks like a duck, then it must be a duck.

Duck typing enables you to redefine any object in the middle of the program. It can start as a duck and turn into a swan or goose.

var person = {
    getName: function() {
        return 'zack';
    }
};

person['getBirthday'] = function() {
    return 'July 18th';
};

alert('The name is ' + person.getName() + ' ' + 
      'and the birthday is ' + person.getBirthday());

At any point, I can change the nature of my Person object to add the new getBirthday function or to remove existing functionality. Java won’t allow that because you can’t check object types when they’re always changing. Dynamically redefining objects gives you a lot of power, for good and bad.

C shows errors when the program runs. C++, Objective-C and Java use the compiler to catch errors at compile time. JavaScript pushes those errors back to the runtime of the application. That’s why supporters of strong typing hate JavaScript so much: it seems like a big step backward. They’re always looking for JavaScript alternatives.

Which Is Better?

I’m looking for a program to parse XML, find a particular element, make a change and save the file. On a team of Java programmers, I wrote the code in the dynamic language Python.

import sys
import string
from xml.dom.minidom import parse

dom = parse(sys.argv[1])

for node in dom.getElementsByTagName('property'):
    attr = node.attributes['name'];
    if attr.value == 'my value':
        node.childNodes[0] = dom.createTextNode('my new value');

file = open(sys.argv[1], 'w');
file.write(dom.toxml('UTF-8'));
file.close();

This program finds every property node with the name my value and sets the contents to my new value. I define the variables dom for my XML document, node for each node of XML that I find, andattr for the attribute. Python doesn’t even require the keyword var, and it doesn’t know that nodehas childNodes or that attr has value until I call it.

To change an XML file in Java, I would write a new class, open an input stream, call the DOM parser, traverse the tree, call the right methods on the right elements, and write the file out to an output stream. I could simplify some of those calls with a library, but I’d still have the overhead of defining my static types. All of the extra definition of objects and variables could easily take a hundred lines. Python takes 14.

Dynamic code is generally shorter than static code because it needs less description of what the code is going to do. This program would be shorter in Python than in C++ and shorter in Ruby than in Objective-C.

So, which is better?

The static programmer says:	The dynamic programmer says:
“Static typing catches bugs with the compiler and keeps you out of trouble.”	“Static typing only catches some bugs, and you can’t trust the compiler to do your testing.”
“Static languages are easier to read because they’re more explicit about what the code does.”	“Dynamic languages are easier to read because you write less code.”
“At least I know that the code compiles.”	“Just because the code compiles doesn’t mean it runs.”
“I trust the static typing to make sure my team writes good code.”	“The compiler doesn’t stop you from writing bad code.”
“Debugging an unknown object is impossible.”	“Debugging overly complex object hierarchies is unbearable.”
“Compiler bugs happen at midmorning in my office; runtime bugs happen at midnight for my customers.”	“There’s no replacement for testing, and unit tests find more issues than the compiler ever could.”

Static typing made our Person object easier to understand. We defined a Person with a name and agreed about which fields mattered ahead of time. Establishing everything clearly makes our Personeasier to debug but harder to change.

What happens when someone in our application needs a second email address? In a static language, we’d need to redefine the object so that everyone has two email addresses, even though most people don’t. Add in a birthday, favorite color and a few more items and every Person will have twice as many fields as they need.

Dynamic languages make this problem much easier. We can add a second email field to onePerson without adding it to everyone. Now, each object has only the fields it needs. A static language could handle this with a generic map of values, but then you’re fighting the static environment to write dynamic code. C programmers spent years tearing their hair out over errors in type conversions, corrupt values, and the terrible bugs that come from small typos. They’ve been burnt by weak typing, and dynamic typing looks weak.

Dynamic programmers spent years banging their heads over the rigidity of static languages, and they crave the freedom to make the languages do what they want.

I’ve seen static code get overly complex and become impossible to follow. Try debugging anenterprise JavaBean or understanding all of the details of generics in Java. I’ve seen dynamic code turn into a giant mound of unmaintainable spaghetti. Look at the myriad of terrible JavaScript programs before jQuery. Node.js does some amazing things, but I can’t look at it without traumatic flashbacks of horrible JavaScript that I’ve debugged.

Conclusion

There’s no clear conclusion. Dynamically typed languages are popular now. The pendulum will swing back and forth many times in the coming years. The only solution is flexibility. Learn to work in each environment and you’ll work well with any team.

Mustali Kachwala's Blog

Search This Blog

Thursday, April 18, 2013

An Introduction To Programming Type Systems | Smashing Coding

Weak Static Languages: C, C++ And Objective-C

Strong Static Languages: Java

Strong Dynamic Languages: JavaScript, Python, Ruby And Many More

Which Is Better?

Conclusion

No comments:

Post a Comment