Python is soup-in-a-cup
Introduction: we’ve had smarter language wars than this
Judging from the blogs — at least from Reddit, which is where I get my daily dose of blog — functional programming is frankly on the rise. Part of it probably is due to OOP’s failure to materalize its promise of easy-to-maintain, flexible, reusable code. Part of it is also the rise of FP-centered languages around relatively recent developments in theoretical computer science — like the MLs and Haskell, a little further away from academia, Erlang. There’s finally the visible hand of Moore’s Law making high-level languages ever more practical.
Also judging from the blogs there’s a distinct zeitgeist of rejecting “static” languages in favor of “dynamic” ones. This is partly an artifact of fast computers finally making serious work in dynamic languages feasible, but also a rejection of the Big Architecture discipline of the C ++/Java weltanschauung.
Being as easy as it is to make noise with a blog, this has led to a lot of uninformed ranting. Even Steve Yegge, who’s supposed to be a smart programmer with a track record that shames mine, has done his share of pointless rambling on how “static” languages compile to ‘hardware’, to ‘Pinocchio’-like wooden boys that aren’t “real”, while dynamic languages yield “living” software somehow.
This, of course, makes absolutely no sense.
The Either/Or approach to classifying computer languages
First of all, this dichotomy mixes up type-checking strategies with evaluation models. Dynamically-typed languages check types at runtime, while statically-typed languages do some form of type-checking that ensures no type error will arise once a program manages to compile. On the other hand, statically-compiled languages yield an executable that does what it needs to do, while dynamic languages are either interpreted — that is, require a separate runtime environment that parses source in text form — or allow some form of pseudocompiling that mangles source into some form of object code and bundle a runtime environment.
This distinction deserves repetition.
|at compile-time||at run-time|
|compilation||to a ‘native’ executable||“static” languages (C, C++, Haskell)||???*|
|to object code for some runtime||“VM” static languages (Java)||“Dynamic” languages|
(* To the best of my knowledge, that can’t be done. How would you do runtime type checking without a runtime environment?)
Evidently, there’s some empirical correlation here — enough to cause massive conceptual confusion in Steve Yegge’s brain. The table above nevertheless represents the standard wisdom about conceptual dimensions to be emphasized; this is supposed to be a technical choice based on pros and cons of each approach:
|static typing||dynamic typing||static compilation||PRO: computationally most efficient; amenable to engineering at all levels. Safe safe safe!
CON: major loss of flexibility. Slower development time. Broken type systems in most languages have no concept of inference — and worse yet, often allow type casts.
|PRO: designing one is likely to get you big bucks, a McArthur Genius Award or even beatification by the Roman Catholic church.
CON: Probably doesn’t exist.
|Interpreted/bundled runtime||PRO: portability without sacrificing safety.
CON: Interpreted languages are generally inefficient; bundled runtimes are generally humongous; ultimately no control of what the virtual machine/interpreter at the user’s side might choose to do.
PRO: Faster development process. In noncrippled languages of the type, metaprogramming. Maximum flexiblity! Agility! Ooh, aah!
CON: Computationally inefficient. Hard to catch bugs before they happen in the wild. Architecture is a matter of faith. Haskell hackers mock you.
Conventional wisdom holds that once you choose what to do about compilation and type-checking strategies, you’ve basically bound yourself to the above pros and cons. Fortunately, the conventional wisdom is essentially wrong.
Is type-checking really this much of an issue?
Why do computers bother to have the concept of a type? At some point, machines mangle around bytes, but even in the lowest-level user languages, some facilities to deal with numbers, strings, etc. become desirable — to the point there are even typed assembly languages.
Given that, there’s no algebraically correct method for summing a number and a string. None. The best you can do is to arbitrarily overload the “+” operator to become “concatenate . toString” so
"Oh bitty box!" + 2 becomes
"Oh bitty box!2", but that won’t map so easily to the rest of the ordinary operations over numbers. Worse yet, numbers no longer are a ring, nor an integrity domain, nor an abelian group; mathematicians, who know this stuff better than us, have decided we don’t want to live in a world where ordinary numbers are not an integrity domain.
What I’m trying to tell you here is that computers choke at machine type mismatches, and that’s a fact of life. All the available dynamic languages won’t help you the least when you try to divide a floating point number by a string. To be feasible, a computation must first make sense. When you see things from this point of view, does it make sense not to try and detect all possible machine type clashes at compile-time so nothing sad happens at compile-time? It does if the type model your language offers you is inconvenient enough.
First of all, digital computers being discrete, finite machines, they’re not good at handling numbers in the sense we’ve accustomed ourselves to deal with in real analysis at all. At the machine level, computers can’t handle the real continuum, or abitrarily large numbers as a type because they’re supposed to enclose values in fixed-sized boxes so memory can be managed. That is, computers don’t deal with “numbers” in the flexible sense we use — they deal with variations of floating point numbers and bounded integers.
When computer languages reflect this fact — and it’s often useful that they do; declaring all numbers to be the largest, most precise floating-point type would lead to imprecisions and waste memory space — you often find out you can’t add two numbers!
int a = 2 can’t be summed with
float b=2.0. That has to be a drag, which makes run-time type checking more attractive: just let code compile, and let’s decide at runtime what machine type this “2″ will correspond to. This, in guidospeak, is “duck typing”: if it walks like a duck and quacks like a duck, then it’s a duck.
2.0 + 2 will still fail to run, though — implicit type casts are evil.
Second of all, in the type model available to most languages, you have to explicitly declare every type. This is very, very boring. This is the stuff that makes for entire lives spent in quiet desperation.
Does static typing have to be like this? Not at all; none of these problems relate directly to the fact that machine types are checked at compile-time.
The first problem has to do with the fact that most languages have no concept of “type classes” built around common operations. In a language with a good type model, I can just say
double x = 2*x
The type of this will be
(Num a)=>a->a, which means this function is correct for every type that implements the operations of Num.
The second problem has to do with compilers for static languages being bone-headed (or actually, the typing models not being smart enough to allow for smart behaviour). We’ve stated the type of
double above; does anyone really think that applies to a string? Why can’t compilers know this? Well, this is
1958 1969 1978 1985 1987 1998 2006 already: they can!
Given this, the only advantage of dynamic typing is that it lets you compile massively broken code, only to see it break when people attempt to run it. I can only attribute the fact that people prefer to catch bugs at runtime rather than at compile-time to the fact that they’ve learned that static typing means boring, repetitive, soul-crushing grunt-work. Except there has been a mathematical theory of type-inference since the late 50s, it’s been perfected for decades to the point of a great piece of art — and most programmers don’t know about this! Instead, they rely on unit tests like an engineer who, for lack of knowing that differential calculus does half of his physics for him, runs extensive simulations for months on end before trusting something.
How many roads must a program construction calculus walk down before you call it a language?
Python’s signature war cry is “batteries included”. That’s supposed to mean that it includes many nice libraries with common operations for everything from scientific computing to web development. Python goes further than that, though: Python — as many other like-minded dynamic languages — is soup-in-a-cup.
The word “Python” refers to (a) a programming language standardized in Neuron #385978 inside Guido van Rossum’s brain — that is, a formal system one can specify computations in — (b) a system for manipulating programs in Python — which makes it a dynamic language in the sense of not being statically compiled (c) an object system over Python and (d) an interpreter.
You can’t have some of it. Your soup-in-a-cup comes with carrots, deal with it. You can’t decide you don’t need the benefits of metaprogramming and would rather have a faster program. You can’t decide to use an alternative interpreter, because no one really knows what Python is supposed to work like except the core development group. You can’t even decide to compile it because metaprogramming is bolted-on: every compiled program written in Python would have to include an interpreter because you might want to use “eval” at some point.
In comparison, a Haskell environment of comparable power is a tower of separable pieces. First, you have the core language — not defined in a fanboy mailing list, but in a congress of top academics evaluating the contributions made to the language by recent research. Second, there are Haskell interpreters and compilers around that formal specification — possibly including extensions that are enable if you politely ask it to use
-fglasgow-exts or something equivalent to that. Third, if you want metaprogramming — if you want to write programs that manipulate Haskell programs — you have Template Haskell, which basically converts between standard, meant-for-humans Haskell syntax and an Abstract Syntax Tree, not too different from what’s available in the Lisp world, which you can manipulate at will to spit out Haskell source. Fourth, if you want that generated code to be dynamically-loaded — if you want “eval” — you have hs-plugins, which is roughly equivalent to the “bundle an entire interpreter with the program” strategy of many Lisp and Scheme pseudocompilers.
And if you really want to defer type-checking to runtime, you can always use Data.Dynamic. As I’ve said before, you can go all the way from defining a type model that matches the algebraic semantics in your head — using the deep mathematics of type inference to do part of the grunt work for you — to relying on products and sums from machine types and using type checking just so there isn’t a mortal blow at runtime when the computer tries to divide a date by a string.
Haskell gives you choices: you can prepare the strict meal, with a ritual involving several plates served in order, and you can prefare the free-wheeling buffet, with a bunch of yummy edible stuff laid on a table on the corner of your party.
You can even serve them soup-in-a-cup. It’s very convenient, I know.
Filed under: Uncategorized | 13 Comments