Just Enough Ruby

You’re looking at a draft of a chapter from a work in progress, tentatively titled Scripting Mac Applications With Ruby: An AppleScript Alternative, by Matt Neuburg.
Covers rb-appscript 0.6.1. Last revised Jun 23, 2012. All content ©2012 by the author, all rights reserved.

1. The Object-Oriented Structure of Ruby

1.1. Sending Messages to Objects

1.2. Module: The Most Important Kind of Object

1.3. Defining a Module is Executable Code

1.4. A Module is a Namespace

1.5. The self Keyword

1.6. Class: Module Plus Inheritance

1.7. Top Level

1.8. Instance

1.9. Assignment: Names are Pointers

1.10. Instance Variables

1.11. Mixins

1.12. The Singleton Class

1.13. The Truth? You Couldn’t Handle the Truth!

2. Datatypes and Syntax

2.1. Method Calling and Syntactic Sugar

2.2. Some Cool Built-in Datatype Classes

2.3. Assignment and Parameter Passing

2.4. Blocks

2.5. Ruby Control Structures

2.6. Libraries and Gems

3. Where to Go From Here

If you already know Ruby, skip this chapter. If you don’t know Ruby, skim this chapter. Whatever you do, don’t expect to learn Ruby from this chapter. But do expect to learn something about Ruby from this chapter.

What this chapter can and will do is provide a quick overview of some of Ruby — just enough Ruby, so that if you’re coming from some other language, you’ll be able to follow the examples in this book. But since this chapter isn’t even going to pretend to teach you all of Ruby or even all the basics of Ruby, you shouldn’t read it all that carefully. Don’t memorize it; just look at it. You aren’t trying to learn the language; you’re just trying to get a general picture of what the language is like. Really learning Ruby is something you can do later, on your own. (I’ll talk about that at the end of the chapter.)

Note: My way of describing Ruby is somewhat peculiar, but there is method in my madness. Most discussions of Ruby explain it “from the bottom up” (starting with strings and arrays and hashes and working up to classes and modules); I explain it “from the top down” (starting with modules). That’s a deliberate pedagogical choice. I learned Ruby from the bottom up, and that made Ruby much harder, and take much longer, to understand; top-down explanation is better, and gives a clearer picture, right away, of how Ruby works. Also, my initial code examples are written in a somewhat stilted style. That, too, is deliberate; it isn’t until later in the discussion that I introduce the syntactical exceptions that permit standard Ruby style, because I see these exceptions as complications that the student shouldn’t have to deal with too soon.

1. The Object-Oriented Structure of Ruby

Ruby is an object-oriented language. If you don’t know what that means, just keep reading; I’ll make it (very) clear as we go. Ruby is so object-oriented that a common Ruby mantra is: “Everything is an object.” It’s a bit tricky to explain what an “object” is; but, loosely speaking, it’s a thing that you send a message to. All the action in Ruby happens because you send a message to an object.

In order for an object to respond meaningfully to a message, it must somehow possess internal knowledge of that message — a pre-existing, primed response, saying what should happen when this particular message arrives. This primed response is called a method. In other words, a method is simply a set of instructions saying what an object should do in response to a particular message.

To send a particular message to an object is to call that method of the object.

1.1. Sending Messages to Objects

In Ruby, you send a message to an object using dot-notation: first the name of the object, then a dot, then the message. For example, if we had an object called Dog, we could tell it to bark like this:

That’s a legal Ruby program in theory, but in fact it won’t do anything useful all by itself. Right now, if you were to run that as a Ruby program, Ruby would give you an error message. That’s because we don’t have an object called Dog. And even if we did, it wouldn’t necessarily know how to bark. Don’t worry; in a moment, we’ll make a Dog object that does know how to bark.

We do, however, have a built-in object called Kernel. Kernel doesn’t know, out of the box, how to bark either; but it does know how to do something called “puts”.

That’s a Ruby program that actually runs, with no error message. But it doesn’t appear to do anything. That’s because puts means: “Output the value of this message’s parameters.” We didn’t provide any parameters, so the output was just an empty line. This time, let’s provide a parameter:

That’s a working Ruby program that actually does something: it outputs the phrase “Hello, world!” In examples in this book, when there’s output from a line of a program, I’ll show you right there in the program what the output is, using a comment. A Ruby comment begins with a hash character (#). By convention, a Ruby comment that tells the reader what is output starts with #=>. So:

So what we’ve just learned is that there’s a Kernel object, and it has a puts method. And we learned how to call that method. Ruby has many other built-in objects with built-in methods. For example, consider this code:

Ruby comes with a Math object, which has a cos method. Therefore, sending the cos message to the Math object does in fact call the Math object’s cos method. This, as you might expect, causes the Math object to calculate the cosine of the parameter. However, it doesn’t cause Ruby to show us the result of the calculation. Since we know about Kernel.puts, we can fix that:

Even a literal expression such as a string is an object. There are lots of built-in methods that a string object knows about. For example:

Method calls can be chained. This makes sense, because if calling a method returns a result, that result is an object (because everything is an object), so it can be sent a method call. For example:

What’s happening there is that first we send the reverse message to the string "howdy", getting back a new string, "ydwoh". Then we send the upcase message to that string.

Where objects come from, and how they come to have methods, is the most important thing to know about Ruby. Ruby has three particularly important types of object: module, class, and instance. We will now discuss how to make objects of all three types, and how to endow them with methods — and, of course, we’ll see why each of these types of object is important and what it’s for.

1.2. Module: The Most Important Kind of Object

Module is the most important kind of object. Before talking about why it’s important and what it’s for, let’s make a module.

That code means: “There’s a Dog object, and it’s a module.” Ruby responds by ensuring there really is a Dog module; if there wasn’t one already, Ruby creates it. To prove that the Dog object now exists, we can proceed to talk about it:

We didn’t get an error on the last line, because by that point in the program the Dog object did indeed exist. Now, the real power here lies in the fact that we can endow an object with methods. To do so, we use the def keyword, sandwiched between the module keyword and its corresponding end line. For example, let’s endow Dog with a bark method:

(The keyword def is short for “define”, and introduces the definition of a method. Never mind for now what the self and the dot are doing before the name of the method.) Let’s pause to summarize what’s happening here.

That code tells Ruby, “There’s a bark method.” Not only that; it says what the bark method does. Between the def line and its corresponding end line are sandwiched the lines of code that produce the desired result when bark is called. Here, the desired result is simply the string “bow wow”.

That code combines the two things. The def sandwich is inside the module sandwich. The module sandwich puts the Dog object under discussion, so the def sandwich is about the Dog object. Thus, this arrangement of code endows the Dog object with a bark method and states that when the bark message is sent to the Dog object, the answer “bow wow” should come back.

Great, so let’s try it: let’s actually send the bark message to the Dog object and see if “bow wow” does come back:

Nothing happened. Why didn’t it work? It did work; it’s just that having “bow wow” come back within the program is not the same thing as outputting “bow wow” so that we can see it. But we know how to output something, so it’s easy to modify our program to do so:

Eureka! We have created an object and endowed it with a method, and then we have successfully called that method.

Let’s sum up. Kernel is an object, and in particular, it is a module. Math is an object, and in particular, it is a module. Dog is an object, and in particular, it is a module. What’s special about Dog is that it didn’t exist until we created it. Moreover, Kernel already had a puts method, and Math already had a cos method, before we came along. But we endowed Dog with a bark method ourselves.

1.3. Defining a Module is Executable Code

A characteristic aspect of Ruby that surprises newcomers is that lines like module Dog and def self.bark are executable code. They are commands, and when our Ruby program runs, they are executed, in the order in which they are encountered. Consider once more the program we just created:

The end lines merely indicate groupings (i.e. where the sandwiches end), so that program actually consists of three executable lines of code, in this order:

It follows that order matters. Our program would not work if the last line were placed first:

In the first line, a non-existent Dog object is mentioned, and the program comes to a halt. The Dog object would have been brought into existence in the second line, but we never reached it.

Notice too that I have never said that a line like module Dog creates the Dog object. I have said that it asserts or ensures its existence, that it creates it if it didn’t exist already. It is perfectly legal to say module Dog and define some methods on the Dog object even if the Dog object already exists. Not only is it legal, it’s common. Not only is it common, it’s largely the essence of what makes Ruby Ruby. Read the following program in order and be sure you understand how it works:

Let’s talk through that code. First, we define a bark method on the Dog object and call it (“bow wow”). Then, we define a wag method on the Dog object. The Dog object already exists and already has a method (the bark method), but that’s not a problem: the bark method continues to exist (“bow wow”) and the wag method now exists too (“wag wag”). Finally, we define the bark method on the Dog object again. Our new definition of the bark method is now in effect, so code that calls the bark method after this moment has a different result from before (“ruff ruff”); meanwhile, the wag method continues to work as before (“wag wag”).

Furthermore, no distinction is drawn between objects that you create and objects that are built-in to Ruby. We could just as easily give the built-in Kernel object a bark method, if that suited our purposes:

We could even change the definition of a built-in Kernel method, such as puts. Of course this means that you have the power to alter built-in objects in such a way as to bring Ruby’s normal behavior to its knees. Nevertheless, altering built-in objects is not at all uncommon. When people describe Ruby as highly dynamic, this is the kind of thing they are talking about.

That code asserts the existence of the Dog module, but it doesn’t define Dog; if it did, Dog would be a module without methods. That is not the case. All modules have built-in methods; and besides, if Dog already exists and has methods, that code doesn’t remove those methods. (It is possible to remove methods — in Ruby, everything is possible — but that isn’t how you do it.) Rubyists frequently refer to code like that as opening the module; it places the module under discussion, so that sandwiched code can define methods.

It is also possible for other executable code to appear in the sandwich. For example, this is legal:

That works, and it outputs “opening Dog,” not in response to any method being called, but at the moment the Kernel.puts line is encountered. That example is highly artificial, but there are certain types of code that are quite commonly executed in the course of a module-opening sandwich.

1.4. A Module is a Namespace

Modules have many powerful features, and one of them is that they are namespaces. This means that names defined inside a module are hidden from outside the module.

To see what I mean, you need to know that one of the things you can do inside a module is define another module. So, for example:

From where we are in the last line, when we say Dog.bark, the name Dog is not visible. That’s because the name Dog is defined inside the Animals module. However, we are at the same level as the place where Animals was defined, so the name Animals is visible. Now, we can reach the name Dog, if we want to; we just can’t do it directly, because it isn’t directly visible. Instead, we have to do it by way of the name Animals, which is directly visible. The “by way of” operator is two colons (::).

Ruby uses modules as namespaces to keep things nicely packaged. A module, in fact, might exist solely as a way to package other modules together, keeping their names from polluting the global namespace. This helps to avoid name collisions, and also just makes it clearer what something is for. That’s what’s happening in this code:

The name PI is defined inside the Math module, so you have to dive into the Math module to use it.

The “names” that I’ve been talking about here all begin with a capital letter. This is not coincidence. Module names must begin with a capital letter, because that’s what tells Ruby what rules to follow in looking to see whether the name is defined. Those rules are complicated, but the take-away message is simple: an object whose name begins with a capital letter is visible to code looking upwards to the level where that name is defined. In the previous example, code inside the Dog module can see the name Dog and the name Animals; but code at top level, where the module Animals is defined, can see the name Animals but not the name Dog. Built-in module names, such as Kernel, are implicitly defined at top level, so that all code can see them.

Incidentally, the dot-notation operator that we are using to call a method of an object is itself a “by way of” operator. In the last line of the above program, just as we cannot see Dog directly without first diving into Animals, so too we cannot see bark directly without first diving into Dog. In fact, the double-colon (::) and the dot (.) are largely interchangeable.

1.5. The self Keyword

The self keyword is often used as an object to send a message to, and when so used, it means: “The object we are in, right now.” By “right now,” I mean “at the time the code runs.” Code can thus use self as a way of accessing, from inside as it were, names that other code would have to access from outside using some explicit object name. So, for example, let’s make a Dog module and give it two methods, one of which calls the other:

When we call Dog.speak, the speak method runs, and the keyword self is encountered (in the expression self.bark). Where are we at that moment? We’re in the Dog module, because that’s who the speak message was sent to. Thus, saying self.bark sends the bark message to the Dog module, from inside as it were, in exactly the same way that saying Dog.bark would send the bark message to the Dog module from outside, explicitly, by name.

It is legal for the keyword self, when used in this way, to be omitted. Putting it another way, if a method call appears to be sent to no object, that’s just an illusion, a convenience created by Ruby behind the scenes; in reality, the method call is being sent to self. So, this would work just as well:

I find omission of self confusing, and I would prefer to avoid it; to me, explicit use of self is cleaner and clearer. However, there are situations, having to do with a feature of certain methods called “privacy”, where self must be omitted. The idea is that a “private” method can be called only from within an object that is endowed with that method, and Ruby enforces this by balking if you explicitly send a private method call to any object, even self. I regard this as unfortunate (it’s one of the few things about Ruby that I don’t like), but there it is.

1.6. Class: Module Plus Inheritance

The second most important kind of Ruby object is class. A class is a module, but it adds two important features: inheritance and instantiation. We’ll talk first about inheritance.

To make a class, you use the class keyword, in a way that looks just like what you do when you make a module:

Thus far, a class simply is a module. But now let’s add inheritance. When we define a class, we are allowed to say what other class it inherits from. This means that our class gets not only its own methods but also any methods belonging to the class it inherits from. Let’s make a class Poodle that inherits from Dog. To do so, we use the < symbol as we define the Poodle class:

Poodle inherits from Dog, so it inherits Dog’s methods. Poodle knows how to bark because Dog knows how to bark. Poodle is a subclass of Dog, and Dog is Poodle’s superclass. (One class can have many subclasses, but can have only one immediate superclass.) It would be superfluous, though, to make a class that inherited everything from its superclass and stopped there; they’d be effectively the same class. The power comes in when a subclass is like its superclass but with a difference. There can be two kinds of obvious difference. First, the subclass might have a method that the superclass lacks:

In that example, a Poodle can stand on its hind legs, but not every Dog can. The second kind of difference is that the subclass might have a method with the same name as its superclass, but with different functionality. Naturally, when you send a message to the subclass, you get the subclass’s version of what that method does:

In that situation, we can say that Basenji’s bark overrides the bark method inherited from Dog.

Sometimes, you want to override an inherited method but incorporate the inherited method’s functionality into the override. So you need a way to call the inherited method from within the method that overrides it. To do so, you use the super keyword. Imagine, for example, that a noisy dog barks louder than a normal dog:

Saying super inside NoisyDog’s bark method calls the superclass’s bark method, giving us the string “bow wow”. We then send the upcase message to that string, and the result of doing that is what NoisyDog’s bark method produces.

In a subclass, self naturally embraces the superclass; the subclass inherits methods from the superclass, and therefore so does self. So:

If you don’t specify a superclass when you define a class, Ruby supplies one for you — the built-in Object class. In other words, this code:

If you change a superclass’s functionality, the subclass instantly inherits the change:

(It is illegal to change a subclass by specifying a different superclass than the one it already has. Ruby is highly dynamic, but it isn’t insane.)

1.7. Top Level

The Object class is important in another way: Every Ruby program takes place entirely inside it. Everything is an object, after all; well, the top-level object, the “universe” as it were, is the Object class object. Certain special rules apply to the top level world (for example, if you ask for self at the top level, you’re told that it’s main), but ultimately it is as if the whole program were embedded in an implicit class Object sandwich.

One result of this architecture is that it is legal to define methods at top level:

Thus, it is possible to program in a “lazy” way, without going through the overhead creating any objects just to get any work done at all. I suppose that as a good little object-oriented programmer I ought to deprecate this style, but in fact it is very handy and many examples in this book will use it.

1.8. Instance

Along with the ability to inherit, a class has the ability to be instantiated. This means that from a class we make an instance.

To make an instance from a class, you send the new message to a class. Now you have a new object, an instance of that class, and you can send messages to it:

Something odd is going on here. Our Dog.new instance is clearly a Dog; sending it the class message tells us so. So why doesn’t it know how to bark?

To tell you the answer, I have to make a confession. There are actually two kinds of method: class methods and instance methods. All the methods we’ve created so far have been class methods — and that’s the significance of the self modifier that appears before the method’s name in the def line. A class method is a method that you call by sending a message to the class (or module), as we’ve been doing. An instance method, on the other hand, is a method that you call by sending a message to an instance of class (an instance that you generated from that class using new, as we’ve just seen). And it is defined inside the class sandwich without using the self modifier, like this:

Since we’ve changed Dog so that bark is an instance method, we can send the bark message to an instance of the Dog class, but we can no longer send the bark message directly to the Dog class itself:

So, if all you want to do with a certain class or module is to send messages directly to that class or module, there is no point whatever in giving that class or module any instance methods. The purpose of giving a class an instance method is so that we can call that method on an instance of that class.

Indeed, although module is the most important kind of object, because it governs the whole structure of how Ruby works, instances that you make from classes are the most common kind of object. When you actually do stuff in a Ruby program, you mostly do not send messages to modules and classes, as we have been doing so far. You mostly send messages to instances generated from classes using new, as we are now doing.

Observe that the keyword self used as an object to send a message to, when instance method code is running, refers to the instance. That’s just a consequence of the fact that self refers to “the object we’re in now”. When what’s running is instance method code, the object it’s in is, ipso facto, an instance. So:

The line self.bark executes at a time when the speak message has been sent to the Dog.new instance. Therefore, that instance is self, and self.bark calls the bark instance method.

So what if an instance needs to refer back, as it were, to its class? It can use the class method, which we used at the beginning of this section. (That’s also one way, obviously, of finding out what class an instance is an instance of.) Knowing this, you can see how an instance would also be able to call a class method of the class of which it is an instance:

1.9. Assignment: Names are Pointers

Every time you call new on a class, you create a new and different instance of that class. To see this easily, we have only to send an instance the object_id message. Every object in the Ruby universe at every moment during the running of a program has a unique object_id value, so we can easily detect whether two objects are the same.

That program shows that there is only one Dog class object, no matter how many times we refer to it; but saying Dog.new gives us a different instance of Dog every time.

An object that can’t be referred to is useless and goes out of existence automatically in Ruby. So in the above program, the Dog instance 44180 probably doesn’t even exist by the time we have generated the Dog instance 44150. Bringing an object into existence only to have it go out of existence again a moment later is perfectly reasonable, but more often than not, you’re going to want an instance generated with new to persist for longer than that. One common way to make an instance persist is to assign it to a variable name. In Ruby, the assignment operator is an equal sign (=).

As you can see, our Dog instance, now called fido, is persisting from one line to the next. Naturally, if Dog has any instance methods, we can call those methods on fido.

That code might give you the impression that fido is itself a Dog instance. That’s loosely true, but it would be more correct to say that the name fido points to (or refers to) a Dog instance. Objects in Ruby have a kind of independent existence (off in a kind of separate object universe called the object space); names merely point to them. One way to see this is that if you assign an object from one name to another, you end up with two names pointing to the very same object:

Beginners sometimes find this confusing (or downright unnerving). The thing to remember is that assignment changes what object a pointer points to; it doesn’t alter that object. For example:

Let’s deconstruct that program. The story starts with a lowercase string. After the second line, both noise and bark point to that same lowercase string. The third line repoints bark at a different object, namely, an uppercase version of that string. But the lowercase version of the string still exists, and noise is still pointing to it.

On the other hand, it is also possible to alter an object in place. If you do this after pointers to the object are already established, those pointers now all point to the altered object. For example:

In the fourth line, we upcased bark, not noise; nevertheless, the fifth line shows that noise was upcased. How can this be? After the second line, both noise and bark are pointers to the very same string object. The upcase! method causes the string object itself to be upcased (that’s what the exclamation mark hints at). No new assignments take place, so after the string object is upcased, both noise and bark are still pointers to that very same (now upcased) string object.

Oh, one more thing. It should not surprise you to learn that if you change a class, any persistent instances of that class instantly acquire the changes:

What that code demonstrates is that fido remains the same instance throughout, yet its behavior when told to bark changes because the definition of the Dog instance method bark changes. And of course this would be true of all now and future Dog instances; they would all, from now on, have the changed Dog functionality. And of course this is true up the inheritance chain as well; if Dog’s superclass is Animal and we change the Animal class, any Dog instances, those that exist now and those that come into existence in the future, all acquire the change.

1.10. Instance Variables

So far, nothing that we have said explains why instances are needed. An instance is an object, a class is an object, a module is an object, they’re all objects. Why shouldn’t you write an entire Ruby program using just classes and modules? The answer is simple, and is the basis of the entire notion of object-oriented programming. An instance can have instance variables.

An instance variable is a variable belonging to the instance where it is mentioned. When I say “mentioned,” I mean what I meant when I talked about self earlier. When code runs, if it mentions an instance variable, that variable belongs to self, whatever that may be at the moment. Assuming the code in question is instance method code, self is the instance, and so the instance variable belongs to that instance. You and Ruby can always know when code is mentioning an instance variable, because that variable’s name begins with a single at-sign (@).

To see what I mean, we’ll give Dog two instance methods, one that sets an instance variable called @name, and one that reports that instance variable’s value. Then we’ll make two persistent instances of Dog and give them each a different @name:

So instances of the same class share their code, because they get their instance methods from the same class; but they maintain the values of their instance variables separately. And that, believe it or not, is the whole point of instances; in fact, it is the essence of what object-oriented programming is all about.

Observe that, in contrast to some languages, Ruby neither requires nor permits you to declare or initialize your instance variables. An instance variable springs into existence, if it didn’t exist already, when code that mentions it runs, like our your_name_is instance method. If you ask an object for the value of an instance variable that has never been assigned a value, the instance variable springs into existence with the special value nil. It would also be perfectly legal for fido to go through life without a @name; if your_name_is is never called on fido, then fido will never have a @name.

On the other hand, it is extremely common to wish to endow an instance with some instance variable values as early as possible. It would drive us crazy to have to remember to call your_name_is on every Dog instance, right after calling Dog.new. Therefore Ruby provides an instance method that is automatically called “as early as possible” — in fact, it is called as part of new. That instance method is called initialize, and it follows this rule: whatever parameters you supply in the new call are passed on as parameters of the initialize method. So, you’ll probably never call initialize directly, but you’ll think of new as a way of calling initialize. So it would be much more common to write the preceding program as follows:

Notice that this convention requires you to know what the parameter(s) to new mean. The program was clearer when we had a your_name_is method; nothing about the phrase new("Fido") tells the reader that "Fido" is going to be the instance’s @name. This is a bummer, and Rubyists often complain about it, devise ingenious alternatives, and so on. But for now, just get over it.

An instance method like initialize or your_name_is, which sets an instance variable, or like what_is_your_name, which gets an instance variable, is called an accessor. An accessor is the only way to get and set an instance variable from outside that instance. (Okay, not really; nothing in Ruby is ever “the only way”. It’s the only normal way, okay?) There is no syntax such as fido.@name. Instance variables are considered private; if there isn’t an accessor for an instance variable of fido, you can’t access that instance variable from outside fido. (Of course if you are fido, then all you need in order to access an instance variable @name is to mention @name in an instance method, as we’ve been doing.)

Although an accessor method can have any name you like, it is considered nice Ruby style to follow a naming convention where the accessor’s name matches the instance variable’s name, like this:

The method name= is our first example of some extremely cool syntactic magic that Ruby performs behind the scenes. The equal-sign is the assignment operator, so Ruby lets you write natural-looking code like fido.name = "Fido". That code looks natural, but it is actually meaningless, because there isn’t a variable called fido.name. To endow it with meaning, Ruby translates this line:

In other words, Ruby looks for a name= instance method, and passes it "Fido" as parameter. If no such method exists, you can’t talk that way:

Thus, Ruby gives you the feeling that you’re using an operator (the equal sign), but in fact you’re calling a method on an object. This happens in Ruby a lot, and is part of what people mean by “everything is an object”. I’ll give further examples later.

It is quite common for an instance to have a bunch of instance variables that it maintains for various purposes, keeping track of things it needs to keep track of, but for which no accessor is supplied. On the other hand, it is also quite common for an instance to have a bunch of instance variable that it wants the rest of the world to be able to access directly, and for which an accessor is supplied. In fact, this is so common that Ruby actually supplies methods for creating accessors — generating, for example, a @name setter called name= and a @name getter called name, on the fly, complete with code, automatically.

1.11. Mixins

It often happens that you have an instance method or, even more frequently, a bunch of related instance methods, which you would like a certain class to adopt, but where class inheritance does not handle the situation. For example, we might have a class inheritance tree expressing the evolutionary relationship among animals, but how can we express that certain animals, without regard to their place in this tree, can fly? We’d like to “inject”, as it were, a fly method into certain classes, such as Bird and Bat.

This problem puzzles every object-oriented programming language; those that solve it have various ways of doing so. Ruby’s solution is particularly elegant, and is called a mixin. Basically, you can “mix in” a module into any other module. When you do this, any instance methods in the former also become effectively part of the latter. This mechanism is another reason why modules, as stated earlier, are so important.

To “mix in” a module into another module, you use the include method. The second module will usually be a class, so the whole thing might look something like this:

Observe that there is effectively no reason to endow a module with an instance method other than to use that module as a mixin. You cannot, after all, call a module’s instance method by sending a message directly to the module; for that, the method would need to be a class method. And you cannot instantiate the module, thus endowing an instance with the instance method; a module is not a class, so it can’t be instantiated. The architecture demonstrated above is in fact the typical mixin architecture: the instance method lives in a module, the module is mixed into a class so that the class acquires the instance method, and the class is instantiated, thus making it possible to call the instance method on an instance.

The line include(Flyer) might need some explanation. The method include is a built-in method of modules. No object is given, so the include method is implicitly sent to self; that’s the class Bird, and a class is a module, so it works. The include method is “private”, so explicitly saying self.include(Flyer) is forbidden. Finally, the include method is called at the moment that line is encountered, while the Bird class is under discussion; I emphasized earlier that other code besides def could appear in a module-opening sandwich, and include is a typical example.

Since the methods within a module that are destined for mixing in are instance methods, they can talk about instance variables. Since those methods will run only after they have been mixed in to some class and that class has been instantiated to make an instance, those instance variables will belong to that instance. Thus, mixed-in methods are just as free to create, access, and manipulate an instance’s instance variables as any other instance methods are.

As usual, you can modify a mixed-in module and your changes will instantly propagate to any classes that mix it in, and to any instances of those classes:

Ruby makes heavy use of mixins in its built-in module/class structure. The most important case is that the Kernel module is mixed into the Object class. Every object (modules, classes, instances) inherits ultimately from Object, so all Kernel instance methods are available everywhere. Thus, Kernel is used as a repository for essential methods. Additionally, Kernel has many internally duplicated methods: they are both class methods and instance methods. This means that in every case in all preceding code examples where I have sent a message to Kernel, I was being unnecessarily stilted; one can do this, but no one ever does. The normal way to call puts is not to say Kernel.puts (calling the puts class method of Kernel) but just to say puts (calling the puts instance method of Kernel mixed into Object):

(Since puts is a “private” instance method of Kernel, you can’t call it by saying self.puts, even though every self is endowed with puts because Kernel is mixed into Object.)

Since you, too, can modify Kernel, and since Kernel is mixed into Object, and since changes in a mixed-in module are instantly propagated, you can easily inject new functionality throughout Ruby. Use your power for good instead of evil!

1.12. The Singleton Class

We have seen that an instance acquires instance methods from its class. But you can also endow an instance with instance methods individually. Thus it is possible to have a situation where one particular instance of a class has a certain method, and no other instance of that class does.

One way to do this is by use of a syntax that’s easier to demonstrate than to describe. Let’s suppose we have two dogs, fido and rover; they both know how to bark, but only fido knows how to fetch.

As you can see, the way we endow the instance fido with its own personal instance method is to treat fido as itself a class, effectively opening a class sandwich on the instance. It would be wrong to say that fido is a class, though, so it is better to imagine that fido — and every instance — is accompanied by a sort of personal, “shadow” class, a class from which this instance alone derives instance methods. This “shadow” class is called the singleton class. Thus, the line class << fido opens fido’s singleton class for discussion.

There is actually a more compact syntax for doing the same thing, appropriate especially if we are about to endow an instance’s singleton class with just one method:

Another way to endow the singleton class with methods is to use the mixin architecture. This is done with the extend method. It’s an instance method of the Object class, so it can be called on any instance; and its effect is similar to include, except that instead of endowing a full-fledged module with another module’s instance methods, it endows an instance’s singleton class with a module’s instance methods.

1.13. The Truth? You Couldn’t Handle the Truth!

The attentive reader may now be thinking: If we are allowed to say def fido.fetch, defining an instance method on fido’s singleton class from outside fido, surely we are allowed to do this for a class or module. Well, let’s try it:

Holy kamoly! It looks like we’ve just endowed the Dog class with a class method, bark, from outside the Dog class sandwich. But then surely that’s what we were already doing when we endowed the Dog class with a class method from inside the Dog class sandwich:

The only difference is that because we are now inside the Dog class sandwich, we can say self.bark instead of Dog.bark, because inside the Dog class sandwich, self is Dog.

Well, dear reader, you figured it out. It’s true. There isn’t really such a thing as a class method. All methods are instance methods!

Here’s the truth about the big picture of what’s going on in Ruby. (I’m warning you, don’t read the rest of this section unless you’re willing to risk having your head explode.)

Everything is an object, meaning an instance of some class. A class object is itself an instance of the Class class; a module object is an instance of the Module class. The Class class’s superclass is Module; the Module class’s superclass is Object. All classes have Object as their ultimate superclass.

An object is able to respond to a method call, not by virtue of anything inside itself, but because there a corresponding method in its singleton class, or in its actual class, or in some class further up the inheritance / mixin chain. In the case of something like a Dog instance such as fido, that’s easy to understand: the reason we can send a message to fido is that there are instructions for dealing with that message in fido’s singleton class, or in Dog, or in Object, or in some module mixed into one of those. But exactly the same thing is true of the Class instance Dog: the reason we can send a message to the Dog class (what I’ve been calling a “class method”) is that there are instructions for dealing with that message in Dog’s singleton class, or in Class, or in Module, or in Object, or in some module mixed into one of those.

You don’t have to understand that in order to use Ruby, and indeed I think that my way of describing Ruby, starting with modules and class methods, then proceeding to classes, and finally to instances, instance methods, instance variables, mixins, and singleton classes, is both valid and pedagogically useful. So feel free to ignore this section if you’re having trouble understanding it. I have to confess that I myself read and reread explanations similar to this section for literally years before I understood Ruby in the way I’ve just described it in the preceding two paragraphs. But when I finally did understand it, it was a very satisfying moment. (And then my head exploded.)

2. Datatypes and Syntax

Ruby syntax is fairly straightforward, especially if you already know any other C-like language, such as Perl, REALbasic, or (gasp) C. You can easily acquire the fine points of Ruby syntax on your own, but a few peculiarities may need some special attention up front.

2.1. Method Calling and Syntactic Sugar

Ruby comes with a number of syntactic rules for method calling, designed to make it look a little less like “everything is an object” and a little more like other programming languages.

For example, we’ve already seen that you can use syntax that looks as if you were assigning to a “property” of an object:

Ruby doesn’t have “properties” so there isn’t a real thing on the left side of the equal sign. Instead, Ruby translates this behind the scenes to:

Now Ruby is sending the name= message to fido, along with the parameter "Fido"; if fido doesn’t have a name= method, there will be an error. So this works (or doesn’t) because fido’s class defines (or doesn’t) an instance method name=.

Most things that look like operators in Ruby work the same way. Take, for example, the plus sign:

There is no addition operator in that expression; Ruby is not “adding” two strings together. Rather, Ruby translates the expression behind the scenes into this:

Now Ruby is merely calling the + method on the string "Hel", with "lo" as its parameter. Since a string is an instance of the String class, which defines an instance method +, it works.

The literal 3 is an instance of the Fixnum class, which defines a + instance method. So Ruby translates the above expression to a method call, which works:

As you can see, it is perfectly legal to write all these expressions as method calls yourself. But no one ever does. Syntactic sugar is, by nature, sweet.

If you’re looking at those symbols and thinking, “Yes, but what do they mean?” you still haven’t got the idea. They don’t mean anything; or rather, you can make them mean whatever you like. However, it’s true that many built-in classes make them mean some very cool stuff, which you’re going to want to learn about; and it’s also true that some of these meanings are conventional, and that you too should follow those conventions, not because you have to, but because it’s convenient if you do and confusing if you don’t. For example, << is used by a number of classes (String, Array, IO) to mean “append”, and many other classes follow suit.

Another example of the value of obeying conventions is the family of methods connected with the spaceship method, <=>. The convention here is that when instances of a class can be compared with the notions “less than”, “equal to”, and “greater than”, the spaceship method embodies all three: it returns -1, 0, or 1, according to whether the object to which the <=> message is sent is less than, equal to, or greater than the parameter. The cool part is that if your class implements this one method, you can mix in the Comparable module and presto, the other five methods spring to life (not a difficult trick, because they are all trivially defined in terms of the spaceship operator).

The square bracket methods at the end of the first line come into play with classes like String and Array. For example, given an array arr, you can fetch the first item of the array by asking for arr[0], and you can set the first item of the array (replacing its previous value, if any) by saying arr[0] = newvalue or similar. Those expressions are syntactic sugar; behind the scenes, they are translated to calls on the [] and []= methods, respectively.

There are additional operators that are syntactic sugar for combinations of methods. For example:

That means: “Fetch the value of s; call its + method with "!" as parameter; and assign that back to s.” Or, try this one:

That means: “Call fido’s name method; take the result and call its + method, with "ookums" as parameter; now take that result and use it as the parameter in calling fido’s name= method.”

Many operators have assignment combinations like the above. A common Ruby idiom is something like this:

So, here’s what that code does. If @name is non-nil (i.e., if it has already been assigned a string value), its value counts as true in the logical expression; so evaluation of the logical expression stops, and the value of @name is returned. If, on the other hand, @name is nil (which, if it has never been given a value, it will be), we go on to the right argument in the logical expression. That value is "Fido", which is assigned to @name — and now the value of @name is returned. So, either way, the value of @name is returned, but if @name is nil it is assigned the value "Fido" first. This is effectively a way of saying that "Fido" is @name’s default value, the value it is to have if it has never been assigned a value.

2.2. Some Cool Built-in Datatype Classes

Ruby comes with a number of extremely elegant built-in classes. It’s impossible to provide all the details in a short space, and besides, that’s the sort of thing that traditional “bottom-up” Ruby tutorials do very well. This section will just provide a quick survey.

Numbers work pretty much the way you would expect them to. There are several numeric classes, and numeric literals are automatically translated into instances of the appropriate class.

Ranges are objects representing a slice of a sequence. A literal range is written with two dots between the endpoints of the slice. For example, 1..5 means all the integers from 1 to 5, inclusive, and 'a'..'c' means all the lowercase letters from a to c. Obviously Ruby has rules for what constitutes a sequence; you can create your own classes for things that behave sequentially, and ranges will work with them. Ranges are of particular value in taking slices of strings and arrays, and in loops, as we shall see.

Symbols are a little bit like strings, and indeed strings can generally be converted to symbols and a vice versa. A literal symbol starts with a colon, so :this_is_a_symbol. The important difference is that although there can be many string objects equal to "some string", there can be only one symbol equal to :some_symbol. Symbol lookup and comparison is therefore very fast, and symbols are useful wherever a mere token or identifier is needed.

Strings can be formed using literal delimiters: single-quotes and the equivalent %q{...}, and double-quotes and the equivalent %{...}. (I’m greatly over-simplifying.) There’s a big difference between single-quoted string literals and double-quoted string literals: double-quoted string literals can contain various escaped characters, plus they are candidates for expression interpolation. For example:

Expression interpolation, as you can see from the example, involves expressions surrounded by #{...}. This syntax, familiar if you’re acquainted with shell scripting, Perl, and the like, is of great convenience when assembling strings. The expression inside the #{...} is evaluated in the current context, and if the result is not a string, its to_s method is called — there is no automatic implicit coercion between classes in Ruby, so a class must implement to_s if it is to be interpolated into a string. (The Kernel method puts also calls its parameter’s to_s, which is why we have not had to worry about whether that parameter is a string.) Ruby also implements “here documents”, where a multi-line stretch of a program is taken to be a literal string, so interpolation into a large string is easy:

The String class comes with many wonderful methods that make string operations a snap. The greatest of these involve square brackets, which let you slice and dice and modify a string easily:

Observe that in Ruby, indexes are zero-based; the first item of a series is item zero.

A literal string in backticks, or the equivalent %x{...}, is interpolated like a string in double-quotes and then evaluated by the shell. This is actually syntactic sugar for calling Kernel.`(s), where the backtick is the name of a method and s is the string.

Regular Expressions are handled through the Regexp class. A Regexp pattern object can be formed using literal delimiters, either forward slashes or the equivalent %r{...}. This literal may be followed by a letter or letters indicating the mode of the matching operation, such as i to ignore case, and m for a multi-line match. The rules for regular expression pattern interpretation and matching are very involved; suffice it to say here that Ruby has some of the best regular expression support in the universe. Regular expressions can be used in a number of string methods; plus, there’s a basic match operator, =~. After a regular expression match, a number of special values are set which tell you about what just happened:

Both $~ and the result of the match operation in the second line are MatchData instances; a MatchData captures all of the information about what happened, and has methods for reporting them (like begin, and the square-bracket method used in the fourth line).

Arrays are ordered lists. A literal array is delimited by square brackets, with the items separated by commas. An item can be any value at all; however, it is a bad idea to give an item the value nil, because that is the default response when you ask for a non-existent item. Arrays can easily be concatenated, combined, sliced and diced, and so on.

(Notice the use of p instead of puts for outputting the literal form of an array. It calls an object’s inspect method instead of its to_s method, and some classes, including Array, implement these differently.)

I could go on and on talking about arrays. You can take the union or intersection of two arrays. There are ways of splitting a string into an array of strings, joining an array of strings into a single string, and so on. Arrays are tremendously important in Ruby, which is one reason why they are so well-endowed with methods; in the next section, we’ll see that arrays play a crucial role in assignment and parameter passing.

One thing that beginners need to watch out for is that an array index is merely a pointer, just like any other name. Assigning into an array doesn’t make a copy. This means that the contents of an array can be changed “behind your back”:

Hashes are unordered collections of key–value pairs. The idea is that you can access a value by way of its key. Keys can be of any class that supports a hash method, but in practice, strings and (preferably) symbols are typically used. A literal hash is delimited by curly braces, with each key preceding its value and separated from it by => (and each key-value pair separated by comma):

A common way to get and set a value by way of its key is to use the square bracket operator:

As the last line shows, fetching through a non-existent key yields nil by default.

Hashes are very important in Ruby and are richly endowed with methods, including ways to combine hashes, and ways to convert between an array and a hash. Key names can be formed dynamically, and hash access is very efficient, so a hash is a great way to store arbitrary associative data. I often see beginners asking on the Ruby forums how to form variable names dynamically, e.g. “I want to read the text of several files into string variables, naming each variable whatever the name of each file may be.” Dynamic variable naming is a nutty idea; this situation cries out for a hash.

Another way to think of a hash is as a lightweight object consisting of instance variables. For example, we could make a lightweight database of people and their favorites, implemented as a hash of hashes:

From one point of view, that’s an appalling way to behave, because it’s so fragile; if we know for a fact that every person has exactly a favorite composer, a favorite painter, and a favorite comedian, and that this set of attributes will never change, we should properly create a Person class and use a hash of Persons instead. (And indeed, a Struct class exists just to make it easy to create classes consisting of nothing but instance variables and their accessors.) Nonetheless, hashes are so convenient that even experienced Rubyists do in fact write code like this all the time.

2.3. Assignment and Parameter Passing

The syntax of assignment and the syntax of parameter passing in a method call are closely related, so they deserve to be studied together. Just as a method can have more than one parameter, so you can assign to more than one variable simultaneously:

After that, name is "Fido" and age is 7. Such multiple simultaneous assignment (or parallel assignment) is, of course, a mere convenience; you could just as easily assign to name in one line and to age in the next line. But even a mere convenience can be very convenient.

The real power, however, emerges when the number of names on the left side of the assignment (lvalues) differs from the number of values on the right side (rvalues). If there is just one lvalue and multiple rvalues, the rvalues are combined into an array, and it is this array that is assigned to the lvalue:

If, on the other hand, there are multiple lvalues and just one rvalue, then an attempt is made to treat the rvalue as an array, by calling its to_ary method if it has one. (Very few classes implement to_ary; Array does, simply returning self.) If this attempt succeeds, we now have an array, and in that case the items of the array are distributed over the lvalues:

Let’s give these behaviors a name. We’ll call the second case, where an array rvalue is distributed over multiple lvalues, splatting the array; and we’ll call the first case, where multiple rvalues are combined into an array, reverse splatting. Then we can say that we have just seen examples of implicit splatting and implicit reverse splatting; the splatting or reverse splatting was performed for us, automatically.

In the case where the number of lvalues and rvalues differs, however, you might want splatting or reverse splatting to occur, and you need a way to indicate this, since it won’t happen automatically. To make this possible, Ruby provides the splat operator, which is an asterisk. If the asterisk precedes an lvalue, that lvalue mops up all excess rvalues as a single array (reverse splatting):

If the asterisk precedes an rvalue, that rvalue is treated as an array (by calling its to_ary method) and, if this succeeds, the elements of that array are distributed over the remaining lvalues (splatting):

Parameter passing, when calling a method, works in almost the same way. The difference is that when you call a method, there is no implicit splatting. The reason is that when you call a method, there’s an arity check, meaning that the number of parameters passed must match the number of arguments declared by the method definition; if they don’t match, there’s an error.

We can use reverse splatting to allow a method to accept any number of parameters:

No matter how many parameters are passed, there will be no error; all the parameters will be combined into a single array argument, and our method code can now proceed to investigate the situation and behave accordingly.

The picture is made more complicated by the fact that in a method definition we are allowed to specify default values for some or all arguments. In Ruby 1.8.x, such arguments must come after the arguments without default values, and if there is a reverse splatted argument, it must come last of all:

However, there’s a problem with all this from a usability point of view. Our calls to the test method contain no indication of what the various parameters are for; we have to consult the method definition and figure it out. (I mentioned this earlier in connection with the discussion of new and initialize.) Also, we are tied to supplying the parameters in a fixed order. Also, we have no way to pass a length value but fall back on the default for weight. A common workaround is to define a method to expect a hash. A hash’s keys indicate what the values are for:

It would then be up to our implementation of test to analyze the hash h; but this is not difficult to do. The important thing to notice is the absence of curly braces. Ruby has a “syntactic sugar” rule that in the comma-list of parameters in a method call, the last parameter can be a literal hash without curly braces. (Ruby knows it’s a hash because of the => symbol.) This rule is to make it easier to pass a literal hash as the last parameter, and taking advantage of it is extremely common.

One final note on passing parameters. Even when a method call involves parameters, parentheses around the parameter list are optional (unless their omission results in an ambiguous expression, because of the complexity of the context or the parameters themselves). Such omission is quite common. So, no one actually writes, as we have been doing:

Similarly, if the last parameter to be passed to a method call is a literal hash, it is common to omit both the parentheses and the curly braces around the literal hash:

2.4. Blocks

Most of Ruby’s control structures are easy to understand, and are similar to the control structures of other languages that you may be familiar with; but something explicit needs to be said about blocks. Nothing is so typical of Ruby syntax as a block, nor so distinctive as to the difference between the Ruby Way and the way other languages do things.

A block is basically the body of a function — arguments, and what to do with those arguments. Any method in Ruby can accept a block. The block, if present, is called only if the method calls the keyword yield. Any parameters supplied to yield are passed as arguments to the block. Here’s a trivial (and silly) example:

The expression {|s| puts s} is a literal block. (It is also possible to supply a variable whose value is a block, but I’m not going to talk about that.) Notice that the literal block is outside the parentheses of the method call. The vertical pipes (|s|) come first inside the block, and give the names of the arguments; then comes the code. So, let’s talk about what happens in that code.

We call blockTester, handing it two things: a string parameter, and a block. The string parameter, "Howdy", becomes the method argument ss; the block just sits there, waiting to see whether the method ever calls yield. The method does call yield, with one parameter, namely the value of ss (which is still "Howdy"). Okay, so now we’re in the block. One parameter arrives and is assigned to the argument s inside the block. Then the code of the block executes, and we output “Howdy”.

For longer blocks, it is common to use a different syntax, with do and end instead of curly braces:

(But there isn’t actually any important difference between using curly braces and using do / end to delimit a literal block.) In that example, notice that the value returned by the block (in this case, the string “Done”) is the value returned by the yield call. After the line that calls yield, the method carries on in the normal way.

Now, in your initial use of Ruby you are unlikely to write many methods that expect blocks. But you are very likely to call methods that expect blocks, because such methods are the standard way of looping in Ruby. The basic example is the each method. Many built-in Ruby classes implement each, especially collections and ranges. The each method means: “Do this for each item in the collection.” To tell the each method what you mean by “this,” you pass a block.

For example, suppose you want to fetch the value of column 1, the value of column 2, and so on up to n. I have no idea what a “column” is, or how you fetch the value of one, and I don’t care; I’m only interested in the abstract mechanics of each and a block. If you know any other computer language at all, you are probably tempted to write a for loop, and you’re going to be casting about to find out how to express this in Ruby. Well, you can; but don’t. No one writes for loops in Ruby. The concept “1, then 2, and so on up to n” is expressed by a range: 1..n. The concept “fetch that column” is expressed in a block:

Here’s another example (one of my favorite ways of demonstrating Ruby). Suppose we have a string, and we want to count all the occurrences of each unique word in that string. How would you do this? Don’t think in terms of cycling through the string; think of turning the string into a collection of which you can process each item with each. So, our first step is to bust the string into words:

Our way of splitting the string into words is crude but cool (involving a regular expression), and now we have an array. An array is a collection, so we’re off to the races. What shall we do with each word? Well, let’s downcase it, so that all our words are lowercase; then, let’s use each word as a key in a hash, which we’ll have prepared beforehand.

How should we hash the item? Well, we’ll fetch the item from the hash. If it’s not there, we’ll get nil, and we’ll assign 1 as a value (because we have just found our first instance of that item). If it’s there, we’ll get the number of times we’ve found that item so far, and we’ll add 1 to that.

That actually works, but it is more common to be a little less verbose. Here’s a tighter version of the same code; be sure you can see why it is the same:

(It is actually possible to make the code even tighter and even more Ruby-like, but that’s enough of that example.)

Variable scoping in blocks is complicated and poses potential hazards for the beginner. The basic rule is that if a variable name mentioned inside a block — including, in Ruby 1.8.x, one of the argument names in pipes — is already defined and visible in the context surrounding the block, then they are the same variable.

This can be extremely convenient, because it means that information from the surrounding context doesn’t have to be passed into the block. So, in the example just above, we defined an empty hash called h outside the block, and then inside the block we referred to that same h. In fact, that is why we defined h outside the block. If we hadn’t done so, then the only h mentioned would be inside the block, and it would therefore be local to the block, and we would be unable to retrieve its value:

But because we first defined h as an empty hash before the block, the h in the block was that h:

So it is, in fact, a very common technique to define a variable, even just setting it to nil, before a call to a method involving a block, just so that block can alter that variable outside itself. But you can readily see the downside: there is a trap waiting for us here, in that we might accidentally give a block variable the same name as a variable outside the block, and destroy the latter’s value unintentionally.

It is important to be clear that when I say that the block can see variable names already defined “in the context surrounding the block”, I mean in the context surrounding where the block is defined. For example, we could have written the first example in this section more concisely, like this:

In the last line we call blockTester, passing it a block that refers to s. The block is called inside the blockTester method, where s is "Farewell"; but that doesn’t matter. At the point where the block was defined, s was "Howdy", and that is the s whose value the block has captured. And it hasn’t just captured it; it has access to it, and can change it:

In that example, the code s = "Farewell" was executed only because we passed it in a block into a method, blockTester, which yielded to it. But the s in the block is the same as the s in the line before because that is where the block is defined.

(Actually, the situation is even deeper. A block is a closure. By this I mean that the block doesn’t just access values that it refers to outside of itself; it preserves them for the lifetime of the block. That doesn’t matter here, because the block’s lifetime is no longer than that of its surroundings; but there is such a thing as a long-lived block.)

Finally, I must say something about how you exit prematurely from a block. Do not say return inside a block. (Well, you can say it, but it doesn’t return just from the block, it returns from the method that defines the block, which is rarely what you want.) To return from one call of a block, allowing the block to be called again if that’s what the caller wants to do, say next; you can use this to return a value from the block (some methods that expect blocks also expect the block to return a meaningful value). To return from the method that called the block (i.e. the method that said yield), say break; again, you can use this to return a value, which will become the value returned from the method that called the block. If you don’t supply a value with return, break, or next, the value returned is nil.

For example, the map method of a range feeds each item to the block, and returns an array comprising each value returned from the block. We’ll write a block that doubles each number fed to it, but skips odd numbers, and stops with a protesting message if it’s fed an even number larger than 6:

The result shows clearly what happened: we returned nil from the block for odd numbers, and doubled the value for even numbers. But if we change the initial range, things are very different:

The result starts out as if it were going to be the same as in the previous example. When item reaches 8, however, break is executed; this cancels the entire array construction process and forces map to return the protesting message instead.

2.5. Ruby Control Structures

Ruby control structures, as already mentioned, are easy to understand; here’s an extremely quick summary of the ones I most commonly use. Consult a proper language introduction for full details.

Besides the C-like boolean operators &&, ||, and !, Ruby has English boolean operators and, or, and not. The binary boolean operators have lazy left-to-right evaluation, and for this reason logical-and is often used as a poor man’s “if” (and logical-or is often used as a nil test, as shown earlier). The English versions have lower precedence, and for this reason are often used with assignment.

(defined? is a keyword; it could not be a method call, because in that case it would choke if its parameter wasn’t defined.)

Instead of if and the negative of something, you can say unless and the positive of that same thing. There is no elsif in an unless structure, and I must warn you that although you can use else in an unless structure, this can make your code quite difficult to understand.

A nice feature of Ruby, similar to Perl, is that simple if and unless conditions can be used as a postfix with a single statement. So, we could rewrite our earlier block example in a more usual Ruby idiom:

No break statements are needed; we never fall through from one when to the next, as in a C switch construct. Comparison between the comparand (here, n) and the possible values uses the === method as defined by the class of the when value (not the comparand!); by default, this is the same as the == method, but it can be specially defined. For example, Class defines === to mean “is an instance of this class or of one of its subclasses”, so you can test against various class possibilities like this:

That’s an elegant way to say “if n is a SomeClass or an OtherClass or a ThirdClass.”

Instead of while, you can say until (until is to while as unless is to if); and both can be used as a postfix with a single statement. There are quite a number of keywords for subverting a while loop. next aborts this iteration and proceeds to the next iteration; redo starts this iteration over again; and break aborts the whole thing. These can also be used in blocks (see above). Blocks are the most Ruby-like way of looping, but while does come in handy very often.

To jump out of a deeply nested loop, you can use throw and catch. These are really the equivalent of the controversial goto, wearing a different hat. They are methods (of Kernel). catch takes a symbol and a block; if any code at any depth within the block calls throw with that same symbol, the block aborts. And throw can take a second parameter, the value to be returned by the block. The textbooks speak of throw and catch as rare, but I have found them extremely useful in my own programming.

To exit a def prematurely, use return (possibly with a value). Otherwise, a def returns the value of the last executed statement (every statement in Ruby has a value). To exit an entire program, use exit; return is illegal at top level.

Runtime errors cause an object of class Exception or one of its subclasses to propagate up the call chain; if it is not handled, the program terminates prematurely. To handle an exception requires a structure like this:

What that does is to assign something to x, unless the attempt to fetch something causes an error, in which case something_else is assigned to x. This is very elegant and a great time-saver.

That means: “If there is an exception, and if it is of class ExceptionClass1 or ExceptionClass2, then handle it, assigning it to the variable err before proceeding.” If you don’t specify an exception class — that is, if you use rescue without qualification — then only exceptions of class StandardError (and its subclasses) are handled. Unfortunately, lots of exception types are not descended from StandardError, so if you say rescue without qualification you won’t handle them. This is a huge “gotcha” waiting to gobble up beginners; in fact, I regard it as the worst aspect of Ruby, a massive flaw in the jewel. I myself have been bitten very often, and now I almost never use bare rescue.

You do not, however, have to include the => err part; you can retrieve the exception object as $!. To handle different exception types differently, you can use multiple rescue structures, in order of increasing generality:

Even this does not express the fullest possible form of a rescue structure, which is actually like this:

Very elegant stuff can be done with the else and ensure clauses, but I’m not going to elaborate here.

To generate an exception, call the raise method. If it takes a string parameter, raise creates a new RuntimeError object and assigns the string to its message, which is the description to be output if the exception is not handled. With no parameters inside a rescue clause, raise re-raises the handled exception. A not uncommon technique is to modify the handled exception and raise that:

One final word about Ruby control structures. Ruby is very flexible about lineation, and is remarkably forgiving of the use of clauses within larger expressions. Thus it is quite common, for example, to assign an entire if structure to a variable, or to send a message to the result of a block:

You’re a real Rubyist (and a happy programmer) when you’re comfortable talking like that.

2.6. Libraries and Gems

One of Ruby’s nicest features is the ease with which it handles storage of different parts of a program in different files. The chief command here is the require method. Its job is to load and execute a file, then and there, once. By “once” I mean that require keeps a list of the parameters that it has been handed, and if it is handed a parameter that it has already seen, it does nothing. Thus, in a simple-minded way, require tries to ensure that a file is loaded only once.

The parameter to require is either a full pathname or a simple filename, and if it’s a simple filename it is quite usual to omit the extension “.rb”. If the parameter is a full pathname, it is loaded and executed. If the parameter is a simple filename, a global variable called $: is consulted. Its value is an array of strings, each string being the pathname of a directory. So now require runs through the array, looking for the file whose simple filename was given, appending the extension “.rb” to its name if needed.

The global variable $: is an array like any other, which means that your code can modify it. A frequent technique is to modify $: at the very beginning of a program, appending additional directories where you want require to look. For example, the current working directory, ".", is included in the default $: list; but the current working directory is not the same as the directory containing the file that is running now. You might want require to search that directory, or a particular directory within it. You can obtain the directory of the currently running file like this:

You can append that, or some directory pathname based on a manipulation of it, to $: in order to affect require’s behavior.

Let me pause to emphasize that require executes a file when it loads it. This is no different from what happens when Ruby itself is told to execute a file. We go through the loaded file from start to finish, executing as we go (and treating the loaded file as top-level code). As we’ve already seen, module and class and def sandwiches are executable code; and this sort of thing is typically the point of loading and executing a file. For example, you might have a class MyCoolClass that you use frequently. So, you keep the definition of that class in a file, and whenever you write a Ruby program that needs MyCoolClass, you require that file. The class sandwich that opens MyCoolClass and endows it with methods is executed, and MyCoolClass springs to life, then and there. Subsequent code in your main file can now instantiate MyCoolClass.

Clearly, order matters; and for this reason, it is most common (though by no means necessary) for any require calls to come very close to the start of a program file — so that the rest of that program file can take advantage of the modules and classes that were opened in the required file(s).

A file containing code intended to be used by other programs can be called a library. So far, we’ve been talking as if the only files you would require are your own libraries; but in fact, Ruby itself comes with many libraries that are not loaded by default. For example, suppose you want to use Ruby’s Date class. To do so, you need (on my machine, at least) to require the file that contains it:

Without the first line (which loads date.rb from one of the directories listed in $:), there is no Date class.

Thus we see that libraries not only permit large programs to be broken up into multiple files, and endow frequently used code with reusability, but also prevent the global namespace from being unnecessarily overburdened. If you don’t need the Date class in a program, you don’t load it; so there’s no runtime penalty for keeping it around in case you do need it.

Third-party libraries are frequently packaged as gems. (As of this writing, there are about 4500 gems available.) The benefit of this mechanism is that a single command-line command, gem, can go out on the Internet to locate, download and install the latest version of a gem. An installed gem’s code must be loaded with require (the documentation for the gem will tell you the name of the file you’re after); on my machine (using Ruby 1.8.6), I have to require rubygems before I can require a gem library.

As an example, let’s download and install a gem and use it. I’ll try the rdiscount gem (a C implementation of John Gruber’s Markdown). Here we go. First, at the command line:

Truly, before writing this section I had never tried the rdiscount gem before; I’d never even heard of it. Yet in seconds I had it downloaded, installed, and running. This shows how fast and easy gems can be.

3. Where to Go From Here

There are many excellent and compendious introductory books about Ruby out there, and if you want to complete and firm up your knowledge of Ruby, you should read one of them sooner or later. I recommend particularly The Ruby Programming Language, by David Flanagan and Yukihiro Matsumoto (O’Reilly Media, Inc.), and Programming Ruby, by Dave Thomas et al. (The Pragmatic Programmers), widely known as “The Pickaxe Book”.

Ruby is backed by a tremendous depth of built-in core and library-based functionality (quite apart from all the downloadable gems). The built-in modules, classes, and methods are listed in various Web pages to which you can find links at http://ruby-doc.org/. For example, if you’re using Ruby 1.8.6, the links you want to click are called “1.8.6 core” and “The 1.8.6 Standard Library.” The first of those links leads to http://ruby-doc.org/core/, and it is quite amazing what you can learn by just occasionally clicking on a class name (in the second column at the top) and giving that page a good read.

Besides, Ruby has too many classes and methods for you learn them all, so you may as well get comfortable consulting the documentation. No matter what you want to do, there’s probably an easy way to do it in Ruby; it’s just a question of finding out what it is. No book could cover it all, and anyway that would be pointless; the documentation is the book.

This book took time and effort to write, and no traditional publisher would accept it. If it has been useful to you, please consider a small donation to my PayPal account (matt at tidbits dot com). Thanks!

1. The Object-Oriented Structure of Ruby

1.1. Sending Messages to Objects

1.2. Module: The Most Important Kind of Object

1.3. Defining a Module is Executable Code

1.4. A Module is a Namespace

1.5. The self Keyword

1.6. Class: Module Plus Inheritance

1.7. Top Level

1.8. Instance

1.9. Assignment: Names are Pointers

1.10. Instance Variables

1.11. Mixins

1.12. The Singleton Class

1.13. The Truth? You Couldn’t Handle the Truth!

2. Datatypes and Syntax

2.1. Method Calling and Syntactic Sugar

2.2. Some Cool Built-in Datatype Classes

2.3. Assignment and Parameter Passing

2.4. Blocks

2.5. Ruby Control Structures

2.6. Libraries and Gems

3. Where to Go From Here