You’re looking at a draft of a chapter from a work in progress, tentatively titled Scripting Mac Applications With Ruby: An AppleScript Alternative, by Matt Neuburg.
Covers rb-appscript 0.6.1. Last revised Jun 23, 2012. All content ©2012 by the author, all rights reserved.
Chapter 2: Just Enough Ruby
1. The Object-Oriented Structure of Ruby
1.1. Sending Messages to Objects
1.2. Module: The Most Important Kind of Object
1.3. Defining a Module is Executable Code
1.5. The self Keyword
1.6. Class: Module Plus Inheritance
1.7. Top Level
1.8. Instance
1.9. Assignment: Names are Pointers
1.10. Instance Variables
1.11. Mixins
1.12. The Singleton Class
1.13. The Truth? You Couldn’t Handle the Truth!
2.1. Method Calling and Syntactic Sugar
2.2. Some Cool Built-in Datatype Classes
2.3. Assignment and Parameter Passing
2.4. Blocks
2.6. Libraries and Gems
If you already know Ruby, skip this chapter. If you don’t know Ruby, skim this chapter. Whatever you do, don’t expect to learn Ruby from this chapter. But do expect to learn something about Ruby from this chapter.
What this chapter can and will do is provide a quick overview of some of Ruby — just enough Ruby, so that if you’re coming from some other language, you’ll be able to follow the examples in this book. But since this chapter isn’t even going to pretend to teach you all of Ruby or even all the basics of Ruby, you shouldn’t read it all that carefully. Don’t memorize it; just look at it. You aren’t trying to learn the language; you’re just trying to get a general picture of what the language is like. Really learning Ruby is something you can do later, on your own. (I’ll talk about that at the end of the chapter.)
Note: My way of describing Ruby is somewhat peculiar, but there is method in my madness. Most discussions of Ruby explain it “from the bottom up” (starting with strings and arrays and hashes and working up to classes and modules); I explain it “from the top down” (starting with modules). That’s a deliberate pedagogical choice. I learned Ruby from the bottom up, and that made Ruby much harder, and take much longer, to understand; top-down explanation is better, and gives a clearer picture, right away, of how Ruby works. Also, my initial code examples are written in a somewhat stilted style. That, too, is deliberate; it isn’t until later in the discussion that I introduce the syntactical exceptions that permit standard Ruby style, because I see these exceptions as complications that the student shouldn’t have to deal with too soon.
Ruby is an object-oriented language. If you don’t know what that means, just keep reading; I’ll make it (very) clear as we go. Ruby is so object-oriented that a common Ruby mantra is: “Everything is an object.” It’s a bit tricky to explain what an “object” is; but, loosely speaking, it’s a thing that you send a message to. All the action in Ruby happens because you send a message to an object.
In order for an object to respond meaningfully to a message, it must somehow possess internal knowledge of that message — a pre-existing, primed response, saying what should happen when this particular message arrives. This primed response is called a method. In other words, a method is simply a set of instructions saying what an object should do in response to a particular message.
To send a particular message to an object is to call that method of the object.
In Ruby, you send a message to an object using dot-notation: first the name of the object, then a dot, then the message. For example, if we had an object called Dog, we could tell it to bark like this:
Dog.bark
That’s a legal Ruby program in theory, but in fact it won’t do anything useful all by itself. Right now, if you were to run that as a Ruby program, Ruby would give you an error message. That’s because we don’t have an object called Dog
. And even if we did, it wouldn’t necessarily know how to bark. Don’t worry; in a moment, we’ll make a Dog object that does know how to bark.
We do, however, have a built-in object called Kernel
. Kernel
doesn’t know, out of the box, how to bark either; but it does know how to do something called “puts”.
Kernel.puts
That’s a Ruby program that actually runs, with no error message. But it doesn’t appear to do anything. That’s because puts
means: “Output the value of this message’s parameters.” We didn’t provide any parameters, so the output was just an empty line. This time, let’s provide a parameter:
Kernel.puts("Hello, world!")
That’s a working Ruby program that actually does something: it outputs the phrase “Hello, world!” In examples in this book, when there’s output from a line of a program, I’ll show you right there in the program what the output is, using a comment. A Ruby comment begins with a hash character (#
). By convention, a Ruby comment that tells the reader what is output starts with #=>
. So:
Kernel.puts("Hello, world!") #=> Hello, world!
So what we’ve just learned is that there’s a Kernel object, and it has a puts
method. And we learned how to call that method. Ruby has many other built-in objects with built-in methods. For example, consider this code:
Math.cos(0)
Ruby comes with a Math
object, which has a cos
method. Therefore, sending the cos
message to the Math
object does in fact call the Math
object’s cos
method. This, as you might expect, causes the Math
object to calculate the cosine of the parameter. However, it doesn’t cause Ruby to show us the result of the calculation. Since we know about Kernel.puts
, we can fix that:
Kernel.puts(Math.cos(0)) #=> 1.0
Even a literal expression such as a string is an object. There are lots of built-in methods that a string object knows about. For example:
Kernel.puts("howdy".reverse) #=> ydwoh
Method calls can be chained. This makes sense, because if calling a method returns a result, that result is an object (because everything is an object), so it can be sent a method call. For example:
Kernel.puts("howdy".reverse.upcase) #=> YDWOH
What’s happening there is that first we send the reverse
message to the string "howdy"
, getting back a new string, "ydwoh"
. Then we send the upcase
message to that string.
Where objects come from, and how they come to have methods, is the most important thing to know about Ruby. Ruby has three particularly important types of object: module, class, and instance. We will now discuss how to make objects of all three types, and how to endow them with methods — and, of course, we’ll see why each of these types of object is important and what it’s for.
Module is the most important kind of object. Before talking about why it’s important and what it’s for, let’s make a module.
module Dog
end
That code means: “There’s a Dog object, and it’s a module.” Ruby responds by ensuring there really is a Dog module; if there wasn’t one already, Ruby creates it. To prove that the Dog object now exists, we can proceed to talk about it:
module Dog
end
Kernel.puts(Dog) #=> Dog
We didn’t get an error on the last line, because by that point in the program the Dog object did indeed exist. Now, the real power here lies in the fact that we can endow an object with methods. To do so, we use the def
keyword, sandwiched between the module
keyword and its corresponding end
line. For example, let’s endow Dog with a bark
method:
module Dog
def self.bark
"bow wow"
end
end
(The keyword def
is short for “define”, and introduces the definition of a method. Never mind for now what the self
and the dot are doing before the name of the method.) Let’s pause to summarize what’s happening here.
module Dog
end
That code tells Ruby, “There’s a Dog object, and it’s a module.”
def self.bark
"bow wow"
end
That code tells Ruby, “There’s a bark
method.” Not only that; it says what the bark
method does. Between the def
line and its corresponding end
line are sandwiched the lines of code that produce the desired result when bark
is called. Here, the desired result is simply the string “bow wow”.
module Dog
def self.bark
"bow wow"
end
end
That code combines the two things. The def
sandwich is inside the module
sandwich. The module
sandwich puts the Dog object under discussion, so the def
sandwich is about the Dog object. Thus, this arrangement of code endows the Dog object with a bark
method and states that when the bark
message is sent to the Dog object, the answer “bow wow” should come back.
Great, so let’s try it: let’s actually send the bark
message to the Dog object and see if “bow wow” does come back:
module Dog
def self.bark
"bow wow"
end
end
Dog.bark
Nothing happened. Why didn’t it work? It did work; it’s just that having “bow wow” come back within the program is not the same thing as outputting “bow wow” so that we can see it. But we know how to output something, so it’s easy to modify our program to do so:
module Dog
def self.bark
"bow wow"
end
end
Kernel.puts(Dog.bark) #=> bow wow
Eureka! We have created an object and endowed it with a method, and then we have successfully called that method.
Let’s sum up. Kernel is an object, and in particular, it is a module. Math is an object, and in particular, it is a module. Dog is an object, and in particular, it is a module. What’s special about Dog is that it didn’t exist until we created it. Moreover, Kernel already had a puts
method, and Math already had a cos
method, before we came along. But we endowed Dog with a bark
method ourselves.
A characteristic aspect of Ruby that surprises newcomers is that lines like module Dog
and def self.bark
are executable code. They are commands, and when our Ruby program runs, they are executed, in the order in which they are encountered. Consider once more the program we just created:
module Dog
def self.bark
"bow wow"
end
end
Kernel.puts(Dog.bark) #=> bow wow
The end
lines merely indicate groupings (i.e. where the sandwiches end), so that program actually consists of three executable lines of code, in this order:
module Dog
: Ensure the existence of a Dog object, which is a module.
def self.bark
: Within the Dog object (because this line appears inside the module Dog
sandwich), define a method called “bark.” Note: The line "bow wow"
is not executed at this time. That’s because it’s inside a method definition; it is not saying what should happen now, but rather what should happen if and when this method is called.
Kernel.puts(Dog.bark)
: Call the bark
method on the Dog object; take the result of that and, using it as a parameter, call the puts
method on the Kernel object.
It follows that order matters. Our program would not work if the last line were placed first:
Kernel.puts(Dog.bark) #=> NameError: uninitialized constant Dog
module Dog
def self.bark
"bow wow"
end
end
In the first line, a non-existent Dog object is mentioned, and the program comes to a halt. The Dog object would have been brought into existence in the second line, but we never reached it.
Notice too that I have never said that a line like module Dog
creates the Dog object. I have said that it asserts or ensures its existence, that it creates it if it didn’t exist already. It is perfectly legal to say module Dog
and define some methods on the Dog object even if the Dog object already exists. Not only is it legal, it’s common. Not only is it common, it’s largely the essence of what makes Ruby Ruby. Read the following program in order and be sure you understand how it works:
module Dog
def self.bark
"bow wow"
end
end
Kernel.puts(Dog.bark) #=> bow wow
module Dog
def self.wag
"wag wag"
end
end
Kernel.puts(Dog.bark) #=> bow wow
Kernel.puts(Dog.wag) #=> wag wag
module Dog
def self.bark
"ruff ruff"
end
end
Kernel.puts(Dog.bark) #=> ruff ruff
Kernel.puts(Dog.wag) #=> wag wag
Let’s talk through that code. First, we define a bark
method on the Dog object and call it (“bow wow”). Then, we define a wag
method on the Dog object. The Dog object already exists and already has a method (the bark
method), but that’s not a problem: the bark
method continues to exist (“bow wow”) and the wag
method now exists too (“wag wag”). Finally, we define the bark
method on the Dog object again. Our new definition of the bark
method is now in effect, so code that calls the bark
method after this moment has a different result from before (“ruff ruff”); meanwhile, the wag
method continues to work as before (“wag wag”).
Furthermore, no distinction is drawn between objects that you create and objects that are built-in to Ruby. We could just as easily give the built-in Kernel object a bark
method, if that suited our purposes:
module Kernel
def self.bark
"bow wow"
end
end
Kernel.puts(Kernel.bark) #=> bow wow
We could even change the definition of a built-in Kernel method, such as puts
. Of course this means that you have the power to alter built-in objects in such a way as to bring Ruby’s normal behavior to its knees. Nevertheless, altering built-in objects is not at all uncommon. When people describe Ruby as highly dynamic, this is the kind of thing they are talking about.
Thus it would be wrong to describe code like this as defining a Dog module:
module Dog
end
That code asserts the existence of the Dog module, but it doesn’t define Dog; if it did, Dog would be a module without methods. That is not the case. All modules have built-in methods; and besides, if Dog already exists and has methods, that code doesn’t remove those methods. (It is possible to remove methods — in Ruby, everything is possible — but that isn’t how you do it.) Rubyists frequently refer to code like that as opening the module; it places the module under discussion, so that sandwiched code can define methods.
It is also possible for other executable code to appear in the sandwich. For example, this is legal:
module Dog
Kernel.puts("opening Dog")
end
That works, and it outputs “opening Dog,” not in response to any method being called, but at the moment the Kernel.puts
line is encountered. That example is highly artificial, but there are certain types of code that are quite commonly executed in the course of a module-opening sandwich.
Modules have many powerful features, and one of them is that they are namespaces. This means that names defined inside a module are hidden from outside the module.
To see what I mean, you need to know that one of the things you can do inside a module is define another module. So, for example:
module Animals
module Dog
def self.bark
"bow wow"
end
end
end
Kernel.puts(Dog.bark)
#=> NameError: uninitialized constant Dog
From where we are in the last line, when we say Dog.bark
, the name Dog
is not visible. That’s because the name Dog
is defined inside the Animals module. However, we are at the same level as the place where Animals was defined, so the name Animals
is visible. Now, we can reach the name Dog
, if we want to; we just can’t do it directly, because it isn’t directly visible. Instead, we have to do it by way of the name Animals
, which is directly visible. The “by way of” operator is two colons (::
).
module Animals
module Dog
def self.bark
"bow wow"
end
end
end
Kernel.puts(Animals::Dog.bark) #=> bow wow
Ruby uses modules as namespaces to keep things nicely packaged. A module, in fact, might exist solely as a way to package other modules together, keeping their names from polluting the global namespace. This helps to avoid name collisions, and also just makes it clearer what something is for. That’s what’s happening in this code:
Kernel.puts(Math.cos(Math::PI)) #=> -1.0
The name PI
is defined inside the Math module, so you have to dive into the Math module to use it.
The “names” that I’ve been talking about here all begin with a capital letter. This is not coincidence. Module names must begin with a capital letter, because that’s what tells Ruby what rules to follow in looking to see whether the name is defined. Those rules are complicated, but the take-away message is simple: an object whose name begins with a capital letter is visible to code looking upwards to the level where that name is defined. In the previous example, code inside the Dog module can see the name Dog and the name Animals; but code at top level, where the module Animals is defined, can see the name Animals but not the name Dog. Built-in module names, such as Kernel, are implicitly defined at top level, so that all code can see them.
Incidentally, the dot-notation operator that we are using to call a method of an object is itself a “by way of” operator. In the last line of the above program, just as we cannot see Dog
directly without first diving into Animals
, so too we cannot see bark
directly without first diving into Dog
. In fact, the double-colon (::
) and the dot (.
) are largely interchangeable.
The self
keyword is often used as an object to send a message to, and when so used, it means: “The object we are in, right now.” By “right now,” I mean “at the time the code runs.” Code can thus use self
as a way of accessing, from inside as it were, names that other code would have to access from outside using some explicit object name. So, for example, let’s make a Dog module and give it two methods, one of which calls the other:
module Dog
def self.bark
"bow wow"
end
def self.speak
self.bark # send to self
end
end
Kernel.puts(Dog.speak) #=> bow wow
When we call Dog.speak
, the speak
method runs, and the keyword self
is encountered (in the expression self.bark
). Where are we at that moment? We’re in the Dog module, because that’s who the speak
message was sent to. Thus, saying self.bark
sends the bark
message to the Dog module, from inside as it were, in exactly the same way that saying Dog.bark
would send the bark
message to the Dog module from outside, explicitly, by name.
It is legal for the keyword self
, when used in this way, to be omitted. Putting it another way, if a method call appears to be sent to no object, that’s just an illusion, a convenience created by Ruby behind the scenes; in reality, the method call is being sent to self
. So, this would work just as well:
module Dog
def self.bark
"bow wow"
end
def self.speak
bark # send to self, implicitly
end
end
Kernel.puts(Dog.speak) #=> bow wow
I find omission of self
confusing, and I would prefer to avoid it; to me, explicit use of self
is cleaner and clearer. However, there are situations, having to do with a feature of certain methods called “privacy”, where self
must be omitted. The idea is that a “private” method can be called only from within an object that is endowed with that method, and Ruby enforces this by balking if you explicitly send a private method call to any object, even self
. I regard this as unfortunate (it’s one of the few things about Ruby that I don’t like), but there it is.
The second most important kind of Ruby object is class. A class is a module, but it adds two important features: inheritance and instantiation. We’ll talk first about inheritance.
To make a class, you use the class
keyword, in a way that looks just like what you do when you make a module:
class Dog
def self.bark
"bow wow"
end
end
Kernel.puts(Dog.bark) #=> bow wow
Thus far, a class simply is a module. But now let’s add inheritance. When we define a class, we are allowed to say what other class it inherits from. This means that our class gets not only its own methods but also any methods belonging to the class it inherits from. Let’s make a class Poodle that inherits from Dog. To do so, we use the <
symbol as we define the Poodle class:
class Dog
def self.bark
"bow wow"
end
end
class Poodle < Dog
end
Kernel.puts(Poodle.bark) #=> bow wow
Poodle inherits from Dog, so it inherits Dog’s methods. Poodle knows how to bark because Dog knows how to bark. Poodle is a subclass of Dog, and Dog is Poodle’s superclass. (One class can have many subclasses, but can have only one immediate superclass.) It would be superfluous, though, to make a class that inherited everything from its superclass and stopped there; they’d be effectively the same class. The power comes in when a subclass is like its superclass but with a difference. There can be two kinds of obvious difference. First, the subclass might have a method that the superclass lacks:
class Dog
def self.bark
"bow wow"
end
end
class Poodle < Dog
def self.stand
"I'm standing on my hind legs"
end
end
Kernel.puts(Poodle.bark) #=> bow wow
Kernel.puts(Poodle.stand) #=> I'm standing on my hind legs
In that example, a Poodle can stand on its hind legs, but not every Dog can. The second kind of difference is that the subclass might have a method with the same name as its superclass, but with different functionality. Naturally, when you send a message to the subclass, you get the subclass’s version of what that method does:
class Dog
def self.bark
"bow wow"
end
end
class Poodle < Dog
end
class Basenji < Dog
def self.bark
"[silence]"
end
end
Kernel.puts(Dog.bark) #=> bow wow
Kernel.puts(Poodle.bark) #=> bow wow
Kernel.puts(Basenji.bark) #=> [silence]
In that situation, we can say that Basenji’s bark
overrides the bark
method inherited from Dog.
Sometimes, you want to override an inherited method but incorporate the inherited method’s functionality into the override. So you need a way to call the inherited method from within the method that overrides it. To do so, you use the super
keyword. Imagine, for example, that a noisy dog barks louder than a normal dog:
class Dog
def self.bark
"bow wow"
end
end
class NoisyDog < Dog
def self.bark
super.upcase
end
end
Kernel.puts(NoisyDog.bark) #=> BOW WOW
Saying super
inside NoisyDog’s bark
method calls the superclass’s bark
method, giving us the string “bow wow”. We then send the upcase
message to that string, and the result of doing that is what NoisyDog’s bark
method produces.
In a subclass, self
naturally embraces the superclass; the subclass inherits methods from the superclass, and therefore so does self
. So:
class Dog
def self.bark
"bow wow"
end
end
class Poodle < Dog
def self.speak
self.bark # send to self
end
end
Kernel.puts(Poodle.speak) #=> bow wow
If you don’t specify a superclass when you define a class, Ruby supplies one for you — the built-in Object class. In other words, this code:
class Dog
end
is absolutely identical to this code:
class Dog < Object
end
If you change a superclass’s functionality, the subclass instantly inherits the change:
class Dog
def self.bark
"bow wow"
end
end
class Poodle < Dog
end
Kernel.puts(Poodle.bark) #=> bow wow
class Dog
def self.bark
"ruff ruff"
end
end
Kernel.puts(Poodle.bark) #=> ruff ruff
(It is illegal to change a subclass by specifying a different superclass than the one it already has. Ruby is highly dynamic, but it isn’t insane.)
The Object class is important in another way: Every Ruby program takes place entirely inside it. Everything is an object, after all; well, the top-level object, the “universe” as it were, is the Object class object. Certain special rules apply to the top level world (for example, if you ask for self
at the top level, you’re told that it’s main
), but ultimately it is as if the whole program were embedded in an implicit class Object
sandwich.
One result of this architecture is that it is legal to define methods at top level:
def greet
Kernel.puts("Hello, world!")
end
greet #=> Hello, world!
Thus, it is possible to program in a “lazy” way, without going through the overhead creating any objects just to get any work done at all. I suppose that as a good little object-oriented programmer I ought to deprecate this style, but in fact it is very handy and many examples in this book will use it.
Along with the ability to inherit, a class has the ability to be instantiated. This means that from a class we make an instance.
To make an instance from a class, you send the new
message to a class. Now you have a new object, an instance of that class, and you can send messages to it:
class Dog
def self.bark
"bow wow"
end
end
Kernel.puts(Dog.new.class) #=> Dog
Kernel.puts(Dog.new.bark)
#=> NoMethodError: undefined method ‘bark’ for #<Dog:0x1593c>
Something odd is going on here. Our Dog.new
instance is clearly a Dog; sending it the class
message tells us so. So why doesn’t it know how to bark?
To tell you the answer, I have to make a confession. There are actually two kinds of method: class methods and instance methods. All the methods we’ve created so far have been class methods — and that’s the significance of the self
modifier that appears before the method’s name in the def
line. A class method is a method that you call by sending a message to the class (or module), as we’ve been doing. An instance method, on the other hand, is a method that you call by sending a message to an instance of class (an instance that you generated from that class using new
, as we’ve just seen). And it is defined inside the class sandwich without using the self
modifier, like this:
class Dog
def bark # this is an instance method
"bow wow"
end
end
Kernel.puts(Dog.new.bark) #=> bow wow
Since we’ve changed Dog so that bark
is an instance method, we can send the bark
message to an instance of the Dog class, but we can no longer send the bark
message directly to the Dog class itself:
class Dog
def bark
"bow wow"
end
end
Kernel.puts(Dog.new.bark) #=> bow wow
Kernel.puts(Dog.bark) #=> NoMethodError: undefined method ‘bark’ for Dog:Class
So, if all you want to do with a certain class or module is to send messages directly to that class or module, there is no point whatever in giving that class or module any instance methods. The purpose of giving a class an instance method is so that we can call that method on an instance of that class.
Indeed, although module is the most important kind of object, because it governs the whole structure of how Ruby works, instances that you make from classes are the most common kind of object. When you actually do stuff in a Ruby program, you mostly do not send messages to modules and classes, as we have been doing so far. You mostly send messages to instances generated from classes using new
, as we are now doing.
Observe that the keyword self
used as an object to send a message to, when instance method code is running, refers to the instance. That’s just a consequence of the fact that self
refers to “the object we’re in now”. When what’s running is instance method code, the object it’s in is, ipso facto, an instance. So:
class Dog
def bark
"bow wow"
end
def speak
self.bark # send to self
end
end
Kernel.puts(Dog.new.speak) #=> bow wow
The line self.bark
executes at a time when the speak
message has been sent to the Dog.new
instance. Therefore, that instance is self
, and self.bark
calls the bark
instance method.
So what if an instance needs to refer back, as it were, to its class? It can use the class
method, which we used at the beginning of this section. (That’s also one way, obviously, of finding out what class an instance is an instance of.) Knowing this, you can see how an instance would also be able to call a class method of the class of which it is an instance:
class Dog
def self.bark # a class method
"bow wow"
end
def speak # an instance method
self.class.bark
end
end
Kernel.puts(Dog.new.speak) #=> bow wow
Every time you call new
on a class, you create a new and different instance of that class. To see this easily, we have only to send an instance the object_id
message. Every object in the Ruby universe at every moment during the running of a program has a unique object_id
value, so we can easily detect whether two objects are the same.
class Dog
end
Kernel.puts(Dog.object_id) #=> 44270
Kernel.puts(Dog.object_id) #=> 44270
Kernel.puts(Dog.new.object_id) #=> 44180
Kernel.puts(Dog.new.object_id) #=> 44150
That program shows that there is only one Dog class object, no matter how many times we refer to it; but saying Dog.new
gives us a different instance of Dog every time.
An object that can’t be referred to is useless and goes out of existence automatically in Ruby. So in the above program, the Dog instance 44180 probably doesn’t even exist by the time we have generated the Dog instance 44150. Bringing an object into existence only to have it go out of existence again a moment later is perfectly reasonable, but more often than not, you’re going to want an instance generated with new
to persist for longer than that. One common way to make an instance persist is to assign it to a variable name. In Ruby, the assignment operator is an equal sign (=
).
class Dog
end
fido = Dog.new
Kernel.puts(fido.object_id) #=> 44360
Kernel.puts(fido.object_id) #=> 44360
As you can see, our Dog instance, now called fido
, is persisting from one line to the next. Naturally, if Dog has any instance methods, we can call those methods on fido
.
class Dog
def bark
"bow wow"
end
end
fido = Dog.new
Kernel.puts(fido.bark) #=> bow wow
That code might give you the impression that fido
is itself a Dog instance. That’s loosely true, but it would be more correct to say that the name fido
points to (or refers to) a Dog instance. Objects in Ruby have a kind of independent existence (off in a kind of separate object universe called the object space); names merely point to them. One way to see this is that if you assign an object from one name to another, you end up with two names pointing to the very same object:
class Dog
end
fido = Dog.new
rover = fido
Kernel.puts(fido.object_id) #=> 44310
Kernel.puts(rover.object_id) #=> 44310
The same thing is true of objects generated in other ways:
bark = "bow wow"
noise = bark
Kernel.puts(bark.object_id) #=> 4420
Kernel.puts(noise.object_id) #=> 4420
Beginners sometimes find this confusing (or downright unnerving). The thing to remember is that assignment changes what object a pointer points to; it doesn’t alter that object. For example:
bark = "bow wow"
noise = bark
bark = bark.upcase
Kernel.puts(bark) #=> BOW WOW
Kernel.puts(noise) #=> bow wow
Let’s deconstruct that program. The story starts with a lowercase string. After the second line, both noise
and bark
point to that same lowercase string. The third line repoints bark
at a different object, namely, an uppercase version of that string. But the lowercase version of the string still exists, and noise
is still pointing to it.
On the other hand, it is also possible to alter an object in place. If you do this after pointers to the object are already established, those pointers now all point to the altered object. For example:
bark = "bow wow"
noise = bark
Kernel.puts(noise) #=> bow wow
bark.upcase!
Kernel.puts(noise) #=> BOW WOW
In the fourth line, we upcased bark
, not noise
; nevertheless, the fifth line shows that noise
was upcased. How can this be? After the second line, both noise
and bark
are pointers to the very same string object. The upcase!
method causes the string object itself to be upcased (that’s what the exclamation mark hints at). No new assignments take place, so after the string object is upcased, both noise
and bark
are still pointers to that very same (now upcased) string object.
Oh, one more thing. It should not surprise you to learn that if you change a class, any persistent instances of that class instantly acquire the changes:
class Dog
def bark
"bow wow"
end
end
fido = Dog.new
Kernel.puts(fido.object_id) #=> 43860
Kernel.puts(fido.bark) #=> bow wow
class Dog
def bark
"ruff ruff"
end
end
Kernel.puts(fido.object_id) #=> 43860
Kernel.puts(fido.bark) #=> ruff ruff
What that code demonstrates is that fido
remains the same instance throughout, yet its behavior when told to bark
changes because the definition of the Dog instance method bark
changes. And of course this would be true of all now and future Dog instances; they would all, from now on, have the changed Dog functionality. And of course this is true up the inheritance chain as well; if Dog’s superclass is Animal and we change the Animal class, any Dog instances, those that exist now and those that come into existence in the future, all acquire the change.
So far, nothing that we have said explains why instances are needed. An instance is an object, a class is an object, a module is an object, they’re all objects. Why shouldn’t you write an entire Ruby program using just classes and modules? The answer is simple, and is the basis of the entire notion of object-oriented programming. An instance can have instance variables.
An instance variable is a variable belonging to the instance where it is mentioned. When I say “mentioned,” I mean what I meant when I talked about self
earlier. When code runs, if it mentions an instance variable, that variable belongs to self
, whatever that may be at the moment. Assuming the code in question is instance method code, self
is the instance, and so the instance variable belongs to that instance. You and Ruby can always know when code is mentioning an instance variable, because that variable’s name begins with a single at-sign (@
).
To see what I mean, we’ll give Dog two instance methods, one that sets an instance variable called @name
, and one that reports that instance variable’s value. Then we’ll make two persistent instances of Dog and give them each a different @name
:
class Dog
def your_name_is(s)
@name = s
end
def what_is_your_name
@name
end
end
fido = Dog.new
rover = Dog.new
fido.your_name_is("Fido")
rover.your_name_is("Rover")
Kernel.puts(fido.what_is_your_name) #=> Fido
Kernel.puts(rover.what_is_your_name) #=> Rover
So instances of the same class share their code, because they get their instance methods from the same class; but they maintain the values of their instance variables separately. And that, believe it or not, is the whole point of instances; in fact, it is the essence of what object-oriented programming is all about.
Observe that, in contrast to some languages, Ruby neither requires nor permits you to declare or initialize your instance variables. An instance variable springs into existence, if it didn’t exist already, when code that mentions it runs, like our your_name_is
instance method. If you ask an object for the value of an instance variable that has never been assigned a value, the instance variable springs into existence with the special value nil
. It would also be perfectly legal for fido
to go through life without a @name
; if your_name_is
is never called on fido
, then fido
will never have a @name
.
On the other hand, it is extremely common to wish to endow an instance with some instance variable values as early as possible. It would drive us crazy to have to remember to call your_name_is
on every Dog instance, right after calling Dog.new
. Therefore Ruby provides an instance method that is automatically called “as early as possible” — in fact, it is called as part of new
. That instance method is called initialize
, and it follows this rule: whatever parameters you supply in the new
call are passed on as parameters of the initialize
method. So, you’ll probably never call initialize
directly, but you’ll think of new
as a way of calling initialize
. So it would be much more common to write the preceding program as follows:
class Dog
def initialize(s)
@name = s
end
def what_is_your_name
@name
end
end
fido = Dog.new("Fido")
rover = Dog.new("Rover")
Kernel.puts(fido.what_is_your_name) #=> Fido
Kernel.puts(rover.what_is_your_name) #=> Rover
Notice that this convention requires you to know what the parameter(s) to new
mean. The program was clearer when we had a your_name_is
method; nothing about the phrase new("Fido")
tells the reader that "Fido"
is going to be the instance’s @name
. This is a bummer, and Rubyists often complain about it, devise ingenious alternatives, and so on. But for now, just get over it.
An instance method like initialize
or your_name_is
, which sets an instance variable, or like what_is_your_name
, which gets an instance variable, is called an accessor. An accessor is the only way to get and set an instance variable from outside that instance. (Okay, not really; nothing in Ruby is ever “the only way”. It’s the only normal way, okay?) There is no syntax such as fido.@name
. Instance variables are considered private; if there isn’t an accessor for an instance variable of fido
, you can’t access that instance variable from outside fido
. (Of course if you are fido
, then all you need in order to access an instance variable @name
is to mention @name
in an instance method, as we’ve been doing.)
Although an accessor method can have any name you like, it is considered nice Ruby style to follow a naming convention where the accessor’s name matches the instance variable’s name, like this:
class Dog
def name=(s)
@name = s
end
def name
@name
end
end
fido = Dog.new
fido.name = "Fido" # actually calls name=
Kernel.puts(fido.name) #=> Fido
The method name=
is our first example of some extremely cool syntactic magic that Ruby performs behind the scenes. The equal-sign is the assignment operator, so Ruby lets you write natural-looking code like fido.name = "Fido"
. That code looks natural, but it is actually meaningless, because there isn’t a variable called fido.name
. To endow it with meaning, Ruby translates this line:
fido.name = "Fido"
to this:
fido.name=("Fido")
In other words, Ruby looks for a name=
instance method, and passes it "Fido"
as parameter. If no such method exists, you can’t talk that way:
class Dog
end
fido = Dog.new
fido.name = "Fido" #=> NoMethodError: undefined method 'name='
Thus, Ruby gives you the feeling that you’re using an operator (the equal sign), but in fact you’re calling a method on an object. This happens in Ruby a lot, and is part of what people mean by “everything is an object”. I’ll give further examples later.
It is quite common for an instance to have a bunch of instance variables that it maintains for various purposes, keeping track of things it needs to keep track of, but for which no accessor is supplied. On the other hand, it is also quite common for an instance to have a bunch of instance variable that it wants the rest of the world to be able to access directly, and for which an accessor is supplied. In fact, this is so common that Ruby actually supplies methods for creating accessors — generating, for example, a @name
setter called name=
and a @name
getter called name
, on the fly, complete with code, automatically.
It often happens that you have an instance method or, even more frequently, a bunch of related instance methods, which you would like a certain class to adopt, but where class inheritance does not handle the situation. For example, we might have a class inheritance tree expressing the evolutionary relationship among animals, but how can we express that certain animals, without regard to their place in this tree, can fly? We’d like to “inject”, as it were, a fly
method into certain classes, such as Bird and Bat.
This problem puzzles every object-oriented programming language; those that solve it have various ways of doing so. Ruby’s solution is particularly elegant, and is called a mixin. Basically, you can “mix in” a module into any other module. When you do this, any instance methods in the former also become effectively part of the latter. This mechanism is another reason why modules, as stated earlier, are so important.
To “mix in” a module into another module, you use the include
method. The second module will usually be a class, so the whole thing might look something like this:
module Flyer
def fly
"wheeee!"
end
end
class Bird
include(Flyer)
end
eagle = Bird.new
Kernel.puts(eagle.fly) #=> wheeee!
Observe that there is effectively no reason to endow a module with an instance method other than to use that module as a mixin. You cannot, after all, call a module’s instance method by sending a message directly to the module; for that, the method would need to be a class method. And you cannot instantiate the module, thus endowing an instance with the instance method; a module is not a class, so it can’t be instantiated. The architecture demonstrated above is in fact the typical mixin architecture: the instance method lives in a module, the module is mixed into a class so that the class acquires the instance method, and the class is instantiated, thus making it possible to call the instance method on an instance.
The line include(Flyer)
might need some explanation. The method include
is a built-in method of modules. No object is given, so the include
method is implicitly sent to self
; that’s the class Bird, and a class is a module, so it works. The include
method is “private”, so explicitly saying self.include(Flyer)
is forbidden. Finally, the include
method is called at the moment that line is encountered, while the Bird class is under discussion; I emphasized earlier that other code besides def
could appear in a module-opening sandwich, and include
is a typical example.
Since the methods within a module that are destined for mixing in are instance methods, they can talk about instance variables. Since those methods will run only after they have been mixed in to some class and that class has been instantiated to make an instance, those instance variables will belong to that instance. Thus, mixed-in methods are just as free to create, access, and manipulate an instance’s instance variables as any other instance methods are.
module Flyer
def fly
"wheeee!"
end
def wingsize=(s)
@wingsize = s
end
def wingsize
@wingsize
end
end
class Bird
include(Flyer)
end
eagle = Bird.new
eagle.wingsize = "big"
sparrow = Bird.new
sparrow.wingsize = "small"
Kernel.puts(eagle.wingsize) #=> big
Kernel.puts(sparrow.wingsize) #=> small
As usual, you can modify a mixed-in module and your changes will instantly propagate to any classes that mix it in, and to any instances of those classes:
module Flyer
def fly
"wheeee!"
end
end
class Bird
include(Flyer)
end
eagle = Bird.new
Kernel.puts(eagle.fly) #=> wheeee!
module Flyer
def land
"aaaaah"
end
end
Kernel.puts(eagle.land) #=> aaaaah
Ruby makes heavy use of mixins in its built-in module/class structure. The most important case is that the Kernel module is mixed into the Object class. Every object (modules, classes, instances) inherits ultimately from Object, so all Kernel instance methods are available everywhere. Thus, Kernel is used as a repository for essential methods. Additionally, Kernel has many internally duplicated methods: they are both class methods and instance methods. This means that in every case in all preceding code examples where I have sent a message to Kernel, I was being unnecessarily stilted; one can do this, but no one ever does. The normal way to call puts
is not to say Kernel.puts
(calling the puts
class method of Kernel) but just to say puts
(calling the puts
instance method of Kernel mixed into Object):
puts("Hello, world!")
(Since puts
is a “private” instance method of Kernel, you can’t call it by saying self.puts
, even though every self
is endowed with puts
because Kernel is mixed into Object.)
Since you, too, can modify Kernel, and since Kernel is mixed into Object, and since changes in a mixed-in module are instantly propagated, you can easily inject new functionality throughout Ruby. Use your power for good instead of evil!
We have seen that an instance acquires instance methods from its class. But you can also endow an instance with instance methods individually. Thus it is possible to have a situation where one particular instance of a class has a certain method, and no other instance of that class does.
One way to do this is by use of a syntax that’s easier to demonstrate than to describe. Let’s suppose we have two dogs, fido
and rover
; they both know how to bark, but only fido
knows how to fetch.
class Dog
def bark
"bow wow"
end
end
fido = Dog.new
rover = Dog.new
class << fido
def fetch
"pant pant slobber slobber"
end
end
puts(fido.bark) #=> bow wow
puts(rover.bark) #=> bow wow
puts(fido.fetch) #=> pant pant slobber slobber
puts(rover.fetch) #=> NoMethodError: undefined method ‘fetch’
As you can see, the way we endow the instance fido
with its own personal instance method is to treat fido
as itself a class, effectively opening a class
sandwich on the instance. It would be wrong to say that fido
is a class, though, so it is better to imagine that fido
— and every instance — is accompanied by a sort of personal, “shadow” class, a class from which this instance alone derives instance methods. This “shadow” class is called the singleton class. Thus, the line class << fido
opens fido
’s singleton class for discussion.
There is actually a more compact syntax for doing the same thing, appropriate especially if we are about to endow an instance’s singleton class with just one method:
class Dog
def bark
"bow wow"
end
end
fido = Dog.new
rover = Dog.new
def fido.fetch
"pant pant slobber slobber"
end
puts(fido.bark) #=> bow wow
puts(rover.bark) #=> bow wow
puts(fido.fetch) #=> pant pant slobber slobber
puts(rover.fetch) #=> NoMethodError: undefined method ‘fetch’
Another way to endow the singleton class with methods is to use the mixin architecture. This is done with the extend
method. It’s an instance method of the Object class, so it can be called on any instance; and its effect is similar to include
, except that instead of endowing a full-fledged module with another module’s instance methods, it endows an instance’s singleton class with a module’s instance methods.
class Dog
def bark
"bow wow"
end
end
module Fetcher
def fetch
"pant pant slobber slobber"
end
end
fido = Dog.new
rover = Dog.new
fido.extend(Fetcher)
puts(fido.bark) #=> bow wow
puts(rover.bark) #=> bow wow
puts(fido.fetch) #=> pant pant slobber slobber
puts(rover.fetch) #=> NoMethodError: undefined method ‘fetch’
The attentive reader may now be thinking: If we are allowed to say def fido.fetch
, defining an instance method on fido
’s singleton class from outside fido
, surely we are allowed to do this for a class or module. Well, let’s try it:
class Dog
end
def Dog.bark
"bow wow"
end
puts(Dog.bark) #=> bow wow
Holy kamoly! It looks like we’ve just endowed the Dog class with a class method, bark
, from outside the Dog class sandwich. But then surely that’s what we were already doing when we endowed the Dog class with a class method from inside the Dog class sandwich:
class Dog
def self.bark
"bow wow"
end
end
puts(Dog.bark) #=> bow wow
The only difference is that because we are now inside the Dog class sandwich, we can say self.bark
instead of Dog.bark
, because inside the Dog class sandwich, self
is Dog
.
Well, dear reader, you figured it out. It’s true. There isn’t really such a thing as a class method. All methods are instance methods!
Here’s the truth about the big picture of what’s going on in Ruby. (I’m warning you, don’t read the rest of this section unless you’re willing to risk having your head explode.)
Everything is an object, meaning an instance of some class. A class object is itself an instance of the Class class; a module object is an instance of the Module class. The Class class’s superclass is Module; the Module class’s superclass is Object. All classes have Object as their ultimate superclass.
An object is able to respond to a method call, not by virtue of anything inside itself, but because there a corresponding method in its singleton class, or in its actual class, or in some class further up the inheritance / mixin chain. In the case of something like a Dog instance such as fido
, that’s easy to understand: the reason we can send a message to fido
is that there are instructions for dealing with that message in fido
’s singleton class, or in Dog, or in Object, or in some module mixed into one of those. But exactly the same thing is true of the Class instance Dog: the reason we can send a message to the Dog class (what I’ve been calling a “class method”) is that there are instructions for dealing with that message in Dog’s singleton class, or in Class, or in Module, or in Object, or in some module mixed into one of those.
You don’t have to understand that in order to use Ruby, and indeed I think that my way of describing Ruby, starting with modules and class methods, then proceeding to classes, and finally to instances, instance methods, instance variables, mixins, and singleton classes, is both valid and pedagogically useful. So feel free to ignore this section if you’re having trouble understanding it. I have to confess that I myself read and reread explanations similar to this section for literally years before I understood Ruby in the way I’ve just described it in the preceding two paragraphs. But when I finally did understand it, it was a very satisfying moment. (And then my head exploded.)
Ruby syntax is fairly straightforward, especially if you already know any other C-like language, such as Perl, REALbasic, or (gasp) C. You can easily acquire the fine points of Ruby syntax on your own, but a few peculiarities may need some special attention up front.
Ruby comes with a number of syntactic rules for method calling, designed to make it look a little less like “everything is an object” and a little more like other programming languages.
For example, we’ve already seen that you can use syntax that looks as if you were assigning to a “property” of an object:
fido.name = "Fido"
Ruby doesn’t have “properties” so there isn’t a real thing on the left side of the equal sign. Instead, Ruby translates this behind the scenes to:
fido.name=("Fido")
Now Ruby is sending the name=
message to fido
, along with the parameter "Fido"
; if fido
doesn’t have a name=
method, there will be an error. So this works (or doesn’t) because fido
’s class defines (or doesn’t) an instance method name=
.
Most things that look like operators in Ruby work the same way. Take, for example, the plus sign:
puts("Hel" + "lo") #=> Hello
There is no addition operator in that expression; Ruby is not “adding” two strings together. Rather, Ruby translates the expression behind the scenes into this:
puts("Hel".+("lo")) #=> Hello
Now Ruby is merely calling the +
method on the string "Hel"
, with "lo"
as its parameter. Since a string is an instance of the String class, which defines an instance method +
, it works.
The case is no different with numbers:
puts(3 + 4) #=> 7
The literal 3
is an instance of the Fixnum class, which defines a +
instance method. So Ruby translates the above expression to a method call, which works:
puts(3.+(4)) #=> 7
As you can see, it is perfectly legal to write all these expressions as method calls yourself. But no one ever does. Syntactic sugar is, by nature, sweet.
Let’s pause to digest some takeaway messages from all this:
It would be wrong to say, as one would say of some other languages, that the +
operator is overloaded in Ruby to add numbers but to concatenate strings. There is no +
operator so there is nothing to overload. There are just objects and methods (“everything is an object”). There is a +
method of the String class that does one thing; there is a completely different +
method of the Fixnum class that does another thing.
Clearly there could be a +
method of your class that does yet another thing! If you want Dog instances to be “addable” (though I shudder to imagine what you might mean by this), you have only to define a +
instance method on Dog.
You can reach right in and change the definition of the +
method on an existing class, just as with any method on any class. For example:
class Fixnum
def +(x)
42
end
end
puts(3 + 4) #=> 42
Great, so now adding any two numbers together is going to yield 42! Clearly this was not a very wise thing to do, if you want the arithmetic universe to keep on working the way you were taught in school. But Ruby doesn’t teach you wisdom; it just gives you power.
Some methods that can be defined with symbolic names in this way are:
+ - * / ** % & | ^ << >> =~ [] []=
<=> < <= > >= ==
===
If you’re looking at those symbols and thinking, “Yes, but what do they mean?” you still haven’t got the idea. They don’t mean anything; or rather, you can make them mean whatever you like. However, it’s true that many built-in classes make them mean some very cool stuff, which you’re going to want to learn about; and it’s also true that some of these meanings are conventional, and that you too should follow those conventions, not because you have to, but because it’s convenient if you do and confusing if you don’t. For example, <<
is used by a number of classes (String, Array, IO) to mean “append”, and many other classes follow suit.
Another example of the value of obeying conventions is the family of methods connected with the spaceship method, <=>
. The convention here is that when instances of a class can be compared with the notions “less than”, “equal to”, and “greater than”, the spaceship method embodies all three: it returns -1, 0, or 1, according to whether the object to which the <=>
message is sent is less than, equal to, or greater than the parameter. The cool part is that if your class implements this one method, you can mix in the Comparable module and presto, the other five methods spring to life (not a difficult trick, because they are all trivially defined in terms of the spaceship operator).
The square bracket methods at the end of the first line come into play with classes like String and Array. For example, given an array arr
, you can fetch the first item of the array by asking for arr[0]
, and you can set the first item of the array (replacing its previous value, if any) by saying arr[0] = newvalue
or similar. Those expressions are syntactic sugar; behind the scenes, they are translated to calls on the []
and []=
methods, respectively.
There are additional operators that are syntactic sugar for combinations of methods. For example:
s += "!"
That means: “Fetch the value of s
; call its +
method with "!"
as parameter; and assign that back to s.” Or, try this one:
fido.name += "ookums"
That means: “Call fido
’s name
method; take the result and call its +
method, with "ookums"
as parameter; now take that result and use it as the parameter in calling fido
’s name=
method.”
Many operators have assignment combinations like the above. A common Ruby idiom is something like this:
class Dog
def name
@name ||= "Fido"
end
end
This depends upon the following facts:
||
, the “logical or” operator,
evaluates its arguments from left to right, stopping after the left argument if it is true
nil
is false in a logical expression (and so is false
); everything else is true
The value of an assignment is the new value of the thing assigned to
So, here’s what that code does. If @name
is non-nil (i.e., if it has already been assigned a string value), its value counts as true in the logical expression; so evaluation of the logical expression stops, and the value of @name
is returned. If, on the other hand, @name
is nil
(which, if it has never been given a value, it will be), we go on to the right argument in the logical expression. That value is "Fido"
, which is assigned to @name
— and now the value of @name
is returned. So, either way, the value of @name
is returned, but if @name
is nil
it is assigned the value "Fido"
first. This is effectively a way of saying that "Fido"
is @name
’s default value, the value it is to have if it has never been assigned a value.
Ruby comes with a number of extremely elegant built-in classes. It’s impossible to provide all the details in a short space, and besides, that’s the sort of thing that traditional “bottom-up” Ruby tutorials do very well. This section will just provide a quick survey.
Numbers work pretty much the way you would expect them to. There are several numeric classes, and numeric literals are automatically translated into instances of the appropriate class.
Ranges are objects representing a slice of a sequence. A literal range is written with two dots between the endpoints of the slice. For example, 1..5
means all the integers from 1 to 5, inclusive, and 'a'..'c'
means all the lowercase letters from a
to c
. Obviously Ruby has rules for what constitutes a sequence; you can create your own classes for things that behave sequentially, and ranges will work with them. Ranges are of particular value in taking slices of strings and arrays, and in loops, as we shall see.
Symbols are a little bit like strings, and indeed strings can generally be converted to symbols and a vice versa. A literal symbol starts with a colon, so :this_is_a_symbol
. The important difference is that although there can be many string objects equal to "some string"
, there can be only one symbol equal to :some_symbol
. Symbol lookup and comparison is therefore very fast, and symbols are useful wherever a mere token or identifier is needed.
Strings can be formed using literal delimiters: single-quotes and the equivalent %q{...}
, and double-quotes and the equivalent %{...}
. (I’m greatly over-simplifying.) There’s a big difference between single-quoted string literals and double-quoted string literals: double-quoted string literals can contain various escaped characters, plus they are candidates for expression interpolation. For example:
s = 'World'
puts("Hello, #{s}! I greet you #{3 + 4} times!")
#=> Hello, World! I greet you 7 times!
Expression interpolation, as you can see from the example, involves expressions surrounded by #{...}
. This syntax, familiar if you’re acquainted with shell scripting, Perl, and the like, is of great convenience when assembling strings. The expression inside the #{...}
is evaluated in the current context, and if the result is not a string, its to_s
method is called — there is no automatic implicit coercion between classes in Ruby, so a class must implement to_s
if it is to be interpolated into a string. (The Kernel method puts
also calls its parameter’s to_s
, which is why we have not had to worry about whether that parameter is a string.) Ruby also implements “here documents”, where a multi-line stretch of a program is taken to be a literal string, so interpolation into a large string is easy:
m1 = "hey"
m2 = "ho"
m3 = "hey nonny no"
s = <<END
With a #{m1}
and a #{m2}
and a #{m3}
END
The String class comes with many wonderful methods that make string operations a snap. The greatest of these involve square brackets, which let you slice and dice and modify a string easily:
s = 'Hello, world'
s = s + '!' # concatenate
puts(s[0,1]) #=> H; starts at beginning, one character in length
puts(s[7,5]) #=> world; starts at character 7, 5 characters in length
puts(s[-6,5]) #=> world; starts six characters from the end, 5 in length
puts(s[0..4]) #=> Hello; range from characters zero thru 4
s[-6,5] = "everyone" # replace
puts(s) #=> Hello, everyone!
s[0,0] = "Gosh! " # insert
puts(s) #=> Gosh! Hello, everyone!
s = s.sub("ell", "ipp") # find and replace
puts(s) #=> Gosh! Hippo, everyone!
Observe that in Ruby, indexes are zero-based; the first item of a series is item zero.
A literal string in backticks, or the equivalent %x{...}
, is interpolated like a string in double-quotes and then evaluated by the shell. This is actually syntactic sugar for calling Kernel.`(s)
, where the backtick is the name of a method and s
is the string.
Regular Expressions are handled through the Regexp class. A Regexp pattern object can be formed using literal delimiters, either forward slashes or the equivalent %r{...}
. This literal may be followed by a letter or letters indicating the mode of the matching operation, such as i
to ignore case, and m
for a multi-line match. The rules for regular expression pattern interpretation and matching are very involved; suffice it to say here that Ruby has some of the best regular expression support in the universe. Regular expressions can be used in a number of string methods; plus, there’s a basic match operator, =~
. After a regular expression match, a number of special values are set which tell you about what just happened:
s = 'Hello, world!'
s =~ /(.)\1/ # look for a double letter
puts($&) #=> ll; the entire matched expression
puts($~[1]) #=> l; the contents of the first (and only) parenthesized match
puts($1) #=> l; same thing, more conveniently
puts($`) #=> He; everything preceding the match
puts($') #=> o, world! everything after the match
puts($~.begin(0)) #=> 2; where the entire match starts in the target string
Both $~
and the result of the match operation in the second line are MatchData instances; a MatchData captures all of the information about what happened, and has methods for reporting them (like begin
, and the square-bracket method used in the fourth line).
Arrays are ordered lists. A literal array is delimited by square brackets, with the items separated by commas. An item can be any value at all; however, it is a bad idea to give an item the value nil
, because that is the default response when you ask for a non-existent item. Arrays can easily be concatenated, combined, sliced and diced, and so on.
arr = [] # an empty array
arr << 1 << 2 # easy way to append to an array
puts(arr[1]) #=> 2
puts(arr[-1]) #=> 2
p(arr[0,2]) #=> [1, 2]
puts(arr[2]) #=> nil
arr.insert(1, 1.5, 1.5, 1.5)
p(arr) #=> [1, 1.5, 1.5, 1.5, 2]
arr = arr.uniq
p(arr) #=> [1, 1.5, 2]
(Notice the use of p
instead of puts
for outputting the literal form of an array. It calls an object’s inspect
method instead of its to_s
method, and some classes, including Array, implement these differently.)
I could go on and on talking about arrays. You can take the union or intersection of two arrays. There are ways of splitting a string into an array of strings, joining an array of strings into a single string, and so on. Arrays are tremendously important in Ruby, which is one reason why they are so well-endowed with methods; in the next section, we’ll see that arrays play a crucial role in assignment and parameter passing.
One thing that beginners need to watch out for is that an array index is merely a pointer, just like any other name. Assigning into an array doesn’t make a copy. This means that the contents of an array can be changed “behind your back”:
s1 = "Mannie"
s2 = "Moe"
s3 = "Jack"
arr = [s1, s2, s3]
s1[2] = "r"
p(arr) #=> ["Marnie", "Moe", "Jack"]
Hashes are unordered collections of key–value pairs. The idea is that you can access a value by way of its key. Keys can be of any class that supports a hash
method, but in practice, strings and (preferably) symbols are typically used. A literal hash is delimited by curly braces, with each key preceding its value and separated from it by =>
(and each key-value pair separated by comma):
favorites = {:composer => "Brahms", :painter => "Van Gogh", :comedian => "Groucho"}
A common way to get and set a value by way of its key is to use the square bracket operator:
favorites = {:composer => "Brahms", :painter => "Van Gogh", :comedian => "Groucho"}
puts(favorites[:composer]) #=> Brahms
favorites[:painter] = "Picasso" # replacement
favorites[:naturalist] = "Darwin" # insertion
p(favorites)
#=> {:painter=>"Picasso", :comedian=>"Groucho", :naturalist=>"Darwin", :composer=>"Brahms"}
puts(favorites[:horse]) #=> nil
As the last line shows, fetching through a non-existent key yields nil
by default.
Hashes are very important in Ruby and are richly endowed with methods, including ways to combine hashes, and ways to convert between an array and a hash. Key names can be formed dynamically, and hash access is very efficient, so a hash is a great way to store arbitrary associative data. I often see beginners asking on the Ruby forums how to form variable names dynamically, e.g. “I want to read the text of several files into string variables, naming each variable whatever the name of each file may be.” Dynamic variable naming is a nutty idea; this situation cries out for a hash.
Another way to think of a hash is as a lightweight object consisting of instance variables. For example, we could make a lightweight database of people and their favorites, implemented as a hash of hashes:
people = {}
people[:me] = {:composer => "Brahms", :painter => "Van Gogh", :comedian => "Groucho"}
people[:you] = {:composer => "Ellington", :painter => "Picasso", :comedian => "Carlin"}
From one point of view, that’s an appalling way to behave, because it’s so fragile; if we know for a fact that every person has exactly a favorite composer, a favorite painter, and a favorite comedian, and that this set of attributes will never change, we should properly create a Person class and use a hash of Persons instead. (And indeed, a Struct class exists just to make it easy to create classes consisting of nothing but instance variables and their accessors.) Nonetheless, hashes are so convenient that even experienced Rubyists do in fact write code like this all the time.
The syntax of assignment and the syntax of parameter passing in a method call are closely related, so they deserve to be studied together. Just as a method can have more than one parameter, so you can assign to more than one variable simultaneously:
name, age = "Fido", 7
After that, name
is "Fido"
and age
is 7. Such multiple simultaneous assignment (or parallel assignment) is, of course, a mere convenience; you could just as easily assign to name
in one line and to age
in the next line. But even a mere convenience can be very convenient.
The real power, however, emerges when the number of names on the left side of the assignment (lvalues) differs from the number of values on the right side (rvalues). If there is just one lvalue and multiple rvalues, the rvalues are combined into an array, and it is this array that is assigned to the lvalue:
name = "Fido", 7
p(name) #=> ["Fido", 7]
If, on the other hand, there are multiple lvalues and just one rvalue, then an attempt is made to treat the rvalue as an array, by calling its to_ary
method if it has one. (Very few classes implement to_ary
; Array does, simply returning self
.) If this attempt succeeds, we now have an array, and in that case the items of the array are distributed over the lvalues:
arr = ["Fido", 7]
name, age = arr # now name is "Fido" and age is 7
Let’s give these behaviors a name. We’ll call the second case, where an array rvalue is distributed over multiple lvalues, splatting the array; and we’ll call the first case, where multiple rvalues are combined into an array, reverse splatting. Then we can say that we have just seen examples of implicit splatting and implicit reverse splatting; the splatting or reverse splatting was performed for us, automatically.
In the case where the number of lvalues and rvalues differs, however, you might want splatting or reverse splatting to occur, and you need a way to indicate this, since it won’t happen automatically. To make this possible, Ruby provides the splat operator, which is an asterisk. If the asterisk precedes an lvalue, that lvalue mops up all excess rvalues as a single array (reverse splatting):
name, *otherstuff = "Fido", 6, 65 # otherstuff is now [6, 65]
If the asterisk precedes an rvalue, that rvalue is treated as an array (by calling its to_ary
method) and, if this succeeds, the elements of that array are distributed over the remaining lvalues (splatting):
arr = [6, 65]
name, age, weight = "Fido", *arr # age is now 6 and weight is now 65
Parameter passing, when calling a method, works in almost the same way. The difference is that when you call a method, there is no implicit splatting. The reason is that when you call a method, there’s an arity check, meaning that the number of parameters passed must match the number of arguments declared by the method definition; if they don’t match, there’s an error.
def test(n)
name, age = n
end
test("Fido", 7) #=> ArgumentError: wrong number of arguments (2 for 1)
But we can get explicit splatting or reverse splatting by asking for it.
def test(*n)
name, age = n
end
test("Fido", 7) # now name is "Fido" and age is 7; do you see why?
Again, here’s the opposite error:
def test(name, age)
end
arr = ["Fido", 7]
test(arr) #=> ArgumentError: wrong number of arguments (1 for 2)
And here’s the cure:
def test(name, age)
end
arr = ["Fido", 7]
test(*arr)
We can use reverse splatting to allow a method to accept any number of parameters:
def test(*n)
No matter how many parameters are passed, there will be no error; all the parameters will be combined into a single array argument, and our method code can now proceed to investigate the situation and behave accordingly.
The picture is made more complicated by the fact that in a method definition we are allowed to specify default values for some or all arguments. In Ruby 1.8.x, such arguments must come after the arguments without default values, and if there is a reverse splatted argument, it must come last of all:
def test(name, age, weight = 165, length = 80, *otherstuff)
end
test("Fido", 6)
# args are "Fido", 6, 165, 80, and []
test("Fido", 6, 150, 75, "what", "is", "this")
# args are Fido", 6, 150, 75, and ["what", "is", "this"]
However, there’s a problem with all this from a usability point of view. Our calls to the test
method contain no indication of what the various parameters are for; we have to consult the method definition and figure it out. (I mentioned this earlier in connection with the discussion of new
and initialize
.) Also, we are tied to supplying the parameters in a fixed order. Also, we have no way to pass a length
value but fall back on the default for weight
. A common workaround is to define a method to expect a hash. A hash’s keys indicate what the values are for:
def test(h)
end
test(:age => 6, :name => "Fido", :length => 75)
It would then be up to our implementation of test
to analyze the hash h
; but this is not difficult to do. The important thing to notice is the absence of curly braces. Ruby has a “syntactic sugar” rule that in the comma-list of parameters in a method call, the last parameter can be a literal hash without curly braces. (Ruby knows it’s a hash because of the =>
symbol.) This rule is to make it easier to pass a literal hash as the last parameter, and taking advantage of it is extremely common.
One final note on passing parameters. Even when a method call involves parameters, parentheses around the parameter list are optional (unless their omission results in an ambiguous expression, because of the complexity of the context or the parameters themselves). Such omission is quite common. So, no one actually writes, as we have been doing:
puts("Hello, world!")
The normal way is to write:
puts "Hello, world!"
Similarly, if the last parameter to be passed to a method call is a literal hash, it is common to omit both the parentheses and the curly braces around the literal hash:
class Dog
def initialize(name, h = nil)
@name = name
# deal with hash, if any, here
end
end
fido = Dog.new "Fido", :age => 6, :weight => 65
Most of Ruby’s control structures are easy to understand, and are similar to the control structures of other languages that you may be familiar with; but something explicit needs to be said about blocks. Nothing is so typical of Ruby syntax as a block, nor so distinctive as to the difference between the Ruby Way and the way other languages do things.
A block is basically the body of a function — arguments, and what to do with those arguments. Any method in Ruby can accept a block. The block, if present, is called only if the method calls the keyword yield
. Any parameters supplied to yield
are passed as arguments to the block. Here’s a trivial (and silly) example:
def blockTester(ss)
yield(ss)
end
blockTester("Howdy") {|s| puts s}
The expression {|s| puts s}
is a literal block. (It is also possible to supply a variable whose value is a block, but I’m not going to talk about that.) Notice that the literal block is outside the parentheses of the method call. The vertical pipes (|s|
) come first inside the block, and give the names of the arguments; then comes the code. So, let’s talk about what happens in that code.
We call blockTester
, handing it two things: a string parameter, and a block. The string parameter, "Howdy"
, becomes the method argument ss
; the block just sits there, waiting to see whether the method ever calls yield
. The method does call yield
, with one parameter, namely the value of ss
(which is still "Howdy"
). Okay, so now we’re in the block. One parameter arrives and is assigned to the argument s
inside the block. Then the code of the block executes, and we output “Howdy”.
For longer blocks, it is common to use a different syntax, with do
and end
instead of curly braces:
def blockTester(ss)
result = yield(ss)
puts result
end
blockTester "Howdy" do |s|
puts s
"Done"
end
(But there isn’t actually any important difference between using curly braces and using do
/ end
to delimit a literal block.) In that example, notice that the value returned by the block (in this case, the string “Done”) is the value returned by the yield
call. After the line that calls yield
, the method carries on in the normal way.
Now, in your initial use of Ruby you are unlikely to write many methods that expect blocks. But you are very likely to call methods that expect blocks, because such methods are the standard way of looping in Ruby. The basic example is the each
method. Many built-in Ruby classes implement each
, especially collections and ranges. The each
method means: “Do this for each item in the collection.” To tell the each
method what you mean by “this,” you pass a block.
For example, suppose you want to fetch the value of column 1, the value of column 2, and so on up to n
. I have no idea what a “column” is, or how you fetch the value of one, and I don’t care; I’m only interested in the abstract mechanics of each
and a block. If you know any other computer language at all, you are probably tempted to write a for
loop, and you’re going to be casting about to find out how to express this in Ruby. Well, you can; but don’t. No one writes for
loops in Ruby. The concept “1, then 2, and so on up to n
” is expressed by a range: 1..n
. The concept “fetch that column” is expressed in a block:
(1..n).each do |col|
fetch_column_number(col) # or whatever
end
Here’s another example (one of my favorite ways of demonstrating Ruby). Suppose we have a string, and we want to count all the occurrences of each unique word in that string. How would you do this? Don’t think in terms of cycling through the string; think of turning the string into a collection of which you can process each item with each
. So, our first step is to bust the string into words:
s = <<END
It was a lover and his lass,
with a hey, and a ho, and hey nonny no;
when birds do sing, hey ding a ding ding,
sweet lovers love the spring.
END
arr = s.split(/\W+/)
Our way of splitting the string into words is crude but cool (involving a regular expression), and now we have an array. An array is a collection, so we’re off to the races. What shall we do with each word? Well, let’s downcase it, so that all our words are lowercase; then, let’s use each word as a key in a hash, which we’ll have prepared beforehand.
h = {} # empty hash
arr.each do |item|
item = item.downcase
# hash the item here
end
How should we hash the item? Well, we’ll fetch the item from the hash. If it’s not there, we’ll get nil
, and we’ll assign 1 as a value (because we have just found our first instance of that item). If it’s there, we’ll get the number of times we’ve found that item so far, and we’ll add 1 to that.
count = h[item]
if count
h[item] = count + 1
else
h[item] = 1
end
That actually works, but it is more common to be a little less verbose. Here’s a tighter version of the same code; be sure you can see why it is the same:
h = {} # empty hash
s.split(/\W+/).each do |item|
item = item.downcase
h[item] = (h[item] || 0) + 1
end
(It is actually possible to make the code even tighter and even more Ruby-like, but that’s enough of that example.)
Variable scoping in blocks is complicated and poses potential hazards for the beginner. The basic rule is that if a variable name mentioned inside a block — including, in Ruby 1.8.x, one of the argument names in pipes — is already defined and visible in the context surrounding the block, then they are the same variable.
This can be extremely convenient, because it means that information from the surrounding context doesn’t have to be passed into the block. So, in the example just above, we defined an empty hash called h
outside the block, and then inside the block we referred to that same h
. In fact, that is why we defined h
outside the block. If we hadn’t done so, then the only h
mentioned would be inside the block, and it would therefore be local to the block, and we would be unable to retrieve its value:
s.split(/\W+/).each do |item|
item = item.downcase
h[item] = (h[item] || 0) + 1
end
p h #=> NameError: undefined local variable or method ‘h’
But because we first defined h
as an empty hash before the block, the h
in the block was that h
:
h = {} # empty hash
s.split(/\W+/).each do |item|
item = item.downcase
h[item] = (h[item] || 0) + 1
end
p h # no problem, the result is output
So it is, in fact, a very common technique to define a variable, even just setting it to nil
, before a call to a method involving a block, just so that block can alter that variable outside itself. But you can readily see the downside: there is a trap waiting for us here, in that we might accidentally give a block variable the same name as a variable outside the block, and destroy the latter’s value unintentionally.
It is important to be clear that when I say that the block can see variable names already defined “in the context surrounding the block”, I mean in the context surrounding where the block is defined. For example, we could have written the first example in this section more concisely, like this:
def blockTester
s = "Farewell"
yield
end
s = "Howdy"
blockTester {puts s} #=> Howdy
In the last line we call blockTester
, passing it a block that refers to s
. The block is called inside the blockTester
method, where s
is "Farewell"
; but that doesn’t matter. At the point where the block was defined, s
was "Howdy"
, and that is the s
whose value the block has captured. And it hasn’t just captured it; it has access to it, and can change it:
def blockTester
yield
end
s = "Howdy"
blockTester {s = "Farewell"}
puts s #=> Farewell
In that example, the code s = "Farewell"
was executed only because we passed it in a block into a method, blockTester
, which yielded to it. But the s
in the block is the same as the s
in the line before because that is where the block is defined.
(Actually, the situation is even deeper. A block is a closure. By this I mean that the block doesn’t just access values that it refers to outside of itself; it preserves them for the lifetime of the block. That doesn’t matter here, because the block’s lifetime is no longer than that of its surroundings; but there is such a thing as a long-lived block.)
Finally, I must say something about how you exit prematurely from a block. Do not say return
inside a block. (Well, you can say it, but it doesn’t return just from the block, it returns from the method that defines the block, which is rarely what you want.) To return from one call of a block, allowing the block to be called again if that’s what the caller wants to do, say next
; you can use this to return a value from the block (some methods that expect blocks also expect the block to return a meaningful value). To return from the method that called the block (i.e. the method that said yield
), say break
; again, you can use this to return a value, which will become the value returned from the method that called the block. If you don’t supply a value with return
, break
, or next
, the value returned is nil
.
For example, the map
method of a range feeds each item to the block, and returns an array comprising each value returned from the block. We’ll write a block that doubles each number fed to it, but skips odd numbers, and stops with a protesting message if it’s fed an even number larger than 6:
arr = (1..6).map do |item|
if item % 2 == 1
next
end
if item > 6
break "TOO BIG"
end
next item * 2
end
p arr #=> [nil, 4, nil, 8, nil, 12]
The result shows clearly what happened: we returned nil
from the block for odd numbers, and doubled the value for even numbers. But if we change the initial range, things are very different:
arr = (1..8).map do |item|
if item % 2 == 1
next
end
if item > 6
break "TOO BIG"
end
next item * 2
end
p arr #=> "TOO BIG"
The result starts out as if it were going to be the same as in the previous example. When item
reaches 8, however, break
is executed; this cancels the entire array construction process and forces map
to return the protesting message instead.
Ruby control structures, as already mentioned, are easy to understand; here’s an extremely quick summary of the ones I most commonly use. Consult a proper language introduction for full details.
Besides the C-like boolean
operators &&
, ||
, and !
, Ruby has English boolean operators and
, or
, and not
. The binary boolean operators have lazy left-to-right evaluation, and for this reason logical-and is often used as a poor man’s “if” (and logical-or is often used as a nil
test, as shown earlier). The English versions have lower precedence, and for this reason are often used with assignment.
defined?(hsh) and s = hsh[:hey] # set s to hsh[:hey], but only if hsh exists
(defined?
is a keyword; it could not be a method call, because in that case it would choke if its parameter wasn’t defined.)
Here’s the structure of a conditional:
if condition
# whatever
[elsif condition]
# whatever
[else]
# whatever
end
Instead of if
and the negative of something, you can say unless
and the positive of that same thing. There is no elsif
in an unless
structure, and I must warn you that although you can use else
in an unless
structure, this can make your code quite difficult to understand.
A nice feature of Ruby, similar to Perl, is that simple if
and unless
conditions can be used as a postfix with a single statement. So, we could rewrite our earlier block example in a more usual Ruby idiom:
arr = (1..7).map do |item|
next if item % 2 == 1
break "TOO BIG" if item > 6
next item * 2
end
For multiple comparisons, the case
construct is often used:
case n
when 1, 2
# whatever
when 3
# whatever
[else]
# whatever
end
No break
statements are needed; we never fall through from one when
to the next, as in a C switch
construct. Comparison between the comparand (here, n
) and the possible values uses the ===
method as defined by the class of the when
value (not the comparand!); by default, this is the same as the ==
method, but it can be specially defined. For example, Class defines ===
to mean “is an instance of this class or of one of its subclasses”, so you can test against various class possibilities like this:
case n
when SomeClass, OtherClass, ThirdClass
# whatever
end
That’s an elegant way to say “if n is a SomeClass or an OtherClass or a ThirdClass.”
Here’s the structure of a while
loop:
while condition do
# whatever
end
Instead of while
, you can say until
(until
is to while
as unless
is to if
); and both can be used as a postfix with a single statement. There are quite a number of keywords for subverting a while
loop. next
aborts this iteration and proceeds to the next iteration; redo
starts this iteration over again; and break
aborts the whole thing. These can also be used in blocks (see above). Blocks are the most Ruby-like way of looping, but while
does come in handy very often.
To jump out of a deeply nested loop, you can use throw
and catch
. These are really the equivalent of the controversial goto
, wearing a different hat. They are methods (of Kernel). catch
takes a symbol and a block; if any code at any depth within the block calls throw
with that same symbol, the block aborts. And throw
can take a second parameter, the value to be returned by the block. The textbooks speak of throw
and catch
as rare, but I have found them extremely useful in my own programming.
To exit a def
prematurely, use return
(possibly with a value). Otherwise, a def
returns the value of the last executed statement (every statement in Ruby has a value). To exit an entire program, use exit
; return
is illegal at top level.
Runtime errors cause an object of class Exception or one of its subclasses to propagate up the call chain; if it is not handled, the program terminates prematurely. To handle an exception requires a structure like this:
begin
# stuff that can go wrong
rescue
# what to do if stuff went wrong
end
Or like this:
def my_method
# the whole method
rescue
# what to do if the method went wrong
end
Or like this:
do_something rescue do_something_else
The rescue
keyword here has very high precedence, so you can talk like this:
x = something rescue something_else
What that does is to assign something
to x
, unless the attempt to fetch something
causes an error, in which case something_else
is assigned to x
. This is very elegant and a great time-saver.
The full form of a rescue
statement is like this:
rescue ExceptionClass1, ExceptionClass2 => err
That means: “If there is an exception, and if it is of class ExceptionClass1 or ExceptionClass2, then handle it, assigning it to the variable err
before proceeding.” If you don’t specify an exception class — that is, if you use rescue
without qualification — then only exceptions of class StandardError (and its subclasses) are handled. Unfortunately, lots of exception types are not descended from StandardError, so if you say rescue
without qualification you won’t handle them. This is a huge “gotcha” waiting to gobble up beginners; in fact, I regard it as the worst aspect of Ruby, a massive flaw in the jewel. I myself have been bitten very often, and now I almost never use bare rescue
.
You do not, however, have to include the => err
part; you can retrieve the exception object as $!
. To handle different exception types differently, you can use multiple rescue
structures, in order of increasing generality:
begin
# stuff that can go wrong
rescue ExceptionClass1
# what to do
rescue ExceptionClass2
# what to do
rescue Exception
# what to do
end
Even this does not express the fullest possible form of a rescue
structure, which is actually like this:
begin
# stuff that can go wrong
rescue ExceptionClass1
# what to do
rescue ExceptionClass2
# what to do
rescue Exception
# what to do
else
# what to do after the begin clause if nothing went wrong
ensure
# what to do last of all no matter what
end
Very elegant stuff can be done with the else
and ensure
clauses, but I’m not going to elaborate here.
To generate an exception, call the raise
method. If it takes a string parameter, raise
creates a new RuntimeError object and assigns the string to its message
, which is the description to be output if the exception is not handled. With no parameters inside a rescue
clause, raise
re-raises the handled exception. A not uncommon technique is to modify the handled exception and raise that:
begin
die
rescue Exception
$!.message << " and boy does that suck!"
raise
end
#=> NameError: undefined local variable or method ‘die’ for main:Object and boy does that suck!
One final word about Ruby control structures. Ruby is very flexible about lineation, and is remarkably forgiving of the use of clauses within larger expressions. Thus it is quite common, for example, to assign an entire if
structure to a variable, or to send a message to the result of a block:
x = if rand < 0.5
"smaller"
else
"larger"
end
arr = (1..7).map do |x|
next if x % 2 == 1
x * 2
end.compact
You’re a real Rubyist (and a happy programmer) when you’re comfortable talking like that.
One of Ruby’s nicest features is the ease with which it handles storage of different parts of a program in different files. The chief command here is the require
method. Its job is to load and execute a file, then and there, once. By “once” I mean that require
keeps a list of the parameters that it has been handed, and if it is handed a parameter that it has already seen, it does nothing. Thus, in a simple-minded way, require
tries to ensure that a file is loaded only once.
The parameter to require
is either a full pathname or a simple filename, and if it’s a simple filename it is quite usual to omit the extension “.rb”. If the parameter is a full pathname, it is loaded and executed. If the parameter is a simple filename, a global variable called $:
is consulted. Its value is an array of strings, each string being the pathname of a directory. So now require
runs through the array, looking for the file whose simple filename was given, appending the extension “.rb” to its name if needed.
The global variable $:
is an array like any other, which means that your code can modify it. A frequent technique is to modify $:
at the very beginning of a program, appending additional directories where you want require
to look. For example, the current working directory, "."
, is included in the default $:
list; but the current working directory is not the same as the directory containing the file that is running now. You might want require
to search that directory, or a particular directory within it. You can obtain the directory of the currently running file like this:
File.dirname(__FILE__)
You can append that, or some directory pathname based on a manipulation of it, to $:
in order to affect require
’s behavior.
Let me pause to emphasize that require
executes a file when it loads it. This is no different from what happens when Ruby itself is told to execute a file. We go through the loaded file from start to finish, executing as we go (and treating the loaded file as top-level code). As we’ve already seen, module
and class
and def
sandwiches are executable code; and this sort of thing is typically the point of loading and executing a file. For example, you might have a class MyCoolClass that you use frequently. So, you keep the definition of that class in a file, and whenever you write a Ruby program that needs MyCoolClass, you require
that file. The class
sandwich that opens MyCoolClass and endows it with methods is executed, and MyCoolClass springs to life, then and there. Subsequent code in your main file can now instantiate MyCoolClass.
Clearly, order matters; and for this reason, it is most common (though by no means necessary) for any require
calls to come very close to the start of a program file — so that the rest of that program file can take advantage of the modules and classes that were opened in the require
d file(s).
A file containing code intended to be used by other programs can be called a library. So far, we’ve been talking as if the only files you would require
are your own libraries; but in fact, Ruby itself comes with many libraries that are not loaded by default. For example, suppose you want to use Ruby’s Date class. To do so, you need (on my machine, at least) to require
the file that contains it:
require 'date'
puts "Today is #{Date::DAYNAMES[Date.today.wday]}"
#=> Today is Tuesday
Without the first line (which loads date.rb from one of the directories listed in $:
), there is no Date class.
Thus we see that libraries not only permit large programs to be broken up into multiple files, and endow frequently used code with reusability, but also prevent the global namespace from being unnecessarily overburdened. If you don’t need the Date class in a program, you don’t load it; so there’s no runtime penalty for keeping it around in case you do need it.
Third-party libraries are frequently packaged as gems. (As of this writing, there are about 4500 gems available.) The benefit of this mechanism is that a single command-line command, gem
, can go out on the Internet to locate, download and install the latest version of a gem. An installed gem’s code must be loaded with require
(the documentation for the gem will tell you the name of the file you’re after); on my machine (using Ruby 1.8.6), I have to require rubygems
before I can require a gem library.
As an example, let’s download and install a gem and use it. I’ll try the rdiscount
gem (a C implementation of John Gruber’s Markdown). Here we go. First, at the command line:
$ sudo gem install rdiscount
Now I’ll write a Ruby program to try it out:
require 'rubygems'
require 'rdiscount'
s = <<END
### Testing
This is a test of Markdown demonstrating:
* headers
* paragraphs
* lists
Let's see whether it works!
END
puts RDiscount.new(s).to_html
And here’s the output:
<h3>Testing</h3>
<p>This is a test of Markdown demonstrating:</p>
<ul>
<li>headers</li>
<li>paragraphs</li>
<li>lists</li>
</ul>
<p>Let's see whether it works!</p>
Truly, before writing this section I had never tried the rdiscount
gem before; I’d never even heard of it. Yet in seconds I had it downloaded, installed, and running. This shows how fast and easy gems can be.
There are many excellent and compendious introductory books about Ruby out there, and if you want to complete and firm up your knowledge of Ruby, you should read one of them sooner or later. I recommend particularly The Ruby Programming Language, by David Flanagan and Yukihiro Matsumoto (O’Reilly Media, Inc.), and Programming Ruby, by Dave Thomas et al. (The Pragmatic Programmers), widely known as “The Pickaxe Book”.
Ruby is backed by a tremendous depth of built-in core and library-based functionality (quite apart from all the downloadable gems). The built-in modules, classes, and methods are listed in various Web pages to which you can find links at http://ruby-doc.org/. For example, if you’re using Ruby 1.8.6, the links you want to click are called “1.8.6 core” and “The 1.8.6 Standard Library.” The first of those links leads to http://ruby-doc.org/core/, and it is quite amazing what you can learn by just occasionally clicking on a class name (in the second column at the top) and giving that page a good read.
Besides, Ruby has too many classes and methods for you learn them all, so you may as well get comfortable consulting the documentation. No matter what you want to do, there’s probably an easy way to do it in Ruby; it’s just a question of finding out what it is. No book could cover it all, and anyway that would be pointless; the documentation is the book.
You’re looking at a draft of a chapter from a work in progress, tentatively titled Scripting Mac Applications With Ruby: An AppleScript Alternative, by Matt Neuburg.
Covers rb-appscript 0.6.1. Last revised Jun 23, 2012. All content ©2012 by the author, all rights reserved.
This book took time and effort to write, and no traditional publisher would accept it. If it has been useful to you, please consider a small donation to my PayPal account (matt at tidbits dot com). Thanks!