6


It's a Library!



As you've seen, Ruby provides a number of built-in classes for you to access; it has a huge library of functions to cover anything from simple text input and output to networking to multimedia (not counting the boat load of third party libraries floating about). We've taken a look at some of the more "targeted" classes (i.e., win32, networking, and so on), but I want to take some time to devote a really quick overview to a few classes that are more miscellaneous in nature.

String Manipulation

Ruby's string support easily rivals that of Perl and other "power" languages. As a matter of fact, I've heard that when Larry Wall goes home at night he secretly moon lights as a Ruby programmer. Don't tell anyone I told you. There are two ways to manipulate strings: instance methods on string objects and regular expressions.

Instance Methods

Ruby strings are, of course, objects, and as such, offer methods to manipulate themselves in a variety of ways. First we will look at the simplest manipulation of a string: the splice. The first way I'd like to show splicing is the splice operator; this operator is used just like the array operator (i.e. it uses the object[index] form to reference and set the value of elements) with a few little string specific extras. For example:

the_alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
puts the_alphabet[0..2]
ABC

puts the_alphabet["HIJ"]
HIJ

puts the_alphabet[1,2]
BC

the_alphabet[1] = "!"
puts the_alphabet
A!CDEFGHIJKLMNOPQRSTUVWXYZ

As an alternative, you can use the slice method to grab elements in a similar manner:

puts the_alphabet.slice(0, 3)
ABC

puts the_alphabet.slice(0..5)
ABCDEF

Once you have your string data sliced and diced just the way you want it, you can manipulate it in a number of ways; these methods include ways to change case, edit certain content, or generally finagle with the string. Here are some examples:

a_str = "I likE To MesS with STRINGS!"

a_str.downcase
i like to mess with strings!

a_str.upcase
I LIKE TO MESS WITH STRINGS!

a_str.swapcase
i LIKe tO mESs WITH strings!

a_str.capitalize
I like to mess with strings!

a_str.capitalize!
puts a_str
I like to mess with strings!

a_str.squeeze
I like to mes with strings!

The names of these methods should make their usage obvious, with the exception of squeeze which will remove sets of the same character and replace them with a single instance of that character (i.e. "mess" is now "mes"); each of these methods also offers an "in place" version as demonstrated by capitalize!.

The insert method allows you to insert a string at a given index; be careful because this method will modify the string in place (I'm not sure why this method lacks a !, but it does):

into = "12345"
into.insert(2, "LETTERS!")
puts into
12LETTERS!345

On the other hand, you can use the delete method to remove characters or a range of characters from a string:

gone = "got gone fool!"
gone.delete("o", "r-v")
puts gone
g gne fl!

Placing a hyphen between two letters will tell Ruby to create an inclusive (i.e. also include the lower and upper limit characters, such as r and v in the example) range of characters. You can use this technique with a number of methods, but it is exceptionally useful with delete.

Removing the last character from a string can be really annoying and much longer that it should be in some languages (I'm looking right at you PHP); fortunately, like any good language that has strings, Ruby offers chomp and chop:

eat_me = "A lot of letters\n"
eat_me.chomp
A lot of letters

eat_me.chop
A lot of letter

eat_me.chomp("ter")
A lot of let

The chomp method will remove the record separator (stored in the global object $/) from the end of a string; if $/ hasn't been changed from the default, this means it will remove \n, \r\n, and \r (essentially any sort of newline character). You can also specify what to chomp by passing it in as a parameter as in the last example. The chop method will simply chop off the last character regardless of what it is (note: it will treat \r\n as one character).

If you simply need to remove the whitespace from the beginning and/or end of a string, then use one of the strip methods: lstrip, rstrip, or simply strip. For example:

stripper = " La di da! \t"
puts "[" + stripper.lstrip + "]"
[La di da! ]

puts "[" + stripper.rstrip + "]"
[ La di da!]

puts "[" + stripper.strip + "]"
[La di da!]

If it isn't obvious from the examples, the lstrip and rstrip methods strip off any sort of whitespace (i.e. spaces, tabs, newlines, etc.) from a string on the left or right side, respectively. The strip method strips whitespace from both sides.

The split method will probably be one of your most often used string methods; it breaks a string into substrings based on a specified delimiter. That's a bit of a mouth full, so let me show you an example:

breakable = "break,it,down,fool!"
breakable.split(",")
["break", "it", "down", "fool!"]

The split method breaks a string into pieces; these pieces are formed by breaking the string at a delimiter, removing that delimiter, and placing the remaining pieces into an array. This allows you to then use an iterating loop to go over the collection and do various operations on the data within.

But what if you need to do an operation that isn't allowed on a string? For example, you want to do some arithmetic on some numbers you read in from a text file, but text from files is always read in as strings. Ruby offers the to_f and to_i methods to allow just that:

numerical_string = "1234"

numerical_string + 6
! TypeError

numerical_string.to_f + 6.7
1240.7

It's important to remember that all data read from sockets, files, and users will be strings; if you plan on doing any sort of math with this data, you must run it through one of these methods first.

Regular Expressions

I mentioned regular expressions in passing earlier so the appearance of the syntax wouldn't confuse you if you were looking at Ruby examples elsewhere; now I'd like to take some time to discuss what regular expressions are and how incredibly useful they can be.

Firstly, let me say that I am not going to give you a grand tour of regular expressions, nor will I bore you with a biography of his third cousin, Samuel del Fuega IV. I will, however, provide some very rudimentary examples whereby I can show the usage of these methods and, for the special price of nothing at all, I am providing an appendix with URLs of pages to visit if you'd like to find out more about the dark craft of regular expressions (it's in Appendix A). Aren't you excited? Let's proceed.

A regular expression (sometimes called a regexp or regex) is a string that describes or matches a set of other strings, according to a set of syntax rules. Their uses range from simple searches to full string transformation involving searching, replacing, and shifting of data. They're a delightful addition to any programming language, a useful tool for string manipulation, and make a delectable topping for any dessert.

Regular expressions have a rich syntax which allows you to do a number of things, but for the sake of brevity and sanity of scope, I will use very simple regular expressions. You will need to go and read an introduction to regular expressions to fully and easily understand what is going on, but perhaps you can gather what's going on without it.

Let's begin by looking at the simplest use of regular expressions: matching. You can search within a string using regular expressions easily; let's say you were wanting to see if a string was a substring of another string. You could do this:

matchmaker = "I'm a cowboy, baby!"

matchmaker =~ /cow/
6

matchmaker.match("cow")
#<MatchData:0x61d3120>

Using the =~ will return the index of the first match of the pattern, but using the match method of a string, you can get a MatchData object which gives you a number of options for accessing the matches. For example:

my_message = "Hello there!"
my_message.match(/(..)(..)/).captures

["He", "ll"]

Using the captures method, you can grab the matches for each expression; the to_a method will offer you similar output but will also tag on the fully matched string. That was sort of a silly example, so let's look at a more plausible example. Say you wanted to grab the text between two delimiters; here is one way to do it:

html_element = "<html>"
my_element_text = html_element.match("(<)(.*)(>)").captures
puts my_element_text[1]
html

If you're the curious type, you can look in the Ruby API Documentation for more information on the MatchData class.

If you'd like to kick out the middle man and simply get an array of matches back from the method, you can use the scan method:

my_message = "Hello there!"
my_message.scan(/(..)(..)/)

[["He", "ll"], ["o ", "th"], ["er", "e!"]]

Other than the difference in return type between the two methods, match and scan also differ in "greediness." A "greedy" regular expression or method will match every occurrence of a pattern rather than just matching one. The scan method is greedy; it will match every occurrence of a pattern in a string (note: you can make match greedy using a certain kind of regular expression).

Another usage of regular expressions is substitution; substitution using regular expressions is very flexible if you use the right combination of potent regular expressions. Again, because I do not discuss advanced regular expressions, the true usefulness of this technique won't really register, but I hope that you will take a serious look at regular expressions and use them to your advantage. To substitute using a regular expression, you use the sub or gsub method of a string instance:

go_away = "My name is Freak Nasty!"

go_away.sub("y", "o")
Mo name is Freak Nasty!

go_away.gsub("y", "o")
Mo name is Freak Nasto!

go_away.sub("Freak Nasty", "Fred Jones")
My name is Fred Jones!

The sub method will only replace the first match for the pattern, but the gsub method is greedy (I'm sure the g didn't give it away) and will replace every match. Again, the greediness of each method can be gaged by the usage of certain regular expression constructs. The more powerful regular expressions you learn, the more you can do with them; remember to check out Appendix A for more information.

Date/Time

Have you ever found yourself in the deli, browsing the ham and other delectable meat products, and then realized that you left your calendar in Bermuda? That happened to me just last millenium, and let me tell you, I felt vulnerable. No calendar means, no days. No days means no nights, which means I could die. Fortunately, Ruby provided me with a fairly useful date and time library to use until I could get my half brother in law's pet monkey's trainer's dog to mail my calendar back yesterday.

There are three date and time classes in the Ruby library: Date, Time, and DateTime. I heard a rumor that Date and Time hooked it up around version 1.4 and got DateTime about 9 months later, but then again, this source also told me that Rails was a gentlemen's club for Ruby programmers.

Dates

The first class I'd like to cover is Date; this class simply exposes an interface to store, manipulate, and compare dates in a Ruby application.

mydate = Date.new(1999, 6, 4) 1999-06-04
mydatejd = Date.jd(2451334) 1999-06-04
mydateord = Date.ordinal(1999, 155) 1999-06-04
mydatecom = Date.commercial(1999, 22, 5) 1999-06-04
Date.jd_to_civil(2451334) [1999,6,4]
Date.jd_to_civil(2451334) [1999,6,4]
mydatep = Date.parse("1999-06-04") 1999-06-04

As you can see, creating a Date instance is rather simple in its literal form; simply call new (or the civil method; the two are synonyms), providing a date as the following parameters. This method uses the date form we usually see, but Date also supports other date forms. The jd method allows you to create a Date instance based on a Julian day number; the ordinal method creates a Date object based on a provided Ordinal date, or a date created providing the year and day number; commercial creates a Date object from the provided Commercial date, or a date created by providing the year, week number, and day number. Methods are provided to convert between these date forms also (e.g., commercial_to_jd, jd_to_civil, and so on). These all work well enough, but notice the last example using the parse method; this allows you to parse strings into Date objects. I find this is the most intuitive way of creating Date objects second to using the new method.

You can also test input with various class methods, and, once you have a Date object, get all sorts of information from it with a few instance methods.

Date.valid_jd?(3829) 3829
Date.valid_civil?(1999, 13, 43) nil
mydate.mon 6
mydate.yday 155
mydate.day 4

As you can see, you can test its validity in a certain format, and, using instance methods, convert between the different formats. You can also get various components of the date, such as the year day (yday), month (mon), and so on. You can also compare and manipulate dates using standard operators.

date1 = Date.new(1985, 3, 18)
date2 = Date.new(1985, 5, 5)

date1 < date2 true
date1 == date2 false
date3 = date1
date1 == date3 true

date1 << 3 1984-12-18
date2 >> 5 1985-10-05

As you can see, comparing dates is just like comparing a standard numerical value or something similar; a date that comes before another date is judged to be "less than"; a date that comes after is judged to be "greater than." You can also use the >> and << operator to add or subtract months (see the last two examples). Now that you have a familiarity with the Date class, let's move on to the Time class.

Times

The Time class is very similar in function to the Date class, except it concentrates on times and timestamps rather than simply dates. Much like the Date class, various constructors are available.

rightnow = Time.new
Sun Sep 10 21:36:15 Eastern Daylight Time 2006

Time.at(934934932)
Tue Aug 17 20:08:52 Eastern Daylight Time 1999

Time.local(2000,"jan",1,20,15,1)
Sat Jan 01 20:15:01 Eastern Standard Time 2000

Time.utc(2006, 05, 21, 5, 13)
Sun May 21 05:13:00 UTC 2006

Time.parse("Tue Jun 13 14:15:01 Eastern Standard Time 2005")
Tue Jun 13 14:15:01 Eastern Daylight Time 2006

As you can see, you can create a new Time object that holds the values for the current time and timezone by simply calling new (or optionally, now; they do the same thing). If you require a certain time, you can use at, which operates on epoch time (i.e., seconds from January 1st, 1970); you can also use the utc and gm methods to create times based on those timeszones and the provided parameters (or the local method to use the current local timezone). You can, just like Date, use the parse method to parse a timestamp into a Time object.

The Time class also offers a few instance methods that allow you to get portions of the object's value, convert the value, and output the value in other formats.

rightnow = Time.new
Sun Sep 10 21:42:30 Eastern Daylight Time 2006

rightnow.hour
21

rightnow.mon
9

rightnow.day
10

rightnow.to_i
1158543750

rightnow.to_s
Sun Sep 17 21:42:30 Eastern Daylight Time 2006

As you can see, the methods for Time are very similar to Date with regards to getting portions of the value, and also notice that you can convert the Time objec to other classes, such as a Fixnum.

Let's concentrate on one instance method for a moment; the strftime method is a very useful method that allows you output a timestamp in the format of your choice by providing you with a formatting interface. This interface acts very, very similarly to printf in C++; it uses delimiters like %f to indicate the placement of values in the output string. Here are a few examples:

rightnow = Time.now

rightnow.strftime("%m/%d/%Y")
09/10/2006
rightnow.strftime("%I:%M%p")
09:13PM
rightnow.strftime("The %dth of %B in '%y")
The 17th of September in '06
rightnow.strftime("%x")
09/17/06

The strftime method is one of the most complex in the Time module; check out the Time class's documentation at http://www.ruby-doc.org/core/classes/Time.html if you'd like more information about strftime and what you can do with it.

Dates and Times

The DateTime class combines the previous two classes into one convenient yet slightly less efficient class. The DateTime class is really just a subclass of Date with some time functionality slapped in there for good measure; it's a fine endeavour to be sure but not really worth the time if you ask me. Even so, it has some interesting functionality.

rightnow = DateTime.now
2006-09-10T21:56:45-0400

maytime = DateTime.civil(2006, 5, 6)
2003-05-06T00:00:00Z

parsed = DateTime.parse("2006-07-03T11:53:02-0400")
2006-07-03T11:53:02-0400

parsed.hour 11
parsed.day 3
parsed.year 2006

As you can see it works very similarly to the Date class; you can construct using the various date formats or parse a date/time string. Also notice that like the Date and Time classes, you can query various parts of the value inside the object.

You may be scratching your head right now asking which one you should use and when. Personally, I would never use DateTime, but rather Time or Date if at all possible. Sometimes this is unavoidable, but be aware that using just Date or Time in lieu of DateTime yieldds approximately 800% better performance. Sometimes performance, like size and Poland, does matter.

HASHING and CRYPTOGRAPHY

Sometimes you simply don't want people to be able to see your data transparently; I mean, maybe you've got this rash that you don't want people to know about. Or maybe there's something you just want to forget, so you hash it and never worry about it again. Forunately for me...er, I mean you...Ruby comes stock with a neat little hash library and has a gem that can be installed to offer cryptography.

Hashing

Think of hashing as one-way encryption; hashes are encrypted strings that are derived from a stream of data. Typical uses include password verification (i.e., you store the hash in the database, then test user input by hashing it and seeing if the hashes match) and file verification (i.e., two of the same file should have the same hash). Ruby offers the two most common hash types, MD5 and SHA-1, as built-in modules.

MD5 MD5 is the most widely used cryptographic hash function around; it was invented in 1994 as a successor to MD4 by Ronald Rivest, a professor at MIT. It's fallen out of mainline use as a secure hash function because of vulnerabilities that have been found, but it's still useful for matching values and such. Ruby's MD5 functionality isn't quite as easy as something like PHP (i.e., md5('your data');), but it's still usable and friendly enough.

require 'digest/md5'
md5 = Digest::MD5.digest('I have a fine vest!')
sXm(1r\371\353\027\367\235u!\266\001\262

md5 = Digest::MD5.hexdigest('I have a fine vest!')
73586d283172f9eb17f79d7521b601b2

The MD5 class offers two methods for getting a hash digest; the first is simply the digest method. This returns a pretty unsafe byte stream of the hash digest; I say unsafe because you could not embed this in something like an XML or HTML (or some databases) and expect it to behave properly. A better choice for these situations would be the hexdigest method (second example); this runs the results of the hash through a base64 hex algorithm, which is fancy talk for a method that makes it more friendly.

SHA-1 The SHA-1 hash algorithm is far more secure than MD5; though still not the most secure (i.e., exploits reportedly exist for it), it should work for most situations. It is widely used as a hashing algorithm in many secure contexts, such as in packages like TLS, SSL, PGP, SSH, and IPSec. Ruby offers the same interface to the SHA-1 algorithm as it does the MD5 algorithm.

require 'digest/sha1'
sha1 = Digest::SHA1.digest('I have a fine vest!')
\225J{{\233\025\236\273\344\003X\233\33 [...]

sha1 = Digest::SHA1.hexdigest('I have a fine vest!')
954a7b7b9b159ebbe403589bdaa8f981003a2fbc

As you can see, it functions exactly the same as the MD5 class, except you get a stronger hash. Now let's get away from hashes and take a look at cryptography.

Cryptography

Ruby does not have cryptographic capabilities built-in, so you have to resort to installing a gem. I guess technically this isn't a Ruby built-in library, but it's important enough to warrant a short mention. The third-party crypt library available at http://crypt.rubyforge.org is a pure Ruby cryptography library. You can install it by issuing the gem install crypt command to install its gem; look in Appendix A for a link on how to install and setup RubyGems if your Ruby installation doesn't have them already.

The crypt library offers four encryption algorithms: Blowfish, GOST, IDEA, and Rijndael. Fortunately, the interface for each on is relatively the same. Let's take a look at an example using the Blowfish algorithm from their documentation.

require 'crypt/blowfish'
blowfish = Crypt::Blowfish.new("A key up to 56 bytes long")
plainBlock = "ABCD1234"
encryptedBlock = blowfish.encrypt_block(plainBlock)
\267Z\347\331~\344\211\250
decryptedBlock = blowfish.decrypt_block(encryptedBlock)
ABCD1234

This is one of the easiest cryptography libraries out there; simply feed it a key in the constructor, call the encrypt_block method to encrypt the data, and then decrypt_block to decrypt it. Since the developers went to great lengths to keep the API basically the same for all the algorithms, you can simply subsitute the other algorithm names in place of Blowfish to get them working (i.e., put Rijndael in place of Blowfish and it should work just the same). There are other restrictions on key length and such, along with other methods and functions you can use. Check out http://crypt.rubyforge.org/ to learn more.

Unit testing

Test Driven Development is the new hotness, especially in the Ruby development world. I'm sure the Ruby core team took this into account when they built a unit testing framework into the standard library of the language: Test::Unit. Ruby's unit testing framework is excellent (and has been made better and/or replaced and improved by other frameworks) yet very simple to use.

The basic premise of testing is to make sure that your code is behaving correctly in as many contexts as you can simulate programmatically. This might sound stupid, but trust me: you'll catch twice as many bugs using a unit test as you will by just playing around with your application because you know how it is supposed to operate but the computer doesn't. It has no "developer's discrimination" when it comes to using your application. You know what I'm talking about; no one wants their application to break, so they unconsciously tip-toe around what might become a bug. I do it all the time. That's why I use testing.

Ruby's unit testing framework provides a simple interface for performing tests. Let's say you wanted to test your class that stores MAC addresses for your locally networked client applications.

class MacAddr
def to_s
return @mac_parts.join(":")
end

def initialize(mac)
if mac.length < 17
fail "MAC is too short; make sure colons are in place"
end

@mac_parts = mac.split(':')
end

def [](index)
return @mac_parts[index]
end
end

This simple class has three methods: to_s, intitialize, and an index method ([]). The constructor, initialize, takes a string with a MAC address in it. If it is the wrong length, an exception is thrown; otherwise it's broken up on the colons (part of the standard MAC notation) and placed in an instance variable. The to_s method joins this array together with colons and returns it. The index method ([]) will return the requested index from the MAC address array (@mac_parts). Now that we have something to work with, let's build some tests.

Tests with Test::Unit center around inheriting from the Test::Unit::TestCase for each test case. So, if we were to create a test case for our MAC address class, we would do something like the following.

require 'test/unit'

class TestMac < Test::Unit::TestCase
end

Simple enough, right? Now that you have a test case class, you need to fill in tests. Tests could be written as a bunch of ugly if statements, but unit testing frameworks do their best to get away from that by providing you with assertions (i.e., wrappers for those conditional statements that hook into the framework in a meaningful way). The first type of assertion we'll look at are the equality tests. There are two equality assertions: assert_equal and assert_not_equal. Let's create a couple of those tests now in the same file we created the class.

require 'test/unit'

class TestMac < Test::Unit::TestCase
def test_tos
assert_equal("FF:FF:FF:FF:FF:FF",
MacAddr.new("FF:FF:FF:FF:FF:FF").to_s)
assert_not_equal("FF:00:FF:00:FF:FF",
MacAddr.new("FF:FF:FF:FF:FF:FF").to_s)
end
end

We've basically created two tests. The first makes sure that if we give it a MAC address, it will return it properly when using to_s. The second one does the same thing, but in an inverse conditional, we feed it a different value to make sure they're not equal. Upon running the tests, we should hopefully see success.

Loaded suite unit_test
Started
.
Finished in 0.0 seconds.

1 tests, 2 assertions, 0 failures, 0 errors

And we do. Excellent. Now let's take a look at another type of assertion: nil assertions. These assertions, assert_nil and assert_not_nil, do basically the same thing as if you did an assert_equal and tested for nil. Let's create a test using assert_not_nil.

require 'test/unit'

class TestMac < Test::Unit::TestCase
def test_tos
assert_equal("FF:FF:FF:FF:FF:FF",
MacAddr.new("FF:FF:FF:FF:FF:FF").to_s)
assert_not_equal("FF:00:FF:00:FF:FF",
MacAddr.new("FF:FF:FF:FF:FF:FF").to_s)
assert_not_nil(MacAddr.new("FF:AE:F0:06:05:33"))
end
end

Again, upon running the tests, we should hopefully see a successful run without and errors or failures.

Loaded suite unit_test
Started
.
Finished in 0.0 seconds.

1 tests, 3 assertions, 0 failures, 0 errors

And we do. Great! Now, let's look at one final type of assertion that deals with exceptions. Ruby's testing framework allows you not only to test the return value of units of code, but also to test whether they raise exceptions or not. We told our class to raise an exception if the MAC address isn't the right length, so let's write a test to test that.

require 'test/unit'

class TestMac < Test::Unit::TestCase
def test_tos
assert_equal("FF:FF:FF:FF:FF:FF",
MacAddr.new("FF:FF:FF:FF:FF:FF").to_s)
assert_not_equal("FF:00:FF:00:FF:FF",
MacAddr.new("FF:FF:FF:FF:FF:FF").to_s)
assert_not_nil(MacAddr.new("FF:AE:F0:06:05:33"))
assert_raise RuntimeError do
MacAddr.new("AA:FF:AA:FF:AA:FF:AA:FF:AA")
end
end
end

Now, if we run these tests again, we'll hopefully see another wonderfully successful run.

Loaded suite unit_test
Started
F
Finished in 0.015 seconds.

1) Failure:
test_tos(TestMac) [test21.rb:27]:
<RuntimeError> exception expected but none was thrown.

1 tests, 4 assertions, 1 failures, 0 errors

Oops! If you look at our constructor, we merely test if the MAC address is too short. Let's switch that < to a != so that it catches it whether it's too short or too long and try these tests again.

Loaded suite unit_test
Started
.
Finished in 0.0 seconds.

1 tests, 4 assertions, 0 failures, 0 errors

Great! We've built a small test suite for our class. Of course, this is just one class in a whole application, and each class should have its own test suite. As test suites grow, you'll inevitably want to break them into separate files, since you wouldn't want to keep all 1,200 of your test cases for your breakdancing panda screen saver in one file. Fortunately, Test::Unit is smart enough to pick up on numerous test files being included into one test run. This means you could do something like the following without any problems.

require 'test/unit'
require 'pandatest'
require 'breakdancetest'
require 'breakdancingpandatest'
require 'somewhatperipheralelementstest'
require 'somewhatperipheralelementsbestfriendsunclestest'

I've just given you a basic rundown of testing; I'm providing a list of links in Appendix A that can take you deeper into Test Driven Development and testing with Ruby. Also be sure to check out the Test::Unit documentation at http://www.ruby-doc.org/stdlib/libdoc/test/unit/rdoc/classes/Test/Unit.html to find about other available assertions (there are a few I don't cover here because they're not very common).