Enumerate All The Collections!!! (Or is it "Collect all the Enumerables?")
posted on November 2, 2015
In Ruby, an Enumerable is a...
... mixin
that provides methods for collections of items (like arrays, hashes, ranges, and pseud-arrays like "Ranges" in Ruby or strings) so that they can be stepped through, then filtered or searched to find one or more elements that match one or more criteria.
But let's backtrack first!!!!! What even is a mixin
?!? (And how is that even pronounced???!!?!?!!!!)
To answer that question, in programming, there is a software design technique called modular programming. In this technique, the functionality of a program is separated into independent, interchangeable modules, such that each contains everything necessary to execute only one aspect of the desired functionality. In other words, it's a thing that contains common functionality or methods that classes or objects have in common. So this "thing" gets shared by the classes and objects that needs these functionality and methods, instead of each class or each object writing their own. For Ruby, this is called either a "mixin
" (pronounced as "mix-in"), and sometimes it's called a "module
" interchangeably.
To create a module
in Ruby, we simply start the code block with module NameOfMyModule
. To put this module
into a class or object that wants to use the functionality of this module, we don't use inheritance, but we do this by putting "include NameOfMyModule
" in the beginning of the class, right under class NameOfMyClass
.
But why can't we just use good ol' inheritance?
Well, inheritance is good most of the time, especially when the class or the object is a kind of class or object of the parent class that it inherits from. But there are some cases where classes or objects from VASTLY different parent classes would need to share some common actions. For example, a string versus an array versus a hash are different from each other, but they need to share some common actions, like loop through each element. To loop through or iterate from a hash is very different from iterating through a string, but they both need those actions. So if I tried to have a String object inherit the looping capabilities of a Hash object (AKA "multiple inheritance", I could run into all kinds of problems. Wikipedia gives a very good explanation about this phenomenon called "the diamond problem". In other words, one does not simply use multiple inheritance. Because if your parent classes have methods that override each other, you're gonna have a bad time.
Anyway, for those of you curious to know, this is why a module
is sometimes called a mixin
, and where the name comes from.
Specifically for Enumerables in Ruby...
In Ruby, common characteristics among the Collections
-type objects tend to reside in the "Enumerable
" module. Any class that aspires to have enumerable-like properties must have two important things:
- "
include Enumerable
" in the beginning of the class, right underclass NameOfYourClass
- a
.each
method whose job it is toyield
items to a supplied code block, one at a time.
Exactly what .each
does will vary from one class to another. In the case of an array, .each
yields
the first element, then the second, and so forth. In the case of a hash, it yields
key/value pairs in the form of two-element arrays. In the case of a filehandle, it yields one line of the file at a time. Ranges iterate by first deciding whether iterating is possible (if the start point is a float, this isn’t possible) and then pretending to be an array.
If you make your own class, .each
can mean whatever you want it to mean, as long as it yields
something. So .each
has different semantics for different classes. But however each is implemented, the methods in the Enumerable module depend on being able to call it. As long as "include Enumerable
" is present and there is an each
method inside, NameOfYourClass
will now automatically just have methods built on top of the each method. This is because all of those methods are built on top of each
.
[1] 2.2.3 > Enumerable.instance_methods.sort [ [ 0] :all?, [ 1] :any?, [ 2] :chunk, [ 3] :collect, [ 4] :collect_concat, [ 5] :count, [ 6] :cycle, [ 7] :detect, [ 8] :drop, [ 9] :drop_while, [10] :each_cons, [11] :each_entry, [12] :each_slice, [13] :each_with_index, [14] :each_with_object, [15] :entries, [16] :find, [17] :find_all, [18] :find_index, [19] :first, [20] :flat_map, [21] :grep, [22] :group_by, [23] :include?, [24] :inject, [25] :lazy, [26] :map, [27] :max, [28] :max_by, [29] :member?, [30] :min, [31] :min_by, [32] :minmax, [33] :minmax_by, [34] :none?, [35] :one?, [36] :partition, [37] :reduce, [38] :reject, [39] :reverse_each, [40] :select, [41] :slice_after, [42] :slice_before, [43] :slice_when, [44] :sort, [45] :sort_by, [46] :take, [47] :take_while, [48] :to_a, [49] :to_h, [50] :zip ] [2] 2.2.3 > Enumerable.instance_methods(false).sort [ [ 0] :all?, [ 1] :any?, [ 2] :chunk, [ 3] :collect, [ 4] :collect_concat, [ 5] :count, [ 6] :cycle, [ 7] :detect, [ 8] :drop, [ 9] :drop_while, [10] :each_cons, [11] :each_entry, [12] :each_slice, [13] :each_with_index, [14] :each_with_object, [15] :entries, [16] :find, [17] :find_all, [18] :find_index, [19] :first, [20] :flat_map, [21] :grep, [22] :group_by, [23] :include?, [24] :inject, [25] :lazy, [26] :map, [27] :max, [28] :max_by, [29] :member?, [30] :min, [31] :min_by, [32] :minmax, [33] :minmax_by, [34] :none?, [35] :one?, [36] :partition, [37] :reduce, [38] :reject, [39] :reverse_each, [40] :select, [41] :slice_after, [42] :slice_before, [43] :slice_when, [44] :sort, [45] :sort_by, [46] :take, [47] :take_while, [48] :to_a, [49] :to_h, [50] :zip ]
false
inside .instance_methods()
would make this list shorter? I don't see the difference, but this was done in coderpad.io, so maybe that's why? Oh well... Arrays serve generically as the containers for most of the results that come back from enumerable selecting and filtering operations, whether or not the object being selected from or filtered is an array. There are some exceptions to this quasi-rule, but it holds widely.
The array is the most generic container and therefore the logical candidate for the role of universal result format. A few exceptions arise. A hash returns a hash from a select or reject operation. Sets return arrays from map, but you can call map! on a set to change the elements of the set in place. For the most part, though, enumerable selection and filtering operations come back to you inside arrays.
Now, here is a little more detail about the the Enumerable
method called .cycle
:
Ruby-doc.org defines cycle
as: "Calls block for each element of enum repeatedly n times or forever if none or nil is given. If a non-positive number is given or the collection is empty, does nothing. Returns nil if the loop has finished without getting interrupted. #cycle saves elements in an internal array so changes to enum after the first pass have no effect. If no block is given, an enumerator is returned instead"
In other words, cycle
is basically a combination of .times
AND either a for
loop or .each
inside of it. So these 4 sets of code are equivalent and do the exact same thing. (Although, it's super-important to put a parameter inside the parentheses of .cycle()
, or else you'll end up with an infinite loop!!!)
counter = 0 while counter < number_of_times_to_redo for item in my_generic_enumerable puts item end counter += 1 end
number_of_times_to_redo.times { for item in my_generic_enumerable puts item end }
number_of_times_to_redo.times { my_generic_enumerable.each { puts item } }
my_generic_enumerable.cycle(number_of_times_to_redo) { puts item }
I think that's really awesome how all those lines of code were able to be refactored to only 3 lines of code!!! ^__^