Commanding objects toward immutability

Following the rules for East-oriented Code helps me organize behavior in my code but it can lead to other benefits as well. As a result of following the rules, I find that my code is better prepared for restrictions like that which immutable objects introduce.

I recently went looking for samples of how people are using instance_eval and instance_exec and ended up with a great example from FactoryGirl thanks to Joshua Clayton. As I was searching, I came upon some code which happened to use instance_eval. Although it was a simple use case for that method it lent itself as a much better example of commands, immutability, and East-oriented code.

Here's the details...

If we want to use nested blocks to create a tree structure, we might create some pseudo-code like this to illustrate our desired code:

Node.new('root') do
  node('branch') do
    node('leaf')
    node('leaf2')
    node('leaf3')
  end
end

The representation of this set of objects should look something like this:

"['root', ['branch', ['leaf', 'leaf2', 'leaf3']]]"

This shows that the created tree is a pair of a named node and an array of named children (who can also have children).

Imperative approach

A simple solution is to initialize a Node and, using an imperative approach, to change its state; that is to say that we alter its collection of children.

class Node
      def initialize(name, &block)
        @name = name
        @children = []

        instance_eval(&block) if block
      end    

      attr_reader :children, :name

      def node(name, &block)
        children << Node.new(name, &block)
      end
    end

When each node is created, its collection of children is set to an empty array. With each call to the node method, a new Node object is created and shoveled into the collection of children.

If we refactor our sample to inline the methods and show us exactly what's going on, it would look something like this:

Node.new('root') do
  self.children << Node.new('branch') do
    self.children << Node.new('leaf')
    self.children << Node.new('leaf2')
    self.children << Node.new('leaf3')
  end
end

We can more clearly see what's happening inside of the node method with this change to our code.

Eastward flow

As I worked with this problem I wondered: what would happen if I started following the 4 rules of East-oriented code?

If our node method returns self, how does that affect our code?

class Node
      # initialize omitted...

      def node(name, &block)
        children << Node.new(name, &block)
        self
      end
    end

Fortunately, because our code relies on an imperative approach by changing the state of the children, the code still works.

If we want, we can shrink the space we use by chaining commands together:

t = Node.new("root") do
      node("branch") do
        node("subbranch") do
          node("leaf").node("leaf2").node("leaf3")
        end
      end
    end

I think that's actually a little more difficult to read, so we can go back to the regular style:

node("leaf")
    node("leaf2")
    node("leaf3")

When seeing techniques like returning self to encourage an East-oriented approach, it's easy to fixate on the chaining. But it's commands that we want to introduce, not chaining. The chaining is incidental here.

If you do chain your method calls together, it at least appears more clearly that each subsequent method is operating on the return value of the last one.

If we want to be clear that we're operating on the last return value, we can maintain the readability of the multiline option by writing it like this:

node("leaf").
    node("leaf2").
    node("leaf3")

Each line chains the next by adding the dot character. We don't have a specific need to do this, but it's good to know how it works.

Not much has changed after introducing our East-oriented approach. We're still updating that collection of children.

Introducing immutability

What will we see if we introduce immutable objects to our solution?

Immutable objects might just help us make our code more predictable. An object which never changes, of course, stays the same. This allows you to better handle the behavior of the system and, without changing any objects, makes a multithreaded approach much less likely to introduce headaches.

The simplest way to add immutability is to freeze objects as they are initialized:

class Node
      def initialize(name, &block)
        @name = name.freeze
        @children = [].freeze

        instance_eval(&block) if block
      end

      attr_reader :children, :name

      def node(name, &block)
        children << Node.new(name, &block).freeze
        self
      end
    end

This, of course, breaks everything. Our code relies upon the fact that the children array may be mutated. Instead of doing the mutation, we'll see this:

RuntimeError: can't modify frozen Array

Now what?

If we can't alter the collection, we're left at creating an entirely new one.

One thing we could do is change the constructor to accept a collection of children when the Node is initialized. Instead of altering the children, we'd use a constructor like this Node.new(name, chlidren). Here's what that looks like:

class Node
      def initialize(name, children=[], &block)
        @name = name.freeze
        @children = children.freeze

        instance_eval(&block) if block
      end
      # ... omitted code

    end

That still doesn't allow us to change anything until we also change the way our node method works (since it is responsible for handling changes to the children).

If the node method created a new Node instead of altering the children, that would get us what we want. Let's break it down.

First, when the node method is called, it needs to create the node to be added to the collection of children:

def node(name, &block)
      new_child = Node.new(name, &block)
      # ... ?
      self
    end

Since we're trying to avoid mutating the state of this object, we don't want to just shove the new node into the collection of children (and we can't because we used freeze on it).

So let's create an entirely new node, with an entirely new collection of children. In order to do that, we need to ensure that for every existing child object, we creat a corresponding new node.

For each command to the object with node, we'll get the representation of what the children should be. So let's build a method to do that:

def next_children
      children.map{|child| Node.new(child.name, child.next_children) }.freeze
    end

When we changed our initializer, that allowed us to set the list of children. Our new next_children method relies on that feature and a recursive call to itself to build the collection of children for that new node with Node.new(child.name, child.next_children).

Looking back at our node method we'll need to break the rules of East-oriented Code. Since we have immutable objects, we'll return a new node instead of self.

def node(name, &block)
      new_child = Node.new(name, &block)
      Node.new(self.name, next_children + [new_child])
    end

But there's still a problem left. Because we need our initialized object to execute a block and the contstructor new might actually need to return a different object than the one originally created. The call to node inside the block changes the return value from the instance that new creates, to the instance that node creates.

Controlling the constructor

To better handle our immutable objects and the return values from the methods we created, we can alter the way the new method works on our Node class.

Instead of handling a block in the initialize method, we can move it to new.

Here's the new new method:

def self.new(*args, &block)
      instance = super.freeze
      if block
        instance.instance_eval(&block)
      else
        instance
      end
    end

The first step is to call super to get an instance the way Ruby normally creates them (as defined in the super class of Node). Then we freeze it.

If we haven't provided a block to the new method, we'll want to return the instance we just created. If we have provided a block, we'll need to evaluate that block in the context of the instance we just created and return it's result.

This means that the block can use the node method and whatever is returned by it.

We need to alter the new method this way because we're not always just returning the instance it creates. Since our objects are frozen, we can't allow the block to alter their states.

The way new usually works is like this:

def self.new(*args, &block)
      instance = allocate
      instance.send(:initialize, *args, &block)
      return instance
    end

You can see the reason that Ruby has you call new on a class but in practice you write your initialize method. This structure ensures that no matter the result of your initialize method, new will always return an instance of the class you've used.

We're bending the rules to allow us to evaluate the given block and return its result, instead of the instance typically created by new.

After that, we can remove the block evaluation from initialize:

def initialize(name, children=[])
      @name = name.freeze
      @children = children.freeze
    end

While the method signature (the list of accepted arguments) has changed for initialize, it's still the same for new: a list of arugments and a block.

Believe it or not, there's still one more problem to solve.

Operating on values

We looked at how returning self allows you to chain your method calls. Although we've broken that rule and are instead returning a new Node object, it's important to consider that chaining.

Our initial code still doesn't work quite right and it's all because we need to think about operating on the return values of our commands and not relying on an imperitive approach to building and changing objects.

First, here's what our Node class looks like:

class Node
      def self.new(*args, &block)
        instance = super.freeze
        if block
          instance.instance_eval(&block)
        else
          instance
        end
      end

      def initialize(name, children=[])
        @name = name.freeze
        @children = children.freeze
      end

      attr_reader :children, :name

      def node(name, &block)
        new_child = self.class.new(name, &block)
        self.class.new(self.name, next_children + [new_child])
      end

      def next_children
        children.map{|child| self.class.new(child.name, child.next_children) }.freeze
      end

      def inspect
        return %{"#{name}"} if children.empty?
        %{"#{name}", #{children}}
      end
    end

We didn't discuss it, but there's an inspect method to return either the name of the node if it has no children, or the name and a list of children if it has some.

Here's what the code to create the tree looks like:

Node.new('root') do
      node('branch') do
        node('leaf')
        node('leaf2')
        node('leaf3')
      end
    end

If we assign the result of that to a variable and inspect it we'll get a surprising result.

t = Node.new('root') do
          node('branch') do
            node('leaf')
            node('leaf2')
            node('leaf3')
          end
        end
    puts [t].inspect

The output will only be

["root", ["branch", ["leaf3"]]]

So what happened to the other leaf and leaf2 objects? Why aren't they there?

Remember that each node call returns a new node. With every node a new result is returned. The node('leaf') returns an object, but node('leaf2') is not a message sent to the object returned by the first. It is a message sent to the node('branch') result.

Each of those calls is returned and forgotten. Here it is annotated:

t = Node.new('root') do
          node('branch') do
            node('leaf') # returned and forgotten
            node('leaf2') # returned and forgotten
            node('leaf3') # returned and used as the final result
          end
        end
    puts [t].inspect
    #=> ["root", ["branch", ["leaf3"]]]

The answer to this problem is to command each object to do the next thing. We can achieve this by chaining the methods. The result of one method is the object which will receive the next command.

t = Node.new('root') do
          node('branch') do
            node('leaf'). # dot (.) charater added to chain
            node('leaf2'). # executed on the result of the last node
            node('leaf3') # executed on the result of the last node
          end
        end
    puts [t].inspect
    #=> ["root", ["branch", ["leaf", "leaf2", "leaf3"]]]

An alternative way to look at this is to store the result of each command:

t = Node.new('root') do
          node('branch') do
            branch = node('leaf')
            next_branch = branch.node('leaf2')
            final_branch = next_branch.node('leaf3')
          end
        end
    puts [t].inspect
    #=> ["root", ["branch", ["leaf", "leaf2", "leaf3"]]]

Following the rules so you know when to break them

What was interesting about this to me was that my code was prepared for the immutable objects when I prepared it to operate on the same one. By structuring my code to return self and send the next message to the result of the last, I was able to change the implementation from an imperative style to a functional style.