Cohesive behaviors with data clumps

A good example of how we use context and locality to understand and manage concepts in our code is using a data clump.

A data clump is a collection of two or more bits of information that are consistently used together. You’ll find that your data loses its meaning when you remove items from the clump.

Date ranges are simple examples of how a data clump puts necessary information into context.
An example of this is to find out if a question was asked between today and one month ago. If our Question class implements a query method for this:

class Question
      def asked_within?(start_date, end_date)
        (start_date..end_date).cover?(self.asked_date)
      end
    end

Then we can pass in our desired dates to get the answer:

# using ActiveSupport
    start_date = 1.month.ago
    end_date = Time.now
    question.asked_within?(start_date, end_date)

Discovering whether a question is within this time frame always requires both a start and end date. This is an indication that we can only understand the feature and indeed only implement it when we have this data clump. To better encapsulate the behavior of these values, we can create a class to manage initializing objects that represent them.

DateRange = Struct.new(:start_date, :end_date)
    last_month = DateRange.new(1.month.ago, Time.now)
    question.asked_within?(last_month)

We can then change our Question class to instead take a date range object for the asked_within? method, but the question’s responsibilities have grown a bit here. A question doesn’t have anything to do with comparing dates, so we can move the control of that information into the data clump that represents them.

DateRange = Struct.new(:start_date, :end_date) do
      def contains?(date)
        (start_date..end_date).cover?(date)
      end
    end
Now, instead of the question managing its date comparison, the date range can do the work.
last_month.contains?(question.date_asked)

By analyzing the individual parts of this date comparison we have to juggle a bit more in our heads. Considering a range as an complete object rather than a collection of parts is simpler and we tend not to think of every individual day within a month when doing a mental comparison. A date range is a small system of interacting parts that we better understand as a broader context.

This example shows us the value not only of separating responsibilities, but of bringing objects together. We get more value by putting details into context than we would have if they remained separate.

Things to note

Struct.new returns a class instance. Inheriting from the result of a new Struct creates an anonymous class in the ancestors of your created class:

[DateRange, #, Struct, ...]

Instead of class DateRange < Struct.new; end use DateRange = Struct.new and avoid an anonymous class in the ancestors:>

[DateRange, Struct, ...]

Additionaly, be careful with large ranges. If our code used include? instead of cover?, Ruby would initialize a Time object for every time between the beginning and end. As your range grows, the memory needed to calculate the answer will grow too.

Avoid excessive memory and use cover? instead. It will check that your beginning date is less than or equal to the given date, and that the given date is less than or equal to the end date.

This article is an excerpt from my book Clean Ruby