Behind the Scenes of the 'Has Many' Active Record Association
In The Rails 4 Way, Obie Fernandez describes what happens when you “wire up” associations (like has_many
and belongs_to
) between Active Record models:
When these relationship declarations are executed, Rails uses some metaprogramming magic to dynamically add code to your models. In particular, proxy collection objects are created that let you manipulate the relationship easily.
– The Rails 4 Way, page 181
What exactly is this “metaprogramming magic” that goes on behind the scenes? In this blog post, I hope to uncover exactly what happens when has_many
is used in an Active Record model.
TL;DR: When a class inherits from ActiveRecord::Base
, it gets access to a large amount of class and instance methods, including the has_many
class method. When has_many
is called on a class, a GeneratedAssociationMethods
module is created on the fly and mixed into the class. A number of instance methods are then defined on this module (these vary depending on the type of association). These methods allow the original class to access and manipulate objects from the associated class passed as an symbol argument to the has_many
method. Pretty magical if you ask me.
Diving into the Source Code
The Rails source code has always been intimidating for me. There’s a ton of “metaprogramming magic” going on under the hood that appears incomprehensible or esoteric to a beginner. Though it’s all written in Ruby, which I feel reasonably comfortable with, I’m a total noob when it comes to metaprogramming, so my previous attempts to read the Rails source code have not gone very far.
I try not to feel too bad about this. I’ve only been coding for about six months, and Rails has been constantly developed over the last ten years by some of the best programmers out there.
My first step was to go the source: the Rails source code, that is. I cloned the master branch down to my computer, and started digging in.
git clone git@github.com:rails/rails.git
I started to feel differently about the code once it was on my local machine. Suddenly I felt like I had more control over the code, just as I would with any other Ruby files open in Sublime Text on my computer. Rather than getting distracted by the complexity and sheer magnitude of the Rails codebase, I relied on techniques that I was already confident with and worked my way through from there.
NOTE: I have pinned the links down to a commit in the Rails 4.1.4 era (when this post was written). Plenty of changes have been made since then, so be sure check out the latest version of the repo and investigate the code for yourself!
Starting With What I Know
Let’s make a basic Rails app using Game of Thrones as its domain model. At first, I have two classes, House
and Character
, both of which inherit from ActiveRecord::Base
. I’d wire up the associations between these classes as such:
# app/models/house.rb
class House < ActiveRecord::Base
has_many :characters
end
# app/models/character.rb
class Character < ActiveRecord::Base
belongs_to :house
end
This domain logic makes sense; in Game of Thrones, characters often introduce themselves in the fashion “I am Arya of House Stark.” Arya can be said to belong to House Stark, while House Stark can be said to have many characters (a collection, if you will) associated with it.
At this point, here’s what I know about the code in the House
model:
-
The
House
class inherits fromActiveRecord::Base
. This gives theHouse
class and instances of this class access to a number of methods defined in the numerous modules included in theActiveRecord::Base
class. We can now say that theHouse
class is an “Active Record model”, as opposed to a “plain old Ruby object”. -
In Rails, an Active Record model is automatically mapped to a table in the database (in this case
houses
), and attributes (i.e. methods) are created in the model that correspond to each column in this database table. Similarly, each instance of an Active Record model “wraps” one row in this table. This is the key to Rails’s adaptation of the Active Record pattern – database logic is effectively encapsulated into methods on the model layer. -
has_many
is a “bareword”. In Ruby, barewords can only be local variables, method arguments, keywords (likeclass
,def
, andend
), and method calls. In this context, it’s neither a local variable nor a method argument, and it’s certainly not a keyword. Therefore it must be a method. -
If
has_many
is a method, then in this context its receiver must be theHouse
class. That makeshas_many
a class method. It could also be written asself.has_many
, but the receiver is implicit here so we don’t need to writeself
. -
It then follows that
:characters
is a method argument passed to thehas_many
method. Ruby allows you to leave off the parentheses surrounding method arguments, though it is generally good practice to keep the parentheses in when defining a method with arguments.
Knowing that has_many
is a class method defined somewhere in Active Record helped to narrow things down. From here, it was very easy to track the method definition down by doing a global search for def has_many
on the codebase using Command + Shift + f
in Sublime Text! Only one result appeared – a two-line method in the ActiveRecord::Associations::ClassMethods
module:
def has_many(name, scope = nil, options = {}, &extension)
reflection = Builder::HasMany.build(self, name, scope, options, &extension)
Reflection.add_reflection self, name, reflection
end
Taking it One Line at a Time
The has_many
method signature is actually quite straightforward. Also, there is a copious amount (~200 lines) of documentation directly above the method definition that describes what has_many
is, gives examples of how to use it, details the methods that are added when a has_many
association is declared, and describes what all of the method arguments it takes are.
Here’s a quick rundown of the arguments that has_many
can take:
-
name
: The only required method argument here. The name conventionally takes a symbol the plural form and references a collection embodied by another class. In our case,:characters
is passed in as thename
parameter. The:characters
collection refers to multiple instances of theCharacter
class (or rows of thecharacters
table in the database) that we want to associate with one instance of theHouse
class. This is the one-to-many relationship that lets us know we should use thehas_many
association to begin with. -
scope
: Defaults tonil
. A scope must be an object with acall
method, solambda
s are generally used here. Scopes can help you narrow in on a more targeted set of records to retrieve from the database. In our example, if we only wanted to associate living characters with their houses, we’d use the following scope:
class House < ActiveRecord::Base
has_many :characters, -> { where(deceased: false) }
end
(Though this might get complicated when dealing with the Iron Islands…)
-
options
: Defaults to an empty hash. Here you can further customize the nature of the association. Some common options to pass here arethrough: :join_table
,dependent: :destroy
,polymorphic: true
, andforeign_key: :uuid
. -
&extension
: Explicit block argument, as indicated by the ampersand. I have never used this before, but the comments suggest that association extensions are “useful for adding new finders, creators and other factory-type methods to be used as part of the association.”
To figure out the next line, I first tracked down the ActiveRecord::Associations::Builder::HasMany
class, which is quite slim, containing just three one-line methods! Unfortunately, build
is not one of these methods. I traversed up the inheritance hierarchy, from ActiveRecord::Associations::Builder::HasMany
to ActiveRecord::Associations::Builder::CollectionAssociation
and finally to ActiveRecord::Associations::Builder::Association
, where the class method build
is defined.
def self.build(model, name, scope, options, &block)
if model.dangerous_attribute_method?(name)
raise ArgumentError, "You tried to define an association named #{name} on the model #{model.name}, but " \
"this will conflict with a method #{name} already defined by Active Record. " \
"Please choose a different association name."
end
builder = create_builder model, name, scope, options, &block
reflection = builder.build(model)
define_accessors model, reflection
define_callbacks model, reflection
define_validations model, reflection
builder.define_extensions model
reflection
end
The ActiveRecord::Associations::Builder::Association::build
method takes 5 arguments. The latter four of these arguments are identical to those initially passed to has_many
. The first method argument is referred to as model
in build
’s method signature. Inside the has_many
method, the value passed as this argument is self
.
Given that the has_many
method is inside the ActiveRecord::Associations::ClassMethods
module, I assumed that somewhere along the line this module is being mixed into a class. The value of self
inside a class method is the class itself – that is, whichever class called the has_many
method to begin with (in our case House
).
I had a difficult time tracking down where, if anywhere, this module is mixed in to the House
class. Then I had a stunning (and obvious) realization: this module is mixed in to every single Active Record model! From here, it was easy to hunt down the offending code in – where else? – ActiveRecord::Base
.
module ActiveRecord
class Base
# ...
include Associations
# ...
end
# ...
end
Now we know that in our example, House
and :characters
are being passed as the model
and name
arguments to the build
method.
Locating the “Metaprogramming Magic”
The build
method first checks if the name
parameter is “dangerous”, and if so it throws an error. The comment above the dangerous_attribute_name?
method states that “a method name is ‘dangerous’ if it is already (re)defined by Active Record, but not by any ancestors. (So ‘puts’ is not dangerous but ‘save’ is.)”
Barring any unforeseen danger, the build
method then passes all of its method arguments to the create_builder
method, which is called on an implicit receiver. In this context, self
is referring to the ActiveRecord::Associations::Builder::HasMany
class. The create_builder
method checks that the name
argument passed to it (:characters
) is a symbol, then instantiates a new instance of the ActiveRecord::Associations::Builder::HasMany
class. This object is stored as builder
, a local variable in the build
method above.
Next, the build
instance method (different from the build
class method) is called on the builder
object. The build
instance method takes a model (House
) as a method argument.
def build(model)
ActiveRecord::Reflection.create(macro, name, scope, options, model)
end
The only new thing here is macro
, which is a method defined on the ActiveRecord::Associations::Builder::HasMany
model.
def macro
:has_many
end
The ActiveRecord::Reflections::create
method delegates based on the macro
passed to it. In our case, it instantiates an instance of the ActiveRecord::Reflection::HasManyReflection
class, which inherits from the ActiveRecord::Reflection::Association
class. I’m still not totally clear what reflections in general do, but this bit of documentation was helpful:
Reflection enables interrogating of Active Record classes and objects about their associations and aggregations. This information can, for example, be used in a form builder that takes an Active Record object and creates input fields for all of the attributes depending on their type and displays the associations to other objects.
The value returned is this instance of the ActiveRecord::Reflection::HasManyReflection
class, and the original method ActiveRecord::Associations::Builder::HasMany::build
class method stores this object to a local variable reflection
.
Once I knew what the model
and reflection
local variables were, it was relatively easy to locate the define_accessors
, define_callbacks
, define_validations
, and define_extensions
methods, as they were all in the same file as the build
method. In this blog post, I’ll only get into the define_accessors
method.
# Defines the setter and getter methods for the association
# class Post < ActiveRecord::Base
# has_many :comments
# end
#
# Post.first.comments and Post.first.comments= methods are defined by this method...
def self.define_accessors(model, reflection)
mixin = model.generated_association_methods
name = reflection.name
define_readers(mixin, name)
define_writers(mixin, name)
end
def self.define_readers(mixin, name)
mixin.class_eval <<-CODE, __FILE__, __LINE__ + 1
def #{name}(*args)
association(:#{name}).reader(*args)
end
CODE
end
def self.define_writers(mixin, name)
mixin.class_eval <<-CODE, __FILE__, __LINE__ + 1
def #{name}=(value)
association(:#{name}).writer(value)
end
CODE
end
This was it. After much searching, I had found the “metaprogramming magic” at the core of Active Record associations.
Interpreting the “Metaprogramming Magic”
Again, I had to start with what I know here. Two method arguments are passed to define_accessors
: model
and reflection
. When the build
method calls define_accessors
, it uses the same names for these arguments. Back in the build
method, model
referred to the House
Active Record model, and reflection
referred to the instance of the ActiveRecord::Reflection::HasManyReflection
class.
Moving on to the next line, I found the generated_association_methods
method is called on House
and the return value is stored in a local variable mixin
. I tracked down the method definition of generated_association_methods
in the ActiveRecord::Core::ClassMethods
module, alongside classics such as find
and find_by
.
def generated_association_methods
@generated_association_methods ||= begin
mod = const_set(:GeneratedAssociationMethods, Module.new)
include mod
mod
end
end
I was really stumped by this one. For one thing, I was distracted by the begin...end
block and the PascalCase symbol. I had to take a step back and remember this is just Ruby code! Soon, I realized that this method memoizes the value of a class instance variable @generated_association_methods
.
As it turns out, const_set
is an instance method on the Module
class in the Ruby core library. const_set
takes two arguments: the first is a string or symbol that will be the name of the new constant you’re creating, and the second argument is an object that will be set to the value of this constant. The constant is then namespaced under the receiver of the const_set
message.
In this case, const_set
has an implicit receiver (self
). It’s tempting to think that self
is the ActiveRecord::Core::ClassMethods
module, but as above, this module is actually mixed in to the House
class when we have it inherit from ActiveRecord::Base
. The receiver here is actually House
. Thus, this method creates a new module in the House
namespace: House::GeneratedAssociationMethods
. (Note that House::GeneratedAssociationMethods
is the return value here, but if we were assigning pretty much anything but a new class or module as GeneratedAssociationMethods
’s value here, the return value would be that value.)
On the next line of this method, the House::GeneratedAssociationMethods
module is included into the House
class. This is pretty awesome – I had no idea that you could call include
from inside a class method. It totally makes sense though, as the receiver is the class itself, so the use of include
here is no different than the typical use of include
in the class namespace.
Finally, the House::GeneratedAssociationMethods
module is returned, and thus is set as the value of the @generated_association_methods
class instance variable. Back in the define_accessors
method, this constant is stored as the mixin
variable.
Going on to the next line of the define_accessors
method, the name
message is passed to the reflection
variable (our instance of the ActiveRecord::Reflection::HasManyReflection
class). This object has an attr_reader
for name, which is set upon initialization. If it’s not too far a stretch back to remember, the name
attribute here is set to the original name
passed as the first argument of the has_many
method. A refresher:
class House < ActiveRecord::Base
has_many :characters
end
The name
is simply :characters
!
define_accessors
delegates to define_readers
and define_writers
, passing House::GeneratedAssociationMethods
and :characters
as the mixin
and name
arguments to both of these methods. For simplicity, I will just focus on the define_readers
method.
A Metaprogrammed Method
From here, it is relatively clear to see what happens next. Let’s look at define_readers
again:
def self.define_readers(mixin, name)
mixin.class_eval <<-CODE, __FILE__, __LINE__ + 1
def #{name}(*args)
association(:#{name}).reader(*args)
end
CODE
end
Aside of a few things in the first line of this method, this actually looks pretty straightforward: we’ve got a new method definition on our hands!
On the first line of the method, class_eval
is called on the House::GeneratedAssociationMethods
module. class_eval
is another instance method of the Module
class in Ruby’s core library. It takes a string as an argument, as well as optional parameters for filename and line number. From the documentation:
class_eval(string [, filename [, lineno]]) → obj
Evaluates the string or block in the context of mod, except that when a block is given, constant/class variable lookup is not affected. This can be used to add methods to a class.
module_eval
returns the result of evaluating its argument. The optional filename and lineno parameters set the text for error messages.
As suspected, class_eval
allows us to add methods to a module or class. In this case, we are adding methods to the House::GeneratedAssociationMethods
module. As this module has already been include
d in House
, these new methods can be accessed by instances of the House
class.
Let’s look at the parameters being passed to this method:
string
: The string being passed to theclass_eval
method is a heredoc denoted by<<-CODE ... CODE
. The contents are a dynamic method definition that will look like this in the example we’re working with:
def characters(*args)
association(:characters).reader(args)
end
-
filename
: I am still not entirely sure what__FILE__
precisely refers to, but I believe it refers to the current file. I am also not very clear about what this means in this context. My educated guess is that it refers to theapp/models/house.rb
file, but really the method is being added to theHouse::GeneratedAssociationMethods
module, which was created dynamically and doesn’t have a file at all! -
lineno
: Again, I’m not too clear about this one. The documentation is sparse here, only stating that__LINE__
refers to “The line number, in the current source file, of the current line.”
Almost there! We’ve now got our method definitions, but what do association
and reader
refer to?
Associations, Caching, and Reflections
It turns out that association
is an instance method of the ActiveRecord::Associations
module.
def association(name) #:nodoc:
association = association_instance_get(name)
if association.nil?
raise AssociationNotFoundError.new(self, name) unless reflection = self.class._reflect_on_association(name)
association = reflection.association_class.new(self, reflection)
association_instance_set(name, association)
end
association
end
The association
method will pull the association specified by the name
argument passed to it (in our case :characters
) out of the @association_cache
instance variable if it has already been loaded into memory. As defined in the ActiveRecord::Core
module, @association_cache
is initialized to an empty Hash. Interestingly, this happens whenever you instantiate an Active Record object using the House.new
syntax!
If the association is not in the @association_cache
(i.e. it hasn’t been loaded into memory), then the association
method checks whether the class where the association is defined (House
) lists the associated class (Character
) as one of its _reflections
(a class_attribute
which is also initialized to an empty hash). The only way to add a reflection to this hash is through the add_reflection
module method. Remember where this is called?
def has_many(name, scope = nil, options = {}, &extension)
reflection = Builder::HasMany.build(self, name, scope, options, &extension)
Reflection.add_reflection self, name, reflection
end
In the case of our domain model, the second line of the has_many
method will add the key "characters"
pointing to the value of our reflection
from above – the instance of the ActiveRecord::Reflection::HasManyReflection
class – to the _reflections
hash.
Let’s make an instance of the House
class.
stark = House.create(surname: "Stark", sigil: "Direwolf", motto: "Winter is coming.")
# => #<House id: 1, surname: "Stark", sigil: "Direwolf", motto: "Winter is coming.", created_at: "2014-10-09 20:05:01", updated_at: "2014-10-09 20:05:01">
If we call stark.characters
, Rails first checks whether "characters"
is a key in House
’s _reflections
hash. If so, a new object is instantiated based on the type of this reflection – in the case of has_many
, it’s an instance of the ActiveRecord::Associations::HasManyAssociation
class. The association
method then calls association_instance_set
, passing :characters
and this instance of ActiveRecord::Associations::HasManyAssociation
as its parameters. This key value pair is then added to the @association_cache
instance variable:
def association_instance_set(name, association)
@association_cache[name] = association
end
The association
local variable is returned from the association
method above, and then is sent the reader
message, which is defined in the ActiveRecord::Associations::CollectionAssociations
class, the superclass of ActiveRecord::Associations::HasManyAssociation
.
# Implements the reader method, e.g. foo.items for Foo.has_many :items
def reader(force_reload = false)
if force_reload
klass.uncached { reload }
elsif stale_target?
reload
end
@proxy ||= CollectionProxy.create(klass, self)
end
A Return Value at the End of the Tunnel
Here on the last line of the reader
method was @proxy
, the final value that is returned by the newly minted characters
method. This @proxy
instance variable is also the one alluded to by Obie Fernandez when he says “proxy collection objects are created that let you manipulate the relationship easily.”
Here is where it gets interesting: the ActiveRecord::Associations::CollectionProxy
class inherits from ActiveRecord::Relation
, which is the main class in Rails that deals with database operations. From here, we have to go all the way up the inheritance chain to the ActiveRecord::Delegation::ClassMethods
module to find the create
method used here.
Now when we call stark.characters
, we know without a shadow of a doubt what our return value will be!
stark.characters
# => [#<Character id: 1, name: "Arya", age: 11, catchphrase: "Stick them with the pointy end.", house_id: 1, created_at: "2014-10-09 20:07:27", updated_at: "2014-10-09 20:07:27">]
Hmm. From inside the Rails console, this looks like an array. Let’s check out the class of this collection:
stark.characters.class
# => Character::ActiveRecord_Associations_CollectionProxy
Much better.
Conclusion
This post was a good exercise for me to get more comfortable delving into the Rails source code, to begin to wrap my head around some of the legendary “Rails magic”, and to learn a bit more about metaprogramming in the wild. This is just the tip of the iceberg, but I no longer feel the Rails source code is untouchable. In fact, it’s all just a bunch of great Ruby code!