Radiant Page#find_by_url

by Jim

On a recent project a client asked about overriding Page#find_by_url and when that actually occurs. I think the answer should be explained for everyone working with it.

20 second summary

This is an in-depth look at the method that gathers pages within Radiant. In short, if you want to do special page finding, create a subclass of Page and write your own find_by_url method to adjust the way Radiant behaves. Every page will respond to this method and return appropriate pages according to the requested url. In the admin interface, you can select your special page type to make that page behave as you have specified.

Simple finding

find_by_url is defined both as a class method and an instance method. Let’s look at the class method Page.find_by_url from the Page model:

class << self
  def find_by_url(url, live = true)
    root = find_by_parent_id(nil)
    raise MissingRootPageError unless root
    root.find_by_url(url, live)
  end
  # ...
end

First, it looks for the root page, which is considered the page with no parent_id. If no root page is found it raises a MissingRootPageError exception; otherwise, it calls the instance method find_by_url on the root page.

This class method takes 2 arguments: the url (really the path matched in the routes from request) to be found, and a live flag which defaults to true (more about that later).

Finding the first page

The find_by_url instance method is a bit more complex. Let’s take a look:

def find_by_url(url, live = true, clean = true)
  return nil if virtual?
  url = clean_url(url) if clean
  my_url = self.url
  if (my_url == url) && (not live or published?)
    self
  elsif (url =~ /^\#{Regexp.quote(my_url)}([^\\/]*)/)
    slug_child = children.find_by_slug($1)
    if slug_child
      found = slug_child.find_by_url(url, live, clean)
      return found if found
    end
    children.each do |child|
      found = child.find_by_url(url, live, clean)
      return found if found
    end
    file_not_found_types = ([FileNotFoundPage] + FileNotFoundPage.descendants)
    file_not_found_names = file_not_found_types.collect { |x| x.name }
    condition = (['class_name = ?'] * file_not_found_names.length).join(' or ')
    condition = \"status_id = \#{Status[:published].id} and (\#{condition})\" if live
    children.find(:first, :conditions => [condition] + file_not_found_names)
  end
end

Wow. There’s a lot going on there and there’s room for some refactoring, but for now let’s just walk through it.

First, nil will be returned if the page is virtual?. A page, by default, is not virtual. This is stored in the database in a boolean field, but you may override this in any subclass of Page that you create. For now, let’s assume that your page isn’t and won’t be virtual and we’ll get back to what it means.

Next, we clean the url if the clean flag is set to true (which it is by default). clean_url simply ensures that the url being checked is properly formatted and that any doubling of slashes is fixed. So this right//here//// becomes this /right/here/.

The next step shows us why we clean the url. A local variable is setup to compare against the page’s url.

my_url = self.url
if (my_url == url) #...

What is a page’s url? It’s calculated by the page’s slug and the slugs of it’s ancestors. In short, if your current page’s slug is ‘here’ and it’s parent page is ‘right’, and that page’s parent is the home page (with a slug of ‘/’) then your current page’s url is ‘/right/here/’.

So we check to see that to see if it is the same as the url in the request. But also, in this comparison, we check to see if the live flag is set and is false or if the page is published?.

This live flag is a bit strange in appearance:

my_url = self.url
if (my_url == url) && (not live or published?)

By default, this not live returns false (since live is true by default and we reverse it with not) so it moves on to published?. You might set live to false in other situations, but for now we’ll just go with this.

A page is published? if it’s status (as stored in the database) is the ‘Published’ Status.

So if the incoming url matches the current page’s url (which at the first pass is the root or home page), then we return with the current page:

my_url = self.url
if (my_url == url) && (not live or published?)
  self

Finding deeper pages

If it isn’t true that the incoming url and the current page’s url are equal, then we move on to the next step:

my_url = self.url
if (my_url == url) && (not live or published?)
  self
elsif (url =~ /^#{Regexp.quote(my_url)}([^\/]*)/)

Here it matches the incoming url against a Regexp of the current page’s url. When it starts, we’re matching the root page which has a url of ‘/’. If that’s the incoming url, it would have been caught in the original if block, but we ended up at the elsif. The Regexp that’s used matches the next slug in the incoming url. So if the incoming url is ‘/right/here/’ then it will match the slug ‘right’.

From that match, we find the current page’s children by their slug (remembering that the current page is the root, with a slug of ‘/’):

elsif (url =~ /^#{Regexp.quote(my_url)}([^\/]*)/)
  slug_child = children.find_by_slug($1)

If it finds that ‘slug_child’, then we call find_by_url on that page to loop down the tree to find the final page that we want (which would be the page that responds to the url ‘/right/here’ or in this simple case, the page with a slug of ‘here’). If it finds the page, then it returns the found page:

  slug_child = children.find_by_slug($1)
  if slug_child
    found = slug_child.find_by_url(url, live, clean)
    return found if found
  end

In that if slug_child block, the slug_child.find_by_url acts as a loop. Because every page responds to this method and will do exactly what is happening here for the root page, each page will search it’s children for a slug matching the slug from the incoming url and any found page will likewise call find_by_url to search it’s children as well.

There is some room here for some optimization in the way we do a lookup for a page, but for now it works and we can get to the refactoring another time.

When no slug is found: customizing the finder

If the slug_child is not found (and no child matches that slug) then this if slug_child block is never hit and we move to the next step. This is where the magic happens for subclasses of Page:

  children.each do |child|
    found = child.find_by_url(url, live, clean)
    return found if found
  end

It asks each child of the current page if it responds to find_by_url and returns any found page.

So even if none of the pages are found by the slug, we still ask the children if they respond to find_by_url. Why would we do this?

The answer lies in one of the included extensions: Archive.

The ArchivePage is a subclass of page which provides it’s own find_by_url method. The ArchivePage#find_by_url will check the incoming url for it’s details and if it meets certain requirements (namely that there is a standard date format in the url such as ‘articles/2010/06/22’) then it will find the appropriate page type such as ArchiveDayIndexPage, ArchiveMonthIndexPage or ArchiveYearIndexPage and return the proper page. If none of those are found it just calls super and calls the original Page#find_by_url.

This can act as your router for your custom page types. If you want to return a particular page type, such as a ProductsPage and your url is ‘/products/1234’ then you can create a ProductPage which has it’s own find_by_url method and would find your ProductDetailsPage to display a standard view of all of your products based upon the slug ‘1234’ which I’d assume would be a product id, but could be anything you want.

Handling 404

Lastly, if none of this finds any pages to return, Radiant has a FileNotFoundPage page which allows you to easily create your own 404 error message for content that isn’t found. You can subclass a FileNotFoundPage page to provide your own behavior there too. But when searching for a match to an incoming url, Radiant will find deeply nested 404 pages. So you can create a FileNotFoundPage as a child of your root page, but you can also create a FileNotFoundPage as a child of your ProductsPage to return an appropriate message to someone looking for ‘/products/not-a-valid-url’.

Here’s the code for that last step:

  file_not_found_types = ([FileNotFoundPage] + FileNotFoundPage.descendants)
  file_not_found_names = file_not_found_types.collect { |x| x.name }
  condition = (['class_name = ?'] * file_not_found_names.length).join(' or ')
  condition = "status_id = #{Status[:published].id} and (#{condition})" if live
  children.find(:first, :conditions => [condition] + file_not_found_names)

The live flag comes into play here again and optionally allows you to find pages that are not published. By default live is true, so in this instance we only check for a FileNotFoundPage that is published.

Radiant has a ‘development’ mode which would find unpublished pages, but that’s a subject for another discussion.

I hope this gives you a good understanding of how Radiant finds its content, and how you can easily bend it to behave differently by creating a subclass of Page and writing your own find_by_url method. If I’ve left anything out or if you want me to cover some other aspect, let me know in the comments.

Comments

Tristan said on Wednesday, June 23, 2010:

Thanks alot for the clarification. I recently stumbled upon #find_by_url. Searched the Wiki, but couldn’t find anything useful. Actually I wasn’t sure if it is the correct approach to overwrite this method in extensions.

One problem I came across was that I needed to capture a substring of the url for later reference in custom tags (say, an ID). It didn’t feel right to initialize instance variables in a method called #find_by_url so I refactored it in private methods (say, #product_from_url). But still, this method is called in #find_by_url. Are there any other „hooks“?

What approach would you recommend for this scenario? What do you think about an API for extension developers providing something like params[:id]?

Jim Gay said on Wednesday, June 23, 2010:

Tristan,

You’ll have a @request object within your tags through which you can get the request_uri.

This is from an app developed on 0.6.9 originally (I think) but some example code is:

uri_parts = @request.request_uri.split(/\//)
    tag.locals.resource_tag = MetaTag.find_by_name(CGI.unescape(uri_parts[-1]))

But I think we might even have a request method now that does the same thing (instead of the instance variable). I’d need to look back at the code, but allowing an accessor like params[:id] would be attractive if there is a standard use for it. I’m open to the idea, but currently unconvinced that there is a need for it. Do you have a project or extension I could view to get a feel for the use case?

A very long time ago I played with the idea of sending requests to controllers and then back to a Page, but never got anywhere satisfactory with it.

Edmund Haselwanter said on Thursday, September 09, 2010:

is it possible to return with a page_not_found from a tag definition.

I would like to use the uri and a tag parameter to determine if I can display something. if not I would like to do something like super in find_by_url

Jim Gay said on Tuesday, September 14, 2010:

You can use super.

What would you want with a page_not_found from a tag?

If you're collecting pages (for example) with r:page:each you'd just not expand the tag if a given page is not found. But I'm not sure I understand the use case. Can you elaborate?

Find more in the archives

1999 - 2014 © Saturn Flyer LLC 2321 S. Buchanan St. Arlington, VA 22206

Call Jim Gay at 571 403 0338