1. Skip to navigation
  2. Skip to content

The ELC Community Blog

A knowledge exchange on Ruby on Rails and Agile Development

March 31, 2008

by jmoline

Advanced db_free_solr

If you've implemented the db_free_solr plugin in your application, it's probably safe to assume that performance is pretty important to you. For those of you that are looking to squeeze just a little bit more efficiency out of your Solr implementation, here's another feature you might want to know about.

The out-of-box functionality that db_free_solr provides, much the same as acts_as_solr, is designed to make things as easy as possible for you. To that end, the default behavior is to index and store every field you call out in your models. The truth, though, is that you won't need every field you index to be returned with the results. Likewise, some of the fields you want to have returned will, no doubt, not be necessary to index for searching purposes.

So, how do we go about avoiding unnecessary indexing and storing?

The answer is actually simpler than you might think. Acts_as_solr allows you to specify the datatypes of the fields you tell it to index (if not specified, the index assumes a datatype of text). Like so:

acts_as_solr :fields => [{:field1 => :string}, {:field2 => :integer}, {:field3 => :date}]
So, what I've done, is to add a few datatypes to correspond with the newly available indexing options. As I said above, the default is to both index and store, so what I've added are datatypes to indicate that a field should be marked for storage only ("_so") or indexing only ("_io"). These can be used like so:
acts_as_solr :fields => [{:field1 => :string_io}, {:field2 => :integer_so}, {:field3 => :date}]
It's also worth noting that you can specify datatypes for each and every field, or only a select few. In other words, this will work just fine:
acts_as_solr :fields => [:field1, {:field2 => :integer_so}, :field3]
Here's a complete listing of the supported datatypes:

Symbol Definition
:text text, indexed and stored
:text_so text, stored only
:text_io text, indexed only
:string string, indexed and stored
:string_so string, stored only
:string_io string, indexed only
:date date, indexed and stored
:date_so date, stored only
:date_io date, indexed only
:integer integer, indexed and stored
:integer_so integer, stored only
:integer_io integer, indexed only
:float float, indexed and stored
:float_so float, stored only
:float_io float, indexed only
:boolean boolean, indexed and stored
:boolean_so boolean, stored only
:boolean_io boolean, indexed only

That's all for now. Happy Programming!

March 15, 2008

by jmoline

"Fixing" acts_as_solr

For what is probably an overwhelming majority of projects, the search functionality that is offered by the acts_as_solr plugin is more than adequate. It's really a neat little piece of work, powerful and easily implemented. Part of the reason it's so easy to implement, though, is that it makes some concessions for the sake of simplicity. Concessions that, for a small percentage of projects, may not be in the best interests of the application and its performance.

What are these concessions, you ask? Well, let's begin by quickly summarizing the basic functionality.

After you install the plugin in your application, pretty much all that's left is to define the indexed fields in each model. Once you've done that and built the indexes, you can go into script/console and run a command like this:

   1  Model.find_by_solr("text")

And it will return a collection of Model objects that relate to your search term.

It's what goes on behind the scenes, though, that should concern us. Acts_as_solr beautifully implements and utilizes the power of the Solr indexes to find relevant results, but before it returns a nice easy to use collection to you, it lazy loads each individual object from the database using the primary key, which is the only thing that gets returned from the Solr search engine.

It doesn't have to be that way, though. The Solr engine can pretty easily be configured to return more useful, descriptive results that can be rendered to the user without ever hitting the database. Once you know how to set up the Solr engine to do that, though, the question becomes, can the acts_as_solr plugin take advantage of it?

The short answer is no (this is where the concessions I mentioned earlier come into play).

But this would be a really lame post if that were the end of the story, now, wouldn't it?

This seems like a pretty good time to mention db_free_solr, a new plugin that extends acts_as_solr to make it easier to harness the power of the Solr search engine and to minimize those resource snarfing database hits.

Unfortunately, in rolling back those concessions I mentioned, search engine set up is now going to require a little more work on the part of the developer. Not a lot, mind you, but it's definitely not plug-n-play anymore. Let's run through an example, shall we? (For the sake of this example, I'm going to assume that acts_as_solr is already installed.)

1. To start, you'll need to install the plugin:

   1  ./script/plugin install http://dbfreesolr.rubyforge.org/svn/tags/1.0/db_free_solr/

2. Next, you'll need to copy the schema.xml file from the db_free.solr/lib/ directory into your acts_as_solr/solr/solr/conf/ directory. Want to know what's different in there? Click here.

3. Reload your Solr indexes. This can be done on individual models by going into ./script/console and typing:

   1  Model.rebuild_solr_index

4. The next step is to make sure that the db_free_solr plugin doesn't get loaded until after acts_as_solr.

5. In theory, you're good to go. In reality, there's probably a little more work to do.

Here's where that extra effort on your part is required. We'll have to do some extra fiddling to account for the limitations of the data returned by Solr. First of all, you need to realize that Solr can only return fields that have been indexed. What does that mean, exactly? Well, for one thing it means that we may want to index fields that aren't required for the search. It also means that you won't be able to access related objects via those convenient rails relationships (:has_many, :belongs_to, :etc). So, what do you do?

I resolved most of these issues using a little creative indexing. For instance, let's say the search returned a blog post that has an associated user (the author of the post) and I want to display the author's display name along with the title. With out-of-box acts_as_solr, the whole model object was returned, so I could simply type post.user.display_name. But with the new Solr returned results, I can't access that associated user object. So, instead, I create the following method in the post model:

   1  def user_display_name
   2    return self.user.display_name
   3  end

All you have to do now is add :user_display_name to the indexed fields and voila! The value gets returned along with the rest of the results.

As another example, let's pretend you're returning results that need to include a thumbnail (as for a profile image, for instance). With out-of-box acts_as_solr, you might have referenced it like this:

   1  image_tag(results.docs[0].profile_photo.public_filename(:thumb))

Now, you instead create a method in the profile object like so:

   1  def profile_thumb
   2    return self.profile_photo.public_filename(:thumb).to_s
   3  end

Add it to the indexed fields (:profile_thumb) and reference it when you display the results like this:

   1  image_tag(results.docs[0].profile_thumb)

You may be wondering if there's something different or special about how you access this new data. The answer is: not really. I tried to make this change as seamless as possible for you, the developer, to implement. To that end, the search methods return a collection of classes that attempt to behave like the models they represent. They do a pretty decent job, too, until you try to access a method that doesn't correspond to an indexed field.

So, you can call and process the results from Solr exactly like you used to:

   1  
   2  results = Post.find_by_solr(“text”).docs
   3  results.each do |doc|
   4    doc.inspect
   5  end

If you get a missing method error, go back and check your indexes.

For those interested in performance statistics:

For Model.find_by_solr, running the same query (which returned over 900 results) 1000 times, db_free_solr outperformed acts_as_solr alone by, on average, just shy of 8% (7.938%).

For Model.multi_solr_search, running the same query (which, again, returned over 900 results from different models) 1000 times, db_free_solr outperformed acts_as_solr alone by an average of 51%.

That's all I have for you this time. Feel free to comment if you have any questions.

Happy Programming!


home | services | Ruby on Rails Development | code | blog | company