public
Description: Solr-powered search for Ruby objects
Home | Edit | New

Setting up classes for search and indexing

In order for objects to be indexed and searched using Sunspot, their class must be configured for search. Configuration tells Sunspot what fields should be indexed, how to get the data for those fields, and a few other class-specific options. In order to configure a class for Sunspot, use the Sunspot.setup method. Let’s start with an example — a Post model for a blog — followed by a discussion of what is contained therein:


Sunspot.setup(Post) do
  text :title, :boost => 2.0
  text :body
  text :author_names do
    authors.map { |author| author.full_name }
  end
  string :title, :stored => true
  integer :blog_id, :references => Blog
  integer :category_ids, :references => Category, :multiple => true
  float :average_rating
  time :published_at
  boolean :featured, :using => :featured?
  boost { featured? ? 2.0 : 1.0 }
end

Ways to populate field data

When Sunspot indexes a Ruby object, it extracts data from that object based on the setup and creates a Solr document. Sunspot has two methods of extracting field data, known as attribute extraction and block extraction.

Most of the fields configured in the above setup use attribute extraction, which is to say they simply call a method on the object and index the return value. By default, the method name used is the same as the name of the field; however, a different method name can be specified with the :using option. For example, the above setup indexes the return value of the #featured? method in a field called featured.

The :author_names field in the above setup is an example of block extraction, which is to say the given block is evaluated in the context of the indexed objects, and its return value is indexed as field data. Block extraction is useful when the data with which you want to populate the field is useful only for that purpose; you can thus keep the logic in the search definition and avoid polluting your object’s method namespace.

Note that in block extraction, as with all of Sunspot’s DSL blocks, your block can take an argument, in which case the object being indexed will be passed as the argument and the block will be evaluated in the calling context. So, the following would be equivalent:


text :author_names do |post|
  post.authors.map { |author| author.full_name }
end

Text Fields

The first three fields defined above, :title, :body, and :author_names, are text fields. When text fields are indexed, they are broken up into their constituent words and then processed using a definable set of filters (with Sunspot’s default Solr installation, they’re just lower-cased). This process is known as tokenization, and it’s what allow text fields to be searched using fulltext matching. You can read more about tokenization and the available filter options on the Solr wiki.

Boost

When text fields are searched, each document is assigned a relevance score based on where the searched words appear in the document, how many times they appear in the document, and how common they are in the index as a whole. You can shape the relevance score by assigning boost, which at the field level tells Solr to assign higher relevance to search terms found in the field. For example, finding a search term in the :title field above would give a result document a higher score than finding the same search term in the :body field.

As well as specifying field boost, you can also assign a document boost, which will make certain documents globally more relevant than others, regardless of search terms. Use the boost method in the DSL to assign document boost, as in the last line of the block in the example. As with field definitions, document boost can be extracted using attribute or block extraction; the above example uses block extraction. Document boost can also be specified statically, thus giving all objects of the class under setup the same boost:


Sunspot.setup(Comment) do
  boost 1.2
end

Attribute Fields

Attribute fields are the focus of most of the other components of search: scoping, faceting, ordering, etc. The fields :title, :blog_id, :category_ids, :average_rating, :published_at, and :featured are all attribute fields. Unlike text fields, attribute fields are not tokenized: they are indexed and searched verbatim, similarly to how columns are saved and queried in a relational database.

Attribute fields are also typed; the available types are string, integer, float, time, date, and boolean. As illustrated by the example above, the method called in the DSL is the type of the field being defined. There is also a special type class, which is used to store the class name of each indexed object; it should not be used explicitly.

Attribute field definitions can take a number of options:

:multiple
Boolean: Whether the field should index multiple values (the method/block used to generate the data returns an Array). Multiple-value fields cannot be used for sorting, for reasons that are fairly obvious when you think about it.
:references
Class: Indicates that the values in this field act as a primary key for the specified class. This allows Sunspot to populate facet rows for this class with the referenced instance. See Drilling down with facets for more information.
:stored
Boolean: If true, the values in this field will be stored as well as indexed. When results are retrieved, stored field values are available in Hit objects, and can be accessed without making a database round-trip to populate the actual result instance.
Last edited by outoftime, Fri Jul 24 05:23:16 -0700 2009
Home | Edit | New
Versions: