You are on page 1of 20

Apache Solr

Enterprise search platform


from the Apache Lucene project

Rivet Logic Corporation


1800 Alexander Bell Drive
Suite 400
Reston, VA 20191
Ph: 703.955.3480 Fax: 703.234.7711

What is Solr?

Search Server
Built upon Apache Lucene
Fast, very
Scalable, query load and collection size
Interoperable
Extensible
Lucene power exposed over HTTP
Spell checking, highlighting, faceting and etc.
Caching
Replication
Distributed search

How stuff works?

schema.xml

Field types
<fieldType name="text" class="solr.TextField" indexed="true" />

Fields
<field name="technologies" type="text" indexed="true" stored="true" multiValued="true"/>

Unique key (optional)


<uniqueKey>id</uniqueKey>

copy fields
<copyField source="developers" dest="df"/>

dynamic fields
<dynamicField name="*_dt" type="date"

indexed="true" stored="true"/>

similarity configuration
Similarity is the scoring routine for each document vs. a query

solrconfig.xml
Lucene indexing parameters
<mergeFactor>10</mergeFactor>
<ramBufferSizeMB>32</ramBufferSizeMB>

Cache settings
<queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="
32"/>

Request handler configuration


<requestHandler name="dismax" class="solr.SearchHandler" >

HTTP cache settings


<httpCaching lastModifiedFrom="openTime" etagSeed="Solr">

Search components, response writers, query parsers


<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<queryResponseWriter name="velocity" class="org.apache.solr.request.
VelocityResponseWriter"/>
<queryParser name="lucene" class="org.apache.solr.search.LuceneQParserPlugin"/>

Request Handler
<requestHandler name="/itas" class="solr.SearchHandler">
<lst name="defaults">
<str name="v.template">browse</str>
<str name="v.properties">velocity.properties</str>
<str name="title">Solritas</str>
<str name="wt">velocity</str>
<str name="defType">dismax</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
<str name="facet">on</str>
<str name="facet.field">df</str>
<str name="facet.mincount">1</str>
<str name="hl">true</str>
<str name="hl.fl">developers</str>
<str name="qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
</lst>
</requestHandler>

Response Writer
A Response Writer generates the formatted response of
a search.
The wt parameter selects the Response Writer to be
used
json, php, phps, python, ruby, xml, xslt, velocity
<queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter">
<int name="xsltCacheLifetimeSeconds">5</int>
</queryResponseWriter>

Analyzers, Tokenizers, Filters


The Analyzer class is a native Lucene concept that determines
how tokens are produced from a piece of text
<fieldType name="nametext" class="solr.TextField">
<analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer"/>
</fieldType>

The job of a tokenizer is to break up a stream of text into


tokens
A token looks at each Token in the stream sequentially
and decides whether to pass it along, replace it or discard
it
<fieldType name="text" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
</fieldType>

Other features
Highlighting
&hl=true&hl.fl=developers

Synonyms
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>

Spell check
The spell check component can return a list of alternative spelling
suggestions.
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">

Content Streams
Allows Solr server to fetch local or remote data itself. Must enable remote streaming in
solrconfig.xml

Solr Cell
leveraging Tika, extracts and indexes rich documents such as Word, PDF, HTML, and many
other types

More like this


http://wiki.apache.org/solr/MoreLikeThis

Indexing with solrJ

SolrServer solr =
new CommonsHttpSolrServer(
new URL("http://localhost:8983/solr"));
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "EXAMPLEDOC01");
doc.addField("title", "NOVAJUG SolrJ Example");
solr.add(doc);
solr.commit(); // after a batch, not per document
solr.optimize(); // periodically, if/when needed

Data Import Handler


Indexes relational database, XML data, and e-mail
sources
Supports full and incremental/delta indexing
Highly extensible with custom data sources,
transformers, etc
http://wiki.apache.org/solr/DataImportHandler

Replication
Master is polled
Replicant pulls Lucene index and optionally also Solr
configuration files
Query throughput scaling: replicate and load balance
http://wiki.apache.org/solr/SolrReplication

Demo
Download solr
http://mirrors.ibiblio.org/pub/mirrors/apache/lucene/solr/1.4.0/

Start solr
cd <solr_home>/example
java -jar start.jar

Post documents
cd <solr_home>/example/exampledocs
java -jar post.jar *.xml
java -jar post.jar cw.xml

Access Solr
http://localhost:8983/solr/admin/

Querying solr
http://localhost:8983/solr/select/?q=binesh
http://localhost:8983/solr/select/?q=binny
http://localhost:8983/solr/select/?q=binesh&facet=true&facet.field=df&facet.mincount=1
http://localhost:8983/solr/itas/

Luke
http://www.getopt.org/luke/

Liferay + Solr: Motivation

Centralizing search index in clustered Liferay


environment
Performance improvement
Re-indexing costs too much for large DB's
Often time indexes of Liferay deployments in a cluster are not
synchronized

Liferay + Solr: Configuration 1


Install Solr (http://lucene.apache.org/solr)
Setting up environment variables
SOLR_HOME = /${solr installed folder}
JAVA_OPTS = "$JAVA_OPTS -Dsolr.solr.home=$SOLR_HOME/example/solr/data"

solr.xml
Place the file under ${tomcat}/conf/Catalina/localhost/ with following content

<?xml version="1.0" encoding="utf-8">


<Context docBase="$SOLR_HOME/apache-solr-1.4.0.war"
debug="0" crossContext="true">
<Environment name="solr/home" type="java.lang.String"
value="$SOLR_HOME" override="true" />
</Context>

Liferay + Solr: Configuration 2


schema.xml
This file tells Solr how to index the data coming from Liferay, and can be
customized for your installation.
Copy this file from solr-web plugin to $SOLR_HOME/conf (you may have
to create the conf directory) in your Solr home folder.
... <fields>
<field name="comments" type="text" indexed="true" stored="true" />
<field name="content" type="text" indexed="true" stored="true" />
<field name="description" type="text" indexed="true" stored="true" />
<field name="name" type="text" indexed="true" stored="true" />
<field name="properties" type="text" indexed="true" stored="true" />
<field name="title" type="text" indexed="true" stored="true" />
<field name="uid" type="string" indexed="true" stored="true" />
<field name="url" type="text" indexed="true" stored="true" />
<field name="userName" type="text" indexed="true" stored="true" />
<field name="version" type="text" indexed="true" stored="true" />
<dynamicField name="*" type="string" indexed="true" stored="true" />
</fields>
<uniqueKey>uid</uniqueKey>
<defaultSearchField>content</defaultSearchField>
... <copyField source="comments" dest="content"/> ... ...

Liferay + Solr: Configuration 3

Copy WAR file


Copy the WAR file $SOLR_HOME/dist/apache-solr-${solr.version}.war
into $SOLR_HOME/example; where ${solr.version} represents Solr
version number, i.e., 1.4.0.

Start Liferay/tomcat
Solr will be picked up and "solr" will be deployed automatically under
${tomcat}/webapps folder

Install solr-web Liferay plugin


Latest Liferay plugin can be checked out from the following location
http://svn.liferay.com/repos/public/plugins/trunk/webs/solr-web

Build the checked out plugin and deploy it

Liferay + Solr: Configuration 4

Final Step
We need to rebuild Liferay search indexes
Control Panel > Server Administration

Liferay + Solr: How it works

solr-spring.xml (from solr-web plugin)


...
<bean id="solrServer"
class="com.liferay.portal.search.solr.server.BasicAuthSolrServer">
<constructor-arg type="java.lang.String"
value="http://localhost:8080/solr" />
</bean>
<bean id="indexSearcher.solr"
class="com.liferay.portal.search.solr.SolrIndexSearcherImpl">
<property name="solrServer" ref="solrServer" />
</bean>
<bean id="indexWriter.solr"
class="com.liferay.portal.search.solr.SolrIndexWriterImpl">
<property name="commit" value="true" />
<property name="solrServer" ref="solrServer" />
</bean>
...

Liferay + Solr: Back to the default?

Simply undeploy solr-web plugin


Rebuild search indexes using the control panel described
in the previous step

You might also like