Articles

Why you should use Apache Solr

by Kalyan Bl Analyst
Apache Solr is both a search engine and a distributed document database with SQL support. Here's how to get started

Apache Solr is a subproject of Apache Lucene, which is the indexing technology behind most recently created search and index technology. Solr is a search engine at heart, but it is much more than that. It is a NoSQL database with transactional support. It is a document database that offers SQL support and executes it in a distributed manner.

Sound interesting? Join me for a closer look. (Full disclosure: I work for Lucidworks, which employs many of the key contributors to the Solr project.)

[ NoSQL grudge match: MongoDB and Couchbase Server go nose to nose. | Keep up with hot topics in programming with InfoWorld’s Application Development newsletter. ]
You need a decent machine (or just use an AWS instance) with ideally 8GB or more RAM. You can find Solr at http://lucene.apache.org/solr. You also need the Java Virtual Machine version 8. Unzip/untar Solr into a directory, make sure JAVA_HOME is set, and that the java binary is in your path. Change to the directory Solr is in and type bin/solr start -e cloud -noprompt. This starts a two node cluster on your laptop with a sample collection called gettingstarted already loaded.

A normal startup would just be bin/solr start -c to start Solr in “cloud” mode. But if you’re going to kick the tires you really want to see a multi-node install even if it is on your own laptop. Solr Cloud is the way you want to run a modern Solr install. If you start without the -c you’ll start in legacy mode. That is a bad thing.

Documents and collections
Solr is a document structured database. Entities like “Person” are composed of fields like name, address, and email. Those documents are stored in collections. Collections are the closest analog to tables in a relational database. However, unlike in a relational database, “Person” can completely contain the entity, meaning if a person has multiple addresses those addresses can be stored in one “Person” document. In a relational database you’d need a separate addresses table.

Documents and collections

Shards, replicas, and cores

Creating a collection

Querying your data

Solr administration


Why Solr?

So clearly you might choose to use Solr if you need a search engine. However, it is also a redundant, distributed document database that offers SQL (out of the box) for those who want to connect tools like Tableau. It is extensible in Java (and other JVM languages), and yet with the REST-like interface you can easily speak JSON or XML to it.

Solr might not be your best choice if you have simple data that you’re looking up by key and doing mostly writes on. Solr has too much plumbing for doing bigger things to be as efficient for that as a key-value store.

Solr is a clear choice if your search is very text-centric. However, there are other not-so-obvious cases where it might be a good choice like for spatial searches on all those people whose cell phones you’ve hacked to track their location. I’m just saying that you, Mr. Putin, might want to choose Solr too.

Regardless, just remember that friends don’t let friends do SQL bla like '%stuff' queries.


Learn Apache Solr Training Course From the Industry Experts. Get Best SOLR Training here


Thank you

Sponsor Ads


About Kalyan Bl Freshman   Analyst

12 connections, 0 recommendations, 34 honor points.
Joined APSense since, March 24th, 2018, From Texas, United States.

Created on Mar 24th 2018 01:23. Viewed 445 times.

Comments

No comment, be the first to comment.
Please sign in before you comment.