elasticsearch. users

Mozilla Foundation

WarOnOrange is a build system that runs unit/functional test cases whenever a build is committed. The test results are all JSONs that are indexed inside ElasticSearch. Developers can gain further insights on the failures and thereby make better Mozilla Products. Here is a sample.

Socorro is Mozilla's crash-reporting system that collects minidumps from crashes, runs them through minidump-stackwalk to generate json + stack-trace and insert that data inside HBase + Postgres, thereby making it available to be queried via front-end. Socorro Search Service is a REST based API system that facilitates real time indexing using ElasticSearch by retrieving JSONs from HBase with HazelCast as a broker, thereby removing the Postgres dependency.

StumbleUpon

StumbleUpon is using elasticsearch, an open-source, scalable data search solution, to power some upcoming new features. Here’s one example of why we love it: say you wanted to retrieve web pages that you’d stored in HBase. To do this with HBase, you’d have to assign unique IDs to each web page and remember these IDs in order to retrieve them later. ElasticSearch can take a web page you’ve submitted and index every word on this page. All you need to do if you need the web page again is remember a keyword present on the page, and the technology will find what you need.

elasticsearch also enables our engineering teams to work cross-functionally. Even engineers not trained on searching data sets like HBase can run functions using ElasticSearch technology. For example, our research team uses ElasticSearch when they conduct natural language processing tests to improve recommendation methods. Plus, it’s fast and easy to implement: I was able to integrate ElasticSearch in our analytics dashboard after just an hour.

Sony Computer Entertainment

What were our reasons for using elasticsearch?

  • Open Source, Free.
  • Very active project.
  • Lucene-based - we all know Lucene, right!?
  • nice slice of awesomeness!!

We have lots of information in the search index. From play counts and ratings, to the title, and other information. Within Infamous 2 you can select certain filters like 'newest' and that will filter the missions in the index on certain of the criteria. Post-filtering then ensures only one mission per map grid and missions not played by the user are returned.

We also use elasticsearch to provide localized free text searching of both Creators and Missions.

Infochimps

"At Infochimps we recently indexed over 2.5 billion documents for a total of 4TB total indexed size. This would not have been possible without ElasticSearch and the Hadoop..."

Assistly

Assistly is the all-in-one customer support system that turns customer service into Customer Wow. Our product lets companies connect to their customers by Email, Facebook, Twitter, Live Chat and Phone — instantly, and in real time — and deliver “awesomely responsive customer service.”

We use elasticsearch in the following ways:

  • Our customers depend on a fast and efficient workflow and we give them the ability to create Case Filters, which are prioritized lists of support requests. We implement our Case Filters using elasticsearch.
  • We let our customers create an extensive Knowledge Base, both for use by their own customer service agents and for display to the outside world on a searchable Help Center. We use elasticsearch to provide real time search on the Knowledge Base.
  • Our users need quick access to their customer records and cases in Assistly, and we use Elastic Search to serve up customer records and offer fast and efficient case lookup with sophisticated command-line options that are made easy with elasticsearch.
  • Users of Assistly create Business Rules to automate workflow and we use elasticsearch to process timed rules.

We’ve made elasticsearch a critical part of the Assistly product. We evaluated many different search technologies before we made the decision to incorporate elasticsearch. elasticsearch has improved our customer experience and has reduced our implementation costs.

Ataxo Social Insider

Ataxo Social Insider is a web application to monitor and evaluate communication in social networks such as Twitter, Facebook, blogs and other online media in Central Europe. The back-end system, implemented with RabbitMQ and custom Python libraries, retrieves, stores and indexes data from various online services. The front-end system, a Ruby On Rails application, continuously retrieves these „mentions“, based on user preferences („keywords“), from the back-end.

ElasticSearch is used both at the back-end (migrating from Solr) and the front-end (which previously relied on CouchDB-Lucene integration; see our CouchDB case study). All the front-end application logic: filtering by keywords, sources, dates, fulltext searching, computing aggregated metrics and displaying records is powered by ElasticSearch (via the Tire gem for Ruby). In fact, we're using ElasticSearch as a searchable database with powerful aggregation features. The data are being duplicated in CouchDB for durability and redundancy.

ElasticSearch has enabled us to implement some unique and competitive features, which would be hard or impossible to achieve otherwise:

  • Users can use any valid Lucene query syntax for defining the „keyword”, such as apple AND NOT iphone. They're able to fine-tune already defined keywords in this way, and see the impact of their changes immediately.
  • We're able to display real-time, rich, interactive visualization of various metric for currently displayed data, without having to write custom map/reduce functions. See the article on the ElasticSearch's blog for details.
  • Using the percolator feature, we can offer users real-time alerts, where notifications are triggerred by a certain amount of mentions retrieved within certain period.
  • Last, but not least, we're able to retrieve data very fast. Most searches take around 5 milliseconds in ElasticSearch and most pages load under 1 second, for the data set in range of millions of documents and tens of gigabytes.

Bazzarvoice

At Bazaarvoice, we use ElasticSearch for internal analytics applications due to its ease of use and powerful faceting capabilities.

Klout

Klout

Klout is using ElasticSearch to power its main search and eligility criteria for Klout Perks.

We needed to create a scalable and robust search solution that would allow us to search across all scored Klout users. Did I mention it had to be fast? Everyone likes to go fast! The problem is that 100 Million People have Klout (and that was this past September-an eternity in Social Media time) which means our search solution had to scale, scale horizontally.

We chose ElasticSearch because it's designed to be distributed and use fast, non-blocking IO. It was created using strong foundations like Apache Lucene and JBoss Netty and it's designed to be easily extended. In summary it help us build powerful search now, and continue improving search to give our users more relevant results.

Sonian

Sonian Inc.

Sonian uses elasticsearch as a core building block for its cloud powered searching of electronic documents. We adopted it in 2010 based on its underlying support for Lucene, its cloud-friendly architecture, and the rapidly growing community around the project. We currently run 20+ clusters, storing over 5 billion documents, and with index storage in the hundreds of terabytes. Due to our cloud DNA, we have tuned our clusters for a cloud environment, and manage them with full automation. We also are committed to helping support the growing elasticsearch community by providing support and open source plugins.

For more information on Sonian, visit us at www.sonian.net or follow us on Twitter at @sonian.

IGN Entertainment

IGN

Elasticsearch powers IGN's search as well as the Video, Object, Social and Article APIs. We run 10 nodes across 2 clusters. We found scaling and monitoring ElasticSearch much easier compared to Solr, which we migrated from. We were easily able to migrate our custom score queries, as in the core ElasticSearch uses Lucene. IGN Boards Search is also powered by ElasticSearch, which has over 50M documents in the index. The other systems contribute another 6M documents to their respective indices. The API servers use Scala to talk to ElasticSearch over TCP, while the site search uses HTTP calls via PHP controllers/JS. With so much data and a very heavy request volume (~5K RPM) Elasticsearch sits in our datacenter as one of the core architecture components, and we could not be happier with this choice.

Check out IGN Engineering by visiting code.ign.com.

 
Fork me on GitHub