As previously blogged I recently read the ElasticSearch server book
published by
Packt Publishing. It was a pleasant reading, really interesting although I was already familiar with the product.
Writing a book about elasticsearch turns out not be easy, at all. There are in fact lots of features and gems that would need to be discussed, something that’s really hard to do in a book with a reasonable number of pages. Also, the product is rapidly evolving, which makes it extremely hard to keep up with it and come up with up-to-date content.
I think this book brings something that was missing until now in the elasticsearch ecosystem, since it goes from installing the product and setting it up to using it in real life, describing also potential issues and their solutions. Also, it doesn’t neglect the needed technical details about the underlying lucene library and search in general.
Click here to read the rest of the article I wrote on the Trifork blog.
Posted on April 15, 2013 in elasticsearch
0
You might have realized from my blogs and tweets that I’m a big elasticsearch fan and I work with it on a daily basis.
That’s why I’m happy to start reading
ElasticSearch Server, the first book that’s been published about the project. It will definitely be an interesting read, and I’ll post here a review as soon as I’m done with it.
I recently wrote a couple of articles about the elasticshell, the command line shell for Elasticsearch that I created. If you haven’t heard about it, it’s a json friendly command line tool that allows to quickly interact with Elasticsearch: you can easily index documents, execute queries and make use of all the API that Elasticsearch provides. It allows for more advanced usecases as well, since it exposes the power and flexibility of both JavaScript and Java. That’s scary, isn’t it? Let’s see what this means…
Click here to read the rest of the article I wrote on the Trifork blog.
So as promised here is a sequel to my previous post Introducing the elasticshell. Let’s start exactly where we left off…
What about search?
We of course need to search against the created index. We can provide queries as either json documents or Java QueryBuilders provided with the elasticsearch Java API, which are exposed to the shell as they are.
Click here to read the rest of the article I wrote on the Trifork blog.
A few days ago I released the first beta version of the elasticshell, a shell for elasticsearch. The idea I had was to create a command line tool that allows you to easily interact with elasticsearch.
Isn’t elasticsearch easy enough already?
I really do think elasticsearch is already great and really easy to use. However, on the other hand there is quite some API available and quite some json involved too. Also, interacting with REST APIs requires a tool other than the browser to use the proper http methods and so on. There are different solutions available: some of them are generic, like curl or browser plugins, while others are elasticsearch plugins like head or sense, that you can use to send json requests and see the result, still in json format. What was missing is a command line tool, something that plays the role of the mongo shell in the elasticsearch world. That’s ambitious, isn’t it?
Click here to read the rest of the article I wrote on the Trifork blog.
At a recent Hippo meetup I gave a presentation about enterprise search. Being able to index and search your content, both in the Hippo CMS and in other sources, is of interest to many Hippo users. The presentation does not go into any Hippo specifics, but provides a brief introduction to search, Apache Lucene and concepts like an inverted index, but quickly goes into the two main enterprise (open source) search servers: Apache Solr and Elasticsearch. You can find here my slides.
Up until now I told you why I think elasticsearch is so cool and how you can use it combined with Spring. It’s now time to get to something a little more technical. For example, once you have a search engine running you need to index data; when it comes to indexing data you usually need to choose between the push and the pull approach. This blog entry will detail these approaches and goes into writing a river plugin for elasticsearch.
Implementing the push approach means writing your own indexer using your favourite programming language and pushing data to the search engine through some client library or even sending REST requests to it.
On the other hand, implementing the pull approach with elasticsearch means writing a special type of plugin, also known as river, which will pull data from a data source and index it. The data source can be whatever system you can get data from: the file system, a database, and so on.
Click here to read the rest of the article I wrote on the Trifork blog.
Whenever there’s a new product out there and you start using it, suggest it to customers or colleagues, you need to be prepared to answer this question: “Why should I use it?”. Well, the answer could be as simple as “Because it’s cool!”, which of course is the case with elasticsearch, but then at some point you may need to explain why. I recently had to answer the question, “So what’s so cool about elasticsearch?”, that’s why I thought it might be worthwhile sharing my own answer in this blog.
Click here to read the rest of the article I wrote on the Trifork blog.
Trifork has a long track record in doing project, training and consulting around open source search technologies. Currently we are working on several interesting search projects using elasticsearch. Elasticsearch is an open source, distributed, RESTful, search engine built on top of Apache Lucene. In contrast to for instance Apache Solr, elasticsearch is built as a highly scalable distributed system from the ground up, allowing you to shard and replicate multiple indices over a large number of nodes. This architecture makes scaling from one server to several hundreds a breeze. But, it turns out elasticsearch is not only good for what everyone calls “Big Data”, but it is also very well suited for indexing only small amounts of documents and even running elasticsearch embedded within an application, while still providing the flexibility to scale up later when needed.
As most developers know, most databases offer full-text search capabilities on the data that is stored. However, from our experience often more is needed and that is where Lucene-based solutions come in. And elasticsearch is currently our technology of choice when it comes to greenfield projects, as it provides all the features you typically need and combines it with scalability.
For this case, we decided to use elasticsearch as part of a bigger project for the University of Amsterdam (UvA). We use elasticsearch to “cache” course information that is retrieved from a Peoplesoft SiS system and make it searchable. And in this case we decided to fire up a local elasticsearch node within an existing Spring web application, using it as an embedded search engine.
Click here to read the rest of the article I wrote on the Trifork blog.
In this post I’m gonna write about elasticsearch and a way to deploy it within a Java servlet container as a war file. Since I’m playing around with Jelastic, why not try to deploy it to the cloud? Once again, I haven’t done anything distributed in Jelastic, I only deployed a single elasticsearch node. As far as I know the multicast discovery doesn’t work on Jelastic, like in all cloud platforms, but I haven’t tried yet running more than one instance together.
The deploy procedure is pretty straightforward. The really interesting feature I’ve used is the deploy via a Jelastic Maven node, directly from a github project.
It’s been a while since I don’t blog, about eight months have gone really fast in Amsterdam! Meanwhile I wrote a few articles on
SearchWorkings.org, the
community site for search professionals. In fact, as you may have noticed from my tweets I’m getting more and more involved in enterprise search and open source projects. In this post I’m gonna write about
Jelastic, the
next generation of Java hosting platforms. I gave it a try deploying
Apache Solr on the cloud and I thought it might be a good idea to share what I did. But be aware that I haven’t done anything in a distributed manner, I only deployed a
single Solr instance.
Nowadays almost every website has a full text search box as well as the auto suggestion feature in order to help users to find what they are looking for, by typing the least possible number of characters possible. The example below shows what this feature looks like in Google. It progressively suggests how to complete the current word and/or phrase, and corrects typo errors. That’s a meaningful example which contains multi-term suggestions depending on the most popular queries, combined with spelling correction.
There are different ways to make auto complete suggestions with Solr. You can find many articles and examples on the internet, but making the right choice is not always easy. The goal of this post is compare the available options in order to identify the best solution tailored to your needs, rather than describe any one specific approach in depth.
Click here to read the rest of the article I wrote on the Trifork blog.
The Data Import Handler is a popular method to import data into a Solr instance. It provides out of the box integration with databases, xml sources, e-mails and documents. A Solr instance often has multiple sources and the process to import data is usually expensive in terms of time and resources. Meanwhile, if you make some schema changes you will probably find out you need to reindex all your data; the same happens with indexes when you want to upgrade to a Solr version without backward compatibility. We can call it “re-index bottleneck”: once you’ve done the first data import involving all your external sources, you will never want to do it the same way again, especially on large indexes and complex systems.
Click here to read the rest of the article I wrote on the Trifork blog.
Almost nine months ago I noticed I was losing the passion for what I had been doing in my working life, and this had a major impact in my daily life. I thought: I’m just 28 years old and I need to work with verve, loving my job and my projects. After, I started thinking about my aims and the relation between my private life and my working one. In fact, they were strongly connected and, maybe I didn’t get the point, but I couldn’t be completely happy at home if my job wasn’t going well. I should certainly work on myself to separate those areas, but why not looking for something better to improve my working life?
Some days ago I needed to manually update some
MD5 values stored into an
Oracle database. I usually generate those fields through a Java application, with the same code I commonly use to
encrypt passwords. I couldn’t run a Java program on the production server and I needed to do the job quickly, hopefully with a single command. I looked up for an Oracle function and I found out the
dbms_obfuscation_toolkit package.