Geo Location and Search


One of the coolest search technology combinations out there are the ability to combine geo and search. Queries such as give me all the restaurants that serves meat ([insert your query here]) within 20 miles from me, or create a distance heat map of them, is slowly becoming a must have for any content website. This is becoming even more relevant with new browsers supporting Geolocation API.

Already in master (and in the upcoming 0.9.1 release), elasticsearch comes with rich support for geo location. Lets take a drive down the geo support path:

Indexing Location Aware Documents

In general, documents indexed are not required to define any predefined mapping in order to use geo location features, but they should conform to a convention if none is defined. For example, lets take an example of a “pin” that we want to index its location and maybe some tags its associated with:

{
    "pin" : {
        "location" : {
            "lat" : 40.12,
            "lon" : -71.34
        },
        "tag" : ["food", "family"],
        "text" : "my favorite family restaurant"
    }
}

The location element is a “geo enabled” location since it has lat and lon properties. Once one follows the above conventions, all geo location features are enabled for pin.location.

If explicit setting is still required, then its easy to define a mapping that defines a certain property as a geo_point. Here is an example:

{
    "pin" : {
        "properties" : {
            "location" : {
                "type" : "geo_point"
            }
        }
    }
}

By defining the location property as geo_point, this means that now we can index location data in many different formats, starting from the lat/lon example above, up to geohash. For information on all the available formats, check out 278.

Update: The automatic mapping of “geo enabled” properties has been disabled since publishing this article. You have to provide the correct mapping for geo properties. Please see the documentation.

Find By Location

The first thing after indexing location aware documents, is being able to query them. There are several ways to be able to query such information, the simplest one is by distance. Here is an example:

{
    "filtered" : {
        "query" : {
            "field" : { "text" : "restaurant" }
        },
        "filter" : {
            "geo_distance" : {
                "distance" : "12km",
                "pin.location" : {
                    "lat" : 40,
                    "lon" : -70
                }
            }
        }
    }
}

The above will search for all documents with text of restaurant that exists within 12km of the provided location. The location point can accept several different formats as well, detailed at 279.

The next query supported is a bounding box query, allowing to restrict the results into a geo box defined by the top left, and bottom right coordinates. Here is an example:

{
    "filtered" : {
        "query" : {
            "field" : { "text" : "restaurant" }
        },
        "filter" : {
            "geo_bounding_box" : {
                "pin.location" : {
                    "top_left" : {
                        "lat" : 40.73,
                        "lon" : -74.1
                    },
                    "bottom_right" : {
                        "lat" : 40.717,
                        "lon" : -73.99
                    }
                }
            }
        }
    }
}

The last, and the most advance form of geo query is a polygon based search, here is an example:

{
    "filtered" : {
        "query" : {
            "field" : { "text" : "restaurant" }
        },
        "filter" : {
            "geo_polygon" : {
                "pin.location" : {
                    "points" : [
                        {"lat" : 40, "lon" : -70},
                        {"lat" : 30, "lon" : -80},
                        {"lat" : 20, "lon" : -90}
                    ]
                }
            }
        }
    }
}

Sorting

The ability to sort results not just by ranking (how relevant is the document to the query), but also by distance allows for much greater geo usability. There is now a new _geo_distance sort type allowing to sort based on a distance from a specific location:

{
    "sort" : [
        {
            "_geo_distance" : {
                "pin.location" : [-40, 70],
                "order" : "asc",
                "unit" : "km"
            }
        }
    ],
    "query" : {
        "field" : { "text" : "restaurant" }
    }
}

On top of that, elasticsearch will now return all the values per hit of fields sorted on, allowing to easily display this important information.

Faceting

Faceting, the ability to show an aggregated views on top of the search results go hand in hand with geo. For example, one would like to get the number of hits matching the search query within 10 miles, 20 miles, and above from his location. The geo distance facet provides just that:

{
    "query" : {
        "field" : { "text" : "restaurant" }
    },
    "facets" : {
        "geo1" : {
            "geo_distance" : {
                "pin.location" : {
                    "lat" : 40,
                    "lon" : -70
                },
                "ranges" : [
                    { "to" : 10 },
                    { "from" : 10, "to" : 20 },
                    { "from" : 20, "to" : 100 },
                    { "from" : 100 }
                ]
            }
        }
    }
}

Summary

The combination of search with geo is a natural one, and slowly becoming critical to any (web) application, especially with HTML 5 and mobile devices becoming more and more widespread. elasticsearch upcoming geo support brings this integration into a whole new level, and enables application to provide rich geo and search functionality easily (ohh, and scale ;) ).

-shay.banon