atom feed31 messages in org.apache.lucene.java-dev[jira] [Commented] (SOLR-2155) Geospa...
FromSent OnAttachments
Bill Bell (JIRA)Mar 25, 2011 9:12 pm 
Robert Muir (JIRA)Mar 26, 2011 4:31 am 
Grant Ingersoll (JIRA)Mar 26, 2011 4:37 am 
Robert Muir (JIRA)Mar 26, 2011 4:51 am 
Chris Male (JIRA)Mar 26, 2011 4:59 am 
Grant IngersollMar 26, 2011 5:05 am 
Robert MuirMar 26, 2011 5:24 am 
Yonik SeeleyMar 26, 2011 6:48 am 
Grant IngersollMar 26, 2011 6:50 am 
Grant IngersollMar 26, 2011 6:56 am 
Yonik SeeleyMar 26, 2011 6:59 am 
Robert MuirMar 26, 2011 7:05 am 
Nicolas HelleringerMar 26, 2011 7:10 am 
Yonik SeeleyMar 26, 2011 7:11 am 
Yonik SeeleyMar 26, 2011 8:03 am 
Robert MuirMar 26, 2011 8:12 am 
David Smiley (JIRA)Mar 26, 2011 9:21 am 
Ryan McKinleyMar 26, 2011 9:26 am 
Ryan McKinley (JIRA)Mar 26, 2011 10:11 am 
Yonik SeeleyMar 26, 2011 10:33 am 
Ryan McKinleyMar 26, 2011 11:16 am 
Yonik SeeleyMar 26, 2011 11:22 am 
Yonik SeeleyMar 26, 2011 11:30 am 
Chris MaleMar 26, 2011 6:03 pm 
Chris MaleMar 26, 2011 6:19 pm 
Grant IngersollMar 26, 2011 7:08 pm 
William BellMar 26, 2011 9:19 pm 
Smiley, David W.Mar 27, 2011 3:47 pm 
David Smiley (JIRA)Apr 1, 2011 2:38 pm 
Lance Norskog (JIRA)Apr 1, 2011 4:30 pm 
Grant Ingersoll (JIRA)Apr 2, 2011 4:53 am 
Subject:[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
From:Ryan McKinley (JIRA) (ji@apache.org)
Date:Mar 26, 2011 10:11:42 am
List:org.apache.lucene.java-dev

[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011661#comment-13011661
]

Ryan McKinley commented on SOLR-2155:

-------------------------------------

Congratulations on the new baby!

Thinking about spatial support in general, I think we should settle on some
basic APIs and approaches that can be used across many indexing strategies. In
http://code.google.com/p/lucene-spatial-playground/ I'm messing with how we can
use a standard API to index Shapes with various strategies. As always, each
stratagey has its tradeoffs, but if we can keep the high level APIs similar,
that makes choosing the right approach easier. In this project I'm looking at
indexing shaps as: * bounding box -- 4 fields xmin/xmax/ymin./ymax * prefix grids -- like geohash or
[csquars|http://www.marine.csiro.au/csquares/about-csquares.htm] * in memory spatial index (rtree/quadtree) * raw WKB geometry tokens * points -- x,y fields * etc

To keep things coherent, I'm proposing a high level interface like: https://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-lucene/src/main/java/org/apache/lucene/spatial/search/SpatialQueryBuilder.java

And then each implementation fills it in: https://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-lucene/src/main/java/org/apache/lucene/spatial/search/prefix/PrefixGridQueryBuilder.java

This solr to just handle setup and configuration: http://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-solr/src/main/java/org/apache/solr/spatial/prefix/SpatialPrefixGridFieldType.java

In my view geohash is a subset of 'spatial prefix grid' (is there a real name
for this?) -- the interface i'm proposing is: http://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-base/src/main/java/org/apache/lucene/spatial/base/prefix/SpatialPrefixGrid.java essentially: {code} public List<CharSequence> readCells( Shape geo ); {code}

Geohash for a point would just be a list of one token -- for a polygon, it would
be a collection of tokens that fill the space like csquares

I aim to get this basic structure in a lucene branch and maybe into trunk in the
next few weeks....

Geospatial search using geohash prefixes

----------------------------------------

Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Assignee: Grant Ingersoll Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch,
GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch

There currently isn't a solution in Solr for doing geospatial filtering on
documents that have a variable number of points. This scenario occurs when
there is location extraction (i.e. via a "gazateer") occurring on free text.
None, one, or many geospatial locations might be extracted from any given
document and users want to limit their search results to those occurring in a
user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a
geohash prefix based filter. A geohash refers to a lat-lon box on the earth.
Each successive character added further subdivides the box into a 4x8 (or 8x4
depending on the even/odd length of the geohash) grid. The first step in this
scheme is figuring out which geohash grid squares cover the user's search query.
I've added various extra methods to GeoHashUtils (and added tests) to assist in
this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter,
that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid
squares in the index. Once a matching geohash grid is found, the points therein
are compared against the user's query to see if it matches. I created an
abstraction GeoShape extended by subclasses named PointDistance... and
CartesianBox.... to support different queried shapes so that the filter need not
care about these details. This work was presented at LuceneRevolution in Boston on October 8th.