LSH Algorithm and
Implementation (E2LSH)
LocalitySensitive Hashing (LSH)
is an algorithm for solving the approximate or exact
Near Neighbor Search in high dimensional spaces. This webpage
links to the newest LSH algorithms in Euclidean and
Hamming spaces, as well as
the E2LSH package, an implementation of an
early practical LSH algorithm.

Algorithm description:

Newest (not quite) LSH algorithms (2014):
These algorithms achieve performance better than the classic LSH algorithms by
using datadependent hashing. They improve over classic LSH
algorithms for both Hamming and Euclidean space. These algorithms are
not dynamic however, in contrast to the classic LSH algorithms, which
use dataindependent hashing and hence allow updates to the pointset.
Optimal
DataDependent Hashing for Approximate Near Neighbors
(by Alexandr Andoni and Ilya Razenshteyn). Manuscript 2014.
Beyond
Locality Sensitive Hashing
(by Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn). In SODA'14.
Slides: Here are some
slides by Alexandr Andoni on the early version from SODA'14.

Survey of LSH in CACM (2008):
"NearOptimal Hashing Algorithms for Approximate Nearest Neighbor in
High Dimensions" (by Alexandr Andoni and Piotr Indyk).
Communications of the ACM, vol. 51, no. 1, 2008, pp. 117122.
(CACM disclaimer).
also available directly from
CACM (for free).

Most Not so
recent algorithm for Euclidean space (2006):
"NearOptimal
Hashing Algorithms for Near Neighbor Problem in High Dimensions"
(by Alexandr Andoni and Piotr Indyk). In FOCS'06.
Slides
on this LSH algorithm from a talk given by Piotr Indyk.

Earlier algorithm for Euclidean
space (2006): a good introduction to LSH, and the description of
affairs as of 2006, is in the following book chapter
LocalitySensitive
Hashing Scheme Based on pStable
Distributions (by Alexandr Andoni, Mayur Datar, Nicole Immorlica,
Piotr Indyk, and Vahab Mirrokni), appearing in the book
Nearest
Neighbor Methods in Learning and
Vision: Theory and Practice,
by T. Darrell and P. Indyk and G. Shakhnarovich (eds.), MIT Press, 2006.
See also the book
introduction for a smooth introduction to NN problem and LSH.

Original LSH algorithm (1999):
the best algorithm for the Hamming space remains
previous version of the algorithm for the
Hamming distance is described in [GIM'99]
paper.

Implementation of LSH:
download the E2LSH package (alphaversion). The code is based on the algorithm
described in the book chapter (2006) from above.
You can download the manual for the code. The code has been developed by Alex Andoni in 20042005.
This research was supported in part by NSF CAREER Grant #0133849 "Approximate
Algorithms for Highdimensional Geometric Problems".