Short Bytes: FAISS is an open-source library released by Facebook for similarity search and clustering high-dimensional data. This library finds application in complex datasets like images and videos which could not fit in RAM all at once.With the advent of highly successful Machine Learning methods, there has been a boom in big datasets across varied domains. With these huge datasets, hardware becomes a bottleneck. Processing these datasets requires high memory bandwidth and processor capabilities. Furthermore, indexing the data points, clustering and search become highly demanding.
Researchers at Facebook AI Research or FAIR recently published a research paper describing an efficient design for clustering and similarity search. Their new algorithmic structure performs much faster than the previous state-of-art algorithms and utilises GPU for higher memory bandwidth and computational throughput.
Recommended: Top 15 Facebook Open Source Projects You Must Know
Based on their research, they have created a library called FAISS and open-sourced it. Although the algorithms for clustering and similarity search are well-known, this library optimizes those algorithms to perform efficiently on GPUs. Some the algorithms implemented in the library include –
- Fast K-Nearest Neighbour
- K-Means clustering
As a test of how the library performs, in the following figure, the first and the last image are given and the algorithm computes the intermediate transitional images from a collection of 95 million images.
Top Features of FAISS Library –
- Written in C++ with complete Python wrappers
- Supports single/multiple GPUs
- Highly Scalable, typically supports up to 100 dimensions
- Built on BLAS and CUDA libraries
- 8.5x faster performance than current state-of-art libraries
Here is the GitHub repo of the FAISS library. So what do you think about the new library? Share your thoughts with us in comments.