N2 Benchmark¶

This page is a detailed explanation of how we performed benchmark experiments.

You can also see benchmarks of ANN libraries in Python at ann-benchmarks.com. Note that N2 version 0.1.6 is used in ann-benchmarks.com (last checked on October 8th, 2020) and we are continuing our efforts to improve N2 performance.

Benchmark Focus¶

These are some factors that we focus on when developing N2.

Our ANN algorithm should run fast even when dealing with large-scale datasets.
Our ANN algorithm should minimize the time required to build an index file - in order to be applied to real-world scenario where dataset changes frequently (e.g. create/update/delete), such as in online content services like news portal.

Therefore, our main criteria for benchmark are set as below:

How long does it take to build the index file?
How long does it take to get results from the large dataset?
How large memory does it take to run large dataset?

Test Dataset¶

Dataset Description¶

To test large amounts of data, we use youtube dataset that contains 14520986 samples, where each sample has 40 data points.

feature1(float32)	feature2(float32)	……	feature2(float32)	feature40(float32)
-0.167898	0.160478	……	0.104421	0.0503584

How to Download the Benchmark Dataset¶

You can download benchmark dataset with the script we provide in Download dataset.

We also share youtube dataset through google drive. It consists of two plain text files, youtube.txt and youtube.txt.vids. youtube.txt is a file containing the information of dataset samples and youtube.txt.vids is a file containing the dataset metadata information. Each line is the metadata corresponding to each sample in youtube.txt.

DSID	VID	Youtube link
34XnPr4YKpo8wE_mEl	Z1Jilm0TZHY	http://www.youtube.com/watch?v=Z1Jilm0TZHY

Test Environment¶

CPU: Intel(R) Xeon(R) CPU E5-2620 v4
Memory: 64GB
Storage: SSD
Dataset: Youtube(5.4GB)
N2 version: 0.1.7
NMSLIB version: 2.0.6
g++ (gcc): 7.3.1

Index Build Time¶

The following is a comparison of the index build times taken when using different numbers of threads. N2 builds index file 10~24% faster than NMSLIB.

Library	1 Thread	2 Threads	4 Threads	8 Threads	16 Threads
N2 (Index Size: 3.7GB)	4995.73 sec	3018.57 sec	1609.89 sec	905.87 sec	554.81 sec
NMSLIB (Index Size: 3.9GB)	6282.5 sec	3996.88 sec	2080.36 sec	1106.18 sec	613.18 sec

Search Speed¶

The data below shows tradeoff between QPS(Queries Per Second) and accuracy. Both N2 and NMSLIB shows similar search performance.

Parameter	Search Time (N2)	Accuracy (N2)	Search Time (NMSLIB)	Accuracy (NMSLIB)
M: 12, efCon: 100, efSearch: 25	0.000130227	0.52136	0.000155903	0.574523
M: 12, efCon: 100, efSearch: 50	0.000168451	0.736898	0.000197621	0.760703
M: 12, efCon: 100, efSearch: 100	0.000235572	0.908154	0.000247012	0.899827
M: 12, efCon: 100, efSearch: 250	0.000439563	0.971894	0.000486722	0.964502
M: 12, efCon: 100, efSearch: 500	0.000805385	0.988616	0.000871604	0.982023
M: 12, efCon: 100, efSearch: 750	0.00114534	0.993323	0.00129876	0.987889
M: 12, efCon: 100, efSearch: 1000	0.00148114	0.995105	0.00166815	0.99014
M: 12, efCon: 100, efSearch: 1500	0.00219379	0.996848	0.00241407	0.991855
M: 12, efCon: 100, efSearch: 2500	0.00348781	0.997529	0.00385025	0.993514
M: 12, efCon: 100, efSearch: 5000	0.00669571	0.99839	0.00744833	0.994425
M: 12, efCon: 100, efSearch: 10000	0.0132182	0.998577	0.014742	0.995269
M: 12, efCon: 100, efSearch: 50000	0.0627954	0.998814	0.0706788	0.995788

Memory Usage¶

The data below shows the amount of memory used to build the index file, which is measured as the difference between memory usage before and after building the index file. N2 uses 15% less memory than NMSLIB.

Library	Memory Usage
N2	11222.48 MB
NMSLIB	13212.76 MB

Conclusion¶

N2 builds index file faster and uses less memory than NMSLIB, while having a similar search speed performance.

The benchmark environment uses multiple threads for index builds but a single thread for searching. In a real production environment, you will need to run concurrent searches by multiple processes or multiple threads. N2 allows you to search simultaneously using multiple processes. With mmap support in N2, it works much more efficiently than other libraries, including NMSLIB.