Python Interface¶
Basic Usage¶
import random
from n2 import HnswIndex
f = 40
t = HnswIndex(f) # HnswIndex(f, "angular, L2, or dot")
for i in range(1000):
v = [random.gauss(0, 1) for z in range(f)]
t.add_data(v)
t.build(m=5, max_m0=10, n_threads=4)
t.save('test.hnsw')
u = HnswIndex(f)
u.load('test.hnsw')
print(u.search_by_id(0, 1000))
You can see more code examples at examples/python.
Main Interface¶
Adds vector v. |
|
Builds a hnsw graph with given configurations. |
|
Saves the index to disk. |
|
Loads the index from disk. |
|
Unloads (unmap) the index. |
|
Returns k nearest items (as vectors) to a query item. |
|
Returns k nearest items (as ids) to a query item. |
|
Returns k nearest items (as vectors) to each query item (batch search with multi-threads). |
|
Returns k nearest items (as ids) to each query item (batch search with multi-threads). |
-
class
n2.
HnswIndex
(dimension, metric='angular')¶ -
__init__
(dimension, metric='angular')¶ - Parameters
dimension (int) -- Dimension of vectors.
metric (string) -- An optional parameter to choose a distance metric. ('angular' | 'L2' | 'dot')
- Returns
An instance of Hnsw index.
-
add_data
(v)¶ Adds vector v.
- Parameters
v (list(float)) -- A vector with dimension
dimension
set in __init__().- Returns
Boolean value indicating whether data addition succeeded or not.
- Return type
bool
-
batch_search_by_ids
(item_ids, k, ef_search=- 1, num_threads=4, include_distances=False)¶ Returns k nearest items (as ids) to each query item (batch search with multi-threads).
Note
With OMP_SCHEDULE environment variable, you can set how threads are scheduled. Refer to GNU libgomp.
- Parameters
item_ids (list(int)) -- Query ids.
k (int) -- k value.
ef_search (int) -- ef_search metric (default: 50 * k). If you pass -1 to ef_search, ef_search will be set as the default value.
num_threads (int) -- Number of threads to use for search.
include_distances (bool) -- If you set this argument to True, it will return a list of tuples((item_id, distance)).
- Returns
A list of list of k nearest items for each input query item in the order passed to parameter
item_ids
.- Return type
list(list(int) or list(list(tuple(int, float)))
-
batch_search_by_vectors
(vs, k, ef_search=- 1, num_threads=4, include_distances=False)¶ Returns k nearest items (as vectors) to each query item (batch search with multi-threads).
Note
With OMP_SCHEDULE environment variable, you can set how threads are scheduled. Refer to GNU libgomp.
- Parameters
vs (list(list(float))) -- Query vectors.
k (int) -- k value.
ef_search (int) -- ef_search metric (default: 50 * k). If you pass -1 to ef_search, ef_search will be set as the default value.
num_threads (int) -- Number of threads to use for search.
include_distances (bool) -- If you set this argument to True, it will return a list of tuples((item_id, distance)).
- Returns
A list of list of k nearest items for each input query item in the order passed to parameter
vs
.- Return type
list(list(int) or list(list(tuple(int, float)))
-
build
(m=None, max_m0=None, ef_construction=None, n_threads=None, mult=None, neighbor_selecting=None, graph_merging=None)¶ Builds a hnsw graph with given configurations.
- Parameters
m (int) -- Max number of edges for nodes at level > 0 (default: 12).
max_m0 (int) -- Max number of edges for nodes at level == 0 (default: 24).
ef_construction (int) -- Refer to HNSW paper (default: 150).
n_threads (int) -- Number of threads for building index.
mult (float) -- Level multiplier. Recommended to use the default value (default: 1 / log(1.0 * M)).
neighbor_selecting (string) --
Neighbor selecting policy.
- Available values
"heuristic"
(default): Select neighbors using algorithm4 on HNSW paper (recommended)."naive"
: Select closest neighbors (not recommended).
graph_merging (string) --
Graph merging heuristic.
- Available values
"skip"
(default): Do not merge (recommended for large-scale data (over 10M))."merge_level0"
: Performs an additional graph build in reverse order, then merges edges at level 0. So, it takes twice the build time compared to"skip"
but shows slightly higher accuracy. (recommended for data under 10M scale).
-
load
(fname, use_mmap=True)¶ Loads the index from disk.
- Parameters
fname (str) -- An index file name.
use_mmap (bool) -- An optional parameter indicating whether to use mmap() or not (default: True). If this parameter is set, N2 loads model through mmap.
- Returns
Boolean value indicating whether model load succeeded or not.
- Return type
bool
-
save
(fname)¶ Saves the index to disk.
- Parameters
fname (str) -- A file destination where the index will be saved.
- Returns
Boolean value indicating whether model save succeeded or not.
- Return type
bool
-
search_by_id
(item_id, k, ef_search=- 1, include_distances=False)¶ Returns k nearest items (as ids) to a query item.
- Parameters
item_id (int) -- A query id.
k (int) -- k value.
ef_search (int) -- ef_search metric (default: 50 * k). If you pass -1 to ef_search, ef_search will be set as the default value.
include_distances (bool) -- If you set this argument to True, it will return a list of tuples((item_id, distance)).
- Returns
A list of k nearest items.
- Return type
list(int) or list(tuple(int, float))
-
search_by_vector
(v, k, ef_search=- 1, include_distances=False)¶ Returns k nearest items (as vectors) to a query item.
- Parameters
v (list(float)) -- A query vector.
k (int) -- k value.
ef_search (int) -- ef_search metric (default: 50 * k). If you pass -1 to ef_search, ef_search will be set as the default value.
include_distances (bool) -- If you set this argument to True, it will return a list of tuples((item_id, distance)).
- Returns
A list of k nearest items.
- Return type
list(int) or list(tuple(int, float))
-
unload
()¶ Unloads (unmap) the index.
-