Python Interface¶

Basic Usage¶

import random

from n2 import HnswIndex

f = 40
t = HnswIndex(f)  # HnswIndex(f, "angular, L2, or dot")
for i in range(1000):
    v = [random.gauss(0, 1) for z in range(f)]
    t.add_data(v)

t.build(m=5, max_m0=10, n_threads=4)
t.save('test.hnsw')

u = HnswIndex(f)
u.load('test.hnsw')
print(u.search_by_id(0, 1000))

You can see more code examples at examples/python.

Main Interface¶

`n2.HnswIndex.add_data`	Adds vector v.
`n2.HnswIndex.build`	Builds a hnsw graph with given configurations.
`n2.HnswIndex.save`	Saves the index to disk.
`n2.HnswIndex.load`	Loads the index from disk.
`n2.HnswIndex.unload`	Unloads (unmap) the index.
`n2.HnswIndex.search_by_vector`	Returns k nearest items (as vectors) to a query item.
`n2.HnswIndex.search_by_id`	Returns k nearest items (as ids) to a query item.
`n2.HnswIndex.batch_search_by_vectors`	Returns k nearest items (as vectors) to each query item (batch search with multi-threads).
`n2.HnswIndex.batch_search_by_ids`	Returns k nearest items (as ids) to each query item (batch search with multi-threads).

class n2.HnswIndex(dimension, metric='angular')¶

__init__(dimension, metric='angular')¶

Parameters

dimension (int) -- Dimension of vectors.
metric (string) -- An optional parameter to choose a distance metric. ('angular' | 'L2' | 'dot')

Returns

An instance of Hnsw index.

add_data(v)¶

Adds vector v.

Parameters: v (list(float)) -- A vector with dimension dimension set in __init__().
Returns: Boolean value indicating whether data addition succeeded or not.
Return type: bool

batch_search_by_ids(item_ids, k, ef_search=- 1, num_threads=4, include_distances=False)¶

Returns k nearest items (as ids) to each query item (batch search with multi-threads).

Note

With OMP_SCHEDULE environment variable, you can set how threads are scheduled. Refer to GNU libgomp.

Parameters

item_ids (list(int)) -- Query ids.
k (int) -- k value.
ef_search (int) -- ef_search metric (default: 50 * k). If you pass -1 to ef_search, ef_search will be set as the default value.
num_threads (int) -- Number of threads to use for search.
include_distances (bool) -- If you set this argument to True, it will return a list of tuples((item_id, distance)).

Returns

A list of list of k nearest items for each input query item in the order passed to parameter item_ids.

Return type

list(list(int) or list(list(tuple(int, float)))

batch_search_by_vectors(vs, k, ef_search=- 1, num_threads=4, include_distances=False)¶

Returns k nearest items (as vectors) to each query item (batch search with multi-threads).

Note

With OMP_SCHEDULE environment variable, you can set how threads are scheduled. Refer to GNU libgomp.

Parameters

vs (list(list(float))) -- Query vectors.
k (int) -- k value.
ef_search (int) -- ef_search metric (default: 50 * k). If you pass -1 to ef_search, ef_search will be set as the default value.
num_threads (int) -- Number of threads to use for search.
include_distances (bool) -- If you set this argument to True, it will return a list of tuples((item_id, distance)).

Returns

A list of list of k nearest items for each input query item in the order passed to parameter vs.

Return type

list(list(int) or list(list(tuple(int, float)))

build(m=None, max_m0=None, ef_construction=None, n_threads=None, mult=None, neighbor_selecting=None, graph_merging=None)¶

Builds a hnsw graph with given configurations.

Parameters

m (int) -- Max number of edges for nodes at level > 0 (default: 12).
max_m0 (int) -- Max number of edges for nodes at level == 0 (default: 24).
ef_construction (int) -- Refer to HNSW paper (default: 150).
n_threads (int) -- Number of threads for building index.
mult (float) -- Level multiplier. Recommended to use the default value (default: 1 / log(1.0 * M)).
neighbor_selecting (string) --
Neighbor selecting policy.
- Available values
  - "heuristic" (default): Select neighbors using algorithm4 on HNSW paper (recommended).
  - "naive": Select closest neighbors (not recommended).
graph_merging (string) --
Graph merging heuristic.
- Available values
  - "skip" (default): Do not merge (recommended for large-scale data (over 10M)).
  - "merge_level0": Performs an additional graph build in reverse order, then merges edges at level 0. So, it takes twice the build time compared to "skip" but shows slightly higher accuracy. (recommended for data under 10M scale).

load(fname, use_mmap=True)¶

Loads the index from disk.

Parameters

fname (str) -- An index file name.
use_mmap (bool) -- An optional parameter indicating whether to use mmap() or not (default: True). If this parameter is set, N2 loads model through mmap.

Returns

Boolean value indicating whether model load succeeded or not.

Return type

bool

save(fname)¶

Saves the index to disk.

Parameters: fname (str) -- A file destination where the index will be saved.
Returns: Boolean value indicating whether model save succeeded or not.
Return type: bool

search_by_id(item_id, k, ef_search=- 1, include_distances=False)¶

Returns k nearest items (as ids) to a query item.

Parameters

item_id (int) -- A query id.
k (int) -- k value.
ef_search (int) -- ef_search metric (default: 50 * k). If you pass -1 to ef_search, ef_search will be set as the default value.
include_distances (bool) -- If you set this argument to True, it will return a list of tuples((item_id, distance)).

Returns

A list of k nearest items.

Return type

list(int) or list(tuple(int, float))

search_by_vector(v, k, ef_search=- 1, include_distances=False)¶

Returns k nearest items (as vectors) to a query item.

Parameters

v (list(float)) -- A query vector.
k (int) -- k value.
ef_search (int) -- ef_search metric (default: 50 * k). If you pass -1 to ef_search, ef_search will be set as the default value.
include_distances (bool) -- If you set this argument to True, it will return a list of tuples((item_id, distance)).

Returns

A list of k nearest items.

Return type

list(int) or list(tuple(int, float))

unload()¶: Unloads (unmap) the index.