Python Interface

Basic Usage

import random

from n2 import HnswIndex

f = 40
t = HnswIndex(f)  # HnswIndex(f, "angular, L2, or dot")
for i in range(1000):
    v = [random.gauss(0, 1) for z in range(f)]
    t.add_data(v)

t.build(m=5, max_m0=10, n_threads=4)
t.save('test.hnsw')

u = HnswIndex(f)
u.load('test.hnsw')
print(u.search_by_id(0, 1000))

You can see more code examples at examples/python.

Main Interface

n2.HnswIndex.add_data

Adds vector v.

n2.HnswIndex.build

Builds a hnsw graph with given configurations.

n2.HnswIndex.save

Saves the index to disk.

n2.HnswIndex.load

Loads the index from disk.

n2.HnswIndex.unload

Unloads (unmap) the index.

n2.HnswIndex.search_by_vector

Returns k nearest items (as vectors) to a query item.

n2.HnswIndex.search_by_id

Returns k nearest items (as ids) to a query item.

n2.HnswIndex.batch_search_by_vectors

Returns k nearest items (as vectors) to each query item (batch search with multi-threads).

n2.HnswIndex.batch_search_by_ids

Returns k nearest items (as ids) to each query item (batch search with multi-threads).

class n2.HnswIndex(dimension, metric='angular')
__init__(dimension, metric='angular')
Parameters
  • dimension (int) -- Dimension of vectors.

  • metric (string) -- An optional parameter to choose a distance metric. ('angular' | 'L2' | 'dot')

Returns

An instance of Hnsw index.

add_data(v)

Adds vector v.

Parameters

v (list(float)) -- A vector with dimension dimension set in __init__().

Returns

Boolean value indicating whether data addition succeeded or not.

Return type

bool

batch_search_by_ids(item_ids, k, ef_search=- 1, num_threads=4, include_distances=False)

Returns k nearest items (as ids) to each query item (batch search with multi-threads).

Note

With OMP_SCHEDULE environment variable, you can set how threads are scheduled. Refer to GNU libgomp.

Parameters
  • item_ids (list(int)) -- Query ids.

  • k (int) -- k value.

  • ef_search (int) -- ef_search metric (default: 50 * k). If you pass -1 to ef_search, ef_search will be set as the default value.

  • num_threads (int) -- Number of threads to use for search.

  • include_distances (bool) -- If you set this argument to True, it will return a list of tuples((item_id, distance)).

Returns

A list of list of k nearest items for each input query item in the order passed to parameter item_ids.

Return type

list(list(int) or list(list(tuple(int, float)))

batch_search_by_vectors(vs, k, ef_search=- 1, num_threads=4, include_distances=False)

Returns k nearest items (as vectors) to each query item (batch search with multi-threads).

Note

With OMP_SCHEDULE environment variable, you can set how threads are scheduled. Refer to GNU libgomp.

Parameters
  • vs (list(list(float))) -- Query vectors.

  • k (int) -- k value.

  • ef_search (int) -- ef_search metric (default: 50 * k). If you pass -1 to ef_search, ef_search will be set as the default value.

  • num_threads (int) -- Number of threads to use for search.

  • include_distances (bool) -- If you set this argument to True, it will return a list of tuples((item_id, distance)).

Returns

A list of list of k nearest items for each input query item in the order passed to parameter vs.

Return type

list(list(int) or list(list(tuple(int, float)))

build(m=None, max_m0=None, ef_construction=None, n_threads=None, mult=None, neighbor_selecting=None, graph_merging=None)

Builds a hnsw graph with given configurations.

Parameters
  • m (int) -- Max number of edges for nodes at level > 0 (default: 12).

  • max_m0 (int) -- Max number of edges for nodes at level == 0 (default: 24).

  • ef_construction (int) -- Refer to HNSW paper (default: 150).

  • n_threads (int) -- Number of threads for building index.

  • mult (float) -- Level multiplier. Recommended to use the default value (default: 1 / log(1.0 * M)).

  • neighbor_selecting (string) --

    Neighbor selecting policy.

    • Available values
      • "heuristic" (default): Select neighbors using algorithm4 on HNSW paper (recommended).

      • "naive": Select closest neighbors (not recommended).

  • graph_merging (string) --

    Graph merging heuristic.

    • Available values
      • "skip" (default): Do not merge (recommended for large-scale data (over 10M)).

      • "merge_level0": Performs an additional graph build in reverse order, then merges edges at level 0. So, it takes twice the build time compared to "skip" but shows slightly higher accuracy. (recommended for data under 10M scale).

load(fname, use_mmap=True)

Loads the index from disk.

Parameters
  • fname (str) -- An index file name.

  • use_mmap (bool) -- An optional parameter indicating whether to use mmap() or not (default: True). If this parameter is set, N2 loads model through mmap.

Returns

Boolean value indicating whether model load succeeded or not.

Return type

bool

save(fname)

Saves the index to disk.

Parameters

fname (str) -- A file destination where the index will be saved.

Returns

Boolean value indicating whether model save succeeded or not.

Return type

bool

search_by_id(item_id, k, ef_search=- 1, include_distances=False)

Returns k nearest items (as ids) to a query item.

Parameters
  • item_id (int) -- A query id.

  • k (int) -- k value.

  • ef_search (int) -- ef_search metric (default: 50 * k). If you pass -1 to ef_search, ef_search will be set as the default value.

  • include_distances (bool) -- If you set this argument to True, it will return a list of tuples((item_id, distance)).

Returns

A list of k nearest items.

Return type

list(int) or list(tuple(int, float))

search_by_vector(v, k, ef_search=- 1, include_distances=False)

Returns k nearest items (as vectors) to a query item.

Parameters
  • v (list(float)) -- A query vector.

  • k (int) -- k value.

  • ef_search (int) -- ef_search metric (default: 50 * k). If you pass -1 to ef_search, ef_search will be set as the default value.

  • include_distances (bool) -- If you set this argument to True, it will return a list of tuples((item_id, distance)).

Returns

A list of k nearest items.

Return type

list(int) or list(tuple(int, float))

unload()

Unloads (unmap) the index.