API Reference

Embeddings

This section contains classes for generating vector embeddings from documents. Users can extend LocalSearch.backend.embeddings.BaseEmbeddings to implement custom embedding strategies.

class LocalSearch.backend.embeddings.BaseEmbeddings.BaseEmbedder

Bases: ABC

Abstract base class for any text embedding model.

Subclasses must implement methods to encode text into vectors and report embedding dimensionality.

abstract dimension() int

Return the dimensionality of the embedding vectors produced by this model.

Returns:

Embedding dimension.

Return type:

int

abstract encode(text: str) ndarray

Convert a single text string into a vector embedding.

Parameters:

text – Input text to embed.

Returns:

The resulting embedding vector (dtype=float32).

Return type:

np.ndarray

class LocalSearch.backend.embeddings.SentenceTransformerEmbedder.SentenceTransformerEmbedder(model_name: str = 'all-MiniLM-L6-v2')

Bases: BaseEmbedder

Embedder using a Sentence-Transformers model.

Default model: “all-MiniLM-L6-v2”

dimension() int

Return the dimensionality of the embedding vectors.

Returns:

Embedding dimension.

Return type:

int

encode(text: str) ndarray

Encode a text string into a vector embedding.

Parameters:

text – Input string to encode.

Returns:

Embedding vector as float32 array.

Return type:

np.ndarray

LLMs

This section contains classes for interacting with Large Language Models (LLMs). Users can subclass LocalSearch.backend.llms.BaseLLM to provide custom LLM implementations.

class LocalSearch.backend.llms.BaseLLM.BaseLLM

Bases: ABC

Abstract base class for any LLM (Large Language Model) integration.

abstract generate(prompt: str) str

Generate text from a prompt using the underlying LLM.

Parameters:

prompt – Input prompt string.

Returns:

Generated text as a string.

class LocalSearch.backend.llms.GroqLLM.GroqLLM(api_key: str, model: str = 'openai/gpt-oss-120b')

Bases: BaseLLM

Groq LLM client wrapper.

Uses the Groq SDK to call a specified model and stream output.

generate(prompt: str) str

Generate text for the given prompt.

Parameters:

prompt – Input string to generate a response for.

Returns:

Generated text as a string.

Vector Stores

This section contains classes for storing and querying document vectors. Users can implement LocalSearch.backend.vector_store.BaseVectorStore to define a custom vector storage backend.

class LocalSearch.backend.vector_store.BaseVectorStore.BaseVectorStore

Bases: ABC

Abstract base class for any vector store.

Any implementation must provide all of the following methods to be compatible with _process_files and other utilities.

abstract add(vectors: ndarray, ids: ndarray, metadata: List[dict])

Add vectors with corresponding IDs and associated metadata.

Parameters:
  • vectors (np.ndarray) – 2D array of vectors to add.

  • ids (np.ndarray) – 1D array of unique IDs for each vector.

  • metadata (List[dict]) – List of metadata dicts, one per vector.

abstract dimension() int

Return the dimensionality of vectors supported by this store.

Returns:

Embedding vector dimensionality.

Return type:

int

abstract get_all_ids() Set[int]

Return a set of all vector IDs currently stored.

Returns:

All IDs in the store.

Return type:

Set[int]

abstract load(path: str)

Load the store from disk.

Parameters:

path (str) – File path or directory to load the store from.

abstract prepare_index(directory_path: str, recursive: bool = True) IndexPreparation

Prepare or load the index from a directory.

Parameters:
  • directory_path (str) – Directory containing files to index.

  • recursive (bool) – Whether to scan subdirectories.

Returns:

Dict with keys:
  • ’index’: internal index object

  • ’current_files’: set of files present in the directory

  • ’used_ids’: set of vector IDs already in the index

Return type:

IndexPreparation

abstract remove_by_id(vector_id: int) None

Remove a vector from the store by its ID.

Parameters:

vector_id (int) – ID of the vector to remove.

abstract save(path: str)

Persist the store to disk.

Parameters:

path (str) – File path or directory to save the store.

abstract search(query_vector: ndarray, top_k: int) List[SearchResult]

Return the top_k nearest neighbors for a query vector.

Parameters:
  • query_vector (np.ndarray) – Single query vector.

  • top_k (int) – Number of nearest neighbors to return.

Returns:

List of search results with keys ‘id’, ‘score’, ‘metadata’.

Return type:

List[SearchResult]

class LocalSearch.backend.vector_store.BaseVectorStore.IndexPreparation

Bases: TypedDict

TypedDict for the dictionary returned by prepare_index(). Ensures a strict format for generic utilities.

current_files: Set[str]
index: object
used_ids: Set[int]
class LocalSearch.backend.vector_store.BaseVectorStore.SearchResult

Bases: TypedDict

TypedDict for a single search result returned by the vector store.

id: int
metadata: dict
score: float
class LocalSearch.backend.vector_store.FaissVectorStore.FaissVectorStore(dim: int)

Bases: BaseVectorStore

FAISS-based vector store with optional metadata persistence.

add(vectors: ndarray, ids: ndarray, metadata: List[dict] | None = None)

Add vectors with IDs and optional metadata.

Parameters:
  • vectors – Array of shape (n, dim)

  • ids – Array of integer IDs

  • metadata – Optional list of metadata dictionaries

dimension() int

Return dimensionality of vectors in this store.

get_all_ids() Set[int]

Return a set of all vector IDs currently stored.

load(path: str)

Load FAISS index and associated metadata.

prepare_index(directory_path: str, recursive: bool = True) IndexPreparation

Load existing index if available, else create a new one.

Parameters:
  • directory_path – Directory to store/load index.

  • recursive – Whether to scan subdirectories.

Returns:

Dict with keys:
  • index: FAISS index object

  • current_files: set of valid files

  • used_ids: set of IDs already in index

Return type:

IndexPreparation

remove_by_id(vector_id: int) None

Remove a vector by its ID and delete associated metadata.

save(path: str | None = None)

Save FAISS index and metadata mapping.

Parameters:

path – Optional path to save index (overrides self.index_path)

search(query_vector: ndarray, top_k: int = 5) List[SearchResult]

Search for nearest neighbors of a query vector.

Parameters:
  • query_vector – Array of shape (1, dim)

  • top_k – Number of neighbors to return

Returns:

List of dictionaries with keys ‘id’, ‘score’, ‘metadata’

Return type:

List[SearchResult]

Metadata Stores

This section contains classes for storing metadata about documents and vector chunks. Users can subclass LocalSearch.backend.metadata_store.BaseMetaDataStore to implement custom metadata handling.

class LocalSearch.backend.metadata_store.BaseMetaDataStore.BaseMetadataStore

Bases: ABC

Abstract base class for metadata storage backends.

Defines strict output formats for metadata and chunk mapping.

abstract get_file_info(file_path: str) FileMetadata

Get information about a specific file.

Parameters:

file_path – Path to the file.

Returns:

A FileMetadata dictionary.

abstract is_modified(file_path: str, current_info: FileMetadata) bool

Determine if a file has changed compared to stored metadata.

Parameters:
  • file_path – Path to the file.

  • current_info – Current FileMetadata dictionary.

Returns:

True if the file is modified, False otherwise.

abstract load_chunk_mapping() List[ChunkMapping]

Load the chunk mapping list from persistent storage.

Returns:

List of ChunkMapping dictionaries.

abstract load_metadata() Dict[str, FileMetadata]

Load all metadata from persistent storage.

Returns:

Dictionary mapping file paths to FileMetadata dictionaries.

abstract save_chunk_mapping(chunk_mapping: List[ChunkMapping]) None

Save the chunk mapping list to persistent storage.

Parameters:

chunk_mapping – List of ChunkMapping dictionaries.

abstract save_metadata(metadata: Dict[str, FileMetadata]) None

Save metadata to persistent storage.

Parameters:

metadata – Dictionary mapping file paths to FileMetadata dictionaries.

abstract update(file_path: str, file_info: FileMetadata) None

Update metadata for a specific file.

Parameters:
  • file_path – Path to the file.

  • file_info – FileMetadata dictionary.

class LocalSearch.backend.metadata_store.BaseMetaDataStore.ChunkMapping

Bases: TypedDict

Structure of a single chunk mapping.

chunk_id: str
end: int
file_path: str
start: int
class LocalSearch.backend.metadata_store.BaseMetaDataStore.FileMetadata

Bases: TypedDict

Structure of metadata stored for each file.

modified: float
size: int
class LocalSearch.backend.metadata_store.JsonMetadataStore.JsonMetadataStore(directory_path: str)

Bases: BaseMetadataStore

JSON-based metadata and chunk mapping persistence.

get_file_info(file_path: str) dict

Get information about a specific file.

Parameters:

file_path – Path to the file.

Returns:

A FileMetadata dictionary.

is_modified(file_path: str, current_info: dict) bool

Determine if a file has changed compared to stored metadata.

Parameters:
  • file_path – Path to the file.

  • current_info – Current FileMetadata dictionary.

Returns:

True if the file is modified, False otherwise.

load_chunk_mapping()

Load the chunk mapping list from persistent storage.

Returns:

List of ChunkMapping dictionaries.

load_metadata()

Load all metadata from persistent storage.

Returns:

Dictionary mapping file paths to FileMetadata dictionaries.

save_chunk_mapping(chunk_mapping: list[dict])

Save the chunk mapping list to persistent storage.

Parameters:

chunk_mapping – List of ChunkMapping dictionaries.

save_metadata(metadata: dict)

Save metadata to persistent storage.

Parameters:

metadata – Dictionary mapping file paths to FileMetadata dictionaries.

update(file_path: str, file_info: dict)

Update metadata for a specific file.

Parameters:
  • file_path – Path to the file.

  • file_info – FileMetadata dictionary.

Text Extractors

This section contains classes for extracting text from files. Users can implement LocalSearch.backend.text_extractor.BaseTextExtractor to define custom extraction strategies.

class LocalSearch.backend.text_extractor.BaseTextExtractor.BaseTextExtractor

Bases: ABC

Abstract base class for extracting text from documents.

Subclasses must implement methods to determine if a file type is supported and to extract text from files.

abstract can_handle(file_path: str) bool

Determine if this extractor can handle the given file type.

Parameters:

file_path – Path to the file.

Returns:

True if this extractor can process the file type, False otherwise.

abstract extract_text(file_path: str) str

Extract and return text from a file.

Parameters:

file_path – Path to the file.

Returns:

Extracted text as a string.

abstract split_text(text: str) list[str]

Split the input text into smaller chunks suitable for embedding.

The exact chunking strategy (size, overlap, etc.) is implementation-dependent.

Parameters:

text (str) – The full text to split.

Returns:

A list of text chunks.

Return type:

List[str]

class LocalSearch.backend.text_extractor.DefaultTextExtractor.DefaultTextExtractor(base_path: str, chunk_size: int = 500, chunk_overlap: int = 50)

Bases: BaseTextExtractor

Default text extractor supporting .txt, .pdf, and .html files.

Provides optional text chunking with overlap for downstream processing.

SUPPORTED_TYPES = ['.txt', '.pdf', '.html']
can_handle(file_path: str) bool

Check if the file extension is supported by this extractor.

Parameters:

file_path – Path to the file.

Returns:

True if the file type is supported, False otherwise.

extract_text(file_path: str) str

Extract text from a supported file type.

Parameters:

file_path – Path to the file.

Returns:

Extracted text as a single string. Returns empty string on error.

split_text(text: str) List[str]

Split text into chunks with optional overlap.

Parameters:

text – Full text to split.

Returns:

List of text chunks.

Engine

This class provides the main interface for searching documents and querying LLMs. Use LocalSearch.backend.engine.SearchEngine to initialize and run searches or start the web interface.

class LocalSearch.backend.engine.SearchEngine(directory_path: str, llm: BaseLLM, embedding_model: BaseEmbedder | None = None, include_file_types: list[str] = ['.txt', '.pdf', '.html'], metadata_store: BaseMetadataStore | None = None, vector_store: BaseVectorStore | None = None, extractor: BaseTextExtractor | None = None, reembed_policy: str = 'modified_only', verbose: bool = True, recursive: bool = True)

Bases: object

Local semantic search engine that wraps file embeddings, metadata, vector store, text extraction, and LLM-based query.

search(query: str, top_k: int = 5) str

Perform semantic search and return LLM-generated answer using only relevant context.

web(host='127.0.0.1', port=8000)

Serve the local frontend with FastAPI and static files.