API Reference

Embeddings

This section contains classes for generating vector embeddings from documents. Users can extend LocalSearch.backend.embeddings.BaseEmbeddings to implement custom embedding strategies.

class LocalSearch.backend.embeddings.BaseEmbeddings.BaseEmbedder

Bases: ABC

Abstract base class for any text embedding model.

Subclasses must implement methods to encode text into vectors and report embedding dimensionality.

abstract dimension() → int

Return the dimensionality of the embedding vectors produced by this model.

Returns:: Embedding dimension.
Return type:: int

abstract encode(text: str) → ndarray

Convert a single text string into a vector embedding.

Parameters:: text – Input text to embed.
Returns:: The resulting embedding vector (dtype=float32).
Return type:: np.ndarray

class LocalSearch.backend.embeddings.SentenceTransformerEmbedder.SentenceTransformerEmbedder(model_name: str = 'all-MiniLM-L6-v2')

Bases: BaseEmbedder

Embedder using a Sentence-Transformers model.

Default model: “all-MiniLM-L6-v2”

dimension() → int

Return the dimensionality of the embedding vectors.

Returns:: Embedding dimension.
Return type:: int

encode(text: str) → ndarray

Encode a text string into a vector embedding.

Parameters:: text – Input string to encode.
Returns:: Embedding vector as float32 array.
Return type:: np.ndarray

LLMs

This section contains classes for interacting with Large Language Models (LLMs). Users can subclass LocalSearch.backend.llms.BaseLLM to provide custom LLM implementations.

class LocalSearch.backend.llms.BaseLLM.BaseLLM

Bases: ABC

Abstract base class for any LLM (Large Language Model) integration.

abstract generate(prompt: str) → str

Generate text from a prompt using the underlying LLM.

Parameters:: prompt – Input prompt string.
Returns:: Generated text as a string.

class LocalSearch.backend.llms.GroqLLM.GroqLLM(api_key: str, model: str = 'openai/gpt-oss-120b')

Bases: BaseLLM

Groq LLM client wrapper.

Uses the Groq SDK to call a specified model and stream output.

generate(prompt: str) → str

Generate text for the given prompt.

Parameters:: prompt – Input string to generate a response for.
Returns:: Generated text as a string.

Vector Stores

This section contains classes for storing and querying document vectors. Users can implement LocalSearch.backend.vector_store.BaseVectorStore to define a custom vector storage backend.

class LocalSearch.backend.vector_store.BaseVectorStore.BaseVectorStore

Bases: ABC

Abstract base class for any vector store.

Any implementation must provide all of the following methods to be compatible with _process_files and other utilities.

abstract add(vectors: ndarray, ids: ndarray, metadata: List[dict])

Add vectors with corresponding IDs and associated metadata.

Parameters:

vectors (np.ndarray) – 2D array of vectors to add.
ids (np.ndarray) – 1D array of unique IDs for each vector.
metadata (List[dict]) – List of metadata dicts, one per vector.

abstract dimension() → int

Return the dimensionality of vectors supported by this store.

Returns:: Embedding vector dimensionality.
Return type:: int

abstract get_all_ids() → Set[int]

Return a set of all vector IDs currently stored.

Returns:: All IDs in the store.
Return type:: Set[int]

abstract load(path: str)

Load the store from disk.

Parameters:: path (str) – File path or directory to load the store from.

abstract prepare_index(directory_path: str, recursive: bool = True) → IndexPreparation

Prepare or load the index from a directory.

Parameters:

directory_path (str) – Directory containing files to index.
recursive (bool) – Whether to scan subdirectories.

Returns:

Dict with keys:

’index’: internal index object
’current_files’: set of files present in the directory
’used_ids’: set of vector IDs already in the index

Return type:

IndexPreparation

abstract remove_by_id(vector_id: int) → None

Remove a vector from the store by its ID.

Parameters:: vector_id (int) – ID of the vector to remove.

abstract save(path: str)

Persist the store to disk.

Parameters:: path (str) – File path or directory to save the store.

abstract search(query_vector: ndarray, top_k: int) → List[SearchResult]

Return the top_k nearest neighbors for a query vector.

Parameters:

query_vector (np.ndarray) – Single query vector.
top_k (int) – Number of nearest neighbors to return.

Returns:

List of search results with keys ‘id’, ‘score’, ‘metadata’.

Return type:

List[SearchResult]

class LocalSearch.backend.vector_store.BaseVectorStore.IndexPreparation

Bases: TypedDict

TypedDict for the dictionary returned by prepare_index(). Ensures a strict format for generic utilities.

current_files: Set[str]

index: object

used_ids: Set[int]

class LocalSearch.backend.vector_store.BaseVectorStore.SearchResult

Bases: TypedDict

TypedDict for a single search result returned by the vector store.

id: int

metadata: dict

score: float

class LocalSearch.backend.vector_store.FaissVectorStore.FaissVectorStore(dim: int)

Bases: BaseVectorStore

FAISS-based vector store with optional metadata persistence.

add(vectors: ndarray, ids: ndarray, metadata: List[dict] | None = None)

Add vectors with IDs and optional metadata.

Parameters:

vectors – Array of shape (n, dim)
ids – Array of integer IDs
metadata – Optional list of metadata dictionaries

dimension() → int: Return dimensionality of vectors in this store.

get_all_ids() → Set[int]: Return a set of all vector IDs currently stored.

load(path: str): Load FAISS index and associated metadata.

prepare_index(directory_path: str, recursive: bool = True) → IndexPreparation

Load existing index if available, else create a new one.

Parameters:

directory_path – Directory to store/load index.
recursive – Whether to scan subdirectories.

Returns:

Dict with keys:

index: FAISS index object
current_files: set of valid files
used_ids: set of IDs already in index

Return type:

IndexPreparation

remove_by_id(vector_id: int) → None: Remove a vector by its ID and delete associated metadata.

save(path: str | None = None)

Save FAISS index and metadata mapping.

Parameters:: path – Optional path to save index (overrides self.index_path)

search(query_vector: ndarray, top_k: int = 5) → List[SearchResult]

Search for nearest neighbors of a query vector.

Parameters:

query_vector – Array of shape (1, dim)
top_k – Number of neighbors to return

Returns:

List of dictionaries with keys ‘id’, ‘score’, ‘metadata’

Return type:

List[SearchResult]

Metadata Stores

This section contains classes for storing metadata about documents and vector chunks. Users can subclass LocalSearch.backend.metadata_store.BaseMetaDataStore to implement custom metadata handling.

class LocalSearch.backend.metadata_store.BaseMetaDataStore.BaseMetadataStore

Bases: ABC

Abstract base class for metadata storage backends.

Defines strict output formats for metadata and chunk mapping.

abstract get_file_info(file_path: str) → FileMetadata

Get information about a specific file.

Parameters:: file_path – Path to the file.
Returns:: A FileMetadata dictionary.

abstract is_modified(file_path: str, current_info: FileMetadata) → bool

Determine if a file has changed compared to stored metadata.

Parameters:

file_path – Path to the file.
current_info – Current FileMetadata dictionary.

Returns:

True if the file is modified, False otherwise.

abstract load_chunk_mapping() → List[ChunkMapping]

Load the chunk mapping list from persistent storage.

Returns:: List of ChunkMapping dictionaries.

abstract load_metadata() → Dict[str, FileMetadata]

Load all metadata from persistent storage.

Returns:: Dictionary mapping file paths to FileMetadata dictionaries.

abstract save_chunk_mapping(chunk_mapping: List[ChunkMapping]) → None

Save the chunk mapping list to persistent storage.

Parameters:: chunk_mapping – List of ChunkMapping dictionaries.

abstract save_metadata(metadata: Dict[str, FileMetadata]) → None

Save metadata to persistent storage.

Parameters:: metadata – Dictionary mapping file paths to FileMetadata dictionaries.

abstract update(file_path: str, file_info: FileMetadata) → None

Update metadata for a specific file.

Parameters:

file_path – Path to the file.
file_info – FileMetadata dictionary.

class LocalSearch.backend.metadata_store.BaseMetaDataStore.ChunkMapping

Bases: TypedDict

Structure of a single chunk mapping.

chunk_id: str

end: int

file_path: str

start: int

class LocalSearch.backend.metadata_store.BaseMetaDataStore.FileMetadata

Bases: TypedDict

Structure of metadata stored for each file.

modified: float

size: int

class LocalSearch.backend.metadata_store.JsonMetadataStore.JsonMetadataStore(directory_path: str)

Bases: BaseMetadataStore

JSON-based metadata and chunk mapping persistence.

get_file_info(file_path: str) → dict

Get information about a specific file.

Parameters:: file_path – Path to the file.
Returns:: A FileMetadata dictionary.

is_modified(file_path: str, current_info: dict) → bool

Determine if a file has changed compared to stored metadata.

Parameters:

file_path – Path to the file.
current_info – Current FileMetadata dictionary.

Returns:

True if the file is modified, False otherwise.

load_chunk_mapping()

Load the chunk mapping list from persistent storage.

Returns:: List of ChunkMapping dictionaries.

load_metadata()

Load all metadata from persistent storage.

Returns:: Dictionary mapping file paths to FileMetadata dictionaries.

save_chunk_mapping(chunk_mapping: list[dict])

Save the chunk mapping list to persistent storage.

Parameters:: chunk_mapping – List of ChunkMapping dictionaries.

save_metadata(metadata: dict)

Save metadata to persistent storage.

Parameters:: metadata – Dictionary mapping file paths to FileMetadata dictionaries.

update(file_path: str, file_info: dict)

Update metadata for a specific file.

Parameters:

file_path – Path to the file.
file_info – FileMetadata dictionary.

Text Extractors

This section contains classes for extracting text from files. Users can implement LocalSearch.backend.text_extractor.BaseTextExtractor to define custom extraction strategies.

class LocalSearch.backend.text_extractor.BaseTextExtractor.BaseTextExtractor

Bases: ABC

Abstract base class for extracting text from documents.

Subclasses must implement methods to determine if a file type is supported and to extract text from files.

abstract can_handle(file_path: str) → bool

Determine if this extractor can handle the given file type.

Parameters:: file_path – Path to the file.
Returns:: True if this extractor can process the file type, False otherwise.

abstract extract_text(file_path: str) → str

Extract and return text from a file.

Parameters:: file_path – Path to the file.
Returns:: Extracted text as a string.

abstract split_text(text: str) → list[str]

Split the input text into smaller chunks suitable for embedding.

The exact chunking strategy (size, overlap, etc.) is implementation-dependent.

Parameters:: text (str) – The full text to split.
Returns:: A list of text chunks.
Return type:: List[str]

class LocalSearch.backend.text_extractor.DefaultTextExtractor.DefaultTextExtractor(base_path: str, chunk_size: int = 500, chunk_overlap: int = 50)

Bases: BaseTextExtractor

Default text extractor supporting .txt, .pdf, and .html files.

Provides optional text chunking with overlap for downstream processing.

SUPPORTED_TYPES = ['.txt', '.pdf', '.html']

can_handle(file_path: str) → bool

Check if the file extension is supported by this extractor.

Parameters:: file_path – Path to the file.
Returns:: True if the file type is supported, False otherwise.

extract_text(file_path: str) → str

Extract text from a supported file type.

Parameters:: file_path – Path to the file.
Returns:: Extracted text as a single string. Returns empty string on error.

split_text(text: str) → List[str]

Split text into chunks with optional overlap.

Parameters:: text – Full text to split.
Returns:: List of text chunks.

Engine

This class provides the main interface for searching documents and querying LLMs. Use LocalSearch.backend.engine.SearchEngine to initialize and run searches or start the web interface.

class LocalSearch.backend.engine.SearchEngine(directory_path: str, llm: BaseLLM, embedding_model: BaseEmbedder | None = None, include_file_types: list[str] = ['.txt', '.pdf', '.html'], metadata_store: BaseMetadataStore | None = None, vector_store: BaseVectorStore | None = None, extractor: BaseTextExtractor | None = None, reembed_policy: str = 'modified_only', verbose: bool = True, recursive: bool = True)

Bases: object

Local semantic search engine that wraps file embeddings, metadata, vector store, text extraction, and LLM-based query.

search(query: str, top_k: int = 5) → str: Perform semantic search and return LLM-generated answer using only relevant context.

web(host='127.0.0.1', port=8000): Serve the local frontend with FastAPI and static files.