API Reference
Embeddings
This section contains classes for generating vector embeddings from documents.
Users can extend LocalSearch.backend.embeddings.BaseEmbeddings to implement custom embedding strategies.
- class LocalSearch.backend.embeddings.BaseEmbeddings.BaseEmbedder
Bases:
ABCAbstract base class for any text embedding model.
Subclasses must implement methods to encode text into vectors and report embedding dimensionality.
- abstract dimension() int
Return the dimensionality of the embedding vectors produced by this model.
- Returns:
Embedding dimension.
- Return type:
int
- abstract encode(text: str) ndarray
Convert a single text string into a vector embedding.
- Parameters:
text – Input text to embed.
- Returns:
The resulting embedding vector (dtype=float32).
- Return type:
np.ndarray
- class LocalSearch.backend.embeddings.SentenceTransformerEmbedder.SentenceTransformerEmbedder(model_name: str = 'all-MiniLM-L6-v2')
Bases:
BaseEmbedderEmbedder using a Sentence-Transformers model.
Default model: “all-MiniLM-L6-v2”
- dimension() int
Return the dimensionality of the embedding vectors.
- Returns:
Embedding dimension.
- Return type:
int
- encode(text: str) ndarray
Encode a text string into a vector embedding.
- Parameters:
text – Input string to encode.
- Returns:
Embedding vector as float32 array.
- Return type:
np.ndarray
LLMs
This section contains classes for interacting with Large Language Models (LLMs).
Users can subclass LocalSearch.backend.llms.BaseLLM to provide custom LLM implementations.
- class LocalSearch.backend.llms.BaseLLM.BaseLLM
Bases:
ABCAbstract base class for any LLM (Large Language Model) integration.
- abstract generate(prompt: str) str
Generate text from a prompt using the underlying LLM.
- Parameters:
prompt – Input prompt string.
- Returns:
Generated text as a string.
- class LocalSearch.backend.llms.GroqLLM.GroqLLM(api_key: str, model: str = 'openai/gpt-oss-120b')
Bases:
BaseLLMGroq LLM client wrapper.
Uses the Groq SDK to call a specified model and stream output.
- generate(prompt: str) str
Generate text for the given prompt.
- Parameters:
prompt – Input string to generate a response for.
- Returns:
Generated text as a string.
Vector Stores
This section contains classes for storing and querying document vectors.
Users can implement LocalSearch.backend.vector_store.BaseVectorStore to define a custom vector storage backend.
- class LocalSearch.backend.vector_store.BaseVectorStore.BaseVectorStore
Bases:
ABCAbstract base class for any vector store.
Any implementation must provide all of the following methods to be compatible with _process_files and other utilities.
- abstract add(vectors: ndarray, ids: ndarray, metadata: List[dict])
Add vectors with corresponding IDs and associated metadata.
- Parameters:
vectors (np.ndarray) – 2D array of vectors to add.
ids (np.ndarray) – 1D array of unique IDs for each vector.
metadata (List[dict]) – List of metadata dicts, one per vector.
- abstract dimension() int
Return the dimensionality of vectors supported by this store.
- Returns:
Embedding vector dimensionality.
- Return type:
int
- abstract get_all_ids() Set[int]
Return a set of all vector IDs currently stored.
- Returns:
All IDs in the store.
- Return type:
Set[int]
- abstract load(path: str)
Load the store from disk.
- Parameters:
path (str) – File path or directory to load the store from.
- abstract prepare_index(directory_path: str, recursive: bool = True) IndexPreparation
Prepare or load the index from a directory.
- Parameters:
directory_path (str) – Directory containing files to index.
recursive (bool) – Whether to scan subdirectories.
- Returns:
- Dict with keys:
’index’: internal index object
’current_files’: set of files present in the directory
’used_ids’: set of vector IDs already in the index
- Return type:
- abstract remove_by_id(vector_id: int) None
Remove a vector from the store by its ID.
- Parameters:
vector_id (int) – ID of the vector to remove.
- abstract save(path: str)
Persist the store to disk.
- Parameters:
path (str) – File path or directory to save the store.
- abstract search(query_vector: ndarray, top_k: int) List[SearchResult]
Return the top_k nearest neighbors for a query vector.
- Parameters:
query_vector (np.ndarray) – Single query vector.
top_k (int) – Number of nearest neighbors to return.
- Returns:
List of search results with keys ‘id’, ‘score’, ‘metadata’.
- Return type:
List[SearchResult]
- class LocalSearch.backend.vector_store.BaseVectorStore.IndexPreparation
Bases:
TypedDictTypedDict for the dictionary returned by prepare_index(). Ensures a strict format for generic utilities.
- current_files: Set[str]
- index: object
- used_ids: Set[int]
- class LocalSearch.backend.vector_store.BaseVectorStore.SearchResult
Bases:
TypedDictTypedDict for a single search result returned by the vector store.
- id: int
- metadata: dict
- score: float
- class LocalSearch.backend.vector_store.FaissVectorStore.FaissVectorStore(dim: int)
Bases:
BaseVectorStoreFAISS-based vector store with optional metadata persistence.
- add(vectors: ndarray, ids: ndarray, metadata: List[dict] | None = None)
Add vectors with IDs and optional metadata.
- Parameters:
vectors – Array of shape (n, dim)
ids – Array of integer IDs
metadata – Optional list of metadata dictionaries
- dimension() int
Return dimensionality of vectors in this store.
- get_all_ids() Set[int]
Return a set of all vector IDs currently stored.
- load(path: str)
Load FAISS index and associated metadata.
- prepare_index(directory_path: str, recursive: bool = True) IndexPreparation
Load existing index if available, else create a new one.
- Parameters:
directory_path – Directory to store/load index.
recursive – Whether to scan subdirectories.
- Returns:
- Dict with keys:
index: FAISS index object
current_files: set of valid files
used_ids: set of IDs already in index
- Return type:
- remove_by_id(vector_id: int) None
Remove a vector by its ID and delete associated metadata.
- save(path: str | None = None)
Save FAISS index and metadata mapping.
- Parameters:
path – Optional path to save index (overrides self.index_path)
- search(query_vector: ndarray, top_k: int = 5) List[SearchResult]
Search for nearest neighbors of a query vector.
- Parameters:
query_vector – Array of shape (1, dim)
top_k – Number of neighbors to return
- Returns:
List of dictionaries with keys ‘id’, ‘score’, ‘metadata’
- Return type:
List[SearchResult]
Metadata Stores
This section contains classes for storing metadata about documents and vector chunks.
Users can subclass LocalSearch.backend.metadata_store.BaseMetaDataStore to implement custom metadata handling.
- class LocalSearch.backend.metadata_store.BaseMetaDataStore.BaseMetadataStore
Bases:
ABCAbstract base class for metadata storage backends.
Defines strict output formats for metadata and chunk mapping.
- abstract get_file_info(file_path: str) FileMetadata
Get information about a specific file.
- Parameters:
file_path – Path to the file.
- Returns:
A FileMetadata dictionary.
- abstract is_modified(file_path: str, current_info: FileMetadata) bool
Determine if a file has changed compared to stored metadata.
- Parameters:
file_path – Path to the file.
current_info – Current FileMetadata dictionary.
- Returns:
True if the file is modified, False otherwise.
- abstract load_chunk_mapping() List[ChunkMapping]
Load the chunk mapping list from persistent storage.
- Returns:
List of ChunkMapping dictionaries.
- abstract load_metadata() Dict[str, FileMetadata]
Load all metadata from persistent storage.
- Returns:
Dictionary mapping file paths to FileMetadata dictionaries.
- abstract save_chunk_mapping(chunk_mapping: List[ChunkMapping]) None
Save the chunk mapping list to persistent storage.
- Parameters:
chunk_mapping – List of ChunkMapping dictionaries.
- abstract save_metadata(metadata: Dict[str, FileMetadata]) None
Save metadata to persistent storage.
- Parameters:
metadata – Dictionary mapping file paths to FileMetadata dictionaries.
- abstract update(file_path: str, file_info: FileMetadata) None
Update metadata for a specific file.
- Parameters:
file_path – Path to the file.
file_info – FileMetadata dictionary.
- class LocalSearch.backend.metadata_store.BaseMetaDataStore.ChunkMapping
Bases:
TypedDictStructure of a single chunk mapping.
- chunk_id: str
- end: int
- file_path: str
- start: int
- class LocalSearch.backend.metadata_store.BaseMetaDataStore.FileMetadata
Bases:
TypedDictStructure of metadata stored for each file.
- modified: float
- size: int
- class LocalSearch.backend.metadata_store.JsonMetadataStore.JsonMetadataStore(directory_path: str)
Bases:
BaseMetadataStoreJSON-based metadata and chunk mapping persistence.
- get_file_info(file_path: str) dict
Get information about a specific file.
- Parameters:
file_path – Path to the file.
- Returns:
A FileMetadata dictionary.
- is_modified(file_path: str, current_info: dict) bool
Determine if a file has changed compared to stored metadata.
- Parameters:
file_path – Path to the file.
current_info – Current FileMetadata dictionary.
- Returns:
True if the file is modified, False otherwise.
- load_chunk_mapping()
Load the chunk mapping list from persistent storage.
- Returns:
List of ChunkMapping dictionaries.
- load_metadata()
Load all metadata from persistent storage.
- Returns:
Dictionary mapping file paths to FileMetadata dictionaries.
- save_chunk_mapping(chunk_mapping: list[dict])
Save the chunk mapping list to persistent storage.
- Parameters:
chunk_mapping – List of ChunkMapping dictionaries.
- save_metadata(metadata: dict)
Save metadata to persistent storage.
- Parameters:
metadata – Dictionary mapping file paths to FileMetadata dictionaries.
- update(file_path: str, file_info: dict)
Update metadata for a specific file.
- Parameters:
file_path – Path to the file.
file_info – FileMetadata dictionary.
Text Extractors
This section contains classes for extracting text from files.
Users can implement LocalSearch.backend.text_extractor.BaseTextExtractor to define custom extraction strategies.
- class LocalSearch.backend.text_extractor.BaseTextExtractor.BaseTextExtractor
Bases:
ABCAbstract base class for extracting text from documents.
Subclasses must implement methods to determine if a file type is supported and to extract text from files.
- abstract can_handle(file_path: str) bool
Determine if this extractor can handle the given file type.
- Parameters:
file_path – Path to the file.
- Returns:
True if this extractor can process the file type, False otherwise.
- abstract extract_text(file_path: str) str
Extract and return text from a file.
- Parameters:
file_path – Path to the file.
- Returns:
Extracted text as a string.
- abstract split_text(text: str) list[str]
Split the input text into smaller chunks suitable for embedding.
The exact chunking strategy (size, overlap, etc.) is implementation-dependent.
- Parameters:
text (str) – The full text to split.
- Returns:
A list of text chunks.
- Return type:
List[str]
- class LocalSearch.backend.text_extractor.DefaultTextExtractor.DefaultTextExtractor(base_path: str, chunk_size: int = 500, chunk_overlap: int = 50)
Bases:
BaseTextExtractorDefault text extractor supporting .txt, .pdf, and .html files.
Provides optional text chunking with overlap for downstream processing.
- SUPPORTED_TYPES = ['.txt', '.pdf', '.html']
- can_handle(file_path: str) bool
Check if the file extension is supported by this extractor.
- Parameters:
file_path – Path to the file.
- Returns:
True if the file type is supported, False otherwise.
- extract_text(file_path: str) str
Extract text from a supported file type.
- Parameters:
file_path – Path to the file.
- Returns:
Extracted text as a single string. Returns empty string on error.
- split_text(text: str) List[str]
Split text into chunks with optional overlap.
- Parameters:
text – Full text to split.
- Returns:
List of text chunks.
Engine
This class provides the main interface for searching documents and querying LLMs.
Use LocalSearch.backend.engine.SearchEngine to initialize and run searches or start the web interface.
- class LocalSearch.backend.engine.SearchEngine(directory_path: str, llm: BaseLLM, embedding_model: BaseEmbedder | None = None, include_file_types: list[str] = ['.txt', '.pdf', '.html'], metadata_store: BaseMetadataStore | None = None, vector_store: BaseVectorStore | None = None, extractor: BaseTextExtractor | None = None, reembed_policy: str = 'modified_only', verbose: bool = True, recursive: bool = True)
Bases:
objectLocal semantic search engine that wraps file embeddings, metadata, vector store, text extraction, and LLM-based query.
- search(query: str, top_k: int = 5) str
Perform semantic search and return LLM-generated answer using only relevant context.
- web(host='127.0.0.1', port=8000)
Serve the local frontend with FastAPI and static files.