Modular: Semantic Search with MAX Engine — BGE-base-en-v1.5 & Batch Inference
AI Impact Summary
This document details the implementation of a semantic search engine leveraging the MAX Engine, specifically utilizing the BGE-base-en-v1.5 model and Amazon Multilingual Counterfactual Dataset. The core innovation lies in batching inference with MAX Engine, demonstrating performance improvements of up to 2.8x compared to PyTorch and ONNX runtime, particularly at smaller batch sizes on CPUs. This suggests a significant optimization opportunity for deploying this semantic search solution.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info