Google Cloud serverless sentiment-analysis pipeline using Cloud Run and PyTorch
AI Impact Summary
The blog documents building a serverless sentiment-analysis microservice on Google Cloud using a PyTorch-based transformer (distilbert-base-uncased-finetuned-sst-2-english) served via a Flask app in Docker on Cloud Run. It covers the migration path from TensorFlow checkpoints to PyTorch, evaluates AI Platform Prediction, App Engine, and Cloud Run, and settles on Cloud Run with increased memory to reduce cold-start impact and improve throughput. The architecture relies on a single Gunicorn worker and thread to constrain memory usage, trading concurrent request handling for cost efficiency, with a simple GET endpoint protected by an API key. This showcases a practical, low-traffic NLP inference pattern, but highlights critical operational levers: memory sizing, startup latency, and potential need for model warm-up to meet SLA.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info