Portfolio Project
Smart Sentence Retriever
NLP Embeddings & Serverless Retrieval
Context
I wanted a quick way to retrieve relevant sentences by meaning, not exact wording.
Approach
- Prepared the corpus (Alice in Wonderland), split into sentences, and precomputed embeddings.
- Benchmarked 6+ embedding models on 800 sentences (k=2–6), tracking both silhouette score and efficiency (silhouette per million parameters).
- Chose the best silhouette‑score model and deployed it as a stateless AWS Lambda endpoint with CORS for top‑k ranking.
Impact
- Best absolute silhouette: 0.313 – Snowflake/snowflake‑arctic‑embed‑l‑v2.0 (k=2, 1024‑d, ~568M params).
- Best efficiency: 0.0116 per M params – jinaai/jina‑embeddings‑v3 (~12.9M params, k=6, 1024‑d).
- Deployed AWS model: Snowflake/snowflake‑arctic‑embed‑l‑v2.0 (prioritizing quality); live demo runs on a lightweight, scalable Lambda API.