Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Embedding API Latency and Caching
An empirical comparison of embedding API latency from OpenAI, Cohere, Google Vertex AI, and Jina, examining temporal and environmental effects and emphasizing caching benefits.
Embeddings are a key ingredient in many modern ML technologies, such as RAG and semantic search. But every time you hit the OpenAI API to embed your search query, are you paying too high a latency toll?
We measured embedding API latency across OpenAI, Cohere, Google Vertex AI, and Jina, and now we know whether the time of day (or even the weather!) affects it. And yes, after seeing the numbers you will always cache embeddings!
This project is a load tester for an embeddable API.
- ScalaScala is a multi-paradigm, statically typed language that seamlessly integrates object-oriented and functional programming on the JVM.Scala (Scalable Language) is a high-level, general-purpose language designed by Martin Odersky: it runs on the Java Virtual Machine (JVM), ensuring robust interoperability with existing Java libraries. It is a pure object-oriented language where every value is an object, yet it fully supports functional programming with features like higher-order functions and pattern matching. This powerful combination allows developers to build concise, type-safe, and highly scalable applications. Scala is the foundational language for major distributed computing frameworks (e.g., Apache Spark) and is used for building fast, concurrent, and distributed systems. Its expressive type system and modern features are key to managing complexity in large-scale projects.
Related projects
How we build the next generation embeddings and rerank model
Berlin
This talk explains the development of a state-of-the-art embeddings and rerank model that surpasses OpenAI's text-embedding-v3, enhancing AI…
Beyond Text: Building a fast Visual Search Engine
Berlin
Learn how to build a production‑grade visual search engine using 1280‑dimensional embeddings, multi‑tenant ingestion, GPU inference, and sub‑second…
High-throughput embedding generation for Vector DB corpus fill
San Francisco
This talk demonstrates an optimized TensorRT-LLM embedding runtime achieving up to twice the performance of alternatives, with code,…
AI Computer
Berlin
Learn how to build a desktop PC with an RTX 3090 for local AI workloads, covering hardware assembly, software…
Embedding Models in Action: From Category Mapping to Visual Search
Hamburg
Learn how embedding models automate product category mapping across marketplaces and power a visual search engine that detects…
The fastest cold starts in the world - a new type of docker registry and kubernetes written in rust
London
Learn how a Rust‑based Docker registry and rebuilt containerd reduce AI model cold start times by 3‑6×, with…