Technology
datasets
Datasets is the core ML library for accessing, sharing, and processing thousands of AI-ready datasets (NLP, CV, Audio) with a single, efficient line of code.
Datasets is the essential utility for modern machine learning workflows: it provides a unified API for data access and preprocessing. The library allows engineers to load over 350,000 datasets (SQuAD, Common Crawl, etc.) directly from the Hugging Face Hub. It leverages an Apache Arrow backend to ensure zero-copy reads, enabling efficient handling of massive datasets without RAM constraints. This architecture streamlines data preparation, making it fast and scalable for training state-of-the-art models across various domains.
16 projects
·
15 cities
Related technologies
Recent Talks & Demos
Showing 1-16 of 16
Optimización de recursos para LLMs
Bogotá
Transformers
PEFT
AI Cantonese Lyric Generator
Hong Kong
Mar 26
LLM
Machine Learning
Hugging Face RAG: Reduce Hallucinations
Tiruchirappalli
Jan 31
Transformers
RAG
Dlab-852-Mini: Hong Kong Cultural AI
Hong Kong
Dec 18
Python
datasets
DARIA: Multi-modal Assessment Pipeline
Raleigh
Dec 10
React
Python
Foundation Models for Clinical Trials
Lausanne
Dec 3
GPT-4
LangChain
JetBrains AI Codebase Benchmarks
Toronto
Nov 10
Python
OpenAI
Teaching AI Maya Glyphs
Montreal
Oct 21
YOLOv8
ResNet
CantoneseLLM
Hong Kong
Aug 22
Baidu Ernie 4
CantoneseLLM
Intrinsic Dimension: Optimizing Embeddings
Austin
Jul 10
Claude Sonnet 4
datasets
Archetype ECS for AI Simulations
Chicago
Jun 3
Python
Daft
Large Scale Entity Interaction Networks
Milan
May 8
OpenAI API
Anthropic API
GeoAI for Territorial Management
Bogotá
Mar 27
GPT-4
Claude-3
Unstructured data visualization
Atlanta
Feb 27
Transformers
datasets
Modal: LLMs Generate Log Queries
Atlanta
Jan 23
Modal
GPT-4
Configuration-Based LLM Evals
Austin
Jul 11
LLMs
CLI