Technology
TF-IDF
TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure that quantifies a term's relevance in a document by multiplying its local frequency (TF) with its global rarity (IDF).
TF-IDF is a core statistical method in information retrieval and text mining: it assigns a numerical weight to a word, signaling its importance within a document relative to a larger corpus. The calculation is direct: Term Frequency (TF) measures how often a word appears in the document, and Inverse Document Frequency (IDF) scales that value down if the word (like 'the' or 'a') is common across all documents. The final TF-IDF score emphasizes terms that are frequent in a specific document but rare overall (e.g., 'quantum' in a physics paper). This vectorization process is crucial for applications like building search engine relevance rankings and training machine learning models for text classification.
Related technologies
Recent Talks & Demos
Showing 1-6 of 6