A recipe for 50x faster local LLM inference | AI & ML Monthly
Welcome to machine learning & AI monthly for June 2025.
This is the video version of the newsletter I write every month which covers the latest and greatest (but not always the latest) in the world of AI and ML.
Thumbnail paper link: https://huggingface.co/papers/2506.14111
Read the issues online:
– AI/ML Monthly June 2025 (this video) — https://zerotomastery.io/blog/ai-and-machine-learning-monthly-newsletter-june-2025/
– AI/ML Monthly May 2025 — https://zerotomastery.io/blog/ai-and-machine-learning-monthly-newsletter-may-2025/
– AI/ML Monthly April 2025 — https://zerotomastery.io/blog/ai-and-machine-learning-monthly-newsletter-april-2025/
My links:
Download Nutrify (my startup) – https://nutrify.app
Download KeepTrack (my other startup) – https://keeptrack.app
Learn Hugging Face – https://dbourke.link/ZTM-HF-Text-Classification
Learn AI/ML (beginner-friendly course) – https://dbourke.link/ZTMMLcourse
Learn TensorFlow – https://dbourke.link/ZTMTFcourse
Learn PyTorch – https://dbourke.link/ZTMPyTorch
My ML blog – https://learnml.io
Read my novel Charlie Walks – https://www.charliewalks.com
Personal website – https://www.mrdbourke.com
Timestamps:
00:00 – Intro
00:25 – ZTM Object Detection with Hugging Face Transformers Project: https://www.learnhuggingface.com/notebooks/hugging_face_object_detection_tutorial
01:28 – KeepTrack is now an app: https://keeptrack.app
02:15 – The case for more ambition in AI research by Jack Morris: https://blog.jxmo.io/p/the-case-for-more-ambition
03:56 – Save money on AI audio transcriptions by speeding up the audio: https://george.mand.is/2025/06/openai-charges-by-the-minute-so-make-the-minutes-shorter/
06:16 – Answer.AI release ReadBench to test how well VLMs can read: https://www.answer.ai/posts/2025-06-05-readbench.html
9:06 – Flux.1 Kontext Release: https://bfl.ai/announcements/flux-1-kontext-dev
11:22 – Gemma 3n models designed to run on local devices released in full: https://huggingface.co/blog/gemma3n
18:05 – NuExtract 2.0 for structured data extraction: https://huggingface.co/collections/numind/nuextract-20-67c73c445106c12f2b1b6960
19:17 – 50x faster LLM inference recipe from Essential AI: https://huggingface.co/papers/2506.14111
23:32 – Qwen3 embedding and reranker models: https://huggingface.co/collections/Qwen/qwen3-embedding-6841b2055b99c44d9a4c371f
24:22 – BioCLIP 2: https://huggingface.co/imageomics/bioclip-2
29:34 – GLiNER-X series for any entity detection: https://huggingface.co/collections/knowledgator/gliner-x-684320a3f1220315c651d2f5
26:28 – V-JEPA 2: https://github.com/facebookresearch/vjepa2
30:58 – OCR edges towards its ChatGPT moment (Nanonets-OCR-s): https://nanonets.com/research/nanonets-ocr-s/
34:12 – torchvista – visualizing PyTorch model flows: https://github.com/sachinhosmani/torchvista
35:22 – Ovis-U1-3B combines multimodal understanding, image generation and editing: https://huggingface.co/AIDC-AI/Ovis-U1-3B
38:29 – Baidu release the Ernie 4.5 foundation models: https://huggingface.co/collections/baidu/ernie-45-6861cd4c9be84540645f35c9
39:46 – Google Colab updates (Hugging Face integration & more): https://medium.com/google-colab/launch-hugging-face-models-in-colab-for-faster-ai-exploration-bee261978cf9
42:13 – Apple updates its on-device and server foundation models: https://machinelearning.apple.com/research/apple-foundation-models-2025-updates
49:16 – Anthropic guide on building a multi-agent research system: https://www.anthropic.com/engineering/built-multi-agent-research-system
49:30 – Google Gemini 2.5 Pro and Flash releases: https://developers.googleblog.com/en/gemini-2-5-thinking-model-updates/
52:36 – Andrej Karpathy on Software 3.0, agents & more: https://youtu.be/LCEmiRjPEtQ?si=0mjoM5H9wlih_HdW
55:42 – Pivot to AI YouTube channel: https://www.youtube.com/@PivotToAI
56:04 – Nate B Jones YouTube channel: https://www.youtube.com/@NateBJones
source