只用CPU跑「小型」語言模型可行嗎？ / Is Running "Small" Language Models on CPUs Only Feasible?

很多人都說跑大型語言模型需要很高級的GPU，其實相對於門檻較高的大型語言模型，小型語言模型也一直在如火如荼地發展。最近我嘗試用12核CPU跟32GB的RAM來跑Gemma2:2B，意外地很順利呢。

Many people say that running large language models requires high-end GPUs. However, relative to the higher barrier to entry of large language models, small language models have also been developing rapidly. Recently, I experimented with running Gemma2:2B using a 12-core CPU and 32GB of RAM, and it went surprisingly smoothly.

(more...)

布丁布丁吃布丁

12月 30, 2024 0 Comments Dify Ollama Text Embedding

離開抱抱臉: 讓Dify擁抱Ollama / Leaving Hugging Face: Embracing Ollama with Dify

在「自行架設大型語言模式應用程式：Dify」這篇我講到我在Dify裡面使用Hugging Face API作文字嵌入(embedding)，但啟動等待跟限速的限制降低了使用效率。這次我們把文字嵌入的工作交給自行架設的Ollama，讓我們來看看自行架設到底可以帶來什麼效益吧。

In the article "Self-Hosting a Large Language Model Application: Dify," I discussed using the Hugging Face API for text embedding within Dify. However, startup latency and rate limiting reduced efficiency. This time, we'll offload the text embedding task to a self-hosted Ollama instance. Let's explore the benefits of self-hosting.

(more...)

布丁布丁吃什麼？

只用CPU跑「小型」語言模型可行嗎？ / Is Running "Small" Language Models on CPUs Only Feasible?

只用CPU跑「小型」語言模型可行嗎？ / Is Running "Small" Language Models on CPUs Only Feasible?