:::
顯示具有 Ollama 標籤的文章。 顯示所有文章

只用CPU跑「小型」語言模型可行嗎? / Is Running "Small" Language Models on CPUs Only Feasible?

布丁布丁吃布丁

只用CPU跑「小型」語言模型可行嗎? / Is Running "Small" Language Models on CPUs Only Feasible?

2025-0121-211755.png

很多人都說跑大型語言模型需要很高級的GPU,其實相對於門檻較高的大型語言模型,小型語言模型也一直在如火如荼地發展。最近我嘗試用12核CPU跟32GB的RAM來跑Gemma2:2B,意外地很順利呢。

Many people say that running large language models requires high-end GPUs. However, relative to the higher barrier to entry of large language models, small language models have also been developing rapidly. Recently, I experimented with running Gemma2:2B using a 12-core CPU and 32GB of RAM, and it went surprisingly smoothly.

(more...)

離開抱抱臉: 讓Dify擁抱Ollama / Leaving Hugging Face: Embracing Ollama with Dify

布丁布丁吃布丁

離開抱抱臉: 讓Dify擁抱Ollama / Leaving Hugging Face: Embracing Ollama with Dify

2024-1203-195243.png

在「自行架設大型語言模式應用程式:Dify」這篇我講到我在Dify裡面使用Hugging Face API作文字嵌入(embedding),但啟動等待跟限速的限制降低了使用效率。這次我們把文字嵌入的工作交給自行架設的Ollama,讓我們來看看自行架設到底可以帶來什麼效益吧。

In the article "Self-Hosting a Large Language Model Application: Dify," I discussed using the Hugging Face API for text embedding within Dify. However, startup latency and rate limiting reduced efficiency. This time, we'll offload the text embedding task to a self-hosted Ollama instance. Let's explore the benefits of self-hosting.

(more...)