# 小米 15 本地 AI 教學:Llama.cpp + Adreno 830 GPU 加速
## 硬體資訊
- 手機:小米 15 (Xiaomi 15)
- 處理器:Snapdragon 8 Elite
- GPU:Adreno 830
- 最大 GPU 記憶體:~1GB
- 系統:Android + Termux
---
## 第一部分:安裝 Termux 與環境設定
### 1. 安裝 Termux
從 **F-Droid**(不要用 Google Play)下載 Termux:
```
https://f-droid.org/packages/com.termux/
```
### 2. 更新與安裝依賴
```bash
pkg update && pkg upgrade
pkg install git cmake ninja clang make python curl
```
---
## 第二部分:編譯 Llama.cpp(OpenCL 版本)
### 1. 下載 llama.cpp 原始碼
```bash
cd ~
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
```
### 2. 設定 OpenCL 環境
```bash
# 測試 OpenCL 是否正常
export LD_LIBRARY_PATH=/vendor/lib64
clinfo
```
應該顯示:
```
Platform Name : QUALCOMM Snapdragon(TM)
Device Name : QUALCOMM Adreno(TM) 830
```
### 3. 編譯(使用 OpenCL)
```bash
cd ~/llama.cpp
cmake -B build-android \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENCL=ON \
-DGGML_OPENCL_EMBED_KERNELS=ON \
-DGGML_OPENCL_USE_ADRENO_KERNELS=ON \
-DBUILD_SHARED_LIBS=ON
cmake --build build-android -j$(nproc)
```
---
## 第三部分:下載模型
### 建議模型(記憶體限制)
| 模型 | 大小 | 記憶體需求 |
|-----|------|----------|
| gemma-2-2b-it-Q4_0 | ~1.5GB | 較低,推薦 |
| Qwen3-0.8B-Q4_0 | ~500MB | 最低 |
| gemma-4-E2B-it-Q4_0 | ~1.8GB | 中等 |
### 下載模型
```bash
# 自動下載到緩存
export LD_LIBRARY_PATH=/vendor/lib64
./build-android/bin/llama-server -hf unsloth/gemma-2-2b-it-GGUF:Q4_0
```
---
## 第四部分:啟動 Llama Server
```bash
export LD_LIBRARY_PATH=/vendor/lib64
cd ~/llama.cpp
./build-android/bin/llama-server \
-hf unsloth/gemma-2-2b-it-GGUF:Q4_0 \
-ngl 20 \
-c 4096 \
--host 0.0.0.0 \
--port 8080
```
**參數說明:**
- `-ngl 20`:使用 GPU 加速層數(Adreno 830 約 20 層)
- `-c 4096`:Context 長度
- `--host 0.0.0.0`:允許其他設備連接
---
## 第五部分:上網搜尋 + AI 對話
### 建立搜尋腳本
建立 `~/web_search_llama.py`:
```python
#!/usr/bin/env python3
"""
網頁搜尋 + llama.cpp AI 分析
"""
import sys
import requests
import openai
LLAMA_SERVER = "http://localhost:8080/v1"
MODEL = "gemma"
def web_search(query, num_results=5):
"""用 DuckDuckGo 搜尋"""
print(f"🔍 搜尋: {query}")
url = f"https://duckduckgo.com/html/?q={requests.utils.quote(query)}"
headers = {"User-Agent": "Mozilla/5.0"}
try:
resp = requests.get(url, headers=headers, timeout=10)
import re
links = re.findall(r'<a class="result__a" href="([^"]+)"[^>]*>([^<]+)</a>', resp.text)
results = []
for href, title in links[:num_results]:
if href.startswith('http'):
results.append({"title": title.strip(), "url": href})
return results if results else ["找不到結果"]
except Exception as e:
return [f"搜索失敗: {e}"]
def fetch_web(url):
"""抓取網頁內容"""
try:
response = requests.get(url, timeout=10, headers={"User-Agent": "Mozilla/5.0"})
text = response.text
import re
text = re.sub(r'<[^>]+>', '', text)
text = ' '.join(text.split())
return text[:3000]
except Exception as e:
return f"錯誤: {e}"
def chat(message):
print("\n🔍 搜尋中...")
results = web_search(message)
print(f"找到 {len(results)} 個結果")
all_content = ""
for i, r in enumerate(results[:3]):
print(f"{i+1}. {r.get('title', r)}")
if isinstance(r, dict) and 'url' in r:
content = fetch_web(r['url'])
all_content += f"\n\n--- {r.get('title')} ---\n{content}\n"
prompt = f"根據以下搜尋結果回答:{message}\n\n{all_content}"
try:
client = openai.OpenAI(api_key="dummy", base_url=LLAMA_SERVER)
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": prompt}],
temperature=0.7,
max_tokens=500
)
return response.choices[0].message.content
except Exception as e:
return f"錯誤: {e}"
if __name__ == "__main__":
print("🎯 AI 搜尋助手(輸入問題,或按 Ctrl+C 離開)")
while True:
try:
msg = input("\n你: ").strip()
if not msg:
continue
print(f"\nAI: {chat(msg)}")
except KeyboardInterrupt:
break
```
### 使用方式
1. 先啟動 llama-server(終端機 1):
```bash
export LD_LIBRARY_PATH=/vendor/lib64
cd ~/llama.cpp
./build-android/bin/llama-server -hf unsloth/gemma-2-2b-it-GGUF:Q4_0 -ngl 20 -c 4096 --host 0.0.0.0 --port 8080
```
2. 啟動搜尋腳本(終端機 2):
```bash
python ~/web_search_llama.py
```
3. 輸入問題,例如:
```
你: 今天天氣如何?
```
---
## 第六部分:測試結果
### OpenCL 偵測成功
```
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name : QUALCOMM Snapdragon(TM)
Device Name : QUALCOMM Adreno(TM) 830
```
### 執行時顯示
```
ggml_opencl: selecting device: QUALCOMM Adreno(TM) 830
ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS)
```
---
## 常見問題
### Q1:無法偵測 OpenCL?
```bash
# 檢查路徑
ls /vendor/lib64/libOpenCL.so
# 設定環境
export LD_LIBRARY_PATH=/vendor/lib64
```
### Q2:模型載入失敗?
- 確認使用 b5026 版本:`git checkout b5026`
- 或使用支援的模型:gemma-2-2b, Qwen3-0.8B
### Q3:記憶體不足?
- 減少 `-ngl` 數值(從 20 降到 10)
- 使用更小的模型
### Q4:gemma-4 需要 64K context?
- gemma-4-e2b 支援 6400 context
- 直接使用:`./build-android/bin/llama-server -hf unsloth/gemma-4-E2B-it-GGUF:Q4_0 -ngl 20 -c 6400`
---
## 總結
✅ 小米 15 + Adreno 830 可以運行本地 AI
✅ 使用 OpenCL 加速
✅ 可上網搜尋 + AI 分析
✅ 支援 gemma-2, gemma-4, Qwen3 等模型
需要更多模型或優化性能的可以繼續探索!
---
**更新日**:2025年
**適用**:小米 15 / Snapdragon 8 Elite / Adreno 830
cd ~/llama.cpp
export LD_LIBRARY_PATH=/vendor/lib64:$LD_LIBRARY_PATH
./build-android/bin/llama-server \
-hf unsloth/gemma-4-E4B-it-GGUF:Q4_0 \
-ngl 20 \
-c 6400 \
--mlock \
--host 0.0.0.0 \
--port 8080
python web_search_llama.py
#!/usr/bin/env python3
"""
網頁搜尋 + AI 分析 (支援天氣、新聞、股票等)
用法: python web_search_llama.py --interactive
"""
import sys
import requests
import openai
from urllib.parse import quote
LLAMA_SERVER = "http://http://192.168.5.233:8080/v1"
#LLAMA_SERVER = "http://localhost:8080/v1"
MODEL = "gemma"
def get_location():
"""嘗試自動取得位置"""
try:
r = requests.get("http://ip-api.com/json/?fields=61445", timeout=5)
data = r.json()
return data.get('city', 'Unknown'), data.get('country', 'Unknown')
except:
return None, None
def detect_search_type(query):
"""自動判斷搜尋類型"""
query_lower = query.lower()
if any(kw in query_lower for kw in ['天氣', 'weather', '溫度', '降雨']):
city = None
for word in ['台北', '台中', '台南', '高雄', '新北', '桃園', '北京', '上海', '東京', '首爾', '香港']:
if word in query:
city = word
break
if not city:
city, country = get_location()
if city:
city = city.replace('City', '')
return "weather", city or "台灣"
if any(kw in query_lower for kw in ['新聞', 'news', '時事', '最新']):
return "news", None
if any(kw in query_lower for kw in ['股票', 'stock', '股價', '台積電', '鴻海', '股市']):
return "stock", None
if any(kw in query_lower for kw in ['匯率', '匯率', '美元', '日元', '歐元']):
return "exchange", None
if any(kw in query_lower for kw in ['翻譯', 'translate', '英文', '日文']):
return "translate", None
return "general", None
def web_search(query, num_results=5):
"""用 DuckDuckGo 搜尋網頁"""
print(f"🔍 搜尋: {query}")
url = f"https://duckduckgo.com/html/?q={quote(query)}&kp=1"
headers = {"User-Agent": "Mozilla/5.0"}
try:
response = requests.get(url, headers=headers, timeout=30)
results = []
import re
links = re.findall(r'<a class="result__a" href="([^"]+)"[^>]*>([^<]+)</a>', response.text)
for href, title in links[:num_results]:
if href.startswith('http'):
results.append({"title": title.strip(), "url": href})
return results
except Exception as e:
return [{"error": str(e)}]
def fetch_web(url):
"""抓取網頁內容"""
print(f"🌐 抓取: {url}")
try:
response = requests.get(url, timeout=15, headers={"User-Agent": "Mozilla/5.0"})
text = response.text
import re
text = re.sub(r'<[^>]+>', '', text)
text = ' '.join(text.split())
return text[:4000]
except Exception as e:
return f"錯誤: {e}"
def ask_ai(prompt):
"""傳給 AI"""
print("🤖 AI 分析中...")
try:
client = openai.OpenAI(api_key="dummy", base_url=LLAMA_SERVER)
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": prompt}],
temperature=0.7,
max_tokens=600
)
return response.choices[0].message.content
except Exception as e:
return f"AI 錯誤: {e}"
def process_query(query):
search_type, extra = detect_search_type(query)
if search_type == "weather":
city = extra if extra else "台北"
query = f"{city} 天氣"
elif search_type == "news":
query = "最新新聞"
elif search_type == "stock":
query = "台灣股票行情"
elif search_type == "exchange":
query = "美元兌台幣匯率"
results = web_search(query)
if not results or "error" in results[0]:
print("搜尋失敗")
return
print(f"找到 {len(results)} 個結果\n")
all_content = ""
for i, r in enumerate(results[:3]):
print(f"{i+1}. {r['title']}")
content = fetch_web(r['url'])
all_content += f"\n\n--- {r['title']} ---\n{content}\n"
prompt = f"根據以下搜尋結果,用中文回答這個問題: {query}\n\n{all_content}"
result = ask_ai(prompt)
print("\n📝 AI 回覆:")
print(result)
def main():
print("🎯 AI 搜尋助手")
print("📌 功能:天氣、新聞、股票、翻譯、任意問題")
print("📌 範例:天氣怎麼樣?最新新聞?台積電股價?")
print("-" * 40)
while True:
try:
query = input("\n你 > ").strip()
if not query:
continue
if query.lower() in ['exit', 'q', 'quit', '離開']:
break
process_query(query)
except KeyboardInterrupt:
print("\n再見!")
break
if __name__ == "__main__":
main()