28 Commits
v0.4.0 ... main

Author SHA1 Message Date
glidea
094600ee26 update README
Removed sponsorship section and updated images with links.
2025-11-22 14:59:24 +08:00
glidea
c03e4c8359 Merge pull request #31 from Twelveeee/main
feat:add RSSHub RSSHubAccessKey
2025-11-07 15:59:20 +08:00
Twelveeee
584f94e1ef feat:add RSSHub RSSHubAccessKey 2025-11-07 14:27:29 +08:00
Twelveeee
6c4223de92 feat:add RSSHub RSSHubAccessKey 2025-11-06 15:58:11 +08:00
Twelveeee
f67db8ea86 feat:add RSSHub RSSHubAccessKey 2025-11-06 11:06:26 +08:00
Twelveeee
bc54cc852e feat:add RSSHub RSSHubAccessKey 2025-11-05 14:55:01 +00:00
glidea
7cb8069d60 update README.md 2025-09-08 15:56:13 +08:00
glidea
87b84d94ff update README.md 2025-09-06 16:20:32 +08:00
glidea
4d29bae67f update README 2025-08-18 16:41:23 +08:00
glidea
d640e975bd handle empty response for gemini 2025-08-18 16:33:27 +08:00
glidea
e4bd0ca43b recommend Qwen/Qwen3-Embedding-4B by default 2025-07-24 10:14:09 +08:00
glidea
8b001c4cdf update image 2025-07-16 11:40:43 +08:00
glidea
6cacb47d3d update doc 2025-07-15 11:31:25 +08:00
glidea
a65d597032 update doc 2025-07-14 21:46:20 +08:00
glidea
151bd5f66f update sponsor 2025-07-14 21:32:43 +08:00
glidea
69a9545869 update doc 2025-07-14 18:12:17 +08:00
glidea
b01e07e348 fix doc 2025-07-14 12:28:52 +08:00
glidea
e92d7e322e allow empty config for object storage 2025-07-11 21:42:54 +08:00
glidea
7b4396067b fix ci 2025-07-09 21:47:23 +08:00
glidea
00c5dfadee add podcast 2025-07-09 17:28:26 +08:00
glidea
263fcbbfaf update docs 2025-07-02 10:51:45 +08:00
glidea
9783ef693f update rewrite-zh.md 2025-06-26 10:52:35 +08:00
glidea
2df7c120a6 fix docs 2025-06-24 08:39:39 +08:00
glidea
4ac4667ce9 fix typo 2025-06-11 21:37:24 +08:00
glidea
94ac06d9ac update docs 2025-06-10 21:48:05 +08:00
glidea
90148b2fcd update docs 2025-06-10 17:03:43 +08:00
glidea
0fc6d73b04 update docs 2025-06-09 20:32:33 +08:00
glidea
55a5a186b9 add desc of telemetry.address 2025-06-09 17:58:51 +08:00
34 changed files with 2002 additions and 377 deletions

1
.github/FUNDING.yml vendored Normal file
View File

@@ -0,0 +1 @@
custom: https://afdian.com/a/glidea

View File

@@ -2,7 +2,7 @@ name: CI
on:
push:
branches: [ main ]
branches: [ main, dev ]
pull_request:
branches: [ main ]
release:
@@ -27,7 +27,7 @@ jobs:
build-and-push:
runs-on: ubuntu-latest
needs: test
if: github.event_name == 'release'
if: github.event_name == 'release' || (github.event_name == 'push' && github.ref_name == 'dev')
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
@@ -37,5 +37,9 @@ jobs:
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push Docker images
run: make push
- name: Build and push Docker image (main)
if: github.event_name == 'release'
run: make push
- name: Build and push Docker image (dev)
if: github.ref_name == 'dev'
run: make dev-push

2
.gitignore vendored
View File

@@ -18,7 +18,7 @@ local_docs/
.env
.env.local
__debug_bin
config.yaml
config.*yaml
data/
*debug*
.cursorrules

View File

@@ -1,172 +1,260 @@
zenfeed: Empower RSS with AI, automatically filter, summarize, and push important information for you, say goodbye to information overload, and regain control of reading.
[中文](README.md)
## Preface
<p align="center">
<img src="docs/images/crad.png" alt="zenfeed cover image">
</p>
RSS (Really Simple Syndication) was born in the Web 1.0 era to solve the problem of information fragmentation, allowing users to aggregate and track updates from multiple websites in one place without frequent visits. It pushes website updates in summary form to subscribers for quick information access.
<p align="center">
<a href="https://app.codacy.com/gh/glidea/zenfeed/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"><img src="https://app.codacy.com/project/badge/Grade/1b51f1087558402d85496fbe7bddde89"/></a>
<a href="https://sonarcloud.io/summary/new_code?id=glidea_zenfeed"><img src="https://sonarcloud.io/api/project_badges/measure?project=glidea_zenfeed&metric=sqale_rating"/></a>
<a href="https://goreportcard.com/badge/github.com/glidea/zenfeed"><img src="https://goreportcard.com/badge/github.com/glidea/zenfeed"/></a>
<a href="https://deepwiki.com/glidea/zenfeed"><img src="https://deepwiki.com/badge.svg"/></a>
</p>
However, with the rise of Web 2.0, social media, and algorithmic recommendations, RSS didn't become mainstream. The shutdown of Google Reader in 2013 was a landmark event. As Zhang Yiming pointed out at the time, RSS demands a lot from users: strong information filtering skills and self-discipline to manage feeds, otherwise it's easy to get overwhelmed by information noise. He believed that for most users, the easier "personalized recommendation" was a better solution, which later led to Toutiao and TikTok.
<h3 align="center">In the torrent of information (Feed), may you maintain your Zen.</h3>
Algorithmic recommendations indeed lowered the bar for information acquisition, but their excessive catering to human weaknesses often leads to filter bubbles and addiction to entertainment. If you want to get truly valuable content from the information stream, you actually need stronger self-control to resist the algorithm's "feeding".
<p align="center">
zenfeed is your <strong>AI information hub</strong>. It's an intelligent RSS reader, a real-time "news" knowledge base, and a personal secretary that helps you monitor "specific events" and delivers analysis reports.
</p>
So, is pure RSS subscription the answer? Not necessarily. Information overload and filtering difficulties (information noise) remain pain points for RSS users.
<p align="center">
<a href="https://zenfeed.xyz"><b>Live Demo (RSS Reading Only)</b></a>
&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;
<a href="docs/tech/hld-en.md"><b>Technical Documentation</b></a>
&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;
<a href="#-installation-and-usage"><b>Quick Start</b></a>
</p>
Confucius advocated the doctrine of the mean in all things. Can we find a middle ground that combines the sense of control and high-quality sources from active RSS subscription with technological means to overcome its information overload drawbacks?
> [!NOTE]
> The description on DeepWiki is not entirely accurate (and I cannot correct it), but the Q&A quality is decent.
Try zenfeed! **AI + RSS** might be a better way to acquire information in this era. zenfeed aims to leverage AI capabilities to help you automatically filter and summarize the information you care about, allowing you to maintain Zen (calmness) amidst the Feed (information flood).
---
## Project Introduction
**epub2rss**: Convert epub ebooks into RSS feeds that update with a chapter every day, [join waitlist](https://epub2rss.pages.dev/)
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/1b51f1087558402d85496fbe7bddde89)](https://app.codacy.com/gh/glidea/zenfeed/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
[![Maintainability Rating](https://sonarcloud.io/api/project_badges/measure?project=glidea_zenfeed&metric=sqale_rating)](https://sonarcloud.io/summary/new_code?id=glidea_zenfeed)
[![Go Report Card](https://goreportcard.com/badge/github.com/glidea/zenfeed)](https://goreportcard.com/report/github.com/glidea/zenfeed)
---
zenfeed is your intelligent information assistant. It automatically collects, filters, and summarizes news or topics you follow, then sends them to you. But we're not just building another "Toutiao"... 🤔
## 💡 Introduction
![Zenfeed](docs/images/arch.png)
RSS (Really Simple Syndication) was born in the Web 1.0 era to solve the problem of information fragmentation, allowing users to aggregate and track updates from multiple websites in one place without frequent visits. It pushes website updates to subscribers in summary form for quick information retrieval.
**For [RSS](https://en.wikipedia.org/wiki/RSS) Veterans** 🚗
* zenfeed can be your AI-powered RSS reader (works with [zenfeed-web](https://github.com/glidea/zenfeed-web))
* An [MCP](https://mcp.so/) Server for [RSSHub](https://github.com/DIYgod/RSSHub)
* A customizable, trusted RSS data source and an incredibly fast AI search engine
* Similar to [Feedly AI](https://feedly.com/ai)
However, with the rise of Web 2.0, social media, and algorithmic recommendations, RSS never became mainstream. The shutdown of Google Reader in 2013 was a landmark event. As Zhang Yiming (founder of ByteDance) pointed out at the time, RSS demands a lot from its users: strong information filtering skills and self-discipline to manage subscription sources, otherwise it's easy to get drowned in information noise. He believed that for most users, easier "personalized recommendations" were a better solution, which led to the creation of Toutiao and Douyin (TikTok).
Algorithmic recommendations have indeed lowered the barrier to accessing information, but their tendency to over-cater to human weaknesses often leads to filter bubbles and entertainment addiction. If you want to get truly valuable content from your information stream, you need even greater self-control to resist the algorithm's "feed."
So, is pure RSS subscription the answer? Not entirely. Information overload and the difficulty of filtering (information noise) are still major pain points for RSS users.
Confucius spoke of the "Doctrine of the Mean" in all things. Can we find a middle ground that allows us to enjoy the sense of control and high-quality sources from active RSS subscriptions while using technology to overcome the drawback of information overload?
Give zenfeed a try! **AI + RSS** might be a better way to consume information in this era. zenfeed aims to leverage the power of AI to help you automatically filter and summarize the information you care about, allowing you to maintain your Zen in the torrent of information (Feed).
> Reference Article: [AI Revives RSS? - sspai.com (Chinese)](https://sspai.com/post/89494)
---
## ✨ Features
![Zenfeed Architecture](docs/images/arch.png)
**For [RSS](https://en.wikipedia.org/wiki/RSS) Power Users** 🚗
* Your AI-powered RSS reader (use with [zenfeed-web](https://github.com/glidea/zenfeed-web))
* Can act as an [MCP](https://mcp.so/) Server for [RSSHub](https://github.com/DIYgod/RSSHub)
* Customize trusted RSS sources to build a lightning-fast personal AI search engine
* Similar in functionality to [Feedly AI](https://feedly.com/ai)
<details>
<summary>Preview</summary>
<img src="docs/images/feed-list-with-web.png" alt="Feed list with web UI" width="600">
<summary><b>Preview</b></summary>
<br>
<img src="docs/images/feed-list-with-web.png" alt="Feed list" width="600">
<img src="docs/images/chat-with-feeds.png" alt="Chat with feeds" width="500">
</details>
**For Seekers of [WWZZ](https://www.wwzzai.com/) Alternatives** 🔍
* zenfeed also offers [information tracking capabilities](https://github.com/glidea/zenfeed/blob/main/docs/config.md#schedule-configuration-schedules), emphasizing high-quality, customizable data sources.
* Think of it as an RSS-based, flexible, more PaaS-like version of [AI Chief Information Officer](https://github.com/TeamWiseFlow/wiseflow?tab=readme-ov-file).
**For Those Seeking an [Everything Tracker](https://www.wwzzai.com/) Alternative** 🔍
* Possesses powerful [information tracking capabilities](https://github.com/glidea/zenfeed/blob/main/docs/config.md#schedule-configuration-schedules) and emphasizes high-quality, customizable data sources.
* Can serve as an RSS version of [AI Chief Intelligence Officer](https://github.com/TeamWiseFlow/wiseflow?tab=readme-ov-file), but more flexible and closer to an engine.
<details>
<summary>Preview</summary>
<img src="docs/images/monitoring.png" alt="Monitoring preview" width="500">
<img src="docs/images/notification-with-web.png" alt="Notification with web UI" width="500">
<summary><b>Preview</b></summary>
<br>
<img src="docs/images/monitoring.png" alt="Monitoring setup" width="500">
<img src="docs/images/notification-with-web.png" alt="Notification example" width="500">
</details>
**For Information Anxiety Sufferers (like me)** 😌
* "zenfeed" combines "zen" and "feed," signifying maintaining calm (zen) amidst the information flood (feed).
* If you feel anxious and tired from constantly checking information streams, it's because context switching costs more than you think and hinders entering a flow state. Try the briefing feature: receive a summary email at a fixed time each day covering the relevant period. This allows for a one-time, quick, comprehensive overview. Ah, a bit of a renaissance feel, isn't it? ✨
**For Those with Information Anxiety (like me)** 😌
* If you're tired of endlessly scrolling through feeds, try the briefing feature. Receive AI-powered briefings at a scheduled time each day for a comprehensive and efficient overview, eliminating the hidden costs of context switching. A bit of a renaissance feel, don't you think? ✨
* "zenfeed" is a combination of "zen" and "feed," meaning: in the torrent of information (feed), may you maintain your zen.
<details>
<summary>Preview</summary>
<img src="docs/images/daily-brief.png" alt="Daily brief preview" width="500">
<summary><b>Preview</b></summary>
<br>
<img src="docs/images/daily-brief.png" alt="Daily brief example" width="500">
</details>
**For Explorers of AI Content Processing** 🔬
* zenfeed features a custom mechanism for pipelining content processing, similar to Prometheus [Relabeling](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config).
* Each piece of content is abstracted as a set of labels (e.g., title, source, body... are labels). At each node in the pipeline, you can process specific label values based on custom prompts (e.g., scoring, classifying, summarizing, filtering, adding new labels...). Subsequently, you can filter based on label queries, [route](https://github.com/glidea/zenfeed/blob/main/docs/config.md#notification-route-configuration-notifyroute-and-notifyroutesub_routes), and [display](https://github.com/glidea/zenfeed/blob/main/docs/config.md#notification-channel-email-configuration-notifychannelsemail)... See [Rewrite Rules](https://github.com/glidea/zenfeed/blob/main/docs/config.md#rewrite-rule-configuration-storagefeedrewrites).
* Crucially, you can flexibly orchestrate all this, giving zenfeed a strong tooling and personalization flavor. Welcome to integrate private data via the Push API and explore more possibilities.
**For Developers** 🔬
* **Pipelined Processing**: Similar to Prometheus's [Relabeling](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config), zenfeed abstracts each piece of content into a set of labels. At each stage of the pipeline, you can use custom prompts to process these labels (e.g., scoring, classifying, summarizing, filtering).
* **Flexible Orchestration**: Based on the processed labels, you can freely query, filter, [route](https://github.com/glidea/zenfeed/blob/main/docs/config.md#notification-routing-configuration-notifyroute-and-notifyroutesub_routes), and [send notifications](https://github.com/glidea/zenfeed/blob/main/docs/config.md#notification-channel-email-configuration-notifychannelsemail), giving zenfeed a highly tool-oriented and customizable nature. For details, see [Rewrite Rules](docs/tech/rewrite-en.md).
* **Open APIs**:
* [Query API](/docs/query-api-en.md)
* [RSS Exported API](/docs/rss-api-en.md)
* [Notify Webhook](/docs/webhook-en.md)
* [Extensive Declarative YAML Configuration](/docs/config.md)
<details>
<summary>Preview</summary>
<img src="docs/images/update-config-with-web.png" alt="Update config with web UI" width="500">
<summary><b>Preview</b></summary>
<br>
<img src="docs/images/update-config-with-web.png" alt="Update config via web" width="500">
</details>
**For Onlookers** 🍉
<p align="center">
<a href="docs/preview.md"><b>➡️ See More Previews</b></a>
</p>
Just for the exquisite email styles, install and use it now!
---
<img src="docs/images/monitoring.png" alt="Monitoring email style" width="400">
## 🚀 Installation and Usage
[More Previews](docs/preview.md)
### 1. Prerequisites
## Installation and Usage
> [!IMPORTANT]
> zenfeed uses model services from [SiliconFlow](https://cloud.siliconflow.cn/en) by default.
> * Models: `Qwen/Qwen3-8B` (Free) and `Qwen/Qwen3-Embedding-4B`.
> * If you don't have a SiliconFlow account yet, use this [**invitation link**](https://cloud.siliconflow.cn/i/U2VS0Q5A) to get a **¥14** credit.
> * If you need to use other providers or models, or for more detailed custom deployments, please refer to the [Configuration Documentation](https://github.com/glidea/zenfeed/blob/main/docs/config.md) to edit `docker-compose.yml`.
### 1. Installation
### 2. One-Click Deployment
By default, uses SiliconFlow's Qwen/Qwen3-8B (free) and Pro/BAAI/bge-m3. If you don't have a SiliconFlow account yet, use this [invitation link](https://cloud.siliconflow.cn/i/U2VS0Q5A) to get a ¥14 credit.
> Get the service up and running in as little as one minute.
Support for other vendors or models is available; follow the instructions below.
#### Mac/Linux
#### Mac / Linux
```bash
# Download the configuration file
curl -L -O https://raw.githubusercontent.com/glidea/zenfeed/main/docker-compose.yml
# If you need to customize more configuration parameters, directly edit docker-compose.yml#configs.zenfeed_config.content BEFORE running the command below.
# Configuration Docs: https://github.com/glidea/zenfeed/blob/main/docs/config.md
API_KEY=your_apikey TZ=your_local_IANA LANGUAGE=English docker-compose -p zenfeed up -d
# Start the service (replace with your API_KEY)
API_KEY="sk-..." docker-compose -p zenfeed up -d
```
#### Windows
> Use PowerShell to execute
#### Windows (PowerShell)
```powershell
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/glidea/zenfeed/main/docker-compose.yml" -OutFile ([System.IO.Path]::GetFileName("https://raw.githubusercontent.com/glidea/zenfeed/main/docker-compose.yml"))
# Download the configuration file
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/glidea/zenfeed/main/docker-compose.yml" -OutFile "docker-compose.yml"
# If you need to customize more configuration parameters, directly edit docker-compose.yml#configs.zenfeed_config.content BEFORE running the command below.
# Configuration Docs: https://github.com/glidea/zenfeed/blob/main/docs/config.md
$env:API_KEY = "your_apikey"; $env:TZ = "your_local_IANA"; $env:LANGUAGE = "English"; docker-compose -p zenfeed up -d
# Start the service (replace with your API_KEY)
$env:API_KEY = "sk-..."; docker-compose -p zenfeed up -d
```
### 2. Using the Web UI
🎉 **Deployment Complete!**
Access it at http://localhost:1400
Access https://zenfeed-web.pages.dev
> If deployed in an environment like a VPS, access https://vps_public_ip:1400 (remember to open the security group port). Do not use the public frontend above.
> ⚠️ zenfeed currently lacks authentication. Exposing it to the public internet might leak your API Key. Please configure your security groups carefully. If you have security concerns, please open an Issue.
> [!WARNING]
> * If you deploy zenfeed on a public server like a VPS, access it via `http://<YOUR_IP>:1400` and ensure that your firewall/security group allows traffic on port `1400`.
> * **Security Notice:** zenfeed does not yet have an authentication mechanism. Exposing the service to the public internet may leak your `API_KEY`. Be sure to configure strict security group rules to allow access only from trusted IPs.
#### Add RSS Feeds
### 3. Getting Started
<img src="docs/images/web-add-source.png" alt="Add source via web UI" width="400">
#### Add RSS Subscription Feeds
> To migrate from Follow, refer to [migrate-from-follow.md](docs/migrate-from-follow.md)
> Requires access to the respective source sites; ensure network connectivity.
> Wait a few minutes after adding, especially if the model has strict rate limits.
<img src="docs/images/web-add-source.png" alt="Add RSS source via web" width="400">
> * To migrate from Follow, please refer to [migrate-from-follow-en.md](docs/migrate-from-follow-en.md).
> * After adding a source, zenfeed needs to access the origin site, so ensure your network is connected.
> * Please wait a few minutes after adding for content to be fetched and processed, especially if the model has strict rate limits.
#### Configure Daily Briefings, Monitoring, etc.
<img src="docs/images/notification-with-web.png" alt="Configure notifications via web UI" width="400">
<img src="docs/images/notification-with-web.png" alt="Configure notifications via web" width="400">
### 3. Configure MCP (Optional)
Using Cherry Studio as an example, configure MCP and connect to Zenfeed, see [Cherry Studio MCP](docs/cherry-studio-mcp.md)
> Default address http://localhost:1301/sse
#### Configure MCP (Optional)
For example, to configure MCP and connect to Zenfeed with Cherry Studio, see [Cherry Studio MCP](docs/cherry-studio-mcp-en.md).
> Default address `http://localhost:1301/sse`
## Roadmap
* P0 (Very Likely)
* Support generating podcasts, male/female dialogues, similar to NotebookLM
* More data sources
* Email
* Web clipping Chrome extension
* P1 (Possible)
* Keyword search
* Support search engines as data sources
* App?
* The following are temporarily not prioritized due to copyright risks:
* Webhook notifications
* Web scraping
#### More...
The web UI doesn't fully capture zenfeed's powerful flexibility. For more ways to play, please check the [Configuration Documentation](docs/config.md)
## Notice
* Compatibility is not guaranteed before version 1.0.
* The project uses the AGPLv3 license; any forks must also be open source.
* For commercial use, please contact for registration; reasonable support can be provided. Note: Legal commercial use only, gray area activities are not welcome.
* Data is not stored permanently; default retention is 8 days.
---
## Acknowledgments
* Thanks to [eryajf](https://github.com/eryajf) for providing the [Compose Inline Config](https://github.com/glidea/zenfeed/issues/1) idea, making deployment easier to understand.
## 🗺️ Roadmap
## 👏🏻 Contributions Welcome
* No formal guidelines yet, just one requirement: "Code Consistency" it's very important.
We have some cool features planned. Check out our [Roadmap](/docs/roadmap-en.md) and feel free to share your suggestions!
## Disclaimer
---
**Before using the `zenfeed` software (hereinafter referred to as "the Software"), please read and understand this disclaimer carefully. Your download, installation, or use of the Software or any related services signifies that you have read, understood, and agreed to be bound by all terms of this disclaimer. If you do not agree with any part of this disclaimer, please cease using the Software immediately.**
## 💬 Community and Support
1. **Provided "AS IS":** The Software is provided on an "AS IS" and "AS AVAILABLE" basis, without any warranties of any kind, either express or implied. The project authors and contributors make no warranties or representations regarding the Software's merchantability, fitness for a particular purpose, non-infringement, accuracy, completeness, reliability, security, timeliness, or performance.
> **For usage questions, please prioritize opening an [Issue](https://github.com/glidea/zenfeed/issues).** This helps others with similar problems and allows for better tracking and resolution.
2. **User Responsibility:** You are solely responsible for all actions taken using the Software. This includes, but is not limited to:
* **Data Source Selection:** You are responsible for selecting and configuring the data sources (e.g., RSS feeds, potential future Email sources) you connect to the Software. You must ensure you have the right to access and process the content from these sources and comply with their respective terms of service, copyright policies, and applicable laws and regulations.
* **Content Compliance:** You must not use the Software to process, store, or distribute any content that is unlawful, infringing, defamatory, obscene, or otherwise objectionable.
* **API Key and Credential Security:** You are responsible for safeguarding the security of any API keys, passwords, or other credentials you configure within the Software. The authors and contributors are not liable for any loss or damage arising from your failure to maintain proper security.
* **Configuration and Use:** You are responsible for correctly configuring and using the Software's features, including content processing pipelines, filtering rules, notification settings, etc.
<table>
<tr>
<td align="center">
<img src="docs/images/wechat.png" alt="Wechat QR Code" width="150">
<br>
<strong>Join WeChat Group</strong>
</td>
<td align="center">
<img src="docs/images/sponsor.png" alt="Sponsor QR Code" width="150">
<br>
<strong>Buy Me a Coffee 🧋</strong>
</td>
</tr>
</table>
3. **Third-Party Content and Services:** The Software may integrate with or rely on third-party data sources and services (e.g., RSSHub, LLM providers, SMTP service providers). The project authors and contributors are not responsible for the availability, accuracy, legality, security, or terms of service of such third-party content or services. Your interactions with these third parties are governed by their respective terms and policies. Copyright for third-party content accessed or processed via the Software (including original articles, summaries, classifications, scores, etc.) belongs to the original rights holders, and you assume all legal liability arising from your use of such content.
Since you've read this far, how about giving us a **Star ⭐️**? It's the biggest motivation for me to keep maintaining this project!
4. **No Warranty on Content Processing:** The Software utilizes technologies like Large Language Models (LLMs) to process content (e.g., summarization, classification, scoring, filtering). These processed results may be inaccurate, incomplete, or biased. The project authors and contributors are not responsible for any decisions made or actions taken based on these processed results. The accuracy of semantic search results is also affected by various factors and is not guaranteed.
If you have any interesting AI job opportunities, please contact me!
5. **No Liability for Indirect or Consequential Damages:** In no event shall the project authors or contributors be liable under any legal theory (whether contract, tort, or otherwise) for any direct, indirect, incidental, special, exemplary, or consequential damages arising out of the use or inability to use the Software. This includes, but is not limited to, loss of profits, loss of data, loss of goodwill, business interruption, or other commercial damages or losses, even if advised of the possibility of such damages.
---
6. **Open Source Software:** The Software is licensed under the AGPLv3 License. You are responsible for understanding and complying with the terms of this license.
## 🧩 Ecosystem Projects
7. **Not Legal Advice:** This disclaimer does not constitute legal advice. If you have any questions regarding the legal implications of using the Software, you should consult a qualified legal professional.
### [Ruhang365 Daily](https://daily.ruhang365.com)
Founded in 2017, Ruhang365 aims to build a community for sharing expertise and growing together, starting with industry information exchange. It is dedicated to providing comprehensive career consulting, training, niche community interactions, and resource collaboration services for internet professionals.
8. **Modification and Acceptance:** The project authors reserve the right to modify this disclaimer at any time. Continued use of the Software following any modifications will be deemed acceptance of the revised terms.
*Experimental Content Sources (Updates Paused)*
* [V2EX](https://v2ex.analysis.zenfeed.xyz/)
* [LinuxDO](https://linuxdo.analysis.zenfeed.xyz/)
**Please be aware: Using the Software to fetch, process, and distribute copyrighted content may carry legal risks. Users are responsible for ensuring their usage complies with all applicable laws, regulations, and third-party terms of service. The project authors and contributors assume no liability for any legal disputes or losses arising from user misuse or improper use of the Software.**
---
## 📝 Notes and Disclaimer
### Notes
* **Version Compatibility:** Backward compatibility for APIs and configurations is not guaranteed before version 1.0.
* **Open Source License:** The project uses the AGPLv3 license. Any forks or distributions must also remain open source.
* **Commercial Use:** Please contact the author to register for commercial use. Support can be provided within reasonable limits. We welcome legitimate commercial applications but discourage using this project for illicit activities.
* **Data Storage:** Data is not stored permanently; the default retention period is 8 days.
### Acknowledgements
* Thanks to [eryajf](https://github.com/eryajf) for the [Compose Inline Config](https://github.com/glidea/zenfeed/issues/1) suggestion, which makes deployment easier to understand.
* [![Powered by DartNode](https://dartnode.com/branding/DN-Open-Source-sm.png)](https://dartnode.com "Powered by DartNode - Free VPS for Open Source")
### Contributing
* The contribution guidelines are still a work in progress, but we adhere to one core principle: "Code Style Consistency."
### Disclaimer
<details>
<summary><strong>Click to expand for the full disclaimer</strong></summary>
**Before using the `zenfeed` software (hereinafter "the Software"), please read and understand this disclaimer carefully. By downloading, installing, using the Software or any related services, you acknowledge that you have read, understood, and agree to be bound by all the terms of this disclaimer. If you do not agree with any part of this disclaimer, please cease using the Software immediately.**
1. **"AS IS" BASIS:** The Software is provided on an "as is" and "as available" basis, without any warranties of any kind, either express or implied. The project authors and contributors make no representations or warranties regarding the Software's merchantability, fitness for a particular purpose, non-infringement, accuracy, completeness, reliability, security, timeliness, or performance.
2. **USER RESPONSIBILITY:** You are solely responsible for all your activities conducted through the Software. This includes, but is not limited to:
* **Data Source Selection:** You are responsible for selecting and configuring the data sources (e.g., RSS feeds, future potential Email sources) to be connected. You must ensure that you have the right to access and process the content from these sources and comply with their respective terms of service, copyright policies, and applicable laws and regulations.
* **Content Compliance:** You must not use the Software to process, store, or distribute any illegal, infringing, defamatory, obscene, or otherwise objectionable content.
* **API Key and Credential Security:** You are responsible for safeguarding any API keys, passwords, or other credentials you configure within the Software. The project authors and contributors are not liable for any loss or damage arising from your failure to do so.
* **Configuration and Use:** You are responsible for the correct configuration and use of the Software's features, including content processing pipelines, filtering rules, notification settings, etc.
3. **THIRD-PARTY CONTENT AND SERVICES:** The Software may integrate with or rely on third-party data sources and services (e.g., RSSHub, LLM providers, SMTP services). The project authors and contributors are not responsible for the availability, accuracy, legality, security, or terms of service of such third-party content or services. Your interactions with these third parties are governed by their respective terms and policies. The copyright of third-party content accessed or processed through the Software (including original articles, summaries, classifications, scores, etc.) belongs to the original rights holders. You are solely responsible for any legal liabilities that may arise from your use of such content.
4. **NO GUARANTEE OF PROCESSING ACCURACY:** The Software uses technologies like Large Language Models (LLMs) to process content (e.g., for summaries, classifications, scoring, filtering). These results may be inaccurate, incomplete, or biased. The project authors and contributors are not responsible for any decisions or actions taken based on these processing results. The accuracy of semantic search results is also affected by multiple factors and is not guaranteed.
5. **LIMITATION OF LIABILITY:** In no event shall the project authors or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.
6. **OPEN SOURCE SOFTWARE:** The Software is licensed under the AGPLv3 license. You are responsible for understanding and complying with the terms of this license.
7. **NOT LEGAL ADVICE:** This disclaimer does not constitute legal advice. If you have any questions about the legal implications of using the Software, you should consult with a qualified legal professional.
8. **MODIFICATION AND ACCEPTANCE:** The project authors reserve the right to modify this disclaimer at any time. Your continued use of the Software will be deemed acceptance of the modified terms.
**Please be aware: Crawling, processing, and distributing copyrighted content using the Software may carry legal risks. Users are responsible for ensuring that their use complies with all applicable laws, regulations, and third-party terms of service. The project authors and contributors assume no liability for any legal disputes or losses arising from the user's misuse or improper use of the Software.**
</details>

285
README.md
View File

@@ -1,30 +1,59 @@
[English](README-en.md)
![](docs/images/crad.png)
---
# 合作伙伴
[![image](docs/images/302.jpg)](https://share.302.ai/mFS9MS)
三点:
[302.AI](https://share.302.ai/mFS9MS)是一个按需付费的AI应用平台提供市面上最全的AI API和AI在线应用。
* 面向用户我们提供了50多种AI应用涵盖文字、图片和音视频各个领域无需月费按用量付费在线使用。
* 面向开发者一站式接入几乎所有AI应用开发需要用到的模型和API一站式付费统一接入。
* 面向企业管理与使用界面分离一人管理多人使用降低中小企业使用AI的门槛和成本。
**1. AI 版 RSS 阅读器**
* 在线服务
* https://zenfeed.xyz
* 或 Folo 搜索 zenfeed
**2. 实时 “新闻” 知识库**
**3. 帮你时刻关注 “指定事件” 的秘书(如 “关税政策变化”“xx 股票波动”)**,并支持整理研究报告
每日研究报告(包含播客)(实验性质) -- 已暂停更新
* [V2EX](https://v2ex.analysis.zenfeed.xyz/)
* [LinuxDO](https://linuxdo.analysis.zenfeed.xyz/)
GitHub 一键登录 [注册一个](https://share.302.ai/mFS9MS) 试试吧!立即获得 1 美元额度
---
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/glidea/zenfeed)
**DeepWiki 的描述并不准确!!!**,但问答质量还行
技术说明文档见:[HLD](docs/tech/hld-zh.md)
# 正文
## 前言
<p align="center">
<img src="docs/images/crad.png" alt="zenfeed cover image">
</p>
<p align="center">
<a href="https://app.codacy.com/gh/glidea/zenfeed/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"><img src="https://app.codacy.com/project/badge/Grade/1b51f1087558402d85496fbe7bddde89"/></a>
<a href="https://sonarcloud.io/summary/new_code?id=glidea_zenfeed"><img src="https://sonarcloud.io/api/project_badges/measure?project=glidea_zenfeed&metric=sqale_rating"/></a>
<a href="https://goreportcard.com/badge/github.com/glidea/zenfeed"><img src="https://goreportcard.com/badge/github.com/glidea/zenfeed"/></a>
<a href="https://deepwiki.com/glidea/zenfeed"><img src="https://deepwiki.com/badge.svg"/></a>
</p>
<h3 align="center">在信息洪流Feed愿你保持禅定Zen</h3>
<p align="center">
zenfeed 是你的 <strong>AI 信息中枢</strong>。它既是<strong>智能 RSS 阅读器</strong>,也是实时<strong>"新闻"知识库</strong>,更能成为帮你时刻关注"指定事件",并呈递<strong>分析报告</strong>的私人秘书。
</p>
<p align="center">
<a href="https://zenfeed.xyz"><b>在线体验 (仅 RSS 阅读)</b></a>
&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;
<a href="https://github.com/xusonfan/zenfeedApp"><b>安卓版体验 (仅 RSS 阅读)</b></a>
&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;
<a href="docs/tech/hld-zh.md"><b>技术文档</b></a>
&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;
<a href="#-安装与使用"><b>快速开始</b></a>
</p>
> [!NOTE]
> DeepWiki 的描述并不完全准确
---
**epub2rss**: 把 epub 电子书转成每日更新一个章节的 RSS Feed[join waitlist](https://epub2rss.pages.dev/)
**one-coffee**: 一款类似 syft万物追踪的日报产品差异点支持播客等多模态高质量信源主攻 AI 领域)。下方加我微信加入 waitlist
---
## 💡 前言
RSS简易信息聚合诞生于 Web 1.0 时代,旨在解决信息分散的问题,让用户能在一个地方聚合、追踪多个网站的更新,无需频繁访问。它将网站更新以摘要形式推送给订阅者,便于快速获取信息。
@@ -40,161 +69,191 @@ RSS简易信息聚合诞生于 Web 1.0 时代,旨在解决信息分散
> 参考文章:[AI 复兴 RSS - 少数派](https://sspai.com/post/89494)
## 项目介绍
---
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/1b51f1087558402d85496fbe7bddde89)](https://app.codacy.com/gh/glidea/zenfeed/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
[![Maintainability Rating](https://sonarcloud.io/api/project_badges/measure?project=glidea_zenfeed&metric=sqale_rating)](https://sonarcloud.io/summary/new_code?id=glidea_zenfeed)
[![Go Report Card](https://goreportcard.com/badge/github.com/glidea/zenfeed)](https://goreportcard.com/report/github.com/glidea/zenfeed)
## ✨ 特性
zenfeed 是你的智能信息助手。它自动收集、筛选并总结关注的新闻或话题,然后发送给你。但我们可不是又造了一个 "今日头条"... 🤔
![Zenfeed Architecture](docs/images/arch.png)
![Zenfeed](docs/images/arch.png)
**For [RSS](https://zh.wikipedia.org/wiki/RSS) 老司机** 🚗
* zenfeed 可以是你的 AI 版 RSS 阅读器(配合 [zenfeed-web](https://github.com/glidea/zenfeed-web)
* [RSSHub](https://github.com/DIYgod/RSSHub) 的 [MCP](https://mcp.so/) Server
* 可自定义可信 RSS 数据源,且速度超快的 AI 搜索引擎
* 与 [Feedly AI](https://feedly.com/ai) 类似
**专为 [RSS](https://zh.wikipedia.org/wiki/RSS) 老司机** 🚗
* 你的 AI 版 RSS 阅读器(配合 [zenfeed-web](https://github.com/glidea/zenfeed-web) 使用)
* 可作为 [RSSHub](https://github.com/DIYgod/RSSHub) 的 [MCP](https://mcp.so/) Server
* 可自定义可信 RSS 数据源,打造速度超快的个人 AI 搜索引擎
* 功能与 [Feedly AI](https://feedly.com/ai) 类似
<details>
<summary>预览</summary>
<img src="docs/images/feed-list-with-web.png" alt="" width="600">
<summary><b>预览</b></summary>
<br>
<img src="docs/images/feed-list-with-web.png" alt="Feed list" width="600">
<img src="docs/images/chat-with-feeds.png" alt="Chat with feeds" width="500">
</details>
**For [万物追踪](https://www.wwzzai.com/) 替代品寻觅者** 🔍
* zenfeed 同样拥有 [信息追踪能力](https://github.com/glidea/zenfeed/blob/main/docs/config-zh.md#%E8%B0%83%E5%BA%A6%E9%85%8D%E7%BD%AE-scheduls),且更强调高质量,自定义的数据源
* [AI 首席情报官](https://github.com/TeamWiseFlow/wiseflow?tab=readme-ov-file) 的 RSS 版,灵活版,更接近引擎形态
**专为 [万物追踪](https://www.wwzzai.com/) 替代品寻觅者** 🔍
* 拥有强大的[信息追踪能力](https://github.com/glidea/zenfeed/blob/main/docs/config-zh.md#%E8%B0%83%E5%BA%A6%E9%85%8D%E7%BD%AE-scheduls),并更强调高质量、可自定义的数据源
* 可作为 [AI 首席情报官](https://github.com/TeamWiseFlow/wiseflow?tab=readme-ov-file) 的 RSS 版,更灵活,更接近引擎形态
<details>
<summary>预览</summary>
<img src="docs/images/monitoring.png" alt="" width="500">
<img src="docs/images/notification-with-web.png" alt="" width="500">
<summary><b>预览</b></summary>
<br>
<img src="docs/images/monitoring.png" alt="Monitoring setup" width="500">
<img src="docs/images/notification-with-web.png" alt="Notification example" width="500">
</details>
**For 信息焦虑症患者(比如我)** 😌
* "zenfeed" 是 "zen" 和 "feed" 的组合,意为在 feed信息洪流愿你保持 zen禅定
* 如果你对时不时地刷信息流感到焦虑疲惫,这是因为上下文切换的成本比想象得高,同时也妨碍了你进入心流。推荐你试试简报功能,每天固定时间收到对应时间段的简报邮件,从而一次性地,快速地,总览地完成阅读。啊哈有点文艺复兴的意味是吗 ✨
**专为 信息焦虑症患者 (比如我)** 😌
* 如果你对频繁刷信息流感到疲惫,试试简报功能。每日定时收取指定时段的 AI 简报,一次性、总览式地高效阅读,告别上下文切换的隐性成本。啊哈有点文艺复兴的意味是吗 ✨
* "zenfeed" 是 "zen" 和 "feed" 的组合,意为在 feed信息洪流愿你保持 zen禅定
<details>
<summary>预览</summary>
<img src="docs/images/daily-brief.png" alt="" width="500">
<summary><b>预览</b></summary>
<br>
<img src="docs/images/daily-brief.png" alt="Daily brief example" width="500">
</details>
**For AI 内容处理的探索者** 🔬
* zenfeed 有一种对内容进行管道化处理的自定义机制,类似 Prometheus [Relabeling](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config)
* 每篇内容都被抽象成一个标签集合(比如标题,来源,正文... 都是标签),在管道的每一个节点,可以基于自定义 Prompt 对特定标签值进行处理(比如评分、分类、摘要、过滤、添加新标签等...),而后基于标签查询过滤,[路由](https://github.com/glidea/zenfeed/blob/main/docs/config-zh.md#%E9%80%9A%E7%9F%A5%E8%B7%AF%E7%94%B1%E9%85%8D%E7%BD%AE-notifyroute-%E5%8F%8A-notifyroutesub_routes)[展示](https://github.com/glidea/zenfeed/blob/main/docs/config-zh.md#%E9%80%9A%E7%9F%A5%E6%B8%A0%E9%81%93-email-%E9%85%8D%E7%BD%AE-notifychannelsemail)... See [Rewrite Rules](docs/tech/rewrite-zh.md)
* 重要的是你可以灵活的编排这一切,这赋予了 zenfeed 浓重的工具化,个性化色彩。欢迎通过 Push API 集成私有数据,探索更多的可能性
**专为 开发者** 🔬
* **管道化处理机制**: 类似 Prometheus 的 [Relabeling](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config)zenfeed 将每篇内容抽象为标签集,你可以在管道的每个节点,通过自定义 Prompt 对标签进行处理(评分、分类、摘要、过滤等)。
* **灵活编排**: 基于处理后的标签,你可以自由地进行查询、过滤、[路由](https://github.com/glidea/zenfeed/blob/main/docs/config-zh.md#%E9%80%9A%E7%9F%A5%E8%B7%AF%E7%94%B1%E9%85%8D%E7%BD%AE-notifyroute-%E5%8F%8A-notifyroutesub_routes)和[通知](https://github.com/glidea/zenfeed/blob/main/docs/config-zh.md#%E9%80%9A%E7%9F%A5%E6%B8%A0%E9%81%93-email-%E9%85%8D%E7%BD%AE-notifychannelsemail),赋予了 zenfeed 浓厚的工具化、个性化色彩。详情请见 [Rewrite Rules](docs/tech/rewrite-zh.md)
* **开放的 API**:
* [Query API](/docs/query-api-zh.md)
* [RSS Exported API](/docs/rss-api-zh.md)
* [Notify Webhook](/docs/webhook-zh.md)
* [大量声明式 YAML 配置](/docs/config-zh.md)
<details>
<summary>预览</summary>
<img src="docs/images/update-config-with-web.png" alt="" width="500">
<summary><b>预览</b></summary>
<br>
<img src="docs/images/update-config-with-web.png" alt="Update config via web" width="500">
</details>
<p align="center">
<a href="docs/preview.md"><b>➡️ 查看更多效果预览</b></a>
</p>
**For 吃瓜群众** 🍉
---
就冲这精美的邮件样式,请立即安装使用
## 🚀 安装使用
<img src="docs/images/monitoring.png" alt="" width="400">
### 1. 准备工作
[更多效果预览](docs/preview.md)
> [!IMPORTANT]
> zenfeed 默认使用 [硅基流动](https://cloud.siliconflow.cn/) 提供的模型服务。
> * 模型: `Qwen/Qwen3-8B` (免费) 和 `Qwen/Qwen3-Embedding-4B`。
> * **!!!如果你愿意赞助本项目,将获赠一定额度的 Gemini 2.5 Pro/Flash!!! (见下方)**
> * 如果你还没有硅基账号,使用 [**邀请链接**](https://cloud.siliconflow.cn/i/U2VS0Q5A) 可获得 **14 元** 赠送额度。
> * 如果需要使用其他厂商或模型,或进行更详细的自定义部署,请参考 [配置文档](https://github.com/glidea/zenfeed/blob/main/docs/config-zh.md) 来编辑 `docker-compose.yml`。
## 安装与使用
### 2. 一键部署
### 1. 安装
> 最快 1min 拉起
> 最快 1 分钟拉起服务。
默认使用硅基流动的 Qwen/Qwen3-8B (免费) 和 Pro/BAAI/bge-m3。如果你还没有硅基账号使用 [邀请链接](https://cloud.siliconflow.cn/i/U2VS0Q5A) 得 14 元额度
如果需要使用其他厂商或模型,或自定义部署:请编辑下方 **docker-compose.yml**#configs.zenfeed_config.content.
参考 [配置文档](https://github.com/glidea/zenfeed/blob/main/docs/config-zh.md)
#### Mac/Linux
#### Mac / Linux
```bash
# 下载配置文件
curl -L -O https://raw.githubusercontent.com/glidea/zenfeed/main/docker-compose.yml
API_KEY=硅基流动apikey docker-compose -p zenfeed up -d
# 启动服务 (请替换你的 API_KEY)
API_KEY="sk-..." docker-compose -p zenfeed up -d
```
#### Windows
> 使用 PowerShell 执行
#### Windows (PowerShell)
```powershell
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/glidea/zenfeed/main/docker-compose.yml" -OutFile ([System.IO.Path]::GetFileName("https://raw.githubusercontent.com/glidea/zenfeed/main/docker-compose.yml"))
# 下载配置文件
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/glidea/zenfeed/main/docker-compose.yml" -OutFile "docker-compose.yml"
$env:API_KEY = "硅基流动apikey"; docker-compose -p zenfeed up -d
# 启动服务 (请替换你的 API_KEY)
$env:API_KEY = "sk-..."; docker-compose -p zenfeed up -d
```
安装完成!访问 https://zenfeed-web.pages.dev
🎉 **部署完成!**
访问 http://localhost:1400
### 2. 使用 Web 端
> [!WARNING]
> * 如果将 zenfeed 部署在 VPS 等公网环境,请通过 `http://<你的IP>:1400` 访问,并确保防火墙/安全组已放行 `1400` 端口。
> * **安全提示:** zenfeed 尚无认证机制,将服务暴露到公网可能会泄露您的 `API_KEY`。请务必配置严格的安全组规则,仅对信任的 IP 开放访问。
> 如果部署在 VPS 等环境请访问 https://vps_public_ip:1400记得开放安全组端口不要使用上方的公共前端
> ⚠️ zenfeed 尚无认证手段,暴露到公网可能会泄露 APIKey请小心设置安全组。如果你有这方面的安全需求请提 Issue
### 3. 开始使用
> 安卓版https://github.com/xusonfan/zenfeedApp
#### 添加 RSS 订阅源
<img src="docs/images/web-add-source.png" alt="" width="400">
<img src="docs/images/web-add-source.png" alt="Add RSS source via web" width="400">
> 从 Follow 迁移过来,参考 [migrate-from-follow.md](docs/migrate-from-follow.md)
> 需要访问对应的源站,请保证网络畅通
> 添加后稍等几分钟,特别模型有严格速率限制的情况下
> * 从 Follow 迁移,参考 [migrate-from-follow.md](docs/migrate-from-follow.md)
> * 添加后 zenfeed 需要访问源站,请保证网络畅通
> * 添加后稍等几分钟,等待内容抓取和处理,尤其是在模型有严格速率限制的情况下
#### 配置每日简报监控等
#### 配置每日简报监控等
<img src="docs/images/notification-with-web.png" alt="" width="400">
<img src="docs/images/notification-with-web.png" alt="Configure notifications via web" width="400">
### 3. 配置 MCP可选
以 Cherry Studio 为例,配置 MCP 并连接到 Zenfeed见 [Cherry Studio MCP](docs/cherry-studio-mcp.md)
> 默认地址 http://localhost:1301/sse
#### 配置 MCP可选
以 Cherry Studio 为例,配置 MCP 并连接到 Zenfeed见 [Cherry Studio MCP](docs/cherry-studio-mcp.md)
> 默认地址 `http://localhost:1301/sse`
### 后续
#### More...
页面暂时没法表达 zenfeed 强大的灵活性,更多玩法请查阅[配置文档](docs/config-zh.md)
zenfeed 提供了超多的自定义配置,还有很多玩法等待你挖掘。详细请查阅[文档](/docs/)
---
### Roadmap
## 🗺️ Roadmap
[Roadmap](/docs/roadmap-zh.md)
我们规划了一些很 cool 的功能,欢迎查看 [Roadmap](/docs/roadmap-zh.md) 并提出你的建议!
## 欢迎加群讨论
> 使用问题请提 Issue谢绝微信私聊。帮助有类似问题的朋友
---
<img src="docs/images/wechat.png" alt="Wechat" width="150">
## 💬 交流与支持
都看到这里了,顺手点个 Star ⭐️ 呗,用于防止我太监掉
> **使用问题请优先提 [Issue](https://github.com/glidea/zenfeed/issues)**,这能帮助到有类似问题的朋友,也能更好地追踪和解决问题。
有好玩的 AI 工作请联系我!
<table>
<tr>
<td align="center">
<img src="https://github.com/glidea/zenfeed/blob/main/docs/images/wechat.png?raw=true" alt="Wechat QR Code" width="300">
<br>
<strong>AI 学习交流社群</strong>
</td>
<td align="center">
<img src="https://github.com/glidea/banana-prompt-quicker/blob/main/images/glidea.png?raw=true" width="250">
<br>
<strong><a href="https://glidea.zenfeed.xyz/">我的其它项目</a></strong>
</td>
</tr>
<tr>
<td align="center" colspan="2">
<img src="https://github.com/glidea/banana-prompt-quicker/blob/main/images/readnote.png?raw=true" width="400">
<br>
<strong><a href="https://www.xiaohongshu.com/user/profile/5f7dc54d0000000001004afb">📕 小红书账号 - 持续分享 AI 原创</a></strong>
</td>
</tr>
</table>
喜欢本项目的话,赞助杯🧋(赛博要饭)
<img src="docs/images/sponsor.png" alt="Wechat" width="150">
都看到这里了,顺手点个 **Star ⭐️** 呗,这是我持续维护的最大动力!
## 生态项目
有好玩的 AI 工作也请联系我!
### [入行365日报](https://daily.ruhang365.com)
---
入行365创立于2017年希望以入行资讯交流为起点与大家一起建立一个分享专业、共同成长的社区。
## 📝 注意事项与免责声明
致力于为广大互联网从业人员提供全面的入行咨询、培训、小圈交流、资源协作等相关服务。
### 注意事项
* **版本兼容性:** 1.0 版本之前不保证 API 和配置的向后兼容性。
* **开源协议:** 项目采用 AGPLv3 协议,任何 Fork 和分发都必须保持开源。
* **商业使用:** 商用请联系作者报备,可在合理范围内提供支持。我们欢迎合法的商业用途,不欢迎利用本项目从事灰色产业。
* **数据存储:** 数据不会永久保存,默认只存储 8 天。
## 注意
* 1.0 版本之前不保证兼容性
* 项目采用 AGPL3 协议,任何 Fork 都需要开源
* 商用请联系报备,可提供合理范围内的支持。注意是合法商用哦,不欢迎搞灰色
* 数据不会永久保存,默认只存储 8 天
### 鸣谢
* 感谢 [eryajf](https://github.com/eryajf) 提供的 [Compose Inline Config](https://github.com/glidea/zenfeed/issues/1) 建议,让部署更易理解。
* [![Powered by DartNode](https://dartnode.com/branding/DN-Open-Source-sm.png)](https://dartnode.com "Powered by DartNode - Free VPS for Open Source")
## 鸣谢
* 感谢 [eryajf](https://github.com/eryajf) 提供的 [Compose Inline Config](https://github.com/glidea/zenfeed/issues/1) 让部署更易理解
* [![Powered by DartNode](https://dartnode.com/branding/DN-Open-Source-sm.png)](https://dartnode.com "Powered by DartNode - Free VPS for Open Source")
### 欢迎贡献
* 目前贡献规范尚在完善,但我们坚守一个核心原则:"代码风格一致性"。
## 👏🏻 欢迎贡献
* 目前还没有规范,只要求一点,“代码一致性”,很重要
### 免责声明 (Disclaimer)
## 免责声明 (Disclaimer)
<details>
<summary><strong>点击展开查看完整免责声明</strong></summary>
**在使用 `zenfeed` 软件(以下简称本软件)前,请仔细阅读并理解本免责声明。您的下载、安装、使用本软件或任何相关服务的行为,即表示您已阅读、理解并同意接受本声明的所有条款。如果您不同意本声明的任何内容,请立即停止使用本软件。**
**在使用 `zenfeed` 软件(以下简称"本软件")前,请仔细阅读并理解本免责声明。您的下载、安装、使用本软件或任何相关服务的行为,即表示您已阅读、理解并同意接受本声明的所有条款。如果您不同意本声明的任何内容,请立即停止使用本软件。**
1. **按原样提供:** 本软件按现状”和“可用的基础提供,不附带任何形式的明示或默示担保。项目作者和贡献者不对本软件的适销性、特定用途适用性、非侵权性、准确性、完整性、可靠性、安全性、及时性或性能做出任何保证或陈述。
1. **"按原样"提供:** 本软件按"现状"和"可用"的基础提供,不附带任何形式的明示或默示担保。项目作者和贡献者不对本软件的适销性、特定用途适用性、非侵权性、准确性、完整性、可靠性、安全性、及时性或性能做出任何保证或陈述。
2. **用户责任:** 您将对使用本软件的所有行为承担全部责任。这包括但不限于:
* **数据源选择:** 您自行负责选择并配置要接入的数据源(如 RSS feeds、未来可能的 Email 源等)。您必须确信您有权访问和处理这些数据源的内容,并遵守其各自的服务条款、版权政策及相关法律法规。
@@ -216,3 +275,5 @@ zenfeed 提供了超多的自定义配置,还有很多玩法等待你挖掘。
**请再次注意:使用本软件抓取、处理和分发受版权保护的内容可能存在法律风险。用户有责任确保其使用行为符合所有适用的法律法规和第三方服务条款。对于任何因用户滥用或不当使用本软件而引起的法律纠纷或损失,项目作者和贡献者不承担任何责任。**
</details>

View File

@@ -31,6 +31,7 @@ services:
ports:
- "1300:1300"
- "1301:1301"
- "9090:9090"
depends_on:
- rsshub
restart: unless-stopped
@@ -59,7 +60,7 @@ configs:
api_key: ${API_KEY:-your-api-key}
- name: embed
provider: siliconflow
embedding_model: Pro/BAAI/bge-m3
embedding_model: Qwen/Qwen3-Embedding-4B
api_key: ${API_KEY:-your-api-key}
scrape:
rsshub_endpoint: http://rsshub:1200

View File

@@ -14,7 +14,7 @@
| 字段 | 类型 | 描述 | 默认值 | 是否必需 |
| :-------------------- | :------- | :----------------------------------------------------------------------------- | :----------- | :------- |
| `telemetry.address` | `string` | 暴露 Prometheus 指标 & pprof。 | | 否 |
| `telemetry.address` | `string` | 暴露 Prometheus 指标 & pprof。 | `:9090` | 否 |
| `telemetry.log` | `object` | Telemetry 相关的日志配置。 | (见具体字段) | 否 |
| `telemetry.log.level` | `string` | Telemetry 相关消息的日志级别, 可选值为 `debug`, `info`, `warn`, `error` 之一。 | `info` | 否 |
@@ -41,6 +41,7 @@
| `llms[].api_key` | `string` | LLM 的 API 密钥。 | | 是 |
| `llms[].model` | `string` | LLM 的模型。例如 `gpt-4o-mini`。如果用于生成任务 (如总结),则不能为空。如果此 LLM 被使用,则不能与 `embedding_model` 同时为空。 | | 条件性必需 |
| `llms[].embedding_model` | `string` | LLM 的 Embedding 模型。例如 `text-embedding-3-small`。如果用于 Embedding则不能为空。如果此 LLM 被使用,则不能与 `model` 同时为空。**注意:** 初次使用后请勿直接修改,应添加新的 LLM 配置。 | | 条件性必需 |
| `llms[].tts_model` | `string` | LLM 的文本转语音 (TTS) 模型。 | | 否 |
| `llms[].temperature` | `float32` | LLM 的温度 (0-2)。 | `0.0` | 否 |
### Jina AI 配置 (`jina`)
@@ -58,6 +59,7 @@
| `scrape.past` | `time.Duration` | 抓取 Feed 的回溯时间窗口。例如 `1h` 表示只抓取过去 1 小时的 Feed。 | `24h` | 否 |
| `scrape.interval` | `time.Duration` | 抓取每个源的频率 (全局默认值)。例如 `1h`。 | `1h` | 否 |
| `scrape.rsshub_endpoint` | `string` | RSSHub 的端点。你可以部署自己的 RSSHub 服务器或使用公共实例 (参见 [RSSHub 文档](https://docs.rsshub.app/guide/instances))。例如 `https://rsshub.app`。 | | 是 (如果使用了 `rsshub_route_path`) |
| `scrape.rsshub_access_key` | `string` | RSSHub 的访问密钥。用于访问控制。(详情见 [RSSHub文档访问控制](https://docs.rsshub.app/deploy/config#access-control-configurations)) | | 否 |
| `scrape.sources` | `对象列表` | 用于抓取 Feed 的源列表。详见下方的 **抓取源配置**。 | `[]` | 是 (至少一个) |
### 抓取源配置 (`scrape.sources[]`)
@@ -80,10 +82,11 @@
### 存储配置 (`storage`)
| 字段 | 类型 | 描述 | 默认值 | 是否必需 |
| :------------- | :------- | :-------------------------------------------- | :----------- | :------- |
| `storage.dir` | `string` | 所有存储的基础目录。应用运行后不可更改。 | `./data` | 否 |
| `storage.feed` | `object` | Feed 存储配置。详见下方的 **Feed 存储配置**。 | (见具体字段) | 否 |
| 字段 | 类型 | 描述 | 默认值 | 是否必需 |
| :--------------- | :------- | :-------------------------------------------------------------- | :----------- | :------- |
| `storage.dir` | `string` | 所有存储的基础目录。应用运行后不可更改。 | `./data` | 否 |
| `storage.feed` | `object` | Feed 存储配置。详见下方的 **Feed 存储配置** | (见具体字段) | 否 |
| `storage.object` | `object` | 对象存储配置,用于存储播客等文件。详见下方的 **对象存储配置**。 | (见具体字段) | 否 |
### Feed 存储配置 (`storage.feed`)
@@ -95,6 +98,16 @@
| `storage.feed.retention` | `time.Duration` | Feed 的保留时长。 | `8d` | 否 |
| `storage.feed.block_duration` | `time.Duration` | 每个基于时间的 Feed 存储块的保留时长 (类似于 Prometheus TSDB Block)。 | `25h` | 否 |
### 对象存储配置 (`storage.object`)
| 字段 | 类型 | 描述 | 默认值 | 是否必需 |
| :--------------------------------- | :------- | :----------------------------- | :----- | :-------------------- |
| `storage.object.endpoint` | `string` | 对象存储的端点。 | | 是 (如果使用播客功能) |
| `storage.object.access_key_id` | `string` | 对象存储的 Access Key ID。 | | 是 (如果使用播客功能) |
| `storage.object.secret_access_key` | `string` | 对象存储的 Secret Access Key。 | | 是 (如果使用播客功能) |
| `storage.object.bucket` | `string` | 对象存储的存储桶名称。 | | 是 (如果使用播客功能) |
| `storage.object.bucket_url` | `string` | 对象存储的桶访问 URL。 | | 否 |
### 重写规则配置 (`storage.feed.rewrites[]`)
定义在存储前处理 Feed 的规则。规则按顺序应用。
@@ -109,12 +122,8 @@
| `...rewrites[].match_re` | `string` | 用于匹配 (转换后) 文本的正则表达式。 | `.*` (匹配所有) | 否 (使用 `match``match_re`) |
| `...rewrites[].action` | `string` | 匹配时执行的操作: `create_or_update_label` (使用匹配/转换后的文本添加/更新标签), `drop_feed` (完全丢弃该 Feed)。 | `create_or_update_label` | 否 |
| `...rewrites[].label` | `string` | 要创建或更新的 Feed 标签名称。 | | 是 (如果 `action``create_or_update_label`) |
### 重写规则转换配置 (`storage.feed.rewrites[].transform`)
| 字段 | 类型 | 描述 | 默认值 | 是否必需 |
| :--------------------- | :------- | :------------------------------------------------------------------- | :----- | :------- |
| `...transform.to_text` | `object` | 使用 LLM 将源文本转换为文本。详见下方的 **重写规则转换为文本配置**。 | `nil` | 否 |
| `...transform.to_text` | `object` | 使用 LLM 将源文本转换为文本。详见下方的 **重写规则转换为文本配置**。 | `nil` | 否 |
| `...transform.to_podcast` | `object` | 将源文本转换为播客。详见下方的 **重写规则转换为播客配置**。 | `nil` | 否 |
### 重写规则转换为文本配置 (`storage.feed.rewrites[].transform.to_text`)
@@ -126,6 +135,25 @@
| `...to_text.llm` | `string` | **仅当 `type` 为 `prompt` 时有效。** 用于转换的 LLM 名称 (来自 `llms` 部分)。如果未指定,将使用在 `llms` 部分中标记为 `default: true` 的 LLM。 | `llms` 部分中的默认 LLM | 否 |
| `...to_text.prompt` | `string` | **仅当 `type` 为 `prompt` 时有效。** 用于转换的 Prompt。源文本将被注入。可以使用 Go 模板语法引用内置 Prompt: `{{ .summary }}`, `{{ .category }}`, `{{ .tags }}`, `{{ .score }}`, `{{ .comment_confucius }}`, `{{ .summary_html_snippet }}`, `{{ .summary_html_snippet_for_small_model }}`。 | | 是 (如果 `type``prompt`) |
### 重写规则转换为播客配置 (`storage.feed.rewrites[].transform.to_podcast`)
此配置定义了如何将 `source_label` 的文本转换为播客。
| 字段 | 类型 | 描述 | 默认值 | 是否必需 |
| :------------------------------------------- | :--------- | :-------------------------------------------------------------------------------------------------------- | :---------------------- | :------- |
| `...to_podcast.llm` | `string` | 用于生成播客稿件的 LLM 名称 (来自 `llms` 部分)。 | `llms` 部分中的默认 LLM | 否 |
| `...to_podcast.transcript_additional_prompt` | `string` | 附加到播客稿件生成 Prompt 的额外指令。 | | 否 |
| `...to_podcast.tts_llm` | `string` | 用于文本转语音 (TTS) 的 LLM 名称 (来自 `llms` 部分)。**注意:目前仅支持 `provider``gemini` 的 LLM**。 | `llms` 部分中的默认 LLM | 否 |
| `...to_podcast.speakers` | `对象列表` | 播客的演讲者列表。详见下方的 **演讲者配置**。 | `[]` | 是 |
#### 演讲者配置 (`...to_podcast.speakers[]`)
| 字段 | 类型 | 描述 | 默认值 | 是否必需 |
| :-------------------- | :------- | :------------------------ | :----- | :------- |
| `...speakers[].name` | `string` | 演讲者的名字。 | | 是 |
| `...speakers[].role` | `string` | 演讲者的角色描述 (人设)。 | | 否 |
| `...speakers[].voice` | `string` | 演讲者的声音。 | | 是 |
### 调度配置 (`scheduls`)
定义查询和监控 Feed 的规则。
@@ -173,10 +201,11 @@
定义*谁*接收通知。
| 字段 | 类型 | 描述 | 默认值 | 是否必需 |
| :------------------------- | :------- | :------------------------------- | :----- | :------------------ |
| `notify.receivers[].name` | `string` | 接收者的唯一名称。在路由中使用。 | | 是 |
| `notify.receivers[].email` | `string` | 接收者的电子邮件地址。 | | 是 (如果使用 Email) |
| 字段 | 类型 | 描述 | 默认值 | 是否必需 |
| :--------------------------- | :------- | :------------------------------------------------------- | :----- | :-------------------- |
| `notify.receivers[].name` | `string` | 接收者的唯一名称。在路由中使用。 | | 是 |
| `notify.receivers[].email` | `string` | 接收者的电子邮件地址。 | | 是 (如果使用 Email) |
| `notify.receivers[].webhook` | `object` | 接收者的 Webhook 配置。例如: `webhook: { "url": "xxx" }` | | 是 (如果使用 Webhook) |
### 通知渠道配置 (`notify.channels`)
@@ -194,4 +223,4 @@
| `...email.from` | `string` | 发件人 Email 地址。 | | 是 |
| `...email.password` | `string` | 发件人 Email 的应用专用密码。(对于 Gmail, 参见 [Google 应用密码](https://support.google.com/mail/answer/185833))。 | | 是 |
| `...email.feed_markdown_template` | `string` | 用于在 Email 正文中格式化每个 Feed 的 Markdown 模板。默认渲染 Feed 内容。不能与 `feed_html_snippet_template` 同时设置。可用的模板变量取决于 Feed 标签。 | `{{ .content }}` | 否 |
| `...email.feed_html_snippet_template` | `string` | 用于格式化每个 Feed 的 HTML 片段模板。不能与 `feed_markdown_template` 同时设置。可用的模板变量取决于 Feed 标签。 | | 否 |
| `...email.feed_html_snippet_template` | `string` | 用于格式化每个 Feed 的 HTML 片段模板。不能与 `feed_markdown_template` 同时设置。可用的模板变量取决于 Feed 标签。 | | 否 |

View File

@@ -14,7 +14,7 @@
| Field | Type | Description | Default Value | Required |
| :-------------------- | :------- | :--------------------------------------------------------------------------------- | :-------------------- | :------- |
| `telemetry.address` | `string` | Exposes Prometheus metrics & pprof. | | No |
| `telemetry.address` | `string` | Exposes Prometheus metrics & pprof. | `:9090` | No |
| `telemetry.log` | `object` | Log configuration related to telemetry. | (See specific fields) | No |
| `telemetry.log.level` | `string` | Log level for telemetry-related messages, one of `debug`, `info`, `warn`, `error`. | `info` | No |
@@ -41,6 +41,7 @@ This section defines the list of available Large Language Models. At least one L
| `llms[].api_key` | `string` | API key for the LLM. | | Yes |
| `llms[].model` | `string` | Model of the LLM. E.g., `gpt-4o-mini`. Cannot be empty if used for generation tasks (e.g., summarization). If this LLM is used, cannot be empty along with `embedding_model`. | | Conditionally Required |
| `llms[].embedding_model` | `string` | Embedding model of the LLM. E.g., `text-embedding-3-small`. Cannot be empty if used for embedding. If this LLM is used, cannot be empty along with `model`. **Note:** Do not modify directly after initial use; add a new LLM configuration instead. | | Conditionally Required |
| `llms[].tts_model` | `string` | The Text-to-Speech (TTS) model of the LLM. | | No |
| `llms[].temperature` | `float32` | Temperature of the LLM (0-2). | `0.0` | No |
### Jina AI Configuration (`jina`)
@@ -58,6 +59,7 @@ This section configures parameters related to the Jina AI Reader API, primarily
| `scrape.past` | `time.Duration` | Time window to look back when scraping feeds. E.g., `1h` means only scrape feeds from the past 1 hour. | `24h` | No |
| `scrape.interval` | `time.Duration` | Frequency to scrape each source (global default). E.g., `1h`. | `1h` | No |
| `scrape.rsshub_endpoint` | `string` | Endpoint for RSSHub. You can deploy your own RSSHub server or use a public instance (see [RSSHub Documentation](https://docs.rsshub.app/guide/instances)). E.g., `https://rsshub.app`. | | Yes (if `rsshub_route_path` is used) |
| `scrape.rsshub_access_key` | `string` | The access key for RSSHub. Used for access control. (see [RSSHub config](https://docs.rsshub.app/deploy/config#access-control-configurations))| | No |
| `scrape.sources` | `list of objects` | List of sources to scrape feeds from. See **Scrape Source Configuration** below. | `[]` | Yes (at least one) |
### Scrape Source Configuration (`scrape.sources[]`)
@@ -80,10 +82,11 @@ Describes each source to be scraped.
### Storage Configuration (`storage`)
| Field | Type | Description | Default Value | Required |
| :------------- | :------- | :------------------------------------------------------------------------------ | :-------------------- | :------- |
| `storage.dir` | `string` | Base directory for all storage. Cannot be changed after the application starts. | `./data` | No |
| `storage.feed` | `object` | Feed storage configuration. See **Feed Storage Configuration** below. | (See specific fields) | No |
| Field | Type | Description | Default Value | Required |
| :--------------- | :------- | :-------------------------------------------------------------------------------------------------------- | :-------------------- | :------- |
| `storage.dir` | `string` | Base directory for all storage. Cannot be changed after the application starts. | `./data` | No |
| `storage.feed` | `object` | Feed storage configuration. See **Feed Storage Configuration** below. | (See specific fields) | No |
| `storage.object` | `object` | Object storage configuration for storing files like podcasts. See **Object Storage Configuration** below. | (See specific fields) | No |
### Feed Storage Configuration (`storage.feed`)
@@ -95,6 +98,16 @@ Describes each source to be scraped.
| `storage.feed.retention` | `time.Duration` | Retention duration for feeds. | `8d` | No |
| `storage.feed.block_duration` | `time.Duration` | Retention duration for each time-based feed storage block (similar to Prometheus TSDB Block). | `25h` | No |
### Object Storage Configuration (`storage.object`)
| Field | Type | Description | Default Value | Required |
| :--------------------------------- | :------- | :------------------------------------------- | :------------ | :----------------------------- |
| `storage.object.endpoint` | `string` | The endpoint of the object storage. | | Yes (if using podcast feature) |
| `storage.object.access_key_id` | `string` | The access key id of the object storage. | | Yes (if using podcast feature) |
| `storage.object.secret_access_key` | `string` | The secret access key of the object storage. | | Yes (if using podcast feature) |
| `storage.object.bucket` | `string` | The bucket of the object storage. | | Yes (if using podcast feature) |
| `storage.object.bucket` | `string` | The URL of the object storage bucket. | | No |
### Rewrite Rule Configuration (`storage.feed.rewrites[]`)
Defines rules to process feeds before storage. Rules are applied sequentially.
@@ -109,12 +122,8 @@ Defines rules to process feeds before storage. Rules are applied sequentially.
| `...rewrites[].match_re` | `string` | Regular expression to match against the (transformed) text. | `.*` (matches all) | No (use `match` or `match_re`) |
| `...rewrites[].action` | `string` | Action to perform on match: `create_or_update_label` (adds/updates a label with the matched/transformed text), `drop_feed` (discards the feed entirely). | `create_or_update_label` | No |
| `...rewrites[].label` | `string` | Name of the feed label to create or update. | | Yes (if `action` is `create_or_update_label`) |
### Rewrite Rule Transform Configuration (`storage.feed.rewrites[].transform`)
| Field | Type | Description | Default Value | Required |
| :--------------------- | :------- | :--------------------------------------------------------------------------------------------- | :------------ | :------- |
| `...transform.to_text` | `object` | Transforms source text to text using an LLM. See **Rewrite Rule To Text Configuration** below. | `nil` | No |
| `...transform.to_text` | `object` | Transforms source text to text using an LLM. See **Rewrite Rule To Text Configuration** below. | `nil` | No |
| `...transform.to_podcast` | `object` | Transforms source text to a podcast. See **Rewrite Rule To Podcast Configuration** below. | `nil` | No |
### Rewrite Rule To Text Configuration (`storage.feed.rewrites[].transform.to_text`)
@@ -124,7 +133,26 @@ This configuration defines how to transform the text from `source_label`.
| :------------------ | :------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------- | :-------------------------- |
| `...to_text.type` | `string` | Type of transformation. Options: <ul><li>`prompt` (default): Uses an LLM and a specified prompt to transform the source text.</li><li>`crawl`: Treats the source text as a URL, directly crawls the web page content pointed to by the URL, and converts it to Markdown format. This method performs local crawling and attempts to follow `robots.txt`.</li><li>`crawl_by_jina`: Treats the source text as a URL, crawls and processes web page content via the [Jina AI Reader API](https://jina.ai/reader/), and returns Markdown. Potentially more powerful, e.g., for handling dynamic pages, but relies on the Jina AI service.</li></ul> | `prompt` | No |
| `...to_text.llm` | `string` | **Only valid if `type` is `prompt`.** Name of the LLM used for transformation (from `llms` section). If not specified, the LLM marked as `default: true` in the `llms` section will be used. | Default LLM in `llms` section | No |
| `...to_text.prompt` | `string` | **Only valid if `type` is `prompt`.** Prompt used for transformation. The source text will be injected. You can use Go template syntax to reference built-in prompts: `{{ .summary }}`, `{{ .category }}`, `{{ .tags }}`, `{{ .score }}`, `{{ .comment_confucius }}`, `{{ .summary_html_snippet }}`. | | Yes (if `type` is `prompt`) |
| `...to_text.prompt` | `string` | **Only valid if `type` is `prompt`.** Prompt used for transformation. The source text will be injected. You can use Go template syntax to reference built-in prompts: `{{ .summary }}`, `{{ .category }}`, `{{ .tags }}`, `{{ .score }}`, `{{ .comment_confucius }}`, `{{ .summary_html_snippet }}`, `{{ .summary_html_snippet_for_small_model }}`. | | Yes (if `type` is `prompt`) |
### Rewrite Rule To Podcast Configuration (`storage.feed.rewrites[].transform.to_podcast`)
This configuration defines how to transform the text from `source_label` into a podcast.
| Field | Type | Description | Default Value | Required |
| :------------------------------------------- | :---------------- | :--------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------- | :------- |
| `...to_podcast.llm` | `string` | The name of the LLM (from the `llms` section) to use for generating the podcast script. | Default LLM in `llms` section | No |
| `...to_podcast.transcript_additional_prompt` | `string` | Additional instructions to append to the prompt for generating the podcast script. | | No |
| `...to_podcast.tts_llm` | `string` | The name of the LLM (from the `llms` section) to use for Text-to-Speech (TTS). **Note: Currently only supports LLMs with `provider: gemini`**. | Default LLM in `llms` section | No |
| `...to_podcast.speakers` | `list of objects` | A list of speakers for the podcast. See **Speaker Configuration** below. | `[]` | Yes |
#### Speaker Configuration (`...to_podcast.speakers[]`)
| Field | Type | Description | Default Value | Required |
| :-------------------- | :------- | :----------------------------------- | :------------ | :------- |
| `...speakers[].name` | `string` | The name of the speaker. | | Yes |
| `...speakers[].role` | `string` | The role description of the speaker. | | No |
| `...speakers[].voice` | `string` | The voice of the speaker. | | Yes |
### Scheduling Configuration (`scheduls`)
@@ -173,10 +201,11 @@ This structure can be nested using `sub_routes`. Feeds will first try to match s
Defines *who* receives notifications.
| Field | Type | Description | Default Value | Required |
| :------------------------- | :------- | :------------------------------------------- | :------------ | :------------------- |
| `notify.receivers[].name` | `string` | Unique name of the receiver. Used in routes. | | Yes |
| `notify.receivers[].email` | `string` | Email address of the receiver. | | Yes (if using Email) |
| Field | Type | Description | Default Value | Required |
| :--------------------------- | :------- | :----------------------------------------------------------------------- | :------------ | :--------------------- |
| `notify.receivers[].name` | `string` | Unique name of the receiver. Used in routes. | | Yes |
| `notify.receivers[].email` | `string` | Email address of the receiver. | | Yes (if using Email) |
| `notify.receivers[].webhook` | `object` | Webhook configuration for the receiver. E.g. `webhook: { "url": "xxx" }` | | Yes (if using Webhook) |
### Notification Channel Configuration (`notify.channels`)

BIN
docs/images/302.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 570 KiB

After

Width:  |  Height:  |  Size: 522 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 MiB

After

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 197 KiB

After

Width:  |  Height:  |  Size: 176 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 201 KiB

After

Width:  |  Height:  |  Size: 897 KiB

107
docs/podcast.md Normal file
View File

@@ -0,0 +1,107 @@
# 使用 Zenfeed 将文章转换为播客
Zenfeed 的播客功能可以将任何文章源自动转换为一场引人入胜的多人对话式播客。该功能利用大语言模型LLM生成对话脚本和文本转语音TTS并将最终的音频文件托管在您自己的对象存储中。
## 工作原理
1. **提取内容**: Zenfeed 首先通过重写规则提取文章的全文内容。
2. **生成脚本**: 使用一个指定的 LLM如 GPT-4o-mini将文章内容改编成一个由多位虚拟主播对话的脚本。您可以定义每个主播的角色人设来控制对话风格。
3. **语音合成**: 调用另一个支持 TTS 的 LLM目前仅支持 Google Gemini将脚本中的每一句对话转换为语音。
4. **音频合并**: 将所有语音片段合成为一个完整的 WAV 音频文件。
5. **上传存储**: 将生成的播客文件上传到您配置的 S3 兼容对象存储中。
6. **保存链接**: 最后,将播客文件的公开访问 URL 保存为一个新的 Feed 标签方便您在通知、API 或其他地方使用。
## 配置步骤
要启用播客功能您需要完成以下三项配置LLM、对象存储和重写规则。
### 1. 配置 LLM
您需要至少配置两个 LLM一个用于生成对话脚本另一个用于文本转语音TTS
- **脚本生成 LLM**: 可以是任何性能较好的聊天模型,例如 OpenAI 的 `gpt-4o-mini` 或 Google 的 `gemini-1.5-pro`
- **TTS LLM**: 用于将文本转换为语音。**注意:目前此功能仅支持 `provider``gemini` 的 LLM。**
**示例 `config.yaml`:**
```yaml
llms:
# 用于生成播客脚本的 LLM
- name: openai-chat
provider: openai
api_key: "sk-..."
model: gpt-4o-mini
default: true
# 用于文本转语音 (TTS) 的 LLM
- name: gemini-tts
provider: gemini
api_key: "..." # 你的 Google AI Studio API Key
tts_model: "gemini-2.5-flash-preview-tts" # Gemini 的 TTS 模型
```
### 2. 配置对象存储
生成的播客音频文件需要一个地方存放。Zenfeed 支持任何 S3 兼容的对象存储服务。这里我们以 [Cloudflare R2](https://www.cloudflare.com/zh-cn/products/r2/) 为例。
首先,您需要在 Cloudflare R2 中创建一个存储桶Bucket。然后获取以下信息
- `endpoint`: 您的 R2 API 端点。通常格式为 `<account_id>.r2.cloudflarestorage.com`。您可以在 R2 存储桶的主页找到它。
- `access_key_id``secret_access_key`: R2 API 令牌。您可以在 "R2" -> "管理 R2 API 令牌" 页面创建。
- `bucket`: 您创建的存储桶的名称。
- `bucket_url`: 存储桶的公开访问 URL。要获取此 URL您需要将存储桶连接到一个自定义域或者使用 R2 提供的 `r2.dev` 公开访问地址。
**示例 `config.yaml`:**
```yaml
storage:
object:
endpoint: "<your_account_id>.r2.cloudflarestorage.com"
access_key_id: "..."
secret_access_key: "..."
bucket: "zenfeed-podcasts"
bucket_url: "https://pub-xxxxxxxx.r2.dev"
```
### 3. 配置重写规则
最后一步是创建一个重写规则,告诉 Zenfeed 如何将文章转换为播客。这个规则定义了使用哪个标签作为源文本、由谁来对话、使用什么声音等。
**关键配置项:**
- `source_label`: 包含文章全文的标签。
- `label`: 用于存储最终播客 URL 的新标签名称。
- `transform.to_podcast`: 播客转换的核心配置。
- `llm`: 用于生成脚本的 LLM 名称(来自 `llms` 配置)。
- `tts_llm`: 用于 TTS 的 LLM 名称(来自 `llms` 配置)。
- `speakers`: 定义播客的演讲者。
- `name`: 演讲者的名字。
- `role`: 演讲者的角色和人设,将影响脚本内容。
- `voice`: 演讲者的声音。请参考 [Gemini TTS 文档](https://ai.google.dev/gemini-api/docs/speech-generation#voices)。
**示例 `config.yaml`:**
```yaml
storage:
feed:
rewrites:
- source_label: content # 基于原文
transform:
to_podcast:
estimate_maximum_duration: 3m0s # 接近 3 分钟
transcript_additional_prompt: 对话引人入胜,流畅自然,拒绝 AI 味,使用中文回复 # 脚本内容要求
llm: xxxx # 负责生成脚本的 llm
tts_llm: gemini-tts # 仅支持 gemini tts推荐使用 https://github.com/glidea/one-balance 轮询
speakers:
- name: 小雅
role: >-
一位经验丰富、声音甜美、风格活泼的科技播客主持人。前财经记者、媒体人出身,因为工作原因长期关注科技行业,后来凭着热爱和出色的口才转行做了全职内容创作者。擅长从普通用户视角出发,把复杂的技术概念讲得生动有趣,是她发掘了老王,并把他‘骗’来一起做播客的‘始作俑者’。
voice: Autonoe
- name: 老王
role: >-
一位资深科技评论员,互联网老兵。亲身经历过中国互联网从草莽到巨头的全过程,当过程序员,做过产品经理,也创过业。因此他对行业的各种‘风口’和‘概念’有自己独到的、甚至有些刻薄的见解。观点犀利,一针见血,说话直接,热衷于给身边的一切产品挑刺。被‘忽悠’上了‘贼船’,表面上经常吐槽,但内心很享受这种分享观点的感觉。
voice: Puck
label: podcast_url
```
配置完成后Zenfeed 将在每次抓取到新文章时,自动执行上述流程。可以在通知模版中使用 podcast_url label或 Web 中直接收听Web 固定读取 podcast_url label若使用别的名称则无法读取

View File

@@ -5,8 +5,8 @@
* TTS 音色进步也只是近几年的事情,长期需要等成本下降
* 短期因为我个人很喜欢播客总结(应该也很适合大家通勤),会先本地部署模型,提供给 https://zenfeed.xyz 使用
* ebup2rss
* 见过 rss2ebup,但你绝没见过反着来的
* epub2rss
* 见过 rss2epub,但你绝没见过反着来的
* 严格上这并不属于 zenfeed顶多算生态项目吧
* 抛开时效性,书比新闻更有价值。但当你立下 “坚持阅读” 的 flag然后呢
* 这个子项目旨在实现:每日更新一章,作为 rss 暴露。在阅读新闻 RSS 时,“顺便” 把书给看了
@@ -21,9 +21,13 @@
> 灵光一现:最近喜欢上和豆包聊新闻了,或许可以分享下如何把 zenfeed 数据接入豆包
## 中长期
* Web 上的相关性聚合阅读
![](images/web-reading-aggr.png)
* 更易用的 Web但坦诚地讲目前优先级比较低更鼓励调用后端 api构建一个属于你的 web
* 主题研究报告
* 屏蔽 or follow 相关新闻后续
* 相关性聚合阅读
![](images/web-reading-aggr.png)
> P.S. 相关功能已经实现,只不过没有下放到 Web
---
如果你觉得 zenfeed 很酷,并且有意愿贡献,请联系我!
如果你觉得 zenfeed 很酷,并且有意愿贡献,请联系我!

View File

@@ -47,12 +47,12 @@
transform:
to_text:
llm: "qwen-default" # 使用名为 "qwen-default" 的 LLM 配置
prompt: "category" # 使用预设的 "category" prompt 模板
prompt: "{{ .category }} 可以接着补充你额外的要求" # 使用预设的 "category" prompt 模板
match: ".+" # 匹配 LLM 返回的任何非空分类结果
action: "create_or_update_label"
label: "category" # 新标签的键为 "category"
```
* **效果**: 如果一篇文章内容是关于人工智能的LLM 可能会返回 "Technology"。经过此规则处理后,文章的标签集会增加或更新一个标签,例如 `{"category", "Technology"}`。
* **效果**: 如果一篇文章内容是关于人工智能的LLM 可能会返回 "Technology"。经过此规则处理后,文章的标签集会增加或更新一个标签,例如 `{"category", "Technology"}`。**后续可用于,“查询分类为 Technology 的文章”,“基于分类为 Technology 的文章发送每日科技日报”...**
### 示例 2: 基于 LLM 评分过滤低质量内容
@@ -65,7 +65,7 @@
transform:
to_text:
llm: "qwen-default"
prompt: "score" # 使用预设的 "score" prompt 模板
prompt: "{{ .score }} 可以接着补充你额外的要求" # 使用预设的 "score" prompt 模板
match: "^([0-9]|10)$" # 确保 LLM 返回的是 0-10 的数字
action: "create_or_update_label"
label: "ai_score" # 将评分结果存入 "ai_score" 标签

13
go.mod
View File

@@ -9,6 +9,7 @@ require (
github.com/edsrzf/mmap-go v1.2.0
github.com/gorilla/feeds v1.2.0
github.com/mark3labs/mcp-go v0.17.0
github.com/minio/minio-go/v7 v7.0.94
github.com/mmcdole/gofeed v1.3.0
github.com/nutsdb/nutsdb v1.0.4
github.com/onsi/gomega v1.36.1
@@ -16,6 +17,7 @@ require (
github.com/prometheus/client_golang v1.21.1
github.com/sashabaranov/go-openai v1.40.1
github.com/stretchr/testify v1.10.0
github.com/temoto/robotstxt v1.1.2
github.com/veqryn/slog-dedup v0.5.0
github.com/yuin/goldmark v1.7.8
gopkg.in/gomail.v2 v2.0.0-20160411212932-81ebce5c23df
@@ -32,25 +34,34 @@ require (
github.com/bwmarrin/snowflake v0.3.0 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/go-ini/ini v1.67.0 // indirect
github.com/goccy/go-json v0.10.5 // indirect
github.com/gofrs/flock v0.8.1 // indirect
github.com/google/go-cmp v0.7.0 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/klauspost/compress v1.18.0 // indirect
github.com/klauspost/cpuid/v2 v2.2.10 // indirect
github.com/minio/crc64nvme v1.0.1 // indirect
github.com/minio/md5-simd v1.1.2 // indirect
github.com/mmcdole/goxpp v1.1.1-0.20240225020742-a0c311522b23 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/philhofer/fwd v1.1.3-0.20240916144458-20a13a1f6b7c // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/prometheus/client_model v0.6.1 // indirect
github.com/prometheus/common v0.62.0 // indirect
github.com/prometheus/procfs v0.15.1 // indirect
github.com/rs/xid v1.6.0 // indirect
github.com/stretchr/objx v0.5.2 // indirect
github.com/temoto/robotstxt v1.1.2
github.com/tidwall/btree v1.6.0 // indirect
github.com/tinylib/msgp v1.3.0 // indirect
github.com/xujiajun/mmap-go v1.0.1 // indirect
github.com/xujiajun/utils v0.0.0-20220904132955-5f7c5b914235 // indirect
github.com/yosida95/uritemplate/v3 v3.0.2 // indirect
golang.org/x/crypto v0.36.0 // indirect
golang.org/x/net v0.38.0 // indirect
golang.org/x/sys v0.31.0 // indirect
golang.org/x/text v0.23.0 // indirect

23
go.sum
View File

@@ -21,12 +21,18 @@ github.com/chewxy/math32 v1.10.1/go.mod h1:dOB2rcuFrCn6UHrze36WSLVPKtzPMRAQvBvUw
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
github.com/edsrzf/mmap-go v1.2.0 h1:hXLYlkbaPzt1SaQk+anYwKSRNhufIDCchSPkUD6dD84=
github.com/edsrzf/mmap-go v1.2.0/go.mod h1:19H/e8pUPLicwkyNgOykDXkJ9F0MHE+Z52B8EIth78Q=
github.com/go-ini/ini v1.67.0 h1:z6ZrTEZqSWOTyH2FlglNbNgARyHG8oLW9gMELqKr06A=
github.com/go-ini/ini v1.67.0/go.mod h1:ByCAeIL28uOIIG0E3PJtZPDL8WnHpFKFOtgjp+3Ies8=
github.com/go-logr/logr v1.4.2 h1:6pFjapn8bFcIbiKo3XT4j/BhANplGihG6tvd+8rYgrY=
github.com/go-logr/logr v1.4.2/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY=
github.com/go-task/slim-sprig/v3 v3.0.0 h1:sUs3vkvUymDpBKi3qH1YSqBQk9+9D/8M2mN1vB6EwHI=
github.com/go-task/slim-sprig/v3 v3.0.0/go.mod h1:W848ghGpv3Qj3dhTPRyJypKRiqCdHZiAzKg9hl15HA8=
github.com/goccy/go-json v0.10.5 h1:Fq85nIqj+gXn/S5ahsiTlK3TmC85qgirsdTP/+DeaC4=
github.com/goccy/go-json v0.10.5/go.mod h1:oq7eo15ShAhp70Anwd5lgX2pLfOS3QCiwU/PULtXL6M=
github.com/gofrs/flock v0.8.1 h1:+gYjHKf32LDeiEEFhQaotPbLuUXjY5ZqxKgXy7n59aw=
github.com/gofrs/flock v0.8.1/go.mod h1:F1TvTiK9OcQqauNUHlbJvyl9Qa1QvF/gOUDKA14jxHU=
github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
@@ -42,6 +48,9 @@ github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnr
github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo=
github.com/klauspost/compress v1.18.0 h1:c/Cqfb0r+Yi+JtIEq73FWXVkRonBlf0CRNYc8Zttxdo=
github.com/klauspost/compress v1.18.0/go.mod h1:2Pp+KzxcywXVXMr50+X0Q/Lsb43OQHYWRCY2AiWywWQ=
github.com/klauspost/cpuid/v2 v2.0.1/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg=
github.com/klauspost/cpuid/v2 v2.2.10 h1:tBs3QSyvjDyFTq3uoc/9xFpCuOsJQFNPiAhYdw2skhE=
github.com/klauspost/cpuid/v2 v2.2.10/go.mod h1:hqwkgyIinND0mEev00jJYCxPNVRVXFQeu1XKlok6oO0=
github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo=
github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE=
github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk=
@@ -53,6 +62,12 @@ github.com/kylelemons/godebug v1.1.0 h1:RPNrshWIDI6G2gRW9EHilWtl7Z6Sb1BR0xunSBf0
github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=
github.com/mark3labs/mcp-go v0.17.0 h1:5Ps6T7qXr7De/2QTqs9h6BKeZ/qdeUeGrgM5lPzi930=
github.com/mark3labs/mcp-go v0.17.0/go.mod h1:KmJndYv7GIgcPVwEKJjNcbhVQ+hJGJhrCCB/9xITzpE=
github.com/minio/crc64nvme v1.0.1 h1:DHQPrYPdqK7jQG/Ls5CTBZWeex/2FMS3G5XGkycuFrY=
github.com/minio/crc64nvme v1.0.1/go.mod h1:eVfm2fAzLlxMdUGc0EEBGSMmPwmXD5XiNRpnu9J3bvg=
github.com/minio/md5-simd v1.1.2 h1:Gdi1DZK69+ZVMoNHRXJyNcxrMA4dSxoYHZSQbirFg34=
github.com/minio/md5-simd v1.1.2/go.mod h1:MzdKDxYpY2BT9XQFocsiZf/NKVtR7nkE4RoEpN+20RM=
github.com/minio/minio-go/v7 v7.0.94 h1:1ZoksIKPyaSt64AVOyaQvhDOgVC3MfZsWM6mZXRUGtM=
github.com/minio/minio-go/v7 v7.0.94/go.mod h1:71t2CqDt3ThzESgZUlU1rBN54mksGGlkLcFgguDnnAc=
github.com/mmcdole/gofeed v1.3.0 h1:5yn+HeqlcvjMeAI4gu6T+crm7d0anY85+M+v6fIFNG4=
github.com/mmcdole/gofeed v1.3.0/go.mod h1:9TGv2LcJhdXePDzxiuMnukhV2/zb6VtnZt1mS+SjkLE=
github.com/mmcdole/goxpp v1.1.1-0.20240225020742-a0c311522b23 h1:Zr92CAlFhy2gL+V1F+EyIuzbQNbSgP4xhTODZtrXUtk=
@@ -70,6 +85,8 @@ github.com/onsi/ginkgo/v2 v2.20.1 h1:YlVIbqct+ZmnEph770q9Q7NVAz4wwIiVNahee6JyUzo
github.com/onsi/ginkgo/v2 v2.20.1/go.mod h1:lG9ey2Z29hR41WMVthyJBGUBcBhGOtoPF2VFMvBXFCI=
github.com/onsi/gomega v1.36.1 h1:bJDPBO7ibjxcbHMgSCoo4Yj18UWbKDlLwX1x9sybDcw=
github.com/onsi/gomega v1.36.1/go.mod h1:PvZbdDc8J6XJEpDK4HCuRBm8a6Fzp9/DmhC9C7yFlog=
github.com/philhofer/fwd v1.1.3-0.20240916144458-20a13a1f6b7c h1:dAMKvw0MlJT1GshSTtih8C2gDs04w8dReiOGXrGLNoY=
github.com/philhofer/fwd v1.1.3-0.20240916144458-20a13a1f6b7c/go.mod h1:RqIHx9QI14HlwKwm98g9Re5prTQ6LdeRQn+gXJFxsJM=
github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
@@ -87,6 +104,8 @@ github.com/remyoudompheng/bigfft v0.0.0-20200410134404-eec4a21b6bb0 h1:OdAsTTz6O
github.com/remyoudompheng/bigfft v0.0.0-20200410134404-eec4a21b6bb0/go.mod h1:qqbHyh8v60DhA7CoWK5oRCqLrMHRGoxYCSS9EjAz6Eo=
github.com/rogpeppe/go-internal v1.10.0 h1:TMyTOH3F/DB16zRVcYyreMH6GnZZrwQVAoYjRBZyWFQ=
github.com/rogpeppe/go-internal v1.10.0/go.mod h1:UQnix2H7Ngw/k4C5ijL5+65zddjncjaFoBhdsK/akog=
github.com/rs/xid v1.6.0 h1:fV591PaemRlL6JfRxGDEPl69wICngIQ3shQtzfy2gxU=
github.com/rs/xid v1.6.0/go.mod h1:7XoLgs4eV+QndskICGsho+ADou8ySMSjJKDIan90Nz0=
github.com/sashabaranov/go-openai v1.40.1 h1:bJ08Iwct5mHBVkuvG6FEcb9MDTfsXdTYPGjYLRdeTEU=
github.com/sashabaranov/go-openai v1.40.1/go.mod h1:lj5b/K+zjTSFxVLijLSTDZuP7adOgerWeFyZLUhAKRg=
github.com/sebdah/goldie/v2 v2.5.3 h1:9ES/mNN+HNUbNWpVAlrzuZ7jE+Nrczbj8uFRjM7624Y=
@@ -106,6 +125,8 @@ github.com/temoto/robotstxt v1.1.2 h1:W2pOjSJ6SWvldyEuiFXNxz3xZ8aiWX5LbfDiOFd7Fx
github.com/temoto/robotstxt v1.1.2/go.mod h1:+1AmkuG3IYkh1kv0d2qEB9Le88ehNO0zwOr3ujewlOo=
github.com/tidwall/btree v1.6.0 h1:LDZfKfQIBHGHWSwckhXI0RPSXzlo+KYdjK7FWSqOzzg=
github.com/tidwall/btree v1.6.0/go.mod h1:twD9XRA5jj9VUQGELzDO4HPQTNJsoWWfYEL+EUQ2cKY=
github.com/tinylib/msgp v1.3.0 h1:ULuf7GPooDaIlbyvgAxBV/FI7ynli6LZ1/nVUNu+0ww=
github.com/tinylib/msgp v1.3.0/go.mod h1:ykjzy2wzgrlvpDCRc4LA8UXy6D8bzMSuAF3WD57Gok0=
github.com/veqryn/slog-dedup v0.5.0 h1:2pc4va3q8p7Tor1SjVvi1ZbVK/oKNPgsqG15XFEt0iM=
github.com/veqryn/slog-dedup v0.5.0/go.mod h1:/iQU008M3qFa5RovtfiHiODxJFvxZLjWRG/qf/zKFHw=
github.com/xujiajun/mmap-go v1.0.1 h1:7Se7ss1fLPPRW+ePgqGpCkfGIZzJV6JPq9Wq9iv/WHc=
@@ -123,6 +144,8 @@ golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5y
golang.org/x/crypto v0.19.0/go.mod h1:Iy9bg/ha4yyC70EfRS8jz+B6ybOBKMaSxLj6P6oBDfU=
golang.org/x/crypto v0.22.0/go.mod h1:vr6Su+7cTlO45qkww3VDJlzDn0ctJvRgYbC2NvXHt+M=
golang.org/x/crypto v0.23.0/go.mod h1:CKFgDieR+mRhux2Lsu27y0fO304Db0wZe70UKqHu0v8=
golang.org/x/crypto v0.36.0 h1:AnAEvhDddvBdpY+uR+MyHmuZzzNqXSe/GvuDeob5L34=
golang.org/x/crypto v0.36.0/go.mod h1:Y4J0ReaxCR1IMaabaSMugxJES1EpwhBHhv2bDHklZvc=
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4=
golang.org/x/mod v0.8.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=

46
main.go
View File

@@ -47,6 +47,7 @@ import (
"github.com/glidea/zenfeed/pkg/storage/feed/block/index/primary"
"github.com/glidea/zenfeed/pkg/storage/feed/block/index/vector"
"github.com/glidea/zenfeed/pkg/storage/kv"
"github.com/glidea/zenfeed/pkg/storage/object"
"github.com/glidea/zenfeed/pkg/telemetry/log"
telemetryserver "github.com/glidea/zenfeed/pkg/telemetry/server"
timeutil "github.com/glidea/zenfeed/pkg/util/time"
@@ -122,18 +123,19 @@ type App struct {
conf *config.App
telemetry telemetryserver.Server
kvStorage kv.Storage
llmFactory llm.Factory
rewriter rewrite.Rewriter
feedStorage feed.Storage
api api.API
http http.Server
mcp mcp.Server
rss rss.Server
scraperMgr scrape.Manager
scheduler schedule.Scheduler
notifier notify.Notifier
notifyChan chan *rule.Result
kvStorage kv.Storage
llmFactory llm.Factory
rewriter rewrite.Rewriter
feedStorage feed.Storage
objectStorage object.Storage
api api.API
http http.Server
mcp mcp.Server
rss rss.Server
scraperMgr scrape.Manager
scheduler schedule.Scheduler
notifier notify.Notifier
notifyChan chan *rule.Result
}
// newApp creates a new application instance.
@@ -164,6 +166,9 @@ func (a *App) setup() error {
if err := a.setupKVStorage(); err != nil {
return errors.Wrap(err, "setup kv storage")
}
if err := a.setupObjectStorage(); err != nil {
return errors.Wrap(err, "setup object storage")
}
if err := a.setupLLMFactory(); err != nil {
return errors.Wrap(err, "setup llm factory")
}
@@ -251,7 +256,8 @@ func (a *App) setupLLMFactory() (err error) {
// setupRewriter initializes the Rewriter factory.
func (a *App) setupRewriter() (err error) {
a.rewriter, err = rewrite.NewFactory().New(component.Global, a.conf, rewrite.Dependencies{
LLMFactory: a.llmFactory,
LLMFactory: a.llmFactory,
ObjectStorage: a.objectStorage,
})
if err != nil {
return err
@@ -282,6 +288,18 @@ func (a *App) setupFeedStorage() (err error) {
return nil
}
// setupObjectStorage initializes the Object storage.
func (a *App) setupObjectStorage() (err error) {
a.objectStorage, err = object.NewFactory().New(component.Global, a.conf, object.Dependencies{})
if err != nil {
return err
}
a.configMgr.Subscribe(a.objectStorage)
return nil
}
// setupTelemetryServer initializes the Telemetry server.
func (a *App) setupTelemetryServer() (err error) {
a.telemetry, err = telemetryserver.NewFactory().New(component.Global, a.conf, telemetryserver.Dependencies{})
@@ -419,7 +437,7 @@ func (a *App) run(ctx context.Context) error {
log.Info(ctx, "starting application components...")
if err := component.Run(ctx,
component.Group{a.configMgr},
component.Group{a.llmFactory, a.telemetry},
component.Group{a.llmFactory, a.objectStorage, a.telemetry},
component.Group{a.rewriter},
component.Group{a.feedStorage},
component.Group{a.kvStorage},

View File

@@ -90,19 +90,22 @@ type LLM struct {
APIKey string `yaml:"api_key,omitempty" json:"api_key,omitempty" desc:"The API key of the LLM. It is required when api.llm is set."`
Model string `yaml:"model,omitempty" json:"model,omitempty" desc:"The model of the LLM. e.g. gpt-4o-mini. Can not be empty with embedding_model at same time when api.llm is set."`
EmbeddingModel string `yaml:"embedding_model,omitempty" json:"embedding_model,omitempty" desc:"The embedding model of the LLM. e.g. text-embedding-3-small. Can not be empty with model at same time when api.llm is set. NOTE: Once used, do not modify it directly, instead, add a new LLM configuration."`
TTSModel string `yaml:"tts_model,omitempty" json:"tts_model,omitempty" desc:"The TTS model of the LLM."`
Temperature float32 `yaml:"temperature,omitempty" json:"temperature,omitempty" desc:"The temperature (0-2) of the LLM. Default: 0.0"`
}
type Scrape struct {
Past timeutil.Duration `yaml:"past,omitempty" json:"past,omitempty" desc:"The lookback time window for scraping feeds. e.g. 1h means only scrape feeds in the past 1 hour. Default: 3d"`
Interval timeutil.Duration `yaml:"interval,omitempty" json:"interval,omitempty" desc:"How often to scrape each source, it is a global interval. e.g. 1h. Default: 1h"`
RSSHubEndpoint string `yaml:"rsshub_endpoint,omitempty" json:"rsshub_endpoint,omitempty" desc:"The endpoint of the RSSHub. You can deploy your own RSSHub server or use the public one (https://docs.rsshub.app/guide/instances). e.g. https://rsshub.app. It is required when sources[].rss.rsshub_route_path is set."`
Sources []ScrapeSource `yaml:"sources,omitempty" json:"sources,omitempty" desc:"The sources for scraping feeds."`
Past timeutil.Duration `yaml:"past,omitempty" json:"past,omitempty" desc:"The lookback time window for scraping feeds. e.g. 1h means only scrape feeds in the past 1 hour. Default: 3d"`
Interval timeutil.Duration `yaml:"interval,omitempty" json:"interval,omitempty" desc:"How often to scrape each source, it is a global interval. e.g. 1h. Default: 1h"`
RSSHubEndpoint string `yaml:"rsshub_endpoint,omitempty" json:"rsshub_endpoint,omitempty" desc:"The endpoint of the RSSHub. You can deploy your own RSSHub server or use the public one (https://docs.rsshub.app/guide/instances). e.g. https://rsshub.app. It is required when sources[].rss.rsshub_route_path is set."`
RSSHubAccessKey string `yaml:"rsshub_access_key,omitempty" json:"rsshub_access_key,omitempty" desc:"The access key for RSSHub. Used for access control. (see [RSSHub config](https://docs.rsshub.app/deploy/config#access-control-configurations))"`
Sources []ScrapeSource `yaml:"sources,omitempty" json:"sources,omitempty" desc:"The sources for scraping feeds."`
}
type Storage struct {
Dir string `yaml:"dir,omitempty" json:"dir,omitempty" desc:"The base directory of the all storages. Default: ./data. It can not be changed after the app is running."`
Feed FeedStorage `yaml:"feed,omitempty" json:"feed,omitempty" desc:"The feed storage config."`
Dir string `yaml:"dir,omitempty" json:"dir,omitempty" desc:"The base directory of the all storages. Default: ./data. It can not be changed after the app is running."`
Feed FeedStorage `yaml:"feed,omitempty" json:"feed,omitempty" desc:"The feed storage config."`
Object ObjectStorage `yaml:"object,omitempty" json:"object,omitempty" desc:"The object storage config."`
}
type FeedStorage struct {
@@ -113,6 +116,14 @@ type FeedStorage struct {
BlockDuration timeutil.Duration `yaml:"block_duration,omitempty" json:"block_duration,omitempty" desc:"How long to keep the feed storage block. Block is time-based, like Prometheus TSDB Block. Default: 25h"`
}
type ObjectStorage struct {
Endpoint string `yaml:"endpoint,omitempty" json:"endpoint,omitempty" desc:"The endpoint of the object storage."`
AccessKeyID string `yaml:"access_key_id,omitempty" json:"access_key_id,omitempty" desc:"The access key id of the object storage."`
SecretAccessKey string `yaml:"secret_access_key,omitempty" json:"secret_access_key,omitempty" desc:"The secret access key of the object storage."`
Bucket string `yaml:"bucket,omitempty" json:"bucket,omitempty" desc:"The bucket of the object storage."`
BucketURL string `yaml:"bucket_url,omitempty" json:"bucket_url,omitempty" desc:"The public URL of the object storage bucket."`
}
type ScrapeSource struct {
Interval timeutil.Duration `yaml:"interval,omitempty" json:"interval,omitempty" desc:"How often to scrape this source. Default: global interval"`
Name string `yaml:"name,omitempty" json:"name,omitempty" desc:"The name of the source. It is required."`
@@ -137,7 +148,8 @@ type RewriteRule struct {
}
type RewriteRuleTransform struct {
ToText *RewriteRuleTransformToText `yaml:"to_text,omitempty" json:"to_text,omitempty" desc:"The transform config to transform the source text to text."`
ToText *RewriteRuleTransformToText `yaml:"to_text,omitempty" json:"to_text,omitempty" desc:"The transform config to transform the source text to text."`
ToPodcast *RewriteRuleTransformToPodcast `yaml:"to_podcast,omitempty" json:"to_podcast,omitempty" desc:"The transform config to transform the source text to podcast."`
}
type RewriteRuleTransformToText struct {
@@ -146,6 +158,20 @@ type RewriteRuleTransformToText struct {
Prompt string `yaml:"prompt,omitempty" json:"prompt,omitempty" desc:"The prompt to transform the source text. The source text will be injected into the prompt above. And you can use go template syntax to refer some built-in prompts, like {{ .summary }}. Available built-in prompts: category, tags, score, comment_confucius, summary, summary_html_snippet."`
}
type RewriteRuleTransformToPodcast struct {
LLM string `yaml:"llm,omitempty" json:"llm,omitempty" desc:"The LLM name to use. Default is the default LLM in llms section."`
EstimateMaximumDuration timeutil.Duration `yaml:"estimate_maximum_duration,omitempty" json:"estimate_maximum_duration,omitempty" desc:"The estimated maximum duration of the podcast. It will affect the length of the generated transcript. e.g. 5m. Default is 5m."`
TranscriptAdditionalPrompt string `yaml:"transcript_additional_prompt,omitempty" json:"transcript_additional_prompt,omitempty" desc:"The additional prompt to add to the transcript. It is optional."`
TTSLLM string `yaml:"tts_llm,omitempty" json:"tts_llm,omitempty" desc:"The LLM name to use for TTS. Only supports gemini now. Default is the default LLM in llms section."`
Speakers []RewriteRuleTransformToPodcastSpeaker `yaml:"speakers,omitempty" json:"speakers,omitempty" desc:"The speakers to use. It is required, at least one speaker is needed."`
}
type RewriteRuleTransformToPodcastSpeaker struct {
Name string `yaml:"name,omitempty" json:"name,omitempty" desc:"The name of the speaker. It is required."`
Role string `yaml:"role,omitempty" json:"role,omitempty" desc:"The role description of the speaker. You can think of it as a character setting."`
Voice string `yaml:"voice,omitempty" json:"voice,omitempty" desc:"The voice of the speaker. It is required."`
}
type SchedulsRule struct {
Name string `yaml:"name,omitempty" json:"name,omitempty" desc:"The name of the rule. It is required."`
Query string `yaml:"query,omitempty" json:"query,omitempty" desc:"The semantic query to get the feeds. NOTE it is optional"`

248
pkg/llm/gemini.go Normal file
View File

@@ -0,0 +1,248 @@
// Copyright (C) 2025 wangyusong
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Affero General Public License for more details.
//
// You should have received a copy of the GNU Affero General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.
package llm
import (
"bytes"
"context"
"encoding/base64"
"encoding/json"
"io"
"net/http"
"path/filepath"
"github.com/pkg/errors"
oai "github.com/sashabaranov/go-openai"
"github.com/glidea/zenfeed/pkg/component"
"github.com/glidea/zenfeed/pkg/telemetry"
telemetrymodel "github.com/glidea/zenfeed/pkg/telemetry/model"
"github.com/glidea/zenfeed/pkg/util/wav"
)
type gemini struct {
*component.Base[Config, struct{}]
text
hc *http.Client
embeddingSpliter embeddingSpliter
}
func newGemini(c *Config) LLM {
config := oai.DefaultConfig(c.APIKey)
config.BaseURL = filepath.Join(c.Endpoint, "openai") // OpenAI compatible endpoint.
client := oai.NewClientWithConfig(config)
embeddingSpliter := newEmbeddingSpliter(1536, 64)
base := component.New(&component.BaseConfig[Config, struct{}]{
Name: "LLM/gemini",
Instance: c.Name,
Config: c,
})
return &gemini{
Base: base,
text: &openaiText{
Base: base,
client: client,
},
hc: &http.Client{},
embeddingSpliter: embeddingSpliter,
}
}
func (g *gemini) WAV(ctx context.Context, text string, speakers []Speaker) (r io.ReadCloser, err error) {
ctx = telemetry.StartWith(ctx, append(g.TelemetryLabels(), telemetrymodel.KeyOperation, "WAV")...)
defer func() { telemetry.End(ctx, err) }()
if g.Config().TTSModel == "" {
return nil, errors.New("tts model is not set")
}
reqPayload, err := buildWAVRequestPayload(text, speakers)
if err != nil {
return nil, errors.Wrap(err, "build wav request payload")
}
pcmData, err := g.doWAVRequest(ctx, reqPayload)
if err != nil {
return nil, errors.Wrap(err, "do wav request")
}
return streamWAV(pcmData), nil
}
func (g *gemini) doWAVRequest(ctx context.Context, reqPayload *geminiRequest) ([]byte, error) {
config := g.Config()
body, err := json.Marshal(reqPayload)
if err != nil {
return nil, errors.Wrap(err, "marshal tts request")
}
url := config.Endpoint + "/models/" + config.TTSModel + ":generateContent"
req, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(body))
if err != nil {
return nil, errors.Wrap(err, "new tts request")
}
req.Header.Set("x-goog-api-key", config.APIKey)
req.Header.Set("Content-Type", "application/json")
resp, err := g.hc.Do(req)
if err != nil {
return nil, errors.Wrap(err, "do tts request")
}
defer func() { _ = resp.Body.Close() }()
if resp.StatusCode != http.StatusOK {
errMsg, _ := io.ReadAll(resp.Body)
return nil, errors.Errorf("tts request failed with status %d: %s", resp.StatusCode, string(errMsg))
}
var ttsResp geminiResponse
if err := json.NewDecoder(resp.Body).Decode(&ttsResp); err != nil {
return nil, errors.Wrap(err, "decode tts response")
}
if len(ttsResp.Candidates) == 0 || len(ttsResp.Candidates[0].Content.Parts) == 0 || ttsResp.Candidates[0].Content.Parts[0].InlineData == nil {
return nil, errors.New("no audio data in tts response")
}
audioDataB64 := ttsResp.Candidates[0].Content.Parts[0].InlineData.Data
pcmData, err := base64.StdEncoding.DecodeString(audioDataB64)
if err != nil {
return nil, errors.Wrap(err, "decode base64")
}
return pcmData, nil
}
func buildWAVRequestPayload(text string, speakers []Speaker) (*geminiRequest, error) {
reqPayload := geminiRequest{
Contents: []*geminiRequestContent{{Parts: []*geminiRequestPart{{Text: text}}}},
Config: &geminiRequestConfig{
ResponseModalities: []string{"AUDIO"},
SpeechConfig: &geminiRequestSpeechConfig{},
},
}
switch len(speakers) {
case 0:
return nil, errors.New("no speakers")
case 1:
reqPayload.Config.SpeechConfig.VoiceConfig = &geminiRequestVoiceConfig{
PrebuiltVoiceConfig: &geminiRequestPrebuiltVoiceConfig{VoiceName: speakers[0].Voice},
}
default:
multiSpeakerConfig := &geminiRequestMultiSpeakerVoiceConfig{}
for _, s := range speakers {
multiSpeakerConfig.SpeakerVoiceConfigs = append(multiSpeakerConfig.SpeakerVoiceConfigs, &geminiRequestSpeakerVoiceConfig{
Speaker: s.Name,
VoiceConfig: &geminiRequestVoiceConfig{
PrebuiltVoiceConfig: &geminiRequestPrebuiltVoiceConfig{VoiceName: s.Voice},
},
})
}
reqPayload.Config.SpeechConfig.MultiSpeakerVoiceConfig = multiSpeakerConfig
}
return &reqPayload, nil
}
func streamWAV(pcmData []byte) io.ReadCloser {
pipeReader, pipeWriter := io.Pipe()
go func() {
defer func() { _ = pipeWriter.Close() }()
if err := wav.WriteHeader(pipeWriter, geminiWavHeader, uint32(len(pcmData))); err != nil {
pipeWriter.CloseWithError(errors.Wrap(err, "write wav header"))
return
}
if _, err := io.Copy(pipeWriter, bytes.NewReader(pcmData)); err != nil {
pipeWriter.CloseWithError(errors.Wrap(err, "write pcm data"))
return
}
}()
return pipeReader
}
var geminiWavHeader = &wav.Header{
SampleRate: 24000,
BitDepth: 16,
NumChannels: 1,
}
type geminiRequest struct {
Contents []*geminiRequestContent `json:"contents"`
Config *geminiRequestConfig `json:"generationConfig"`
}
type geminiRequestContent struct {
Parts []*geminiRequestPart `json:"parts"`
}
type geminiRequestPart struct {
Text string `json:"text"`
}
type geminiRequestConfig struct {
ResponseModalities []string `json:"responseModalities"`
SpeechConfig *geminiRequestSpeechConfig `json:"speechConfig"`
}
type geminiRequestSpeechConfig struct {
VoiceConfig *geminiRequestVoiceConfig `json:"voiceConfig,omitempty"`
MultiSpeakerVoiceConfig *geminiRequestMultiSpeakerVoiceConfig `json:"multiSpeakerVoiceConfig,omitempty"`
}
type geminiRequestVoiceConfig struct {
PrebuiltVoiceConfig *geminiRequestPrebuiltVoiceConfig `json:"prebuiltVoiceConfig,omitempty"`
}
type geminiRequestPrebuiltVoiceConfig struct {
VoiceName string `json:"voiceName,omitempty"`
}
type geminiRequestMultiSpeakerVoiceConfig struct {
SpeakerVoiceConfigs []*geminiRequestSpeakerVoiceConfig `json:"speakerVoiceConfigs,omitempty"`
}
type geminiRequestSpeakerVoiceConfig struct {
Speaker string `json:"speaker,omitempty"`
VoiceConfig *geminiRequestVoiceConfig `json:"voiceConfig,omitempty"`
}
type geminiResponse struct {
Candidates []*geminiResponseCandidate `json:"candidates"`
}
type geminiResponseCandidate struct {
Content *geminiResponseContent `json:"content"`
}
type geminiResponseContent struct {
Parts []*geminiResponsePart `json:"parts"`
}
type geminiResponsePart struct {
InlineData *geminiResponseInlineData `json:"inlineData"`
}
type geminiResponseInlineData struct {
MimeType string `json:"mimeType"`
Data string `json:"data"` // Base64 encoded.
}

View File

@@ -18,8 +18,10 @@ package llm
import (
"bytes"
"context"
"io"
"reflect"
"strconv"
"strings"
"sync"
"time"
@@ -42,19 +44,33 @@ import (
// --- Interface code block ---
type LLM interface {
component.Component
text
audio
}
type text interface {
String(ctx context.Context, messages []string) (string, error)
EmbeddingLabels(ctx context.Context, labels model.Labels) ([][]float32, error)
Embedding(ctx context.Context, text string) ([]float32, error)
}
type audio interface {
WAV(ctx context.Context, text string, speakers []Speaker) (io.ReadCloser, error)
}
type Speaker struct {
Name string
Voice string
}
type Config struct {
Name string
Default bool
Provider ProviderType
Endpoint string
APIKey string
Model, EmbeddingModel string
Temperature float32
Name string
Default bool
Provider ProviderType
Endpoint string
APIKey string
Model, EmbeddingModel, TTSModel string
Temperature float32
}
type ProviderType string
@@ -72,7 +88,7 @@ var defaultEndpoints = map[ProviderType]string{
ProviderTypeOpenAI: "https://api.openai.com/v1",
ProviderTypeOpenRouter: "https://openrouter.ai/api/v1",
ProviderTypeDeepSeek: "https://api.deepseek.com/v1",
ProviderTypeGemini: "https://generativelanguage.googleapis.com/v1beta/openai",
ProviderTypeGemini: "https://generativelanguage.googleapis.com/v1beta",
ProviderTypeVolc: "https://ark.cn-beijing.volces.com/api/v3",
ProviderTypeSiliconFlow: "https://api.siliconflow.cn/v1",
}
@@ -97,8 +113,8 @@ func (c *Config) Validate() error { //nolint:cyclop
if c.APIKey == "" {
return errors.New("api key is required")
}
if c.Model == "" && c.EmbeddingModel == "" {
return errors.New("model or embedding model is required")
if c.Model == "" && c.EmbeddingModel == "" && c.TTSModel == "" {
return errors.New("model or embedding model or tts model is required")
}
if c.Temperature < 0 || c.Temperature > 2 {
return errors.Errorf("invalid temperature: %f, should be in range [0, 2]", c.Temperature)
@@ -182,6 +198,7 @@ func (c *FactoryConfig) From(app *config.App) {
APIKey: llm.APIKey,
Model: llm.Model,
EmbeddingModel: llm.EmbeddingModel,
TTSModel: llm.TTSModel,
Temperature: llm.Temperature,
})
}
@@ -207,12 +224,9 @@ func NewFactory(
) (Factory, error) {
if len(mockOn) > 0 {
mf := &mockFactory{}
getCall := mf.On("Get", mock.Anything)
getCall.Run(func(args mock.Arguments) {
m := &mockLLM{}
component.MockOptions(mockOn).Apply(&m.Mock)
getCall.Return(m, nil)
})
m := &mockLLM{}
component.MockOptions(mockOn).Apply(&m.Mock)
mf.On("Get", mock.Anything).Return(m)
mf.On("Reload", mock.Anything).Return(nil)
return mf, nil
@@ -307,11 +321,6 @@ func (f *factory) Get(name string) LLM {
continue
}
if f.llms[name] == nil {
llm := f.new(&llmC)
f.llms[name] = llm
}
return f.llms[name]
}
@@ -320,8 +329,12 @@ func (f *factory) Get(name string) LLM {
func (f *factory) new(c *Config) LLM {
switch c.Provider {
case ProviderTypeOpenAI, ProviderTypeOpenRouter, ProviderTypeDeepSeek, ProviderTypeGemini, ProviderTypeVolc, ProviderTypeSiliconFlow: //nolint:lll
case ProviderTypeOpenAI, ProviderTypeOpenRouter, ProviderTypeDeepSeek, ProviderTypeVolc, ProviderTypeSiliconFlow: //nolint:lll
return newCached(newOpenAI(c), f.Dependencies().KVStorage)
case ProviderTypeGemini:
return newCached(newGemini(c), f.Dependencies().KVStorage)
default:
return newCached(newOpenAI(c), f.Dependencies().KVStorage)
}
@@ -333,14 +346,17 @@ func (f *factory) initLLMs() {
llms = make(map[string]LLM, len(config.LLMs))
defaultLLM LLM
)
for _, llmC := range config.LLMs {
llm := f.new(&llmC)
llms[llmC.Name] = llm
if llmC.Name == config.defaultLLM {
defaultLLM = llm
}
}
f.llms = llms
f.defaultLLM = defaultLLM
}
@@ -392,6 +408,9 @@ func (c *cached) String(ctx context.Context, messages []string) (string, error)
if err != nil {
return "", err
}
if strings.Trim(value, " \n\r\t") == "" {
return "", errors.New("empty response") // Gemini may occur this.
}
// TODO: reduce copies.
if err = c.kvStorage.Set(ctx, []byte(keyStr), []byte(value), 65*time.Minute); err != nil {
@@ -482,12 +501,27 @@ func (m *mockLLM) String(ctx context.Context, messages []string) (string, error)
func (m *mockLLM) EmbeddingLabels(ctx context.Context, labels model.Labels) ([][]float32, error) {
args := m.Called(ctx, labels)
if args.Error(1) != nil {
return nil, args.Error(1)
}
return args.Get(0).([][]float32), args.Error(1)
}
func (m *mockLLM) Embedding(ctx context.Context, text string) ([]float32, error) {
args := m.Called(ctx, text)
if args.Error(1) != nil {
return nil, args.Error(1)
}
return args.Get(0).([]float32), args.Error(1)
}
func (m *mockLLM) WAV(ctx context.Context, text string, speakers []Speaker) (io.ReadCloser, error) {
args := m.Called(ctx, text, speakers)
if args.Error(1) != nil {
return nil, args.Error(1)
}
return args.Get(0).(io.ReadCloser), args.Error(1)
}

View File

@@ -18,6 +18,7 @@ package llm
import (
"context"
"encoding/json"
"io"
"github.com/pkg/errors"
oai "github.com/sashabaranov/go-openai"
@@ -31,9 +32,7 @@ import (
type openai struct {
*component.Base[Config, struct{}]
client *oai.Client
embeddingSpliter embeddingSpliter
text
}
func newOpenAI(c *Config) LLM {
@@ -42,18 +41,34 @@ func newOpenAI(c *Config) LLM {
client := oai.NewClientWithConfig(config)
embeddingSpliter := newEmbeddingSpliter(1536, 64)
base := component.New(&component.BaseConfig[Config, struct{}]{
Name: "LLM/openai",
Instance: c.Name,
Config: c,
})
return &openai{
Base: component.New(&component.BaseConfig[Config, struct{}]{
Name: "LLM/openai",
Instance: c.Name,
Config: c,
}),
client: client,
embeddingSpliter: embeddingSpliter,
Base: base,
text: &openaiText{
Base: base,
client: client,
embeddingSpliter: embeddingSpliter,
},
}
}
func (o *openai) String(ctx context.Context, messages []string) (value string, err error) {
func (o *openai) WAV(ctx context.Context, text string, speakers []Speaker) (r io.ReadCloser, err error) {
return nil, errors.New("not supported")
}
type openaiText struct {
*component.Base[Config, struct{}]
client *oai.Client
embeddingSpliter embeddingSpliter
}
func (o *openaiText) String(ctx context.Context, messages []string) (value string, err error) {
ctx = telemetry.StartWith(ctx, append(o.TelemetryLabels(), telemetrymodel.KeyOperation, "String")...)
defer func() { telemetry.End(ctx, err) }()
@@ -91,7 +106,7 @@ func (o *openai) String(ctx context.Context, messages []string) (value string, e
return resp.Choices[0].Message.Content, nil
}
func (o *openai) EmbeddingLabels(ctx context.Context, labels model.Labels) (value [][]float32, err error) {
func (o *openaiText) EmbeddingLabels(ctx context.Context, labels model.Labels) (value [][]float32, err error) {
ctx = telemetry.StartWith(ctx, append(o.TelemetryLabels(), telemetrymodel.KeyOperation, "EmbeddingLabels")...)
defer func() { telemetry.End(ctx, err) }()
@@ -117,7 +132,7 @@ func (o *openai) EmbeddingLabels(ctx context.Context, labels model.Labels) (valu
return vecs, nil
}
func (o *openai) Embedding(ctx context.Context, s string) (value []float32, err error) {
func (o *openaiText) Embedding(ctx context.Context, s string) (value []float32, err error) {
ctx = telemetry.StartWith(ctx, append(o.TelemetryLabels(), telemetrymodel.KeyOperation, "Embedding")...)
defer func() { telemetry.End(ctx, err) }()

View File

@@ -17,9 +17,13 @@ package rewrite
import (
"context"
"fmt"
"html/template"
"io"
"regexp"
"strconv"
"strings"
"time"
"unicode/utf8"
"github.com/pkg/errors"
@@ -30,10 +34,12 @@ import (
"github.com/glidea/zenfeed/pkg/llm"
"github.com/glidea/zenfeed/pkg/llm/prompt"
"github.com/glidea/zenfeed/pkg/model"
"github.com/glidea/zenfeed/pkg/storage/object"
"github.com/glidea/zenfeed/pkg/telemetry"
telemetrymodel "github.com/glidea/zenfeed/pkg/telemetry/model"
"github.com/glidea/zenfeed/pkg/util/buffer"
"github.com/glidea/zenfeed/pkg/util/crawl"
hashutil "github.com/glidea/zenfeed/pkg/util/hash"
)
// --- Interface code block ---
@@ -68,7 +74,8 @@ func (c *Config) From(app *config.App) {
}
type Dependencies struct {
LLMFactory llm.Factory
LLMFactory llm.Factory // NOTE: String() with cache.
ObjectStorage object.Storage
}
type Rule struct {
@@ -120,32 +127,98 @@ func (r *Rule) Validate() error { //nolint:cyclop,gocognit,funlen
}
// Transform.
if r.Transform != nil {
if r.Transform.ToText == nil {
return errors.New("to_text is required when transform is set")
if r.Transform != nil { //nolint:nestif
if r.Transform.ToText != nil && r.Transform.ToPodcast != nil {
return errors.New("to_text and to_podcast can not be set at same time")
}
if r.Transform.ToText == nil && r.Transform.ToPodcast == nil {
return errors.New("either to_text or to_podcast must be set when transform is set")
}
switch r.Transform.ToText.Type {
case ToTextTypePrompt:
if r.Transform.ToText.Prompt == "" {
return errors.New("to text prompt is required for prompt type")
if r.Transform.ToText != nil {
switch r.Transform.ToText.Type {
case ToTextTypePrompt:
if r.Transform.ToText.Prompt == "" {
return errors.New("to text prompt is required for prompt type")
}
tmpl, err := template.New("").Parse(r.Transform.ToText.Prompt)
if err != nil {
return errors.Wrapf(err, "parse prompt template %s", r.Transform.ToText.Prompt)
}
buf := buffer.Get()
defer buffer.Put(buf)
if err := tmpl.Execute(buf, prompt.Builtin); err != nil {
return errors.Wrapf(err, "execute prompt template %s", r.Transform.ToText.Prompt)
}
r.Transform.ToText.promptRendered = buf.String()
case ToTextTypeCrawl, ToTextTypeCrawlByJina:
// No specific validation for crawl type here, as the source text itself is the URL.
default:
return errors.Errorf("unknown transform type: %s", r.Transform.ToText.Type)
}
tmpl, err := template.New("").Parse(r.Transform.ToText.Prompt)
if err != nil {
return errors.Wrapf(err, "parse prompt template %s", r.Transform.ToText.Prompt)
}
if r.Transform.ToPodcast != nil {
if len(r.Transform.ToPodcast.Speakers) == 0 {
return errors.New("at least one speaker is required for to_podcast")
}
buf := buffer.Get()
defer buffer.Put(buf)
if err := tmpl.Execute(buf, prompt.Builtin); err != nil {
return errors.Wrapf(err, "execute prompt template %s", r.Transform.ToText.Prompt)
}
r.Transform.ToText.promptRendered = buf.String()
r.Transform.ToPodcast.speakers = make([]llm.Speaker, len(r.Transform.ToPodcast.Speakers))
var speakerDescs []string
var speakerNames []string
for i, s := range r.Transform.ToPodcast.Speakers {
if s.Name == "" {
return errors.New("speaker name is required")
}
if s.Voice == "" {
return errors.New("speaker voice is required")
}
r.Transform.ToPodcast.speakers[i] = llm.Speaker{Name: s.Name, Voice: s.Voice}
case ToTextTypeCrawl, ToTextTypeCrawlByJina:
// No specific validation for crawl type here, as the source text itself is the URL.
default:
return errors.Errorf("unknown transform type: %s", r.Transform.ToText.Type)
desc := s.Name
if s.Role != "" {
desc += " (" + s.Role + ")"
}
speakerDescs = append(speakerDescs, desc)
speakerNames = append(speakerNames, s.Name)
}
speakersDesc := "- " + strings.Join(speakerDescs, "\n- ")
exampleSpeaker1 := speakerNames[0]
exampleSpeaker2 := exampleSpeaker1
if len(speakerNames) > 1 {
exampleSpeaker2 = speakerNames[1]
}
promptSegments := []string{
"Please convert the following article into a podcast dialogue script.",
"The speakers are:\n" + speakersDesc,
}
if r.Transform.ToPodcast.EstimateMaximumDuration > 0 {
wordsPerMinute := 200
totalMinutes := int(r.Transform.ToPodcast.EstimateMaximumDuration.Minutes())
estimatedWords := totalMinutes * wordsPerMinute
promptSegments = append(promptSegments, fmt.Sprintf("The script should be approximately %d words to fit within a %d-minute duration. If the original content is not sufficient, the script can be shorter as appropriate.", estimatedWords, totalMinutes))
}
if r.Transform.ToPodcast.TranscriptAdditionalPrompt != "" {
promptSegments = append(promptSegments, "Additional instructions: "+r.Transform.ToPodcast.TranscriptAdditionalPrompt)
}
promptSegments = append(promptSegments,
"The output format MUST be a script where each line starts with the speaker's name followed by a colon and a space.",
"Do NOT include any other text, explanations, or formatting before or after the script.",
"Do NOT use background music in the script.",
"Do NOT include any greetings or farewells (e.g., 'Hello everyone', 'Welcome to our show', 'Goodbye').",
fmt.Sprintf("Example of the required format:\n%s: Today we are discussing the article's main points.\n%s: Let's start with the first one.", exampleSpeaker1, exampleSpeaker2),
"Now, convert the article.",
)
r.Transform.ToPodcast.transcriptPrompt = strings.Join(promptSegments, "\n\n")
r.Transform.ToPodcast.speakersDesc = speakersDesc
}
}
@@ -160,9 +233,10 @@ func (r *Rule) Validate() error { //nolint:cyclop,gocognit,funlen
r.matchRE = re
// Action.
switch r.Action {
case "":
if r.Action == "" {
r.Action = ActionCreateOrUpdateLabel
}
switch r.Action {
case ActionCreateOrUpdateLabel:
if r.Label == "" {
return errors.New("label is required for create or update label action")
@@ -179,7 +253,7 @@ func (r *Rule) From(c *config.RewriteRule) {
r.If = c.If
r.SourceLabel = c.SourceLabel
r.SkipTooShortThreshold = c.SkipTooShortThreshold
if c.Transform != nil {
if c.Transform != nil { //nolint:nestif
t := &Transform{}
if c.Transform.ToText != nil {
toText := &ToText{
@@ -192,6 +266,25 @@ func (r *Rule) From(c *config.RewriteRule) {
}
t.ToText = toText
}
if c.Transform.ToPodcast != nil {
toPodcast := &ToPodcast{
LLM: c.Transform.ToPodcast.LLM,
EstimateMaximumDuration: time.Duration(c.Transform.ToPodcast.EstimateMaximumDuration),
TranscriptAdditionalPrompt: c.Transform.ToPodcast.TranscriptAdditionalPrompt,
TTSLLM: c.Transform.ToPodcast.TTSLLM,
}
if toPodcast.EstimateMaximumDuration == 0 {
toPodcast.EstimateMaximumDuration = 3 * time.Minute
}
for _, s := range c.Transform.ToPodcast.Speakers {
toPodcast.Speakers = append(toPodcast.Speakers, Speaker{
Name: s.Name,
Role: s.Role,
Voice: s.Voice,
})
}
t.ToPodcast = toPodcast
}
r.Transform = t
}
r.Match = c.Match
@@ -203,7 +296,8 @@ func (r *Rule) From(c *config.RewriteRule) {
}
type Transform struct {
ToText *ToText
ToText *ToText
ToPodcast *ToPodcast
}
type ToText struct {
@@ -220,6 +314,24 @@ type ToText struct {
promptRendered string
}
type ToPodcast struct {
LLM string
EstimateMaximumDuration time.Duration
TranscriptAdditionalPrompt string
TTSLLM string
Speakers []Speaker
transcriptPrompt string
speakersDesc string
speakers []llm.Speaker
}
type Speaker struct {
Name string
Role string
Voice string
}
type ToTextType string
const (
@@ -310,13 +422,9 @@ func (r *rewriter) Labels(ctx context.Context, labels model.Labels) (rewritten m
}
// Transform text if configured.
text := sourceText
if rule.Transform != nil && rule.Transform.ToText != nil {
transformed, err := r.transformText(ctx, rule.Transform, sourceText)
if err != nil {
return nil, errors.Wrap(err, "transform text")
}
text = transformed
text, err := r.transform(ctx, rule.Transform, sourceText)
if err != nil {
return nil, errors.Wrap(err, "transform")
}
// Check if text matches the rule.
@@ -338,18 +446,34 @@ func (r *rewriter) Labels(ctx context.Context, labels model.Labels) (rewritten m
return labels, nil
}
func (r *rewriter) transform(ctx context.Context, transform *Transform, sourceText string) (string, error) {
if transform == nil {
return sourceText, nil
}
if transform.ToText != nil {
return r.transformText(ctx, transform.ToText, sourceText)
}
if transform.ToPodcast != nil {
return r.transformPodcast(ctx, transform.ToPodcast, sourceText)
}
return sourceText, nil
}
// transformText transforms text using configured LLM or by crawling a URL.
func (r *rewriter) transformText(ctx context.Context, transform *Transform, text string) (string, error) {
switch transform.ToText.Type {
func (r *rewriter) transformText(ctx context.Context, toText *ToText, text string) (string, error) {
switch toText.Type {
case ToTextTypeCrawl:
return r.transformTextCrawl(ctx, r.crawler, text)
case ToTextTypeCrawlByJina:
return r.transformTextCrawl(ctx, r.jinaCrawler, text)
case ToTextTypePrompt:
return r.transformTextPrompt(ctx, transform, text)
return r.transformTextPrompt(ctx, toText, text)
default:
return r.transformTextPrompt(ctx, transform, text)
return r.transformTextPrompt(ctx, toText, text)
}
}
@@ -363,13 +487,13 @@ func (r *rewriter) transformTextCrawl(ctx context.Context, crawler crawl.Crawler
}
// transformTextPrompt transforms text using configured LLM.
func (r *rewriter) transformTextPrompt(ctx context.Context, transform *Transform, text string) (string, error) {
func (r *rewriter) transformTextPrompt(ctx context.Context, toText *ToText, text string) (string, error) {
// Get LLM instance.
llm := r.Dependencies().LLMFactory.Get(transform.ToText.LLM)
llm := r.Dependencies().LLMFactory.Get(toText.LLM)
// Call completion.
result, err := llm.String(ctx, []string{
transform.ToText.promptRendered,
toText.promptRendered,
text, // TODO: may place to first line to hit the model cache in different rewrite rules.
})
if err != nil {
@@ -388,6 +512,71 @@ func (r *rewriter) transformTextHack(text string) string {
return text
}
var audioKey = func(transcript, ext string) string {
hash := hashutil.Sum64(transcript)
file := strconv.FormatUint(hash, 10) + "." + ext
return "podcasts/" + file
}
func (r *rewriter) transformPodcast(ctx context.Context, toPodcast *ToPodcast, sourceText string) (url string, err error) {
transcript, err := r.generateTranscript(ctx, toPodcast, sourceText)
if err != nil {
return "", errors.Wrap(err, "generate podcast transcript")
}
audioKey := audioKey(transcript, "wav")
url, err = r.Dependencies().ObjectStorage.Get(ctx, audioKey)
switch {
case err == nil:
// May canceled at last time by reload, fast return.
return url, nil
case errors.Is(err, object.ErrNotFound):
// Not found, generate new audio.
default:
return "", errors.Wrap(err, "get audio")
}
audioStream, err := r.generateAudio(ctx, toPodcast, transcript)
if err != nil {
return "", errors.Wrap(err, "generate podcast audio")
}
defer func() {
if closeErr := audioStream.Close(); closeErr != nil {
err = errors.Wrap(err, "close audio stream")
}
}()
url, err = r.Dependencies().ObjectStorage.Put(ctx, audioKey, audioStream, "audio/wav")
if err != nil {
return "", errors.Wrap(err, "store podcast audio")
}
return url, nil
}
func (r *rewriter) generateTranscript(ctx context.Context, toPodcast *ToPodcast, sourceText string) (string, error) {
transcript, err := r.Dependencies().LLMFactory.Get(toPodcast.LLM).
String(ctx, []string{toPodcast.transcriptPrompt, sourceText})
if err != nil {
return "", errors.Wrap(err, "llm completion")
}
return toPodcast.speakersDesc +
"\n\nFollowed by the actual dialogue script:\n" +
transcript, nil
}
func (r *rewriter) generateAudio(ctx context.Context, toPodcast *ToPodcast, transcript string) (io.ReadCloser, error) {
audioStream, err := r.Dependencies().LLMFactory.Get(toPodcast.TTSLLM).
WAV(ctx, transcript, toPodcast.speakers)
if err != nil {
return nil, errors.Wrap(err, "calling tts llm")
}
return audioStream, nil
}
type mockRewriter struct {
component.Mock
}

View File

@@ -2,6 +2,8 @@ package rewrite
import (
"context"
"io"
"strings"
"testing"
. "github.com/onsi/gomega"
@@ -12,6 +14,7 @@ import (
"github.com/glidea/zenfeed/pkg/component"
"github.com/glidea/zenfeed/pkg/llm"
"github.com/glidea/zenfeed/pkg/model"
"github.com/glidea/zenfeed/pkg/storage/object"
"github.com/glidea/zenfeed/pkg/test"
)
@@ -19,8 +22,9 @@ func TestLabels(t *testing.T) {
RegisterTestingT(t)
type givenDetail struct {
config *Config
llmMock func(m *mock.Mock)
config *Config
llmMock func(m *mock.Mock)
objectStorageMock func(m *mock.Mock)
}
type whenDetail struct {
inputLabels model.Labels
@@ -173,7 +177,7 @@ func TestLabels(t *testing.T) {
},
ThenExpected: thenExpected{
outputLabels: nil,
err: errors.New("transform text: llm completion: LLM failed"),
err: errors.New("transform: llm completion: LLM failed"),
isErr: true,
},
},
@@ -220,22 +224,163 @@ func TestLabels(t *testing.T) {
isErr: false,
},
},
{
Scenario: "Successfully generate podcast from content",
Given: "a rule to convert content to a podcast with all dependencies mocked to succeed",
When: "processing labels with content to be converted to a podcast",
Then: "should return labels with a new podcast_url label",
GivenDetail: givenDetail{
config: &Config{
{
SourceLabel: model.LabelContent,
Transform: &Transform{
ToPodcast: &ToPodcast{
LLM: "mock-llm-transcript",
TTSLLM: "mock-llm-tts",
Speakers: []Speaker{{Name: "narrator", Voice: "alloy"}},
},
},
Action: ActionCreateOrUpdateLabel,
Label: "podcast_url",
},
},
llmMock: func(m *mock.Mock) {
m.On("String", mock.Anything, mock.Anything).Return("This is the podcast script.", nil).Once()
m.On("WAV", mock.Anything, mock.Anything, mock.AnythingOfType("[]llm.Speaker")).
Return(io.NopCloser(strings.NewReader("fake audio data")), nil).Once()
},
objectStorageMock: func(m *mock.Mock) {
m.On("Put", mock.Anything, mock.AnythingOfType("string"), mock.Anything, "audio/wav").
Return("http://storage.example.com/podcast.wav", nil).Once()
m.On("Get", mock.Anything, mock.AnythingOfType("string")).Return("", object.ErrNotFound).Once()
},
},
WhenDetail: whenDetail{
inputLabels: model.Labels{
{Key: model.LabelContent, Value: "This is a long article to be converted into a podcast."},
},
},
ThenExpected: thenExpected{
outputLabels: model.Labels{
{Key: model.LabelContent, Value: "This is a long article to be converted into a podcast."},
{Key: "podcast_url", Value: "http://storage.example.com/podcast.wav"},
},
isErr: false,
},
},
{
Scenario: "Fail podcast generation due to transcription LLM error",
Given: "a rule to convert content to a podcast, but the transcription LLM is mocked to fail",
When: "processing labels",
Then: "should return an error related to transcription failure",
GivenDetail: givenDetail{
config: &Config{
{
SourceLabel: model.LabelContent,
Transform: &Transform{
ToPodcast: &ToPodcast{LLM: "mock-llm-transcript", Speakers: []Speaker{{Name: "narrator", Voice: "alloy"}}},
},
Action: ActionCreateOrUpdateLabel, Label: "podcast_url",
},
},
llmMock: func(m *mock.Mock) {
m.On("String", mock.Anything, mock.Anything).Return("", errors.New("transcript failed")).Once()
},
},
WhenDetail: whenDetail{inputLabels: model.Labels{{Key: model.LabelContent, Value: "article"}}},
ThenExpected: thenExpected{
outputLabels: nil,
err: errors.New("transform: generate podcast transcript: llm completion: transcript failed"),
isErr: true,
},
},
{
Scenario: "Fail podcast generation due to TTS LLM error",
Given: "a rule to convert content to a podcast, but the TTS LLM is mocked to fail",
When: "processing labels",
Then: "should return an error related to TTS failure",
GivenDetail: givenDetail{
config: &Config{
{
SourceLabel: model.LabelContent,
Transform: &Transform{
ToPodcast: &ToPodcast{LLM: "mock-llm-transcript", TTSLLM: "mock-llm-tts", Speakers: []Speaker{{Name: "narrator", Voice: "alloy"}}},
},
Action: ActionCreateOrUpdateLabel, Label: "podcast_url",
},
},
llmMock: func(m *mock.Mock) {
m.On("String", mock.Anything, mock.Anything).Return("script", nil).Once()
m.On("WAV", mock.Anything, mock.Anything, mock.Anything).Return(nil, errors.New("tts failed")).Once()
},
objectStorageMock: func(m *mock.Mock) {
m.On("Get", mock.Anything, mock.AnythingOfType("string")).Return("", object.ErrNotFound).Once()
},
},
WhenDetail: whenDetail{inputLabels: model.Labels{{Key: model.LabelContent, Value: "article"}}},
ThenExpected: thenExpected{
outputLabels: nil,
err: errors.New("transform: generate podcast audio: calling tts llm: tts failed"),
isErr: true,
},
},
{
Scenario: "Fail podcast generation due to object storage error",
Given: "a rule to convert content to a podcast, but object storage is mocked to fail",
When: "processing labels",
Then: "should return an error related to storage failure",
GivenDetail: givenDetail{
config: &Config{
{
SourceLabel: model.LabelContent,
Transform: &Transform{
ToPodcast: &ToPodcast{LLM: "mock-llm-transcript", TTSLLM: "mock-llm-tts", Speakers: []Speaker{{Name: "narrator", Voice: "alloy"}}},
},
Action: ActionCreateOrUpdateLabel, Label: "podcast_url",
},
},
llmMock: func(m *mock.Mock) {
m.On("String", mock.Anything, mock.Anything).Return("script", nil).Once()
m.On("WAV", mock.Anything, mock.Anything, mock.Anything).Return(io.NopCloser(strings.NewReader("fake audio")), nil).Once()
},
objectStorageMock: func(m *mock.Mock) {
m.On("Put", mock.Anything, mock.Anything, mock.Anything, mock.Anything).Return("", errors.New("storage failed")).Once()
m.On("Get", mock.Anything, mock.AnythingOfType("string")).Return("", object.ErrNotFound).Once()
},
},
WhenDetail: whenDetail{inputLabels: model.Labels{{Key: model.LabelContent, Value: "article"}}},
ThenExpected: thenExpected{
outputLabels: nil,
err: errors.New("transform: store podcast audio: storage failed"),
isErr: true,
},
},
}
for _, tt := range tests {
t.Run(tt.Scenario, func(t *testing.T) {
// Given.
var mockLLMFactory llm.Factory
var mockInstance *mock.Mock // Store the mock instance for assertion
// Create mock factory and capture the mock.Mock instance.
mockOption := component.MockOption(func(m *mock.Mock) {
mockInstance = m // Capture the mock instance.
var mockLLMInstance *mock.Mock
llmMockOption := component.MockOption(func(m *mock.Mock) {
mockLLMInstance = m
if tt.GivenDetail.llmMock != nil {
tt.GivenDetail.llmMock(m)
}
})
mockLLMFactory, err := llm.NewFactory("", nil, llm.FactoryDependencies{}, mockOption) // Use the factory directly with the option
mockLLMFactory, err := llm.NewFactory("", nil, llm.FactoryDependencies{}, llmMockOption)
Expect(err).NotTo(HaveOccurred())
var mockObjectStorage object.Storage
var mockObjectStorageInstance *mock.Mock
objectStorageMockOption := component.MockOption(func(m *mock.Mock) {
mockObjectStorageInstance = m
if tt.GivenDetail.objectStorageMock != nil {
tt.GivenDetail.objectStorageMock(m)
}
})
mockObjectStorageFactory := object.NewFactory(objectStorageMockOption)
mockObjectStorage, err = mockObjectStorageFactory.New("test", nil, object.Dependencies{})
Expect(err).NotTo(HaveOccurred())
// Manually validate config to compile regex and render templates.
@@ -252,7 +397,8 @@ func TestLabels(t *testing.T) {
Instance: "test",
Config: tt.GivenDetail.config,
Dependencies: Dependencies{
LLMFactory: mockLLMFactory, // Pass the mock factory
LLMFactory: mockLLMFactory, // Pass the mock factory
ObjectStorage: mockObjectStorage,
},
}),
}
@@ -280,10 +426,12 @@ func TestLabels(t *testing.T) {
Expect(outputLabels).To(Equal(tt.ThenExpected.outputLabels))
}
// Verify LLM calls if stubs were provided.
if tt.GivenDetail.llmMock != nil && mockInstance != nil {
// Assert expectations on the captured mock instance.
mockInstance.AssertExpectations(t)
// Verify mock calls if stubs were provided.
if tt.GivenDetail.llmMock != nil && mockLLMInstance != nil {
mockLLMInstance.AssertExpectations(t)
}
if tt.GivenDetail.objectStorageMock != nil && mockObjectStorageInstance != nil {
mockObjectStorageInstance.AssertExpectations(t)
}
})
}

View File

@@ -80,6 +80,7 @@ func (c *Config) From(app *config.App) {
URL: app.Scrape.Sources[i].RSS.URL,
RSSHubEndpoint: app.Scrape.RSSHubEndpoint,
RSSHubRoutePath: app.Scrape.Sources[i].RSS.RSSHubRoutePath,
RSSHubAccessKey: app.Scrape.RSSHubAccessKey,
}
}
}
@@ -216,6 +217,10 @@ func (m *manager) reload(config *Config) (err error) {
func (m *manager) runOrRestartScrapers(config *Config, newScrapers map[string]scraper.Scraper) error {
for i := range config.Scrapers {
c := &config.Scrapers[i]
if err := c.Validate(); err != nil {
return errors.Wrapf(err, "validate scraper %s", c.Name)
}
if err := m.runOrRestartScraper(c, newScrapers); err != nil {
return errors.Wrapf(err, "run or restart scraper %s", c.Name)
}

View File

@@ -33,6 +33,7 @@ type ScrapeSourceRSS struct {
URL string
RSSHubEndpoint string
RSSHubRoutePath string
RSSHubAccessKey string
}
func (c *ScrapeSourceRSS) Validate() error {
@@ -46,9 +47,22 @@ func (c *ScrapeSourceRSS) Validate() error {
return errors.New("URL must be a valid HTTP/HTTPS URL")
}
// Append access key as query parameter if provided
c.appendAccessKey()
return nil
}
func (c *ScrapeSourceRSS) appendAccessKey() {
if c.RSSHubEndpoint != "" && c.RSSHubAccessKey != "" && !strings.Contains(c.URL, "key=") {
if strings.Contains(c.URL, "?") {
c.URL += "&key=" + c.RSSHubAccessKey
} else {
c.URL += "?key=" + c.RSSHubAccessKey
}
}
}
// --- Factory code block ---
func newRSSReader(config *ScrapeSourceRSS) (reader, error) {
if err := config.Validate(); err != nil {

View File

@@ -122,6 +122,55 @@ func TestNewRSS(t *testing.T) {
},
},
},
{
Scenario: "Valid Configuration - RSSHub with Access Key",
Given: "a valid configuration with RSSHub details and access key",
When: "creating a new RSS reader",
Then: "should succeed, construct the URL with access key, and return a valid reader",
GivenDetail: givenDetail{
config: &ScrapeSourceRSS{
RSSHubEndpoint: "http://rsshub.app/",
RSSHubRoutePath: "/_/test",
RSSHubAccessKey: "testkey",
},
},
WhenDetail: whenDetail{},
ThenExpected: thenExpected{
wantErr: false,
validateFunc: func(t *testing.T, r reader) {
Expect(r).NotTo(BeNil())
rssReader, ok := r.(*rssReader)
Expect(ok).To(BeTrue())
Expect(rssReader.config.URL).To(Equal("http://rsshub.app/_/test?key=testkey"))
Expect(rssReader.config.RSSHubEndpoint).To(Equal("http://rsshub.app/"))
Expect(rssReader.config.RSSHubRoutePath).To(Equal("/_/test"))
Expect(rssReader.config.RSSHubAccessKey).To(Equal("testkey"))
},
},
},
{
Scenario: "Valid Configuration - URL with Access Key",
Given: "a valid configuration with URL and access key",
When: "creating a new RSS reader",
Then: "should succeed, append access key to URL, and return a valid reader",
GivenDetail: givenDetail{
config: &ScrapeSourceRSS{
URL: "http://example.com/feed",
RSSHubAccessKey: "testkey",
},
},
WhenDetail: whenDetail{},
ThenExpected: thenExpected{
wantErr: false,
validateFunc: func(t *testing.T, r reader) {
Expect(r).NotTo(BeNil())
rssReader, ok := r.(*rssReader)
Expect(ok).To(BeTrue())
Expect(rssReader.config.URL).To(Equal("http://example.com/feed"))
Expect(rssReader.config.RSSHubAccessKey).To(Equal("testkey"))
},
},
},
}
// --- Run tests ---

View File

@@ -69,6 +69,11 @@ func (c *Config) Validate() error {
if c.Name == "" {
return errors.New("name cannot be empty")
}
if c.RSS != nil {
if err := c.RSS.Validate(); err != nil {
return errors.Wrap(err, "invalid RSS config")
}
}
return nil
}

View File

@@ -244,7 +244,7 @@ func TestNew(t *testing.T) {
WhenDetail: whenDetail{},
ThenExpected: thenExpected{
isErr: true,
wantErrMsg: "creating source: invalid RSS config: URL must be a valid HTTP/HTTPS URL", // Error from newRSSReader via newReader
wantErrMsg: "invalid RSS config: URL must be a valid HTTP/HTTPS URL", // Error from newRSSReader via newReader
},
},
{
@@ -264,7 +264,7 @@ func TestNew(t *testing.T) {
WhenDetail: whenDetail{},
ThenExpected: thenExpected{
isErr: true,
wantErrMsg: "creating source: source not supported", // Error from newReader
wantErrMsg: "source not supported", // Error from newReader
},
},
}

View File

@@ -0,0 +1,229 @@
// Copyright (C) 2025 wangyusong
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Affero General Public License for more details.
//
// You should have received a copy of the GNU Affero General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.
package object
import (
"context"
"io"
"net/url"
"strings"
"github.com/minio/minio-go/v7"
"github.com/minio/minio-go/v7/pkg/credentials"
"github.com/pkg/errors"
"github.com/glidea/zenfeed/pkg/component"
"github.com/glidea/zenfeed/pkg/config"
"github.com/glidea/zenfeed/pkg/telemetry"
"github.com/glidea/zenfeed/pkg/telemetry/log"
telemetrymodel "github.com/glidea/zenfeed/pkg/telemetry/model"
)
// --- Interface code block ---
type Storage interface {
component.Component
config.Watcher
Put(ctx context.Context, key string, body io.Reader, contentType string) (url string, err error)
Get(ctx context.Context, key string) (url string, err error)
}
var ErrNotFound = errors.New("not found")
type Config struct {
Endpoint string
AccessKeyID string
SecretAccessKey string
client *minio.Client
Bucket string
BucketURL string
bucketURL *url.URL
}
func (c *Config) Validate() error {
if c.Empty() {
return nil
}
if c.Endpoint == "" {
return errors.New("endpoint is required")
}
c.Endpoint = strings.TrimPrefix(c.Endpoint, "https://") // S3 endpoint should not have https:// prefix.
c.Endpoint = strings.TrimPrefix(c.Endpoint, "http://")
if c.AccessKeyID == "" {
return errors.New("access key id is required")
}
if c.SecretAccessKey == "" {
return errors.New("secret access key is required")
}
client, err := minio.New(c.Endpoint, &minio.Options{
Creds: credentials.NewStaticV4(c.AccessKeyID, c.SecretAccessKey, ""),
Secure: true,
})
if err != nil {
return errors.Wrap(err, "new minio client")
}
c.client = client
if c.Bucket == "" {
return errors.New("bucket is required")
}
if c.BucketURL == "" {
return errors.New("bucket url is required")
}
u, err := url.Parse(c.BucketURL)
if err != nil {
return errors.Wrap(err, "parse public url")
}
c.bucketURL = u
return nil
}
func (c *Config) From(app *config.App) *Config {
*c = Config{
Endpoint: app.Storage.Object.Endpoint,
AccessKeyID: app.Storage.Object.AccessKeyID,
SecretAccessKey: app.Storage.Object.SecretAccessKey,
Bucket: app.Storage.Object.Bucket,
BucketURL: app.Storage.Object.BucketURL,
}
return c
}
func (c *Config) Empty() bool {
return c.Endpoint == "" && c.AccessKeyID == "" && c.SecretAccessKey == "" && c.Bucket == "" && c.BucketURL == ""
}
type Dependencies struct{}
// --- Factory code block ---
type Factory component.Factory[Storage, config.App, Dependencies]
func NewFactory(mockOn ...component.MockOption) Factory {
if len(mockOn) > 0 {
return component.FactoryFunc[Storage, config.App, Dependencies](
func(instance string, config *config.App, dependencies Dependencies) (Storage, error) {
m := &mockStorage{}
component.MockOptions(mockOn).Apply(&m.Mock)
return m, nil
},
)
}
return component.FactoryFunc[Storage, config.App, Dependencies](new)
}
func new(instance string, app *config.App, dependencies Dependencies) (Storage, error) {
config := &Config{}
config.From(app)
if err := config.Validate(); err != nil {
return nil, errors.Wrap(err, "validate config")
}
return &s3{
Base: component.New(&component.BaseConfig[Config, Dependencies]{
Name: "ObjectStorage",
Instance: instance,
Config: config,
Dependencies: dependencies,
}),
}, nil
}
// --- Implementation code block ---
type s3 struct {
*component.Base[Config, Dependencies]
}
func (s *s3) Put(ctx context.Context, key string, body io.Reader, contentType string) (publicURL string, err error) {
ctx = telemetry.StartWith(ctx, append(s.TelemetryLabels(), telemetrymodel.KeyOperation, "Put")...)
defer func() { telemetry.End(ctx, err) }()
config := s.Config()
if config.Empty() {
return "", errors.New("not configured")
}
if _, err := config.client.PutObject(ctx, config.Bucket, key, body, -1, minio.PutObjectOptions{
ContentType: contentType,
}); err != nil {
return "", errors.Wrap(err, "put object")
}
return config.bucketURL.JoinPath(key).String(), nil
}
func (s *s3) Get(ctx context.Context, key string) (publicURL string, err error) {
ctx = telemetry.StartWith(ctx, append(s.TelemetryLabels(), telemetrymodel.KeyOperation, "Get")...)
defer func() { telemetry.End(ctx, err) }()
config := s.Config()
if config.Empty() {
return "", errors.New("not configured")
}
if _, err := config.client.StatObject(ctx, config.Bucket, key, minio.StatObjectOptions{}); err != nil {
errResponse := minio.ToErrorResponse(err)
if errResponse.Code == minio.NoSuchKey {
return "", ErrNotFound
}
return "", errors.Wrap(err, "stat object")
}
return config.bucketURL.JoinPath(key).String(), nil
}
func (s *s3) Reload(app *config.App) (err error) {
ctx := telemetry.StartWith(s.Context(), append(s.TelemetryLabels(), telemetrymodel.KeyOperation, "Reload")...)
defer func() { telemetry.End(ctx, err) }()
newConfig := &Config{}
newConfig.From(app)
if err := newConfig.Validate(); err != nil {
return errors.Wrap(err, "validate config")
}
s.SetConfig(newConfig)
log.Info(ctx, "object storage reloaded")
return nil
}
// --- Mock code block ---
type mockStorage struct {
component.Mock
}
func (m *mockStorage) Put(ctx context.Context, key string, body io.Reader, contentType string) (string, error) {
args := m.Called(ctx, key, body, contentType)
return args.String(0), args.Error(1)
}
func (m *mockStorage) Get(ctx context.Context, key string) (string, error) {
args := m.Called(ctx, key)
return args.String(0), args.Error(1)
}
func (m *mockStorage) Reload(app *config.App) error {
args := m.Called(app)
return args.Error(0)
}

View File

@@ -122,6 +122,32 @@ func ReadUint32(r io.Reader) (uint32, error) {
return binary.LittleEndian.Uint32(b), nil
}
// WriteUint16 writes a uint16 using a pooled buffer.
func WriteUint16(w io.Writer, v uint16) error {
bp := smallBufPool.Get().(*[]byte)
defer smallBufPool.Put(bp)
b := *bp
binary.LittleEndian.PutUint16(b, v)
_, err := w.Write(b[:2])
return err
}
// ReadUint16 reads a uint16 using a pooled buffer.
func ReadUint16(r io.Reader) (uint16, error) {
bp := smallBufPool.Get().(*[]byte)
defer smallBufPool.Put(bp)
b := (*bp)[:2]
// Read exactly 2 bytes into the slice.
if _, err := io.ReadFull(r, b); err != nil {
return 0, errors.Wrap(err, "read uint16")
}
return binary.LittleEndian.Uint16(b), nil
}
// WriteFloat32 writes a float32 using a pooled buffer.
func WriteFloat32(w io.Writer, v float32) error {
return WriteUint32(w, math.Float32bits(v))

100
pkg/util/wav/wav.go Normal file
View File

@@ -0,0 +1,100 @@
// Copyright (C) 2025 wangyusong
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Affero General Public License for more details.
//
// You should have received a copy of the GNU Affero General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.
package wav
import (
"io"
"github.com/pkg/errors"
binaryutil "github.com/glidea/zenfeed/pkg/util/binary"
)
// Header contains the WAV header information.
type Header struct {
SampleRate uint32
BitDepth uint16
NumChannels uint16
}
// WriteHeader writes the WAV header to a writer.
// pcmDataSize is the size of the raw PCM data.
func WriteHeader(w io.Writer, h *Header, pcmDataSize uint32) error {
// RIFF Header.
if err := writeRIFFHeader(w, pcmDataSize); err != nil {
return errors.Wrap(err, "write RIFF header")
}
// fmt chunk.
if err := writeFMTChunk(w, h); err != nil {
return errors.Wrap(err, "write fmt chunk")
}
// data chunk.
if _, err := w.Write([]byte("data")); err != nil {
return errors.Wrap(err, "write data chunk marker")
}
if err := binaryutil.WriteUint32(w, pcmDataSize); err != nil {
return errors.Wrap(err, "write pcm data size")
}
return nil
}
func writeRIFFHeader(w io.Writer, pcmDataSize uint32) error {
if _, err := w.Write([]byte("RIFF")); err != nil {
return errors.Wrap(err, "write RIFF")
}
if err := binaryutil.WriteUint32(w, uint32(36+pcmDataSize)); err != nil {
return errors.Wrap(err, "write file size")
}
if _, err := w.Write([]byte("WAVE")); err != nil {
return errors.Wrap(err, "write WAVE")
}
return nil
}
func writeFMTChunk(w io.Writer, h *Header) error {
if _, err := w.Write([]byte("fmt ")); err != nil {
return errors.Wrap(err, "write fmt")
}
if err := binaryutil.WriteUint32(w, uint32(16)); err != nil { // PCM chunk size.
return errors.Wrap(err, "write pcm chunk size")
}
if err := binaryutil.WriteUint16(w, uint16(1)); err != nil { // PCM format.
return errors.Wrap(err, "write pcm format")
}
if err := binaryutil.WriteUint16(w, h.NumChannels); err != nil {
return errors.Wrap(err, "write num channels")
}
if err := binaryutil.WriteUint32(w, h.SampleRate); err != nil {
return errors.Wrap(err, "write sample rate")
}
byteRate := h.SampleRate * uint32(h.NumChannels) * uint32(h.BitDepth) / 8
if err := binaryutil.WriteUint32(w, byteRate); err != nil {
return errors.Wrap(err, "write byte rate")
}
blockAlign := h.NumChannels * h.BitDepth / 8
if err := binaryutil.WriteUint16(w, blockAlign); err != nil {
return errors.Wrap(err, "write block align")
}
if err := binaryutil.WriteUint16(w, h.BitDepth); err != nil {
return errors.Wrap(err, "write bit depth")
}
return nil
}

161
pkg/util/wav/wav_test.go Normal file
View File

@@ -0,0 +1,161 @@
// Copyright (C) 2025 wangyusong
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Affero General Public License for more details.
//
// You should have received a copy of the GNU Affero General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.
package wav
import (
"bytes"
"testing"
. "github.com/onsi/gomega"
"github.com/glidea/zenfeed/pkg/test"
)
func TestWriteHeader(t *testing.T) {
RegisterTestingT(t)
type givenDetail struct{}
type whenDetail struct {
header *Header
pcmDataSize uint32
}
type thenExpected struct {
expectedBytes []byte
expectError bool
}
tests := []test.Case[givenDetail, whenDetail, thenExpected]{
{
Scenario: "Standard CD quality audio",
Given: "a header for CD quality audio and a non-zero data size",
When: "writing the header",
Then: "should produce a valid 44-byte WAV header and no error",
GivenDetail: givenDetail{},
WhenDetail: whenDetail{
header: &Header{
SampleRate: 44100,
BitDepth: 16,
NumChannels: 2,
},
pcmDataSize: 176400,
},
ThenExpected: thenExpected{
expectedBytes: []byte{
'R', 'I', 'F', 'F',
0x34, 0xB1, 0x02, 0x00, // ChunkSize = 36 + 176400 = 176436
'W', 'A', 'V', 'E',
'f', 'm', 't', ' ',
0x10, 0x00, 0x00, 0x00, // Subchunk1Size = 16
0x01, 0x00, // AudioFormat = 1 (PCM)
0x02, 0x00, // NumChannels = 2
0x44, 0xAC, 0x00, 0x00, // SampleRate = 44100
0x10, 0xB1, 0x02, 0x00, // ByteRate = 176400
0x04, 0x00, // BlockAlign = 4
0x10, 0x00, // BitsPerSample = 16
'd', 'a', 't', 'a',
0x10, 0xB1, 0x02, 0x00, // Subchunk2Size = 176400
},
expectError: false,
},
},
{
Scenario: "Mono audio for speech",
Given: "a header for mono speech audio and a non-zero data size",
When: "writing the header",
Then: "should produce a valid 44-byte WAV header and no error",
GivenDetail: givenDetail{},
WhenDetail: whenDetail{
header: &Header{
SampleRate: 16000,
BitDepth: 16,
NumChannels: 1,
},
pcmDataSize: 32000,
},
ThenExpected: thenExpected{
expectedBytes: []byte{
'R', 'I', 'F', 'F',
0x24, 0x7D, 0x00, 0x00, // ChunkSize = 36 + 32000 = 32036
'W', 'A', 'V', 'E',
'f', 'm', 't', ' ',
0x10, 0x00, 0x00, 0x00, // Subchunk1Size = 16
0x01, 0x00, // AudioFormat = 1
0x01, 0x00, // NumChannels = 1
0x80, 0x3E, 0x00, 0x00, // SampleRate = 16000
0x00, 0x7D, 0x00, 0x00, // ByteRate = 32000
0x02, 0x00, // BlockAlign = 2
0x10, 0x00, // BitsPerSample = 16
'd', 'a', 't', 'a',
0x00, 0x7D, 0x00, 0x00, // Subchunk2Size = 32000
},
expectError: false,
},
},
{
Scenario: "8-bit mono audio with zero data size",
Given: "a header for 8-bit mono audio and a zero data size",
When: "writing the header for an empty file",
Then: "should produce a valid 44-byte WAV header with data size 0",
GivenDetail: givenDetail{},
WhenDetail: whenDetail{
header: &Header{
SampleRate: 8000,
BitDepth: 8,
NumChannels: 1,
},
pcmDataSize: 0,
},
ThenExpected: thenExpected{
expectedBytes: []byte{
'R', 'I', 'F', 'F',
0x24, 0x00, 0x00, 0x00, // ChunkSize = 36 + 0 = 36
'W', 'A', 'V', 'E',
'f', 'm', 't', ' ',
0x10, 0x00, 0x00, 0x00, // Subchunk1Size = 16
0x01, 0x00, // AudioFormat = 1
0x01, 0x00, // NumChannels = 1
0x40, 0x1F, 0x00, 0x00, // SampleRate = 8000
0x40, 0x1F, 0x00, 0x00, // ByteRate = 8000
0x01, 0x00, // BlockAlign = 1
0x08, 0x00, // BitsPerSample = 8
'd', 'a', 't', 'a',
0x00, 0x00, 0x00, 0x00, // Subchunk2Size = 0
},
expectError: false,
},
},
}
for _, tt := range tests {
t.Run(tt.Scenario, func(t *testing.T) {
// Given.
var buf bytes.Buffer
// When.
err := WriteHeader(&buf, tt.WhenDetail.header, tt.WhenDetail.pcmDataSize)
// Then.
if tt.ThenExpected.expectError {
Expect(err).To(HaveOccurred())
} else {
Expect(err).NotTo(HaveOccurred())
Expect(buf.Bytes()).To(Equal(tt.ThenExpected.expectedBytes))
}
})
}
}