Google Omni 视频模型泄露预览——全面解析及 Happy Horse vs Seedance 2 对比测试

视频信息

项目	内容
标题	Google Omni is INSANE! (Full-preview)
教程编号	09
视频 ID	B43xOmhxgtk
时长	26:24
频道	AI Samson
上传日期	2026-05-13
链接	https://www.youtube.com/watch?v=B43xOmhxgtk
主题	Google Omni 视频模型泄露预览、Happy Horse vs Seedance 2 深度对比

引言

“Nobody was supposed to see this leaked preview from Google. And it’s possibly the best AI video model we’ve ever seen.”

一条本不该被看到的泄露预览，让整个 AI 视频领域瞬间炸了锅。一位 Gemini 用户偶然发现，视频生成的归因标签从 “VO 3.1” 悄然变成了 “Google Omni”——这意味着 Google 正在秘密研发一个全新的多模态视频模型，而且从泄露的演示来看，它的表现可能远超当前所有竞品。

在这期视频中，AI Samson 不仅深入分析了 Google Omni 的泄露细节和令人震惊的演示效果，还对当前两大顶级 AI 视频模型——阿里巴巴的 Happy Horse 和 Seedance 2——进行了一场全方位的”擂台赛”。通过精心设计的多种场景提示词，他测试了文本到视频、图像到视频、音频同步、角色渲染、动作真实性等多个维度，最终给出了各模型的优劣评判。

如果你想了解 AI 视频生成的最前沿进展，搞清楚哪个模型最适合你的创作需求，这篇教程将为你提供完整的答案。

第一阶段 — Google Omni 泄露事件：发生了什么？

泄露的发现过程

事情的起因很简单：一位用户在使用 Gemini 时注意到，视频创建的归因标签发生了变化。

“One user was using Gemini and they noticed that the attribution to video creation had changed from VO 3.1 to Google Omni.”

原来显示的是 “VO 3.1”（即 Google 的 Veo 3.1 模型），突然变成了 “Google Omni”。更值得注意的是，UI 界面中还出现了一些全新的功能选项——包括用简单文本提示编辑视频的能力，这与 Google 此前推出的图像编辑工具（被称为 “nano banana”）类似。

“Some are calling it the nano banana for video.”

“Omni” 命名的含义

“Omni” 这个名字本身就透露了关键信息——它代表着这是一个多模态视频模型。

“This refers to the fact that it can be a multimodal video model, which means that we can use different types of content to inform our video creation.”

这意味着你可以： - 上传一段现有视频作为参考素材 - 仅上传一段音频轨道，让模型根据音频来生成对应的视频 - 使用简单的文本提示来编辑已有视频

为什么是现在泄露？

“One theory is that Google IO, their yearly conference, is starting in just a week. And it’s likely we’re going to get an official release then.”

AI Samson 认为，泄露的时间点并非巧合。Google I/O 年度大会即将在一周后举行，很可能会在大会上正式发布该模型。此外，大会还预计将带来新的强大 agentic 工作流，以及增强的图像、视频和音频模型。

第二阶段 — Google Omni 演示深度分析

演示一：黑板数学推导

第一段泄露的演示视频展示了一位教授在黑板上推导数学公式——从基本的三角恒等式 sin²+cos²=1 出发，推导出正切函数的恒等式。

“Why this is particularly interesting is because of the level of text fidelity. You can see that we have an incredible amount of accuracy for each of the characters.”

AI Samson 特别关注了两个方面：

文字保真度：黑板上的每个字符都有着令人难以置信的准确度——这在之前的 AI 视频模型中一直是个大问题
手部动作追踪：粉笔的运动轨迹精确地匹配了手部的运动

“Previously, it’s been a real challenge for AI video to accurately track the movements of a hand as it’s implicating onto a surface.”

当然，这段演示并非完美。AI Samson 发现了一个小瑕疵：

“There is just one moment that I spotted where this doesn’t quite happen, and that’s on the second stroke of this equal sign… it appears instead of gets drawn. It’s almost like he’s using magic to put on the equal sign.”

等号的第二笔不是被”画”出来的，而是像变魔术一样突然”出现”的。但总体而言，这依然是一个巨大的进步。

与 Seedance 2 的对比

为了验证 Google Omni 的水平，AI Samson 用完全相同的提示词在 Seedance 2 上跑了一遍。

“The writing is much less accurate. The way he’s moving his hand across the chalkboard does not make as much sense as it does in the Google Omni model.”

结论很明确：在文字保真和手部动作追踪这个特定用例上，Google Omni 明显优于 Seedance 2。

演示二：海滨用餐场景

第二段演示展示了两个人在海滨享用意大利面的场景。AI Samson 特别提到了”吃东西”这个在 AI 视频领域一直是噩梦的动作：

“Eating has also been one of the hardest challenges that AI has faced over the years, going from Will Smith eating spaghetti like this, to now we have two individuals who are more than capable of coherently eating a nice bowl of spaghetti bolognese.”

从经典的”Will Smith 吃意面”恐怖谷画面，到现在两个人能够连贯地享用一碗意面——进步是巨大的。

第三阶段 — Google Omni 的视频编辑潜力

文本驱动的视频编辑

Google Omni 最令人兴奋的功能之一，是通过简单文本提示直接编辑视频的能力。

“The ability to edit videos with simple text prompts. That could mean simply changing the lighting or adding in different elements.”

这意味着你可以： - 更改视频的光照条件 - 添加不同的视觉元素 - 添加特效 - 进行高级色彩分级

“This opens up the door for being able to use our existing real life footage and using AI video to directly enhance it in specific ways, like adding special effects and superior color grading.”

这打开了一扇全新的大门：你可以用真实拍摄的素材，通过 AI 视频模型对其进行特定方面的增强，而不是从零开始生成。

第四阶段 — Happy Horse vs Seedance 2：文字到视频全面对决

竞争格局一览

在等待 Google Omni 正式发布的同时，当前市场上的两大竞争者是阿里巴巴的 Happy Horse 和 Seedance 2。

“On the Artificial Analysis text-to-video leaderboard, this new model, Happy Hour by Alibaba, is leading the way. It is number one by quite a significant margin.”

但在 arena.ai 的 image-to-video 排行榜上，情况则不同：

“On the image-to-video arena, it’s a much more close-run race and it’s just edged out by Seedance 2.”

所以到底谁更强？AI Samson 设计了一系列极具挑战性的测试场景来寻找答案。

测试一：末日钢琴家

测试重点：音频同步、演奏动作真实性

Happy Horse 的表现： > “It does look like he might need to move his hands just a little bit more to really recreate the piece of music that we’re hearing.”

音频与画面基本同步，钢琴家的弹奏动作也算合理，但手部运动幅度不够大。

Seedance 2 的表现： > “The downside with this version of Seedance is that we did not get any music out at all.”

Seedance 2 完全没有生成音乐！在视觉上两者差不多，但音频同步这一关 Happy Horse 完胜。

测试二：会呼吸的图书馆（超现实场景）

测试重点：超现实效果的可信度

Happy Horse： > “It gives us a lovely whimsical feel to the piece… but the walking gait of the characters is too similar. It almost seems like they’re carbon copies of each other.”

画面有美好的奇幻感，但有两个问题——书本飞出的字符没有意义，而且角色的行走姿态过于相似，像是复制粘贴的。

Seedance 2： > “The atmosphere of this piece is certainly more eloquent, the camera movement is gracefully floating across the scene.”

Seedance 2 的氛围更优雅，镜头运动更流畅。但也有一个有趣的穿帮：

“He stamps the book and then closes it and suddenly this one book turns into two books.”

一本书合上后变成了两本——物体一致性仍然是 AI 视频的顽疾。

测试三：地铁武士（真实感测试）

测试重点：真实感、电影感、细节表现

“Happy Horse is doing pretty well here. There’s nothing we’d particularly pull out that is unusual or unrealistic.”

Happy Horse 表现不错，没有明显的不真实之处。

Seedance 2 的问题： > “The audio does not quite suit the piece as accurately… it almost feels like it speeds up here and then slows down.”

Seedance 2 在这个场景中音频不太匹配，而且节奏感有问题——时快时慢。这一轮 Happy Horse 略胜。

测试四：发光螃蟹迁徙（纹理与电影感）

测试重点：纹理渲染、视觉风格

Happy Horse 在这个场景中表现很差： > “The aesthetic of this piece suddenly has this very generic AI video style… everything almost flattens itself into a graphic art piece.”

过度饱和，纹理渲染低质，完全就是”一眼假”的 AI 视频风格。

而 Seedance 2 则大获全胜： > “SeaArt’s here is vastly superior. Not only are the textures more realistic, the detailed rendering as well as a more cinematic style of camera work, the entire piece knocks Happy Horse out of the water.”

纹理更真实、细节更丰富、镜头语言更有电影感——全方位碾压。

测试五：未来主义跑鞋广告（音效同步）

测试重点：音效时机、视觉细节

这个测试揭示了两个模型的核心差异。

Happy Horse 在音效方面表现出色： > “We are getting perfectly timed sound effects of splashing water as it comes up from the road.”

每一步踏水的音效都精确对应画面。

Seedance 2 在视觉上更胜一筹： > “The detail on Seedance 2 here is just much more crisp and cinematic. We get this wonderful close-up, and these individual drops rendered out in this slow-motion spray.”

但 Seedance 2 的音效存在严重问题： > “This first footstep is very good, but the second one is almost silent. It creates no rhythm and no consistency.”

第一步音效完美，第二步几乎无声——节奏完全崩溃。

测试六：飞机上下棋（多角色渲染）

测试重点：角色多样性、细节一致性

Seedance 2： > “There is pretty good variety in the likeness of all the characters… but some of the postures are just too similar, like these two gents are extremely mirrored.”

角色外观有一定多样性，但姿态过于镜像对称——这打破了真实感。

Happy Horse： > “We also have the same issue of quite a few very exact postures. We have a little bit more variety in the outfits.”

同样存在姿态重复的问题，但服装搭配多样性稍好。不过小文字渲染有问题——比如错误的数字 “23”。这一轮 AI Samson 略偏向 Happy Horse。

第五阶段 — AI 视频的真正潜力：一亿播放的启示

Wedg U Studio 的病毒式作品

在技术对比之外，AI Samson 特别分享了一个令人震撼的案例——来自 Wedg U Studio 的 AI 短视频，一周内获得了超过一亿次播放。

“He created this AI video last week that has gone on to get more than 100 million views in just a week.”

这段视频展示了一颗小行星撞击河流，引发巨大海啸，而画面前方一对正在争吵的情侣在世界末日来临之际转而深情拥吻。

“It speaks to perhaps something deeper about human psychology… facing impending doom, we can recognize that we can put our differences aside and focus on love.”

这个案例证明了一个核心观点：

“AI video is still challenging to get exceptional results out, but it is possible with the right process to get truly meaningful works of media.”

AI 视频的价值不在于技术本身的完美，而在于能否用它讲出打动人心的故事。

第六阶段 — 更多对比测试：表演与动作

外星人酒店场景（面部表情与表演）

Seedance 2： > “He looks somewhat motivated, but also disdained about his tough life. He’s interacting with the aliens with a good sense of meaning.”

表情丰富，演技可信——但有一个逻辑问题：角色走出建筑后，下一个镜头又出现在前台。

Happy Horse： > “The man’s acting, I would say, is less performative. It’s less interesting.”

表演感更弱，不够有趣。这一轮 Seedance 2 略胜。

孤独指挥家（情感表达）

这个测试揭示了一个有趣的分化：

Happy Horse： > “The expressions of the man are extremely exquisite… I suddenly feel some empathy and relation to this man.”

面部表情极其精致，让 AI Samson 产生了强烈的共情——这是非常了不起的成就。

Seedance 2： > “We get a little bit more dynamic movement, more intensity… but there’s not quite so much change in emotion. It’s almost as if he maintains this one expression throughout the piece.”

动作更有力度，但表情变化不够——几乎全程保持同一个表情。

“For the detail and cinematic realism, Seedance is the best, but for the emotional expression of humanity, I’d give it to Happy Horse.”

结论：电影感和细节给 Seedance 2，情感和人性表达给 Happy Horse。

火车站场景（物理规则）

Seedance 2 在这个场景中出了大问题： > “He simply just flies through the door of the train… the action that we get is just so obviously unreal and breaks the rules of physics.”

角色直接”飞”过了车门——完全违反物理定律。

Happy Horse 则有不同的问题： > “Happy Horse has this pretty high saturation feel on all of its pieces… it just feels a little bit garish. It almost has this feel of how the AI art generators were when we first got them.”

过高的饱和度让画面显得廉价，甚至让人想起早期 AI 图像生成器的风格。

环法自行车赛（复杂群体运动）

Happy Horse： > “There is just a little bit too much repetition in the cadence of all of the riders… they’re almost like synchronized with their up and down leg movements.”

所有骑手的蹬踏节奏完全同步——现实中不可能出现这种情况。

Seedance 2 则表现更好： > “Seedance is certainly performing this better… I not only enjoy the more rapid cuts that it provides, but also the camera angles.”

Seedance 2 提供了更多样的镜头切换和机位选择，角色的动作也更有个性差异。

“This man is definitely at the bottom of his stroke. These two men are in a little bit of a similar position. So it is giving us a little bit more variety.”

第七阶段 — Higgsfield MCP：AI 创意工作流实战

什么是 MCP？

在视频对比之外，AI Samson 还介绍了 Higgsfield MCP——一种将 AI 创意工具与 Claude 等 AI 助手连接起来的方式。

“An MCP allows you to leverage Higgsfield along with other AI tools. This means we can create complex agentic workflows.”

设置方法

具体操作步骤： 1. 打开 Claude，进入 Settings 2. 下拉到 Connectors 3. 访问 Higgsfield.ai/mcp 4. 复制 Custom Connector URL 5. 在 Claude 中添加新 Connector，选择 Custom Connector 6. 输入名称并粘贴 URL 7. 点击 Connect，授权 Higgsfield 账户 8. 允许所需的工具权限（图像/视频生成、只读工具、写入/删除工具）

实际应用演示

AI Samson 用一个提示词完成了品牌套件的创建：

“Create a brand kit for a protein coffee brand with packaging, socials, and website in Higgsfield.”

Claude 结合 Higgsfield 自动完成了： - 定义字体风格 - 设定语调 - 设计高能量美学风格 - 生成完整的品牌套件 - 同时并行生成多张图片

“We stop being the individual who has to write in one prompt, wait for that to finish, create another prompt… We can now go off and allow the AI to perform much more complex series of tasks while we sit back independently.”

病毒度预测功能

Higgsfield MCP 还有一个有趣的功能——engagement score（互动分数）：

“This is a virality predictor where it can take your content and tell you exactly how well it expects it to perform.”

AI 创业的思考

AI Samson 还分享了一个重要观察：

“The world is changing from where it’s getting much harder to find a job, but much easier to create a business or a small meaningful project that can generate revenue.”

AI 正在改变创业的门槛——找工作越来越难，但创建小型有价值的项目和生意却越来越容易。

第八阶段 — 大象画家：终极挑战与最终评判

笔触映射测试

这个测试用一头大象画画的超现实场景来考验模型的笔触追踪能力。

Happy Horse： > “Happy Horse is absolutely failing atrociously at this as the painting is just appearing as if by magic.”

画作像变魔术一样凭空出现，而不是被一笔一笔画出来的——完全失败。

Seedance 2： > “Seedance also has an extremely similar issue, and this is certainly one of the largest challenges that we have.”

Seedance 2 也有极其相似的问题——笔触映射仍然是 AI 视频模型的最大挑战之一。不过从美学角度来看，Seedance 2 略胜。

最终总结评判

“Google Omni looks like it’s going to be the best model we’ve ever seen, but we’re going to have to wait until we get our hands on that for full testing.”

“The new Happy Horse model is certainly giving us a new option, especially for complex audio environments. However, in my testing, I feel it still lets us down on advanced movement aesthetics and realism.”

“For that, the main player in the game, in my opinion, is Seedance 2. But, it’s only a matter of time until it’s dethroned.”

核心概念速查表

概念	说明
Google Omni	Google 即将发布的多模态视频生成模型，从泄露预览来看可能是目前最强的 AI 视频模型
VO 3.1 (Veo 3.1)	Google 此前的视频生成模型，Omni 是其继任者
Seedance 2	当前市场上综合表现最强的 AI 视频模型，电影感和视觉细节出色
Happy Horse	阿里巴巴推出的 AI 视频模型，在音频同步方面表现突出
多模态 (Multimodal)	能够接受多种输入类型（文字、图像、视频、音频）来生成视频
文字保真度 (Text Fidelity)	AI 视频中文字渲染的准确程度，一直是重大技术挑战
音频同步 (Audio Sync)	视频中的声音效果与画面动作精确匹配的能力
Higgsfield MCP	将 Higgsfield 创意工具与 Claude 等 AI 助手连接的协议，实现复杂创意工作流
Agentic Workflow	让 AI 自主完成多步骤复杂任务的工作流程
Engagement Score	Higgsfield 的病毒度预测功能，预估内容的传播潜力
Artificial Analysis	AI 视频模型的评测排行榜平台
arena.ai	另一个 AI 视频模型的对比排行榜

实用技巧总结

音频优先选 Happy Horse：如果你的视频项目需要精确的音效同步（如广告、音乐视频），Happy Horse 在音频生成和同步方面明显优于 Seedance 2
电影感优先选 Seedance 2：需要高质量的镜头语言、纹理细节和电影级视觉效果时，Seedance 2 仍然是最佳选择
等待 Google Omni：如果你的项目涉及文字渲染或手部动作追踪，不妨等待 Google Omni 的正式发布
关注 Google I/O 发布会：Google 年度大会预计将正式发布 Omni 模型，同时带来多项 AI 创新
用 Higgsfield MCP 构建创意工作流：将 Higgsfield 连接到 Claude，可以实现品牌资产的批量自动生成
同一提示词多模型对比：在确定最终使用哪个模型前，用完全相同的提示词在多个模型上测试，因为每个模型在不同场景中的表现差异很大
注意”一眼假”的 AI 视频特征：过高的饱和度、重复的角色姿态、违反物理规则的动作都是常见的穿帮特征
讲好故事比技术完美更重要：一亿播放量的 AI 短视频证明，打动人心的叙事远比画面完美更有价值

常见误区

“排行榜第一就是最好的模型”——Happy Horse 在 Artificial Analysis 排行榜上排名第一，但在多个实际测试场景中表现不如 Seedance 2。排行榜只能作为参考，实际效果因场景而异
“AI 视频已经可以完美模拟现实”——即使是最先进的模型也会出现角色穿过车门、一本书变两本、所有人同步踏步等违反物理定律的错误
“视觉效果好就是好模型”——Seedance 2 视觉上更出色，但音频同步方面远不如 Happy Horse。选模型要看你的具体需求
“Google Omni 已经可以使用了”——目前只有泄露预览，正式发布预计在 Google I/O 大会，完整功能还需等待
“AI 视频中文字渲染已经解决了”——Google Omni 的演示确实令人印象深刻，但即使在那段演示中也出现了等号”凭空出现”的瑕疵。这仍然是一个活跃的研究挑战
“一个模型就能满足所有需求”——实测表明，没有任何一个模型在所有维度上都是最好的。Happy Horse 赢在音效，Seedance 2 赢在视觉，Google Omni 可能赢在文字保真
“AI 创意工具只能一个一个单独使用”——通过 MCP 协议，你可以将多个 AI 工具串联起来，实现复杂的创意工作流，比手动一个个操作效率高得多
“AI 视频的角色都长得一样”——虽然角色多样性仍然是个问题（特别是姿态重复），但两个模型都已经能在同一场景中渲染出外观不同的角色
“AI 视频只能从零生成，不能编辑现有视频”——Google Omni 的核心卖点之一就是用文本提示编辑已有视频，包括改变光照、添加特效、色彩分级
“AI 视频的过饱和风格是不可避免的”——这主要是 Happy Horse 的问题，Seedance 2 已经能提供更自然、更电影化的色彩风格

关键要点

Google Omni 可能是迄今为止最强大的 AI 视频模型，其文字保真度和手部动作追踪在泄露演示中表现出色，远超当前竞品
Omni 的”多模态”特性意味着革命性的创作方式——你可以用视频、音频、图像和文字的任意组合来指导视频生成
文本驱动的视频编辑是 Omni 最具颠覆性的功能之一，可以用简单文字指令对真实拍摄素材进行 AI 增强
Happy Horse 在音频同步方面表现卓越，是目前唯一能在复杂音效场景中提供精确音画同步的模型
Seedance 2 在视觉质量和电影感方面仍然领先——更好的纹理、更丰富的镜头语言、更真实的细节
没有全能冠军——每个模型在不同场景中各有胜负，选择模型必须基于你的具体创作需求
AI 视频仍然面临严峻的”物理规则”挑战——角色穿过实体物体、物品数量突变、动作同步等问题普遍存在
角色多样性是当前的共同短板——无论是 Happy Horse 还是 Seedance 2，在多角色场景中都会出现姿态过于相似的问题
AI 视频的价值在于叙事而非技术——一亿播放量的短视频证明，好故事能超越技术局限，产生真正的情感共鸣
MCP 协议正在重塑 AI 创意工作流——将创意工具连接到 AI 助手，可以实现品牌资产的自动化批量生成
AI 正在降低创业门槛——从”找工作”到”创项目”，AI 工具让个人和小团队能完成过去需要大型团队才能完成的创意工作
AI 视频领域的王座更替速度极快——Seedance 2 目前领先，但 Google Omni 即将到来，今天的冠军明天就可能被取代

结论

AI 视频生成领域正处于一个关键的转折点。Google Omni 的泄露预览让我们看到了下一代 AI 视频模型的潜力——多模态输入、精确的文字渲染、自然的动作追踪，以及最重要的，通过文本提示编辑已有视频的能力。

与此同时，当前可用的 Happy Horse 和 Seedance 2 各有千秋：前者在音频同步和情感表达上更出色，后者在视觉品质和电影感上更胜一筹。没有一个模型能在所有维度上称王——这恰恰说明这个领域仍有巨大的发展空间。

最深刻的启示或许来自那段一亿播放量的末日拥吻短视频：技术可以不完美，但故事必须打动人心。AI 视频的未来不属于追求像素完美的人，而属于那些能用 AI 工具讲出有意义故事的创作者。

“AI video is still challenging to get exceptional results out, but it is possible with the right process to get truly meaningful works of media, and that is only going to enhance.”

正如 AI Samson 所说，这一切才刚刚开始——而且只会越来越好。