
語音搜索的未來 : CUDA (GPU高速運算) 加速的新領域 -- 達x75倍速 -- 這領域有可能促成一家"未來的Google"公司嗎?
”語音索引”, 這是我們最近在GPGPU.org讀到的, 一篇由Nexiwave公司所提出的新CUDA研究領域文章. 我們聯繫了Nexiwave公司的執行長, Ben Jiang, 以期能了解更多. 以下是訪談內容的摘要:
We recently read about a "speech indexing" start-up called Nexiwave in a posting on gpgpu.org. We wanted to learn more so we contacted CEO Ben Jiang. Here's an excerpt from our email interview:
NVIDIA: Ben, 請問您為何認為”語音索引”是很重要的應用?
Ben: 人類90%的日常溝通來自於語言. 如果我們從已經錄製的聲音檔(純聲音檔, 或者影片中的聲音檔), 搜尋包含某段關鍵字的語音, 你將會面臨建立龐大的語音索引資料庫, 以及從這樣龐大資料庫進行搜尋比對工作的困難.
Skype用戶至今已經累積了上千億分鐘的通話量, 而電話會議公司每個月也新增超過十億分鐘的龐大電話會議內容, 網路上還有成千上萬的錄像檔案 – 根據Youtue統計, 每一分鐘全球的youtube用戶就能累積上傳24小時規模的新影片.
問題是, 當今的資訊找回技術, 也就是搜尋技術, 只著重在文字搜索; 而語音資訊的找回技術仍然倚靠人類自己的記憶力 (註:人腦是專長於平行運算的).
語音索引的目標, 就是讓我們可以快速從封存的聲音與影像檔案中, 很容易的找出我們要的片段.
使用Nexiwave系統, 消費者能很簡單的搜尋聲音/影像內容, 並精確標示出你有興趣找回的片段, 不論你是用單字, 片語, 或者主題的語音索引搜尋.
NVIDIA: Ben, what makes speech indexing a compelling application?
Ben: Ninety percent of human communication is through speech. The amount of spoken words that could potentially be indexed and searched is staggering. Skype callers have logged over 100 billion minutes of talk time. Conference call companies are carrying over a billion minutes of calls per month. There are hundreds of millions of podcasts on the web, with 24 hours of video uploaded to YouTube every minute.
The problem is that today's information retrieval applications, such as internet search, focus on textual content. Information retrieval from speech content still relies primarily on a human's memory. The objective of speech indexing is to enable us to easily extract information from archived audio and video content. Through the Nexiwave system, an end user can easily search the content and locate the exact location of interest, whether it's a word, a phrase or a general topic.
NVIDIA: 請您舉例一些關於語音索引很有潛力的市場?
Ben: 想想看跨國公司24小時不斷進行的電話會議, 我們常常有這樣的需求: “嗯…上次John會議中講了個很重要的內容, 我希望我還能精準的記得他是怎麼說的…”
在未來, 只要是”語音索引”啟動的電話會議, 我們將能輕鬆的搜尋並找出我們要的聲音片段.
另一個有潛力的市場是電話服務中心(維修or提出需求), 企業不在只是紀錄客人的來電號碼與次數, 而將能清楚的理解客人想要的服務與建議是什麼.
其他幾個有潛力的市場還包括了: 電子搜索(當然必須在合法範圍內使用), 或者紀錄有教育意義的媒體內容.
林耀南註: 我想, Doraemon的翻譯麵包(翻訳コンニャク, translation bread)也可以成真吧.
NVIDIA: What are some of the potentially big markets for speech indexing?
Ben: Think about the conference calls that happen 24x7 at companies around the world. We've all had moments where we thought: "Ahh, John said something really useful in the last call. I wish I could remember exactly what he said." In the future, with speech indexing-enabled conference calls, we will be able to easily do that via a quick search to locate the exact audio snippet. Another interesting market is call centers, where the ability to do a deep search (not just time of call and phone number) will enable companies to find out what customers are really telling them. Other markets are e-discovery (in the legal field), recorded educational media, podcasts and audio-centric enterprises.
NVIDIA: 請問您的語音索引已經成功到哪個程度?
Ben: Nexiwave 1.0已經於2009年10月發佈過了.
最新採用NVIDIA GPU高速平行運算的CUDA版本, Nexiwave 2.0, 也剛剛在2010年6月3日公開推出.
我們能提供SaaS (軟體即服務, Software as a service), 以及雲端運算方案, 也能提供軟體授權服務.
NVIDIA: What stage is your technology in?
Ben: Nexiwave 1.0 was released in October 2009. Nexiwave 2.0, our NVIDIA GPU-enabled version, was released on June 3, 2010 and is in production. We offer a SaaS (software as a service) and cloud computing solution as well as software licenses.
NVIDIA: 請問您的Nexiwave語音索引方案, 和CMU Sphinx (Carnegie Mellon公司的一種語音辨識系統) 的關係是什麼?
Ben: CMU Sphinx 是一套非常普遍的開源(open source)語音處理引擎. 我們的Nexiwave建立在它之上, 而更勝於它, 關鍵在於我們許多獨家的改善方案. 我舉一個例子, 也就是以CUDA這種GPU高速平行運算為基礎的聲音計分分析(一種完全改寫的聲音計分程式碼).
我們是在此領域一個非常重要的公司, 能夠自主貢獻此領域程式碼改進與新程式碼的改寫, 還能教導下游客戶or論壇上的獨立程式開發者如何依據我們的Nexiwave做出他們要的產品.
NVIDIA: What is the connection between Nexiwave and CMU Sphinx, the speech recognition system from Carnegie Mellon?
Ben: CMU Sphinx is a very popular open source speech processing engine. Our system is built on top of it with many of our own proprietary improvements, such as CUDA-based acoustic scoring (a total re-write of the core acoustic scoring code). We are one of the major commercial companies contributing to it through code fixes, developer resources and user forum support.
NVIDIA: 請問您GPU如何緊密的幫助Nexiwave運算?
Ben: 語音索引需要極為繁複的運算, 用傳統方法(CPU)進行這樣的運算將所費不貲.
語音索引可以用平行運算大幅增進效率 (註: 人腦是專長於平行運算的, 所以也專長於回想影像與聲音的片段, 也就是我們俗稱的記憶力), 這恰好是GPU高速平行運算處理器的專長.
使用GPU高速平行運算處理器, 成本大幅下降, 我們可以增加語音索引資料庫的量, 從而增快搜索的速度與精準度.
NVIDIA: Where does the GPU fit into this?
Ben: Speech indexing is computationally intensive and has traditionally been very expensive. Speech indexing can be efficiently processed in parallel which means the GPU is a perfect fit for it. The GPU will solve the cost issue associated with indexing vast amounts of audio content quickly and accurately.
NVIDIA: 請問您怎麼看CUDA C的程式開發環境?
Ben: 我們以CUDA C進行程式開發的過程, 簡直可以用”享受”這兩字來形容.
詳細的CUDA Best Practices Guide文件, 對我們提供極多(tons)很棒的暴衝效能調整上的幫助.
NVIDIA: How did you like programming/porting in the CUDA C environment?
Ben: Our experience with programming in CUDA C has been enjoyable. The CUDA Best Practices Guide provided tons of help in performance tuning.
NVIDIA: 請問CUDA如何幫助您所說的, 暴衝效能調整上的幫助?
Ben: Nexiwave經由CUDA的幫助, 已經將75%的運算(也就是換算成每秒鐘這麼短的聲音檔案, 就必須進行一千一百萬次的繁複運算) 轉移到CUDA C的GPU高速平行運算, 暴衝效能達x75倍速!!
這也表示 與傳統方法(CPU)相比, 只要1/75的成本!!
令人興奮的是, 這樣的x75倍速效能, 把我們Nexiwave推向市場以獲利, 尤其在語音索引搜尋是不可能的任務的時候.
NVIDIA: How does CUDA help you?
Ben: Nexiwave has been able to move 75% of our computing processes (or 11 million computation loops per audio minute) to CUDA C, speeding up our application by more than 75 times. This directly translates into cost reduction (we have released a large number of CPU machines back to our computing provider). The exciting thing about this speedup is that it enables us to move into markets where speech indexing has not been possible before.
需要更多訊息, 請拜訪www.mexiwave.com網站, 或者發電子郵件給eric@nexiwave.com.
For more info, see www.nexiwave.com or email eric@nexiwave.com.