Data Science Current

Google at Interspeech 2023

Google Research AI blog

AUGUST 21, 2023

Posted by Catherine Armato, Program Manager, Google This week, the 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023) is being held in Dublin, Ireland, representing one of the world’s most extensive conferences on research and technology of spoken language understanding and processing.

Clustering

Clustering AI AI

? Announcing our $50M Series C to build superhuman Speech AI models

AssemblyAI

DECEMBER 8, 2023

Join Us On Discord 🚀 Expanded AssemblyAI Docs We've published new tutorials that use our Speech-to-Text API to build Speech AI Applications. Speech-to-Text with Java : Make use of AssemblyAI's Java SDK to build applications with voice data in Java.

AI

AI AI Python

? New Punctuation & Casing Model For Real-Time Transcription

AssemblyAI

DECEMBER 15, 2023

Join Us On Discord 🚀 New Punctuation & Casing Model For Real-Time We recently released a significant improvement to our Punctuation and Truecasing model for asynchronous transcription. The approach is based on a joint audio-language pre-training that enhances performance without task-specific fine-tuning.

Python

Python AI AI

Webinars

The Key to Sustainable Energy Optimization: A Data-Driven Approach for Manufacturing

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

How To Get Promoted In Product Management

MORE WEBINARS

Universal Speech Model (USM): State-of-the-art speech AI for 100+ languages

Google Research AI blog

MARCH 6, 2023

Posted by Yu Zhang, Research Scientist, and James Qin, Software Engineer, Google Research Last November, we announced the 1,000 Languages Initiative , an ambitious commitment to build a machine learning (ML) model that would support the world’s one thousand most-spoken languages, bringing greater inclusion to billions of people around the globe.

Supervised Learning

Supervised Learning AI AI Algorithm

Understanding Generative and Discriminative Models

Chatbots Life

APRIL 16, 2024

By applying generative models in these areas, researchers and practitioners can unlock new possibilities in various domains, including computer vision, natural language processing, and data analysis. They capture temporal dependencies and are widely used in tasks like language translation and speech recognition.

Natural Language Processing

Natural Language Processing Support Vector Machines Machine Learning Machine Learning

A journey from hieroglyphs to chatbots: Understanding NLP over Google’s USM updates

Dataconomy

MARCH 14, 2023

Google, one of the world’s leading technology companies, has been at the forefront of research and development in these areas, with its latest advancements showing tremendous potential for improving the efficiency and effectiveness of NLP and conversational AI systems.

Natural Language Processing

Natural Language Processing Machine Learning Machine Learning Supervised Learning

AI for Universal Audio Understanding: Qwen-Audio Explained

AssemblyAI

DECEMBER 7, 2023

Researchers from Alibaba Group have introduced Qwen-Audio , a groundbreaking large-scale audio-language model that elevates the way AI systems process and reason about a diverse spectrum of audio signals. This article delves into the key findings from this recent research.

AI

AI AI Deep Learning Deep Learning

AVFormer: Injecting vision into frozen speech models for zero-shot AV-ASR

Google Research AI blog

JUNE 2, 2023

Posted by Arsha Nagrani and Paul Hongsuck Seo, Research Scientists, Google Research Automatic speech recognition (ASR) is a well-established technology that is widely adopted for various applications such as conference calls, streamed video transcription and voice commands. Unconstrained audiovisual speech recognition.

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

AWS Machine Learning Blog

NOVEMBER 15, 2023

Solution overview After extensive research, the Principal team finalized AWS Contact Center Intelligence (CCI) solution s, which empower companies to improve customer experience and gain conversation insights by adding AI capabilities to third-party on-premises and cloud contact centers.

AWS

AWS Analytics Analytics ML

Combining Speech Recognition and Diarization in one model

AssemblyAI

OCTOBER 27, 2023

Speaker Diarization is a powerful feature in multi-speaker speech processing that involves distinguishing and segmenting speech signals based on individual speakers. An example of raw transcription vs. transcription + diarization for conversational audio data. The SD model addresses the question who spoke when?

Clustering

Clustering Algorithm Deep Learning Deep Learning

Google Research, 2022 & beyond: Research community engagement

Google Research AI blog

FEBRUARY 28, 2023

Posted by Posted by Leslie Yeh, Director, University Relations (This is Part 9 in our series of posts covering different topical areas of research at Google. Sharing knowledge is essential to Google’s research philosophy — it accelerates technological progress and expands capabilities community-wide.

ML

ML ML Deep Learning Deep Learning

Introducing Our New Punctuation Restoration and Truecasing Models

AssemblyAI

NOVEMBER 8, 2023

The new neural truecasing architecture is loosely inspired by previous work published by Google Research [ 1 ] [ 2 ] and builds on other academic works around Punctuation Restoration and Truecasing [ 3 ] [ 4 ] [ 5 ]. Below, we provide a simplified explanation of the process and key findings from our research. billion words).

System Architecture

System Architecture Deep Learning Deep Learning AI

Google Research, 2022 & Beyond: Language, Vision and Generative Models

Google Research AI blog

JANUARY 18, 2023

Posted by Jeff Dean, Senior Fellow and SVP of Google Research, on behalf of the Google Research community Today we kick off a series of blog posts about exciting new developments from Google Research. Please keep your eye on this space and look for the title “Google Research, 2022 & Beyond” for more articles in the series.

ML

ML ML AI AI

HuggingFace research lead on unified foundation models

Snorkel AI

MARCH 8, 2023

Amanpreet Singh, Lead Researcher at Hugging Face gave a presentation entitled Towards Unified Foundation Models for Vision and Language Alignment a Snorkel AI’s Foundation Model Summit in January. Below follows a transcript of his talk, lightly edited for readability. So it was all over the place. In practice, does it work?

Natural Language Processing

Natural Language Processing AI AI

HuggingFace research lead on unified foundation models

Snorkel AI

MARCH 8, 2023

Amanpreet Singh, Lead Researcher at Hugging Face gave a presentation entitled Towards Unified Foundation Models for Vision and Language Alignment a Snorkel AI’s Foundation Model Summit in January. Below follows a transcript of his talk, lightly edited for readability. So it was all over the place. In practice, does it work?

Natural Language Processing

Natural Language Processing AI AI

Multi-Modal Methods: Visual Speech Recognition (Lip Reading)

ML Review

MAY 3, 2018

Readers are encouraged to view the piece through our website for the best experience: [link] Part One: Visual Speech Recognition (Lip Reading) Part Two: Image Captioning (From Translation to Attention) Feedback and comments are welcomed, either through medium or directly to info@themtank.com. An experience that weighs learning heavily.

Deep Learning

Deep Learning Deep Learning Natural Language Processing Machine Learning

Data Science Current

Google at Interspeech 2023

? Announcing our $50M Series C to build superhuman Speech AI models

Webinars

Trending Sources

? New Punctuation & Casing Model For Real-Time Transcription

Webinars

Universal Speech Model (USM): State-of-the-art speech AI for 100+ languages

Understanding Generative and Discriminative Models

A journey from hieroglyphs to chatbots: Understanding NLP over Google’s USM updates

AI for Universal Audio Understanding: Qwen-Audio Explained

AVFormer: Injecting vision into frozen speech models for zero-shot AV-ASR

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

Combining Speech Recognition and Diarization in one model

Google Research, 2022 & beyond: Research community engagement

Introducing Our New Punctuation Restoration and Truecasing Models

Google Research, 2022 & Beyond: Language, Vision and Generative Models

HuggingFace research lead on unified foundation models

HuggingFace research lead on unified foundation models

Multi-Modal Methods: Visual Speech Recognition (Lip Reading)

Stay Connected