Data Pipeline, Events and Natural Language Processing

Build a Data Cleaning & Validation Pipeline in Under 50 Lines of Python

KDnuggets

JUNE 24, 2025

🔗 Link to the code on GitHub Why Data Cleaning Pipelines? Think of data pipelines like assembly lines in manufacturing. Performance optimization : For large datasets, consider using vectorized operations or parallel processing. Wrapping Up Data pipelines arent just about cleaning individual datasets.

Python

Python Natural Language Processing Data Science Machine Learning

Go vs. Python for Modern Data Workflows: Need Help Deciding?

KDnuggets

JUNE 19, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter Go vs. Python for Modern Data Workflows: Need Help Deciding?

Python

Python Natural Language Processing Data Science Machine Learning

Streaming Langchain: Real-time Data Processing with AI

Data Science Dojo

NOVEMBER 25, 2024

As the world becomes more interconnected and data-driven, the demand for real-time applications has never been higher. Artificial intelligence (AI) and natural language processing (NLP) technologies are evolving rapidly to manage live data streams.

AI

AI AI Predictive Analytics Python

Automate Data Quality Reports with n8n: From CSV to Professional Analysis

KDnuggets

JUNE 26, 2025

Scheduled Analysis Replace the Manual Trigger with a Schedule Trigger to automatically analyze datasets at regular intervals, perfect for monitoring data sources that update frequently. This proactive approach helps you identify data pipeline issues before they impact downstream analysis or model performance.

Data Quality

Data Quality Data Science Natural Language Processing Machine Learning

Time series forecasting with LLM-based foundation models and scalable AIOps on AWS

AWS Machine Learning Blog

MARCH 5, 2025

Chronos is founded on a key insight: both LLMs and time series forecasting aim to decode sequential patterns to predict future events. This parallel allows us to treat time series data as a language to be modeled by off-the-shelf transformer architectures. Outside of work, he enjoys game development and rock climbing.

AWS

AWS Machine Learning Machine Learning Natural Language Processing

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

If the question was Whats the schedule for AWS events in December?, AWS usually announces the dates for their upcoming # re:Invent event around 6-9 months in advance. Rajesh Nedunuri is a Senior Data Engineer within the Amazon Worldwide Returns and ReCommerce Data Services team.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

14 Datasets for Economics to Help Find and Use Data for Powerful Insights

ODSC - Open Data Science

JUNE 13, 2025

Ideal for building forecasting models or studying market reactions to events. Global Economic Indicators (2010–2023): Offers data on GDP, inflation, employment, and trade for dozens of countries — perfect for comparative studies. Zillow Economics Data: This dataset captures U.S. housing prices by ZIP code.

Data Science

Data Science Data Scientist Natural Language Processing Data Pipeline

10 Data Engineering Topics and Trends You Need to Know in 2024

ODSC - Open Data Science

JANUARY 9, 2024

Data Engineering for Large Language Models LLMs are artificial intelligence models that are trained on massive datasets of text and code. They are used for a variety of tasks, such as natural language processing, machine translation, and summarization. Interested in attending an ODSC event?

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Meet the Seattle-area startups that just graduated from Y Combinator

Flipboard

SEPTEMBER 25, 2023

Brian Chesky, CEO of Airbnb, spoke at a Y Combinator event this summer. (Y The companies include: Talc AI, a service for assessing large language models. We borrow proven techniques from the latest in NLP (natural language processing) academia to build evaluation tooling that any software engineer can use.

Data Pipeline

Data Pipeline Natural Language Processing AI AI

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

AWS Machine Learning Blog

MARCH 14, 2024

Amazon Kendra uses natural language processing (NLP) to understand user queries and find the most relevant documents. For our final structured and unstructured data pipeline, we observe Anthropic’s Claude 2 on Amazon Bedrock generated better overall results for our final data pipeline.

SQL

SQL AWS AI AI

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Data Engineering : Building and maintaining data pipelines, ETL (Extract, Transform, Load) processes, and data warehousing. Artificial Intelligence : Concepts of AI include neural networks, natural language processing (NLP), and reinforcement learning.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

Robust time series forecasting with MLOps on Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 28, 2023

In these applications, time series data can have heavy-tailed distributions, where the tails represent extreme values. Accurate forecasting in these regions is important in determining how likely an extreme event is and whether to raise an alarm. However, the extreme event will have zero probability.

AWS

AWS Machine Learning Machine Learning ML

6 Remote AI Jobs to Look for in 2024

ODSC - Open Data Science

DECEMBER 19, 2023

Data Engineer Data engineers are responsible for the end-to-end process of collecting, storing, and processing data. They use their knowledge of data warehousing, data lakes, and big data technologies to build and maintain data pipelines. Interested in attending an ODSC event?

Data Scientist

Data Scientist Machine Learning Machine Learning Computer Science

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

IBM Journey to AI blog

AUGUST 12, 2024

MLOps aims to bridge the gap between data science and operational teams so they can reliably and efficiently transition ML models from development to production environments, all while maintaining high model performance and accuracy. AIOps integrates these models into existing IT systems to enhance their functions and performance.

Big Data

Big Data Big Data ML ML

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 15, 2023

Elementl / Dagster Labs Elementl and Dagster Labs are both companies that provide platforms for building and managing data pipelines. Elementl’s platform is designed for data engineers, while Dagster Labs’ platform is designed for data scientists. Interested in attending an ODSC event?

Machine Learning

Machine Learning Machine Learning Data Pipeline AI

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Learn more The Best Tools, Libraries, Frameworks and Methodologies that ML Teams Actually Use – Things We Learned from 41 ML Startups [ROUNDUP] Key use cases and/or user journeys Identify the main business problems and the data scientist’s needs that you want to solve with ML, and choose a tool that can handle them effectively.

Machine Learning

Machine Learning Machine Learning ML ML

How to use foundation models and trusted governance to manage AI workflow risk

IBM Journey to AI blog

OCTOBER 16, 2023

Foundation models: The power of curated datasets Foundation models , also known as “transformers,” are modern, large-scale AI models trained on large amounts of raw, unlabeled data. In addition to natural language, models are trained on various modalities, such as code, time-series, tabular, geospatial and IT events data.

AI

AI AI Data Warehouse ML

The Top LLMs and AI Tools in 2024 So Far

ODSC - Open Data Science

MAY 9, 2024

From state-of-the-art language models to innovative AI-driven applications, to new open-source models hoping to take away GPT’s crown, let’s take a tour of some of the most notable AI tools and top LLMs that are working to shape how 2024 concludes, and how AI will shape the future. Interested in attending an ODSC event?

Machine Learning

Machine Learning Machine Learning AI AI

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

Snorkel AI

JUNE 9, 2023

Snorkel AI wrapped the second day of our The Future of Data-Centric AI virtual conference by showcasing how Snorkel’s data-centric platform has enabled customers to succeed, taking a deep look at Snorkel Flow’s capabilities, and announcing two new solutions. You need to find a place to park your data.

AI

AI AI Data Scientist Machine Learning

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

Snorkel AI

JUNE 9, 2023

Snorkel AI wrapped the second day of our The Future of Data-Centric AI virtual conference by showcasing how Snorkel’s data-centric platform has enabled customers to succeed, taking a deep look at Snorkel Flow’s capabilities, and announcing two new solutions. You need to find a place to park your data.

AI

AI AI Data Scientist Machine Learning

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

With proper unstructured data management, you can write validation checks to detect multiple entries of the same data. Continuous learning: In a properly managed unstructured data pipeline, you can use new entries to train a production ML model, keeping the model up-to-date.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

How to Create a Fan 360 Profile with Snowflake & Fivetran

phData

DECEMBER 12, 2023

This allows proactive and targeted engagement strategies, such as predicting attendance for specific events, tailoring promotions, and optimizing marketing efforts. Integrating natural language processing capabilities allows for more human-like interactions, enhancing the overall fan experience.

Data Warehouse

Data Warehouse Machine Learning Machine Learning Tableau

What Does the Modern Data Scientist Look Like? Insights from 30,000 Job Descriptions

ODSC - Open Data Science

JANUARY 7, 2025

Natural Language Processing (NLP) has emerged as a dominant area, with tasks like sentiment analysis, machine translation, and chatbot development leading the way. Data Engineering Data engineering remains integral to many data science roles, with workflow pipelines being a key focus.

Data Scientist

Data Scientist Data Science Machine Learning Machine Learning

Taking the First Steps Toward Enterprise AI

phData

JUNE 7, 2023

DL is particularly effective in processing large amounts of unstructured data, such as images, audio, and text. Natural Language Processing (NLP) : NLP is a branch of AI that deals with the interaction between computers and human languages.

AI

AI AI Machine Learning Machine Learning

A review of purpose-built accelerators for financial services

AWS Machine Learning Blog

SEPTEMBER 11, 2024

Socio-political events have also caused delays and issues, such as a COVID backlog, and with inert gases for manufacturing coming from Russia. The benchmark used is the RoBERTa-Base, a popular model used in natural language processing (NLP) applications, that uses the transformer architecture.

AWS

AWS ML ML Clustering

AI in Time Series Forecasting

Pickl AI

DECEMBER 16, 2024

Long Short-Term Memory (LSTM) A type of recurrent neural network (RNN) designed to learn long-term dependencies in sequential data. Facebook Prophet A user-friendly tool that automatically detects seasonality and trends in time series data. This is vital for agriculture, disaster management, and event planning.

AI

AI AI Machine Learning Machine Learning

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Data Scientists can use Azure Data Factory to prepare data for analysis by creating data pipelines that ingest data from multiple sources, clean and transform it, and load it into Azure data stores.

Azure

Azure Data Scientist Data Science Machine Learning

Mastering Duplicate Data Management in Machine Learning for Optimal Model Performance

DagsHub

JANUARY 14, 2025

Similar Audio: Audio recordings of the same event or sound but with different microphone placements or background noise. It would help to improve the process in future by creating a clear audit trail of how duplicate records are identified and handled throughout the data pipeline.

Machine Learning

Machine Learning Machine Learning Clustering Algorithm

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

The service will consume the features in real time, generate predictions in near real-time , such as in an event processing pipeline, and write the outputs to a prediction queue. Solution Data lakes and warehouses are the two key components of any data pipeline. Data engineers are mostly in charge of it.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

How Active Learning Can Improve Your Computer Vision Pipeline

DagsHub

DECEMBER 23, 2024

Query Synthesis Scenario : Training a model to classify rare astronomical events using synthetic telescope data. Libact : It is a Python package for active learning. It provides implementations of various active learning algorithms like uncertainty sampling, query-by-committee, and density-weighted methods.

Deep Learning

Deep Learning Deep Learning Supervised Learning Clustering

Indian BFSI Reinvents Risk Detection with AI-Driven Early Warning Systems

Flipboard

JULY 3, 2025

Encora also partners with BFSI clients to develop scalable, AI-driven EWS and real-time data pipelines using cloud-native architectures. Mhaskar highlighted that the company uses natural language processing (NLP) to analyse digital interactions and understand customer behaviour.

AI

AI AI Database Machine Learning

Ask HN: Who wants to be hired? (July 2025)

Hacker News

JULY 1, 2025

Prior to that, I spent a couple years at First Orion - a smaller data company - helping found & build out a data engineering team as one of the first engineers. We were focused on building data pipelines and models to protect our users from malicious phonecalls. I've secured $1.7M ecosystems.

Python

Python AWS SQL ML

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

Kaggle

JULY 29, 2020

David: My technical background is in ETL, data extraction, data engineering and data analytics. I spent over a decade of my career developing large-scale data pipelines to transform both structured and unstructured data into formats that can be utilized in downstream systems.

ETL

ETL Data Scientist Data Science Machine Learning

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

The MLOps Blog

AUGUST 11, 2023

Internally within Netflix’s engineering team, Meson was built to manage, orchestrate, schedule, and execute workflows within ML/Data pipelines. Meson managed the lifecycle of ML pipelines, providing functionality such as recommendations and content analysis, and leveraged the Single Leader Architecture.

ML

ML ML Machine Learning Machine Learning

The most important AI trends in 2024

IBM Journey to AI blog

FEBRUARY 9, 2024

But the most impactful developments may be those focused on governance, middleware, training techniques and data pipelines that make generative AI more trustworthy , sustainable and accessible, for enterprises and end users alike. Here are some important current AI trends to look out for in the coming year.

AI

AI AI Artificial Intelligence Artificial Intelligence

Data Science Current

Build a Data Cleaning & Validation Pipeline in Under 50 Lines of Python

Go vs. Python for Modern Data Workflows: Need Help Deciding?

Trending Sources

Streaming Langchain: Real-time Data Processing with AI

Automate Data Quality Reports with n8n: From CSV to Professional Analysis

Time series forecasting with LLM-based foundation models and scalable AIOps on AWS

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

14 Datasets for Economics to Help Find and Use Data for Powerful Insights

10 Data Engineering Topics and Trends You Need to Know in 2024

Meet the Seattle-area startups that just graduated from Y Combinator

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

A Guide to Choose the Best Data Science Bootcamp

Robust time series forecasting with MLOps on Amazon SageMaker

6 Remote AI Jobs to Look for in 2024

AIOps vs. MLOps: Harnessing big data for “smarter” ITOPs

Find Your AI Solutions at the ODSC West AI Expo

MLOps Landscape in 2023: Top Tools and Platforms

How to use foundation models and trusted governance to manage AI workflow risk

The Top LLMs and AI Tools in 2024 So Far

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

The Future of Data-Centric AI Day 2: Snorkel Flow and Beyond

How to Manage Unstructured Data in AI and Machine Learning Projects

How to Create a Fan 360 Profile with Snowflake & Fivetran

What Does the Modern Data Scientist Look Like? Insights from 30,000 Job Descriptions

Taking the First Steps Toward Enterprise AI

A review of purpose-built accelerators for financial services

AI in Time Series Forecasting

Your Complete Roadmap to Become an Azure Data Scientist

Mastering Duplicate Data Management in Machine Learning for Optimal Model Performance

Definite Guide to Building a Machine Learning Platform

How Active Learning Can Improve Your Computer Vision Pipeline

Indian BFSI Reinvents Risk Detection with AI-Driven Early Warning Systems

Ask HN: Who wants to be hired? (July 2025)

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission | A…

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

The most important AI trends in 2024

Stay Connected