AI, Data Pipeline and Document - Data Science Current

Building End-to-End Data Pipelines: From Data Ingestion to Analysis

KDnuggets

JULY 15, 2025

By Josep Ferrer , KDnuggets AI Content Specialist on July 15, 2025 in Data Science Image by Author Delivering the right data at the right time is a primary need for any organization in the data-driven society. But lets be honest: creating a reliable, scalable, and maintainable data pipeline is not an easy task.

Data Pipeline

Data Pipeline Natural Language Processing Data Science SQL

What’s New with Azure Databricks: Unified Governance, Open Formats, and AI-Native Workloads

databricks

JULY 15, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Azure

Azure Power BI AI AI

What’s New: Lakeflow Jobs Provides More Efficient Data Orchestration

databricks

JULY 24, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Data Pipeline

Data Pipeline Data Engineer Data Engineering Data Engineering

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

8 Ways to Scale your Data Science Workloads

KDnuggets

JULY 22, 2025

Every data scientist has been there: downsampling a dataset because it won’t fit into memory or hacking together a way to let a business user interact with a machine learning model. Machine Learning in your Spreadsheets BQML training and prediction from a Google Sheet Many data conversations start and end in a spreadsheet.

Data Science

Data Science Natural Language Processing Machine Learning Machine Learning

The Ultimate Guide to Vibe Coding: 6 Powerful Frameworks Transforming Software Development

Data Science Dojo

JULY 24, 2025

At its core, vibe coding means expressing your intent in natural language and letting AI coding assistants translate that intent into working code. Vibe coding is a new paradigm in software development where you use natural language programming to instruct AI coding assistants to generate, modify, and even debug code.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

5 Fun Generative AI Projects for Absolute Beginners

Flipboard

JULY 23, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter 5 Fun Generative AI Projects for Absolute Beginners New to generative AI?

Natural Language Processing

Natural Language Processing Data Science Python Machine Learning

Building enterprise-scale RAG applications with Amazon S3 Vectors and DeepSeek R1 on Amazon SageMaker AI

Flipboard

JULY 17, 2025

However, standalone LLMs have key limitations such as hallucinations, outdated knowledge, and no access to proprietary data. Retrieval Augmented Generation (RAG) addresses these gaps by combining semantic search with generative AI , enabling models to retrieve relevant information from enterprise knowledge bases before responding.

AI

AI AI Database AWS

A Complete Guide to Matplotlib: From Basics to Advanced Plots

KDnuggets

JULY 21, 2025

Whether you’re visualizing climate data or plotting sales trends, the goal is clarity. The key is to start simple, iterate often, and don’t fear the documentation. Remember, even experts Google “how to add a second y-axis” sometimes.

Natural Language Processing

Natural Language Processing Data Science Machine Learning Machine Learning

What’s New: Zerobus and Other Announcements Improve Data Ingestion for Lakeflow Connect

databricks

JULY 23, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Database

Database Data Warehouse Data Engineer Data Engineering

Go vs. Python for Modern Data Workflows: Need Help Deciding?

KDnuggets

JUNE 19, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter Go vs. Python for Modern Data Workflows: Need Help Deciding?

Python

Python Natural Language Processing Data Science Machine Learning

The Lifecycle of Feature Engineering: From Raw Data to Model-Ready Inputs

Flipboard

JULY 16, 2025

Document Everything : Keep clear and versioned documentation of how each feature is created, transformed, and validated. Use Automation : Use tools like feature stores, pipelines, and automated feature selection to maintain consistency and reduce manual errors.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Science

What’s New in Lakeflow Declarative Pipelines: July 2025

databricks

JULY 22, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

AWS Machine Learning Blog

DECEMBER 4, 2024

Building generative AI applications presents significant challenges for organizations: they require specialized ML expertise, complex infrastructure management, and careful orchestration of multiple services. The following diagram illustrates the conceptual architecture of an AI assistant with Amazon Bedrock IDE.

AWS

AWS AI AI SQL

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

AWS Machine Learning Blog

OCTOBER 23, 2024

The landscape of enterprise application development is undergoing a seismic shift with the advent of generative AI. This intuitive platform enables the rapid development of AI-powered solutions such as conversational interfaces, document summarization tools, and content generation apps through a drag-and-drop interface.

AI

AI AI AWS Database

10 Free Online Courses to Master Python in 2025

KDnuggets

JULY 24, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter 10 Free Online Courses to Master Python in 2025 How can you master Python for free?

Python

Python Data Science Natural Language Processing Machine Learning

Real value, real time: Production AI with Amazon SageMaker and Tecton

AWS Machine Learning Blog

DECEMBER 4, 2024

Businesses are under pressure to show return on investment (ROI) from AI use cases, whether predictive machine learning (ML) or generative AI. Only 54% of ML prototypes make it to production, and only 5% of generative AI use cases make it to production. This post is cowritten with Isaac Cameron and Alex Gnibus from Tecton.

ML

ML ML AWS AI

Evaluate large language models for your machine translation tasks on AWS

AWS Machine Learning Blog

JANUARY 7, 2025

It is critical for AI models to capture not only the context, but also the cultural specificities to produce a more natural sounding translation. The solution offers two TM retrieval modes for users to choose from: vector and document search. For this post, we use a document store. Choose With Document Store.

AWS

AWS Python AI AI

Align and monitor your Amazon Bedrock powered insurance assistance chatbot to responsible AI principles with AWS Audit Manager

AWS Machine Learning Blog

JANUARY 7, 2025

Generative AI applications are gaining widespread adoption across various industries, including regulated industries such as financial services and healthcare. To address this need, AWS generative AI best practices framework was launched within AWS Audit Manager , enabling auditing and monitoring of generative AI applications.

AWS

AWS AI AI Database

Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

AWS Machine Learning Blog

JULY 3, 2025

This challenge led Swisscom , Switzerland’s leading telecommunications provider, to explore how AI can transform their network operations. This solution combines generative AI capabilities with a sophisticated data processing pipeline to help engineers quickly access and analyze network data.

AWS

AWS AI AI SQL

How Dataiku and Snowflake Strengthen the Modern Data Stack

phData

NOVEMBER 4, 2024

It must integrate seamlessly across data technologies in the stack to execute various workflows—all while maintaining a strong focus on performance and governance. Two key technologies that have become foundational for this type of architecture are the Snowflake AI Data Cloud and Dataiku. Let’s say your company makes cars.

Machine Learning

Machine Learning Machine Learning Data Science Data Preparation

AWS Machine Learning: A Beginner’s Guide

How to Learn Machine Learning

DECEMBER 24, 2024

If you’re diving into the world of machine learning, AWS Machine Learning provides a robust and accessible platform to turn your data science dreams into reality. Today, we’ll explore why Amazon’s cloud-based machine learning services could be your perfect starting point for building AI-powered applications.

Machine Learning

Machine Learning Machine Learning AWS ML

HCLTech’s AWS powered AutoWise Companion: A seamless experience for informed automotive buyer decisions with data-driven design

AWS Machine Learning Blog

JANUARY 15, 2025

This post introduces HCLTechs AutoWise Companion, a transformative generative AI solution designed to enhance customers vehicle purchasing journey. Powered by generative AI services on AWS and large language models (LLMs) multi-modal capabilities, HCLTechs AutoWise Companion provides a seamless and impactful experience.

AWS

AWS SQL AI AI

Build a conversational data assistant, Part 2 – Embedding generative business intelligence with Amazon Q in QuickSight

AWS Machine Learning Blog

JULY 11, 2025

In Part 1 of this series, we explored how Amazon’s Worldwide Returns & ReCommerce (WWRR) organization built the Returns & ReCommerce Data Assist (RRDA)—a generative AI solution that transforms natural language questions into validated SQL queries using Amazon Bedrock Agents.

Business Intelligence

Business Intelligence Business Intelligence SQL AWS

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

At the heart of this transformation is the OMRON Data & Analytics Platform (ODAP), an innovative initiative designed to revolutionize how the company harnesses its data assets. The robust security features provided by Amazon S3, including encryption and durability, were used to provide data protection.

AWS

AWS Data Governance Data Silos SQL

How Walmart built an AI platform that makes it beholden to no one (and that 1.5M associates actually want to use)

Flipboard

JUNE 24, 2025

VB Transform brings together the people building real enterprise AI strategy. Learn more Walmart isn’t buying enterprise AI solutions, they’re creating them in their AI foundry. Learn more Walmart isn’t buying enterprise AI solutions, they’re creating them in their AI foundry.

AI

AI AI Data Scientist Data Pipeline

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

AWS Machine Learning Blog

OCTOBER 24, 2024

Amazon Bedrock Agents helps you accelerate generative AI application development by orchestrating multistep tasks. With the power of AI automation, you can boost productivity and reduce cost. The generative AI–based application builder assistant from this post will help you accomplish tasks through all three tiers.

AWS

AWS SQL Database AI

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

phData

APRIL 28, 2025

Well start by breaking down what a Matillion pipeline is, then dive into some best practices to keep your workflows clean, scalable, and easy to maintain. As a bonus, well check out Matillions AI Copilot and see how AI can help take workflow design to the next level. Data tables used and their role in the workflow.

AI

AI AI SQL ETL

Scaling globally starts with building smarter, not selling faster

Dataconomy

JUNE 25, 2025

Clean, interoperable data pipelines : Having region-specific analytics, differentiated content such as marketing materials translated into various languages, and numerous CRM instances all add up to global operations. Consistent execution requires defined change management workflows and clearly delineated onboarding documentation.

Analytics

Analytics Analytics Data Pipeline Data Science

What is the Pile Dataset

Pickl AI

DECEMBER 25, 2024

It integrates diverse, high-quality content from 22 sources, enabling robust AI research and development. Its accessibility and scalability make it essential for applications like text generation, summarisation, and domain-specific AI solutions. Its diverse content includes academic papers, web data, books, and code.

Natural Language Processing

Natural Language Processing Machine Learning Machine Learning AI

Data Integration for AI: Top Use Cases and Steps for Success

Precisely

FEBRUARY 20, 2025

Key Takeaways Trusted data is critical for AI success. Data integration ensures your AI initiatives are fueled by complete, relevant, and real-time enterprise data, minimizing errors and unreliable outcomes that could harm your business. Data integration solves key business challenges.

Data Silos

Data Silos AI AI Data Quality

Streamlining Process Configuration in Machine Learning with Hydra

Pickl AI

NOVEMBER 29, 2024

Use Cases in ML Workflows Hydra excels in scenarios requiring frequent parameter tuning, such as hyperparameter optimisation, multi-environment testing, and orchestrating pipelines. It also simplifies managing configuration dependencies in Deep Learning projects and large-scale data pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

Whether youre new to AI development or an experienced practitioner, this post provides step-by-step guidance and code examples to help you build more reliable AI applications. The agent knowledge base stores Amazon Bedrock service documentation, while the cache knowledge base contains curated and verified question-answer pairs.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

Big Data vs. Data Science: Demystifying the Buzzwords

Pickl AI

APRIL 21, 2025

Semi-Structured Data: Data that has some organizational properties but doesn’t fit a rigid database structure (like emails, XML files, or JSON data used by websites). Unstructured Data: Data with no predefined format (like text documents, social media posts, images, audio files, videos).

Big Data

Big Data Big Data Data Science Machine Learning

Bringing Declarative Pipelines to the Apache Spark™ Open Source Project

databricks

JUNE 12, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data!

SQL

SQL Data Engineer Data Engineering Data Engineering

LLM app platforms

Dataconomy

MARCH 20, 2025

As AI technologies continue to evolve, understanding the functionalities and development stages of LLM applications is essential for both new and seasoned developers. Data collection and preparation Quality data is paramount in training an effective LLM. KLU.ai: Offers no-code solutions for smooth data source integration.

Data Preparation

Data Preparation Data Pipeline Data Quality Database

These AI & Data Engineering Sessions Are a Must-Attend at ODSC East 2025

ODSC - Open Data Science

MARCH 19, 2025

As AI and data engineering continue to evolve at an unprecedented pace, the challenge isnt just building advanced modelsits integrating them efficiently, securely, and at scale. This session explores open-source tools and techniques for transforming unstructured documents into structured formats like JSON and Markdown.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Ask HN: Who wants to be hired? (July 2025)

Hacker News

JULY 1, 2025

Prior to that, I spent a couple years at First Orion - a smaller data company - helping found & build out a data engineering team as one of the first engineers. We were focused on building data pipelines and models to protect our users from malicious phonecalls. I am interested in contract work if it is the right fit.

Python

Python AWS SQL ML

How Anomalo solves unstructured data quality issues to deliver trusted assets for AI with AWS

Flipboard

JUNE 17, 2025

Generative AI has rapidly evolved from a novelty to a powerful driver of innovation. From summarizing complex legal documents to powering advanced chat-based assistants, AI capabilities are expanding at an increasing pace. Gartner predicts that 30% of generative AI projects will be abandoned in 2025.

Data Quality

Data Quality AWS AI AI

Enhanced diagnostics flow with LLM and Amazon Bedrock agent integration

Flipboard

JUNE 3, 2025

To solve this, Noodoe has integrated large language models (LLMs) through Amazon Bedrock and Amazon Bedrock Agents to deliver intelligent automation, real-time data access, and multilingual support. In this post, we explore how Noodoe uses AI and Amazon Bedrock to optimize EV charging operations.

AWS

AWS Apache Kafka Database AI

Best Data Engineering Tools Every Engineer Should Know

Pickl AI

MARCH 19, 2025

Summary: Data engineering tools streamline data collection, storage, and processing. Learning these tools is crucial for building scalable data pipelines. offers Data Science courses covering these tools with a job guarantee for career growth. Below are 20 essential tools every data engineer should know.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

A Field Guide to Rapidly Improving AI Products

Flipboard

APRIL 15, 2025

Most AI teams focus on the wrong things. Heres a common scene from my consulting work: AI TEAM Heres our agent architectureweve got RAG here, a router there, and were using this new framework for ME [Holding up my hand to pause the enthusiastic tech lead] Can you show me how youre measuring if any of this actually works?

AI

AI AI Database ML

Go is a good fit for agents

Hacker News

JUNE 4, 2025

Give us feedback → Edit this page Scroll to top Blog Why Go is a good fit for agents Why Go is a good fit for agents Since you’re here, you might be interested in checking out Hatchet — the platform for running background tasks, data pipelines and AI agents at scale. They often involve input from a user (or another agent!)

Python

Python Database Data Pipeline Machine Learning

RAG vs Fine-Tuning for Enterprise LLMs

Towards AI

FEBRUARY 17, 2025

Originally published on Towards AI. RAFT vs Fine-Tuning Image created by author As the use of large language models (LLMs) grows within businesses, to automate tasks, analyse data, and engage with customers; adapting these models to specific needs (e.g., Solution: Use overlapping chunks (e.g.,

Database

Database Data Pipeline Data Preparation Data Quality

Generate training data and cost-effectively train categorical models with Amazon Bedrock

AWS Machine Learning Blog

MARCH 27, 2025

In this post, we explore how you can use Amazon Bedrock to generate high-quality categorical ground truth data, which is crucial for training machine learning (ML) models in a cost-sensitive environment. Lets look at how generative AI can help solve this problem.

AWS

AWS ETL ML ML

Building End-to-End Data Pipelines: From Data Ingestion to Analysis

What’s New with Azure Databricks: Unified Governance, Open Formats, and AI-Native Workloads

Webinars

Trending Sources

What’s New: Lakeflow Jobs Provides More Efficient Data Orchestration

Webinars

8 Ways to Scale your Data Science Workloads

The Ultimate Guide to Vibe Coding: 6 Powerful Frameworks Transforming Software Development

5 Fun Generative AI Projects for Absolute Beginners

Building enterprise-scale RAG applications with Amazon S3 Vectors and DeepSeek R1 on Amazon SageMaker AI

A Complete Guide to Matplotlib: From Basics to Advanced Plots

What’s New: Zerobus and Other Announcements Improve Data Ingestion for Lakeflow Connect

Go vs. Python for Modern Data Workflows: Need Help Deciding?

The Lifecycle of Feature Engineering: From Raw Data to Model-Ready Inputs

What’s New in Lakeflow Declarative Pipelines: July 2025

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

Unlocking generative AI for enterprises: How SnapLogic powers their low-code Agent Creator using Amazon Bedrock

10 Free Online Courses to Master Python in 2025

Real value, real time: Production AI with Amazon SageMaker and Tecton

Evaluate large language models for your machine translation tasks on AWS

Align and monitor your Amazon Bedrock powered insurance assistance chatbot to responsible AI principles with AWS Audit Manager

Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

How Dataiku and Snowflake Strengthen the Modern Data Stack

AWS Machine Learning: A Beginner’s Guide

HCLTech’s AWS powered AutoWise Companion: A seamless experience for informed automotive buyer decisions with data-driven design

Build a conversational data assistant, Part 2 – Embedding generative business intelligence with Amazon Q in QuickSight

Shaping the future: OMRON’s data-driven journey with AWS

How Walmart built an AI platform that makes it beholden to no one (and that 1.5M associates actually want to use)

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

Optimizing Matillion Workflows: A Guide to Visual Design and Best Practices

Scaling globally starts with building smarter, not selling faster

What is the Pile Dataset

Data Integration for AI: Top Use Cases and Steps for Success

Streamlining Process Configuration in Machine Learning with Hydra

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

Big Data vs. Data Science: Demystifying the Buzzwords

Bringing Declarative Pipelines to the Apache Spark™ Open Source Project

LLM app platforms

These AI & Data Engineering Sessions Are a Must-Attend at ODSC East 2025

Ask HN: Who wants to be hired? (July 2025)

How Anomalo solves unstructured data quality issues to deliver trusted assets for AI with AWS

Enhanced diagnostics flow with LLM and Amazon Bedrock agent integration

Best Data Engineering Tools Every Engineer Should Know

A Field Guide to Rapidly Improving AI Products

Go is a good fit for agents

RAG vs Fine-Tuning for Enterprise LLMs

Generate training data and cost-effectively train categorical models with Amazon Bedrock

Stay Connected