Data Pipeline, Information and ML - Data Science Current

Streaming Data Pipelines: What Are They and How to Build One

Precisely

DECEMBER 28, 2023

More than ever, advanced analytics, ML, and AI are providing the foundation for innovation, efficiency, and profitability. But insights derived from day-old data don’t cut it. Many scenarios call for up-to-the-minute information. Now, information is dynamic. That’s where streaming data pipelines come into play.

Data Pipeline

Data Pipeline Apache Kafka Big Data Big Data

Real value, real time: Production AI with Amazon SageMaker and Tecton

AWS Machine Learning Blog

DECEMBER 4, 2024

Businesses are under pressure to show return on investment (ROI) from AI use cases, whether predictive machine learning (ML) or generative AI. Only 54% of ML prototypes make it to production, and only 5% of generative AI use cases make it to production. Using SageMaker, you can build, train and deploy ML models.

ML

ML ML AWS AI

Five Important Trends in Big Data Analytics

Flipboard

FEBRUARY 3, 2023

Over the last few years, with the rapid growth of data, pipeline, AI/ML, and analytics, DataOps has become a noteworthy piece of day-to-day business New-age technologies are almost entirely running the world today. Among these technologies, big data has gained significant traction. This concept is …

Big Data Analytics

Big Data Analytics Big Data Analytics Big Data Big Data

Webinars

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

Flipboard

NOVEMBER 24, 2023

With that, the need for data scientists and machine learning (ML) engineers has grown significantly. These skilled professionals are tasked with building and deploying models that improve the quality and efficiency of BMW’s business processes and enable informed leadership decisions.

ML

ML ML AWS AI

The ultimate guide to the Machine Learning Model Deployment

Data Science Dojo

JULY 5, 2023

Machine Learning (ML) is a powerful tool that can be used to solve a wide variety of problems. Getting your ML model ready for action: This stage involves building and training a machine learning model using efficient machine learning algorithms. This involves removing any errors or inconsistencies in the data.

Machine Learning

Machine Learning Machine Learning EDA ML

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

AWS Machine Learning Blog

AUGUST 12, 2024

Instead, organizations are increasingly looking to take advantage of transformative technologies like machine learning (ML) and artificial intelligence (AI) to deliver innovative products, improve outcomes, and gain operational efficiencies at scale. Data is presented to the personas that need access using a unified interface.

ML

ML ML AWS AI

How to Build ETL Data Pipeline in ML

The MLOps Blog

MAY 17, 2023

From data processing to quick insights, robust pipelines are a must for any ML system. Often the Data Team, comprising Data and ML Engineers , needs to build this infrastructure, and this experience can be painful. However, efficient use of ETL pipelines in ML can help make their life much easier.

ETL

ETL Data Pipeline ML ML

Boosting Resiliency with an ML-based Telemetry Analytics Architecture | Amazon Web Services

Flipboard

MARCH 3, 2023

Data proliferation has become a norm and as organizations become more data driven, automating data pipelines that enable data ingestion, curation, …

Data Pipeline

Data Pipeline ML ML Analytics

Journeying into the realms of ML engineers and data scientists

Dataconomy

MAY 16, 2023

Key skills and qualifications for machine learning engineers include: Strong programming skills: Proficiency in programming languages such as Python, R, or Java is essential for implementing machine learning algorithms and building data pipelines.

Data Scientist

Data Scientist ML ML Machine Learning

Use Snowflake as a data source to train ML models with Amazon SageMaker

AWS Machine Learning Blog

MARCH 8, 2023

Amazon SageMaker is a fully managed machine learning (ML) service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment. We add this data to Snowflake as a new table.

ML

ML ML AWS Python

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. SageMaker Studio is the first fully integrated development environment (IDE) for ML. The next step is to build ML models using features selected from one or multiple feature groups.

ML

ML ML AWS Data Warehouse

Edge Impulse Launches “Bring Your Own Model” for ML Engineers

Towards AI

APRIL 4, 2023

Last Updated on April 4, 2023 by Editorial Team Introducing a Python SDK that allows enterprises to effortlessly optimize their ML models for edge devices. With their groundbreaking web-based Studio platform, engineers have been able to collect data, develop and tune ML models, and deploy them to devices.

ML

ML ML Python Machine Learning

Build Data Pipelines: Comprehensive Step-by-Step Guide

Pickl AI

JULY 8, 2024

Summary: This blog explains how to build efficient data pipelines, detailing each step from data collection to final delivery. Introduction Data pipelines play a pivotal role in modern data architecture by seamlessly transporting and transforming raw data into valuable insights.

Data Pipeline

Data Pipeline Data Quality Database Apache Kafka

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

IBM Data Science in Practice

MARCH 8, 2023

The growth of the AI and Machine Learning (ML) industry has continued to grow at a rapid rate over recent years. Hidden Technical Debt in Machine Learning Systems More money, more problems — Rise of too many ML tools 2012 vs 2023 — Source: Matt Turck People often believe that money is the solution to a problem. Spark, Flink, etc.)

Machine Learning

Machine Learning Machine Learning ML ML

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

FEBRUARY 21, 2025

While these models are trained on vast amounts of generic data, they often lack the organization-specific context and up-to-date information needed for accurate responses in business settings. After ingesting the data, you create an agent with specific instructions: agent_instruction = """You are the Amazon Bedrock Agent.

AWS

AWS Natural Language Processing Machine Learning Machine Learning

How to Build Effective Data Pipelines in Snowpark

phData

AUGUST 6, 2024

As today’s world keeps progressing towards data-driven decisions, organizations must have quality data created from efficient and effective data pipelines. For customers in Snowflake, Snowpark is a powerful tool for building these effective and scalable data pipelines.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineer

Pioneering computer vision: Aleksandr Timashov, ML developer

Dataconomy

AUGUST 22, 2024

Aleksandr Timashov is an ML Engineer with over a decade of experience in AI and Machine Learning. In this interview, Aleksandr shares his unique experiences of leading groundbreaking projects in Computer Vision and Data Science at the Petronas global energy group (Malaysia). This does sound intriguing!

ML

ML ML Machine Learning Machine Learning

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

AWS Machine Learning Blog

DECEMBER 4, 2024

Building generative AI applications presents significant challenges for organizations: they require specialized ML expertise, complex infrastructure management, and careful orchestration of multiple services. These components will work together to process and retrieve information for your generative AI application.

AWS

AWS AI AI SQL

Shaping the future: OMRON’s data-driven journey with AWS

AWS Machine Learning Blog

APRIL 3, 2025

Data is one of the most critical assets of many organizations. Theyre constantly seeking ways to use their vast amounts of information to gain competitive advantages. This enables OMRON to extract meaningful patterns and trends from its vast data repositories, supporting more informed decision-making at all levels of the organization.

AWS

AWS Data Governance Data Silos SQL

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

AWS Machine Learning Blog

FEBRUARY 24, 2023

AWS recently released Amazon SageMaker geospatial capabilities to provide you with satellite imagery and geospatial state-of-the-art machine learning (ML) models, reducing barriers for these types of use cases. For more information, refer to Preview: Use Amazon SageMaker to Build, Train, and Deploy ML Models Using Geospatial Data.

ML

ML ML AWS Data Pipeline

10 highest-paying AI jobs and careers in 2024

Data Science Dojo

APRIL 16, 2024

Machine learning (ML) engineer Potential pay range – US$82,000 to 160,000/yr Machine learning engineers are the bridge between data science and engineering. Integrating the knowledge of data science with engineering skills, they can design, build, and deploy machine learning (ML) models.

AI

AI AI Machine Learning Machine Learning

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

AWS Machine Learning Blog

AUGUST 22, 2024

This makes managing and deploying these updates across a large-scale deployment pipeline while providing consistency and minimizing downtime a significant undertaking. Generative AI applications require continuous ingestion, preprocessing, and formatting of vast amounts of data from various sources.

ML

ML ML Python AWS

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Machine Learning ML ML

Failure analysis machine learning

Dataconomy

MAY 6, 2025

With an increasing reliance on ML models across various sectors, identifying potential failures before they manifest is vital for maintaining user trust and operational efficiency. A thorough understanding of these failures can inform better practices and approaches. What is failure analysis machine learning?

Machine Learning

Machine Learning Machine Learning Data Pipeline ML

Streamlining Process Configuration in Machine Learning with Hydra

Pickl AI

NOVEMBER 29, 2024

It enhances scalability, experimentation, and reproducibility, allowing ML teams to focus on innovation. This blog highlights the importance of organised, flexible configurations in ML workflows and introduces Hydra. It also simplifies managing configuration dependencies in Deep Learning projects and large-scale data pipelines.

Machine Learning

Machine Learning Machine Learning ML ML

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

AWS Machine Learning Blog

MARCH 1, 2023

Statistical methods and machine learning (ML) methods are actively developed and adopted to maximize the LTV. In this post, we share how Kakao Games and the Amazon Machine Learning Solutions Lab teamed up to build a scalable and reliable LTV prediction solution by using AWS data and ML services such as AWS Glue and Amazon SageMaker.

AWS

AWS ML ML ETL

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

Heartbeat

NOVEMBER 6, 2023

Image Source — Pixel Production Inc In the previous article, you were introduced to the intricacies of data pipelines, including the two major types of existing data pipelines. You might be curious how a simple tool like Apache Airflow can be powerful for managing complex data pipelines.

Data Pipeline

Data Pipeline Clean Data ETL Python

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

AWS Machine Learning Blog

DECEMBER 15, 2023

We are excited to announce the launch of Amazon DocumentDB (with MongoDB compatibility) integration with Amazon SageMaker Canvas , allowing Amazon DocumentDB customers to build and use generative AI and machine learning (ML) solutions without writing code. Follow the tutorial to load sample restaurant data into Amazon DocumentDB.

Machine Learning

Machine Learning Machine Learning AWS ML

Understanding and predicting urban heat islands at Gramener using Amazon SageMaker geospatial capabilities

AWS Machine Learning Blog

APRIL 5, 2024

By providing authorities with the tools and insights they need to make informed decisions about environmental and social impact, Gramener is playing a vital role in building a more sustainable future. Urban heat islands (UHIs) are areas within cities that experience significantly higher temperatures than their surrounding rural areas.

Clustering

Clustering ML ML AWS

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

AWS Machine Learning Blog

APRIL 19, 2023

Since 2018, our team has been developing a variety of ML models to enable betting products for NFL and NCAA football. These models are then pushed to an Amazon Simple Storage Service (Amazon S3) bucket using DVC, a version control tool for ML models. For more information on the DJL and its other features, see Deep Java Library.

ML

ML ML Deep Learning Deep Learning

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

The ZMP analyzes billions of structured and unstructured data points to predict consumer intent by using sophisticated artificial intelligence (AI) to personalize experiences at scale. For more information, see Zeta Global’s home page. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly.

AWS

AWS Machine Learning Machine Learning ML

Foundational data protection for enterprise LLM acceleration with Protopia AI

AWS Machine Learning Blog

DECEMBER 5, 2023

In their implementation of generative AI technology, enterprises have real concerns about data exposure and ownership of confidential information that may be sent to LLMs. These concerns of privacy and data protection can slow down or limit the usage of LLMs in organizations.

AI

AI AI AWS ML

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

AWS Machine Learning Blog

MARCH 14, 2024

Recent improvements in Generative AI based large language models (LLMs) have enabled their use in a variety of applications surrounding information retrieval. Given the data sources, LLMs provided tools that would allow us to build a Q&A chatbot in weeks, rather than what may have taken years previously, and likely with worse performance.

SQL

SQL AWS AI AI

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning Blog

OCTOBER 18, 2023

Purina used artificial intelligence (AI) and machine learning (ML) to automate animal breed detection at scale. The solution focuses on the fundamental principles of developing an AI/ML application workflow of data preparation, model training, model evaluation, and model monitoring. DynamoDB is used to store the pet attributes.

AWS

AWS ML ML Machine Learning

Managing Dataset Versions in Long-Term ML Projects

The MLOps Blog

MARCH 20, 2023

Long-term ML project involves developing and sustaining applications or systems that leverage machine learning models, algorithms, and techniques. An example of a long-term ML project will be a bank fraud detection system powered by ML models and algorithms for pattern recognition. 2 Ensuring and maintaining high-quality data.

ML

ML ML Machine Learning Machine Learning

Building and Scaling Gen AI Applications with Simplicity, Performance and Risk Mitigation in Mind Using Iguazio (acquired by McKinsey) and MongoDB

Iguazio

JULY 22, 2024

AI offers a transformative approach by automating the interpretation of regulations, supporting data cleansing, and enhancing the efficacy of surveillance systems. AI-powered systems can analyze transactions in real-time and flag suspicious activities more accurately, which helps institutions take informed actions to prevent financial losses.

AI

AI AI ML ML

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

AWS Machine Learning Blog

AUGUST 8, 2024

As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads. Data is the foundational layer for all generative AI and ML applications.

SQL

SQL Data Lakes Data Analyst AWS

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

AWS Machine Learning Blog

SEPTEMBER 18, 2023

Machine learning (ML) is becoming increasingly complex as customers try to solve more and more challenging problems. This complexity often leads to the need for distributed ML, where multiple machines are used to train a single model. SageMaker is a fully managed service for building, training, and deploying ML models.

Machine Learning

Machine Learning Machine Learning ML ML

Designing generative AI workloads for resilience

AWS Machine Learning Blog

FEBRUARY 1, 2024

Data pipelines In cases where you need to provide contextual data to the foundation model using the RAG pattern, you need a data pipeline that can ingest the source data, convert it to embedding vectors, and store the embedding vectors in a vector database.

AWS

AWS AI AI Database

What Lays Ahead in 2024? AI/ML Predictions for the New Year

Iguazio

DECEMBER 18, 2023

For data science practitioners, productization is key, just like any other AI or ML technology. However, it's important to contextualize generative AI within the broader landscape of AI and ML technologies. By thinking about the ML process in advance: preparing, managing, and versioning data, reusing components, etc.,

ML

ML ML AI AI

Improving air quality with generative AI

AWS Machine Learning Blog

JUNE 18, 2024

Through evaluations of sensors and informed decision-making support, Afri-SET empowers governments and civil society for effective air quality management. The attempt is disadvantaged by the current focus on data cleaning, diverting valuable skills away from building ML models for sensor calibration.

AWS

AWS Python AI AI

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

AWS Machine Learning Blog

FEBRUARY 13, 2024

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. Their task is to construct and oversee efficient data pipelines.

AWS

AWS ML ML Machine Learning

How Did We Get to ML Model Reproducibility

The MLOps Blog

MARCH 14, 2023

When working on real-world ML projects , you come face-to-face with a series of obstacles. The ml model reproducibility problem is one of them. Instead, we tend to spend much time on data exploration, preprocessing, and modeling. This is indeed an erroneous thing to do when working on ML projects at scale.

ML

ML ML Machine Learning Machine Learning

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

AWS Machine Learning Blog

NOVEMBER 1, 2023

Today, personally identifiable information (PII) is everywhere. It refers to any data or information that can be used to identify a specific individual. It’s a critical component of modern data management and cybersecurity practices. PII is in emails, slack messages, videos, PDFs, and so on.

AWS

AWS Machine Learning Machine Learning ML

Streaming Data Pipelines: What Are They and How to Build One

Real value, real time: Production AI with Amazon SageMaker and Tecton

Webinars

Trending Sources

Five Important Trends in Big Data Analytics

Webinars

Accelerating AI/ML development at BMW Group with Amazon SageMaker Studio

The ultimate guide to the Machine Learning Model Deployment

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

How to Build ETL Data Pipeline in ML

Boosting Resiliency with an ML-based Telemetry Analytics Architecture | Amazon Web Services

Journeying into the realms of ML engineers and data scientists

Use Snowflake as a data source to train ML models with Amazon SageMaker

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Edge Impulse Launches “Bring Your Own Model” for ML Engineers

Build Data Pipelines: Comprehensive Step-by-Step Guide

Feature Platforms?—?A New Paradigm in Machine Learning Operations (MLOps)

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

How to Build Effective Data Pipelines in Snowpark

Pioneering computer vision: Aleksandr Timashov, ML developer

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

Shaping the future: OMRON’s data-driven journey with AWS

Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI

10 highest-paying AI jobs and careers in 2024

Fine tune a generative AI application for Amazon Bedrock using Amazon SageMaker Pipeline decorators

MLOps Landscape in 2023: Top Tools and Platforms

Failure analysis machine learning

Streamlining Process Configuration in Machine Learning with Hydra

How Kakao Games automates lifetime value prediction from game data using Amazon SageMaker and AWS Glue

Supercharging Your Data Pipeline with Apache Airflow (Part 2)

Use Amazon DocumentDB to build no-code machine learning solutions in Amazon SageMaker Canvas

Understanding and predicting urban heat islands at Gramener using Amazon SageMaker geospatial capabilities

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Foundational data protection for enterprise LLM acceleration with Protopia AI

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Managing Dataset Versions in Long-Term ML Projects

Building and Scaling Gen AI Applications with Simplicity, Performance and Risk Mitigation in Mind Using Iguazio (acquired by McKinsey) and MongoDB

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

Orchestrate Ray-based machine learning workflows using Amazon SageMaker

Designing generative AI workloads for resilience

What Lays Ahead in 2024? AI/ML Predictions for the New Year

Improving air quality with generative AI

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

How Did We Get to ML Model Reproducibility

How Reveal’s Logikcull used Amazon Comprehend to detect and redact PII from legal documents at scale

Stay Connected