Data Modeling, Data Pipeline and Data Scientist

What Do Data Scientists Do? A Guide to AI Maturity, Challenges, and Solutions

DataRobot Blog

SEPTEMBER 13, 2022

According to IDC , 83% of CEOs want their organizations to be more data-driven. Data scientists could be your key to unlocking the potential of the Information Revolution—but what do data scientists do? What Do Data Scientists Do? Data scientists drive business outcomes.

Data Scientist

Data Scientist ML ML AI

Here’s Why Automation For Data Lakes Could Be Important

Smart Data Collective

APRIL 2, 2019

These massive storage pools of data are among the most non-traditional methods of data storage around and they came about as companies raced to embrace the trend of Big Data Analytics which was sweeping the world in the early 2010s. The First Problem – Data Ingestion. The Third Problem – Preparation of Data.

Data Lakes

Data Lakes Big Data Big Data Data Scientist

Best Data Engineering Tools Every Engineer Should Know

Pickl AI

MARCH 19, 2025

Summary: Data engineering tools streamline data collection, storage, and processing. Learning these tools is crucial for building scalable data pipelines. offers Data Science courses covering these tools with a job guarantee for career growth. Below are 20 essential tools every data engineer should know.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Some popular end-to-end MLOps platforms in 2023 Amazon SageMaker Amazon SageMaker provides a unified interface for data preprocessing, model training, and experimentation, allowing data scientists to collaborate and share code easily. Can you see the complete model lineage with data/models/experiments used downstream?

Machine Learning

Machine Learning Machine Learning ML ML

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

Overview: Data science vs data analytics Think of data science as the overarching umbrella that covers a wide range of tasks performed to find patterns in large datasets, structure data for use, train machine learning models and develop artificial intelligence (AI) applications.

Data Science

Data Science Analytics Analytics Data Scientist

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Pickl AI

JULY 25, 2023

Unfolding the difference between data engineer, data scientist, and data analyst. Data engineers are essential professionals responsible for designing, constructing, and maintaining an organization’s data infrastructure. Role of Data Scientists Data Scientists are the architects of data analysis.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

It simplifies feature access for model training and inference, significantly reducing the time and complexity involved in managing data pipelines. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly. Saurabh Gupta is a Principal Engineer at Zeta Global.

AWS

AWS Machine Learning Machine Learning ML

Deploying Gen AI in Production with NVIDIA NIM & MLRun

Iguazio

JUNE 9, 2025

The blog is based on the webinar Deploying Gen AI in Production with NVIDIA NIM & MLRun with Amit Bleiweiss, Senior Data Scientist at NVIDIA, and Yaron Haviv, co-founder and CTO and Guy Lecker, ML Engineering Team Lead at Iguazio (acquired by McKinsey). Ensuring data security, lineage and risk controls.

AI

AI AI Data Preparation ML

Unlocking Tabular Data’s Hidden Potential

ODSC - Open Data Science

MAY 10, 2023

Many mistakenly equate tabular data with business intelligence rather than AI, leading to a dismissive attitude toward its sophistication. Standard data science practices could also be contributing to this issue. Making data engineering more systematic through principles and tools will be key to making AI algorithms work.

Data Scientist

Data Scientist Data Science Deep Learning Deep Learning

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Summary: The fundamentals of Data Engineering encompass essential practices like data modelling, warehousing, pipelines, and integration. Understanding these concepts enables professionals to build robust systems that facilitate effective data management and insightful analysis. What is Data Engineering?

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

ML Collaboration: Best Practices From 4 ML Teams

The MLOps Blog

DECEMBER 28, 2022

Data scientists frame the business problem and the objective into a statistical solution and start with the very first step of data exploration. Team composition The team comprises domain experts, data engineers, data scientists, and ML engineers.

ML

ML ML Data Scientist Machine Learning

Building Scalable AI Pipelines with MLOps: A Guide for Software Engineers

ODSC - Open Data Science

OCTOBER 7, 2024

In today’s landscape, AI is becoming a major focus in developing and deploying machine learning models. It isn’t just about writing code or creating algorithms — it requires robust pipelines that handle data, model training, deployment, and maintenance. Model Training: Running computations to learn from the data.

Machine Learning

Machine Learning Machine Learning AI AI

Find Your AI Solutions at the ODSC West AI Expo

ODSC - Open Data Science

OCTOBER 15, 2023

Elementl / Dagster Labs Elementl and Dagster Labs are both companies that provide platforms for building and managing data pipelines. Elementl’s platform is designed for data engineers, while Dagster Labs’ platform is designed for data scientists. ArangoDB is designed to be scalable, reliable, and easy to use.

Machine Learning

Machine Learning Machine Learning Data Pipeline AI

DataOps vs. DevOps: What’s the Difference?

Alation

AUGUST 3, 2021

It brings together business users, data scientists , data analysts, IT, and application developers to fulfill the business need for insights. DataOps then works to continuously improve and adjust data models, visualizations, reports, and dashboards to achieve business goals. Using DataOps to Empower Users.

DataOps

DataOps Data Pipeline Data Analyst Analytics

Architect a mature generative AI foundation on AWS

Flipboard

MAY 30, 2025

Data quality is ownership of the consuming applications or data producers. Governance The two key areas of governance are model and data: Model governance Monitor model for performance, robustness, and fairness. Model versions should be managed centrally in a model registry.

AWS

AWS AI AI Database

Best 8 Data Version Control Tools for Machine Learning 2024

DagsHub

DECEMBER 11, 2023

DagsHub DagsHub is a centralized Github-based platform that allows Machine Learning and Data Science teams to build, manage and collaborate on their projects. In addition to versioning code, teams can also version data, models, experiments and more. It does not support the ‘dvc repro’ command to reproduce its data pipeline.

Machine Learning

Machine Learning Machine Learning Data Lakes Data Science

Implementing GenAI in Practice

Iguazio

JANUARY 22, 2024

Production App - Build resilient and modular production pipelines with automation, scale, testing, observability, versioning, security, risk handling, etc. Monitoring - Monitor all resources, data, model and application metrics to ensure performance. This helps cleanse the data.

Data Pipeline

Data Pipeline ML ML Data Warehouse

Why Improving Problem-Solving Skills is Crucial for Data Engineers?

DataSeries

AUGUST 15, 2024

Data Engineering Career: Unleashing The True Potential of Data Problem-Solving Skills Data Engineers are required to possess strong analytical and problem-solving skills to navigate complex data challenges. Understanding these fundamentals is essential for effective problem-solving in data engineering.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

A Recipe For AI Strategy

ODSC - Open Data Science

FEBRUARY 8, 2024

data sources or simpler data models) of the data products we want to build? Answering these questions allows data scientists to develop useful data products that start out simple and can be improved and made more complex over time until the long-term vision is achieved. What are the dependencies (e.g.

Data Science

Data Science AI AI Data Scientist

LLMOps vs. MLOps: Understanding the Differences

Iguazio

FEBRUARY 8, 2024

Data engineers, data scientists and other data professional leaders have been racing to implement gen AI into their engineering efforts. Data Pipeline - Manages and processes various data sources. ML Pipeline - Focuses on training, validation and deployment. LLMOps is MLOps for LLMs.

ML

ML ML Data Scientist Machine Learning

What Industries are Hiring for Different Jobs in AI

ODSC - Open Data Science

APRIL 26, 2023

Though just about every industry imaginable utilizes the skills of a data-focused professional, each has its own challenges, needs, and desired outcomes. This is why you’ll often find that there are jobs in AI specific to an industry, or desired outcome when it comes to data. So, what are you waiting for?

Data Analyst

Data Analyst Machine Learning Machine Learning Power BI

Best 8 Experiment Tracking Tools for Machine Learning 2024

DagsHub

DECEMBER 5, 2023

It helps data scientists keep track of their experiments, reproduce their results, and collaborate with others effectively. Experiment tracking tools enable us to log experiment metadata, such as hyperparameters, dataset/code versions, and model performance metrics. This is where ML experiment tracking comes into play!

Machine Learning

Machine Learning Machine Learning ML ML

How to Choose MLOps Tools: In-Depth Guide for 2024

DagsHub

APRIL 21, 2024

This collaboration of ML and operations teams is what you call MLOps and focuses on streamlining the process of deploying the ML models to production, along with maintaining and monitoring them. Model Training Frameworks This stage involves the process of creating and optimizing the predictive models with labeled and unlabeled data.

Machine Learning

Machine Learning Machine Learning ML ML

Demystifying Time Series Database: A Comprehensive Guide

Pickl AI

JULY 8, 2024

Features and Capabilities of Time Series Databases TSDBs offer a rich set of functionalities that empower developers and data scientists to effectively manage and analyse time series data. Here are some key features: High-performance Write and Read Operations TSDBs are optimised for rapid data ingestion and retrieval.

Database

Database Data Pipeline Machine Learning Machine Learning

Capital One’s data-centric solutions to banking business challenges

Snorkel AI

MAY 12, 2023

My name is Erin Babinski and I’m a data scientist at Capital One, and I’m speaking today with my colleagues Bayan and Kishore. We’re here to talk to you all about data-centric AI. billion is lost by Fortune 500 companies because of broken data pipelines and communications.

Machine Learning

Machine Learning Machine Learning ML ML

Capital One’s data-centric solutions to banking business challenges

Snorkel AI

MAY 12, 2023

My name is Erin Babinski and I’m a data scientist at Capital One, and I’m speaking today with my colleagues Bayan and Kishore. We’re here to talk to you all about data-centric AI. billion is lost by Fortune 500 companies because of broken data pipelines and communications.

Machine Learning

Machine Learning Machine Learning ML ML

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

Mlearning.ai

FEBRUARY 16, 2023

Thus, the solution allows for scaling data workloads independently from one another and seamlessly handling data warehousing, data lakes , data sharing, and engineering. Therefore, you’ll be empowered to truncate and reprocess data if bugs are detected and provide an excellent raw data source for data scientists.

Data Warehouse

Data Warehouse Business Intelligence Business Intelligence Database

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

Enter dbt dbt provides SQL-centric transformations for your data modeling and transformations, which is efficient for scrubbing and transforming your data while being an easy skill set to hire for and develop within your teams. It should also enable easy sharing of insights across the organization.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData

SEPTEMBER 27, 2024

Introduction: The Customer Data Modeling Dilemma You know, that thing we’ve been doing for years, trying to capture the essence of our customers in neat little profile boxes? For years, we’ve been obsessed with creating these grand, top-down customer data models. Yeah, that one.

Data Models

Data Models Data Modeling Apache Kafka Data Lakes

Ask HN: Who wants to be hired? (July 2025)

Hacker News

JULY 1, 2025

Prior to that, I spent a couple years at First Orion - a smaller data company - helping found & build out a data engineering team as one of the first engineers. We were focused on building data pipelines and models to protect our users from malicious phonecalls. "[1] type problems.

Python

Python AWS SQL ML

Ask HN: Who is hiring? (July 2025)

Hacker News

JULY 1, 2025

Designing AI data pipelines to process billions of data points. Open roles include: • Senior ML/Data Engineers • Senior AI Consultants • Senior AI Project Managers • Industry Directors • Junior ML/Data Engineers and many more! We have PMF, and it's time to scale!

Python

Python AWS ML ML

How to Build an End-To-End ML Pipeline

The MLOps Blog

MAY 9, 2023

The platform typically includes components for the ML ecosystem like data management, feature stores, experiment trackers, a model registry, a testing environment, model serving, and model management. They include: 1 Data (or input) pipeline. 2 Model (or training) pipeline.

ML

ML ML Machine Learning Machine Learning

Data Scientists in the Age of AI Agents and AutoML

Towards AI

JANUARY 22, 2025

Uncomfortable reality: In the era of large language models (LLMs) and AutoML, traditional skills like Python scripting, SQL, and building predictive models are no longer enough for data scientist to remain competitive in the market. Coding skills remain important, but the real value of data scientists today is shifting.

Data Scientist

Data Scientist EDA Exploratory Data Analysis AI

Gen AI Trends and Scaling Strategies for 2025

Iguazio

MARCH 20, 2025

This includes responsible AI, Gartners concept of AI TRiSM (Trust, Risk and Security in AI Models) and Sovereign AI. AI engineering - AI is being democratized for developers and engineers, expanding beyond the limited pool of data scientists. AI Agents and multi-agent systems.

AI

AI AI Data Pipeline Data Scientist

Data Science Current

What Do Data Scientists Do? A Guide to AI Maturity, Challenges, and Solutions

Here’s Why Automation For Data Lakes Could Be Important

Trending Sources

Best Data Engineering Tools Every Engineer Should Know

MLOps Landscape in 2023: Top Tools and Platforms

Data science vs data analytics: Unpacking the differences

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Deploying Gen AI in Production with NVIDIA NIM & MLRun

Unlocking Tabular Data’s Hidden Potential

Discover the Most Important Fundamentals of Data Engineering

ML Collaboration: Best Practices From 4 ML Teams

Building Scalable AI Pipelines with MLOps: A Guide for Software Engineers

Find Your AI Solutions at the ODSC West AI Expo

DataOps vs. DevOps: What’s the Difference?

Architect a mature generative AI foundation on AWS

Best 8 Data Version Control Tools for Machine Learning 2024

Implementing GenAI in Practice

Why Improving Problem-Solving Skills is Crucial for Data Engineers?

A Recipe For AI Strategy

LLMOps vs. MLOps: Understanding the Differences

What Industries are Hiring for Different Jobs in AI

Best 8 Experiment Tracking Tools for Machine Learning 2024

How to Choose MLOps Tools: In-Depth Guide for 2024

Demystifying Time Series Database: A Comprehensive Guide

Capital One’s data-centric solutions to banking business challenges

Capital One’s data-centric solutions to banking business challenges

Discover the Snowflake Architecture With All its Pros and Cons- NIX United

The Ultimate Modern Data Stack Migration Guide

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

Ask HN: Who wants to be hired? (July 2025)

Ask HN: Who is hiring? (July 2025)

How to Build an End-To-End ML Pipeline

Data Scientists in the Age of AI Agents and AutoML

Gen AI Trends and Scaling Strategies for 2025

Stay Connected