2018, Data Science and ML - Data Science Current

Bridging the Gap: New Datasets Push Recommender Research Toward Real-World Scale

KDnuggets

JUNE 11, 2025

Spotify Million Playlist Released for RecSys 2018, this dataset helps analyze short-term and sequential listening behavior. Read the original article at Turing Post , the newsletter for over 90 000 professionals who are serious about AI and ML. Yelp Open Dataset Contains 8.6M reviews, but coverage is sparse and city-specific.

Natural Language Processing

Natural Language Processing Data Science Machine Learning Machine Learning

Building Generative AI and ML solutions faster with AI apps from AWS partners using Amazon SageMaker

AWS Machine Learning Blog

DECEMBER 4, 2024

This increases the time it takes for customers to go from data to insights. Our customers want a simple and secure way to find the best applications, integrate the selected applications into their machine learning (ML) and generative AI development environment, manage and scale their AI projects.

AWS

AWS ML ML AI

The most important unanswered questions of 2018 in Artificial Intelligence (AI) and Machine Learning (ML)

Dataconomy

JULY 16, 2018

This is the first part of an article series based on a whitepaper by Dataiku) The year 2018 was supposed to be the one. The post The most important unanswered questions of 2018 in Artificial Intelligence (AI) and Machine Learning (ML) appeared first on Dataconomy. Let’s find out.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Machine Learning Machine Learning

GoLang for Data Science

Data Science 101

APRIL 26, 2019

While it is not one of the popular programming languages for data science, The Go Programming Language (aka Golang) has surfaced for me a few times in the past few years as an option for data science. I decided to do some searching and find some conclusions about whether golang is a good choice for data science.

Data Science

Data Science Machine Learning Machine Learning Python

The Rise and Fall of Data Science Trends: A 2018–2024 Conference Perspective

ODSC - Open Data Science

MARCH 12, 2025

The field of data science has evolved dramatically over the past several years, driven by technological breakthroughs, industry demands, and shifting priorities within the community. By analyzing conference session titles and abstracts from 2018 to 2024, we can trace the rise and fall of key trends that shaped the industry.

Data Science

Data Science Machine Learning Machine Learning Data Engineering

Predictive analytics vs. AI: Why the difference matters in 2023?

Data Science Dojo

SEPTEMBER 8, 2023

AI encompasses the creation of intelligent machines capable of autonomous decision-making, while Predictive Analytics relies on data, statistics, and machine learning to forecast future events accurately. Read more –> Data Science vs AI – What is 2023 demand for? Streamline operations. Improve customer service.

Predictive Analytics

Predictive Analytics Analytics Analytics Deep Learning

The Role of DevSecOps in Ensuring Data Privacy and Security in Data Science Projects

ODSC - Open Data Science

APRIL 17, 2023

Source Purpose of Using DevSecOps in Traditional and ML Applications The DevSecOps practices are different in traditional and ML applications as each comes with different challenges. The characteristics which we saw for DevSecOps for traditional applications also apply to ML-based applications.

Data Science

Data Science ML ML Deep Learning

Google AI

Dataconomy

MARCH 19, 2025

Formerly known as Google Research, it was rebranded during the 2018 Google I/O conference. Data science services Through Google Cloud, users can access ML infrastructure and data science capabilities, empowering organizations to leverage AI for their specific needs.

AI

AI AI Natural Language Processing Machine Learning

Data Integrity: The Foundation for Trustworthy AI/ML Outcomes and Confident Business Decisions

ODSC - Open Data Science

APRIL 28, 2023

Be sure to check out her talk, “ Power trusted AI/ML Outcomes with Data Integrity ,” there! Due to the tsunami of data available to organizations today, artificial intelligence (AI) and machine learning (ML) are increasingly important to businesses seeking competitive advantage through digital transformation.

ML

ML ML Data Silos Data Quality

Improving ML Datasets with Cleanlab, a Standard Framework for Data-Centric AI

ODSC - Open Data Science

MARCH 22, 2023

Be sure to check out his session, “ Improving ML Datasets with Cleanlab, a Standard Framework for Data-Centric AI ,” there! Anybody who has worked on a real-world ML project knows how messy data can be. Everybody knows you need to clean your data to get good ML performance. How does cleanlab work?

ML

ML ML Data Scientist AI

Machine Learning Engineering in the Real World

ODSC - Open Data Science

SEPTEMBER 21, 2023

Secondly, to be a successful ML engineer in the real world, you cannot just understand the technology; you must understand the business. Some typical examples are given in the following table, along with some discussion as to whether or not ML would be an appropriate tool for solving the problem: Figure 1.1:

Machine Learning

Machine Learning Machine Learning ML ML

Best Colleges for Data Science Course Online in India

Pickl AI

APRIL 10, 2023

So, if you are eyeing your career in the data domain, this blog will take you through some of the best colleges for Data Science in India. There is a growing demand for employees with digital skills The world is drifting towards data-based decision making In India, a technology analyst can make between ₹ 5.5

Data Science

Data Science Machine Learning Machine Learning Python

Deploy large language models for a healthtech use case on Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 6, 2024

To support overarching pharmacovigilance activities, our pharmaceutical customers want to use the power of machine learning (ML) to automate the adverse event detection from various data sources, such as social media feeds, phone calls, emails, and handwritten notes, and trigger appropriate actions.

AWS

AWS ML ML Data Preparation

Racing beyond DeepRacer: Debut of the AWS LLM League

AWS Machine Learning Blog

APRIL 11, 2025

Announced at re:Invent 2018, it puts machine learning in the hands of every developer through the fun and excitement of developing and racing self-driving remote control cars. Idries is the Product Marketing Manager for AWS AI/ML Gamified Learning Programs.

AWS

AWS Machine Learning Machine Learning ML

How Marubeni is optimizing market decisions using AWS machine learning and analytics

AWS Machine Learning Blog

MARCH 8, 2023

Manager Data Science at Marubeni Power International. MPII is using a machine learning (ML) bid optimization engine to inform upstream decision-making processes in power asset management and trading. This solution helps market analysts design and perform data-driven bidding strategies optimized for power asset profitability.

AWS

AWS Machine Learning Machine Learning Analytics

NLP-Powered Data Extraction for SLRs and Meta-Analyses

Towards AI

JULY 20, 2023

Natural Language Processing Getting desirable data out of published reports and clinical trials and into systematic literature reviews (SLRs) — a process known as data extraction — is just one of a series of incredibly time-consuming, repetitive, and potentially error-prone steps involved in creating SLRs and meta-analyses.

Natural Language Processing

Natural Language Processing ML ML Support Vector Machines

AWS performs fine-tuning on a Large Language Model (LLM) to classify toxic speech for a large gaming company

AWS Machine Learning Blog

AUGUST 7, 2023

AWS ProServe solved this use case through a joint effort between the Generative AI Innovation Center (GAIIC) and the ProServe ML Delivery Team (MLDT). AWS received about 100 samples of labeled data from the customer, which is a lot less than the 1,000 samples recommended for fine-tuning an LLM in the data science community.

AWS

AWS ML ML Data Science

Accelerating large-scale neural network training on CPUs with ThirdAI and AWS Graviton

AWS Machine Learning Blog

FEBRUARY 29, 2024

We are actively working on extending our methods to additional domains, such as computer vision, but be aware that our efficiency improvements do not translate to all ML domains at this time. Graviton Technical Guide is a good resource to consider while evaluating your ML workloads to run on Graviton.

AWS

AWS Deep Learning Deep Learning ML

Taking Pandas To The Next Level With LLMs

Mlearning.ai

MAY 15, 2023

Photo by Andrew Neel on Unsplash Introduction If you are working or have worked on any data science task then you definitely used pandas. So, pandas is a library which helps with performing data ingestion and transformations. apply(lambda x: x.year) df.groupby('year')['Sales'].mean() Yearly average sales.

Data Science

Data Science Machine Learning Machine Learning AI

Data Catalogs: A Category of Their Own

Alation

FEBRUARY 20, 2020

While this requires technology – AI, machine learning, log parsing, natural language processing,metadata management, this technology must be surfaced in a form accessible to business users – the data catalog. The Forrester Wave : Machine Learning Data Catalogs, Q2 2018. Subscribe to Alation's Blog.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Analytics

Identifying defense coverage schemes in NFL’s Next Gen Stats

AWS Machine Learning Blog

FEBRUARY 10, 2023

Through a collaboration between the Next Gen Stats team and the Amazon ML Solutions Lab , we have developed the machine learning (ML)-powered stat of coverage classification that accurately identifies the defense coverage scheme based on the player tracking data. Each season consists of around 17,000 plays.

ML

ML ML Machine Learning Machine Learning

A Study of Real-world AI Model Failures and Their Impact

ODSC - Open Data Science

MAY 4, 2023

Amazon’s AI Resume Screening In 2018, Amazon abandoned an AI-powered resume screening tool after it was found to be biased against women. If you are curious to know how your AI/ML models in production might be failing under your noses and how to go about fixing them, be sure to catch my session, “ Why do AI Models go Rogue?

AI

AI AI Data Science ML

Incorporate offline and online human – machine workflows into your generative AI applications on AWS

AWS Machine Learning Blog

MAY 14, 2024

RLHF is a technique that combines rewards and comparisons, with human feedback to pre-train or fine-tune a machine learning (ML) model. Response before RLHF : SageMaker stores code in ML storage volumes Response after RLHF : SageMaker stores code in ML storage volumes, secured by security groups and optionally encrypted at rest.

AWS

AWS AI AI Machine Learning

Federated Learning on AWS with FedML: Health analytics without sharing sensitive data – Part 2

AWS Machine Learning Blog

JANUARY 13, 2023

To mitigate these challenges, we propose a federated learning (FL) framework, based on open-source FedML on AWS, which enables analyzing sensitive HCLS data. It involves training a global machine learning (ML) model from distributed health data held locally at different sites. Import the data loader into the training script.

AWS

AWS Analytics Analytics Machine Learning

23 Best Free NLP Datasets for Machine Learning

Iguazio

SEPTEMBER 20, 2023

20 Newsgroups A dataset containing roughly 20,000 newsgroup documents spanning a variety of topics, for text classification, text clustering and similar ML applications. million articles from 20,000 news sources across a seven day period in 2017 and 2018. Long-Form Content 14. The newsgroups are: comp.graphics, comp.os.ms-windows.misc,

Machine Learning

Machine Learning Machine Learning Database Data Scientist

How to optimize your LinkedIn as a Data Scientist?

Pickl AI

MAY 16, 2023

If you are a Data Scientist, then your LinkedIn profile should be flooded with information on Data Science’s latest development in this domain, such that it instantly garners the attention of recruiters as well as your contemporaries. In fact, these industries majorly employ Data Scientists.

Data Scientist

Data Scientist Data Science SQL Python

Train a Large Language Model on a single Amazon SageMaker GPU with Hugging Face and LoRA

AWS Machine Learning Blog

JUNE 5, 2023

In this post, we show you how to train the 7-billion-parameter BloomZ model using just a single graphics processing unit (GPU) on Amazon SageMaker , Amazon’s machine learning (ML) platform for preparing, building, training, and deploying high-quality ML models. BloomZ is a general-purpose natural language processing (NLP) model.

AWS

AWS ML ML Machine Learning

Introduction to Autoencoders

Flipboard

JULY 10, 2023

By using our mathematical notation, the entire training process of the autoencoder can be written as follows: Figure 2 demonstrates the basic architecture of an autoencoder: Figure 2: Architecture of Autoencoder (inspired by Hubens, “Deep Inside: Autoencoders,” Towards Data Science , 2018 ).

Deep Learning

Deep Learning Deep Learning Machine Learning Machine Learning

First Step to Object Detection Algorithms

Heartbeat

FEBRUARY 6, 2023

When you’re working on an enterprise scale, managing your ML models can be tricky. YOLOv3 is a newer version of YOLO and was released in 2018. During training, the model is presented with an image with its own real labels and learns to predict the class and position of each object in the image, as well as the corresponding mask.

Algorithm

Algorithm Deep Learning Deep Learning ML

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

Flipboard

NOVEMBER 24, 2023

Iris was designed to use machine learning (ML) algorithms to predict the next steps in building a data pipeline. He currently is working on Generative AI for data integration. Clay Elmore is an AI/ML Specialist Solutions Architect at AWS. He is the author of the upcoming book “What’s Your Problem?”

Database

Database AWS ETL SQL

A Vision for the Future: How Computer Vision is Transforming Robotics

Heartbeat

MARCH 14, 2023

photo from Data Science Central Industrial automation, security and surveillance, and service robots are just a few examples of fields that might benefit from robotics’ ability to identify and track objects. A combination of simulated and real-world data was used to train the system, enabling it to generalize to new objects and tasks.

Deep Learning

Deep Learning Deep Learning Algorithm Machine Learning

Present and future of data cubes: an European EO perspective

Mlearning.ai

JANUARY 26, 2023

Priorities for Data Cubes evolution Users and developers discussed some of the main trends in the evolution of data cubes and best practices moving forward, such as how to overcome bottlenecks, and key technologies to improve efficiency and accessibility. 2/2) What should be the priority for the data cube evolution? 2018, July).

AWS

AWS Database Data Science Clean Data

Top 5 Generative AI Integration Companies to drive Customer Support in 2023

Chatbots Life

MAY 16, 2023

10Clouds is a software consultancy, development, ML, and design house based in Warsaw, Poland. Deeper Insights has six years of experience in building AI solutions for large enterprise and scale-up clients, a suite of AI models, and data visualization dashboards that enable them to quickly analyze and share insights.

AI

AI AI Natural Language Processing Artificial Intelligence

The NLP Cypher | 02.14.21

Towards AI

JULY 19, 2023

The Continuing Story of Neural Magic Around New Year’s time, I pondered about the upcoming sparsity adoption and its consequences on inference w/r/t ML models. But first… a word from our sponsors: [link] If you enjoy the read, help us out by giving it a ?? and share with friends! The company is Neural Magic.

Natural Language Processing

Natural Language Processing Azure Python Artificial Intelligence

Generative AI in the Enterprise

O'Reilly Media

NOVEMBER 28, 2023

AI is the next generation of what we called “data science” a few years back, and data science represented a merger between statistical modeling and software development. The next most needed skill is operations for AI and ML (54%). That’s not the same as failure, and 2018 significantly predates generative AI.

AI

AI AI Data Analysis Data Analysis

Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart

AWS Machine Learning Blog

MAY 2, 2023

Text generation using RAG with LLMs enables you to generate domain-specific text outputs by supplying specific external data as part of the context fed to LLMs. JumpStart is a machine learning (ML) hub that can help you accelerate your ML journey. SageMaker Savings Plans apply only to SageMaker ML Instance usage.

Algorithm

Algorithm Machine Learning Machine Learning Natural Language Processing

The Hidden Cost of Poor Training Data in Machine Learning: Why Quality Matters

How to Learn Machine Learning

OCTOBER 10, 2024

The quality of your training data in Machine Learning (ML) can make or break your entire project. Real-Life Examples of Poor Training Data in Machine Learning Amazon’s Hiring Algorithm Disaster In 2018, Amazon made headlines for developing an AI-powered hiring tool to screen job applicants. Sounds great, right?

Machine Learning

Machine Learning Machine Learning Data Quality Algorithm

Google Research, 2022 & beyond: Research community engagement

Google Research AI blog

FEBRUARY 28, 2023

Adherence to such public health programs is a prevalent challenge, so researchers from Google Research and the Indian Institute of Technology, Madras worked with ARMMAN to design an ML system that alerts healthcare providers about participants at risk of dropping out of the health information program. certainty when used correctly.

ML

ML ML Deep Learning Deep Learning

Capital One’s data-centric solutions to banking business challenges

Snorkel AI

MAY 12, 2023

Three experts from Capital One ’s data science team spoke as a panel at our Future of Data-Centric AI conference in 2022. Please welcome to the stage, Senior Director of Applied ML and Research, Bayan Bruss; Director of Data Science, Erin Babinski; and Head of Data and Machine Learning, Kishore Mosaliganti.

Machine Learning

Machine Learning Machine Learning ML ML

Capital One’s data-centric solutions to banking business challenges

Snorkel AI

MAY 12, 2023

Three experts from Capital One ’s data science team spoke as a panel at our Future of Data-Centric AI conference in 2022. Please welcome to the stage, Senior Director of Applied ML and Research, Bayan Bruss; Director of Data Science, Erin Babinski; and Head of Data and Machine Learning, Kishore Mosaliganti.

Machine Learning

Machine Learning Machine Learning ML ML

A Quick Recap of Natural Language Processing

Mlearning.ai

JUNE 7, 2023

I worked on an early conversational AI called Marcel in 2018 when I was at Microsoft. In 2018 when BERT was introduced by Google, I cannot emphasize how much it changed the game within the NLP community. In retrospect, we were slightly ahead of our time because of what came next.

Natural Language Processing

Natural Language Processing AI AI ML

Explainability in AI and Machine Learning Systems: An Overview

Heartbeat

SEPTEMBER 13, 2023

Why We Will Never Open Deep Learning's Black Box || Towards Data Science Brent, M., Explainability and Auditability in ML: Definitions, Techniques, and Tools || Neptune.ai For explainability purposes, you can log the explanations generated by different techniques and associate them with the corresponding model runs.

Machine Learning

Machine Learning Machine Learning AI AI

Unleashing the Power of Deep Learning: Revolutionizing Recommender Systems

Heartbeat

OCTOBER 18, 2023

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Deep Learning

Deep Learning Deep Learning Machine Learning Machine Learning

RoBERTa: A Modified BERT Model for NLP

Heartbeat

MARCH 15, 2023

An open-source machine learning model called BERT was developed by Google in 2018 for NLP, but this model had some limitations, and due to this, a modified BERT model called RoBERTa (Robustly Optimized BERT Pre-Training Approach) was developed by the team at Facebook in the year 2019.

Deep Learning

Deep Learning Deep Learning Machine Learning Machine Learning

Bridging the Gap: New Datasets Push Recommender Research Toward Real-World Scale

Building Generative AI and ML solutions faster with AI apps from AWS partners using Amazon SageMaker

Trending Sources

The most important unanswered questions of 2018 in Artificial Intelligence (AI) and Machine Learning (ML)

GoLang for Data Science

The Rise and Fall of Data Science Trends: A 2018–2024 Conference Perspective

Predictive analytics vs. AI: Why the difference matters in 2023?

The Role of DevSecOps in Ensuring Data Privacy and Security in Data Science Projects

Google AI

Data Integrity: The Foundation for Trustworthy AI/ML Outcomes and Confident Business Decisions

Improving ML Datasets with Cleanlab, a Standard Framework for Data-Centric AI

Machine Learning Engineering in the Real World

Best Colleges for Data Science Course Online in India

Deploy large language models for a healthtech use case on Amazon SageMaker

Racing beyond DeepRacer: Debut of the AWS LLM League

How Marubeni is optimizing market decisions using AWS machine learning and analytics

NLP-Powered Data Extraction for SLRs and Meta-Analyses

AWS performs fine-tuning on a Large Language Model (LLM) to classify toxic speech for a large gaming company

Accelerating large-scale neural network training on CPUs with ThirdAI and AWS Graviton

Taking Pandas To The Next Level With LLMs

Data Catalogs: A Category of Their Own

Identifying defense coverage schemes in NFL’s Next Gen Stats

A Study of Real-world AI Model Failures and Their Impact

Incorporate offline and online human – machine workflows into your generative AI applications on AWS

Federated Learning on AWS with FedML: Health analytics without sharing sensitive data – Part 2

23 Best Free NLP Datasets for Machine Learning

How to optimize your LinkedIn as a Data Scientist?

Train a Large Language Model on a single Amazon SageMaker GPU with Hugging Face and LoRA

Introduction to Autoencoders

First Step to Object Detection Algorithms

How SnapLogic built a text-to-pipeline application with Amazon Bedrock to translate business intent into action

A Vision for the Future: How Computer Vision is Transforming Robotics

Present and future of data cubes: an European EO perspective

Top 5 Generative AI Integration Companies to drive Customer Support in 2023

The NLP Cypher | 02.14.21

Generative AI in the Enterprise

Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart

The Hidden Cost of Poor Training Data in Machine Learning: Why Quality Matters

Google Research, 2022 & beyond: Research community engagement

Capital One’s data-centric solutions to banking business challenges

Capital One’s data-centric solutions to banking business challenges

A Quick Recap of Natural Language Processing

Explainability in AI and Machine Learning Systems: An Overview

Unleashing the Power of Deep Learning: Revolutionizing Recommender Systems

RoBERTa: A Modified BERT Model for NLP

Stay Connected