2018, Data Scientist and Natural Language Processing

Bridging the Gap: New Datasets Push Recommender Research Toward Real-World Scale

KDnuggets

JUNE 11, 2025

Spotify Million Playlist Released for RecSys 2018, this dataset helps analyze short-term and sequential listening behavior. Yelp Open Dataset Contains 8.6M reviews, but coverage is sparse and city-specific. Valuable for local business research, yet not optimal for large-scale generalizable models.

Natural Language Processing

Natural Language Processing Data Science Machine Learning Machine Learning

AI Agents in Analytics Workflows: Too Early or Already Behind?

Flipboard

JUNE 13, 2025

BI Dashboards Everywhere After 2018, a new shift happened. Tools like Tableau and Power BI do data analysis by just clicking, and they offer amazing visualizations at once, called dashboards. Nate Rosidi is a data scientist and in product strategy. Now the complex tasks could be done in minutes. This was the new standard.

Analytics

Analytics Analytics Natural Language Processing Data Science

Tensor Processing Units (TPUs)

Dataconomy

MARCH 19, 2025

They are essential for processing large amounts of data efficiently, particularly in deep learning applications. What are Tensor Processing Units (TPUs)? History of Tensor Processing Units The inception of TPUs can be traced back to 2015 when Google developed them for internal machine learning projects.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Deep Learning

How To Make a Career in GenAI In 2024

Towards AI

DECEMBER 28, 2023

GenAI I serve as the Principal Data Scientist at a prominent healthcare firm, where I lead a small team dedicated to addressing patient needs. Over the past 11 years in the field of data science, I’ve witnessed significant transformations. In 2023, we witnessed the substantial transformation of AI, marking it as the ‘year of AI.’

Natural Language Processing

Natural Language Processing Deep Learning Deep Learning Python

In the Counterfeit Boom, AI is on Both Sides

ODSC - Open Data Science

AUGUST 31, 2023

The data set only helps it if it’s well-trained and supervised. A Mongolian pharmaceutical company engaged in a pilot study in 2018 to detect fake drugs, an initiative with the potential to save hundreds of thousands of lives. Experts train AI specifically on how to fight counterfeit products.

Natural Language Processing

Natural Language Processing AI AI Data Scientist

NLP-Powered Data Extraction for SLRs and Meta-Analyses

Towards AI

JULY 20, 2023

Natural Language Processing Getting desirable data out of published reports and clinical trials and into systematic literature reviews (SLRs) — a process known as data extraction — is just one of a series of incredibly time-consuming, repetitive, and potentially error-prone steps involved in creating SLRs and meta-analyses.

Natural Language Processing

Natural Language Processing ML ML Support Vector Machines

Deploy large language models for a healthtech use case on Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 6, 2024

Transformers, BERT, and GPT The transformer architecture is a neural network architecture that is used for natural language processing (NLP) tasks. BERT can be fine-tuned for a variety of NLP tasks, including question answering, natural language inference, and sentiment analysis.

AWS

AWS ML ML Data Preparation

Beyond data: Cloud analytics mastery for business brilliance

Dataconomy

SEPTEMBER 4, 2023

Text analytics: Text analytics, also known as text mining, deals with unstructured text data, such as customer reviews, social media comments, or documents. It uses natural language processing (NLP) techniques to extract valuable insights from textual data. Poor data integration can lead to inaccurate insights.

Analytics

Analytics Analytics Big Data Analytics Big Data Analytics

Evolving Trends in Data Science: Insights from ODSC Conference Sessions from 2015 to 2024

ODSC - Open Data Science

MARCH 10, 2025

The Early Years: Laying the Foundations (20152017) In the early years, data science conferences predominantly focused on foundational topics like data analytics , visualization , and the rise of big data. The Deep Learning Boom (20182019) Between 2018 and 2019, deep learning dominated the conference landscape.

Data Science

Data Science Deep Learning Deep Learning Machine Learning

Data Catalogs: A Category of Their Own

Alation

FEBRUARY 20, 2020

While this requires technology – AI, machine learning, log parsing, natural language processing,metadata management, this technology must be surfaced in a form accessible to business users – the data catalog. The Forrester Wave : Machine Learning Data Catalogs, Q2 2018.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Analytics

Train self-supervised vision transformers on overhead imagery with Amazon SageMaker

AWS Machine Learning Blog

AUGUST 16, 2023

The images document the land cover, or physical surface features, of ten European countries between June 2017 and May 2018. Jeremy Anderson is a Director & Data Scientist at Travelers on the AI & Automation Accelerator team. His specialty is Natural Language Processing (NLP) and is passionate about deep learning.

ML

ML ML AWS Data Scientist

How foundation models and data stores unlock the business potential of generative AI

IBM Journey to AI blog

AUGUST 1, 2023

Foundation models can be trained to perform tasks such as data classification, the identification of objects within images (computer vision) and natural language processing (NLP) (understanding and generating text) with a high degree of accuracy. An open-source model, Google created BERT in 2018.

AI

AI AI Machine Learning Machine Learning

Foundation models: a guide

Snorkel AI

MARCH 1, 2023

Foundation models are large AI models trained on enormous quantities of unlabeled data—usually through self-supervised learning. This process results in generalized models capable of a wide variety of tasks, such as image classification, natural language processing, and question-answering, with remarkable accuracy.

Natural Language Processing

Natural Language Processing Supervised Learning Machine Learning Machine Learning

NLP in Legal Discovery: Unleashing Language Processing for Faster Case Analysis

Heartbeat

AUGUST 23, 2023

But what if there was a technique to quickly and accurately solve this language puzzle? Enter Natural Language Processing (NLP) and its transformational power. But what if there was a way to unravel this language puzzle swiftly and accurately?

Natural Language Processing

Natural Language Processing Algorithm Artificial Intelligence Artificial Intelligence

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

AWS Machine Learning Blog

APRIL 19, 2023

Since 2018, our team has been developing a variety of ML models to enable betting products for NFL and NCAA football. Our data scientists train the model in Python using tools like PyTorch and save the model as PyTorch scripts. Business requirements We are the US squad of the Sportradar AI department.

ML

ML ML Deep Learning Deep Learning

Identifying defense coverage schemes in NFL’s Next Gen Stats

AWS Machine Learning Blog

FEBRUARY 10, 2023

Quantitative evaluation We utilize 2018–2020 season data for model training and validation, and 2021 season data for model evaluation. Haibo Ding is a senior applied scientist at Amazon Machine Learning Solutions Lab. He is broadly interested in Deep Learning and Natural Language Processing.

ML

ML ML Machine Learning Machine Learning

AWS performs fine-tuning on a Large Language Model (LLM) to classify toxic speech for a large gaming company

AWS Machine Learning Blog

AUGUST 7, 2023

AWS received about 100 samples of labeled data from the customer, which is a lot less than the 1,000 samples recommended for fine-tuning an LLM in the data science community. Improving Language Understanding by Generative Pre-Training” Devlin et al., Safa Tinaztepe is a full-stack data scientist with AWS Professional Services.

AWS

AWS ML ML Data Science

Meta-Learning: Learning to Learn in Machine Learning

Heartbeat

JANUARY 29, 2024

Imagine an AI system that becomes proficient in many tasks through extensive training on each specific problem and a higher-order learning process that distills valuable insights from previous learning endeavors. Natural Language Processing: With Meta-Learning, language models can be generalized across various languages and dialects.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Algorithm

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 1

AWS Machine Learning Blog

OCTOBER 2, 2024

About the Authors Maira Ladeira Tanke is a Senior Generative AI Data Scientist at AWS. Her work spans speech recognition, natural language processing, and large language models. Andrew Gordon Wilson before joining Amazon in 2018.

AI

AI AI AWS Machine Learning

RoBERTa: A Modified BERT Model for NLP

Heartbeat

MARCH 15, 2023

But now, a computer can be taught to comprehend and process human language through Natural Language Processing (NLP), which was implemented, to make computers capable of understanding spoken and written language. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Deep Learning

Deep Learning Deep Learning Machine Learning Machine Learning

Fine-tune and deploy Llama 2 models cost-effectively in Amazon SageMaker JumpStart with AWS Inferentia and AWS Trainium

AWS Machine Learning Blog

JANUARY 17, 2024

About the Authors Xin Huang is a Senior Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering.

AWS

AWS Python Machine Learning Machine Learning

McKinsey QuantumBlack experts: exciting foundation model future

Snorkel AI

MARCH 21, 2023

In 2018, we did a piece of research where we tried to estimate the value of AI and machine learning across geographies, across use cases, and across sectors. One is compared to our first survey conducted in 2018, we see more enterprises investing in AI capability. We need data scientists familiar with deep learning frameworks.

ML

ML ML AI AI

sense2vec reloaded: contextually-keyed word vectors

Explosion

NOVEMBER 21, 2019

from_disk("/path/to/s2v_reddit_2015_md") nlp.add_pipe(s2v) doc = nlp("A sentence about natural language processing.") text == "natural language processing" freq = doc[3:6]._.s2v_freq While few data scientists would endorse this as best practice, the qualitative evaluation does have important advantages.

Natural Language Processing

Natural Language Processing Data Scientist Machine Learning Machine Learning

Introducing spaCy v2.1

Explosion

MARCH 17, 2019

of the spaCy Natural Language Processing library includes a huge number of features, improvements and bug fixes. spaCy is an open-source library for industrial-strength natural language processing in Python. Maybe you’re a grad student working on a paper, maybe you’re a data scientist working on a prototype.

Python

Python Natural Language Processing Deep Learning Deep Learning

Large language models: their history, capabilities and limitations

Snorkel AI

MAY 25, 2023

Large language models are foundation models (a kind of large neural network) that generate or embed text. The text they generate can be conditioned by giving them a starting point or “prompt,” enabling them to solve useful tasks expressed in natural language or code. However, modern large language models make it even easier.

Natural Language Processing

Natural Language Processing Python Machine Learning Machine Learning

Large language models: their history, capabilities and limitations

Snorkel AI

MAY 25, 2023

Large language models are foundation models (a kind of large neural network) that generate or embed text. The text they generate can be conditioned by giving them a starting point or “prompt,” enabling them to solve useful tasks expressed in natural language or code. However, modern large language models make it even easier.

Natural Language Processing

Natural Language Processing Python Machine Learning Machine Learning

Harvard professor: DataPerf and AI’s need for data benchmarks

Snorkel AI

APRIL 25, 2023

What the plot here on the X-axis is showing is just, from 2018 up to 2022, relative performance from the best system that we had because of a good benchmark like MLPerf and seeing what the improvement was with respect to “Moore’s Law” (because Moore’s law has been slowing now). Now, these are the “fundamental benchmarks.”

Machine Learning

Machine Learning Machine Learning ML ML

Harvard professor: DataPerf and AI’s need for data benchmarks

Snorkel AI

APRIL 25, 2023

What the plot here on the X-axis is showing is just, from 2018 up to 2022, relative performance from the best system that we had because of a good benchmark like MLPerf and seeing what the improvement was with respect to “Moore’s Law” (because Moore’s law has been slowing now). Now, these are the “fundamental benchmarks.”

Machine Learning

Machine Learning Machine Learning ML ML

Ask HN: Who wants to be hired? (July 2025)

Hacker News

JULY 1, 2025

LinkedIn: https://www.linkedin.com/in/edwin-genego/ Throughout my career since 2018 I have primarily been a Python and Django developer, with a unintentional pivot to AI integrations & engineering in 2022/2023. I'm a data scientist, I'm looking for hard problems to solve.

Python

Python AWS SQL ML

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 2

AWS Machine Learning Blog

OCTOBER 21, 2024

About the Authors Maira Ladeira Tanke is a Senior Generative AI Data Scientist at AWS. Her work spans speech recognition, natural language processing, and large language models. Andrew Gordon Wilson before joining Amazon in 2018.

AWS

AWS AI AI ML

Fine-tune Meta Llama 3.2 text generation models for generative AI inference using Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 11, 2024

Data scientists and developers can quickly prototype and experiment with various ML use cases, accelerating the development and deployment of ML applications. This fine-tuning process involves providing the model with a dataset specific to the target domain. Domain adaption format You can fine-tune the Meta Llama 3.2

AI

AI AI ML ML

Data Science Current

Bridging the Gap: New Datasets Push Recommender Research Toward Real-World Scale

AI Agents in Analytics Workflows: Too Early or Already Behind?

Trending Sources

Tensor Processing Units (TPUs)

How To Make a Career in GenAI In 2024

In the Counterfeit Boom, AI is on Both Sides

NLP-Powered Data Extraction for SLRs and Meta-Analyses

Deploy large language models for a healthtech use case on Amazon SageMaker

Beyond data: Cloud analytics mastery for business brilliance

Evolving Trends in Data Science: Insights from ODSC Conference Sessions from 2015 to 2024

Data Catalogs: A Category of Their Own

Train self-supervised vision transformers on overhead imagery with Amazon SageMaker

How foundation models and data stores unlock the business potential of generative AI

Foundation models: a guide

NLP in Legal Discovery: Unleashing Language Processing for Faster Case Analysis

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

Identifying defense coverage schemes in NFL’s Next Gen Stats

AWS performs fine-tuning on a Large Language Model (LLM) to classify toxic speech for a large gaming company

Meta-Learning: Learning to Learn in Machine Learning

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 1

RoBERTa: A Modified BERT Model for NLP

Fine-tune and deploy Llama 2 models cost-effectively in Amazon SageMaker JumpStart with AWS Inferentia and AWS Trainium

McKinsey QuantumBlack experts: exciting foundation model future

sense2vec reloaded: contextually-keyed word vectors

Introducing spaCy v2.1

Large language models: their history, capabilities and limitations

Large language models: their history, capabilities and limitations

Harvard professor: DataPerf and AI’s need for data benchmarks

Harvard professor: DataPerf and AI’s need for data benchmarks

Ask HN: Who wants to be hired? (July 2025)

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 2

Fine-tune Meta Llama 3.2 text generation models for generative AI inference using Amazon SageMaker JumpStart

Stay Connected