11 Ways to do Machine Learning Better at ODSC West 2023

ODSC - Open Data Science
5 min readOct 18, 2023

Many companies are now utilizing data science and machine learning, but there’s still a lot of room for improvement in terms of ROI. A 2021 VentureBeat analysis suggests that 87% of AI models never make it to a production environment and an MIT Sloan Management Review article found that 70% of companies reported minimal impact from AI projects. Yet despite these difficulties, Gartner forecasts investment in artificial intelligence to reach an unprecedented $62.5 billion in 2022, an increase of 21.3% from 2021.

Nevertheless, we are still left with the question: How can we do machine learning better? To find out, we’ve taken some of the upcoming tutorials and workshops from ODSC West 2023 and let the experts via their topics guide us toward building better machine learning.

An Introduction to Data Labeling

Chris Hoge | Head of Community | Heartex

In the modern age of Machine Learning, the importance of high-quality, accurately labeled data cannot be overstated. This presentation introduces how to create high-quality, annotated datasets for training machine learning models. In this tutorial, we will use Label Studio, an open-source, multi-type data labeling tool, to explore common methods for annotating raw datasets, including both human and automated labeling techniques.

Uncertainty Quantification: Approaches and Methods

Brian Lucena | Principal | Numeristical

We will begin with an overview of why it is important to quantify the uncertainty around a model’s predictions. We will discuss how UQ differs in classification problems versus regression problems, and introduce the various approaches. This workshop will provide the theoretical context for these methods and then dive into real-world examples af their applications using Jupyter notebooks.

Data Morph: A Cautionary Tale of Summary Statistics

Stefanie Molin | Software Engineer, Data Scientist, Chief Information Security Office, Author of Hands-On Data Analysis with Pandas | Bloomberg

In this talk, Stefanie will discuss Data Morph, an open source package that builds on previous research from Autodesk using simulated annealing to perturb an arbitrary input dataset into a variety of shapes, while preserving the mean, standard deviation, and correlation to multiple decimal points. She will showcase how it works, discuss the challenges faced during development, and explore the limitations of this approach.

Causal AI: from Data to Action

Dr. Andre Franca | CTO | connectedFlow

In this talk, we will explore and demystify the world of Causal AI for data science practitioners, with a focus on understanding cause-and-effect relationships within data to drive optimal decisions. In this talk, we will focus on shapley, DAGs, discovering causality, and optimal decision-making.

Completing Knowledge Discovery fast at High Quality with AI

Alex Liu, Ph.D. | Founder and Director | RMDS Lab

In this presentation, our speaker will commence by conducting a comprehensive review of a list of common failure factors. Additionally, they will present an AI-driven ecosystem approach for knowledge discovery and discuss a real-world test of this approach involving 14 knowledge discovery projects. Through this, attendees will gain valuable insights into the crucial connection between the success of data-driven knowledge discovery projects and advancements in AI technology. Participants will also grasp how this ecosystem approach can yield remarkable gains in speed, enhanced quality, and effective risk mitigation for knowledge discovery initiatives.

Bridging the Interpretability Gap in Customer Segmentation

Evie Fowler | Senior Data Scientist | Fulcrum Analytics

Historically, there have been two main approaches to segmentation: rules-based and machine learning-driven. In this talk, Evie will present a new, hybrid approach that combines the best aspects of both methods. The process begins with a careful observation of customer data and an assessment of whether there are naturally formed clusters in the data. It continues with the selection of a clustering algorithm and the fine-tuning of a model to create clusters. After that, there is additional exploratory data analysis to understand what differentiates each cluster from the others. Finally, a linear approximation is used to create a simple representation of the machine learning clustering algorithm.

Human-Centered AI

Peter Norvig, PhD | Engineering Director, Education Fellow | Google, Stanford Institute for Human-Centered Artificial Intelligence (HAI)

We have seen amazing technical progress in AI applications in recent years. This talk considered the human side rather than the technical side: how can we gain confidence that our applications will be fair, just, truthful, beneficial, and well-stirred for their users, the other stakeholders, and society at large.

Representation Learning on Graphs and Networks

Dr. Petar Veličković Staff Research Scientist | Affiliated Lecturer DeepMind | University of Cambridge

In this talk, Petar will attempt to provide several “bird’s eye” views on GNNs. Following a quick motivation on the utility of graph representation learning, he will derive GNNs from first principles of permutation invariance and equivariance. We will discuss how we can build GNNs that are not strictly reliant on the input graph structure. The talk will be geared towards a generic computer science audience, though some basic knowledge of machine learning with neural networks will be a useful prerequisite.

Machine Learning Has Become Necromancy

Mark Saroufim | Author | Breaking Stagnation

Machine learning has undergone a profound transformation with open source. From a technology that can only do naive curve fitting to technology that could potentially end humanity’s dominion. In 2017, Ali Rahimi declared that Machine learning is the new alchemy and we’d like to go further and claim that Machine Learning is the new necromancy. A forgotten science with a passionately strong open-source ethos that was eventually destroyed by the catholic church.

Anomaly Detection for CRM Production Data

Geeta Shankar, Software Engineer and Tuli Nivas, Software Engineering Architect | Salesforce

In our technical talk, we’ll demonstrate the value of machine learning and analytical visualizations in solving real-world data analytics challenges. We’ll showcase how our data-driven production system addresses these challenges, emphasizing the importance of data analytics in ensuring reliable systems and building customer trust.

Missing Data: A Synthetic Data Approach for Missing Data Imputation

Fabiana Clemente | Co-founder and CDO | YData

In this talk, we will cover the use of Generative models, such as LLMs and GANs, for the generation of smart synthetic data that can be leveraged to impute missing data. By using a generative model to impute missing data, we can generate new samples that are representative of the underlying data distribution, which can help to reduce the impact of missing data on our models. In addition, these models can be fine-tuned to specific datasets, allowing us to generate synthetic data that is tailored to our particular use case.

Learn Better Methods For Better Machine Learning at ODSC West 2023

To dive deeper into these topics, join us at ODSC West 2023 this October 30th to November 2nd, either in-person or virtually. The conference will also feature hands-on training sessions in focus areas, such as machine learning, deep learning, MLOps and data engineering, responsible AI, and more. What’s more, you can extend your immersive training to 4 days with a bootcamp pass. Check out all of our types of passes here.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.