MLOps Blog

ML Collaboration: Best Practices From 4 ML Teams

Vidhi Chugh

7 min

27th July, 2023

MLOps

As per a report by McKinsey, AI has the potential to contribute USD 13 trillion to the global economy by 2030. The onset of the pandemic has triggered a rapid increase in the demand and adoption of ML technology. It uncovers diverse uses, such as improving productivity leading to operational gains, and introducing new features to improve customer experience and engagement, among many others.

Building ML team

Following the surge in ML use cases that have the potential to transform business, the leaders are making a significant investment in ML collaboration, building teams that can deliver the promise of machine learning. A large part of building successful ML teams depends on the size of the organization and its strategic vision.

Size and strategic vision of the organization

Building ML teams is a non-trivial decision and primarily depends on how leadership endorses ML technology and whether they believe ML aligns with the strategic vision of their organization.

There is no golden rule on how to build successful ML teams. The leaders are frequently faced with a lot of hurdles and questions, such as

1 What should be the team size?
2 What type of skills should my team possess?
3 How to balance the mix of specialists vs generalists? Having generalists early on in the team-building phase helps in formulating the business problem and gives rise to the need for a range of skills that are needed for the project later on.
4 How do organize the team between horizontal and vertical initiatives?

Focus on operationalising ML

Most companies leave the production efforts as an afterthought and are not prepared to handle the scale when it comes to creating production-grade systems. Hence, the ML teams must have a mix of strong data architects and engineering experts that can successfully operationalize the ML model.

How to organize ML team

Centralized ML team

People from different fields like engineering, product, DevOps, and ML all come together under one big team. Such a team becomes a task force for any ML initiatives the organization plans to employ. The ideation cycle from initiation to fruition becomes blazing fast. It is an ideal case of having experts from a breadth of skillsets – thereby forming a win-win for breadth and depth of technical expertise.

However, the downside of such a team organization is that the knowledge gets limited to this forum. With barriers to knowledge dissemination, such a model leads to increased dependency and hinders democratization.

Cenrelized model — *Illustration of centralized ML team | Source*

Decentralized ML team

It is a small “team of teams” that has representation from the technical experts to deliver a specialized feature or solution. The structure is very agile, where the team from different backgrounds comes together for a specified deliverable and dissolves after that.

Decentralized team — *Illustration of decentralized ML team | Source*

Roles in ML Team and How They Collaborate With Each Other

ML collaboration: why it is important

We will explain each of the diverse roles in the later section of this post, but a quick glance at the team composition already highlights the potential issues that can surface on how different team members collaborate with each other.

Understanding requirements

Quite often, the ML collaborati aspect is often not paid much attention to. It leads to gaps in communicating the requirements, which are neither understood well nor documented properly. As a result of which, the project might face setbacks in terms of repeating tasks or, at worst, scraping away the work that has already expended effort.

Pursuing the right direction

The clear focus on ML collaboration among different stakeholders ensures that the project is progressing in the direction it is intended to and that any unforeseen risks are communicated in the right forum well in time.

Approvals from stakeholders

ML projects are inherently iterative by nature. Data scientists frame the business problem and the objective into a statistical solution and start with the very first step of data exploration. EDA, as it is popularly called, is the pivotal phase of the project where discoveries are made. It is during this stage that the data scientists might find the lack of good quality signal or pattern in the data, or a change in the chosen metric to measure the success, or the problem itself might not be ML solvable.

Such findings, when circulated through business leaders, often result in revised business goals. Clear communication in such scenarios is crucial to assure that all stakeholders are on the same page with the updated project status.

Union of business and data teams

The success of ML projects lies in the strong collaboration between the data team and the business team. Such continuous alliance of the business team helps the data science team to create ML models that have the potential to add significant business value.

Visibility supersedes micro-management

Strong team collaboration brings visibility to how each member will contribute to the final solution. It underpins the significance of ownership of various tasks during the ML project lifecycle.

Degree of Communication Based on Role — *Degree of communication based on role | Source*

We have reached out to four companies and interviewed them on how they structure their ML teams, what are the best practices their teams follow to collaborate with each other, what tools they use, and more.

Kindly note that the names of the organizations and those of the point of contact have been disclosed after their approval, wherever possible.

ML collaboration and timely evaluation of system design

Thanks to Abhishek Rai, a data scientist with Gigaforce Inc, for collaborating with me on this interview post and reviewing it before it was published.

Organization

Gigaforce Inc

Industry

InsurTech provider

Team size

Gigaforce built an ML team three years ago in 2020 and has a team size of 5-7.

Team composition

The team comprises domain experts, data engineers, data scientists, and ML engineers.

Machine learning collaboration

Gigaforce allocates work based on the phase of the project. Evidently, not all team members are required throughout the lifecycle of the project. For example, a domain expert plays a crucial role in the project’s initial phase to help the data scientists with business requirements.

The ML team at Gigaforce has been able to build and deploy a regression model into production within five months, which is a great feat and talks volumes about impeccable project management and team organization. So, let us learn from them how they assigned the roles and responsibilities and what measures they took to foster collaboration.

Gigaforce works with enormous amounts of data that calls for building multiple model versions, aka experiments. The team quickly identified the need for data versioning tools which is one of the key challenges the industry faces today while building production-grade solutions. They have evaluated DVC for their data-versioning needs and are working on optimizing their use cases.

“Based on the amount of data we deal with the major challenge that comes with Model versioning is tracking data to reproduce similar experiments in the future. So data versioning is one of the major challenges in the industry and we are trying to be more effective with how we can optimize our tracking of changes in Data.”

We all have witnessed that ~80% of the proof of concepts do not go down the production path. There could be many contributing factors, such as lack of vision, force-fitting ML components to solve the business problem, not having good-signal quality data to learn the pattern, etc. But, the one factor that can save the damage before the project is in the neck-deep stage is the ability to foresee the intricacies arising during model integration into the existing tech stack of the organization. This is where Gigaforce took the right decision of involving engineers right at the POC planning stage to share their technical expertise on the possible feasibility of the solution.

Tools used

Google Docs for documentation
Confluence for documentation
Slack was used for async communication
Git for code collaboration
MLflow to track experiments

Tracking progress

Daily (or Weekly on a need basis) Standups were conducted to facilitate collaboration within the team.
The success of Gigaforce’s rapid development cycle is owed to a tightly coupled feedback loop with the business and product teams. This is crucial to ensure that developers do not become victims of any moonshot commitment and that all stakeholders are timely aware of any contingencies to calibrate the project direction.

Check also

The Best Software for Collaborating on Machine Learning Projects

ML collaboration in engineering-focused team

Thanks to Makarand Pandey, ex-product manager with Acquia, for collaborating with me on this interview post and reviewing it before it was published.

Organization

Acquia

Industry

Software-as-a-service

Team size

Acquia built an ML team five years ago in 2017 and has a team size of 6.

Team composition

The team comprises data pipeline engineers, ML engineers, full-stack engineers, and data scientists.

Team collaboration

Its team composition presents a great case wherein they have emphasized building robust data and model pipelines, such as the capacity expansion of prediction clusters, refining codebase, and retraining models.

“Roles and responsibilities are assigned usually based on the project mix in the sprints. Exploration projects were mostly handled by the data scientists while operational requests were handled by ML Engineers”

The team is highly self-sufficient as it managed on its own to build multiple probabilistic models that predict the likelihood of buying, engaging, and converting along with customer lifetime value. Needless to say, Acquia did not feel the need to hire an engineering manager for such a high-performing team. The team has developed probabilistic models and iterated on them to improve the model accuracy over time.

Tools used

Jenkins for CI/CD process
Git for code collaboration
MLflow for model versioning

Tracking Progress

Acquia simplified communication through async standups via slack – a trend commonly seen in the industry since the beginning of hybrid work culture. They use Jira for sprint tracking, AHA for product management visibility, and confluence for project documentation.

ML collaboration and playing by the team’s strengths

The company has chosen to remain anonymous but is kind to share its best practices around ML team collaboration. Our sincere thanks to them for collaborating on this interview post and reviewing it before it was published.

Organization

Anonymized and referred to by the pronoun ‘they’ in the below section.

Industry

Computer Software

Team size

They built a fairly new ML team in 2021 and have a team size of 5.

Team composition

The team comprises full-stack scientists and specialized skills-based experts.

Team collaboration

The team has built a baseline version of anomaly detection in five months and plans to create a treatment-specific “claims scoring system” soon.

While it is difficult to decide how to allocate projects to the team in general, they have a fair proposition of playing by the team’s strengths. There are various ways the project assignment is decided, but there is no thumb rule. It depends on a multitude of factors, such as what skills already exist within the organization and whether they are readily available for onboarding vs. the willingness of the allocated team to learn and deliver the project if they do not possess those skills beforehand, etc.

The team did not have a solution architect who would orchestrate the work within the team and help them with the progress of the key milestones. This certainly highlights the need for a project manager who can wear multiple hats to make the project a success.

“ Unfortunately, we didn’t have a biz arch/product owner/analyst assigned to our team so there was a huge struggle with documentation, and keeping progress updated”

Tools used

H2O.ai for machine learning workflows
Bitbucket for tracking experiments

Tracking progress

They use Confluence, Notion, Slack, and daily stand-ups to share knowledge and status updates with the team.

Learn more

How to Build Machine Learning Teams That Deliver

ML collaboration in the Kanban team

Thanks to Felix Wick, CVP, Data Science, Blue Yonder Inc, for collaborating with me on this interview post and reviewing it before it was published.

Organization

Blue Yonder Inc

Industry

Supply Chain SaaS

Team Size

Blue Yonder Inc has an ML team size of 30 and counting.

Team composition

Over the course of several years, Blue Yonder has built an engineering team for the development of a SaaS product with ML at its core. Blue Yonder’s team organization stands out as it ensures that all team members have a decent overview of the full system, in addition to having a few all-rounders around. It has a team with specific focus areas, such as Data Engineers, Software Engineers, Data Scientists, ML engineers, Full stack, and frontend developers.

Team collaboration

It is certainly difficult to manage a large team, so Blue Yonder has found an efficient way to split the team into a few sub-teams with different technical focuses such as data, model, or UI-centric.

While the members of different sub-teams collaborate with other sub-teams on end-to-end user stories and epics, they have close exchanges within their technical sub-team across stories. This team organization also brings synergies and provides an effective way to ensure that all team members are aware of the entire spectrum of work.

Tools used

Blue Yonder works in develop mode to augment scale, improvement, and expansion of scope through a continuous process, as well as operate the product at the same time. It uses modern collaboration platforms to operate at such a scale to cater to global customers.

“We use a mono repository for all sub-components with a pull request workflow and CI/CD (with unit, integration, system, model, and performance tests). A committed branch with model changes automatically triggers evaluation runs on several data sets, outputting comparisons to the master model upon which a decision can be made if the model is a good candidate or not.”

Tracking progress

It follows a Kanban-like structure clubbed with a continuous delivery mode that includes

a roadmap for epics, including a release plan
daily standups, extensive internal and external reviews for releases
regular team and sub-team retrospectives

Bonus nugget

Everything we discussed so far has been from the perspective of medium-sized teams located in one place. It is important to note that some adaptability might be required as employee strength scales up in any company. Large organizations have geographically spread out data science teams that are generally not aware of what their peers are working on. Hence, it is considered good practice to maintain a central database of all the data scientists that can list:

their areas of expertise,
what projects they have worked on in the past,
what was the success rate of their historical projects,
what projects they are working on currently, and
what skills and algorithms they aspire to learn and deliver next.

Instead of focusing on specific tools, a good way to start early and reap the benefits of such a repo would be to start with something as simple as a shared excel. The goal is to create organization-wide visibility of the pool of skills that already exist in-house. Besides, such a repository comes in handy when the project management team is looking to allocate data scientists for any upcoming project, making it a win-win for both – the business and the data scientists.

Takeaways

The discussion with these great organizations gave a sneak peek of what it takes to build robust, scalable, and live ML production systems. Through this interview post, our intent is to bring out their learnings and experiences with our readers. We would like to summarize and share three key takeaways that might prove beneficial to you in your production journey:

Machine learning teams are very diverse, with experts from different backgrounds brainstorming their way to success. Organizations need to create a culture that facilitates effective ML collaboration to mark up the team’s efficiency.

Do not go by the job title, they vary from company to company. Instead, be very clear with the skills and tasks that need to be performed to achieve the goal. Do not underestimate the contribution of any one team member. “ML projects mean data scientists” is a myth – it takes extensive brain power from tech experts to build world-class systems.
But one thing about ML projects which is not a myth – they are all about data. So, be ready to embrace the fact that you will have a lot of data versions that would lead to multiple model management. There are a number of products that help you manage that chaos – we highly recommend you do your research and timely invest in them.

References

Was the article useful?

Thank you for your feedback!

Thanks for your vote! It's been noted. | What topics you would like to see for your next read?

Thanks for your vote! It's been noted. | Let us know what should be improved.

Thanks! Your suggestions have been forwarded to our editors

More about ML Collaboration: Best Practices From 4 ML Teams

Check out our product resources and related articles below:

Explore more content topics:

Computer Vision General LLMOps ML Model Development ML Tools MLOps Natural Language Processing Product Updates Reinforcement Learning Tabular Data Time Series

Neptune is the MLOps stack component for experiment tracking.

It offers a single place to track, compare, store, and collaborate on experiments and models.

Take interactive tour of the Neptune app

See Docs

Explore resources

Check pricing

Building ML team

Size and strategic vision of the organization

Focus on operationalising ML

How to organize ML team

Centralized ML team

Decentralized ML team

Read more

ML collaboration: why it is important

Understanding requirements

Pursuing the right direction

Approvals from stakeholders

Union of business and data teams

Visibility supersedes micro-management

ML collaboration and timely evaluation of system design

Organization

Industry

Team size

Team composition

Machine learning collaboration

Tools used ​​

Tracking progress

Check also

ML collaboration in engineering-focused team

Organization

Industry

Team size

Team composition

Team collaboration

Tools used

Tracking Progress

ML collaboration and playing by the team’s strengths

Organization

Industry

Team size

Team composition

Team collaboration

Tools used

Tracking progress

Learn more

ML collaboration in the Kanban team

Organization

Industry

Team Size

Team composition

Team collaboration

Tools used

Tracking progress

Bonus nugget

Takeaways

References

Was the article useful?

Check out our product resources and related articles below:

Building MLOps Capabilities at GitLab As a One-Person ML Platform Team

How to Optimize Hyperparameter Search Using Bayesian Optimization and Optuna

Customizing LLM Output: Post-Processing Techniques

Deep Learning Optimization Algorithms

Explore more content topics:

Manage your model metadata in a single place

Tools used