Data Threads: Pair Analysis in Match 360

Karin Steckler
IBM Data Science in Practice
4 min readJan 4, 2023

--

Many enterprises strive for a 360° view on their customer data. This view allows them to understand their customers’ needs and history during every interaction. Traditionally, customer data has been scattered and siloed across organizations. With a data fabric and the customer 360 approach, enterprises can connect the data and achieve the 360° view they are looking for.

At the core of creating this 360° view is identifying who your customer really is. Data of the same person often shows up in various places in an organization. For example, an organization may have several divisions which each hold their own customer database. In addition, the person data may be scattered across different marketing, support, and other databases.

IBM Match 360 with Watson is a service on the Cloud Pak for Data Platform that allows you to load record data from different sources. This record data can be any type of data your organization considers master data. Most commonly this is person or organization data. Match 360 reads the record data and then moves records into entities where each entity represents a real-life person. This process is called “matching”.

Customers can choose the attributes of the records that should be used for matching. Alternatively, Match 360 provides a default algorithm to assist the matching process. This algorithm weighs the attributes and determines the difference of two records. The algorithm decides how much a difference of two records affects the decision to consider the records to belong to the same entity or not.

The default algorithm works very well in many situations. But there are cases in which relying on the default algorithm is just not good enough. The pair analysis feature will help customers improve the matching quality of Match 360 to address the specific customer’s needs.

Consider the example above where the two records differ in the phone number and very slightly in the name but the date of birth matches. Now, consider these records are found in two different organizations.

One organization uses the phone number to stay in contact with their customers but does not rely on the birthdate much. Another organization may have access to the birth records but does not have a high data quality on the phone number. These organization may come to opposite matching decisions for the two records mentioned above.

Introducing Pair Analysis

In Cloud Pak 4.5 we introduced a new capability of the Match 360 service called pair analysis that allows organizations to tune the matching algorithm for their individual business needs. The process requires a data steward to tell the service what pairs of record they consider to be matching. This data will then be used in a tuning step to optimize the algorithm.

The pair analysis process contains three phases.

In the first phase, a data engineer requests a new pair review. As part of triggering the process, the engineer needs to decide how many pairs will be generated and subsequently reviewed in step 2.

Once the pair review was requested, a process to generate meaningful pairs is started. These pairs are taken from the customers data and selected such that the tuning process can learn most from it.

In phase 2 of the pair analysis process, data stewards will find an open pair review request on their homepage. As part of the pair review, they now need to review the generated pairs and decide if they are matches or not.

Once all pairs have been reviewed a tuning process is started.

After all pairs were reviewed, an AI-powered tuning process is started. This process considers the data on the pairs as labeled data and tunes the weights for the individual attribute comparisons. It determines how significant a small difference on any given attribute is in comparison to a larger difference.

Once the weights are optimized, a second tuning step optimizes the autolink threshold. This is a threshold that determines the point at which two records are similar enough to link them to the same entity.

After the tuning process, the results are shown to the engineer who can now decide to apply the optimized threshold and weights or continue with the original configuration.

With this process, the matching engine can be optimized for very different customer requirements and make a large step on their journey towards a customer 360 experience.

If you are already an IBM Cloud Pak for Data customer try this today!

--

--