Enhancing Data Fabric with SQL Asset in IBM Knowledge Catalog

Published in

IBM Data Science in Practice

7 min readApr 26, 2024

In the era of digital transformation, enterprises are increasingly relying on the power of artificial intelligence (AI) to unlock valuable insights from their vast repositories of data. Within this landscape, Cloud Pak for Data (CP4D) emerges as a pivotal platform. CP4D offers a robust data fabric architecture that ensures the trust and governance necessary for AI-driven decision-making. Data fabric is a cohesive architecture designed to simplify data access and management across diverse environments. At the heart of this architecture lies the IBM Knowledge Catalog, empowering enterprises with intelligent metadata management capabilities. In this blog, we explore how the introduction of dynamic view enhances the metadata enrichment process within the IBM Knowledge Catalog, enhancing data governance and consumption.

Understanding Data Fabric and IBM Knowledge Catalog

A data fabric is an architectural blueprint that helps transcending traditional data silos and complexities. It enables organizations to seamlessly access and utilize data assets irrespective of their location or format. Central to this approach is the IBM Knowledge Catalog, a cloud-based repository that facilitates end-to-end data management. By consolidating infrastructure, technical, and business metadata, the Knowledge Catalog empowers users to govern, trust, protect, and consume data effectively.

Metadata Enrichment: Empowering Data Governance

Data Quality Tab from Metadata Enrichment

Metadata enrichment is a crucial aspect of data governance, enabling organizations to enhance the quality and context of their data assets. Through the IBM Knowledge Catalog’s metadata enrichment feature, users can leverage data quality analysis, profiling, and automatic term assignment and data classification to enrich their data seamlessly. This integrated experience accelerates the delivery of quality data, kick-starts governance initiatives, and fosters a deeper understanding of business data. More on metadata enrichment can be read here.

Introducing SQL Asset Type

A significant enhancement to the metadata enrichment process is the introduction of SQL Asset Type. This feature enables users to enrich and analyze data assets using filtered data, thereby optimizing resource utilization and enhancing data quality measurement. By embedding SQL queries within data assets, enterprises can precisely target metadata enrichment and data quality analysis, aligning with specific business requirements.

Dynamic Views with SQL Asset Type

Incorporating SQL Asset introduces the concept of dynamic views, which are data assets created dynamically based on an SQL query. Depending on the SQL query employed, dynamic views can contain a subset or a superset of columns based on column selection, specific rows based on conditional expressions, or a combination thereof. The ability to dynamically edit SQL queries within dynamic views enhances adaptability in data analysis. Users can fine-tune filtering criteria, incorporate additional data sources, or adjust analysis parameters on-the-fly, responding effectively to evolving research questions or business needs.

The resulting dynamic view can be utilized in various capacities:

Data Science: Data scientists can leverage dynamic views within analytic tools such as notebooks, SPSS Modeler, or Data Refinery, enabling them to conduct in-depth analysis and derive valuable insights from filtered and refined datasets.
Data Stewardship: Data stewards can utilize dynamic views for metadata enrichment, profiling, and data governance activities. By enriching dynamic views with relevant metadata, organizations can improve data discoverability, understandability, and lineage, thereby enhancing data governance and compliance efforts.
Data Quality Analysis: Dynamic views serve as valuable assets for data quality analysts, enabling them to run comprehensive data quality analysis processes. By assessing the quality of data within dynamic views, organizations can identify and address data discrepancies, inconsistencies, and inaccuracies, ensuring the reliability and integrity of their data assets.

Example

Imagine you’re managing a large dataset containing information about customer demographics, purchasing behavior, and product preferences. This dataset spans a wide range of ages, from teenagers to senior citizens. The data scientist on your team is interested in studying the behavior of individuals aged 20–30 years to understand their preferences and sentiments towards certain products.

Traditionally, if you were to conduct metadata enrichment and data quality analysis on the entire dataset, it would involve processing a vast amount of information, including data from age groups outside the target range. This approach could be time-consuming and resource-intensive, with a significant portion of the analysis focusing on irrelevant data.

Now, let’s apply the concept of SQL Asset to streamline this process:

Filtering Data: With SQL Asset, you can create a SQL query that filters the dataset to include only individuals aged 20–30 years. This step effectively reduces the dataset to a subset that is relevant to the data scientist’s analysis objectives.
Metadata Enrichment: Once the data has been filtered, you can then run metadata enrichment specifically on this subset of data. This involves enhancing the metadata associated with each data point, such as adding tags, descriptions, and quality metrics.

Steps to create SQL Asset

Let’s walk through the series of steps to create SQL Asset as described in the example above.

Step 1: Create New asset

Click on “New asset” button after selecting the Project from the IBM Cloud Pak for Data

Step 2: Select SQL Query

Step 3: Create a dynamic view of data

We can create a dynamic view of data by giving the details as shown below, selecting connection, giving tags, and entering the query.

Step 4: View the new SQL Asset

Step 5: Preview the created asset

Click on the asset name to preview it and to validate that it is picking only the relevant data, in our case age group between 20 and 30.

As discussed, we can now run profiling on this subset of data.

Benefits of SQL Asset

From the above example we show how the incorporation of SQL Asset brings forth a myriad of benefits for organizations striving to maximize the value of their data assets:

Precision and Efficiency: By leveraging SQL queries to filter data, organizations can precisely target metadata enrichment and other usecases. This results in enhanced efficiency and accuracy in assessing the quality and relevance of data assets.
Resource Optimization: Traditional metadata enrichment processes often involve analyzing vast datasets, consuming significant time and resources. With SQL Asset, organizations can filter out irrelevant data, focusing resources on the most pertinent datasets.
Tailored Governance: Different data assets may require varying levels of governance based on their importance and sensitivity. SQL Asset allows organizations to tailor metadata enrichment and data quality analysis according to the specific needs of each dataset, ensuring compliance and mitigating risks effectively.
Accelerated Insights: By streamlining the metadata enrichment process, SQL Asset accelerates the generation of insights from data assets. Organizations can swiftly extract valuable insights, driving informed decision-making and fostering innovation.

A note on Data Quality Rules

Data quality rules define the standards and expectations for data accuracy, completeness, consistency, and validity. As part of data quality analysis, organizations often create and apply a set of data quality rules to identify and address discrepancies or anomalies within their datasets. By running data quality rules directly on SQL assets, they efficiently identify and address data discrepancies, ensuring data integrity and reliability. Whether detecting outliers, enforcing regulatory compliance, or validating against benchmarks, SQL Asset facilitates targeted data quality analysis, enhancing trustworthiness and driving better decision-making.

Summary

In the era of data-driven decision-making, effective management and governance of data assets are paramount. The integration of SQL Asset within the metadata enrichment process represents a significant leap forward in enhancing data fabric capabilities. By enabling precise targeting of metadata enrichment and data quality analysis, organizations can unlock the full potential of their data assets while optimizing resources and driving innovation.

Moreover, the versatility of SQL Asset extends beyond metadata enrichment and data quality analysis. Once created, SQL assets can be leveraged for a multitude of operations, functions, and features across the CP4D platform. Whether it’s performing advanced analytics, generating insights, or driving machine learning models, SQL assets offer a flexible and powerful tool for data manipulation and exploration.

Acknowledgement

I would like to thank Ananya Sarkar for helping me with this blog.