Database and Document - Data Science Current

NoSQL Databases and Their Use Cases

KDnuggets

MARCH 16, 2023

Learn about NoSQL Databases and their types like key-value, document, graph and column family with their use cases.

Database

Database SQL

Enabling SSL for Database in IBM SPSS CaDS on Liberty Server — Post-Installation Guide

IBM Data Science in Practice

MAY 19, 2025

Enabling SSL for Database in IBM SPSS CaDS on Liberty ServerPost-Installation Guide If youve recently installed the SPSS Collaboration and Deployment Services (CaDS) on IBM Liberty and are wondering how to securely connect to your database via SSL, this blog is for you. Why Enable SSL for DB Connections? Microsoft SQL Server).

Database

Database SQL Data Science

Ask your Documents with Langchain and Deep Lake!

Analytics Vidhya

SEPTEMBER 14, 2023

Introduction Large Language Models like langchain and deep lake have come a long way in Document Q&A and information retrieval. However, a […] The post Ask your Documents with Langchain and Deep Lake! These models know a lot about the world, but sometimes, they struggle to know when they don’t know something.

Analytics

Analytics Analytics Database Python

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Building Multi-Document Agentic RAG using LLamaIndex

Analytics Vidhya

SEPTEMBER 5, 2024

Enter Multi-Document Agentic RAG – a powerful approach that combines Retrieval-Augmented Generation (RAG) with agent-based systems to create AI that can reason across multiple documents.

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Analytics Analytics

A New Era of Text Generation: RAG, LangChain, and Vector Databases

Analytics Vidhya

NOVEMBER 5, 2023

One such groundbreaking approach is Retrieval Augmented Generation (RAG), which combines the power of generative models like GPT (Generative Pretrained Transformer) with the efficiency of vector databases and langchain.

Database

Database Natural Language Processing Analytics Analytics

A Deep Dive into Qdrant, the Rust-Based Vector Database

Analytics Vidhya

NOVEMBER 21, 2023

Introduction Vector Databases have become the go-to place for storing and indexing the representations of unstructured and structured data. In the ever-evolving landscape of […] The post A Deep Dive into Qdrant, the Rust-Based Vector Database appeared first on Analytics Vidhya.

Database

Database Deep Learning Deep Learning Analytics

Automate document processing with Amazon Bedrock Prompt Flows (preview)

AWS Machine Learning Blog

OCTOBER 29, 2024

Enterprises in industries like manufacturing, finance, and healthcare are inundated with a constant flow of documents—from financial reports and contracts to patient records and supply chain documents. An AWS Lambda function reads the Amazon Textract response and calls an Amazon Bedrock prompt flow to classify the document.

AWS

AWS ML ML Machine Learning

Build Semantic Search Applications Using Open Source Vector Database ChromaDB

Analytics Vidhya

JULY 18, 2023

Among such tools, today we will learn about the workings and functions of ChromaDB, an open-source vector database to store embeddings from […] The post Build Semantic Search Applications Using Open Source Vector Database ChromaDB appeared first on Analytics Vidhya.

Database

Database Analytics Analytics AI

Automating complex document processing: How Onity Group built an intelligent solution using Amazon Bedrock

AWS Machine Learning Blog

MAY 20, 2025

In the mortgage servicing industry, efficient document processing can mean the difference between business growth and missed opportunities. Onity processes millions of pages across hundreds of document types annually, including legal documents such as deeds of trust where critical information is often contained within dense text.

AWS

AWS ML ML AI

CRUD Operations in MongoDB

Analytics Vidhya

DECEMBER 13, 2022

Introduction MongoDB is a type of NoSQL Database, that stores data in document format(bson or binary json format). Its advantage over traditional SQL Databases includes the flexibility of schema-design, relaxation of its ACID properties and its distributed data storage capability thus performing better for […].

SQL

SQL Database Data Science Analytics

Building Custom Q&A Applications Using LangChain and Pinecone Vector Database

Analytics Vidhya

AUGUST 19, 2023

One of the fascinating applications of these models is developing custom question-answering or chatbots that draw from personal or organizational data sources. […] The post Building Custom Q&A Applications Using LangChain and Pinecone Vector Database appeared first on Analytics Vidhya.

Database

Database Artificial Intelligence Artificial Intelligence Analytics

Introduction to Apache CouchDB using Python

Analytics Vidhya

JULY 23, 2022

Introduction Apache CouchDB is an open-source, document-based NoSQL database developed by Apache Software Foundation and used by big companies like Apple, GenCorp Technologies, and Wells Fargo. This article was published as a part of the Data Science Blogathon.

Python

Python Database Data Science Analytics

Vector Streaming: Memory-efficient Indexing with Rust

Analytics Vidhya

SEPTEMBER 17, 2024

Introduction Vector streaming in EmbedAnything is being introduced, a feature designed to optimize large-scale document embedding. Today, I will show how to integrate it with the Weaviate Vector Database for seamless image embedding and search.

Database

Database Analytics Analytics

Intelligent document processing

Dataconomy

APRIL 30, 2025

Intelligent document processing (IDP) is transforming the way businesses manage their documentation and data management processes. By harnessing the power of emerging technologies, organizations can automate the extraction and handling of data from various document types, significantly enhancing operational workflows.

Natural Language Processing

Natural Language Processing Machine Learning Machine Learning ML

50+ MongoDB Interview Questions and Answers

Analytics Vidhya

JULY 18, 2024

Introduction MongoDB is a NoSQL database offering high performance and scalability. It stores data as documents, similar to JSON objects, allowing for complex structures like nested documents and arrays. It also reduces the need for joins with embedded documents and arrays.

Database

Database Analytics Analytics

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

AWS Machine Learning Blog

NOVEMBER 20, 2024

Whether it’s structured data in databases or unstructured content in document repositories, enterprises often struggle to efficiently query and use this wealth of information. The solution combines data from an Amazon Aurora MySQL-Compatible Edition database and data stored in an Amazon Simple Storage Service (Amazon S3) bucket.

Database

Database AWS SQL ETL

Introduction to Elasticsearch using Python

Analytics Vidhya

JULY 18, 2022

Introduction Elasticsearch is primarily a document-based NoSQL database, meaning developers do not need any prior knowledge of SQL to use it. Still, it is much more than just a NoSQL database. This article was published as a part of the Data Science Blogathon.

Python

Python SQL Database Data Science

How to Develop Serverless Code Using Azure Functions?

Analytics Vidhya

JANUARY 30, 2023

Whether we are analyzing IoT data streams, managing scheduled events, processing document uploads, responding to database changes, etc. Azure functions allow developers […] The post How to Develop Serverless Code Using Azure Functions? appeared first on Analytics Vidhya.

Azure

Azure Database Analytics Analytics

How To Create An Aggregation Pipeline In MongoDB

Analytics Vidhya

APRIL 12, 2021

Introduction MongoDB is a free open-source No-SQL document database. ArticleVideo Book This article was published as a part of the Data Science Blogathon. The post How To Create An Aggregation Pipeline In MongoDB appeared first on Analytics Vidhya.

SQL

SQL Data Science Database Analytics

Improve search results for AI using Amazon OpenSearch Service as a vector database with Amazon Bedrock

Flipboard

FEBRUARY 21, 2025

Search applications include ecommerce websites, document repository search, customer support call centers, customer relationship management, matchmaking for gaming, and application search. AWS recommends Amazon OpenSearch Service as a vector database for Amazon Bedrock as the building blocks to power your solution for these workloads.

Database

Database AI AI ML

Multi-tenancy in RAG applications in a single Amazon Bedrock knowledge base with metadata filtering

AWS Machine Learning Blog

APRIL 7, 2025

Additionally, we dive into integrating common vector database solutions available for Amazon Bedrock Knowledge Bases and how these integrations enable advanced metadata filtering and querying capabilities. Using the query embedding and the metadata filter, relevant documents are retrieved from the knowledge base.

Database

Database AWS Natural Language Processing AI

Master Vector Embeddings with Weaviate – A Comprehensive Series for You!

Data Science Dojo

JANUARY 22, 2025

Heres how embeddings power these advanced systems: Semantic Understanding LLMs use embeddings to represent words, sentences, and entire documents in a way that captures their semantic meaning. The process enables the models to find the most relevant sections of a document or dataset, improving the accuracy and relevance of their outputs.

Database

Database ML ML AI

Understanding the popular database management system: MySQL

Data Science Dojo

MARCH 25, 2024

MySQL is a popular database management system that is used globally and across different domains. It is one of the most popular database management systems (DBMS) globally that supports all major operating systems: Linux, macOS, and Windows. Databases are stored on a server, which is typically a remote computer or a cloud server.

Database

Database SQL

Understanding the popular database management system: MySQL

Data Science Dojo

MARCH 25, 2024

MySQL is a popular database management system that is used globally and across different domains. It is one of the most popular database management systems (DBMS) globally that supports all major operating systems: Linux, macOS, and Windows. Databases are stored on a server, which is typically a remote computer or a cloud server.

Database

Database SQL

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Flipboard

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis. or a later version) database.

ETL

ETL Data Warehouse Analytics Analytics

Retrieval augmented generation (RAG) – Elevate your large language models experience

Data Science Dojo

DECEMBER 6, 2023

This process is typically facilitated by document loaders, which provide a “load” method for accessing and loading documents into the memory. This involves splitting lengthy documents into smaller chunks that are compatible with the model and produce accurate and clear results.

Database

Database Data Preparation Algorithm AI

Perplexity acquires Carbon, a Seattle startup that helps developers connect data sources to LLMs

Flipboard

DECEMBER 18, 2024

. “Carbon will make it easier for Perplexity’s answer engine to be informed by diverse sources of information, whether that data resides in internal databases, cloud storage, or document repositories.” ” Carbon raised a $1.3 million seed round in 2023.

Computer Science

Computer Science Computer Science Database AI

Complete roadmap of LlamaIndex to Creating Personalized Q&A Chatbots

Data Science Dojo

SEPTEMBER 28, 2023

It supports a variety of data sources, including APIs, databases, and PDFs. Key components of LlamaIndex: The key components of LlamaIndex are as follows: Data connectors:  These components allow LlamaIndex to ingest data from a variety of sources, such as APIs, databases, and PDFs.

Natural Language Processing

Natural Language Processing Database Data Science Analytics

Natural Language Processing Using CNNs for Sentence Classification

Analytics Vidhya

SEPTEMBER 2, 2021

This article was published as a part of the Data Science Blogathon Overview Sentence classification is one of the simplest NLP tasks that have a wide range of applications including document classification, spam filtering, and sentiment analysis. A question database will be used for this article and […].

Natural Language Processing

Natural Language Processing Data Science Database Analytics

Building a Multimodal RAG Pipeline using Gemma 3 and Docling

Analytics Vidhya

MARCH 28, 2025

In this tutorial, we explore how to set up and execute a sophisticated retrieval-augmented generation (RAG) pipeline in Google Colab.

Database

Database Analytics Analytics

From keywords to conversations: Reimagining document discovery with Amazon Bedrock

Flipboard

MAY 5, 2025

By combining LLMs and RAG on Amazon Bedrock , organizations can transform static document troves into dynamic, intuitive interfaces for discovery. Users must rely on specific phrases and terminology to find relevant documents, which becomes challenging when searching for complex information requiring deeper language understanding.

AWS

AWS Data Silos Database Artificial Intelligence

Implement RAG while meeting data residency requirements using AWS hybrid and edge services

Flipboard

JANUARY 14, 2025

The documents uploaded to the knowledge base on the rack might be private and sensitive documents, so they wont be transferred to the AWS Region and will remain completely local on the Outpost rack. This vector database will store the vector representations of your documents, serving as a key component of your local Knowledge Base.

AWS

AWS Database AI AI

A Beginner’s Guide to MongoDB and CRUD Operations

Analytics Vidhya

MAY 31, 2023

Introduction In this guide, we will explore the fundamentals of MongoDB and delve into the essential CRUD (Create, Read, Update, Delete) operations that form the backbone of any database system.

Database

Database Analytics Analytics SQL

MongoRAG: Leveraging MongoDB Atlas as a Vector Database with Databricks-Deployed Embedding Model and LLMs for Retrieval-Augmented Generation

Towards AI

JANUARY 29, 2025

Retrieval Augmented Generation generally consists of Three major steps, I will explain them briefly down below – Information Retrieval The very first step involves retrieving relevant information from a knowledge base, database, or vector database, where we store the embeddings of the data from which we will retrieve information.

Database

Database Clustering Python SQL

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

Flipboard

NOVEMBER 19, 2024

A common adoption pattern is to introduce document search tools to internal teams, especially advanced document searches based on semantic search. In a real-world scenario, organizations want to make sure their users access only documents they are entitled to access. The following diagram depicts the solution architecture.

AWS

AWS AI AI Big Data

Search enterprise data assets using LLMs backed by knowledge graphs

Flipboard

NOVEMBER 27, 2024

Customers want to search through all of the data and applications across their organization, and they want to see the provenance information for all of the documents retrieved. Enhance the JSON format metadata to JSON-LD format by adding context, and load the data to an Amazon Neptune Serverless database as RDF triples. raw_customer".

AWS

AWS Database ML ML

Fauna Service Winding Down

Hacker News

MARCH 19, 2025

The truly serverless database that combines the power of a relational database with the flexibility of JSON documents.

Database

What is LangChain? Key Features, Tools, and Use Cases

Data Science Dojo

OCTOBER 24, 2024

It also connects effortlessly with collaboration tools like Airtable, Trello, Figma, and Notion, as well as databases including Pandas, MongoDB, and Microsoft databases. For instance, a healthcare application could integrate patient data from a secure database with the latest medical research.

Database

Database Natural Language Processing AI AI

Leaked Midjourney artist database could be a moment of reckoning for AI art

Flipboard

JANUARY 4, 2024

Over 16,000 artists are named in the document.

Database

Database AI AI

Accelerate AWS Well-Architected reviews with Generative AI

Flipboard

MARCH 4, 2025

We demonstrate how to harness the power of LLMs to build an intelligent, scalable system that analyzes architecture documents and generates insightful recommendations based on AWS Well-Architected best practices. An interactive chat interface allows deeper exploration of both the original document and generated content.

AWS

AWS AI AI Database

Top 10 Python packages you need to master to maximize your coding productivity

Data Science Dojo

MAY 1, 2023

It is designed to simplify the process of working with databases by providing a consistent and high-level interface. It offers a set of utilities and abstractions that make it easier to interact with relational databases using SQL queries. BeautifulSoup BeautifulSoup is a Python library for parsing HTML and XML documents.

Python

Python Machine Learning Machine Learning Data Science

What is an LLM Bootcamp? What Does Data Science Dojo Offer for Your Success?

Data Science Dojo

NOVEMBER 5, 2024

It covers a range of topics including generative AI, LLM basics, natural language processing, vector databases, prompt engineering, and much more. You get a chance to work on various projects that involve practical exercises with vector databases, embeddings, and deployment frameworks.

Data Science

Data Science Azure Natural Language Processing Database

Enhance customer support with Amazon Bedrock Agents by integrating enterprise data APIs

AWS Machine Learning Blog

NOVEMBER 7, 2024

Access to car manuals and technical documentation helps the agent provide additional context for curated guidance, enhancing the quality of customer interactions. The workflow includes the following steps: Documents (owner manuals) are uploaded to an Amazon Simple Storage Service (Amazon S3) bucket.

AWS

AWS Python AI AI

Build Custom Retriever using LLamaIndex and Gemini

Analytics Vidhya

APRIL 30, 2024

Chat with Multiple Documents using Gemini LLM is the project use case on which we will build this RAG pipeline. Introduction Retriever is the most important part of the RAG(Retrieval Augmented Generation) pipeline. In this article, you will implement a custom retriever combining Keyword and Vector search retriever using LlamaIndex.

Analytics

Analytics Analytics Database AI

NoSQL Databases and Their Use Cases

Enabling SSL for Database in IBM SPSS CaDS on Liberty Server — Post-Installation Guide

Webinars

Trending Sources

Ask your Documents with Langchain and Deep Lake!

Webinars

Building Multi-Document Agentic RAG using LLamaIndex

A New Era of Text Generation: RAG, LangChain, and Vector Databases

A Deep Dive into Qdrant, the Rust-Based Vector Database

Automate document processing with Amazon Bedrock Prompt Flows (preview)

Build Semantic Search Applications Using Open Source Vector Database ChromaDB

Automating complex document processing: How Onity Group built an intelligent solution using Amazon Bedrock

CRUD Operations in MongoDB

Building Custom Q&A Applications Using LangChain and Pinecone Vector Database

Introduction to Apache CouchDB using Python

Vector Streaming: Memory-efficient Indexing with Rust

Intelligent document processing

50+ MongoDB Interview Questions and Answers

Unify structured data in Amazon Aurora and unstructured data in Amazon S3 for insights using Amazon Q

Introduction to Elasticsearch using Python

How to Develop Serverless Code Using Azure Functions?

How To Create An Aggregation Pipeline In MongoDB

Improve search results for AI using Amazon OpenSearch Service as a vector database with Amazon Bedrock

Multi-tenancy in RAG applications in a single Amazon Bedrock knowledge base with metadata filtering

Master Vector Embeddings with Weaviate – A Comprehensive Series for You!

Understanding the popular database management system: MySQL

Understanding the popular database management system: MySQL

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Retrieval augmented generation (RAG) – Elevate your large language models experience

Perplexity acquires Carbon, a Seattle startup that helps developers connect data sources to LLMs

Complete roadmap of LlamaIndex to Creating Personalized Q&A Chatbots

Natural Language Processing Using CNNs for Sentence Classification

Building a Multimodal RAG Pipeline using Gemma 3 and Docling

From keywords to conversations: Reimagining document discovery with Amazon Bedrock

Implement RAG while meeting data residency requirements using AWS hybrid and edge services

A Beginner’s Guide to MongoDB and CRUD Operations

MongoRAG: Leveraging MongoDB Atlas as a Vector Database with Databricks-Deployed Embedding Model and LLMs for Retrieval-Augmented Generation

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

Search enterprise data assets using LLMs backed by knowledge graphs

Fauna Service Winding Down

What is LangChain? Key Features, Tools, and Use Cases

Leaked Midjourney artist database could be a moment of reckoning for AI art

Accelerate AWS Well-Architected reviews with Generative AI

Top 10 Python packages you need to master to maximize your coding productivity

What is an LLM Bootcamp? What Does Data Science Dojo Offer for Your Success?

Enhance customer support with Amazon Bedrock Agents by integrating enterprise data APIs

Build Custom Retriever using LLamaIndex and Gemini

Stay Connected