Document, Python and SQL - Data Science Current

Integrating DuckDB & Python: An Analytics Guide

KDnuggets

JUNE 10, 2025

By Josep Ferrer , KDnuggets AI Content Specialist on June 10, 2025 in Python Image by Author DuckDB is a fast, in-process analytical database designed for modern data analysis. As understanding how to deal with data is becoming more important, today I want to show you how to build a Python workflow with DuckDB and explore its key features.

Python

Python Analytics Analytics SQL

Go vs. Python for Modern Data Workflows: Need Help Deciding?

KDnuggets

JUNE 19, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter Go vs. Python for Modern Data Workflows: Need Help Deciding?

Python

Python Natural Language Processing Data Science Machine Learning

Why You Need RAG to Stay Relevant as a Data Scientist

KDnuggets

JUNE 11, 2025

By Nate Rosidi , KDnuggets Market Trends & SQL Content Specialist on June 11, 2025 in Language Models Image by Author | Canva If you work in a data-related field, you should update yourself regularly. Instead of generating answers from parameters, the RAG can collect relevant information from the document. What is a retriever?

Data Scientist

Data Scientist Natural Language Processing Data Science Machine Learning

Building a Custom PDF Parser with PyPDF and LangChain

KDnuggets

JUNE 12, 2025

py # (Optional) to mark directory as Python package You can leave the __init.py__ file empty, as its main purpose is simply to indicate that this directory should be treated as a Python package. Tools Required(requirements.txt) The necessary libraries required are: PyPDF : A pure Python library to read and write PDF files.

Data Science

Data Science Natural Language Processing Python Machine Learning

Introduction to Elasticsearch using Python

Analytics Vidhya

JULY 18, 2022

Introduction Elasticsearch is primarily a document-based NoSQL database, meaning developers do not need any prior knowledge of SQL to use it. The post Introduction to Elasticsearch using Python appeared first on Analytics Vidhya. Still, it is much more than just a NoSQL database.

Python

Python SQL Database Data Science

7 Cool Python Projects to Automate the Boring Stuff

Flipboard

JUNE 9, 2025

By Bala Priya C , KDnuggets Contributing Editor & Technical Content Specialist on June 9, 2025 in Python Image by Author | Ideogram Have you ever spent several hours on repetitive tasks that leave you feeling bored and… unproductive? But you can automate most of this boring stuff with Python. I totally get it. Let’s get started.

Python

Python Natural Language Processing Data Science Machine Learning

Top 10 Python packages you need to master to maximize your coding productivity

Data Science Dojo

MAY 1, 2023

Python is a powerful and versatile programming language that has become increasingly popular in the field of data science. NumPy NumPy is a fundamental package for scientific computing in Python. Seaborn Seaborn is a library for creating attractive and informative statistical graphics in Python.

Python

Python Machine Learning Machine Learning Data Science

Automating GitHub Workflows with Claude 4

KDnuggets

JUNE 13, 2025

Documentation Updates: Automatically update documentation based on code changes. Issue Triage: Analyze issues, categorize them, and suggest or implement fixes. Debugging and Bug Fixing: Locate bugs, implement fixes, and create PRs for review. Refactoring Code: Improve code readability, performance, or maintainability.

Natural Language Processing

Natural Language Processing Data Science Machine Learning Machine Learning

Serve Machine Learning Models via REST APIs in Under 10 Minutes

KDnuggets

JULY 4, 2025

Run it once to generate the model file: python model/train_model.py More On This Topic FastAPI Tutorial: Build APIs with Python in Minutes Build a Data Cleaning & Validation Pipeline in Under 50 Lines of Python Top 5 Machine Learning APIs Practitioners Should Know 5 Machine Learning Models Explained in 5 Minutes 3 APIs to Access Gemini 2.5

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Science

MLFlow Mastery: A Complete Guide to Experiment Tracking and Model Management

KDnuggets

JUNE 23, 2025

A project contains: Source code : The Python scripts or notebooks for training and evaluation. Example MLproject file: name: my_ml_project conda_env: conda.yaml entry_points: main: parameters: data_path: {type: str, default: "data.csv"} epochs: {type: int, default: 10} command: "python train.py --data_path {data_path} --epochs {epochs}" 3.

Machine Learning

Machine Learning Machine Learning Natural Language Processing Data Science

The 7 Most Useful Jupyter Notebook Extensions for Data Scientists

KDnuggets

JUNE 18, 2025

Follow the installation process outlined in the documentation, and you will see a new tab in your Jupyter Notebook labeled Nbextensions. By using Python code, we can generate an interactive visualization that enables users to engage in a more intuitive data exploration process. We can see an example of Jupyter Widgets below.

Data Scientist

Data Scientist Natural Language Processing Data Science Machine Learning

NotebookLM + Deep Research: The Ultimate Learning Hack

KDnuggets

JUNE 17, 2025

Step 4: Leverage NotebookLM’s Tools Audio Overview This feature converts your document, slides, or PDFs into a dynamic, podcast-style conversation with two AI hosts that summarize and connect key points. Study Guides & Briefing Docs In the “Studio” panel, you can generate structured outputs such as study guides or briefing documents.

Natural Language Processing

Natural Language Processing Data Science Machine Learning Machine Learning

10 FREE AI Tools That’ll Save You 10+ Hours a Week

KDnuggets

JUNE 25, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter 10 FREE AI Tools That’ll Save You 10+ Hours a Week No tech skills needed.

Natural Language Processing

Natural Language Processing Data Science AI AI

Deploying the Magistral vLLM Server on Modal

KDnuggets

JUNE 17, 2025

With Modal, you can configure your Python app, including system requirements like GPUs, Docker images, and Python dependencies, and then deploy it to the cloud with a single command. First, install the Modal Python client. file and add the following code for: Defining a vLLM image based on Debian Slim, with Python 3.12

Natural Language Processing

Natural Language Processing Machine Learning Machine Learning Data Science

Make Sense of a 10K+ Line GitHub Repos Without Reading the Code

KDnuggets

JUNE 24, 2025

Traditional methods of understanding code structures involve reading through numerous files and documentation, which can be time-consuming and error-prone. GitDiagram offers a solution by converting GitHub repositories into interactive diagrams, providing a visual representation of the codebases architecture.

Natural Language Processing

Natural Language Processing Data Science Machine Learning Machine Learning

How To Create An Aggregation Pipeline In MongoDB

Analytics Vidhya

APRIL 12, 2021

Introduction MongoDB is a free open-source No-SQL document database. ArticleVideo Book This article was published as a part of the Data Science Blogathon. The post How To Create An Aggregation Pipeline In MongoDB appeared first on Analytics Vidhya.

SQL

SQL Data Science Database Analytics

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

AWS Machine Learning Blog

DECEMBER 6, 2023

Such data often lacks the specialized knowledge contained in internal documents available in modern businesses, which is typically needed to get accurate answers in domains such as pharmaceutical research, financial investigation, and customer support. For example, imagine that you are planning next year’s strategy of an investment company.

SQL

SQL AWS Analytics Analytics

Top 7 MCP Clients for AI Tooling

KDnuggets

JUNE 11, 2025

Cursor AI If you use Cursor for coding or editing, integrating multiple MCP servers has become essential for boosting its capabilities—giving you easy access to the web, databases, documentation, APIs, and external services. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

Natural Language Processing

Natural Language Processing Data Science Machine Learning Machine Learning

Top 10 Python packages you need to master to maximize your coding productivity

Data Science Dojo

MAY 1, 2023

Python is a powerful and versatile programming language that has become increasingly popular in the field of data science. NumPy NumPy is a fundamental package for scientific computing in Python. Seaborn Seaborn is a library for creating attractive and informative statistical graphics in Python.

Python

Python Machine Learning Machine Learning Data Science

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

AWS Machine Learning Blog

MARCH 18, 2025

SQL is one of the key languages widely used across businesses, and it requires an understanding of databases and table metadata. This can be overwhelming for nontechnical users who lack proficiency in SQL. This application allows users to ask questions in natural language and then generates a SQL query for the users request.

SQL

SQL Database AI AI

Automatically Build AI Workflows with Magical AI

KDnuggets

JUNE 16, 2025

PDF Data Extraction: Upload a document, highlight the fields you need, and Magical AI will transfer them into online forms or databases, saving you hours of tedious work. You can find detailed step-by-step for many different workflows in Magical AIs own documentation. It even learns your tone over time.

Natural Language Processing

Natural Language Processing Data Science AI AI

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Data Science Blog

SEPTEMBER 19, 2023

Documentation and Disaster Recovery Made Easy Data is the lifeblood of any organization, and losing it can be catastrophic. using for loops in Python). The following Terraform script will create an Azure Resource Group, a SQL Server, and a SQL Database. So why using IaC for Cloud Data Infrastructures?

Data Warehouse

Data Warehouse Azure SQL Database

Automate invoice processing with Streamlit and Amazon Bedrock

AWS Machine Learning Blog

NOVEMBER 14, 2024

Streamlit is an open source framework for data scientists to efficiently create interactive web-based data applications in pure Python. Solution overview This solution uses the Amazon Bedrock Knowledge Bases chat with document feature to analyze and extract key details from your invoices, without needing a knowledge base.

AWS

AWS Python AI AI

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

AWS Machine Learning Blog

FEBRUARY 28, 2024

Structured Query Language (SQL) is a complex language that requires an understanding of databases and metadata. Today, generative AI can enable people without SQL knowledge. This generative AI task is called text-to-SQL, which generates SQL queries from natural language processing (NLP) and converts text into semantically correct SQL.

SQL

SQL AWS Database ML

Imperva optimizes SQL generation from natural language using Amazon Bedrock

AWS Machine Learning Blog

JUNE 20, 2024

The data is stored in a data lake and retrieved by SQL using Amazon Athena. The following figure shows a search query that was translated to SQL and run. Data is normally stored in databases, and can be queried using the most common query language, SQL. Constructing SQL queries from natural language isn’t a simple task.

SQL

SQL Database AWS Machine Learning

From RAG to fabric: Lessons learned from building real-world RAGs at GenAIIC – Part 2

AWS Machine Learning Blog

NOVEMBER 15, 2024

This centralized system consolidates a wide range of data sources, including detailed reports, FAQs, and technical documents. The system integrates structured data, such as tables containing product properties and specifications, with unstructured text documents that provide in-depth product descriptions and usage guidelines.

Database

Database K-nearest Neighbors Data Analysis SQL

Understanding databases: A comprehensive guide to different types for beginners

Data Science Dojo

APRIL 6, 2023

While Python and R are popular for analysis and machine learning, SQL and database management are often overlooked. However, data is typically stored in databases and requires SQL or business intelligence tools for access. They use Structured Query Language (SQL) for managing and querying data. What is SQL?

Database

Database SQL Data Science Business Intelligence

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

AWS Machine Learning Blog

OCTOBER 24, 2024

The agent can generate SQL queries using natural language questions using a database schema DDL (data definition language for SQL) and execute them against a database instance for the database tier. The following are sample user queries: Write a Python function to validate email address syntax.

AWS

AWS SQL Database AI

MongoRAG: Leveraging MongoDB Atlas as a Vector Database with Databricks-Deployed Embedding Model and LLMs for Retrieval-Augmented Generation

Towards AI

JANUARY 29, 2025

Basically, its MongoDB on Cloud, users can create an account by signing up from their official website provided below – MongoDB Atlas: Cloud Document Database | MongoDB After signing in for the very first time, just follow the steps mentioned in the below documentation to spin up a free cluster.

Database

Database Clustering Python SQL

Data lakehouse

Dataconomy

JUNE 18, 2025

Evolution of data warehouses Data warehouses emerged in the 1980s, designed as structured data repositories conducive to high-performance SQL queries and ACID transactions. SQL performance tuning: On-the-fly optimization of data formats for diverse queries.

Data Lakes

Data Lakes Data Warehouse Business Intelligence Business Intelligence

LangChain SQL Agent for Massive Documents Interaction

Towards AI

MARCH 7, 2024

Instead, we will leverage LangChain’s SQL Agent to generate complex database queries from human text. The documents should contain data with a bunch of specifications, alongside more fluid, natural language descriptions. Analyze the content of each document using GPT to parse it into JSON objects. I’m using Python 3.11.

SQL

SQL Database Python Azure

How to Develop Serverless Code Using Azure Functions?

Analytics Vidhya

JANUARY 30, 2023

Whether we are analyzing IoT data streams, managing scheduled events, processing document uploads, responding to database changes, etc. Azure functions allow developers […] The post How to Develop Serverless Code Using Azure Functions? appeared first on Analytics Vidhya.

Azure

Azure Database Analytics Analytics

Build a Text-to-SQL solution for data consistency in generative AI using Amazon Nova

AWS Machine Learning Blog

JUNE 6, 2025

Text-to-SQL bridges this gap by generating precise, schema-specific queries that empower faster decision-making and foster a data-driven culture. We show how to effectively use Text-to-SQL using Amazon Nova , a foundation model (FM) available in Amazon Bedrock , to derive precise and reliable answers from your data.

SQL

SQL Database AWS AI

7 Popular LLMs Explained in 7 Minutes

Flipboard

JUNE 26, 2025

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter 7 Popular LLMs Explained in 7 Minutes Get a quick overview of GPT, BERT, LLaMA, and more!

Data Science

Data Science Natural Language Processing Machine Learning Machine Learning

Transforming credit decisions using generative AI with Rich Data Co and AWS

AWS Machine Learning Blog

FEBRUARY 10, 2025

It aims to boost team efficiency by answering complex technical queries across the machine learning operations (MLOps) lifecycle, drawing from a comprehensive knowledge base that includes environment documentation, AI and data science expertise, and Python code generation. Its also adept at troubleshooting coding errors.

AWS

AWS Data Science AI AI

Best practices for prompt engineering with Meta Llama 3 for Text-to-SQL use cases

AWS Machine Learning Blog

AUGUST 30, 2024

In this post, we provide an overview of the Meta Llama 3 models available on AWS at the time of writing, and share best practices on developing Text-to-SQL use cases using Meta Llama 3 models. Meta Llama 3’s capabilities enhance accuracy and efficiency in understanding and generating SQL queries from natural language inputs.

SQL

SQL AWS Database AI

HCLTech’s AWS powered AutoWise Companion: A seamless experience for informed automotive buyer decisions with data-driven design

AWS Machine Learning Blog

JANUARY 15, 2025

Based on the customer query and context, the system dynamically generates text-to-SQL queries, summarizes knowledge base results using semantic search , and creates personalized vehicle brochures based on the customers preferences. This seamless process is facilitated by Retrieval Augmentation Generation (RAG) and a text-to-SQL framework.

AWS

AWS SQL AI AI

Snowflake Arctic models are now available in Amazon SageMaker JumpStart

AWS Machine Learning Blog

AUGUST 22, 2024

Snowflake Arctic is a family of enterprise-grade large language models (LLMs) built by Snowflake to cater to the needs of enterprise users, exhibiting exceptional capabilities (as shown in the following benchmarks ) in SQL querying, coding, and accurately following instructions. To learn more, refer to API documentation.

SQL

SQL Python AWS ML

Overcoming 12 Challenges in Building Production-Ready RAG-based LLM Applications

Data Science Dojo

MARCH 29, 2024

Common Challenges in Data Ingestion Pipeline Challenge 1: Data Extraction: Parsing Complex Data Structures: Extracting data from various types of documents, such as PDFs with embedded tables or images, can be challenging. Program synthesis for symbolic reasoning, utilizing languages like Python or SQL.

Database

Database Clustering SQL Machine Learning

Launch HN: Chonkie (YC X25) – Open-Source Library for Advanced Chunking

Hacker News

JUNE 9, 2025

Python: https://github.com/chonkie-inc/chonkie TypeScript: https://github.com/chonkie-inc/chonkie-ts Here's a video showing our code chunker: https://youtu.be/Xclkh6bU1P0. . 200k+ tokens) with many SQL snippets, query results and database metadata (e.g.

Database

Database SQL ETL AI

Why Django is one of the best frameworks for web development?

Dataconomy

JULY 29, 2024

Django is a high-level open source framework written in the Python programming language. As a result of hard work, they created the world’s first website in Python and along the way developed their own framework, which they called Django, in honor of the jazz musician Django Reinhardt. Django has excellent documentation.

Python

Python SQL Machine Learning Machine Learning

How to Save Trained Model in Python

The MLOps Blog

MAY 10, 2023

How to save a trained model in Python? Saving trained model with pickle The pickle module can be used to serialize and deserialize the Python objects. For saving the ML models used as a pickle file, you need to use the Pickle module that already comes with the default Python installation. Now let’s see how we can save our model.

Python

Python ML ML Database

How to Split Text For Vector Embeddings in Snowflake

phData

NOVEMBER 28, 2024

However, this also calls for the need to build processes to split large text data from various documents, which is required for RAG applications, into smaller chunks so that they can be embedded and retrieved efficiently as vectors. We can split a large document or text into smaller chunks. Return the chunks as an ARRAY.

Python

Python Database SQL Machine Learning

Achieve ~2x speed-up in LLM inference with Medusa-1 on Amazon SageMaker AI

AWS Machine Learning Blog

FEBRUARY 12, 2025

medium instance with a Python 3 (ipykernel) kernel. For this post, we use a dataset called sql-create-context , which contains samples of natural language instructions, schema definitions and the corresponding SQL query. For details, refer to Creating an AWS account.

AWS

AWS AI AI SQL

Integrating DuckDB & Python: An Analytics Guide

Go vs. Python for Modern Data Workflows: Need Help Deciding?

Trending Sources

Why You Need RAG to Stay Relevant as a Data Scientist

Building a Custom PDF Parser with PyPDF and LangChain

Introduction to Elasticsearch using Python

7 Cool Python Projects to Automate the Boring Stuff

Top 10 Python packages you need to master to maximize your coding productivity

Automating GitHub Workflows with Claude 4

Serve Machine Learning Models via REST APIs in Under 10 Minutes

MLFlow Mastery: A Complete Guide to Experiment Tracking and Model Management

The 7 Most Useful Jupyter Notebook Extensions for Data Scientists

NotebookLM + Deep Research: The Ultimate Learning Hack

10 FREE AI Tools That’ll Save You 10+ Hours a Week

Deploying the Magistral vLLM Server on Modal

Make Sense of a 10K+ Line GitHub Repos Without Reading the Code

How To Create An Aggregation Pipeline In MongoDB

Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock

Top 7 MCP Clients for AI Tooling

Top 10 Python packages you need to master to maximize your coding productivity

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

Automatically Build AI Workflows with Magical AI

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Automate invoice processing with Streamlit and Amazon Bedrock

Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources

Imperva optimizes SQL generation from natural language using Amazon Bedrock

From RAG to fabric: Lessons learned from building real-world RAGs at GenAIIC – Part 2

Understanding databases: A comprehensive guide to different types for beginners

Create a generative AI-based application builder assistant using Amazon Bedrock Agents

MongoRAG: Leveraging MongoDB Atlas as a Vector Database with Databricks-Deployed Embedding Model and LLMs for Retrieval-Augmented Generation

Data lakehouse

LangChain SQL Agent for Massive Documents Interaction

How to Develop Serverless Code Using Azure Functions?

Build a Text-to-SQL solution for data consistency in generative AI using Amazon Nova

7 Popular LLMs Explained in 7 Minutes

Transforming credit decisions using generative AI with Rich Data Co and AWS

Best practices for prompt engineering with Meta Llama 3 for Text-to-SQL use cases

HCLTech’s AWS powered AutoWise Companion: A seamless experience for informed automotive buyer decisions with data-driven design

Snowflake Arctic models are now available in Amazon SageMaker JumpStart

Overcoming 12 Challenges in Building Production-Ready RAG-based LLM Applications

Launch HN: Chonkie (YC X25) – Open-Source Library for Advanced Chunking

Why Django is one of the best frameworks for web development?

How to Save Trained Model in Python

How to Split Text For Vector Embeddings in Snowflake

Achieve ~2x speed-up in LLM inference with Medusa-1 on Amazon SageMaker AI

Stay Connected