Analytics, Data Science and Hadoop - Data Science Current

Analytics

Data Science

Hadoop

Hadoop Ecosystem

Analytics Vidhya

OCTOBER 9, 2022

This article was published as a part of the Data Science Blogathon. Introduction Apache Hadoop is an open-source framework designed to facilitate interaction with big data. Still, for those unfamiliar with this technology, one question arises, what is big data?

Hadoop

Hadoop Apache Hadoop Big Data Big Data

An Introduction to Hadoop Ecosystem for Big Data

Analytics Vidhya

MAY 27, 2022

This article was published as a part of the Data Science Blogathon. Introduction Every day the internet generates billions of bytes of data. Every time you put on a dog filter, watch cat videos or order food from your favourite restaurant, you generate data.

Hadoop

Hadoop Big Data Big Data Data Science

Join 20,000+

professionals

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

The Project Clinic: Assessing Project Health, Planning, and Execution

MORE WEBINARS

Trending Sources

Integration of Python with Hadoop and Spark

Analytics Vidhya

MAY 30, 2021

ArticleVideo Book This article was published as a part of the Data Science Blogathon Introduction Big data is the collection of data that is vast. The post Integration of Python with Hadoop and Spark appeared first on Analytics Vidhya.

Hadoop

Hadoop Python Big Data Big Data

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

The Project Clinic: Assessing Project Health, Planning, and Execution

MORE WEBINARS

The Tale of Apache Hadoop YARN!

Analytics Vidhya

MAY 31, 2022

This article was published as a part of the Data Science Blogathon. Introduction YARN stands for Yet Another Resource Negotiator, a large-scale distributed data operating system used for Big Data Analytics. The post The Tale of Apache Hadoop YARN! appeared first on Analytics Vidhya.

Apache Hadoop

Apache Hadoop Hadoop Big Data Analytics Big Data Analytics

Introduction to Hadoop Architecture and Its Components

Analytics Vidhya

JUNE 14, 2022

This article was published as a part of the Data Science Blogathon. Introduction Hadoop is an open-source, Java-based framework used to store and process large amounts of data. Data is stored on inexpensive asset servers that operate as clusters. Developed by Doug Cutting and Michael […].

Hadoop

Hadoop Clustering Data Science Analytics

Frequent Itemset Mining Using MapReduce on Hadoop

Analytics Vidhya

SEPTEMBER 14, 2022

This article was published as a part of the Data Science Blogathon. Introduction Every Data Science enthusiast’s journey goes through one of the most classical data problems – Frequent Itemset Mining, also sometimes referred to as Association Rule Mining or Market Basket Analysis.

Hadoop

Hadoop Data Science Analytics Analytics

Apache Spark Vs. Hadoop MapReduce – Top 7 Differences

Analytics Vidhya

JUNE 14, 2022

This article was published as a part of the Data Science Blogathon. Earlier to it, Hadoop MapReduce was the main focus for processing large data with no competitors. The post Apache Spark Vs. Hadoop MapReduce – Top 7 Differences appeared first on Analytics Vidhya. Let’s take a […].

Hadoop

Hadoop Data Science Analytics Analytics

HIVE – A DATA WAREHOUSE IN HADOOP FRAMEWORK

Analytics Vidhya

MAY 30, 2021

ArticleVideo Book This article was published as a part of the Data Science Blogathon Different components in the Hadoop Framework Introduction Hadoop is. The post HIVE – A DATA WAREHOUSE IN HADOOP FRAMEWORK appeared first on Analytics Vidhya.

Hadoop

Hadoop Data Warehouse Data Science Analytics

Apache Oozie: Scheduler System to Manage & Perform Hadoop Jobs

Analytics Vidhya

MAY 1, 2022

This article was published as a part of the Data Science Blogathon. Introduction on Apache Oozie Apache Oozie is a tool that allows us to run any application or job in any sequence within Hadoop’s distributed environment. We may schedule the job to run at a specified time with Oozie. What is Apache Oozie? Apache […].

Hadoop

Hadoop Data Science Analytics Analytics

Workings of Hadoop Distributed File System (HDFS)

Analytics Vidhya

MAY 5, 2022

This article was published as a part of the Data Science Blogathon. Introduction This article will discuss the Hadoop Distributed File System, its features, components, functions, and benefits. Hadoop is a powerful platform for supporting an enormous variety of data applications.

Hadoop

Hadoop Data Science Analytics Analytics

Data Science Blogathon 30th Edition- Women in Data Science

Analytics Vidhya

MARCH 8, 2023

The Biggest Data Science Blogathon is now live! Martin Uzochukwu Ugwu Analytics Vidhya is back with the largest data-sharing knowledge competition- The Data Science Blogathon. Knowledge is power. Sharing knowledge is the key to unlocking that power.”―

Data Science

Data Science Analytics Analytics Apache Hadoop

Data Science Blogathon 28th Edition

Analytics Vidhya

JANUARY 8, 2023

Hey, are you the data science geek who spends hours coding, learning a new language, or just exploring new avenues of data science? Analytics Vidhya is back with its 28th Edition of blogathon, a place where you can share your knowledge about […].

Data Science

Data Science Analytics Analytics Hadoop

Data Science Blogathon 26th Edition

Analytics Vidhya

NOVEMBER 7, 2022

Hello, fellow data science enthusiasts, did you miss imparting your knowledge in the previous blogathon due to a time crunch? Well, it’s okay because we are back with another blogathon where you can share your wisdom on numerous data science topics and connect with the community of fellow enthusiasts.

Data Science

Data Science Analytics Analytics Hadoop

Learn Everything about MapReduce Architecture & its Components

Analytics Vidhya

JULY 5, 2022

This article was published as a part of the Data Science Blogathon. Introduction MapReduce is part of the Apache Hadoop ecosystem, a framework that develops large-scale data processing. Other components of Apache Hadoop include Hadoop Distributed File System (HDFS), Yarn, and Apache Pig.

Apache Hadoop

Apache Hadoop Hadoop Data Science Algorithm

Coding vs Data Science: A comprehensive guide to unraveling the differences

Data Science Dojo

JULY 7, 2023

In the technology-driven world we inhabit, two skill sets have risen to prominence and are a hot topic: coding vs data science. Coding vs Data Science Coding goes beyond just software creation, impacting fields as diverse as healthcare, finance, and entertainment. What is Data Science?

Data Science

Data Science Data Scientist Decision Trees Python

Introduction to Apache Sqoop

Analytics Vidhya

JULY 25, 2022

This article was published as a part of the Data Science Blogathon. Introduction Apache Sqoop is a big data engine for transferring data between Hadoop and relational database servers. Big Data Sqoop can also be […]. Big Data Sqoop can also be […].

Hadoop

Hadoop Big Data Big Data Data Engineering

Architecture and Components of Apache YARN

Analytics Vidhya

JULY 11, 2022

This article was published as a part of the Data Science Blogathon. Previous versions of Hadoop only support […]. Previous versions of Hadoop only support […]. The post Architecture and Components of Apache YARN appeared first on Analytics Vidhya.

Hadoop

Hadoop Data Science Analytics Analytics

Get to Know Apache Flume from Scratch!

Analytics Vidhya

MAY 12, 2022

This article was published as a part of the Data Science Blogathon. Introduction Apache Flume, a part of the Hadoop ecosystem, was developed by Cloudera. Initially, it was designed to handle log data solely, but later, it was developed to process event data. appeared first on Analytics Vidhya.

Hadoop

Hadoop Data Science Analytics Analytics

22 Widely Used Data Science and Machine Learning Tools in 2020

Analytics Vidhya

JUNE 27, 2020

Overview There are a plethora of data science tools out there – which one should you pick up? The post 22 Widely Used Data Science and Machine Learning Tools in 2020 appeared first on Analytics Vidhya. Here’s a list of over 20.

Machine Learning

Machine Learning Machine Learning Data Science Analytics

A Brief Introduction to Apache HBase and it’s Architecture

Analytics Vidhya

OCTOBER 12, 2022

This article was published as a part of the Data Science Blogathon. Introduction Since the 1970s, relational database management systems have solved the problems of storing and maintaining large volumes of structured data.

Hadoop

Hadoop Big Data Big Data Data Science

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

Though you may encounter the terms “data science” and “data analytics” being used interchangeably in conversations or online, they refer to two distinctly different concepts. Meanwhile, data analytics is the act of examining datasets to extract value and find answers to specific questions.

Data Science

Data Science Analytics Analytics Data Scientist

An Introduction to MapReduce with a Word Count Example

Analytics Vidhya

MAY 18, 2022

This article was published as a part of the Data Science Blogathon. Introduction Hadoop facilitates the processing of large datasets in a distributed manner and provides the foundation on which other services and applications can be built. MapReduce and HDFS are the two main components of Hadoop.

Hadoop

Hadoop Data Science Analytics Analytics

What is Apache Impala- Features and Architecture

Analytics Vidhya

AUGUST 17, 2022

This article was published as a part of the Data Science Blogathon. Introduction Impala is an open-source and native analytics database for Hadoop. The post What is Apache Impala- Features and Architecture appeared first on Analytics Vidhya. source: -[link] It rapidly processes large […].

Hadoop

Hadoop Data Science Database Analytics

A Comprehensive Guide to Apache Spark RDD and PySpark

Analytics Vidhya

OCTOBER 21, 2021

This article was published as a part of the Data Science Blogathon Overview Hadoop is widely used in the industry to examine large data volumes. The post A Comprehensive Guide to Apache Spark RDD and PySpark appeared first on Analytics Vidhya. Table of […]. Table of […].

Hadoop

Hadoop Data Science Analytics Analytics

Warehouse, Lake or a Lakehouse – What’s Right for you?

Analytics Vidhya

OCTOBER 10, 2022

This article was published as a part of the Data Science Blogathon. Introduction Most of you would know the different approaches for building a data and analytics platform. You would have already worked on systems that used traditional warehouses or Hadoop-based data lakes. Selecting one among […].

Data Lakes

Data Lakes Hadoop Data Science Analytics

Most Frequently Asked Apache HBase Interview Questions

Analytics Vidhya

AUGUST 1, 2022

This article was published as a part of the Data Science Blogathon. Introduction HBase is a column-oriented non-relational database management system that operates on Hadoop Distributed File System (HDFS). HBase provides a fault-tolerant manner of storing sparse data sets, which are prevalent in several big data use cases.

Hadoop

Hadoop Big Data Big Data Data Science

An Overview on DDL Commands in Apache Hive

Analytics Vidhya

APRIL 29, 2022

This article was published as a part of the Data Science Blogathon. Introduction Apache Hadoop is the most used open-source framework in the industry to store and process large data efficiently. Hive is built on the top of Hadoop for providing data storage, query and processing capabilities.

Apache Hadoop

Apache Hadoop Hadoop SQL Data Science

Performance Tuning Practices in Hive

Analytics Vidhya

FEBRUARY 20, 2022

This article was published as a part of the Data Science Blogathon. Introduction Apache Hive is a data warehouse system built on top of Hadoop which gives the user the flexibility to write complex MapReduce programs in form of SQL- like queries.

Hadoop

Hadoop Data Warehouse SQL Data Science

Apache Zookeeper Architecture and Installation

Analytics Vidhya

AUGUST 3, 2022

This article was published as a part of the Data Science Blogathon. Introduction Zookeeper in Hadoop can be considered a centralized repository where distributed applications can put data into and retrieve data from. The post Apache Zookeeper Architecture and Installation appeared first on Analytics Vidhya.

Hadoop

Hadoop Data Science Analytics Analytics

Everything About Apache Hive and its Advantages!

Analytics Vidhya

JUNE 29, 2022

This article was published as a part of the Data Science Blogathon. Hive, founded by Facebook and later Apache, is a data storage system created for the purpose of analyzing structured data. Operating under an open-source data platform called Hadoop, Apache Hive is a software application released in 2010 (October).

Hadoop

Hadoop Data Science Analytics Analytics

Most Asked Interview Questions on Apache Spark

Analytics Vidhya

AUGUST 26, 2022

This article was published as a part of the Data Science Blogathon. Introduction Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark’s in-memory data processing capabilities make it 100 times faster than Hadoop. The most […]. The most […].

Hadoop

Hadoop Data Science Analytics Analytics

Introduction to Partitioned hive table and PySpark

Analytics Vidhya

OCTOBER 28, 2021

This article was published as a part of the Data Science Blogathon What is the need for Hive? The official description of Hive is- ‘Apache Hive data warehouse software project built on top of Apache Hadoop for providing data query and analysis.

Apache Hadoop

Apache Hadoop Hadoop Data Warehouse SQL

Partitioning and Bucketing in Hive

Analytics Vidhya

JUNE 30, 2022

This article was published as a part of the Data Science Blogathon. Introduction Hive is a popular data warehouse built on top of Hadoop that is used by companies like Walmart, Tiktok, and AT&T. It is an important technology for data engineers to learn and master.

Hadoop

Hadoop Data Warehouse Data Engineering Data Engineer

Apache Pig Architecture and Execution Modes

Analytics Vidhya

JULY 10, 2022

This article was published as a part of the Data Science Blogathon. The Apache Pig is built on top of Hadoop. Provides a stream of data processing for large data sets. The post Apache Pig Architecture and Execution Modes appeared first on Analytics Vidhya. Apache Pork offers a high-quality language.

Hadoop

Hadoop Data Science Analytics Analytics

An Introduction to Data Analysis using Spark SQL

Analytics Vidhya

AUGUST 30, 2021

This article was published as a part of the Data Science Blogathon Introduction Spark is an analytics engine that is used by data scientists all over the world for Big Data Processing. It is built on top of Hadoop and can process batch as well as streaming data.

Data Analysis

Data Analysis Data Analysis SQL Hadoop

3 Reasons Why In-Hadoop Analytics are a Big Deal

Dataconomy

APRIL 21, 2016

Recent technology advances within the Apache Hadoop ecosystem have provided a big boost to Hadoop’s viability as an analytics environment—above and beyond just being a good place to store data. Leveraging these advances, new technologies now support SQL on Hadoop, making in-cluster analytics of data in Hadoop a reality.

Hadoop Analytics

Hadoop Analytics Hadoop Apache Hadoop Analytics

Getting Started with NoSQL Database Called HBase

Analytics Vidhya

MAY 17, 2022

This article was published as a part of the Data Science Blogathon. It is developed as a part of the Hadoop ecosystem and runs on top of HDFS. It provides random real-time read and write access to the given data. The post Getting Started with NoSQL Database Called HBase appeared first on Analytics Vidhya.

Database

Database Hadoop Data Science Analytics

What is Hadoop and How Does It Work?

Pickl AI

JUNE 18, 2023

Hadoop has become a highly familiar term because of the advent of big data in the digital world and establishing its position successfully. The technological development through Big Data has been able to change the approach of data analysis vehemently. But what is Hadoop and what is the importance of Hadoop in Big Data?

Hadoop

Hadoop Big Data Big Data Clustering

YARN – Yet Another Resource Negotiator

Analytics Vidhya

JANUARY 7, 2022

In today’s world, data is being generated at an ever-growing pace, leading to a boom in demand for Big Data tools such as Hadoop, Pig, Spark, Hive, and many more. The tool that stands out the most is Apache Hadoop, and one of its core components is YARN. Apache Hadoop YARN, or as it is […].

Apache Hadoop

Apache Hadoop Hadoop Big Data Big Data

Cloud Data Science 10

Data Science 101

MARCH 7, 2020

The Cloud Data Science world is keeping busy. Azure HDInsight now supports Apache analytics projects This announcement includes Spark, Hadoop, and Kafka. The post Cloud Data Science 10 appeared first on Data Science 101. Lots of happenings this week.

Cloud Data

Cloud Data Data Science Azure Hadoop

10 reasons to learn Data Science

Pickl AI

FEBRUARY 6, 2024

Summary: Are you still wondering whether or not you should pursue your career as a Data Scientist? This blog breaks the ice and unfolds 10 reasons to learn Data Science. 10 reasons to learn Data Science The rapid increase in digitization has created volumes of data. million new job opportunities.

Data Science

Data Science Data Scientist Machine Learning Machine Learning

A beginner tale of Data Science

Becoming Human

JANUARY 23, 2023

Data Science You heard this term most of the time all over the internet, as well this is the most concerning topic for newbies who want to enter the world of data but don’t know the actual meaning of it. I’m not saying those are incorrect or wrong even though every article has its mindset behind the term ‘ Data Science ’.

Data Science

Data Science Big Data Big Data Deep Learning

Hadoop Ecosystem

An Introduction to Hadoop Ecosystem for Big Data

Webinars

Trending Sources

Integration of Python with Hadoop and Spark

Webinars

The Tale of Apache Hadoop YARN!

Introduction to Hadoop Architecture and Its Components

Frequent Itemset Mining Using MapReduce on Hadoop

Apache Spark Vs. Hadoop MapReduce – Top 7 Differences

HIVE – A DATA WAREHOUSE IN HADOOP FRAMEWORK

Apache Oozie: Scheduler System to Manage & Perform Hadoop Jobs

Workings of Hadoop Distributed File System (HDFS)

Data Science Blogathon 30th Edition- Women in Data Science

Data Science Blogathon 28th Edition

Data Science Blogathon 26th Edition

Learn Everything about MapReduce Architecture & its Components

Coding vs Data Science: A comprehensive guide to unraveling the differences

Introduction to Apache Sqoop

Architecture and Components of Apache YARN

Get to Know Apache Flume from Scratch!

22 Widely Used Data Science and Machine Learning Tools in 2020

A Brief Introduction to Apache HBase and it’s Architecture

Data science vs data analytics: Unpacking the differences

An Introduction to MapReduce with a Word Count Example

What is Apache Impala- Features and Architecture

A Comprehensive Guide to Apache Spark RDD and PySpark

Warehouse, Lake or a Lakehouse – What’s Right for you?

Top 20 Apache Oozie Interview Questions

Most Frequently Asked Apache HBase Interview Questions

An Overview on DDL Commands in Apache Hive

Performance Tuning Practices in Hive

Top Interview Questions & Answers for Apache Oozie

Apache Zookeeper Architecture and Installation

Everything About Apache Hive and its Advantages!

Most Asked Interview Questions on Apache Spark

Introduction to Partitioned hive table and PySpark

Partitioning and Bucketing in Hive

Apache Pig Architecture and Execution Modes

An Introduction to Data Analysis using Spark SQL

3 Reasons Why In-Hadoop Analytics are a Big Deal

Getting Started with NoSQL Database Called HBase

What is Hadoop and How Does It Work?

YARN – Yet Another Resource Negotiator

Cloud Data Science 10

10 reasons to learn Data Science

A beginner tale of Data Science

Stay Connected