Hadoop Bigdata Training in Koramangala Bangalore

Information about Hadoop Bigdata Training in Koramangala Bangalore

Published on October 13, 2014

Author: codefrux

Source: authorstream.com

Content

Big Data & Hadoop: Big Data & Hadoop Agenda: Agenda Introduction to Big Data Why Big Data? Big Data Overview Hadoop Overview Why Hadoop? Who can learn Hadoop ? #Trending : Jobs for Hadoop and Java Hadoop : Architecture & Ecosystem Introduction to Big Data: Introduction to Big Data Big Data : A Term for collective data sets with large and complex volumes of data. Volumes are in Petabytes (1024 TB) or Exabytes (1024 PB) & will soon be Zettabytes (1024 EB). Hence, the data are hard to interpret & process in the existing traditional data processing application and tools. Why Big Data: Why Big Data To Manage Huge Data in a better way. Benefit of Data Speed, C apacity & Scalability from cloud storage. Potential insights by Data Analysis Methods. Companies can find new prospects & Business Opportunities. Unlike other methods, with Big Data, Business Users can Visualize the Data. Big Data Overview: Big Data Overview Big Data include : Traditional Structured Databases from inventories, orders and customer information. Unstructured Data from web, social networking sites etc., The problem with these massive datasets are that it can’t be analyzed with standard tool & procedures. P rocessing these data appropriately can help an Organization gain useful insights on the business prospects. Unstructured data Growth: Unstructured data Growth No. of Emails sent per second 2.9 Million Videos uploaded on YouTube per min 20 hours Data processed by Google per day 20PetaBytes Tweets per day 50 Million Minutes spent on FaceBook per month 700 Billion Data sent & received by mobile users per day 1.3 ExaBytes Products ordered on Amazon per second 73 items * Source:http ://www. http ://ibm.com / Unstructured data Growth: Unstructured data Growth PowerPoint Presentation: * Source:http ://www. http ://forbes.com/ Hadoop Overview: Hadoop Overview Hadoop allows batch processing for colossal data sets (Petabytes & Exabytes ) as a series of parallel processes . Hadoop cluster comprises a number of server " nodes ”. Nodes store and process data in a parallel and distributed fashion . Its a parallelized, distributed storage & processing framework that can operate on commodity servers. Commodity Hardware: Commodity Hardware Its the average amount of computing resources. It doesn’t imply low quality but, affordability. Hadoop Clusters run on Commodity Servers. Commodity servers have an average ratio of disk space to memory which is not like specialized servers with high memory or CPU. Servers are not designed specifically to distribute storage and process framework, but its made to fit the purpose. Benefits of Hadoop: Benefits of Hadoop Scalable – Hadoop can store and distribute very large data sets across hundreds of inexpensive servers that operate in parallel . Failure Tolerance – HDFS can replicate  files for specified number of times and can automatically re-replicate data blocks on nodes that have failed. Benefits of Hadoop: Benefits of Hadoop Cost-Effective – Hadoop is a scale-out architecture that stores all the company's data for later use, for which it offers computing and storage capabilities for a reasonable price. Speed – Hadoop’s unique storage method is based on a distributed file system, resulting in much faster data processing. Flexible – Hadoop easily access new data sources and different types of data to generate insights. PowerPoint Presentation: * Source:http ://www. http ://datanami.com/ Why Hadoop: Why Hadoop It provides insights into daily operations Drives new product ideas Used by companies for research and development and marketing analysis Image and text processing. Analyses huge amount of data in comparatively less time. Network monitoring Log and/or click stream analysis of various kinds . Hadoop Forecast: Hadoop Forecast * Source:http ://www. http ://alliedmarketresearch.com/ Who can Learn Hadoop: Who can Learn Hadoop Anyone with basic knowledge of Java & Linux. Even if you aren’t introduced to Java & Linux before, You can learn it parallel along with Hadoop. Hadoop projects are available as Architect, Developer, Tester, Linux/Network/Hardware Administrator. Some need the knowledge of Java and some don’t. Who can Learn Hadoop: Who can Learn Hadoop SQL knowledge will help in learning HiveQL , which is a feature in Hadoop Ecosystem. Knowledge of Linux in will be helpful in understanding Hadoop command line Parameters. But even without any prerequisite knowledge of Java & Linux, with the help of few basic classes you can Learn Hadoop. #Trending: Hadoop Jobs: #Trending : Hadoop Jobs * Source:http ://www. http ://the451group.com/ Job Opportunities in Hadoop: Job Opportunities in Hadoop MNC’s like IBM, Microsoft & Oracle have integrated with Hadoop. Also, companies like Facebook, HortonWorks , Amazon, ebay and Yahoo! are currently looking for Hadoop Professionals. So, companies are looking for IT professionals with enough Hadoop Mapreduce skills. Salary Trend in Hadoop: Salary Trend in Hadoop * Source:http ://www. http ://itproportal.com/ Hadoop Architecture: Hadoop Architecture The 2 main components of Hadoop are: Hadoop Distributed File System ( HDFS) is the storage component that breaks files into blocks, replicates and stores them across the cluster. MapReduce , the processing component that distributes the workload for operations on files stored in HDFS and automatically restarts failed work. PowerPoint Presentation: * Source:http ://www. http ://cloudera.com/ Hadoop Ecosystem: Hadoop Ecosystem Apache Hadoop Distributed File System offers storage of large files across multiple machines.  Apache MapReduce is a program for processing large data sets with a parallel & distributed algorithm on a clusters . Apache Hive data warehouse in distributed storage facilitating data summarization, queries and managing large datasets . Apache Pig  is an engine for executing data flows in parallel on Apache Hadoop. Apache HBase Non-relational distributed database performing real-time operations in large tables. Hadoop Ecosystem: Hadoop Ecosystem Apache Flume is an Unstructured data aggregator to HDFS. Apache Sqoop is a system for transferring bulk data between HDFS and relational databases. Apache Oozie is a workflow scheduler system to manage Apache Hadoop jobs . Apache Zookeeper is a coordinator with tools to write correct distributed applications.  Apache Avro is a framework for modelling, serializing and making Remote Procedure Calls. Q & A: Q & A Q & A

#trending presentations

Related presentations


Other presentations created by codefrux

On Job Training
22. 10. 2014
0 views

On Job Training

Sales force 1
12. 11. 2014
0 views

Sales force 1