Spark 和 Hadoop開發(fā)人員培訓(xùn)(CCA)
課程介紹
4天的課程包涵了解Apache Spark的基礎(chǔ)知識(shí)及其與Hadoop整體生態(tài)系統(tǒng)的集成方式。本課程將重溫HDFS的基礎(chǔ)內(nèi)容,學(xué)習(xí)如何使用Sqoop/Flume攝取數(shù)據(jù),利用Spark處理分布式數(shù)據(jù),學(xué)習(xí)在Impala和Hive上數(shù)據(jù)建模,以及在數(shù)據(jù)存儲(chǔ)方面的*實(shí)踐。
? How data is distributed, stored, and processed in a Hadoop cluster
? How to use Sqoop and Flume to ingest data
? How to process distributed data with Apache Spark
? How to model structured data as tables in Impala and Hive
? How to choose the best data storage format for different data usage patterns
? Best practices for data storage
課程目標(biāo)
? 課程完成時(shí),你將需要參加 CCA Spark 和 Hadoop 開發(fā)人員認(rèn)證。該認(rèn)證證明了其核心開發(fā)人員的技能水平,并且可以編寫和維護(hù) Apache Spark 和 Apache Hadoop 項(xiàng)目。
適合人群
? 面向具有 Scala 和 Python 編程經(jīng)驗(yàn)的開發(fā)人員。熟悉 Linux 命令行。強(qiáng)烈建議不熟悉 Hadoop 的人員參加該培訓(xùn)。
? This course is designed for developers and engineers who have programming experience. Apache Spark examples and hands-on exercises are presented in Scala and Python, so the ability to program in one of those languages is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful; prior knowledge of Hadoop is not required.