Hadoop操作手册(英文影印版)
出版时间:2013年版
内容简介
如果你需要维护大型而且复杂的Hadoop集群的话,《Hadoop操作手册(影印版)》是绝对必需的。随着Hadoop变成数据中心里大规模数据处理的行业标准,操作手册方面的需求急剧增长。萨默尔,cloudera公司的首席方案架构师,在本书中为你展示了产品级Hadoop的运行细节,从规划、安装和配置系统到提供可持续的维护管理。《Hadoop操作手册(影印版)》这本操作指南并没有列举每种可能的场景,它更注重实效,描述了在重要部署中的各项步骤。 本书内容: HDFS和MapRedLice概览:它们存在的原因和原理;从硬件和OS选择到网络需求来规划Hadoop部署; 根据重要属性列表来学习搭建和配置细节; 通过在多个组中共享集群来管理资源;获取最常见的集群维护任务运行手册; 监控Hadoop集群——以及学习基于实际例子的故障检测;使用基础工具和技术来处理备份和灾难性故障。
目录
Preface
1.Introduction
2.HDFS
Goals and Motivation
Design
Daemons
Reading and Writing Data
The Read Path
The Write Path
Managing Filesystem Metadata
Namenode High Availability
Namenode Federation
Access and Integration
Command—Line Tools
FUSE
REST Support
3.MapReduce
The Stages of MapReduce
Introducing Hadoop MapReduce
Daemons
When It All Goes Wrong
YARN
4.Planning a Hadoop Cluster
Picking a Distribution and Version of Hadoop
Apache Hadoop
Cloudera’S Distribution Including Apache Hadoop
What Should I Use?
Hardware Selection
Master Hardware Selection
Worker Hardware Selection
Cluster Sizing
Blades,SANs,and Virtualization
Operating System Selection and Preparation
Deployment Layout
Software
Hostnames.DNS.and Identmcation
Users,Groups,and Privileges
Kernel Tuning
vm.swappiness
vm.overcommit_memory
Disk Configuration
Choosing a Filesystem
Mount Options
Network Design
Network Usage in Hadoop:A Review
1 Gb versus 10 Gb Networks
Typical Network Topologies
5.Installation andConfiguration
Installing Hadoop
Apache Hadoop
CDH
Configuration:An 0verview
The Hadoop XML Configuration Files
Environment Variables and Shell Scripts
Logging Configuration
HDFS
Identification and Location
Optimization and Tuning
Formatting the Namenode
Creating a/tmp Directory
Namenode High Availability
Fencing Options
Basic Configuration
Automatic Failover Configuration
Format and Bootstrap the Namenodes
Namenode Federation
MapReduce
Identification and Location
Optimization and Tuning
Rack Topology
Security
6.Identity,Authentication,and Authorization
Identity
Kerberos and Hadoop
Kerberos:A Refresher
Kerberos Support in Hadoop
Authorization
HDFS
MapReduce
Other Tools and Systems
Tying It Together
7.ResojJrceManagement
What Is Resource Management?
HDFS Quotas
MapReduce Schedulers
The FIFO Scheduler
The Fair Scheduler
The Capacity Scheduler
The Future
8.ClusterMaintenance
Managing Hadoop Processes
Starting and Stopping Processes with Into Scripts
Starting and Stopping Processes Manually
HDFS Maintenance Tasks
Adding a Datanode
Decommissioning a Datanode
Checking Filesystem Integrity with fsck
Balancing HDFS Block Data
Dealing with a Failed Disk
MapReduce Maintenance Tasks
Adding a Tasktracker
Decommissioning a Tasktracker
Killing a MapReduce Job
Killing a MapReduce Task
Dealing with a Blacklisted Tasktracker
9.Troubleshooting
Differential Diagnosis Applied to Systems