Apache Flume: Distributed Log Collection for Hadoop - Second Edition

Name: Apache Flume: Distributed Log Collection for Hadoop - Second Edition
Author: Steve Hoffman
ISBN: 9781784392178

Author Steve Hoffman

Publisher Packt Publishing - ebooks Account

Shop on Amazon — pick your country

🇺🇸 USA 🇨🇦 Canada 🇬🇧 UK 🇩🇪 Germany 🇫🇷 France 🇮🇳 India

36.99 USD

Buy New on Amazon 🇺🇸 Used — $31.78

Usually ships in 24 hours

Book Details

Author(s) Steve Hoffman

Publisher Packt Publishing - ebooks Account

ISBN / ASIN 1784392170

ISBN-13 9781784392178

Availability Usually ships in 24 hours

Sales Rank #1,495,842

Marketplace United States 🇺🇸

Ratings & Reviews No reviews yet — be the first!

No reviews yet.

Description

Design and implement a series of Flume agents to send streamed data into Hadoop

About This Book

Construct a series of Flume agents using the Apache Flume service to efficiently collect, aggregate, and move large amounts of event data
Configure failover paths and load balancing to remove single points of failure
Use this step-by-step guide to stream logs from application servers to Hadoop's HDFS

Who This Book Is For

If you are a Hadoop programmer who wants to learn about Flume to be able to move datasets into Hadoop in a timely and replicable manner, then this book is ideal for you. No prior knowledge about Apache Flume is necessary, but a basic knowledge of Hadoop and the Hadoop File System (HDFS) is assumed.

What You Will Learn

Understand the Flume architecture, and also how to download and install open source Flume from Apache
Follow along a detailed example of transporting weblogs in Near Real Time (NRT) to Kibana/Elasticsearch and archival in HDFS
Learn tips and tricks for transporting logs and data in your production environment
Understand and configure the Hadoop File System (HDFS) Sink
Use a morphline-backed Sink to feed data into Solr
Create redundant data flows using sink groups
Configure and use various sources to ingest data
Inspect data records and move them between multiple destinations based on payload content
Transform data en-route to Hadoop and monitor your data flows

In Detail

Apache Flume is a distributed, reliable, and available service used to efficiently collect, aggregate, and move large amounts of log data. It is used to stream logs from application servers to HDFS for ad hoc analysis.

This book starts with an architectural overview of Flume and its logical components. It explores channels, sinks, and sink processors, followed by sources and channels. By the end of this book, you will be fully equipped to construct a series of Flume agents to dynamically transport your stream data and logs from your systems into Hadoop.

A step-by-step book that guides you through the architecture and components of Flume covering different approaches, which are then pulled together as a real-world, end-to-end use case, gradually going from the simplest to the most advanced features.

Apache Flume: Distributed Log Collection for Hadoop - Second Edition

About This Book

Who This Book Is For

What You Will Learn

In Detail

More Books by Steve Hoffman