Big Data Integration

bigdata

Data Warehouse System For Hadoop

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.

At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

  • Tools to enable easy data extract/transform/load (ETL)
  • A mechanism to impose structure on a variety of data formats
  • Query execution via MapReduce

Hive defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language. QL can also be extended with custom scalar functions (UDF’s), aggregations (UDAF’s), and table functions (UDTF’s).

Apache HBase

Apache HBase began as a project by the company Powerset out of a need to process massive amounts of data for the purposes of natural language search. It is now a top-level Apache project and has generated considerable interest.

Facebook elected to implement its new messaging platform using HBase in November 2010.

previous
next