==========================================
set operations
Union, munus, Intersect
Only Union is supported in hive, minus and Intersect are not supported.
We can write the query to achieve minus and Intersect.
==========================================
Sub queries
IN , NOT IN
exists , not exists
Only these 2 are supported in hive
==========================================
Views
Logical table.
Creating the table which will have subset of the columns.
create view viewname as (query)
Will give security.
can create multiple logical table.
hiding complexity of the query.
This query will be stored in metastore.
==========================================
Normalization vs Denormalization
Normalization : Data should not be redundant
Dividing bigger table into multiple smaller table to remove with
the intent to remove redundancy
Normalization is good for transactional db.
Denormalization : keep the data in big table rather than small tables.
it will have issue of data redundancy.
It will help in reducing the number of joins.
Hive Basic Concepts:
Datawarehouse
Data
metadata : metastore
managed table, external table, temporary table
loading data from local and from hdfs
complex data types.
UDF, UDAF, UDTF
Views
Normalization and deNormalization
==========================================
Hive Optimization techniques:
1. table structure level Optimization
Partitioning and bucketing
2. Optimization of hive query
1. table structure level Optimization
Logically segreagate the data
Partitioning and bucketing
Static and Dynamic Partitioning
We can have hierarchy of the folders in partitioning.
We dont have control on the number of partitions created.
Partition is the folder
==========================================
Bucketing tables :
How bucketing works?
Bucket value has to be fixed while creating table.
Each bucket is a file.
hash function used for inserting will be used by searching also to decide
which bucket to read.
Whenever we have lot of distinct values we can use bucketing
==========================================
We can use combination of partitioning and bucketing to organize the data.
set operations
Union, munus, Intersect
Only Union is supported in hive, minus and Intersect are not supported.
We can write the query to achieve minus and Intersect.
==========================================
Sub queries
IN , NOT IN
exists , not exists
Only these 2 are supported in hive
==========================================
Views
Logical table.
Creating the table which will have subset of the columns.
create view viewname as (query)
Will give security.
can create multiple logical table.
hiding complexity of the query.
This query will be stored in metastore.
==========================================
Normalization vs Denormalization
Normalization : Data should not be redundant
Dividing bigger table into multiple smaller table to remove with
the intent to remove redundancy
Normalization is good for transactional db.
Denormalization : keep the data in big table rather than small tables.
it will have issue of data redundancy.
It will help in reducing the number of joins.
Hive Basic Concepts:
Datawarehouse
Data
metadata : metastore
managed table, external table, temporary table
loading data from local and from hdfs
complex data types.
UDF, UDAF, UDTF
Views
Normalization and deNormalization
==========================================
Hive Optimization techniques:
1. table structure level Optimization
Partitioning and bucketing
2. Optimization of hive query
1. table structure level Optimization
Logically segreagate the data
Partitioning and bucketing
Static and Dynamic Partitioning
We can have hierarchy of the folders in partitioning.
We dont have control on the number of partitions created.
Partition is the folder
==========================================
Bucketing tables :
How bucketing works?
Bucket value has to be fixed while creating table.
Each bucket is a file.
hash function used for inserting will be used by searching also to decide
which bucket to read.
Whenever we have lot of distinct values we can use bucketing
==========================================
We can use combination of partitioning and bucketing to organize the data.
No comments:
Post a Comment