Powered By Blogger

Saturday, November 16, 2019

Hive Managed and External Table


----- Types of table-----------------------------
Managed Table :
  Data is owned by hive. If you drop the table, then metadata and actual content wil be dropped.
  location will be by default /user/hive/warehouse



External Table :
  Hive does not own the data. If you drop the table, then metadata will be dropped. Content will
  remain untouched.
    location will be anywhere
If external systems are using this file then use the External Table.


temporary table :
automatically created for intermediate processing. THis will be active untill the session is active.


by default table created will be managed table.

hadoop fs -cat /user/hive/warehouse/basan.db/test/*


[cloudera@quickstart Downloads]$ hadoop fs  -ls /user/hive/warehouse/basan.db/test
Found 1 items
-rwxrwxrwx   1 cloudera supergroup         11 2019-11-16 22:42 /user/hive/warehouse/basan.db/test/000000_0
[cloudera@quickstart Downloads]$ hadoop fs  -ls /user/hive/warehouse/basan.db/test
Found 1 items
-rwxrwxrwx   1 cloudera supergroup         11 2019-11-16 22:42 /user/hive/warehouse/basan.db/test/000000_0
[cloudera@quickstart Downloads]$ hadoop fs -cat /user/hive/warehouse/basan.db/test/*
1234 basin
[cloudera@quickstart Downloads]$ drop table test
bash: drop: command not found
[cloudera@quickstart Downloads]$


hive> drop table test
    > ;
OK
Time taken: 0.642 seconds
hive> dfs -ls /user/hive/warehouse/basan.db
    > ;
Found 2 items
drwxrwxrwx   - cloudera supergroup          0 2019-11-16 21:22 /user/hive/warehouse/basan.db/customers
drwxrwxrwx   - cloudera supergroup          0 2019-11-16 22:31 /user/hive/warehouse/basan.db/orders
hive>

Observe that below file been deleted
-rwxrwxrwx   1 cloudera supergroup         11 2019-11-16 22:42 /user/hive/warehouse/basan.db/test/000000_0




----- Working with external table-----------------------------



[cloudera@quickstart Downloads]$ hadoop fs -mkdir /data/exttable
vi products.csv create the file in Desktop

iphone7, iPhone 7, 950
camera_canon, Canon 570x, 1000
washingmachine_samsung, Samsung Swift, 400
tv_vu, Vu 56 Inch, 600


[cloudera@quickstart ~]$ pwd
/home/cloudera
[cloudera@quickstart ~]$ cd Desktop/
[cloudera@quickstart Desktop]$ vi products.csv
[cloudera@quickstart Desktop]$ hadoop fs -put /home/cloudera/Desktop/products.csv /data/exttable
[cloudera@quickstart Desktop]$ hadoop fs -ls /data/exttable
Found 1 items
-rw-r--r--   1 cloudera supergroup        120 2019-11-16 23:08 /data/exttable/products.csv
[cloudera@quickstart Desktop]$


create external table products (
id string,
title string,
cost float
)
location '/data/exttable/';


hive> create external table products (
    > id string,
    > title string,
    > cost float
    > )
    > location '/data/exttable/';
OK
Time taken: 0.091 seconds
hive> select * from products;
OK
iphone7, iPhone 7, 950 NULL NULL
camera_canon, Canon 570x, 1000 NULL NULL
washingmachine_samsung, Samsung Swift, 400 NULL NULL
tv_vu, Vu 56 Inch, 600 NULL NULL
Time taken: 0.094 seconds, Fetched: 4 row(s)
hive>



Observe entire row is treated as one column, and other columns are NULL. This is not the
results we were expecting.

hadoop fs -ls /user/hive/warehouse/basan.db

[cloudera@quickstart Desktop]$ hadoop fs -ls /user/hive/warehouse/basan.db
Found 2 items
drwxrwxrwx   - cloudera supergroup          0 2019-11-16 21:22 /user/hive/warehouse/basan.db/customers
drwxrwxrwx   - cloudera supergroup          0 2019-11-16 22:31 /user/hive/warehouse/basan.db/orders
[cloudera@quickstart Desktop]$

As we see the output we dont see the entry for the table products as in the case of managed table.


We need to specify explicitly how to seperate the columns.


hive> drop table products;
OK
Time taken: 0.133 seconds
hive> dfs -ls /data/exttable
    > ;
Found 1 items
-rw-r--r--   1 cloudera supergroup        120 2019-11-16 23:08 /data/exttable/products.csv
hive>

Though we have dropped the table , contents are not dropped. Lets correct the problem to avoid NULL

create external table if not exists products (
id string,
title string,
cost float
)
row format delimited fields terminated by ','
stored as textfile
location '/data/exttable/';



hive> create external table if not exists products (
    > id string,
    > title string,
    > cost float
    > )
    > row format delimited fields terminated by ','
    > stored as textfile
    > location '/data/exttable/';
OK
Time taken: 0.125 seconds
hive> select * from products;
OK
iphone7 iPhone 7 950.0
camera_canon Canon 570x 1000.0
washingmachine_samsung Samsung Swift 400.0
tv_vu Vu 56 Inch 600.0
Time taken: 0.084 seconds, Fetched: 4 row(s)
hive>

hive> describe formatted products;
OK
# col_name            data_type            comment           

id                  string                                 
title                string                                 
cost                float                                   

# Detailed Table Information
Database:            basan               
Owner:              cloudera           
CreateTime:          Sat Nov 16 23:24:47 PST 2019
LastAccessTime:      UNKNOWN             
Protect Mode:        None               
Retention:          0                   
Location:            hdfs://quickstart.cloudera:8020/data/exttable
Table Type:          EXTERNAL_TABLE     
Table Parameters:
COLUMN_STATS_ACCURATE false             
EXTERNAL            TRUE               
numFiles            1                 
numRows              -1                 
rawDataSize          -1                 
totalSize            120               
transient_lastDdlTime 1573975487         

# Storage Information
SerDe Library:      org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:        org.apache.hadoop.mapred.TextInputFormat
OutputFormat:        org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:          No                 
Num Buckets:        -1                 
Bucket Columns:      []                 
Sort Columns:        []                 
Storage Desc Params:
field.delim          ,                 
serialization.format ,                 
Time taken: 0.137 seconds, Fetched: 35 row(s)
hive>

Observer
Table Type:          EXTERNAL_TABLE
Location:            hdfs://quickstart.cloudera:8020/data/exttable

No comments:

Post a Comment