Alfresco: Hive Internals

Transactional
heavy inserts updates.
databases like Oracle, Mysql which are having ACID properties

Analytical
mostly read of the past data. Insert/updates are not that heavy.
Data ware house is the best example for this.
TerraData, Hive

Hive is opensource data warehouse.

Hive contents can be seen as the table. Data stored in the HDFS file system.We define schema
seperately. On the runtime schema is imposed on the data.
This metadata is stored in database. The name of the database is metastore.
By default it uses derby as the database.

Cloudera uses mysql as the database for metastore.

Opening Hive shell
[cloudera@quickstart Downloads]$ hive

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive>

hive> show databases;
OK
default

---------------- create database -----------

hive> create database basan;
OK
Time taken: 3.037 seconds
hive> show databases;
OK
basan
default
Time taken: 0.102 seconds, Fetched: 2 row(s)
hive> use basan;
OK
Time taken: 0.079 seconds
hive>

---------------- create table -----------

hive> create table customers (
> id bigint,
> name string,
> address string
> );
OK
Time taken: 0.398 seconds
hive> show tables;
OK
customers
Time taken: 0.047 seconds, Fetched: 1 row(s)

---------------- describe table -----------

hive> describe customers;
OK
id bigint
name string
address string
Time taken: 0.214 seconds, Fetched: 3 row(s)

---------------- describe formatted table -----------

It will give more details of kind of table, where it is stored,when it is created,owner of the
table , compression, output format , input format, buckets defined

hive> describe formatted customers;
OK
# col_name data_type comment

id bigint
name string
address string

# Detailed Table Information
Database: basan
Owner: cloudera
CreateTime: Sat Nov 16 21:09:04 PST 2019
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://quickstart.cloudera:8020/user/hive/warehouse/basan.db/customers
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1573967344

# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.117 seconds, Fetched: 28 row(s)
hive>

---------------- inserting table -----------
There are 3 ways in which you can use is
1. create table and insert the Column. Not good as it will run MR internally.
2. We have file, and create the schema and to see as table.
3. Inserting data from one table to another table.

insert into customers values(1111 , "John","WA");

hive> insert into customers values(1111 , "John","WA");
Query ID = cloudera_20191116212020_30fd76f7-f1b9-425a-8129-627e330eabc5
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1572771724749_0015, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1572771724749_0015/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1572771724749_0015
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-11-16 21:20:18,653 Stage-1 map = 0%, reduce = 0%
2019-11-16 21:20:27,469 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.75 sec
MapReduce Total cumulative CPU time: 2 seconds 750 msec
Ended Job = job_1572771724749_0015
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://quickstart.cloudera:8020/user/hive/warehouse/basan.db/customers/.hive-staging_hive_2019-11-16_21-20-05_225_3626440419486037600-1/-ext-10000
Loading data to table basan.customers
Table basan.customers stats: [numFiles=1, numRows=1, totalSize=13, rawDataSize=12]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.75 sec HDFS Read: 4053 HDFS Write: 84 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 750 msec
OK
Time taken: 24.049 seconds

Observe Time taken: 24.049 seconds

hive>
>
> insert into customers values (
> 2222, "Emily", "WA"
> ), (
> 3333, "Rick", "WA"
> ), (
> 4444, "Jane", "CA"
> ), (
> 5555, "Amit", "NJ"
> ), (
> 6666, "Nina", "NY"
> );
Query ID = cloudera_20191116212121_b9112812-fade-45f9-83e9-bc6661083854
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1572771724749_0016, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1572771724749_0016/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1572771724749_0016
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-11-16 21:21:51,838 Stage-1 map = 0%, reduce = 0%
2019-11-16 21:22:01,570 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.51 sec
MapReduce Total cumulative CPU time: 2 seconds 510 msec
Ended Job = job_1572771724749_0016
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://quickstart.cloudera:8020/user/hive/warehouse/basan.db/customers/.hive-staging_hive_2019-11-16_21-21-43_008_458275992706050129-1/-ext-10000
Loading data to table basan.customers
Table basan.customers stats: [numFiles=2, numRows=6, totalSize=79, rawDataSize=73]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.51 sec HDFS Read: 4202 HDFS Write: 137 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 510 msec
OK
Time taken: 20.162 seconds
hive>

---------------- Location of table and view contents-----------

By default contents inserted will be present in the location user/hive/warehouse
ex :

/user/hive/warehouse///
/user/hive/warehouse/basan.db/customers/

hadoop fs -ls /user/hive/warehouse/basan.db/customers/

[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/basan.db/customers/

Found 2 items
-rwxrwxrwx 1 cloudera supergroup 13 2019-11-16 21:20 /user/hive/warehouse/basan.db/customers/000000_0
-rwxrwxrwx 1 cloudera supergroup 66 2019-11-16 21:22 /user/hive/warehouse/basan.db/customers/000000_0_copy_1

hadoop fs -cat /user/hive/warehouse/basan.db/customers/*

[cloudera@quickstart ~]$ hadoop fs -cat /user/hive/warehouse/basan.db/customers/*
1111 John WA
2222 Emily WA
3333 Rick WA
4444 Jane CA
5555 Amit NJ
6666 Nina NY
[cloudera@quickstart ~]$

hive> select * from customers;
OK
1111 John WA
2222 Emily WA
3333 Rick WA
4444 Jane CA
5555 Amit NJ
6666 Nina NY
Time taken: 0.077 seconds, Fetched: 6 row(s)

Navigate to
http://localhost:50070/dfshealth.html#tab-overview
Browse rhe file system
http://localhost:50070/explorer.html#/

---------------- We can see the contents of this metadata in mysql-----------

mysql -u root -p
cloudera

[cloudera@quickstart ~]$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 153
Server version: 5.1.73 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| banking |
| cm |
| firehose |
| hue |
| metastore |
| mysql |
| nav |
| navms |
| oozie |
| retail_db |
| rman |
| sentry |
+--------------------+
13 rows in set (0.01 sec)

mysql> use metastore
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql>

mysql> show tables;
+---------------------------+
| Tables_in_metastore |
+---------------------------+
| BUCKETING_COLS |
| CDS |
| COLUMNS_V2 |
| COMPACTION_QUEUE |
| COMPLETED_TXN_COMPONENTS |
| DATABASE_PARAMS |
| DBS |
| DB_PRIVS |
| DELEGATION_TOKENS |
| FUNCS |
| FUNC_RU |
| GLOBAL_PRIVS |
| HIVE_LOCKS |
| IDXS |
| INDEX_PARAMS |
| MASTER_KEYS |
| METASTORE_DB_PROPERTIES |
| NEXT_COMPACTION_QUEUE_ID |
| NEXT_LOCK_ID |
| NEXT_TXN_ID |
| NOTIFICATION_LOG |
| NOTIFICATION_SEQUENCE |
| NUCLEUS_TABLES |
| PARTITIONS |
| PARTITION_EVENTS |
| PARTITION_KEYS |
| PARTITION_KEY_VALS |
| PARTITION_PARAMS |
| PART_COL_PRIVS |
| PART_COL_STATS |
| PART_PRIVS |
| ROLES |
| ROLE_MAP |
| SDS |
| SD_PARAMS |
| SEQUENCE_TABLE |
| SERDES |
| SERDE_PARAMS |
| SKEWED_COL_NAMES |
| SKEWED_COL_VALUE_LOC_MAP |
| SKEWED_STRING_LIST |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES |
| SORT_COLS |
| TABLE_PARAMS |
| TAB_COL_STATS |
| TBLS |
| TBL_COL_PRIVS |
| TBL_PRIVS |
| TXNS |
| TXN_COMPONENTS |
| TYPES |
| TYPE_FIELDS |
| VERSION |
+---------------------------+
54 rows in set (0.00 sec)

mysql>
mysql> select * from TBLS;
+--------+-------------+-------+------------------+----------+-----------+-------+-----------+---------------+--------------------+--------------------+----------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | LINK_TARGET_ID |
+--------+-------------+-------+------------------+----------+-----------+-------+-----------+---------------+--------------------+--------------------+----------------+
| 1 | 1573967344 | 6 | 0 | cloudera | 0 | 1 | customers | MANAGED_TABLE | NULL | NULL | NULL |
+--------+-------------+-------+------------------+----------+-----------+-------+-----------+---------------+--------------------+--------------------+----------------+
1 row in set (0.00 sec)

TBLS is having the table we created previously.

---------------hive queries -------------

hive> select * from customers;
OK
1111 John WA
2222 Emily WA
3333 Rick WA
4444 Jane CA
5555 Amit NJ
6666 Nina NY
Time taken: 0.272 seconds, Fetched: 6 row(s)
hive> select * from customers where address = "WA";
OK
1111 John WA
2222 Emily WA
3333 Rick WA
Time taken: 0.235 seconds, Fetched: 3 row(s)
hive> select name, address from customers where address = "CA";
OK
Jane CA
Time taken: 0.153 seconds, Fetched: 1 row(s)
hive> select name, address from customers where address = "WA" and id > 2222;
OK
Rick WA
Time taken: 0.188 seconds, Fetched: 1 row(s)
hive> select DISTINCT address from customers;
Query ID = cloudera_20191116214949_011234a5-6b24-40f5-94b4-289e6d05c23a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1572771724749_0017, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1572771724749_0017/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1572771724749_0017
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-11-16 21:50:02,755 Stage-1 map = 0%, reduce = 0%
2019-11-16 21:50:11,354 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
2019-11-16 21:50:19,906 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.84 sec
MapReduce Total cumulative CPU time: 3 seconds 840 msec
Ended Job = job_1572771724749_0017
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.84 sec HDFS Read: 7273 HDFS Write: 12 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 840 msec
OK
CA
NJ
NY
WA
Time taken: 26.758 seconds, Fetched: 4 row(s)
hive>

hive> select name, address from customers order by address;
Query ID = cloudera_20191116215151_6a479974-9d86-4a12-85b7-f3d7b95b00ef
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1572771724749_0018, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1572771724749_0018/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1572771724749_0018
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-11-16 21:51:26,789 Stage-1 map = 0%, reduce = 0%
2019-11-16 21:51:36,444 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
2019-11-16 21:51:46,136 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.57 sec
MapReduce Total cumulative CPU time: 4 seconds 570 msec
Ended Job = job_1572771724749_0018
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.57 sec HDFS Read: 6965 HDFS Write: 49 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 570 msec
OK
Jane CA
Amit NJ
Nina NY
Rick WA
Emily WA
John WA
Time taken: 31.2 seconds, Fetched: 6 row(s)
hive>

hive> select count(*) from customers;
Query ID = cloudera_20191116215252_f2fbe779-64c7-415a-b505-41c3d9c073b7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1572771724749_0019, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1572771724749_0019/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1572771724749_0019
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-11-16 21:52:34,030 Stage-1 map = 0%, reduce = 0%
2019-11-16 21:52:41,690 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.6 sec
2019-11-16 21:52:51,279 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.6 sec
MapReduce Total cumulative CPU time: 3 seconds 600 msec
Ended Job = job_1572771724749_0019
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.6 sec HDFS Read: 7620 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 600 msec
OK
6
Time taken: 27.953 seconds, Fetched: 1 row(s)
hive>

hive> select address, count(*) from customers group by address;
Query ID = cloudera_20191116215353_fb7a5a60-479f-4e3d-a76f-53ae9b66749b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1572771724749_0020, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1572771724749_0020/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1572771724749_0020
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-11-16 21:53:43,163 Stage-1 map = 0%, reduce = 0%
2019-11-16 21:53:50,537 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.46 sec
2019-11-16 21:53:58,026 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.58 sec
MapReduce Total cumulative CPU time: 3 seconds 580 msec
Ended Job = job_1572771724749_0020
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.58 sec HDFS Read: 7944 HDFS Write: 20 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 580 msec
OK
CA 1
NJ 1
NY 1
WA 3
Time taken: 26.185 seconds, Fetched: 4 row(s)
hive> select address, count(*) as customer_count from customers group by address;
Query ID = cloudera_20191116215454_83e98133-558e-4f60-8fcf-bbbb00fd17eb
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1572771724749_0021, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1572771724749_0021/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1572771724749_0021
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-11-16 21:54:16,056 Stage-1 map = 0%, reduce = 0%
2019-11-16 21:54:24,943 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.62 sec
2019-11-16 21:54:33,428 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.55 sec
MapReduce Total cumulative CPU time: 3 seconds 550 msec
Ended Job = job_1572771724749_0021
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.55 sec HDFS Read: 7944 HDFS Write: 20 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 550 msec
OK
CA 1
NJ 1
NY 1
WA 3
Time taken: 28.699 seconds, Fetched: 4 row(s)
hive>

hive> select * from customers limit 1;
OK
1111 John WA
Time taken: 0.139 seconds, Fetched: 1 row(s)
hive>

Observer the logs some are running MR, some are only mapper and some are without MR

----------------------------- Bee line ---------------------

to enter into beeline
beeline -u jdbc:hive2://

!q - to quit from beeline

beeline -u jdbc:hive2:// -e "select * from basan.customers"

[cloudera@quickstart Downloads]$ beeline -u jdbc:hive2://
scan complete in 2ms
Connecting to jdbc:hive2://
Connected to: Apache Hive (version 1.1.0-cdh5.13.0)
Driver: Hive JDBC (version 1.1.0-cdh5.13.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.13.0 by Apache Hive

0: jdbc:hive2://> select * from basan.customers;
OK
+---------------+-----------------+--------------------+--+
| customers.id | customers.name | customers.address |
+---------------+-----------------+--------------------+--+
| 1111 | John | WA |
| 2222 | Emily | WA |
| 3333 | Rick | WA |
| 4444 | Jane | CA |
| 5555 | Amit | NJ |
| 6666 | Nina | NY |
+---------------+-----------------+--------------------+--+
6 rows selected (0.996 seconds)
0: jdbc:hive2://>

Connect to beeline and get the data from query
[cloudera@quickstart Downloads]$ beeline -u jdbc:hive2:// -e "select * from basan.customers"

scan complete in 3ms
Connecting to jdbc:hive2://
Connected to: Apache Hive (version 1.1.0-cdh5.13.0)
Driver: Hive JDBC (version 1.1.0-cdh5.13.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
OK
+---------------+-----------------+--------------------+--+
| customers.id | customers.name | customers.address |
+---------------+-----------------+--------------------+--+
| 1111 | John | WA |
| 2222 | Emily | WA |
| 3333 | Rick | WA |
| 4444 | Jane | CA |
| 5555 | Amit | NJ |
| 6666 | Nina | NY |
+---------------+-----------------+--------------------+--+
6 rows selected (1.809 seconds)
Beeline version 1.1.0-cdh5.13.0 by Apache Hive
Closing: 0: jdbc:hive2://
[cloudera@quickstart Downloads]$

------------- command to check if all the services are running----
sudo service --status-all

ttpd is stopped
supervisor (pid 7971) is running...
Impala Catalog Server is running [ OK ]
Impala Server is running [ OK ]
Impala State Store Server is running [ OK ]
ip6tables: Firewall is not running.
iptables: Firewall is not running.
irqbalance (pid 4542) is running...
Kdump is operational
lvmetad is stoppe

------------- beeline having set of instructions----

cd Desktop
gedit beelineScriptExec.hql

Below is the content of the file.
show databases;
use basan;
show tables;
select * from customers;

beeline -u jdbc:hive2:// -f /home/cloudera/Desktop/beelineScriptExec.hql

[cloudera@quickstart Desktop]$ beeline -u jdbc:hive2:// -f /home/cloudera/Desktop/beelineScriptExec.hql
scan complete in 3ms
Connecting to jdbc:hive2://
Connected to: Apache Hive (version 1.1.0-cdh5.13.0)
Driver: Hive JDBC (version 1.1.0-cdh5.13.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://> show databases;
OK
+----------------+--+
| database_name |
+----------------+--+
| basan |
| default |
+----------------+--+
2 rows selected (1.241 seconds)
0: jdbc:hive2://> use basan;
OK
No rows affected (0.081 seconds)
0: jdbc:hive2://> show tables;
OK
+------------+--+
| tab_name |
+------------+--+
| customers |
+------------+--+
1 row selected (0.096 seconds)
0: jdbc:hive2://> select * from customers;
OK
+---------------+-----------------+--------------------+--+
| customers.id | customers.name | customers.address |
+---------------+-----------------+--------------------+--+
| 1111 | John | WA |
| 2222 | Emily | WA |
| 3333 | Rick | WA |
| 4444 | Jane | CA |
| 5555 | Amit | NJ |
| 6666 | Nina | NY |
+---------------+-----------------+--------------------+--+
6 rows selected (0.832 seconds)
0: jdbc:hive2://>
0: jdbc:hive2://>
Closing: 0: jdbc:hive2://
[cloudera@quickstart Desktop]$

If we are already in the beeline and want to execute hql use the below Commands
//for connecting to beeline
beeline -u jdbc:hive2://
//for executing hql file from beeline cli
source /home/cloudera/Desktop/beelineScriptExec.hql;

[cloudera@quickstart Desktop]$ beeline -u jdbc:hive2://
scan complete in 4ms
Connecting to jdbc:hive2://
Connected to: Apache Hive (version 1.1.0-cdh5.13.0)
Driver: Hive JDBC (version 1.1.0-cdh5.13.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.13.0 by Apache Hive
0: jdbc:hive2://> source /home/cloudera/Desktop/beelineScriptExec.hql
. . . . . . . . > ;
OK
+----------------+--+
| database_name |
+----------------+--+
| basan |
| default |
+----------------+--+
2 rows selected (1.178 seconds)
OK
No rows affected (0.062 seconds)
OK
+------------+--+
| tab_name |
+------------+--+
| customers |
+------------+--+
1 row selected (0.141 seconds)
OK
+---------------+-----------------+--------------------+--+
| customers.id | customers.name | customers.address |
+---------------+-----------------+--------------------+--+
| 1111 | John | WA |
| 2222 | Emily | WA |
| 3333 | Rick | WA |
| 4444 | Jane | CA |
| 5555 | Amit | NJ |
| 6666 | Nina | NY |
+---------------+-----------------+--------------------+--+
6 rows selected (0.732 seconds)
0: jdbc:hive2://>

----- hue ----

hue is the other way of accessing the hive table from using

http://quickstart.cloudera:8888/hue/editor/?type=hive
cloudera/cloudera

----- Accessing hadoop from hive terminal -----------------------------
We can use dfs -ls command to access file system from hive terminal.

hive> dfs -ls /
> ;
Found 10 items
drwxrwxrwx - hdfs supergroup 0 2017-10-23 09:15 /benchmarks
drwxr-xr-x - cloudera supergroup 0 2019-11-03 03:58 /data
drwxr-xr-x - hbase supergroup 0 2019-11-03 01:02 /hbase
drwxr-xr-x - cloudera supergroup 0 2019-11-15 23:18 /mapreduce_input
drwxr-xr-x - cloudera supergroup 0 2019-11-15 23:45 /mapreduce_output
drwxr-xr-x - solr solr 0 2017-10-23 09:18 /solr
drwxr-xr-x - cloudera supergroup 0 2019-11-09 02:01 /spark_data
drwxrwxrwt - hdfs supergroup 0 2019-11-01 08:59 /tmp
drwxr-xr-x - hdfs supergroup 0 2019-11-03 01:14 /user
drwxr-xr-x - hdfs supergroup 0 2017-10-23 09:17 /var
hive>

----- Types of table-----------------------------
Managed Table :
Data is owned by hive. If you drop the table, then metadata and actual content wil be dropped
External Table :
Hive does not own the data. If you drop the table, then metadata will be dropped. Content will
remain untouched.

Alfresco

Saturday, November 16, 2019

Hive Internals

1 comment: