Apache Spark SQL / Hive: Create External Table based on File in HDFS
Mike Houngbadji

Mike Houngbadji @mikekenneth77

About: Hi, just a problem solver, I use mainly Python. Working in Data Engineering & Analytics, though still connected to the web/apps dev community.

Location:
World
Joined:
Mar 22, 2019

Apache Spark SQL / Hive: Create External Table based on File in HDFS

Publish Date: Jul 13 '23
0 0

To create a Table based on a file located in HDFS, we'll proceed as follow:

  • Update the file/folder to HDFS:
hadoop fs -put /local/source/location /hdfs/destination/location
Enter fullscreen mode Exit fullscreen mode
  • Create the table using the below SQL:
CREATE TABLE sample_table(
        key STRING,
        data STRING)
USING CSV  -- This is based on the format of your source files
OPTIONS ('delimiter'=',',  -- This only needed for delimited file.
        'path'='hdfs:///hdfs/destination/location')
Enter fullscreen mode Exit fullscreen mode
  • We can the now query our table:
SELECT *
FROM sample_table
Enter fullscreen mode Exit fullscreen mode

References:
SparkSQL Documentation - Create Table

PS:
I wrote this to also help myself retrieve the solution faster.

Comments 0 total

    Add comment