Install SQOOP on windows — step by step guide (with bug fixes)

A guide to installing Apache SQOOP on a windows device. This guide includes all the steps to set up sqoop as well as fixes for any bugs you may encounter.

This article is part of the data lake set up tutorial. Sqoop is a tool developed by Apache which allows you to transfer data between an RDBMS and HADOOP. In this tutorial we will see how to set up SQOOP 1.4.7 on Windows 10.

If you do not know what HADOOP is or how it is set up then you can begin here.

Download the SQOOP package

You need to download the following package from it’s official download page; “sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz”. Download it from the following link.

Unpacking

Now unpack this ‘.tar’ file in your destination directory. In my case the destination directory is ‘D:/Data_lake/’. I simply copy the .tar file into this directory and run the following command in the command prompt.

tar -xvzf 'sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz'
Code language: JavaScript (javascript)

You can use any other software to unpack the files.

It might take a few seconds to unpack. 

Setting Environment Variables

We need to add this to the path in System Variables. Search for environment variables in your windows search bar and click on ‘edit the system environment variables’. From their click on “Environment Variables…”. Next click on ‘New…’ under the ‘user variables’ tab.

Name this new variable: “SQOOP_HOME”

The path will be the path to where you unpacked the Sqoop tar files. See below screenshot for my path.

SQOOP Environment variable
SQOOP Environment variable

Configure SQOOP

Ice cream scoop
Photo by Sharon McCutcheon on Unsplash

In order to configure SQOOP, you will need Git Bash. Open the Git Bash terminal and execute the following commands.

**Please note you will need to install Git for this**

cd $SQOOP_HOME\\bin<br>./configure-sqoop
Code language: HTML, XML (xml)

Some warnings may appear in the terminal window but ignore those for now.

Verify Installation

Open command prompt and run the following command.

%SQOOP_HOME%\bin\sqoop.cmd version
Code language: CSS (css)
SQOOP verification

This command should execute without any error (like shown in above screenshot). 

SQOOP is now ready to work.

Connecting SQOOP with SQLServer database

Sqoop uses a jdbc connector to make a connection between a RDBMS and Hadoop. 

There are two ways to connect to SQL Server:

  1. Using Username and Password. For this you need to create a User on your SQL Server. Your connection string would be something like.
"jdbc:sqlserver://localhost:1433;user=<username>;password=<password>"
Code language: HTML, XML (xml)

and your entire sqoop command would be:

%SQOOP_HOME%\bin\sqoop.cmd list-databases --connect "jdbc:sqlserver://localhost:1433;user=<username>;password=<password>"
Code language: HTML, XML (xml)

The above command would list all the databases present in your server.

2. Using Windows Authentication. This method only works if your SQL Server resides on the same machine on which you are running SQOOP and you have the right permissions. 

Your connection string would be:

"jdbc:sqlserver://localhost:1433;integratedSecurity=true"
Code language: JSON / JSON with Comments (json)

And similarly the SQOOP connect statement would be.

%SQOOP_HOME%\bin\sqoop.cmd list-databases --connect "jdbc:sqlserver://localhost:1433;integratedSecurity=true"
Code language: PHP (php)

Once connected to the RDBMS via the connection string, there are a lot of functions in sqoop which you can explore. We see all those in a separate blog post. Stay Tuned!!

Possible errors

  1. Failed to login to user.

In this case double check your username and password. If those are correct then run your command prompt as administrator (permission issues).

2. Missing sqljdbc_auth.dll file.

FIX: Go to the following page and download the sqlserver jdbc connector. Unzip the file and go to the directory.

/sqljdbc_6.0/enu/auth/x64(or x86 depending on your sytem)/sqljdbc_auth.dll

Copy this DLL file to the following path.

%HADOOP_HOME%\bin\

3. Sqoop Import Failed “Java.Lang.Classnotfoundexception: Org.Apache.Commons.Lang.Stringutils”

FIX: Download the commons-lang-2.6.jar file from the following page. Paste this file to the following path.

%SQOOP_HOME%\lib

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Prev
Install Hadoop on Windows — Step by step guide
Installing hadoop on windows 10

Install Hadoop on Windows — Step by step guide

A guide to setting up APACHE HADOOP on Windows 10 and running hdfs clusters as

Next
Feature Selection Algorithms for Machine Learning
Feature Selection Algorithms for Machine Learning

Feature Selection Algorithms for Machine Learning

Lets talk two powerful feature selection algorithms and how they can help you

You May Also Like