content
- hadoop cluster start
- Three port viewing interface
-
Architecture of HDFS
- HDFS Client: It is the client.
- NameNode: It is the master, which is a supervisor and manager.
- DataNode: It is Slave. The NameNode issues the command, and the DataNode performs the actual operation.
- Secondary NameNode: It is not a hot backup of the NameNode. When the NameNode dies, it cannot immediately replace the NameNode and provide services.
- hdfs command line usage
- HDFs JAVA API Operations
hadoop cluster start
cd /export/servers/hadoop-2.7.5/ sbin/start-dfs.sh sbin/start-yarn.sh sbin/mr-jobhistory-daemon.sh start historyserver
Three port viewing interface
http://node01:50070/explorer.html#/ View hdfs http://node01:8088/cluster View yarn cluster http://node01:19888/jobhistory View historical completed tasks
Architecture of HDFS
HDFS is a master/slave (Mater/Slave) architecture
HDFS consists of four parts, HDFS Client, NameNode, DataNode and Secondary NameNode.
HDFS Client: It is the client.
File segmentation. When a file is uploaded to HDFS, the Client will divide the file into blocks one by one, and then save the file.
store.
Interact with the NameNode to get the location information of the file.
Interact with DataNode to read or write data.
Client provides some commands to manage and access HDFS, such as starting or closing HDFS.
NameNode: It is the master, which is a supervisor and manager.
- Manage namespaces for HDFS
- Manage data block (Block) mapping information
- Configure a copy policy
- Handle client read and write requests.
DataNode: It is Slave. The NameNode issues the command, and the DataNode performs the actual operation.
- Store the actual data block.
- Perform read/write operations of data blocks.
Secondary NameNode: It is not a hot backup of the NameNode. When the NameNode dies, it cannot immediately replace the NameNode and provide services.
- Assist NameNode to share its workload.
- Periodically merge fsimage and fsedits and push to NameNode.
- In an emergency, it can assist in restoring the NameNode.
hdfs command line usage
-
ls
Format: hdfs dfs -ls URI Action: similar to Linux of ls command to display a list of files eg. hdfs dfs -ls /
-
lsr
Format : hdfs dfs -lsr URI effect : Execute recursively in the entire directory ls, and UNIX middle ls-R similar eg. hdfs dfs -lsr /
-
mkdir
Format : hdfs dfs [-p] -mkdir <paths> effect : by<paths>middle URI As an argument, create a directory. use-p Arguments can recursively create directories
-
put
Format : hdfs dfs -put <localsrc > ... <dst> Role: convert a single source file src or multiple source files srcs Copy from the local file system to the target file system (<dst>corresponding path). Can also read input from standard input and write to the target file system hdfs dfs -put /rooot/a.txt /dir1
-
moveFromLocal
Format: hdfs dfs -moveFromLocal <localsrc> <dst> effect: and put Commands are similar, but source files localsrc After copying itself is deleted hdfs dfs -moveFromLocal /root/install.log /
-
get
Format hdfs dfs -get [-ignorecrc ] [-crc] <src> <localdst> Role: Copy the file to the local file system. CRC The file that failed to verify passed-ignorecrc option copy. documents and CRC The checksum can be passed through-CRC option copy hdfs dfs -get /install.log /export/servers
-
mv
Format : hdfs dfs -mv URI <dest> Function: will hdfs The file on the file is moved from the original path to the target path (the file is deleted after the move), this command cannot cross file systems hdfs dfs -mv /dir1/a.txt /dir2
-
rm
Format: hdfs dfs -rm [-r] [-skipTrash] URI [URI . . . ] Function: Delete the file specified by the parameter, there can be multiple parameters. This command only deletes files and non-empty directories. if specified-skipTrash option, then if the recycle bin is available, this option will skip the recycle bin and delete the file directly; otherwise, when the recycle bin is available, HDFS Shell Executing this command in , will temporarily put the file in the recycle bin. hdfs dfs -rm -r /dir1
-
cp
Format: hdfs dfs -cp URI [URI ...] <dest> Function: Copy the file to the target path. if<dest> If it is a directory, you can copy multiple files to the directory Down. -f option will overwrite the target if it already exists. -p option will preserve file attributes (timestamp, ownership, license, ACL,XAttr). hdfs dfs -cp /dir1/a.txt /dir2/b.txt
-
cat
hdfs dfs -cat URI [uri ...] Function: Output the contents of the file indicated by the parameter to stdout hdfs dfs -cat /install.log
-
chmod
Format: hdfs dfs -chmod [-R] URI[URI ...] Function: Change file permissions. If using -R option, effectively recursively executes the entire directory. user using this command Must be the owning user of the file, or superuser. hdfs dfs -chmod -R 777 /install.log
-
chown
Format: hdfs dfs -chmod [-R] URI[URI ...] Function: Change the user and user group to which the file belongs. If using -R option, effectively recursively executes the entire directory. use The user of this command must be the owning user of the file, or the superuser. hdfs dfs -chown -R hadoop:hadoop /install.log
-
appendToFile
Format: hdfs dfs -appendToFile <localsrc> ... <dst> effect: Append one or more files to hdfs in the specified file.It is also possible to read input from the command line. hdfs dfs -appendToFile a.xml b.xml /big.xml
HDFs JAVA API Operations
hdfs URL access, input and output
@Test public void urlHdfs() throws IOException { // 1. Register the url of HDFs // There, think about the object that should be placed inside the brackets: URLStreamHandlerFactory fac // View all subclasses of a class and subclasses of subclasses and display them in a hierarchical relationship // A URLStreamHandlerFactory instance is used to construct stream protocol handlers from protocol names. URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()); // 2. Get the input stream of the hdfs file InputStream inputStream=new URL("hdfs://node01:8020/a.txt").openStream(); // 3. Get the output stream of the local file FileOutputStream OutputStream=new FileOutputStream(new File("D:\\hello.txt")); // 4. Implement file copying IOUtils.copy(inputStream, OutputStream); // 5. Shutdown IOUtils.closeQuietly(inputStream); IOUtils.closeQuietly(OutputStream); }
Four main ways to get hdfs file system
@Test public void getFileSystem4() throws IOException, URISyntaxException { // 1: Get the specified file system FileSystem fileSystem=FileSystem.newInstance(new URI("hdfs://node01:8020"),new Configuration()); // 2: output System.out.println(fileSystem); } @Test public void getFileSystem3() throws IOException { // 1: Create a Configuration object // Configuration encapsulates the configuration of the client or server, sets the corresponding file system type, and uses the FileSystem.get method to obtain the corresponding file system object //to operate on the file system Configuration configuration=new Configuration(); // 2: Set the file system type configuration.set("fs.defaultFS", "hdfs://node01:8020"); // 3: Get the specified file system FileSystem fileSystem=FileSystem.newInstance(configuration); // 4: output System.out.println(fileSystem); } @Test public void getFileSystem2() throws IOException, URISyntaxException { // 1: Get the specified file system FileSystem fileSystem=FileSystem.get(new URI("hdfs://node01:8020"),new Configuration()); // 2: output System.out.println(fileSystem); } @Test public void getFileSystem1() throws IOException { // 1: Create a Configuration object Configuration configuration=new Configuration(); // 2: Set the file system type configuration.set("fs.defaultFS", "hdfs://node01:8020"); // 3: Get the specified file system FileSystem fileSystem=FileSystem.get(configuration); // 4: output System.out.println(fileSystem); }
Traversal of HDFs files
@Test public void listFiles() throws URISyntaxException, IOException { // 1. Get the FileSystem instance FileSystem fileSystem=FileSystem.get(new URI("hdfs://node01:8020"),new Configuration()); // 2. Call the method listFiles to get all the file information in the / directory RemoteIterator<LocatedFileStatus> iterator=fileSystem.listFiles(new Path("/"),true); // 3. Traverse the iterator while(iterator.hasNext()){ LocatedFileStatus fileStatus=iterator.next(); System.out.println("path:"+(fileStatus.getPath())+"---"+fileStatus.getPath().getName()); BlockLocation[] blockLocations=fileStatus.getBlockLocations(); System.out.println("How many pieces are divided into:"+blockLocations.length); } }
Create folders on HDFs
@Test public void mkdirs() throws Exception{ // 1. Get the FileSystem instance FileSystem fileSystem=FileSystem.get(new URI("hdfs://node01:8020"),new Configuration()); // 2. Create a folder boolean mkdirs= fileSystem.mkdirs(new Path("/hello/123")); fileSystem.create(new Path("/hello/123/a.txt")); System.out.println(mkdirs); // 3. fileSystem.close(); }
File download 1
@Test public void downloadFile1() throws URISyntaxException, IOException { // 1. Get FileSystem FileSystem fileSystem=FileSystem.get(new URI("hdfs://node01:8020"), new Configuration()); // 2. Get the input stream of hdfs FSDataInputStream inputStream=fileSystem.open(new Path("/a.txt")); // 3. Get the output stream of the local path OutputStream outputStream=new FileOutputStream("D://a.txt"); // 4. Copy of documents IOUtils.copy(inputStream, outputStream); // 5. Close the stream IOUtils.closeQuietly(inputStream); IOUtils.closeQuietly(outputStream); fileSystem.close(); }
File download 2
@Test public void downloadFile2() throws URISyntaxException, IOException { // 1. Get FileSystem FileSystem fileSystem=FileSystem.get(new URI("hdfs://node01:8020"), new Configuration()); // 2. Copy of files fileSystem.copyToLocalFile(new Path("/a.txt"),new Path("D://b.txt")); // 3. Shut down the file system fileSystem.close(); }
File Upload
@Test public void uploadFile() throws URISyntaxException, IOException { // 1. Get FileSystem FileSystem fileSystem=FileSystem.get(new URI("hdfs://node01:8020"), new Configuration()); // 2. Copy of files fileSystem.copyFromLocalFile(new Path("D://b.txt"),new Path("/b.txt")); // 3. Shut down the file system fileSystem.close(); }
Small file merge upload
@Test public void mergeFile() throws URISyntaxException, IOException, InterruptedException { //Configuration encapsulates the configuration of the client or server, sets the corresponding file system type, and uses the FileSystem.get method to obtain the corresponding file system object //to operate on the file system //1. Get the FileSystem distributed file system FileSystem fileSystem=FileSystem.get(new URI("hdfs://node01:8020"),new Configuration(),"root"); //2. Get output stream of hdfs large file FSDataOutputStream outputStream=fileSystem.create(new Path("/big.txt")); //3. Get the local file system LocalFileSystem localFileSystem=FileSystem.getLocal(new Configuration()); //4. Get a list of files from the local file system as a collection FileStatus[] fileStatus=localFileSystem.listStatus(new Path("D://input")); //5. Traverse each file and get the input stream of each file for(FileStatus file:fileStatus){ FSDataInputStream inputStream=localFileSystem.open(file.getPath()); //6. Copy data from small file to large file IOUtils.copy(inputStream, outputStream); IOUtils.closeQuietly(inputStream); } IOUtils.closeQuietly(outputStream); localFileSystem.close(); fileSystem.close(); }