Hadoop Core - HDFS Basics

content

hadoop cluster start

cd /export/servers/hadoop-2.7.5/
sbin/start-dfs.sh
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver

Three port viewing interface

http://node01:50070/explorer.html#/ View hdfs
http://node01:8088/cluster View yarn cluster
http://node01:19888/jobhistory View historical completed tasks

Architecture of HDFS

HDFS is a master/slave (Mater/Slave) architecture

HDFS consists of four parts, HDFS Client, NameNode, DataNode and Secondary NameNode.

HDFS Client: It is the client.

File segmentation. When a file is uploaded to HDFS, the Client will divide the file into blocks one by one, and then save the file.
store.
Interact with the NameNode to get the location information of the file.
Interact with DataNode to read or write data.
Client provides some commands to manage and access HDFS, such as starting or closing HDFS.

NameNode: It is the master, which is a supervisor and manager.

  • Manage namespaces for HDFS
  • Manage data block (Block) mapping information
  • Configure a copy policy
  • Handle client read and write requests.

DataNode: It is Slave. The NameNode issues the command, and the DataNode performs the actual operation.

  • Store the actual data block.
  • Perform read/write operations of data blocks.

Secondary NameNode: It is not a hot backup of the NameNode. When the NameNode dies, it cannot immediately replace the NameNode and provide services.

  • Assist NameNode to share its workload.
  • Periodically merge fsimage and fsedits and push to NameNode.
  • In an emergency, it can assist in restoring the NameNode.

hdfs command line usage

  • ls

    Format: hdfs dfs -ls URI
     Action: similar to Linux of ls command to display a list of files
    eg. hdfs dfs -ls /
    
  • lsr

    Format : hdfs dfs -lsr URI
     effect : Execute recursively in the entire directory ls, and UNIX middle ls-R similar
    eg. hdfs dfs -lsr /
    
  • mkdir

    Format : hdfs dfs [-p] -mkdir <paths>
    effect : by<paths>middle URI As an argument, create a directory. use-p Arguments can recursively create directories
    
  • put

    Format : hdfs dfs -put <localsrc > ... <dst>
    Role: convert a single source file src or multiple source files srcs Copy from the local file system to the target file system (<dst>corresponding path). Can also read input from standard input and write to the target file system
    hdfs dfs -put /rooot/a.txt /dir1
    
  • moveFromLocal

    Format: hdfs dfs -moveFromLocal <localsrc> <dst>
    effect: and put Commands are similar, but source files localsrc After copying itself is deleted
    hdfs dfs -moveFromLocal /root/install.log /
    
  • get

    Format hdfs dfs -get [-ignorecrc ] [-crc] <src> <localdst>
    Role: Copy the file to the local file system. CRC The file that failed to verify passed-ignorecrc option copy.  documents and CRC The checksum can be passed through-CRC option copy
    hdfs dfs -get /install.log /export/servers
    
  • mv

    Format : hdfs dfs -mv URI <dest>
    Function: will hdfs The file on the file is moved from the original path to the target path (the file is deleted after the move), this command cannot cross file systems
    hdfs dfs -mv /dir1/a.txt /dir2
    
  • rm

    Format: hdfs dfs -rm [-r] [-skipTrash] URI [URI . . . ]
    Function: Delete the file specified by the parameter, there can be multiple parameters.  This command only deletes files and non-empty directories.
    if specified-skipTrash option, then if the recycle bin is available, this option will skip the recycle bin and delete the file directly; otherwise, when the recycle bin is available, HDFS Shell Executing this command in , will temporarily put the file in the recycle bin.
    hdfs dfs -rm -r /dir1
    
  • cp

    Format: hdfs dfs -cp URI [URI ...] <dest>
    Function: Copy the file to the target path. if<dest> If it is a directory, you can copy multiple files to the directory
     Down.
    -f
     option will overwrite the target if it already exists.
    -p
     option will preserve file attributes (timestamp, ownership, license, ACL,XAttr). 
    hdfs dfs -cp /dir1/a.txt /dir2/b.txt
    
  • cat

    hdfs dfs -cat URI [uri ...]
    Function: Output the contents of the file indicated by the parameter to stdout
    hdfs dfs -cat /install.log
    
  • chmod

    Format: hdfs dfs -chmod [-R] URI[URI ...]
    Function: Change file permissions. If using -R option, effectively recursively executes the entire directory. user using this command
     Must be the owning user of the file, or superuser.
    hdfs dfs -chmod -R 777 /install.log
    
  • chown

    Format: hdfs dfs -chmod [-R] URI[URI ...]
    Function: Change the user and user group to which the file belongs. If using -R option, effectively recursively executes the entire directory. use
     The user of this command must be the owning user of the file, or the superuser.
    hdfs dfs -chown -R hadoop:hadoop /install.log
    
  • appendToFile

    Format: hdfs dfs -appendToFile <localsrc> ... <dst>
    effect: Append one or more files to hdfs in the specified file.It is also possible to read input from the command line.
    hdfs dfs -appendToFile a.xml b.xml /big.xml
    

HDFs JAVA API Operations

hdfs URL access, input and output

    @Test
    public void urlHdfs() throws IOException {
        // 1. Register the url of HDFs
        // There, think about the object that should be placed inside the brackets: URLStreamHandlerFactory fac
        // View all subclasses of a class and subclasses of subclasses and display them in a hierarchical relationship
        // A URLStreamHandlerFactory instance is used to construct stream protocol handlers from protocol names.
        URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
        // 2. Get the input stream of the hdfs file
        InputStream inputStream=new URL("hdfs://node01:8020/a.txt").openStream();

        // 3. Get the output stream of the local file
            FileOutputStream OutputStream=new FileOutputStream(new File("D:\\hello.txt"));
        // 4. Implement file copying
        IOUtils.copy(inputStream, OutputStream);

        // 5. Shutdown
        IOUtils.closeQuietly(inputStream);
        IOUtils.closeQuietly(OutputStream);
    } 

Four main ways to get hdfs file system

    @Test
    public void getFileSystem4() throws IOException, URISyntaxException {
        // 1: Get the specified file system
        FileSystem fileSystem=FileSystem.newInstance(new URI("hdfs://node01:8020"),new Configuration());
        // 2: output
        System.out.println(fileSystem);
    }

@Test
    public void getFileSystem3() throws IOException {
        // 1: Create a Configuration object
        // Configuration encapsulates the configuration of the client or server, sets the corresponding file system type, and uses the FileSystem.get method to obtain the corresponding file system object
        //to operate on the file system
        Configuration configuration=new Configuration();
        // 2: Set the file system type
        configuration.set("fs.defaultFS", "hdfs://node01:8020");
        // 3: Get the specified file system
        FileSystem fileSystem=FileSystem.newInstance(configuration);
        // 4: output
        System.out.println(fileSystem);

    }
@Test
    public void getFileSystem2() throws IOException, URISyntaxException {
        // 1: Get the specified file system
        FileSystem fileSystem=FileSystem.get(new URI("hdfs://node01:8020"),new Configuration());
        // 2: output
        System.out.println(fileSystem);
    }


    @Test
    public void getFileSystem1() throws IOException {
        // 1: Create a Configuration object
        Configuration configuration=new Configuration();
        // 2: Set the file system type
        configuration.set("fs.defaultFS", "hdfs://node01:8020");
        // 3: Get the specified file system
        FileSystem fileSystem=FileSystem.get(configuration);
        // 4: output
        System.out.println(fileSystem);
    }

Traversal of HDFs files

    @Test
    public void listFiles() throws URISyntaxException, IOException {
        // 1. Get the FileSystem instance
        FileSystem fileSystem=FileSystem.get(new URI("hdfs://node01:8020"),new Configuration());

        // 2. Call the method listFiles to get all the file information in the / directory
        RemoteIterator<LocatedFileStatus> iterator=fileSystem.listFiles(new Path("/"),true);
        // 3. Traverse the iterator
        while(iterator.hasNext()){
            LocatedFileStatus fileStatus=iterator.next();
            System.out.println("path:"+(fileStatus.getPath())+"---"+fileStatus.getPath().getName());
            BlockLocation[] blockLocations=fileStatus.getBlockLocations();
            System.out.println("How many pieces are divided into:"+blockLocations.length);
        }
    }

Create folders on HDFs

    @Test
    public void mkdirs() throws Exception{
        // 1. Get the FileSystem instance
        FileSystem fileSystem=FileSystem.get(new URI("hdfs://node01:8020"),new Configuration());
        // 2. Create a folder
        boolean mkdirs= fileSystem.mkdirs(new Path("/hello/123"));
        fileSystem.create(new Path("/hello/123/a.txt"));
        System.out.println(mkdirs);
        // 3.
        fileSystem.close();
    }

File download 1

    @Test
    public void downloadFile1() throws URISyntaxException, IOException {
        // 1. Get FileSystem
        FileSystem fileSystem=FileSystem.get(new URI("hdfs://node01:8020"), new Configuration());
        // 2. Get the input stream of hdfs
        FSDataInputStream inputStream=fileSystem.open(new Path("/a.txt"));
        // 3. Get the output stream of the local path
        OutputStream outputStream=new FileOutputStream("D://a.txt");
        // 4. Copy of documents
        IOUtils.copy(inputStream, outputStream);
        // 5. Close the stream
        IOUtils.closeQuietly(inputStream);
        IOUtils.closeQuietly(outputStream);
        fileSystem.close();
    }

File download 2

    @Test
    public void downloadFile2() throws URISyntaxException, IOException {
        // 1. Get FileSystem
        FileSystem fileSystem=FileSystem.get(new URI("hdfs://node01:8020"), new Configuration());
        // 2. Copy of files
        fileSystem.copyToLocalFile(new Path("/a.txt"),new Path("D://b.txt"));
        // 3. Shut down the file system
        fileSystem.close();
    }

File Upload

    @Test
    public void uploadFile() throws URISyntaxException, IOException {
        // 1. Get FileSystem
        FileSystem fileSystem=FileSystem.get(new URI("hdfs://node01:8020"), new Configuration());
        // 2. Copy of files
        fileSystem.copyFromLocalFile(new Path("D://b.txt"),new Path("/b.txt"));
        // 3. Shut down the file system
        fileSystem.close();
    }

Small file merge upload

    @Test
    public void mergeFile() throws URISyntaxException, IOException, InterruptedException {
        //Configuration encapsulates the configuration of the client or server, sets the corresponding file system type, and uses the FileSystem.get method to obtain the corresponding file system object
        //to operate on the file system
        //1. Get the FileSystem distributed file system
        FileSystem fileSystem=FileSystem.get(new URI("hdfs://node01:8020"),new Configuration(),"root");
        //2. Get output stream of hdfs large file
        FSDataOutputStream outputStream=fileSystem.create(new Path("/big.txt"));
        //3. Get the local file system
        LocalFileSystem localFileSystem=FileSystem.getLocal(new Configuration());
        //4. Get a list of files from the local file system as a collection
        FileStatus[] fileStatus=localFileSystem.listStatus(new Path("D://input"));
        //5. Traverse each file and get the input stream of each file
        for(FileStatus file:fileStatus){
            FSDataInputStream inputStream=localFileSystem.open(file.getPath());
            //6. Copy data from small file to large file
            IOUtils.copy(inputStream, outputStream);

            IOUtils.closeQuietly(inputStream);
        }
        IOUtils.closeQuietly(outputStream);
        localFileSystem.close();
        fileSystem.close();
    }

Tags: hdfs

Posted by Takuma on Wed, 25 May 2022 02:34:55 +0300