How to Realize Breakpoint Continuation of Large Files Similar to Baidu Disk

Catalog

background

If you are responsible for the open platform in your work, there is often a demand to upload files. Generally, pictures within 10M can be uploaded through one upload interface, but if the file size exceeds 100M or 1G or even larger, it is obviously unfriendly to human-computer interaction through one interface, expecting to provide users with a progress bar to inform the upload progress in real time; And users can choose to pause, such as disconnecting the network or uploading error files, and users can resume uploading at any time. If the user repeatedly uploads the same file, the system can prompt for a successful second upload. That is to achieve upload function similar to Baidu netdisk.

  • Small file overall upload effect map

  • Large File Slice Upload Effect

Next, we will build a front-end and a server from scratch to upload small files and then large files step by step.

The technology stack is mainly front-end: React, AntD, Typescript; Server: TS-Node, Express...

The article begins with @lan-react/upload , please indicate the source. Client Code Storage,Service-side Code Storage.

Implement small file upload as a whole

Build Front End Environment

Create a project from create-react-app --template typescript

Introduce antd yarn add antd and yarn start to run the project

Write uploaded components

import React, { ChangeEvent, useState, useEffect } from 'react';
import { Row, Col, Input } from 'antd';
interface UploadProps {
  width?: number;
}
interface CurrentFile {
  file: File;
  dataUrl?: string;
  type?: string;
}
const isImage = (type: string = ''): boolean => type.includes('image');
const Upload: React.FC<UploadProps> = (props) => {
  const {
    width = 300,
  } = props;
  const [currentFile, setCurrentFile] = useState<CurrentFile>();
  const onFileChange = (event: ChangeEvent<HTMLInputElement>) => {
    const file: File = event.target.files![0];
    if (file) {
      const reader = new FileReader();
      reader.addEventListener('load', () => {
        setCurrentFile({
          file: file,
          dataUrl: reader.result as string,
          type: file.type,
        })
      });
      reader.readAsDataURL(file);
    }
  }
  return (
    <div>
      <Input type="file" style={{ width: width }} onChange={onFileChange} />
      { isImage(currentFile?.type) ? <img src={currentFile?.dataUrl} style={{ width: 100 }} alt={currentFile?.file.name} /> : null }
    </div>
  )
}
export default Upload;

Small file upload using FormData

return (
  <div>
    <Input type="file" style={{ width: width }} onChange={onFileChange} />
+    <Button type="primary" onClick={() => onFileUpload(UploadType.WHOLE)}>Small file upload as a whole</Button>
  </div>
)

Write a button and process the upload

// Upload Type
enum UploadType {
  WHOLE,
  PART,
}
// Size Detection
const checkSize = (size: number = 0, maxSize: number = 2 * 1024 * 1024 * 1024): boolean => {
  if (size > maxSize) {
    message.error(`File size cannot exceed 2 G`)
    return false
  }
  return true
}
const onFileUpload = (type: UploadType = UploadType.WHOLE) => {
  if (!currentFile?.file) {
    message.error('Please select a file~')
    return
  }
  if (!checkSize(currentFile?.file?.size)) return
  switch (type) {
    case UploadType.WHOLE:
      wholeUpload();
      break;
  }
}
// Overall upload
const wholeUpload = async () => {
  const formData = new FormData()
  formData.append('file', currentFile?.file as File)
  formData.append('name', currentFile?.file.name as string)
  const res = await request({
    url: '/wholeUpload',
    method: 'POST',
    data: formData,
  })
  message.success('Upload Successful');
}

Then simply encapsulate request

export interface Config {
  baseUrl?: string;
  url?: string;
  method?: string;
  headers?: any;
  data?: any;
}
export const request = (conf: Config): Promise<any> => {
  const config: Config = {
    method: 'GET',
    baseUrl: 'http://localhost:8000',
    headers: {},
    data: {},
    ...conf
  }
  return new Promise((resolve, reject) => {
    const xhr = new XMLHttpRequest();
    xhr.open(config.method as string, `${config.baseUrl}${config.url}`);
    for (const key in config.headers) {
      if (Object.prototype.hasOwnProperty.call(config.headers, key)) {
        const value = config.headers[key];
        xhr.setRequestHeader(key, value);
      }
    }
    xhr.responseType = 'json';
    xhr.onreadystatechange = () => {
      if (xhr.readyState === 4) {
        if (xhr.status === 200) {
          resolve(xhr.response);
        } else {
          reject(xhr.response);
        }
      }
    }
    xhr.send(config.data);
  })
}

Setting up a service-side environment

Using nodemon and ts-node

"scripts": {
  "dev": "cross-env PORT=8000 nodemon --exec ts-node --files ./src/www.ts"
},

Write services using http modules. / src/www.ts

import app from './app'
import http from 'http'
const port = process.env.PORT || 8000
const server = http.createServer(app)
const onError = (error: any) => {
  console.error(error)
}
const onListening = () => {
  console.log(`Listening on port ${port}`)
}
server.listen(port)
server.on('error', onError)
server.on('listening', onListening)

Then write the app.ts

import express, { Request, Response, NextFunction } from 'express'
import path from 'path'
import fs from 'fs-extra'
import logger from 'morgan'
import cors from "cors"
import multiparty from 'multiparty'
import createError from 'http-errors'
import { INTERNAL_SERVER_ERROR } from 'http-status-codes'
const app = express()
const PUBLIC_DIR = path.resolve(__dirname, 'public')
app.use(logger('dev'))
app.use(express.json())
app.use(express.urlencoded({ extended: true }))
app.use(cors())
app.use(express.static(PUBLIC_DIR))
app.post('/upload', async (req: Request, res: Response, next: NextFunction) => {
  const form = new multiparty.Form()
  form.parse(req, async (err: any, fields, files) => {
    if (err) return next(err)
    const name = fields.name[0]
    const file = files.file[0]
    await fs.move(file.path, path.resolve(PUBLIC_DIR, name), { overwrite: true })
    res.json({
      success: true
    })
  })
})
app.use((_req: Request, _res: Response, next: NextFunction) => {
  next(createError(404))
})
app.use((error: any, _req: Request, res: Response, _next: NextFunction) => {
  res.status(error.status || INTERNAL_SERVER_ERROR)
  res.json({
    succuess: false,
    error,
  })
})
export default app

Store the file locally, in the public directory at the root of the service-side project.

Then start the service npm run dev

Then do the following to upload a small file.

Next, you'll do a piecewise upload of large files

How to Upload Large Files in Pieces

Thoughts on Partial Upload of Large Files

  • Clients split large files. (Here the granularity is 10M, and the split files are sorted by fileName-${index} for server-side merge)
  • In order to achieve the second pass function, hash values need to be calculated for the file contents. (Computing hash takes time, implements with worker, and provides a progress bar)
  • Client calls interface upload in turn for small split files.
  • The server provides an upload interface. (Store all small files in a temporary directory)
  • After the client uploads all the fragmented files, the interface requesting merge is invoked.
  • The server provides a merge interface. (merge by sorted file name above and store locally after merging into large files)
  • The client provides pause/resume functionality. (pause calls xhr.abort(), and resume uploads again)
  • In particular: the client invokes the check interface before uploading. (Know if the file has been uploaded and what part of the file has been uploaded?)
  • The server provides a verification interface. (Reads the fragmentation in the temporary directory based on the file name, and returns the fragmentation information to the client if any)
  • Clients process based on what is returned. (seconds for finished files, seconds for fragmented files, and only the rest)
  • Upload complete

The above general implementation ideas, the following will describe the implementation details and points of attention.

Client implementation slicing

const onFileUpload = (type: UploadType = UploadType.WHOLE) => {
  // ...
  switch (type) {
    case UploadType.WHOLE:
      wholeUpload();
      break;
+    case UploadType.PART:
+      partUpload()
+      break;
  }
}
interface Part {
  chunk: Blob;
  size: number;
  fileName?: string;
  chunkName?: string;
  loaded?: number;
  percent?: number;
  xhr?: XMLHttpRequest;
}
const partUpload = async () => {
  setUploadStatus(UploadStatus.UPLOADING);
  // 1. File slicing
  // 2. Compute file hash based on fragmentation
  // 3. Partial upload
  const partList = createChunks(currentFile?.file as File);
  const fileHash = await generateHash(partList);
  console.log(fileHash, 'fileHash');
  const lastDotIdx = currentFile?.file.name.lastIndexOf('.');
  const extName = currentFile?.file.name.slice(lastDotIdx);
  const fileName = `${fileHash}${extName}`;
  partList.forEach((part: Part, index: number) => {
    part.fileName = fileName;
    part.chunkName = `${fileName}-${index}`;
    part.loaded = 0;
    part.percent = 0;
  })
  setFileName(fileName);
  setPartList(partList);
  await uploadParts(partList, fileName)
}

Then implement the two core methods createChunks and generateHash in turn

const createChunks = (file: File, size: number = DEAFULT_SIZE): Part[] => {
  let current: number = 0;
  const partList: Part[] = [];
  while (current < file.size) {
    const chunk: Blob = file.slice(current, current + size);
    partList.push({
      chunk,
      size: chunk.size,
    })
    current += size
  }
  return partList;
}

Client computing hash

const generateHash = (partList: Part[]): Promise<any> => {
  return new Promise((resolve, reject) => {
    const worker = new Worker('/generateHash.js');
    worker.postMessage({ partList });
    worker.onmessage = (event) => {
      const { percent, hash } = event.data;
      setHashPercent(percent);
      if (hash) {
        resolve(hash);
      }
    }
    worker.onerror = error => {
      reject(error);
    }
  })
}

Create a new generateHash under public mainly with the help of Worker. JS file

self.importScripts('https://cdn.bootcss.com/spark-md5/3.0.0/spark-md5.js');
self.onmessage = async (event) => {
  const { partList } = event.data;
  const spark = new self.SparkMD5.ArrayBuffer();
  let percent = 0;
  const perSize = 100 / partList.length;
  const buffers = await Promise.all(partList.map(({ chunk }) => new Promise((resolve, reject) => {
    const reader = new FileReader();
    reader.readAsArrayBuffer(chunk);
    reader.onload = (e) => {
      percent += perSize;
      self.postMessage({ percent: Number(percent.toFixed(2)) });
      resolve(e.target.result);
    }
  })));
  buffers.forEach(buffer => spark.append(buffer));
  self.postMessage({ percent: 100, hash: spark.end() });
  self.close();
}

Give generateHash the postMessage of each slice's calculation progress, then synchronize the progress setHashPercent in real time

Client Upload Slices

const uploadParts = async (partList: Part[], fileName: string) => {
  const res = await request({
    url: `/verify/${fileName}`,
  })
  if (res.code === 200) {
    if (!res.data.needUpload) {
      message.success('Seconds passed successfully');
      setPartList(partList.map((part: Part) => ({
        ...part,
        loaded: DEAFULT_SIZE,
        percent: 100,
      })))
      reset()
      return
    }
    try {
      const { uploadedList } = res.data
      const requestList = createRequestList(partList, uploadedList, fileName);
      const partsRes = await Promise.all(requestList);
      if (partsRes.every(item => item.code === 200)) {
        const mergeRes = await request({
          url: `/merge/${fileName}`,
        })
        if (mergeRes.code === 200) {
          message.success('Upload Successful');
          reset()
        } else {
          message.error('Upload failed, please try again later~');
        }
      } else {
        message.error('Upload failed, please try again later~');
      }
    } catch (error) {
      message.error('Upload failed or paused');
      console.error(error);
    }
  }
}
  • Clients invoke the check interface before uploading. (Know if the file has been uploaded and what part of the file has been uploaded?)
  • Client calls interface upload in turn for small split files.
  • After the client uploads all the fragmented files, the interface requesting merge is invoked.

Server Implements Check Interface

export const PUBLIC_DIR = path.resolve(__dirname, 'public')
export const TEMP_DIR = path.resolve(__dirname, 'temp')
const DEAFULT_SIZE = 1024 * 1024 * 10;
// ...
app.get('/verify/:fileName', async (req: Request, res: Response, _next: NextFunction) => {
  const { fileName } = req.params
  const filePath = path.resolve(PUBLIC_DIR, fileName)
  const existFile = await fs.pathExists(filePath)
  if (existFile) {
    res.json({
      code: 200,
      msg: 'success',
      data: {
        needUpload: false,
      },
    })
    return
  }
  const folderPath = path.resolve(TEMP_DIR, fileName)
  const existFolder = await fs.pathExists(folderPath)
  let uploadedList: any[] = []
  if (existFolder) {
    uploadedList = await fs.readdir(folderPath)
    uploadedList = await Promise.all(uploadedList.map(async (fileName: string) => {
      const stat = await fs.stat(path.resolve(folderPath, fileName))
      return {
        fileName,
        size: stat.size,
      }
    }))
  }
  res.json({
    code: 200,
    msg: 'success',
    data: {
      needUpload: true,
      uploadedList,
    }
  })
})

Server Implements Piecewise Upload Interface

app.post('/partUpload/:fileName/:start/:chunkName', async (req: Request, res: Response, _next: NextFunction) => {
  const { fileName, chunkName, start } = req.params
  const folderPath = path.resolve(TEMP_DIR, fileName)
  const existFolder = await fs.pathExists(folderPath)
  if (!existFolder) {
    await fs.mkdirs(folderPath)
  }
  const filePath = path.resolve(folderPath, chunkName)
  const ws = fs.createWriteStream(filePath, {
    start: Number(start),
    flags: 'a',
  })
  req.on('end', () => {
    ws.close()
    res.json({
      code: 200,
      msg: 'success',
      data: true,
    })
  })
  req.on('error', () => {
    ws.close()
  })
  req.on('close', () => {
    ws.close()
  })
  req.pipe(ws)
})
  • fileName creates a temporary directory based on the fileName to store the fragmented files
  • chunkName slice name, named in fileName-${index}
  • How many start record fragments have been uploaded

Service side implements merge interface

app.get('/merge/:fileName', async (req: Request, res: Response, _next: NextFunction) => {
  const { fileName } = req.params
  try {
    await mergeChunks(fileName)
    res.json({
      code: 200,
      msg: 'success',
      data: true,
    })
  } catch (error) {
    res.json({
      code: 1,
      msg: 'error',
      data: false,
    })
  }
})

Corresponding to the client, the merge rule is the opposite of the split rule.

const getIndex = (str: string) => {
  const matched = str.match(/-(\d{1,})$/)
  return matched ? Number(matched[1]) : 0
}

const pipeStream = (filePath: string, ws: WriteStream) => new Promise((resolve, _reject) => {
  const rs = fs.createReadStream(filePath)
  rs.on('end', async () => {
    await fs.unlink(filePath)
    resolve()
  })
  rs.pipe(ws)
})

export const mergeChunks = async (fileName: string, size: number = DEAFULT_SIZE) => {
  const filePath = path.resolve(PUBLIC_DIR, fileName)
  const folderPath = path.resolve(TEMP_DIR, fileName)
  const folderFiles = await fs.readdir(folderPath)
  folderFiles.sort((a, b) => getIndex(a) - getIndex(b))
  await Promise.all(folderFiles.map((chunk: string, index: number) => pipeStream(
    path.resolve(folderPath, chunk),
    fs.createWriteStream(filePath, {
      start: index * size
    })
  )))
  await fs.rmdir(folderPath)
}

Using streams instead of writing files directly

Client implements pause/resume function

<Row>
  <Col span={24}>
    {
      uploadStatus === UploadStatus.INIT && <Button type="primary" onClick={() => onFileUpload(UploadType.PART)}>Large File Piecewise Upload</Button>
    }
    {
      uploadStatus === UploadStatus.UPLOADING && <Button type="primary" onClick={() => onFilePause()}>suspend</Button>
    }
    {
      uploadStatus === UploadStatus.PAUSE && <Button type="primary" onClick={() => onFileResume()}>recovery</Button>
    }
  </Col>
</Row>

Call XHR when a pause is implemented. Abrt method, needs to be in request first. Provide it mountable in TS

// request.ts
if (config.setXhr) {
  config.setXhr(xhr);
}
// upload.tsx
const onFilePause = () => {
  partList.forEach((part: Part) => part.xhr && part.xhr.abort())
  setUploadStatus(UploadStatus.PAUSE)
}

The createRequestList method needs to be improved before this

const createRequestList = (partList: Part[], uploadedList: Uploaded[], fileName: string): Promise<any>[] => {
  return partList.filter((part: Part) => {
    const uploadedFile = uploadedList.find(item => item.fileName === part.chunkName);
    if (!uploadedFile) { // This chunk has not been uploaded yet
      part.loaded = 0;
      part.percent = 0;
      return true;
    }
    if (uploadedFile.size < part.chunk.size) { // This chunk is partially Uploaded
      part.loaded = uploadedFile.size;
      part.percent = Number((part.loaded / part.chunk.size * 100).toFixed(2));
      return true;
    }
    // Uploaded
    return false;
  }).map((part: Part) => request({
    url: `/partUpload/${fileName}/${part.loaded}/${part.chunkName}`,
    method: 'POST',
    headers: {
      'Content-Type': 'application/octet-stream',
    },
    setXhr: (xhr: XMLHttpRequest) => { // +Mount
      part.xhr = xhr;
    },
    onProgress: (event: ProgressEvent) => { // +Progress Bar
      part.percent = Number(((part.loaded! + event.loaded) / part.chunk.size * 100).toFixed(2));
      console.log('part percent: ', part.percent)
      setPartList([...partList])
    },
    data: part.chunk.slice(part.loaded),
  }))
}

Implement recovery again, that is, request again

const onFileResume = async () => {
  await uploadParts(partList, fileName)
  setUploadStatus(UploadStatus.UPLOADING)
}

Client Implements Progress Bar Function

As in the method createRequestList code above, get the upload progress in real time in onProgress

const columns = [
  {
    title: 'Fragment Name',
    dataIndex: 'chunkName',
    key: 'chunkName',
    width: '20%',
  },
  {
    title: 'Progress bar',
    dataIndex: 'percent',
    key: 'percent',
    width: '80%',
    render: (value: number) => {
      return <Progress percent={value} />
    }
  }
]

const totalPercent = partList.length > 0 
  ? partList.reduce((memo: number, curr: Part) => memo + curr.percent!, 0) / partList.length
  : 0
{ uploadStatus !== UploadStatus.INIT ? (
  <>
    <Row>
      <Col span={4}>
        Hash Progress bar
      </Col>
      <Col>
        <Progress percent={hashPercent} />
      </Col>
    </Row>
    <Row>
      <Col span={4}>
        Total Progress bar
      </Col>
      <Col>
        <Progress percent={totalPercent} />
      </Col>
    </Row>
    <Table 
      columns={columns}
      dataSource={partList}
      rowKey={row => row.chunkName as string}
    />
  </>
) : null }

Client Implements File Secondary Transfer

Implemented in the uploadParts method above

const uploadParts = async (partList: Part[], fileName: string) => {
  const res = await request({
    url: `/verify/${fileName}`,
  })
  if (res.code === 200) {
    if (!res.data.needUpload) {
      message.success('Seconds passed successfully');
      setPartList(partList.map((part: Part) => ({
        ...part,
        loaded: DEAFULT_SIZE,
        percent: 100,
      })))
      reset()
      return
    }
    // ...
  }
}

bingo

The effect diagram is as follows

summary

  • Small file upload use FormData , large file upload setting'Content-Type':'application/octet-stream'. FormData can carry parameters, and octet-stream parameters can be set in the url.
formData.append('file', currentFile?.file as File)
formData.append('name', currentFile?.file.name as string)

request({ url: `/partUpload/${fileName}/${part.loaded}/${part.chunkName}` })
  • Because File Inherited from Blob, client can use Blob.slice Split large files; The service side stores fragmented files and provides merge interfaces to merge in a cutting order (using createWriteStream/createReadStream).
  • In order to achieve the second upload function, the file needs to be uniquely identified, and the server checks that the uploaded file returns the success and access address directly.

    • Use Worker Create background tasks to compute the unique identity of large files to avoid page jams.
    • Use Spark-md5 Calculate file uniquely identifies MD5
  • Provide progress bar functionality

    • Worker can be used when calculating MD5. PosMessage notifies the front end of progress by fragmentation granularity
    • Upload slices with xhr.upload.onprogress Real-time notification of front-end upload progress
    • Front End Shows Progress Bar with Antd-Progress/Table
  • Provide pause/resume functionality

    • Pause Help xhr.abort() Terminate Request
    • Re-upload to get the upload, and then upload only the non-uploaded part. Read the upload fragments on the server side and use the blob again when the client uploads. Slice (part.loaded). Press FS when the server stores. CreateWriteStream (filePath, {start: Number (start), flags:'a'}) appends files.

Tags: Front-end Interview

Posted by doforumda on Sun, 22 May 2022 20:42:59 +0300