Real-time face recognition and object recognition based on nodejs on Raspberry Pi 4B using tfjs-node

start

I had this idea a few years ago. I wanted to be an assistant that can recognize faces and objects, and can recognize commands through simple dialogues. Similar to the offline enhanced version of Xiao Ai, I can monitor my cabin by the way.

However, I was too busy at the end of the year and didn’t have the time and energy to toss. I was thinking of doing it again at the beginning of the year. Who would have thought that there would be an epidemic and suddenly there would be so much free time. Unfortunately, the Raspberry Pi was not bought in time, and a lot of time was wasted.

After the resumption of work, there was a mishap in the middle, and it was almost the end of the year to finally overcome the late stage of lazy cancer and realize the basic functions.

Write it down here to make a record, after all, my memory is not very good when I am older.

Raspberry Pi environment

My Raspberry Pi is 4B, the official system. The nodejs version is 10.21.0, because the small screen connected to an oled is also controlled by nodejs, but the package that this driver depends on is too old, and the version above 12 cannot run, so it is reduced to 10. Under normal circumstances, versions around 12 can run tfjs.

The camera is a 20-piece CSI interface camera with a small bracket from a treasure, which supports 1080p. The price surprised me. The same model is not limited as long as the camera can get the video stream

If there is no Raspberry Pi, the function can be realized normally in Linux or win system

Libraries used

face recognition is face-api.js , is a js library based on tfjs, tfjs is the js version of TensorFlow, which supports web and nodejs. The general principle of this library is to take 68 points of the face for comparison. The recognition rate is quite high, and it can detect gender, age (of course, the camera comes with a beauty for entertainment today) and facial expressions.

The last maintenance time of this library was eight months ago. I don’t know if it was because of the riotousness of Laomei. It has not been maintained for a long time. The tfjs core library has been updated to about 2.6, and this library is still using 1.7.

Note that if you are using the X86 architecture Linux or win system directly, there is no problem, but if you use the arm architecture system such as Raspberry Pi, the tfjs core library before 2.0 is not supported

I have been in here for a long time. It runs well on Windows and Linux. When I install the npm package on the Raspberry Pi, it will report an error. After reading the tfjs source code, the reason for the error is that the 1.7 version has not added support for the arm architecture. Although he can run without relying on the tfjs core library, the efficiency is impressive. The recognition time of 200ms takes about 10 seconds on a Raspberry Pi that does not use the tfjs core library. Certainly not considered this way.

So the first thing to do to run on the Raspberry Pi is to download face-api.js The source code of tfjs is updated to the version of tfjs and then recompiled. When I update to the 2.6 version, a core library method will be deprecated. Just comment it out. Of course, if you are too lazy to change, you can also use the compiled version that I changed. face-api

Object recognition is also based on the tfjs library recognizejs. Similarly, tfjs needs to be upgraded to 2.0 or above. This library is just a simple application of the tfjs object recognition library coco-ssd and mobilenet, so just download it and change the version of tfjs. All right.

Get the stream of the camera

I have tried several methods here. After all, I want to use nodejs to implement the whole process, so the identification method is to obtain each frame captured by the camera. At the beginning, I used the camera command that comes with the Raspberry Pi, but each photo needs to be taken. Waiting for the camera to open the framing and then capture, it takes about a second to be too slow, ffmpeg cannot directly get the video stream to nodejs at first, if you use Python or something like this, it feels almost meaningless.

Later, it was found that ffmpeg can stream to nodejs, but nodejs is not good at directly processing video streams, so it is only necessary to convert the format pushed by ffmpeg to mjpeg, so that each frame obtained by nodejs is directly a picture, and no other processing is required. .

First install ffmpeg, Baidu has a lot of it, so I won't post it

Then install the relevant nodejs dependencies

{
  "dependencies": {
    "@tensorflow/tfjs-node": "^2.6.0",
    "babel-core": "^6.26.3",
    "babel-preset-env": "^1.7.0",
    "canvas": "^2.6.1",
    "nodejs-websocket":"^1.7.2",
    "fluent-ffmpeg": "^2.1.2"
  }
}

Note that when installing the canvas library, you will depend on many packages. You can install the corresponding package according to the error message, or you can directly install the package required by the node-canvas of Baidu Raspberry Pi.

Pull the stream from the camera

The method I use is to first use the built-in camera tool to push the stream down to port 8090, and then use nodejs ffmpeg to intercept the stream

Excuting an order

raspivid -t 0 -w 640 -h 480 -pf high -fps 24 -b 2000000 -o - | nc -k -l 8090

At this time, you can test whether the push stream is successful by playing the address port

Can be tested with ffplay

ffplay tcp://Your address: 8090

If all goes well, you should be able to see your big face

nodejs pull stream

First pull the tcp stream pushed by the port through ffmpeg

var ffmpeg = require('child_process').spawn("ffmpeg", [
"-f",
"h264",
"-i",
"tcp://"+'own ip and port',
"-preset",
"ultrafast",
"-r",
"24",
"-q:v",
"3",
"-f",
"mjpeg",
"pipe:1"
]);


ffmpeg.on('error', function (err) {
throw err;
});

ffmpeg.on('close', function (code) {
console.log('ffmpeg exited with code ' + code);
});

ffmpeg.stderr.on('data', function (data) {
// console.log('stderr: ' + data);
});
ffmpeg.stderr.on('exit', function (data) {
// console.log('exit: ' + data);
});

At this time, nodejs can process every frame of picture pushed by mjpeg.

ffmpeg.stdout.on('data', function (data) {

    var frame = new Buffer(data).toString('base64');
    console.log(frame);

  });

At this point, the processing of face recognition and object recognition can be written into one process, but if an error is reported somewhere or the whole program overflows, the whole program will hang up, so I write face recognition and object recognition to two files separately. , handle it through socket communication, so that a process hangs up and restarts it alone, it will not affect all

So push the pulled stream to the socket that needs to be identified, and prepare to receive the returned identification data

const net = require('net');
let isFaceInDet = false,isObjInDet = false,faceBox=[],objBox=[],faceHasBlock=0,objHasBlock=0;
ffmpeg.stdout.on('data', function (data) {

    var frame = new Buffer(data).toString('base64');
    console.log(frame);

  });

let clientArr = [];
const server = net.createServer();
// 3 Bind the link event
server.on('connection',(person)=>{
  console.log(clientArr.length);
// log linked process
  person.id = clientArr.length;
  clientArr.push(person);
  // person.setEncoding('utf8');
// Client socket process binding event
  person.on('data',(chunk)=>{
    // console.log(chunk);

    if(JSON.parse(chunk.toString()).length>0){
      //data after identification
      faceBox = JSON.parse(chunk.toString());
    }else{
      if(faceHasBlock>5){
        faceHasBlock = 0;
        faceBox = [];
      }else{
        faceHasBlock++;
      }
    }

    isFaceInDet = false;
  })
  person.on('close',(p1)=>{
    clientArr[p1.id] = null;
  } )
  person.on('error',(p1)=>{
    clientArr[p1.id] = null;
  })
})
server.listen(8990);


let clientOgjArr = [];
const serverOgj = net.createServer();
// 3 Bind the link event
serverOgj.on('connection',(person)=>{
  console.log(clientOgjArr.length);
// log linked process
  person.id = clientOgjArr.length;
  clientOgjArr.push(person);
  // person.setEncoding('utf8');
// Client socket process binding event
  person.on('data',(chunk)=>{
    // console.log(chunk);

    if(JSON.parse(chunk.toString()).length>0){
      objBox = JSON.parse(chunk.toString());
    }else{
      if(objHasBlock>5){
        objHasBlock = 0;
        objBox = [];
      }else{
        objHasBlock++;
      }
    }

    isObjInDet = false;
  })
  person.on('close',(p1)=>{
    clientOgjArr[p1.id] = null;
  } )
  person.on('error',(p1)=>{
    clientOgjArr[p1.id] = null;
  })
})
serverOgj.listen(8991);

face recognition

Dry down the official demo of face-api and change it slightly

It is necessary to receive the passed image buffer first, and return the identification data after processing.

let client;
const { canvas, faceDetectionNet, faceDetectionOptions, saveFile }= require('./commons/index.js');
const { createCanvas } = require('canvas')
const { Image } = canvas;
const canvasCtx = createCanvas(1280, 760)

const ctx = canvasCtx.getContext('2d')

async function init(){
    if(!img){
        //Preload the model
        await loadRes();
    }
 
    client = net.connect({port:8990,host:'127.0.0.1'},()=>{
        console.log('=-=-=-=')
    });

    let str=false;
    client.on('data',(chunk)=>{

        // console.log(chunk);
        //process images
        detect(chunk);


    })
    client.on('end',(chunk)=>{

        str=false
    })


    client.on('error',(e)=>{
        console.log(e.message);
    })

}

init();

async function detect(buffer) {
    //buffer to canvas object
    let queryImage = new Image();
    queryImage.onload = () => ctx.drawImage(queryImage, 0, 0);
    queryImage.src = buffer;
    console.log('queryImage',queryImage);

    try{
        //Identify
        resultsQuery = await faceapi.detectAllFaces(queryImage, faceDetectionOptions)

    }catch (e){
        console.log(e);
    }

    let outQuery ='';

    // console.log(resultsQuery);

    //return the result to the socket
    client.write(JSON.stringify(resultsQuery))

    return;

    if(resultsQuery.length>0){

    }else{
        console.log('do not detectFaces resultsQuery')
        outQuery = faceapi.createCanvasFromMedia(queryImage)
    }

}

More parameter details in official documentation and examples

object recognition

The same reference to the official example, processing the passed pictures

let client,img=false,myModel;

async function init(){
    if(!img){
        //It is recommended to download the model and save it locally, otherwise each initialization will pull the model from the remote, which consumes a lot of time
        myModel = new Recognizejs({
            mobileNet: {
                version: 1,
                // modelUrl: 'https://hub.tensorflow.google.cn/google/imagenet/mobilenet_v1_100_224/classification/1/model.json?tfjs-format=file'
                modelUrl: 'http://127.0.0.1:8099/web_model/model.json'
            },
            cocoSsd: {
                base: 'lite_mobilenet_v2',
               
                // modelUrl: 'https://hub.tensorflow.google.cn/google/imagenet/mobilenet_v1_100_224/classification/1/model.json?tfjs-format=file'
                modelUrl: 'http://127.0.0.1:8099/ssd/model.json'
            },
        });
 
        await myModel.init(['cocoSsd', 'mobileNet']);
        img = true;
    }



    client = net.connect({port:8991,host:'127.0.0.1'},()=>{
        console.log('=-=-=-=')
        client.write(JSON.stringify([]))
    });

    let str=false;
    client.on('data',(chunk)=>{

        // console.log(chunk);
        console.log(n);
        detect(chunk);
     
    })
    client.on('end',(chunk)=>{

        str=false
    })


    client.on('error',(e)=>{
        console.log(e.message);
    })
}

init();

async function detect(imgBuffer) {

    let results = await myModel.detect(imgBuffer);

    client.write(JSON.stringify(results))

    return;

}

At this time, push the acquired image stream to these two socket s in the js of the push stream

ffmpeg.stdout.on('data', function (data) {
    //Only process one image at a time
    if(!isFaceInDet){
      isFaceInDet = true;
      if(clientArr.length>0){
        clientArr.forEach((val)=>{
// Data is written to all client processes
          val.write(data);
        })
      }
    }
    if(!isObjInDet){
      isObjInDet = true;
      if(clientOgjArr.length>0){
        clientOgjArr.forEach((val)=>{
// Data is written to all client processes
          val.write(data);
        })
      }
    }
    var frame = new Buffer(data).toString('base64');
    console.log(frame);

  });

At this time, each frame of picture and recognized data obtained by the camera is available, and can be returned to the web page through websocket. The web page then draws each frame of picture and recognized data to the page through canvas. The most basic effect is Achieved.

Of course, I don’t feel relieved to put it on the web page. After all, it will involve privacy, so you can use a shell such as react-native or weex. It has been flashing when I am, and the effect is very bad. I have also used the canvas of rn, and the performance is too poor, and it will get stuck after a while. It is better to run directly with webview.

At last

I originally planned to do this and study tfjs and do a monitoring and early warning of a hut

For the early warning, you only need to add your own face detection to the face recognition, and send me a message when you recognize that it is not yourself. Such a simple monitoring guard can be realized.

Originally, I wanted to make a robot that can recognize simple operations, similar to Xiaoai, but the hardware on a certain treasure is basically connected to other platforms, and there is no programmable play, so I can only make a simple conversation first. Little robot.

Also, the recognized data can be processed in nodejs first and then pushed to ffmpeg and then converted into rtmp stream. At this time, audio stream can be added to push at the same time, so the effect will be better, but I feel that I have no brain cells to burn, usually Enough work is enough~, if you have experience later, you should be able to toss a little robot again, the technology stack has been seen almost, but the brain is not enough

Tags: Javascript node.js Machine Learning Deep Learning face recognition

Posted by ludjer on Fri, 29 Apr 2022 15:52:37 +0300