python+shell backup csdn blog post 3 backup pictures

python+shell backup csdn blog post 3 backup pictures

Previously, we backed up all the blog posts. However, the pictures in the blog are still scattered everywhere. Some are on third-party websites, some are on CSDN servers, and some directly refer to pictures from other places.

A few days ago, I wrote a blog post Creating your own blog gallery with github, python 3 and MWeb , we set up our own local drawing bed service, and then pushed all the pictures to the github service. By using the raw address of github, we can get an unlimited number of good drawing beds.

Still, let's toss around, upload all the pictures in my blog, and replace all the picture addresses in the article with the addresses of github.

Just do it. First think about the following ideas:

Overall idea of backup pictures

  1. By looping through each line of each blog post document, find all picture paths and save them as a dictionary.
  2. Use the shell to cycle through the dictionary and download all the picture files
  3. Write another script to cycle all the picture files and post them to my drawing bed service In addition, the image file name and the returned file name are used to generate a dictionary for the next operation.
  4. Cycle each line of the blog post again, and replace the original image path with the updated path through the dictionary matched in the previous step.

Any complex problem can be solved after careful analysis.

Find all picture path scripts

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import linecache
import requests as req
from io import BytesIO
import json
import os

def saveImg (mdFile):
    print(mdFile)
    with open(mdFile, 'r', encoding="utf-8") as mdTxt:
        for line in mdTxt:
            if '![' in line and '](' in line:
                imgUrl = line.split('(')[1].split(')')[0]
                print('\t' + imgUrl)
                os.system('echo "' + imgUrl + '" >> imgUrl.txt')

def findMdFile ():
    sdir = './markdown/'
    res = []
    for f in os.listdir(sdir):
        fp = os.path.join(sdir, f)
        if '.md' in fp:
            res.append(fp)

    for i in res:
        saveImg(i)

findMdFile()
copy

OK, through this script, we will save all the image paths in imgurl Txt in this text file.

Download all pictures

I tried to download it in python, but it was always 403, so I gave it up.

Create an img folder dedicated to storing pictures, and then create a new script file. Enter the following contents:

for i in $(cat ../imgUrl.txt); do
  curl -O $i;
  sleep 1;
done
copy

After more than ten minutes of waiting, all the pictures have been downloaded.

The specific time depends on the number of pictures in your blog.

Upload pictures to my drawing bed service

Here I use my own solution. If you use a third-party drawing bed, you can simply modify the following script to meet your needs.

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import os
import imghdr
import requests as req
import json


# Find all the pictures in the source directory and output them as an array
def findImg(sdir):
    res = []
    for f in os.listdir(sdir):
        fp = os.path.join(sdir, f)
        if not os.path.isdir(fp):
            if imghdr.what(fp):
                res.append(fp)
    return res

def upImg ():
    imgs = findImg('./img/')
    for i in imgs:
        files = {'file': ('imgName', open(i, 'rb'), 'image/jpeg')}
        r = req.post('http://localhost:7000/upimg', files=files)
        rJson = json.loads(r.text)
        if rJson['status'] == 0:
            rPath = rJson['data']['path']
            os.system('echo "' + i + '\t' + rPath + '" >> imgDict.txt')
            print('Succ: ' + i + ' | ' + rPath)
        else:
            os.system('echo "' + i + '" >> imgErr.txt')
            print('upErr: ' + i)
        print(i)

upImg()
copy

Through this script, we uploaded all the pictures to my drawing bed service. In addition, an imgdict Txt dictionary file, which compares the addresses of new and old pictures.

Replace the old picture address in all blog posts with the new picture address

When writing the above script, the output dictionary is a tab separated dictionary. For convenience, I changed the format of array containing tuples in batch. Then rename it imgdict Py is convenient to use in the following script.

Dictionary format is dict = [('oldname ',' newpath '), ('oldname', 'newpath')]

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import linecache
import requests as req
from io import BytesIO
import json
import os
import imgDict

imgFix = 'https://raw.githubusercontent.com/fengcms/articles/master/image'

DICT = imgDict.DICT


def writeFile (i, line, tarFile):
    if i == 0:
        with open(tarFile, 'w+') as f:
            f.write(line)
            f.close()
    else:
        with open(tarFile, 'a') as f:
            f.write(line)
            f.close()

def reImg (line):
    if '![' in line and '](' in line:
        for i in DICT:
            if i[0] in line:
                return '![](' + imgFix + i[1] + ')'
    return line

def saveImg (mdFile):
    mdName = mdFile.replace('markdown/', '').replace('/', ':')
    print(mdName)
    souFile = open('markdown/' + mdName, 'r', encoding="utf-8")
    tarFile = './calcMarkdown/' + mdName
    with souFile as mdTxt:
        i = 0
        for line in mdTxt:
            writeFile(i, reImg(line), tarFile)
            i += 1

def findMdFile ():
    sdir = 'markdown/'
    res = []
    for f in os.listdir(sdir):
        fp = os.path.join(sdir, f)
        if '.md' in fp:
            res.append(fp)

    for i in res:
        saveImg(i)

findMdFile()
copy

OK, through the above script, I successfully replaced all the image paths in all blog posts with new image paths and saved them to the calcMarkdown directory.

Shell is not good, but I feel that there is still too much code in python in the last step. It may be better to replace it with a shell in three or two lines. However, the goal has been achieved, and I am too lazy to continue to study.

In fact, for the viewer, the focus is not on what I wrote, but that these codes are relatively basic python codes. I hope they can be helpful to the viewer.

This article was originally created by FungLeo and can be reproduced, but the first link must be retained.

Posted by echelon2010 on Thu, 05 May 2022 20:17:13 +0300