python-based method to copy the corresponding information in an excel file to another out of order excel

foreword

I wanted to write this because I received a task from the teacher today. The teacher gave me two folders. One of the folders has the candidates’ scores and candidates’ names, and the other has the candidates’ names, but they are out of order. What I have to do is to write the scores of these people into a new excel document one by one, but it is too wasteful of eyes, and I am lazy, so I thought of the python method, and it is estimated that there will be many such cases in the future, so Record it, maybe it's useful

refer to

This article thanks the following blogs for reference:

  1. How openpyxl writes dictionary type data into Excel
  2. Python3 dictionary
  3. How to iterate over keys and values ​​in a dictionary in python
  4. python operation xlsx format file
  5. How python pandas reads Excel data and rewrites it to the Excel file according to the specified format
  6. Use python to read and write data in xlsx format [pandas]

There is also a blog I wrote before (the same can be used under windows)
Change the default download source of pip under linux system

thinking logic

First of all, I changed the original excel file, that is, created a new one. The original file contains student number, sorting, name, etc., but because I only need grades and names, I copied grades and names. The column is listed in a new document, that is to say, the excel of my operation below has only two columns, the first column is the name, and the second column is the grade (the one that needs to be imported is empty, but some people already have grades , but does not affect)

My thinking is as follows:

  1. Open the file with existing grades in python, read the data, and convert it into a dictionary
  2. Open the empty grade file also in python, and read the data, I replaced the part without grade with -1, and converted it into a dictionary
  3. Replace the value of the dictionary with grades with the value of the dictionary without grades, of course the value of the corresponding key
  4. Generate the new dictionary into an excel file
  5. Just copy the contents of the new excel file directly to the ungraded excel file, because the order is right, there is nothing wrong with direct copying

Code explanation

Third-party modules that need to be installed

  • openpyxl
  • pandas
  • numpy

Code Breakdown Instructions

import required modules

import openpyxl
import pandas as pd
import numpy as np

read the corresponding file

Use the pandas package to read, here I personally understand the parameters as follows:
The meaning of sheet_name=0 is to return the entire table. If you want to combine multiple tables, it seems to be other parameters, but I have not studied it carefully here.
header=0 means to read the header, the first line is the header
usecols means to read by column, 0 and 1 are to read the first and second columns
The latter is to convert the read things into np format, which is convenient for operation.

fileone = pd.read_excel("newinput.xlsx",sheet_name=0,header=0,usecols=[0,1])
filetwo = pd.read_excel("newresult.xlsx",sheet_name=0,header=0,usecols=[0,1])
raw_data = np.array(fileone)
true_data = np.array(filetwo)

read file content

Here I read it in the form of a dictionary. The first is to read the part of the list with results, which is pro_data. There is nothing to say about the assignment. The following is a list of no grades. If there are no grades, pandas will read a value of nan. Therefore, the np method is used to judge whether it is nan. If it is, it will give a value of -1.

pro_data = {}    # Contents of excel file with grades
for i in range(len(raw_data)):
    pro_data[raw_data[i][0]] = raw_data[i][1]

res_data = {}    # Contents of excel file without grades
for i in range(len(true_data)):
    if np.isnan(true_data[i][1]):
        true_data[i][1] = -1
    res_data[true_data[i][0]] = true_data[i][1]

Grade entry

The essence of this part is to assign the value of a dictionary to another dictionary, and it is also the main part of human work, that is, retrieval, which can be done with two nested for loops in python. When the same key is retrieved (that is, a= =c), will be assigned to value

for a,b in pro_data.items():
    for c,d in res_data.items():
        if a == c:
            res_data[c] = b

Export of new dictionaries

grade = openpyxl.Workbook()
gradee = grade.active
gradee.title = 'test'   # excel file name
n = 1
for key,value in res_data.items():
    gradee['A' + str(n)] = key   # This A is the meaning of the corresponding cell
    gradee['B'+ str(n)] = value
    n = n + 1

grade.save('test.xlsx')

Complete code, easy to copy

import openpyxl
import pandas as pd
import numpy as np

fileone = pd.read_excel("newinput.xlsx",sheet_name=0,header=0,usecols=[0,1])
filetwo = pd.read_excel("newresult.xlsx",sheet_name=0,header=0,usecols=[0,1])

raw_data = np.array(fileone)

pro_data = {}
for i in range(len(raw_data)):
    pro_data[raw_data[i][0]] = raw_data[i][1]

true_data = np.array(filetwo)

res_data = {}
for i in range(len(true_data)):
    if np.isnan(true_data[i][1]):
        true_data[i][1] = -1
    res_data[true_data[i][0]] = true_data[i][1]

print(res_data)

for a,b in pro_data.items():
    for c,d in res_data.items():
        if a == c:
            res_data[c] = b

print(res_data)
print(pro_data)

grade = openpyxl.Workbook()
gradee = grade.active
gradee.title = 'test'
n = 1
for key,value in res_data.items():
    gradee['A' + str(n)] = key
    gradee['B'+ str(n)] = value
    n = n + 1

grade.save('test.xlsx')

Summarize

python is cool
Fire, Burglary and Fraud

Tags: Python Data Analysis pandas

Posted by Orkan on Wed, 12 Oct 2022 01:26:16 +0300