Python full stack series 44 - basic use of pymongo


Based on the different business and operation scenarios, there are several commonly used databases that cannot be avoided. Fortunately, python has corresponding interfaces:

database Python interface package use
mysql pymysql Used to store relatively static master data, such as user information, reports, etc.
mongo pymongo Used to store unstructured information, log-level information can be stored; for example, the json sent by the user
redis redis It is used to handle a large number of high-frequency data accesses, and can also be used for simple message queues
neo4j py2neo Used to store structural information (meta-information), such as the structure of an algorithm, the organization of a website, a network of relationships, etc.

Here is mainly a review (recall) of mongo first.

Installation starts

I used to manually install it on the mac, I forgot, it should be very simple. I will use docker in the future, so I will not mention the installation.

Startup on mac:
Front-end startup (terminal display)

mongod --dbpath ~/data/db

Start in the background. Note that there are two pits here: 1. The log to be formulated is a file, not a directory; 2. The path must be an absolute path, and cannot be abbreviated as above.

mongod  --dbpath=YOURABSPATH/data  --logpath=YOURABSPATH/data/log/mongodb.log --fork

Just enter a few basic commands (no semicolons required)

        + 1 >>> show dbs  | it's inside baidu library
        + 2 >>> use baidu 
        + 3 >>> show collections
        + 4 >>> db.gps.find() | query all data

mongo does not set a user and password by default. If you need to set it, you can refer to This article

The installation on ubuntu I have done before, let's put it first

  • 1 sudo apt-get install mongodb install service, refer to This article
  • 2 Start service mongodb start (background start)
  • 3 View pgrep mongo -l
  • 4 stop service mongodb stop
  • 5 Modify remote access: sudo vi /etc/mongodb.conf
    • bind_ip changed to
    • Consider that if it is personal data, it does not persist on the server. If it is running on the server, it needs to be sent to the message queue and stored locally.

-6 restart service mongodb restart

Access via pymongo

from pymongo import MongoClient
from datetime import datetime
conn =  MongoClient('localhost', 27017)

# 1 Select a database. Connect to the mydb database, if not, it will be automatically created
db = conn.my1 # You can also use conn['my1']
# 2 Select a collection (equivalent to selecting a table). Use the test_set collection, if not, it will be created automatically
test_coll = db['coll']
# 3 Make a piece of data for testing

test_dict1 = {}
test_dict1['name'] = 'andy'
test_dict1['occupation'] = 'Data Scientist'
test_dict1['age'] = 111
test_dict1['opr_ts'] =
# Add / insert_one and insert_many
res_create = test_coll.insert_one(test_dict1)

# delete / delete_one and delete_many
# res_delete = test_coll.delete_one({'name':'andy'})

# change / 
# In addition, we can also use the $set operator to update the data. Use $set to update only the fields that exist in the dictionary,
# Other fields are not updated nor deleted. If not used, all data will be updated, and other existing fields will be deleted.

test_dict2 = {}
test_dict2['age'] = 222
# update will refresh the entire article | It is no longer recommended to use
# update_one with $ will only refresh existing ones
# res_update = test_coll.update({'name': 'andy'}, test_dict2)
res_update = test_coll.update_one({'name':'andy'}, {'$set':test_dict2})
# The content returned in res_update has information on whether it is successful, how many matches, how many changes, etc.
# You can refer to this link, which contains a lot of practical content

# Search: There are many ways to find, directly find one, find one and delete one... (Use automatic association to find many)
# res_find = test_coll.find_one({'age':{'$gt':30}})

res_find = test_coll.find({'age': {'$gt': 30}}) # find should be displayed in a list
# datetime can also use the timestamp method to convert dates to timestamps
Comparison notation induction:
1. $lt  less than{'age': {'$lt': 20}}
2. $gt  more than the{'age': {'$gt': 20}}
3. $lte less than or equal to{'age': {'$lte': 20}}
4. $gte greater or equal to{'age': {'$gte': 20}}
5. $ne  not equal to{'age': {'$ne': 20}}
6. $in  within the scope{'age': {'$in': [20, 23]}}
7. $nin out of range{'age': {'$nin': [20, 23]}}

Symbol Meaning Example Example Meaning
$regex match regular{'name': {'$regex': '^M.*'}}name by M beginning
$exists Does the property exist{'name': {'$exists': True}}name property exists
$type Type judgment{'age': {'$type': 'int'}}age is of type int
$mod Digital analog operation{'age': {'$mod': [5, 0]}}Age modulo 5 more than 0
$text text query{'$text': {'$search': 'Mike'}}text The type attribute contains Mike string
$where Advanced conditional query{'$where': 'obj.fans_count == obj.follows_count'}The number of followers equals the number of followers

# sort
# You can call the sort method and pass in the sorted field and the ascending and descending order flag. The example is as follows:
results = collection.find().sort('name', pymongo.ASCENDING)
# Offset, you may want to take only a few elements, here you can use the skip() method to offset a few positions, such as offset 2, ignore the first 2 elements, and get the third and later elements.
results = collection.find().sort('name', pymongo.ASCENDING).skip(2)

# Get values ​​by looping through a list
[result['name'] for result in results]

1. Conditional delete
2. Delete all numbers
3. Delete the collection
4. Delete the entire database
show dbs;

After mongodb deletes the collection, the disk space is not released, and it can be released only by repairing it with db.repairDatabase().
However, if an abnormal mongodb hangs up during the repair process, it cannot be started when it starts again, and it needs to be repaired first.
You can use ./mongod --repair --dbpath=/data/mongo/,
If you put the database in a separate folder and specify the dbpath to point to the database to be repaired, the repair may take a long time.
When using db.repairDatabase() to repair, you must stop reading and writing, and mongodb must have a standby machine.
Otherwise, do not use db.repairDatabase() to repair the database casually, remember.

The basic operation content is so much, and then it will be combined with the actual use.


It turns out that I have also done offline tdf-idf calculation, saved it with mongo, and sorted it out when I have time.

to be continued…

Posted by ajpasetti on Mon, 09 May 2022 13:48:51 +0300