Core technology and algorithm of python natural language processing -- Chinese syntactic analysis based on PCFG

This chapter in this book is a little thin, but the author also explained that this book is an introductory practice book of NLP, and syntactic analysis belongs to a higher-level problem in NLP, so I didn't explain it in depth. I'm also an introductory NLP after learning this book. After learning this book, I will learn statistical natural lang ...

Posted by forcerecon on Tue, 24 May 2022 12:42:37 +0300

Analysis of named entity recognition task code based on LSTM/BLSTM/CNNBLSTM -- 1

The original code comes from github. The specific website is: https://github.com/OustandingMan/LSTM-CRF However, reading the corpus is not a template, but a code written by yourself to read the data   My understanding of deep learning: Processing data: processing data into a format that can be read by the network Network construction: vari ...

Posted by FraggleRock on Mon, 16 May 2022 16:11:02 +0300

NLP05: Sentiment Classification Based on CNN-LSTM

Official Account: Notes on Data Mining and Machine Learning Sentiment classification using CNN-LSTM, here is a binary classification model. It is divided into the following steps as a whole: Environment and parameter settings data preprocessing Model network structure construction and training model usage 1. Environment and parameter setti ...

Posted by hkothari on Mon, 16 May 2022 12:42:47 +0300

NLP learning - text classification of NLP practice - Chinese spam classification - Python 3

1, Implementation steps of text classification: Definition stage: define the data and classification system, which categories are divided and which data are needed Data preprocessing: prepare documents for word segmentation and de stop words Data extraction features: reduce the dimension of the document matrix and extract the most useful featu ...

Posted by lucerias on Mon, 16 May 2022 00:57:33 +0300

pytorch collate_fn function to realize variable length batch - Dynamic padding

Note: batch here refers to mini batch Two methods to realize sequence (text, log) batch processing Fixed length batches (uniform length batches) All batch sequences have the same length. For example, seqs = [[1,2,3,3,4,5,6,7], [1,2,3], [2,4,1,2,3], [1,2,4,1]] batch_size = 2 Then the maximum sequence length is 8. If it is less than 8, fill it ...

Posted by sonic_2k_uk on Sat, 14 May 2022 05:21:32 +0300

TF-IDF model for NLP text keyword extraction: epidemic text data analysis based on stuttering word segmentation and wordcloud

TF-IDF model: analysis of epidemic text data based on stuttering word segmentation and wordcloud Recently, we have made a text data analysis of China's policy on the COVID-19. Let's introduce the relevant knowledge to summarize and consolidate, and hope to help more people. 1, TF IDF: keyword extraction Stop words: stop words are words o ...

Posted by Dasndan on Fri, 13 May 2022 00:46:36 +0300

[from the official case study framework Keras] seq2seq based on character LSTM

[from the official case study framework Keras] seq2seq based on character LSTM Keras official case link Tensorflow official case link Paddle official case link Pytoch official case link Note: this series only helps you to quickly understand and learn, and can independently use the relevant framework for in-depth learning research. Please ...

Posted by sarah on Sun, 08 May 2022 07:29:41 +0300

Knowledge map tracing 1.1 -- starting from NER

Overview of this article: recurrence of knowledge- KG open source project set Medium BERT-NER-pytorch Some learning records after the project are of reference significance to Xiaobai, who is also a newcomer. Data: about the introduction of transformer in BERT model, what must be shared is Animation of Jay Alammar , why didn't I see such a good ...

Posted by MartiniMan on Sun, 08 May 2022 05:09:16 +0300

The loss of LSTM in multi classification does not decrease (pytorch Implementation)

Recently, LSTM has been used for text classification based on THUCNews dataset. The previous classification of 10 news categories with LSTM model can converge normally, which shows that it should not be the reason for the wrong code. However, when I expanded the news categories to 14 categories, the loss did not decrease: Because I don't know ...

Posted by narch31 on Tue, 03 May 2022 07:45:01 +0300

[hands on learning pytorch notes] 36 Transformer implementation

Transformer implementation Put the contents of the previous sections together, muti head attention, positive encoding import math import pandas as pd import torch from torch import nn from d2l import torch as d2l Position based feedforward network The name is very tall. In fact, it is a single hidden layer MLP #@save class PositionWi ...

Posted by natbrazil on Mon, 25 Apr 2022 20:16:13 +0300