下面是代码:
from sklearn import model_selection, preprocessing, linear_model, naive_bayes, metrics, svm from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer from sklearn import decomposition, ensemble import pandas, xgboost, numpy, textblob, string from keras.preprocessing import text, sequence from keras import layers, models, optimizers # load the dataset data = open('data/corpus').read() labels, texts = [], [] for i, line in enumerate(data.split("\n")): content = line.split() labels.append(content[0]) texts.append(" ".join(content[1:])) # create a dataframe using texts and lables trainDF = pandas.DataFrame() trainDF['text'] = texts trainDF['label'] = labels # split the dataset into training and validation datasets train_x, valid_x, train_y, valid_y = model_selection.train_test_split(trainDF['text'], trainDF['label']) # label encode the ta
讯享网

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容,请联系我们,一经查实,本站将立刻删除。
如需转载请保留出处:https://51itzy.com/kjqy/120813.html