HuggingFace

发布时间 2023-05-22 19:23:18作者: guangheli

Pipeline  

pipeline 模块把所有东西都封装死了,只需要传进去原始输入,就能得到输出. 

例:遮掩词填空,可以看出 pipeline function 给出了 5 个备选答案   

from transformers import pipeline   

classifier = pipeline("fill-mask")    
y_pred = classifier("I love <mask> very much.") 

print(y_pred)  
"""
[
    {'score': 0.09382506459951401, 'token': 123, 'token_str': ' him', 'sequence': 'I love him very much.'}, 
    {'score': 0.06408175826072693, 'token': 47, 'token_str': ' you', 'sequence': 'I love you very much.'}, 
    {'score': 0.056255027651786804, 'token': 69, 'token_str': ' her', 'sequence': 'I love her very much.'}, 
    {'score': 0.017606642097234726, 'token': 106, 'token_str': ' them', 'sequence': 'I love them very much.'}, 
    {'score': 0.016162296757102013, 'token': 24, 'token_str': ' it', 'sequence': 'I love it very much.'}
] 
"""

  

 

Tokenizer  

tokenizer 是分词器,对输入的单词进行预处理,可能会将单词拆开(例如,dogs 拆成 dog + s)  

一般来说,tokenizer 的处理结果和后面的大模型应当是配套的(显然,不同大模型有不同的拆分方案)