[Most-ai-contest] 預訓練的bert參數

Wed 12月 11 15:08:22 CST 2019

各位好，

https://drive.google.com/drive/folders/1Gbpg5Idu40wRWooXKJbVBm4g4BZKFVHs?usp=sharing


以上連結為Finetune後的bert模型，

roberta的版本會訓練完成再釋出於此雲端資料夾，有需要roberta的可能需要先用bert來進行測試。

在開會期間，有調查用pytorch的開發人員都是透過from_pretrained的方式去load預訓練模型，
所以使用方法一樣，把資料夾下載下來後透過path的方式讓你的libarary幫你自動載入。

Ex：
bert_model = BertModel.from_pretrained('download_path/bert_chinese/')

基本上都可以順利加載。

記住tokenize也要用此種方式，不然embedding的index也會對不上。

=======================================

補充一下

bert_chinese是將整體資料切90%為Training set剩下的為Test set的結果。

bert_chinese_total是將整體資料拿去training並且在Test set的結果。

昨天ppt的ppl的計算有錯誤

修正後的數據如下

bert-chinese ppl = 2.5567 [簡體]

bert-chinese_total ppl = 2.4514 [簡體]

original bert-chinese ppl = 3.4818 [簡體]

original bert-chinese ppl = 3.4816 [繁體]

======

※所有訓練流程都是先轉為簡體來執行的
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.iis.sinica.edu.tw/pipermail/most-ai-contest/attachments/20191211/2da5ba94/attachment.html>