[Most-ai-contest] zhwiki_20190820 & pbs+cdc+cna+moti(2019.1.1~2019.11.28 broadcast news) corpus for pre-training DNN models
范正忠
jjfan於iis.sinica.edu.tw
Wed 12月 4 12:37:20 CST 2019
This is for 羅上堡 to pretrain NN models and 吳佳樺 to N-Grams language models.
[ https://drive.google.com/drive/folders/1y0UCb-n1YKlKUsQk2GJz6iKJqQFJ25rK?usp=sharing | https://drive.google.com/drive/folders/1y0UCb-n1YKlKUsQk2GJz6iKJqQFJ25rK?usp=sharing ]
jjfan
From: "范正忠" <jjfan at iis.sinica.edu.tw>
To: "Most-ai Contest" <Most-ai-contest at iis.sinica.edu.tw>
Sent: Thursday, November 28, 2019 4:49:04 PM
Subject: [Most-ai-contest] zhwiki_20190820 & pbs(2019.1.1~2019.11.28 broadcast news) corpus for pre-training DNN models
Dear all,
The following corpus can be used to improve pre-trained parameters of DNN models and N-Gram language models.
Primarily for 羅上堡 to pre-train XLNet model, and 吳佳樺 to train N-Gram models.
[ https://drive.google.com/drive/folders/1y0UCb-n1YKlKUsQk2GJz6iKJqQFJ25rK?usp=sharing | https://drive.google.com/drive/folders/1y0UCb-n1YKlKUsQk2GJz6iKJqQFJ25rK?usp=sharing ]
Please feel free to let me know any questions.
Best,
jjfan
From: "范正忠" <jjfan at iis.sinica.edu.tw>
To: "Most-ai Contest" <Most-ai-contest at iis.sinica.edu.tw>
Sent: Thursday, November 21, 2019 4:06:45 PM
Subject: Release dataset 1.2
Dear all,
Enclosed please find FGC_Release_1.1 data-set , which
1. DRCD, ASR, Kaggle, Lee
2. FGC_release_A_train, FGC_release_A_dev, FGC_release_A_test
Please note that all data are in cn language and FGC format
The following is the answer-type & answer-mode distributions for each dataset ( less "Misc" answer-type )
All
Answer Type YesNo Num-Measure Kinship Person Date-Duration Location Organization Object Event Misc Total
53 59 83 73 125 92 83 71 19 88 746
53 87 83 77 137 99 86 79 19 26 746
7.10% 11.66% 11.13% 10.32% 18.36% 13.27% 11.53% 10.59% 2.55% 3.49% 100.00%
Answer Mode YesNo (是否題) Multi-Spans-Extraction (列舉題型) Kinship Single-Span-Extraction (單一答案) Date-Duration Arithmetic-Operations Counting Comparing-Members Common-Sense
53 101 75 442 57 3 15 0 0 746
53 101 75 426 61 6 23 1 0 746
7.10% 13.54% 10.05% 57.10% 8.18% 0.80% 3.08% 0.13% 0.00% 100.00%
Train
Answer Type YesNo Num-Measure Kinship Person Date-Duration Location Organization Object Event Misc Total
21 43 59 39 50 44 56 24 11 16 363
5.79% 11.85% 16.25% 10.74% 13.77% 12.12% 15.43% 6.61% 3.03% 4.41% 100.00%
Answer Mode YesNo (是否題) Multi-Spans-Extraction (列舉題型) Kinship Single-Span-Extraction (單一答案) Date-Duration Arithmetic-Operations Counting Comparing-Members Common-Sense
21 40 59 208 18 3 14 0 0 363
5.79% 11.02% 16.25% 57.30% 4.96% 0.83% 3.86% 0.00% 0.00% 100.00%
Dev
Answer Type YesNo Num-Measure Kinship Person Date-Duration Location Organization Object Event Misc Total
17 23 15 21 56 34 14 18 2 9 209
8.13% 11.00% 7.18% 10.05% 26.79% 16.27% 6.70% 8.61% 0.96% 4.31% 100.00%
Answer Mode YesNo (是否題) Multi-Spans-Extraction (列舉題型) Kinship Single-Span-Extraction (單一答案) Date-Duration Arithmetic-Operations Counting Comparing-Members Common-Sense
17 26 12 117 29 2 6 0 0 209
8.13% 12.44% 5.74% 55.98% 13.88% 0.96% 2.87% 0.00% 0.00% 100.00%
Test
Answer Type YesNo Num-Measure Kinship Person Date-Duration Location Organization Object Event Misc Total
15 21 9 17 31 21 16 37 6 1 174
8.62% 12.07% 5.17% 9.77% 17.82% 12.07% 9.20% 21.26% 3.45% 0.57% 100.00%
Answer Mode YesNo (是否題) Multi-Spans-Extraction (列舉題型) Kinship Single-Span-Extraction (單一答案) Date-Duration Arithmetic-Operations Counting Comparing-Members Common-Sense
15 35 4 101 14 1 3 0 0 173
8.67% 20.23% 2.31% 58.38% 8.09% 0.58% 1.73% 0.00% 0.00% 100.00%
Best,
jjfan
From: "范正忠" <jjfan at iis.sinica.edu.tw>
To: "Most-ai Contest" <Most-ai-contest at iis.sinica.edu.tw>
Sent: Tuesday, November 19, 2019 5:04:12 PM
Subject: Re: [Most-ai-contest] refinement of anstype and ansmode for fgc-2019 dataset
Dear all,
Enclosed please find FGC_Release_1.1 data-set, which include
1. DRCD, ASR, Kaggle, Lee
2. FGC_release_A_train_1.1, FGC_release_A_dev_1.1, FGC_release_A_test_1.1
Please use this data-set as the standard benchmark.
Also note that you can use item 1 + FGC_release_A_train_1.1 as your training set, FGC_release_A_dev_1.1 as development set, and FGC_release_A_test_1.1 as testing set.
Please feel free to let me know any questions.
Best,
jjfan
From: "范正忠" <jjfan at iis.sinica.edu.tw>
To: "Most-ai Contest" <Most-ai-contest at iis.sinica.edu.tw>
Sent: Monday, November 18, 2019 8:56:28 AM
Subject: Re: [Most-ai-contest] refinement of anstype and ansmode for fgc-2019 dataset
Dear all,
Please send me error list of Answer-Type and Answer-Mode annotations end of today.
Then I will divide FGC release data-set into training, development, and test, and release them tomorrow for your benchmark.
Thanks.
Best,
jjfan
From: "Chiangyulun0914" <chiangyulun0914 at iis.sinica.edu.tw>
To: "Most-ai Contest" <Most-ai-contest at iis.sinica.edu.tw>
Sent: Wednesday, November 13, 2019 5:19:49 PM
Subject: Re: [Most-ai-contest] refinement of anstype and ansmode for fgc-2019 dataset
大家好,
檔案以 .xlsx 或 .csv 檔為主。附檔為範例。感謝!
江侑倫
自然語言理解實驗室
中央研究院資訊科學研究所
BQ_BEGIN
BQ_END
Yu-Lun Chiang
BQ_BEGIN
BQ_END
Natural Language Understanding Lab
BQ_BEGIN
BQ_END
Institute of Information Science, Academia Sinica
Mobile: +886-975279013 (Taiwan)
江侑倫 < [ mailto:chiangyulun0914 at iis.sinica.edu.tw | chiangyulun0914 at iis.sinica.edu.tw ] > 於 2019年11月13日 週三 下午4:57寫道:
BQ_BEGIN
大家好,
有鑑於范博士最新釋出的 fgc-2019 dataset 中,可能因使用 rule-based 標記 anstype 和 ansmode 而造成一些錯誤,因此若團隊成員在使用數據集時發現致命錯誤, 請隨手紀錄,並依照附檔的格式與檔名 ,將修正前和修正後的 anstype 與 ansmode 回傳給范博士,以利范博士更新數據集。在此亦附上 20191112 當天范博士釋出最新版的 anstype 與 ansmode 種類。
若 anstype 與 ansmode 中 僅出現一個 需要被修正,仍請將 不需修正的另一個 也填進附檔的 refined 那行中,以利范博士直接依照 refined 行中的資訊進行數據集更新。
感謝 !
江侑倫
自然語言理解實驗室
BQ_BEGIN
BQ_END
中央研究院資訊科學研究所
BQ_BEGIN
BQ_END
Yu-Lun Chiang
BQ_BEGIN
BQ_END
Natural Language Understanding Lab
BQ_BEGIN
BQ_END
Institute of Information Science, Academia Sinica
Mobile: +886-975279013 (Taiwan)
BQ_END
_______________________________________________
Most-ai-contest mailing list
Most-ai-contest at iis.sinica.edu.tw
https://www.iis.sinica.edu.tw/mailman/listinfo/most-ai-contest
_______________________________________________
Most-ai-contest mailing list
Most-ai-contest at iis.sinica.edu.tw
https://www.iis.sinica.edu.tw/mailman/listinfo/most-ai-contest
_______________________________________________
Most-ai-contest mailing list
Most-ai-contest at iis.sinica.edu.tw
https://www.iis.sinica.edu.tw/mailman/listinfo/most-ai-contest
_______________________________________________
Most-ai-contest mailing list
Most-ai-contest at iis.sinica.edu.tw
https://www.iis.sinica.edu.tw/mailman/listinfo/most-ai-contest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.iis.sinica.edu.tw/pipermail/most-ai-contest/attachments/20191204/bf549c38/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jjfan_anstype&ansmode.PNG
Type: image/png
Size: 54300 bytes
Desc: not available
URL: <http://www.iis.sinica.edu.tw/pipermail/most-ai-contest/attachments/20191204/bf549c38/attachment-0001.png>
More information about the Most-ai-contest
mailing list