Taiwan's Cultural Autonomy Under Siege as Chinese-Influenced AI Gains Ground

0
Number of files
0 M
Total number of words
0
Total file size (MB)
0 %
Progress

In Taiwan, "háng" (行, pronounced like "hong") means column and "liè" (列) means row, while in China, it's completely the opposite!

Taiwan and China have distinct knowledge systems and terminology. AI lacks local Taiwanese texts to understand the culture, relying on simplified Chinese documents instead. As AI prevails, Taiwan's culture and language face erosion. DSR4AI collects local texts as "Taiwan Textbooks" for global AI to learn about Taiwan.

Why is Taiwanese Culture Endangered in the Age of AI?

Taiwanese texts make up only 0.0012 of AI training data. China's approved narratives flood the internet, dwarfing the scarce Taiwanese content. Like a child attending school briefly but watching hours of TikTok, AI exposed mainly to English and simplified Chinese learns from those perspectives, jeopardizing Taiwan's unique cultural identity.
Traditional
Chinese
Simplified
Chinese
English
< 0.1 GB

Data Required for AI's Taiwan Culture Textbook

AI training data is heavily skewed towards English (89.7%) and Simplified Chinese, with only 0.13% Traditional Chinese (Meta's Llama2), evident in AI-generated content.Llama1 was trained on 4700GB, equivalent to 2.5 million copies of "Dream of the Red Chamber". DSR4AI's 2-year goal: accumulate 100GB, then aim for 1TB. Taiwan in 2024 can outpace Wikipedia's 100GB in 5 years. Start now.
1 TB
1 GB

DSR4AI "Data Collection" Project

Some think governments should provide open AI data, as it's massive. But the private sector may be better, more efficient. In democracy, data is private property; making it public is donating. Can donate anonymously or attributed, with non-commercial use restrictions - more flexible.
  • 1

    Doc-Donate Platform

    Online platform for donating human-readable text: blogs, recipes, guides, books, local records, culture, literature, religious texts. Clear guidelines, intuitive UI. Anonymous or attributed donation options. Respect privacy.
  • 2

    Data Standards

    Define categories, quality standards, file formats. Classification/annotation norms, metadata requirements. Data organization/preprocessing guidelines. Usage authorization scope. Ensure text consistency and usability.
  • 3

    Promote Collection

    Partner with academia and content creators to run text collection campaigns. Encourage sharing of research data and original works. Promote DSR4AI and AI applications to raise awareness of Taiwan's need for open-source data and attract data donations.
  • 4

    Data Pipeline

    Efficient ingestion, storage. Automated cleaning, conversion, de-identification. Review for AI training suitability. Filter by type, topic, sensitivity. Metadata. Ensure integrity, security, traceability. Boost efficiency. Protect privacy, sensitive data. Transparent disclosure.
  • 5

    Publishing Platform

    Open platform, diverse formats, usage modes. Smart contracts ensure compliance: licensing, terms, Taiwan-benefiting use cases. Version control, update mechanisms. For researchers, developers, stakeholders. Ensures data timeliness, consistency.
  • 6

    Usage Feedback

    Encourage feedback. Share usage, outcomes. Host competitions, challenges. Build community. Enhance quality, practicality. Showcase impact. Drive innovation, value creation. Advance local Taiwan AI.

Donate Documents

Welcome Taiwan text document donations. Review guidelines.
Register for upload, file management, publication notifications. Or upload unregistered.

Unregistered = anonymous donation.
No file records. Can't manage/delete. No notifications.