All categories

WP2TXT 0.8

Free WP2TXT extracts plain text data from Wikipedia dump file.
0 
Latest version:
0.8 See all
Developer:
Collect

WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata. In addition, the app allows you to specify text elements to be extracted/converted (title, heading, paragraph, etc.). The character references are converted to UTF-8 entities.

The app is originally intended to be useful for researchers who look for an easy way to obtain open-source multi-lingual corpora, but may be handy for other purposes.

Comments

Suggestions

Animal Typing
Animal Typing
Free

Help children of all ages learn touch-typing.

Bible+
Bible+
Free

Acess, browse, and manage data for Bible studies.

Midnight Planets
Midnight Planets
Free

Follow the adventure! Midnight Planets is Midnight Martian's new app for visualizing data from spacecraft exploring our solar system...

Essential Anatomy
Essential Anatomy
Free

Provides 3D models of human anatomy elements.

Daily Bible
Daily Bible
Free

Read through the Bible using one of 3 great daily plans.

Bookends
Bookends
Free

It helps you manage bibliographies, literature research materials, etc.

Download
Free