'AWS Public Data sets' has full Wikipedia available in TSV format
'Amazon Web Services Blog' reports that the AWS public data sets has the Wikipedia Extraction (WEX), which is a processed, machine-readable dump of the English-language section of the Wikipedia. At nearly 67 GB, this is a handly and formidable data set. The data is provided is the TSV format as exported by PostgreSQL.
There are a number of other data sets also available, read more here.
They also describe how easily you an use these data sets:
Awesome.
There are a number of other data sets also available, read more here.
They also describe how easily you an use these data sets:
Instantiating these data sets is basically trivial. You create a new EBS volume of the appropriate size, basing it on the snapshot id of the data. Next, you attach the volume to a running EC2 instance in the same availability zone. Finally, you create a mount point and mount the EBS volume on the instance.
Awesome.
0 Comments:
Post a Comment
<< Home