There I was, visiting the Sequence and Annotation Downloads page on the UCSC Genome Bioinformatics website. That page contains links to sequences and annotation data downloads for the genome assemblies that are featured in the UCSC Genome Browser. There were so many files to choose from, but I was interested in downloading the following file in the assembly of the human genome data set:
hg38.fa.gz - "Soft-masked" assembly sequence in one file. Repeats from RepeatMasker and Tandem Repeats Finder (with period of 12 or less) are shown in lower case; non-repeating sequence is shown in upper case.
Guess what? That file is greater than 3GB in size! No worries, you may say. Text editors today can handle massive files, right?? I am using Windows, so we're talking about Notepad, WordPad, and Microsoft Office Word, just to name a few.
Well, it seems we have overestimated the abilities of these editors. When I tried the text editors mentioned above, they screamed in agony. Check it out:
Notepad
WordPad
Microsoft Office Word
Yikes.
Continue reading %Ruby on Medicine: Handling Large Files%
by Abder-Rahman Ali via SitePoint