The NCBI Sequence Read Archive is a large (>3 quadrillion basepairs as of 2014) repository for next-generation sequence data. Like many NCBI database it is complex and mastering its use is greater than the scope of this lesson. Very often, as in the Lenski paper, there will be a direct link (perhaps in the supplemental information) to where on the SRA the dataset can be found. The link from the Lenski paper is: Lenski dataset
Sections A and B of these exercises take place on your own laptop - no need to open the shell or connect to your remote computer yet!
You should now have a file called SraRunTable.txt
Taking a quick glance at the file, you should be able to answer the following questions:
After answering the question, you should avoid saving this file; we don't want to make any changes. If you were to save this file, make sure you save it as a plain .txt file.
We are going to be doing our formal shell lesson next. However, we can already play with a few simple commands that have you thinking about how the shell can be useful tool in examining your dataset
During the shell lesson we will go much more step-by-step, building our way through every command before we use it. For this first command go ahead any copy and paste if you are just getting use to the shell. The other commands are short enough to type and follow.
$ cd /mnt/research/common-data/workshops/genomics/dc_sample_data/sra_metadata
$ cat SraRunTable.txt
$ column -t SraRunTable.txt
Or, we can look at it with a text viewer like less
:
$ less SraRunTable.txt
Or, we can even just look at just a subset of the data:
$ cut SraRunTable.txt -f 1-4 | head -6 | column -t
Or both!
$ cut SraRunTable.txt -f 1-4 | head -6 | column -t | less
In the next lesson on the shell, we will be able to find out how to do a lot more with the shell, including what else the commands we have done can do.
For the purposes of this workshop, we will be working with 6 of the fastq reads used in this experiment.
SRA Run Number | Clone | Generation | Clade |
---|---|---|---|
SRR098028 | REL1166A | 2,000 | ? |
SRR098281 | ZDB409 | 5,000 | ? |
SRR098283 | ZDB446 | 15,000 | UC |
SRR097977 | CZB152 | 33,000 | Cit+ |
SRR098026 | CZB154 | 33,000 | Cit+ |
SRR098027 | CZB199 | 33,000 | C1 |