Wednesday, June 2, 2010

Parsing rule for Spirulina amino acid database at biotec

1. Open the file to be converted with windows wordpad.exe and save the file as "Text-document MS Dos format" with extension *.faa (in this case Spirulina_KMITT.faa".

2. Create a folder
C:\inethub\mascot\sequence\Spirulina_KMITT\current

2. Copy the Spirulina_KMITT.faa to the folder above.

3. Open database maintenance

4. Click New Definition

5. Type in Name "Spirulina_KMITT".
Notice "Select" will change to the same name.
Select "Inactive"
Select "AA"
check Mem map
Taxonomy "none"

6. Path = C:/inethub/mascot/sequence/Spirulina_KMITT/current/Spirulina_KMITT.faa

7. Rule to parse accesstion string from Fasta file = rule 33

8. Rule to parse description string from Fasta file = rule 34

9. Rule to parse accession string from local reference file: --no local reference file--
Host and path are empty

10. Click "Test this definition" to test the new rule. We should obtain two column of accession number and description

11. Select "Active"

12. Select APPLY, done.

More details on the syntax of the file:

Spirulina_KMITT

C:\Inetpub\mascot\sequence\Spirulina_KMITT\current

>AP01000002/Possible HNH endonuclease: reverse transcriptase
MVRLIETMYPKKANRVQASLVRYADDFVVISPSLDIIEPCKNAIFEWLKPVGLEIKPEKTRVCHTLNPIQYEGRTEEAGFDLLGFNIRQYPVGKYKSGKTGGTASRLIGHKTHIKPSKKAVQAHTEVIKGVIKQHKTAPQSALISRLNPIIRGWENYYSGVVSSETFSKLDDIIWQMLRAWKVSRCGKANIEKLRNYLRPGTVILSNGKERHETWLFRTKDGLQLWKHNWTPIVRHTLIKPEATPYDGNWTYWSNRKGQAIGTPNRVAKLLKKQKGKCTWCGQYLPPYDLVEVDHIVP

>AP01020001/conserved hypothetical protein
MEIIKLKIRADGEGKVILQVPQDLANQELEIAVIYQSASPPSATQTPDELGWPPGFFEQTAGCLADEPLVRYDPGEYQVREEIE
.
.
.

Parsing Rules:

Rule 33: ">\([A-Z]*[0-9]*\)"
Rule 34: "/\(.*\)"

No comments:

Post a Comment