This program extracts the vif gene sequences from a set of HIV sequences by aligning them to the HXB2 reference genome.
- Reads the HXB2 reference sequence from
data/hxb2.fasta - Reads multiple query sequences from
data/sequences.fasta - For each query sequence:
- Performs a global alignment against HXB2
- Maps the vif gene coordinates (positions 5243-5619 in HXB2) to the query sequence
- Extracts and prints the corresponding vif sequence in FASTA format
-
Ensure you have
uvinstalled. -
Navigate to the project root directory.
-
Run the program:
uv run program1
The output will be printed to stdout, with each vif sequence in FASTA format.
For each input sequence, the program outputs:
- A FASTA header:
>{sequence_id}_vif - The extracted vif sequence
- The program uses global pairwise alignment with BioPython's PairwiseAligner.
- Coordinate mapping is handled by the aligntools library.
- The program skips sequences where the vif region cannot be successfully mapped.