Notes: sorry that the previous fuller version of this website is lost since I forgot to pay webserver. Currently, only the GTFtools software package is provided here.
I will improve this web site gradually.
You can use the following command from the command line to get help.
python gtftools.py -h
The meaning of options is described below:
Options | Commands | Description |
---|---|---|
-h | python gtftools.py -h | show this help message and exit |
-c | python gtftools.py -c 1-5,X,Y demo.gtf |
chromosome list to analyze. Chromosomes can be separated by comma(,) or dash(-). For example, '-c 1-5,X,Y' means chromosomes 1 to 5 plus X and Y. Default is: 1-22,X,Y |
-m | python gtftools.py -m merged_exons.bed demo.gtf |
output file name for outputing merged exons from all isoforms of a gene in bed format |
-e | python gtftools.py -e exon.bed demo.gtf |
output file name for exon coordination of splice isoforms in bed format |
-i | python gtftools.py -i intron.bed demo.gtf |
output file name for outputing intron coordination of splice isoforms in bed format |
-d | python gtftools.py -d independent_introns.bed demo.gtf |
output file name for outputing independent intron coordination of genes. Independent introns refer to those introns that do not overlap with any exon of isoforms. It is calcualted by merging all exons of a chromosome followed by substracting them from gene regions. |
-b | python gtftools.py -b intergenic_region.tzt demo.gtf |
output file name for coordinates of intergenic regions, which is calculated by subtracting gene regions from each chromosome. |
-l | python gtftools.py -l gene_length.txt demo.gtf |
output file name for gene length. Four types of gene lengths are calculated. The first three are the mean, median, and max length of the isoforms of a gene. The fourth is the length of the non-overlapping exons of all isoforms. |
-r | python gtftools.py -r isoform_length.txt demo.gtf |
output file name for isoform length file. Isoform length is calculated as the summed length of its exons |
-k | python gtftools.py -k masked_intron.txt demo.gtf |
output file name for the intron that overlaps with exons of other isoforms/genes |
-u | python gtftools.py -u UTR.bed demo.gtf |
output file name for UTR regions |
-s | python gtftools.py -s isoform.txt demo.gtf |
output file name for isoform coordinates and names. |
-q | python gtftools.py -q splice_site.bed demo.gtf |
output file name for 5' or 3' splice site in bed format. The region is based on MaxEntScan: the 5' donor site is 9 bases long with 3 bases in exon and 6 bases in intron, and the 3' acceptor site is 23 bases long with 20 bases in the intron and 3 bases in the exon. |
-g | python gtftools.py -g gene.txt demo.gtf |
output file name for gene coordinates and names. If cis-range of the gene needs to be calculated, users can include the -f option to sepcify cis-range. |
-p | python gtftools.py -p snp_list.txt demo.gtf |
An input file containing a list of SNPs with at least three columns, with the first being chromosome and the second being coordinate and the third being SNP names such as rs ID number. With this option, GTFtools will search for and output cis-SNPs for each gene annotated in the provided GTF file. |
-f | python gtftools.py -g gene.txt -f 2000-1000 demo.gtf |
-f specifies the upstream and downstream distance used to calculate cis-range of a gene. -f is specified in the format of 'distup-distdown', where distup represent the upstream distance from TSS and distdown means the downstream distance from the end of the gene. Note that this parameter takes effect only when the '-g' option is used. For example, using 'python gtftools.py -g gene.bed -f 2000-1000 demo.gtf' means that 2000 bases upstream and 1000 bases downstream of the gene will be clculated as the cis-range and the cis-range will be output to the gene.bed file. By default, -f is set to 0-0, indicating that cis-range will not be calculated when using -g to calculate gene information. |
-t | python gtftools.py -t TSS.txt demo.gtf |
output file name for a region flanking transcription start site (TSS). It is calculated as (TSS-wup,TSS+wdown) where wup is a user-specified distance, say 1000bp, upstream of TSS, wdown is the distance downstream of TSS. wup and wdown is defined by the w parameter specified by '-w'. |
-w | python gtftools.py -t TSS.txt -w 1000-300 demo.gtf |
w specifies the upstream and downstream distance from TSS as described in '-t'. w is specified in the format of 'wup-wdown', where wup and wdown represent the upstream and downstream distance of TSS. Default w = 1000-300 (that is, 1000 bases upstream of TSS and 300 bases downstream of TSS). This range is based on promoter regions used in the dbSNP database based on ref: Genome-wide promoter extraction and analysis in human, mouse, and rat, Genome Biology, 2005. |
-v | python gtftools.py -v |
show program's version number and exit |
Hongdong Li, hongdong@csu.edu.cn
School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China