最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - How do I properly use codonbias.scores.FrequencyOfOptimalCodons? - Stack Overflow

programmeradmin0浏览0评论

I am trying to write a script analyzing codon usage in sequence utilizing the codon-bias package.

I am trying to use the class codonbias.scores.FrequencyOfOptimalCodons, but when I do so in my code:

FOC = cb.scores.FrequencyOfOptimalCodons (ref_seq=sequence_list, genetic_code=11)

where sequence_list is a list of str objects containing ORFs, and the genetic_code is set for Bacterial, archaeal and plant plastid code,

I get the following from my Shell:

Traceback (most recent call last):
  File "I:\R&D\Product Research Group\Metabolic&Regulatory modeling\Codon usage\Python scripts\CodonBias CAI analyzer.py", line 117, in <module>
    codon_df, total_orfs, analyzed_orfs = analyze_codon_usage(orfs_file, trna_file)
  File "I:\R&D\Product Research Group\Metabolic&Regulatory modeling\Codon usage\Python scripts\CodonBias CAI analyzer.py", line 71, in analyze_codon_usage
    FOC = cb.scores.FrequencyOfOptimalCodons (ref_seq=sequence_list, genetic_code=11)
  File "C:\Users\shlomog\AppData\Local\Programs\Python\Python312\Lib\site-packages\codonbias\scores.py", line 199, in __init__
    self.weights = self.weights.droplevel('aa')
  File "C:\Users\shlomog\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\generic.py", line 943, in droplevel
    new_labels = labels.droplevel(level)
  File "C:\Users\shlomog\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexes\base.py", line 2155, in droplevel
    levnums = sorted(self._get_level_number(lev) for lev in level)[::-1]
  File "C:\Users\shlomog\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexes\base.py", line 2155, in <genexpr>
    levnums = sorted(self._get_level_number(lev) for lev in level)[::-1]
  File "C:\Users\shlomog\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexes\multi.py", line 1660, in _get_level_number
    raise ValueError(
ValueError: The name aa occurs multiple times, use a level number

Any idea what I'm doing wrong? I had the program print out the sequences in sequence_list before the call, and they appear in order.

Here the offending code in init of FrequencyOfOptimalCodons.

I am trying to write a script analyzing codon usage in sequence utilizing the codon-bias package.

I am trying to use the class codonbias.scores.FrequencyOfOptimalCodons, but when I do so in my code:

FOC = cb.scores.FrequencyOfOptimalCodons (ref_seq=sequence_list, genetic_code=11)

where sequence_list is a list of str objects containing ORFs, and the genetic_code is set for Bacterial, archaeal and plant plastid code,

I get the following from my Shell:

Traceback (most recent call last):
  File "I:\R&D\Product Research Group\Metabolic&Regulatory modeling\Codon usage\Python scripts\CodonBias CAI analyzer.py", line 117, in <module>
    codon_df, total_orfs, analyzed_orfs = analyze_codon_usage(orfs_file, trna_file)
  File "I:\R&D\Product Research Group\Metabolic&Regulatory modeling\Codon usage\Python scripts\CodonBias CAI analyzer.py", line 71, in analyze_codon_usage
    FOC = cb.scores.FrequencyOfOptimalCodons (ref_seq=sequence_list, genetic_code=11)
  File "C:\Users\shlomog\AppData\Local\Programs\Python\Python312\Lib\site-packages\codonbias\scores.py", line 199, in __init__
    self.weights = self.weights.droplevel('aa')
  File "C:\Users\shlomog\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\generic.py", line 943, in droplevel
    new_labels = labels.droplevel(level)
  File "C:\Users\shlomog\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexes\base.py", line 2155, in droplevel
    levnums = sorted(self._get_level_number(lev) for lev in level)[::-1]
  File "C:\Users\shlomog\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexes\base.py", line 2155, in <genexpr>
    levnums = sorted(self._get_level_number(lev) for lev in level)[::-1]
  File "C:\Users\shlomog\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexes\multi.py", line 1660, in _get_level_number
    raise ValueError(
ValueError: The name aa occurs multiple times, use a level number

Any idea what I'm doing wrong? I had the program print out the sequences in sequence_list before the call, and they appear in order.

Here the offending code in init of FrequencyOfOptimalCodons.

Share Improve this question edited Mar 4 at 22:44 Vasilis G. 7,8694 gold badges21 silver badges31 bronze badges asked Mar 3 at 15:45 Shlomo GorenShlomo Goren 111 bronze badge 3
  • github/alondmnt/codon-bias/issues try asking here – pippo1980 Commented Mar 3 at 18:08
  • The error is in Pandas dataframe so this issue is likely due to the way the codon-bias package is constructing or processing its internal DataFrame based on the reference sequences you're providing. Check he content of your sequence_list. You could do a test with a very simple one and see if the error still occurs. – Lewis Commented Mar 5 at 6:23
  • there is something wrong in the class definition meaning it doesnt work, let the developer/manteiner know about this bug. Also The score for a sequence is the fraction of codons in the sequence deemed optimal. The returned vector for a sequence is a binary array where optimal positions contain 1 and non-optimal ones contain 0. These have to be called on FOC like FOC.__calc_score(seq) FOC._calc_vector(seq) – pippo1980 Commented Mar 5 at 16:55
Add a comment  | 

1 Answer 1

Reset to default 0

try to figure out what is going on inside the code,

I modded FrequencyOfOptimalCodons in score.py see score.py like this:

class FrequencyOfOptimalCodons(ScalarScore, VectorScore):
    """
    Frequency of Optimal Codons (FOP, Ikemura, J Mol Biol, 1981).

    This model determines the optimal codons for each amino acid based
    on their frequency in the given set of reference sequences
    `ref_seq`. Multiple codons may be selected as optimal based on
    `thresh`. The score for a sequence is the fraction of codons in
    the sequence deemed optimal. The returned vector for a sequence is
    a binary array where optimal positions contain 1 and non-optimal
    ones contain 0.

    Parameters
    ----------
    ref_seq : iterable of str
        A set of reference DNA sequences for codon usage statistics.
    thresh : float, optional
        Minimal ratio between the frequency of a codon and the most
        frequent one in order to be set as optimal, by default 0.95
    genetic_code : int, optional
        NCBI genetic code ID, by default 1
    ignore_stop : bool, optional
        Whether STOP codons will be discarded from the analysis, by
        default True
    pseudocount : int, optional
        Pseudocount correction for normalized codon frequencies. this is
        effective when `ref_seq` contains few short sequences. by default 1
    """
    def __init__(self, ref_seq, thresh=0.95, genetic_code=1,
                 ignore_stop=True, pseudocount=1):
        self.thresh = thresh
        self.counter = CodonCounter(genetic_code=genetic_code,
                                    ignore_stop=ignore_stop)
        self.pseudocount = pseudocount
        
        print('self.counter: ', self.counter , type(self.counter),'\n\n')
        
        for i in dir(self.counter):
        
            print(i)
            
        print('self.counter.count(ref_seq).get_aa_table(normed=True, pseudocount=pseudocount) : ', 
        self.counter.count(ref_seq).get_aa_table(normed=True, pseudocount=pseudocount) , 
        type(self.counter.count(ref_seq).get_aa_table(normed=True, pseudocount=pseudocount)),'\n\n')
        
        print("self.counter.count(ref_seq).get_aa_table(normed=True, pseudocount=pseudocount).groupby('aa') : ", 
        self.counter.count(ref_seq).get_aa_table(normed=True, pseudocount=pseudocount).groupby('aa') , 
        type(self.counter.count(ref_seq).get_aa_table(normed=True, pseudocount=pseudocount).groupby('aa')),"\n\n")


        self.weights = self.counter.count(ref_seq)\
            .get_aa_table(normed=True, pseudocount=pseudocount).groupby('aa').transform(lambda x: x / x.max())
            #.groupby('aa').apply(lambda x: x / x.max())
        print('self.weights ####### : ', self.weights , type(self.weights),'\n\n')

        #self.weights = self.counter.count(ref_seq)\
        #    .get_aa_table(normed=True, pseudocount=pseudocount)\
        #    .groupby('aa').apply(lambda x: x / x.max())
        #print('self.weights : ', self.weights , type(self.weights),'\n\n')
        self.weights[self.weights >= self.thresh] = 1  # optimal
        
        print('self.weights : ', self.weights , type(self.weights),'\n\n')
        
        self.weights[self.weights < self.thresh] = 0  # non-optimal
        
        print('self.weights : ', self.weights , type(self.weights),'\n\n')
        
        print(self.weights.to_string())
        
        #self.weights = self.weights.drop_duplicates()
        
        
        
        #print("self.weights.drop_duplicates() : ", self.weights , type(self.weights),'\n\n')
        
        #print(self.weights.to_string())
        
        
        self.weights = self.weights.droplevel('aa')
        
        print("self.weights.droplevel('aa') : ", self.weights , type(self.weights),'\n\n')
        
        print(self.weights.to_string())
        
        print('self.weights.values : \n', self.weights.values)
        
        print('self.weights.keys() : \n', self.weights.keys())

    def _calc_score(self, seq):
        #counts = self.counter.count(seq).counts
        
        #print('\nself.weights : \n', self.weights.to_string())
        
        #print('\ncounts : \n', counts)

        #return mean(self.weights, counts)
        
        print('(i[1] for i in self._calc_vector(seq))', [i[1] for i in self._calc_vector(seq)])
        print('len(seq)/3 ', len(seq)/3)
        
        return sum(i[1] for i in self._calc_vector(seq))/(len(seq)/3)

    def _calc_vector(self, seq):
    
        print('self._get_codon_vector(seq) : \n' , self._get_codon_vector(seq))
        
        #return self.weights.reindex(self._get_codon_vector(seq)).values
        
        return [(i , self.weights.get(key = i.upper())) for i in self._get_codon_vector(seq)]

Then run this test_code.py :

import codonbias as cb

sequence_list = ["atgccgaaaagcttttatgatgccgtgggcggcgcgaaaacctttgatgcgattgtgagc",
                 "cgcttttatgcgcaggtggcggaagatgaagtgctgcgccgcgtgtatccggaagatgat",
                 "ctggcgggcgcggaagaacgcctgcgcatgtttctggaacagtattggggcggcccgcgc",
                 "aagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaag",
                 "atgatgatggccgccgcc"]

FOC = cb.scores.FrequencyOfOptimalCodons(ref_seq=sequence_list, genetic_code=11)


print('\n\nFOC : \n', FOC)


print('score: \n',FOC._calc_score(sequence_list[0]))

print('calc : \n', FOC._calc_vector(sequence_list[0]))

print('calc : \n', FOC._calc_vector(sequence_list[1]))

print('calc : \n', FOC._calc_vector(sequence_list[2]))

print('calc : \n', FOC._calc_vector(sequence_list[3]))

print('score: \n',FOC._calc_score(sequence_list[3]))

print('calc : \n', FOC._calc_vector(sequence_list[4]))

print('score: \n',FOC._calc_score(sequence_list[4]))

output , kind of long and needs to be studied:

self.counter:  <codonbias.stats.CodonCounter object at 0x7f9b64e1d0d0> <class 'codonbias.stats.CodonCounter'> 


__class__
__delattr__
__dict__
__dir__
__doc__
__eq__
__format__
__ge__
__getattribute__
__gt__
__hash__
__init__
__init_subclass__
__le__
__lt__
__module__
__ne__
__new__
__reduce__
__reduce_ex__
__repr__
__setattr__
__sizeof__
__str__
__subclasshook__
__weakref__
_count
_count_single
_format_counts
_init_table
concat_index
count
genetic_code
get_aa_table
get_codon_table
ignore_stop
k_mer
sum_seqs
self.counter.count(ref_seq).get_aa_table(normed=True, pseudocount=pseudocount) :  aa  codon
A   GCA      0.071429
    GCC      0.357143
    GCG      0.500000
    GCT      0.071429
C   TGC      0.500000
               ...   
V   GTG      0.666667
    GTT      0.111111
W   TGG      1.000000
Y   TAC      0.166667
    TAT      0.833333
Name: count, Length: 61, dtype: float64 <class 'pandas.core.series.Series'> 


self.counter.count(ref_seq).get_aa_table(normed=True, pseudocount=pseudocount).groupby('aa') :  <pandas.core.groupby.generic.SeriesGroupBy object at 0x7f9b64db78e0> <class 'pandas.core.groupby.generic.SeriesGroupBy'> 


self.weights ####### :  aa  codon
A   GCA      0.142857
    GCC      0.714286
    GCG      1.000000
    GCT      0.142857
C   TGC      1.000000
               ...   
V   GTG      1.000000
    GTT      0.166667
W   TGG      1.000000
Y   TAC      0.200000
    TAT      1.000000
Name: count, Length: 61, dtype: float64 <class 'pandas.core.series.Series'> 


self.weights :  aa  codon
A   GCA      0.142857
    GCC      0.714286
    GCG      1.000000
    GCT      0.142857
C   TGC      1.000000
               ...   
V   GTG      1.000000
    GTT      0.166667
W   TGG      1.000000
Y   TAC      0.200000
    TAT      1.000000
Name: count, Length: 61, dtype: float64 <class 'pandas.core.series.Series'> 


self.weights :  aa  codon
A   GCA      0.0
    GCC      0.0
    GCG      1.0
    GCT      0.0
C   TGC      1.0
            ... 
V   GTG      1.0
    GTT      0.0
W   TGG      1.0
Y   TAC      0.0
    TAT      1.0
Name: count, Length: 61, dtype: float64 <class 'pandas.core.series.Series'> 


aa  codon
A   GCA      0.0
    GCC      0.0
    GCG      1.0
    GCT      0.0
C   TGC      1.0
    TGT      1.0
D   GAC      0.0
    GAT      1.0
E   GAA      1.0
    GAG      0.0
F   TTC      0.0
    TTT      1.0
G   GGA      0.0
    GGC      1.0
    GGG      0.0
    GGT      0.0
H   CAC      1.0
    CAT      1.0
I   ATA      0.0
    ATC      0.0
    ATT      1.0
K   AAA      0.0
    AAG      1.0
L   CTA      0.0
    CTC      0.0
    CTG      1.0
    CTT      0.0
    TTA      0.0
    TTG      0.0
M   ATG      1.0
N   AAC      1.0
    AAT      1.0
P   CCA      0.0
    CCC      0.0
    CCG      1.0
    CCT      0.0
Q   CAA      0.0
    CAG      1.0
R   AGA      0.0
    AGG      0.0
    CGA      0.0
    CGC      1.0
    CGG      0.0
    CGT      0.0
S   AGC      1.0
    AGT      0.0
    TCA      0.0
    TCC      0.0
    TCG      0.0
    TCT      0.0
T   ACA      0.0
    ACC      1.0
    ACG      0.0
    ACT      0.0
V   GTA      0.0
    GTC      0.0
    GTG      1.0
    GTT      0.0
W   TGG      1.0
Y   TAC      0.0
    TAT      1.0
self.weights.droplevel('aa') :  codon
GCA    0.0
GCC    0.0
GCG    1.0
GCT    0.0
TGC    1.0
      ... 
GTG    1.0
GTT    0.0
TGG    1.0
TAC    0.0
TAT    1.0
Name: count, Length: 61, dtype: float64 <class 'pandas.core.series.Series'> 


codon
GCA    0.0
GCC    0.0
GCG    1.0
GCT    0.0
TGC    1.0
TGT    1.0
GAC    0.0
GAT    1.0
GAA    1.0
GAG    0.0
TTC    0.0
TTT    1.0
GGA    0.0
GGC    1.0
GGG    0.0
GGT    0.0
CAC    1.0
CAT    1.0
ATA    0.0
ATC    0.0
ATT    1.0
AAA    0.0
AAG    1.0
CTA    0.0
CTC    0.0
CTG    1.0
CTT    0.0
TTA    0.0
TTG    0.0
ATG    1.0
AAC    1.0
AAT    1.0
CCA    0.0
CCC    0.0
CCG    1.0
CCT    0.0
CAA    0.0
CAG    1.0
AGA    0.0
AGG    0.0
CGA    0.0
CGC    1.0
CGG    0.0
CGT    0.0
AGC    1.0
AGT    0.0
TCA    0.0
TCC    0.0
TCG    0.0
TCT    0.0
ACA    0.0
ACC    1.0
ACG    0.0
ACT    0.0
GTA    0.0
GTC    0.0
GTG    1.0
GTT    0.0
TGG    1.0
TAC    0.0
TAT    1.0
self.weights.values : 
 [0. 0. 1. 0. 1. 1. 0. 1. 1. 0. 0. 1. 0. 1. 0. 0. 1. 1. 0. 0. 1. 0. 1. 0.
 0. 1. 0. 0. 0. 1. 1. 1. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0.
 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 1. 0. 1.]
self.weights.keys() : 
 Index(['GCA', 'GCC', 'GCG', 'GCT', 'TGC', 'TGT', 'GAC', 'GAT', 'GAA', 'GAG',
       'TTC', 'TTT', 'GGA', 'GGC', 'GGG', 'GGT', 'CAC', 'CAT', 'ATA', 'ATC',
       'ATT', 'AAA', 'AAG', 'CTA', 'CTC', 'CTG', 'CTT', 'TTA', 'TTG', 'ATG',
       'AAC', 'AAT', 'CCA', 'CCC', 'CCG', 'CCT', 'CAA', 'CAG', 'AGA', 'AGG',
       'CGA', 'CGC', 'CGG', 'CGT', 'AGC', 'AGT', 'TCA', 'TCC', 'TCG', 'TCT',
       'ACA', 'ACC', 'ACG', 'ACT', 'GTA', 'GTC', 'GTG', 'GTT', 'TGG', 'TAC',
       'TAT'],
      dtype='object', name='codon')


FOC : 
 <codonbias.scores.FrequencyOfOptimalCodons object at 0x7f46da53ebe0>
self._get_codon_vector( atgccgaaaagcttttatgatgccgtgggcggcgcgaaaacctttgatgcgattgtgagc ) : 
 ['atg', 'ccg', 'aaa', 'agc', 'ttt', 'tat', 'gat', 'gcc', 'gtg', 'ggc', 'ggc', 'gcg', 'aaa', 'acc', 'ttt', 'gat', 'gcg', 'att', 'gtg', 'agc']
(i[1] for i in self._calc_vector( atgccgaaaagcttttatgatgccgtgggcggcgcgaaaacctttgatgcgattgtgagc )) [1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
len(seq)/3  20.0
self._get_codon_vector( atgccgaaaagcttttatgatgccgtgggcggcgcgaaaacctttgatgcgattgtgagc ) : 
 ['atg', 'ccg', 'aaa', 'agc', 'ttt', 'tat', 'gat', 'gcc', 'gtg', 'ggc', 'ggc', 'gcg', 'aaa', 'acc', 'ttt', 'gat', 'gcg', 'att', 'gtg', 'agc']
score: 
 0.85
self._get_codon_vector( atgccgaaaagcttttatgatgccgtgggcggcgcgaaaacctttgatgcgattgtgagc ) : 
 ['atg', 'ccg', 'aaa', 'agc', 'ttt', 'tat', 'gat', 'gcc', 'gtg', 'ggc', 'ggc', 'gcg', 'aaa', 'acc', 'ttt', 'gat', 'gcg', 'att', 'gtg', 'agc']
calc : 
 [('atg', 1.0), ('ccg', 1.0), ('aaa', 0.0), ('agc', 1.0), ('ttt', 1.0), ('tat', 1.0), ('gat', 1.0), ('gcc', 0.0), ('gtg', 1.0), ('ggc', 1.0), ('ggc', 1.0), ('gcg', 1.0), ('aaa', 0.0), ('acc', 1.0), ('ttt', 1.0), ('gat', 1.0), ('gcg', 1.0), ('att', 1.0), ('gtg', 1.0), ('agc', 1.0)]
self._get_codon_vector( cgcttttatgcgcaggtggcggaagatgaagtgctgcgccgcgtgtatccggaagatgat ) : 
 ['cgc', 'ttt', 'tat', 'gcg', 'cag', 'gtg', 'gcg', 'gaa', 'gat', 'gaa', 'gtg', 'ctg', 'cgc', 'cgc', 'gtg', 'tat', 'ccg', 'gaa', 'gat', 'gat']
calc : 
 [('cgc', 1.0), ('ttt', 1.0), ('tat', 1.0), ('gcg', 1.0), ('cag', 1.0), ('gtg', 1.0), ('gcg', 1.0), ('gaa', 1.0), ('gat', 1.0), ('gaa', 1.0), ('gtg', 1.0), ('ctg', 1.0), ('cgc', 1.0), ('cgc', 1.0), ('gtg', 1.0), ('tat', 1.0), ('ccg', 1.0), ('gaa', 1.0), ('gat', 1.0), ('gat', 1.0)]
self._get_codon_vector( ctggcgggcgcggaagaacgcctgcgcatgtttctggaacagtattggggcggcccgcgc ) : 
 ['ctg', 'gcg', 'ggc', 'gcg', 'gaa', 'gaa', 'cgc', 'ctg', 'cgc', 'atg', 'ttt', 'ctg', 'gaa', 'cag', 'tat', 'tgg', 'ggc', 'ggc', 'ccg', 'cgc']
calc : 
 [('ctg', 1.0), ('gcg', 1.0), ('ggc', 1.0), ('gcg', 1.0), ('gaa', 1.0), ('gaa', 1.0), ('cgc', 1.0), ('ctg', 1.0), ('cgc', 1.0), ('atg', 1.0), ('ttt', 1.0), ('ctg', 1.0), ('gaa', 1.0), ('cag', 1.0), ('tat', 1.0), ('tgg', 1.0), ('ggc', 1.0), ('ggc', 1.0), ('ccg', 1.0), ('cgc', 1.0)]
self._get_codon_vector( aagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaag ) : 
 ['aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag']
calc : 
 [('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0), ('aag', 1.0)]
self._get_codon_vector( aagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaag ) : 
 ['aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag']
(i[1] for i in self._calc_vector( aagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaag )) [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
len(seq)/3  20.0
self._get_codon_vector( aagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaagaag ) : 
 ['aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag', 'aag']
score: 
 1.0
self._get_codon_vector( atgatgatggccgccgcc ) : 
 ['atg', 'atg', 'atg', 'gcc', 'gcc', 'gcc']
calc : 
 [('atg', 1.0), ('atg', 1.0), ('atg', 1.0), ('gcc', 0.0), ('gcc', 0.0), ('gcc', 0.0)]
self._get_codon_vector( atgatgatggccgccgcc ) : 
 ['atg', 'atg', 'atg', 'gcc', 'gcc', 'gcc']
(i[1] for i in self._calc_vector( atgatgatggccgccgcc )) [1.0, 1.0, 1.0, 0.0, 0.0, 0.0]
len(seq)/3  6.0
self._get_codon_vector( atgatgatggccgccgcc ) : 
 ['atg', 'atg', 'atg', 'gcc', 'gcc', 'gcc']
score: 
 0.5



I am not sure I got it right, but to me now I get what the Docs states.

As per comments under your question:

The error is in Pandas dataframe so this issue is likely due to the way the codon-bias package is constructing or processing its internal DataFrame based on the reference sequences you're providing. Check he content of your sequence_list. You could do a test with a very simple one and see if the error still occurs. –

there is something wrong in the class definition meaning it doesnt work, let the developer/manteiner know about this bug. Also The score for a sequence is the fraction of codons in the sequence deemed optimal. The returned vector for a sequence is a binary array where optimal positions contain 1 and non-optimal ones contain 0. These have to be called on FOC like FOC.__calc_score(seq) FOC._calc_vector(seq) –

BE WARNED:

that calculating the score of sequences that are not made of exact triplets (that contain 2 or 1 extra nucleotide at their C-term) will throw an error like :

line 202, in _calc_score
    return sum(i[1] for i in self._calc_vector(seq))/(len(seq)/3)
TypeError: unsupported operand type(s) for +: 'float' and 'NoneType'

can change def _calc_score with :

`def _calc_score(self, seq):

    return sum(i[1] for i in self._calc_vector(seq) if isinstance(i[1], float))/(len(seq)/3)

`

I strongly suggest to open an issue on the project github page:

https://github/alondmnt/codon-bias/issues

发布评论

评论列表(0)

  1. 暂无评论