| Lots of antibody drug have been successfully expressed in the prokaryotic system. Lacking knowledge on the process mechanism of recombinant protein expression has forced researchers to use “trial and error†to improve soluble production, which is time and labor consuming with low success rate.Great effort has been made to predict expression level simply based on protein amino acid sequence and several prediction models have been built. However, the application of these models is limited due to the diversity of data resource.Herein, we used high throughput cultivation and protein detection methods to collect standardized data. And antibody fragments such as single-chain antibody(sc Fv) and domain antibody(d Ab), which are small in molecular weights and have high percentages of constant region, are ideal protein models served for the studies on the correlation between protein primary sequence structure and soluble expression level.Firstly, we cultured sc Fv with 5 L fermentor and obtained the purified sc Fv with the concentration of 1.58 mg·m L-1, which was used to detect the expression level of d Ab.Adding different amino acids to the C-terminal of a d Ab often results in significant variation of soluble expression levels. G, P, C, and Q showed significant influence on d Ab soluble expression, both in supernatant and pellet. L showed significant affection on d Ab expression in supernatant, whereas, W, F, and Y in pellet.The amino acid sequence of d Abs highly concords with their soluble expression levels, with a consistency of over 70%. Through analyzing the soluble expression and sequence data of 65 d Abs using clustering and linear modeling, we show that certain amino acids panel could significantly affect the soluble expression of d Abs, with the specific amino acids composition in these panels being(S, R, N, D, Q),(G, R, C, N, S) and(R, S, G), respectively, in the supernatant, pellet and total amount. In addition, polar was found a vital factor affecting the soluble production of d Ab.Findings from the study may be able to directly impact the relevant industrial process development of antibody fragments, and also have strong potential application as a general guide for orthomutation of other therapeutic proteins. |