Ron Lab


 

Homologous recombination in ES cells using plasmids with C57BL/6 derived homology arms.

April 1, 2004

In conventional targeting vectors that employ relatively short homology arms, small deviations between the sequence of the targeting vector and target locus may negatively impact on frequency of homologous recombination (this seems not to be a problem when using BACs). Therefore people have gone out of their way to obtain the homology sequences for the arms from isogenic 129 DNA (ES cells are derived from 129 strains). However, today, it is very easy to obtain addresses for BACs (most of which were derived from C57BL/6 genomic DNA) from the mouse genome database, whereas to find a 129-derived BAC that picks up "your" gene you either have to be lucky or you have to screen one of the 129 BAC libraries to get an address (Comment: this state of affairs has changed since this page was first written in 2003, see link).

It turns out however that the 129 and C57BL/6 genomes have large blocks of complete identity and other large blocks over which many differences occur (Wade, et al., 2002, Nature;420:574). Therefore if you know that your gene falls in a region where the two strains are identical (SNP-poor area), it is probably safe to use the BACs derived from C57BL/6 as templates to construct the homology arms of the targeting vector, whereas if you know that your gene falls into a region where the two genomes have a high frequency of differences (SNP-rich islands), it is better to go to the extra trouble of getting 129 DNA.

Go to http://www.broad.mit.edu/personal/claire/mosaic.htm and Download "MSexcel file containing basic haplotype maps for comparisons between C57BL/6J and other strains (129S1/SvImJ, C3H/HeJ, Balb/cByJ)"

Go to UCSC (http://genome.ucsc.edu) and blat your sequence against the February 2003 assembly (Note: that in the Wade & Daly data base the location of the snps are expressed in terms of the February 2003 assembly, not the latest October 2003 assembly).

Get the address of "your" gene and check it against the Excel spreadsheet you downloaded (strainsnplist_all.xls) to determine if it lands in a SNP-rich or SNP-poor area.

I think this was informative. One of "my" genes fell solidly in a snp desert at the telomeric end of Chr 19 and I believe it would be safe to use a C57BL/6 BAC to construct the targeting vector for that one. The other gene falls at the junction between a snp-poor and snp-rich region in the middle of Chr 1 and for that gene I believe it safest to use 129 sequence to construct the targeting vector.

Communicated by David Ron