Sequencing of Hypertrophic and Dilated Cardiomyopathy Genes
Sequencing of Hypertrophic and Dilated Cardiomyopathy Genes
In the present study we show the feasibility of array based target enrichment of 23 HMC and DCM genes combined with NGS for diagnostic mutation screening in HCM/DCM patients. The use of DNA bar codes enabled pre-capture multiplexing, a step essential for increasing throughput and reducing the costs of analysis. This approach not only enables the simultaneous and comprehensive investigation of multiple target genes but also facilitates the analysis of multiple samples in parallel. We have developed an optimised capture array that is at least as sensitive as the current standard, Sanger sequencing. For HCM, the current capture array analyses three times more genes as compared with the current Sanger sequencing panel within the same turn-around time. Screening of more genes results in a higher diagnostic yield. For DCM patients, but not for HCM patients, we have developed a similar assay for TTN. This enables us to screen TTN on top of the 23 reported genes only for DCM patients.
We and others have shown that NGS technologies are on the verge of being broadly used in clinical laboratories. It is very likely that these new technologies will replace traditional (Sanger sequencing based and array based) sequencing tests for genetically heterogeneous disorders like HCM/DCM.
Targeted exome massively parallel sequencing has great potential for both research and clinical use. However, sequencing of the entire exome for diseases with 'limited' numbers of genes to be investigated is not practically feasible yet due to relatively high costs, variable depth of exon coverage, the extent of data analysis, and data storage. To overcome these problems, sequence capture based target enrichment for a limited number of genes followed by NGS can be an approach that is logistically and financially feasible.
Our first pilot study with five HCM patients with known pathogenic mutations showed that array based target enrichment and NGS could easily detect different types of known mutations (substitutions, insertions and deletions) and numerous non-pathogenic variants already detected with Sanger sequencing. All 57 variants detected with Sanger sequencing were also found with GS-FLX Titanium sequencing, including a coding 26 bp insertion in the MYBPC3 gene (figure 3). Because we detected 100% of the variants present in five HCM patients we proceeded with testing of additional patients. Nevertheless, in this pilot study we have observed that exon coverage varies significantly within one sample (determined by array design) resulting in lower confidentiality of particular variants (eg, MYBPC3: NM_000256.3: c.1093-24C>T). A balanced representation of all targeted exons would reduce the average coverage needed to detect variants with high confidentiality, consequently lowering the false negative rate. Therefore, we have designed arrays with a more balanced coverage as has previously been proposed by others. The rebalanced capture array has been used to analyse nine HCM patients and 19 DCM patients. The rebalanced design showed that 99.80% of the targeted coding bases were covered at least once and 99.64% at least 16×. We and others have calculated that at 15–16× coverage a 99% sensitivity is obtained. This means that for an experiment with a mean coverage of 100±35× the statistical chance that a variant is missed in a patient is 0.004% (for calculation see supplementary figure S2).
(Enlarge Image)
Figure 3.
Mapping of a coding 26 bp duplication in a hypertrophic cardiomyopathy patient. In contrast with short read platforms, a heterozygous insertion of 26 bp is easily mapped and reported.
The bases with low coverage were exon 5 from PRKAG2 (NM_016203.3) in all patients and exon 2 from LAMP2 (NM_002294.2) in about half of the patients. This is likely due to a high GC content, a phenomenon observed before. In the current test these exons are analysed by standard Sanger sequencing.
From the nine HCM patients the HCM gene panel had been analysed with Sanger sequencing. In these patients 99.2% of the present variants were detected with GS-FLX Titanium sequencing. The two undetected non-coding variants (identical in two individuals) were both times a single nucleotide insertion present in a region with multiple homopolymer stretches of four to six nucleotides. Variant detection and in particular indels in homopolymer stretches is a known problem with pyrosequencing due to incorrect base calling in these regions. In our datasets it has become clear that homopolymer stretches up to 5-mers are generally called properly, while 6-mers and more in general result in improper base calling. Screening all coding regions for ≥6-mers showed 17 exons with ≥6-mers. For diagnostic purposes, these exons are analysed by Sanger sequencing until better base calling algorithms are available.
Apart from the variants present in the already used gene panel, numerous additional variants were found in the genes that were not Sanger sequenced previously. Two variants (indicated in bold), which both result in amino acid changes (one amino acid change also predicts a splice site loss) (Table 3), are of possible clinical significance. The variant in LDB3 has been reported before while the VCL variant is novel.
In the DCM panel we detected 11 potentially pathogenic mutations in 10 individuals. The genes involved are MYH7 (three mutations), TNNI3 (one mutation), PLN (two identical mutations), LAMP2 (one mutation), LDB3 (one mutation), EMD (one mutation), and SCN5A (two mutations) (for details see Table 2). The variant in TNNI3, PLN, and SCN5A have already been reported in the literature, and all other variants are novel. Identification of additional variants in both HCM and DCM patients shows an increase in diagnostic yield upon extension of the gene panel sequenced.
Finally, we evaluated 30 index patients in parallel with Sanger sequencing. In these experiments we especially focused on the diagnostic value of this approach. In 10 patients a single potentially pathogenic variant was found, in two patients two potentially pathogenic variants were found, and in one patient three potentially pathogenic variants were identified. From the 17 variants 11 were in the genes regularly screened with Sanger sequencing, and six were found in genes that were additionally screened due to GS-FLX Titanium sequencing (Table 4). This shows that the clinical sensitivity can be increased by the addition of extra genes. In this experiment we were able to compare directly the sensitivity of Sanger sequencing and GS-FLX Titanium sequencing. Sensitivity of GS-FLX Titanium sequencing was at least as good as that of Sanger sequencing (99.7% vs 99.4%, respectively), showing that GS-FLX Titanium sequencing can replace Sanger sequencing in a diagnostic setting.
Small insertions and deletions (1–50 bp) represent the second most frequent class of variation in the human genome after SNPs. Throughout all the experiments described here we have particularly focused on the detection of indels. Two non-coding variants (MYL2 c.353+20 and c.353+46) were not reported in the GS-FLX Titanium variant list. Both variants are present in a region with multiple homopolymer stretches of four to six nucleotides, likely the underlying reason for improper variant calling. Nevertheless, the other 106 indels present in this study were called properly. In fact, pathogenic mutations caused by an insertion of a G nucleotide, deletion of CT nucleotides, deletion of AGA nucleotides, as well as an insertion of 26 bases were called correctly. Furthermore, multiple known non-coding indels like insertion of a C nucleotide, deletion of a C nucleotide, insertion of AC nucleotides, deletion of CTTCT nucleotides, insertion of ATTTT nucleotides, insertion of ATTTTGTTTT nucleotides, and insertion of ACAG nucleotides were all detected in these patients.
In conclusion, we have shown that on-array multiplexed sequence capture in combination with GS-FLX Titanium sequencing is suitable for reliable variant detection (sensitivity of 99.7%) and will increase clinical sensitivity in cardiomyopathy patients. To date, NGS is used as a research tool with high confidence and ease. In the present paper we demonstrate that due to continuing improvements in throughput, accuracy, cost and ease of data analysis, it has become feasible to apply NGS in a diagnostic setting.
Discussion
In the present study we show the feasibility of array based target enrichment of 23 HMC and DCM genes combined with NGS for diagnostic mutation screening in HCM/DCM patients. The use of DNA bar codes enabled pre-capture multiplexing, a step essential for increasing throughput and reducing the costs of analysis. This approach not only enables the simultaneous and comprehensive investigation of multiple target genes but also facilitates the analysis of multiple samples in parallel. We have developed an optimised capture array that is at least as sensitive as the current standard, Sanger sequencing. For HCM, the current capture array analyses three times more genes as compared with the current Sanger sequencing panel within the same turn-around time. Screening of more genes results in a higher diagnostic yield. For DCM patients, but not for HCM patients, we have developed a similar assay for TTN. This enables us to screen TTN on top of the 23 reported genes only for DCM patients.
We and others have shown that NGS technologies are on the verge of being broadly used in clinical laboratories. It is very likely that these new technologies will replace traditional (Sanger sequencing based and array based) sequencing tests for genetically heterogeneous disorders like HCM/DCM.
Targeted exome massively parallel sequencing has great potential for both research and clinical use. However, sequencing of the entire exome for diseases with 'limited' numbers of genes to be investigated is not practically feasible yet due to relatively high costs, variable depth of exon coverage, the extent of data analysis, and data storage. To overcome these problems, sequence capture based target enrichment for a limited number of genes followed by NGS can be an approach that is logistically and financially feasible.
Our first pilot study with five HCM patients with known pathogenic mutations showed that array based target enrichment and NGS could easily detect different types of known mutations (substitutions, insertions and deletions) and numerous non-pathogenic variants already detected with Sanger sequencing. All 57 variants detected with Sanger sequencing were also found with GS-FLX Titanium sequencing, including a coding 26 bp insertion in the MYBPC3 gene (figure 3). Because we detected 100% of the variants present in five HCM patients we proceeded with testing of additional patients. Nevertheless, in this pilot study we have observed that exon coverage varies significantly within one sample (determined by array design) resulting in lower confidentiality of particular variants (eg, MYBPC3: NM_000256.3: c.1093-24C>T). A balanced representation of all targeted exons would reduce the average coverage needed to detect variants with high confidentiality, consequently lowering the false negative rate. Therefore, we have designed arrays with a more balanced coverage as has previously been proposed by others. The rebalanced capture array has been used to analyse nine HCM patients and 19 DCM patients. The rebalanced design showed that 99.80% of the targeted coding bases were covered at least once and 99.64% at least 16×. We and others have calculated that at 15–16× coverage a 99% sensitivity is obtained. This means that for an experiment with a mean coverage of 100±35× the statistical chance that a variant is missed in a patient is 0.004% (for calculation see supplementary figure S2).
(Enlarge Image)
Figure 3.
Mapping of a coding 26 bp duplication in a hypertrophic cardiomyopathy patient. In contrast with short read platforms, a heterozygous insertion of 26 bp is easily mapped and reported.
The bases with low coverage were exon 5 from PRKAG2 (NM_016203.3) in all patients and exon 2 from LAMP2 (NM_002294.2) in about half of the patients. This is likely due to a high GC content, a phenomenon observed before. In the current test these exons are analysed by standard Sanger sequencing.
From the nine HCM patients the HCM gene panel had been analysed with Sanger sequencing. In these patients 99.2% of the present variants were detected with GS-FLX Titanium sequencing. The two undetected non-coding variants (identical in two individuals) were both times a single nucleotide insertion present in a region with multiple homopolymer stretches of four to six nucleotides. Variant detection and in particular indels in homopolymer stretches is a known problem with pyrosequencing due to incorrect base calling in these regions. In our datasets it has become clear that homopolymer stretches up to 5-mers are generally called properly, while 6-mers and more in general result in improper base calling. Screening all coding regions for ≥6-mers showed 17 exons with ≥6-mers. For diagnostic purposes, these exons are analysed by Sanger sequencing until better base calling algorithms are available.
Apart from the variants present in the already used gene panel, numerous additional variants were found in the genes that were not Sanger sequenced previously. Two variants (indicated in bold), which both result in amino acid changes (one amino acid change also predicts a splice site loss) (Table 3), are of possible clinical significance. The variant in LDB3 has been reported before while the VCL variant is novel.
In the DCM panel we detected 11 potentially pathogenic mutations in 10 individuals. The genes involved are MYH7 (three mutations), TNNI3 (one mutation), PLN (two identical mutations), LAMP2 (one mutation), LDB3 (one mutation), EMD (one mutation), and SCN5A (two mutations) (for details see Table 2). The variant in TNNI3, PLN, and SCN5A have already been reported in the literature, and all other variants are novel. Identification of additional variants in both HCM and DCM patients shows an increase in diagnostic yield upon extension of the gene panel sequenced.
Finally, we evaluated 30 index patients in parallel with Sanger sequencing. In these experiments we especially focused on the diagnostic value of this approach. In 10 patients a single potentially pathogenic variant was found, in two patients two potentially pathogenic variants were found, and in one patient three potentially pathogenic variants were identified. From the 17 variants 11 were in the genes regularly screened with Sanger sequencing, and six were found in genes that were additionally screened due to GS-FLX Titanium sequencing (Table 4). This shows that the clinical sensitivity can be increased by the addition of extra genes. In this experiment we were able to compare directly the sensitivity of Sanger sequencing and GS-FLX Titanium sequencing. Sensitivity of GS-FLX Titanium sequencing was at least as good as that of Sanger sequencing (99.7% vs 99.4%, respectively), showing that GS-FLX Titanium sequencing can replace Sanger sequencing in a diagnostic setting.
Small insertions and deletions (1–50 bp) represent the second most frequent class of variation in the human genome after SNPs. Throughout all the experiments described here we have particularly focused on the detection of indels. Two non-coding variants (MYL2 c.353+20 and c.353+46) were not reported in the GS-FLX Titanium variant list. Both variants are present in a region with multiple homopolymer stretches of four to six nucleotides, likely the underlying reason for improper variant calling. Nevertheless, the other 106 indels present in this study were called properly. In fact, pathogenic mutations caused by an insertion of a G nucleotide, deletion of CT nucleotides, deletion of AGA nucleotides, as well as an insertion of 26 bases were called correctly. Furthermore, multiple known non-coding indels like insertion of a C nucleotide, deletion of a C nucleotide, insertion of AC nucleotides, deletion of CTTCT nucleotides, insertion of ATTTT nucleotides, insertion of ATTTTGTTTT nucleotides, and insertion of ACAG nucleotides were all detected in these patients.
In conclusion, we have shown that on-array multiplexed sequence capture in combination with GS-FLX Titanium sequencing is suitable for reliable variant detection (sensitivity of 99.7%) and will increase clinical sensitivity in cardiomyopathy patients. To date, NGS is used as a research tool with high confidence and ease. In the present paper we demonstrate that due to continuing improvements in throughput, accuracy, cost and ease of data analysis, it has become feasible to apply NGS in a diagnostic setting.