Cells in the hematopoietic system undergo extensive phenotypic changes during the differentiation process, starting from stem cells, entering the committed lineage, and then to maturation. Each of these events must be tightly controlled by expression of specific sets of genes in the genome. Various genetic alternations, for example, those in leukemia will change the normal pattern of gene expression and will lead to the pathological transformation. Therefore, more complete data on the pattern of gene expression in both normal and leukemic states will likely result in a better understanding of the genetic control of hematopoietic differentiation and will provide clues for understanding the cause of abnormal differentiation.

We have initiated studies on genome-wide gene expression in both normal progenitor and leukemic cells. We have developed an integrated system that includes four “wet-bench” techniques and a bioinformatic program for this project: 1)The SPGI technique (Screening PolydA/dT(−) cDNAs for Gene Identification); 2) The IPGI technique (Integrated Procedures for Gene Identification); 3) A modified SAGE technique (Serial Analysis of Gene Expression); 4) the GLGI technique (Generation of Longer cDNA fragments from SAGE tags for Gene Identification).

Our goals are: 1). Qualitatively identify all the genes expressed in the targeted cells; 2). Quantitatively analyze the difference in the gene expression pattern between the normal and leukemic cells; 3). Identify the critical pathways through which altered gene expression leads to leukemia with the goal of using this information to develop targeted therapeutic strategies. 4). Applying the whole system to investigate the genome, for example, to identify the correct number of genes.

Over 27,000 and 22,000 unique tag sequences technique from human and mouse myeloid progenitor cells (CD15+ and GR-1+) have been identified respectively by applying SAGE method. Analysis of these unique tag sequences in the both species showed: 1) over 70% genes are expressed at low level whereas a small number of tags were expressed at high and intermediate levels; 2) about 60% of tags matched to known expressed sequences and 40% of tags had no match and these represent potentially novel genes; 3) analysis of the highly expressed genes identified the genes specifically expressed in myeloid progenitor cells, as well as genes ubiquitously expressed in other cell types. The data from this study provide a base for analyzing the genome-wide gene expression in the differentiation of human and mouse myeloid progenitor cells. The data also provide a baseline for further characterization of abnormal gene expression in the myeloid lineage in pathological states, especially in leukemia.