You are on page 1of 9

American Journal of Medical Genetics (Neuropsychiatric Genetics) 81:248–256 (1998)

Integrating Clinical and Laboratory Data in


Genetic Studies of Complex Phenotypes:
A Network-Based Data Management System
Francis J. McMahon,1* C.J.M. Thomas,1 Rebecca J. Koskela,2 Theresa S. Breschel,1
Tyler C. Hightower,1 Nichole Rohrer,1 Christine Savino,1 Melvin G. McInnis,1 Sylvia G. Simpson,1
and J. Raymond DePaulo1
1
Department of Psychiatry and Behavioral Sciences, The Johns Hopkins University School of Medicine,
Baltimore, Maryland
2
Department of Genetics, Stanford University, Stanford, California

The identification of genes underlying a later as needed. Am. J. Med. Genet. (Neuro-
complex phenotype can be a massive under- psychiatr. Genet.) 81:248–256, 1998.
taking, and may require a much larger © 1998 Wiley-Liss, Inc.
sample size than thought previously. The in-
tegration of such large volumes of clinical KEY WORDS: relational database; linkage;
and laboratory data has become a major data integrity; modular
challenge. In this paper we describe a net-
work-based data management system de-
signed to address this challenge. Our system INTRODUCTION
offers several advantages. Since the system
uses commercial software, it obviates the The identification of genes underlying a complex
acquisition, installation, and debugging of phenotype can be a massive undertaking. Data man-
privately-available software, and is fully agement for such studies must cope with large sample
compatible with Windows and other com- sizes, multiple data storage sites, and some data that
mercial software. The system uses rela- change over time. The integration of such large vol-
tional database architecture, which offers umes of clinical and laboratory data has become a ma-
exceptional flexibility, facilitates complex jor challenge in genetic studies of complex phenotypes.
data queries, and expedites extensive data Gene identification in complex phenotypes may re-
quality control. The system is particularly quire a much larger sample size than thought previ-
designed to integrate clinical and labora- ously. Large sample sizes may be needed for the initial
tory data efficiently, producing summary detection of linkage, and even larger sample sizes for
reports, pedigrees, and exported files con- replication of linkage findings [Suarez et al., 1995].
taining both phenotype and genotype data Narrowing the linkage finding to a physically-
in a virtually unlimited range of formats. We mappable chromosomal location may require more
describe a comprehensive system that man- than 2,000 affected sib-pairs [Kruglyak and Lander,
ages clinical, DNA, cell line, and genotype 1995]. Furthermore, each affected subject is typically
data, but since the system is modular, re- associated with a large amount of clinical data and the
searchers can set up only those elements more than 300 genotypes that are generated in ge-
which they need immediately, expanding nome-wide linkage searches.
Each type of primary data has special storage re-
quirements. Clinical assessments are typically col-
lected on handwritten forms and are often supported
Contract grant sponsor: Charles A. Dana Foundation; Contract by copies of medical records and other documentary
grant sponsor: NIMH; Contract grant sponsor: National Alliance evidence in various nonstandard formats. Blood
for Research on Schizophrenia and Depression; Contract grant samples and cell lines must be tracked from subject to
sponsor: Johns Hopkins University Affective Disorders Fund;
freezer. Genotype data may exist in the form of auto-
Contract grant sponsor: Johns Hopkins University George
Browne Laboratory Fund. radiographs or the specialized data files produced by
*Correspondence to: Francis J. McMahon, M.D., Department of automated genotyping systems.
Psychiatry and Behavioral Sciences, The Johns Hopkins Univer- The data generated in genetic studies are not static,
sity School of Medicine, Meyer 3-181, 600 N. Wolfe Street, Balti- but change over time. Previously unavailable relatives
more, MD 21287-7381. E-mail: fmcm@welchlink.wech.jhu.edu may volunteer for the study or previously studied sub-
Received 10 September 1997; Revised 13 January 1998 jects may die. Clinical data may change after longitu-
© 1998 Wiley-Liss, Inc.
Network-based Data Management System 249

dinal follow-up. DNA sample supplies dwindle and Each computer should be connected to a printer, ei-
must be replaced. Genotype data may require correc- ther through the network or through a printer buffer.
tion if they fail to segregate or when false paternity or We use two different types of printers. Our main
sample mix-ups are detected. Thus, regular archiving printer is a Hewlett Packard (HP) LaserJet 4M Plus
and updating of data are required to forestall degen- (Hewlett Packard, Palo Alto, CA), which is linked to the
eration of the database over time. network and is centrally located for multiple users. We
In this paper we describe a network-based data man- also have HP Inkjet 500-series printers attached di-
agement system designed to address the special re- rectly to the computers that serve the DNA, cell line,
quirements of family studies of complex phenotypes. and genotype modules.
The system is expandable, modular, and easily adapted The DNA module is designed to work with a spectro-
to a wide variety of studies. The system uses existing photometer that measures DNA concentration and
relational database, pedigree, and networking software quality. We use an HP Diode Array Spectrophotometer
and standard PC hardware. The efficient integration of (model 8452A) connected to an HP Vectra 486/33N
clinical and laboratory data in the form of output files, computer with 16 MB RAM. Included with the spectro-
summary reports, and pedigrees is a major feature. photometer is general scanning and quantitation soft-
ware that enables the user to program desired absor-
MATERIALS AND METHODS bance wavelengths for DNA quantitation.
User Input
Software Requirements
Our first task in designing a data management sys-
tem focused on the ‘‘users,’’ i.e., the clinicians, research This data management system requires two types of
assistants, and laboratory technicians who would be software: commercial or ‘‘off the shelf ’’ programs, and
using the system every day. We involved the users in customized programs written by us expressly for use
deciding which data elements needed to be considered, with the database software or for software used by in-
the naming of fields and tables, and the design of data dividual modules.
entry forms. After each module of the system was made The commercial software required includes a rela-
available, users were polled in a series of feedback tional database program, backup software (usually pro-
meetings about whether that module was accomplish- vided with the backup drive), a pedigree drawing pro-
ing the desired tasks in an efficient and user-friendly gram, and optional network software. The relational
fashion. If not, that module was redesigned to better database software is the keystone, providing storage,
fulfill user needs. After the system was entirely in querying, and reporting capabilities. The relational da-
place, further expansions and modifications were con- tabase software must also allow each module to access
sidered as needed. data both within and between modules on different
Hardware Requirements computers. One of the most useful features of this da-
tabase is the ability to output data into a pedigree
The hardware requirements for this system depend drawing format, so the software chosen for the pedigree
on the size of the sample to be studied and the modules drawing should be able to import files. We use Paradox
used. At least one computer is required to house the version 5.0 (Borland Intl., Scotts Valley, CA) for the
main database, and (ideally) at least one computer is relational database, Colorado Backup version 2.80, Cy-
devoted to each module, located in a spot where the rillic 2.1 (Cherwell Scientific, Oxford, UK) for the pedi-
users will have ready and continuous access. As the gree drawing, and Windows for Workgroups 3.11 on
main computer we use a Pentium 75-MHz machine the server computer. Computers used for the indi-
(Compaq Computer Corp., Houston, TX) with 16 MB of vidual modules use either Windows for Workgroups
RAM, a 2-gigabyte hard drive, and a Colorado Jumbo 3.11 or Windows NT 3.51/4.0.
1400 tape drive (Hewlett Packard, Palo Alto, CA). The customized programs come in three types. The
These should be viewed as reasonable minimum re- first type facilitates the use of the database software.
quirements, since this computer stores all the data and These programs consist of data entry forms, reports,
acts as a server for the entire system. For each module and queries that are part of the relational database
we use at least one 486/66-MHz or Pentium 75-MHz program options; only knowledge of that software is
computer with at least 16 MB RAM and a 500-MB hard needed. The second type of customized program builds
drive. For the genotype module, a Macintosh computer on the options within the database software (for Para-
(Apple Computer, Cupertino, CA) is also needed if dox these programs are written in ObjectPal). For ex-
automated genotypes will be processed using the ample, we have created ‘‘smart forms’’ that aid data
GeneScan 2.1fc2 and Genotyper 1.1 programs (Applied entry by filling some fields with prespecified default
Biosystems, Foster City, CA). values, skipping inappropriate fields based on previ-
If additional computers are being used, then a net- ously entered values (e.g., skipping an IF YES, SPECIFY
work must also be in place and each networked com- field when ‘‘no’’ was entered into the previous field),
puter needs a network card. The network system we and automatically calculating sums and differences.
use is a departmental local area network (LAN), con- Other forms provide a ‘‘button’’ that when ‘‘pressed’’
necting each computer to a central hub, with struc- executes other programs, e.g., backing up the data files
tured cabling utilizing 10BaseT Ethernet. The hub also or archiving old data. These custom programs, which
connects the LAN to the campus-wide network using modify and extend features within the database soft-
fiber-optic cable, which provides access to the Internet. ware, call for additional knowledge of that software
250 McMahon et al.

and some basic programming skills. The final type of Notation


program is more complex. It includes a BASIC program
that converts the data output from the spectrophotom- Throughout this manuscript, all variable fields are
eter into the correct format for the database, and an indicated in SMALL CAPITALS and data tables in italics.
ObjectPal program that then appends the spectropho-
tometric data to the appropriate database table. An- RESULTS
other example is the ObjectPal program that gathers Overall Structure
all the relevant data and creates an ASCII file that the
pedigree drawing program processes into a visual rep- The overall structure of the database is illustrated in
resentation of a pedigree with selected information dis- Figure 1. Each of the four modules is represented by a
played. These programs were written by the authors circle. The Master Pedigree table joins all modules via
(C.J.M.T. and R.J.K.) and require a more advanced individual unique identification numbers for each sub-
knowledge of the internal programming of the data- ject. This table also stores pedigree structure informa-
base as well as programming languages such as BASIC tion, such as parent identifiers and gender. Each record
and ObjectPal. The original code for all our customized in the Master Pedigree table is joined via the SUBJECT
programs is available upon request to the correspond- ID to a record in the Alias table. The Alias table stores
ing author. all alternative identifiers ever used for that subject,

Fig. 1. Each module is shown by an open circle. The Master Pedigree table joins the individual modules via the subject identifiers. The Alias table
stores all alternative identifiers for each subject. Asterisk (*) indicates key field.
Network-based Data Management System 251

including numbers assigned to blood or DNA samples, for each enrolled subject and family. The module con-
and numbers assigned by other studies in which the sists of three sets of related tables (Fig. 2). The Family
subject may also be enrolled. So that the Master Pedi- table stores information about each family as a whole,
gree and Alias tables can be used to join any other such as method of ascertainment, status in the study
tables in queries, all subjects appear once and only (from ‘‘enrolled’’ to ‘‘all interviews and blood draws
once in both the Master Pedigree and Alias tables. To complete’’), or the phenotype segregating in the family.
help ensure referential integrity, data may be entered The Administrative tables store information about
elsewhere in the system only for subjects with an entry each subject for bookkeeping purposes, such as demo-
in both the Master Pedigree and Alias tables. graphics, consent forms, and subject reimbursement,
Clinical Modules as well as each subject’s status in the study (from ‘‘en-
rolled’’ to ‘‘interview and blood draw complete’’). The
Structure. The clinical data module is designed to Phenotype tables store information derived from the
track all the clinical findings and administrative data clinical evaluation, including symptoms and diagnosis.

Fig. 2. The clinical data module tracks all the clinical findings and administrative data on each subject and family in the study. Tables and groups
of tables are designated by rectangles, fields by ellipses attached by dotted lines. Major relationships between tables are indicated by a solid line, with
the linking field shown. One to many relationships are shown by crows’ feet. Asterisk (*) indicates key field.
252 McMahon et al.

Depending on the phenotype studied, these tables 260/280-nm ratio and other DNA quality ratios, and
might also store clinical laboratory data, such as the the ASCII filename of the spectrophotometer output
results of glucose tolerance testing. file. This SPECTRAL FILE FLAG allows the user to access
Data flow. When a family is ascertained, each in- graphical spectrophotometric data for each vial of
dividual who appears in the pedigree is given a unique DNA. The DNA Volume table records the usage history
identifier (SUBJECT ID), under which all of that sub- of each vial of DNA. The Archive DNA Volume table
ject’s data are stored. The SUBJECT ID is placed in the stores all the DNA sample and concentration informa-
Master Pedigree and Alias tables. The family status tion as well as the usage history once a DNA vial is
(‘‘enrolled’’) and the method of ascertainment are en- emptied.
tered as well. Data flow. Each time a new vial of DNA is gener-
After completion of the evaluation, the clinical find- ated, the SUBJECT ID, DNA vial identifiers, and other
ings are entered into the Phenotype table. Once all data are entered into the data base, using a form. Dur-
available members of the family have been interviewed ing the data entry process, data (in ASCII format) are
and blood has been received, the status of the family is imported from the spectrophotometer into the DNA da-
changed to ‘‘complete’’ in the Family table. tabase, using a program that also calculates the DNA
The administrative tables are used to process sub- concentration based on 260-nm absorbance data. Any
jects through all stages of the study, from ascertain- aliquots of DNA removed are automatically subtracted
ment through follow-up. This allows for a confirmation from the original starting volume, so that there is a
that each subject has signed a consent form, received re- continuously-updated record of the current volume and
imbursement, undergone clinical evaluation, and so on. amount of DNA for each vial and for each subject in the
DNA Module study.
Once a vial of DNA becomes empty, this information
Structure. The DNA data module tracks the loca- is entered in the DNA Volume table and a script is
tion, quantity, quality, and current volume of each vial activated that simultaneously removes all the entries
of DNA for each subject. The module consists of three pertaining to that particular vial from the DNA Con-
related tables (Fig. 3). The DNA Sample table tracks centration, DNA Sample, and DNA Volume tables to
the date a particular vial of DNA was extracted, its Archive DNA Volume. This table can be accessed at will
freezer box location, and the starting volume for every and serves an essential function by preserving all data
DNA vial. Spectrophotometric data associated with entered into the DNA module. This allows users to
each DNA vial are imported into the DNA Concentra- track down key errors and account for other data dis-
tion table. These data include the concentration data, crepancies that may arise.

Fig. 3. The DNA data module tracks the location, quality, and current volume of each DNA vial for each subject. See Figure 2 legend for explanation
of symbols.
Network-based Data Management System 253

The primary output for the DNA database is a report cell line is removed from the freezer, this fact is also
listing the box location, current volume, and total entered into the Usage History table, with the field
amount of DNA per vial and per subject. Other reports showing it as a new removal. The vial is then automati-
can be generated that essentially function as flags. For cally deleted from the Storage table. As a result, the
example, reports are generated when a particular DNA Storage table always contains an accurate inventory of
vial is of poor quality (e.g., out-of-range 260/280-nm the available cell lines and their locations. The Usage
ratio) or when the amount of DNA for a particular sub- History table contains a record of all additions and re-
ject goes below a user-specified value, thereby alerting movals, and thus acts as an archive for the Storage
the technician to begin cell culture for the extraction of table.
new DNA. The main output file can also be interfaced Reports are generated to identify subjects needing to
with the cell line, clinical, and genotype databases. have a blood sample redrawn because the culture
failed, to summarize the number and locations of cell
Cell Line Module line vials stored for each individual, and to alert labo-
Structure. The cell line module consists of three ratory staff that the supply of cell line vials for an in-
related tables (Fig. 4). The Growth table contains data dividual has gone below a user-specified value.
about each growth attempt, recording the number of Genotype Module
attempts, quality of the growth, and reasons for any
failure. The Storage table contains the box and coordi- Structure. The genotype module consists of four
nate location data for each cell line vial in the freezer, related tables (Fig. 5). The Genotype table contains the
giving an up-to-date inventory of cell lines available for marker genotypes for each subject, in the form of arbi-
each subject studied. The Usage History table records trary allele numbers. The exact allele size in base pairs
any additions or removals to the freezer, thus tracking corresponding to each arbitrary allele number is re-
the usage of cell lines, and facilitating error checks and corded in the Allele Size table. The Reader table con-
audits. tains the information on who read the genotypes in
Data flow. When a blood sample arrives at the each family, with MARKER ID specified. Allele Size and
laboratory, an attempt is made to grow a cell line. The Reader are linked to other tables via FAMILY ID, since
vigor and quality of the culture and relevant dates are arbitrary allele numbers are assigned within each fam-
recorded in the Growth table for every growth attempt. ily. The last table, Markers, contains the reference in-
Once a cell line is grown successfully, the storage in- formation for all markers, linking the MARKER ID to the
formation is entered into the Usage History table, with marker name(s), chromosome, and location (if desired,
a field showing that it is a new addition. These data are multiple Markers tables can be used to group markers
automatically copied into the Storage tables. When a in a convenient manner, such as by chromosome).

Fig. 4. The cell line data module tracks the growth and storage of each vial of lymphoblastoid cells generated for each subject. See Figure 2 legend
for explanation of symbols.
254 McMahon et al.

Fig. 5. The genotype data module tracks the results of the polymorphic DNA marker analyses for each subject as well as descriptive data about the
markers used and their chromosomal locations. See Figure 2 legend for explanation of symbols.

Data flow. After a family has been clinically The primary output from the genotype module is
evaluated, the individuals with a DNA sample who are linked with the phenotype data from the clinical mod-
required for linkage analysis are selected for genotyp- ule to generate linkage files for analysis. Other outputs
ing. This is noted in the Master Pedigree table, in a field can also be generated, e.g., status reports summarizing
called GENOTYPING. Another field, NEED FOR PEDIGREE, genotype progress by individual or marker.
designates the individuals selected for genotyping
along with individuals required to connect the pedigree Pedigree Reports
structure, e.g., parents of a sib-pair.
Once the genotypes for each marker are determined, The most useful report format for family studies is
they are entered into the Genotype table. If the geno- often the pedigree itself. Therefore, we developed a
types were read automatically, these data are set in an method for importing any data of interest from the da-
importable format by a semiautomated routine. Output tabase into a pedigree drawing program, where the
from the ABI 373 sequencer is binned using the pro- data are displayed directly on the pedigree. This ap-
gram Genetic Analysis System (GAS 2.0; GAS © Alan proach preserves the flexibility of the relational data-
Young, Oxford University, 1993–1995), whose text out- base while displaying the data in the way that is most
put is imported into the database tables. If genotypes intuitive for genetic researchers.
are read manually, the information is entered directly For each report format, a program collects the data of
into the Genotype table. This is accomplished with a interest from the relevant tables in the database, joins
form that requires entering MARKER ID only once per these data with the pedigree structure information in
family and simplifies data entry by allowing entry into the Master Pedigree table, and formats the joined data
Genotype, Reader, and Allele Size tables all at once. for importing into the pedigree drawing program. Re-
This guarantees complete data for every genotype, e.g., sidual errors in pedigree structure are easily detected
that every arbitrary allele number corresponds to an at this point, since any errors will cause the pedigree
absolute allele size value in the database. drawing program to either reject the import file or
Network-based Data Management System 255

draw a discontinuous pedigree structure. After the quisition, installation, and debugging of privately-
pedigree is drawn, further adjustments in formatting available software, and is fully compatible with Win-
(but not content) can be applied prior to printing. This dows and most other software, both private and com-
ensures that the content of the pedigree reports agrees mercial. The system is based on relational database
exactly with the data in the database on the day the architecture, which offers exceptional flexibility, facili-
report was generated. tates complex data queries, and expedites extensive
Several different pedigree report formats have been data quality control. Data entry and retrieval forms
developed. For example, one format (Fig. 6) displays can be designed to facilitate these functions for users
the latest phenotype data, DNA availability, and study with little training. The system is particularly de-
status for each subject in a family. This format is par- signed to integrate clinical and laboratory data effi-
ticularly useful for deciding when to send a family into ciently in a variety of output formats. Summary re-
the laboratory for genotyping. Another format, used for ports, pedigrees, and exported files containing both
families that have already been genotyped, displays phenotype and genotype data are easily generated in a
the final clinical diagnosis along with genotype results virtually unlimited range of formats. Since the system
for a set of DNA markers. This format is useful for is modular, researchers can set up only those elements
checking the results of linkage analyses and construct- which they need immediately, expanding later as
ing haplotypes. Pedigree reports showing only struc- needed. Although our system works with PC-type ma-
ture and gender can also be generated for use in the chines in a Windows environment, the concepts can be
laboratory, where it is important to maintain blindness adapted to Macintosh or Unix platforms as well.
to phenotype. This system may not be suitable for all situations.
Much of the flexibility comes from the networking of
DISCUSSION several computers, but networks can be unstable and
good maintenance is essential. (This issue appears to
We describe a network-based data management sys- be less problematic with newer network protocols and
tem designed to address the special problems posed by operating systems such as Windows NT.) Our system is
family and genetic studies of complex phenotypes. The based on a central ‘‘hub’’ computer that stores all of the
system uses commercial relational database, pedigree, data; the system is thus disabled when the hub ma-
and networking software, and standard PC hardware. chine is down. This could be avoided by distributing the
The efficient integration of clinical and laboratory data database over several networked computers, but
is a major feature. The system is comprehensive, ex- backup procedures would then have to be implemented
pandable, modular, and highly adaptable to a wide va- for each networked computer. The powerful query
riety of studies. mechanisms offered by the relational database archi-
Our system offers several advantages for researchers tecture may pose security problems, since identifiers
studying the genetics of complex phenotypes. Since it can readily be joined with other data. Like most com-
uses commercial software, our system obviates the ac- mercially available relational database software, Para-

Fig. 6. Sample pedigree reports. SADS:


Clinical interview completed; Best Est: Best es-
timate diagnosis completed; G: Selected for
genotyping; L: Needed for linkage analysis, but
not selected for genotyping; C: Cell lines avail-
able; D: ù0.15 mg of DNA available; d: <0.15 mg
DNA available. Pedigree drawn by Cyrillic 2.1.
256 McMahon et al.

dox offers a password mechanism that limits which management system be able to adapt to changing
data a particular user can see or edit. This addresses needs. We describe a network-based data management
the problem satisfactorily for our system. Finally, as system that is highly adaptable and efficiently inte-
with any relational database, the establishment and grates clinical and laboratory data. While this system
maintenance of referential integrity are essential, usu- or its concepts may be of value to researchers, they
ally requiring effort from trained staff. should carefully consider all of the advantages and dis-
All relational database management systems should advantages of this and other systems before embarking
contain automatic procedures for ensuring referential on a large scale study.
integrity and other types of data integrity. Our system
automatically implements four types of data integrity
checks. To prevent ‘‘orphan’’ entries that cannot be ACKNOWLEDGMENTS
linked to other tables in the database, a limited hier- This work was supported by grants from the Charles
archy is imposed upon the database tables. Data may A. Dana Foundation Consortium on the Genetic Basis
be entered into the system only for subjects with en- of Manic Depressive Illness, the National Institutes of
tries in the Alias and Master Pedigree tables, and the Mental Health, and the National Alliance for Research
corresponding rows in these tables must be deleted be- on Schizophrenia and Depression, and by contributors
fore entries can be deleted elsewhere in the database. to the Affective Disorders Fund and the George Browne
To minimize key errors, a variety of range restrictions Laboratory Fund at Johns Hopkins University. We
are imposed at the point of data entry. Any data editing thank Susan Folstein and Thomas G. Marr for concep-
automatically generates a log entry recording the old tual input. Scott Allan, Dean MacKinnon, and Geri
and new values, date and time of change, reason for the Rochino provided user feedback. Diane M. Carroll, Re-
change, and the log-in name. Finally, the hard drive of nee Conte, Meryl Cooper, Sara Cushing, Cheryl Eckler,
the hub computer is automatically backed up every Jo Ellen Green, Barbara Schweitzer, and Eva Vishio
evening, and the backup tape is stored off-site. contributed to earlier versions of the present system.
Several other data management systems have been Matthew Goykers Eliot, Mike Grierson of Survey Re-
developed for genetic studies DOLINK [Cook et al., search Associates (Falls Church, VA), Phil Marcus of
1993] prepares data output formats that are compat- Training to Go (Ellicott City, MD), and Mark Oxenham
ible with many genetic analysis programs, but is not a of Cherwell Scientific (Oxford, UK) provided technical
complete data management system. DOLINK and the advice and assistance.
more recently introduced program Genome Topogra-
pher [Marr, 1996] complement our system, since
the relevant data can be easily exported in any text for- REFERENCES
mat for further manipulation by such programs. The Adams P (1994): A data management system specifically designed for ge-
LABMAN/LINKMAN package of data management nome searches of complex diseases. Genet Epidemiol 11:87–98.
programs [Adams, 1994] offers many of the same ba- Charru A, Jeunemaitre X, Soubrier F, Corvol P, Chatellier G (1994): Hy-
sic functions as our system, but lacks the flexibility pergene: A clinical and genetic database for genetic analysis of human
of our modular organization, and is currently not hypertension. J Hypertens 12:981–985.
fully compatible with Windows. The database system Cheung K-H, Nadkarni P, Silverstein S, Kidd JR, Pakstis AJ, Miller P,
HYPERGENE [Charru et al., 1994] is conceptually Kidd KK (1996): PhenoDB: An integrated client/server database for
linkage and population genetics. Comput Biomed Res 29;327–337.
quite similar to our system, but is designed for a Mac-
Cook CCH, Gurling HMD, Curtis D (1993): DOLINK—A computer pro-
intosh platform and does not offer modules to manage gram to facilitate management of genetic data and analyses. Ann Hum
DNA or cell line samples. Another relational system, Genet 57:307–310.
PhenoDB [Cheung et al., 1996], was developed with the Kruglyak L, Lander ES (1995): High-resolution genetic mapping of com-
aim of integrating population and linkage genetics. It plex traits. Am J Hum Genet 56:1212–1223.
offers some of the same capabilities as our system on a Marr TG (1996): Genome Topographer: A computerized system for doing
Unix platform, but was not designed to manage exten- genetics, finding genes and studying their function [abstract]. Am J
Hum Genet [suppl] 59:308.
sive clinical phenotype data.
Like the acquisition and analysis of scientific data Suarez BK, Hampe CL, Van Eedewegh P (1995): Problems of replicating
linkage claims in psychiatry. In Gershon ES, Cloninger CR (eds): ‘‘Ge-
itself, data management is a process that evolves over netic Approaches to Mental Disorders.’’ Washington, DC. American
the course of a study. Thus it is essential that a data Psychiatric Press, pp 23–46.

You might also like