Introduction:

Descent computes a variety of statistics about consanguineal kin from a population genealogy, including coefficients of relatedness between each member of the population, inbreeding coefficient for each member of the population, average coefficient of relatedness of each member to the entire population, average coefficient of relatedness of each member to their consanguineal kin, number of kin of each population member, patri- and matrilineages, consanguineal connections between members of the population, and number of kin in particular categories (e.g., sisters) for each member of the population. Descent also computes the average relatedness of population subgroups (e.g. households), as well as the average relatedness of subgroups to each other. These statistics can then be used in other statistical analyses; for example, to determine the importance of various types of kinship for food sharing, political alliances, etc.

Input file format:

None of Descent's functions is available until a file is opened. Because Descent is currently in a pre 1.0 release, make a backup of your genealogy files before opening them in Descent.

The input file format is a simple tab-delimited text file. Each line of the file corresponds to a single member of the population (an 'ego'). Each line must contain a minimum of four fields (columns), separated by tabs:

EGO ID tab FATHER ID tab MOTHER ID tab SEX

Each member of the population must be assigned a unique Ego ID string (numeric IDs will be treated as strings). Ego's biological father and mother are identified with their unique ID codes, or by a 'Missing' code if their identities are unknown (see below). Ego's sex must also be specified. An optional fifth field, LIVING, denoting whether EGO is alive or dead, can be included. The first line of the file can optionally contain column header labels (e.g., EGO, FATHER, etc.). Thus, the first four lines of a typical input file might look like this:

EGOFATHERMOTHERSEXLIVING
Kennedy JohnKennedy JoeFitzgerald RosemaleNo
Kennedy RobertKennedy JoeFitzgerald RosemaleNo
Kennedy EuniceKennedy JoeFitzgerald RosefemaleYes

The input file may contain additional columns (fields). These columns will be displayed but otherwise ignored by Descent. The order of the columns and the optional column header labels need not conform to the above pattern. When a file is opened, an 'Editor' tab will appear displaying the data in a spreadsheet-like grid. If the first row of the file contains column headings, check the following box, which appears in the lower left-hand corner of the Editor screen:

Column widths can be adjusted, if desired. Below this checkbox are five column identifier popup menus:

Set each of the four required fields (Ego, Father, Mother, Sex) to the appropriate column by selecting the columns from the popup menus. If the current data file does not contain a header row, the popup menus will display column numbers. If the data file does contain a header row, the popup menus will display the column header labels. For example, if the ego IDs appear in column three, and there is no column header label row, set the Ego popup menu to '3'.

If there is no 'Living' column in the data file, or if you wish to treat all population members as living, select the first item of the popup menu: 'All living'.

The codes for Male, Female, Alive, Dead, and Missing can be set by the user. Enter the proper codes for the open data file in the boxes in the lower right-hand corner of the Editor tab:

 

Warning: Opening or importing a new file will delete all results calculated on the previous file. Because some of these calculations can be quite lengthy (see below), be sure and save results that you want to keep.

Importing and exporting KINDEMCOM Egodata files:

The EGODATA genealogy file used by a command-line program named KINDEMCOM (Chagnon and Bryant 1984) that performed several of the computations also performed by Descent can be imported using the 'Import KINDEMCOM...' menu item in the 'File' menu. Your tab-delimited data file can also be exported to KINDEMCOM format by choosing 'Export KINDEMCOM' from the 'Export' menu. This export function will only export the following required fields from your data file: EGO ID, FATHER ID, MOTHER ID, SEX, and LIVING (Descent will convert your sex and living codes to the equivalent KINDEMCOM codes). All other field values will be set to KINDEMCOM's missing code for that field (e.g., 0, 99, 999, 9999). Missing codes will be inserted on export even if you just imported a KINDEMCOM file with data in those fields! KINDEMCOM requires that all fathers and mothers also appear as egos in the data file. If a non-missing father or mother does not have an entry as an ego, Descent will add them to the exported KINDEMCOM file, setting their father, mother, and living codes to missing.

Error checking:

Check for basic errors in the data file by clicking the 'Error check' button in the lower right-hand corner of the Editor tab. Descent will compute and display basic population statistics, and then check if all of the following are true of the open data file:

If there are errors, the type of error and the row(s) where errors occur will appear in the Error Log on the right-hand side of the Editor tab. Descent checks for errors when any function is performed on the data, and will not proceed with calculations until error checking finds no errors.

Descent will check if non-missing fathers and mothers appear as egos in the data file. If a non-missing father or mother is not also listed as an ego, Descent will issue a non-fatal warning.

If the 'Incest' box is checked, Descent will check for any incestuous matings between first-degree relatives (siblings, parents, and offspring), and issue non-fatal warnings if any such matings are found. Because first-degree incest is rare, its presence might indicate a coding error. Checking for incest in large genealogies can be lengthy (perhaps on the order of tens of seconds).

All computations can still be performed if there are only warnings. In fact, Descent will only check for warnings when 'Error check' is clicked in the 'Editor' panel; unlike error checking, it will not check for warnings prior to performing other functions.

Simple typos can be corrected by double-clicking the appropriate data cell and editing the value (e.g., changing an 'm' to an 'f'). After editing, click the 'Error check' button again. Repeat until there are no errors. The edited data file can be saved by selecting 'Save' or 'Save as...' from the 'File' menu. Because Descent has limited editing capabilities, it may be easier to make extensive changes (e.g., an incorrect sort order for one column) by editing the data file in a spreadsheet like Excel, and then re-opening it in Descent.

Computing relatedness:

Click on the 'Relatedness' tab near the top of the Descent window. You will then see a 'Compute Relatedness Stats' button (near the bottom). Click this button to compute relatedness stats. For large genealogies (e.g., 1000+ individuals), this computation can take several minutes. On a 233 MHz Pentium II, for example, a genealogy of 150 individuals took 7 seconds, and a genealogy of 900 individuals took just under 5 minutes. If there are uncorrected errors in the data file, no stats will be computed and you will be directed to correct the errors in the Editor tab.

Clicking the 'Compute Relatedness Stats' button will compute an NxN matrix, where N = the number of individuals in the population. Each ij entry in the matrix contains Wright's coefficient of relatedness between ego i and ego j. The coefficient of relatedness is the fraction of genes of two individuals that are identical by descent from a recent ancestor (higher values thus indicate closer relationships). This computation assumes that 'father' and 'mother' are ego's biological parents. If they are not, then the coefficients of relatedness will not accurately represent the probability that two individuals share alleles by descent, but it can still be used as an index of genealogical 'closeness' between ego i and ego j.

The relatedness matrix is not displayed, but can be exported to a tab-delimited text file by choosing 'Export R-matrix...' from the 'Export' menu. This file will contain row and column labels.

After completing the computation, Descent will display four summary statistics for each ego calculated from the R-matrix:

  1. FgALL: The average relatedness of ego to every living member of the population.
  2. FgCON: The average relatedness of ego to every living consanguineal relative in the population.
  3. Each ego's number of living relatives in the population.
  4. The inbreeding coefficient for each ego.

These statistics can be exported to a tab-delimited text file by choosing 'Export results...' from the 'Export' menu. Results can also be exported by selecting the desired block of cells and copying and pasting into another application like Excel or SPSS.

Founders:

This tab finds the founders of the population, the number of their descendants, and the number of their living descendants. Note that founders need not appear as egos in the genealogy; they simply must have non-missing ID's. Descendants and living descendants, however, consist only of egos in the genealogy.

Lineages:

Patrilineages and matrilineages are determined by tracing each ego's patri- and matriline back to the most distant ancestor with a non-missing ID. Clicking 'Compute Lineages' will return a table with the following results for each ego: EGO ID, Patrilineage founder ID, Patriarch, Patrilineage size, Matrilineage founder ID, Matriarch, and Matrilineage size. Patri- and Matriarchs are the most distant living lineal ancestors. Simple summary statistics, including the number of lineages, range of lineage sizes, and mean lineage size, will also be displayed. Lineages identified in this manner may or may not correspond to indigenous lineage classifications. It is quite possible, for example, that individuals identified by Descent as belonging to separate lineages in fact share a common lineal ancestor who is not a member of the population under study, and thus actually belong to the same lineage.

Warnings:

Counting kin:

The number of kin each ego has in several kin categories can be calculated in the 'Kin counter' tab. In this tab, there are six groups of four popup menus. Each group of four defines a kin category, by compounding basic kin categories if desired. For example, to count paternal aunts, first select 'fathers', and then select 'full sisters':

Up to six compound categories can be defined. Compound categories are computed by first finding all kin in the basic kin category listed first, then finding all the kin of those kin in the basic kin category listed second, and so on. For example, to count brothers-in-law for each ego, select the following from the popup menus in one of the groups of four:

Both the counts of the kin in each category, as well as the IDs of each kin, will be displayed. The final two columns will always be a 'Sum' column and a 'Total' column. These columns sum the kin categories for each ego. For example, to count the number of parallel cousins, select 'fathers:full brothers:offspring' as one compound category, and 'mothers:full sisters:offspring' as a second compound category. The sum column will contain the number of father's brothers' offspring and mother's sisters' offspring (parallel cousins) for each ego. The 'Total' column will contain the IDs of all parallel cousins. The 'Sum' column counts only unique IDs, so if a cousin appears on both sides of the family, he or she will only be counted once. Similarly, if there is, e.g., sibling incest, then the offspring from such matings will be categorized as both siblings and cousins of each other.

Note that basic kin categories are computed solely from the genealogy. For example, mates are individuals who have produced offspring together, not individuals who are married. 'Step' categories (e.g., stepdaughters) are computed using this biological definition of mate (i.e., mate's daughters who are not ego's daughters).

In large genealogies, computing several compound kin categories can be lengthy. Results can be exported to a tab-delimited text file by selecting 'Export results...' from the 'Export' menu. Results can also be exported by selecting the desired block of cells and copying and pasting into another application.

Warnings:

Kin:

Kin relations can be computed only after relatedness stats have been computed in the 'Relatedness' tab.

Select an Ego from the drop-down menu, and then click the 'Compute kin' button. A table will appear. Column one contains the degree of relatedness between Ego and every Alterego (i.e., every other member of the population). Column two contains all ancestors that Ego and Alterego have in common. In parentheses after each common ancestor is the path from Ego to Alterego that includes that ancestor (F=father, M=mother, S=son, D=daughter). For example, if Bob and Tim are brothers, they share a father and mother as common ancestors. After the father ID will be printed (FS), father's son; after the mother ID will be printed (MS), mother's son. Checking 'Omit non-kin' will omit all Alteregos who have no consanguineal connection with Ego.

Note: The 'path' is from Ego to Alterego, not from the common ancestor to either Ego or Alterego. But, the path includes the common ancestor as one of the links. If Ego Martha is the parent of Alterego Brittany, then Martha is not only the Ego, but also the common ancestor, of Brittany. So, in the 'Common ancestor' column, we would have 'Martha (D)', signifying that Martha was the common ancestor, and that the path from Ego Martha to Alterego Brittany was Daughter (i.e., Brittany is Martha's daughter), not that Martha is the daughter of Brittany!

Groups:

Group relatedness can only be computed after relatedness stats have been computed in the 'Relatedness' tab.

If the genealogy file contains a field (column) that identifies different subgroups by group ID codes, the average relatedness of subgroup members to each other is computed for each subgroup (i.e., the average relatedness of each member of subgroup i to other members of subgroup i), and the average relatedness between subgroups is also computed (i.e., the average relatedness of subgroup i to subgroup j). Unless 'All living' is selected in the 'Editor' tab, dead individuals are excluded from the analyses. The between subgroup matrix is not displayed, but can be exported to a tab-delimited text file using the 'Export G-Matrix' menu. Each ego can belong to only one subgroup. The group ID column is selected in the 'Groups' tab.

For example, if the genealogy contains a 'household' ID code, which assigns each ego to a household, the average relatedness of household members to each other is computed, as is a matrix of the average relatedness of each household to every other household.

The required 'father', 'mother', and 'sex' fields can be used as the 'group' code, if desired. This will compute, respectively, the average relatedness of father's offspring to each other (not including the father!), the average relatedness of mother's offspring to each other, and the average relatedness of the sexes to each other. It will also compute the matrix of the average relatedness of each of the subgroups to each of the other subgroups.

Plotting:

Although Descent was designed to compute kinship statistics that would then be exported for analysis in other statistical packages, a simple scatterplot function is available for fishing expeditions. Descent checks whether any variables in the main data file are numeric; if so, these variables are available for plotting against other variables either in the main data file or computed by Descent. To be available for plotting, the variables must first be computed in their respective tabs. Thus, relatedness variables will only be available for plotting after the 'Compute relatedness stats' button is clicked in the Relatedness tab. Only ego-based variables can be plotted, which excludes the output of both the Founders and Group tabs.

Note that if categorical variables have numerical labels (e.g., household ID's are labeled 1, 2, 3, etc.) then these variables will be available for plotting, even though such plots would be meaningless. A correlation coefficient is also computed, but a corresponding p-value is not computed because, since cases may not be independent, the degrees of freedom may not equal the number of cases.

To plot, select the desired variables in the popup menus and click 'Plot variables'. After the plot is displayed, you can zoom in by clicking and dragging over the points you wish to zoom in on, and you can zoom out by clicking the right mouse button.

Warning: The plot function will sort the source table(s) of the selected variables on the ego column. This ensures that ego A's value on the X variable will be matched with ego A's value's on the Y variable. So, if you plot a variable from the main data file displayed in the Editor tab, the Editor table will be sorted on the ego column. This may not be what you want. See the next section on sorting.

Sorting:

All tables can be sorted by column. Simply double-click on the column label (e.g., Ego ID), and the table will be sorted in ascending order on that column. The sort order is as follows: Numeric items will come first, sorted in ascending order. This includes numeric Ego ID codes. Non-numeric items come second, sorted in ascending alphabetical order. Note that upper and lower case items will be sorted differently, with all upper case letters coming before all lower case letters.

Warning: There is currently no option to restore the original sort order. If the order of your genealogy file is important, and it does not conform to the sort order described above, do not sort the genealogy file (in the Editor tab), or do not save the file if you do sort it. Otherwise, you won't be able to recover your original ordering.

License:

Copyright (c) 2000-2004, Edward H. Hagen
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Change log:

0.1.6.4 (Jan. 30, 2004)

0.1.6.3 (Oct. 7, 2003)

0.1.6.2 (Sept. 20, 2003)

0.1.6.1

0.1.6.0

0.1.5.5

0.1.5.0

0.1.4.5

0.1.4.0

Bugs:

Descent is an early alpha release. THERE ARE BUGS, AND LITTLE TESTING OF COMPUTATIONS HAS BEEN PERFORMED!

Email bug reports to e.hagen@biologie.hu-berlin.de

Known bugs:

Selecting and copying text from the error or lineage logs will cause the log window to not refresh after it is obscured either by another panel or by another window.

Contact info:

The author of this software can be contacted at:

e.hagen@biologie.hu-berlin.de

The latest version of the software can be obtained at:

http://itb.biologie.hu-berlin.de/~hagen/Descent/

Acknowledgements:

Support for the development of Descent provided by:

The Institute for Theoretical Biology
Humboldt University, Berlin

Max-Planck-Institut für ethnologische Forschung
Halle/Saale

Special thanks to John Ziker for dicussions on the design of Descent. Special thanks also to Chagnon and Bryant for KINDEMCOM, which provided the inspiration for Descent.