Tuesday, February 09, 2021

Evaluating Your GedMatch Results

You have submitted your DNA to 23andMe.com, and have uploaded them to GedMatch. You have your relative's list, and you wonder, what does it all mean?

This essay will hopefully help you make sense of those results, and help you to use them to further your search. But first, it is important to know what GedMatch is, and what its limitations are.

GedMatch is an open data base, as opposed to 23andMe, Ancestry and others, which are a closed data bases. "Closed" means that you can't put DNA from another source into 23andMe, for example, but you must buy their kit to be included in their data base. GedMatch is an "open" data base since you can upload DNA from a wide variety of processors, including 23andMe, Ancestry, MyHeritage, etc. This is a huge benefit, since it allows people to just upload their results from 23andMe, Ancestry, etc., and be able to compare their DNA profile with thousands of samples processed by other companies. This saves a ton of money, since we don't have to purchase more than one test. This feature is the primary reason we upload all of our birth parent DNA samples to GedMatch.

So, let's take a look at the results of one of our kits and see what typical results may look like (click on image to enlarge). 






So, the basics: Column 1 is the kit number of your relative. You can click on this link to see THEIR relative's list. Column 2 (The "A") prefills your kit and the profile to the left into the one-to-one comparison. You can run different tests to see what segments of your DNA match, etc. It is not necessary to really get too involved with this feature, but just know that it is there. Column 3 is the name of the profile, column 4 is the email contact for the profile. This is valuable information if you decide to contact this person (more on that below). Column 5 ("Largest Seg") is the largest continuous segment of DNA you share with the person. Generally speaking, each time a person's DNA is "mixed" through reproduction the DNA is fractured into pieces. Thus, the further away from another person you are biologically, the smaller the fragments of the shared DNA you both received from your common ancestor will be. Column 6 ("Total CM") is the total common DNA you share. In other words, it is the sum of all common DNA segments, the longest segment from column 5, and all of the smaller segments. Both of these metrics will be larger the closer you are in relatedness to the other person. 

Both column 5 and 6 are combined to arrive at the most important column in your list: Column 7, the "Gen" column. This is the most imformative column for searching purposes, because this number tells you an estimation of how many generations separate you from the other person. For searching purposes, you are looking for a "Gen" number of less than 3. Without getting too technical, a "3" means that you shared a great-grand parent as a common ancestor, meaning that you are second cousins (A great resource for interpreting GedMatch's "Gen" numbers can be found here). Depending on who was tested, a match of "3" or less could possibly be leveraged to locating a birth family, depending on how familiar the other person is with their family tree. But assuming the other person has parents still alive, and those parents know their first cousins, it would be possible to network and test the various branches of that small tree to locate birth parents.

Column 8 ("Overlap") is a fairly technical column, and is explained here. We don't use this column at all, but if you wish to get into the weeds it gives value. Column 9 is the date that your sample and the other persons were first compared. When you first upload your sample to GedMatch, every one of your relatives will have the same date -- the date your kit was processed. But, over time, new kits will be added to your list as new samples of other people are added. So, if a match appears with a later date than your upload date, that is an indication that the new match is a recent addition to your relative's list.  

Column 10 is also a very important column, and you may want to refer back to the image I included above. I already mentioned that the ability to compare DNA from various platforms is one of GedMatch's strengths, but in a sense it is also a weakness. In computer programing, this weakness is called "GIGO," or "Garbage in, garbage out." Not all DNA processing companies are created equal for matching purposes. The gold standard is 23andMe, which currently tests 640,000 SNPs (segments of your genome that are different among people). The more SNPs that are tested, the finer the "resolution" is for your DNA. It is like a TV -- the more pixels, the better the picture. 23andMe has the highest number of "pixels" (SNPs) in the industry. Ancestry, FamilyTree, and MyHeritage compare a similar number, but there is small variation between companies as to which "SNPs" are tested. For matching purposes, it doesn't matter because with that many data points being compared, true relationships can be accurately determined. Thus, if you match to a relative that used 23andMe, or Ancestry, or one of the other premium testing companies, the match you see will be a solid, accurate match. 

The problem is when the match is to a non-US company like 23Mofang or WeGene. When you look at our listing above, our top match is with *Anna, at 3.1 generations. One might normally think this is a solid match, just outside our search window, but close! However, when we see what company *Anna used for her DNA processing, we see that she left it blank. Without knowing which company she used, we can't determine how reliable that relationship is. Why? Let's take a look at our second match to see.

"Theo" tested with 23Mofang, and is our second closest match at 3.5 generations. Why does this name look familiar to most Chinese adoptees? Because this sample appears of many lists. 





He Qian is a birth mother living in Hengyang City, Hunan, while Yang Ping lives in Wuhu, Anhui and Yang Man Xiu, the sample at the beginning of this article, lives in Ningdu, Jiangxi. None of these people are related, but all three share "Theo" as a common relative at around 3 generations. 

How can this be?

The problem lies in the fact that 23Mofang and WeGene look at different SNPs than 23andMe, Ancestry, and other western processors. As 23Mofang detailed it, "We undertook modifications to some of the loci of the array to improve its applicability to the genome of the Chinese population." In other words, 23Mofang (as well as WeGene) tests different genetic markers than 23andMe, Ancestry, etc. That is why kits from these Chinese data bases often show up near the top of Chinese adoptee's relatives lists. But sadly, these relationships are almost always exaggerated. 

So, when you get your results, you should probably ignore any matches that show a blank testing company, or that list 23Mofang and WeGene. Assume these are not valid matches. Common names that appear over and over include:

Acheng Zhaoye (WeGene)

Anna (Blank)

Chinese Korean (Blank)

Dongguan Chen (Blank)

Guangzhou (WeGene)

Huangxin (23Mofang)

Theo (23Mofang)

Zhao Ruming (23Mofang)

So, after eliminating suspect matches, the next thing to do it reach out to your top 5-10 relatives (4 generations or closer) to ask them what part of China they originated from. You are not looking to utilize these matches for searching purposes (only <3 generations will help with that), but seeing if your somewhat near relatives are from the same area of China as you or your child. We all realize by now that just because an adoptee was adopted from orphanage X in no way makes it certain that the birth family of that adoptee is from that orphanage area. The movement of children from one area to another for adoption was/is prevalent. So, if you contact five near relatives and they cluster around Western Guangdong, for example, and you or your child was adopted from an orphanage in Hunan, those relatives suggest you were not born in the immediate orphanage area. It is not certain, of course, but suggestive. But it may help you expand your view of where to search. 

To summarize: GedMatch relatives closer than 3 generations are useful for birth parent searching. Most families inside China are aware of first and second cousins in their families, so networking close relatives is possible to locate a birth family. 

Matches more distant are helpful for triangulation purposes, to suggest other areas of China that a birth family may have lived or at least originated from. We have seen this in our own family's search activity -- our daughter's first cousin match (an adoptee) is from a small orphanage at the other end of the Province. We are now focused on locating this other adoptee's birth family because we know if we locate hers, we will locate our own daughter's. 

And matches from 23Mofang and WeGene are not commonly usable for either purpose.