<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Gendered names in Freebase</title>
	<atom:link href="http://infotrope.net/blog/2009/09/10/gendered-names-in-freebase/feed/" rel="self" type="application/rss+xml" />
	<link>http://infotrope.net/blog/2009/09/10/gendered-names-in-freebase/</link>
	<description>Kirrily Robert&#039;s blog</description>
	<lastBuildDate>Mon, 02 Aug 2010 02:01:58 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: rone</title>
		<link>http://infotrope.net/blog/2009/09/10/gendered-names-in-freebase/comment-page-1/#comment-2057</link>
		<dc:creator>rone</dc:creator>
		<pubDate>Sat, 12 Sep 2009 04:09:59 +0000</pubDate>
		<guid isPermaLink="false">http://infotrope.net/blog/?p=372#comment-2057</guid>
		<description>I am amused by the contrasting results for &quot;Lindsey&quot; and &quot;Lindsay&quot;.</description>
		<content:encoded><![CDATA[<p>I am amused by the contrasting results for &#8220;Lindsey&#8221; and &#8220;Lindsay&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Skud</title>
		<link>http://infotrope.net/blog/2009/09/10/gendered-names-in-freebase/comment-page-1/#comment-2056</link>
		<dc:creator>Skud</dc:creator>
		<pubDate>Fri, 11 Sep 2009 17:22:34 +0000</pubDate>
		<guid isPermaLink="false">http://infotrope.net/blog/?p=372#comment-2056</guid>
		<description>@James: some of our queues (merge and delete) require three people in consensus before writing, but the genderizer/typewriter don&#039;t.  It has to do with how easily an error can be reverted.  Anyone can fix Rebecca M Riordan&#039;s gender easily, but it&#039;s much harder to undelete or unmerge.  Given that, we went for volume :)  As to why I mis-gendered her... I think that just happens sometimes, when you&#039;re blasting through a big set.  You get a little dazed ;)  I know there was some talk at our last hack day about how to mitigate that effect (ideally in fun ways that don&#039;t slow down data contribution too much).</description>
		<content:encoded><![CDATA[<p>@James: some of our queues (merge and delete) require three people in consensus before writing, but the genderizer/typewriter don&#8217;t.  It has to do with how easily an error can be reverted.  Anyone can fix Rebecca M Riordan&#8217;s gender easily, but it&#8217;s much harder to undelete or unmerge.  Given that, we went for volume :)  As to why I mis-gendered her&#8230; I think that just happens sometimes, when you&#8217;re blasting through a big set.  You get a little dazed ;)  I know there was some talk at our last hack day about how to mitigate that effect (ideally in fun ways that don&#8217;t slow down data contribution too much).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: James</title>
		<link>http://infotrope.net/blog/2009/09/10/gendered-names-in-freebase/comment-page-1/#comment-2055</link>
		<dc:creator>James</dc:creator>
		<pubDate>Thu, 10 Sep 2009 06:53:07 +0000</pubDate>
		<guid isPermaLink="false">http://infotrope.net/blog/?p=372#comment-2055</guid>
		<description>The outliers often seem to be misclassifications - eg for John and James there&#039;s a few pen-names of women, or in the case of &lt;a href=&quot;http://www.freebase.com/view/en/rebecca_m_riordan&quot; rel=&quot;nofollow&quot;&gt;Rebecca M Riordan&lt;/a&gt; &lt;a href=&quot;http://www.freebase.com/history/view/en/rebecca_m_riordan&quot; rel=&quot;nofollow&quot;&gt;you&lt;/a&gt; got her gender wrong in Genderizer, assuming her &lt;a href=&quot;http://www.oreillynet.com/pub/au/3257&quot; rel=&quot;nofollow&quot;&gt;O&#039;Reilly profile&lt;/a&gt; is correct. Not to pick on you, it just happened to be the first one I came across. For some reason I thought Genderizer/Typewriter averaged three people&#039;s opinions before changing a record, is that not the case?</description>
		<content:encoded><![CDATA[<p>The outliers often seem to be misclassifications &#8211; eg for John and James there&#8217;s a few pen-names of women, or in the case of <a href="http://www.freebase.com/view/en/rebecca_m_riordan" rel="nofollow">Rebecca M Riordan</a> <a href="http://www.freebase.com/history/view/en/rebecca_m_riordan" rel="nofollow">you</a> got her gender wrong in Genderizer, assuming her <a href="http://www.oreillynet.com/pub/au/3257" rel="nofollow">O&#8217;Reilly profile</a> is correct. Not to pick on you, it just happened to be the first one I came across. For some reason I thought Genderizer/Typewriter averaged three people&#8217;s opinions before changing a record, is that not the case?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Skud</title>
		<link>http://infotrope.net/blog/2009/09/10/gendered-names-in-freebase/comment-page-1/#comment-2054</link>
		<dc:creator>Skud</dc:creator>
		<pubDate>Thu, 10 Sep 2009 06:30:39 +0000</pubDate>
		<guid isPermaLink="false">http://infotrope.net/blog/?p=372#comment-2054</guid>
		<description>@Bruce Yeah, geographical distribution is a good one.  You see it quite clearly with &quot;Andrea&quot; where it is a female name in some languages but a male name in others, leading to around 50% overall, but you could guess pretty strongly one way or the other if you were in eg. Australia (female) or Italy (male).

Freebase has &quot;place of birth&quot; but that&#039;s less well filled out than name and gender.  It would be something to consider, though, as an option.</description>
		<content:encoded><![CDATA[<p>@Bruce Yeah, geographical distribution is a good one.  You see it quite clearly with &#8220;Andrea&#8221; where it is a female name in some languages but a male name in others, leading to around 50% overall, but you could guess pretty strongly one way or the other if you were in eg. Australia (female) or Italy (male).</p>
<p>Freebase has &#8220;place of birth&#8221; but that&#8217;s less well filled out than name and gender.  It would be something to consider, though, as an option.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bruce Van Allen</title>
		<link>http://infotrope.net/blog/2009/09/10/gendered-names-in-freebase/comment-page-1/#comment-2053</link>
		<dc:creator>Bruce Van Allen</dc:creator>
		<pubDate>Thu, 10 Sep 2009 06:22:12 +0000</pubDate>
		<guid isPermaLink="false">http://infotrope.net/blog/?p=372#comment-2053</guid>
		<description>Thanks for your work on this!

I take a somewhat different approach, which I&#039;ll briefly sketch for added perspective on this challenging problem.

My need for deriving gender from names is for political organizing, when a strategy could involve targeting voters with gender as one attribute among others such as age, party, and so on. In voter registration records, many people do not provide a gendered title such as Ms, Mr, Mrs, or Miss (American English here). So I needed a way to guess their genders with some reliability. 

I wrote a routine -- in Perl, natch -- that looks at the names of people who did provide gendered titles, and uses that to guess the gender of those who didn&#039;t. The first thing I saw, of course was the ambiguous names, so I added a statistical threshhold below which the guess wasn&#039;t allowed. Then I noticed the dimension that your method doesn&#039;t account for, besides the limitations you mentioned: running the routine in different geographical areas gave different results. So for example in one county the name Guadalupe would come out as overwhelmingly female usage and in another county it would be too ambiguous to provide a useful guess. 

So my approach differs from yours in that it is based on actual usage within a specific population. Still uses some assumptions, and would only be useful where one has an actual population to look at. But after several years of field testing, this method&#039;s results have been consistently better than others I&#039;ve been shown, measured by fewer people assigned &#039;unknown&#039; gender and by higher degree of accuracy found in the field.

If I weren&#039;t typing on such small device, I&#039;d offer some code, and I&#039;d like someday to do some more rigorous statistical analysis of thus method. Happy to follow up if anyone is interested.</description>
		<content:encoded><![CDATA[<p>Thanks for your work on this!</p>
<p>I take a somewhat different approach, which I&#8217;ll briefly sketch for added perspective on this challenging problem.</p>
<p>My need for deriving gender from names is for political organizing, when a strategy could involve targeting voters with gender as one attribute among others such as age, party, and so on. In voter registration records, many people do not provide a gendered title such as Ms, Mr, Mrs, or Miss (American English here). So I needed a way to guess their genders with some reliability. </p>
<p>I wrote a routine &#8212; in Perl, natch &#8212; that looks at the names of people who did provide gendered titles, and uses that to guess the gender of those who didn&#8217;t. The first thing I saw, of course was the ambiguous names, so I added a statistical threshhold below which the guess wasn&#8217;t allowed. Then I noticed the dimension that your method doesn&#8217;t account for, besides the limitations you mentioned: running the routine in different geographical areas gave different results. So for example in one county the name Guadalupe would come out as overwhelmingly female usage and in another county it would be too ambiguous to provide a useful guess. </p>
<p>So my approach differs from yours in that it is based on actual usage within a specific population. Still uses some assumptions, and would only be useful where one has an actual population to look at. But after several years of field testing, this method&#8217;s results have been consistently better than others I&#8217;ve been shown, measured by fewer people assigned &#8216;unknown&#8217; gender and by higher degree of accuracy found in the field.</p>
<p>If I weren&#8217;t typing on such small device, I&#8217;d offer some code, and I&#8217;d like someday to do some more rigorous statistical analysis of thus method. Happy to follow up if anyone is interested.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul Fenwick (pjf) 's status on Thursday, 10-Sep-09 02:38:40 UTC - Identi.ca</title>
		<link>http://infotrope.net/blog/2009/09/10/gendered-names-in-freebase/comment-page-1/#comment-2052</link>
		<dc:creator>Paul Fenwick (pjf) 's status on Thursday, 10-Sep-09 02:38:40 UTC - Identi.ca</dc:creator>
		<pubDate>Thu, 10 Sep 2009 02:38:51 +0000</pubDate>
		<guid isPermaLink="false">http://infotrope.net/blog/?p=372#comment-2052</guid>
		<description>[...]  http://infotrope.net/blog/2009/09/10/gendered-names-in-freebase/        a few seconds ago  from  Gwibber [...]</description>
		<content:encoded><![CDATA[<p>[...]  <a href="http://infotrope.net/blog/2009/09/10/gendered-names-in-freebase/" rel="nofollow">http://infotrope.net/blog/2009/09/10/gendered-names-in-freebase/</a>        a few seconds ago  from  Gwibber [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
