<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Open government and parsable data formats</title>
	<atom:link href="http://infotrope.net/blog/2009/11/01/open-government-and-parsable-data-formats/feed/" rel="self" type="application/rss+xml" />
	<link>http://infotrope.net/blog/2009/11/01/open-government-and-parsable-data-formats/</link>
	<description>Kirrily Robert&#039;s blog</description>
	<lastBuildDate>Sat, 17 Jul 2010 20:29:03 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: Adam Kennedy</title>
		<link>http://infotrope.net/blog/2009/11/01/open-government-and-parsable-data-formats/comment-page-1/#comment-2440</link>
		<dc:creator>Adam Kennedy</dc:creator>
		<pubDate>Mon, 02 Nov 2009 02:43:09 +0000</pubDate>
		<guid isPermaLink="false">http://infotrope.net/blog/?p=434#comment-2440</guid>
		<description>After talking to a number of government agencies at the GovHack event over the weekend, they were quite positive about the idea of producing a tool that could be pointed at a geo data MID/MIF file and validate as &quot;export grade&quot;, making sure it has a default project, appropriate datum, ST_IsValid-passing polygons, and so on and so forth.

I&#039;m hoping to produce it (or something like it) shortly.</description>
		<content:encoded><![CDATA[<p>After talking to a number of government agencies at the GovHack event over the weekend, they were quite positive about the idea of producing a tool that could be pointed at a geo data MID/MIF file and validate as &#8220;export grade&#8221;, making sure it has a default project, appropriate datum, ST_IsValid-passing polygons, and so on and so forth.</p>
<p>I&#8217;m hoping to produce it (or something like it) shortly.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Skud</title>
		<link>http://infotrope.net/blog/2009/11/01/open-government-and-parsable-data-formats/comment-page-1/#comment-2439</link>
		<dc:creator>Skud</dc:creator>
		<pubDate>Sun, 01 Nov 2009 21:23:59 +0000</pubDate>
		<guid isPermaLink="false">http://infotrope.net/blog/?p=434#comment-2439</guid>
		<description>Yeah, we mostly use JSON at Freebase.com and although I was skeptical at first (probably a hangover from early Javascript trauma) I am starting to really love working with it.  XML does have some benefits though, like being able to specify that certain fields are required, or whatever.</description>
		<content:encoded><![CDATA[<p>Yeah, we mostly use JSON at Freebase.com and although I was skeptical at first (probably a hangover from early Javascript trauma) I am starting to really love working with it.  XML does have some benefits though, like being able to specify that certain fields are required, or whatever.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: J Chris Anderson</title>
		<link>http://infotrope.net/blog/2009/11/01/open-government-and-parsable-data-formats/comment-page-1/#comment-2438</link>
		<dc:creator>J Chris Anderson</dc:creator>
		<pubDate>Sun, 01 Nov 2009 20:56:26 +0000</pubDate>
		<guid isPermaLink="false">http://infotrope.net/blog/?p=434#comment-2438</guid>
		<description>I know we can&#039;t ask old govt data to get future-perfect right away, but for greenfield projects, maybe it makes sense to sell JSON as the simplest option. JSON definitely simplifies the parse-and-go workflow compared to XML, and it isn&#039;t subject to the frame issues of CSV.</description>
		<content:encoded><![CDATA[<p>I know we can&#8217;t ask old govt data to get future-perfect right away, but for greenfield projects, maybe it makes sense to sell JSON as the simplest option. JSON definitely simplifies the parse-and-go workflow compared to XML, and it isn&#8217;t subject to the frame issues of CSV.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Skud</title>
		<link>http://infotrope.net/blog/2009/11/01/open-government-and-parsable-data-formats/comment-page-1/#comment-2437</link>
		<dc:creator>Skud</dc:creator>
		<pubDate>Sun, 01 Nov 2009 20:42:39 +0000</pubDate>
		<guid isPermaLink="false">http://infotrope.net/blog/?p=434#comment-2437</guid>
		<description>Sure, but any kind of CSV format, once inspected, is still easier to write a parser for than PDF.  And, arguably, better than XML for most applications.  Perl&#039;s &lt;a href=&quot;http://search.cpan.org/perldoc?Text::CSV&quot; rel=&quot;nofollow&quot;&gt;Text::CSV&lt;/a&gt; module, which comes with the distribution by default, handles all the issues you describe above except those that are Excel&#039;s fault.  And quoting fields that aren&#039;t really numeric (eg. zipcodes, phone numbers) will fix most of those.

It&#039;s not that you can&#039;t make some godawful messes with CSV, but they&#039;re *still* more parsable than PDF.</description>
		<content:encoded><![CDATA[<p>Sure, but any kind of CSV format, once inspected, is still easier to write a parser for than PDF.  And, arguably, better than XML for most applications.  Perl&#8217;s <a href="http://search.cpan.org/perldoc?Text::CSV" rel="nofollow">Text::CSV</a> module, which comes with the distribution by default, handles all the issues you describe above except those that are Excel&#8217;s fault.  And quoting fields that aren&#8217;t really numeric (eg. zipcodes, phone numbers) will fix most of those.</p>
<p>It&#8217;s not that you can&#8217;t make some godawful messes with CSV, but they&#8217;re *still* more parsable than PDF.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Philip Newton</title>
		<link>http://infotrope.net/blog/2009/11/01/open-government-and-parsable-data-formats/comment-page-1/#comment-2436</link>
		<dc:creator>Philip Newton</dc:creator>
		<pubDate>Sun, 01 Nov 2009 19:53:49 +0000</pubDate>
		<guid isPermaLink="false">http://infotrope.net/blog/?p=434#comment-2436</guid>
		<description>&lt;i&gt;CSV is a great format for open data [...]. It’s [...] easy to parse [...].&lt;/i&gt;

Um. Only if you restrict yourself to a subset of CSV, especially if you expect anyone to import the stuff into Excel at some point.

Excel is a bit too smart at times, and you&#039;ll get weird things like ZIP codes being recognised as numeric (so leading zeroes disappear), numbers being recognised as dates or vice versa, etc.

Numbers with decimal places might also cause problems for people in locales where the decimal separator is not a dot - for example, in Germany, exporting to &quot;CSV&quot; from Excel will result in a *semicolon*-separated file with decimal *commas*.

Some applications support line breaks inside a field (some only if the field is surrounded by quotation marks), while others mess up completely in such a situation.

Tab-separated is a bit better, since it avoids the problem of the separator being embedded in the contents (most data is less likely to contain a tab than a comma, semicolon, or quotation mark). But even that will fall prey to the heuristics Excel and friends use to determine data type based on content.</description>
		<content:encoded><![CDATA[<p><i>CSV is a great format for open data [...]. It’s [...] easy to parse [...].</i></p>
<p>Um. Only if you restrict yourself to a subset of CSV, especially if you expect anyone to import the stuff into Excel at some point.</p>
<p>Excel is a bit too smart at times, and you&#8217;ll get weird things like ZIP codes being recognised as numeric (so leading zeroes disappear), numbers being recognised as dates or vice versa, etc.</p>
<p>Numbers with decimal places might also cause problems for people in locales where the decimal separator is not a dot &#8211; for example, in Germany, exporting to &#8220;CSV&#8221; from Excel will result in a *semicolon*-separated file with decimal *commas*.</p>
<p>Some applications support line breaks inside a field (some only if the field is surrounded by quotation marks), while others mess up completely in such a situation.</p>
<p>Tab-separated is a bit better, since it avoids the problem of the separator being embedded in the contents (most data is less likely to contain a tab than a comma, semicolon, or quotation mark). But even that will fall prey to the heuristics Excel and friends use to determine data type based on content.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
