<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-570860519139426322</id><updated>2012-01-20T11:03:03.845-05:00</updated><category term='digital library'/><category term='web analytics'/><category term='visualization'/><category term='computational linguistics'/><category term='wiki'/><category term='information overload'/><category term='hakia'/><category term='character encoding'/><category term='codepage'/><category term='document management'/><category term='vyasa'/><category term='music'/><category term='digital asset management'/><category term='indexing'/><category term='analytics'/><category term='audio'/><category term='enterprise content management'/><category term='semantic search'/><category term='natural language processing'/><category term='search'/><category term='file type'/><category term='unicode'/><category term='information discovery'/><category term='metadata'/><category term='charset'/><category term='semantic analysis'/><category term='google'/><category term='taxonomy'/><category term='discovery'/><title type='text'>Shared Development</title><subtitle type='html'>Sharing the software development experience</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>18</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-9181098810964368013</id><published>2007-02-11T07:02:00.000-05:00</published><updated>2007-02-11T09:17:51.056-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='file type'/><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><title type='text'>MagicMimeTypeIdentifier update</title><content type='html'>&lt;p&gt;&lt;a href="http://aperture.sourceforge.net/doc/javadoc/org/semanticdesktop/aperture/mime/identifier/magic/MagicMimeTypeIdentifier.html"&gt;MagicMimeTypeIdentifier&lt;/a&gt;, which scored so highly on &lt;a href="http://fredeaker.blogspot.com/2006/12/file-type-mime-detection.html"&gt;my comparison of Java file type detectors&lt;/a&gt;, has been updated. Here is the email I received from Christian Fluit:&lt;/p&gt;&lt;blockquote&gt;I have just updated Aperture's MagicMimeTypeIdentifier based on the results of your benchmark, in order to achieve the best score possible. This led to the addition of the following MIME types:&lt;br /&gt;&lt;br /&gt;audio/x-aiff&lt;br /&gt;audio/x-ms-wma (previously mistakenly labeled as audio/x-ms-wmv)&lt;br /&gt;application/x-ms-wm (artificial supertype of wma and wmv, they share the same magic number and can only be distinguished when they have the proper file name extension or when you interpret the container's contents in more depth)&lt;br /&gt;image/svg&lt;br /&gt;image/x-icon&lt;br /&gt;image/x-raw&lt;br /&gt;image/x-tga&lt;br /&gt;application/x-freemind&lt;br /&gt;&lt;br /&gt;Also, some MIME types had their description updated, e.g. .rmi files are now also labeled as audio/midi.&lt;br /&gt;&lt;br /&gt;Unfortunately, a 100% score is not achievable at the moment, as the magic number of TGA files also matches with that of certain versions of Quattro Pro spreadsheets. At the moment this cannot be expressed in the identifier's config file, you would almost need a rule based language to express this.&lt;/blockquote&gt;&lt;p&gt;Running the test with the new version of MagicMimeTypeIdentifier (&lt;a href="http://sourceforge.net/cvs/?group_id=150969"&gt;available from the CVS repository&lt;/a&gt;) lead to a great improvement: &lt;span style="font-weight:bold;"&gt;95% accuracy!&lt;/span&gt; Most of this improvement came from recognition of the SVG and ICO files. Great job Chris!&lt;/p&gt;&lt;p&gt;Chris also mentions in his email:&lt;/p&gt;&lt;blockquote&gt;A remark about your benchmark: I noticed that all your files had the proper file extensions. Our MIME type identifier primarily uses magic numbers and only switches to checking file extensions when magic number matching fails or when it is unable to discriminate between a family of related file formats (e.g. the MS Office formats). I wonder what the outcome of your test would be if you would remove all file name extensions :)&lt;/blockquote&gt;&lt;p&gt;Renaming the files without an extension reduced the accuracy of MagicMimeTypeIdentifier to only 78%--a small decrease from 84% accuracy with file extensions. Still, this beats the heck out of other detectors.&lt;/p&gt;&lt;p&gt;Current project deadlines prevent me from doing more comprehensive testing, but I would like to thank Chris for his response.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-9181098810964368013?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/9181098810964368013/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=9181098810964368013' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/9181098810964368013'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/9181098810964368013'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/02/magicmimetypeidentifier-update.html' title='MagicMimeTypeIdentifier update'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-2607311628342867297</id><published>2007-01-24T16:19:00.001-05:00</published><updated>2007-01-24T16:21:28.363-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><title type='text'>Microsoft Photo Info</title><content type='html'>&lt;p&gt;&lt;blockquote&gt;&lt;a href="http://www.microsoft.com/windowsxp/using/digitalphotography/prophoto/photoinfo.mspx"&gt;Microsoft Photo Info&lt;/a&gt; is a new software add-in for Microsoft Windows that allows photographers to add, change and delete common "metadata" properties for digital photographs from inside Windows Explorer.&lt;/blockquote&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-2607311628342867297?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.microsoft.com/windowsxp/using/digitalphotography/prophoto/photoinfo.mspx' title='Microsoft Photo Info'/><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/2607311628342867297/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=2607311628342867297' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/2607311628342867297'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/2607311628342867297'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/microsoft-photo-info_24.html' title='Microsoft Photo Info'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-1279733212329735779</id><published>2007-01-24T16:02:00.000-05:00</published><updated>2007-01-24T16:14:14.924-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='visualization'/><title type='text'>Visualization Links</title><content type='html'>&lt;p&gt;&lt;a href="http://www.swivel.com/"&gt;Swivel&lt;/a&gt; is a Web site for curious people to explore data. They "use farms of powerful computers and algorithms ... to transform a lonely grid of numbers and letters into hundreds - sometimes thousands - of graphs that can be explored and compared with any other public data ... have ratings and comments and publishing shortcuts for bloggers, so folks can share ideas, talk about insights and understand data together ... we transform the sometimes tedious task of reading someone else's spreadsheet into a fun experience of clicking through a Web site full of images, graphs and color."&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.gapminder.org/"&gt;Gapminder&lt;/a&gt; "is a non-profit venture for development and provision of free software that visualize human development. This is done in collaboration with universities, UN organizations, public agencies and non-governmental organizations. The main project during the coming three years is a collaboration with UN Statistic Division with the aim to visualize UN common database..."&lt;/p&gt;&lt;p&gt;The &lt;a href="http://www.research.ibm.com/visual/"&gt;IBM Visual Communication Lab&lt;/a&gt; "develop[s] visualization algorithms that help people see and exchange information in novel ways. Our designs aim to transform visualization from a solitary activity into a collaborative one. Some application areas are online discussions, email archives, social networks, software development, and executive decision support tools. By allowing people to observe and orient themselves in complex information landscapes, our inventions enable faster, more insightful decisions."&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-1279733212329735779?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/1279733212329735779/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=1279733212329735779' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/1279733212329735779'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/1279733212329735779'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/visualization-links.html' title='Visualization Links'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-4319334249875954252</id><published>2007-01-22T07:33:00.000-05:00</published><updated>2007-01-22T08:01:24.944-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='semantic search'/><category scheme='http://www.blogger.com/atom/ns#' term='semantic analysis'/><title type='text'>Bauhaus-Universität Weimar</title><content type='html'>&lt;p&gt;I recently stumbled upon &lt;a href="http://www.uni-weimar.de/cms/"&gt;Bauhaus-Universität Weimar&lt;/a&gt; (&lt;a href="http://www.uni-weimar.de/index.en.php?lang=en"&gt;english&lt;/a&gt;), a univerisity for creative studies in Weimar, Germany. The university conducts research in &lt;a href="http://www.uni-weimar.de/cms/webis.69.0.html"&gt;Web Technology and Information Systems&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;The site has information about:&lt;p&gt;&lt;ul&gt;&lt;li&gt;The &lt;a href="http://www.uni-weimar.de/medien/webis/research/aitools/wiki/doku.php"&gt;AItools&lt;/a&gt; suite which addresses text-based information retrieval tasks. It is comprised of basic and advanced algorithms, data structures, and design patterns to model complex real-world retrieval processes.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;The &lt;a href="http://www.uni-weimar.de/cms/AIsearch.99.0.html"&gt;AIsearch&lt;/a&gt; mining tool for the intelligent analysis of document collections. It offers a convenient interface for Web-based search and combines algorithms for the formation, labeling, and visualization of categories along with a smart spelling analysis.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://www.uni-weimar.de/cms/TIR_Workshop_Series.5933.0.html"&gt;The International Workshop on Text-based Information Retrieval&lt;/a&gt; which addresses researchers, users, and practitioners from different fields: data mining and machine learning, document and knowledge management, semantic technologies, computer linguistics, and information retrieval in general.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;The TIR workshop is occurs in conjunction with &lt;a href="http://www.dexa.org/"&gt;DEXA&lt;/a&gt; which also hosts other &lt;a href="http://www.dexa.org/ws_groups_list"&gt;interesting workshops&lt;/a&gt; on data processing, data management, data mining and retrieval, semantics, knowledge, self adaption, and autonomic computing.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-4319334249875954252?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/4319334249875954252/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=4319334249875954252' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/4319334249875954252'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/4319334249875954252'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/bauhaus-universitt-weimar.html' title='Bauhaus-Universität Weimar'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-5154623126831485239</id><published>2007-01-19T07:23:00.000-05:00</published><updated>2007-01-22T08:02:33.981-05:00</updated><title type='text'>File Type Metadata Discovery, Part 2: Images</title><content type='html'>&lt;p style="font-weight:bold"&gt;File Type Metadata Discovery, Part 2: Images&lt;/p&gt;&lt;p&gt;In a &lt;a href="http://fredeaker.blogspot.com/2006/12/file-type-mime-detection.html"&gt;previous   article&lt;/a&gt;, I evaluated various libraries to determine which most accurately identified a file's type. This article represents part two in a series of articles that explore how to discover metadata about a file after its type has been detected. &lt;/p&gt;&lt;p style="font-weight: bold;"&gt;Java ImageIO&lt;/p&gt;&lt;p&gt;The primary Java library for image handling is &lt;a href="http://java.sun.com/j2se/1.5.0/docs/guide/imageio/index.html"&gt;javax.imageio&lt;/a&gt; which provides a pluggable architecture for working with images stored in files and accessed across the network and a framework for the addition of format-specific plugins. Plug-ins for several common formats are included with Java Image I/O, but third parties can use this API to create their own plugins to handle special formats. &lt;/p&gt;&lt;p&gt;There is also a &lt;a href="https://jai-imageio.dev.java.net/"&gt;jai-imageio&lt;/a&gt;   project on &lt;a href="https://www.dev.java.net/"&gt;java.net&lt;/a&gt; which is a set of   ImageReader and ImageWriter plugins for the ImageIO API, primarily built by the JAI team (see &lt;a href="http://forums.java.net/jive/thread.jspa?messageID=64535&amp;tstart=0"&gt;this   thread at java.net&lt;/a&gt;).&lt;/p&gt;&lt;p&gt;The &lt;a href="http://java.sun.com/j2se/1.5.0/docs/guide/imageio/index.html"&gt;javax.imageio&lt;/a&gt;   has comprehensive metadata capability included in the &lt;span style="font-family: Courier New;"&gt;ImageReader.getStreamMetadata()&lt;/span&gt; and &lt;span style="font-family: Courier New;"&gt;ImageReader.getImageMetadata()&lt;/span&gt; methods. These methods generate an &lt;a href="http://java.sun.com/javase/6/docs/api/javax/imageio/metadata/IIOMetadataFormat.html" title="IIOMetadataFormat"&gt;IIOMetadataFormat&lt;/a&gt; object &lt;a href="http://java.sun.com/j2se/1.5.0/docs/guide/imageio/spec/apps.fm5.html" title="whose values are accessible through a DOM tree"&gt;whose values are accessible through a DOM tree&lt;/a&gt;. The amount of image specific metadata available is staggering, and probably wouldn't be useful to someone unless they were creating a very image-centric application. However, in a &lt;a href="http://en.wikipedia.org/wiki/Digital_asset_management"&gt;digital asset management&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Digital_asset_management_system"&gt;system&lt;/a&gt;, simple metadata such as height and width are readily available from &lt;span style="font-family: Courier New;"&gt;ImageReader.getImageMetadata()&lt;/span&gt;.&lt;/p&gt;&lt;p style="font-weight: bold;"&gt;ImageMetadataDiscoverer&lt;/span&gt;&lt;/p&gt;&lt;p&gt;I have created a simple library that gathers all available image metadata into a &lt;a href="http://java.sun.com/javase/6/docs/api/java/util/Map.html"&gt;Map&lt;/a&gt; set. Download   &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=155397&amp;package_id=173024&amp;amp;release_id=479689" title="ImageMetadataDiscoverer"&gt;ImageMetadataDiscoverer&lt;/a&gt; at Sourceforge or read the &lt;a href="http://vyasa.sourceforge.net/ImageMetadataDiscoverer/"&gt;javadocs&lt;/a&gt;.&lt;/p&gt;&lt;p style="font-weight: bold;"&gt;Need More?&lt;/p&gt;&lt;p&gt;More specific needs can be met by the wide variety of image tools available for Java. Marco Schmidt has a nice list of &lt;a href="http://schmidt.devlib.org/java/pixel-image-io-libraries.html" title="raster"&gt;raster&lt;/a&gt; and &lt;a href="http://schmidt.devlib.org/java/vector-image-io-libraries.html" title="vector"&gt;vector&lt;/a&gt; libraries. &lt;a href="http://dmoz.org/Computers/Programming/Languages/Java/Class_Libraries/Graphics/Data_Formats/"&gt;DMOZ&lt;/a&gt;   also maintains a directory of libraries.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-5154623126831485239?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/5154623126831485239/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=5154623126831485239' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/5154623126831485239'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/5154623126831485239'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/file-type-metadata-discovery-part-2.html' title='File Type Metadata Discovery, Part 2: Images'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-8862490526753502672</id><published>2007-01-17T15:38:00.000-05:00</published><updated>2007-01-17T15:54:57.123-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='semantic search'/><category scheme='http://www.blogger.com/atom/ns#' term='semantic analysis'/><title type='text'>The Future of Semantic Search</title><content type='html'>&lt;p&gt;  &lt;a href="http://www.gcn.com/print/26_01/42877-1.html?topic=content_management&amp;amp;CMP=OTC-RSS"&gt;Steven Arnold recently stated some facts&lt;/a&gt; that are very closely related to &lt;a href="http://vyasa.sourceforge.net/"&gt;my project&lt;/a&gt; and research interests: &lt;/p&gt; &lt;blockquote&gt;...what will carry us into 2007 is a collection of technologies we think of as text mining, where software algorithms look at documents and find the names of people, places and things and attempt to relate them to one another.&lt;/blockquote&gt;&lt;blockquote&gt;...companies like &lt;a href="http://www.attensity.com/"&gt;Attensity Corp.&lt;/a&gt; and &lt;a href="http://www.nstein.com/"&gt;nStein Technologies&lt;/a&gt; ... are focused on figuring out the nuances, relationships and the important concepts in a document. Their systems generate index terms that an enterprise search system can suck in.&lt;/blockquote&gt;&lt;blockquote&gt;...new companies ... are approaching the problem both mathematically and by doing vocabulary and knowledge-based analysis. Their software decomposes sentences into subjects, verbs and adjectives and analyzes the results with the predictive algorithms.&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-8862490526753502672?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/8862490526753502672/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=8862490526753502672' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/8862490526753502672'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/8862490526753502672'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/future-of-semantic-search.html' title='The Future of Semantic Search'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-7271414414122128610</id><published>2007-01-17T15:02:00.000-05:00</published><updated>2007-01-17T15:10:29.303-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='wiki'/><category scheme='http://www.blogger.com/atom/ns#' term='search'/><title type='text'>Wikiseek</title><content type='html'>&lt;p&gt;A newly launched service called &lt;a href="http://www.wikiseek.com/" title="Wikiseek"&gt;Wikiseek&lt;/a&gt; focuses on complimenting &lt;a href="http://wikipedia.org/" title="Wikipedia"&gt;Wikipedia&lt;/a&gt; by restricting search results to articles and references in the encyclopedia. &lt;a href="http://www.wikiseek.com/about/" title="Wikiseek's about page"&gt;Wikiseek's about page&lt;/a&gt; claims that this method makes it "an authoritative source of information less subject to spam and SEO schemes."&lt;/p&gt;&lt;p&gt;Wikiseek suggests search refinements "based on user tagging and categorization within Wikipedia." When I do a search for the word "&lt;a href="http://www.wikiseek.com/results.php?q=apple"&gt;apple&lt;/a&gt;," Wikiseek returns a category cloud displaying categories which which my search phrase appears most often. The "Apple hardware" category shows most prominently. Actually, I was more interested in the fruit because I had one with lunch today. Clicking on the fruit category gives me a list of those references in Wikipedia that refer to "apple" as a fruit. Pretty nifty! If I go back and choose "Apple hardware" I see the expected list of articles about current and legacy Apple products.&lt;/p&gt;&lt;p&gt;There are some instances of strange results, such as the search for "&lt;a href="http://www.wikiseek.com/results.php?q=wiki"&gt;wiki&lt;/a&gt;," and comments at the &lt;a href="http://www.techcrunch.com/2007/01/16/wikipedia-search-engine-wikiseek-launches/"&gt;TechCrunch article&lt;/a&gt; are quick to point out that since Wikipedia is editable, article spammers now have more incentive if their actions will effect a Wikipedia-specific search engine. &lt;/p&gt; &lt;p&gt;  Personally, I find the category refinement feature and reference results useful although not enough to build an entire site around. It would be better utilized as a module in a large package... perhaps Wikipedia itself.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-7271414414122128610?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/7271414414122128610/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=7271414414122128610' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/7271414414122128610'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/7271414414122128610'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/wikiseek.html' title='Wikiseek'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-4475839394133589365</id><published>2007-01-15T13:27:00.000-05:00</published><updated>2007-01-15T14:36:45.577-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='information discovery'/><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><category scheme='http://www.blogger.com/atom/ns#' term='semantic search'/><title type='text'>The Future of Enterprise Search</title><content type='html'>&lt;p&gt;Dana Gardner of &lt;a href="http://www.interarbor-solutions.com/"&gt;Interarbor Solutions&lt;/a&gt; recently interviewed members of &lt;a href="http://www.fastsearch.com/"&gt;FAST Search &amp; Transfer&lt;/a&gt;. Their discussion brought up several interesting topics about semantic, search-centered applications.&lt;p&gt;&lt;p&gt;Dr. Bjørn Olstad (CTO) makes the following insightful remarks:&lt;blockquote&gt;The fundamental difference between search and such things as database, content management, and document management systems is that search starts with the user and then does reverse engineering -- what is necessary to realize this user's experience. It's starting with the content and then trying to deduce what should we do with this content. When you do that, you end up with a user-driven experience.&lt;/blockquote&gt;&lt;blockquote&gt;...increasingly the user experience will be driven by algorithms and will be dynamic, so that you can actually optimize the user experience...&lt;/blockquote&gt;&lt;blockquote&gt;...&lt;span style="font-weight:bold;"&gt;search can auto-generate metadata&lt;/span&gt; and find ways to use that structure and to improve the discovery. Then, the allocation elements that the traditional Semantic Web talks about can be aimed at how to improve algorithms, as opposed to starting from scratch. In doing that, I think search has the opportunity to deliver on the premise of the Semantic Web, by applying algorithms as opposed to altering tools.&lt;/blockquote&gt;&lt;blockquote&gt;Technically, if you use an open-ended query like, "List the innovative people in my company," or something, you could get back kind of a menu of people that have been referred to in the documents or discussions where there is talk of innovation, and then get the facts related to these people. &lt;span style="font-weight:bold;"&gt;So it’s not coming back with an answer, but it is coming back with an analysis -- and giving you the opportunity to refine the query.&lt;/span&gt;&lt;/blockquote&gt;&lt;p&gt;I really liked Zia Zaman's ideas as well (Vice President, Strategic Market Development):&lt;/p&gt;&lt;blockquote&gt;I often like to talk about Greek philosophers, and Socrates is one of my favorites. The reason is because of what he did with a seeker. &lt;span style="font-weight:bold;"&gt;He didn’t answer a question with an answer. Rather he asked another question, and that allowed the seeker to refine his or her question, until finally they got to what they were looking for.&lt;/span&gt; In many ways, technology -- whether it’s information discovery, search, and business intelligence, whatever it might be -- shouldn’t insult the intelligence of the seeker. Rather it should allow individuals to find the answer that they’re looking for, either through tacit information that’s stored in the enterprise, or semantic information, or structured information, whatever it might be. What we’re trying to do is mimic the type of dialogue that Socrates had.&lt;/blockquote&gt;&lt;p&gt;&lt;a href="http://briefingsdirect.blogspot.com/2007/01/transcript-of-briefingsdirect-podcast_14.html"&gt;Listen to the podcast or read the transcription...&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-4475839394133589365?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://briefingsdirect.blogspot.com/2007/01/transcript-of-briefingsdirect-podcast_14.html' title='The Future of Enterprise Search'/><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/4475839394133589365/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=4475839394133589365' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/4475839394133589365'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/4475839394133589365'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/future-of-enterprise-search.html' title='The Future of Enterprise Search'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-3386630860312436358</id><published>2007-01-12T14:45:00.000-05:00</published><updated>2007-01-12T14:58:00.185-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><category scheme='http://www.blogger.com/atom/ns#' term='music'/><category scheme='http://www.blogger.com/atom/ns#' term='audio'/><title type='text'>Free Music Identification and Metadata Service</title><content type='html'>Yesterday &lt;a href="http://fredeaker.blogspot.com/2007/01/file-type-metadata-discovery-part-1.html"&gt;I briefly mentioned&lt;/a&gt; that there were services that could help you identify non-technical audio metadata. I just become aware of &lt;a href="http://www.marketwire.com/mw/release_html_b1?release_id=199924"&gt;a press release from MusicIP&lt;/a&gt; that announces their free music identification service that provides track metadata.&lt;/p&gt;&lt;p&gt;Check out &lt;a href="http://www.marketwire.com/mw/release_html_b1?release_id=199924"&gt;the press release&lt;/a&gt; or the &lt;a href="http://www.musicip.com/"&gt;MusicIP&lt;/a&gt; website for more info.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-3386630860312436358?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.marketwire.com/mw/release_html_b1?release_id=199924' title='Free Music Identification and Metadata Service'/><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/3386630860312436358/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=3386630860312436358' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/3386630860312436358'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/3386630860312436358'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/free-music-identification-and-metadata.html' title='Free Music Identification and Metadata Service'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-4205019546656539546</id><published>2007-01-11T15:23:00.000-05:00</published><updated>2007-01-11T15:46:00.098-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='web analytics'/><category scheme='http://www.blogger.com/atom/ns#' term='natural language processing'/><category scheme='http://www.blogger.com/atom/ns#' term='computational linguistics'/><category scheme='http://www.blogger.com/atom/ns#' term='document management'/><category scheme='http://www.blogger.com/atom/ns#' term='taxonomy'/><category scheme='http://www.blogger.com/atom/ns#' term='semantic analysis'/><title type='text'>Flexible Taxonomies</title><content type='html'>&lt;p&gt;In &lt;a href="http://semphonic.blogs.com/semangel/2007/01/web_analysis_to.html"&gt;Web Analytics And Content Group Management&lt;/a&gt;, Gary Angel writes about the use of &lt;a href="http://en.wikipedia.org/wiki/Taxonomy"&gt;taxonomies&lt;/a&gt; in web analytics, stating that "no single taxonomy is likely to support a very wide range of analytic problems. [...] there are only taxonomies appropriate for more or fewer analytic problems."&lt;/p&gt;&lt;p&gt;He mentions the common yet limiting use of navigational taxonomy and points out that the analysis process/application should "be able to construct multiple 'point' taxonomies that can be used for specific analytic purposes [...] The combination of a graphical drag-and-drop interface, ability to apply regex rules and the ability to create analysis specific taxonomies on the fly [...] would make it relatively easy to 'manufacture' a taxonomy for analysis..."&lt;/p&gt;&lt;p&gt;Angel is proposing a method of flexible taxonomy creation, which would be useful not only in web analytics, but also in the analysis of any type of document collection, including business, scientific or humanities.&lt;/p&gt;&lt;p&gt;So how is a flexible taxonomy implemented? Angel mentions that taxonomy often needs to be based on the content of page instead of its function. The semantic content of a page can be determined by &lt;a href="http://en.wikipedia.org/wiki/Computational_linguistics"&gt;computational linguistics&lt;/a&gt;, which includes &lt;a href="http://en.wikipedia.org/wiki/Natural_language_processing"&gt;natural language processing&lt;/a&gt; and semantic &lt;a href="http://en.wikipedia.org/wiki/Semantic_relatedness"&gt;relatedness&lt;/a&gt;/&lt;a href="http://en.wikipedia.org/wiki/Semantic_differential"&gt;differential&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;Theoretically, a document (or even a paragraph) could be analyzed to determine its semantic meaning. The semantic data could then be used to build a taxonomy, either automatically without user-intervention, or as Angel proposes, dynamically, allowing the analyst to determine what characteristics define their custom taxonomic unit.&lt;/p&gt;&lt;p&gt;For example, in building a taxonomy that describes communications technology, an analyst would undoubtedly include Apple's new &lt;a href="http://www.apple.com/iphone/"&gt;iPhone&lt;/a&gt;. Depending on its commercial success, the present and future technology included in the iPhone could heavily influence the rest of the communications industry. Semantic analysis of &lt;a href="http://en.wikipedia.org/wiki/Iphone"&gt;documents describing the iPhone&lt;/a&gt; would reveal that its &lt;a href="http://en.wikipedia.org/wiki/Multi-touch"&gt;multi-touch&lt;/a&gt; interface is, in part, the result of research conducted by &lt;a href="http://cs.nyu.edu/~jhan/"&gt;Jefferson Han&lt;/a&gt; at &lt;a href="http://nyu.edu/"&gt;NYU&lt;/a&gt;. New taxonomic units could then be created that classify the various influences on a particular technology, such as market, military or academic.&lt;/p&gt;&lt;p&gt;I greatly appreciate Angel's thoughtful remarks about flexible taxonomies. The fields of web analytics and document management obviously have much in common.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-4205019546656539546?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/4205019546656539546/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=4205019546656539546' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/4205019546656539546'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/4205019546656539546'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/flexible-taxonomies.html' title='Flexible Taxonomies'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-125248366393370955</id><published>2007-01-11T07:20:00.000-05:00</published><updated>2007-01-12T15:00:34.840-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='file type'/><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><title type='text'>File Type Metadata Discovery, Part 1: Audio</title><content type='html'>&lt;p&gt;In a &lt;a href="http://fredeaker.blogspot.com/2006/12/file-type-mime-detection.html"&gt;previous&lt;/a&gt; article, I evaluated various libraries to determine which most accurately identified a file's type. This article represents part one in a series of articles that explore how to discover metadata about a file after its type has been detected.&lt;/p&gt;&lt;p style="font-weight: bold;"&gt;Audio Metadata&lt;/p&gt;&lt;p&gt;Based on the nature of digital audio &lt;a href="http://en.wikipedia.org/wiki/Audio_file_format"&gt;Wikipedia cites "sample rate, resolution and number of channels"&lt;/a&gt; as important audio file format parameters. This type of technical metadata is important in classifying and organizing digital assets. Determining non-technical metadata, such as title, author and date are beyond the scope of this article, &lt;a href="http://musicip.com/"&gt;although&lt;/a&gt; &lt;a href="http://www.freedb.org/"&gt;there&lt;/a&gt; &lt;a href="http://musicbrainz.org/"&gt;are&lt;/a&gt; &lt;a href="http://www.allmediaguide.com/lasso/"&gt;many&lt;/a&gt; &lt;a href="http://www.gracenote.com/"&gt;resources&lt;/a&gt; that address this type of metadata discovery.&lt;/p&gt;&lt;p&gt;Much of Java media development revolves around playing, streaming, recording and editing. However, only metadata discovery is consider in digital asset management (and in this article).&lt;/p&gt;&lt;p style="font-weight: bold;"&gt;Discovering Audio Metadata with the Java Sound API&lt;/p&gt;&lt;p&gt;According to Sun, the &lt;a href="http://java.sun.com/products/java-media/sound/"&gt;Java Sound API&lt;/a&gt; "provides low-level support for audio operations." The &lt;a style="font-family: courier new;" href="http://java.sun.com/j2se/1.5.0/docs/api/javax/sound/sampled/AudioFormat.html"&gt;javax.sound.sampled.AudioFormat&lt;/a&gt; and &lt;a style="font-family: courier new;" href="http://java.sun.com/j2se/1.5.0/docs/api/javax/sound/sampled/AudioFileFormat.html"&gt;javax.sound.sampled.AudioFileFormat&lt;/a&gt; both allow access to an audio file's metadata. Using the static methods of the &lt;a style="font-family: courier new;" href="http://java.sun.com/j2se/1.5.0/docs/api/javax/sound/sampled/AudioSystem.html"&gt;javax.sound.sampled.AudioSystem&lt;/a&gt; class, we can get an AudioFileFormat based on a File, InputStream or URL. An AudioFormat is obtained by invoking the AudioFileFormat's getFormat() method. &lt;a href="http://www.jsresources.org/"&gt;Java Sound Resources&lt;/a&gt; provides &lt;a href="http://www.jsresources.org/examples/AudioFileInfo.java.html"&gt;an excellent example&lt;/a&gt; of how this is done. In summary, the following list shows what metadata is accessible through the Java Sound API.&lt;/p&gt;&lt;table style="background-color:#ccddcc;" border="0" cellpadding="5" cellspacing="0" width="100%"&gt;&lt;tbody&gt;&lt;tr style="background-color: rgb(222, 233, 204);"&gt;&lt;td colspan="2"&gt;&lt;p style="font-weight: bold;"&gt;AudioFileFormat&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;getType()&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;audio file type, such as WAVE or AU&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;properties()&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;unmodifiable map of properties that specify additional informational meta data (like a author, copyright, or file duration). Properties are optional information, and file reader and file writer implementations are not required to provide or recognize properties&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;getByteLength()&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;size in bytes of the entire audio file (not just its audio data)&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;getFrameLength()&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;length of the audio data contained in the file, expressed in sample frames&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style="background-color:#dee9cc;"&gt;&lt;td colspan="2"&gt;&lt;p style="font-weight: bold;"&gt;AudioFormat&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;getChannels()&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;number of channels&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;getEncoding()&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;type of encoding for sounds in this format&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;getFrameRate()&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;frame rate in frames per second&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;getFrameSize()&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;frame size in bytes&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;getSampleRate()&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;sample rate&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;getSampleSizeInBits()&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;size of a sample&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;p&gt;properties()&lt;/p&gt;&lt;/td&gt;&lt;td&gt;&lt;p&gt;an unmodifiable map of properties&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="2"&gt;&lt;p  style="font-size:85%;"&gt;&lt;span style="font-weight: bold;"&gt;NOTE:&lt;/span&gt; The duration of an audio file (in seconds) can be computed by multiplying the frame length by the frame rate.&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;p&gt;The Java Sound API appears to meet our needs well, however, &lt;a href="http://java.sun.com/products/java-media/sound/techReference/javasoundfaq.html#formats"&gt;it only supports AIFF, AU, WAV and some MIDI based formats!&lt;/a&gt; To remedy this, &lt;a href="http://www.jsresources.org/faq_misc.html#sec_spi"&gt;the API provides a service provider interface (SPI)&lt;/a&gt; to support more file formats. &lt;a href="http://www.javazoom.net/mp3spi/mp3spi.html"&gt;MP3&lt;/a&gt;, &lt;a href="http://www.javazoom.net/vorbisspi/vorbisspi.html"&gt;OGG&lt;/a&gt;, &lt;a href="http://jmac.sourceforge.net/"&gt;APE (Monkey's Audio)&lt;/a&gt;, &lt;a href="http://jflac.sourceforge.net/"&gt;FLAC&lt;/a&gt; and even &lt;a href="http://jspeex.sourceforge.net/"&gt;Speex&lt;/a&gt; SPIs are available. An alternative implementation of the Java Sound API, &lt;a href="http://www.tritonus.org/"&gt;Tritonus&lt;/a&gt;, also provides some SPI &lt;a href="http://www.tritonus.org/plugins.html"&gt;plug-ins&lt;/a&gt; (the OGG SPI requires Tritonus).&lt;/p&gt;&lt;p&gt;It should also be noted here that an SPI will implement the properties() method of the AudioFileFormat and AudioFormat classes. The properties returned may include non-technical metadata, but this is solely dependent upon the audio file format.&lt;/p&gt;&lt;p style="font-weight: bold;"&gt;What about other Java media libraries?&lt;/p&gt;&lt;p&gt;According to Sun, the &lt;a href="http://java.sun.com/products/java-media/jmf/index.jsp"&gt;Java Media Framework (JMF)&lt;/a&gt; "can capture, playback, stream, and transcode multiple media formats." In &lt;a href="http://www.jsresources.org/faq_general.html#js_vs_jmf"&gt;comparing the JMF to Java sound&lt;/a&gt;, we find that the JMF does have more codecs, however the ability to capture audio specific metadata is limited. The focus of the JMF is not on metadata. At least from what I can tell. If anyone has found the case to be different, &lt;a href="mailto:fredeaker@gmail.com"&gt;please let me know&lt;/a&gt;. The same goes with the &lt;a href="http://fmj.sourceforge.net/"&gt;FMJ&lt;/a&gt; and &lt;a href="http://www.alphaworks.ibm.com/tech/emb"&gt;Enterprise Media Beans&lt;/a&gt;.&lt;/p&gt;&lt;p style="font-weight: bold;"&gt;Conclusion&lt;/p&gt;&lt;p&gt;The Java Sound API currently provides the best means of discovering an audio file's technical metadata.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-125248366393370955?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/125248366393370955/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=125248366393370955' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/125248366393370955'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/125248366393370955'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/file-type-metadata-discovery-part-1.html' title='File Type Metadata Discovery, Part 1: Audio'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-3448250722980473528</id><published>2007-01-05T18:54:00.000-05:00</published><updated>2007-01-05T18:58:10.275-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='semantic search'/><title type='text'>Centiare on the Heels of Wikipedia</title><content type='html'>&lt;blockquote&gt;"semantic web" technology is installed on Centiare. This means registered users can perform amazing searches on Centiare that just wouldn’t be possible on Google, MySpace, or Wikipedia. Imagine multi-level searches for very specific things, like:&lt;br /&gt;&lt;br /&gt;* Are there any white males, over 6 feet tall, with a Masters degree, in Pennsylvania?&lt;br /&gt;* Locate all home heating oil companies, at least 50 years in business, in New Jersey.&lt;br /&gt;* Find all Centiare Directory entities situated between the 39th and 40th parallels.&lt;br /&gt;&lt;br /&gt;These searches would be practically impossible in a wiki database without semantic web enabled. It's no wonder that the inventor of the World Wide Web, Tim Berners-Lee, has literally written a roadmap for semantic web -– he knows it's that important. And Internet prognosticators are finally agreeing that in 2007, semantic web will usher in the "Web 3.0" era.&lt;/blockquote&gt;&lt;br /&gt;&lt;a href="http://www.sbwire.com/news/view/9912"&gt;Read more...&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-3448250722980473528?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.sbwire.com/news/view/9912' title='Centiare on the Heels of Wikipedia'/><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/3448250722980473528/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=3448250722980473528' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/3448250722980473528'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/3448250722980473528'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/centiare-on-heels-of-wikipedia.html' title='Centiare on the Heels of Wikipedia'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-8548025330965461178</id><published>2007-01-03T11:25:00.000-05:00</published><updated>2007-01-03T11:31:49.345-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='file type'/><title type='text'>File type detection follow up</title><content type='html'>&lt;a href="http://earl.strain.at/space/comments-2007-01-02"&gt;earl.strain.at&lt;/a&gt; ran the test files used in my &lt;a href="http://fredeaker.blogspot.com/2006/12/file-type-mime-detection.html"&gt;file type detection article&lt;/a&gt; against &lt;a href="http://www.darwinsys.com/file/"&gt;file&lt;/a&gt;, "the open source implementation of the file(1) command that ships with every free operating system."&lt;br /&gt;&lt;br /&gt;&lt;a href="http://earl.strain.at/space/comments-2007-01-02"&gt;Read about his results...&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-8548025330965461178?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://earl.strain.at/space/comments-2007-01-02' title='File type detection follow up'/><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/8548025330965461178/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=8548025330965461178' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/8548025330965461178'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/8548025330965461178'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/file-type-detection-follow-up.html' title='File type detection follow up'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-2641592671587775384</id><published>2007-01-02T13:52:00.000-05:00</published><updated>2007-01-11T08:06:01.444-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='taxonomy'/><title type='text'>Intellisophic Achieves Patent Milestone on Document Indexing System &amp; Methods</title><content type='html'>&lt;blockquote&gt;&lt;a href="http://www.intellisophic.com/"&gt;Intellisophic Inc.&lt;/a&gt;, a leading provider of information products to the search and text mining industry, announced that it has been granted allowance by the U.S. Patent and Trademark Office on the systems and methods used for indexing documents based on relevance to reference publications such as a book, web page, newspaper or encyclopedia. The patent also covers core technology relating to generating keyword indices and topic hierarchy from the reference materials for the purposes of scoring or matching other documents based on relevance.&lt;/blockquote&gt;&lt;br /&gt;&lt;a href="http://www.techweb.com/showPressRelease.jhtml?articleID=X565744"&gt;Read the full article at TechWeb...&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-2641592671587775384?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.techweb.com/showPressRelease.jhtml?articleID=X565744' title='Intellisophic Achieves Patent Milestone on Document Indexing System &amp; Methods'/><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/2641592671587775384/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=2641592671587775384' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/2641592671587775384'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/2641592671587775384'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/intellisophic-achieves-patent-milestone.html' title='Intellisophic Achieves Patent Milestone on Document Indexing System &amp; Methods'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-2410647261285011677</id><published>2007-01-02T13:37:00.000-05:00</published><updated>2007-01-02T13:51:51.819-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='semantic search'/><category scheme='http://www.blogger.com/atom/ns#' term='google'/><category scheme='http://www.blogger.com/atom/ns#' term='hakia'/><title type='text'>Hakia: A New Google?</title><content type='html'>&lt;blockquote&gt;"...the triumph of a semantic search technology over the irrelevance of syntactic search illustrates why companies like Hakia will garner more attention in 2007."&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.line56.com/articles/default.asp?ArticleID=8103"&gt;An article on Line56.com&lt;/a&gt; discusses the power of semantic search.&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-2410647261285011677?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.line56.com/articles/default.asp?ArticleID=8103' title='Hakia: A New Google?'/><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/2410647261285011677/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=2410647261285011677' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/2410647261285011677'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/2410647261285011677'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/hakia-new-google.html' title='Hakia: A New Google?'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-6741209717263653002</id><published>2007-01-02T11:10:00.000-05:00</published><updated>2007-01-02T13:45:04.935-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='discovery'/><category scheme='http://www.blogger.com/atom/ns#' term='information overload'/><category scheme='http://www.blogger.com/atom/ns#' term='document management'/><category scheme='http://www.blogger.com/atom/ns#' term='digital asset management'/><category scheme='http://www.blogger.com/atom/ns#' term='analytics'/><category scheme='http://www.blogger.com/atom/ns#' term='enterprise content management'/><title type='text'>Information Scarcity to Information Overload</title><content type='html'>&lt;blockquote&gt;"The focus in enterprise content management (ECM) is shifting from ending information scarcity to dealing with information overload. This dynamic explains why the disparate technologies of search, records management and analytics are now hot."&lt;/blockquote&gt;&lt;br /&gt;&lt;a href="http://www.dmreview.com/article_sub.cfm?articleId=1072477"&gt;Read full article at DMReview...&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-6741209717263653002?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.dmreview.com/article_sub.cfm?articleId=1072477' title='Information Scarcity to Information Overload'/><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/6741209717263653002/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=6741209717263653002' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/6741209717263653002'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/6741209717263653002'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/information-scarcity-to-information.html' title='Information Scarcity to Information Overload'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-102802340202061634</id><published>2007-01-01T13:06:00.000-05:00</published><updated>2007-01-01T14:11:49.280-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='charset'/><category scheme='http://www.blogger.com/atom/ns#' term='character encoding'/><category scheme='http://www.blogger.com/atom/ns#' term='unicode'/><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><category scheme='http://www.blogger.com/atom/ns#' term='codepage'/><title type='text'>Character encoding detection</title><content type='html'>&lt;p style="font-weight:bold;"&gt;Problem&lt;/p&gt;&lt;p&gt;Documents that are stored as plain text, such as XML or XHTML, often have a particular character encoding, also known as a character set or codepage. This character encoding allows applications to identify how characters should be displayed. This is especially the case with &lt;a href="http://en.wikipedia.org/wiki/CJK"&gt;CJK&lt;/a&gt; languages.&lt;/p&gt;&lt;p&gt;Detailed analysis of the problem is available in &lt;a href="http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html"&gt;A composite approach to language/encoding detection&lt;/a&gt; by Shanjian Li and Katsuhiko Momoi (2001). Li and Momoi's approach has become Mozilla's &lt;a href="http://www.mozilla.org/projects/intl/detectorsrc.html"&gt;Universal Charset Detector&lt;/a&gt;.&lt;/p&gt;&lt;p style="font-weight:bold;"&gt;Implementations&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://jchardet.sourceforge.net/"&gt;jchardet&lt;/a&gt; is a Java port of Mozilla's character set detection algorithm. (MPL)&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://icu.sourceforge.net/"&gt;International Components for Unicode (ICU)&lt;/a&gt; is a set of C/C++ and Java libraries for Unicode support, software internationalization and globalization (i18n/g11n). It grew out of the JDK 1.1 internationalization APIs, which the ICU team contributed, and the project continues to be developed for the most advanced Unicode/i18n support. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software. (&lt;a href="http://www-306.ibm.com/software/globalization/icu/license.jsp"&gt;X license&lt;/a&gt;)&lt;/li&gt;&lt;li&gt;&lt;a href="http://cpdetector.sourceforge.net/"&gt;cpdetector&lt;/a&gt; is a Java framework for configurable code page-detection of documents. (MPL)&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.ebi.ac.uk/~kirsch/monq-doc/monq/stuff/EncodingDetector.html"&gt;monq.stuff.EncodingDetector&lt;/a&gt; is part of the Java Finite Automata class library from the &lt;a href="http://www.ebi.ac.uk/Rebholz-srv/whatizit/software"&gt;European Bioinformatics Institute&lt;/a&gt;. (GPL)&lt;/li&gt;&lt;li&gt;&lt;a href="https://rome.dev.java.net/apidocs/0_8/com/sun/syndication/io/XmlReader.html"&gt;com.sun.syndication.io.XmlReader&lt;/a&gt; handles the character encoding of XML documents in Files, raw streams and HTTP streams by offering a wide set of constructors. Part of the &lt;a href="https://rome.dev.java.net/"&gt;ROME&lt;/a&gt; project for reading RSS and Atom feeds. A nice explanation of how ROME detects character encoding is available &lt;a href="http://wiki.java.net/bin/view/Javawsxml/Rome05CharsetEncoding"&gt;here&lt;/a&gt;. (Apache License)&lt;/li&gt;&lt;li&gt;&lt;a href="http://glaforge.free.fr/wiki/index.php?wiki=GuessEncoding"&gt;com.glaforge.i18n.io.CharsetToolkit&lt;/a&gt; is a utility class that guesses the charset used in a byte buffer. (Unknown license)&lt;/li&gt;&lt;li&gt;&lt;a href="http://msdn.microsoft.com/workshop/misc/mlang/mlang.asp"&gt;MLang&lt;/a&gt; is a MSDN library that lists, “detection of which possible code pages and languages text data is written in,” as one of its features. (Microsoft Windows DLL)&lt;/li&gt;&lt;li&gt;&lt;a href="http://chsdet.sourceforge.net/"&gt;Charset Detector&lt;/a&gt; is a stand alone executable module for automatic charset / encoding detection based on Mozilla's i18n component. It can be compiled for MS Windows using Delphi or Free Pascal or Linux using Delphi/Kylix. (LGPL)&lt;/li&gt;&lt;/ul&gt;&lt;p style="font-weight:bold;"&gt;Test Results&lt;/p&gt;&lt;p style="text-align:center"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://vyasa.sourceforge.net/blog/20070101/20070101_charset_results.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 350px;" src="http://vyasa.sourceforge.net/blog/20070101/20070101_charset_results.jpg" border="0" alt="Click for full-size image" /&gt;&lt;br /&gt;Click for full-size image&lt;/a&gt;&lt;/p&gt;&lt;p style="text-align:center"&gt;&lt;a href="http://vyasa.sourceforge.net/blog/20070101/2007-01-01_charset_test_results.pdf"&gt;Raw data (PDF)&lt;/a&gt; | &lt;a href="http://vyasa.sourceforge.net/blog/20070101/20070101_CharsetDetectorTester.zip"&gt;Source and test files&lt;/a&gt;&lt;/p&gt;&lt;p&gt;The HTML &lt;a href="http://www.columbia.edu/kermit/csettables.html"&gt;Character Set Tables&lt;/a&gt; from Columbia University all have META tags that specify the encoding. &lt;a href="http://cpdetector.sourceforge.net/"&gt;cpdetector&lt;/a&gt; did an excellent job in recognizing this while the other detectors failed to do so.&lt;/p&gt;&lt;p&gt;The XHTML files created from the &lt;a href="http://www.j-a-b.net/web/char/codepage-test"&gt;Character Encoding Test Page&lt;/a&gt; all have XML prologs and META tags that specify the encoding. &lt;a href="http://cpdetector.sourceforge.net/"&gt;cpdetector&lt;/a&gt; and &lt;a href="http://www.ebi.ac.uk/~kirsch/monq-doc/monq/stuff/EncodingDetector.html"&gt;monq.stuff.EncodingDetector&lt;/a&gt; both shine here, while &lt;a href="https://rome.dev.java.net/apidocs/0_8/com/sun/syndication/io/XmlReader.html"&gt;com.sun.syndication.io.XmlReader&lt;/a&gt; generated several java.io.UnsupportedEncodingExceptions.&lt;/p&gt;&lt;p&gt;The Japanese XML files from the &lt;a href="http://www.w3.org/XML/Test/"&gt;W3C&lt;/a&gt; (“pr-” and “weekly-” prefix) were best handled by &lt;a href="http://icu.sourceforge.net/"&gt;ICU&lt;/a&gt; and &lt;a href="https://rome.dev.java.net/apidocs/0_8/com/sun/syndication/io/XmlReader.html"&gt;com.sun.syndication.io.XmlReader&lt;/a&gt;, despite the fact that XML prologs were not always available.&lt;/p&gt;&lt;p&gt;Finally, the &lt;a href="http://xml.ascc.net/test/"&gt;Chinese TXT and XML files&lt;/a&gt; (“zh-” prefix), which all have XML prologs, were handled by &lt;a href="http://cpdetector.sourceforge.net/"&gt;cpdetector&lt;/a&gt;, &lt;a href="http://www.ebi.ac.uk/~kirsch/monq-doc/monq/stuff/EncodingDetector.html"&gt;monq.stuff.EncodingDetector&lt;/a&gt; and &lt;a href="https://rome.dev.java.net/apidocs/0_8/com/sun/syndication/io/XmlReader.html"&gt;com.sun.syndication.io.XmlReader&lt;/a&gt;.&lt;/p&gt;&lt;p style="font-weight:bold;"&gt;Conclusion&lt;/p&gt;&lt;p&gt;The strength of a character encoding detector lies in whether or not its focus is on statistical analysis or HTML META and XML prolog discovery. If you are processing HTML files that have META, use &lt;a href="http://cpdetector.sourceforge.net/"&gt;cpdetector&lt;/a&gt;. Otherwise, your best bet is either &lt;a href="http://www.ebi.ac.uk/~kirsch/monq-doc/monq/stuff/EncodingDetector.html"&gt;monq.stuff.EncodingDetector&lt;/a&gt; or &lt;a href="https://rome.dev.java.net/apidocs/0_8/com/sun/syndication/io/XmlReader.html"&gt;com.sun.syndication.io.XmlReader&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;What about files that have no META or prolog? That is a big question. If an XML file has neither, but contains multiple languages, such a Chinese content and English markup, a statistical analysis skewed by the more prevalent markup language may not be enough to display a document properly. This is where &lt;a href="http://en.wikipedia.org/wiki/Unicode"&gt;Unicode&lt;/a&gt; comes in. Unicode with multiple languages is handled by the application used to render the document. Check out &lt;a href="http://en.wikipedia.org/wiki/Help:Multilingual_support"&gt;Wikipedia's entry on Multilingual support&lt;/a&gt;.&lt;/p&gt;&lt;p style="font-weight:bold;"&gt;Test files&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.w3.org/XML/Test/"&gt;Extensible Markup Language (XML) Conformance Test Suites (10 December 2003)&lt;/a&gt;&lt;li&gt;&lt;li&gt;&lt;a href="http://www.oasis-open.org/committees/xml-conformance/xml-test-suite.shtml"&gt;OASIS XML Conformance Subcommittee, XML 1.0 Test Suite, Second Edition (15 March 2001)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://xml.ascc.net/test/"&gt;Chinese XML Now! Test Files&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.w3.org/MarkUp/Test/"&gt;W3C HTML/XHTML Test Suites&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://lexiconbridge.com/charset/win.htm"&gt;Windows Test page for Academic Russian&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.j-a-b.net/web/char/codepage-test"&gt;Character Encoding Test Page&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.columbia.edu/kermit/csettables.html"&gt;Character Set Tables&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p style="font-weight:bold;"&gt;Other Useful Resources&lt;/p&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://www.joelonsoftware.com/articles/Unicode.html"&gt;The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)&lt;/a&gt; By Joel Spolsky, Wednesday, October 08, 2003&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.eki.ee/letter/"&gt;Letter Database&lt;/a&gt; lists the characters and corresponding code pages for specific languages.&lt;/li&gt;&lt;li&gt;W3C I18N tutorial: &lt;a href="http://www.w3.org/International/tutorials/tutorial-char-enc/"&gt;Character sets &amp; encodings in XHTML, HTML and CSS&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://groups.google.com/group/netscape.public.mozilla.i18n"&gt;netscape.public.mozilla.i18n&lt;/a&gt; has been abandoned but replaced by &lt;a href="http://groups.google.com/group/mozilla.dev.i18n"&gt;mozilla.dev.i18n&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://schneegans.de/sv/test-cases/"&gt;XHTML Test Cases&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://juicystudio.com/mimetest/character.php"&gt;MIME Test - Character Encoding Test&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;While doing research, I used the following phrase in Google:&lt;br /&gt;&lt;a href='http://www.google.com/search?q=("character+set"+OR+"charset"+OR+"codepage"+OR+"character+encoding")'&gt;("character set" OR "charset" OR "codepage" OR "character encoding")&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-102802340202061634?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/102802340202061634/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=102802340202061634' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/102802340202061634'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/102802340202061634'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2007/01/character-encoding-detection.html' title='Character encoding detection'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-570860519139426322.post-4708611071830408799</id><published>2006-12-26T14:52:00.000-05:00</published><updated>2007-02-22T16:56:24.015-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='vyasa'/><category scheme='http://www.blogger.com/atom/ns#' term='file type'/><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><category scheme='http://www.blogger.com/atom/ns#' term='digital library'/><category scheme='http://www.blogger.com/atom/ns#' term='digital asset management'/><title type='text'>File type detection</title><content type='html'>&lt;p&gt;&lt;b&gt;NOTE:&lt;/b&gt; Since this article was written, updates have been made to the MagicMimeTypeIdentifier in the Aperture Framework. &lt;a href="http://fredeaker.blogspot.com/2007/02/magicmimetypeidentifier-update.html"&gt;Read more...&lt;/a&gt;&lt;/p&gt;&lt;p style="font-weight: bold;"&gt;Problem Description&lt;/p&gt;&lt;p&gt;My current project, &lt;a href="http://vyasa.sourceforge.net/"&gt;vyasa&lt;/a&gt;, is a digital library management system. One of the features of a digital library is the ability to recognize the types of files (digital assets) that are loaded into the repository.&lt;/p&gt;&lt;p&gt;The process of detecting a file's type (also known as a file's &lt;a href="http://en.wikipedia.org/wiki/Mime"&gt;MIME&lt;/a&gt;)  is non-trivial, yet there are important benefits. For example, type-related metadata, such the length and bitrate of an audio file, or the size and DPI of an image file, lead to comprehensive asset management.&lt;/p&gt;&lt;p style="font-weight: bold;"&gt;Possible Solutions&lt;/p&gt;&lt;p&gt;The problem and various solutions to file type detection are briefly explained in &lt;a href="http://en.wikipedia.org/wiki/File_type"&gt;this Wikipedia article&lt;/a&gt;. Detailed coverage is available in the following academic papers (PDF):&lt;/p&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://www.jmu.edu/cisc/research/publications/HICCSSTFMS04.pdf"&gt;Content Based File Type Detection Algorithms&lt;/a&gt;, Mason McDaniel and M. Hossain Heydari, Computer Science Department, James Madison University, Harrisonburg.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://www.micsymposium.org/mics_2005/papers/paper7.pdf"&gt;File Type Detection Technology&lt;/a&gt;, Douglas J. Hickok, Daine Richard Lesniak, Michael C. Rowe, Ph.D., Computer Science and Software Engineering Department, University of Wisconsin-Platteville.&lt;/li&gt;&lt;/ul&gt;&lt;p style="font-weight: bold;"&gt;Java Implementations&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://java.sun.com/products/javabeans/glasgow/javadocs/javax/activation/FileDataSource.html"&gt;javax.activation.FileDataSource&lt;/a&gt; is part of the JavaBeans(TM) Activation Framework used by the JavaMail(TM) API to manage MIME data.&lt;/li&gt;&lt;li&gt;&lt;a href="http://jmimemagic.sourceforge.net/"&gt;Java Mime Magic&lt;/a&gt;,  "retrieves file and stream mime types by checking magic headers," &lt;a href="http://www.rgagnon.com/javadetails/java-0487.html"&gt;according to Réal Gagnon&lt;/a&gt;. (LPGL)&lt;/li&gt;&lt;li&gt;&lt;a href="http://schmidt.devlib.org/ffident/"&gt;ffident&lt;/a&gt; is a Java metadata extraction, file format identification library created by Marco Scmidt. (LPGL)&lt;/li&gt;&lt;li&gt;&lt;a href="http://hul.harvard.edu/jhove/"&gt;JHOVE&lt;/a&gt; (JSTOR/Harvard Object Validation Environment) &lt;a href="http://hul.harvard.edu/jhove/"&gt;&lt;/a&gt;will identify, validate and characterize file types (LPGL).&lt;/li&gt;&lt;li&gt;&lt;a href="http://aperture.sourceforge.net/doc/javadoc/org/semanticdesktop/aperture/mime/identifier/magic/MagicMimeTypeIdentifier.html"&gt;MagicMimeTypeIdentifier&lt;/a&gt; is from the Aperture Framework. It determines the MIME type of a binary resource based on magic number-based heuristics. (AFL, OSL)&lt;/li&gt;&lt;li&gt;&lt;a href="http://maven.nuxeo.org/NXMimeType/apidocs/org/nuxeo/ecm/platform/ec/mimetype/service/MimetypeRegistryService.html"&gt;MimetypeRegistryService&lt;/a&gt; is part of the Nuexo project. (LGPL)&lt;/li&gt;&lt;/ul&gt;&lt;p style="font-weight: bold;"&gt;Non-Java Resources&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.koders.com/java/fidB7B3B5648AA854434CFCAB45EDAB4B0995A18AAF.aspx"&gt;complex.filter.detection.typeDetection&lt;/a&gt; is OpenOffice.org's code for detecting file types (LGPL).&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.magicdb.org/"&gt;Magic DB&lt;/a&gt; is a file containing magic numbers for identifying file type as well as several other file metadata. The format of the file is specified by &lt;a href="http://www.optimasc.com/products/fileid/index.html"&gt;Optima SC&lt;/a&gt;. (see license on magicdb.org)&lt;/li&gt;&lt;li&gt;Marco Schmidt lists several non-Java resources on his &lt;a href="http://schmidt.devlib.org/file-formats/"&gt;file formats&lt;/a&gt; page.&lt;/li&gt;&lt;li&gt;&lt;a href="http://pldaniels.com/filetype/"&gt;FileType&lt;/a&gt; is an internal filetype detection engine for other coders who wish to have a simple to use C module.&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.mmbase.org/mmdocs/javadoc/complete/org/mmbase/util/magicfile/package-summary.html"&gt;org.mmbase.util.magicfile&lt;/a&gt; determines file types based on a parsing of the UNIX magic command.&lt;/li&gt;&lt;/ul&gt;&lt;p style="font-weight: bold;"&gt;Character Encoding&lt;/p&gt;&lt;p&gt;Another important aspect of file type detection is character encoding (aka codepage) detection for plain text files such as HTML or XML. I will cover this topic in a &lt;a href="http://fredeaker.blogspot.com/2007/01/character-encoding-detection.html"&gt;future article&lt;/a&gt;.&lt;/p&gt;&lt;p style="font-weight: bold;"&gt;Testing&lt;/p&gt;&lt;p&gt;A series of text, image, audio, video and "other" files were used to test the Java libraries listed above. The details of the files I used are available &lt;a href="http://vyasa.sourceforge.net/FileTypeDetector_20061227.zip"&gt;here&lt;/a&gt;. The results of the test are summarized in the chart below:&lt;/p&gt;&lt;p&gt;&lt;img src="http://vyasa.sourceforge.net/2006-12-26_file-type_stats.jpg" alt="file-type_stats.jpg" /&gt;&lt;/p&gt;&lt;p&gt;The detection accuracy of most libraries was less than 50%. However, the Aperture Framework's MagicMimeTypeIdentifier was extremely accurate. It was able to correctly identify many proprietary formats. The code used to perform the actual testing, along with the test files themselves are available &lt;a href="http://vyasa.sourceforge.net/FileTypeDetector_20061227.zip"&gt;here&lt;/a&gt;. More detail about the results of the tests are available in PDF form &lt;a href="http://vyasa.sourceforge.net/2006-12-26_file_type_test_results.pdf"&gt;here&lt;/a&gt;.&lt;/p&gt;&lt;br /&gt;&lt;p style="font-weight: bold;"&gt;Conclusion&lt;/p&gt;&lt;p&gt;&lt;a href="http://aperture.sourceforge.net/doc/javadoc/org/semanticdesktop/aperture/mime/identifier/magic/MagicMimeTypeIdentifier.html"&gt;MagicMimeTypeIdentifier&lt;/a&gt; from the Aperture Framework appears to be the most reliable and accurate file type detector.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"&gt;&lt;!--
google_ad_client = "pub-0449886619260943";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as_rimg";
google_cpa_choice = "CAAQ-YeYhAIaCAdIh9i_wwwrKOm293M";
google_ad_channel = "";
//--&gt;&lt;/script&gt;
&lt;script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/570860519139426322-4708611071830408799?l=fredeaker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://fredeaker.blogspot.com/feeds/4708611071830408799/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=570860519139426322&amp;postID=4708611071830408799' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/4708611071830408799'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/570860519139426322/posts/default/4708611071830408799'/><link rel='alternate' type='text/html' href='http://fredeaker.blogspot.com/2006/12/file-type-mime-detection.html' title='File type detection'/><author><name>Fred Eaker</name><uri>http://www.blogger.com/profile/06436747828395636097</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='31' height='17' src='http://4.bp.blogspot.com/_NWRXlsa2lMs/STQFTJd5FbI/AAAAAAAAADo/3dLdYtG_HFc/S220/fred.jpg'/></author><thr:total>8</thr:total></entry></feed>
