GoodRelations is a standardized vocabulary for product, price, and company data that can (1) be embedded into existing static and dynamic Web pages and that (2) can be processed by other computers. This increases the visibility of your products and services in the latest generation of search engines, recommender systems, and other novel applications.
Martin Hepp (UniBW)
martin.hepp at ebusiness-unibw.org
Sat Oct 10 17:39:52 CEST 2009
Dear all: The distributed character of the Web makes it very likely that the exact same entity is being defined in multiple graphs. In particular, there will be significant redundancy in the definition of - Business Entities and - Product Models. The main cause is that providers of data may define entities locally rather than searching for an authoritative URI on the Web. For example, someone exporting a catalog may want to refer to the manufacturer of a gr:ProductOrServiceModel or gr:ProductOrServicesSomeInstancesPlaceholder without searching for the authoritative URI of that manufacturer. This is not a major technical problem, since providers of commerce dataspaces will very likely offer entity consolidation as one important feature. For your own projects, you can start with the following simple SPARQL CONSTRUCT rules to create owl:sameAs statements so that multiple definition of the very same entities will be treated as one. Note that the current rules assume perfect equivalence of the legal names resp. the EAN/UPC code. You could use more sophisticated filters for expanding the scope of the consolidation, e.g. ignoring capitalization and special characters (e.g. "Miller Ltd." vs. "miller ltd"). # Consolidate Business Entities that have the exact same legalName CONSTRUCT {?u2 owl:sameAs ?u1.} WHERE {?u1 a gr:BusinessEntity. ?u2 a gr:BusinessEntity. ?u1 gr:legalName ?name1. ?u2 gr:legalName ?name2. FILTER (?u1!=?u2 && ?name1=?name2)} # Consolidate Product Models that have the exact same gr:hasEAN_UCC-13 CONSTRUCT {?u2 owl:sameAs ?u1.} WHERE {?u1 a gr:ProductOrServiceModel. ?u2 a gr:ProductOrServiceModel. ?u1 gr:hasEAN_UCC-13 ?ean1. ?u2 gr:hasEAN_UCC-13 ?ean2. FILTER (?u1!=?u2 && ?ean1=?ean2 && ?ean1!="")} Important: 1. Make sure you consolidate only nodes of the same type. For example, two gr:Offerings may have the same gr:hasEAN_UCC-13 property, but are of course not the same. 2. For local sets of such statements, you have any degree of freedom and I encourage you to experiment with different ones. Before publishing such sameAs statements, however, run thorough quality checks first. Reckless usage of sameAs can spam the Web of Linked Data, and dataspaces will consequently ignore all your graphs. Best wishes Martin Hepp -- -------------------------------------------------------------- martin hepp e-business & web science research group universitaet der bundeswehr muenchen e-mail: hepp at ebusiness-unibw.org phone: +49-(0)89-6004-4217 fax: +49-(0)89-6004-4620 www: http://www.unibw.de/ebusiness/ (group) http://www.heppnetz.de/ (personal) skype: mfhepp twitter: mfhepp Check out GoodRelations for E-Commerce on the Web of Linked Data! ================================================================= Webcast: http://www.heppnetz.de/projects/goodrelations/webcast/ Recipe for Yahoo SearchMonkey: http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey Talk at the Semantic Technology Conference 2009: "Semantic Web-based E-Commerce: The GoodRelations Ontology" http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287 Overview article on Semantic Universe: http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html Project page: http://purl.org/goodrelations/ Resources for developers: http://www.ebusiness-unibw.org/wiki/GoodRelations Tutorial materials: CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_IEEE_CEC%2709 -------------- next part -------------- A non-text attachment was scrubbed... Name: martin_hepp.vcf Type: text/x-vcard Size: 308 bytes Desc: not available URL: <http://ebusiness-unibw.org/pipermail/goodrelations/attachments/20091010/e100d64a/attachment.vcf>