Semantic Hypertext
From EmergiaWiki
Semantic Hypertext (course 2008/09)
Notes for a postgraduate course by Joseba Abaitua.
[ Draft version | Orignally written at Littera (consult updates) ]
This is a postgraduate course about Hypertext and the Semantic Web given at the posgraduate programme on Editorial Management at the University of Deusto, Bilbao (Spain).
Semantic hypertext is the logical evolution of hypertext. Hypertext is the natural way of presenting text in electronic form. This course is about the evolution of editing tools under the influence of both Web2.0 and the Semantic Web paradigms. One of the major outcomes of such evolution are semantic wikis. We will analyse some developments such as Ontoworld.org[1] and the semantic extension to MediaWiki, as well as one of its main achievements, project Halo[2][3].
A large part of the material we will use to describe different tools and technologies are taken from Wikipedia (and will eventually be returned to it). Wikipedia is an outstanding collaborative hypertext product with wich the author[4][5][6] occasionally contributes.
Acknowledgments: The approach exposed in the course is based on research conducted for the CollOnBus project and has benefited from the contributions of all its participants, in particular from work by María Legorburu, Inés Jacob, JosuKa Díaz Labrador, David Buján, Diego López de Ipiña, Unai Aguilera and Pablo García Bringas among others. I will like to express special gratitude to two collaborators, Igor Ruiz and Iker Porto, that helped discovering and testing both SemanticMediaWiki and project Halo.
Contents |
[edit] Introduction
Applying the model of the Semantic Web in the way Tim Berners-Lee envisioned is not an easy task.
[edit] Objectives
- Find, explore and enjoy an ISBN on Google Book Search
- Assess Wikipedia's metadata and taxonomies
- Register and create a FOAF record in Ontoworld.org
- Review Halo Project
- Develop a piece from a semantic wiki of an Event
[edit] Background
[edit] Electronic Publishing
Electronic editing tools are now widespread and offer diverse capabilities.
- Editing tools
- Text editors: used for editing plain text files[7].
- Electronic typesetting: involves the presentation of textual material in graphic form on paper or some other medium. Before the advent of desktop publishing, typesetting of printed material was produced in print shops by compositors working by hand, and later with machines[8].
- Word processors: for the production (including composition, editing, formatting, and possibly printing) of any sort of printable material[9].
- Desktop publishing: combine a personal computer and page layout software to create publication documents on a computer for either large scale publishing or small scale local economical multifunction peripheral output and distribution[10].
- Authoring tool: used to create and package content deliverable to end users, commonly used to create e-learning modules.[11].
- Content management systems: used to manage the content of a Web site[12].
[edit] Markup languages
A major breakthrough in electronic publishing is the availability of murkup languages.
- Tex, SGML, HTML, XML, XHTML
- Text Encoding Initiative (TEI) is a consortium of institutions and research projects which collectively maintains and develops a standard for the representation of texts in digital form. Originally sponsored by three scholarly societies, the TEI is now an independent membership consortium, hosted by academic institutions in the US and in Europe. Its major deliverable is a set of Guidelines, which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics. Since 1994, these guidelines are a widely-used standard for text materials for performing online research and teaching[13].
[edit] Hypertext
Hypertext is the natural way of presenting text in electronic form. It most often refers to text on a computer that will lead the user to other, related information on demand. Hypertext represents a relatively recent innovation to user interfaces, which overcomes some of the limitations of written text. Rather than remaining static like traditional text, hypertext makes possible a dynamic organization of information through links and connections (called hyperlinks). Hypertext can be designed to perform various tasks; for instance when a user "clicks" on it or "hovers" over it, a bubble with a word definition may appear, a web page on a related subject may load, a video clip may run, or an application may open[14].
- Hypertext and the World Wide Web. In the late 1980s, Berners-Lee, then a scientist at CERN, invented the World Wide Web to meet the demand for automatic information-sharing among scientists working in different universities and institutes all over the world. In 1992, Lynx was born as the world's first Internet web browser. Its ability to provide hypertext links within documents that could reach into documents anywhere on the Internet began the creation of the web on the Internet[15].
- Timeline of hypertext technology[16]
[edit] Natural Language Processing
- Text mining tools
- Part of speech (POS) taggers: marking up the words in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e., relationship with adjacent and related words in a phrase, sentence, or paragraph[17].
- Syntactic and semantic parsers: parsing (more formally: syntactic analysis) is the process of analyzing a sequence of tokens to determine its grammatical structure with respect to a given formal grammar[18].
- Meaning extraction tools
- Data mining: sorting through large amounts of data and picking out relevant information. It is usually used by business intelligence organizations, and financial analysts, but it is increasingly used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods[19].
- Metadata and content annotators: extra information asserted with a particular point in a document or other piece of information[20].
- Web annotation systems: used to add, modify or remove information from a Web resource without modifying the resource itself[21].
- Applications
- Question answering
- Semantic search
- Text summarisation
- Machine translation
[edit] Digital Resource Management
- Metadata and Markup
- MARC, BibTeX, Dublin Core
- eXtensible Markup Language
- List of XML markup languages
- Metadata, Taxonomies, Ontologies [22]
[edit] Dublin Core
Sitio oficial: http://dublincore.org/
- Metadata Element Set
- Dublin Core Qualifiers
- "...extending or refining the original 15 elements of the Dublin Core Metadata Element Set (DCMES). The terms or "qualifiers" listed here were identified, generally in working groups of the Dublin Core Metadata Initiative, (DCMI) and judged by the DCMI Usage Board to be in conformance with principles of good practice for the qualification of Dublin Core metadata elements."
Diane Hillmann. (2005, November 07). Using Dublin Core. Retrieved 17:30, April 2, 2008, from http://dublincore.org/documents/usageguide/
Dublin Core. (2008, March 25). In Wikipedia, The Free Encyclopedia. Retrieved 15:20, April 2, 2008, from http://en.wikipedia.org/w/index.php?title=Dublin_Core&oldid=200916641
[edit] Semantic Web
[edit] Web evolution
[edit] Web 2.5
Web 2.0 is the social web. The Semantic Web as it was envisoned is rather antisocial. It cannot compare, because it is conceived as a technological push, rather than as an output. Web2.0 is largelly a consequence of semantic (or presemantic) technologies. Web 3.0 will be the movile, ubiquitous web; the Web of devices (Vázquez et al 2007).
[edit] Semantic Hypertext
Semantic wikis are a typical Web 2.5 outcome. It will never be so widespread neither will reach the development of Web 2.0 phenomena, such as Wikipedia. However there may be automatic parsers that partially convert wiki hypertexts into semantic hypertexts.
[edit] SemanticMediaWiki
Consider Ontoworld project.
[edit] Project Halo
A step forward: more tools.
[edit] Experimenta wiki
Semantic wiki by Feria Madrid es Ciencia
[edit] Exercises
- Find an ISBN on Google Book Search
- Consult Hypertext. (2008, February 21). In Wikipedia, The Free Encyclopedia. Retrieved 02:57, February 23, 2008, from http://en.wikipedia.org/w/index.php?title=Hypertext&oldid=193068069
- Clic on ISBN link of Landow, George (2006). Hypertext 3.0 Critical Theory and New Media in an Era of Globalization: Critical Theory and New Media in a Global Era (Parallax, Re-Visions of Culture and Society). Baltimore: The Johns Hopkins University Press. ISBN 0-8018-8257-5.
- Find this book on Google Book Search. Explore and enjoy.
- Wikipedia's metadata and taxonomies: Wikipedia's categories fall under the notions of metadata and taxonomies, and share properties of both. We are going to consider two mechanisms used by Wikipedia to classify and organise information: Templates and Categories [29].
- Debate and justify wether or not categories and templates from Wikipedia are taxonomies or folksonomies.
- Select and describe two templates
- Select and compare categories, and subcategories of one or two articles in different languages
- Review of semantic wiki properties at Ontoworld [30]
- How can FOAF be used [31], [32]
- Property:Foaf:knows [33], example Barry Norton at Ontoworld
- Other semantic relationships: Member_of, Participant_of, etc.
- Review of other projects
- Project Halo
- Review of Experimenta wiki
- Semantic wiki for an event
[edit] References
Electronic publishing. (2008, February 12). In Wikipedia, The Free Encyclopedia. Retrieved 01:51, February 23, 2008, from http://en.wikipedia.org/w/index.php?title=Electronic_publishing&oldid=190925336
Markup language. (2008, February 21). In Wikipedia, The Free Encyclopedia. Retrieved 01:54, February 23, 2008, from http://en.wikipedia.org/w/index.php?title=Markup_language&oldid=192987990
Word processor. (2008, February 22). In Wikipedia, The Free Encyclopedia. Retrieved 03:42, February 23, 2008, from http://en.wikipedia.org/w/index.php?title=Word_processor&oldid=193343984

