Biblioteca 2.0/Curso
From EmergiaWiki
Taxonomies, folksonomies and ontologies (2008/09)
Módulo del curso de doctorado I0703-Web semántica, Programa de Doctorado en Sistemas de Información, Facultad de Ingeniería-ESIDE, Universidad de Deusto, marzo de 2009 impartido por Josuka Díaz Labrador y Joseba Abaitua.
Contents |
[edit] Contextualization
- UD's new library
- Resource discovery and knowledge extraction
- Online open access
- Formats and genres: Wikipedia articles, scientific papers
- Content processing, knowledge extraction and aggregation
- Online open access
[edit] Questions
- Why are scientific papers the main literary or communicative format used by the scientific community?
- What is their structure? What are the best means for dealing with them?
- Which tools can we use to discover, manage and extract information from them
- How can these tools be combined to make them more efficient?
- What are the future directions of the field? Which field or fields?
[edit] Planning
- March 2. Introduction
- Resource discovery
- Scientific publications: sources, problems, and perspectives
- March 3. SRM and Open Access
- Scholar reference management (practical exercise 1)
- Open Access, zOAZ
- March 4. Information extraction and knowledge management
- Information extraction (practical exercise 2)
- Knowledge representation: Review of YAGO (IE, AI, HAL 9000, OWL/RDFS)
- Project proposal and NLP/KR tools and demo systems (practical exercise 3)
- March 5. Knowledge construction and dissemination
- Project proposal: PaperSqueezer
- Presentation of slides (practical exercises 4 and 5)
- Discussion: What can we do for UD new library?
[edit] Exercises
- Exercise 1. How to deal with scientific references
- March 2. Select a cloud computing tool for managing scientific references
- March 3. Make a full reference of a scientific paper
- March 3. Include short references of six more papers, both forward and backward citations of your main paper (by means of citation search engines)
- Exercise 2. Information extraction from scientific papers
- March 4. Identify in your paper entities, relations, and facts
- March 4. Make a small taxonomy of entities (classes, subclasses)
- March 4. Test your taxonomy against SUMO http://www.ontologyportal.org/
- Exercise 3. Demo and tool testing
- March 4. Test and compare two knowledge extraction tools
- Exercise 4. Knowledge dissemination
- March 4. Include at least one short review of the papers in your SRM account
- March 4. Add three slides [4] with a summary of your paper
- March 5. Add three slides [5] with shallow evaluation of the tested tools
- March 5. Add one slide [6] with a contribution proposal to the new university Library
- March 5. Present your slides [7]
- Exercise 5. Aggregation of shared experience
- March 5. RSS/feed: a room at FriendFeed.com
[edit] Resource discovery
What do I do when I am looking for related work or documentation on a topic of my research? Do I use catalogues or search engines? Which type of engines? Do I try other resources?
[edit] Resources: reference
[edit] Catalogues (mainly book-oriented)
- UD http://catalogo.biblioteca.deusto.es/
- BDDOC-CSIC http://bddoc.csic.es:8080
- WorldCat http://www.worldcat.org/
[edit] Online Publishing (journal and conference papers)
- Journals in print in UD http://catalogo.biblioteca.deusto.es/ (select "Journals" on left menu)
- Databases in UD Digital Library
- Publishers
- Springer http://www.springer.com/ Computer Science
- Springer-CS online (UD suscription access, full-text?)
- ACM http://www.acm.org/ Digital Library, free search, toll access (TA) to full-text
- IEEE http://www.ieee.org/, IEEE-Computer Society http://www.computer.org/, idem
- SIAM http://www.siam.org/ SIAM Journals Online, idem
- etc.
- To publish is their bussiness
[edit] Catalogues (scientific paper oriented)
- ISI Web of Knowledge (WoK) http://www.accesowok.fecyt.es/login/
- also known before as ISI, known now as TMAC (The Mother of All Catalogues)
- because of the "impact factor" (JCR-Journal Citation Reports)
- anyway, it's only a database, it offers only abstract and metadata (reference) information, no full-text, even TA
[edit] Online (self-)publishing (authors, groups)
- DELi publications, bad example, not updated for a long time
- Many researchers/groups have now a publications page, full-text papers indeed
- Very good, but this leads to several problems
- publisher copyright (because of this, sometimes author publish draft versions)
- nanometric disaggregation of resources (it's the Web!), you have to know the author to access his/her publications
- metadata not present or difficult to (machine-) process
[edit] Resources: full-text
A reference (author, title), and an abstract, are not enough to make use of the previous works of others. We need full text.
This conclusion seems very clever, but it is trivial. Actually, references are not only not enough, but they are almost nothing. We know or learn because of the content. We can discern by means of an abstract of 10 lines whether or not we want to know more about some work, but we need the content to fulfill our learning of that work. In the world of scientific databases, catalogues, journal listings, references, etc., we get used to think that there is no life out of there (out of the-references).
Yesterday, we used the Library's catalogue to look for book titles of our interest, and then we went to the bookcase to read the books (not only the titles). In the computer and Internet age, we use digital catalogues to look for paper titles of our interest, and then... there are several difficulties to read the papers. The thing to appreciate is that digital full text of a paper exists without any doubt: now, every journal and conference asks authors to send PDF (or another text-preserved format), so that the full text resource exists. There are, among others, several possibilities:
- You can reach the TA agent (publisher, organization) responsible for the paper publication. If you are lucky, being at the UD, you can access the full text digital libraries of ACM, IEEE Computer Society, SIAM, and several others listed before. In other cases, we are not lucky.
- You could look for the author/group homepage, to see if they have full text versions of his work. Sometimes, it works.
- There are now sites that aggregate document bases of considerable extension, most of them of public access and full text contents (see later Google Scholar, Citeseer, DBLP and others).
The relevant aspect to know is that, partly in reaction to the TA (closed access) policies of publishers and other organizations, there is a movement that seeks and promotes open access to scientific publications, much in the way of well-known phenomena of the digital era as open source software, free software, Creative Commons licenses, P2P networks, and many others. Open access movement shares with the aforementioned the implication of a series of factors such as:
- payment for a resource
- copyrights problems
- shift from hard (physical) media to digital media, and others
but has also distinguished ones.
[edit] Open Access (OA)
- EPrints for Digital Repositories http://www.eprints.org/
- #1 PubMed Central http://www.pubmedcentral.nih.gov/, 1,530,623 records (2009, March 1)
- #5 ArXiv http://arxiv.org/, 522,053 records (2009, March 1)
- #11 Dialnet http://dialnet.unirioja.es/, 237,030 records (2009, March 1)
- Based partly on self-archiving ("green road to OA")
- "pre-(peer-review)-prints" and "post-prints" are archived
- ROMEO http://romeo.eprints.org/publishers.html, "green publisher"= Publisher's green light to self-archive refereed postprint
- Springer, ACM, IEEE, SIAM are green publishers
- History
- Budapest Open Access Initiative (BOAI)
- Bethesda Statement on Open Access Publishing
- Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities
- Recent declarations of USA Universities, for example Stanford University School of Education’s Open Access Policy
- Similar initiatives
- Public Knowledge Project http://pkp.sfu.ca/, promoting Open Journal Systems http://pkp.sfu.ca/?q=ojs and Open Conference Systems http://pkp.sfu.ca/?q=ocs
- Public Library of Science http://www.plos.org/, with a Open Letter to Scientific Publishers
[edit] Technology behind Open Access
- Open Archives Initiative http://www.openarchives.org/
- OAC y OAC-onto (Open Archive Cataloger): study, design, and implementation of a cataloging agent that brings to content producers and users a mean of sharing the tasks related to the creation, updating, and consumption of these contents, by means of interchange and distribution protocols based in Open Archives.
- Zope Open Archives Cataloguer ¡en SourceForge!
- Interfaz de consultas
- Aplicación: servidor de publicaciones
- Joseba Abaitua, Josu Azpillaga, JosuKa Díaz-Labrador, Jon Fernández, Inés Jacob, Txus Sánchez, Fernando Quintana [2005] “Ontology-based browsing of bibliographic metadata”
- JosuKa Díaz, Inés Jacob, Joseba Abaitua, Fernando Quintana, Jon Fernández, Txus Sánchez, Garikoitz Etxebarria, Josu Azpillaga [2005] “OAC: recolección y agregación de metadatos heterogéneos para un proveedor de servicios OAI”
[edit] Open content access
Open access has many benefits. First of all, universal dissemination of knowledge and the whole implications of that, as has been stressed in the references above. But there is another side of the word open, related to the support media. For example, if the document has been digitalized as an image, the content is open for the human eye, but closed to any other respect. Not very fortunately, because it is not an information representation format, the vast majority of scientific works is now disseminated in PDF, but at least PDF preserves textual content as such. We can suppose then that full text means also the cappability of accessing the textual content.
This is the final key question, because full text is the entry key to the full text resource discovery. As you may have concluded now, many of the databases, catalogues, search engines listed above allow for a reference, or metadata-based, resource discovery, but they miss the content in most cases. Imagine Google could only give results based on the title and meta elements of web pages (you may think that the World Wide Web would not be such a mess doing that, but this is another question).
The analogy is very relevant, because the head element of a web page plays exactly the same role as the reference or metatada associated to a document, althougth its expressiveness is very limited. But you may remember the reasons of Google success ten years ago: before Google, there were mainly web directories (Yahoo, Lycos, etc., think that directory management was a human performed task), that is, structured metadata-based resource discovery. Google is a full-text automatic resource discoverer, althought it also uses metadata information (the head element).
Anyway, the thing is that if full text documents (PDF included) are put in any public access web page, we have the whole rich set of cappabilities and tools that now exist on the web to perform discovery and knowlegde extraction (you may think again that if the web was not enough such a mess, let's enrich it with the whole scientific production of the planet; well, actually, it's being done). For a first example, as Google indexes PDF documents, the fact is that we can use Google to perform the first step of discovery (and as a corolary, that reference-based engines are mostly condemned to dissapear).
Indeed, from the strict scientific point of view, this is very significative. On the first hand, when a research is started, it is essential to know and mention the previous works on the subject ("the shoulders of the giants", as said by Newton). Now it's the time, by means of open content access, in which this scientific premise can be accomplished better than ever.
On the other hand, a research is relevant if it gets itself a "giant shoulder", that is, if others can use and extend it. In that case, the research gets cited. The quantitative measure of number of cites equals quality may be questioned (it is the origin of the impact factor, but also of the page rank mechanism used by Google, as you know), but the qualitative idea is there. The thing is that your work may be cited if at least it is discovered, so enhancing public content access to your work undoubtely increments the probability of being cited, supposed it is relevant.
[edit] Discovery
There is a bulk of scientific and academic information available on the Web that serves our research purposes. The question is how we discover the most relevant materials, and how we filter them to make the optimal selection.
We can benefit from several tools for the discovery, selection and management of the information.
[edit] Citation harvesters
- OAIster http://www.oaister.org/
- Dialnet http://dialnet.unirioja.es/
- REBIUN http://www.rebiun.org/
[edit] Scholar search engines (public)
- Google Scholar http://scholar.google.es/
- About: Alireza Noruzi (2005), Steven J. Bell (2009), Marilyn Christianson (2007), Daniel Pauly and Konstantinos I. Stergiou (2005)
- Other interesting search engines: Grokker, Clusty
- CiteSeer http://citeseerx.ist.psu.edu/ Wikipedia
- IDEAS http://ideas.repec.org/ (bibliographic database dedicated to Economics)
- 3DGreco http://alfama.sim.ucm.es/3DGreco/ (Colección Digital Complutense)
- DBPL http://www.informatik.uni-trier.de/~ley/db/ (Universität Trier)
- The Collection of Computer Science Bibliographies http://liinwww.ira.uka.de/bibliography/ (Universität Karlsruhe)
[edit] Social tagging
- Bookmarks
- Delicious http://delicious.com/joseba_abaitua/
- Twine http://www.twine.com/user/abaitua
- StumbleUpon, Furl, Mr Wong
- Books
- LibraryThing http://www.librarything.es/home/JosebaAbaitua
[edit] Scholar reference management
- Scientific papers
- (Group A) Bibsonomy http://www.bibsonomy.org/user/josebaabaitua, Ontoworld
- (Group B) EndNote http://www.endnote.com/
- (Group C) CiteUlike http://www.citeulike.org/user/JosebaAbaitua
- (Group D) Zotero http://www.zotero.org/ Wikipedia, Delicious, Twine
- Connotea Home, Wikipedia, Delicious, Twine
- Wikindx [8] Wikipedia
Comparison of reference management software. (2009, March 4). In Wikipedia, The Free Encyclopedia. Retrieved 09:29, March 4, 2009, from http://en.wikipedia.org/w/index.php?title=Comparison_of_reference_management_software&oldid=274835993
[edit] Social networks
- Delicious http://delicious.com/joseba_abaitua/network/library2.0
- SlideShare http://www.slideshare.net/group/library20
- Ning http://library20.ning.com/
- Twine http://www.twine.com/twine/11m4r4zkb-yq/library-2-0
- FriendFeed http://friendfeed.com/rooms/library20
- Research Gate https://www.researchgate.net/groups.GroupInfo.html?group=2991
[edit] Knowledge extraction
What is knowledge? Can epistemology help us find out?
Knowledge is defined in the Oxford English Dictionary as "(i) expertise, and skills acquired by a person through experience or education; the theoretical or practical understanding of a subject, (ii) what is known in a particular field or in total; facts and information or (iii) awareness or familiarity gained by experience of a fact or situation".
Ballard's (2004) descriptive formula "knowledge = theory + information" is a core principle underlying theory-based semantic technologies.
[edit] Artificial Intelligence
Artificial intelligence (AI) is the intelligence of machines and the branch of computer science which aims to create it. Major AI textbooks define the field as "the study and design of intelligent agents," where an intelligent agent is a system that perceives its environment and takes actions which maximize its chances of success. John McCarthy, who coined the term in 1956, defines it as "the science and engineering of making intelligent machines."
The problem of simulating (or creating) intelligence has been broken down into a number of specific sub-problems. These consist of particular traits or capabilities that researchers would like an intelligent system to display. The traits described below have received the most attention:
- Perception: ability to use input from sensors (such as cameras, microphones, sonar and others more exotic) to deduce aspects of the world.
- Learning: ability to find patterns in a stream of input.
- Natural language processing: ability to read and understand the languages that the human beings speak.
- Knowledge representation: representation of objects, properties, categories and relations between objects; situations, events, states and time; causes and effects; knowledge about knowledge (what we know about what other people know); and many other, less well researched domains.
- Social intelligence: ability to predict the actions of others, by understanding their motives and emotional states
- Deduction, reasoning, problem solving: algorithms that imitated the step-by-step reasoning that human beings use when they solve puzzles, play board games or make logical deductions.
- Creativity: both theoretically (from a philosophical and psychological perspective) and practically (via specific implementations of systems that generate outputs that can be considered creative).
- Planning: ability to set goals and achieve them.
- Motion and manipulation: to handle such tasks as object manipulation and navigation, with sub-problems of localization (knowing where you are), mapping (learning what is around you) and motion planning (figuring out how to get there).
- General intelligence: ability to combine all the skills above and exceeding human abilities at most or all of them.
Artificial intelligence. (2009, March 2). In Wikipedia, The Free Encyclopedia. Retrieved 09:09, March 4, 2009, from http://en.wikipedia.org/w/index.php?title=Artificial_intelligence&oldid=274447760
Examples:
- 2001: A Space Odyssey (film) by Arthur C. Clarke and Stanley Kubrick Wikipedia
- The Shutting Down of HAL 9000 YouTube, [9]
[edit] Knowledge management
Knowledge management comprises a range of practices used in an organisation to identify, create, represent, distribute and enable adoption of insights and experiences. Such insights and experiences comprise knowledge, either embodied in individuals or embedded in organisational processes or practice [10].
The literature provides many definitions of knowledge, most of which build the concept from data, to information, to knowledge. Some of the literature even takes this one step further and expands knowledge to understanding and wisdom (Ackoff 1989; Kannegieter 2001; Stewart 1999); however there is little agreement for a precise definition of knowledge (Biggam 2001, p. 2; Håkanson 2001, p. 3). Unfortunately data and information are often used interchangeably, and information and knowledge are used as synonymsDurant-Law Consulting Pty Limited (2004).
An established discipline since 1995, Knowledge Management (KM) includes courses taught in the fields of business administration, information systems, management, and library and information sciences (Alavi & Leidner 1999). More recently, other fields, to include those focused on information and media, computer science, public health, and public policy, also have started contributing to KM research.
KM efforts can help individuals and groups to share valuable organisational insights, to reduce redundant work, to avoid reinventing the wheel per se, to reduce training time for new employees, to retain intellectual capital as employees turnover in an organisation, and to adapt to changing environments and markets (McAdam & McCreedy 2000)(Thompson & Walsham 2004).
A basic expectation of scientific method is to document, archive and share all data and methodology so they are available for careful scrutiny by other scientists, thereby allowing other researchers the opportunity to verify results by attempting to reproduce them. This practice, called full disclosure, also allows statistical measures of the reliability of these data to be established.
Readings
- Thompson, Mark P.A. & Geoff Walsham (2004), "Placing Knowledge Management in Context", Journal of
Management Studies 41 (5): 725-747
- Knowledge management. (2009, February 27). In Wikipedia, The Free Encyclopedia. Retrieved 12:51, March 1, 2009, from http://en.wikipedia.org/w/index.php?title=Knowledge_management&oldid=273704371
- Epistemology. (2009, March 1). In Wikipedia, The Free Encyclopedia. Retrieved 08:48, March 1, 2009, from http://en.wikipedia.org/w/index.php?title=Epistemology&oldid=274014322
- Scientific method. (2009, February 27). In Wikipedia, The Free Encyclopedia. Retrieved 17:24, March 1, 2009, from http://en.wikipedia.org/w/index.php?title=Scientific_method&oldid=273762798
[edit] Related topics
- Data, information, knowledge
- What is a "knowledge unit"? GoolgeScholar
- Can we build new knowledge on top of aggregated shared knowledge units?
- How can be parametrize or rank our knowledge units based on:
- peer review (Scholar, Conferences, OAI-PMH)
- cross references (Scholar, CiteSeer)
- other notions of authority? GoogleScholar
[edit] Content annotation
Documents may not be the best means for the transmission of knowledge in the semantic web, Chris Welty and J. William Murdock (2006). But they still are the standard method for sharing and disseminating knowledge within the scientific community. It is not clear how can it be made explicit to computers. There are an number of mechanisms for knowledge representation:
- Formal languages, propositional logic, semantic networks, frames
- Markup, categories, metadata, social tags
- Ontologies, taxonomies, folksonomies
Readings
- Ronald J. Brachman, Hector J. Levesque (2004) Knowledge Representation and Reasoning, Morgan Kaufmann, ISBN-13: 978-1-55860-932-7. [11]
- Knowledge representation. (2009, February 19). In Wikipedia, The Free Encyclopedia. Retrieved 16:28, March 2, 2009, from http://en.wikipedia.org/w/index.php?title=Knowledge_representation&oldid=271817881
[edit] Markup languages
After the publication of XML official specification in 1998, markup languages based on XML became an efficient way of making explicit the interpretation of data, through metadata. RDF is a general method for conceptual description of information available on the Web.
- DC, Dublin Core Metadata Initiative http://dublincore.org/
- XML, eXtensible Markup Language http://www.w3.org/TR/2000/REC-xml-20001006
- RSS, Really Simply Syndicatyn http://web.resource.org/rss/1.0/ [12]
- TEI, Text Encoding Initiative http://www.tei-c.org/[13]
- DITA, Darwin Information Typing Architecture http://dita.xml.org/ [14]
- RDF, Resource Description Framework http://www.w3.org/TR/rdf-syntax-grammar/ [15]
- OWL, Web Ontology Language http://www.w3.org/TR/owl-features/ [16]
[edit] Content aggregation
A necessary solution to overcome information overload that complements information selection and filtering, is aggregation, particularly when information is redundant. Redundacy however help us detect possibly more relevant information. Techniques:
- Tag clouds [17]
- RSS aggregation [18]
- Linked data [19]
- Natural language processing
- Summarization
- Name entity recognition
- Terminology extraction
- Automatic ontology construction
Readings
- Salonen, J (2007). Self-organising map based tag clouds - Creating spatially meaningful representations of tagging data. Proceedings of the 1st OPAALS conference, 26-27 November 2007, Rome, Italy
- Tim Berners-Lee (2006). Linked data. Retrieved 2009 March 2 from http://www.w3.org/DesignIssues/LinkedData.html
[edit] Natural language processing
Getting explicit semantic content from text has attracted the scientific community for decades. The main computing and Internet companies have strong research groups that work on the field:
- Yahoo research: http://research.yahoo.com [20]
- Google labs: http://labs.google.com/ [21] [22]
- Microsoft group: http://research.microsoft.com [23]
Readings
- Christopher D. Manning, Hinrich Schutze (2003). Foundations of Statistical Natural Language Processing, MIT Press, ISBN 978-0262133609
- Peter Jackson, Isabelle Moulinier (2002). Natural Language Processing for Online Applications: Text Retrieval, Extraction, and Categorization, John Benjamins,ISBN 902724989X, 9789027249890
- Natural language processing. (2009, February 19). In Wikipedia, The Free Encyclopedia. Retrieved 16:47, March 2, 2009, from http://en.wikipedia.org/w/index.php?title=Natural_language_processing&oldid=271808914
[edit] NLP tools
- Powerset http://www.powerset.com
- TextRunner http://turingc.cs.washington.edu:7125/TextRunner/
- WikipediaMiner http://wikipedia-miner.sourceforge.net
[edit] Semantic searching
- DBpedia http://dbpedia.org [24] [25]
- YAGO http://www.mpi-inf.mpg.de/~suchanek/downloads/yago/
- NAGA http://www.mpi-inf.mpg.de/~kasneci/naga/
- CALAIS http://www.opencalais.com/
[edit] Content aggregators
- EVRI http://www.evri.com
- ANSWERS http://www.answers.com/
Related projects
- Fawiki http://www.faviki.com/
- LinkedData http://linkeddata.org/
[edit] Documentation
- Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum (2008). YAGO: A Large Ontology from Wikipedia and WordNet. Journal of Web Semantics 6-3: 203-217. Retrieved 2009 January 30 from http://www.mpi-inf.mpg.de/~suchanek/publications/jws2008.pdf [26]
- Wisam Dakka and W. Silviu Cucerzan (2008). Augmenting Wikipedia with Named Entity Tags. In Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP 2008), Hyderabad, retrieved 2009 February 19 from http://research.microsoft.com/en-us/people/silviu/ijcnlp08.pdf
- Peter Mika, Massimiliano Ciaramita, Hugo Zaragoza and Jordi Atserias (2008). Learning to tag and tagging to learn: A case study on Wikipedia. Yahoo! Research, Barcelona. Retrieved 2009 February 19 from http://grupoweb.upf.es/hugoz/pdf/mika_ieee08.pdf [27]
- David Milne and Ian H. Witten (2008). An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. Retrieved 2009 February 19 from http://www.aaai.org/Papers/Workshops/2008/WS-08-15/WS08-15-005.pdf
- Silviu Cucerzan. (2007). Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In Proceedings of Empirical Methods in Natural Processing (EMNLP 2007), Prague, Czech Republic. [28]
- Caecilia Zirn, Vivi Nastase and Michael Strube (2008). Distinguishing between Instances and Classes in the Wikipedia Taxonomy. The Semantic Web: Research and Applications. 5th European Semantic Web Conference, ESWC 2008, Tenerife. Retrieved 2009 February 27 from http://www.eswc2008.org/final-pdfs-for-web-site/onl-4.pdf [29]
[edit] See also
- Erick Schonfeld (2009, February 15). Mining The Thought Stream. TechCrunch. Retrieved February 16 2009 from http://www.techcrunch.com/2009/02/15/mining-the-thought-stream/
- Steven J. Bell (2009, February 17). The Library Web Site of the Future. Inside Higher Ed. Retrieved February 28 2009 from http://www.insidehighered.com/views/2009/02/17/bell
- Vuk Miličić (2008, December 9). Zemanta Launches Public Semantic API. Faviki Blog. The official blog of Faviki, a social bookmarking tool based on semantic Wikipedia tags. Retrieved February 2 2009 from http://faviki.wordpress.com/2008/12/09/zemanta-launches-public-semantic-api/
- Chris Welty and J. William Murdock (2006) Towards Knowledge Acquisition from Information Extraction. Retrieved 2009 March 2 from http://iswc2006.semanticweb.org/items/Welty2006tw.pdf
- Barney Pell (2007). POWERSET - Natural Language and the Semantic Web. The 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference, 2007. Retrieved January 18 from http://videolectures.net/iswc07_pell_nlpsw/
- Olena Medelyan, Catherine Legg, David Milne and Ian H. Witten (2008, September). Mining meaning from Wikipedia. Working paper, The University of Waikato, retrieved 2009 February from http://arxiv.org/ftp/arxiv/papers/0809/0809.4530.pdf
- Clay Shirky (2005). Ontology Is Overrated, at IMCExpo in April entitled "Folksonomies & Tags: The rise of user-developed classification" retrieved 2009 February from http://www.shirky.com/writings/ontology_overrated.html and http://itc.conversationsnetwork.org/shows/detail470.html
- Alireza Noruzi (2005). Google Scholar: The New Generation of Citation Indexes. International Journal of Libraries and Information Services 55-4: 169-235, retrieved 2009, February from http://www.librijournal.org/pdf/2005-4pp170-180.pdf
- Marilyn Christianson (2007). Ecology Articles in Google Scholar: Levels of Access to Articles in Core Journals. Retrieved 2009 March 1 from http://www.istl.org/07-winter/refereed.html
- OpCit Project (2009). The effect of open access and downloads ('hits') on citation impact: a bibliography of studies. Retrieved 2009 February 23 from http://opcit.eprints.org/oacitation-biblio.html

