Johns Hopkins UniversityThe Sheridan Libraries



Photo of a librarian helping students with research.
Staff DirectoryPersons With DisabilitiesContact UsSite MapHours

HomeLibrary ServicesOnline ResourcesCatalogsResearch HelpCollections

Ask a Librarian
How Do I...
Forms
My Account
About Us
Info For . . .
Giving
Search
LIBRARY BLOG

Spotlight

Box Office Hits Available on DVD

Thanks to the generosity of the Friends of the Libraries, popular movies are now available for 1-week loans.  DVDs have been added to the current fiction and non-fiction books in the McNaughton Collection. More...

Archives 


  


Home > Research Help > humanities > Issues in Using E-Texts


Circuit boardIssues in Using Electronic Texts
 


There are 2 basic components of an electronic text project:

  • the digitization of the text itself. This can be simply scanned page images, that are not searchable. It can also include running the scanned pages through a software program called Optical Character Recognition (OCR) that digitizes the individual words on a page, so that they are searchable.
  • the building of structural systems to allow retrieval and navigation. Metadata is information about the content and nature of the electronic document, that is added to a database and is invisible to the end-user. Tagging is also often performed on the data. Tagging is structural metadata, that defines the structure of a document, or defines the pieces of a document (such as paragraphs, images, chapters, notes, table of contents, tables, title page, index, etc.)


Types of E-texts:

  • A database of page images, plus a search interface. Some metadata will be present, so that searching the citation, but not the full-text, is possible.
  • The above content, plus metadata, plus "dirty ASCII". Basically what is included is page images plus searchable text. OCR has been performed on the page images, so that full-text searching is available
  • A database of texts that have been keyed in by hand, resulting in a high degree of accuracy. Often, these keyed-in texts have also been marked up ("tagged") with SGML or XML, so that searching is quite efficient
This link opens a new window - Info About English Poetry English Poetry  (All US JHU)
  • Keyed-in texts that are also critical editions, and that include some critical apparatus. These have also been marked up, for better searching
  • The above database, plus page images
Le Roman de la Rose
  • A database of primary sources plus secondary literature. Literary texts are included, in the context of critical material. Sometimes, the secondary material can be quite extensive, as in the online archive projects
The Rossetti Archive
Princeton Dante Project

Uncle Tom's Cabin


Search level or depth will vary. Depending on the nature of the e-text collection, you will be able to search on different things:

  • only the metadata, if only page images are present. That is, the record or citation for the text, including title, author, publication information, and maybe subjects.
  • usually the full-text of the works if OCR has been performed on the page images
  • if the database also includes secondary literature, access to more than just the texts themselves will be available: the context as well
  • a related issue is the ability to search across texts. Sometimes search capability is limited to one text at a time; sometimes the ability to search across multiple texts is present.


Search engines will vary in sophistication, and may not be present at all.

  • in the absence of a search interface, the Find in Page function can be used in the Browser. Depending on the display, this could limit searching to a page at a time, so there is no ability to search an entire work.
  • search engines may only consist of a single search box, with no field searching capability
  • advanced search options will often provide a means to search individual fields and/or to combine fields
  • other limits can sometimes be present in sophisticated search engines: the ability to limit by date, format, language, etc.


Uses for e-texts:

  • word searches, to find occurrences, frequency, and contexts of individual words or phrases
  • create concordances
  • trace a theme or motif across a large corpus of texts, or an individual writer's works
  • find particular passages or quotations in large bodies of texts
  • make connections between 2 or more words, concepts, themes

Issues in using e-texts:

  • changes in the format and materiality of texts result in changes in their appropriation. How e-texts are read and used may be quite different from how their original texts, in codex form, are read and used.
  • E-texts can be reconfigured, reformed, copied, moved, rewritten, and manipulated in ways no written text ever could.
  • there are many interfaces to contend with in viewing and searching e-texts. And these interfaces change often.
  • e-texts on the open Internet come and go, appear and disappear, or at the very least, change their "location".
  • which editions are used as the basis of an e-text. Are the best critical editions chosen? Are the easiest editions (those out of copyright) used? Are only translations used, without access to the original language? Which translations are used?
  • e-texts are most often created by commercial publishers, who think in terms of a product to market, not in terms of research needs or methodologies.
  • perhaps the essential problem with current access to e-texts is their disparate nature. They exist freely all over the Internet, or as very expensive commercial products, or as individual hobby-type projects, or as semi-professional online archives, or as highly sophisticated digital archive projects undertaken by research institutions with major grant funding. Quality control and consistency is almost non-existent.


STAFF DIRECTORY | PERSONS WITH DISABILITIES | CONTACT US | SITE MAP | HOURS

Sheridan Libraries
3400 North Charles Street, Baltimore, MD 21218
(410)-516-8335
Copyright 2008 | Disclaimer | Privacy Policy