Quantcast
Channel: BestTechVideos: Videos Tagged with 'Text Mining'
Viewing all articles
Browse latest Browse all 31

Propping Open the Document Trapdoor

$
0
0

Propping Open the Document Trapdoor

Computer document processing often starts with an abstract, structural, representation before entering a processing pipeline which creates a desired layout and appearance. But unfortunately the whole system resembles a series of steps in a one-way chemical reaction, or the successive irreversible stages of creating assembler code using a compiler.

This `one-way function' behaviour is most obvious with PDF, which is tied to a completely fixed appearance once a document passes through a one-way 'trapdoor' like Adobe Distiller. Some formats, such as XHTML, allow for a little more wriggle room but even this breaks down if the appearance changes dramatically (such as displaying a Web page on a large monitor). In essence, any attempt to reflow a document, or view it at some other size, is either frustrating, or simply impossible, without regenerating the document from a more abstract, higher-level representation.

This limitation has not had much effect over the past 25 years, but it is now hitting us hard. In a world of iPhones, eBook Readers, 10" netbooks, laptops, 30" Cinema Displays -- and not forgetting the humble printed page -- it is no longer safe to assume that a document will be viewed in one fixed presentation. `Repurposing' (without the need for total re-processing) needs to be the watchword for a modern document format. However, this leads us to the heart of the problem: current formats don't lend themselves to having their presentational properties partially unpicked and re-engineered.

In this talk, we outline the current state of the art in document formats, and their limitations when it comes to repurposing. We describe our attempts at making PDF be a more repurposable format and we outline some necessary features, and open questions, for future document formats.

Steven R. Bagley & David F. Brailsford, School of Computer Science, University of Nottingham, NOTTINGHAM NG8 1BB , UK
Google Tech Talks
November 5, 2009

Read more about this video…


Want more on these topics?
Browse the archive of posts filed under Conferences, Companies, Science

Viewing all articles
Browse latest Browse all 31

Latest Images

Trending Articles





Latest Images