In this paper, we introduce a Document REpresentation for Scalable Structure (DRESS) to help information providers to make composite documents, typically Web pages, scalable in both logic and layout structure to support effective information acquisition in heterogeneous environments. Through this novel document representation structure based on binary slicing trees, the document can dynamically adapt its presentation according to display sizes by maximizing the information throughput to users.
Web browsing, adaptive content delivery, slicing tree, layout optimization
As a great many of new devices with diverse capabilities are making a population boom on the Internet mobile clients, their limited display sizes become a serious obstacle to information accessing. Many efforts have already addressed the problem of Web browsing on small terminals and various solutions have been proposed including some commercial products. Current approaches can be divided into two directions: the first one is to transform existing Web pages such as , while the other attempts to introduce new formats and mechanisms  which make Web pages themselves scalable to different display sizes.
We focus on the problem of defining a scalable and efficient Web document representation structure that is adaptive to various display sizes. In this paper, we propose to use binary slicing trees, a data structure widely used in computer aided design community, to represent a page template that can be controlled by the author. When the display area shrinks, some parts of the Web pages will be compressed in terms of linguistic summaries, and then presented, together with other unsummarized parts, adaptively to end users with aesthetic layouts.
Definition 1: The DRESS for a Web page is a binary slicing tree with N leaf nodes. Each inner node is labeled with either v or h denoting vertical or horizontal split, and each leaf node is an information block defined as follows:
the ith information block in the Web page
IMPi, importance value of Bi
MPSi, minimal perceptible size of Bi
MPHi, minimal perceptible height of Bi
MPWi, minimal perceptible width of Bi
ADJi, whether the aspect ratio of Bi is adjustable
ALTi, alternative of Bi
Figure 1. (a) An example Web page. (b) The slicing tree.
As shown in Definition 1, DRESS is a slicing tree with each leaf node as an information block in the Web page. The label on each inner node determines how the display area is recursively subdivided into sub-rectangles by slicing vertically (v) or horizontally (h). An information block will be placed in the sub-rectangle held by the leaf node. An example Web page and its corresponding slicing tree are shown in Figure 1 (a) and (b) respectively. Our definition of slicing tree template does not cover where to split, i.e. the ratio of each split, which is often referred to as the slicing number. In our approach, all the slicing numbers will be adjusted adaptively to the display size.
We introduce importance value (IMP), a quantified value of author's subjective evaluation on an information block, as an indicator of the weight of each block in contribution to the whole page information. This value is used when choosing the less important blocks for summarization under small displays. We normalize all the weights of information blocks in a single page so that their sum is 1.
Obviously, the information delivery of a block is significantly relying on its area of presentation. If an information block is scaled down too much, it may not be perceptible enough to let users catch the information that authors intend to deliver. Therefore, we introduce minimal perceptible size (MPS), minimal perceptible height (MPH), and minimal perceptible width (MPW), to denote the minimal allowable spatial area, height, and width of an information block, respectively. They are used as thresholds to determine whether an information block should be shrunken or summarized when rendering the adapted view.
Adjustability (ADJ) denotes whether the aspect ratio of an information block is adjustable. For example, if the content block is a pure text block or a mixture of images and texts, e.g. a news paragraph, it can be wrapped and adapted to fit into the aspect ratio of final display region. However, when the information block is a table like navigation bar, or a large image, the aspect ratio is usually fixed. In this case, the value of ADJ should be set to false and we fix its aspect ratio at MPW/MPH.
As regards to those information blocks of less importance such as decorations or advertisements, it is desirable to summarize them in order to save display space for more important blocks. Also when dealing with a block of large MPS which cannot be displayed without excessive shrinking due to the limited display size, a summary with a link to the original contents is more preferable.
Information fidelity introduced here is a subjective comparison of a modified version of content object with the original version. The value of information fidelity is confined between 0 (lowest, all information lost) and 1 (highest, all information kept just as original). For a Web page P consisting of several blocks, the resulting information fidelity is defined as a weighted sum of the information fidelity of all blocks in P, Straightforwardly, we can employ the importance values from DRESS as the weights of contributions to the perceptual quality of the whole page. Thus, the information fidelity of an adapted result is described as
This is used as the object function of our adaptation algorithm. In this paper, we assume IF value only depends on the version of content block, i.e., whether it is summarized or not. If a content block is replaced by its alternative, the IF value of this block is defined to be 0, otherwise it is 1.
We introduce P' as the set of unsummarized information blocks in a Web page P, . Thus, our misson is to find the block set P' that carries the largest information fidelity while meets the display constraints. In order to ensure that all the content blocks are possible to be included in the final presentation, the following constraint should be satisfied.
where Area is the size of target area and size (x) is a function which returns the size of display area needed by ALTi. If the constraint (3) is transformed to
then the Web layout optimization problem becomes:
We can see that problem (5) is equivalent to a traditional NP-complete problem, 0-1 knapsack.
Since constraint (3) does not ensure that the MPH or MPW will be satisfied, we solve the problem by a two-level approach. First we use a branch and bound algorithm to enumerate all possible block set P', then for each block set, we use a capacity ratio based slicing algorithm to test whether a valid layout can be found. By this process, we search among all possible block set P' to select the optimal one, i.e. the scheme that achieves the largest IF value while presents an aesthetic view. The capacity ratio based slicing method includes two steps. First, we go through the slicing tree in a bottom-up way to calculate the capacity, height and width constraints for each node. The second step compares the capacities of two sub-trees of each inner node and let the slicing number of that node be proportional to them. After layout optimization, each block, whether summarized or not, will be assigned a rectangle region for display.
We have implemented a prototype of DRESS-based Web browser to validate the performance of our proposed scheme. With DRESS and the page rendering algorithms, the browser can dynamically select and summarize proper parts of Web pages, then optimize the layout for target area, and finally adapt block contents to fit into the corresponding regions allocated. Experimental results are very encouraging.
By integrating both techniques from computer aided design and information extraction, DRESS allows easy negotiation between author and viewer, hides layout manipulation from author but still keeps the result structure predicable, and represents contents in dynamic layouts to fit various user preferred display sizes. Currently, we are developing a Web authoring tool to create and insert DRESS structure with ease of use. Though we focus on a new document representation, automatic transformation from existing Web contents is also crucial to its incremental deployment. We are looking forward to automatic DRESS generation by leveraging our previous experience on Web page analysis.