You are on page 1of 9

A Proposed Method to Hide Text Inside

HTML Web Page File

Prof. Dr. Ala’a H. Al-Hamami*


Mazin S. Al-Hakeem** Mohammed A. Al-Hamami*

Abstract:

Steganography provides for the embedding of information in a block of


host data in conditions where perceptible modification of the host data is
intolerable. Steganographic techniques are highly dependent on the character
of the host data.

A technique for embedding information in image makes subtle changes


in hue, while a method for embedding information in audio data could exploit
the limitations of the human ear by encoding the encapsulated information in
inaudible frequency ranges. Current implementations of textual Steganography
exploit tolerances in typesetting by making minute changes in line placement
and kerning in order to encapsulate hidden information, making them
vulnerable to simple retype setting attacks.

In this research we present a method to hide text in HTML Web page


files using HTML characteristics without given suspicious to the hidden data.
The embedding process and the extracting process needs keys to embed and
extract hidden data.

Keywords:
HTML Web Page, HTML tags, tag’s attribute, embed data, extract data,
cover-text, stego-text, Web page background, Web page text color, Meta-tags,
Robots/Index META element, Encrypt Secret Data.

1- Introduction:

Every few years, computer security has to re-invert itself. New


technologies and new applications bring new threats, and force us to invent
new protection mechanisms. Cryptography has become important when
businesses started to build networked computer system; virus epidemics started
once large numbers of personal computer (PC) users were swapping programs;
and when the Internet took off, the firewall industry was one of the first to
benefit.

* Al-Rafidain University College, Computer Science Dept.


** University of Technology, Computer Science Dep.

1
One of the newest hot spots in security applications is information
hiding. It is driven by two of the biggest policy issues of the information age-
copyright protection (Watermark) and state surveillance (Steganography).

The general model of hiding data in other data can be described as


follows: The embedded data is the message that one wishes to send secretly. It
is usually hidden in an innocuous message referred to as a cover-text, cover-
image or cover-audio as appropriate, producing the stego-text or other stego-
object. A stego-key is used to control the hiding process so as to restrict
detection and/or recovery of the embedded data to parties who know it (or who
know some derived key value)[1]. The figure(1) is show the core part
Information Hiding.

Figure(1): Information Hiding

Since the Internet technologies are progressing remarkably, the amount


of information transmitted electronically increases. There are much of the
applications that process not only plain text but also formatted data written in
markup language such as HTML. HTML is beginning used as a fundamental
technology to exchange information on the Web.

Recently, the technical methods protecting the copyright of the various


digital contents are highly concerned. Developing the protection techniques
fitted to HTML contents are expected.

On the other hand, the communications surveillance system that includes


sophisticated filtering technologies is begin developed. And the communication
techniques that such a system can’t detect attracted much public attention.

Because documents written in HTML are widely distributed on Web,


there is a meaning to examine the possibility of the communication that
detection is difficult in consideration of the technical validity of the watch
system as well.

2
Documents written in HTML are distributed widely on Web. Therefore, in
consideration of the technical validity of such watch system, there is meaning to
examine the possibility of the communication using HTML that is difficult to
detect.

Technical researches meeting such are in the area of information hiding,


apart of information security. Information hiding is the technology that hides
secrets into electronic data, briefly. In this area, there are methods to set secret
communication channel where the identification of the sender is difficult,
methods to keep the information existence secret, and methods for digital
watermarking[2].

2- HTML Characteristics:

HTML(from HyperText Markup Language) is a Language that process


not only plain text but also formatted data written[3]. HTML is widely regarded
as the standard publishing language of the World Wide Web.

HTML gives authors the means to Publish online documents with


headings, text, tables, lists, images, etc. Retrieve online information via
hypertext links, design forms for conducting transactions with remote services,
for use in searching for information, making reservations, ordering products,
etc. Include spread-sheets, video clips, sound clips, and other applications
directly in their documents.

Each HTML files (Web pages) must start with an HTML element (tag), that
containing a HEAD element (tag) and then a BODY element (tag)[4].

<HTML>
<HEAD>
<TITLE>A simple web page</TITLE>
... other head elements
</HEAD>
<BODY>
... document body
</BODY>
</HTML>

HTML document include the HADE element (tag). The contents of the
document head is an unordered collection of the following elements (tags) such
as:

• The TITLE element (tag): defines the HTML document (Web page file)
title, and is always needed.

3
• The META element (tag): used to supply Meta info as name/content
pairs. For examle, adding Robots-META element to HTML document
(Web page file) is enhance the probability that a search engine shows
web page as a search resoult for a certain request.

Examples:

<META name=”Robots” content=”Index”>


<META name=”Robots” content=” Follow”>
<META name=”Robots” content=” Not-index”>

The Robots/Index META element allows the Robot (also called


Web Crawler or Web Spider) to indexing the specific Web Page. The
Robots/Follow META element allows to indexing another Web Pages
linked with the current Page, but the Robots/Follow META element not
allows indexing any Web Page in current Web Site.

Every HTML document must include the BODY element (tag). This contains
the document body. The body can contain a wide range of elements (tags)
such as:

• The P element: paragraphs.


• The UL element: unordered lists.
• The OL element: ordered (i.e. numbered) lists.
• The CENTER element: text alignment.
• The FORM element: fill-out forms.
• The TABLE element: Tables.
• The IMG element: Put image.
• The A element: Create hyper link.

The key attributes are BACKGROUND, BGCOLOR, TEXT, LINK, VLINK


and ALINK. These can be used inside BODY element (tag) to formatted data
written such as set a repeating background image, plus background and
foreground colors for normal text and hypertext links.

Example:

<body bgcolor=white text=black link=red vlink=maroon alink=fuchsia>

where:
o The bgcolor attribute: Specifies the background color for the document
body.
o The text attribute: Specifies the color used to stroke the document's text.
o The link attribute: Specifies the color used to stroke the text for unvisited
hypertext links.

4
o The vlink attribute: Specifies the color used to stroke the text for visited
hypertext links.
o The alink attribute: Specifies the highlight color used to stroke the text
for hypertext links at the moment the user clicks on the link.
o The background attribute: Specifies a URL for an image that will be used
to tile the document background.

3- The Proposed Method:

The proposed method is to hide text inside HTML file (HTML Web Page
file) which is based on the HTML Web page text as a cover-text. The secret
data is embedded inside the HTML Web page text while the meaning of the
original Web page text preserved, and transmit as a stego-text.

The main idea of the proposed method is to hide a secret data in the
HTML file by useful from existence of the white space inside the Web page text.
Where we can embed one character of secret data per each white space. The
secret data is colored by the same color of HTML Web page background (Secret
Data Color = Web Page Background Color). The bgcolor attribute is Specifies
the background color for Web Page, so the secret data is colored by bgcolor
attribute value. Then characters of the secret data colored are insert inside
white spaces on the original HTML Web page text.

We can encrypt the secret data colored -e.g., by using DES method-
before insert inside original HTML Web page to increase the security level.

To prevent the Web page from shown as a searching resoult for a


certain search request, the Robots/Not-index META element is inserted inside
the HTML Web page file. This is enhance the probability of not shows the Web
page front ahead the Web browser when browsing the net. This embedding
process is shown in figure(2).

5
Figure(2): Supposed embedding process

The algorithm of supposed embedding process is:

Input:-
HTML Web page (as Cover-text) & Secret Data & Stego key
Output:-
HTML Web page (as Stego-text) to transmit
Process:-
Get bgcolor attribute value (as Stego key)
Color Secret Data with Stego key
Encrypt Secret Data colored (e.g., by using DES method)
Repeat
Embed each character of Secret Data per a one white space
Until Secret Data is finished
Insert Robots/Not-index Meta-tag in HEAD part

6
To extract the embedded data from stego-text, we must used HTML
Web page text color as a stego key to colored background HTML Web page
(Web Page Background Color = Web page Text Color). The text attribute
specifies the color used to stroke the document's text, so the Web Page
Background is colored by text attribute value. Then the secret data is
appearing. This extracting process is shown in figure(3).

Figure(3): Supposed extracting process

The algorithm of supposed extracting process is:

Input:-
Received HTML Web page (Stego-text) & Stego key
Output:-
HTML Web page (Cover-text) & Secret Data
Process:-
Get Text attribute value (as Stego key)
Set bgcolor with Stego key

The figures (4) & (5) are shown the stego-text for proposed method
before and after extracting process, with the following secrt data “THIS IS MY
HIDDEN TEXT”.

7
Figure(4): stego-text before extracting process

figure(5): stego-text after extracting process

8
4- Conclusion:

steganography unlike cryptography, does not depend on algorithm for


implementation. It depends on the human behaviour and way of thinking and
then using an algorithm to hide the secret.

The most successful hiding method is the uncommon one. Although the
proposed method is simple but it is unthinkable one and needs a knowledge
and experience to be discovered.

5- References:

1. Mohammad A. Al- Hamami "Information Hiding Attack in Images",


M.Sc thesis, the Informatic Institute for Postgraduate Studies of the Iraqi
Committee for Computers and Informatic, 2002.
2. Fabien A. P. Petitcolas, Ross J. Anderson, and Markus G. Kuhn,
"Information Hiding – A Survey", available at:
http://www.cl.cam.ac.uk/~fapp2/publications/ieee99-infohiding.pdf
3. W3C Recommendation, “HTML 4.01 Specification", Available at:
http://www.w3.org/TR/html401/
4. “HTML Tag Reference", Available at:
http://developer.netscape.com/docs/manuals/htmlguid/tags2.htm
5. Shingo Inoue, Kyoko Makino, Ichiro Murase, Osamu Takizawa, Tsutomu
Matsumoto, Hiroshi Nakagawa, “A Proposal on Information Hiding
Methods Using XML”, Available at: http://afnlp.org/nlprs2001/WS-
NLPXML/pdf/8_inoue.pdf.
6. Tim Berners-Lee, “Weaving the Web”, San Francisco:HarperCollins, 1999.
7. WWW Journal-Issue3, “The Interview with Tim Berners-Lee", Available
at: http://www.wwwjournal.com/issue3/timinterview.html

You might also like