An advertising program that helps Google to earn billions of dollars of revenue. In essence, it is a keyword purchasing program.
See also Keyword purchasing, Click through rate (CTR), Cost-per-click (CPC)
An advertising program that helps Google to earn billions of dollars of revenue. It is a keyword purchasing program whereby you display Google’s AdWords ads on your site and share revenue with Google. Premium AdSense services are available for large sites, i.e. sites that receive 5 million+ search queries or 20 million+ content page views a month.
See also Keyword purchasing, Click through rate (CTR), Cost-per-click (CPC)
Blog or Web log
A blog (short for “web log”) is a type of web page that serves as a publicly accessible personal journal (or
log) for an individual. Typically updated daily, blogs often reflect the personality of the author. Blog software usually has archives of old blogs and is searchable. Frequently, blogging software is used by web pages providing excellent information on many topics, although very frequently the content is personal and requires careful evaluation.
To follow links in a page, to shop around in a page, exploring what’s there, a bit like window shopping (but you can’t type keywords to search). The opposite of browsing a page is searching it. When you search a page, you find a search box, enter terms, and find all occurrences of the terms throughout the site. When you browse, you have to guess which words on the page pertain to your interests. Searching is usually more efficient, but sometimes you find things by browsing that you might not find because you might not think of the “right” term to search by.
Browsers are software programs that enable you to view documents on the Internet. They “translate” HTMLencoded files into the text, images, sounds, and other features you see. Microsoft Internet Explorer (called simply IE), Netscape, Mosaic are examples of browsers that enable you to view text and images and many other web features. They are software that must be installed on your computer.
A bot is a software tool for digging through data. You give a bot directions and it brings back answers. The word is short for a ‘robot’. Google’s crawler is nicknamed Googlebot, for example.
In browsers, “cache” is used to identify a space where web pages you have visited are stored in your
computer. A copy of documents you retrieve is stored in cache. When you use GO, BACK, or any other
means to revisit a document, the browser first checks to see if it is in cache and will retrieve it from there, because it is much faster than retrieving it from the server.
In search results from Google, Yahoo! Search, and some other search engines, there is usually a cached link which allows you to view the version of a page that the search engine has stored in its database. The live page on the web might differ from this cached copy, because the cached copy dates from whenever the search engine’s spider last visited the page and detected modified content. Use the cached link to see when a page was last crawled and, in Google, where your terms are and why you got a page when all of your search terms are not in it.
Cascading style sheets (CSS)
A W3C recommended language for defining style (such as font, size, color, spacing, etc.) for web
documents. This file is a hub that allows you to control style of all your site’s pages from one place.
Learn more about CSS in this CSS tutorial
CGI stands for ‘Common Gateway Interface’ – a standard interface between web server software and other programs running on the same machine. CGI Program is any program which handles its input and output data according to the CGI standard. In practice, CGI programs are used to handle forms and database queries on web pages, and to produce non-static web page content.
A computer, program or process which makes requests for information from another computer, program or process. Web browsers are client programs. Search engine spiders are (or can be said to behave as) clients.
Click through rate (CTR)
The number of times visitors click on a hyperlink (or advertisement) on a page, as a percentage of the
number of times the page has been displayed. Good ranking may be useless if visitors do not click on the
link which leads to the indexed site. The secret here is to provide a good descriptive title and an accurate
and interesting description.
The hiding of page content. Normally carried out to stop page thieves stealing optimized pages.
Information from a web server, stored on your computer by your web browser. The purpose of a cookie is to provide information about your visit to the website for use by the server during a later visit. It’s like us getting a ticket or a customer card at a shop, spa or movie cinema when we go there for the first time. This card ensures that we are remembered when we come back and helps the service provider give us a better (i.e. personalized) service. In technical terms, a message from a web server computer, sent to and stored by your browser on your computer. When your computer consults the originating server computer, the cookie is sent back to the server, allowing it to respond to you according to the cookie’s contents. The main use for cookies is to provide customized Web pages according to a profile of your interests.
Cost per click (CPC)
This is a measure of what you pay Google and other search engines for displaying your ad. As a rule, every time someone clicks on your ad, you get charged a certain amount, i.e. your CPC.
See also AdSense, AdWords
That part of a search engine which surfs the web, storing the URLs and indexing the keywords and text of each page it finds. Google’s crawler is called Googlebot.
See also Spider, Bot
Data stored in a computer in such a way that a computer program can easily retrieve and manipulate the
data. A database system is a computer program (like MS Access, Oracle, and MySQL) for manipulating data in a database.
An internet link which doesn’t lead to a page or site, probably because the server is down or the page has
moved or no longer exists. Most search engines have techniques for removing such pages from their listings automatically, but as the internet continues to increase in size, it becomes more and more difficult for a search engine to check all the pages in the index regularly. Reporting of dead links helps to keep the indexes clean and accurate, and this can usually be done by submitting the dead link to the search engine.
The removal of pages from a search engine’s index. Removal can occur for various reasons, includingsunreliabilty of the machine that hosts a site or because of perceived attempts at spamdexing.
A name that identifies one or more IP addresses. For example, the domain name microsoft.com represents about a dozen IP addresses. Domain names are used in URLs to identify particular web pages. For example, in the URL http://www.google.com/about.html, the domain name is google.com.
Information on web pages which changes or is changed automatically, e.g. based on database content or
user information. You can spot that that dynamic content is being used, if the URL ends with .asp, .cfm, .cgi or .shtml. But it is also possible to serve dynamic content using standard (normally static) .htm or .html type pages. Search engines will currently index dynamic content in a similar fashion to static content, although they will not usually index URLs which contain the ? character.
Dynamically generated pages
Pages created as the result of a search are called dynamically generated pages. The answer to your query is encased in a web page designed to carry the answer and sent to your computer. Often the page is not
stored anywhere afterward, because its unique content (the answer to your specific query) is probably not of use to many other people. It’s easier for the database to regenerate the page when needed than to keep it around.
The opposite of a dynamic page is a “static” page. Static pages reside on servers, each identified by a
unique URL, and waiting to be retrieved when their URL is invoked. Spiders can find a static page if it is
linked to in any other page they “know” about. They follow links to it and retrieve it much as you would by clicking if you knew the link. Static pages are not invisible, although search engines might choose to omit them for policy reasons discussed below.
An HTML technique for combining two or more separate HTML documents within a single web browser
screen. Compound interacting documents can be created to make a more effective web page presented in
multiple windows or sub-windows.
A framed web site often causes great problems for search engines, and may not be indexed correctly. Search engines will often index only the part of a framed site within the <NOFRAMES> section, so make sure that the <NOFRAMES> section includes relevant text which can be indexed by the spiders. If your site uses frames, include proper scripting to allow search engines “see” the framed content. Submit the main page, the one containing the <FRAMESET> tag to the search engines. If you use a gateway page, submit this separately.
See also NOFRAMES tag
File Transfer Protocol. Ability to transfer rapidly entire files from one computer to another, intact for
viewing or other purposes.
Google is play on “Googol” – the mathematical term for a 1 followed by 100 zeros. Google was the name
Larry Page and Sergey Brin selected for their future company in September 1998. Google Inc. is the
developer of the award-winning Google search engine, which is designed to provide a simple, fast way to
search the Internet for information. Offering users access to an index comprising more than 8 billion URLs, Google is the largest search engine on the World Wide Web. In 2004, Google became a publicly traded company, with over 5 billion dollars in revenue. Google created the largest and the best search engine in the world called Googlebot.
Many search engines give extra weight and importance to the text found inside HTML heading sections. It is generally considered good advice to use headings when designing web pages and to place keywords inside headings.
Text on a web page which is visible to search engine spiders but not visible to human visitors. This is
sometimes because the text has been set the same colour as the background, because multiple TITLE tags have been used or because the text is an HTML comment. Hidden text is often used for spamdexing. Many search engines can now detect the use of hidden text and often remove offending pages from their database or lower their positioning.
HyperText Markup Language – the (main) language used to write web pages.
HyperText Transfer Protocol – the (main) protocol used to communicate between web servers and web
IE stands for ‘Microsoft Internet Explorer’ browser. Browsers are software programs that enable you to view www documents.
Some types of pages and links are excluded from most search engines by policy. Others are excluded
because search engine spiders cannot access them. Pages that are excluded are referred to as the Invisible Web, i.e. what you don’t see in search engine results. The Invisible Web is estimated to be two to three or more times bigger than the visible web.
IP is an Internet Protocol number or Internet Protocol address. This is a unique number consisting of 4 parts separated by dots, e.g. 126.96.36.199 Every machine that is on the Internet has a unique IP address. If a machine does not have an IP number, it is not really on the Internet. Most machines also have one or more Domain names that are easier for people to remember.
See also Domain name
A computer programming language whose programs can run on a number of different types of computer
and/or operating system. Used extensively to produce applets for web pages.
A simple interpreted computer language used for small programming tasks within HTML web pages. The scripts are normally interpreted (or run) on the client computer by the web browser. Some search engines have been known to index these scripts, presumably erroneously.
A word which forms (part of) a search engine Query.
A property of the text in a web page which indicates how close together the keywords appear. Some search engines use this property for positioning. Analyzers are available which allow comparisons between pages. Pages can then be produced with the similar keyword densities to those found in high ranking pages.
Keyword domain name
The use of keywords as part of the URL to a website. Positioning is improved on some search engines when keywords are reinforced in the URL.
A phrase which forms (part of) a search engine query.
Glossary of Web
The buying of search keywords from search engines, usually to control banner ad placement.
See also AdWords, AdSense
The repeating of keywords and keyword phrases in links, META tags, copy of the page or elsewhere.
Link is a colloquial for a hyperlink. A pointer to another document. Most often a pointer to another web
page. A hyperlink is also a synonym for a hotlink and sometimes called a hypertext connection to another document or web page.
Tags inserted into documents to describe the document. A META Tag is a construct placed in the HTML
header of a web page, providing information which is not visible to browsers (i.e. users). The most common META tags (and those most relevant to search engines) are KEYWORDS and DESCRIPTION.
The KEYWORDS tag allows the author to emphasize the importance of certain words and phrases used
within the page. Some search engines will respond to this information – others will ignore it. Don’t use
quotes around the keywords or key phrases.
The DESCRIPTION tag allows the author to control the text of the summary displayed when the page
appears in the results of a search. Again, some search engines will ignore this information. The HTTPEQUIV meta tag is used to issue HTTP commands, and is frequently used with the REFRESH tag to refresh page content after a given number of seconds. Gateway pages sometimes use this technique to force browsers to a different page or site. Most search engines are wise to this, and will index the final page and/or reduce the ranking. Infoseek has a strong policy against this technique, and they might penalize your site, or even ban it by removing it from Index.
Other common meta tags are GENERATOR (usually advertising the software used to generate the page) and
AUTHOR (used to credit the author of the page, and often containing e-mail address, homepage URL, etc.).
<TITLE>PulseHR: Recruitment of Foreign Nurses</TITLE>
<meta name=”DESCRIPTION” CONTENT=”PulseHR is a recruiting agency specializing in
recruitment of foreign nurses into the United States, Canada, and the United Kingdom”>
<meta name=”KEYWORDS” CONTENT=”recruitment of foreign nurses in USA, foreign nurse
recruitment, foreing nurse recruitment, foreign nurses to Canada, foreign RNs, recruitment of foreign
RNs, recruit foreign nurses, hiring foreign nurses, hiring international nurses, international nurses,
international recruitment of foreign nurses, requirements to recruit foreign nurses, nursing
immigration, employment based immigration of nurses, immigration of foreign nurses, foreign nurses
<meta name=”robots” content=”noarchive”>
Also known as Doorway pages or Doorway sites. Multiple copies of identical web sites or web pages, often on different servers. The process of registering these multiple copies with search engines is often treated as spamdexing, because it artificially increases the relevancy of the pages. Many search engines now remove multiple mirrors from the indexes.
It used to be possible to repeat the HTML title tag in the header section of a page several times to improve search engine positioning. Most search engines now detect this trick. Below is an example of what would be considered multiple Titles or Title tags.
<TITLE>PulseHR: Recruitment of Foreign Nurses</TITLE>
<TITLE>PulseHR: Foreign Nurse Recruitment</TITLE>
<TITLE>PulseHR: Recruitment of International Nurses</TITLE>
See also Spamdexing, Tags
Multiple keyword tags
The use of more than one Keywords META tag in order to try to increase the relevancy of the best keywords on a page. This is not recommended. It may be detected as a spamming technique, or all but one of the tags may simply be ignored.
See also META Tags
The NOFRAMES tag allows no-frames browsers see what’s inside the frame. This is particularly important for search engine optimization. It is best not to use frames when designing a site, but if frames are used, the following examples shows how to open the framed content for search engines:
<frameset border=”1″ cols=”200,*” frameBorder=”0″ frameSpacing=”4″>
You should include HTML here to support web crawlers and browsers that don’t support frames.
You may want to include a second copy of your index and set your colors in the BODY statement
above the same as you would in your index file.
<frame name=”left” src=”htmlindex.html”>
<frame name=”right” src=”htmlintroduction.html”>
The NOSCRIPT tag allows browsers to “see” what users would see when they push a button (i.e. view
dynamic content). NOSCRIPT is shown by script-aware browsers if scripting is disabled or a scripting
language which it did not understand was used. NOSCRIPT supports all core attributes, international
attributes, and events, though they are not needed. If you wish to make pages which are widely compatible and will work with the next generation of browsers without breaking, it is wise to use the Type AND Language attributes, while avoiding the Src, For and Event attributes. You should also provide alternate content. For example:
<SCRIPT TYPE=”text/vbscript” LANGUAGE=”VBScript”>
ge 13 of 20
A measure of the number and quality of links to a particular page (inbound links). Many search engines (and most noticeably Google) are increasingly using this number as part of the positioning process. The number and quality of inbound links is becoming as important as the optimization of page content. A free service to measure page popularity can be found at http://www.linkpopularity.com.
Pop-up window is a small box that appears over a visited page to deliver information or display an ad. Most people find them very annoying and chose to block them through their browser settings (under Tools).
The process of ordering web sites or web pages by a search engine or a directory so that the most relevant sites appear first in the search results for a particular query. There are a number of software programs that can be used to determine how a URL is positioned for a particular search engine when using a particular search phrase.
A method of modifying a web page so that search engines (or a particular search engine) treat the page as more relevant to a particular query (or a set of queries).
A word, a phrase or a group of words, possibly combined with other syntax used to pass instructions to a search engine or a directory in order to locate web pages.
The process of informing a search engine or directory that a new web page or web site should be indexed.
The method a search engine or directory uses to match the keywords in a query with the content of each
web page, so that the web pages found can be ordered suitably in the query results. Each search engine or directory is likely to use a different algorithm, and to change or improve its algorithm from time to time.
Relevancy ranking of search results
The most common method for determining the order in which search results are displayed. Each search tool uses its own unique algorithm. Most use “fuzzy and” combined with such factors as how often your terms occur in documents, whether they occur together as a phrase, and whether they are in Title or how near the top of the text. Popularity is another ranking system.
Repeating the search engine registration process one or more times for the same page or site. Under
certain circumstances, this is regarded with suspicion by the search engines, as it could indicate that
someone is experimenting with spamming techniques.
The Infoseek and Altavista search engines are particularly vulnerable to spamming because they list sites
very quickly, and are thus easy to experiment with. Both engines de-list sites for repeated re-submission
and Infoseek, for example, does not allow more than one submission of the same page in a 24 hour period.
Occasional re-submission of changed pages is not normally a problem.
Any browser program which follows hypertext links and accesses web pages but is not directly under human control. Examples are the search engine spiders, the “harvesting” programs which extract e-mail addresses and other data from web pages and various intelligent web searching programs. A database of web robots is maintained by Webcrawler.
A text file stored in the top level directory of a web site to deny access by robots to certain pages or subdirectories of the site. Only robots which comply with the Robots Exclusion Standard will read and obey the commands in this file. Robots will read this file on each visit, so that pages or areas of sites can be made public or private at any time by changing the content of robots.txt before re-submitting to the search engines.
Take a look this simple example
provided by Google: <META NAME=”Googlebot” CONTENT=”nofollow”>
In this example, a robot should
neither index this document, nor
analyze it for links.
<META NAME=”ROBOTS” CONTENT=”noindex, nofollow”>
In this example, you are asking
robots not to archive your pages,
so that your old pages, the ones
you have removed, for example,
do not get displayed in search
<META NAME=”ROBOTS” CONTENT=”noarchive”>
For more information about robots.txt see also HTML Author’s Guide to the Robots META tag.
Note, however, currently only few robots support the robot tag.
See also Meta Tags, Tags
RSS stands for Really Simple Syndication. It is a format for distributing news content. News sites publish via
RSS then individuals and websites automatically get the updated content.
A server or a collection of servers dedicated to indexing internet web pages, storing the results and
returning lists of pages which match particular queries. Some of the major search engines are Google,
Altavista, MSN, Excite, Hotbot, Infoseek, Lycos, and Webcrawler. Note that Yahoo is a directory, not a search engine.
The term Search Engine is also often used to describe both directories and search engines.
Search Engines for the general web (like all those listed above) do not really search the World Wide Web
directly. Each one searches a database of the full text of web pages selected from the billions of web pages out there residing on servers. When you search the web using a search engine, you are always searching a somewhat stale copy of the real web page. When you click on links provided in a search engine’s search results, you retrieve from the server the current version of the page.
Search engine databases are selected and built by computer robot programs called spiders. Spider is that
part of a search engine which surfs the web, storing the URLs and indexing the keywords and text of each page it finds. Google’s spider, also called crawler, is called Googlebot. Although it is said they “crawl” the web in their hunt for pages to include, in truth they stay in one place. They find the pages for potential inclusion by following the links in the pages they already have in their database (i.e., already “know about”). They cannot think or type a URL or use judgment to “decide” to go look something up and see what’s on the web about it.
If a web page is never linked to in any other page, search engine spiders cannot find it. The only way a brand new page – one that no other page has ever linked to – can get into a search engine is for its URL to be sent by some human to the search engine companies as a request that the new page be included. All search engine companies offer ways to do this.
After spiders find pages, they pass them on to another computer program for “indexing.” This program identifies the text, links, and other content in the page and stores it in the search engine database’s files so that the database can be searched by keyword and whatever more advanced approaches are offered, and the page will be found if your search matches its content.
Some types of pages and links are excluded from most search engines by policy. Others are excluded
because search engine spiders cannot access them. Pages that are excluded are referred to as the “Invisible Web”, i.e. what you don’t see in search engine results. The Invisible Web is estimated to be two to three or more times bigger than the visible web. (Source: Berkeley University)
SEO or search engine optimization involves changes made to a web page to improve the positioning of that page with search engines. A means of helping potential customers or visitors to find a web site.
Optimization may involve design/layout changes, new text for the title-tags, meta-tags, alt- attributes,
headings, and changes to the first 200-250 words of the main text. A large image map at the top of a page should be moved further down the page. Frames should be avoided (unless navigational links are also provided within the frames). Keyword-rich copy should be present on all pages. Quality links, particularly,incoming links, should be created. Large sites rank better than small sites.
A computer, program or process which responds to requests for information from a client. On the internet, all web pages are held on servers. This includes those parts of the search engines and directories which are accessible from the internet.
The use of various means to steal another site’s traffic. Techniques used include the wholesale copying of
web pages (with the copied page altered slightly to direct visitors to a different site, and then registered
with the search engines) and the use of keywords or keyword phrases “belonging” to other organizations,
companies or web sites.
The alteration or creation of a document with intent to deceive an electronic catalog or filing system. Any
technique that increases the potential position of a site at the expense of the quality of the search engine’s database can also be regarded as spamdexing, also known as spamming or spoofing.
Spamming is also used more generally to refer to the sending of unsolicited bulk electronic mail, and the
search engine use is derived from this term.
See also Spamdexing
That part of a search engine which surfs the web, storing the URLs and indexing the keywords and text of each page it finds. Please refer to the Search Engine Watch SpiderSpotting Chart for details of individual spiders. See also Robot.
The process of surfing the web, storing URLs and indexing keywords, links and text. Typically, even the
largest search engines cannot spider all of the pages on the net. This is due to the huge amount of data
available, the speed at which the new data appears, the use of politeness windows and practical limits on
the number of pages that can be visited in a given time. The search engines have to make compromises in order to visit as many sites as possible, and they do this in different ways. For example, some only index the home pages of each site, some only visit sites they’re explicitly told about, and some make judgment about the importance of sites (from number and quality of inbound links) before “digging deeper” into the sub-pages of a site.
See also Spamdexing
Static pages reside on servers, each identified by a unique URL, and waiting to be retrieved when their URL is invoked. Spiders can find a static page if it is linked to in any other page they “know” about. They follow links to it and retrieve it much as you would by clicking if you knew the link. Static pages are not invisible, although search engines might choose to omit them for policy reasons discussed below.
The opposite of a static page is a “dynamically generated” page. Pages created as the result of a search are called dynamically generated pages. The answer to your query is encased in a web page designed to carry the answer and sent to your computer. Often the page is not stored anywhere afterward, because its unique content (the answer to your specific query) is probably not of use to many other people. It’s easier for the database to regenerate the page when needed than to keep it around.
See also Dynamically generated page
In database searching, “stop words” are small and frequently occurring words like and, or, in, of that are
often ignored when keyed as search terms. Sometimes putting them in quotes ” ” will allow you to search
them. Sometimes + immediately before them makes them searchable. As a rule, it is advisable to check
with search engines themselves as to what they omit and what not. See how Google works
Notifications or commands written into a web document. Tags is a colloquial name for ‘HTML Tags’, a code to identify the different parts of a document so that a web browser will know how to display it. Tags are usually included in the brackets: <tag content>. The end of a tag is identified as </tag content>.
Some of the most important tags from the SEO perspective are the following:
Title tag, META tags, NOFRAMES, NOSCRIPT, and NOARCHIVE
The visitors to a web page or web site. Also refers to the number of visitors, hits, accesses etc. over a given period.
A real visitor to a web site. Web servers record the IP addresses of each visitor, and this is used to
determine the number of real people who have visited a web site. If for example, someone visits twenty
pages within a web site, the server will count only one unique visitor (because the page accesses are all
associated with the same IP address) but twenty page accesses.
Universal Resource Locator. An address which can specify your website uniquely.
For example: www.AnnaTulchinsky.com.
The writing of text especially for a web page. Similar to the writing of copy for any other type of
publication, good web copywriting can have a great effect on search engine positioning, so it forms a major part of optimization. See also SEO (Optimization)
Anna Tulchinsky Website Design and Marketing