Introduction
All of the options that appear on the Advanced tab of the Pro tool are described below.
![]() |
Advanced tab |
Skip
You may not always wish to check certain parts of your site. This option allows you to skip parts of the site by specifying one or more paths below the starting page to ignore.
When you click on the 'Skip' button a dialog appears to allow you to add, remove, and update a list of paths to skip.
The paths that you enter here must start with a '/'. Any pages that lie below the starting page and within the path specified here will be ignored.
For example if your starting page is 'http://mysite.com/somepath/index.html' and you specify a path to skip of '/otherpath' then pages that start with 'http://mysite.com/somepath/otherpath' will be skipped. Note that this includes pages such as 'http://mysite.com/somepath/otherpath/index.html' and 'http://mysite.com/somepath/otherpathaswell/index.html'. If you just wish to skip 'http://mysite.com/somepath/otherpath/index.html' then specify a path to skip of '/otherpath/'.
You can use this option in combination with the 'Include' option to provide further restrictions on what to check. For example you could set a path to skip of '/' to skip everything except the starting page, and then use 'Include' to specify exactly which paths should be validated. You can also use the 'robots.txt' option at the same time for further fine-grained selection of what to validate.
You can use regular expression syntax here, but you must always start with '/' as the first character and note that .* is always automatically added to the end of whatever you enter.
Include
If you specify some paths to skip, or use Disallow within your robots.txt file, you may wish to override this to include some paths within these areas of your website that you do wish to check. Note that it only makes sense for these include paths to be 'below' the paths to skip or 'below' paths disallowed in the robots.txt file.
When you click on the 'Include' button a dialog appears to allow you to add, remove, and update a list of paths to include.
The paths that you enter here must start with a '/' and be more than just a single '/'.
For example if your starting page is 'http://mysite.com/somepath/index.html' and you specify a path to skip of '/otherpath/', and an include path of '/otherpath/subpath/' then pages such as 'http://mysite.com/somepath/otherpath/index.html' will be skipped, but pages such as 'http://mysite.com/somepath/otherpath/subpath/index.html' will be included (as long as you have a link to them from any pages that are validated).
If you wish to skip everything except a single folder you could set a path to skip of '/' (to skip everything except the starting page), and then use 'Include' to specify exactly which paths below this should be validated. You can also use the 'robots.txt' option at the same time for further fine-grained selection of what to validate.
You can use regular expression syntax here, but you must always start with '/' as the first character and note that .* is always automatically added to the end of whatever you enter.
Use robots.txt
An alternative way of specifying which parts of your site to skip is to add a standard robots.txt file to you website. Total Validator will use any rules marked for all user agents with a *, as well as those specifically marked with a user agent of 'TotalValidator'. For example:
User-agent: * Disallow: /blogs User-agent: TotalValidator Allow: /support/ Disallow: /support/resources/
Total Validator supports all of the features supported by Google including multiple 'Disallow:' and 'Allow:' statements in any order, wildcards and suffixes.
Note that paths in a robots.txt file are relative to the root of the site and not the starting page for validation unlike the 'Skip' and 'Include' options. The starting page itself will always be validated even if the robots.txt file disallows it. This option can also be used in combination with the 'Skip' and 'Include' options for fine-grained selection of what to validate.
Follow remote links
When checking more than one page; those pages that don't start with the URL of the starting page will be ignored. This includes pages on remote sites and pages 'above' the starting page or in a different part of the website.
Selecting this option will cause the validator to ignore this restriction and so will visit all the pages linked to the starting page regardless of their URL. This applies to the starting page only, so that remote links on subsequent pages will be ignored.
Use this option with care otherwise you may end up checking far more pages than intended. It is expected that in most cases this option will be used with a specially constructed starting page that references different parts of the same website.
Strip query
Some websites are constructed such that query parameters are dynamically added to links on their pages such that the links are different each time the page is served. This is a problem for Total Validator which treats these links as being to different pages because the URLs are different. This means that it will test the same page(s) again and again.
If this happens to you then use this option to prevent it. The links will then be stripped of all query parameters before being used. Note that this may mean that not all pages are checked, depending on how the query parameters are used.
Strip session
Some websites are constructed such that session ids may be dynamically added to links on their pages. These links typically add these session ids to the end of the link using a semicolon ';' to separate them like so:
http://thewebsite.com/path/page.html;jsession=123456
This can sometime be a problem for Total Validator which may view two links to the same page as referring to different pages because the URLs are different. This means that it may test the same page(s) again and again.
If this happens to you than use this option to prevent it. The links will then be stripped of the semicolon and everything following this up to the start of any query parameters or to the end of the URL if there are none.
Validate errors
If you select this option then whenever your web server returns an error status code such as 404 (page not found), then the error page sent by the web server will be validated.
This is a useful way of checking that your error pages also conform to standards.
Save results to
Normally results are saved to your documents folder. This can sometimes cause issues, or is just inconvenient.
Use this option to select an alternative folder to save the results to. You can use the Browse function to select a folder or enter one directly. But note that if the folder doesn't already exist then it will be ignored.
If you do enter a folder and but later wish to go back to saving to the default location then just delete whatever is entered (i.e. leave it blank)
Unique report
When results are saved they are normally saved with the first page called 'TotalValidator.html' and the rest of the pages in a subfolder called 'Results'. So each time you run a new validation the old results are overwritten.
If you select this option the first page will be called TV<timestamp>.html and the subfolder TV<timestamp>, where <timestamp> is in the format YYYYMMDDHHMMSS. Although not guaranteed this should make the results unique so that they are not overwritten each time.
It is expected that this option will be used with the 'Save Results To' option for people wishing to keep old results.
Hide content
Normally the output report displays the content as well as tags on the page. However you can use this option to hide the content of the web page. Setting this option may make it easier to locate and resolve problems.
Note that if you are spell checking the content, then any content with spelling mistakes in will be always displayed whatever the setting of this option.
Extra/Own dictionary
Use this option to enter the path to a file containing a list of your own words to supplement the standard dictionaries or to replace them.
If you select a language from the Spell check option then the dictionary you supply will be used in addition to the standard dictionary for the selected language.
If you select 'Own dictionary' from the the Spell check option, then any dictionary you specify will be the only one used for performing the spell check.
In this way you could check against a language not currently supported, or simply add your own set of industry specific words for the area your website covers.
Your dictionary file must be a plain text file consisting of one word per line, with no duplicates, and must be saved using UTF-8 encoding.
Spelling options
By default certain types of words are not spell checked. This includes words that are all upper case (e.g. NASA), words that are mixed case (e.g. SpellCheck), and words that contain digits (e.g. Homer6). Use the 'Check upper case', 'Check mixed case' and 'Check words with digits' options respectively to include these types of words in the spell checking.
Words within attributes are not normally checked. However you can use the 'Check attributes' option to spell check text within the following displayable attributes: alt, title, summary, label, prompt, and standby.
When a word is not found in the dictionary a list of suggestions is normally presented. But with the 'Ignore suggestions' option you can suppress this list.
By default if the web page has a language set and this language is not compatible with the language chosen for spell checking, then a warning is added to the page and the spell check skipped. But with the 'Ignore web page language' option you can force the spell check to proceed whatever the language of the web page. For reference:
- American or British spell check runs when web page language set to 'en' or 'eng'
- French spell check runs when web page language set to 'fr', 'fre', or 'fra'
- Italian spell check runs when web page language set to 'it' or 'ita'
- Spanish spell check runs when web page language set to 'es', 'esl', or 'spa'
- German spell check runs when web page language set to 'de', 'deu', or 'ger'
To allow you to quickly create your own dictionary of additional words the 'Save misspelt words' option will save all of the words not found by the spell checker into a separate file. This text file may then be edited and used as an extra dictionary. Note that this file is reused each time you run a validation with this option and the result after each validation will be a sorted list of words with no duplicates. So you may wish to use a copy of this file as the extra dictionary. A file is created and used for each language and has the same name as the dictionary used for testing. It may be found within the 'dicts' subfolder next to the place where results pages are saved. For example: 'My Documents\TotalValidatorTool\dicts\ukenglish.dic.new'. Note that this file is saved using UTF-8 encoding.
Timeout
When checking for broken links then if a page doesn't respond within 20 seconds it is recorded as broken. If you link to pages on particularly slow Web Servers, then you can use this option to increase the time Total Validator waits before recording it as a broken link.
You can also reduce the time down to 10 seconds if you wish. The advantage of doing this is that the whole process may finish a lot quicker if you do have broken links, but there is a danger that some slow links will be reported as broken.
The value that you enter here must be an integer (whole number) from 10 to 120 representing the number of seconds to wait for a response for a link before reporting it as broken. If you leave it blank it will default to 20 seconds.
Concurrency
When checking for broken links some routers cannot handle high numbers of requests. Also some web servers cannot cope with a lot of requests for pages from the same site at the same time. This option allows you to set how many simultaneous links checks will be performed.
If you have a fast router and/or are checking a fast site then increasing this value will generally make the validation run faster. Enter 0 to remove all limits for the fastest results.
The value that you enter here must be an integer of 0 or greater. If you leave it blank it will default to 10.
Report redirects
When checking for broken links you can use this option to additionally report warnings for any links that are 'redirects' to another place.
These are reported as warnings as they are not strictly errors. But because a redirected link can often become obsolete and so broken it may be wise to replace any such links with the ones being redirected to.
Three types of redirects are reported: Permanently Moved (301), Temporarily Moved (302), and See Other (303).
Ignore errors/warnings
If you use the tool and it reports that there are 'errors' in your site that you are happy to live with, then use this option to stop them appearing. In this way you can clean up the reports produced to make them more useful to you. You could also ignore any errors/warnings that you think are errors in the tool itself, although we would prefer it if you could let us know so we can fix them so that everyone will benefit.
This value you supply must be a comma separated list of errors and/or warnings to ignore. For example:
E601, W600, E404
Once you've seen how the errors/warnings are reported we are sure you'll understand what to put in here.