GREP 7.2 — User Guide
Find Regular Expressions in Files

Program Dated 13 Jan 2003  /  Document Dated 13 Jan 2003
Copyright © 1986-2003 Stan Brown, Oak Road Systems

Summary:  GREP searches named input files, or the standard input, for lines that contain matches for one or more patterns called regular expressions and displays those matching lines. GREP can also search binary files and display records or buffers that contain matches.

This user guide provides an overview of GREP. Details of the command-line options and the use of regular expressions are in the reference manual, and a full revision history is also provided. This user guide is sometimes revised between software releases. You may want to check for revisions at <http://oakroadsystems.com/sharware/grep.htm>.

Contents: 


Why GREP? Why This GREP?


The DOS filter FIND is useful for finding a given string in one or more files. But what if you want to find the word the in caps or lower case, without also finding other, There, then, and so on? You don't really want to search for a specific string. Rather, what you're looking for is a regular expression or regex, namely the preceded and followed by something other than a letter. GREP to the rescue!

GREP takes one or more regexes, matches them against the input files, and displays the hits.

Oak Road Systems GREP combines most features of UNIX grep, egrep, and fgrep. GREP has many other advantages over FIND besides using regular expressions. Indeed, customers have cited some of these as features they couldn't find in competing GREPs:


Getting Started


System Requirements

The 16-bit version, GREP16, runs under DOS 2.0 or higher, including a DOS box under Windows. The 32-bit version, GREP32, requires a DOS box under Windows 98, Win95, or Win NT 4.0. (I fully expect it to run in all later versions of Windows, but have not tested it.)

The two executables operate the same and have the same features, except that you need GREP32 for long filenames, for extended regular expressions, and for character mapping. If you typically run GREP in a DOS box under Windows 9x or later or Windows NT, GREP32 is the one you want.

Installation and Demo

There is no special installation procedure. Simply move GREP16.EXE, GREP32.EXE, or both to any convenient directory in your path.

An interactive program tour is included; just type TOUR after unZIPping the archive.

You may wish to rename the executable you use more often to the simpler GREP.EXE. All the examples in this user guide will assume you've done that. Otherwise, just substitute GREP16 or GREP32 wherever you see GREP in the examples.

Evaluation, License, and Warranty

GREP is shareware. You are encouraged to "try before you buy" with the free download from sites like Simtel, garbo, and the Oak Road Systems site.

The unregistered version displays a three-line registration reminder when you run it. But there is no time delay and you don't have to press any extra keys.

If you use GREP past a 30-day evaluation period, you must register and pay for it. Please see the file LICENSE.TXT for full details, including support and warranty information.

When you register, you get the registered version with these benefits:

Uninstall

There is no special uninstall procedure; simply delete the GREP files. GREP doesn't write any secret files or modify the Windows registry.


Command Line


The basic GREP command form is

        grep [options] [regex] [inputfilespecs] 

As with any command, you can redirect or pipe inputs or output. GREP can return a useful value in ERRORLEVEL, as explained below.

For a summary of operating instructions, type

        grep /? | more 

The help text is over 100 lines long; you might prefer to redirect it to your printer or a file:

        grep /? >prn: 

regex is a regular expression; see Regular Expressions below. A regex is normally required on the command line; however, if you use the /F option, one or more regexes will be taken from a file or the keyboard instead of the command line.

Command-line options can actually appear anywhere, not just before the regex. The first thing that isn't an option is taken as the regex, and everything else that isn't an option is taken as input filespecs. All the options are processed before any files are scanned, so it doesn't matter whether a given option comes before, after, or among the filespecs.

Example:

        grep /I pic[t\s] \proj\*.cob 

will examine every COBOL source file in the PROJ directory and display every line that contains a picture clause ("pic" followed by either "t" or a space) in caps or lower case (the /I option).

        grep /I pic[t\s] \*.cob /S 

will examine every COBOL source file in all directories on the current disk (the /S option).


Inputs


GREP scans either named input files or the standard input. The standard input in turn can be a named file, a pipe, or the keyboard. Thus GREP can take its input from any one of these four sources.

If you name input filespecs on the command line, GREP will take its input from those files. The second section below tells you how GREP handles named input files.

Standard Input and Redirection

If you don't specify any named input files, GREP will take its input from the standard input. That can mean any of these three sources:

GREP actually can have up to three types of file inputs: regular expressions (/F option), lines to be scanned for matches, and a list of files to scan for matches (/@ option). Any of the three can come from standard input (depending on options), and standard input could be from the keyboard, piped, or redirected. Beginning with release 7.0, when GREP is waiting for keyboard input it will prompt you for the specific type it is expecting.

Example:

        grep /F- inputfilespecs 

tells GREP to read one or more regexes from the keyboard, rather than take a regex from the command line. Grep will prompt you with "regex:" for each regex, then after you've entered your regex(es) it will read the named input filespecs and match them against the regex(es) you typed.

For another example of redirection, please see the /@ option in the reference manual.

Named Input Files

Named input files provide the greatest flexibility. They can be read as text or binary, and you can search subdirectory trees.

GREP will expand any wildcards in input filespecs. Not only DOS-style * and ?, but UNIX-style [...] can be used. For instance, "c:\My Documents\[abc]*doc" tells GREP to examine any file in the indicated directory that starts with A, B, or C and ends with DOC. Please see "Input Filespecs" in the reference manual for complete rules.

You also use the /X option to exclude some files or groups of files from consideration.) For instance, if you want all 2001 reports except December, you might specify something like

        grep [options] [regex] *2001* -x*dec2001* 

If you have many input filespecs, you may want to store them in a file; see the /@ option.

GREP32 will use long filenames; GREP16 will use short filenames.

Subdirectory Searches

If you set the /S option, GREP will search not only the filespecs indicated on the command line, but also the same-named files in subdirectories.

For example, with the command

        grep /S regex \hazax* *.c g:\mumble\*.htm 

GREP will examine all files on the entire current drive whose names start with hazax; then it will look at all C source files in the current directory and all subdirectories under it; finally it will look at all HTML files in directory g:\mumble and all subdirectories under it.

Perhaps a more realistic example: you have a document about Vandelay Industries somewhere on your disk, but you can't remember where. You can find it this way:

        grep Vandelay /S \*
or:     grep Vandelay /S \*.* 

(Both * and *.* select all files.) You might want to add the /I option if you can't remember how "Vandelay" was capitalized.

Subdirectory search follows the normal file-searching rules: hidden and system subdirectories are normally ignored. (Yes, you have them if you have Windows 9x.) The /A option also applies during subdirectory search: with /S and /A together, GREP will search every subdirectory. There's no way to search every subdirectory but only normal files, or to search only normal subdirectories but to search for hidden files in them.

You may want to know in what order GREP examines files when the /S option is set. (If not, skip this paragraph and the next.) Ordinarily, GREP examines all files in the first file argument, including the subdirectory tree, then proceeds to the second file argument, and so on. However, when you use the /S option and none of the file arguments contains a path, GREP will look first for all those files in the current directory, then for all of them in the first subdirectory, and so on.

If you specify a list of input files with the /@ option, GREP will process the first filespec in that list and all subdirectories, then process the second filespec and subdirectories, and so on. When the /@ list file is exhausted, GREP will go on to process any filespecs on the command line, in the order given in the preceding paragraph.

(The /S option is fully functional in the registered version, and will search all the way to the bottom of a directory tree. In the unregistered version, GREP will search the named or implied directories and all directories immediately below them, but no further in any one execution. You can either make multiple runs, or register GREP for the convenience of searching the entire directory tree.)

The /D option will show you every directory and wildcard search as GREP performs it. The output also contains lots of other stuff, but the records of file visits all contain the string "GX:".

Binary Files and Text Files

GREP was originally written with plain text files in mind, but you can also use it quite well with binary files.

What's the difference between text and binary modes?

DOS doesn't mark a file as text or binary; the program that reads the file just has to know. GREP "knows" files are binary when you tell it via the /R2 or /R3 option; otherwise it treats input files as text. If GREP reads a file in text mode but the file is actually binary, some matches may be missed. It's important, therefore, to scan binary files in binary mode.

Registered users can use the /R-1 or /R-2 option to have GREP examine each file and decide whether it's text or free-form binary; I recommend /R-1. Please see the /R option for details on how GREP decides.

Here's a comparison of the three ways GREP can read input files.

line-oriented text
(/R0)
record-oriented binary
(/R2)
free-form binary
(/R3)
(/R0) The file is read a line at a time. Any line bigger than the /W option value is read in chunks with each chunk treated as a line. (/R2) The file is read a record at a time; the record length is given by the /W option. (/R3) The file is read in overlapping half-buffers. The /W option gives the buffer size; see that option description for recommended buffer size.
(/R0) A line ends with a carriage return or line feed (ASCII 13 or 10) or both. (/R2, /R3) ASCII 13 and 10 have no special meaning.
(/R0) Control-Z (ASCII 26) marks the end of file. (/R2, /R3) The file length is given by the directory entry. Control-Z is just another character.
(/R0, /R2) The regex characters ^ and $ mean the start and end of a line or record. (/R3) The characters ^ and $ in an extended regex match a newline (ASCII 10). In a basic regex they don't match anything useful.
(/R0, /R2) The /V option looks for lines or records that don't contain a match. (/R3) The /V option makes no sense with free-form binary processing, unless you use it with the /L option to report files that contain no matches to the regex at all.

The file format not only affects how the file is read (above), but it also affects how hits are displayed:

line-oriented text
(/R0)
record-oriented binary
(/R2)
free-form binary
(/R3)
(/R0, /R2) When a match is found, the matching line or record is displayed, unless you used the /C option, /J option, or /L option. (/R3) The /C option, /J option, or /L option is strongly recommended. But if you don't use any of them, then when a match is found, GREP displays the buffer that contains it.
(/R0, /R2) With the /N option, GREP displays the line or record number with each hit. (/R3) With the /N option, GREP displays the starting byte number with each hit. The first byte in the file is numbered 0.
(/R0) Matching lines are output as character streams. GREP doesn't check for control characters like form feed (ASCII 12) and backspace (ASCII 8); if they are output to the terminal, output may be formatted strangely. (/R2, /R3) Printable characters are displayed normally, and non-printable characters are displayed by their hexadecimal values, such as <18> for Control-Z (ASCII 26, or 18 hex). GREP16 considers characters 0-31 and 127-255 as non-printable characters; in GREP32 that is the default but you can change it by setting a character mapping with the /M option.
(/R0, /R2) The /P option specifies how many lines or records from the file to display before and after each hit. (/R3) The /P option is ignored.

Outputs


Normally, GREP will display hits on your screen. "Hits" are the text lines, binary records, or binary buffers that contain matches for the regex(es). As part of the output, GREP will display the filespec (path and name) as a header above the group of hits from that file. You can use various options to display abbreviated or expanded forms of hits or to suppress those headers, move them to the lines with the hits, or display headers even for files that had no hits.

You can also redirect GREP's output into a file or pipe GREP's output to another command (perhaps another GREP command). To redirect GREP output, follow the DOS rules and put one of these at the end of the GREP command line:

You can pipe or redirect output regardless of whether input was piped or redirected.

Only the hits (and filespec headers, if present) are redirected by the above syntax. Errors and warning messages are still sent to the standard error stream. That is usually your screen, though some OSes or shell replacements let you redirect error output. For example, in 4DOS and 4NT type "help piping" or "help redirection" (without quotes) for information.

The /D option lets you create extra debugging output and send it to a named file or the standard error output.


Options


The reference manual describes the options in detail. Here's a one-line summary of what each option does. Each description is hyperlinked to the full description of that option in the reference manual:

Option and Effect UNIX grep *
DOS FIND *
 ?  Display help for filespecs, regexes, and options.   --help   /?
 @  Take input filespecs from keyboard or file.    
 A  Include hidden and system files when expanding wildcards.    
 B  Display a header for every file, even if it contains no hits.    
 C  Display the hit count, not the actual hits.   -c   /C
 D  Display debugging output.    
 E  Select extended regular expressions or strings, or search for a word.   (-E), (-w)  
 F  Read regexes from keyboard or file.   (-f)  
 H  Don't display headers (filespecs) in output.   -h  
 I  Ignore case when matching.   -i   /I
 J  Display just the part of each line that matches the regex.    
 K  Report only the first few hits.    
 L  List the files that contain hits, not the actual hits.   -l  
 M  Specify character mapping and define "word".    
 N  Show line numbers with hits.   -n   /N
 P  Show context lines around matching lines.   (-A, -B, -C)  
 Q  Suppress program logo and some or all warnings.   (-s)  
 R  Read and display input files as binary or text.   -U, (-a)  
 S  Scan files in subdirectories too.   -r  
 U  UNIX-style output: show filespec with each hit.    
 V  Display lines that don't contain a match.   -v   /V
 W  Specify line width or binary block length.    
 X  Exclude matching files from scan.   -x  
 Y  Multiple regular expressions must all match.    
 Z  Reset all options.    
 0  Set ERRORLEVEL = 0 if any hits were found.    
 1  Set ERRORLEVEL = 1 if any hits were found.   (-v)  
* UNIX grep options are case sensitive; GREP and FIND options are not.
(An option is shown in parentheses if the GREP option's effect is similar but not identical.)

How to Specify Options

On the command line, options can appear anywhere, before or after the regex and the input filespecs. All options are processed before any files are read.

You have a lot of freedom about how you enter options: use a leading hyphen or slash, use upper- or lower-case letters, and leave spaces between options or combine them. For instance, the following are just some of the different ways of turning on the /P3 option and /B option:

        /p3 -b    /b/P3    /p3B    -B/P3    -P3 -b 

This user guide will always use capital letters for the options, to make it easier to distinguish letter l and figure 1.

For clarity, you should always use a hyphen or slash before the numeric /0 option or /1 option. Example: /E0 means the /E option with a value of 0, but /E/0 means the /E option with no value specified, followed by the /0 option.

Environment Variable

Registered users who use certain options frequently can put them in the ORS_GREP environment variable. You have the same freedom as on the command line: leading slashes or hyphens, space separation or options run together, caps or lower case.

Example: If you prefer to have GREP sense the type of each file (/R-1 option) and you prefer UNIX-style output (/U option) with line numbers (/N option), then you want to set the environment variable as

        set ORS_GREP=/R-1UN 

The reference manual gives more information about the environment variable, including instructions for overriding a particular stored option on the command line.


Regular Expressions (Regexes)


A regular expression or regex is a pattern of characters that will be compared to lines, records, or buffers from one or more input files. GREP reports a hit if the input contains a match with the pattern in the regex.

A regex can be a simple text string, like mother, or something more complex. (If you want to search only for simple strings, use the /E0 option and ignore all this regex stuff.)

Regexes by Example

Example 1: If you want both the English and the American spellings of the word grey/gray, use gr[ea]y as your regex. (See Example 5 for colour/color.)

Example 2: The basic regex for any word starting with "moth" is moth[a-z]*, which is the letters "moth" followed by any number of letters a through z. Yes, that regex does match "moth" itself: see * or + for Repetition in the reference manual.

Example 3: A word in double quotes would be matched by "[a-z]+". Read that regex as "a double quote mark, followed by one or more letters, followed by another double quote mark."

Example 4: A U.S. local telephone number has the basic regex

        [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9] 

That is three digits, followed by a hyphen, followed by four digits. (You could express it more simply with an extended regex: [0-9]{3}-[0-9]{4} or even \d{3}-\d{4}.)

Example 5: To get the American and English spellings of color/colour is easy with GREP32: specify an extended regex (/E2 option) colou?r. GREP16 doesn't support extended regexes, so you could either use colou*r (which would also match the non-words colouur, colouuuuur, etc.), or else use the /F- option and enter color and colour as two regexes.

Regex Language Summary

A regex, then, is essentially a string of characters with a bunch of operators thrown in to express possibilities like "any of these characters" and "repeated". Here's a quick summary of the characters that have special meaning in a regex. Each of them is hyperlinked to the section of the reference manual where you'll find a full description.

which regexes?description
Characters with special meaning outside square brackets:       
\ backslash any treat any of the listed special characters as normal
\ backslash extended (1) character types like \w for a word character;
(2) simple assertions like \b for a word boundary;
(3) back references to parenthesized subexpressions;
(4) character encoding for odd characters like \x3c for <
. period any matches any character
* asterisk any matches 0 or more occurrences of the preceding
+ plus sign any matches 1 or more occurrences of the preceding
? question mark extended matches 0 or 1 occurrence of the preceding
{ left brace extended repetition count, e.g. {3,} for three or more occurrences of the preceding
[ left square bracket any start a character class, e.g. [abcde] to match any one of a, b, c, d, e
^ caret any match start of line in text mode or start of record in binary mode
$ dollar sign any match end of line in text mode or end of record in binary mode
| vertical bar extended alternatives, e.g. mother|father to match "mother" or "father"
(...) parentheses
or round brackets
extended subexpressions, e.g. (&nbsp;)+ to match one or more occurrences of "&nbsp;"
Characters with special meaning inside square brackets:       
] right square bracket any end the character class
- minus sign or hyphen any character range, e.g. [a-z] to match any lower-case English letter
^ caret any negate the character class, e.g. [^a-z] to match any character except a lower-case English letter
\ backslash any treat the next character as normal
\ backslash extended character encoding
[: left square bracket
followed by colon
extended introduce a named character class, e.g. [[:punct:]0-9] for any punctuation character or a digit


Return Values (ERRORLEVEL)


GREP returns a status number to DOS, and you can test the return value with IF ERRORLEVEL in a batch file. (In 4DOS, %? gets you the errorlevel on the command line, not just in a batch file.)

If you don't specify the /0 or /1 option, GREP returns one of these values:

255You specified a bad option in the environment variable or on the command line, specified a bad regex, or made some other error.
254Your specified file for the /F option or /@ option is not available, or a file-system error occurred while reading either of those or any input file.
253There was insufficient memory for GREP to run with the options selected. For what you can do if this occurs, see "insufficient memory" in the list of messages in the reference manual.
128GREP made an error in expanding a regex. Please report this to Oak Road Systems.
4You listed one or more input filespecs, but none of them matched any existing files.
2You requested the help message with the /? option.
0The program read at least one input file and ran to completion, whether or not there were any hits.
 
You might want to use GREP in a batch file or a makefile and take different actions depending on whether hits were found or not. To do this, use the /0 or /1 option; each tells GREP what to return in ERRORLEVEL if any hits were found.
 
The /1 option tells GREP to return these values of ERRORLEVEL:
0At least one file (or standard input) was read, but no hits were found in any file.
1One or more hits were found in at least one file.
2-255(as above)
 
The /0 option is the opposite: it returns these ERRORLEVEL values:
0One or more hits were found in at least one file.
1At least one file (or standard input) was read, but no hits were found in any file.
2-255  (as above)


Limitations


GREP16 is limited by its 64 KB data segment. You may run into trouble if you use large values for both the /W option and the before number of the /P option.

For basic regexes, GREP is limited to 127 characters compiled into no more than 511. The "compiled" basic regex is GREP's internal representation, after character ranges have been expanded and so on.

For extended regexes, the maximum compiled size is 65,539 (sic) bytes. There can be no more than 65,536 capturing subpatterns, and all kinds of subpatterns can be nested no more than 200 levels deep.


Troubleshooting and How-to


Please share any questions that had you scratching your head. They'll be added to a future version of this user guide, space permitting.

Regex Matching Problems

  1. GREP is missing matches in my Word or Word Perfect files, even though I know they're in there!

    Binary files, including most word-processing files, may contain ASCII 26 (Control-Z) characters. These have no special meaning in a binary file but signal the end of a file being read as text. To read such files, use the /R3 option. Better yet, if you register GREP you can use the /R-1 or /R-2 option and let GREP figure out the type of each file automatically.

  2. How do I search for a word? For example, how do I get "plain" without also getting lines with "explain", "plains", etc.?

    GREP searches for lines that contain the string of characters represented by your regex. If you want that string of characters only when it's a whole word, you have to tell GREP.

    With GREP32, the /E4 option makes this task easy. For example,

            grep plain /e4 file1 file2 

    will find "plain" as a word. Note that the definition of "word" includes letters, digits, and the underscore. For searching most text that doesn't matter, but if your input contains something like "plain55" you might want to define "word" to be just letters, or to be any printing character. See the /M option.

    With GREP16, the task can still be done but it's less convenient. For techniques to find a single word with basic regexes, please see "Finding a Word" in the reference manual.

  3. How do I find all lines that contain "this" but not "that"?

    Use GREP as a filter and execute it twice, the first time to find all lines that contain "this" and the second time with the /V option to filter out any lines that contain "that":

            grep "this" files... | grep /v "that" 
  4. How do I find all files that contain "this" and "that"?

    If you want "this" and "that" on the same line and in that order, use the regex this.*that on the command line.

    If you want files that contain "this" and "that" on the same line in either order, use the /F option to enter the two regexes and the /Y option to make the AND condition.

    To find files that contain "this" and "that" anywhere in the same file, not just on the same line, use two grep calls connected with the "|" pipe character. You'll find an example with the /@ option in the reference manual.

  5. I've got a bunch of backslashes in my regex, and I don't think GREP is interpreting it the way I want.

    You can use the /D option to reveal what GREP is doing with your regex. The output can voluminous, but you can cut it down to size. Repeat your command with this added at the end:

            /D-|grep "grep GX:" 

    You'll see only the interpretation of the regex.

    If the displayed original regex is different from what you typed, then either DOS or the Microsoft 32-bit startup code has altered some of your characters. Use the /F- option and enter your regex from the keyboard, or see Special Rules for the Command Line in the reference manual.

    If you see a line about a "massaged" regex, you're probably running afoul of the Special Rules for the Command Line. Try entering your regex from keyboard or file with the /F option.

    Other possibilities: check whether you entered extended regex characters but didn't specify the /E2 option to tell GREP you're using extended regexes.

  6. I'm trying to GREP for a character like (, ?, or {, but it doesn't work.

    These have special meanings in extended regular expressions but not in basic regexes. Make sure you have not turned on extended regexes; or use a backslash \ to make GREP match them as normal characters.

  7. GREPping on a word boundary with \< and \> doesn't work.
    or: My subpattern with \( doesn't work!
    or: \| doesn't work for alternatives!

    With extended regular expressions (/E2 option), GREP uses Perl-style regexes: \b for a word boundary, ( ) for subexpressions, and plain | for alternatives.

    With basic regular expressions (/E1 option, or no /E option), a word boundary can't be used directly. However, you can still search for whole words; see the "Finding a Word" in the reference manual.

  8. \w, [:alpha:], and similar only take account of English letters. I need to work with 8-bit letters.

    In GREP32, use the /M option to select an appropriate character mapping. In GREP16, your only choice is to code the extra letters explicitly as shown in the character range example.

  9. I used the -w option to find a word, but it didn't work.

    GREP32 uses the /E4 option to search for a regex as a stand-alone word. GREP16 users need to use the techniques shown in "Finding a Word" in the reference manual.

  10. When I enter a character like é in my regex, the search doesn't seem to work.

    This is a problem (in GREP32 only) with how Microsoft's startup code processes the command line. Here are three ways to get around this problem:

     

General Problems

  1. I registered GREP, but it's still prompting me to register.

    The registered and unregistered versions are two separate executables. You need to delete the unregistered executables and unzip the registered version that you downloaded.

  2. What does this error or warning message mean?

    A section of the reference manual lists and explains the messages displayed by GREP.

  3. I got the message "insufficient memory".

    For what you can do if this occurs, see "grep: insufficient memory" in the list of messages in the reference manual.

  4. I put * on the command line, but 16-bit GREP searched every file.

    This is a change between releases 6.9 and 7.0. GREP16 and GREP32 now follow identical wildcard rules, and "*" now means "all files" in GREP16 as it always has in GREP32. If you want files with no extension, "*." will do the trick.

  5. I typed my GREP command and hit the Enter key, and it just sat there.

    Did GREP prompt you for keyboard input? You can halt it by pressing Control-Z then Enter.

    Are you piping GREP output ( | ) to MORE or another command? No output will appear until GREP has scanned all the files and the second command has done its work.

    Is the disk light on your computer flashing? GREP is reading lots of input but not finding any hits.

    Did you enter an extended regex with the | character? DOS interprets that character as a pipe, so it's waiting for GREP to finish and then DOS will run GREP's output through the "second command". Press Control-Z to end GREP. Some systems, like 4DOS, will accept the | if you enclose the whole regex in double quotes " ". Otherwise, use the /F- option and enter your regex from the keyboard; or see Backslash for Character Encoding (extended regex) or Special Rules for the Command Line in the reference manual.

  6. I used the -r option, but GREP won't scan files in subdirectories.

    You need the -s option for subdirectories, not the -r option. GREP diverges from UNIX in this respect.


[ on to the reference manual ]