TJOIN v2.10 - join two related data tables |
Documentation revised 21 Oct 00 - Copyright (c) 1996-2000 by Rune Berg. TextTools Freeware. |
Contents:
Usage | Top || Next |
tjoin [log logfile] [options] [infile] and infile2 [to outfile] [$i=$j ...]
See Understanding The Usage Section for details.
Description | Top || Previous || Next |
tjoin prints, to outfile, the join of the tables in infile and infile2, optionally using predicates to restrict output.
infile and infile2 are ASCII text files. tjoin sees each input line as a row of (by default, but see options) whitespace-separated fields; this is described in more detail in the documentation for tcols.
tjoin ignores empty (whitespace only) input lines.
Predicates of form $i=$j (where i and j are numbers in the range 1..400) restrict output to the cases where the i'th field in infile compares equal to the j'th field in infile2. tjoin compares fields according to the following rules:
If you don't specify infile, tjoin reads from standard input.
If you don't specify outfile, tjoin writes to standard output.
If you don't specify logfile, tjoin writes error messages to standard
error.
tjoin holds infile in memory while reading infile2, so you may want to specify the smaller input file as infile.
The output from tjoin has the following form:
infile-row-1 infile2-row-1 infile-row-2 infile2-row-1 infile-row-3 infile2-row-1 ... infile-row-1 infile2-row-2 infile-row-2 infile2-row-2 infile-row-3 infile2-row-2 ... ...
The output rows are not sorted by tjoin; they retain their original order (except unjoined rows from infile when doing a left outer join).
Example | Top || Previous || Next |
For example, consider the file "boys" containing the table:
john tennis john golf tim surfing al tennis
and the file "girls" containg the table:
sue golf mary rowing lisa tennis
The command:
tjoin boys and girls
produces the output below, all possible pairs of the data sets from the two files:
john tennis sue golf john golf sue golf tim surfing sue golf al tennis sue golf john tennis mary rowing john golf mary rowing tim surfing mary rowing al tennis mary rowing john tennis lisa tennis john golf lisa tennis tim surfing lisa tennis al tennis lisa tennis
For example, to find sports partners, use the command:
tjoin boys and girls $2=$2
to produce the output below, all pairs of the data sets from the two files where the second fields are equal:
john golf sue golf john tennis lisa tennis al tennis lisa tennis
Options | Top || Previous || Next |
tjoin recognizes the following command line options:
Option | Function |
---|---|
-iC | Separate fields in infile by character C (except \). Use \t to form a tab. |
-csvi | Do CSV (comma separated values) style parsing of input fields from infile. Unless the -iC option is given, use a comma as the field separator. |
-aC | Separate fields in infile2 by character C (except \). Use \t to form a tab. |
-csva | Do CSV (comma separated values) style parsing of input fields from infile2. Unless the -aC option is given, use a comma as the field separator. |
-oS | Separate output fields by string S, instead of the default tab character. Use \t to form a tab, \\ to form a backslash. -o recognizes no other escaped characters. |
-csvo | Print output fields CSV (comma separated values) style. Unless the -oS option is given, use a comma as the field separator. |
-fppN | Use floating-point precision N (0..15, default 6) decimal digits for comparisons/output. See separate discussion on floating point numbers for more details. |
-jol | Left outer join. |
-jor | Right outer join. |
-jodS | Use string S as default value in outer joins. Use \t to form a tab, \\ to form a backslash. This option recognizes no other escaped characters. If this options is not used, tjoin uses the string DEFAULT. |
-r | Print a one-line report to standard error (or logfile, if given) after processing. This option has no effect if processing is aborted due to an error. |
-v | Print version banner and usage info to standard error (or logfile, if given), then exit. |
Outer Joins | Top || Previous || Next |
If you're using predicates to restrict output, but still want to make sure that every input row appears at least once on the output, an "outer join" is the way to go.
A left outer join forces all rows in infile to appear at least once in the output. Unjoined infile rows get printed followed by the output separator and the default value string (which you set using the -jod option).
For example, returning to the example files above, the command:
tjoin -jol -jod=UNJOINED boys and girls $2=$2
to produce the output below:
john golf sue golf john tennis lisa tennis al tennis lisa tennis tim surfing UNJOINED
Note that the unjoined row(s) from infile will always appear as the last output line(s), but in their original relative order.
A right outer join forces all rows in infile2 to appear at least once in the output. Unjoined infile2 rows get printed preceded by the default value string and the output separator.
For example, returning to the example files above, the command:
tjoin -jor -jod=UNJOINED boys and girls $2=$2
to produce the output below:
john golf sue golf UNJOINED mary rowing john tennis lisa tennis al tennis lisa tennis
Note that the unjoined row(s) from infile2 will always appear in the same position(s) as if they were joined rows, and in their original relative order.
It's perfectly legal to combine left and right outer joins, as in the command:
tjoin -jol -jor -jod=UNJOINED boys and girls $2=$2
to produce output such as:
john golf sue golf UNJOINED mary rowing john tennis lisa tennis al tennis lisa tennis tim surfing UNJOINED
Limitations | Top || Previous || Next |
tjoin runs out of memory if infile is too large.
See also TextTools General Features
Return Codes | Top || Previous || Next |
tjoin returns with one of the following codes ("error levels"):
Code | Meaning |
---|---|
0 | Success |
101 | Out of memory |
102 | Incorrect/missing command line arguments |
104 | Error opening file |
105 | I/O Error |
106 | Capacity overrun |
107 | File name clash |
109 | Too few fields to satisfy predicate |
For more details, see TextTools General Features.
Version History | Top || Previous |
These are the released versions of tjoin:
Version | Date | Changes |
---|---|---|
1.02 | 25-Feb-96 | n/a |
1.10 | 25-Sep-96 | |
1.20 | 8-Mar-97 | |
1.30 | 13-Jul-97 | |
2.00 | 2-Jan-99 | |
2.10 | 21-Oct-00 |
|
End of document |