All facilities of PSORT can be requested from the command line with the following syntax.
Global options are command line switches which apply to the whole sorting process. These indicate record format, delimiter characters, etc. If no global options are specified, it will be assumed that the file consists of variable length text records no more than 511 bytes long. All global options must precede the first key field.
Key fields specify how the records in the input file are to be finally sequenced. If no sorting key fields are specified, the whole record is taken as a sorting field. The first non-printable character terminates the sorting of the record.
The file to be sorted is read from the standard input. The sorted file is written to the standard output. These may be redirected from the DOS command line.
To sort a file named bob and send the results to a new file to be named anne the following command would suffice:
The global options are specified with the following syntax:
-rt [<maximum record size>] Indicates that the file is a text file consisting of a sequence of records delimited by a newline character. If a record larger than the indicated maximum size is encountered the program will terminate with an error message. The default file type is a text file with records up to 511 characters in length. If the last record in the file is not terminated with a carriage return - line feed sequence, it will be discarded. For text files that include a Z (0x1a) for the last character in the file, this implies that this last Z will not appear in the output file.
-rf <fixed record size> Indicates that the file consists of fixed length records of the indicated size. if a short record is found at the end of the file it will be discarded. If the file consists of fixed length records each terminated with a carrage return - line feed sequence, don't forget to include the 2 characters in the length of the record.
-rv [<range> [<maximum size>] Indicates that the file consists of variable length records up to the indicated maximum size. Variable records have one or two bytes reserved for the length of the record. The position of these two bytes is specified by the range. If the range is from a higher to lower number it is assumed that the record length is little endian. That is, that the higher order byte follows the lower order byte. The range may be more than two bytes wide but only the least significant two bytes are used since a record size cannot exceed approximately 64KB in length in any case. If the range is unspecified it is assumed that the record length is to be found in the first two bytes. The record length field should contain the number of bytes in the record that follow the record length field. That is, 0 is a valid record length and will correspond to a record with no bytes following the record length field. For purposes of specifying location of key fields the whole record including the record length field should be considered. Note that this is different that the record length field.
If no record type is specified, it is assumed that the file consists of text records no more than 511 characters long.
-in <input file>... Normally the input to be sorted is taken from the standard input. When this is inconvenient or there is more than one input file the -in switch may be used to specify the names of the input files. A simple "-" in the file list will expand to the standard input file.
-out <output file> Normally the sorted file will be written to the standard output file. When this is inconvenient the -out switch may be used to specify the output file. Using this switch, the output file can be written to the same file name as the input file. This will decrease the disk space required for the sort. It will also delete the input file during the sort so it will be lost if for any reason the sorting program does not run to completion. The output file should be written over the input file only in those cases where disk space is at a premium and adequate backup exists for the input file.
-w [<dir>] use the following name for the temporary directory. Default is taken from TMP, TEMP, or TMPDIR environment variables. If the -w switch is used with no argument, any temporary workfile will be created in the current directory. Note that if this environmental variable is assigned to a RAM disk, there may not be enough space available to sort a large file. In such cases the program will abort prior to completion indicating that disk space is exhausted while writing to the work file. If this happens, either use the -w switch or reassign the environment variable to a hard disk directory with sufficient space available. The work file will usually be somewhat larger than the input file.
-u output only records that are unique according to the sorting key fields.
-t [<range> ...] Indicates which characters terminate fields. For example -t '|' . If no -t specification is used the whole record will be considered as one field.
-i invert sorting sequence. Normally records are sorted according to increasing collating value of characters in the key fields. Use this switch to sort according to decreasing collating value instead. This switch applies to every field subsequently specified. However this switch and overridden on a field by field basis when keys are specified.
-b <range>... Indicates which characters should be considered blanks to be skipped to find the start of each field.
If PSORT detects records which do not have expected fields, it will normally terminate with an error message. The following switches alter this default behavior.
-mcf If a record is encountered with a sorting field too short to contain the entire sorting key, PSORT will normally terminate with an error message. This action can be overridden by using this switch. If this is done, the pointer to the current character in the sorting field will cease to advance when the end of the field is encountered.
-mcr This switch specifies that when short sorting fields are encountered, the pointer to the current character will cease to advance when and only when the end of the record is encountered. That is, field delimiters will be passed over and field will be continued towards the end of the record.
-mfr If a record is encountered with a missing sorting field, PSORT will normally terminate with an error message. This action can be overridden with this switch. If so, the pointer to the current character in the sorting field will be advanced to the null character at the end of the record.
The following switches normally need not be used. They are used in special situations, debugging, and fine tuning. Values for buffer and memory sizes have been initially set to values determined through experiment to give good results. In certain cases, modifying these values may decrease sorting time or amount of memory required.
-q suppresses display of program copyright notice when the program starts up.
-v specify visible mode. This displays statistics on each distribution pass in the file. It is used for debugging and fine tuning. If only the top levels of distribution are desired use -v <number of levels>.
-m <memory size> maximum memory in K to be allocated for sort. Normally PSORT will attempt to reserve enough memory from the system to hold the entire file. If the file is too large, all available memory will be reserved and a temporary work file will be created. This switch can be necessary in a multitasking environment to inhibit PSORT from requesting more than the specified amount of memory thereby leaving memory available for other tasks. Memory size can also be specified in megabytes or gigabytes by appending m or g respectively.
-l <allocation size> length of segment used by internal storage in K. It must exceed the longest record in the file. Default value is 39. The maximum permitted size is 63 on 16 bit versions. On 32 bit versions segment size can be as large as available memory. Psort will terminate if it is unable to allocate at least 4 segments when the program starts.
-ibs <buffer size> specify the attributes of the buffering for the main file input. These features include size, buffer count, etc. and is described in more detail below.
-obs <buffer size> same as -ibs but applies to the output file.
-wb <buffer size> specify the attributes output buffer used to write data to the temporary file.
-rb <buffer size> specify the attributes of the input buffer used to read data from the temporary file.
-bs <buffer> This is a short cut method for equivalent to specifying the attributes of both the input and output buffers.
*(gt size of the buffer in KB. The size can be specified in MB by appending an "m" to the value. If not specified, an environmental dependent default is assigned.
-sync use syncronous i/o. Suspend psort operation while waiting for data to be read and or written. When couple with buffered i/o (see below) the operating system will buffer sequential reads and write so in practice sorting operation will overlap i/o and operation will be quite efficient.
-async <buffer count> use asyncronous i/o. Let psort handle buffering of data with the specified number of buffers. For the input work file, defaul uses async i/o. This is due the fact that psort can schedule a seek on the next block of data while sorting the most recently aquired block This can enhance performance in many cases. For other files, experience suggests the performance with either method is comparable. For other files, performance doesn't seem to be effected
A key field will consist several optional parts in following syntax.
In addition to the key collating sequence explained below, a key field consists of the following optional components.
-i Invert the sequence of the sort for the last key specified. Normally fields are sorted sequence according to the -i global option or a higher level key. Using this switch inverts this sequence for this field. That is, if the global -i switch or this local -i switch is used the field will be sorted in descending sequence. If both switches are specified, the this local -i switch will re-invert the sense of the global -i switch resulting in records sorted on this field being in ascending sequence.
-b <range>... Specify additional leading blank characters for this field. These leading blank characters are in addition to the ones specified with the global option -b.
-f <range>... Sort on one or more fields. Fields are groups of characters separated by one of the delimiter characters specified by the global -t switch. If no -t switch has been specified, the whole record is considered as one field and reference to fields in positions greater that 0 will terminate PSORT with an error message. After finding a delimiter, the characters specified by global or local -b switch are skipped over to determine where the field actually starts. Fields are numbered starting at 0. That is -f 0 refers to the first field of the record. A field specification may contain a range of fields as in -f 2-4 to indicate that sorting sequence is to be determined on the basis of the third, fourth and fifth fields in turn. A range must have a definite end. That is -f 2- is not permitted. A field range need not be increasing. That is, -f 3-2 is permitted and will sort first by the fourth then by the third field.
-c <range>... Sort on one or more characters within the indicated fields. Start counting character positions from 0. For example -f 1 -c 2-3 would sort on the third and fourth characters of the second field. Several character ranges may be specified for a given field. For example -f 2 -c 5-6 -c 3-4 -c 1-2 would specify three sorting fields of 2 characters each within the third delimited field. When specifying a character range within a field, the second number need not be greater than the first. That is -c 7-3 is permitted and will result sorting being applied to characters in positions 7,6,5,4, and 3 in that order. As we will see in the examples below, this will be useful in sorting certain types of binary number fields. An indefinite character range can be specified as in -c 4- . This will indicate all characters starting with the fifth to the end of the field where ever that might be. A -c -2 would indicate all characters starting at the last one in the field moving to the left upto and including the third character in the field.
Key collating sequences are used to specify how characters are to be weighted in determining which record, field, or character is "less than" or "greater than" another. There are four kinds of key collating sequences that can be used.
-k [ [ [-r] <range>] ...] specifies a collating sequence The collating sequence is specified as one or more ranges of values. Characters are assigned collating sequence in order of their specification. For example, to sort a file containing only lower case alphabetic characters
-r repeats previous collating range. For example, to fold upper case letters to lower case letters for purposes of determining sorting priority use -k 'a'-'z' -r 'A'-'Z'. This would assign the first character following the -r the same collating value as the first one assigned in the previous range. That is 'A' through 'Z' would be assigned collating values 1 through 26 as would 'a' through 'z'. To give varying white space characters equal weight use
-n [ [-r] <range> ]...[-d <decimals>] ] character numeric sort on the key. This is an alternative to -k. character numeric fields may contain a leading or trailing sign and/or a decimal point. Numeric fields should look like
The -d switch is used to specify the maximum number of digits to the right of the decimal/radix point. It is usually not necessary to use the -d switch but could speed up the sort if the maximum number of digits to the right of the decimal/radix point is known.
Character numeric fields are inherently of variable length. If no fields delimiters are used on the record, it is possible for PSORT to fail to properly determine where the decimal fraction ends and the next field begins. This can be remedied by using the -d switch to specify how many characters after the decimal/radix point the field ends.
-s [ [ [-r] <range>]... ] This used to specify collating values for bytes corresponding to signs. The first half of the values are assumed to correspond to negative numbers are assumed to be negative and subsequent fields are ordered inversely to the sense specified by the global and local -i switches. The second half of the values are assumed to correspond to positive numbers and subsequent fields are sorted normally. To illustrate this consider the following sequence of records:
Nested Keys The final type of key specification is the nested field. Its syntax is (<sort command>) . In this case, commands enclosed in parenthesis are applied to each of the fields defined by the subsequent -b, -i, -f and -c parameters as if each of these were a record. This will be illustrated after macros and include files are described.
Default Key If no -k, -n, -s or nested field is specified a default collating sequence of all printable characters is used. Space and tab (0x09) are considered printable characters. For files containing non-printable characters be sure to include a -k specification.
To clarify the interaction among the global -t and -b switch and the local -b switch, the following sequence of operations is presented. For each record in the file. A pointer starts at the beginning of the record. For each field Blank characters specified by the global -b switch are skipped. Blank characters specified by the local -b switch are skipped. The pointer now points to the beginning of the field. Characters not specified by the global switch -t are skipped. The pointer now points to the end of the field. The next character (the delimiter) is skipped. The pointer now points to first blank (if any) of the next field.
If it is desired fields be delimited by white space and that adjacent tabs/space not constitute null fields, the following is appropriate:
If it is desired that fields be delimited by white space but that adjacent tabs constitute null fields, the following would be used.
Some systems maintain records with fields in the form "abcd","efgjkjl","irowq",.... To sort this file in descending sequence according to the third field, then in ascending sequence according the to first field, any of the following would suffice.
If no -b switch is used no characters are skipped after the delimiter.
Remember that characters not specified within a collating sequence are taken as collating value zero and that the field is considered terminated when a character with a collating value of zero is recognized. This can result in unexpected behavior when fields are not the same length. Following is the result of sorting a small file with -k 'z'-'a'.
The DOS command line only permits a maximum of 128 characters. This is not enough to permit all the command parameters that some files might require. In order to accommodate this and other situations the following switches can be used.
-#include <filename> This is used the specify the the contents of the indicated file should be inserted into the string of sorting parameters. Include files can be nested to reasonable depth. Include files have the exact same format as command line parameters with the following exceptions:
(1) The # character is special. It is used for placing comments into the command parameter file. All characters encountered after the # character are discarded. In order to be recognized as a comment character it must be preceded by a space. This prevents constructions such as '#' from being erroneously treated as comment characters.
(2) Command parameters are recognized across record boundaries until the end of the file is detected. There is no need to specify a special character such as '\' to specify continuation on to the next record.
-#define <macro name> <commands>... -#end These are used to create a sequence of commands and assign a name to them. For example, suppose that for a given file of accounting transactions account number is stored in positions 0-11 while the date is stored in positions 12-17. We could create a file named accts.srt containing the following:
Macro definitions are not expanded until they are used. Macros definitions may use other macros that may not have been defined yet. A macro may contain -#define statements so that new macros are created when it is used. Macros may include files with the -#include statement. Included files may define and reference defined macros. If a macro is defined and there already exist one with the same name, future references to that name will expand to the most recently defined macro. Macro invocations, definitions and included files may be nested to reasonable depth. When using nesting -#define statements each -#end will terminate the last unterminated -#define. The number of -#end parameters should match the number of -#define statements.
Macros should not contain references to themselves or to other macros that refer to themselves. This will result in a termination of the sort when the macro is invoked. In other words, macros may not be recursive.
Suppose we have a file of records containing work orders. Each record has a promise date in field 1 and an order date in field 4. The dates are stored in format DDMMYY. The following key specification will result in the fastest sort by promise date as well as account for roll over at the end of the century.
If our system used date fields in the form DDMMYY in several places it would be convenient to create a file named accts.srt which contained:
The displacements specified in the nested field parameter are applied from the first byte in the outer level field and move towards the end of the field. Thus nested fields properly account for outer level fields that have been specified in reverse order. The same nested field specification defined as -date in our previous example will serve just as well for fields where bytes are ordered in reverse order of significance. For example, suppose we have a field where the bytes of the date field were in reversed order, i.e. YYMMDD. If the field occupied positions 19 to 24, the following command would be appropriate:
Fields can be nested to any reasonable depth. In the following example the date field is specified as a combination of previously defined day, month, and year fields.
This technique would now permit us to define a special american date type field.
The complete syntax for nested key fields is