NAME List::Parseable - routines to work with lists containing a simple language DESCRIPTION This module allows you to treat a list (which can be expressed as an actual perl list, or as a string which will be parsed to form a list) as a simple program which returns a value. This allows you to do several tasks that I run into frequently. A task that occurs often is to have a config file or a data file which is read in by the program. Most of the time, the data stored in these files is fully defined at the time it is read in, but occasionally, the data that you want is more complex, and is best determined at run time. One obvious way to do this is to build extra logic into the program that understands the data stored in the file and which can do additional operations such as supplying missing values, checking data validity, etc., but often, building a knowledge of the exact format of the data file is not desired. It leads to added complexity in the program, and usually, the types of checks and manipulations that are done are very repetitive. This module can be used to bypass some of these issues. Creating complex values If you need to set a variable in a config file or a data file to some value which can only be determined at runtime, or which is best defined as some definition based on other values is the config file, the value can be set to a list which is interpreted using this module. That way, the actual value can be determined at runtime using a simple program. Supplying missing data Defaults for missing values can be supplied. Describing valid data If you want to describe validity checks for data or config files, this can be done using this module. A piece of data can be defined as well as a flag which will be evaluated based on the value currently in the config file. If the data is invalid, the flag will be set accordingly. A description of the data can be written as a list which, when evaluated, will return true if a piece of data meets the validity requirements described there. All of this could be done by nesting actual perl code and running eval on it, but it is usually desirable to do things in a much safer way. Also, it is rarely necessary to have the full power (and complexity) of the perl language in this case. ROUTINES new use List::Parseable; $obj = new List::Parseable; Creates a new List::Parseable object. version $version = $obj->version(); Check the module version. list, string $obj->list(NAME,LIST); $obj->string(NAME,STRING); The list function takes the arguments and stores them in the object under the given NAME. The string function takes a single string argument and converts it to a list (string parsing rules are described below). eval $val = $obj->eval(NAME); This must be called after the list or string method is used to store a list under the given name. It parses the list using the list parsing rules described below. errors $obj->errors(OPTION,OPTION,...) When a list is used in any of the operations described below, some of the elements may not be valid for that operation. This option tells how to handle these errors. Allowed values for OPTION are: exit : the program halts with an error return : the routine returns an empty set or element (depending on type of return value) ignore : the value is ignored (removed from the list) and the operation continues NOTE: exactly one of the above options may be given. It defaults to "ignore". In addition, the following options may be included: stderr : send a warning about invalid elements to stderr stdout : send a warning about invalid elements to stdout both : sends wanring message to both stdout and stderr quiet : never send warnings The default is quiet. vars $obj->vars(HASH); This takes a hash of the form VAR => VAL where each VAR is the name of a variable, and each VAL is either a scalar or a list reference (which may be nested list references and scalars). It stores these in the object for use in the "getvar" and other variable operations described below. LIST PARSING RULES List parsing consists of two steps. First, every element is examined. If it is a scalar, it is left untouched duiring the first step. If it is a list reference, the list of values is first parsed using the same rules as the parent list. In this way, nested lists are parsed to any level. As an example of this, the list: (count a b) would evaluate to a scalar "2" after the second step (since a list who's first element is "count" evaluates to the number of elements in the list), so the nested structure: (foo (count a b) 3 (count x)) is identical to: (foo 2 3 1) after the first step is complete on the main list. For the second step of parsing, the list is examined again. It must consist of zero or more operations (each of which is one of the strings described below) followed by zero or more arguments, each of which can be either a list reference or a scalar. The types of arguments allowed depend on the operation. Arguments start with the first element which is not a known operation. Alternately, if one of the elements is "--", that element signals the end of the operations and the start of the arguments (but is otherwise ignored). For example, if "foo" and "bar" are known operations, then (foo bar a b) has two operations and two arguments. This is equivalent to: (foo bar -- a b) The list: (foo -- bar a b) has one operation and three arguments since "bar" is not treated as an operation in this case. Only the first occurence of "--" are treated this way, and only if it follows a set of operations. For example: (foo a -- bar b) contains one operation (foo) and 4 arguments (a, --, bar, b). If no operation is included, it defaults to the "scalar" operation, so (scalar a b) (a b) are equivalent. If a list includes multiple operations, they are handled one at a time, starting with the right most. For example: (foo bar a b) is equivalent to: (foo (bar a b)) STRING PARSING RULES When parsing a string, sets of list delimiters are checked for. Valid list delimeters are: parenthese () brackets [] braces {} The list delimiters may be separated from the list elements by whitespace, but this is optional except in cases where the first element in the list begins with a punctuation mark. In this case, the list delimiter must be followed by space. Any string that starts with a punctuation mark which immediately follows the left list delimiter is treated as an element delimiter.`q Elements in a list are typically separated by spaces, but including an element delimiter can change this. An element delimiter is a character or string attached to the left list delimiter. The element delimiter MUST start some punctuation mark, but it is not allowed to start with "\" which is treated specially. For example, the list (a b c) can be written in the following ways: '( a b c )' '(a b c)' '[ a b c ]' '{: a:b:c }' Any combination of list delimiters can be used to create nested lists: '(a (b c) [: d:e ] )' There is currently no "quoting" mechanism, so there is no way to include the element delimiter in an element, so if an element can contain a space, some other element delimiter must be used. In other words, to make a list: ("this", "is a", "special list"), use: '(: this:is a:special list)' In order to include any list delimiter in an element, the left list delimiter must be followed immediately with an "\". It may then be followed by any element delimiter. Everything up to the closing list delimiter is treated as part of the list of elements. No nested lists can be created. For example: '( (a) (\ x ] ) [\: b:) ] )' is the list: ( [ 'a' ], [ 'x', ']' ], [ 'b', ')' ] ) In order to include a parentheses (either left or right) in a list, use one of the other list delimiters with the "\" option. KNOWN OPERATIONS In the following operations, lists of elements may be either scalars or list references, but in most cases, some types of elements may not be valid. In the event of an invalid element, the behavior is dictated by the results of the "errors" method. The following basic operations are known: (scalar ELE0 ELE1 ...), (list ELE0 ELE1 ...) These two operations determine how to treat the results of the parsing. Most operations take a list of scalars as arguments, but some take multiple lists. These require that the "list" operation be used. For example: (foo (scalar a b) (list c d)) after parsing all of the sublists is eqivalent to: (foo a b [ c d ]) where [ c d ] is a list reference. All element types are allowed. The following operations take a list and return a scalar based on the list: (count ELE0 ELE1 ...) The count operation counts the number of arguments and returns it. (count a b) => 2 (count (list a b) c) => 2 All element types are allowed. (countval VAL ELE0 ELE1 ...) This returns the number of times VAL appears in the list. VAL must be a scalar. (countval a a b a) => 2 All elements should be scalars. (minval ELE0 ELE1 ...), (maxval ELE0 ELE1 ...) This returns the numerical value who's value is the least or greatest. (minval 5 7 8) => 5 (maxval 5 7 8) => 8 All elements should be numeric scalars. (nth N ELE0 ELE1 ...) This returns the Nth element of the list. Elements are numbered 0 to M or -(M+1) to -1. The first element must be an integer or nothing is returned. All element types are allowed for the remaining arguments. (case TEST0 VAL0 ... TESTN VALN [DEFAULT_VAL]) All TEST elements must be scalars or nothing is returned. Values can be any type. Tests are evaluated, one at a time, and the first one that evaluates as true provides the return value. If no test is true, the default value is returned. If no default is provided, nothing is returned. (indexval VAL ELE0 ELE1 ...), (rindexval VAL ELE0 ELE1 ...) This returns the index of the first/last occurence of VAL in the list or -1 if it doesn't appear. VAL must be a scalar. All other elements should be scalars. (join ELE0 ELE1 ...), (join delim DEL ELE0 ELE1 ...) This joins all elements into a single string. By default, a space is used, but this can be overridden by including the "delim" word as the first element in the list followed by the delimiter. DEL can be the keywork "_null_" which means to join them with no delimiter, "_space_" to join them with a space, or "_nl_" to join them with a newline, or "_tab_" to join with a tab. DEL must be scalar. All others should be scalars. ( + ELE0 ELE1 ...), ( * ELE0 ELE1 ...) These return the result of adding or multiplying all of the elements. All elements should be numbers. ( - ELE0 ELE1 ), ( / ELE0 ELE1 ) These perform the subtraction (ELE0 - ELE1) or division (ELE0/ELE1). All elements must be numbers, and in the division case, ELE1 must not be zero. The following returns true or false (1 or 0) based on the list: (mintrue N ELE0 ELE1 ...), (maxtrue N ELE0 ELE1 ...) Returns true if at least (or at most) N of the elements evaluate to true. All elements should be scalars. (minfalse N ELE0 ELE1 ...), (maxfalse N ELE0 ELE1 ...) Similar to mintrue/maxtrue but tests for false values. All elements should be scalars. (numtrue N ELE0 ELE1 ...), (numfalse N ELE0 ELE1 ...) Returns true if exactly N of the elements evaluate to true (or false). All elements should be scalars. (and ELE0 ELE1 ...) Returns true if all elements evaluate to true. All elements should be scalars. (or ELE0 ELE1 ...) Returns true if any element evaluates to true. All elements should be scalars. (not ELE0 ELE1 ...) Returns true if all elements evaluate to false. All elements should be scalars. (member VAL ELE0 ELE1 ...) Returns true if any element in the list is equal to the value. All elements should be scalars. The first element MUST be a scalar or nothing can be returned. (absent VAL ELE0 ELE1 ...) Returns true if no element in the list is equal to the value. All elements should be scalars. The first element MUST be a scalar or nothing can be returned. ( ELE0 ELE1 ); also >= == <= < !=> This compares ELE0 and ELE1 numerically. It returns true if ELE0 is greater than ELE1. The other common mathematcial operations: >=, =, <=, <, != are also available. Note that space is required after the opening list delimiter in order to not confuse them with element delimiters. Exactly two numerical elements are required in all cases. ( gt ELE0 ELE1 ); also ge eq le lt ne This compares ELE0 and ELE1 alphabetically. It returns true if ELE0 is greater than ELE1. The other common string operations: ge, eq, le, lt, ne are also available. Exactly two scalar elements are required in all cases. (if TEST), (if TEST VAL1), (if TEST VAL1 VAL2) This checks to see if TEST evaluates to true. If it is true, it returns VAL1 if it is included or true otherwise. If it is false, it returns VAL2 if itis include or false otherwise. TEST must be a scalar. The other values can be any type. (is_equal LIST0 LIST1) This takes two list references (which should contain only scalars) and checks to make sure that the elements are equal (order is ignored). If they are, true is returned. Otherwise, false is. If either argument is not a list reference, or if either list contains non-scalars, nothing is returned. (not_equal LIST0 LIST1) Similar to is_equal, but returns true if the two lists are different. (iff ELE0 ELE1 ...) This returns true if all elements are true or all are false. It returns false if they are a mixture of true and false. (range NUM X Y); also rangeL rangeR rangeLR These check to make sure that NUM is in the range X to Y. All three must be numeric, and X must be less than (or equal) to Y. It returns true in the following cases: range X <= NUM <= Y rangeL X < NUM <= Y rangeR X <= NUM < Y rangeLR X < NUM < Y and false otherwise. The following manipulate a list. (flatten ELE0 ELE1 ...) This takes all elements (which may be scalars or nested list references) and returns a flat list with all of the elements from any level. All element types are allowed. (union ELE0 ELE1 ...) This combines all of the members of all of the elements (which may be scalars or nested lists) into a single list of elements. Only the top level is flattened. So: (union a [b] [ [c], [d] ]) => (a b [c] [d]) All element types are allowed. (sort ELE0 ELE1 ...) This returns the list sorted alphabetically. The list is flattened first. All elements should be scalars. (sort_by_method METHOD LIST ARG1 ARG2 ...) This returns the list sorted by the method given. The method can be any method in the Sort::DataTypes module. LIST must be a list produced by the (list ELE0 ELE1 ...) operation. The first element (METHOD) must be a valid method and LIST must be a list reference or nothing can be returned. Other arguments must be valid for the sort method, but are not checked in advance. (unique ELE0 ELE1 ...) This removes duplicate elements and returns a list of unique elements. All elements should be scalars. (compact ELE0 ELE1 ...) This removes all empty ("") elements and returns a flat list of all remaining elements. All elements should be scalars. (true ELE0 ELE1 ...) This removes all elements that evaluate to false and returns a flat list of all remaining elements. All elements should be scalars. (pop ELE0 ELE1 ...), (shift ELE0 ELE1 ...) This removes the last/first element from the list and returns the resulting list. All element types are allowed. (pad LENGTH ELE0 ELE1 ...) This takes a list of elements and pads it to the right with spaces until it is LENGTH characters long. If LENGTH is negative, it will pad to the left. The first argument must be an integer or nothing will be returned. All others should be scalars. (padchar LENGTH CHAR ELE0 ELE1 ...) This is identical to (pad ...) but it will pad with an arbitrary character (the 2nd argument). The first arguement must be an integer and the 2nd argument must be a single character or nothing will be returned. Other elements should be scalars. (column N LIST0 LIST1 ...) This returns a list of the Nth element of each of the listrefs. All arguments (except for the first) must be listrefs. (reverse ELE0 ELE1 ...) This returns the reverse of the list. All element types are allowed. (rotate N ELE0 ELE1 ...) This rotates the list of element N times. If N is positive, a single rotation is to remove the first element and put it on the end. If N is negative, a single rotation is to remove the last element and move it to the first. N must be an integer or nothing is returned. All element types are allowed for the list. (delete VAL ELE0 ELE1 ...) This removes all occurences of VAL from the list. VAL must be a scalar or nothing is returned. All other elements should be scalars. (clear ELE0 ELE1 ...) This clears all elements from the list and returns an empty list. (append STRING ELE0 ELE1 ...), (prepend STRING ELE0 ELE1 ...) These append or prepend a string to all elements in the list. The first element must be a scalar. All others should be scalars. (splice LIST N LEN [ELE0 ELE1 ...]) This deletes LEN elements from LIST starting at element N and inserts the ELE elements in their place. LIST must be a list reference, N and LEN must be integers (LEN must be zero or positive). The remaining elements can be any type. (slice N LEN ELE0 ELE1 ...) This returns the slice of the list of elements starting with the Nth element and including LEN number of elements. N and LEN must be integers (LEN must be zero or positive). The remaining elements can be any type. (fill LIST N LEN VAL) This sets elements in LIST. The first argument is required, and must be a list reference. All other arguments are optional. Elements are numbered 0 to M or -(M+1) to -1. It is valid to refer to elements with an index greater than M (these are elements which will be added on to the right of the list), or less than -(M+1) (these are elements which will be added to the left. If N is given, it must be an integer. This is the index of the first element of the list to change. If LEN is given, it must be an integer. This is the number of elements to change starting at the Nth element and moving right. If it is negative, it is the number of elements to change starting with the Nth element and moving left. If LEN is zero,the list is unmodified. If VAL is set, all elements to be changed will be set to it. Otherwise, they will be set to "". The elements to be set depend on N and LEN (and M). The following table describes the changes for cases where LEN is not given. L refers to an index to the left of the list, R refers to an index to the right of the list, and C refers to an index in the list. N === A If both N and LEN are omitted, the entire list is filled with "". L This adds blank elements to the left of the list out to (and including) the Lth element. Existing elements are unmodified. C This sets from Cth element to the end of the list to "". R This adds blank elements to the right of the list out to (and including) the Rth element. Existing elements are unmodified. For cases where LEN is given, N and LEN explicitly define the start and stop elements to modify. Either or both elements can be to the left of the list or to the right of the list. All elements in the (N,LEN) range are set to the value. If LEN goes past the end of the list, additional elements are added and set to that value. If N points to elements before the list, additional elements are added to the left. One possible confusion is when the operation is used to fill elements which are outside of the list entirely. For example if LIST contains: (a a a) and N is 4, LEN is 2, VAL is b, the resulting list is: (a a a "" b b) (so an empty element was added to get the list out to the portion that was being set). (difference LIST0 LIST1), (d_difference LIST0 LIST1) These take two lists and removes all elements in the second list from the first (and return the new list). The difference between the two is how duplicate entries are handled. In the first call, all duplicates are removed. In the second call, only one instance per element in the second list is removed. (difference [list a a b c] [list a]) => (b c) (d_difference [list a a b c] [list a]) => (a b c) (intersection LIST0 LIST1), (d_intersection LIST0 LIST1) This takes two lists and finds the intersection of the two. The intersection are the elements that are in both lists. In the first call, duplicates are ignored returning only a single instance of the intersection. In the second call, duplicates may be included in the intersection. (intersection [list a a b c] [list a a a b]) => (a b) (d_intersection [list a a b c] [list a a a b]) => (a a b) (symdiff LIST0 LIST1), (d_symdiff LIST0 LIST1) This takes the symmetric difference between two lists. The symmetric difference are elements that are in either list but not both. Again, duplicates are allowed in the second call. (symdiff [list a a b c] [list a a a b]) => (c) (d_symdiff [list a a b c] [list a a a b]) => (a c) The following variable operations are known: (getvar VAR) This returns the value of a variable named VAR. VAR must be a valid variable name or nothing is returned. (setvar VAR VAL) This sets the variable VAR to the given value (which may be a list or a scalar). Returns VAL. (default VAR VAL) This sets the values of VAR to VAL unless VAR already has a value. Returns the value of VAR. (unsetvar VAR) Removes VAR from the defined variables. Returns nothing. (shiftvar VAR), (popvar VAR) These shift or pop a value from the variable. Nothing is returned if the operation is not valid. Otherwise, the shifted/popped value is returned. (unshiftvar VAR VAL), (pushvar VAR VAL) These add a new element to the start or end of the given variable. If the variable refers to a scalar, it will be converted to a list. Nothing is returned. EXAMPLES Reading a config file A simple config file might contain a two types of lines. The first type would be lines which actually set a config variable to a value: Var = Val and the second type would be lines which use this module to set more complex values: Var : ( ... ) You might read the file, one line at a time, in the following way: use List::Parseable; $lp = new List::Parseable; %CONFIG = (); @lines = ...; # @lines contains the lines from the config file foreach $line (@lines) { if ($line =~ /^\s*(\S+)\s*=\s*(.*?)\s*$/) { set_value($1,$2); } elsif ($line =~ /^\s*(\S+)\s*:\s*(.*?)\s*$/) { parse_value($1,$2); } } # This stores a value in a variable. The value is stored in both # a global config hash and in the List::Parseable object so that # the config values can be used in setting complex values. # sub set_value { my($var,$val) = @_; $::CONFIG{$var} = $val; $lp->vars($var,$val); } # Set a complex variable using List::Parseable. # sub parse_value { my($var,$str) = @_; $lp->string("curr",$str); my ($val) = $lp->eval("curr"); set_value($var,$val); } Setting a complex value If you have a config file with three values: ValA, ValB, and Ave, and you want Ave to be the average of ValA and ValB, but you don't want to build this into the program that reads the data, you could have the following (using the program in the example for reading a config file): ValA = 7 ValB = 9 Ave : ( / ( + (getvar ValA) (getvar ValB) ) 2 ) Supplying a missing value If you have a config file and you want to provide defaults for missing values, you can do the following: ValA = suppliedA ValA : (default ValA defaultA) ValB : (default ValB defaultB) This will result in the ValA being 'suppliedA' and ValB being 'defaultB'. BACKWARDS INCOMPATABILITIES 1.01 no longer uses < as list delimiters> In order to simplify the use of the <, <=, >, and >= operators, the <> symbols will no longer be used as list delimiters. BUGS AND QUESTIONS If you find a bug in this module, please send it directly to me (see the AUTHOR section below). Alternately, you can submit it on CPAN. This can be done at the following URL: http://rt.cpan.org/Public/Dist/Display.html?Name=List-Parseable Please do not use other means to report bugs (such as usenet newsgroups, or forums for a specific OS or linux distribution) as it is impossible for me to keep up with all of them. When filing a bug report, please include the following information: * The version of the module you are using. You can get this by using the script: use List::Parseable; $obj = new List::Parseable; print $obj->version(),"\n"; * The output from "perl -V" If you have a problem using the module that perhaps isn't a bug (can't figure out the syntax, etc.), you're in the right place. Go right back to the top of this manual and start reading. If this still doesn't answer your question, mail me directly. KNOWN PROBLEMS None at this point. LICENSE This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself. AUTHOR Sullivan Beck (sbeck@cpan.org)