Simulation Aids Module by Automation Professionals
E-Mail Support

Dataset / view

Process a dataset into a new dataset, using jython expressions for each column to return, and in optional "Where", "Group By", "Pivot", "Having", "Order By", and "Limit" clauses. Extra arguments are converted into an "args" tuple for efficient use within the jython expressions. Available in all scopes.


view(selectString, fromDataset [, args...]) returns Dataset

ArgumentData TypeDescription
selectStringStringPseudo-SQL query composed of a list of column expressions with output column names, and optional Where, Group By, Pivot, Having, and Order By clauses. Line breaks and whitespace are not significant outside of quotes. When developing, consider using a custom string property here, and editing the string separately from the expression.
fromDatasetDatasetThe data to be used as the implied "From" clause of the Pseudo-SQL query. Column names within this dataset will be converted into python identifiers by the auxiliary function system.dataset.mungeColumnName(), and its row values assigned to these names for use in the column expressions and other clauses.
argsObject[]Zero or more arguments to be packed into a python "args" tuple, supplying varying values to the expression without requiring a recompile, and without having to convert to a string. Intended to supply comparison values in the Where and Having clauses.


The syntax for the selectString argument must use the following basic pattern:

  pyExpression As ColumnName,
  pyExpression As "Column Name" ...
Where pyConditionalExpression
Group By pyExpression, pyExpression ...
Pivot pyExpression For pyStringExpression [In pySequenceExpression]
Having pyConditionalExpression
Order By pyExpression, pyExpression ...
Limit pyIntegerExpression

Within the selectString, the clause keywords are not case sensitive, but everything else is case-sensitive. Whitespace and line breaks outside of quotes collapse to single spaces, so the Pseudo-SQL may be formatted for easy readability. How the column data and intermediate values are used within the pyExpressions varies by clause, as follows:

Column expressions that include an AS clause are processed verbatim. When no AS clause is present, and the pyExpression is a simple munged source column name, the original column name is used as an implied output column name. In that case, if there is also a Group By and/or Pivot clause, a [0] subscript is appended to the expression.

When a column expression has no AS clause, and the expression is not a simple munged column name, the expression is used verbatim. The output column name is then constructed by passing the entire expression through the name munging function.

When a column expression is simply an asterisk, it is replaced with the complete source dataset column list, with original output column names, and with [0] subscripts as above when grouping or pivoting.

The code extracts the values from each row of the source dataset, assigning them to local variables using the munged column names. If a Where clause is present, it is evaluated immediately, and the row discarded if not True. The Where clause may use the munged names of the source columns, objects normally present in the local scope (system.*, etc.), the "args" and "binding" objects as described in the objectScript expression function, and "_r", the row index in the source dataset.

If a Group By clause is present, for any rows that pass the Where condition, the group key is computed next. These Group By key expressions have access to the same variables as the Where clause. A dictionary keyed by group is created. Each entry is a list of lists of row values for that keying tuple. After all source rows are collected into groups, the code iterates through the list of group keys, assigning the lists of row values back into the munged source column names.

If a Pivot clause is present, the column name string expression in the For subclause is used as a final grouping key, but just for the pivoted column expression. The optional In subclause may be either a python list or a python tuple. If a tuple, only the given column headings will be used. Rows generating other column names will be discarded. But if a list, all missing entries will be added after the given ones and no rows will be discarded. The pivot value expressions are computed before the other selected output columns, with values from just the inner group.

The code next computes the output column values using the column expressions in the Select clause. If no Group By or Pivot clause was used, these column expressions may use anything valid in the Where clause. If Group By or Pivot was used, the column expressions must handle the source columns as python lists instead, and may not use "_r". In either case, each computed output value is available to following column expressions using its output column name (munged if necessary).

If a Having clause is present, it is evaluated immediately after the output values are constructed. It may use any variable syntax valid in an output column expression, including the munged output column names and munged pivot column names. If the Having clause is true, or not given at all, an output row is constructed from each set of output column values.

If an Order By clause is present, it is evaluated with each constructed output row to produce an ordering tuple. Like the Having clause, these expressions may use any variables valid in an output column expression. The output rows are collected in a temporary list with the ordering tuples, sorted, then the final row list is extracted from the temporary list. If the Order By clause is omitted, the constructed output rows are added to the final list directly.

If a Limit clause is present, it is evaluated immediately after the output values are sorted and the otherwise final list of output rows generated. The expression must yield an integer, and will take effect only if positive. The list of rows is available as the variable _rows, making it possible to return a percentage of the rows like so:

LIMIT int(0.10 * len(_rows))

The code finishes by calling system.dataset.toDataSet() with the output column names and the final list of rows.

Usage Notes

If you need to programmatically create a selectString for this function, you can get munged column names from your actual column names with the system.dataset.mungeColumnName() auxiliary function. Basically, column names that contain non-alphanumeric characters are converted to alphanumeric only, using underscores.

To make grouping operations as similar as practical to real SQL syntax, common aggregate functions are defined in script.aggregate.*. These functions ignore null/None values as is expected in SQL. Since some have names matching python builtins, they are only imported without prefix within the local scope of the view() function. If you use these in a script module called from view(), you must import them or use the full names.

The aggregate functions are not syntactically identical to real SQL, though, as they can only take a list argument. In real SQL, a user may place a scalar expression inside the function's parenthesis, and the SQL language takes care of the list conversion. In view()'s Pseudo-SQL, the user must handle that themselves. Also, the use of an aggregate function in real SQL will deliver an aggregated result, even if no Group By clause is present. In this Pseudo-SQL, you must use "Group By 0" to trigger grouping of all rows. (Group By 0 is implied if Pivot is used without Group By.)

Unlike grouping in real SQL, you may not use an output column expression that matches a Group By column or expression. I suggest using [0] or [-1] as a subscript to obtain the first or last row in a group for those expressions. (This is done automatically for simple munged column names.)

When in doubt how view() is handling your code and data, set its logger to DEBUG. (com.automation-pros.simaids.View) This will show you the python whenever it re-compiles, along with execution times. Looking at the generated python will also show other variables you may use in your expressions.