1. Home
  2. Calculations and manipulation

Calculations and manipulation

The directive CALCULATE allows arithmetic calculations on the values of any numeric data structure; logical tests can also be done on numerical and textual values. Functions and operators are available for a very wide range of calculations on matrices and tables. Another general directive is EQUATE, which allows values to be copied from one set of data structures to another; the structures must store values of the same mode (for example, numbers or text), but need not be of the same type.

Structure values can be deleted to save space within Genstat; attributes can also be deleted so that the structure can be redefined, for example as another type. Contents of data structures can be compared, to see if they contain the same distinct items, or whether the distinct values in one structure are a subset of those in another. You can also find all the locations where a number, identifier or string occurs within a data structure.

    CALCULATE performs arithmetic and logical calculations
    DELETE allows values and attributes of data structures to be deleted
    EQUATE copies values between sets of data structures
    SETRELATE compares the sets of values in two data structures
    GETLOCATIONS finds locations of an identifier within a pointer, or a string within a factor or text, or a number within any numerical data structure

There are several general directives for manipulating vectors (variates, factors or texts). Units of vectors can be sorted into systematic order or into random order. Boolean arithmetic can be performed on their contents, or you can form all the ways of partitioning them into subsets. A “restriction” can be associated with a vector, so that subsequent statements operate on only a subset of its units. A default length and labelling can be defined for vectors formed later in the job. Facilities for specific types of vector allow interpolation of values for variates, monotonic regression, calculation of regression quantiles, generation of factor values, and concatenation, editing and searching of text.

    SORT sorts units of vectors into alphabetic or numerical order of an index vector, or forms a factor from a variate or text
    SETCALCULATE performs Boolean set calculations on the contents of vectors and pointers
    SETALLOCATIONS runs through all ways of allocating a set of objects to subsets
    RESTRICT defines a “restriction” on the units of a vector
    UNITS defines default length or labelling for vectors defined subsequently in the job
    INTERPOLATE calculates variates of interpolated values
    FRQUANTILES forms regression quantiles
    MONOTONIC fits an increasing monotonic regression
    GROUPS forms a factor (or grouping variable) from a variate or text, together with the set of distinct values that occur
    CONCATENATE concatenates together lines of text vectors
    EDIT line editor for units of text vectors
    TXBREAK breaks a text structure into individual words
    TXCONSTRUCT forms a text structure by appending or concatenating values of scalars, variates, texts, factors or pointers; allows the case of letters to be changed or values to truncated and reversed
    TXFIND finds a subtext within a text structure
TXINTEGERCODES converts textual characters to and from their corresponding integer codes
    TXPOSITION locates strings within the lines of a text structure
    TXREPLACE replaces a subtext within a text structure

Another general directive allows you to run many algorithms from the Numerical Algorithms Group Library, for example to build mathematical models.

    NAG calls an algorithm from the NAG Library

Other facilities for vectors are provided by the procedures in the Genstat Procedure Library, including:

    APPEND appends a list of vectors of compatible types
    FACAMEND permutes the levels and labels of a factor
    FACCOMBINATIONS forms a factor to indicate observations with identical values of a set of variates, texts or factors
    FACDIVIDE represents a factor by factorial combinations of a set of factors
FACEXCLUDEUNUSED redefines the levels and labels of a factor
to exclude those that are unused
    FACMERGE merges levels of factors
    FACPRODUCT forms a factor with a level for every combination of other factors
    FACSORT sorts the levels of a factor according to an index vector
    FACLEVSTANDARDIZE redefines a list of factors so that they have the same levels or labels
    FACUNIQUE redefines a factor so that its levels and labels are unique
    FBETWEENGROUPVECTORS forms variates and classifying factors containing within-group summaries to use e.g. in a between-group analysis
    FDISTINCTFACTORS checks sets of factors to remove any that define duplicate classifications
    FMFACTORS forms a pointer of factors representing a multiple-response
    FFREERESPONSEFACTOR forms multiple-response factors from free-response data
    FREGULAR expands vectors onto a regular two-dimensional grid
    FRESTRICTEDSET forms vectors with the restricted subset of a list of vectors
    FROWCANONICALMATRIX puts a matrix into row canonical, or reduced row echelon, form
    FSTRING forms a single string from a list of strings in a text
    FTEXT forms a text structure from a variate
    FUNIQUEVALUES redefines a variate or text so that its values are unique
FWITHINTERMS forms factors to define terms representing the effects of one factor within another factor
    FVSTRING forms a string listing the identifiers of a set of data structures
    GRANDOM generates pseudo-random numbers from probability distributions
 GRMNOMIAL
generates multinomial pseudo-random numbers
    GRMULTINORMAL generates multivariate normal pseudo-random numbers
    JOIN joins or merges two sets of vectors together, based on classifying keys
    MVFILL replaces missing values in a vector with the previous non-missing value
    ORTHPOLYNOMIAL calculates orthogonal polynomials
    QUANTILE calculates quantiles of the values in a variate
    RANK produces ranks, from the values in a variate, allowing for ties
RESHAPE reshapes a data set with classifying factors for rows and columns, into a reorganized data set with new identifying factors
    SAMPLE samples from a set of units, possibly stratified by factors
    SVSAMPLE constructs stratified random samples
    STACK combines several data sets by “stacking” the corresponding vectors
    STANDARDIZE standardizes columns of a data matrix to have mean 0 and variance 1
    SUBSET forms vectors containing subsets of the values in other vectors
    TXPAD pads strings of a text structure with extra characters so that their lengths are equal
TXPROGRESSION forms a text containing a progression of strings
    TXSPLIT splits a text into individual texts, at positions on each line marked by separator character(s)
    UNSTACK splits vectors into individual vectors according to levels of a factor
    VEQUATE equates values across a set of data structures
    VINTERPOLATE performs linear and inverse linear interpolation between variates

There are several procedures for calculating or fitting splines, and for manipulating series of observations of a theoretical curve.

    SPLINE calculates a set of basis functions for M-, B- or I-splines
    LSPLINE calculates design matrices to fit a natural polynomial or trignometric L-spline as a linear mixed model
    NCSPLINE calculates natural cubic spline basis functions (for use e.g. in REML)
    PENSPLINE calculates design matrices to fit a penalized spline as a linear mixed model
    PSPLINE calculates design matrices to fit a P-spline as a linear mixed model
    RADIALSPLINE calculates design matrices to fit a radial-spline surface as a linear mixed model
    TENSORSPLINE calculates design matrices to fit a tensor-spline surface as a linear mixed model
    ALIGNCURVE forms an optimal warping to align an observed series of observations with a standard series
    BASELINE estimates a baseline for a series of numbers whose minimum value is drifting
    PEAKFINDER finds the locations of peaks in an observed series

Directives are available for eigenvalue, QR and singular-value decompositions of matrices, and to form the values of SSPM structures.

    FLRV calculates latent roots and vectors (that is, eigenvalues and eigenvectors)
    QRD calculates QR decompositions of matrices
    SVD calculates singular-value decompositions of matrices
    FSSPM calculates values for SSPM structures (sums of squares and products, means, etc.)

Procedures in the Library for operating on matrices include:

    FCORRELATION forms the correlation matrix for a list of variates
    PARTIALCORRELATIONS calculates partial correlations for a list of variates
    FHADAMARDMATRIX forms Hadamard matrices
    FPROJECTIONMATRIX forms a projection matrix for a set of model terms
    FRTPRODUCTDESIGNMATRIX forms summation, or relationship, matrices for model terms
    FVCOVARIANCE forms the variance-covariance matrix for a list of variates
    GINVERSE calculates the generalized inverse of a matrix
    LINDEPENDENCE finds the linear relations associated with matrix singularities
    MPOWER forms integer powers of a square matrix
    POSSEMIDEFINITE calculates a positive semi-definite approximation of a non-positive semi-definite symmetric matrix
    VMATRIX copies values and row/column labels from a matrix to variates and texts

Tables can be formed containing summaries of values in variates: totals, minimum and maximum values, quantiles, numbers of missing and non-missing values, means and variances. Manipulations of multi-way structures include the ability to add various types of marginal summaries to tables, and to combine “slices” of tables, of matrices or of variates.

    TABULATE forms tables of summaries of the values of a variate
    MARGIN calculates or deletes margins of tables
    COMBINE combines or omits “slices” of tables, matrices or variates

Procedures in the Library for operating on tables include:

    BACKTRANSFORM calculates back-transformed means with approximate standard errors and confidence intervals
    MEDIANTETRAD gives robust identification of multiple outliers in 2-way tables
    MTABULATE tabulates data classified by multiple-response factors
    PERCENT expresses the body of a table as percentages of one of its margins
    SVBOOT bootstraps data from random surveys
    SVCALIBRATE performs generalized calibration of survey data
    SVGLM fits generalized linear models to survey data
    SVREWEIGHT modifies survey weights adjusting to ensure that their overall sum weights remains unchanged
    SVSAMPLE constructs stratified random samples
    SVSTRATIFIED analyses stratified random surveys by expansion or ratio raising
    SVTABULATE tabulates data from random surveys, including multistage surveys and surveys with unequal probabilities of selection
    SVWEIGHT forms survey weights
    TABINSERT inserts the contents of a sub-table into a table
    TABMODE forms summary tables of modes of values
    TABSORT sorts tables so their margins are in ascending or descending order
    TCOMBINE combines several tables into a single table
    T%CONTROL expresses tables as percentages of control cells
    VTABLE forms a variate and set of classifying factors from a table

Directives are available for adding and removing branches of trees, and to assist in the construction and use of trees.

    BASSESS assesses potential splits for regression and classification trees
    BCUT cuts a tree at a defined node, discarding nodes and information below it
    BIDENTIFY identifies specimens using a tree
    BJOIN extends a tree by joining another tree to a terminal node
    BGROW adds new branches to a node of a tree

There are also procedures for displaying and pruning trees. These are provide basic utilities for tree-based analysis, and are used by the existing procedures for classification trees, identification keys and regression trees (BCLASSIFICATION, BKEY and BREGRESSION).

    BCONSTRUCT constructs a tree
    BGRAPH plots a tree
    BPRINT displays a tree
    BPRUNE prunes a tree using minimal cost complexity

Formulae and expressions can be interpreted, revised or constructed automatically from the contents of pointers.

    FARGUMENTS forms lists of arguments involved in an expression
    FCLASSIFICATION forms classification sets for the terms in a formula or breaks a formula up into separate formulae (one for each term)
    REFORMULATE modifies a formula or an expression to operate on a different set of data structures
    SET2FORMULA forms a model formula using structures supplied in a pointer

Values can be assigned to dummies and pointers.

    ASSIGN sets values of dummies and pointers

Aspects of the “environment” of the current job can be modified, such as whether or not Genstat starts output from a statistical analysis at the top of a new page, or whether it should pause during interactive output. New defaults can be set for options and parameters. Details of the environmental settings can be copied into Genstat data structures. Attributes of data structures can also be accessed.

    SET sets details of the “environment” of a Genstat job
    SETOPTION sets or modifies defaults of options of Genstat directives or procedures
    SETPARAMETER sets or modifies defaults of parameters of Genstat directives or procedures
    GET gets details of the “environment” of a Genstat job
    GETATTRIBUTE accesses attributes of data structures
    GETNAME forms the name of a structure according to its IPRINT attribute

There are also various specialist mathematical facilities.

 BPCONVERT converts bit patterns between integers, pointers of set bits and textual descriptions
    FPARETOSET forms the Pareto optimal set of non-dominated groups
    GALOIS forms addition and multiplication tables for a Galois finite field
    NCONVERT converts integers between base 10 and other bases
    PERMUTE forms all possible permutations of the integers 1…n
    PRIMEPOWER decomposes a positive integer into its constituent prime powers

 

Updated on February 9, 2022

Was this article helpful?