Standardizes organizational names. Takes either vector or column in the table.
Source:R/nstandr.r
standardize.Rd
Standardizes organizational names. Takes either vector or column in the table.
Usage
standardize(
x,
procedures = nstandr_default_procedures_list,
show_progress = TRUE,
nrows_min_to_show_progress = 10^3,
progress_step_nrows = NULL,
progress_step_in_percent = 1,
progress_message_use_names = TRUE,
quite = FALSE,
save_intermediate_x_to_var = NULL,
...
)
make_std_names(
x,
procedures = nstandr_default_procedures_list,
show_progress = TRUE,
nrows_min_to_show_progress = 10^3,
progress_step_nrows = NULL,
progress_step_in_percent = 1,
progress_message_use_names = TRUE,
quite = FALSE,
save_intermediate_x_to_var = NULL,
...
)
make_standard_names(
x,
procedures = nstandr_default_procedures_list,
show_progress = TRUE,
nrows_min_to_show_progress = 10^3,
progress_step_nrows = NULL,
progress_step_in_percent = 1,
progress_message_use_names = TRUE,
quite = FALSE,
save_intermediate_x_to_var = NULL,
...
)
nstand(
x,
procedures = nstandr_default_procedures_list,
show_progress = TRUE,
nrows_min_to_show_progress = 10^3,
progress_step_nrows = NULL,
progress_step_in_percent = 1,
progress_message_use_names = TRUE,
quite = FALSE,
save_intermediate_x_to_var = NULL,
...
)
Arguments
- x
object (table)
- procedures
The procedures that basically comprise the standardization algorithm are specified as a list of either (1) names of procedures function as character strings or as (2) calls where you can provide optional arguments or (3) nested lists that allow user to group procedures. Nesting lists of procedures has an effect on standardization progress reporting and on visualizing algorithms with
nstandr_plot
. Technically nested lists are equivalent to plain list of procedures as it should produce same results. Names of the list elements are used for progress messages. For unnamed elements the name of procedure's function will be used for standardization progress reporting. Default isnstandr:::nstandr_default_procedures_list
.- show_progress
Whether to report progress percentage. Default is TRUE
- nrows_min_to_show_progress
The minimum number of rows the x should have for automatic progress estimation. If x has less rows no progress will be shown. Default is 10^5
- progress_step_nrows
If set it will divide the x into chunk of this amount of rows. Default is NULL.
- progress_step_in_percent
Number of percents that represent one step in progress. Value should be between 0.1 and 50. Default is 1 which means it will try to chunk the x into 100 pieces.
- progress_message_use_names
Should we use names from
procedures
list to report progress. Default is TRUE.- quite
Suppress all messages. Default is FALSE.
- save_intermediate_x_to_var
For debuging of standartization procedures. Saves intermediate results to this variable. If procedures finish without errors then the variable will be removed.
- ...
Arguments passed on to
standardize_options
col
Column of interest (the one we need to standardize) in the
x
object (if it is data.frame like).rows
Logical vector to filter records of interest. Default is NULL which means do not filter records.
omitted_rows_value
If
rows
parameter is set then mergeomitted_rows_value
with the results (filtered byrows
). Either single string or a character vector of lengthnrow(x)
. If NULL (the default) then original values ofcol
are merged with results.output_placement
Where to inset retults (standardized vector) in the
x
object. Default options is 'replace_col' which overwrides thecol
inx
with results. Other options:'omit' :: do not write results back to table (usually used when
append_output_copy
is set for temporary values)'prepend_to_col' :: prepend to
col
'append_to_col' :: append to
col
'prepend_to_x' :: prepend to
x
data.frame like object'append_to_x' :: append to
x
data.frame like object
x_atomic_name
If
x
is vector use this name for original column if it is in results. Default is "x". Ifx
is table the name ofcol
will be used.output_col_name
Use this name for the column with results (standardized values). Parts in curly brakeds are substitute strings. Options for substitutions are:
append_output_copy
Whether to append a copy of result vector to
x
objectoutput_copy_col_name
How the append copy wiil be named