Skip to contents

Standardizes strings using exact procedures described in Magerman et al. 2009.

Usage

standardize_magerman(
  x,
  detect_legal_form = FALSE,
  append_output_copy_before_common_words_removal = FALSE,
  condense_words = FALSE,
  ...
)

Arguments

x

table or vector

detect_legal_form

Whether to detect legal forms. Default is FALSE

append_output_copy_before_common_words_removal

Whether to save standardized column before common.words.removal procedure. Default is FALSE

condense_words

Whether to remove all spaces in standard names

...

Arguments passed on to standardize

procedures

The procedures that basically comprise the standardization algorithm are specified as a list of either (1) names of procedures function as character strings or as (2) calls where you can provide optional arguments or (3) nested lists that allow user to group procedures. Nesting lists of procedures has an effect on standardization progress reporting and on visualizing algorithms with nstandr_plot. Technically nested lists are equivalent to plain list of procedures as it should produce same results. Names of the list elements are used for progress messages. For unnamed elements the name of procedure's function will be used for standardization progress reporting. Default is nstandr:::nstandr_default_procedures_list.

show_progress

Whether to report progress percentage. Default is TRUE

nrows_min_to_show_progress

The minimum number of rows the x should have for automatic progress estimation. If x has less rows no progress will be shown. Default is 10^5

progress_step_nrows

If set it will divide the x into chunk of this amount of rows. Default is NULL.

progress_step_in_percent

Number of percents that represent one step in progress. Value should be between 0.1 and 50. Default is 1 which means it will try to chunk the x into 100 pieces.

progress_message_use_names

Should we use names from procedures list to report progress. Default is TRUE.

quite

Suppress all messages. Default is FALSE.

save_intermediate_x_to_var

For debuging of standartization procedures. Saves intermediate results to this variable. If procedures finish without errors then the variable will be removed.

Value

standardized names table

References

Magerman et al., 2006 - Data Production Methods for Harmonized Patent Statistics: Patentee Name Standardization