Cleanup Goverment Organizations (Non-Corporates group)

From non_corporates.do file. Source - https://sites.google.com/site/patentdataproject/Home/posts/namestandardizationroutinesuploaded

Usage

cockburn_replace_govt(
  x,
  patterns = cockburn_patterns_govt_cleanup,
  patterns_col = 1,
  patterns_mode = "all",
  patterns_mode_col = NULL,
  patterns_type = "fixed",
  patterns_type_col = NULL,
  patterns_replacements_col = 2,
  replacements = if (is.atomic(patterns)) "" else NULL,
  ...
)

Arguments

x

Vector or table to standardize.

patterns

Accepts both vector or table. If patterns is a table can also include replacements column.

patterns_col

If patterns is not a vector which column to use. Default is 1.

patterns_mode

Mode of matching. Could be one of c("all", "first", "last"). The default is "all" (it is 2x faster than "first" and "last" because of handy stri_replace_all_* functions). Also possible to pass a vector (same length as patterns)

patterns_mode_col

Column in patterns table with the mode of matching

patterns_type

Type of pattern for matching. Default is "fixed" (calling codestri_replace_all_fixed). Other options are:

patterns_type_col

Column with the type of pattern in case when patterns should have different types

patterns_replacements_col

If patterns is not a vector and includes replacements which column to use for replacements. Default is 2.

replacements

If patterns does not have column with replacements provide it here.

...

Arguments passed on to standardize_options

col

Column of interest (the one we need to standardize) in the x object (if it is data.frame like).

rows

Logical vector to filter records of interest. Default is NULL which means do not filter records.

omitted_rows_value

If rows parameter is set then merge omitted_rows_value with the results (filtered by rows). Either single string or a character vector of length nrow(x). If NULL (the default) then original values of col are merged with results.

output_placement

Where to inset retults (standardized vector) in the x object. Default options is 'replace_col' which overwrides the col in x with results. Other options:

'omit' :: do not write results back to table (usually used when append_output_copy is set for temporary values)
'prepend_to_col' :: prepend to col
'append_to_col' :: append to col
'prepend_to_x' :: prepend to x data.frame like object
'append_to_x' :: append to x data.frame like object

x_atomic_name

If x is vector use this name for original column if it is in results. Default is "x". If x is table the name of col will be used.

output_col_name

Use this name for the column with results (standardized values). Parts in curly brakeds are substitute strings. Options for substitutions are:

append_output_copy

Whether to append a copy of result vector to x object

output_copy_col_name

How the append copy wiil be named

Value

standardized names table

Other magerman: cockburn_detect_corp(), cockburn_detect_govt(), cockburn_detect_hosp(), cockburn_detect_indiv(), cockburn_detect_inst_conds_1(), cockburn_detect_inst(), cockburn_detect_univ(), cockburn_detect_uspto(), cockburn_remove_standard_names(), cockburn_remove_uspto(), cockburn_replace_compustat_names(), cockburn_replace_compustat(), cockburn_replace_derwent(), cockburn_replace_univ(), magerman_condense(), magerman_detect_characters(), magerman_detect_comma_period_irregularities(), magerman_detect_legal_form_beginning(), magerman_detect_legal_form_end(), magerman_detect_legal_form_middle(), magerman_detect_umlaut(), magerman_remove_common_words_anywhere(), magerman_remove_common_words_at_the_beginning(), magerman_remove_common_words_at_the_end(), magerman_remove_double_quotation_marks_beginning_end(), magerman_remove_double_quotation_marks_irregularities(), magerman_remove_double_spaces(), magerman_remove_html_codes(), magerman_remove_non_alphanumeric_at_the_beginning(), magerman_remove_non_alphanumeric_at_the_end(), magerman_remove_special_characters(), magerman_replace_accented_characters(), magerman_replace_comma_period_irregularities_all(), magerman_replace_comma_period_irregularities(), magerman_replace_legal_form_beginning(), magerman_replace_legal_form_end(), magerman_replace_legal_form_middle(), magerman_replace_proprietary_characters(), magerman_replace_sgml_characters(), magerman_replace_spelling_variation(), standardize_eee_ppat()

Usage

Arguments

Value

See also