From non_corporates.do file. Source - https://sites.google.com/site/patentdataproject/Home/posts/namestandardizationroutinesuploaded
Usage
cockburn_detect_inst(
x,
patterns = cockburn_patterns_inst,
patterns_col = 1,
patterns_codes_col = 2,
patterns_type = "fixed",
patterns_type_col = NULL,
patterns_codes = "inst",
output_codes_col_name = "{col_name_}entity_type",
codes_omitted_rows_value = NULL,
no_match_code = NULL,
merge_existing_codes = "append_to_existing",
return_only_codes = FALSE,
return_only_first_detected_code = TRUE,
return_merge_codes_description = FALSE,
...
)
Arguments
- x
Vector or table to detect in.
- patterns
Accepts both vector or table. If patterns it is table can also include replacements column.
- patterns_col
If patterns is a table this specifies which column to use. Default is 1.
- patterns_codes_col
If patterns is table which column to use as codes column. Default is 2.
- patterns_type
Specifies kind(s) of patterns. Default is "fixed" (calling codestri_replace_all_fixed). Other options are:
- patterns_type_col
Column in patterns table where you can specify types of patterns for each pattern. If set then
patterns.type
is ignored. Default is NULL.- patterns_codes
If provided use it as codes. Should be the same length as patterns. Default is NULL.
- output_codes_col_name
If provided use it as a name for codes column (new if it does not exist or the one to update) in results.
- codes_omitted_rows_value
If
rows
is set. Use this value to fill the omitted rows. When we update existing codes column Default is NULL which means that we use initial codes values for omitted rows. If there is no codes col to update omited rows will be filled with NA.- no_match_code
If provided code records that did not get any match with it.
- merge_existing_codes
Whether to merge newly detected codes with existing. Options are:
- return_only_codes
If toggled on then just return codes vector.
- return_only_first_detected_code
If TRUE then return only codes for the first detected pattern. If FALSE return list of all matched codes. Default is TRUE. (Currently does affect performance)
- return_merge_codes_description
Return description of choices for
merge_existing_codes
paramenter.- ...
Arguments passed on to
standardize_options
col
Column of interest (the one we need to standardize) in the
x
object (if it is data.frame like).rows
Logical vector to filter records of interest. Default is NULL which means do not filter records.
omitted_rows_value
If
rows
parameter is set then mergeomitted_rows_value
with the results (filtered byrows
). Either single string or a character vector of lengthnrow(x)
. If NULL (the default) then original values ofcol
are merged with results.output_placement
Where to inset retults (standardized vector) in the
x
object. Default options is 'replace_col' which overwrides thecol
inx
with results. Other options:'omit' :: do not write results back to table (usually used when
append_output_copy
is set for temporary values)'prepend_to_col' :: prepend to
col
'append_to_col' :: append to
col
'prepend_to_x' :: prepend to
x
data.frame like object'append_to_x' :: append to
x
data.frame like object
x_atomic_name
If
x
is vector use this name for original column if it is in results. Default is "x". Ifx
is table the name ofcol
will be used.output_col_name
Use this name for the column with results (standardized values). Parts in curly brakeds are substitute strings. Options for substitutions are:
append_output_copy
Whether to append a copy of result vector to
x
objectoutput_copy_col_name
How the append copy wiil be named
See also
detect_patterns
Other magerman:
cockburn_detect_corp()
,
cockburn_detect_govt()
,
cockburn_detect_hosp()
,
cockburn_detect_indiv()
,
cockburn_detect_inst_conds_1()
,
cockburn_detect_univ()
,
cockburn_detect_uspto()
,
cockburn_remove_standard_names()
,
cockburn_remove_uspto()
,
cockburn_replace_compustat_names()
,
cockburn_replace_compustat()
,
cockburn_replace_derwent()
,
cockburn_replace_govt()
,
cockburn_replace_univ()
,
magerman_condense()
,
magerman_detect_characters()
,
magerman_detect_comma_period_irregularities()
,
magerman_detect_legal_form_beginning()
,
magerman_detect_legal_form_end()
,
magerman_detect_legal_form_middle()
,
magerman_detect_umlaut()
,
magerman_remove_common_words_anywhere()
,
magerman_remove_common_words_at_the_beginning()
,
magerman_remove_common_words_at_the_end()
,
magerman_remove_double_quotation_marks_beginning_end()
,
magerman_remove_double_quotation_marks_irregularities()
,
magerman_remove_double_spaces()
,
magerman_remove_html_codes()
,
magerman_remove_non_alphanumeric_at_the_beginning()
,
magerman_remove_non_alphanumeric_at_the_end()
,
magerman_remove_special_characters()
,
magerman_replace_accented_characters()
,
magerman_replace_comma_period_irregularities_all()
,
magerman_replace_comma_period_irregularities()
,
magerman_replace_legal_form_beginning()
,
magerman_replace_legal_form_end()
,
magerman_replace_legal_form_middle()
,
magerman_replace_proprietary_characters()
,
magerman_replace_sgml_characters()
,
magerman_replace_spelling_variation()
,
standardize_eee_ppat()