Detects legal form in the middle of a name
Source:R/magerman.r
magerman_detect_legal_form_middle.RdDetects legal form in the middle of a name
Usage
magerman_detect_legal_form_middle(
x,
patterns = magerman_patterns_legal_form_middle,
patterns_col = 1,
patterns_codes_col = 3,
patterns_type = "fixed",
patterns_type_col = NULL,
patterns_codes = NULL,
output_codes_col_name = "{col_name_}legal_form",
codes_omitted_rows_value = NULL,
no_match_code = NULL,
merge_existing_codes = "replace_empty",
return_only_codes = FALSE,
return_only_first_detected_code = TRUE,
return_merge_codes_description = FALSE,
...
)Arguments
- x
Vector or table to detect in.
- patterns
Accepts both vector or table. If patterns it is table can also include replacements column.
- patterns_col
If patterns is a table this specifies which column to use. Default is 1.
- patterns_codes_col
If patterns is table which column to use as codes column. Default is 2.
- patterns_type
Specifies kind(s) of patterns. Default is "fixed" (calling codestri_replace_all_fixed). Other options are:
- patterns_type_col
Column in patterns table where you can specify types of patterns for each pattern. If set then
patterns.typeis ignored. Default is NULL.- patterns_codes
If provided use it as codes. Should be the same length as patterns. Default is NULL.
- output_codes_col_name
If provided use it as a name for codes column (new if it does not exist or the one to update) in results.
- codes_omitted_rows_value
If
rowsis set. Use this value to fill the omitted rows. When we update existing codes column Default is NULL which means that we use initial codes values for omitted rows. If there is no codes col to update omited rows will be filled with NA.- no_match_code
If provided code records that did not get any match with it.
- merge_existing_codes
Whether to merge newly detected codes with existing. Options are:
- return_only_codes
If toggled on then just return codes vector.
- return_only_first_detected_code
If TRUE then return only codes for the first detected pattern. If FALSE return list of all matched codes. Default is TRUE. (Currently does affect performance)
- return_merge_codes_description
Return description of choices for
merge_existing_codesparamenter.- ...
Arguments passed on to
standardize_optionscolColumn of interest (the one we need to standardize) in the
xobject (if it is data.frame like).rowsLogical vector to filter records of interest. Default is NULL which means do not filter records.
omitted_rows_valueIf
rowsparameter is set then mergeomitted_rows_valuewith the results (filtered byrows). Either single string or a character vector of lengthnrow(x). If NULL (the default) then original values ofcolare merged with results.output_placementWhere to inset retults (standardized vector) in the
xobject. Default options is 'replace_col' which overwrides thecolinxwith results. Other options:'omit' :: do not write results back to table (usually used when
append_output_copyis set for temporary values)'prepend_to_col' :: prepend to
col'append_to_col' :: append to
col'prepend_to_x' :: prepend to
xdata.frame like object'append_to_x' :: append to
xdata.frame like object
x_atomic_nameIf
xis vector use this name for original column if it is in results. Default is "x". Ifxis table the name ofcolwill be used.output_col_nameUse this name for the column with results (standardized values). Parts in curly brakeds are substitute strings. Options for substitutions are:
append_output_copyWhether to append a copy of result vector to
xobjectoutput_copy_col_nameHow the append copy wiil be named
See also
detect_patterns
Other magerman:
cockburn_detect_corp(),
cockburn_detect_govt(),
cockburn_detect_hosp(),
cockburn_detect_indiv(),
cockburn_detect_inst_conds_1(),
cockburn_detect_inst(),
cockburn_detect_univ(),
cockburn_detect_uspto(),
cockburn_remove_standard_names(),
cockburn_remove_uspto(),
cockburn_replace_compustat_names(),
cockburn_replace_compustat(),
cockburn_replace_derwent(),
cockburn_replace_govt(),
cockburn_replace_univ(),
magerman_condense(),
magerman_detect_characters(),
magerman_detect_comma_period_irregularities(),
magerman_detect_legal_form_beginning(),
magerman_detect_legal_form_end(),
magerman_detect_umlaut(),
magerman_remove_common_words_anywhere(),
magerman_remove_common_words_at_the_beginning(),
magerman_remove_common_words_at_the_end(),
magerman_remove_double_quotation_marks_beginning_end(),
magerman_remove_double_quotation_marks_irregularities(),
magerman_remove_double_spaces(),
magerman_remove_html_codes(),
magerman_remove_non_alphanumeric_at_the_beginning(),
magerman_remove_non_alphanumeric_at_the_end(),
magerman_remove_special_characters(),
magerman_replace_accented_characters(),
magerman_replace_comma_period_irregularities_all(),
magerman_replace_comma_period_irregularities(),
magerman_replace_legal_form_beginning(),
magerman_replace_legal_form_end(),
magerman_replace_legal_form_middle(),
magerman_replace_proprietary_characters(),
magerman_replace_sgml_characters(),
magerman_replace_spelling_variation(),
standardize_eee_ppat()