Function reference • nstandr

All functions

browse_dot_graph(): Generates a temporary html file with visualization of given nstandr procedures and opens it in a browser (specified in options('browser')). This requires DiagrammeR package to be installed. If you do not have those you can cat() the returned string from `nstandr:::make_dot_graph()' to the R console and copy it to some web tool for dot visualization (e.g.http://magjac.com/graphviz-visual-editor)

check_rows(): Assumes that rows (if logical) are same length as x

cockburn_combabbrev(): Collapses single character sequences

cockburn_detect_corp(): Detect Corporates (code - 'firm')

cockburn_detect_govt(): Detect Goverment Organizations (Non-Corporates group)

cockburn_detect_hosp(): Detect Hospitals (Non-Corporates group)

cockburn_detect_indiv(): Detect Individuals (Non-Corporates group)

cockburn_detect_inst(): Detect Non-profit Institutes (Non-Corporates group)

cockburn_detect_inst_conds(): Detects Non-profit institutes with special conditions

cockburn_detect_inst_conds_1(): Detects Non-profit institutes with special conditions

cockburn_detect_inst_conds_2(): Detects Non-profit institutes with special conditions

cockburn_detect_inst_german(): Detects German Non-profit institutes

cockburn_detect_type(): Identifies Entity Type

cockburn_detect_univ(): Detect Universities (Non-Corporates group)

cockburn_detect_uspto(): Special USPTO codes. Codes as 'indiv'

cockburn_remove_standard_names(): Creates so called stem name (a name with all legal entity identifiers removed)

cockburn_remove_uspto(): Removes special USPTO codes.

cockburn_replace_compustat(): COMPUSTAT specific standardization for organizational names

cockburn_replace_compustat_names(): COMPUSTAT specific standardization for organizational names. Full name replacements.

cockburn_replace_derwent(): Performs Derwent standardization of organizational names

cockburn_replace_govt(): Cleanup Goverment Organizations (Non-Corporates group)

cockburn_replace_punctuation(): Removes punctuation and standardise some symbols.

cockburn_replace_standard_names(): Create standard name

cockburn_replace_type(): Cleanup Entity Type

cockburn_replace_univ(): Cleanup Universities (Non-Corporates group)

defactor(): Defactor the object

defactor_vector(): Converts factor to character

detect_legal_form(): Detect legal form

detect_patterns(): Codes strings (e.g., organizational names) based on certain patterns

escape_regex(): Escapes special for regex characters

escape_regex_for_type(): Escapes special for different types of pattern

escape_regex_for_types(): Escapes special for regex characters conditionally

get_dots(): Provides access to arguments of nested functions. Sort of an alterative mechanism to passing ... arguments but with more features.

get_standardize_options(): Gets standardize_options at point with consistent updates up through calling stack.

get_target(): Gets a target vector to standardize.

get_vector(): Gets vector by column and defactor if needed. Optionaly one can provide a fallback_value which will be returned if col is not specified.

inset_target(): Insets target vector back to input object (x)

is_empty(): Checks if string has something to print

magerman_condense(): Condensing names

magerman_detect_characters(): Detect candidates for characters that need to be cleaned

magerman_detect_comma_period_irregularities(): Detects comma period irregularities

magerman_detect_legal_form(): Detect legal form

magerman_detect_legal_form_beginning(): Detects legal form at the beginning of a name

magerman_detect_legal_form_end(): Detects legal form at the end of a name

magerman_detect_legal_form_middle(): Detects legal form in the middle of a name

magerman_detect_umlaut(): Detect umlauts

magerman_remove_common_words(): Remove common words

magerman_remove_common_words_anywhere(): Removes common words anywhere in a name

magerman_remove_common_words_at_the_beginning(): Removes common words at the beginning of a name

magerman_remove_common_words_at_the_end(): Removes common words at the end of a name

magerman_remove_double_quotation_marks_beginning_end(): Removes double quotation irregularities

magerman_remove_double_quotation_marks_irregularities(): Removes double quotation irregularities

magerman_remove_double_spaces(): Removes double spaces

magerman_remove_html_codes(): Removes html codes

magerman_remove_legal_form(): Removes legal form

magerman_remove_legal_form_and_clean(): Removes legal form

magerman_remove_non_alphanumeric_at_the_beginning(): Removes non alphanumeric characters at the beginning of a name

magerman_remove_non_alphanumeric_at_the_end(): Removes non alphanumeric characters at the end of a name

magerman_remove_special_characters(): Removes special characters

magerman_replace_accented_characters(): Replaces accented characters

magerman_replace_comma_period_irregularities(): Replaces comma period irregularities

magerman_replace_comma_period_irregularities_all(): Replaces comma and period irregularities

magerman_replace_legal_form_beginning(): Replaces legal form at the beginning of a name

magerman_replace_legal_form_end(): Replaces legal form at the end of a name

magerman_replace_legal_form_middle(): Replace legal form in the middle of a name

magerman_replace_proprietary_characters(): Replaces proprietary characters

magerman_replace_sgml_characters(): Replaces sgml characters

magerman_replace_spelling_variation(): Replaces spelling variation

magerman_replace_umlaut(): Replaces Umlauts

make_dot_edges(): Makes dot graph edges for visualizing arrows between sequence of procedures.

make_dot_graph(): Generates graph description for visualizing list of procedures in dot format.

make_dot_nodes(): Generates description of dot graph nodes.

paste_dot_node(): Makes a dot node (as html table) from procedure's attributes.

paste_dot_node_tr_td(): Makes TR TD record for dot node TABLE

replace_patterns(): A wrapper for string replacement and cbinding some columns.

save_dot_graph_as(): Saves dot graph as file using system command 'dot' from GraphViz (https://graphviz.org/) if installed.

standardize() make_std_names() make_standard_names() nstand(): Standardizes organizational names. Takes either vector or column in the table.

standardize_cockburn(): Standardizes strings using exact procedures described in Cockburn, et al. (2009)

standardize_dehtmlize(): Converts HTML characters to UTF-8

standardize_detect_enc(): Detects string encoding

standardize_is_data_empty(): Checks if all elements in vector(s) are either "", NA, NULL or have zero length

standardize_magerman(): Standardizes strings using exact procedures described in Magerman et al. 2009.

standardize_make_procedures_list(): Makes list of procedures calls from table.

standardize_omit_empty(): Removes elements that are either "", NA, NULL or have zero length

standardize_options(): Does nothing but stores (as its own default arguments) options that control vector handeling through standardization process. These options are available in most nstandr functions that accept ... parameter.

standardize_remove_brackets(): Removes brackets and content in brackets

standardize_remove_quotes(): Removes double quotes

standardize_squish_spaces(): Removes redundant whitespases

standardize_toascii(): Translates non-ascii symbols to its ascii equivalent

standardize_toupper(): Uppercases vector of interest in the object (table)

standardize_x_split(): Splits the object (table) in chunks by rows

unlist_if_possible(): If column in the x table is list unlist it if possible

visualize(): Visualizes list of procedures.

x_length(): Gets lengths of the object

x_width(): Gets width of the object