`typeddfs.df_typing`

Information about how DataFrame subclasses should be handled.

Module Contents

typeddfs.df_typing.FINAL_DF_TYPING

typeddfs.df_typing.FINAL_IO_TYPING

class typeddfs.df_typing.DfTyping

Contains all information about how to type a DataFrame subclass.

_auto_dtypes :Optional[Mapping[str, Type[Any]]]

_column_series_name :Union[bool, None, str]

_columns_to_drop :Optional[Set[str]]

_index_series_name :Union[bool, None, str]

_io_typing :IoTyping

_more_columns_allowed :bool = True

_more_index_names_allowed :bool = True

_order_dclass :bool = True

_post_processing :Optional[Callable[[T], Optional[T]]]

_required_columns :Optional[Sequence[str]]

_required_index_names :Optional[Sequence[str]]

_reserved_columns :Optional[Sequence[str]]

_reserved_index_names :Optional[Sequence[str]]

_value_dtype :Optional[Type[Any]]

_verifications :Optional[Sequence[Callable[[T], Union[None, bool, str]]]]

property auto_dtypes(self) → Mapping[str, Type[Any]]: A mapping from column/index names to the expected dtype. These are used via pd.Series.as_type for automatic conversion. An error will be raised if a as_type call fails. Note that Pandas frequently just does not perform the conversion, rather than raising an error. The keys should be contained in known_names, but this is not strictly required.

property column_series_name(self) → Union[bool, None, str]: Intelligently returns df.columns.name. Returns a value that will be forced into df.columns.name on calling convert. If None, will set df.columns.name = None. If False, will not set. (True is treated the same as None.)

property columns_to_drop(self) → Set[str]: Returns the list of columns that are automatically dropped by convert. This does NOT include “level_0” and “index, which are ALWAYS dropped.

copy(self, **kwargs) → DfTyping

property index_series_name(self) → Union[bool, None, str]: Intelligently returns df.index.name. Returns a value that will be forced into df.index.name on calling convert, only if the DataFrame is multi-index. If None, will set df.index.name = None if df.index.names != [None]. If False, will not set. (True is treated the same as None.)

property io(self) → IoTyping

property is_strict(self) → bool: Returns True if this allows unspecified index levels or columns.

property known_column_names(self) → Sequence[str]: Returns all columns that are required or reserved. The sort order positions required columns first.

property known_index_names(self) → Sequence[str]: Returns all index levels that are required or reserved. The sort order positions required columns first.

property known_names(self) → Sequence[str]: Returns all index and column names that are required or reserved. The sort order is: required index, reserved index, required columns, reserved columns.

property more_columns_allowed(self) → bool: Returns whether the DataFrame allows columns that are not reserved or required.

property more_indices_allowed(self) → bool: Returns whether the DataFrame allows index levels that are neither reserved nor required.

property order_dataclass(self) → bool: Whether the corresponding dataclass can be sorted (has __lt__).

property post_processing(self) → Optional[Callable[[T], Optional[T]]]: A function to be called at the final stage of convert. It is called immediately before verifications are checked. The function takes a copy of the input BaseDf and returns a new copy.

Note

Although a copy is passed as input, the function should not modify it. Technically, doing so will cause problems only if the DataFrame’s internal values are modified. The value passed is a shallow copy (see pd.DataFrame.copy).

property required_columns(self) → Sequence[str]: Returns the list of required column names.

property required_index_names(self) → Sequence[str]: Returns the list of required column names.

property required_names(self) → Sequence[str]: Returns all index and column names that are required. The sort order is: required index, required columns.

property reserved_columns(self) → Sequence[str]: Returns the list of reserved (optional) column names.

property reserved_index_names(self) → Sequence[str]: Returns the list of reserved (optional) index levels.

property reserved_names(self) → Sequence[str]: Returns all index and column names that are not required. The sort order is: reserved index, reserved columns.

property value_dtype(self) → Optional[Type[Any]]: A type for “values” in a simple DataFrame. Typically numeric.

property verifications(self) → Sequence[Callable[[T], Union[None, bool, str]]]

Additional requirements for the DataFrame to be conformant.

Returns: A sequence of conditions that map the DF to None or True if the condition passes, or False or the string of an error message if it fails

class typeddfs.df_typing.IoTyping

Abstract base class for generic types.

A generic type is typically declared by inheriting from this class parameterized with one or more type variables. For example, a generic mapping type might be defined as:

class Mapping(Generic[KT, VT]):
    def __getitem__(self, key: KT) -> VT:
        ...
    # Etc.

This class can then be used as follows:

def lookup_name(mapping: Mapping[KT, VT], key: KT, default: VT) -> VT:
    try:
        return mapping[key]
    except KeyError:
        return default

_attrs_json_kwargs :Optional[Mapping[str, Any]]

_attrs_suffix :str = .attrs.json

_custom_readers :Optional[Mapping[str, Callable[[pathlib.Path], pandas.DataFrame]]]

_custom_writers :Optional[Mapping[str, Callable[[pandas.DataFrame, pathlib.Path], None]]]

_hash_alg :Optional[str] = sha256

_hdf_key :str = df

_read_kwargs :Optional[Mapping[typeddfs.file_formats.FileFormat, Mapping[str, Any]]]

_recommended :bool = False

_remap_suffixes :Optional[Mapping[str, typeddfs.file_formats.FileFormat]]

_remapped_read_kwargs :Optional[Mapping[str, Any]]

_remapped_write_kwargs :Optional[Mapping[str, Any]]

_save_hash_dir :bool = False

_save_hash_file :bool = False

_secure :bool = False

_text_encoding :str = utf-8

_use_attrs :bool = False

_write_kwargs :Optional[Mapping[typeddfs.file_formats.FileFormat, Mapping[str, Any]]]

property attrs_json_kwargs(self) → Mapping[str, Any]: Keyword arguments for typeddfs.json_utils.JsonUtils.encoder. Used when writing attrs.

property attrs_suffix(self) → str: File filename suffix detailing where to save/load per-DataFrame “attrs” (metadata). Will be appended to the DataFrame filename.

copy(self, **kwargs) → IoTyping

property custom_readers(self) → Mapping[str, Callable[[pathlib.Path], pandas.DataFrame]]: Mapping from filename suffixes (module compression) to custom reading methods.

property custom_writers(self) → Mapping[str, Callable[[pandas.DataFrame, pathlib.Path], None]]: Mapping from filename suffixes (module compression) to custom reading methods.

property dir_hash(self) → bool: Whether to save (append) to per-directory hash files by default. Specifically, in typeddfs.abs_df.AbsDf.write_file().

property file_hash(self) → bool: Whether to save per-file hash files by default. Specifically, in typeddfs.abs_df.AbsDf.write_file().

property flexwf_sep(self) → str: The delimiter used when reading “flex-width” format.

Caution

Only checks the read keyword arguments, not write

property hash_algorithm(self) → Optional[str]: The hash algorithm used for checksums.

property hdf_key(self) → str: The default key used in typeddfs.abs_df.AbsDf.to_hdf(). The key is also used in typeddfs.abs_df.AbsDf.read_hdf.()

property is_text_encoding_utf(self) → bool

property read_kwargs(self) → Mapping[typeddfs.file_formats.FileFormat, Mapping[str, Any]]: Passes kwargs into read functions from read_file. These are keyword arguments that are automatically added into specific read_ methods when called by read_file.

Note

This should rarely be needed

property read_suffix_kwargs(self) → Mapping[str, Mapping[str, Any]]: Per-suffix kwargs into read functions from read_file. Modulo compression (e.g. .tsv is equivalent to .tsv.gz).

property recommended(self) → bool: Whether to forbid discouraged formats like fixed-width and HDF5. Excludes all insecure formats.

property remap_suffixes(self) → Mapping[str, typeddfs.file_formats.FileFormat]: Returns filename formats that have been re-mapped to file formats. These are used in read_file and write_file.

Note

This should rarely be needed. An exception might be .txt to tsv rather than lines; Excel uses this.

property secure(self) → bool: Whether to forbid insecure operations and formats.

property text_encoding(self) → str: Can be an exact encoding like utf-8, “platform”, “utf8(bom)” or “utf16(bom)”. See the docs in TypedDfs.typed().encoding for details.

property toml_aot(self) → str: The name of the Array of Tables (AoT) used when reading TOML.

Caution

Only checks the read keyword arguments, not write

property use_attrs(self) → bool: Whether to read and write pd.DataFrame.attrs when passing attrs=None.

property write_kwargs(self) → Mapping[typeddfs.file_formats.FileFormat, Mapping[str, Any]]: Passes kwargs into write functions from to_file. These are keyword arguments that are automatically added into specific to_ methods when called by write_file.

Note

This should rarely be needed

property write_suffix_kwargs(self) → Mapping[str, Mapping[str, Any]]: Per-suffix kwargs into read functions from write_file. Modulo compression (e.g. .tsv is equivalent to .tsv.gz).

typeddfs.df_typing

Module Contents

`typeddfs.df_typing`