Last Updated on by Karina Shakya
- Data sets are made up of data objects.
- A data object represents an entity.
- Examples: – sales database: customers, store items, sales – medical database: patients, treatments – university database: students, professors, courses .
- Also called samples , examples, instances, data points, objects, tuples.
- Data objects are described by attributes.
- Database rows -> data objects; columns ->attributes.
- Attribute: data field representing a characteristic or feature of data object.
- A set of attributes used to describe a given object is called attribute vector.
- The distribution of data involving one attribute is called univariate.
- A bivariate distribution involves two attributes and so on.
- The type of attribute is determined by the set of possible values- nominal, binary,ordinal,or numeric-the attribute can have.
- Nominal Attributes
- The values of nominal attributes are symbol or name of things.
- Each value represent some kind of category, code, or state and so nominal attribute are also referred to as categorical.
- Values of nominal attribute do not have any meaningful order about them and are not quantitative
- Examples: Hair_color = {black, brown, grey, red, white}
marital status, occupation, etc.
- Binary Attributes
- Nominal attribute with only two categories or states: 0 or 1, where 0 means attribute is absent and 1 means that it is present.
- Binary attributes are referred to as Boolean if two states correspond to true or false.
- Symmetric binary: both outcomes equally important
e.g., gender
- Asymmetric binary: outcomes not equally important.
e.g., medical test (positive vs. negative)
Convention: assign 1 to most important outcome (e.g., HIV positive)
- Ordinal Attributes
- It is attributed with possible values that have a meaningful order or ranking among them but magnitude between successive values is not known.
- Size = {small, medium, large}, grades, rankings
- Numeric attributes
- It is quantitative i.e. it is a measurable quantity, represented in integer or real values.
- They can be interval-scaled or ratio-scaled.
- Ratio: Inherent zero-point i.e. if a measurement is ratio-scaled, we can speak of a value as being multiple (or ratio) of another value.
- values are ordered and difference between values can be computed along with mean, median and mode.
- e.g., length, counts, monetary quantities
- Interval: Measured on a scale of equal-sized units
- Values have order and can be positive, 0, or negative. Thus , in addition to providing a ranking of values, such attributes allow us to compare and quantify the difference between values.
- E.g., calendar dates
- No true zero-point