B.Sc. CSIT 7th Semester Data Warehousing and Data Mining

Study about Data Object and Attribute Types

  • Data sets are made up of data objects. 
  • A data object represents an entity. 
  • Examples: – sales database: customers, store items, sales – medical database: patients, treatments – university database: students, professors, courses .
  • Also called samples , examples, instances, data points, objects, tuples. 
  •  Data objects are described by attributes. 
  •  Database rows -> data objects; columns ->attributes.
  • Attribute: data field representing a characteristic or feature of data object. 
  • A set of attributes used to describe a given object is called attribute vector.
  • The distribution of data involving one attribute is called univariate.
  • A bivariate distribution involves two attributes and so on.
  • The type of attribute is determined by the set of possible values- nominal, binary,ordinal,or numeric-the attribute can have.
  1. Nominal Attributes
  • The values of nominal attributes are symbol or name of things.
  • Each value represent some kind of category, code, or state and so nominal attribute are also referred to as categorical.
  • Values of nominal attribute do not have any meaningful order about them and are not quantitative
  • Examples: Hair_color = {black, brown, grey, red, white}  

                            marital status, occupation, etc. 

  1. Binary Attributes
  • Nominal attribute with only two categories or states: 0 or 1, where 0 means attribute is absent and 1 means that it is present.
  • Binary attributes are referred to as Boolean if two states correspond to true or false.
  • Symmetric binary: both outcomes equally important 

 e.g., gender 

  •  Asymmetric binary: outcomes not equally important. 

e.g., medical test (positive vs. negative) 

Convention: assign 1 to most important outcome (e.g., HIV positive)

  1. Ordinal Attributes
  • It is attributed with possible values that have a meaningful order or ranking among them but magnitude between successive values is  not known.
  • Size = {small, medium, large}, grades, rankings
  1. Numeric attributes
  • It is quantitative i.e. it is a measurable quantity, represented in integer or real values.
  • They can be interval-scaled or ratio-scaled.
    • Ratio: Inherent zero-point i.e. if a measurement is ratio-scaled, we can speak of a value as being multiple (or ratio) of another value.
    • values are ordered and difference between values can be computed along with mean, median and mode.
    • e.g., length, counts, monetary quantities
  • Interval: Measured on a scale of equal-sized units 
    • Values have order  and can be positive, 0, or negative. Thus , in addition to providing a ranking of values, such attributes allow us to compare and quantify the difference between values.
    •  E.g., calendar dates 
    •  No true zero-point
About Author

Karina Shakya