Intro
Hi, this is Dongyan Zhou. I am currently working at ByteDance, focusing on big data and AI.
When optimizing database queries, one crucial factor is estimating the Number of Distinct Values (NDV) after operations like filtering or sampling. Let’s explore how this works in a practical way. What is NDV? NDV (Number of Distinct Values) counts how many unique values exist in a dataset. For instance, in a table with 1000 customer records, a “country” column might have only 50 distinct values since many customers are from the same countries. ...