Cơ sở dữ liệu - Chapter 1: Introduction

pdf 49 trang vanle 2950
Bạn đang xem 20 trang mẫu của tài liệu "Cơ sở dữ liệu - Chapter 1: Introduction", để tải tài liệu gốc về máy bạn click vào nút DOWNLOAD ở trên

Tài liệu đính kèm:

  • pdfco_so_du_lieu_chapter_1_introduction.pdf

Nội dung text: Cơ sở dữ liệu - Chapter 1: Introduction

  1. Chapter 1: Introduction • Pseudocode • Abstract data type • Algorithm efficiency Cao Hoang Tru 1 CSE Faculty - HCMUT 10 September 2008
  2. Pseudocode • What is an algorithm? Cao Hoang Tru 2 CSE Faculty - HCMUT 10 September 2008
  3. Pseudocode • What is an algorithm? –The logical steps to solve a problem. Cao Hoang Tru 3 CSE Faculty - HCMUT 10 September 2008
  4. Pseudocode • What is a program? – Program = Data structures + Algorithms (Niklaus Wirth) Cao Hoang Tru 4 CSE Faculty - HCMUT 10 September 2008
  5. Pseudocode • The most common tool to define algorithms. • English-like representation of the code required for an algorithm. Cao Hoang Tru 5 CSE Faculty - HCMUT 10 September 2008
  6. Pseudocode • Pseudocode = English + Code relaxed syntax being instructions using easy to read basic control structures (sequential, conditional, iterative) Cao Hoang Tru 6 CSE Faculty - HCMUT 10 September 2008
  7. Pseudocode Algorithm Header Algorithm Body Cao Hoang Tru 7 CSE Faculty - HCMUT 10 September 2008
  8. Pseudocode • Algorithm Header: – Name – Parameters and their types – Purpose • what the algorithm does – Precondition • precursor requirements for the parameters – Postcondition • taken action and status of the parameters – Return condition • returned value Cao Hoang Tru 8 CSE Faculty - HCMUT 10 September 2008
  9. Pseudocode • Algorithm Body: – Statements – Statement numbers • decimal notation to express levels – Variables • important data – Algorithm analysis • comments to explain salient points – Statement constructs • sequence, selection, iteration Cao Hoang Tru 9 CSE Faculty - HCMUT 10 September 2008
  10. Example Algorithm average Pre nothing Post numbers read and their average printed 1 i = 0 2 loop (all data not read) 1 i = i + 1 2 read number 3 sum = sum + number 3 average = sum / i 4 print average 5 return End average Cao Hoang Tru 10 CSE Faculty - HCMUT 10 September 2008
  11. Algorithm Design • Divide-and-conquer • Top-down design • Abstraction of instructions • Step-wise refinement Cao Hoang Tru 11 CSE Faculty - HCMUT 10 September 2008
  12. Abstract Data Type • What is a data type? – Class of data objects that have the same properties Cao Hoang Tru 12 CSE Faculty - HCMUT 10 September 2008
  13. Abstract Data Type • Development of programming concepts: – GOTO programming • control flow is like spaghetti on a plate – Modular programming • programs organized into subprograms – Structured programming • structured control statements (sequence, selection, iteration) – Object-oriented programming • encapsulation of data and operations Cao Hoang Tru 13 CSE Faculty - HCMUT 10 September 2008
  14. Abstract Data Type •ADT = Data structures + Operations Cao Hoang Tru 14 CSE Faculty - HCMUT 10 September 2008
  15. Abstract Data Type Interface User knows what a data type can do. Implementation of data and operations How it is done is hidden. Cao Hoang Tru 15 CSE Faculty - HCMUT 10 September 2008
  16. Abstract Data Type data structure internal data function function A data function B Cao Hoang Tru 16 CSE Faculty - HCMUT 10 September 2008
  17. Example: Variable Access • Rectangle: r – length: x – width: y • Rectangle: r – length: x (hidden) – width: y (hidden) – get_length() – get_width() Cao Hoang Tru 17 CSE Faculty - HCMUT 10 September 2008
  18. Example: List •Interface: – Data: • sequence of components of a particular data type – Operations: • accessing • insertion • deletion • Implementation: – Array, or – Linked list Cao Hoang Tru 18 CSE Faculty - HCMUT 10 September 2008
  19. Algorithm Efficiency •How fast an algorithm is? • How much memory does it cost? • Computational complexity: measure of the difficulty degree (time or space) of an algorithm. Cao Hoang Tru 19 CSE Faculty - HCMUT 10 September 2008
  20. Algorithm Efficiency • General format: f(n) n is the size of a problem (the key number that determines the size of input data) Cao Hoang Tru 20 CSE Faculty - HCMUT 10 September 2008
  21. Linear Loops 1 i = 1 1 i = 1 2 loop (i <= 1000) 2 loop (i <= 1000) 1 application code 1 application code 2 i = i + 1 2 i = i + 2 The number of times the body The number of times the body of the loop is replicated is of the loop is replicated is 1000 500 Cao Hoang Tru 21 CSE Faculty - HCMUT 10 September 2008
  22. Linear Loops time f(n) = n.T f(n) = (n/2).T Cao Hoang Tru n 22 CSE Faculty - HCMUT 10 September 2008
  23. Logarithmic Loops Multiply loops 1 i = 1 2 loop (i <= 1000) 1 application code 2 i = i × 2 The number of times the body of the loop is replicated is log2n Cao Hoang Tru 23 CSE Faculty - HCMUT 10 September 2008
  24. Logarithmic Loops Multiply loops Divide loops 1 i = 1 1 i = 1000 2 loop (i = 1) 1 application code 1 application code 2 i = i × 2 2 i = i / 2 The number of times the body of the loop is replicated is log2n Cao Hoang Tru 24 CSE Faculty - HCMUT 10 September 2008
  25. Logarithmic Loops time f(n) = (log2n).T Cao Hoang Tru n 25 CSE Faculty - HCMUT 10 September 2008
  26. Nested Loops Iterations = Outer loop iterations × Inner loop iterations Cao Hoang Tru 26 CSE Faculty - HCMUT 10 September 2008
  27. Linear Logarithmic Loops 1 i = 1 2 loop (i <= 10) 1 j = 1 2 loop (j <= 10) 1 application code 2 j = j × 2 3 i = i + 1 The number of times the body of the loop is replicated is nlog2n Cao Hoang Tru 27 CSE Faculty - HCMUT 10 September 2008
  28. Linear Logarithmic Loops time f(n) = (nlog2n).T Cao Hoang Tru n 28 CSE Faculty - HCMUT 10 September 2008
  29. Quadratic Loops 1 i = 1 2 loop (i <= 10) 1 j = 1 2 loop (j <= 10) 1 application code 2 j = j + 1 3 i = i + 1 The number of times the body of the loop is replicated is n2 Cao Hoang Tru 29 CSE Faculty - HCMUT 10 September 2008
  30. Dependent Quadratic Loops 1 i = 1 2 loop (i <= 10) 1 j = 1 2 loop (j <= i) 1 application code 2 j = j + 1 3 i = i + 1 The number of times the body of the loop is replicated is 1 + 2 + + n = n(n + 1)/2 Cao Hoang Tru 30 CSE Faculty - HCMUT 10 September 2008
  31. Quadratic Loops time f(n) = n2.T Cao Hoang Tru n 31 CSE Faculty - HCMUT 10 September 2008
  32. Asymptotic Complexity • Algorithm efficiency is considered with only big problem sizes. • We are not concerned with an exact measurement of an algorithm's efficiency. • Terms that do not substantially change the function’s magnitude are eliminated. Cao Hoang Tru 32 CSE Faculty - HCMUT 10 September 2008
  33. Big-O Notation •f(n) = c.n⇒ f(n) = O(n). • f(n) = n(n + 1)/2 = n2/2 + n/2 ⇒ f(n) = O(n2). Cao Hoang Tru 33 CSE Faculty - HCMUT 10 September 2008
  34. Big-O Notation • Set the coefficient of the term to one. • Keep the largest term and discard the others. 2 3 k n log2n n nlog2n n n n 2 n! Cao Hoang Tru 34 CSE Faculty - HCMUT 10 September 2008
  35. Standard Measures of Efficiency Efficiency Big-O Iterations Est. Time logarithmic O(log2n) 14 microseconds linear O(n) 10,000 .1 seconds linear logarithmic O(nlog2n) 140,000 2 seconds quadratic O(n2) 10,0002 15-20 min. polynomial O(nk) 10,000k hours exponential O(2n) 210,000 intractable factorial O(n!) 10,000! intractable Assume instruction speed of 1 microsecond and 10 instructions in loop. n = 10,000 Cao Hoang Tru 35 CSE Faculty - HCMUT 10 September 2008
  36. Standard Measures of Efficiency O(n) 2 n nlog2n n log2n Cao Hoang Tru n 36 CSE Faculty - HCMUT 10 September 2008
  37. Big-O Analysis Examples Algorithm addMatrix (val matrix1 , val matrix2 , val size , ref matrix3 ) Add matrix1 to matrix2 and place results in matrix3 Pre matrix1 and matrix2 have data size is number of columns and rows in matrix Post matrices added - result in matrix3 1 r = 1 2 loop (r <= size) 1 c = 1 2 loop (c <= size) 1 matrix3[r, c] = matrix1[r, c] + matrix2[r, c] 2 c = c + 1 3 r = r + 1 3 return End addMatrix Cao Hoang Tru 37 CSE Faculty - HCMUT 10 September 2008
  38. Big-O Analysis Examples Algorithm addMatrix (val matrix1 , val matrix2 , val size , ref matrix3 ) Add matrix1 to matrix2 and place results in matrix3 Pre matrix1 and matrix2 have data size is number of columns and rows in matrix Post matrices added - result in matrix3 1 r = 1 2 loop (r <= size) 1 c = 1 2 loop (c <= size) 1 matrix3[r, c] = matrix1[r, c] + matrix2[r, c] 2 c = c + 1 3 r = r + 1 2 3 return Nested linear loop: f(size) = O(size ) End addMatrix Cao Hoang Tru 38 CSE Faculty - HCMUT 10 September 2008
  39. Time Costing Operations • The most time consuming: data movement to/from memory/storage. • Operations under consideration: – Comparisons – Arithmetic operations – Assignments Cao Hoang Tru 39 CSE Faculty - HCMUT 10 September 2008
  40. Recurrence Equation • An equation or inequality that describes a function in terms of its value on smaller input. Cao Hoang Tru 40 CSE Faculty - HCMUT 10 September 2008
  41. Recurrence Equation • Example: binary search. a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11] a[12] 4 7 8 10 14 21 22 36 62 77 81 91 Cao Hoang Tru 41 CSE Faculty - HCMUT 10 September 2008
  42. Recurrence Equation • Example: binary search. a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11] a[12] 4 7 8 10 14 21 22 36 62 77 81 91 f(n) = 1 + f(n/2) ⇒ f(n) = O(log2n) Cao Hoang Tru 42 CSE Faculty - HCMUT 10 September 2008
  43. Best, Average, Worst Cases • Best case: when the number of steps is smallest. • Worst case: when the number of steps is largest. • Average case: in between. Cao Hoang Tru 43 CSE Faculty - HCMUT 10 September 2008
  44. Best, Average, Worst Cases • Example: sequential search. a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11] a[12] 4 8 7 10 21 14 22 36 62 91 77 81 Best case: f(n) = O(1) Worst case: f(n) = O(n) Cao Hoang Tru 44 CSE Faculty - HCMUT 10 September 2008
  45. Best, Average, Worst Cases • Example: sequential search. a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11] a[12] 4 8 7 10 21 14 22 36 62 91 77 81 Average case: f(n) = ∑i.pi pi: probability for the target being at a[i] pi = 1/n ⇒ f(n) = (∑i)/n = O(n) Cao Hoang Tru 45 CSE Faculty - HCMUT 10 September 2008
  46. P and NP Problems •P: Polynomial (can be solved in polynomial time on a deterministic machine). •NP: Nondeterministic Polynomial (can be solved in polynomial time on a non- deterministic machine). Cao Hoang Tru 46 CSE Faculty - HCMUT 10 September 2008
  47. P and NP Problems Travelling Salesman Problem: A salesman has a list of cities, each of which he must visit exactly once. There are direct roads between each pair of cities on the list. Find the route the salesman should follow for the shortest possible round trip that both starts and finishes at any one of the cities. A 110 B DE 55 15 5 C Cao Hoang Tru 47 CSE Faculty - HCMUT 10 September 2008
  48. P and NP Problems Travelling Salesman Problem: Deterministic machine: f(n) = n(n-1)(n-2) 1 = O(n!) ⇒ NP problem A 110 B DE 55 15 5 C Cao Hoang Tru 48 CSE Faculty - HCMUT 10 September 2008
  49. P and NP Problems • NP-complete: NP and every other problem in NP is polynomially reducible to it. • Open question: P = NP? P NP NP-complete Cao Hoang Tru 49 CSE Faculty - HCMUT 10 September 2008