background
What is multithreading, high concurrency, and distributed
Multithreading: a method to complete tasks by implementing multiple threads concurrently in software or hardware;
Distributed: an optimization method used to solve the performance bottleneck of a single physical server;
High concurrency: a state of system operation, which is used to solve a large number of operation requests in a short time
Why introduce multithreading, high concurrency and distributed
Multithreading: focus on how to use programming language to maximize CPU scheduling capacity;
Distributed: from the point of view of physical resources, different machines form an overall external service, so as to realize a system with high concurrency and high throughput;
High concurrency: describe the capability of the system from a business perspective, that is, allow a large number of users to access services at the same time;
What are the problems with multithreading and high concurrency
Security problem: (race condition) in a multithreaded environment, if multiple threads operate on the same variable, the final value of the variable depends on the alternating execution order of multiple thread operations at runtime, which is not what we want to see.
As shown in the following figure, the results of each run cannot be predicted accurately:
Activity problem: one of the forms is the infinite loop caused by accident, so that the code after the loop cannot be executed; In other forms, if thread A is waiting for thread B to release the resource it holds, and thread B will never release the resource, then A will wait forever (deadlock, hunger, livelock, etc.).
Performance problems include many aspects, such as long service time, insensitive response, low throughput, high resource consumption, or low scalability. Because in multithreaded programs, when the thread scheduler temporarily suspends an active thread and runs another thread, context switching operations will occur frequently, which brings great overhead (saving and restoring the execution context, losing locality, and CPU time will be spent on thread scheduling rather than thread running); When threads share data, synchronization mechanisms must be used. These mechanisms often inhibit some compiler optimizations and invalidate the data in the memory cache.
Fundamentals of Java Concurrent Programming
Thread safety
- What is thread safety:
When multiple threads access a class, no matter what scheduling mode the runtime environment adopts or how these threads will execute alternately, and there is no need for any additional synchronization or cooperation in the calling code, this class can show correct behavior.
The necessary synchronization mechanism is encapsulated in the thread safety class, so the client does not need to take further synchronization measures
Stateless objects must be thread safe.
For example, the following example:
/** * Stateless objects must be thread safe: * Because stateless objects do not contain any fields, nor do they contain any references to fields in other classes, * Therefore, the variables involved in such objects are local variables that exist in the private stack memory of the thread, * That is, the thread accessing the statelessfactor will not affect the calculation results of another thread accessing the same statelessfactor, * That is, the two threads do not share state. */ @ThreadSafe public class StatelessFactorizer implements Servlet { public void service(ServletRequest req, ServletResponse resp) { BigInteger i = extractFromRequest(req); BigInteger[] factors = factor(i); encodeIntoResponse(resp, factors); } public BigInteger extractFromRequest(ServletRequest req) { return new BigInteger(""); } public BigInteger [] factor(BigInteger bigInteger) { return new BigInteger[] {}; } public void encodeIntoResponse(ServletResponse resp, BigInteger [] bigIntegers) {} }
- Atomicity:
Race condition: in concurrent programming, this kind of incorrect result due to improper execution timing is a very important situation.
When adding a state to a stateless class, if the state is completely managed by thread safe objects, the class is still thread safe.
For example, add a counter to a stateless class, which is managed by using the thread safety class AtomicLong, so as to ensure the thread safety of the code. The following is an example:
/** * java.util.concurrent.atomic The package contains some atomic variable classes, which are used to implement values and objects * Atomic state transitions on references. It ensures that all access operations to counter states are atomic. Since the state of the Servlet is the state of the counter, * And the counter is thread safe, so the Servlet is also thread safe */ @ThreadSafe public class CountingFactorizer implements Servlet { private final AtomicLong count = new AtomicLong(0); public long getCount() { return count.get(); } public void service(ServletRequest req, ServletResponse resp) { BigInteger i = extractFromRequest(req); BigInteger[] factors = factor(i); count.incrementAndGet(); encodeIntoResponse(resp, factors); } }
However, when the number of state variables changes from one to multiple, it is not as simple as changing the number of state variables from zero to one.
- Locking mechanism:
When multiple variables are involved in the invariance condition, the variables are not independent of each other, but the value of one variable will restrict the value of other variables. Therefore, when updating a variable, you need to update other variables at the same time in the same atomic operation.
For example, the following example is not thread safe:
/** * One of the invariance conditions of this class is that the product of the factors cached in lastFactors should be equal to the value cached in lastNumber. * However, in the process of obtaining these two values by thread A, thread B may modify them, so thread A will also find that the invariance condition is broken */ public class UnsafeCachingFacorizer implements Servlet { private final AtomicReference<BigInteger> lastNumber = new AtomicReference<>(); private final AtomicReference<BigInteger []> lastFactors = new AtomicReference<>(); public void service(ServletRequest req, ServletResponse resp) { BigInteger i = extractFromRequest(req); if (i.equals(lastNumber.get())) encodeIntoResponse(resp, lastFactors.get()); else { BigInteger[] factors = factor(i); lastNumber.set(i); lastFactors.set(factors); encodeIntoResponse(resp, factors); } } }
Built in lock
As the above multiple states are related to each other, it is necessary to perform atomic operations on these related variables.
Java provides a built-in locking mechanism to support atomicity: synchronized blocks. The synchronization code block consists of two parts: an object reference as a lock and a code block protected by the lock. The method decorated with the keyword synchronized is a synchronized code block across the whole method body, in which the lock of the synchronized code block is the object where the method call is located. The static synchronized method takes the Class object as the lock.
synchronized (lock) { // Access or modify the shared state protected by the lock }
Every Java object can have synchronous locks, which are called built-in locks. Java's built-in lock is equivalent to a mutually exclusive lock, which means that at most one thread can hold this lock. However, there are often performance problems, and solutions will be proposed later.
Reentry
"Reentry" means that the granularity of the operation to obtain the lock is "thread", not "call".
Reentry further improves the encapsulation of locking behavior, so it simplifies the development of object-oriented concurrent code.
The following is an example to illustrate the function of reentry lock:
If the built-in lock is not reentrant, this code will deadlock
public class Widget { public synchronized void doSomething() { // ... } } public class LoggingWidget extends Widget { public synchronized void doSomething() { System.out.println(toString() + ": calling doSomething."); super.doSomething(); } }
- Activity and performance
Generally, there are mutual constraints between simplicity and performance. When implementing a synchronization strategy, you must not blindly sacrifice simplicity for performance (which may undermine security)
When performing calculations that take a long time or operations that may not be completed quickly (for example, network I/O or console I/O), be sure not to hold the lock.
Object sharing
- visibility
Memory visibility: when one thread modifies the object state, other threads can see the state changes.
Without synchronization, the compiler, processor and runtime may make some unexpected adjustments to the execution order of operations. In multithreaded programs that lack sufficient synchronization, it is almost impossible to draw a correct conclusion to judge the execution order of memory operations.
For memory visibility, you need to explain the memory model of the JVM, as shown in the following figure:
As can be seen from the above figure, in Java multithreading, each thread has its own working memory, and the assignment and value of variables are directly associated with the working memory. How to execute these operations is defined by the memory model. Through the access rules of these defined variables, the problems caused by multithreading's race conditions for stateful objects can be solved.
For example, the following is a problem caused by multithreading visibility:
/** * Invisibility of variables: * In the code, both the main thread and the read thread will access the shared variables ready and number. * The main thread starts the read thread, then sets number to 42 and ready to true. * The reading thread loops until it finds that the value of ready becomes true, and then outputs the value of number. * Although it looks like it will output 42, in fact, it is likely to output 0, or it cannot be terminated at all */ public class NoVisibility { private static boolean ready; private static int number; private static class ReaderThread extends Thread { public void run() { while (!ready) Thread.yield(); System.out.println(number); } } public static void main(String[] args) throws InterruptedException { new ReaderThread().start(); ready = true; number = 42; } }
The following is an example to illustrate the problem of invalid data:
/** * Non thread safe variable integer class */ public class MutableInteger { private int value; public int get() { return value; } public void set(int value) { this.value = value; } } /** * Thread safe variable integer class */ public class SynchronizedInteger { private int value; public synchronized int get() {return value;} public synchronized void set(int value) {this.value = value;} }
Locking and visibility
The meaning of locking is not limited to mutually exclusive behavior, but also includes memory visibility. To ensure that all threads can see the latest value of the shared variable, all threads performing read or write operations must synchronize on the same lock.
From the perspective of memory visibility, writing volatile variables is equivalent to exiting the synchronous code block, while reading volatile variables is equivalent to entering the synchronous code block. However, the locking operation will not be performed when accessing the volatile variable, so the execution thread will not be blocked. That is, the volatile variable is a lighter synchronization mechanism than the synchronized keyword.
The following is a typical use of volatile variables: used to check a status flag to determine whether to exit the loop.
volatile boolean asleep; // ... while(!asleep) countSomeSheep();
- Release and escape
"Publishing" an object means enabling the object to be used in code outside the current scope.
Escape: when an object that should not be published is published, this situation is called escape.
Do not let this reference escape during construction—— Secure object construction process
The following is an example of using factory methods to prevent this reference from escaping during construction:
/** * Chapter 3: object sharing * Construction process of secure object - > do not use this reference to escape during construction * If you want to register an event listener or start a thread in the constructor, you can use a private constructor and * A common factory method to avoid incorrect construction process */ public class SafeListener { private final EventListener listener; private SafeListener() { listener = new EventListener() { public void onEvent(Event e) { doSomething(e); } }; } public static SafeListener newInstance(EventSource source) { SafeListener safe = new SafeListener(); source.registerListener(safe.listener); return safe; } }
- Thread closure
Stack closure
Because in Java semantics, local variables are thread private. Therefore, stack closure is also called thread internal use or thread local use.
For local variables of basic types, stack closure will not be destroyed in any case.
When maintaining the stack closure of object references, programmers need to do more work to ensure that the referenced objects do not escape.
ThreadLocal class
A more standardized method to maintain thread closure is to use ThreadLocal, which can associate a value in the thread with the object that saves the value. ThreadLocal provides access interfaces or methods such as get and set. These methods store an independent copy for each thread that uses the variable. Therefore, get always returns the latest value set by the current executing thread when calling set.
/** * ThreadLocal is heavily used when implementing the application framework. For example, during an EJB call, * J2EE The container needs to associate a transaction context with an executing thread. * By saving the transaction context in a static ThreadLocal object, this function can be easily realized: when the framework * When the code needs to determine which transaction is currently running, it only needs to read the transaction context from the ThreadLocal object. */
- Invariance
An object is called immutable if its state cannot be modified after it is created.
Referring to the stateless object mentioned earlier, it must be thread safe because it has no competing state
Immutable objects must be thread safe.
The object is immutable when the following conditions are met:
- After an object is created, its status data cannot be modified;
- All fields of the object are of type final;
- The object was created correctly (this reference did not escape during object creation).
Final domain
Fields of final type cannot be modified (but if the objects referenced by the final field are variable, these referenced objects can be modified).
In the Java memory model, the final domain also has special semantics. The final domain ensures the security of the initialization process, so that immutable objects can be accessed without restrictions, and there is no need to synchronize when sharing these objects.
About final keyword:
- Modified class: the modified class cannot be inherited (unless this class will not be used for inheritance in the future or for security reasons, try not to design the class as final class)
- Modified method: the modified method will not be overridden (the private method of the class will be implicitly specified as the final method)
- Modifier variable:
- If the modification is a variable of basic data type, its value cannot be changed after initialization;
- If the modification is a reference variable, it can no longer point to another object after its initialization
Example: use volatile type to publish immutable objects
The following is the optimization of the previous factor decomposition, which provides a weak form of atomicity through immutable objects.
/** * Cache values and their factorization results. For immutable container classes: * Whenever you need to perform an operation atomically on a set of related data, you can consider creating an immutable class to contain the data */ public class OneValueCache { private final BigInteger lastNumber; private final BigInteger[] lastFactors; public OneValueCache(BigInteger i, BigInteger[] factors) { lastNumber = i; lastFactors = Arrays.copyOf(factors, factors.length); } public BigInteger[] getFactors(BigInteger i) { if (lastNumber == null || !lastNumber.equals(i)) return null; else return Arrays.copyOf(lastFactors, lastFactors.length); } }
/** * VolatileCachedFactorizer OneValueCache is used to save cached values and their factors. * When a thread sets a volatile type cache to reference a new OneValueCache, other threads will immediately see the cached data */ public class VolatileCachedFactorizer { private volatile OneValueCache cache = new OneValueCache(null, null); public void service(ServletRequest req, ServletResponse resp) { BigInteger i = extractFromRequest(req); BigInteger[] factors = cache.getFactors(i); if (factors == null) { factors = factor(i); cache = new OneValueCache(i, factors); } encodeIntoResponse(resp, factors); } }
- Security release
Common modes of secure Publishing
- In static initialization function
Initialize an object reference;
- Save the reference of the object to the volatile domain or AtomicRefrance object;
- Save the reference of the object to the final type field of a correctly constructed object;
- Save the reference of the object to a domain protected by a lock
Synchronization within the thread safety container means that when objects are put into a container, the last requirement mentioned above will be met, that is, the container in the thread safety library provides the following security release guarantee:
- By putting a key or value into a HashTable, synchronized map, or ConcurrentMap, you can safely publish it to any thread accessing it from these containers (whether directly or through an iterator);
- By putting an element into Vector, CopyOnWriteArrayList, CopyOnWriteArraySet, synchronizedList or synchronizedSet, the element can be safely published to any thread accessing the element from these containers;
- By putting an element into BlockingQueue or ConcurrentLinkedQueue, the element can be safely published to any thread accessing the element from these queues;
Fact immutable object, mutable object, safely shared object
The publishing requirements of an object depend on its variability:
- Immutable objects can be published through any mechanism;
- Fact immutable objects must be published in a safe way;
- Mutable objects must be published in a safe way, and must be thread safe or protected by a lock;
Thread closure: the object is enclosed in this thread and can only be modified by this thread;
Read only share:
Thread safe sharing: thread safe objects are synchronized internally, so multiple threads can be accessed through the public interface of the object without further synchronization;
Protected object: protected objects can only
Access by holding a specific lock.
Combination of objects
Every memory access is analyzed to ensure thread safety, which is time-consuming and labor-consuming. Therefore, some existing thread safety components are combined into larger components or programs.
-
Designing thread safe classes
-
Instance closure
Thread closure: ensure that objects can only be accessed in a single thread, or protect all access to objects through a lock.
Instance closure: when an object is encapsulated into another object, all access methods of the encapsulated object are known.
Encapsulating the data inside the object can limit the access of the data to the methods of the object, which makes it easier to ensure that the thread always holds the correct lock when accessing the data.
————Java Concurrent Programming Practice 3
For example, the following is an example of thread safety through the closure mechanism:
/** * PersonSet The pair state is managed by the HashSet, but the HashSet is not a thread safe pair. * Since mySet is a private pair and will not escape, HashSet is enclosed in PersonSet, * The only way to access mySet is addPerson and containsPerson * They all need to obtain the lock on the PersonSet. The state of the PersonSet is completely protected by its built-in lock, * Therefore, PersonSet is a thread safe class. */ public class PersonSet { private final Set<Person> mySet = new HashSet<>(); public synchronized void addPerson(Person person) { mySet.add(person); } public synchronized boolean containsPerson(Person person) { return mySet.contains(person); } }
In the Java class library, some basic container classes are not thread safe, such as ArrayList and HashMap, but the class library provides decorator factory methods (such as Collections.synchronizedList and similar methods), so that these non thread safe classes can be used safely in a multithreaded environment. These factory methods encapsulate the container class in a synchronous decorator object through decorator mode, and the decorator object has a unique reference to the underlying container object (that is, enclose the underlying container object in the decorator), then it is thread safe.
————Java Concurrent Programming Practice
-
Delegation of thread safety???
-
Add functionality to existing thread safe classes
Client locking mechanism
/** * Non thread safe "add if not" (do not do this): * The problem is that the program synchronizes on the wrong lock. No matter which lock the List uses to protect its state, * To be sure, this lock is not a lock on the ListHelper, which means that putIfAbsent is relative to * List Is not atomic for other operations, so it is impossible to ensure that putIfAbsent is executed by another thread * The linked list will not be modified. * * @param <E> */ public class ListHelper<E> { public List<E> list = Collections.synchronizedList(new ArrayList<>()); //... public synchronized boolean putIfAbsent(E x) { boolean absent = !list.contains(x); if (absent) list.add(x); return absent; } } /** * For putIfAbsent() to execute correctly, the List must use the same lock when implementing client-side locking or external locking. * Client locking means that for the client code using an object x, the lock used by X itself to protect its state is used to protect the client * code. To use client-side locking, you must instruct object X which lock to use. * * @param <E> */ public class ListHelperBetter<E> { public List<E> list = Collections.synchronizedList(new ArrayList<>()); public boolean putIfAbsent(E t) { synchronized (list) { boolean absent = !list.contains(t); if (absent) list.add(t); return absent; } } }
Basic synchronization method
- Synchronization container class
Synchronization container classes include Vector and Hashtable. These synchronization wrappers are composed of collections Created by factory methods such as synchronizedxxx. The implementation method is to encapsulate their state and synchronize each public method, so that only one thread can access the state of the container at a time.
Synchronization container class problem
Iterators and ConcurrentModificationException
The iteration of synchronization container class (or for each loop) is carried out in the way of Iterator.
The mechanism of the iterator when dealing with concurrent modification: associate the change of the counter with the container. If the counter is modified during the iterator, hasNext or next will throw a ConcurrentModificationException.
The following is an example of the iteration of the List container with the for each loop syntax:
// Iterate List through Iterator List<Widget> widgetList = Collections.synchronizedList(new ArrayList<Widget>()); // ... // Concurrent modificationexception may be thrown for (Widget w : widgetList) doSomething(w);
Hide iterators
Similar to the following:
/** * addTenthings The method may throw a ConcurrentModificationException because during the generation of debugging information, * toString Iterate over the container */ public class HiddenIterator { private final Set<Integer> set = new HashSet<>(); public synchronized void add(Integer i) { set.add(i); } public synchronized void remove(Integer i) { set.remove(i); } public void addTenThings() { Random random = new Random(); for (int i = 0; i < 10; i++) add(random.nextInt()); System.out.println("DEBUG: added ten elements to " + set); } public static void main(String[] args) { HiddenIterator hiddenIterator = new HiddenIterator(); hiddenIterator.addTenThings(); } }
- Concurrent container
Improve the performance of synchronization containers through concurrent container classes.
- Because synchronous containers serialize all access to container state pairs to achieve their thread safety. The cost of this method is to seriously reduce concurrency. When multiple threads compete for container locks, the throughput will be seriously reduced.
- Concurrent container is designed for concurrent access of multiple threads
ConcurrentHashMap
The synchronization container class holds a lock during each operation;
ConcurrentHashMap uses a finer granularity locking mechanism to achieve greater sharing, that is, segmented locking. At this time, any number of reading threads, threads performing read operations and threads performing write operations can access the Map concurrently
- Synchronization tool class
The synchronization tool class can be any object. As long as it coordinates the batch execution of control threads according to its own state, blocking queues and other types (such as semaphores, fences and locks) can be used as synchronization tool classes.
atresia
Locking: it is a tool class that restricts multiple threads to start at the same time when they reach the target state
Function: it is equivalent to a door. No thread will start before the locking reaches the target state. When the target state is reached, the locking limit will be opened and all threads will start at the same time.
Example: CountDownLatch is a flexible locking implementation. The countDown method decrements the counter, indicating that an event has occurred, and the await method waits for the counter to reach zero, indicating that all events that need to wait have occurred. Before that, await will block until the counter is zero, or the waiting thread is interrupted or timed out.
/** * TestHarness Create a certain number of threads and use them to execute the specified tasks concurrently. It uses two latches, representing * "Start gate "and" end gate ". The initial value of the start gate counter is 1, and the initial value of the end gate counter is the number of worker threads. Each * The last thing the worker thread needs to do first is to subtract 1 from the countDown method that calls the end gate */ public class TestHarness { public static long timeTasks(int nThreads, final Runnable task) throws InterruptedException { final CountDownLatch startGate = new CountDownLatch(1); final CountDownLatch endGate = new CountDownLatch(nThreads); for (int i = 0; i < nThreads; i++) { Thread t = new Thread() { public void run() { try { startGate.await(); try { task.run(); } finally { endGate.countDown(); } } catch (InterruptedException e) { e.printStackTrace(); } } }; t.start(); } long start = System.nanoTime(); startGate.countDown(); endGate.await(); long end = System.nanoTime(); return end - start; } }
FutureTask
The calculation represented by FutureTask is realized through Callable, which is equivalent to a Runnable that can generate results, and can be in the following three states: waiting for operation, running and running completion.
FutureTask represents asynchronous tasks in the Executor framework. In addition, it can also be used to represent some long-time calculations.
Semaphore
The count semaphore is used to control the number of operations that access a specific resource at the same time or perform a specified operation at the same time.
Semaphore can be used to implement resource pools, such as database connection pools. When the semaphore is 0, all requests for resources will be blocked until the semaphore increases.
The following is an example of using Semaphore to set boundaries for containers:
/** * Semaphore is used to turn any kind of container into a bounded blocking container, and the count value of semaphore is initialized to the maximum value of container capacity. * add Before adding an element to the underlying container, the operation first needs to obtain a semaphore. If no element is added, a semaphore will be released immediately; * remove Operation releases a semaphore so that more elements can be added to the container. * @param <T> */ public class BoundedHashSet<T> { private final Set<T> set; private final Semaphore sem; public BoundedHashSet(int bound) { this.set = Collections.synchronizedSet(new HashSet<>()); sem = new Semaphore(bound); } public boolean add(T o) throws InterruptedException { sem.acquire(); boolean wasAdded = false; try { wasAdded = set.add(o); return wasAdded; } finally { if (!wasAdded) sem.release(); } } public boolean remove(Object o) { boolean wasRemoved = set.remove(o); if (wasRemoved) sem.release(); return wasRemoved; } }
fence
A fence is similar to a lock. It can block a group of threads until something happens, or wait for a group of related operations to end. Locking is a one-time object. Once it enters the termination state, it cannot be reset.
The key difference between fence and locking: all threads must reach the fence position at the same time before they can continue to execute; Locks are used to wait for events, while fences are used to wait for other threads.
When to use the fence:
- It is unrealistic to simulate cells through the automatic calculation of the fence and assign an independent thread to each element (cell). The reasonable approach is to decompose the problem into a certain number of sub problems, assign a thread to each sub problem for solution, and then combine all the results.
appendix
- Reference articles
[1]. Java multithreaded concurrent program practice
[2]. What are distributed, multithreaded and highly concurrent